Attractor Metafeatures and Their Application in Biomolecular Data Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Attractor Metafeatures and Their Application in Biomolecular Data Analysis Attractor Metafeatures and Their Application in Biomolecular Data Analysis Tai-Hsien Ou Yang Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2018 ©2018 Tai-Hsien Ou Yang All rights reserved ABSTRACT Attractor Metafeatures and Their Application in Biomolecular Data Analysis Tai-Hsien Ou Yang This dissertation proposes a family of algorithms for deriving signatures of mutually associated features, to which we refer as attractor metafeatures, or simply attractors. Specifically, we present multi-cancer attractor derivation algorithms, identifying correlated features in signatures from multiple biological data sets in one analysis, as well as the groups of samples or cells that exclusively express these signatures. Our results demonstrate that these signatures can be used, in proper combinations, as biomarkers that predict a patient’s survival rate, based on the transcriptome of the tumor sample. They can also be used as features to analyze the composition of the tumor. Through analyzing large data sets of 18 cancer types and three high-throughput platforms from The Cancer Genome Atlas (TCGA) PanCanAtlas Project and multiple single-cell RNA-seq data sets, we identified novel cancer attractor signatures and elucidated the identity of the cells that express these signatures. Using these signatures, we developed a prognostic biomarker for breast cancer called the Breast Cancer Attractor Metagenes (BCAM) biomarker as well as a software platform to analyze the tumor sample, called Analysis of the Single-Cell Omics for Tumor (ASCOT). Contents List of Figures ................................................................................................................................. ii List of Tables ................................................................................................................................. iii Acknowledgements ........................................................................................................................ iv Chapter 1 Introduction .................................................................................................................... 1 1.1 Background .......................................................................................................................... 1 1.2 Prior Work on Cancer Signature Identification .................................................................... 6 1.3 Contributions of the Thesis ................................................................................................ 14 Chapter 2 Derivation of the Attractor Metafeatures ..................................................................... 17 2.1 Introduction ........................................................................................................................ 17 2.2 Definitions .......................................................................................................................... 17 2.3 Implementation ................................................................................................................... 32 2.4 Comparison with Previous Work ....................................................................................... 33 Chapter 3 Attractor Biological Signatures .................................................................................... 37 3.1 Introduction ........................................................................................................................ 37 3.2 Multicancer Signatures ....................................................................................................... 37 3.3 Single-Cell-Attractor Signatures ........................................................................................ 53 Chapter 4 Prognostic Cancer Biomarker ...................................................................................... 61 4.1 Introduction ........................................................................................................................ 61 4.2 Methods .............................................................................................................................. 62 4.3 Results ................................................................................................................................ 72 4.4 Discussion .......................................................................................................................... 79 Chapter 5 Tumor Analysis Using Single-Cell Transcriptomes .................................................... 85 5.1 Introduction ........................................................................................................................ 85 5.2 Methods .............................................................................................................................. 88 5.3 Results ................................................................................................................................ 91 5.4 Discussion .......................................................................................................................... 96 Chapter 6 Conclusions and Future Work ...................................................................................... 98 6.1 Single-Cell Epigenomic-Attractor Signatures .................................................................... 99 6.2 Reconstruction of the evolution and the development processes of the cell populations 100 6.3 Predictive markers for precision medicine in cancer ....................................................... 101 Bibliography ............................................................................................................................... 103 Appendix A Comparison of the Consensus Attractors and the Attractor Clusters ..................... 117 Appendix B The Consensus Attractors in the TCGA PanCanAtlas Data Sets ........................... 118 Appendix C Significance of Consistency of the Attractor Signatures ........................................ 123 Appendix D The Single Cell Consensus Attractors .................................................................... 125 Appendix E Interrelationships and Concordance Indices of the Attractor Signatures ............... 126 Appendix F Cell-Specific Expression of the Single Cell Consensus Attractors ........................ 130 i List of Figures Figure 2.1 (A) Strength and (B) number of attractors against exponent 휶 for the consensus- attractor derivation method in TCGA bulk data sets. ............................................................30 Figure 2.2 (A) Strength and (B) number of attractors against exponent 휶 for the consensus- attractor derivation method in scRNA-seq data sets. .............................................................30 Figure 3.1 Scatter plots of(A) CD53, SASH3, IL10RA in 18 TCGA data sets, (B) Leukocyte, M+, M- attractor metafeatures in 17 TCGA data sets ...........................................................45 Figure 3.2 Scatter plots of (A) the T-cell signature and (B) the B-cell signature against the mitosis signature. ...................................................................................................................57 Figure 3.3 Top hits of the MSigDB gene-set enrichment analysis against the top genes of the stem-like signature. ................................................................................................................59 Figure 3.4 Scatter plots of the stem-like signature against the fibroblast signature (A) in the melanoma data set and (B) in the breast cancer data set. .......................................................60 Figure 3.5 Boxplots of the expression levels of the stem-like signature by type of cells in (A) the melanoma data set and in (B) the breast cancer data set. .................................................60 Figure 4.1 Screenshot of the feature-selector facility. ...................................................................66 Figure 4.2 Scatter plot of the YAP1 and the SMAD6 proteins in the AML patients. ...................70 Figure 4.3 Scores of the optimal combinations of different numbers of the features. ...................74 Figure 4.4 Estimated breast cancer-specific ten-year survival rate as a function of the BCAM score. ......................................................................................................................................76 Figure 4.5 Kaplan–Meier plots of (A) FGD3-SUSD3 and (B) ESR1 in breast cancer data sets. ..78 Figure 4.6 Frequency distributions of BCAM scores for subgroups of ER-negative breast cancer cases. ...........................................................................................................................82 Figure 5.1 Relationships of the exponent for metacell discovery and the number of discovered metacells in the annotated scRNA-seq data set GSE36552.................................90 Figure 5.2 t-SNE plot of the classification result of the samples from (A) melanoma patient CY79 and (B) melanoma patient CY84. ................................................................................93 Figure 5.3 Boxplots of the stem-like signatures by the type of cells in (A) the cells from patient CY79 and (B) the cells from patient CY84 of the melanoma data set. ......................94 Figure 5.4 t-SNE plot of the cell populations in the breast cancer data set. ..................................96 ii List of Tables Table 3.1 List
Recommended publications
  • Molecular Cytogenetic Analysis for TFE3 Rearrangement In
    Modern Pathology (2014) 27, 113–127 & 2014 USCAP, Inc. All rights reserved 0893-3952/14 $32.00 113 Molecular cytogenetic analysis for TFE3 rearrangement in Xp11.2 renal cell carcinoma and alveolar soft part sarcoma: validation and clinical experience with 75 cases Jennelle C Hodge, Kathryn E Pearce, Xiaoke Wang, Anne E Wiktor, Andre M Oliveira and Patricia T Greipp Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA Renal cell carcinoma with TFE3 rearrangement at Xp11.2 is a distinct subtype manifesting an indolent clinical course in children, with recent reports suggesting a more aggressive entity in adults. This subtype is morphologically heterogeneous and can be misclassified as clear cell or papillary renal cell carcinoma. TFE3 is also rearranged in alveolar soft part sarcoma. To aid in diagnosis, a break-apart strategy fluorescence in situ hybridization (FISH) probe set specific for TFE3 rearrangement and a reflex dual-color, single-fusion strategy probe set involving the most common TFE3 partner gene, ASPSCR1, were validated on formalin-fixed, paraffin- embedded tissues from nine alveolar soft part sarcoma, two suspected Xp11.2 renal cell carcinoma, and nine tumors in the differential diagnosis. The impact of tissue cut artifact was reduced through inclusion of a chromosome X centromere control probe. Analysis of the UOK-109 renal carcinoma cell line confirmed the break-apart TFE3 probe set can distinguish the subtle TFE3/NONO fusion-associated inversion of chromosome X. Subsequent extensive clinical experience was gained through analysis of 75 cases with an indication of Xp11.2 renal cell carcinoma (n ¼ 54), alveolar soft part sarcoma (n ¼ 13), perivascular epithelioid cell neoplasms (n ¼ 2), chordoma (n ¼ 1), or unspecified (n ¼ 5).
    [Show full text]
  • Rap-3971 Alveolar Soft Part Sarcoma Chromosome
    DATA SHEET Alveolar Soft Part Sarcoma Chromosome Region, Candidate 1 Human Item Number rAP-3971 Synonyms ASPCR1, ASPL, ASPS, RCC17, TUG, UBXD9, UBXN9, Tether containing UBX domain for GLUT4, Alveo- lar soft part sarcoma chromosomal region candidate gene 1 protein, Alveolar soft part sarcoma locus, Re- nal papillary cell carcinoma protein 17, UBX domain-containin Description ASPSCR1 Human Recombinant produced in E. coli is a single polypeptide chain containing 576 amino acids (1-553) and having a molecular mass of 62.6kDa. ASPSCR1 is fused to a 23 amino acid His-tag at N- terminus & purified by proprietary chromatographic techniques. Uniprot Accesion Number Q9BZE9 Amino Acid Sequence MGSSHHHHHH SSGLVPRGSH MGSMAAPAGG GGSAVSVLAP NGRRHTVKVT PSTVLLQVLE DTCRRQDFNP CEYDLKFQRS VLDLSLQWRF ANLPNNAKLE MVPASRSREG PENMVRIALQ LDDGS- RLQDS FCSGQTLWEL LSHFPQIREC LQHPGGATPV CVYTRDEVTG EAALRGTTLQ SLGLTGGSAT IRFVMKCYDP VGKTPGSLGS SASAGQAAAS APLPLESGEL SRGDLSRPED ADTSGPCCEH TQEKQSTRAP AAAPFVPFSG GGQRLGGPPG PTRPLTSSSA KLPKSLSSPG GPSKPKKSKS GQDPQQEQEQ ERERDPQQEQ ERERPVDREP VDREPVVCHP DLEERLQAWP AELPDEFFEL Source Escherichia Coli. Physical Appearance Sterile Filtered clear solution. Store at 4°C if entire vial will be used within 2-4 weeks. Store, frozen at -20°C and Stability for longer periods of time. For long term storage it is recommended to add a carrier protein (0.1% HSA or BSA).Avoid multiple freeze-thaw cycles. Formulation and Purity The ASPSCR1 solution (0.25mg/ml) contains 20mM Tris-HCl buffer (pH 8.0), 0.15M NaCl, 10% glycerol and 1mM DTT. Greater than 85% as determined by SDS-PAGE. Application Solubility Biological Activity Shipping Format and Condition Lyophilized powder at room temperature. Optimal dilutions should be determined by each laboratory for each application.
    [Show full text]
  • The Expression of the Human Apolipoprotein Genes and Their Regulation by Ppars
    CORE Metadata, citation and similar papers at core.ac.uk Provided by UEF Electronic Publications The expression of the human apolipoprotein genes and their regulation by PPARs Juuso Uski M.Sc. Thesis Biochemistry Department of Biosciences University of Kuopio June 2008 Abstract The expression of the human apolipoprotein genes and their regulation by PPARs. UNIVERSITY OF KUOPIO, the Faculty of Natural and Environmental Sciences, Curriculum of Biochemistry USKI Juuso Oskari Thesis for Master of Science degree Supervisors Prof. Carsten Carlberg, Ph.D. Merja Heinäniemi, Ph.D. June 2008 Keywords: nuclear receptors; peroxisome proliferator-activated receptor; PPAR response element; apolipoprotein; lipid metabolism; high density lipoprotein; low density lipoprotein. Lipids are any fat-soluble, naturally-occurring molecules and one of their main biological functions is energy storage. Lipoproteins carry hydrophobic lipids in the water and salt-based blood environment for processing and energy supply in liver and other organs. In this study, the genomic area around the apolipoprotein genes was scanned in silico for PPAR response elements (PPREs) using the in vitro data-based computer program. Several new putative REs were found in surroundings of multiple lipoprotein genes. The responsiveness of those apolipoprotein genes to the PPAR ligands GW501516, rosiglitazone and GW7647 in the HepG2, HEK293 and THP-1 cell lines were tested with real-time PCR. The APOA1, APOA2, APOB, APOD, APOE, APOF, APOL1, APOL3, APOL5 and APOL6 genes were found to be regulated by PPARs in direct or secondary manners. Those results provide new insights in the understanding of lipid metabolism and so many lifestyle diseases like atherosclerosis, type 2 diabetes, heart disease and stroke.
    [Show full text]
  • Seq2pathway Vignette
    seq2pathway Vignette Bin Wang, Xinan Holly Yang, Arjun Kinstlick May 19, 2021 Contents 1 Abstract 1 2 Package Installation 2 3 runseq2pathway 2 4 Two main functions 3 4.1 seq2gene . .3 4.1.1 seq2gene flowchart . .3 4.1.2 runseq2gene inputs/parameters . .5 4.1.3 runseq2gene outputs . .8 4.2 gene2pathway . 10 4.2.1 gene2pathway flowchart . 11 4.2.2 gene2pathway test inputs/parameters . 11 4.2.3 gene2pathway test outputs . 12 5 Examples 13 5.1 ChIP-seq data analysis . 13 5.1.1 Map ChIP-seq enriched peaks to genes using runseq2gene .................... 13 5.1.2 Discover enriched GO terms using gene2pathway_test with gene scores . 15 5.1.3 Discover enriched GO terms using Fisher's Exact test without gene scores . 17 5.1.4 Add description for genes . 20 5.2 RNA-seq data analysis . 20 6 R environment session 23 1 Abstract Seq2pathway is a novel computational tool to analyze functional gene-sets (including signaling pathways) using variable next-generation sequencing data[1]. Integral to this tool are the \seq2gene" and \gene2pathway" components in series that infer a quantitative pathway-level profile for each sample. The seq2gene function assigns phenotype-associated significance of genomic regions to gene-level scores, where the significance could be p-values of SNPs or point mutations, protein-binding affinity, or transcriptional expression level. The seq2gene function has the feasibility to assign non-exon regions to a range of neighboring genes besides the nearest one, thus facilitating the study of functional non-coding elements[2]. Then the gene2pathway summarizes gene-level measurements to pathway-level scores, comparing the quantity of significance for gene members within a pathway with those outside a pathway.
    [Show full text]
  • Two Locus Inheritance of Non-Syndromic Midline Craniosynostosis Via Rare SMAD6 and 4 Common BMP2 Alleles 5 6 Andrew T
    1 2 3 Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and 4 common BMP2 alleles 5 6 Andrew T. Timberlake1-3, Jungmin Choi1,2, Samir Zaidi1,2, Qiongshi Lu4, Carol Nelson- 7 Williams1,2, Eric D. Brooks3, Kaya Bilguvar1,5, Irina Tikhonova5, Shrikant Mane1,5, Jenny F. 8 Yang3, Rajendra Sawh-Martinez3, Sarah Persing3, Elizabeth G. Zellner3, Erin Loring1,2,5, Carolyn 9 Chuang3, Amy Galm6, Peter W. Hashim3, Derek M. Steinbacher3, Michael L. DiLuna7, Charles 10 C. Duncan7, Kevin A. Pelphrey8, Hongyu Zhao4, John A. Persing3, Richard P. Lifton1,2,5,9 11 12 1Department of Genetics, Yale University School of Medicine, New Haven, CT, USA 13 2Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT, USA 14 3Section of Plastic and Reconstructive Surgery, Department of Surgery, Yale University School of Medicine, New Haven, CT, USA 15 4Department of Biostatistics, Yale University School of Medicine, New Haven, CT, USA 16 5Yale Center for Genome Analysis, New Haven, CT, USA 17 6Craniosynostosis and Positional Plagiocephaly Support, New York, NY, USA 18 7Department of Neurosurgery, Yale University School of Medicine, New Haven, CT, USA 19 8Child Study Center, Yale University School of Medicine, New Haven, CT, USA 20 9The Rockefeller University, New York, NY, USA 21 22 ABSTRACT 23 Premature fusion of the cranial sutures (craniosynostosis), affecting 1 in 2,000 24 newborns, is treated surgically in infancy to prevent adverse neurologic outcomes. To 25 identify mutations contributing to common non-syndromic midline (sagittal and metopic) 26 craniosynostosis, we performed exome sequencing of 132 parent-offspring trios and 59 27 additional probands.
    [Show full text]
  • Supplementary Information Integrative Analyses of Splicing in the Aging Brain: Role in Susceptibility to Alzheimer’S Disease
    Supplementary Information Integrative analyses of splicing in the aging brain: role in susceptibility to Alzheimer’s Disease Contents 1. Supplementary Notes 1.1. Religious Orders Study and Memory and Aging Project 1.2. Mount Sinai Brain Bank Alzheimer’s Disease 1.3. CommonMind Consortium 1.4. Data Availability 2. Supplementary Tables 3. Supplementary Figures Note: Supplementary Tables are provided as separate Excel files. 1. Supplementary Notes 1.1. Religious Orders Study and Memory and Aging Project Gene expression data1. Gene expression data were generated using RNA- sequencing from Dorsolateral Prefrontal Cortex (DLPFC) of 540 individuals, at an average sequence depth of 90M reads. Detailed description of data generation and processing was previously described2 (Mostafavi, Gaiteri et al., under review). Samples were submitted to the Broad Institute’s Genomics Platform for transcriptome analysis following the dUTP protocol with Poly(A) selection developed by Levin and colleagues3. All samples were chosen to pass two initial quality filters: RNA integrity (RIN) score >5 and quantity threshold of 5 ug (and were selected from a larger set of 724 samples). Sequencing was performed on the Illumina HiSeq with 101bp paired-end reads and achieved coverage of 150M reads of the first 12 samples. These 12 samples will serve as a deep coverage reference and included 2 males and 2 females of nonimpaired, mild cognitive impaired, and Alzheimer's cases. The remaining samples were sequenced with target coverage of 50M reads; the mean coverage for the samples passing QC is 95 million reads (median 90 million reads). The libraries were constructed and pooled according to the RIN scores such that similar RIN scores would be pooled together.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • An Interaction Map of Circulating Metabolites, Immune Gene Networks, and Their Genetic Regulation Artika P
    Nath et al. Genome Biology (2017) 18:146 DOI 10.1186/s13059-017-1279-y RESEARCH Open Access An interaction map of circulating metabolites, immune gene networks, and their genetic regulation Artika P. Nath1,2, Scott C. Ritchie2,3, Sean G. Byars3,4, Liam G. Fearnley3,4, Aki S. Havulinna5,6, Anni Joensuu5, Antti J. Kangas7, Pasi Soininen7,8, Annika Wennerström5, Lili Milani9, Andres Metspalu9, Satu Männistö5, Peter Würtz7,10, Johannes Kettunen5,7,8,11, Emma Raitoharju12, Mika Kähönen13, Markus Juonala14,15, Aarno Palotie6,16,17,18, Mika Ala-Korpela7,8,11,19,20, Samuli Ripatti6,21, Terho Lehtimäki12, Gad Abraham2,3,4, Olli Raitakari22,23, Veikko Salomaa5, Markus Perola5,6,9 and Michael Inouye1,2,3,4* Abstract Background: Immunometabolism plays a central role in many cardiometabolic diseases. However, a robust map of immune-related gene networks in circulating human cells, their interactions with metabolites, and their genetic control is still lacking. Here, we integrate blood transcriptomic, metabolomic, and genomic profiles from two population-based cohorts (total N = 2168), including a subset of individuals with matched multi-omic data at 7-year follow-up. Results: We identify topologically replicable gene networks enrichedfordiverseimmunefunctions including cytotoxicity, viral response, B cell, platelet, neutrophil, and mast cell/basophil activity. These immune gene modules show complex patterns of association with 158 circulating metabolites, including lipoprotein subclasses, lipids, fatty acids, amino acids, small molecules, and CRP. Genome-wide scans for module expression quantitative trait loci (mQTLs) reveal five modules with mQTLs that have both cis and trans effects. The strongest mQTL is in ARHGEF3 (rs1354034) and affects a module enriched for platelet function, independent of platelet counts.
    [Show full text]
  • Growth and Molecular Profile of Lung Cancer Cells Expressing Ectopic LKB1: Down-Regulation of the Phosphatidylinositol 3؅-Phosphate Kinase/PTEN Pathway1
    [CANCER RESEARCH 63, 1382–1388, March 15, 2003] Growth and Molecular Profile of Lung Cancer Cells Expressing Ectopic LKB1: Down-Regulation of the Phosphatidylinositol 3؅-Phosphate Kinase/PTEN Pathway1 Ana I. Jimenez, Paloma Fernandez, Orlando Dominguez, Ana Dopazo, and Montserrat Sanchez-Cespedes2 Molecular Pathology Program [A. I. J., P. F., M. S-C.], Genomics Unit [O. D.], and Microarray Analysis Unit [A. D.], Spanish National Cancer Center, 28029 Madrid, Spain ABSTRACT the cell cycle in G1 (8, 9). However, the intrinsic mechanism by which LKB1 activity is regulated in cells and how it leads to the suppression Germ-line mutations in LKB1 gene cause the Peutz-Jeghers syndrome of cell growth is still unknown. It has been proposed that growth (PJS), a genetic disease with increased risk of malignancies. Recently, suppression by LKB1 is mediated through p21 in a p53-dependent LKB1-inactivating mutations have been identified in one-third of sporadic lung adenocarcinomas, indicating that LKB1 gene inactivation is critical in mechanism (7). In addition, it has been observed that LKB1 binds to tumors other than those of the PJS syndrome. However, the in vivo brahma-related gene 1 protein (BRG1) and this interaction is required substrates of LKB1 and its role in cancer development have not been for BRG1-induced growth arrest (10). Similar to what happens in the completely elucidated. Here we show that overexpression of wild-type PJS, Lkb1 heterozygous knockout mice show gastrointestinal hamar- LKB1 protein in A549 lung adenocarcinomas cells leads to cell-growth tomatous polyposis and frequent hepatocellular carcinomas (11, 12). suppression. To examine changes in gene expression profiles subsequent to Interestingly, the hamartomas, but not the malignant tumors, arising in exogenous wild-type LKB1 in A549 cells, we used cDNA microarrays.
    [Show full text]
  • Triplet Repeat Length Bias and Variation in the Human Transcriptome
    Triplet repeat length bias and variation in the human transcriptome Michael Mollaa,1,2, Arthur Delcherb,1, Shamil Sunyaevc, Charles Cantora,d,2, and Simon Kasifa,e aDepartment of Biomedical Engineering and dCenter for Advanced Biotechnology, Boston University, Boston, MA 02215; bCenter for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742; cDepartment of Medicine, Division of Genetics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115; and eCenter for Advanced Genomic Technology, Boston University, Boston, MA 02215 Contributed by Charles Cantor, July 6, 2009 (sent for review May 4, 2009) Length variation in short tandem repeats (STRs) is an important family including Huntington’s disease (10) and hereditary ataxias (11, 12). of DNA polymorphisms with numerous applications in genetics, All Huntington’s patients exhibit an expanded number of copies in medicine, forensics, and evolutionary analysis. Several major diseases the CAG tandem repeat subsequence in the N terminus of the have been associated with length variation of trinucleotide (triplet) huntingtin gene. Moreover, an increase in the repeat length is repeats including Huntington’s disease, hereditary ataxias and spi- anti-correlated to the onset age of the disease (13). Multiple other nobulbar muscular atrophy. Using the reference human genome, we diseases have also been associated with copy number variation of have catalogued all triplet repeats in genic regions. This data revealed tandem repeats (8, 14). Researchers have hypothesized that inap- a bias in noncoding DNA repeat lengths. It also enabled a survey of propriate repeat variation in coding regions could result in toxicity, repeat-length polymorphisms (RLPs) in human genomes and a com- incorrect folding, or aggregation of a protein.
    [Show full text]
  • Toxicogenomics Applications of New Functional Genomics Technologies in Toxicology
    \-\w j Toxicogenomics Applications of new functional genomics technologies in toxicology Wilbert H.M. Heijne Proefschrift ter verkrijging vand egraa dva n doctor opgeza gva nd e rector magnificus vanWageninge n Universiteit, Prof.dr.ir. L. Speelman, in netopenbaa r te verdedigen op maandag6 decembe r200 4 des namiddagst e half twee ind eAul a - Table of contents Abstract Chapter I. page 1 General introduction [1] Chapter II page 21 Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach[2] Chapter III page 48 Bromobenzene-induced hepatotoxicity atth etranscriptom e level PI Chapter IV page 67 Profiles of metabolites and gene expression in rats with chemically induced hepatic necrosis[4] Chapter V page 88 Liver gene expression profiles in relation to subacute toxicity in rats exposed to benzene[5] Chapter VI page 115 Toxicogenomics analysis of liver gene expression in relation to subacute toxicity in rats exposed totrichloroethylen e [6] Chapter VII page 135 Toxicogenomics analysis ofjoin t effects of benzene and trichloroethylene mixtures in rats m Chapter VII page 159 Discussion and conclusions References page 171 Appendices page 187 Samenvatting page 199 Dankwoord About the author Glossary Abbreviations List of genes Chapter I General introduction Parts of this introduction were publishedin : Molecular Biology in Medicinal Chemistry, Heijne etal., 2003 m NATO Advanced Research Workshop proceedings, Heijne eral., 2003 81 Chapter I 1. General introduction 1.1 Background /.1.1 Toxicologicalrisk
    [Show full text]
  • WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization I International Bureau (10) International Publication Number (43) International Publication Date WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT (51) International Patent Classification: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, C12Q 1/68 (2018.01) A61P 31/18 (2006.01) DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, C12Q 1/70 (2006.01) HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, (21) International Application Number: MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, PCT/US2018/056167 OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, (22) International Filing Date: SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, 16 October 2018 (16. 10.2018) TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (25) Filing Language: English (84) Designated States (unless otherwise indicated, for every kind of regional protection available): ARIPO (BW, GH, (26) Publication Language: English GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, (30) Priority Data: UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, 62/573,025 16 October 2017 (16. 10.2017) US TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, ΓΕ , IS, IT, LT, LU, LV, (71) Applicant: MASSACHUSETTS INSTITUTE OF MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TECHNOLOGY [US/US]; 77 Massachusetts Avenue, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, Cambridge, Massachusetts 02139 (US).
    [Show full text]