Identifying Tissue Specific Distal Regulatory Sequences in the Mouse Genome

Total Page:16

File Type:pdf, Size:1020Kb

Identifying Tissue Specific Distal Regulatory Sequences in the Mouse Genome Identifying tissue specific distal regulatory sequences in the mouse genome by Julie Chih-yu Chen A thesis submitted in conformity with the requirements for the degree of Master Cell and Systems Biology University of Toronto © Copyright by Julie Chih-yu Chen 2011 Identifying tissue specific distal regulatory sequences in the mouse genome Julie Chih-yu Chen Master Cell and Systems Biology University of Toronto 2011 Abstract Epigenetic modifications, transcription factor (TF) availability and chromatin conformation influence how a genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. In this report, enhancer identification is first carried out using an enhancer associated feature in mouse erythroid cells. Taking advantage of public domain ChIP-Seq data sets in mouse embryonic stem cells, an integrative model is then used to assess features in enhancer prediction, and subsequently locate enhancers. Significant associations with multiple TF bound loci, higher expression in the closest genes, and active enhancer marks support functionality and tissue-specificity of these enhancers. Motif enrichment analysis further determines known and novel TFs regulating the target cell type. Furthermore, the features identified can facilitate more accurate enhancer prediction in other cell types. ii “You cannot open a book without learning something.” Confucius iii Acknowledgments My time at U of T has been a life changing experience. As a result of the environment and my commitment during this time, my desire in pursuing research has never been more evident. Much more than I set out to achieve, I have expanded my horizon and developed diverse sets of skills both in research and in personal growth. I would like to thank my supervisor, Dr. Jennifer Mitchell, for the introduction to the research, the opportunity to learn wet lab techniques, the experience, and the frequent discussion that shaped the biological framework of the study. I am particularly grateful of her guidance on biological interpretations and her help on editing my NSERC application. I would also like to thank Dr. Quaid Morris for computational guidance in the mouse embryonic stem cell project, and for the discussions that enriched my research experience. I also thank both of my committee members, Dr. Sue Varmuza and Dr. Nicholas Provart, for their inputs on my study and all the help on my completion of the graduate study. I am especially thankful of Dr. Nicholas Provart for the thorough editing of my thesis. I am very grateful of the government funding agencies, OGS and NSERC, for the awards to support my research, and the conference organizations, BioC2010 and CDB Symposium 2011, for the travel fellowships, which allowed me not only to present my posters, but also interact with other researchers and broaden my scope of knowledge. I would like to thank my lab mates, Anandi Bhattacharya and Mike Schwartz, and Jessica Yang and Dr. Yunchen Gong from the Guttman lab for the discussions on biological and bioinformatics research, and for being there to brighten up the days. I also thank Dr. Paul Boutros for his final and very helpful inputs, and Dr. Ieuan Clay for the opportunity to participate in one of his projects. Furthermore, a great deal of my bioinformatics and statistical applications in the thesis were acquired from periods when I was supervised by Dr. Chao A. Hsiung, Dr. I- Shou Chang and Dr. Von-Wun Soo. I am deeply grateful of their influences, and the opportunities to learn and develop these skills. Lastly, I would like to thank my parents and my best friend, Hui-yi, for the constant support and encouragement from the other end of the world throughout my graduate study. I also thank my iv family, Andy, Alice, Cat and n, for their warm and loving support. I could not have reached this point without them. Finally, on a non-scientific note, I am grateful of the five positive messages in the fortune cookies, which had strong relevance to the stages of my life at the moments I received them. v Table of Contents Acknowledgments ........................................................................................................................... iv Table of Contents ............................................................................................................................ vi Declaration…………….. ................................................................................................................. x List of Abbreviations ...................................................................................................................... xi List of Tables…………………… ................................................................................................. xii List of Figures…………………… ............................................................................................... xiii List of Appendices ........................................................................................................................ xiv Chapter 1 Introduction ..................................................................................................................... 1 1.1. One genome, multiple epigenomes and transcriptomes ..................................................... 2 1.2. Distal regulatory elements: Enhancers ................................................................................ 2 1.2.1. Significance not to be overlooked ........................................................................... 2 1.2.2. Epigenetic states at enhancers in relation to tissue specificity ............................... 5 1.2.3. Regulation of gene expression through chromatin looping .................................... 5 1.3. Features predictive of enhancers ......................................................................................... 5 1.3.1. Interaction of proteins and enhancers ..................................................................... 6 1.3.2. Histone modification states at enhancers ................................................................ 6 1.3.3. Active and poised enhancers in embryonic stem cells ............................................ 7 1.4. Computational approaches relevant to enhancer identification .......................................... 7 1.4.1. Position specific matrices and comparative genomics ............................................ 7 1.4.2. Integrative modeling of ChIP-Seq data ................................................................... 8 1.5. Thesis overview .................................................................................................................. 9 Chapter 2 Methods ......................................................................................................................... 10 2.1. Methods for enhancer identification in mouse erythroid cell ........................................... 11 2.1.1. Mapping of datasets to the mouse genome ........................................................... 11 vi 2.1.2. Enrichment of nucRNA-Seq and ChIP-Seq datasets ............................................ 11 2.1.3. Conservation, motif identification, and function annotation analyses .................. 13 2.1.4. Native ChIP-qPCR of H3K4me1 .......................................................................... 13 2.2. Methods for enhancer identification in mouse ES cells .................................................... 14 2.2.1. Public datasets ……………………………. ......................................................... 14 2.2.2. Data pre-processing .............................................................................................. 14 2.2.3. Training data sets .................................................................................................. 16 2.2.4. Feature combination assessment using Naive Bayes ............................................ 16 2.2.5. Feature extraction with lasso regularized multinomial logistic regression ........... 17 2.2.6. Absolute gene expression in mouse ES cell ......................................................... 17 2.2.7. Gene Ontology functional enrichment analysis .................................................... 17 2.2.8. Association with multiple transcription factor bound loci .................................... 18 2.2.9. Supervised motif analysis ..................................................................................... 18 2.2.10. Comparison to other high throughput sequencing datasets .................................. 18 Chapter 3 Enhancer Identification in Erythroid Cells: A Biologically Directed Approach .......... 20 3.1. Introduction ....................................................................................................................... 21 3.2. Results……………………… ........................................................................................... 22 3.2.1. A closer look at the Hbb locus control region ...................................................... 22 3.2.2. Identification of putative enhancers ...................................................................... 24 3.2.3. Overlap with transcription factors and conserved regions of the genome ............ 27 3.2.4. Multiple transcription factor peaks in proximity to putative enhancers ............... 29 3.2.5. H3K4me1 ChIP-qPCR results support putative
Recommended publications
  • Detection of Interacting Transcription Factors in Human Tissues Using
    Myšičková and Vingron BMC Genomics 2012, 13(Suppl 1):S2 http://www.biomedcentral.com/1471-2164/13/S1/S2 PROCEEDINGS Open Access Detection of interacting transcription factors in human tissues using predicted DNA binding affinity Alena Myšičková*, Martin Vingron From The Tenth Asia Pacific Bioinformatics Conference (APBC 2012) Melbourne, Australia. 17-19 January 2012 Abstract Background: Tissue-specific gene expression is generally regulated by combinatorial interactions among transcription factors (TFs) which bind to the DNA. Despite this known fact, previous discoveries of the mechanism that controls gene expression usually consider only a single TF. Results: We provide a prediction of interacting TFs in 22 human tissues based on their DNA-binding affinity in promoter regions. We analyze all possible pairs of 130 vertebrate TFs from the JASPAR database. First, all human promoter regions are scanned for single TF-DNA binding affinities with TRAP and for each TF a ranked list of all promoters ordered by the binding affinity is created. We then study the similarity of the ranked lists and detect candidates for TF-TF interaction by applying a partial independence test for multiway contingency tables. Our candidates are validated by both known protein-protein interactions (PPIs) and known gene regulation mechanisms in the selected tissue. We find that the known PPIs are significantly enriched in the groups of our predicted TF-TF interactions (2 and 7 times more common than expected by chance). In addition, the predicted interacting TFs for studied tissues (liver, muscle, hematopoietic stem cell) are supported in literature to be active regulators or to be expressed in the corresponding tissue.
    [Show full text]
  • Table S1. List of Proteins in the BAHD1 Interactome
    Table S1. List of proteins in the BAHD1 interactome BAHD1 nuclear partners found in this work yeast two-hybrid screen Name Description Function Reference (a) Chromatin adapters HP1α (CBX5) chromobox homolog 5 (HP1 alpha) Binds histone H3 methylated on lysine 9 and chromatin-associated proteins (20-23) HP1β (CBX1) chromobox homolog 1 (HP1 beta) Binds histone H3 methylated on lysine 9 and chromatin-associated proteins HP1γ (CBX3) chromobox homolog 3 (HP1 gamma) Binds histone H3 methylated on lysine 9 and chromatin-associated proteins MBD1 methyl-CpG binding domain protein 1 Binds methylated CpG dinucleotide and chromatin-associated proteins (22, 24-26) Chromatin modification enzymes CHD1 chromodomain helicase DNA binding protein 1 ATP-dependent chromatin remodeling activity (27-28) HDAC5 histone deacetylase 5 Histone deacetylase activity (23,29,30) SETDB1 (ESET;KMT1E) SET domain, bifurcated 1 Histone-lysine N-methyltransferase activity (31-34) Transcription factors GTF3C2 general transcription factor IIIC, polypeptide 2, beta 110kDa Required for RNA polymerase III-mediated transcription HEYL (Hey3) hairy/enhancer-of-split related with YRPW motif-like DNA-binding transcription factor with basic helix-loop-helix domain (35) KLF10 (TIEG1) Kruppel-like factor 10 DNA-binding transcription factor with C2H2 zinc finger domain (36) NR2F1 (COUP-TFI) nuclear receptor subfamily 2, group F, member 1 DNA-binding transcription factor with C4 type zinc finger domain (ligand-regulated) (36) PEG3 paternally expressed 3 DNA-binding transcription factor with
    [Show full text]
  • Small Cell Ovarian Carcinoma: Genomic Stability and Responsiveness to Therapeutics
    Gamwell et al. Orphanet Journal of Rare Diseases 2013, 8:33 http://www.ojrd.com/content/8/1/33 RESEARCH Open Access Small cell ovarian carcinoma: genomic stability and responsiveness to therapeutics Lisa F Gamwell1,2, Karen Gambaro3, Maria Merziotis2, Colleen Crane2, Suzanna L Arcand4, Valerie Bourada1,2, Christopher Davis2, Jeremy A Squire6, David G Huntsman7,8, Patricia N Tonin3,4,5 and Barbara C Vanderhyden1,2* Abstract Background: The biology of small cell ovarian carcinoma of the hypercalcemic type (SCCOHT), which is a rare and aggressive form of ovarian cancer, is poorly understood. Tumourigenicity, in vitro growth characteristics, genetic and genomic anomalies, and sensitivity to standard and novel chemotherapeutic treatments were investigated in the unique SCCOHT cell line, BIN-67, to provide further insight in the biology of this rare type of ovarian cancer. Method: The tumourigenic potential of BIN-67 cells was determined and the tumours formed in a xenograft model was compared to human SCCOHT. DNA sequencing, spectral karyotyping and high density SNP array analysis was performed. The sensitivity of the BIN-67 cells to standard chemotherapeutic agents and to vesicular stomatitis virus (VSV) and the JX-594 vaccinia virus was tested. Results: BIN-67 cells were capable of forming spheroids in hanging drop cultures. When xenografted into immunodeficient mice, BIN-67 cells developed into tumours that reflected the hypercalcemia and histology of human SCCOHT, notably intense expression of WT-1 and vimentin, and lack of expression of inhibin. Somatic mutations in TP53 and the most common activating mutations in KRAS and BRAF were not found in BIN-67 cells by DNA sequencing.
    [Show full text]
  • Table S1 the Four Gene Sets Derived from Gene Expression Profiles of Escs and Differentiated Cells
    Table S1 The four gene sets derived from gene expression profiles of ESCs and differentiated cells Uniform High Uniform Low ES Up ES Down EntrezID GeneSymbol EntrezID GeneSymbol EntrezID GeneSymbol EntrezID GeneSymbol 269261 Rpl12 11354 Abpa 68239 Krt42 15132 Hbb-bh1 67891 Rpl4 11537 Cfd 26380 Esrrb 15126 Hba-x 55949 Eef1b2 11698 Ambn 73703 Dppa2 15111 Hand2 18148 Npm1 11730 Ang3 67374 Jam2 65255 Asb4 67427 Rps20 11731 Ang2 22702 Zfp42 17292 Mesp1 15481 Hspa8 11807 Apoa2 58865 Tdh 19737 Rgs5 100041686 LOC100041686 11814 Apoc3 26388 Ifi202b 225518 Prdm6 11983 Atpif1 11945 Atp4b 11614 Nr0b1 20378 Frzb 19241 Tmsb4x 12007 Azgp1 76815 Calcoco2 12767 Cxcr4 20116 Rps8 12044 Bcl2a1a 219132 D14Ertd668e 103889 Hoxb2 20103 Rps5 12047 Bcl2a1d 381411 Gm1967 17701 Msx1 14694 Gnb2l1 12049 Bcl2l10 20899 Stra8 23796 Aplnr 19941 Rpl26 12096 Bglap1 78625 1700061G19Rik 12627 Cfc1 12070 Ngfrap1 12097 Bglap2 21816 Tgm1 12622 Cer1 19989 Rpl7 12267 C3ar1 67405 Nts 21385 Tbx2 19896 Rpl10a 12279 C9 435337 EG435337 56720 Tdo2 20044 Rps14 12391 Cav3 545913 Zscan4d 16869 Lhx1 19175 Psmb6 12409 Cbr2 244448 Triml1 22253 Unc5c 22627 Ywhae 12477 Ctla4 69134 2200001I15Rik 14174 Fgf3 19951 Rpl32 12523 Cd84 66065 Hsd17b14 16542 Kdr 66152 1110020P15Rik 12524 Cd86 81879 Tcfcp2l1 15122 Hba-a1 66489 Rpl35 12640 Cga 17907 Mylpf 15414 Hoxb6 15519 Hsp90aa1 12642 Ch25h 26424 Nr5a2 210530 Leprel1 66483 Rpl36al 12655 Chi3l3 83560 Tex14 12338 Capn6 27370 Rps26 12796 Camp 17450 Morc1 20671 Sox17 66576 Uqcrh 12869 Cox8b 79455 Pdcl2 20613 Snai1 22154 Tubb5 12959 Cryba4 231821 Centa1 17897
    [Show full text]
  • The New Therapeutic Strategies in Pediatric T-Cell Acute Lymphoblastic Leukemia
    International Journal of Molecular Sciences Review The New Therapeutic Strategies in Pediatric T-Cell Acute Lymphoblastic Leukemia Marta Weronika Lato 1 , Anna Przysucha 1, Sylwia Grosman 1, Joanna Zawitkowska 2 and Monika Lejman 3,* 1 Student Scientific Society, Laboratory of Genetic Diagnostics, Medical University of Lublin, 20-093 Lublin, Poland; [email protected] (M.W.L.); [email protected] (A.P.); [email protected] (S.G.) 2 Department of Pediatric Hematology, Oncology and Transplantology, Medical University of Lublin, 20-093 Lublin, Poland; [email protected] 3 Laboratory of Genetic Diagnostics, Medical University of Lublin, 20-093 Lublin, Poland * Correspondence: [email protected] Abstract: Childhood acute lymphoblastic leukemia is a genetically heterogeneous cancer that ac- counts for 10–15% of T-cell acute lymphoblastic leukemia (T-ALL) cases. The T-ALL event-free survival rate (EFS) is 85%. The evaluation of structural and numerical chromosomal changes is important for a comprehensive biological characterization of T-ALL, but there are currently no ge- netic prognostic markers. Despite chemotherapy regimens, steroids, and allogeneic transplantation, relapse is the main problem in children with T-ALL. Due to the development of high-throughput molecular methods, the ability to define subgroups of T-ALL has significantly improved in the last few years. The profiling of the gene expression of T-ALL has led to the identification of T-ALL subgroups, and it is important in determining prognostic factors and choosing an appropriate treatment. Novel therapies targeting molecular aberrations offer promise in achieving better first remission with the Citation: Lato, M.W.; Przysucha, A.; hope of preventing relapse.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Supplementary Tables
    Supp Table 1. Patient characteristics of samples used for colony assays, xenografts and intracellular colony assays Sr.No Age Sex Diagnosis Blast% Cytogenetics Mutations (IPSS) (AF%) Patient samples used for colony assays 1. 68 F Low risk 1% N/A None MDS 2. 78 M RAEB-1 7% N/A DNMT3A 3. 66 F High risk 12.2% 5q-,7q-,+11, DNMT3A MDS 20q- (28%) TP53 (58%) 4. 61 M High risk 6.2% Monosomy 7 ASXL1 (36%) MDS EZH2 (77%) RUNX1(18%) 5. 63 M Low risk Normal ETV6 (34%), MDS KRAS(15%), RUNX1 (40%), SRSF2 (43%), ZRSR2 (86%) 6. 62 F RAEB-2 Del 5q TP53 (7%) 7. 76 F t-MDS <1% Normal TET2(10%) 8. 80 M Low risk 5% Normal SF3B1(21%) MDS TET2(8%) ZRZR2(62%) 9. 74 F MPN 5% JAK2V617F+ 10. 76 M Low risk <1% Deletion Y None MDS 11. 84 M Low risk N/A N/A N/A MDS 12. 86 M Int-2 MDS 4-8% Normal U2AF1 (43%) blasts CBL (15%) 13. 64 M low risk 1-3% 20q deletion ASXL1 (17%) MDS SETBP1(17%) U2AF1(15%) 14. 81 F Low risk <1% Normal None MDS Patient samples used for PDX 15. 87 F Int-2 risk 1.2% Complex SETBP1 (38%) MDS cytogenetics, del 7, dup 11, del 13q 16. 59 F High-risk 7-10% Complex NRAS (12%), MDS cytogenetics RUNX1 (20%), (-5q31, -7q31, SRSF2 (22%), trisomy 8, del STAG2 (17%) 11q23 17. 79 F Int-2 risk 5-8% None MDS 18. 67 M MPN 6.6% Normal CALR (51%) IDH1(47%) PDGFRB (47%) Patient samples used for intracellular ASO uptake 19.
    [Show full text]
  • Supplemental Information
    Supplemental information Dissection of the genomic structure of the miR-183/96/182 gene. Previously, we showed that the miR-183/96/182 cluster is an intergenic miRNA cluster, located in a ~60-kb interval between the genes encoding nuclear respiratory factor-1 (Nrf1) and ubiquitin-conjugating enzyme E2H (Ube2h) on mouse chr6qA3.3 (1). To start to uncover the genomic structure of the miR- 183/96/182 gene, we first studied genomic features around miR-183/96/182 in the UCSC genome browser (http://genome.UCSC.edu/), and identified two CpG islands 3.4-6.5 kb 5’ of pre-miR-183, the most 5’ miRNA of the cluster (Fig. 1A; Fig. S1 and Seq. S1). A cDNA clone, AK044220, located at 3.2-4.6 kb 5’ to pre-miR-183, encompasses the second CpG island (Fig. 1A; Fig. S1). We hypothesized that this cDNA clone was derived from 5’ exon(s) of the primary transcript of the miR-183/96/182 gene, as CpG islands are often associated with promoters (2). Supporting this hypothesis, multiple expressed sequences detected by gene-trap clones, including clone D016D06 (3, 4), were co-localized with the cDNA clone AK044220 (Fig. 1A; Fig. S1). Clone D016D06, deposited by the German GeneTrap Consortium (GGTC) (http://tikus.gsf.de) (3, 4), was derived from insertion of a retroviral construct, rFlpROSAβgeo in 129S2 ES cells (Fig. 1A and C). The rFlpROSAβgeo construct carries a promoterless reporter gene, the β−geo cassette - an in-frame fusion of the β-galactosidase and neomycin resistance (Neor) gene (5), with a splicing acceptor (SA) immediately upstream, and a polyA signal downstream of the β−geo cassette (Fig.
    [Show full text]
  • WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization I International Bureau (10) International Publication Number (43) International Publication Date WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT (51) International Patent Classification: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DJ, DK, DM, DO, C12Q 1/68 (2018.01) A61P 31/18 (2006.01) DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, C12Q 1/70 (2006.01) HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP, KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, (21) International Application Number: MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, PCT/US2018/056167 OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, (22) International Filing Date: SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, 16 October 2018 (16. 10.2018) TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (25) Filing Language: English (84) Designated States (unless otherwise indicated, for every kind of regional protection available): ARIPO (BW, GH, (26) Publication Language: English GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, (30) Priority Data: UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, 62/573,025 16 October 2017 (16. 10.2017) US TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, ΓΕ , IS, IT, LT, LU, LV, (71) Applicant: MASSACHUSETTS INSTITUTE OF MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TECHNOLOGY [US/US]; 77 Massachusetts Avenue, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, Cambridge, Massachusetts 02139 (US).
    [Show full text]
  • Genome-Scale Identification of Transcription Factors That Mediate An
    ARTICLE DOI: 10.1038/s41467-018-04406-2 OPEN Genome-scale identification of transcription factors that mediate an inflammatory network during breast cellular transformation Zhe Ji 1,2,4, Lizhi He1, Asaf Rotem1,2,5, Andreas Janzer1,6, Christine S. Cheng2,7, Aviv Regev2,3 & Kevin Struhl 1 Transient activation of Src oncoprotein in non-transformed, breast epithelial cells can initiate an epigenetic switch to the stably transformed state via a positive feedback loop that involves 1234567890():,; the inflammatory transcription factors STAT3 and NF-κB. Here, we develop an experimental and computational pipeline that includes 1) a Bayesian network model (AccessTF) that accurately predicts protein-bound DNA sequence motifs based on chromatin accessibility, and 2) a scoring system (TFScore) that rank-orders transcription factors as candidates for being important for a biological process. Genetic experiments validate TFScore and suggest that more than 40 transcription factors contribute to the oncogenic state in this model. Interestingly, individual depletion of several of these factors results in similar transcriptional profiles, indicating that a complex and interconnected transcriptional network promotes a stable oncogenic state. The combined experimental and computational pipeline represents a general approach to comprehensively identify transcriptional regulators important for a biological process. 1 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA. 2 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. 3 Department of Biology, Howard Hughes Medical Institute and David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 20140, USA. 4Present address: Department of Pharmacology and Biomedical Engineering, Northwestern University, Evanston 60611 IL, USA.
    [Show full text]
  • Analysis of Genomic Aberrations and Gene Expression Profiling Identifies
    Citation: Blood Cancer Journal (2011) 1, e40; doi:10.1038/bcj.2011.39 & 2011 Macmillan Publishers Limited All rights reserved 2044-5385/11 www.nature.com/bcj ORIGINAL ARTICLE Analysis of genomic aberrations and gene expression profiling identifies novel lesions and pathways in myeloproliferative neoplasms KL Rice1, X Lin1, K Wolniak1, BL Ebert2, W Berkofsky-Fessler3, M Buzzai4, Y Sun5,CXi5, P Elkin5, R Levine6, T Golub7, DG Gilliland8, JD Crispino1, JD Licht1 and W Zhang5 1Division of Hematology/Oncology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA; 2Division of Hematology, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA; 3Translational Research Sciences, Hoffmann-La Roche, Inc., Nutley, NJ, USA; 4Oncology, Novartis, Origgio, VA, Italy; 5Division of Hematology/ Oncology, Department of Medicine, Center of Biomedical Informatics, Mount Sinai School of Medicine, New York, NY, USA; 6Human Oncology and Pathogenesis Program and Leukemia Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA; 7The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA and 8Oncology, Merck Research Laboratories, North Wales, PA, USA Polycythemia vera (PV), essential thrombocythemia and by a recurrent mutation, JAK2V617F, which is present in B95% primary myelofibrosis, are myeloproliferative neoplasms of patients with PV, B65% with PMF and B55% with ET.2 JAK2 (MPNs) with distinct clinical features and are associated with is one member of the Janus family of non-receptor tyrosine the JAK2V617F mutation. To identify genomic anomalies involved in the pathogenesis of these disorders, we profiled kinases that transduces signals downstream of type I and II 87 MPN patients using Affymetrix 250K single-nucleotide cytokine receptors via signal transducer and activators of polymorphism (SNP) arrays.
    [Show full text]
  • Supplementary Table S4. FGA Co-Expressed Gene List in LUAD
    Supplementary Table S4. FGA co-expressed gene list in LUAD tumors Symbol R Locus Description FGG 0.919 4q28 fibrinogen gamma chain FGL1 0.635 8p22 fibrinogen-like 1 SLC7A2 0.536 8p22 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 DUSP4 0.521 8p12-p11 dual specificity phosphatase 4 HAL 0.51 12q22-q24.1histidine ammonia-lyase PDE4D 0.499 5q12 phosphodiesterase 4D, cAMP-specific FURIN 0.497 15q26.1 furin (paired basic amino acid cleaving enzyme) CPS1 0.49 2q35 carbamoyl-phosphate synthase 1, mitochondrial TESC 0.478 12q24.22 tescalcin INHA 0.465 2q35 inhibin, alpha S100P 0.461 4p16 S100 calcium binding protein P VPS37A 0.447 8p22 vacuolar protein sorting 37 homolog A (S. cerevisiae) SLC16A14 0.447 2q36.3 solute carrier family 16, member 14 PPARGC1A 0.443 4p15.1 peroxisome proliferator-activated receptor gamma, coactivator 1 alpha SIK1 0.435 21q22.3 salt-inducible kinase 1 IRS2 0.434 13q34 insulin receptor substrate 2 RND1 0.433 12q12 Rho family GTPase 1 HGD 0.433 3q13.33 homogentisate 1,2-dioxygenase PTP4A1 0.432 6q12 protein tyrosine phosphatase type IVA, member 1 C8orf4 0.428 8p11.2 chromosome 8 open reading frame 4 DDC 0.427 7p12.2 dopa decarboxylase (aromatic L-amino acid decarboxylase) TACC2 0.427 10q26 transforming, acidic coiled-coil containing protein 2 MUC13 0.422 3q21.2 mucin 13, cell surface associated C5 0.412 9q33-q34 complement component 5 NR4A2 0.412 2q22-q23 nuclear receptor subfamily 4, group A, member 2 EYS 0.411 6q12 eyes shut homolog (Drosophila) GPX2 0.406 14q24.1 glutathione peroxidase
    [Show full text]