Supp Supplement.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Supp Supplement.Pdf Supplement. April 11, 2008 Contents 1 Simulation: comparison of our method with the traditional linkage 3 2 Overlap Significance 12 3 Simulated Annealing 13 3.1 Network Jumps . 13 3.2 Expectation-maximization (EM) Update versus Maximum-Likelihood Estimation of Cluster Probabilities . 13 3.3 Iteration Number and Temperature Schedule . 14 4 Unlinked Data Simulation 15 5 Analysis Settings and Important Observations 15 6 Data 17 6.1 Molecular Network . 17 6.2 Genes . 17 6.3 Linkage Datasets . 17 6.4 Marker Maps . 18 7 Linkage Tools and Parameters 19 1 List of Figures S1 Simulation LOD scores . 4 S2 Simulation Results . 5 S3 An outline of the generative model of data in our analysis, depicted as a graphical model. 6 S4 A hypothetical family. 7 S5 A graphical model representation of a pedigree. 8 S6 Testing significance of the gene-specific statistic values. 9 List of Tables S1 Additional Highly significant and suggestively significant genes 10 S2 Overlaps for the analyses including the X chromosome. 11 S3 Analyses parameters. 16 S4 Two- and three-way global overlap p-values. 18 S5 Data sets. 19 S6 Multipoint linkage analysis tools. 20 2 1 Simulation: comparison of our method with the traditional linkage We used our generative model to produce and analyze 100 artificial data sets. For each data set we defined a randomly chosen “true” (disease predis- posing) cluster of ten genes. We then simulated genotypes and phenotypes of individuals in 1,000 families in the following way. First, we generated 1,000 “empty” pedigree topologies by randomly choosing a real pedigree with less than 17 meioses (to reduce computation time) from “our” autism data. Sec- ond, for each pedigree, we sampled a gene from the predisposing gene cluster using uniform cluster probabilities (1/10), and assigned the sampled gene to the pedigree. Third, we simulated states for all genetic marker loci and for the family-specific “causal” (disease) gene for all individuals in each family. We assigned states of genetic markers for founders of each pedigree by sam- pling from the empirical frequency distribution estimated from the autism dataset. We assumed a single disease- and a single wild-type allele for each “causal” gene, setting allele frequencies to 0.01 and 0.99, respectively. Given genotypes of pedigree founders, we simulated meiotic events leading to the rest of the pedigree, assigning allelic states to all “empty” genomes in the family. Fourth, given known genotypes of all individuals, we simulated their phenotypes. Considering only a family-specific “causal” disease gene (and not the rest of genes in the disease cluster), we attributed disease pheno- type to individuals with two wild-type alleles of the gene with probability 0.001. Similarly, individuals with one wild type and one disease allele or two disease alleles of the gene were assigned disease phenotype with probability 0.8. Fifth, we modeled ascertainment by rejecting families with less than two affected individuals. Using these 100 1,000-family artificial datasets, we compared our current approach with the conventional linkage analysis. We could not afford computationally to estimate a separate background distribution for each for the 100 simulated datasets as we did in the analyses of the real datasets. Instead we estimated one background distribution for all 100 samples by simulating 10 unlinked genotype sets for each of the 100 simulations and thus gathering a total of 1,000 unlinked simulations. As expected, due to extreme heterogeneity of data, the conventional link- age analysis, produces overwhelmingly low LOD scores, often smaller than -100 (see Figure S1 for an example LOD score curve over one of the correct 3 Figure S1: Simulation LOD scores. In the shown simulation the TRIT1 was part of the correct cluster and it was assigned to the top 10% of the families. No other locus on chromosome 1 was linked to the phenotype. A. LOD score curve produced by traditional linkage analysis. B. Per-family LOD scores. 4 genes). Such abysmal LOD scores would certainly lead to the conclusion of absence of linkage; the data would likely to be discarded without reporting. Figure S2: Simulation Results However, if we use gene-specific LOD scores to rank genes by their likely involvement in the disease etiology (by integrating over all possible thresh- olds for the LOD scores), we achieve a rather high mean AUC (area under the ROC curve) value of 0.77 [0.76, 0.78]. Suggesting that even the tradi- tional linkage analysis can be modified to extract more information under assumption of genetic heterogeneity than the approach does now. When we applied our current method to prioritize the genes based on the significance of their estimated cluster probabilities, we achieved a significantly higher mean AUC value of 0.92 [0.90, 0.93]. Further, we measured how many out of the 20 top-ranking genes produced by both methods are true-positives. As expected, our method, implement- ing the “true” statistical model, significantly outperformed the conventional 5 linkage. The average numbers of true-positive were 4.13 [3.79, 4.47] and 0.11 [0.05, 0.17], for our and conventional linkage methods, respectively (see Figure S2). We assume that every gene in the Penetrance parameters are All markers and genes are disease-predisposing gene cluster C common for all genes in the arranged according to a sex- has one normal and one disease- gene cluster C averaged genetic map predisposing allele; the frequency of the disease-predisposing allele is the same for all genes in the cluster C Molecular network Map (gene and Penetrance Pedigree m arker positions Allelic Uniform prior parameters topology in the genome) frequencies for distribution over genes and connected components m arkers of size c Gene cluster of c Pedigree-specific Gene-specific genes genotypes cluster probability (marker alleles, Pedigree- gene alleles) No errors in specific genotyping predisposing gene Observed Observed Assumption Dependence phenotype m arker alleles Parameter Data or variable Figure S3: An outline of the generative model of data in our analysis, de- picted as a graphical model. 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure S4: A hypothetical family. Each pedigree (family) described in our data is provided with a kinship structure and phenotypic information. Circles and squares represent female and male individuals, respectively. By conven- tion, the filled shapes represent affected (sick) individuals; the open shapes represent unaffected (healthy) ones. 7 0 1 0 0 1 0 g1 g2 g3 g4 g5 g6 h3,2 h4,1 h1 h2 h5 h6 h3,1 h4,2 1 g7 0 g8 g9 0 g10 1 h7,2 h8,1 h9 h10 h7,1 h8,2 0 g11 g12 1 g13 0 h12,2 h13,1 h13,2 h12,1 0 g14 g15 1 Figure S5: The same pedigree as that shown in the Figure S4, represented as a graphical model. We can directly observe phenotypes (Φ0 and Φ1 are phenotypes of unaffected and affected individuals, respectively). We can think of the phenotypes as emitted messages produced by the hidden (not directly observable) genotypes, gi. Variables hkl represent haplotypes that are passed from parents to their children. 8 Original data (Pedigrees, disease phenotypes, marker map, marker genotypes) l e Repeat K times d o Simulate the k-th set of m disease-unlinked genotypes l l Real chromosome LOD scores u (LOD score for every position on n k-th simulated set of every chromosome) chromosome LOD scores e a t h t a r Real gene LOD score matrix d - e k-th simulated gene LOD score matrix (LOD score for every gene and every pedigree) l d (LOD score for every gene and every pedigree) a n e u R n o Bootstrap Loop i Bootstrap Loop t u b i r k-th simulation-based gene-specific Real-data gene-specific statistic value t statistic value s i D For every gene K gene-specific statistic values p-value Bootstrap Loop: Input: gene LOD score matrix [LOD scores for G genes and F pedigrees] Set all gene statistic counts to 0 Repeat B times Sample with replacement F pedigrees Find the cluster C with the highest likelihood (via simulated annealing) Update statistic values for all genes Output: gene-specific statistic values Figure S6: Testing significance of the gene-specific statistic values. See the main text of the paper for discussion and explanation. 9 Table S1: Additional Significant and suggestively significant genes Chromosome Max Sum GeneID Symbol Gene name location p-value p-value autism-no-x 10913 EDAR 2q11-q13 ectodysplasin A receptor <0.000 <0.000 3991 LIPE 19q13.2 lipase, hormone-sensitive <0.000 <0.000 889 KRIT1 7q21-q22 ankyrin repeat containing <0.000 0.001 autism-x 2067 ERCC1 19q13.2-q13.3 excision repair protein <0.000 <0.000 10658 CUGBP1 11p11 CUG triplet repeat, RNA binding <0.000 0.001 7486 WRN 8p12-p11.2 Werner syndrome <0.000 0.003 10018 BCL2L11 2q13 BCL2-like 11 (apoptosis facilitator) <0.000 0.009 1747 DLX3 17q21 distal-less homeo box 3 <0.000 0.013 11030 RBPMS 8p12-p11 RNA binding protein 0.002 <0.000 megalencephalic leukoencephalopathy with 23209 MLC1 22q13.33 0.002 <0.000 subcortical cysts 1 autism-x-dom-rec 10893 MMP24 20q11.2 matrix metallopeptidase 24 <0.000 <0.000 1992 SERPINB1 6p25 serpin peptidase inhibitor <0.000 <0.000 5903 RANBP2 2q12.3 RAN binding protein 2 <0.000 0.007 8205 TAM* 21q11.2 myeloproliferative syndrome, transient 0.004 <0.000 bipolar-x 1437 CSF2 5q31.1 colony stimulating factor 2 0.001 <0.000 1604 CD55 1q32 decay accelerating for complement 0.005 <0.000 1869 E2F1 20q11.2 E2F transcription factor 1 0.002 <0.000 3586 IL10 1q31-q32 interleukin 10 0.001 <0.000 5075 PAX1 20p11.2 paired box gene 1 0.009 <0.000 6696 SPP1 4q21-q25 secreted phosphoprotein
Recommended publications
  • By Submitted in Partial Satisfaction of the Requirements for Degree of in In
    Developments of Two Imaging based Technologies for Cell Biology Researches by Xiaowei Yan DISSERTATION Submitted in partial satisfaction of the requirements for degree of DOCTOR OF PHILOSOPHY in Biochemistry and Molecular Biology in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Approved: ______________________________________________________________________________Ronald Vale Chair ______________________________________________________________________________Jonathan Weissman ______________________________________________________________________________Orion Weiner ______________________________________________________________________________ ______________________________________________________________________________ Committee Members Copyright 2021 By Xiaowei Yan ii DEDICATION Everything happens for the best. To my family, who supported me with all their love. iii ACKNOWLEDGEMENTS The greatest joy of my PhD has been joining UCSF, working and learning with such a fantastic group of scientists. I am extremely grateful for all the support and mentorship I received and would like to thank: My mentor, Ron Vale, who is such a great and generous person. Thank you for showing me that science is so much fun and thank you for always giving me the freedom in pursuing my interest. I am grateful for all the guidance from you and thank you for always supporting me whenever I needed. You are a person full of wisdom, and I have been learning so much from you and your attitude to science, science community and even life will continue inspire me. Thank you for being my mentor and thank you for being such a great mentor. Everyone else in Vale lab, past and present, for making our lab a sweet home. I would like to give my special thank to Marvin (Marvin Tanenbaum) and Nico (Nico Stuurman), two other mentors for me in the lab. I would like to thank them for helping me adapt to our lab, for all the valuable advice and for all the happiness during the time that we work together.
    [Show full text]
  • Mouse Casp8ap2 Conditional Knockout Project (CRISPR/Cas9)
    https://www.alphaknockout.com Mouse Casp8ap2 Conditional Knockout Project (CRISPR/Cas9) Objective: To create a Casp8ap2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Casp8ap2 gene (NCBI Reference Sequence: NM_011997 ; Ensembl: ENSMUSG00000028282 ) is located on Mouse chromosome 4. 11 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 10 (Transcript: ENSMUST00000029950). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Casp8ap2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-75H14 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for disruption of this gene die before implantation. Exon 4 starts from about 2.12% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 819 bp, and the size of intron 4 for 3'-loxP site insertion: 639 bp. The size of effective cKO region: ~601 bp. The cKO region does not have any other known gene. Page 1 of 7 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele gRNA region 5' gRNA region 3' 1 3 4 5 11 Targeting vector Targeted allele Constitutive KO allele (After Cre recombination) Legends Exon of mouse Casp8ap2 Homology arm cKO region loxP site Page 2 of 7 https://www.alphaknockout.com Overview of the Dot Plot Window size: 10 bp Forward Reverse Complement Sequence 12 Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats.
    [Show full text]
  • Advancing a Clinically Relevant Perspective of the Clonal Nature of Cancer
    Advancing a clinically relevant perspective of the clonal nature of cancer Christian Ruiza,b, Elizabeth Lenkiewicza, Lisa Eversa, Tara Holleya, Alex Robesona, Jeffrey Kieferc, Michael J. Demeurea,d, Michael A. Hollingsworthe, Michael Shenf, Donna Prunkardf, Peter S. Rabinovitchf, Tobias Zellwegerg, Spyro Moussesc, Jeffrey M. Trenta,h, John D. Carpteni, Lukas Bubendorfb, Daniel Von Hoffa,d, and Michael T. Barretta,1 aClinical Translational Research Division, Translational Genomics Research Institute, Scottsdale, AZ 85259; bInstitute for Pathology, University Hospital Basel, University of Basel, 4031 Basel, Switzerland; cGenetic Basis of Human Disease, Translational Genomics Research Institute, Phoenix, AZ 85004; dVirginia G. Piper Cancer Center, Scottsdale Healthcare, Scottsdale, AZ 85258; eEppley Institute for Research in Cancer and Allied Diseases, Nebraska Medical Center, Omaha, NE 68198; fDepartment of Pathology, University of Washington, Seattle, WA 98105; gDivision of Urology, St. Claraspital and University of Basel, 4058 Basel, Switzerland; hVan Andel Research Institute, Grand Rapids, MI 49503; and iIntegrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ 85004 Edited* by George F. Vande Woude, Van Andel Research Institute, Grand Rapids, MI, and approved June 10, 2011 (received for review March 11, 2011) Cancers frequently arise as a result of an acquired genomic insta- on the basis of morphology alone (8). Thus, the application of bility and the subsequent clonal evolution of neoplastic cells with purification methods such as laser capture microdissection does variable patterns of genetic aberrations. Thus, the presence and not resolve the complexities of many samples. A second approach behaviors of distinct clonal populations in each patient’s tumor is to passage tumor biopsies in tissue culture or in xenografts (4, 9– may underlie multiple clinical phenotypes in cancers.
    [Show full text]
  • Cytotoxic Effects and Changes in Gene Expression Profile
    Toxicology in Vitro 34 (2016) 309–320 Contents lists available at ScienceDirect Toxicology in Vitro journal homepage: www.elsevier.com/locate/toxinvit Fusarium mycotoxin enniatin B: Cytotoxic effects and changes in gene expression profile Martina Jonsson a,⁎,MarikaJestoib, Minna Anthoni a, Annikki Welling a, Iida Loivamaa a, Ville Hallikainen c, Matti Kankainen d, Erik Lysøe e, Pertti Koivisto a, Kimmo Peltonen a,f a Chemistry and Toxicology Research Unit, Finnish Food Safety Authority (Evira), Mustialankatu 3, FI-00790 Helsinki, Finland b Product Safety Unit, Finnish Food Safety Authority (Evira), Mustialankatu 3, FI-00790 Helsinki, c The Finnish Forest Research Institute, Rovaniemi Unit, P.O. Box 16, FI-96301 Rovaniemi, Finland d Institute for Molecular Medicine Finland (FIMM), University of Helsinki, P.O. Box 20, FI-00014, Finland e Plant Health and Biotechnology, Norwegian Institute of Bioeconomy, Høyskoleveien 7, NO -1430 Ås, Norway f Finnish Safety and Chemicals Agency (Tukes), Opastinsilta 12 B, FI-00521 Helsinki, Finland article info abstract Article history: The mycotoxin enniatin B, a cyclic hexadepsipeptide produced by the plant pathogen Fusarium,isprevalentin Received 3 December 2015 grains and grain-based products in different geographical areas. Although enniatins have not been associated Received in revised form 5 April 2016 with toxic outbreaks, they have caused toxicity in vitro in several cell lines. In this study, the cytotoxic effects Accepted 28 April 2016 of enniatin B were assessed in relation to cellular energy metabolism, cell proliferation, and the induction of ap- Available online 6 May 2016 optosis in Balb 3T3 and HepG2 cells. The mechanism of toxicity was examined by means of whole genome ex- fi Keywords: pression pro ling of exposed rat primary hepatocytes.
    [Show full text]
  • XIAO-DISSERTATION-2015.Pdf
    CELLULAR AND PROCESS ENGINEERING TO IMPROVE MAMMALIAN MEMBRANE PROTEIN EXPRESSION By Su Xiao A dissertation is submitted to Johns Hopkins University in conformity with the requirements for degree of Doctor of Philosophy Baltimore, Maryland May 2015 © 2015 Su Xiao All Rights Reserved Abstract Improving the expression level of recombinant mammalian proteins has been pursued for production of commercial biotherapeutics in industry, as well as for biomedical studies in academia, as an adequate supply of correctly folded proteins is a prerequisite for all structure and function studies. Presented in this dissertation are different strategies to improve protein functional expression level, especially for membrane proteins. The model protein is neurotensin receptor 1 (NTSR1), a hard-to- express G protein-coupled receptor (GPCR). GPCRs are integral membrane proteins playing a central role in cell signaling and are targets for most of the medicines sold worldwide. Obtaining adequate functional GPCRs has been a bottleneck in their structure studies because the expression of these proteins from mammalian cells is very low. The first strategy is the adoption of mammalian inducible expression system. A stable and inducible T-REx-293 cell line overexpressing an engineered rat NTSR1 was constructed. 2.5 million Functional copies of NTSR1 per cell were detected on plasma membrane, which is 167 fold improvement comparing to NTSR1 constitutive expression. The second strategy is production process development including suspension culture adaptation and induction parameter optimization. A further 3.5 fold improvement was achieved and approximately 1 milligram of purified functional NTSR1 per liter suspension culture was obtained. This was comparable yield to the transient baculovirus- insect cell system.
    [Show full text]
  • Supplementary Data
    SUPPLEMENTARY DATA A cyclin D1-dependent transcriptional program predicts clinical outcome in mantle cell lymphoma Santiago Demajo et al. 1 SUPPLEMENTARY DATA INDEX Supplementary Methods p. 3 Supplementary References p. 8 Supplementary Tables (S1 to S5) p. 9 Supplementary Figures (S1 to S15) p. 17 2 SUPPLEMENTARY METHODS Western blot, immunoprecipitation, and qRT-PCR Western blot (WB) analysis was performed as previously described (1), using cyclin D1 (Santa Cruz Biotechnology, sc-753, RRID:AB_2070433) and tubulin (Sigma-Aldrich, T5168, RRID:AB_477579) antibodies. Co-immunoprecipitation assays were performed as described before (2), using cyclin D1 antibody (Santa Cruz Biotechnology, sc-8396, RRID:AB_627344) or control IgG (Santa Cruz Biotechnology, sc-2025, RRID:AB_737182) followed by protein G- magnetic beads (Invitrogen) incubation and elution with Glycine 100mM pH=2.5. Co-IP experiments were performed within five weeks after cell thawing. Cyclin D1 (Santa Cruz Biotechnology, sc-753), E2F4 (Bethyl, A302-134A, RRID:AB_1720353), FOXM1 (Santa Cruz Biotechnology, sc-502, RRID:AB_631523), and CBP (Santa Cruz Biotechnology, sc-7300, RRID:AB_626817) antibodies were used for WB detection. In figure 1A and supplementary figure S2A, the same blot was probed with cyclin D1 and tubulin antibodies by cutting the membrane. In figure 2H, cyclin D1 and CBP blots correspond to the same membrane while E2F4 and FOXM1 blots correspond to an independent membrane. Image acquisition was performed with ImageQuant LAS 4000 mini (GE Healthcare). Image processing and quantification were performed with Multi Gauge software (Fujifilm). For qRT-PCR analysis, cDNA was generated from 1 µg RNA with qScript cDNA Synthesis kit (Quantabio). qRT–PCR reaction was performed using SYBR green (Roche).
    [Show full text]
  • Genome-Wide Screen of Cell-Cycle Regulators in Normal and Tumor Cells
    bioRxiv preprint doi: https://doi.org/10.1101/060350; this version posted June 23, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Genome-wide screen of cell-cycle regulators in normal and tumor cells identifies a differential response to nucleosome depletion Maria Sokolova1, Mikko Turunen1, Oliver Mortusewicz3, Teemu Kivioja1, Patrick Herr3, Anna Vähärautio1, Mikael Björklund1, Minna Taipale2, Thomas Helleday3 and Jussi Taipale1,2,* 1Genome-Scale Biology Program, P.O. Box 63, FI-00014 University of Helsinki, Finland. 2Science for Life laboratory, Department of Biosciences and Nutrition, Karolinska Institutet, SE- 141 83 Stockholm, Sweden. 3Science for Life laboratory, Division of Translational Medicine and Chemical Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 21 Stockholm, Sweden To identify cell cycle regulators that enable cancer cells to replicate DNA and divide in an unrestricted manner, we performed a parallel genome-wide RNAi screen in normal and cancer cell lines. In addition to many shared regulators, we found that tumor and normal cells are differentially sensitive to loss of the histone genes transcriptional regulator CASP8AP2. In cancer cells, loss of CASP8AP2 leads to a failure to synthesize sufficient amount of histones in the S-phase of the cell cycle, resulting in slowing of individual replication forks. Despite this, DNA replication fails to arrest, and tumor cells progress in an elongated S-phase that lasts several days, finally resulting in death of most of the affected cells.
    [Show full text]
  • Discrepancies Between Human DNA, Mrna and Protein Reference
    Database, 2016, 1–15 doi: 10.1093/database/baw124 Original article Original article Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population Matsuyuki Shirota1,2,3 and Kengo Kinoshita2,3,4,* 1Graduate School of Medicine, Tohoku University, Sendai, Miyagi 9808575, Japan, 2Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 9808575, Japan, 3Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 9808579, Japan, 4Institute for Development, Aging and Cancer, Tohoku University, Sendai, Miyagi 9808575, Japan *Corresponding author: Tel: þ81227957179; Fax: þ81227957179; Email: [email protected] Citation details: Shirota,M. and Kinoshita,K. Discrepancies between human DNA, mRNA and protein reference se- quences and their relation to single nucleotide variants in the human population. Database (2016) Vol. 2016: article ID baw124; doi:10.1093/database/baw124. Received 5 May 2016; Revised 6 July 2016; Accepted 4 August 2016 Abstract The protein coding sequences of the human reference genome GRCh38, RefSeq mRNA and UniProt protein databases are sometimes inconsistent with each other, due to poly- morphisms in the human population, but the overall landscape of the discordant se- quences has not been clarified. In this study, we comprehensively listed the discordant bases and regions between the GRCh38, RefSeq and UniProt reference sequences, based on the genomic coordinates of GRCh38. We observed that the RefSeq sequences are more likely to represent the major alleles than GRCh38 and UniProt, by assigning the al- ternative allele frequencies of the discordant bases. Since some reference sequences have minor alleles, functional and structural annotations may be performed based on rare alleles in the human population, thereby biasing these analyses.
    [Show full text]
  • Comprehensive Analyses of DNA Repair Pathways, Smoking and Bladder Cancer Risk in Los Angeles and Shanghai
    IJC International Journal of Cancer Comprehensive analyses of DNA repair pathways, smoking and bladder cancer risk in Los Angeles and Shanghai Roman Corral1,2, Juan Pablo Lewinger1,2, David Van Den Berg1,2, Amit D. Joshi3, Jian-Min Yuan4, Manuela Gago-Dominguez1,2,5, Victoria K. Cortessis1,2,6, Malcolm C. Pike1,2,7, David V. Conti1,2, Duncan C. Thomas1,2, Christopher K. Edlund2, Yu-Tang Gao8, Yong-Bing Xiang8, Wei Zhang8, Yu-Chen Su1,2 and Mariana C. Stern1,2 1 Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 2 Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 3 Department of Epidemiology, Harvard School of Public Health, Boston, MA 4 Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh Cancer Institute, University of Pittsbugh, Pittsburgh, PA 5 Galician Foundation of Genomic Medicine, Complexo Hospitalario Universitario Santiago, Servicio Galego de Saude, IDIS, Santiago de Compostela, Spain 6 Department of Obstetrics and Gynecology, Keck School of Medicine of University of Southern California, Los Angeles, CA 7 Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY 8 Department of Epidemiology, Shanghai Cancer Institute and Cancer Institute of Shanghai Jiaotong University, Shanghai, China Tobacco smoking is a bladder cancer risk factor and a source of carcinogens that induce DNA damage to urothelial cells. Using data and samples from 988 cases and 1,004 controls enrolled in the Los Angeles County Bladder Cancer Study and the Shanghai Bladder Cancer Study, we investigated associations between bladder cancer risk and 632 tagSNPs that comprehensively capture genetic variation in 28 DNA repair genes from four DNA repair pathways: base excision repair (BER), nucleotide excision repair (NER), non-homologous end-joining (NHEJ) and homologous recombination repair (HHR).
    [Show full text]
  • Open Data for Differential Network Analysis in Glioma
    International Journal of Molecular Sciences Article Open Data for Differential Network Analysis in Glioma , Claire Jean-Quartier * y , Fleur Jeanquartier y and Andreas Holzinger Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, 8036 Graz, Austria; [email protected] (F.J.); [email protected] (A.H.) * Correspondence: [email protected] These authors contributed equally to this work. y Received: 27 October 2019; Accepted: 3 January 2020; Published: 15 January 2020 Abstract: The complexity of cancer diseases demands bioinformatic techniques and translational research based on big data and personalized medicine. Open data enables researchers to accelerate cancer studies, save resources and foster collaboration. Several tools and programming approaches are available for analyzing data, including annotation, clustering, comparison and extrapolation, merging, enrichment, functional association and statistics. We exploit openly available data via cancer gene expression analysis, we apply refinement as well as enrichment analysis via gene ontology and conclude with graph-based visualization of involved protein interaction networks as a basis for signaling. The different databases allowed for the construction of huge networks or specified ones consisting of high-confidence interactions only. Several genes associated to glioma were isolated via a network analysis from top hub nodes as well as from an outlier analysis. The latter approach highlights a mitogen-activated protein kinase next to a member of histondeacetylases and a protein phosphatase as genes uncommonly associated with glioma. Cluster analysis from top hub nodes lists several identified glioma-associated gene products to function within protein complexes, including epidermal growth factors as well as cell cycle proteins or RAS proto-oncogenes.
    [Show full text]
  • Supplementary Materials and Tables a and B
    SUPPLEMENTARY MATERIAL 1 Table A. Main characteristics of the subset of 23 AML patients studied by high-density arrays (subset A) WBC BM blasts MYST3- MLL Age/Gender WHO / FAB subtype Karyotype FLT3-ITD NPM status (x109/L) (%) CREBBP status 1 51 / F M4 NA 21 78 + - G A 2 28 / M M4 t(8;16)(p11;p13) 8 92 + - G G 3 53 / F M4 t(8;16)(p11;p13) 27 96 + NA G NA 4 24 / M PML-RARα / M3 t(15;17) 5 90 - - G G 5 52 / M PML-RARα / M3 t(15;17) 1.5 75 - - G G 6 31 / F PML-RARα / M3 t(15;17) 3.2 89 - - G G 7 23 / M RUNX1-RUNX1T1 / M2 t(8;21) 38 34 - + ND G 8 52 / M RUNX1-RUNX1T1 / M2 t(8;21) 8 68 - - ND G 9 40 / M RUNX1-RUNX1T1 / M2 t(8;21) 5.1 54 - - ND G 10 63 / M CBFβ-MYH11 / M4 inv(16) 297 80 - - ND G 11 63 / M CBFβ-MYH11 / M4 inv(16) 7 74 - - ND G 12 59 / M CBFβ-MYH11 / M0 t(16;16) 108 94 - - ND G 13 41 / F MLLT3-MLL / M5 t(9;11) 51 90 - + G R 14 38 / F M5 46, XX 36 79 - + G G 15 76 / M M4 46 XY, der(10) 21 90 - - G NA 16 59 / M M4 NA 29 59 - - M G 17 26 / M M5 46, XY 295 92 - + G G 18 62 / F M5 NA 67 88 - + M A 19 47 / F M5 del(11q23) 17 78 - + M G 20 50 / F M5 46, XX 61 59 - + M G 21 28 / F M5 46, XX 132 90 - + G G 22 30 / F AML-MD / M5 46, XX 6 79 - + M G 23 64 / M AML-MD / M1 46, XY 17 83 - + M G WBC: white blood cell.
    [Show full text]
  • Computational Inferences of Mutations Driving Mesenchymal Differentiation in Glioblastoma
    Computational Inferences of Mutations Driving Mesenchymal Differentiation in Glioblastoma James Chen Submitted in partial fulfillment of the requirements for the Doctor of Philosophy Degree in the Graduate School of Arts and Sciences Columbia University 2013 ! 2013 James Chen All rights reserved ABSTRACT Computational Inferences of Mutations Driving Mesenchymal Differentiation in Glioblastoma James Chen This dissertation reviews the development and implementation of integrative, systems biology methods designed to parse driver mutations from high- throughput array data derived from human patients. The analysis of vast amounts of genomic and genetic data in the context of complex human genetic diseases such as Glioblastoma is a daunting task. Mutations exist by the hundreds, if not thousands, and only an unknown handful will contribute to the disease in a significant way. The goal of this project was to develop novel computational methods to identify candidate mutations from these data that drive the molecular differentiation of glioblastoma into the mesenchymal subtype, the most aggressive, poorest-prognosis tumors associated with glioblastoma. TABLE OF CONTENTS CHAPTER 1… Introduction and Background 1 Glioblastoma and the Mesenchymal Subtype 3 Systems Biology and Master Regulators 9 Thesis Project: Genetics and Genomics 20 CHAPTER 2… TCGA Data Processing 23 CHAPTER 3… DIGGIn Part 1 – Selecting f-CNVs 33 Mutual Information 40 Application and Analysis 45 CHAPTER 4… DIGGIn Part 2 – Selecting drivers 52 CHAPTER 5… KLHL9 Manuscript 63 Methods 90 CHAPTER 5a… Revisions work-in-progress 105 CHAPTER 6… Discussion 109 APPENDICES… 132 APPEND01 – TCGA classifications 133 APPEND02 – GBM f-CNV list 136 APPEND03 – MES f-CNV candidate drivers 152 APPEND04 – Scripts 149 APPEND05 – Manuscript Figures and Legends 175 APPEND06 – Manuscript Supplemental Materials 185 i ACKNOWLEDGEMENTS I would like to thank the Califano Lab and my mentor, Andrea Califano, for their intellectual and motivational support during my stay in their lab.
    [Show full text]