Cancer Mutant Proteome Database

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Published online 14 November 2014 Nucleic Acids Research, 2015, Vol. 43, Database issue D849–D855 doi: 10.1093/nar/gku1182 CMPD: cancer mutant proteome database Po-Jung Huang1,2,†, Chi-Ching Lee1,†, Bertrand Chin-Ming Tan3, Yuan-Ming Yeh4, Lichieh Julie Chu2, Ting-Wen Chen1, Kai-Ping Chang5, Cheng-Yang Lee1, Ruei-Chi Gan1, Hsuan Liu6,* and Petrus Tang1,* 1Bioinformatics Core Laboratory, Chang Gung University, Taoyuan 333, Taiwan, 2Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan, 3Department of Biomedical Sciences, Chang Gung University, Taoyuan 333, Taiwan, 4Bioinformatics Division, Tri-I Biotech, Inc., Taipei 221, Taiwan, 5Department of Otolaryngology, Head and Neck Surgery, Chang Gung Memorial Hospital, Lin-Kou, Taoyuan 333, Taiwan and 6Department of Molecular and Cellular Biology, Chang Gung University, Taoyuan 333, Taiwan Received August 15, 2014; Accepted November 02, 2014 ABSTRACT INTRODUCTION Whole-exome sequencing, which centres on the Cancer is a genetic disease, which arise as a consequence protein coding regions of disease/cancer asso- of genomic abnormalities stemming from somatically ac- ciated genes, represents the most cost-effective quired mutations or inherited gene mutations. Genomic se- method to-date for deciphering the association be- quence can be altered on difference scales, which include tween genetic alterations and diseases. Large-scale single nucleotide variations (SNVs), small insertions and deletions (INDELs), rearrangements of genome segments whole exome/genome sequencing projects have and changes in the copy number of DNA fragments. Since been launched by various institutions, such as NCI, specific mutations of cancer-associated genes are generally Broad Institute and TCGA, to provide a comprehen- considered as DNA biomarkers for diagnosis and as molec- sive catalogue of coding variants in diverse tissue ular markers for therapeutic drug selection in clinical set- samples and cell lines. Further functional and clini- tings, several large-scale sequencing studies have been per- cal interrogation of these sequence variations must formed with the aim to further understand cancer genomics. rely on extensive cross-platforms integration of se- To this end, the NCI Cancer Genome Atlas (TCGA) project quencing information and a proteome database that has sequenced the genomes of over 10 000 tumour samples explicitly and comprehensively archives the corre- in 33 cancer types (https://tcga-data.nci.nih.gov/tcga/)(1). sponding mutated peptide sequences. While such Sequencing data on the NCI-60 cell lines (2) as well as 947 data resource is a critical for the mass spectrometry- cancer cell lines (3) also provide an extensive catalogue of cancer relevant variants as well as pharmacogenomics cor- based proteomic analysis of exomic variants, no relations between specific variants and anticancer agents. database is currently available for the collection of As mutations that alter the protein sequences have the most mutant protein sequences that correspond to recent significant impact on protein stability and functionality, re- large-scale genomic data. To address this issue and searchers have started to design multilayer experiments en- serve as bridge to integrate genomic and proteomics compassing both genomic and proteomic methods in order datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd)col- to better understand the roles of coding variants in tumori- lected over 2 millions genetic alterations, which not genesis and progression, as well as their clinical implications only facilitates the confirmation and examination of (4). potential cancer biomarkers but also provides an in- Sequence database search is the most widely used method valuable resource for translational medicine research for protein identification in the field of mass spectrometry- and opportunities to identify mutated proteins en- based proteomics (5,6). However, search results are directly affected by the specificity and completeness of sequence coded by mutated genes. database––the mass-spectrometry search engine will likely miss the mutated peptides, which are not included in the reference databases. Recently, a mutated peptide database, named XMAn (7), was developed to translate disease- and *To whom correspondence should be addressed. Tel: +886 3 2118800 (Ext 5136); Fax: +886 3 2118122; Email: [email protected] Correspondence may also be addressed to Hsuan Liu. Email: [email protected] †The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors. C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] D850 Nucleic Acids Research, 2015, Vol. 43, Database issue cancer-related mutations from gene-level into mutated pep- CMPD to identifying mutated peptides was demonstrated tide sequences for proteomics search, based on known by whole-exome sequencing and proteome resources of the mutations gathered from widely accepted sources such as COLO 205 cancer cell line. UniProt, IARC P53, OMIM and COSMIC (8–11). How- ever, no existing mutated peptide database has been de- veloped for global exploration of gene alterations discov- ered in large-scale sequencing studies such as those men- MATERIALS AND METHODS tioned above, and efforts on mapping these mutations to Construction of tryptic peptide database protein sequences for further proteomics analysis are still very limited. Only two existing databases––Human Pro- A comprehensive list of gene mutations was extracted tein Mutant Database (HPMD) (12) and Human Cancer from large-scale cancer genomic sequencing projects Proteome Variation Database (CanProVar) (13)––were de- including NCI-60 (2), CCLE (3), and TCGA (1,15–17). signed for collecting mutated protein specific to colon can- The Variant Call Format (VCF) as well as the Mu- cer cell lines and single amino acid alternations from human tation Annotation Format (MAF) files corresponded cancer proteome, respectively. However, as the data sources to these sequencing projects can be downloaded from were limited to specific cell lines or cancer types, it isun- CellMiner (http://discover.nci.nih.gov/cellminer/), CCLE likely that these resources archive the full repertoire of pro- website (http://www.broadinstitute.org/ccle/home), and the tein sequence variations derived from currently known ge- MAF Dashboard (https://confluence.broadinstitute.org/ netic alterations. Moreover, these databases only annotate display/GDAC/MAF+Dashboard) of the Broad Institute’s variant information in the header line without providing Genome Data Analysis Center (GDAC), respectively. the mutated protein FASTA sequences. Therefore, existing Gene information and FASTA-formatted sequence files peptide mass fingerprint search engines cannot directly as- at both protein and mRNA levels can be downloaded sess these information portals, limiting their applicability from ENSEMBL (22) and UCSC through BioMart (23) in identifying mutant proteins derived from genetic varia- and UCSC Table Browser (24), respectively. Additional tion databases (8,14) and cancer genome/exome sequencing information on genes and proteins, such as HGNC gene studies (1,15,16). symbol (25), identifier in external databases such as Ref- Cancer Mutant Proteome Database (CMPD) is designed Seq (26) and UniProtKB (11), GO annotations (27)and to address this issue, aiming at improving the link between disease descriptions, and KEGG pathway information genomic and proteomics mutations. The mutated protein (28), is available through the cross-reference functionality sequence collection was based on the exome or genome provided by BioMart (23). Moreover, ANNOVAR (29) sequencing datasets from NCI-60 cell lines, 947 cancer was used to evaluate the overall consequences of SNVs cell lines from Cancer Cell Line Encyclopaedia project, and INDELs on corresponding transcripts. In addition, and 5500 more cases from 20 TCGA cancer cohort stud- observed allele frequencies in the 1000 Genomes Project ies (1,15–17). The identified genetic alterations (SNVs and (30) and NHLBI Exome Sequencing Project (31), single INDELs) were converted to all possible mutated protein- nucleotide polymorphisms reported in dbSNP (14), so- coding sequences according to sample-specific transcript matic mutations categorized in COSMIC (8), medically isoforms. A wide variety of databases, such as dbSNP (14), important variants collected in ClinVar (19), and func- dbNFSP (18), ClinVar (19), COSMIC (8) and OMIM (10), tional prediction results of all non-synonymous SNVs has been integrated to our annotation database to facilitate from popular algorithms (18,32–35) were compiled into compilation and exploration of associations between dis- an integrated SQLite database to facilitate variant-based eases and mutations and to eliminate any inconsistency in prioritization. Non-synonymous coding variants and small data format between annotation sources. Functional pre- INDELs identified by NCI-60, CCLE, and TCGA studies diction results are pre-compiled from various prediction were introduced into protein sequences to create sequence algorithms for evaluating the potential functional
Recommended publications
  • Environmental Influences on Endothelial Gene Expression

    Environmental Influences on Endothelial Gene Expression

    ENDOTHELIAL CELL GENE EXPRESSION John Matthew Jeff Herbert Supervisors: Prof. Roy Bicknell and Dr. Victoria Heath PhD thesis University of Birmingham August 2012 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT Tumour angiogenesis is a vital process in the pathology of tumour development and metastasis. Targeting markers of tumour endothelium provide a means of targeted destruction of a tumours oxygen and nutrient supply via destruction of tumour vasculature, which in turn ultimately leads to beneficial consequences to patients. Although current anti -angiogenic and vascular targeting strategies help patients, more potently in combination with chemo therapy, there is still a need for more tumour endothelial marker discoveries as current treatments have cardiovascular and other side effects. For the first time, the analyses of in-vivo biotinylation of an embryonic system is performed to obtain putative vascular targets. Also for the first time, deep sequencing is applied to freshly isolated tumour and normal endothelial cells from lung, colon and bladder tissues for the identification of pan-vascular-targets. Integration of the proteomic, deep sequencing, public cDNA libraries and microarrays, delivers 5,892 putative vascular targets to the science community.
  • A Gene Expression Resource Generated by Genome-Wide Lacz

    A Gene Expression Resource Generated by Genome-Wide Lacz

    © 2015. Published by The Company of Biologists Ltd | Disease Models & Mechanisms (2015) 8, 1467-1478 doi:10.1242/dmm.021238 RESOURCE ARTICLE A gene expression resource generated by genome-wide lacZ profiling in the mouse Elizabeth Tuck1,**, Jeanne Estabel1,*,**, Anika Oellrich1, Anna Karin Maguire1, Hibret A. Adissu2, Luke Souter1, Emma Siragher1, Charlotte Lillistone1, Angela L. Green1, Hannah Wardle-Jones1, Damian M. Carragher1,‡, Natasha A. Karp1, Damian Smedley1, Niels C. Adams1,§, Sanger Institute Mouse Genetics Project1,‡‡, James N. Bussell1, David J. Adams1, Ramiro Ramırez-Soliś 1, Karen P. Steel1,¶, Antonella Galli1 and Jacqueline K. White1,§§ ABSTRACT composite of RNA-based expression data sets. Strong agreement was observed, indicating a high degree of specificity in our data. Knowledge of the expression profile of a gene is a critical piece of Furthermore, there were 1207 observations of expression of a information required to build an understanding of the normal and particular gene in an anatomical structure where Bgee had no essential functions of that gene and any role it may play in the data, indicating a large amount of novelty in our data set. development or progression of disease. High-throughput, large- Examples of expression data corroborating and extending scale efforts are on-going internationally to characterise reporter- genotype-phenotype associations and supporting disease gene tagged knockout mouse lines. As part of that effort, we report an candidacy are presented to demonstrate the potential of this open access adult mouse expression resource, in which the powerful resource. expression profile of 424 genes has been assessed in up to 47 different organs, tissues and sub-structures using a lacZ reporter KEY WORDS: Gene expression, lacZ reporter, Mouse, Resource gene.
  • Supplementary Data

    Supplementary Data

    SUPPLEMENTARY DATA A cyclin D1-dependent transcriptional program predicts clinical outcome in mantle cell lymphoma Santiago Demajo et al. 1 SUPPLEMENTARY DATA INDEX Supplementary Methods p. 3 Supplementary References p. 8 Supplementary Tables (S1 to S5) p. 9 Supplementary Figures (S1 to S15) p. 17 2 SUPPLEMENTARY METHODS Western blot, immunoprecipitation, and qRT-PCR Western blot (WB) analysis was performed as previously described (1), using cyclin D1 (Santa Cruz Biotechnology, sc-753, RRID:AB_2070433) and tubulin (Sigma-Aldrich, T5168, RRID:AB_477579) antibodies. Co-immunoprecipitation assays were performed as described before (2), using cyclin D1 antibody (Santa Cruz Biotechnology, sc-8396, RRID:AB_627344) or control IgG (Santa Cruz Biotechnology, sc-2025, RRID:AB_737182) followed by protein G- magnetic beads (Invitrogen) incubation and elution with Glycine 100mM pH=2.5. Co-IP experiments were performed within five weeks after cell thawing. Cyclin D1 (Santa Cruz Biotechnology, sc-753), E2F4 (Bethyl, A302-134A, RRID:AB_1720353), FOXM1 (Santa Cruz Biotechnology, sc-502, RRID:AB_631523), and CBP (Santa Cruz Biotechnology, sc-7300, RRID:AB_626817) antibodies were used for WB detection. In figure 1A and supplementary figure S2A, the same blot was probed with cyclin D1 and tubulin antibodies by cutting the membrane. In figure 2H, cyclin D1 and CBP blots correspond to the same membrane while E2F4 and FOXM1 blots correspond to an independent membrane. Image acquisition was performed with ImageQuant LAS 4000 mini (GE Healthcare). Image processing and quantification were performed with Multi Gauge software (Fujifilm). For qRT-PCR analysis, cDNA was generated from 1 µg RNA with qScript cDNA Synthesis kit (Quantabio). qRT–PCR reaction was performed using SYBR green (Roche).
  • The Tumor Suppressor Notch Inhibits Head and Neck Squamous Cell

    The Tumor Suppressor Notch Inhibits Head and Neck Squamous Cell

    The Texas Medical Center Library DigitalCommons@TMC The University of Texas MD Anderson Cancer Center UTHealth Graduate School of The University of Texas MD Anderson Cancer Biomedical Sciences Dissertations and Theses Center UTHealth Graduate School of (Open Access) Biomedical Sciences 12-2015 THE TUMOR SUPPRESSOR NOTCH INHIBITS HEAD AND NECK SQUAMOUS CELL CARCINOMA (HNSCC) TUMOR GROWTH AND PROGRESSION BY MODULATING PROTO-ONCOGENES AXL AND CTNNAL1 (α-CATULIN) Shhyam Moorthy Shhyam Moorthy Follow this and additional works at: https://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Biochemistry, Biophysics, and Structural Biology Commons, Cancer Biology Commons, Cell Biology Commons, and the Medicine and Health Sciences Commons Recommended Citation Moorthy, Shhyam and Moorthy, Shhyam, "THE TUMOR SUPPRESSOR NOTCH INHIBITS HEAD AND NECK SQUAMOUS CELL CARCINOMA (HNSCC) TUMOR GROWTH AND PROGRESSION BY MODULATING PROTO-ONCOGENES AXL AND CTNNAL1 (α-CATULIN)" (2015). The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access). 638. https://digitalcommons.library.tmc.edu/utgsbs_dissertations/638 This Dissertation (PhD) is brought to you for free and open access by the The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. THE TUMOR SUPPRESSOR NOTCH INHIBITS HEAD AND NECK SQUAMOUS CELL CARCINOMA (HNSCC) TUMOR GROWTH AND PROGRESSION BY MODULATING PROTO-ONCOGENES AXL AND CTNNAL1 (α-CATULIN) by Shhyam Moorthy, B.S.
  • Mouse Ctnnal1 Knockout Project (CRISPR/Cas9)

    Mouse Ctnnal1 Knockout Project (CRISPR/Cas9)

    https://www.alphaknockout.com Mouse Ctnnal1 Knockout Project (CRISPR/Cas9) Objective: To create a Ctnnal1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering. Strategy summary: The Ctnnal1 gene (NCBI Reference Sequence: NM_018761 ; Ensembl: ENSMUSG00000038816 ) is located on Mouse chromosome 4. 19 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 19 (Transcript: ENSMUST00000045142). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a targeted disruption of this gene are viable and fertile and exhibit no overt phenotypes or defects in hematopoiesis and hematopoietic stem cell function. Exon 2 starts from about 6.34% of the coding region. Exon 2 covers 8.66% of the coding region. The size of effective KO region: ~190 bp. The KO region does not have any other known gene. Page 1 of 9 https://www.alphaknockout.com Overview of the Targeting Strategy Wildtype allele 5' gRNA region gRNA region 3' 1 2 19 Legends Exon of mouse Ctnnal1 Knockout region Page 2 of 9 https://www.alphaknockout.com Overview of the Dot Plot (up) Window size: 15 bp Forward Reverse Complement Sequence 12 Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.
  • Genome Wide Methylome Alterations in Lung Cancer

    Genome Wide Methylome Alterations in Lung Cancer

    RESEARCH ARTICLE Genome Wide Methylome Alterations in Lung Cancer Nandita Mullapudi1☯, Bin Ye2☯, Masako Suzuki3, Melissa Fazzari3, Weiguo Han1, Miao K. Shi1, Gaby Marquardt1, Juan Lin4, Tao Wang5, Steven Keller6, Changcheng Zhu7, Joseph D. Locker7¤, Simon D. Spivack1,3,5* 1 Department of Medicine/Pulmonary, Albert Einstein College of Medicine, Bronx, New York, United States of America, 2 Department of Bioinformatics, Albert Einstein College of Medicine, Bronx, New York, United States of America, 3 Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America, 4 Department of Epidemiology & Population Health, Division of Biostatistics, Albert Einstein College of Medicine, Bronx, New York, United States of America, 5 Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, New York, United States of America, 6 Department of Cardiovascular &Thoracic Surgery, Montefiore Medical Center, Bronx, New York, United States of America, a11111 7 Department of Pathology, Montefiore Medical Center, Bronx, New York, United States of America ☯ These authors contributed equally to this work. ¤ Current address: Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America * [email protected] OPEN ACCESS Abstract Citation: Mullapudi N, Ye B, Suzuki M, Fazzari M, Han W, Shi MK, et al. (2015) Genome Wide Aberrant cytosine 5-methylation underlies many deregulated elements of cancer. Among Methylome Alterations in
  • Transdifferentiation of Human Mesenchymal Stem Cells

    Transdifferentiation of Human Mesenchymal Stem Cells

    Transdifferentiation of Human Mesenchymal Stem Cells Dissertation zur Erlangung des naturwissenschaftlichen Doktorgrades der Julius-Maximilians-Universität Würzburg vorgelegt von Tatjana Schilling aus San Miguel de Tucuman, Argentinien Würzburg, 2007 Eingereicht am: Mitglieder der Promotionskommission: Vorsitzender: Prof. Dr. Martin J. Müller Gutachter: PD Dr. Norbert Schütze Gutachter: Prof. Dr. Georg Krohne Tag des Promotionskolloquiums: Doktorurkunde ausgehändigt am: Hiermit erkläre ich ehrenwörtlich, dass ich die vorliegende Dissertation selbstständig angefertigt und keine anderen als die von mir angegebenen Hilfsmittel und Quellen verwendet habe. Des Weiteren erkläre ich, dass diese Arbeit weder in gleicher noch in ähnlicher Form in einem Prüfungsverfahren vorgelegen hat und ich noch keinen Promotionsversuch unternommen habe. Gerbrunn, 4. Mai 2007 Tatjana Schilling Table of contents i Table of contents 1 Summary ........................................................................................................................ 1 1.1 Summary.................................................................................................................... 1 1.2 Zusammenfassung..................................................................................................... 2 2 Introduction.................................................................................................................... 4 2.1 Osteoporosis and the fatty degeneration of the bone marrow..................................... 4 2.2 Adipose and bone
  • 393LN V 393P 344SQ V 393P Probe Set Entrez Gene

    393LN V 393P 344SQ V 393P Probe Set Entrez Gene

    393LN v 393P 344SQ v 393P Entrez fold fold probe set Gene Gene Symbol Gene cluster Gene Title p-value change p-value change chemokine (C-C motif) ligand 21b /// chemokine (C-C motif) ligand 21a /// chemokine (C-C motif) ligand 21c 1419426_s_at 18829 /// Ccl21b /// Ccl2 1 - up 393 LN only (leucine) 0.0047 9.199837 0.45212 6.847887 nuclear factor of activated T-cells, cytoplasmic, calcineurin- 1447085_s_at 18018 Nfatc1 1 - up 393 LN only dependent 1 0.009048 12.065 0.13718 4.81 RIKEN cDNA 1453647_at 78668 9530059J11Rik1 - up 393 LN only 9530059J11 gene 0.002208 5.482897 0.27642 3.45171 transient receptor potential cation channel, subfamily 1457164_at 277328 Trpa1 1 - up 393 LN only A, member 1 0.000111 9.180344 0.01771 3.048114 regulating synaptic membrane 1422809_at 116838 Rims2 1 - up 393 LN only exocytosis 2 0.001891 8.560424 0.13159 2.980501 glial cell line derived neurotrophic factor family receptor alpha 1433716_x_at 14586 Gfra2 1 - up 393 LN only 2 0.006868 30.88736 0.01066 2.811211 1446936_at --- --- 1 - up 393 LN only --- 0.007695 6.373955 0.11733 2.480287 zinc finger protein 1438742_at 320683 Zfp629 1 - up 393 LN only 629 0.002644 5.231855 0.38124 2.377016 phospholipase A2, 1426019_at 18786 Plaa 1 - up 393 LN only activating protein 0.008657 6.2364 0.12336 2.262117 1445314_at 14009 Etv1 1 - up 393 LN only ets variant gene 1 0.007224 3.643646 0.36434 2.01989 ciliary rootlet coiled- 1427338_at 230872 Crocc 1 - up 393 LN only coil, rootletin 0.002482 7.783242 0.49977 1.794171 expressed sequence 1436585_at 99463 BB182297 1 - up 393
  • Genome-Scale Analysis of DNA Methylation in Lung Adenocarcinoma and Integration with Mrna Expression

    Genome-Scale Analysis of DNA Methylation in Lung Adenocarcinoma and Integration with Mrna Expression

    Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Research Genome-scale analysis of DNA methylation in lung adenocarcinoma and integration with mRNA expression Suhaida A. Selamat,1 Brian S. Chung,1 Luc Girard,2 Wei Zhang,2 Ying Zhang,3 Mihaela Campan,1 Kimberly D. Siegmund,3 Michael N. Koss,4 Jeffrey A. Hagen,5 Wan L. Lam,6 Stephen Lam,6 Adi F. Gazdar,2 and Ite A. Laird-Offringa1,7 1Department of Surgery, Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA; 2The Hamon Center for Therapeutic Oncology Research and Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA; 3Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA; 4Department of Pathology, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA; 5Department of Surgery, Keck School of Medicine, University of Southern California, Los Angeles, California 90089-9176, USA; 6BC Cancer Research Center, BC Cancer Agency, Vancouver, BC V521L3, Canada Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma is its most common histological subtype. Clinical and molecular evidence indicates that lung adenocarcinoma is a heterogeneous disease, which has important implications for treatment. Here we performed genome-scale DNA methylation profiling using the Illumina Infinium HumanMethylation27 platform on 59 matched lung adenocarcinoma/non-tumor lung pairs, with genome-scale verifi- cation on an independent set of tissues.
  • Leveraging Models of Cell Regulation and GWAS Data in Integrative Network-Based Association Studies

    Leveraging Models of Cell Regulation and GWAS Data in Integrative Network-Based Association Studies

    PERSPECTIVE Leveraging models of cell regulation and GWAS data in integrative network-based association studies Andrea Califano1–3,11, Atul J Butte4,5, Stephen Friend6, Trey Ideker7–9 & Eric Schadt10,11 Over the last decade, the genome-wide study of both heritable and straightforward: within the space of all possible genetic and epigenetic somatic human variability has gone from a theoretical concept to a variants, those contributing to a specific trait or disease likely have broadly implemented, practical reality, covering the entire spectrum some coalescent properties, allowing their effect to be functionally of human disease. Although several findings have emerged from these canalized via the cell communication and cell regulatory machin- studies1, the results of genome-wide association studies (GWAS) have ery that allows distinct cells to interact and regulates their behavior. been mostly sobering. For instance, although several genes showing Notably, contrary to random networks, whose output is essentially medium-to-high penetrance within heritable traits were identified by unconstrained, regulatory networks produced by adaptation to spe- these approaches, the majority of heritable genetic risk factors for most cific fitness landscapes are optimized to produce only a finite number common diseases remain elusive2–7. Additionally, due to impractical of well-defined outcomes as a function of a very large number of exog- requirements for cohort size8 and lack of methodologies to maximize enous and endogenous signals. Thus, if a comprehensive and accurate power for such detections, few epistatic interactions and low-pen- map of all intra- and intercellular molecular interactions were avail- etrance variants have been identified9. At the opposite end of the able, then genetic and epigenetic events implicated in a specific trait or germline versus somatic event spectrum, considering that tumor cells disease should cluster in subnetworks of closely interacting genes.
  • A Network Inference Approach to Understanding Musculoskeletal

    A Network Inference Approach to Understanding Musculoskeletal

    A NETWORK INFERENCE APPROACH TO UNDERSTANDING MUSCULOSKELETAL DISORDERS by NIL TURAN A thesis submitted to The University of Birmingham for the degree of Doctor of Philosophy College of Life and Environmental Sciences School of Biosciences The University of Birmingham June 2013 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT Musculoskeletal disorders are among the most important health problem affecting the quality of life and contributing to a high burden on healthcare systems worldwide. Understanding the molecular mechanisms underlying these disorders is crucial for the development of efficient treatments. In this thesis, musculoskeletal disorders including muscle wasting, bone loss and cartilage deformation have been studied using systems biology approaches. Muscle wasting occurring as a systemic effect in COPD patients has been investigated with an integrative network inference approach. This work has lead to a model describing the relationship between muscle molecular and physiological response to training and systemic inflammatory mediators. This model has shown for the first time that oxygen dependent changes in the expression of epigenetic modifiers and not chronic inflammation may be causally linked to muscle dysfunction.
  • Coexpression Network Based on Natural Variation in Human Gene Expression Reveals Gene Interactions and Functions

    Coexpression Network Based on Natural Variation in Human Gene Expression Reveals Gene Interactions and Functions

    Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Letter Coexpression network based on natural variation in human gene expression reveals gene interactions and functions Renuka R. Nayak,1 Michael Kearns,2 Richard S. Spielman,3,6 and Vivian G. Cheung3,4,5,7 1Medical Scientist Training Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 2Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 3Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 4Department of Pediatrics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; 5Howard Hughes Medical Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA Genes interact in networks to orchestrate cellular processes. Analysis of these networks provides insights into gene interactions and functions. Here, we took advantage of normal variation in human gene expression to infer gene net- works, which we constructed using correlations in expression levels of more than 8.5 million gene pairs in immortalized B cells from three independent samples. The resulting networks allowed us to identify biological processes and gene functions. Among the biological pathways, we found processes such as translation and glycolysis that co-occur in the same subnetworks. We predicted the functions of poorly characterized genes, including CHCHD2 and TMEM111, and provided experimental evidence that TMEM111 is part of the endoplasmic reticulum-associated secretory pathway. We also found that IFIH1, a susceptibility gene of type 1 diabetes, interacts with YES1, which plays a role in glucose transport. Furthermore, genes that predispose to the same diseases are clustered nonrandomly in the coexpression network, suggesting that net- works can provide candidate genes that influence disease susceptibility.