Computational Approaches for Disease Gene Identification

Total Page:16

File Type:pdf, Size:1020Kb

Computational Approaches for Disease Gene Identification Computational Approaches for Disease Gene Identification YANG PENG School of Computer Engineering A Thesis Submitted to Nanyang Technological University In Fulfillment of the Requirement For the Degree of Doctor of Philosophy May 2013 Acknowledgements First and foremost, my special gratitude goes to my supervisors, Professor Kwoh Chee-Keong, Professor Li Xiaoli and Professor Ng See-Kiong for their immense patience and invaluable advice they provide me during the essential part of my life. The work that I have done here would not have been possible without them. During working with them, I have gained both the scientific knowledge and the way to do original research work. I am benefit from their thoroughness and diligence in revising every word and sentence in my research write-ups and strict attitude toward scientific research work. I have to thank Assist Professor Zheng Jie, from whom I am impressed by his sharp eyes in finding the key underlying problems and his critical thinking skills to let me see many issues with my own eyes. I have to thank Associate Professor Lin Feng, Assist Professor Manoranjan Dash for their effective comments and support during my Ph.D qualifying examination. Besides professors, I would warmly thank my colleague and friend, Dr. Mei Jianping for her friendly help and kindly sharing with me her MATLAB codes of many machine learning algorithms used in this study. I also extend my special appreciation to Dr. Li Yongjin, from whom I was inspired to learn a lot of knowledge and approaches related with disease gene prediction. I also enjoyed the time that I spent with my other colleagues in Bioinformatics Research Center i including Dr. Liu qian, Dr. Wu min, Dr. Zhao Liang, Dr. Li Zhenhua, Dr. Piyushkumar and with my lab mates including Ouyang Xuchang, Su Tran To Chinh, Liu Wenting, Chen Haifen, Zhang fan, who worked closed with me to discover new problems in Bioinformatics. Especially, I would like to give my special thanks to my brother, girlfriend and parents. Their encouragement and understanding are my motivation to finish my Ph.D study. Finally, I express my sincere gratitude to everyone who has contributed to this thesis. ii Contents ACKNOWLEDGEMENTS ........................................................................................ I ABSTRACT ............................................................................................................ VI IMPORTANT ABBREVIATIONS USED ................................................................ X LIST OF FIGURES ................................................................................................. XII LIST OF TABLES .................................................................................................. XIV CHAPTER 1. INTRODUCTION ........................................................................... 1 1.1 BACKGROUND ................................................................................................... 1 1.1.1 Motivation and Objective of Disease Gene Identification ......................... 1 1.1.2 Challenges of Disease Gene Identification ................................................ 3 1.1.3 Wet-lab Experiments for Disease Gene Identification ............................... 5 1.2 RELATED PRIOR WORKS .................................................................................... 7 1.3 MAJOR CONTRIBUTIONS AND ORGINAZATION ................................................. 11 1.4 OUTLINE ......................................................................................................... 15 CHAPTER 2. LITERATURE REVIEW ............................................................. 16 2.1 PRIORITIZATION OF CANDIDATE GENES BASED ON BIOLGICAL DATA SOURCE 17 2.1.1 Sequence Based Methods ......................................................................... 17 2.1.2 Gene Expression Based Methods ............................................................. 18 2.1.3 Ontology Based Methods ......................................................................... 20 2.1.4 PPI Network Based Methods ................................................................... 22 2.2 INTEGRATION METHODOLOGIES ON CANDIDATE GENES PRIORITIZATION ....... 24 2.3 SUMMARY ....................................................................................................... 33 CHAPTER 3. PREDICTING DISEASE GENE VIA PROTEIN COMPLEX NETWORK PROPAGATION .................................................................................. 36 iii 3.1 INTRODUCTION ............................................................................................... 36 3.2 METHOD ......................................................................................................... 38 3.2.1 Overall Network Structure in RWPCN .................................................... 38 3.2.2 Constructing Phenotype Network ............................................................ 40 3.2.3 Constructing Protein Complex Network .................................................. 41 3.2.4 Random walk with restart on protein complexes network (RWPCN) ..... 43 3.3 EXPERIMENT RESULTS .................................................................................... 48 3.3.1 Experimental settings and evaluation metrics .......................................... 48 3.3.2 Experimental Results ................................................................................ 50 3.4 SUMMARY ........................................................................................................ 67 CHAPTER 4. POSITIVE UNLABELED LEARNING FOR DISEASE GENE IDENTIFICATION ....................................................................................... 69 4.1 INTRODUCTION ............................................................................................... 70 4.2 METHOD ......................................................................................................... 73 4.2.1 Gene characterization ............................................................................... 73 4.2.2 Feature Selection ...................................................................................... 76 4.2.3 PU learning to identify the disease genes from U .................................... 80 4.3 RESULT ........................................................................................................... 85 4.3.1 Experimental data, settings and evaluation metrics ................................. 85 4.3.2 Experimental Result ................................................................................. 90 4.4 SUMMARY ..................................................................................................... 103 CHAPTER 5. ENSEMBLE BASED POSITIVE UNLABELED LEARNING FOR DISEASE GENE IDENTIFICATION ................................... 105 5.1 INTRODUCTORY............................................................................................. 106 5.2 Material and Method .................................................................................... 109 iv 5.2.1 Experimental data and gene network modeling ..................................... 110 5.2.2 The proposed Technique EPU ................................................................ 113 5.2.3 Ensemble positive unlabeled learning EPU ........................................... 118 5.3 EXPERIMENTAL RESULTS .............................................................................. 125 5.3.1 Experimental setting ............................................................................... 125 5.3.2 Evaluation metrics .................................................................................. 126 5.3.3 Experimental result ................................................................................. 126 5.4 SUMMARY ..................................................................................................... 137 CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS .................... 138 6.1 CONCLUSIONS AND DISCUSSION .................................................................... 138 6.2 FUTURE DIRECTIONS ..................................................................................... 141 6.2.1 Integration of More Data Sources for Disease Gene classification ........ 141 6.2.2 Phenotype Entities Similarity Calculation ............................................. 141 6.2.3 Improving the network propagation based method (RWPCN) using machine learning classification approaches ........................................................ 142 6.2.4 Prioritization of loci using GWAS data .................................................. 143 6.3 FINAL REMARKS ............................................................................................ 143 REFERENCES ......................................................................................................... 145 AUTHOR’S PUBLICATIONS ............................................................................... 163 v Abstract Identifying disease genes from human genome is an important and fundamental problem in biomedical research. Despite many publications of machine learning methods applied to discover new disease genes, it still remains a challenge because of the pleiotropy of genes, the limited number of confirmed disease genes among whole genome and the genetic heterogeneity of diseases. Recent approaches have applied the concept of ‘guilty by association’ to investigate the association between a disease phenotype and its causative genes, which means that candidate genes with
Recommended publications
  • Inherited Neuropathies
    407 Inherited Neuropathies Vera Fridman, MD1 M. M. Reilly, MD, FRCP, FRCPI2 1 Department of Neurology, Neuromuscular Diagnostic Center, Address for correspondence Vera Fridman, MD, Neuromuscular Massachusetts General Hospital, Boston, Massachusetts Diagnostic Center, Massachusetts General Hospital, Boston, 2 MRC Centre for Neuromuscular Diseases, UCL Institute of Neurology Massachusetts, 165 Cambridge St. Boston, MA 02114 and The National Hospital for Neurology and Neurosurgery, Queen (e-mail: [email protected]). Square, London, United Kingdom Semin Neurol 2015;35:407–423. Abstract Hereditary neuropathies (HNs) are among the most common inherited neurologic Keywords disorders and are diverse both clinically and genetically. Recent genetic advances have ► hereditary contributed to a rapid expansion of identifiable causes of HN and have broadened the neuropathy phenotypic spectrum associated with many of the causative mutations. The underlying ► Charcot-Marie-Tooth molecular pathways of disease have also been better delineated, leading to the promise disease for potential treatments. This chapter reviews the clinical and biological aspects of the ► hereditary sensory common causes of HN and addresses the challenges of approaching the diagnostic and motor workup of these conditions in a rapidly evolving genetic landscape. neuropathy ► hereditary sensory and autonomic neuropathy Hereditary neuropathies (HN) are among the most common Select forms of HN also involve cranial nerves and respiratory inherited neurologic diseases, with a prevalence of 1 in 2,500 function. Nevertheless, in the majority of patients with HN individuals.1,2 They encompass a clinically heterogeneous set there is no shortening of life expectancy. of disorders and vary greatly in severity, spanning a spectrum Historically, hereditary neuropathies have been classified from mildly symptomatic forms to those resulting in severe based on the primary site of nerve pathology (myelin vs.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • Targeting PH Domain Proteins for Cancer Therapy
    The Texas Medical Center Library DigitalCommons@TMC The University of Texas MD Anderson Cancer Center UTHealth Graduate School of The University of Texas MD Anderson Cancer Biomedical Sciences Dissertations and Theses Center UTHealth Graduate School of (Open Access) Biomedical Sciences 12-2018 Targeting PH domain proteins for cancer therapy Zhi Tan Follow this and additional works at: https://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Bioinformatics Commons, Medicinal Chemistry and Pharmaceutics Commons, Neoplasms Commons, and the Pharmacology Commons Recommended Citation Tan, Zhi, "Targeting PH domain proteins for cancer therapy" (2018). The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access). 910. https://digitalcommons.library.tmc.edu/utgsbs_dissertations/910 This Dissertation (PhD) is brought to you for free and open access by the The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. TARGETING PH DOMAIN PROTEINS FOR CANCER THERAPY by Zhi Tan Approval page APPROVED: _____________________________________________ Advisory Professor, Shuxing Zhang, Ph.D. _____________________________________________
    [Show full text]
  • The Roma Paradigm
    The Counteracting Effects of Demography on Functional Genomic Variation: The Roma Paradigm Neus Font-Porterias,1 Rocio Caro-Consuegra,1 Marcel Lucas-Sanchez,1 Marie Lopez,2 Aaron Gimenez,3 Annabel Carballo-Mesa,4 Elena Bosch,1,5 Francesc Calafell ,1 Lluıs Quintana-Murci,2,6 and David Comas *,1 Downloaded from https://academic.oup.com/mbe/article/38/7/2804/6169539 by Consejo Superior de Investigaciones Cientificas (CSIC) user on 31 August 2021 1Departament de Cie`ncies Experimentals i de la Salut, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain 2Human Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris, France 3Facultat de Sociologia, Universitat Autonoma de Barcelona, Barcelona, Spain 4Facultat de Geografia i Historia, Universitat de Barcelona, Barcelona, Spain 5Centro de Investigacion Biomedica en Red de Salud Mental (CIBERSAM), Reus, Spain 6Human Genomics and Evolution, Colle`gedeFrance,Paris,France *Corresponding author: E-mail: [email protected]. Associate editor: Kelley Harris Abstract Demographic history plays a major role in shaping the distribution of genomic variation. Yet the interaction between different demographic forces and their effects in the genomesisnotfullyresolvedinhumanpopulations. Here, we focus on the Roma population, the largest transnational ethnic minority in Europe. They have a South Asian origin and their demographic history is characterized by recent dispersals, multiple founder events, and extensive gene flow from non- Roma groups. Through the analyses of new high-coverage whole exome sequences and genome-wide array data for 89 Iberian Roma individuals together with forward simulations, we show that founder effects have reduced their genetic diversity and proportion of rare variants, gene flow has counteracted the increase in mutational load, runs of homo- zygosity show ancestry-specific patterns of accumulation of deleterious homozygotes, and selection signals primarily derive from preadmixture adaptation in the Roma population sources.
    [Show full text]
  • Expression of Odontogenic Ameloblast-Associated Protein (ODAM) in Dental and Other Epithelial Neoplasms
    Expression of Odontogenic Ameloblast-Associated Protein (ODAM) in Dental and Other Epithelial Neoplasms Daniel P Kestler, James S Foster, Sallie D Macy, Charles L Murphy, Deborah T Weiss, and Alan Solomon Human Immunology and Cancer Program, Department of Medicine, University of Tennessee Graduate School of Medicine, Knoxville, Tennessee, United States of America We previously have communicated our discovery that the amyloid associated with calcifying epithelial odontogenic tumors is composed of N-terminal fragments of the structurally novel odontogenic ameloblast-associated protein designated ODAM. Sub- sequently, it was shown by other investigators that ODAM is expressed in rodent enamel organ and is likely involved in dental de- velopment. We now report that this molecule also is found in certain human tissues, principally the salivary gland and trachea, as evidenced by RNA array analysis and immunohistochemistry-utilizing antibodies prepared against synthetic ODAM-related peptides and recombinant protein. Notably, these reagents immunostained normal and malignant ameloblasts and other types of human neoplastic cells, including those of gastric, lung, and breast origin where the presence in the latter was confirmed by in situ hybridization using gene-specific molecular probes. Moreover, significant titers of anti-ODAM IgG antibodies were detected in the sera of patients with these malignancies. Our studies have provided the first evidence in humans for the cellular expression of ODAM in normal and diseased states. Based on our findings, we posit that ODAM is a developmental antigen that has an es- sential role in tooth maturation and in the pathogenesis of certain odontogenic and other epithelial neoplasms; further, we sug- gest that ODAM may serve as a novel prognostic biomarker, as well as a potential diagnostic and therapeutic target for patients with breast and other epithelial forms of cancer.
    [Show full text]
  • Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene
    Hodgkin Lymphoma SUPPLEMENTARY APPENDIX Whole exome sequencing in families at high risk for Hodgkin lymphoma: identification of a predisposing mutation in the KDR gene Melissa Rotunno, 1 Mary L. McMaster, 1 Joseph Boland, 2 Sara Bass, 2 Xijun Zhang, 2 Laurie Burdett, 2 Belynda Hicks, 2 Sarangan Ravichandran, 3 Brian T. Luke, 3 Meredith Yeager, 2 Laura Fontaine, 4 Paula L. Hyland, 1 Alisa M. Goldstein, 1 NCI DCEG Cancer Sequencing Working Group, NCI DCEG Cancer Genomics Research Laboratory, Stephen J. Chanock, 5 Neil E. Caporaso, 1 Margaret A. Tucker, 6 and Lynn R. Goldin 1 1Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 2Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 3Ad - vanced Biomedical Computing Center, Leidos Biomedical Research Inc.; Frederick National Laboratory for Cancer Research, Frederick, MD; 4Westat, Inc., Rockville MD; 5Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; and 6Human Genetics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA ©2016 Ferrata Storti Foundation. This is an open-access paper. doi:10.3324/haematol.2015.135475 Received: August 19, 2015. Accepted: January 7, 2016. Pre-published: June 13, 2016. Correspondence: [email protected] Supplemental Author Information: NCI DCEG Cancer Sequencing Working Group: Mark H. Greene, Allan Hildesheim, Nan Hu, Maria Theresa Landi, Jennifer Loud, Phuong Mai, Lisa Mirabello, Lindsay Morton, Dilys Parry, Anand Pathak, Douglas R. Stewart, Philip R. Taylor, Geoffrey S. Tobias, Xiaohong R. Yang, Guoqin Yu NCI DCEG Cancer Genomics Research Laboratory: Salma Chowdhury, Michael Cullen, Casey Dagnall, Herbert Higson, Amy A.
    [Show full text]
  • Appendix 2. Significantly Differentially Regulated Genes in Term Compared with Second Trimester Amniotic Fluid Supernatant
    Appendix 2. Significantly Differentially Regulated Genes in Term Compared With Second Trimester Amniotic Fluid Supernatant Fold Change in term vs second trimester Amniotic Affymetrix Duplicate Fluid Probe ID probes Symbol Entrez Gene Name 1019.9 217059_at D MUC7 mucin 7, secreted 424.5 211735_x_at D SFTPC surfactant protein C 416.2 206835_at STATH statherin 363.4 214387_x_at D SFTPC surfactant protein C 295.5 205982_x_at D SFTPC surfactant protein C 288.7 1553454_at RPTN repetin solute carrier family 34 (sodium 251.3 204124_at SLC34A2 phosphate), member 2 238.9 206786_at HTN3 histatin 3 161.5 220191_at GKN1 gastrokine 1 152.7 223678_s_at D SFTPA2 surfactant protein A2 130.9 207430_s_at D MSMB microseminoprotein, beta- 99.0 214199_at SFTPD surfactant protein D major histocompatibility complex, class II, 96.5 210982_s_at D HLA-DRA DR alpha 96.5 221133_s_at D CLDN18 claudin 18 94.4 238222_at GKN2 gastrokine 2 93.7 1557961_s_at D LOC100127983 uncharacterized LOC100127983 93.1 229584_at LRRK2 leucine-rich repeat kinase 2 HOXD cluster antisense RNA 1 (non- 88.6 242042_s_at D HOXD-AS1 protein coding) 86.0 205569_at LAMP3 lysosomal-associated membrane protein 3 85.4 232698_at BPIFB2 BPI fold containing family B, member 2 84.4 205979_at SCGB2A1 secretoglobin, family 2A, member 1 84.3 230469_at RTKN2 rhotekin 2 82.2 204130_at HSD11B2 hydroxysteroid (11-beta) dehydrogenase 2 81.9 222242_s_at KLK5 kallikrein-related peptidase 5 77.0 237281_at AKAP14 A kinase (PRKA) anchor protein 14 76.7 1553602_at MUCL1 mucin-like 1 76.3 216359_at D MUC7 mucin 7,
    [Show full text]
  • Integrative Epigenomic and Genomic Analysis of Malignant Pheochromocytoma
    EXPERIMENTAL and MOLECULAR MEDICINE, Vol. 42, No. 7, 484-502, July 2010 Integrative epigenomic and genomic analysis of malignant pheochromocytoma Johanna Sandgren1,2* Robin Andersson3*, pression examination in a malignant pheochromocy- Alvaro Rada-Iglesias3, Stefan Enroth3, toma sample. The integrated analysis of the tumor ex- Goran̈ Akerstro̊ m̈ 1, Jan P. Dumanski2, pression levels, in relation to normal adrenal medulla, Jan Komorowski3,4, Gunnar Westin1 and indicated that either histone modifications or chromo- somal alterations, or both, have great impact on the ex- Claes Wadelius2,5 pression of a substantial fraction of the genes in the in- vestigated sample. Candidate tumor suppressor 1Department of Surgical Sciences genes identified with decreased expression, a Uppsala University, Uppsala University Hospital H3K27me3 mark and/or in regions of deletion were for SE-75185 Uppsala, Sweden 2 instance TGIF1, DSC3, TNFRSF10B, RASSF2, HOXA9, Department of Genetics and Pathology Rudbeck Laboratory, Uppsala University PTPRE and CDH11. More genes were found with in- SE-75185 Uppsala, Sweden creased expression, a H3K4me3 mark, and/or in re- 3The Linnaeus Centre for Bioinformatics gions of gain. Potential oncogenes detected among Uppsala University those were GNAS, INSM1, DOK5, ETV1, RET, NTRK1, SE-751 24 Uppsala, Sweden IGF2, and the H3K27 trimethylase gene EZH2. Our ap- 4Interdisciplinary Centre for Mathematical and proach to associate histone methylations and DNA Computational Modelling copy number changes to gene expression revealed ap- Warsaw University parent impact on global gene transcription, and en- PL-02-106 Warszawa, Poland abled the identification of candidate tumor genes for 5Corresponding author: Tel, 46-18-471-40-76; further exploration.
    [Show full text]
  • Open Data for Differential Network Analysis in Glioma
    International Journal of Molecular Sciences Article Open Data for Differential Network Analysis in Glioma , Claire Jean-Quartier * y , Fleur Jeanquartier y and Andreas Holzinger Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, 8036 Graz, Austria; [email protected] (F.J.); [email protected] (A.H.) * Correspondence: [email protected] These authors contributed equally to this work. y Received: 27 October 2019; Accepted: 3 January 2020; Published: 15 January 2020 Abstract: The complexity of cancer diseases demands bioinformatic techniques and translational research based on big data and personalized medicine. Open data enables researchers to accelerate cancer studies, save resources and foster collaboration. Several tools and programming approaches are available for analyzing data, including annotation, clustering, comparison and extrapolation, merging, enrichment, functional association and statistics. We exploit openly available data via cancer gene expression analysis, we apply refinement as well as enrichment analysis via gene ontology and conclude with graph-based visualization of involved protein interaction networks as a basis for signaling. The different databases allowed for the construction of huge networks or specified ones consisting of high-confidence interactions only. Several genes associated to glioma were isolated via a network analysis from top hub nodes as well as from an outlier analysis. The latter approach highlights a mitogen-activated protein kinase next to a member of histondeacetylases and a protein phosphatase as genes uncommonly associated with glioma. Cluster analysis from top hub nodes lists several identified glioma-associated gene products to function within protein complexes, including epidermal growth factors as well as cell cycle proteins or RAS proto-oncogenes.
    [Show full text]
  • WO 2012/174282 A2 20 December 2012 (20.12.2012) P O P C T
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2012/174282 A2 20 December 2012 (20.12.2012) P O P C T (51) International Patent Classification: David [US/US]; 13539 N . 95th Way, Scottsdale, AZ C12Q 1/68 (2006.01) 85260 (US). (21) International Application Number: (74) Agent: AKHAVAN, Ramin; Caris Science, Inc., 6655 N . PCT/US20 12/0425 19 Macarthur Blvd., Irving, TX 75039 (US). (22) International Filing Date: (81) Designated States (unless otherwise indicated, for every 14 June 2012 (14.06.2012) kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, English (25) Filing Language: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, Publication Language: English DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, KR, (30) Priority Data: KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME, 61/497,895 16 June 201 1 (16.06.201 1) US MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, 61/499,138 20 June 201 1 (20.06.201 1) US OM, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SC, SD, 61/501,680 27 June 201 1 (27.06.201 1) u s SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, 61/506,019 8 July 201 1(08.07.201 1) u s TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
    [Show full text]
  • "The Genecards Suite: from Gene Data Mining to Disease Genome Sequence Analyses". In: Current Protocols in Bioinformat
    The GeneCards Suite: From Gene Data UNIT 1.30 Mining to Disease Genome Sequence Analyses Gil Stelzer,1,5 Naomi Rosen,1,5 Inbar Plaschkes,1,2 Shahar Zimmerman,1 Michal Twik,1 Simon Fishilevich,1 Tsippi Iny Stein,1 Ron Nudel,1 Iris Lieder,2 Yaron Mazor,2 Sergey Kaplan,2 Dvir Dahary,2,4 David Warshawsky,3 Yaron Guan-Golan,3 Asher Kohn,3 Noa Rappaport,1 Marilyn Safran,1 and Doron Lancet1,6 1Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel 2LifeMap Sciences Ltd., Tel Aviv, Israel 3LifeMap Sciences Inc., Marshfield, Massachusetts 4Toldot Genetics Ltd., Hod Hasharon, Israel 5These authors contributed equally to the paper 6Corresponding author GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It au- tomatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. Var- Elect’s capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses.
    [Show full text]
  • The Spotted Gar Genome Illuminates Vertebrate Evolution and Facilitates Human-To-Teleost Comparisons
    Europe PMC Funders Group Author Manuscript Nat Genet. Author manuscript; available in PMC 2016 September 22. Published in final edited form as: Nat Genet. 2016 April ; 48(4): 427–437. doi:10.1038/ng.3526. Europe PMC Funders Author Manuscripts The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons A full list of authors and affiliations appears at the end of the article. Abstract To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. Europe PMC Funders Author Manuscripts Keywords GWAS; comparative medicine; polyploidy; zebrafish; medaka; neofunctionalization Teleost fish represent about half of all living vertebrate species1 and provide important models for human disease (e.g. zebrafish and medaka)2-9.
    [Show full text]