Supercomputing Pipeline for the Ark (SPARK) Thesis

Total Page:16

File Type:pdf, Size:1020Kb

Supercomputing Pipeline for the Ark (SPARK) Thesis Supercomputing Pipeline for The Ark (SPARK) Thilina Sunimal Ranaweera Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy Melbourne School of Population and Global Health The University of Melbourne August 2018 Produced on archival quality paper. Copyright c 2018 Thilina Sunimal Ranaweera All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author. Abstract Data is the primary asset of biomedical researchers and the engine for both dis- covery and translation. The challenge is to facilitate the discovery of new medical knowledge, supported by advanced computing technologies based on heterogeneous medical datasets. At the beginning of the 21st century, the much-anticipated Hu- man Genome Project was completed, thus commencing an era of high-dimensional genomic datasets. Additionally, the internet era has introduced advanced and more user-friendly computing technologies for data storage, processing, and analysis. The biomedical informatics research field has emerged to fill the void between medical research and computing technologies. The Ark is an open-source web-based medical research data management plat- form which currently supports multiple medical research studies. The Ark’s soft- ware architecture is modular, capable of managing multiple studies and sub-studies in a single database, including laboratory information and questionnaire data. The Ark’s role-based security infrastructure provides a secure environment for biomed- ical datasets. Prior to the work presented in this thesis, The Ark did not have the capability to effectively manage or analyse genomic data, nor model pedigree structures. This thesis presents a next-generation biomedical data management system, the Supercomputer Pipeline for The Ark (SPARK). SPARK provides a novel data management approach to pedigree and genomic datasets. The SPARK project en- iii ables access to high-performance computing (HPC) using a software architecture that builds upon The Ark’s researcher-friendly web-based interface. The Ark ge- nomics module can accommodate multiple HPC facilities simultaneously, enabling multiple SPARK nodes that are attached to each HPC facility to operate indepen- dently. A SPARK plugin format for analysis algorithms is implemented, promoting the exchange of algorithm implementations within the global genomics community. In addition to HPC support, the genomics module provides functions to manage high-dimensional genome-wide association study (GWAS) data. A mechanism to manage and analyse GWAS datasets using a mixed relational/non-relational model is designed and implemented. A further major contribution of this thesis is the first open-source pedigree data modelling and visualisation tool that is integrated with a sophisticated medical data management system. The Ark’s new pedigree module is capable of modelling n-th degree relatives and twins, including multiples. It can dynamically infer family structures based upon parental and twin relationships alone, and can visualize these structures with configurable annotations, for example, affected status. In summary, SPARK implements a novel knowledge discovery platform that in- corporates HPC and genomic data management capabilities into The Ark. SPARK empowers medical researchers, who do not have a computer science background, to exploit HPC resources for genomic medical research, share algorithms and data, and extract results in a user-friendly way. The pedigree module extends The Ark to han- dle family and twin-based studies, which are a powerful tool to better understand the origins of disease. SPARK represents an important milestone in translating HPC and Big Data management technologies to the medical research community. The design and implementation described in this thesis are sufficiently generic that they could be readily extended to other medical domains involving Big Data, and analyses that demand HPC. iv Declaration This is to certify that 1. the thesis comprises only my original work towards the PhD, 2. due acknowledgement has been made in the text to all other material used, 3. the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliographies and appendices. Thilina Sunimal Ranaweera, August 2018 v Acknowledgements First and foremost, I would like to express my sincere gratitude to my primary supervisor, Dr Adrian Bickerstaffe for his continuous support, guidance and inspi- ration throughout my PhD journey. Thank you for your understanding, patience, bestowing your knowledge and guiding me to the right research direction, and en- riching my life with your advice and words of wisdom. I could not have imagined a better supervisor for my PhD candidature. Besides my primary supervisor, heartfelt thanks go to my co-supervisors Dr Enes Makalic and Professor John L. Hopper. Thank you for your encouragement, insightful discussions, providing guidance and feedback to enrich my research im- mensely. Specially, I would like to thank Professor John L. Hopper for providing the opportunity to work in his research group and supporting my research work from his funds. I would also like to thank my fellow researchers and work colleagues at the Centre for Epidemiology and Biostatistics, The University of Melbourne School of Population and Global Health, for timely help and insightful discussions. I have met some amazing people during my PhD candidature, especially my colleagues at the Centre for Genetic Origins of Health and Disease, The University of Western Australia. Specially, I would like to thank Professor Eric Moses on initiating open-source based The Ark project to manage the biomedical research datasets, which has become the foundation of my PhD research. Also, I would like to thank The Ark development team based on the Centre for Genetic Origins vi of Health and Disease, The University of Western Australia for their effort and dedication to this project. Thank you for making my life enjoyable and for many unforgettable experiences with The Ark developments. Also, I would like to thank Dr Gillian Dite and Dr Julie Cantrill for proofreading my thesis and providing suggestions for improvements. Their suggestions have helped to improve my thesis content immensely. PhD journey is enriched with achievements and constant challenges, and I was very fortunate to have my wife by my side taking the same ride. My dearest wife Nilu, thank you for all your understanding, supporting me during difficult times, taking care of me and being patient with me when I was too busy to manage responsibilities as a husband. Also, thank you for making me laugh and making my life enjoyable. My PhD journey has become extra special as I had the greatest gift of my life, my daughter Minuli during my candidature. She made my life super busy but changed it for the better and motivated me to be successful. I appreciate my little girl for abiding my ignorance and the patience she showed during my thesis writing. Please accept sincere apologies of your Thaththa for not being able to spend more time and play with you. Without my mother, father, sister, and brother I would not be who I am today, and I am very fortunate to have them by my side. My mother supported me, motivated me and stayed with me like my own shadow during good times and hard times. Thank you Amma for being so special, kind, understanding and always wishing the best for me. My father always guided me through his knowledge and encouraged me to pursue higher studies. Thank you Taththa for believing in me and guiding me in the right direction. Finally, I would like to thank everyone who has been a friend to me, thank you very much for your friendship and the good times we shared. vii To my beloved family and respected teachers viii Contents List of Figures xv List of Tables xix 1 Introduction1 1.1 Biomedical informatics.........................1 1.1.1 Genomics............................4 1.2 The software for biomedical research.................5 1.2.1 The Ark............................8 1.2.2 Limitations of existing tools and systems...........9 1.3 SPARK................................. 11 1.3.1 Pedigree data management.................. 11 1.3.2 Access to high-performance computing resources...... 13 1.3.3 Genomic data management.................. 14 1.3.4 Fostering biomedical research collaboration......... 18 1.4 Thesis outline............................. 19 2 Literature Review 23 2.1 Introduction............................... 23 2.2 Data Management........................... 25 2.2.1 Data management systems................... 25 2.2.2 Relational database limitations................ 35 2.2.3 Current approaches to manage high-dimensional datasets. 38 2.3 Informatics applied to medical research and practice......... 46 2.3.1 Biomedical Informatics.................... 47 2.3.2 Informatics approaches to medical data management.... 47 2.4 GWAS.................................. 54 2.4.1 GWAS analysis techniques.................. 58 2.4.2 Systems for GWAS data management and analysis..... 60 2.4.3 Notable GWAS findings.................... 64 2.4.4 Beyond GWAS and towards personalised medicine..... 66 2.5 HPC................................... 67 ix 2.5.1 Introduction to HPC...................... 68 2.5.2 Types of HPC......................... 71 2.5.3 Strengths and weaknesses................... 74 2.6 Conclusion............................... 77 3 The Ark 79 3.1 Introduction............................... 79 3.2
Recommended publications
  • Ovarian Gene Expression in the Absence of FIGLA, an Oocyte
    BMC Developmental Biology BioMed Central Research article Open Access Ovarian gene expression in the absence of FIGLA, an oocyte-specific transcription factor Saurabh Joshi*1, Holly Davies1, Lauren Porter Sims2, Shawn E Levy2 and Jurrien Dean1 Address: 1Laboratory of Cellular and Developmental Biology, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA and 2Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA Email: Saurabh Joshi* - [email protected]; Holly Davies - [email protected]; Lauren Porter Sims - [email protected]; Shawn E Levy - [email protected]; Jurrien Dean - [email protected] * Corresponding author Published: 13 June 2007 Received: 11 December 2006 Accepted: 13 June 2007 BMC Developmental Biology 2007, 7:67 doi:10.1186/1471-213X-7-67 This article is available from: http://www.biomedcentral.com/1471-213X/7/67 © 2007 Joshi et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Ovarian folliculogenesis in mammals is a complex process involving interactions between germ and somatic cells. Carefully orchestrated expression of transcription factors, cell adhesion molecules and growth factors are required for success. We have identified a germ-cell specific, basic helix-loop-helix transcription factor, FIGLA (Factor In the GermLine, Alpha) and demonstrated its involvement in two independent developmental processes: formation of the primordial follicle and coordinate expression of zona pellucida genes. Results: Taking advantage of Figla null mouse lines, we have used a combined approach of microarray and Serial Analysis of Gene Expression (SAGE) to identify potential downstream target genes.
    [Show full text]
  • Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene
    Hodgkin Lymphoma SUPPLEMENTARY APPENDIX Whole exome sequencing in families at high risk for Hodgkin lymphoma: identification of a predisposing mutation in the KDR gene Melissa Rotunno, 1 Mary L. McMaster, 1 Joseph Boland, 2 Sara Bass, 2 Xijun Zhang, 2 Laurie Burdett, 2 Belynda Hicks, 2 Sarangan Ravichandran, 3 Brian T. Luke, 3 Meredith Yeager, 2 Laura Fontaine, 4 Paula L. Hyland, 1 Alisa M. Goldstein, 1 NCI DCEG Cancer Sequencing Working Group, NCI DCEG Cancer Genomics Research Laboratory, Stephen J. Chanock, 5 Neil E. Caporaso, 1 Margaret A. Tucker, 6 and Lynn R. Goldin 1 1Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 2Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; 3Ad - vanced Biomedical Computing Center, Leidos Biomedical Research Inc.; Frederick National Laboratory for Cancer Research, Frederick, MD; 4Westat, Inc., Rockville MD; 5Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD; and 6Human Genetics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA ©2016 Ferrata Storti Foundation. This is an open-access paper. doi:10.3324/haematol.2015.135475 Received: August 19, 2015. Accepted: January 7, 2016. Pre-published: June 13, 2016. Correspondence: [email protected] Supplemental Author Information: NCI DCEG Cancer Sequencing Working Group: Mark H. Greene, Allan Hildesheim, Nan Hu, Maria Theresa Landi, Jennifer Loud, Phuong Mai, Lisa Mirabello, Lindsay Morton, Dilys Parry, Anand Pathak, Douglas R. Stewart, Philip R. Taylor, Geoffrey S. Tobias, Xiaohong R. Yang, Guoqin Yu NCI DCEG Cancer Genomics Research Laboratory: Salma Chowdhury, Michael Cullen, Casey Dagnall, Herbert Higson, Amy A.
    [Show full text]
  • Goat Anti-Actin-Like 7B Antibody Peptide-Affinity Purified Goat Antibody Catalog # Af1021b
    10320 Camino Santa Fe, Suite G San Diego, CA 92121 Tel: 858.875.1900 Fax: 858.622.0609 Goat Anti-Actin-like 7B Antibody Peptide-affinity purified goat antibody Catalog # AF1021b Specification Goat Anti-Actin-like 7B Antibody - Product Information Application WB Primary Accession Q9Y614 Other Accession NP_006677, 10880 Reactivity Human Host Goat Clonality Polyclonal Concentration 100ug/200ul Isotype IgG Calculated MW 45234 Goat Anti-Actin-like 7B Antibody - Additional AF1021b (0.03 µg/ml) staining of human Information testis lysate (35 µg protein in RIPA buffer). Primary incubation was 1 hour. Detected by Gene ID 10880 chemiluminescence. Other Names Actin-like protein 7B, Actin-like-7-beta, Goat Anti-Actin-like 7B Antibody - ACTL7B Background Format The protein encoded by this gene is a member 0.5 mg IgG/ml in Tris saline (20mM Tris of a family of actin-related proteins (ARPs) pH7.3, 150mM NaCl), 0.02% sodium azide, which share significant amino acid sequence with 0.5% bovine serum albumin identity to conventional actins. Both actins and ARPs have an actin fold, which is an Storage ATP-binding cleft, as a common feature. The Maintain refrigerated at 2-8°C for up to 6 ARPs are involved in diverse cellular processes, months. For long term storage store at including vesicular transport, spindle -20°C in small aliquots to prevent orientation, nuclear migration and chromatin freeze-thaw cycles. remodeling. This gene (ACTL7B), and related gene, ACTL7A, are intronless, and are located Precautions Goat Anti-Actin-like 7B Antibody is for approximately 4 kb apart in a head-to-head research use only and not for use in orientation within the familial dysautonomia diagnostic or therapeutic procedures.
    [Show full text]
  • Genome-Wide Association Studies of Smooth Pursuit and Antisaccade Eye Movements in Psychotic Disorders: findings from the B-SNIP Study
    OPEN Citation: Transl Psychiatry (2017) 7, e1249; doi:10.1038/tp.2017.210 www.nature.com/tp ORIGINAL ARTICLE Genome-wide association studies of smooth pursuit and antisaccade eye movements in psychotic disorders: findings from the B-SNIP study R Lencer1, LJ Mills2, N Alliey-Rodriguez3, R Shafee4,5,AMLee6, JL Reilly7, A Sprenger8, JE McDowell9, SA McCarroll4, MS Keshavan10, GD Pearlson11,12, CA Tamminga13, BA Clementz9, ES Gershon3, JA Sweeney13,14 and JR Bishop6,15 Eye movement deviations, particularly deficits of initial sensorimotor processing and sustained pursuit maintenance, and antisaccade inhibition errors, are established intermediate phenotypes for psychotic disorders. We here studied eye movement measures of 849 participants from the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) study (schizophrenia N = 230, schizoaffective disorder N = 155, psychotic bipolar disorder N = 206 and healthy controls N = 258) as quantitative phenotypes in relation to genetic data, while controlling for genetically derived ancestry measures, age and sex. A mixed-modeling genome-wide association studies approach was used including ~ 4.4 million genotypes (PsychChip and 1000 Genomes imputation). Across participants, sensorimotor processing at pursuit initiation was significantly associated with a single nucleotide polymorphism in IPO8 (12p11.21, P =8×10− 11), whereas suggestive associations with sustained pursuit maintenance were identified with SNPs in SH3GL2 (9p22.2, P =3×10− 8). In participants of predominantly African ancestry, sensorimotor processing was also significantly associated with SNPs in PCDH12 (5q31.3, P = 1.6 × 10 − 10), and suggestive associations were observed with NRSN1 (6p22.3, P = 5.4 × 10 −8) and LMO7 (13q22.2, P = 7.3x10−8), whereas antisaccade error rate was significantly associated with a non-coding region at chromosome 7 (P = 6.5 × 10− 9).
    [Show full text]
  • Methylation of Leukocyte DNA and Ovarian Cancer
    Fridley et al. BMC Medical Genomics 2014, 7:21 http://www.biomedcentral.com/1755-8794/7/21 RESEARCH ARTICLE Open Access Methylation of leukocyte DNA and ovarian cancer: relationships with disease status and outcome Brooke L Fridley1*, Sebastian M Armasu2, Mine S Cicek2, Melissa C Larson2, Chen Wang2, Stacey J Winham2, Kimberly R Kalli3, Devin C Koestler1,4, David N Rider2, Viji Shridhar5, Janet E Olson2, Julie M Cunningham5 and Ellen L Goode2 Abstract Background: Genome-wide interrogation of DNA methylation (DNAm) in blood-derived leukocytes has become feasible with the advent of CpG genotyping arrays. In epithelial ovarian cancer (EOC), one report found substantial DNAm differences between cases and controls; however, many of these disease-associated CpGs were attributed to differences in white blood cell type distributions. Methods: We examined blood-based DNAm in 336 EOC cases and 398 controls; we included only high-quality CpG loci that did not show evidence of association with white blood cell type distributions to evaluate association with case status and overall survival. Results: Of 13,816 CpGs, no significant associations were observed with survival, although eight CpGs associated with survival at p < 10−3, including methylation within a CpG island located in the promoter region of GABRE (p = 5.38 x 10−5, HR = 0.95). In contrast, 53 CpG methylation sites were significantly associated with EOC risk (p <5 x10−6). The top association was observed for the methylation probe cg04834572 located approximately 315 kb upstream of DUSP13 (p = 1.6 x10−14). Other disease-associated CpGs included those near or within HHIP (cg14580567; p =5.6x10−11), HDAC3 (cg10414058; p = 6.3x10−12), and SCR (cg05498681; p = 4.8x10−7).
    [Show full text]
  • A Rapidly Evolving Actin Mediates Fertility and Developmental Tradeoffs in Drosophila
    bioRxiv preprint doi: https://doi.org/10.1101/2020.09.28.317503; this version posted September 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 A rapidly evolving actin mediates fertility and developmental tradeoffs in Drosophila Courtney M. Schroeder1,*, Sarah A. Tomlin1,2, John R. Valenzuela1, and Harmit S. Malik1,2,* 1Division of Basic Sciences & 2Howard HuGhes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, WA RunninG Title: Actin paraloG Arp53D plays opposinG roles in fertility and development *Correspondence should be addressed to: Courtney Schroeder, 1100 Fairview Avenue N. A2-025, Seattle, WA 98109, [email protected]; Harmit S. Malik, 1100 Fairview Avenue N. A2-205, Seattle, WA 98109, [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.09.28.317503; this version posted September 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2 Abstract Most actin-related proteins (Arps) are highly conserved in eukaryotes, where they carry out well-defined cellular functions. Drosophila and mammals also encode divergent non- canonical Arps in their male-germline whose roles remain unknown. Here, we show that Arp53D, a rapidly-evolving Drosophila Arp, localizes to fusomes and actin cones, two male germline-specific actin structures critical for sperm maturation, via its non- canonical N-terminal tail.
    [Show full text]
  • The Human Gene Connectome As a Map of Short Cuts for Morbid Allele Discovery
    The human gene connectome as a map of short cuts for morbid allele discovery Yuval Itana,1, Shen-Ying Zhanga,b, Guillaume Vogta,b, Avinash Abhyankara, Melina Hermana, Patrick Nitschkec, Dror Friedd, Lluis Quintana-Murcie, Laurent Abela,b, and Jean-Laurent Casanovaa,b,f aSt. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065; bLaboratory of Human Genetics of Infectious Diseases, Necker Branch, Paris Descartes University, Institut National de la Santé et de la Recherche Médicale U980, Necker Medical School, 75015 Paris, France; cPlateforme Bioinformatique, Université Paris Descartes, 75116 Paris, France; dDepartment of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel; eUnit of Human Evolutionary Genetics, Centre National de la Recherche Scientifique, Unité de Recherche Associée 3012, Institut Pasteur, F-75015 Paris, France; and fPediatric Immunology-Hematology Unit, Necker Hospital for Sick Children, 75015 Paris, France Edited* by Bruce Beutler, University of Texas Southwestern Medical Center, Dallas, TX, and approved February 15, 2013 (received for review October 19, 2012) High-throughput genomic data reveal thousands of gene variants to detect a single mutated gene, with the other polymorphic genes per patient, and it is often difficult to determine which of these being of less interest. This goes some way to explaining why, variants underlies disease in a given individual. However, at the despite the abundance of NGS data, the discovery of disease- population level, there may be some degree of phenotypic homo- causing alleles from such data remains somewhat limited. geneity, with alterations of specific physiological pathways under- We developed the human gene connectome (HGC) to over- come this problem.
    [Show full text]
  • Epigenetic Mechanisms Are Involved in the Oncogenic Properties of ZNF518B in Colorectal Cancer
    Epigenetic mechanisms are involved in the oncogenic properties of ZNF518B in colorectal cancer Francisco Gimeno-Valiente, Ángela L. Riffo-Campos, Luis Torres, Noelia Tarazona, Valentina Gambardella, Andrés Cervantes, Gerardo López-Rodas, Luis Franco and Josefa Castillo SUPPLEMENTARY METHODS 1. Selection of genomic sequences for ChIP analysis To select the sequences for ChIP analysis in the five putative target genes, namely, PADI3, ZDHHC2, RGS4, EFNA5 and KAT2B, the genomic region corresponding to the gene was downloaded from Ensembl. Then, zoom was applied to see in detail the promoter, enhancers and regulatory sequences. The details for HCT116 cells were then recovered and the target sequences for factor binding examined. Obviously, there are not data for ZNF518B, but special attention was paid to the target sequences of other zinc-finger containing factors. Finally, the regions that may putatively bind ZNF518B were selected and primers defining amplicons spanning such sequences were searched out. Supplementary Figure S3 gives the location of the amplicons used in each gene. 2. Obtaining the raw data and generating the BAM files for in silico analysis of the effects of EHMT2 and EZH2 silencing The data of siEZH2 (SRR6384524), siG9a (SRR6384526) and siNon-target (SRR6384521) in HCT116 cell line, were downloaded from SRA (Bioproject PRJNA422822, https://www.ncbi. nlm.nih.gov/bioproject/), using SRA-tolkit (https://ncbi.github.io/sra-tools/). All data correspond to RNAseq single end. doBasics = TRUE doAll = FALSE $ fastq-dump -I --split-files SRR6384524 Data quality was checked using the software fastqc (https://www.bioinformatics.babraham. ac.uk /projects/fastqc/). The first low quality removing nucleotides were removed using FASTX- Toolkit (http://hannonlab.cshl.edu/fastxtoolkit/).
    [Show full text]
  • The Human Gene Connectome As a Map of Short Cuts for Morbid Allele Discovery
    The human gene connectome as a map of short cuts for morbid allele discovery Yuval Itana,1, Shen-Ying Zhanga,b, Guillaume Vogta,b, Avinash Abhyankara, Melina Hermana, Patrick Nitschkec, Dror Friedd, Lluis Quintana-Murcie, Laurent Abela,b, and Jean-Laurent Casanovaa,b,f aSt. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065; bLaboratory of Human Genetics of Infectious Diseases, Necker Branch, Paris Descartes University, Institut National de la Santé et de la Recherche Médicale U980, Necker Medical School, 75015 Paris, France; cPlateforme Bioinformatique, Université Paris Descartes, 75116 Paris, France; dDepartment of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel; eUnit of Human Evolutionary Genetics, Centre National de la Recherche Scientifique, Unité de Recherche Associée 3012, Institut Pasteur, F-75015 Paris, France; and fPediatric Immunology-Hematology Unit, Necker Hospital for Sick Children, 75015 Paris, France Edited* by Bruce Beutler, University of Texas Southwestern Medical Center, Dallas, TX, and approved February 15, 2013 (received for review October 19, 2012) High-throughput genomic data reveal thousands of gene variants to detect a single mutated gene, with the other polymorphic genes per patient, and it is often difficult to determine which of these being of less interest. This goes some way to explaining why, variants underlies disease in a given individual. However, at the despite the abundance of NGS data, the discovery of disease- population level, there may be some degree of phenotypic homo- causing alleles from such data remains somewhat limited. geneity, with alterations of specific physiological pathways under- We developed the human gene connectome (HGC) to over- come this problem.
    [Show full text]
  • Isolation and Identification of Proteins from Swine Sperm Chromatin and Nuclear Matrix
    DOI: 10.21451/1984-3143-AR816 Anim. Reprod., v.14, n.2, p.418-428, Apr./Jun. 2017 Isolation and identification of proteins from swine sperm chromatin and nuclear matrix Guilherme Arantes Mendonça1,3, Romualdo Morandi Filho2, Elisson Terêncio Souza2, Thais Schwarz Gaggini1, Marina Cruvinel Assunção Silva-Mendonça1, Robson Carlos Antunes1, Marcelo Emílio Beletti1,2 1Post-graduation Program in Veterinary Science, Federal University of Uberlandia, Uberlandia, MG, Brazil. 2Post-graduation Program in Cellular and Molecular Biology, Federal University of Uberlandia, Uberlandia, MG, Brazil. Abstract (Yamauchi et al., 2011). According to the same authors, these active sperm chromatin sites in protamine toroids The aim of this study was to perform a may contain important epigenetic information for the proteomic analysis to isolate and identify proteins from developing embryo. the swine sperm nuclear matrix to contribute to a The isolated use of genomic and transcriptomic database of swine sperm nuclear proteins. We used pre- information may be insufficient to fully understand a chilled diluted semen from seven boars (19 to 24 week- complex organism because proteomics and old) from the commercial line Landrace x Large White transcriptomics can be discordant and DNA-RNA x Pietran. The semen was processed to separate the relationships cannot be fully correlated. Thus, sperm heads and extract the chromatin and nuclear measurements of other metabolic levels should also be matrix for protein quantification and analysis by mass obtained, such as the study of proteins (Wright et al., spectrometry, by LTQ Orbitrap ELITE mass 2012). According to these same authors, large-scale spectrometer (Thermo-Finnigan) coupled to a nanoflow protein research in organisms (i.e., the proteome-protein chromatography system (LC-MS/MS).
    [Show full text]
  • Content Based Search in Gene Expression Databases and a Meta-Analysis of Host Responses to Infection
    Content Based Search in Gene Expression Databases and a Meta-analysis of Host Responses to Infection A Thesis Submitted to the Faculty of Drexel University by Francis X. Bell in partial fulfillment of the requirements for the degree of Doctor of Philosophy November 2015 c Copyright 2015 Francis X. Bell. All Rights Reserved. ii Acknowledgments I would like to acknowledge and thank my advisor, Dr. Ahmet Sacan. Without his advice, support, and patience I would not have been able to accomplish all that I have. I would also like to thank my committee members and the Biomed Faculty that have guided me. I would like to give a special thanks for the members of the bioinformatics lab, in particular the members of the Sacan lab: Rehman Qureshi, Daisy Heng Yang, April Chunyu Zhao, and Yiqian Zhou. Thank you for creating a pleasant and friendly environment in the lab. I give the members of my family my sincerest gratitude for all that they have done for me. I cannot begin to repay my parents for their sacrifices. I am eternally grateful for everything they have done. The support of my sisters and their encouragement gave me the strength to persevere to the end. iii Table of Contents LIST OF TABLES.......................................................................... vii LIST OF FIGURES ........................................................................ xiv ABSTRACT ................................................................................ xvii 1. A BRIEF INTRODUCTION TO GENE EXPRESSION............................. 1 1.1 Central Dogma of Molecular Biology........................................... 1 1.1.1 Basic Transfers .......................................................... 1 1.1.2 Uncommon Transfers ................................................... 3 1.2 Gene Expression ................................................................. 4 1.2.1 Estimating Gene Expression ............................................ 4 1.2.2 DNA Microarrays ......................................................
    [Show full text]
  • IL-1Β Regulates FHL2 and Other Cytoskeleton-Related Genes in Human Chondrocytes
    IL-1β Regulates FHL2 and Other Cytoskeleton-Related Genes in Human Chondrocytes Helga Joos,1 Wolfgang Albrecht,2 Stefan Laufer,3 Heiko Reichel,4 and Rolf E Brenner1 1Division for Biochemistry of Joint and Connective Tissue Diseases, Department of Orthopedics, University of Ulm, Ulm, Germany; 2ratiopharm group, Department of Drug Research, Ulm, Germany; 3Institute of Pharmacy, Department of Pharmaceutical and Medicinal Chemistry, Eberhard-Karls University Tübingen, Tübingen, Germany; 4Department of Orthopedics, University of Ulm, Ulm, Germany In osteoarthritis (OA), cartilage destruction is associated not only with an imbalance of anabolic and catabolic processes but also with alterations of the cytoskeletal organization in chondrocytes, although their pathogenetic origin is largely unknown so far. Therefore, we have studied possible effects of the proinflammatory cytokine IL-1β on components of the cytoskeleton in OA chondrocytes on gene expression level. Using a whole genome array, we found that IL-1β is involved in the regulation of many cytoskeleton-related genes. Apart from well-known cytoskeletal components, the expression and regulation of four genes cod- ing for LIM proteins were shown. These four genes were previously undescribed in the chondrocyte context. Quantitative PCR analysis confirmed significant downregulation of Fhl1, Fhl2, Lasp1, and Pdlim1 as well as Tubb and Vim by IL-1β. Inhibition of p38 mitogen-activated protein kinase (MAPK) by SB203580 counteracted the influence of IL-1β on Fhl2 and Tubb expression, indicat- ing partial involvement of this signaling pathway. Downregulation of the LIM-only protein FHL2 was confirmed additionally on the protein level. In agreement with these results, IL-1β induced changes in the morphology of chondrocytes, the organization of the cytoskeleton, and the cellular distribution of FHL2.
    [Show full text]