Bioinformatic Characterization and Analysis of Polymorphic Inversions in the Human Genome

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatic Characterization and Analysis of Polymorphic Inversions in the Human Genome Bioinformatic characterization and analysis of polymorphic inversions in the human genome Author: Alexander Martínez Fundichely DOCTORAL THESIS UPF = YEAR 2013 Supervisor: Dr. Mario Cáceres Aguilar Tutor: Dr. Xavier Estivill Pallejà Department of Experimental and Health Sciences A thesis submitted for the degree of PhilosophiæDoctor (PhD) in Biomedicine. In the department of Experimental and Health Sciences of the Universitat Pompeu Fabra (UPF). By the doctoral candidate Msc. Alexander Martínez Fundichely Supervisor: Principal investigator of the Institut de Biotecnologia i de Biomedicina (IBB), Universitat Autònoma de Barcelona (UAB), Bellaterra, Barcelona, Spain. And member of the Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. Dr. Mario Cáceres Aguilar Tutor: Senior investigator at the Center for Genomic Regulation (CRG), Barcelona, Catalonia in Spain. And associate professor of Pompeu Fabra University (UPF) Dr. Xavier Estivill Pallejà Day of the defense: 12=12=2013 To my Mom To my maternal grandma Especially to my missing grandparents and From the bottom of my heart, I wish share this achievement with my Dad (without you physically present, everything is harder than that we talked before, but step by step I’m keeping on the play). ACKNOWLEDGEMENTS First and foremost I offer my sincerest and utmost gratitude to my senior su- pervisor, Dr. Mario Cáceres, who has supported me throughout my graduate studies with his encouragement, guidance and resourcefulness. In addition to his experience in genomics that was a crucial factor in the success of this thesis, I have learned the requirements of being a good researcher from him. I thank him one more time for all his help. I would like to give my regards and appreciations to Dr. Xavier Estivill, this whole story began with his acceptance to my visit to his lab. Starting from the first beginning, I would also like to thank my other collaborators in the CRG but especially thanks to the people in the lab of IBB, a real nice people who offered me friendship and collaborations and all of them without exception have been important in the development of this thesis. Their motivation was a huge driving force for me. Especially thanks to my new family for the love, affection and support they gave me in all these years, that helps me to overcome the sadness of being far away from my loved ones and feel like one more catalan. I would also like to thank all my friends every one are mentioned here al- though not seen the name list,(Cuban, Catalan, Spanish...) their help has been very important in this achievement. Last, but not least, I thank to my mom and grandma, their support even at the distance has been essential to reach this goal, and thank to all the other Cuban family who are interested in me, and always helps my mother when she needs. Alexander Martinez Fundichely v ABSTRACT Within the great interest in the characterization of genomic structural variants (SVs) in the human genome, inversions present unique challenges and have been little studied. This thesis has developed GRIAL, a new algorithm focused specif- ically in detect and map accurately inversions from paired-end mapping (PEM) data, which is the most widely used method to detect SVs. GRIAL is based on geometrical rules to cluster, merge and refine both breakpoints of putative inver- sions. That way, we have been able to predict hundreds of inversions in the human genome. In addition, thanks to the different GRIAL quality scores, we have been able to identify spurious PEM-patterns and their causes, and discard a big fraction of the predicted inversions as false positives. Furthermore, we have created In- vFEST, the first database of human polymorphic inversions, which represents the most reliable catalogue of inversions and integrates all the associated information from multiple sources. Currently, InvFEST combines information from 34 dif- ferent studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Finally, the analysis of all the data gener- ated has provided information on the genomic patterns of inversions, contributing decisively to the understanding of the map of human polymorphic inversions. vii RESUMEN Dentro del estudio de las variantes estructurales en el genoma humano, las inversiones presentan retos específicos y han sido poco caracterizadas. Esta tesis aborda este problema a través de la implementación de GRIAL un nuevo algoritmo específicamente diseñado para detectar y localizar de forma precisa las inversiones a partir de datos de mapeo de secuencias apareadas PEM1, que es el método más uti- lizado para estudiar la variación estructural. GRIAL se basa en reglas geométricas para agrupar los patrones de PEM correspondientes a los posibles puntos de rotura y refinar su localización para cada inversión. Los resultados de GRIAL nos per- mitieron predecir cientos de inversiones en el genoma humano. Además, gracias a la creación de índices de fiabilidad para las predicciones, se ha podido identificar patrones de inversión incorrectos y sus causas, descartando un gran número de predicciones posiblemente falsas. Por otra parte, se ha creado InvFEST, la primera base de datos dedicada a inversiones polimórficas en el genoma humano, la cual representa el catálogo más fiable de inversiones e integra toda la información aso- ciada disponible de múltiples fuentes. Actualmente, InvFEST combina informa- ción de 34 estudios diferentes e incluye 1092 inversiones clasificadas según crite- rios internos y anotación manual. Por último el análisis de toda la información generada, nos ha permitido describir los patrones genómicos de las inversiones contribuyendo decisivamente a descifrar el mapa de las inversiones polimórficas humanas. 1del inglés paired-end mapping (PEM) ix PREFACE A little more than a decade pass from the completion of the Human Genome Project, the efficiency of DNA sequencing has drastically improved. The high throughput next generation sequencing (NGS) technologies have been reducing the cost and are increas- ing the capacity for sequence production of thousand of individuals at an unprecedented rate on within different international project [1000 Genomes Project et al., 2012; Inter- national Cancer Genome et al., 2010]. These newest sequencing technologies adding to traditional Sanger sequencing have significantly changed how genomic research is con- ducted, and have provided new insights on the genetic basis of phenotypic and disease- susceptibility differences between individuals through the uncovered an unprecedented degree of structural variation in the human genome. Although the Database of Genomic Variants (DGV) is showing the great advances on identifying of several types of variations among individual genomes. The genomic inversions have been relatively disregarded compared to copy number variations (CNVs) due to their difficulty of study that makes the analysis of these variants on the sequenced genomes very challenging mainly due to the complex, and repetitive nature of human genomes that in particular become much harder when dealing with balanced rearrange- ments. This thesis starts in chapter 1 with a general introduction on the study of structural variation with particular emphasizing in the inversions. This chapter exposed the state- ment of the scientific problem addressed and the main objective of the thesis project. The results have three integral parts that have a strong connection to each other from a methodological point of view, each part is addressing different levels of study of the xi PREFACE inversions and are represented in the three manuscript of papers in chapters of results (2, 3, 4). The first result in chapters 2 is focused on genome sequence analysis, in particular development of computational methods for structural variation discovery in sequenced genomes. We present our effort in developing novel paired-end mapping algorithm for identifying specifically inversions, adding also two scores to assess the reliability of the predictions, as well as a new dataset of inversion predictions discovered in multiple sample genomes. In this part, we also address some problems of paired-end sequencing experiment, based on fosmid clones. The second result of this thesis in chapters 3, is focused on develop a data base for a comprehensive integration of all information about the inversions in the human genome. In this case we create an alternative data source centered on human inversions studies, to store the predicted inversions and their accu- rate break points, validation status or population distribution among other data. The last third result of this thesis in the chapters 4 is, however, focused on the descriptive analysis of genomic patterns of the current information about human inversion poly- morphisms. We discuss several features of the most reliable catalogue of inversion that represent a preliminary characterization of this variant in the human genome that is paramount to better understanding of the possible biases and trends in the detection of inversions in human genome. The chapter 5 is a general discussion that presents in a integrated global result the three topic developed in the thesis project and their contributions. The thesis concludes with the chapter 6 that are a general conclusion of the thesis. The different results of this work have been presented at the RECOMB/ISCB 2012 and ISMB/ECCB 2012 and 2013 conferences. The chapter 3 is already published in Nucleic Acids Research (NAR) 2013. Finally in the appendix part are also presented two scientific articles in which the developing of this thesis has collaborated. xii CONTENTS Abstract vii Resumen ix Preface xi List of Tables xvii List of Figures xix Glossary xxi 1 General introduction 1 1.1 The study of the human genome . 1 1.2 Genomic structural variation . 4 1.3 Copy number variation (CNV) . 7 1.4 Genomic inversions . 9 1.4.1 Origin of genomic inversions . 11 1.4.1.1 Homologous recombination mechanisms . 12 1.4.1.2 Non-homologous and microhomology mechanisms . 15 1.4.2 Inversion effects . 18 1.4.2.1 Mutational effect .
Recommended publications
  • CST9L (NM 080610) Human Tagged ORF Clone – RC206646L4
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for RC206646L4 CST9L (NM_080610) Human Tagged ORF Clone Product data: Product Type: Expression Plasmids Product Name: CST9L (NM_080610) Human Tagged ORF Clone Tag: mGFP Symbol: CST9L Synonyms: bA218C14.1; CTES7B Vector: pLenti-C-mGFP-P2A-Puro (PS100093) E. coli Selection: Chloramphenicol (34 ug/mL) Cell Selection: Puromycin ORF Nucleotide The ORF insert of this clone is exactly the same as(RC206646). Sequence: Restriction Sites: SgfI-MluI Cloning Scheme: ACCN: NM_080610 ORF Size: 441 bp This product is to be used for laboratory only. Not for diagnostic or therapeutic use. View online » ©2021 OriGene Technologies, Inc., 9620 Medical Center Drive, Ste 200, Rockville, MD 20850, US 1 / 2 CST9L (NM_080610) Human Tagged ORF Clone – RC206646L4 OTI Disclaimer: The molecular sequence of this clone aligns with the gene accession number as a point of reference only. However, individual transcript sequences of the same gene can differ through naturally occurring variations (e.g. polymorphisms), each with its own valid existence. This clone is substantially in agreement with the reference, but a complete review of all prevailing variants is recommended prior to use. More info OTI Annotation: This clone was engineered to express the complete ORF with an expression tag. Expression varies depending on the nature of the gene. RefSeq: NM_080610.1 RefSeq Size: 982 bp RefSeq ORF: 444 bp Locus ID: 128821 UniProt ID: Q9H4G1, A0A140VJH1 Protein Families: Secreted Protein, Transmembrane MW: 17.3 kDa Gene Summary: The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences.
    [Show full text]
  • Understanding Chronic Kidney Disease: Genetic and Epigenetic Approaches
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2017 Understanding Chronic Kidney Disease: Genetic And Epigenetic Approaches Yi-An Ko Ko University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Genetics Commons, and the Systems Biology Commons Recommended Citation Ko, Yi-An Ko, "Understanding Chronic Kidney Disease: Genetic And Epigenetic Approaches" (2017). Publicly Accessible Penn Dissertations. 2404. https://repository.upenn.edu/edissertations/2404 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/2404 For more information, please contact [email protected]. Understanding Chronic Kidney Disease: Genetic And Epigenetic Approaches Abstract The work described in this dissertation aimed to better understand the genetic and epigenetic factors influencing chronic kidney disease (CKD) development. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) significantly associated with chronic kidney disease. However, these studies have not effectively identified target genes for the CKD variants. Most of the identified variants are localized to non-coding genomic regions, and how they associate with CKD development is not well-understood. As GWAS studies only explain a small fraction of heritability, we hypothesized that epigenetic changes could explain part of this missing heritability. To identify potential gene targets of the genetic variants, we performed expression quantitative loci (eQTL) analysis, using genotyping arrays and RNA sequencing from human kidney samples. To identify the target genes of CKD-associated SNPs, we integrated the GWAS-identified SNPs with the eQTL results using a Bayesian colocalization method, coloc. This resulted in a short list of target genes, including PGAP3 and CASP9, two genes that have been shown to present with kidney phenotypes in knockout mice.
    [Show full text]
  • Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected]
    University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School July 2017 Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected] Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the Pathology Commons Scholar Commons Citation Mohamed, Mai, "Role of Amylase in Ovarian Cancer" (2017). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6907 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Role of Amylase in Ovarian Cancer by Mai Mohamed A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Pathology and Cell Biology Morsani College of Medicine University of South Florida Major Professor: Patricia Kruk, Ph.D. Paula C. Bickford, Ph.D. Meera Nanjundan, Ph.D. Marzenna Wiranowska, Ph.D. Lauri Wright, Ph.D. Date of Approval: June 29, 2017 Keywords: ovarian cancer, amylase, computational analyses, glycocalyx, cellular invasion Copyright © 2017, Mai Mohamed Dedication This dissertation is dedicated to my parents, Ahmed and Fatma, who have always stressed the importance of education, and, throughout my education, have been my strongest source of encouragement and support. They always believed in me and I am eternally grateful to them. I would also like to thank my brothers, Mohamed and Hussien, and my sister, Mariam. I would also like to thank my husband, Ahmed.
    [Show full text]
  • Human CST9L / Testatin Protein (Fc Tag)
    Human CST9L / Testatin Protein (Fc Tag) Catalog Number: 13243-H02H General Information SDS-PAGE: Gene Name Synonym: bA218C14.1; CTES7B; PRO3543; UNQ1835 Protein Construction: A DNA sequence encoding the human CST9L (Q9H4G1) (Met 1-His 147) was fused with the Fc region of human IgG1 at the C-terminus. Source: Human Expression Host: Human Cells QC Testing Purity: > 92 % as determined by SDS-PAGE Endotoxin: Protein Description < 1.0 EU per μg of the protein as determined by the LAL method Testatin is a member of the Cystatin family. Cystatins comprise genes that Stability: all show expression patterns that are strikingly restricted to reproductive tissue. Cystatins are a family of cysteine protease inhibitors with homology Samples are stable for up to twelve months from date of receipt at -70 ℃ to chicken cystatin. There are typically about 115 amino acids in this family. They are largely acidic, contain four conserved cysteine residues known to Predicted N terminal: Trp 29 form two disulfide bonds, may be glycosylated and/or phosphorylated, with Molecular Mass: similarity to fetuins, kininogens, stefins, histidine-rich glycoproteins and cystatin-related proteins. Testatin shows homology to family 2 cystatins, a The recombinant human CST9L/Fc chimera is a disulfide-linked group of broadly expressed small secretory proteins that are inhibitors of homodimeric protein. The reduced monomer consists of 360 amino acids cysteine proteases in vitro but whose in vivo functions are unclear. It is and has a calculated molecular mass of 41.3 kDa. In SDS-PAGE under expressed in germ cells and somatic cells in reproductive tissues. Testatin reducing conditions, the apparent molecular mass of rhCST9L/Fc is considered a strong candidate for involvement in early testis monomer is approximately 48 kDa.
    [Show full text]
  • CST9L Polyclonal Antibody
    PRODUCT DATA SHEET Bioworld Technology,Inc. CST9L polyclonal antibody Catalog: BS65196 Host: Rabbit Reactivity: Human BackGround: munogen. cystatin 9-like(CST9L) Homo sapiens The cystatin su- Applications: perfamily encompasses proteins that contain multiple Immunohistochemistry: 1/100 - 1/300. ELISA: 1/10000. cystatin-like sequences. Some of the members are active Not yet tested in other applications. cysteine protease inhibitors, while others have lost or Storage&Stability: perhaps never acquired this inhibitory activity. There are Store at 4°C short term. Aliquot and store at -20°C long three inhibitory families in the superfamily, including the term. Avoid freeze-thaw cycles. type 1 cystatins (stefins), type 2 cystatins and the kinino- Specificity: gens. The type 2 cystatin proteins are a class of cysteine CST9L Polyclonal Antibody detects endogenous levels of proteinase inhibitors found in a variety of human fluids CST9L protein. and secretions. The cystatin locus on chromosome 20 DATA: contains the majority of the type 2 cystatin genes and pseudogenes. This gene is located in the cystatin locus and encodes a protein similar to mouse cystatin 9. Based on its testis-specific expression, it is likely to have a role in tissue reorganization during early testis development. [provided by RefSeq, Jul 2008], Product: Liquid in PBS containing 50% glycerol, 0.5% BSA and Immunohistochemistry analysis of paraffin-embedded human placenta, 0.02% sodium azide. using CST9L Antibody. The picture on the right is blocked with the Molecular Weight: synthesized peptide. / Note: Swiss-Prot: For research use only, not for use in diagnostic procedure. Q9H4G1 Purification&Purity: The antibody was affinity-purified from rabbit antiserum by affinity-chromatography using epitope-specific im- Bioworld Technology, Inc.
    [Show full text]
  • Rabbit Anti-CST9L Antibody-SL14089R
    SunLong Biotech Co.,LTD Tel: 0086-571- 56623320 Fax:0086-571- 56623318 E-mail:[email protected] www.sunlongbiotech.com Rabbit Anti-CST9L antibody SL14089R Product Name: CST9L Chinese Name: 半胱氨酸蛋白酶抑制剂9样蛋白抗体 Alias: CST9L; CST9L_HUMAN; Cystatin-9-like. Organism Species: Rabbit Clonality: Polyclonal React Species: Human, ELISA=1:500-1000IHC-P=1:400-800IHC-F=1:400-800ICC=1:100-500IF=1:100- 500(Paraffin sections need antigen repair) Applications: not yet tested in other applications. optimal dilutions/concentrations should be determined by the end user. Molecular weight: 14kDa Cellular localization: cytoplasmic Form: Lyophilized or Liquid Concentration: 1mg/ml immunogen: KLH conjugated synthetic peptide derived from human CST9L:41-147/147 Lsotype: IgG Purification: affinitywww.sunlongbiotech.com purified by Protein A Storage Buffer: 0.01M TBS(pH7.4) with 1% BSA, 0.03% Proclin300 and 50% Glycerol. Store at -20 °C for one year. Avoid repeated freeze/thaw cycles. The lyophilized antibody is stable at room temperature for at least one month and for greater than a year Storage: when kept at -20°C. When reconstituted in sterile pH 7.4 0.01M PBS or diluent of antibody the antibody is stable for at least two weeks at 2-4 °C. PubMed: PubMed The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences. Some of the members are active cysteine protease inhibitors, while others have lost or perhaps never acquired this inhibitory activity. There are three inhibitory Product Detail: families in the superfamily, including the type 1 cystatins (stefins), type 2 cystatins and the kininogens. The type 2 cystatin proteins are a class of cysteine proteinase inhibitors found in a variety of human fluids and secretions.
    [Show full text]
  • HHS Public Access Author Manuscript
    HHS Public Access Author manuscript Author Manuscript Author ManuscriptGenet Epidemiol Author Manuscript. Author Author Manuscript manuscript; available in PMC 2016 June 01. Published in final edited form as: Genet Epidemiol. 2015 December ; 39(8): 664–677. doi:10.1002/gepi.21932. Multiple SNP-sets Analysis for Genome-wide Association Studies through Bayesian Latent Variable Selection Zhaohua Lu, Hongtu Zhu, Rebecca C Knickmeyer, Patrick F. Sullivan, Williams N. Stephanie, and Fei Zou for the Alzheimer’s Disease Neuroimaging Initiative* Departments of Biostatistics, Psychiatry, and Genetics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Abstract The power of genome-wide association studies (GWAS) for mapping complex traits with single SNP analysis may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP-set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint association mapping between a large number of SNP-sets and complex traits. Compared to single SNP-set analysis, such joint association mapping not only accounts for the correlation among SNP-sets, but also is capable of detecting causal SNP- sets that are marginally uncorrelated with traits. The spike-slab prior assigned to the effects of SNP-sets can greatly reduce the dimension of effective SNP-sets, while speeding up computation. An efficient MCMC algorithm is developed. Simulations demonstrate that BLVS outperforms several competing variable selection methods in some important scenarios.
    [Show full text]
  • Open Sadiesteffens Dissertation Final.Pdf
    The Pennsylvania State University The Graduate School College of Medicine MECHANISM OF DRUG ACTION OF THE SPECIFIC CK2 INHIBITOR CX-4945 IN ACUTE MYELOID LEUKEMIA A Dissertation in Biomedical Sciences by Sadie Lynne Steffens 2015 Sadie Lynne Steffens Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2015 The dissertation of Sadie Lynne Steffens was reviewed and approved* by the following: Sinisa Dovat Physician, Associate Professor of Pediatrics, Pharmacology, & Biochemistry Director, Translational Research – Four Diamonds Pediatric Cancer Research Center Dissertation Advisor Chair of Committee Barbara A. Miller Physician, Professor of Pediatrics Chief, Division of Pediatric Hematology/Oncology Sergei A. Grigoryev Professor of Biochemistry and Molecular Biology Jong K. Yun Associate Professor of Pharmacology Ralph L. Keil Associate Professor of Biochemistry and Molecular Biology Chair, Biomedical Sciences Graduate Program *Signatures are on file in the Graduate School ii ABSTRACT Acute myeloid leukemia (AML) is a malignant disease of the myeloid line of blood cells and is characterized by the rapid growth of abnormal white blood cells that accumulate in the bone marrow and interfere with the production of normal blood cells. Cytarabine and other currently available treatments for acute myeloid leukemia are highly toxic and insufficient, as more than half of all AML patients develop resistance to chemotherapeutic agents. Since AML often affects older people who are less tolerant of chemotherapy, there is need for novel, targeted, less toxic drugs in order to improve survival for this disease. Casein Kinase II (CK2) is a pro-oncogenic serine/threonine kinase that is essential for cellular proliferation.
    [Show full text]
  • CST9L (NM 080610) Human Tagged ORF Clone Lentiviral Particle Product Data
    OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for RC206646L4V CST9L (NM_080610) Human Tagged ORF Clone Lentiviral Particle Product data: Product Type: Lentiviral Particles Product Name: CST9L (NM_080610) Human Tagged ORF Clone Lentiviral Particle Symbol: CST9L Synonyms: bA218C14.1; CTES7B Vector: pLenti-C-mGFP-P2A-Puro (PS100093) ACCN: NM_080610 ORF Size: 441 bp ORF Nucleotide The ORF insert of this clone is exactly the same as(RC206646). Sequence: OTI Disclaimer: The molecular sequence of this clone aligns with the gene accession number as a point of reference only. However, individual transcript sequences of the same gene can differ through naturally occurring variations (e.g. polymorphisms), each with its own valid existence. This clone is substantially in agreement with the reference, but a complete review of all prevailing variants is recommended prior to use. More info OTI Annotation: This clone was engineered to express the complete ORF with an expression tag. Expression varies depending on the nature of the gene. RefSeq: NM_080610.1 RefSeq Size: 982 bp RefSeq ORF: 444 bp Locus ID: 128821 UniProt ID: Q9H4G1, A0A140VJH1 Protein Families: Secreted Protein, Transmembrane MW: 17.3 kDa This product is to be used for laboratory only. Not for diagnostic or therapeutic use. View online » ©2021 OriGene Technologies, Inc., 9620 Medical Center Drive, Ste 200, Rockville, MD 20850, US 1 / 2 CST9L (NM_080610) Human Tagged ORF Clone Lentiviral Particle – RC206646L4V Gene Summary: The cystatin superfamily encompasses proteins that contain multiple cystatin-like sequences.
    [Show full text]
  • The DNA Sequence and Comparative Analysis of Human Chromosome 20
    articles The DNA sequence and comparative analysis of human chromosome 20 P. Deloukas, L. H. Matthews, J. Ashurst, J. Burton, J. G. R. Gilbert, M. Jones, G. Stavrides, J. P. Almeida, A. K. Babbage, C. L. Bagguley, J. Bailey, K. F. Barlow, K. N. Bates, L. M. Beard, D. M. Beare, O. P. Beasley, C. P. Bird, S. E. Blakey, A. M. Bridgeman, A. J. Brown, D. Buck, W. Burrill, A. P. Butler, C. Carder, N. P. Carter, J. C. Chapman, M. Clamp, G. Clark, L. N. Clark, S. Y. Clark, C. M. Clee, S. Clegg, V. E. Cobley, R. E. Collier, R. Connor, N. R. Corby, A. Coulson, G. J. Coville, R. Deadman, P. Dhami, M. Dunn, A. G. Ellington, J. A. Frankland, A. Fraser, L. French, P. Garner, D. V. Grafham, C. Grif®ths, M. N. D. Grif®ths, R. Gwilliam, R. E. Hall, S. Hammond, J. L. Harley, P. D. Heath, S. Ho, J. L. Holden, P. J. Howden, E. Huckle, A. R. Hunt, S. E. Hunt, K. Jekosch, C. M. Johnson, D. Johnson, M. P. Kay, A. M. Kimberley, A. King, A. Knights, G. K. Laird, S. Lawlor, M. H. Lehvaslaiho, M. Leversha, C. Lloyd, D. M. Lloyd, J. D. Lovell, V. L. Marsh, S. L. Martin, L. J. McConnachie, K. McLay, A. A. McMurray, S. Milne, D. Mistry, M. J. F. Moore, J. C. Mullikin, T. Nickerson, K. Oliver, A. Parker, R. Patel, T. A. V. Pearce, A. I. Peck, B. J. C. T. Phillimore, S. R. Prathalingam, R. W. Plumb, H. Ramsay, C. M.
    [Show full text]
  • Identifying Transcriptomic Correlates of Histology Using Deep Learning
    bioRxiv preprint doi: https://doi.org/10.1101/2020.08.07.241331; this version posted August 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Identifying Transcriptomic Correlates of Histology using Deep Learning Liviu Badea1*, Emil Stănescu1 1 Artificial Intelligence and Bioinformatics Group, National Institute for Research and Development in Informatics, Bucharest, Romania * Corresponding author Email: [email protected] Abstract Linking phenotypes to specific gene expression profiles is an extremely important problem in biology, which has been approached mainly by correlation methods or, more fundamentally, by studying the effects of gene perturbations. However, genome-wide perturbations involve extensive experimental efforts, which may be prohibitive for certain organisms. On the other hand, the characterization of the various phenotypes frequently requires an expert’s subjective interpretation, such as a histopathologist’s description of tissue slide images in terms of complex visual features (e.g. ‘acinar structures’). In this paper, we use Deep Learning to eliminate the inherent subjective nature of these visual histological features and link them to genomic data, thus establishing a more precisely quantifiable correlation between transcriptomes and phenotypes. Using a dataset of whole slide images with matching gene expression data from 39 normal tissue types, we first developed a Deep Learning tissue classifier with an accuracy of 94%. Then we searched for genes whose expression correlates with features inferred by the classifier and demonstrate that Deep Learning can automatically derive visual (phenotypical) features that are well correlated with the transcriptome and therefore biologically interpretable.
    [Show full text]
  • Chromatin Conformation Links Distal Target Genes to CKD Loci
    BASIC RESEARCH www.jasn.org Chromatin Conformation Links Distal Target Genes to CKD Loci Maarten M. Brandt,1 Claartje A. Meddens,2,3 Laura Louzao-Martinez,4 Noortje A.M. van den Dungen,5,6 Nico R. Lansu,2,3,6 Edward E.S. Nieuwenhuis,2 Dirk J. Duncker,1 Marianne C. Verhaar,4 Jaap A. Joles,4 Michal Mokry,2,3,6 and Caroline Cheng1,4 1Experimental Cardiology, Department of Cardiology, Thoraxcenter Erasmus University Medical Center, Rotterdam, The Netherlands; and 2Department of Pediatrics, Wilhelmina Children’s Hospital, 3Regenerative Medicine Center Utrecht, Department of Pediatrics, 4Department of Nephrology and Hypertension, Division of Internal Medicine and Dermatology, 5Department of Cardiology, Division Heart and Lungs, and 6Epigenomics Facility, Department of Cardiology, University Medical Center Utrecht, Utrecht, The Netherlands ABSTRACT Genome-wide association studies (GWASs) have identified many genetic risk factors for CKD. However, linking common variants to genes that are causal for CKD etiology remains challenging. By adapting self-transcribing active regulatory region sequencing, we evaluated the effect of genetic variation on DNA regulatory elements (DREs). Variants in linkage with the CKD-associated single-nucleotide polymorphism rs11959928 were shown to affect DRE function, illustrating that genes regulated by DREs colocalizing with CKD-associated variation can be dysregulated and therefore, considered as CKD candidate genes. To identify target genes of these DREs, we used circular chro- mosome conformation capture (4C) sequencing on glomerular endothelial cells and renal tubular epithelial cells. Our 4C analyses revealed interactions of CKD-associated susceptibility regions with the transcriptional start sites of 304 target genes. Overlap with multiple databases confirmed that many of these target genes are involved in kidney homeostasis.
    [Show full text]