BOOKLET Santiago
Total Page:16
File Type:pdf, Size:1020Kb
!1 14th International Symposium on Variants in the Genome: detection, sequencing & interpretation 5 - 7 June 2017 NH Collection Santiago Santiago de Compostela, Spain Scientific Programme Committee Prof. Johan T. den Dunnen (Leiden, Nederland) CHAIR Prof. Sir John Burn (Newcastle, UK) Prof. Angel Carracedo (Santiago de Compostela, Spain) Dr Reece Hart (San Francisco, CA, USA) Dr Andreas Laner (Munich, Germany) Dr Maria-Jesus Sobrido (Santiago de Compostela, Spain) Organising Committee Maria Torres (CEGEN-PRB2, Universidade de Santiago de Compostela) Rania Horaitis; Event Manager (Meeting Makers www.meeting-makers.com) Previous Meetings 2003 Palm Cove, Australia 1991 Oxford, UK 2005 Santorini, Greece 1993 Lago D’Orta, Italy 2007 Xiamen, China 1995 Visby, Sweden 2009 Paphos, Cyprus 1997 Brno, Czech Republic 2011 Santorini, Greece 1999 Vicoforte, Italy 2013 Lake Louise, Canada 2001 Bled, Slovenia 2015 Leiden, Netherlands !2 !3 !4 !5 !6 !7 !8 !9 !10 !11 !12 !13 Oral Presentation Abstracts Exomiser to include an assessment of SESSION 1 regulatory regions, in an application called the Genomiser. Simulations show an ability to rank the seeded causative variant in first place Phenotype Driven Genomic over an entire genome in over 60% of cases. Diagnostics In this talk, I will review how computational phenotype analysis with the HPO works and Peter N Robinson how we have used it for the Exomiser and Genomiser. Professor of Computational Biology, The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA CNV detection from targeted [email protected] next-generation panel sequencing www.jax.org Robinson lab: https://robinsongroup.github.io/ data in routine diagnostics Anna Benet-Pagès, Anke M. Nissen, Janine The analysis of phenotype plays a key role in Graf, Christina Rapp, Melanie Locher, Andreas clinical practice and medical research but is Laner, Elke Holinski- Feder. difficult to compute“ on. The Human Phenotype Ontology (HPO) provides a Medizinisch Genetisches Zentrum, MGZ, standardized, controlled vocabulary that Munich, Germany allows phenotypic information to be described in an unambiguous fashion and has been Gene dosage abnormalities account for a adopted by a number of groups including the significant proportion of pathogenic mutations NIH Undiagnosed Diseases Program (UDP) in rare genetic disease related genes. In times and the UK 100,000 Genomes Project. The of next generation sequencing (NGS), a single HPO makes clinical data “computable”, and analysis approach to detect SNVs and CNVs allow a shift from binary phenotype analysis from the same data source would be of great (patient cohort vs. control) to an analysis of benefit for routine diagnostics. However, CNV phenotypic profiles that offers a foundation for detection from exon-capture NGS data has no computational approaches to integrating standard methods or quality measures so far. clinical data in precision medicine. HPO- Current bioinformatics tools depend solely on based algorithms integrate phenotype read depth which is systematically biased. We specificity, imprecision, noise and frequency developed a novel approach based on: 1. to identify matching diseases and patients. utilization of five independent detection tools We have developed Exomiser to identify to increase sensitivity, 2. different reference promising candidate genes in whole exome sets for different kits and normalization sequencing WES studies by ranking candidate against samples from the same sequencing genes according to phenotypic similarity to run to improve robustness against workflow human, mouse, and zebrafish mutant conditions, 3. definition of special quality phenotypes. Using simulated exomes and the thresholds for single exon events to minimize NIH UDP patient cohort showed Exomiser false negatives, 4. identification of reliable ranked the causal variant as the top hit in 97% regions by assessment of capture efficiency of known disease–gene associations. We have using a reference set of CNV negative patients developed a machine learning algorithm that to minimize false positives. A CNV is called in provides a pathogenicity score for each base a reliable region if at least two out of five tools of the non-coding genome, and extended the are concordant for the respective CNV. The !14 pipeline shows a sensitivity of 80% and a challenges to variant disease interpretation. A precision of 95%. Within routine gene panel typical WGS of an individual identifies diagnostics we analyzed a total of 1088 ~5million non-reference variants, a 50 fold patients indicated to have rare Mendelian increase compared to whole exome diseases for SNV and CNVs. In 32 patients a sequencing (WES). A considerable proportion CNV was detected in genes associated with of this “variant avalanche” (10-15%) resides the respective individual phenotype. within transcription regulatory elements - Interestingly, in several cases the CNV promoters and enhancers. Promoters are completed the patients report as it was relatively easy to identify due to their detected in genes with a recessive mode of stereotyped positioning at the immediate 5’ inheritance where previously only a neighborhood of genes, and their targets are heterozygous pathogenic SNV was found. quite obvious. In contrast, the identification of Overall, with the additional analysis of CNVs enhancers constitutes a major challenge. An we increased the diagnostic yield from 15% equally difficult task is identifying the (class 4, 5 single nucleotide events) to 18%. connections between these distant-acting However, there are still issues in the detection regulatory elements and their target genes. of CNVs from NGS data for routine Genomic enhancers are centrally involved in diagnostics. CNV pipelines are very prone to the spatiotemporal orchestration of gene errors caused by enrichment inconsistencies expression in embryonic development and in compared to SNV detection tools. The cell differentiation. This makes them prime assessment of sensitivity and specificity is novel targets for annotating the plethora of difficult due to the lack of datasets to validate non-coding variants in WGS, and interpreting CNV detection pipelines. Originally, the them in the realms of health and diseases. analysis of CNVs was performed mainly in Methodology: We created GeneHancer, a patients with mental retardation disorders, novel regulatory element database, in the resulting in a paucity of CNV data linked to framework of the GeneCards suite other Mendelian diseases. Moreover, the (www.genecards.org). We integrated four identification of the actual size and thus the enhancer data sources: a) 176,000 enhancer assessment of pathogenicity of a CNV is regions from the ENCODE project (https:// difficult, because targeted NGS gene panels www.encodeproject.org/); b) 213,000 do not cover all genes. In conclusion, NGS elements from the Ensembl regulatory build data is a suitable data source for the (http://www.ensembl.org/index.html); c) simultaneous detection of SNVs and CNVs for 43,000 elements from FANTOM (http:// clinical diagnosis; however, with the current fantom.gsc.riken.jp/), identified via enhancer tools it is only applicable in accurately RNAs (eRNAs); d) 1,700 experimentally- validated regions. validated elements from the VISTA enhancer browser (http://enhancer.lbl.gov/). Subsequently, we consolidated gene- GeneHancer and VarElect: enhancer links obtained by five disease interpretation of whole methodologies: a) GTEx expression genome sequence variants quantitative trait loci (eQTLs, http:// www.gtexportal.org/home/), b) Capture Hi-C Simon Fishilevich, Naomi Rosen, Michal Twik, promoter-enhancer long range interactions Rotem Hadar, Tsippi Iny-Stein, Marilyn Safran (PMID 25938943); c) FANTOM expression and Doron Lancet correlations between eRNAs and candidate target genes; d) Expression correlations Department of Molecular Genetics, Weizmann between enhancer-targeted transcription Institute of Science, Rehovot 7610001, Israel factors and genes; e) Enhancer-gene genomic Purpose: The emergence of whole genome distance scores. sequencing (WGS) poses considerable !15 Results: GeneHancer portrays 285,000 ChromiumTM: Full spectrum integrated non-redundant candidate genome analysis with Linked- enhancers (covering 12.4% of the genome), along with annotation-derived confidence Reads scores. In parallel, our database incorporates Steve Glavas, Sarah Garcia, Claudia ~1.02 million integrated and scored gene- Catalanotti, Haynes Heaton, Patrick Marks, enhancer links involving 101,337 genes. Michael Schnall-Levin, Stephen Williams, Among these, we define a subset of “double Andrew Wei Xu, Grace Zheng, Deanna M. elite” enhancer-gene pairs, based on the Church conjunction of two or more methods for both entities. This allows WGS variants within enhancers to be interpreted with high 10x genomics, Inc. confidence, based on high-probability target gene links. These WGS analysis capabilities High-throughput sequencing has are being embedded within the GeneCards revolutionized genome analysis. However, it is Suite, among others by modifying VarElect clear that traditional short read methods and TGex, its next generation sequencing provide an incomplete view of the genome (NGS) disease interpretation tools [PMID: and result in an incomplete understanding of 27357693]. the clinical and biological complexity present. For WES, VarElect prioritizes a list of variant- In particular, the lack of long-range containing genes by seeking the relevance of information combined with inherent limitations such