CENTER FOR INDIVIDUALIZED MEDICINE
Clinical Variant Interpretation Lab June 22, 2021 Joel A. Morales-Rosado, MD ©2012 MFMER | slide-1 Variant vs. Gene Information
We have to consider information at two levels: Gene Is the gene central to processes related to disease development? (e.g. growth, development, multifactorial) Is the gene sensitive to perturbation? (e.g. haploinsufficiency) Variant What is the variant effect on the gene product? Is the variant causative of disease? Is it clinically actionable?
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-2 Tools for Gene / Variant Interpretation
Disease association
Variant Frequency
Deleteriousness or Biological Consequence?
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-3 Genotype-Phenotype Databases
Variant of Uncertain Gene of Uncertain Significance Significance (VUS) Disease (GUS) association Known Unknown
“Variant in a gene known to “A connection to a be related to the clinical human disease has not question or a disease” been established.” Is there enough evidence to confirm or rule out pathogenicity?
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-4 Genotype-Phenotype Databases
Disease association Known Unknown
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-5 Genotype-Phenotype Databases Second part of the course “novel disease associations” Gene of Uncertain Significance (GUS)
Disease association Known Unknown
PCAN
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-6 OMIM: Online Mendelian Inheritance in Man
January 16th 2020 Update 25,302 entries 1966 16,253 – gene descriptions
1,467 5,709 – phenotype – phenotype molecular basis known entities 1,552 – phenotype – molecular basis unknown
• ”Gene” level information and curated summary of the history of the gene • OMIM is most useful in understanding the connection of a specific gene to disease
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-7 OMIM: Online Mendelian Inheritance in Man
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-8 OMIM: Online Mendelian Inheritance in Man
Some genes have several phenotypes associated with them – Pleiotropy
Other genes have no phenotype associated with them, but there may still be useful information about function of the gene
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-9 OMIM: Online Mendelian Inheritance in Man
The “phenotype” entry name can be misleading and usually is not explanatory enough – Look at the detailed description and always read the text
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-10 OMIM: Online Mendelian Inheritance in Man
Also keep in mind the “creation” and “edit” dates
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-11 OMIM: Online Mendelian Inheritance in Man
Every OMIM page has a “References” section that you can review for more details, but is better to complement with a Pubmed search specially for publications after the last “Edit” date
Center for INDIVIDUALIZED MEDICINE http://omim.org/
©2012 MFMER | slide-12 GeneCards
Center for INDIVIDUALIZED MEDICINE http://genecards.org/
©2012 MFMER | slide-13 GeneCards
Comprehensive database with multiple links to other helpful resources.
Center for INDIVIDUALIZED MEDICINE http://genecards.org/
©2012 MFMER | slide-14 GeneCards
The gene pages contain:
• Summaries • Protein information • Antibody products • Associated diseases • Protein localization • Pathways • Drugs • Expression • Orthologs/paralogs • …
Center for INDIVIDUALIZED MEDICINE http://genecards.org/
©2012 MFMER | slide-15 UniProt
Protein-oriented database
Center for INDIVIDUALIZED MEDICINE http://uniprot.org/
©2012 MFMER | slide-16 UniProt Useful to go beyond the DNA sequence alterations and get information about the possible effect in the context of the protein structure
Center for INDIVIDUALIZED MEDICINE http://uniprot.org/
©2012 MFMER | slide-17 Gene Reviews
Overview of Gene and related Phenotype, MolGen and various diagnostic testing methods
Center for INDIVIDUALIZED MEDICINE https://www.ncbi.nlm.nih.gov/books/NBK1326/
©2012 MFMER | slide-18 The Human Gene Mutation Database
A good overview of BOTH - variant spectrum of genes and linked disease phenotypes
Updated monthly “public version” VS “professional version” Almost 2x more mutation entries Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php
©2012 MFMER | slide-19 HGMD ® : The Human Gene Mutation Database
When exploring a gene-disease association – always try to think on the nature of the variants… are they mostly missense? Are they deletions?Center for INDIVIDUALIZED Non-sense? MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php
©2012 MFMER | slide-20 HGMD ® : The Human Gene Mutation Database
When exploring a gene-disease association – always try to think on the nature of the variants… are they mostly missense? Are they deletions? Non-sense?
Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php
©2012 MFMER | slide-21 HGMD: The Human Gene Mutation Database
• Includes specific variants and their disease association with curated information and links to manuscripts relevant to that variant • Classifications are independent of ACMG guidelines (e.g. DM – disease mutation) and they are not always accurate: REVIEW THE REFERENCES • You can get a good overview of the published history in chronological order of that specific variant.
Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php
©2012 MFMER | slide-22 ClinVar Released for the first time in April 2013.
• Specific variants submitted by different clinical laboratories • Information is not curated and the classification often comes form the laboratory’s personal classification. • Unfortunately, classification information is often limited, and different reporters can have contradictory conclusions http://www.ncbi.nlm.nih.gov/clinvar/ Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-23 ClinVar
http://www.ncbi.nlm.nih.gov/clinvar/ Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-24 ClinVAR
The more submitters reporting with no conflicts, probably the more reliable the classification is
Again, keep in mind the dates of submission. The same lab can change their pathogenicity assessmentCenter on for theINDIVIDUALIZED same variant MEDICINE if there is more information available ©2012 MFMER | slide-25 ClinVAR
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-26 Always check the year! 1. ACMG criteria applied? ClinVAR 2. EXAC / GNOMAD… BEFORE? Allele frequencies update.
No further evidence provided
Detailed summary provided
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-27 ClinGen
• Authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. • Standardize clinical annotation and interpretation of variants
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-28 ClinGen
• Authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. • Standardize clinical annotation and interpretation of variants 1734 genes in 2020
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-29 COSMIC Somatic variants mostly in the context of cancer
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-30 DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources
• Incorporates a suite of tools designed to aid the interpretation of genomic variants. • International community of academic departments of clinical genetics and rare disease genomics now numbering more than 250 centres and having uploaded more than 35,000 cases (2020). • Biased toward neurodevelopmental disorders + CNV/SV alterations.
https://decipher.sanger.ac.uk/ Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-31 DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources
https://decipher.sanger.ac.uk/search?q=ge Center for INDIVIDUALIZED MEDICINE ne%3APKHD1#consented-patients/results ©2012 MFMER | slide-32 DIDA: Digenic Diseases Database Novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases,
Last Update: July 2017 the simplest form of oligogenic inheritance.
Center for INDIVIDUALIZED MEDICINE http://dida.ibsquare.be/
©2012 MFMER | slide-33 Disease association Known Unknown
Variant Frequency Rare Common
Allele Frequency Databases
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-34 Allele frequency and disease penetrance • Age of disease onset? • Effect and frequency of protective variation?
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-35 Exome Aggregation Consortium (ExAC)
https://www.nature.com/articles/nature19057 Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-38 Exome Aggregation Consortium (ExAC) gnomAD browser 125,748 exome sequences 15,708 whole-genome sequences 141,456 individuals
http://gnomad.broadinstitute.org/
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-39 Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-40 Constraint information Increased constraint! Extremely Intolerant (intolerance to variation) - Z-Scores + 0 pLI 1
3.0 0.9 Deviation of observed counts from the expected number pLIvalue - Probability a gene is haploinsufficient - of variants for an specific gene where heterozygous LoFs are not tolerated.
This is not absolute – Interpret with caution! Among the factors that modify this are variable expressivity, penetrance and some specific regions of the gene may be more intolerant than others for missense.
https://www.nature.com/articles/nature19057 Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-41 http://gnomad.broadinstitute.org/gene/ENSG00000170927 Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-42 http://gnomad.broadinstitute.org/variant/6-51944718-G-A Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-43 JUST GETTING STARTED – Many projects underway
US – Million genome project MC Biobank will serve as the repository.
100,000 UK Genome Project Chinese Million Genomes endeavor BGI – Beijing Genomics Institute Phase 1 – 140K – October 2018
2005 - Harvard PGP (United States) 2012 - PGP Canada (Canada) 2013 - PGP UK (United Kingdom) 2014 - GenomAustria (Austria) 2017 - PGP China (People's Republic of China) Asian Genome Project
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-45 Variant Frequency Rare Common
Allele Frequency Databases
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-46 Disease association Known Unknown
Variant Frequency Rare Common
Allele Frequency Databases
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-47 Known Disease Unknown association
Variant Rare Frequency Common
“Deleteriousness”
Gene Product Variant Effect Function & Pathways on Gene Product
Assessment of Biological Consequence
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-48 Gene Product Effect: Protein-coding
Nonsense: adds a STOP codon → truncated protein Frameshift In/Del: shifts the reading frame → protein translated incorrectly from that point Splicing: alters key sites guiding splicing In-frame In/Del: removes/add one or more amino acids Stoploss: loss of STOP codon → additional coding region added to protein Missense: modifies 1 amino acid Synonymous: no amino acid change
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-49 Variant Effect on Gene Product
Impact on protein structure
Synonymous Missense Still be careful…. Loss of They look benign, but they may not be… Function
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-50 Loss of Function (LOF) Variants
Nonsense c.466C>T → p.Q156*, Frameshift c.2247delT → p.Asp749Glufs*4
Splicing c.1685+3T>G → p.? (exon skipping, in-frame, out of frame)
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-51 Loss of Function (LOF) Variants
Nonsense c.466C>T → p.Q156*, Frameshift c.2247delT → p.Asp749Glufs*4
Splicing c.1685+3T>G → p.? (exon skipping, in-frame, out of frame)
These are more disruptive but that does not necessarily mean they are deleterious : What percentage of the protein is affected? NMD? Are there multiple transcript isoforms? Splicing effect difficult to predict
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-52 Missense Variants How do we tell if a missense alters protein function?
Type of amino acid change Size differences Electrochemical properties Conservation across species Conserved protein domain Secondary protein structure Tertiary (3D) protein structure Other functional features (PTM)
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-53 Missense Variants How do we tell if a missense alters protein function?
Type of amino acid change Conservation across species Conserved protein domain Secondary protein structure Tertiary (3D) protein structure A B Other functional features (PTM)
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-54 Conservation
Conservation is a powerful and broadly used idea How conserved is a given nucleotide or genomic interval, comparing different species to human? How conserved is an amino acid in a protein sequence? Available from UCSC (nucleotide conservation): PhyloP score – useful to assess single variants, test to detect if nucleotide substitution rates are faster or slower than expected under neutral drift PhastCons score/element – useful to assess putative regulatory regions and genes not coding proteins Multi-species alignment – generally useful
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-55 Missense Variants How do we tell if a missense alters protein function?
Type of amino acid change
Conservation across species https://www.uniprot.org/ Conserved protein domain Secondary protein structure Tertiary (3D) protein structure Other functional features (PTM)
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-57 Synonymous variants are not always “silent”…
Does no amino acid change equal no functional effect? Often, but not always! Codon usage Cryptic regulatory sequences (e.g. splicing enhancers) Strong conservation can be suggestive No broadly used tool to handle these
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-59 Synonymous variants are not always “silent”…
Does no amino acid change equal no functional effect? Often, but not always! Codon usage Cryptic regulatory sequences (e.g. splicing enhancers) Strong conservation can be suggestive No broadly used tool to handle these
Be careful with mutations at the exon-intron boundary
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-60 Functional Prediction Algorithms
Splicing: • ESE Finder • SpliceSiteFinder-like • MaxEntScan • NNSplicer • GeneSplicer • SpliceAI
Lots of different Algorithms!!!
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-61 Tools for different types of variants
Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:
• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…
Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html
©2012 MFMER | slide-62 Tools for different types of variants
Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:
• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…
Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html
©2012 MFMER | slide-63 Splice Variants:
CRYPTIC SPLICE SITES CAN BE CREATED…
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-64 • Calculates the consensus values of potential splice sites and search for branch points. • Integrates all available matrices to identify exonic and intronic motifs, as well as new matrices to identify hnRNP A1, Tra2-β and 9G8.
INPUT→
ERCC6 c.1527-2A>G OUTPUT→
Center for INDIVIDUALIZED MEDICINE http://www.umd.be/HSF3/index.html ©2012 MFMER | slide-65 Tools for different types of variants
Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:
• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…
Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html
©2012 MFMER | slide-66 Polyphen-2 (Polymorphism Phenotyping v2) Tool
• Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. • Integrates multiple features 8 sequence-based, 3 structure-based (nucleotide and amino acid level)
INPUT→ Alterations appraised “qualitatively” Benign Possibly damaging Probably damaging OUTPUT→
Center for INDIVIDUALIZED MEDICINE http://genetics.bwh.harvard.edu/pph2/
©2012 MFMER | slide-67 CADD Score Scores deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.
It considers: How to interpret them? • Allelic frequencies PHRED-scale meaning • Conservation metrics a) CADD> 30 – top 0.1% • Functional genomic data • Protein-level scores like Grantham, b) CADD> 20 – top 1.0% SIFT, and PolyPhen. c) CADD>10 – top 10% d) CADD<10 – 7.74billion
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-68 M-CAP Score
Input:
Output:
Center for INDIVIDUALIZED MEDICINE http://bejerano.stanford.edu/mcap/
©2012 MFMER | slide-69 M-CAP Score
Input:
Output:
Center for INDIVIDUALIZED MEDICINE http://bejerano.stanford.edu/mcap/
©2012 MFMER | slide-70 Constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category- optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score.
Center for INDIVIDUALIZED MEDICINE https://loschmidt.chemi.muni.cz/predictsnp2/ ©2012 MFMER | slide-71 Things to keep in mind: What features are used to calculate it? Nucleotide / amino acid conservation Amino acid physicochemical properties Conservation metrics Transcript diversity
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-72 Scores are not deterministic of biological effect/deleteriousness, they are used as “supporting evidence”
gDNA: Chr6(GRCh37):g.51720765A>G cDNA: NM_138694.3(PKHD1):c.7837T>C p.Trp2613Arg
Polyphen-2: Probably damaging CADD: 29 Scores agree towards SNV M-CAP: Probably being deleterious PredictSNP2: Deleterious
Likelihood of pathogenicity is affected, not determined.
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-73 Try to get a “big picture” and assemble all the information in a single place
TERT HET c.1885G>A p.Gly629Arg Parents: Unknown VAF:??? AD/AR Disease:
No splicing effect GnomAD: Not described ClinVar: Not described predicted by SpliceAI HGMD: DM? – Pulmonary Fibrosis – CM173470 In silico (SIFT/Polyphen/Mutation Taster/MCAP): Deleterious/Probably Damaging/Disease Causing/Pos.. Pathogenic Location: Chr5 – Exon 4/16 Comments: Telomerase is a ribonucleoprotein polymerase that maintains telomere ends by addition of the telomere repeat TTAGGG. The enzyme consists of a protein component with reverse transcriptase activity, encoded by this gene, and an RNA component which serves as a template for the telomere repeat. Telomerase expression plays a role in cellular senescence, as it is normally repressed in postnatal somatic cells resulting in progressive shortening of telomeres.
Moderately conserved across species
Center for INDIVIDUALIZED MEDICINE Intolerant to ©2012 MFMER | MS and LoF 3198462-74 ©2012 MFMER | slide-74 Try to get a “big picture” and assemble all the information in a single place RTEL1 HET c.932C>T p.Thr311Ile Parents: Unknown AD/AR Disease: gnomAD: ALL: (61/282590 – 0 Hom) 0.022% - AMR:0.10% - NFE:0.012% - OTH:0.14% ClinVar: 1 submission (573620) – VUS HGMD: Not reported In silico (SIFT/Polyphen/Mutation Taster/MCAP): Tolerated/Benign/Polymorphism/Possibly Pathogenic Location: Chr20 – Ex10 of 35 (Helicase ATP binding) Comments: This gene encodes a DNA helicase which functions in the stability, protection and elongation of telomeres and interacts with proteins in the shelterin complex known to protect telomeres during DNA replication. Read-through transcription of this gene into the neighboring downstream gene, which encodes tumor necrosis factor receptor superfamily, member 6b, generates a non-coding transcript.
Not conserved across species
Tolerant to MS Center for INDIVIDUALIZED MEDICINE ©2012 MFMER | 3198462-75 ©2012 MFMER | slide-75 Pathogenicity Disease Variant Biological Other association Frequency Consequence information Interpretation
Always remember: a) Verify frequency (and the quality of the calling) b) Disease associations: Are they known phenotype/genotype correlations? What’s the level of evidence? Functional studies… c) Conservation scores, constraint scores, in silico prediction tools, variation intolerance scores…. are often disease and context dependent. THEY SUPPORT PATHOGENICITY and/or deleteriousness assertion, they DO NOT DETERMINE IT!
Center for INDIVIDUALIZED MEDICINE
©2012 MFMER | slide-76