CENTER FOR INDIVIDUALIZED MEDICINE

Clinical Variant Interpretation Lab June 22, 2021 Joel A. Morales-Rosado, MD ©2012 MFMER | slide-1 Variant vs. Gene Information

We have to consider information at two levels: Gene Is the gene central to processes related to disease development? (e.g. growth, development, multifactorial) Is the gene sensitive to perturbation? (e.g. haploinsufficiency) Variant What is the variant effect on the gene product? Is the variant causative of disease? Is it clinically actionable?

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-2 Tools for Gene / Variant Interpretation

Disease association

Variant Frequency

Deleteriousness or Biological Consequence?

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-3 Genotype-Phenotype Databases

Variant of Uncertain Gene of Uncertain Significance Significance (VUS) Disease (GUS) association Known Unknown

“Variant in a gene known to “A connection to a be related to the clinical human disease has not question or a disease” been established.” Is there enough evidence to confirm or rule out pathogenicity?

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-4 Genotype-Phenotype Databases

Disease association Known Unknown

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-5 Genotype-Phenotype Databases Second part of the course “novel disease associations” Gene of Uncertain Significance (GUS)

Disease association Known Unknown

PCAN

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-6 OMIM: Online Mendelian Inheritance in Man

January 16th 2020 Update 25,302 entries 1966 16,253 – gene descriptions

1,467 5,709 – phenotype – phenotype molecular basis known entities 1,552 – phenotype – molecular basis unknown

• ”Gene” level information and curated summary of the history of the gene • OMIM is most useful in understanding the connection of a specific gene to disease

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-7 OMIM: Online Mendelian Inheritance in Man

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-8 OMIM: Online Mendelian Inheritance in Man

Some genes have several phenotypes associated with them – Pleiotropy

Other genes have no phenotype associated with them, but there may still be useful information about function of the gene

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-9 OMIM: Online Mendelian Inheritance in Man

The “phenotype” entry name can be misleading and usually is not explanatory enough – Look at the detailed description and always read the text

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-10 OMIM: Online Mendelian Inheritance in Man

Also keep in mind the “creation” and “edit” dates

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-11 OMIM: Online Mendelian Inheritance in Man

Every OMIM page has a “References” section that you can review for more details, but is better to complement with a Pubmed search specially for publications after the last “Edit” date

Center for INDIVIDUALIZED MEDICINE http://omim.org/

©2012 MFMER | slide-12 GeneCards

Center for INDIVIDUALIZED MEDICINE http://genecards.org/

©2012 MFMER | slide-13 GeneCards

Comprehensive database with multiple links to other helpful resources.

Center for INDIVIDUALIZED MEDICINE http://genecards.org/

©2012 MFMER | slide-14 GeneCards

The gene pages contain:

• Summaries • Protein information • Antibody products • Associated diseases • Protein localization • Pathways • Drugs • Expression • Orthologs/paralogs • …

Center for INDIVIDUALIZED MEDICINE http://genecards.org/

©2012 MFMER | slide-15 UniProt

Protein-oriented database

Center for INDIVIDUALIZED MEDICINE http://uniprot.org/

©2012 MFMER | slide-16 UniProt Useful to go beyond the DNA sequence alterations and get information about the possible effect in the context of the protein structure

Center for INDIVIDUALIZED MEDICINE http://uniprot.org/

©2012 MFMER | slide-17 Gene Reviews

Overview of Gene and related Phenotype, MolGen and various diagnostic testing methods

Center for INDIVIDUALIZED MEDICINE https://www.ncbi.nlm.nih.gov/books/NBK1326/

©2012 MFMER | slide-18 The Human Gene Database

A good overview of BOTH - variant spectrum of genes and linked disease phenotypes

Updated monthly “public version” VS “professional version” Almost 2x more mutation entries Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php

©2012 MFMER | slide-19 HGMD ® : The Human Gene Mutation Database

When exploring a gene-disease association – always try to think on the nature of the variants… are they mostly missense? Are they deletions?Center for INDIVIDUALIZED Non-sense? MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php

©2012 MFMER | slide-20 HGMD ® : The Human Gene Mutation Database

When exploring a gene-disease association – always try to think on the nature of the variants… are they mostly missense? Are they deletions? Non-sense?

Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php

©2012 MFMER | slide-21 HGMD: The Human Gene Mutation Database

• Includes specific variants and their disease association with curated information and links to manuscripts relevant to that variant • Classifications are independent of ACMG guidelines (e.g. DM – disease mutation) and they are not always accurate: REVIEW THE REFERENCES • You can get a good overview of the published history in chronological order of that specific variant.

Center for INDIVIDUALIZED MEDICINE http://www.hgmd.cf.ac.uk/ac/index.php

©2012 MFMER | slide-22 ClinVar Released for the first time in April 2013.

• Specific variants submitted by different clinical laboratories • Information is not curated and the classification often comes form the laboratory’s personal classification. • Unfortunately, classification information is often limited, and different reporters can have contradictory conclusions http://www.ncbi.nlm.nih.gov/clinvar/ Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-23 ClinVar

http://www.ncbi.nlm.nih.gov/clinvar/ Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-24 ClinVAR

The more submitters reporting with no conflicts, probably the more reliable the classification is

Again, keep in mind the dates of submission. The same lab can change their pathogenicity assessmentCenter on for theINDIVIDUALIZED same variant MEDICINE if there is more information available ©2012 MFMER | slide-25 ClinVAR

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-26 Always check the year! 1. ACMG criteria applied? ClinVAR 2. EXAC / GNOMAD… BEFORE? Allele frequencies update.

No further evidence provided

Detailed summary provided

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-27 ClinGen

• Authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. • Standardize clinical annotation and interpretation of variants

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-28 ClinGen

• Authoritative central resource that defines the clinical relevance of genes and variants for use in precision medicine and research. • Standardize clinical annotation and interpretation of variants 1734 genes in 2020

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-29 COSMIC Somatic variants mostly in the context of

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-30 DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources

• Incorporates a suite of tools designed to aid the interpretation of genomic variants. • International community of academic departments of clinical and rare disease genomics now numbering more than 250 centres and having uploaded more than 35,000 cases (2020). • Biased toward neurodevelopmental disorders + CNV/SV alterations.

https://decipher.sanger.ac.uk/ Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-31 DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources

https://decipher.sanger.ac.uk/search?q=ge Center for INDIVIDUALIZED MEDICINE ne%3APKHD1#consented-patients/results ©2012 MFMER | slide-32 DIDA: Digenic Diseases Database Novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases,

Last Update: July 2017 the simplest form of oligogenic inheritance.

Center for INDIVIDUALIZED MEDICINE http://dida.ibsquare.be/

©2012 MFMER | slide-33 Disease association Known Unknown

Variant Frequency Rare Common

Allele Frequency Databases

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-34 Allele frequency and disease penetrance • Age of disease onset? • Effect and frequency of protective variation?

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-35 Exome Aggregation Consortium (ExAC)

https://www.nature.com/articles/nature19057 Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-38 Exome Aggregation Consortium (ExAC) gnomAD browser 125,748 exome sequences 15,708 whole- sequences 141,456 individuals

http://gnomad.broadinstitute.org/

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-39 Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-40 Constraint information Increased constraint! Extremely Intolerant (intolerance to variation) - Z-Scores + 0 pLI 1

3.0 0.9 Deviation of observed counts from the expected number pLIvalue - Probability a gene is haploinsufficient - of variants for an specific gene where heterozygous LoFs are not tolerated.

This is not absolute – Interpret with caution! Among the factors that modify this are variable expressivity, penetrance and some specific regions of the gene may be more intolerant than others for missense.

https://www.nature.com/articles/nature19057 Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-41 http://gnomad.broadinstitute.org/gene/ENSG00000170927 Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-42 http://gnomad.broadinstitute.org/variant/6-51944718-G-A Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-43 JUST GETTING STARTED – Many projects underway

US – Million genome project MC Biobank will serve as the repository.

100,000 UK Genome Project Chinese Million endeavor BGI – Beijing Genomics Institute Phase 1 – 140K – October 2018

2005 - Harvard PGP (United States) 2012 - PGP Canada (Canada) 2013 - PGP UK (United Kingdom) 2014 - GenomAustria (Austria) 2017 - PGP China (People's Republic of China) Asian Genome Project

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-45 Variant Frequency Rare Common

Allele Frequency Databases

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-46 Disease association Known Unknown

Variant Frequency Rare Common

Allele Frequency Databases

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-47 Known Disease Unknown association

Variant Rare Frequency Common

“Deleteriousness”

Gene Product Variant Effect Function & Pathways on Gene Product

Assessment of Biological Consequence

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-48 Gene Product Effect: Protein-coding

 Nonsense: adds a STOP codon → truncated protein  Frameshift In/Del: shifts the reading frame → protein translated incorrectly from that point  Splicing: alters key sites guiding splicing  In-frame In/Del: removes/add one or more amino acids  Stoploss: loss of STOP codon → additional coding region added to protein  Missense: modifies 1 amino acid  Synonymous: no amino acid change

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-49 Variant Effect on Gene Product

Impact on protein structure

Synonymous Missense Still be careful…. Loss of They look benign, but they may not be… Function

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-50 Loss of Function (LOF) Variants

Nonsense c.466C>T → p.Q156*, Frameshift c.2247delT → p.Asp749Glufs*4

Splicing c.1685+3T>G → p.? (exon skipping, in-frame, out of frame)

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-51 Loss of Function (LOF) Variants

Nonsense c.466C>T → p.Q156*, Frameshift c.2247delT → p.Asp749Glufs*4

Splicing c.1685+3T>G → p.? (exon skipping, in-frame, out of frame)

These are more disruptive but that does not necessarily mean they are deleterious : What percentage of the protein is affected? NMD? Are there multiple transcript isoforms? Splicing effect difficult to predict

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-52 Missense Variants How do we tell if a missense alters protein function?

 Type of amino acid change  Size differences  Electrochemical properties  Conservation across species  Conserved protein domain  Secondary protein structure  Tertiary (3D) protein structure  Other functional features (PTM)

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-53 Missense Variants How do we tell if a missense alters protein function?

 Type of amino acid change  Conservation across species  Conserved protein domain  Secondary protein structure  Tertiary (3D) protein structure A B  Other functional features (PTM)

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-54 Conservation

 Conservation is a powerful and broadly used idea  How conserved is a given nucleotide or genomic interval, comparing different species to human?  How conserved is an amino acid in a protein sequence?  Available from UCSC (nucleotide conservation):  PhyloP score – useful to assess single variants, test to detect if nucleotide substitution rates are faster or slower than expected under neutral drift  PhastCons score/element – useful to assess putative regulatory regions and genes not coding proteins  Multi-species alignment – generally useful

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-55 Missense Variants How do we tell if a missense alters protein function?

 Type of amino acid change

 Conservation across species https://www.uniprot.org/  Conserved protein domain  Secondary protein structure  Tertiary (3D) protein structure  Other functional features (PTM)

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-57 Synonymous variants are not always “silent”…

 Does no amino acid change equal no functional effect?  Often, but not always!  Codon usage  Cryptic regulatory sequences (e.g. splicing enhancers)  Strong conservation can be suggestive  No broadly used tool to handle these

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-59 Synonymous variants are not always “silent”…

 Does no amino acid change equal no functional effect?  Often, but not always!  Codon usage  Cryptic regulatory sequences (e.g. splicing enhancers)  Strong conservation can be suggestive  No broadly used tool to handle these

Be careful with at the exon-intron boundary

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-60 Functional Prediction Algorithms

Splicing: • ESE Finder • SpliceSiteFinder-like • MaxEntScan • NNSplicer • GeneSplicer • SpliceAI

Lots of different Algorithms!!!

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-61 Tools for different types of variants

Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:

• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…

Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html

©2012 MFMER | slide-62 Tools for different types of variants

Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:

• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…

Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html

©2012 MFMER | slide-63 Splice Variants:

CRYPTIC SPLICE SITES CAN BE CREATED…

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-64 • Calculates the consensus values of potential splice sites and search for branch points. • Integrates all available matrices to identify exonic and intronic motifs, as well as new matrices to identify hnRNP A1, Tra2-β and 9G8.

INPUT→

ERCC6 c.1527-2A>G OUTPUT→

Center for INDIVIDUALIZED MEDICINE http://www.umd.be/HSF3/index.html ©2012 MFMER | slide-65 Tools for different types of variants

Splice variants: Point mutation/ missense variant/ nonsynonymous • Human Splicing Finder 3.1 variant:

• M-CAP • PredictSNP2 •ESE Finder • SIFT •SpliceSiteFinder-like • CADD •MaxEntScan • PolyPhen-2 •NNSplicer •GeneSplicer Many others…

Center for INDIVIDUALIZED MEDICINE http://www.humgen.nl/SNP_databases.html

©2012 MFMER | slide-66 Polyphen-2 (Polymorphism Phenotyping v2) Tool

• Predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. • Integrates multiple features 8 sequence-based, 3 structure-based (nucleotide and amino acid level)

INPUT→ Alterations appraised “qualitatively” Benign Possibly damaging Probably damaging OUTPUT→

Center for INDIVIDUALIZED MEDICINE http://genetics.bwh.harvard.edu/pph2/

©2012 MFMER | slide-67 CADD Score Scores deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome.

It considers: How to interpret them? • Allelic frequencies PHRED-scale meaning • Conservation metrics a) CADD> 30 – top 0.1% • Functional genomic data • Protein-level scores like Grantham, b) CADD> 20 – top 1.0% SIFT, and PolyPhen. c) CADD>10 – top 10% d) CADD<10 – 7.74billion

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-68 M-CAP Score

Input:

Output:

Center for INDIVIDUALIZED MEDICINE http://bejerano.stanford.edu/mcap/

©2012 MFMER | slide-69 M-CAP Score

Input:

Output:

Center for INDIVIDUALIZED MEDICINE http://bejerano.stanford.edu/mcap/

©2012 MFMER | slide-70 Constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category- optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score.

Center for INDIVIDUALIZED MEDICINE https://loschmidt.chemi.muni.cz/predictsnp2/ ©2012 MFMER | slide-71 Things to keep in mind:  What features are used to calculate it?  Nucleotide / amino acid conservation  Amino acid physicochemical properties  Conservation metrics  Transcript diversity

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-72 Scores are not deterministic of biological effect/deleteriousness, they are used as “supporting evidence”

gDNA: Chr6(GRCh37):g.51720765A>G cDNA: NM_138694.3(PKHD1):c.7837T>C p.Trp2613Arg

Polyphen-2: Probably damaging CADD: 29 Scores agree towards SNV M-CAP: Probably being deleterious PredictSNP2: Deleterious

Likelihood of pathogenicity is affected, not determined.

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-73 Try to get a “big picture” and assemble all the information in a single place

TERT HET c.1885G>A p.Gly629Arg Parents: Unknown VAF:??? AD/AR Disease:

No splicing effect GnomAD: Not described ClinVar: Not described predicted by SpliceAI HGMD: DM? – Pulmonary Fibrosis – CM173470 In silico (SIFT/Polyphen/Mutation Taster/MCAP): Deleterious/Probably Damaging/Disease Causing/Pos.. Pathogenic Location: Chr5 – Exon 4/16 Comments: Telomerase is a ribonucleoprotein polymerase that maintains telomere ends by addition of the telomere repeat TTAGGG. The enzyme consists of a protein component with reverse transcriptase activity, encoded by this gene, and an RNA component which serves as a template for the telomere repeat. Telomerase expression plays a role in cellular senescence, as it is normally repressed in postnatal somatic cells resulting in progressive shortening of telomeres.

Moderately conserved across species

Center for INDIVIDUALIZED MEDICINE Intolerant to ©2012 MFMER | MS and LoF 3198462-74 ©2012 MFMER | slide-74 Try to get a “big picture” and assemble all the information in a single place RTEL1 HET c.932C>T p.Thr311Ile Parents: Unknown AD/AR Disease: gnomAD: ALL: (61/282590 – 0 Hom) 0.022% - AMR:0.10% - NFE:0.012% - OTH:0.14% ClinVar: 1 submission (573620) – VUS HGMD: Not reported In silico (SIFT/Polyphen/Mutation Taster/MCAP): Tolerated/Benign/Polymorphism/Possibly Pathogenic Location: Chr20 – Ex10 of 35 (Helicase ATP binding) Comments: This gene encodes a DNA helicase which functions in the stability, protection and elongation of telomeres and interacts with proteins in the shelterin complex known to protect telomeres during DNA replication. Read-through transcription of this gene into the neighboring downstream gene, which encodes tumor necrosis factor receptor superfamily, member 6b, generates a non-coding transcript.

Not conserved across species

Tolerant to MS Center for INDIVIDUALIZED MEDICINE ©2012 MFMER | 3198462-75 ©2012 MFMER | slide-75 Pathogenicity Disease Variant Biological Other association Frequency Consequence information Interpretation

Always remember: a) Verify frequency (and the quality of the calling) b) Disease associations: Are they known phenotype/genotype correlations? What’s the level of evidence? Functional studies… c) Conservation scores, constraint scores, in silico prediction tools, variation intolerance scores…. are often disease and context dependent. THEY SUPPORT PATHOGENICITY and/or deleteriousness assertion, they DO NOT DETERMINE IT!

Center for INDIVIDUALIZED MEDICINE

©2012 MFMER | slide-76