<<

Annotation: curation, tools, ontologies, databases

Mike Cherry

Genomics Lecture 7 Genetics 211 - Winter 2014 Database of the Past

Staats, Joan. The Classified Bibliography of Inbred Strains of Mice. Science 119(3087): 295-296 (1954-02-26).

2 Staats, Joan. The Classified Bibliography of Inbred Strains of Mice. Science 119(3087): 295-296 (1954-02-26) What’s this guy talking about?

"So you want to make a nnn, there are a couple of steps. You acquire data through partners. You do a bunch of engineering on that data to get it into the right format and conflate it with other sources of data, and then you do a bunch of operations, which is what this tool is about, to hand massage the data. And out the other end pops something that is higher quality than the sum of its parts." ! Michael Weiss-Malik, The Atlantic 9/2012

!3

Why curate publications?

• Standardize vocabulary – fulltext is easy to use, but difficult to know the search was complete • Integrate results – why don’t publishers mandate submission of standardized data? • for decades crystallographic data & coordinates, and GenBank accession numbers have been required • GEO & SRA accessions should be but, are not enforced.

!5 Manual Curation

• Read published literature or use tools for analysis of results to make the best annotation • Identify the experimental methods used • Connect associated IDs from ontologies/ vocabularies, sequences their IDs, and connections to other databases (pathway, chemical, orthology, interactions, disease, expression, ... etc.)

!6 Examples of Curated Databases

Database, UniProtKB • , database • Chemical, CHEBI or PubChem • Human Genetic Disease, OMIM • Function, Consortium • Gene Models, GenCODE & UCSC • Sequence Variants, LSVD & HGMD • Personal Genomics, Ingenuity & OMICIA How would you curate this paper?

Drosophila Hedgehog

The gene hedgehog is referred to in FlyBase by the symbol Dmel\hh (CG4637, FBgn0004644). It is a protein_coding_gene from . Its sequence location is 3R:18953425..18967881. It has the cytological map location 94E1. There is experimental evidence that it has the molecular function: surface binding; protein binding. There is experimental evidence for 24 unique biological process terms, many of which group under: anatomical structure development; cellular component movement; organ morphogenesis; regulation of biological process; sensory organ development; reproductive process in a multicellular organism; organ development; regulation of protein transport; cell projection organization; open tracheal system development; central nervous system development. 169 alleles are reported. The phenotypes of these alleles are annotated with: organ system; adult segment; adult mesothoracic segment; peripheral nervous system; primordium; nervous system; embryonic nervous system; late extended germ band embryo; ventral nerve cord primordium; abdominal ventral denticle belt. It has 2 annotated transcripts and 2 annotated polypeptides.

!10 Saccharomyces SGS1 ! SGS1 encodes a helicase with similarity to E. coli RecQ and human BLM and WRN helicases (6). in BLM are implicated in the cancer- prone Bloom's Syndrome, and mutations in WRN cause the premature- aging Werner's Syndrome (6). SGS1 was identified in a screen for suppressors of the slow growth phenotype of top3 mutants, and Sgs1p has been shown to interact with the topoisomerase Top3 p (7). Sgs1p appears to be involved in the maintenance of genome stability and the suppression of illegitimate recombination; sgs1 null mutants show mitotic hyperrecombination and elevated levels of missegregation (8, 9, 10). Sgs1p has been localized to the nucleolus, and is needed to maintain the integrity of rDNA repeats (2). Sgs1p shows ATPase activity and unwinds duplex DNA; it preferentially binds to branched DNA substrates and has a 3' to 5' polarity of unwinding (11, 12).

11 www.pharmgkb.org ! Knowledge Base of Pharmacogenetic and Pharmacogenomic Information

12 Imaging expression patterns during Drosophila embryogenesis.

Tomancak et al. Genome Biology 2002 3:research0088.1 13 The Gene Wiki article about Cyclin-dependent kinase

Good B M et al. Nucl. Acids Res. 2011;nar.gkr925 Trust distributions of Gene Wiki revisions versus general (non-Gene Wiki) Wikipedia revisions.

Good B M et al. Nucl. Acids Res. 2011;nar.gkr925 Directed Acyclic Graph How can ontologies help organizing information?

• Describe material entities of nature • Represent events, actions, procedures & relationships as immaterial entities • Specific connections between entities • Standardize nomenclature • Enhance computability of information • Provide a framework to communicate information An ontology is a set of terms…

cell chromosome nucleus

mitochondrial! chromosome

!18 An ontology is a set of terms… … with different types of relationships to each .

cell

Parent Term part_of part_of

part_of mitochondrion chromosome nucleus

Child Term

part_of is_a

mitochondrial! chromosome An ontology is a set of terms… … with different types of relationships to each other." All relationships must be true.

cell

Parent Term part_of part_of

part_of mitochondrion chromosome nucleus

Child Term part_of

part_of is_a

mitochondrial! chromosome An ontology is a set of terms… … with different types of relationships to each other." All relationships must be true" because inferences can be made based on these relationships cell

Parent Term part_of part_of

part_of mitochondrion chromosome nucleus

part_of Child Term part_of

part_of is_a

part_of

mitochondrial! chromosome www.geneontology.org/GO.ontology.relations.shtml 22 Example Gene Ontology Term

id: GO:0000016! name: lactase activity! namespace: molecular_function! def: "Catalysis of the reaction: lactose + H2O = ! !!!!!D-glucose + D-galactose." [EC:3.2.1.108]! synonym: "lactase-phlorizin hydrolase activity" BROAD ! !!!!![EC:3.2.1.108]! synonym: "lactose galactohydrolase activity" EXACT [EC:3.2.1.108]! xref: EC:3.2.1.108! xref: MetaCyc:LACTASE-RXN! xref: Reactome:20536! is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl! !!!!!compounds

23 Subset of Evidence Code Ontology [Term]! id: ECO:00000067! name: inferred from electronic annotation! def: "Used for annotations that depend directly on computation or automated transfer of annotations from a database, particularly when the analysis is performed internally and not published. A key feature that distinguishes this evidence code from others is that it is not made by a curator; use IEA when no curator has checked the specific annotation to verify its accuracy. The actual method used (BLAST search, Swiss-Prot keyword mapping, etc.) doesn't matter." [GO:IEA]! synonym: "IEA" RELATED []! is_a: ECO:0000043 ! inferred from in-silico analysis! ! [Term]! id: ECO:0000007! name: inferred from immunofluorescence! def: "Used when an annotation is made based on methods that detect the presence of macromolecules, , and compounds by the use of a fluorescent-labeled ." [TAIR:TED]! synonym: "IDA: immunofluorescence" RELATED []! is_a: ECO:0000040 ! inferred from immunological assay! ! 24 Evidence Codes for Gene Product Annotations

Direct assay (IDA). Enzyme assays, In vitro reconstitution (e.g. ), Immunofluorescence (for cellular component) Expression pattern (IEP). Transcript levels (e.g. Northerns, microarray data) Genetic interaction (IGI). "Traditional" genetic interactions such as suppressors, synthetic lethals, etc. Mutant phenotype (IMP). Any gene /knockout Physical interaction (IPI). 2-hybrid interactions; Co-purification Sequence or structural similarity (ISS). Sequence similarity (homologue of/most closely related to); Recognized domains (ISA). Pairwise or multiple sequence alignment to experimentally characterized proteins. Sequence Orthology (ISO). Orthologous share a common ancestor and have arisen due to a speciation event. Phylogenetic analysis with maximum likelihood or nearest neighbor joining. Sequence Model (ISM). Statistical modeling tool determined protein membership in a functional family. HHM, tRNAscan and TMHMM are examples of this type of algorithm. Genomic context (IGC). Annotations based on synteny, in these cases sequences similarity is not enough. For example, presence within an operon in bacterial systems.

25 30000" Number"of"Annotated"Genes"per"Organism"by"Evidence"Type" December"2013"(using"genename"as"unique"ID)"

25000"

20000"

15000"

10000"

5000"

0" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" EcoCyc" PomBase" SGD" dictyBase" WB" FB" Chicken" Cow" ZFIN" Pig" Dog" RGD" MGI" TAIR" Human" Biological process GO enrichment of 88 human peroxisome proteins before the focused manual peroxisome protein annotation effort.

Mutowo-Meullenet P et al. Database 2013;2013:bas062

© The Author(s) 2013. Published by Oxford University Press. Biological process GO enrichment of 88 human peroxisome proteins after the focused manual peroxisome protein annotation effort.

Mutowo-Meullenet P et al. Database 2013;2013:bas062

© The Author(s) 2013. Published by Oxford University Press. Annotations are unconnected

posi-ve(regula-on(of( protein( transcrip-on(from(pol(II( localiza-on(to( pap1% promoter(in(response(to( sty1% nucleus[GO: oxida-ve(stress[GO: 0034504]( 0036091](

cellular(response( to(oxida-ve(stress( [GO:0034599](

DB# Object# Term# Ev# Ref# ..# PomBase( sty1( GO:0034504( IMP( PMID:9585505( ..( ..( ..( SPAC24B11.06c(( PomBase( sty1( GO:0034599( IMP( PMID:9585505( ..( ..( SPAC24B11.06c(( PomBase( pap1( GO:0036091( IMP( PMID:9585505( ..( SPAC1783.07c( Annotation with Relationships

posi-ve(regula-on(of( protein( cellular(response(to( transcrip-on(from(pol(II( localiza-on(to( oxida-ve(stress( promoter(in(response(to( nucleus( [GO:0034599]( oxida-ve(stress( [GO:0034504]( [GO:0036091]( happens( during(

has( sty1% has(( regula-on( input( pap1% target(

DB# Object# Term# Ev# Ref# Extension# PomBase( sty1( GO:0034504( IMP( PMID:9585505( ..( happens_during((GO:0034599),( ..( SPAC24B11.06c(( protein( has_input(SPAC1783.07c)( localiza@on(to( nucleus( PomBase( pap1( GO:0036091( IMP( PMID:9585505( has_regula@on_target(…)( SPAC1783.07c( Shah NH, Cole T, Musen MA (2012) Chapter 9: Analyses Using Disease Ontologies. PLoS Comput Biol 8(12): e1002827. doi:10.1371/journal.pcbi.1002827

is_a (SubClassOf ) part_of develops_from GO surrounded_by FMA multicellular EHDAA2 organismal process organ solid organ pharyngeal system region respiratoryg aseous exchange

respiratory primordium respiratory parenchymatous lung bud respiratory system system organ process

lung MA Lower lobular respiratory thoracic organ organ cavity system tract MPO

abnormal thoracic respiratory system respiratory morphology cavity system organ pleural sac lung abnormal lung morphology

abnormal lung pulmonarya cinus pulmonary morphology acinus abnormal pulmonary alveolus lung alveolar sac morphology alveolus

Mungall et al. Genome Biology 2012 13:R5 doi:10.1186/gb-2012-13-1-r5 Uberon, unified anatomy ontology

is_a (SubClassO f) anatomical part_of develops_from structure capable_o f is_a (taxon equivalent) endoderm only_in_taxon organ part foregut

swim bladder organ endoderm of forgut

NCBITaxo:n respiration organ Actinopterygi i respiratory primordium GO: respiratory gaseous exchang e pulmonaryaci nus alveolus lung lung primordium

NCBITaxo:n Mammalia alveolus of lung alveolar sac lung bud

FMA: pulmonary FMA:lung MA:lung alveolus EHDAA : MA:lung alveolus lung bud

Mungall et al. Genome Biology 2012 13:R5 doi:10.1186/gb-2012-13-1-r5 Metadata

HepG2#and#HuH67#are# HepG2#and#HuH67#are## #derived#from#liver# associated#with#liver#carcinomas#

HepG2#and#HuH67#are## HepG2#and#HuH67#are## stable#cell#lines# of#human#origin#

By#using#the#ontology#terms#HepG2#and#HuH67#to#describe#the#cell#lines#used#in#the#experiment,# addi

Bioproject (e.g. collection of results) ! Biosamples (e.g. cell type) ! Treatments (e.g. protocol) ! Analysis (e.g. pipeline) ! Antibody (e.g. ChIP-seq, RIP-seq) ! Results (e.g. datasets) ! Replicates (e.g. association between results) A model experiment submitted to the modENCODE DCC and its mapping to metadata components BIR-TAB SDRF and the wiki.

Washington N L et al. Database 2011;2011:bar023 (a) (b) human mouse human Arabidopsis negative trait A trait B Waardenburg gravitropism syndrome SNAI2 Phenologs defective MITF predicted for PAX3 DDHD2/SEC23IP human trait A STX7/STX12 DNAJC13

Waardenburg syndrome, (c) certain mammalian neural weakly crest defects and plant predicted human for A gravitropism defects share and partly arise from an trait A Arabidopsis ancient, highly conserved trait B vesicle trafficking system

38 Woods et al. BMC 2013 14:203 doi:10.1186/1471-2105-14-203 strongly predicted for A weakly predicted for A mouse trait C Prediction of disease–genes from

orthologous

phenotypes. Woods et al. BMC Bioinformatics 2013 14:203 doi:10.1186/1471-2105-14-203 Annotation Challenges •Gene functional and annotations of known processes are not complete. ! •Model organisms are used to explore different aspects of biology. ! •There are systematic differences between annotation groups. Evolution of the gene model and its relationship to wild-type and mutant phenotypes

T.R. Gingeras Genome Res. 2007; 17: 682-690

GENCODE: Creating a Validated Manually Annotated Geneset for the Whole

A. Bignell1, A. Frankish1, B. Aken1, M. Diekhans7, F. Kokocinski1, M. Lin3, M. Tress2 , J. Van Baren4, I. Barnes1, T. Hunt1, D. Carvalho-Silva1, C. Davidson1, S. Donaldson1, J. Gilbert1, E. Hart1, M. Kay1, R. Kinsella1, D. Lloyd1, J E. Loveland1, J. Mudge1, C. Snow1, J. Vamathevan1, L. Wilming1, M. Brent4, M. Gerstein6, R. Guigo5, R. Harte7, M. Kellis3, S. Searle1, J. Harrow1 and T. Hubbard1.

1Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 2Spanish National Cancer Research Centre, Madrid, Spain. 3MIT Computer Science and Artificial Intelligence Laboratory, of MIT and Harvard, Cambridge, USA. 4Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis, USA. 5Centre for Genomic Regulation, Barcelona, Catalonia, Spain. 6Department of Molecular Biophysics and Biochemistry, Yale University New Haven, USA. 7Center for Biomolecular Science and Engineering, University of California, Santa Cruz, USA.

HAVANA ENCODE

The Human and Vertebrate Analysis and Annotation (HAVANA) The aim of the ENCODE (Encyclopedia of DNA Elements) project group at the Wellcome Trust Sanger Institute produces high is to identify all functional elements in the human genome quality manual annotation of protein-coding, non-coding and sequence. During the pilot phase investigating 1% of the human pseudogene loci. All HAVANA annotation is supported by genome HAVANA produced a manually annotated geneset that transcript (EST, mRNA) and/or protein evidence and provides was validated computationally and experimentally by our f unparalleled coverage of , untranslated collaborators in the GENCODE subgroup. regions, pseudogenes and poly-adenylation features. Following the success of the ENCODE pilot project, GENCODE are All HAVANA annotation can be viewed on our Vertebrate Genome reprising their previous role and providing high quality gene Annotation browser (VEGA):http://vega.sanger.ac.uk annotation for the whole human genome. This geneset will be used in the analyses of all the other members of the ENCODE consortium.

GENCODE Annotation Manual annotation: HAVANA Data from all members of GENCODE are distributed via DAS (Distributed Annotation System) and are now visible in our Zmap annotation interface. For example, the KLHL22 shown below contains a U12 intron prediction, CONGO coding exon Computational: predictions and an intronic Yale pseudogene prediction. UCSC GENCODE (EBI) WashU Experimental: Yale HAVANA Annotation: CRG U12 Intron Yale Pseudogene Lausanne MIT Protein CRG CRG Coding Transcript Pseudogene CNIO Ensembl

Computational predictions are produced independently of manual annotation and used as both a guide for new annotation and for validation of completed annotation. Potential novel loci and variants are identified by state-of-the-art algorithms for finding exons, splice junctions, transcripts and pseudogenes. The coding potential of all annotated CDSs is assessed by investigating sequence conservation and comparing predicted secondary structures to similar proteins with solved structures.

Although initial experimental validation of transcripts was based on RT-PCR and extension by 5' and 3' RACE, short-read PolyA Features CCDS Protein Evidence EST Evidence mRNA Evidence sequences (RNA_Seq) have recently been added to validate annotated splice junctions. RNA_Seq data will also allow the MIT CONGO Exons Ensembl Predictions identification of novel transcripts and provide information on 43 tissue specificity of all annotated transcripts. Where novel features are confirmed the annotation is updated.

Validation

GENTRACK Computational validation of the manual annotation of 21 and 22 demonstrated that while HAVANA The Gentrack database was annotation is both comprehensive and robust it has been built specifically to hold data enriched by comparison with good computational predictions. provided by GENCODE groups and facilitate the investigation of all identified di!erences between manual annotation and automated predictions. Artemis showing RNA-Seq data for Plasmodium falciparum

Carver T et al. Bioinformatics 2012;28:464-469 Overview of PRO.

Natale D A et al. Nucl. Acids Res. 2011;39:D539-D545 InterPro -- http://www.ebi.ac.uk/interpro/ There are many identifiers for the same thing

Database Identifier NCBI GI 25777711 UniProt P63208 HUGO Gene Name SKP1A (HGNC:10899) OMIM MIM:601434 Locus Map 5q31 GeneID 6500 HomoloGene 38775 UniProtKB/Swiss-Prot SKP1_HUMAN RefSeq NP_008861 Protein_ID NP_008861.2 DNA coordinates NM_006930.2:140..622 ensEMBL ENSP00000331708, product of ENSG00000113558

Consensus CDS Protein Set CCDS4172.1 Human Protein Reference Dataset 3255 47 Mapping between any of these ID spaces GI, EMBL, EMBL-CDS, UniProtKB-ID, UniParc, NCBI_TaxID, UniRef100, UniRef90, UniRef50, RefSeq, RefSeq_NT, GeneID, KEGG, GenomeReviews, OMA, ProtClustDB, KO, EnsemblGenome, BioCyc, Ensembl, GeneTree, OrthoDB, UniGene, IPI, HSSP, TIGR, FlyBase, EuPathDB, UCSC, PDB, NextBio, VectorBase, euHCVdb, MEROPS, HGNC, MGI, ZFIN, WormBase, GermOnline, RGD, , TAIR, REBASE, GenoList, neXtProt, GeneCards, PharmGKB, MIM, DMDM, DIP, HPA, H- InvDB, dictyBase, Reactome, CGD, SGD, LegioList, CYGD, DrugBank, EcoGene, EchoBASE, Orphanet, Allergome, PeroxiBase, MaizeGDB, DisProt, PptaseDB

48 Interconvert ~100 ID spaces http://www.uniprot.org/faq/28#id_mapping_examples #!/usr/bin/python ! import urllib,urllib2! url = 'http://www.uniprot.org/mapping/'! params = {! 'from':'ACC',! 'to':'P_REFSEQ_AC',! 'format':'tab',! 'query':'P13368 P20806 Q9UM73 P97793 Q17192'! }! data = urllib.urlencode(params)! request = urllib2.Request(url, data)! contact = "" # Add email address to help debug ! request.add_header('User-Agent', 'Python %s' % contact)! response = urllib2.urlopen(request)! page = response.read(200000)! print page 49 GEO Omnibus (www.ncbi.nlm.nih.gov/geo) SRA Sequence Read Archive (www.ncbi.nlm.nih.gov/sra) NCBI dbGap GWAS (genotype / phenotype), medical , molecular diagnostic assays, association between genotype and non- clinical traits

NCBI dbSNP Single-base nucleotide polymorphism (SNP), small-scale multi-base deletions or insertions (DIP), retroposable element insertions and microsatelite repeat variations (short tandem repeats or STR)

NCBI dbVar genomic structural variation (SV) NCBI clinVar clinical variation (SNP, DIP, STR, SV) The Cloud ! what is the cloud? where is the cloud? are we in the cloud right now?

53

Galaxies on Galaxies on private clouds public clouds private Tool Sheds

Galaxy Tool http://usegalaxy.org Shed ...

http://usegalaxy.org/community

1 2 3 ... ∞

private Galaxy installations What is Galaxy? A collection of bioinformatics tools for: •data conversion and manipulation •statistical analysis •next generation sequencing analysis •provides integration of useful tools into reuseable pipelines, that can also be shared •unified and consistent interface for easy exploration Toolbox for: •retrieving (“get”) data •manipulating data (liftOver, filter, sort, set operations, format conversion) •data analysis (statistics, sequence alignment, variant calling and annotation)

[email protected]

Dozens of tools for different NGS applications packaged with Galaxy Galaxy Pipeline

[email protected] Managing Workflows Galaxy Visualization SDHAF2, indicated in PGL syndromes

Hereditary paraganglioma (PGL) syndromes are characterized by paragangliomas (tumors that arise from neuroendocrine tissues symmetrically distributed along the paravertebral axis from the base of the skull to the pelvis). Clinical Labs testing for SDHAF2

• Ambry C, D • Children’s Philadelphia T, D • Transgenomics C • Courtagen C • PreventionGenetics C, D • Baylor D • Partners C, D, E • Genetaq E ! C = entire coding region, D = deletion E = selected exon, T = targeted variant ClinGen'Portal''

PHENOTYPES'(PhenoDB/HPO'descriptors)' Disease' 1. Variant'is'known'cause''of'disease'O'*evidence*' (Mendelian)' 2. Variant'predisposes'to'disease'O''*evidence*' Advocacy/support' (mulDfactorial'risk'factors)' 3. Variant'common'in'disease'populaDon' LINKS' (associaDon/GWAS)' 1. PubMed,'OMM,'GTR'etc' 2. Gene/disease'specific'sites' 3. Advocacy/support'groups'

EXPRESSION' 1. Recessive/dominant' 2. Loss/gain'of'funcDon' Protein/' 3. Variable'penetrance' Drugs' 4. Monogenic/mulDfactorial' Pathway' 5. XOlinked/imprinted'

TYPE' CLINICAL'UTILITY/ACTIONABILITY' 1. SNP' 1. DiagnosDc?'' 2. CNV'–'dosage' 2. PredicDve?'' 3. Indel' 3. Drug/treatment'intervenDon?' 4. Inversion' 4. Prenatal'tesDng?' 5. Guidelines'and'standards' EFFECT'(ICCG'standards)' 5. TranslocaDon' 1. Pathogenic' 6. EpigeneDc'' 2. Likely'pathogenic' 3. Uncertain' 4. Likely'benign' CLINICAL'TESTS' 5. Benign' 1. What?' Variant' 2. How?' GeneDc/genomic'' 3. Where?' Public' NAME/LOCATION'(HGVS'nomenclature)' test'or'sequence' 1. Genomic'coordinates' Private' 2. CytogeneDc'band' OCCURRENCE' 3. Nuclear'or'mitochondrial' 1. Allele'frequency' dbGaP' 2. Genotype'frequency' 3. PopulaDon/cohortOspecific'frequency''

MODE'OF'TRANSMISSION' 1. De'novo' NONOGENIC'MODIFIERS' 1. Age' 2. Inherited' 2. Sex' 3. GeneDc'vs'somaDc'(cancer)' PopulaDon' 3. Smoking' 4. Weight' Data$provenance$for$expert$cura/on$ Detec/on$method:$$Exon$sequencing$ Disease:$Paragangliomas$2$ $ Cohort:$DNA$samples$from$205$individuals$at$Bri/sh$ GeneContext:$Exon$3$ ins/tu/ons$affected$with$adrenal$or$extraadrenal$ Gene:$SDHAF$$ Conserved$allele:$yes$ pheochromocytoma/head$and$neck$paraganglioma$(PPGL/ Variants:$4/12$ Blosum:$[3$ HNPGL)$were$analyzed.$$ PolyPhen2:$0.641,$possibly$damaging$ Affected$individuals:$205$ $ Affected$individuals$with$muta/on:$1$ VariantLoca/on:$ Automated$computa/on$ $ PMID:$23666964$ NC_000011.9:60823776[61446398$$ VariantLoca/on:$NC_000011.9:61205534$ VariantType:$CNV$duplica/on$ HGVSGenomicChange:$NM_017841.2:c.319C>T$ This$variant$is$listed$in$COSMIC$(Ensembl)$ MolecularConsequence:$Copy$number$ HGVSProteinChange:$NP_060311.1:p.Arg107Cys$ Manual$Cura/on$ increase$(SO:0001911)$ 3$ VariantType:$single$nucleo/de$variant$ Inheritance:$maternal$ MolecularConsequence:$Missense$(SO:0001583)$ Condi/onName:$Global$developmental$delay$ OMIMAllele:$n/a$ ClinicalAsser/on:$Likely$pathogenic$$ Frequency:$1000$$ ClinVar:$n/a$ 2$ $$$$ESP6500:African_American$ DECIPHER:$269756$$ PMIDs:$23666964$ $$$$ESP6500:European_American$ Condi/onName:$Paragangliomas$2$(PGL2)$ $$$$NHLBI[ESP:ESP_Cohort_Popula/ons$ ClinicalAsser/on:$Uncertain$significance$$ Ensembl:$rs113560320$$$ AncestralAllele:$C$ $ Detec/on$method:$$Exon$sequencing$ Frequency:$NHLBI[ESP:ESP_Cohort_Popula/ons$ Cohort:$Consecu/ve$pa/ents$(52$females$and$27$males;$mean$age,$ dbSNP:rs140191819$$ 45.7+/[16.8$years;$age$range,$14–82$years)$affected$with$HNPGL,$ evaluated$between$January$1,$2003$and$March$30,$2011$in$Italy$ VariantLoca/on:$ Penetrance:$Age$at$presenta/on$was$significantly$lower$in$muta/on$ NC_000011.9:3575068[67694736$(outer)$ carriers$than$in$sporadic$cases,$and$this$difference$is$s/ll$present$ VariantType:$CNV$loss$ when$excluding$pa/ents$with$a$posi/ve$FH,$which$might$have$played$ GeneSymbol:$SDHAF2$ MolecularConsequence:$Copy$number$ a$role$in$an/cipa/ng$the$detec/on$of$the$tumor$ decrease$(SO:0001912)$ 4$ Affected$individuals:$79$ Condi/onName:$None$ Affected$individuals$with$muta/on:$1$(Italian)$ Detec/on$method:$PEM$ ClinicalAsser/on:$Likely$benign$ PMID:$22241717$ VariantLoca/on:$NC_000011.9:61205292$ dbVar:nsv436122$$ Manual$Cura/on$ HGVSGenomicChange:$NM_017841.2:c.232G>A$ HGVSProteinChange:$NP_060311.1:p.Gly78Arg$ This$is$Yoruban$HapMap$sample$NA18505$ Detec/on$method:$$Exon$sequencing$ VariantType:$single$nucleo/de$variant$ Manual$Cura/on$ Valida/on:$Yeast$mutant$rescue$to$confirm$loss$of$func/on.$ MolecularConsequence:$Missense$(SO:0001583)$ Cohort:$1$Dutch$family$ OMIMAllele:$613019.0001$$$ Inheritance:$Paternal$(maternal$imprin/ng)$ dbSNP:$rs113560320$$ 1$ Penetrance:$5$individuals$(median$age$42$years)$with$a$paternal$ PMIDs:$6286462,$19628817,$21348866$ muta/on$had$not$developed$overt$paraganglioma.$This$reduced$ Condi/onName:$Paragangliomas$2$(PGL2)$ AncestralAllele:$G$ penetrance$was$thought$to$relate$to$young$age$and/or$presence$of$ AgeOfOnset:$Childhood$???$ dbSNP:$rs113560320$$$ undetected$tumors.$$ Orphanet$prevalence:$1[9$/$1,000,000$???$ Affected$individuals:$33$ ClinicalAsser/on:$Pathogenic$(LOVD:$0000721)$ Individuals$with$muta/on:$45$ ClinVar:$RCV000000428.1$$ Affected$individuals$with$muta/on:$33$ $ Non$affected$individuals$with$muta/on:$$ GeneContext:$Exon$2$ $7$maternal$inheritance$ Conserved$allele:$yes$(Ensembl)$ $5$paternal$inheritance$$ Blosum:$[2$ Unaffected$controls:$400$ PolyPhen2:$1.000,$probably$damaging$ PMID:$19628817$ $ Automated$computa/on$ $ Modeling)disease/gene/variant/phenotype)rela3onships) Mendelian)recessive) Mendelian)recessive) Compound)heterozygote)

Less)severe/ PhenotypeY) modified) Each)popula3on/cohort) PhenotypeY) has)a)variant)allele) genotype)frequency) Mendelian)dominant) Var002) Var002) Var002) Var003) (although)we)may)not) Variable)penetrance) Allele1) Allele2) Allele1) Allele2) have)the)data)) )

Less)severe/ Each)popula3on/cohort) modified) has)a)haplotype) Gene2) Gene2) PhenotypeX) associated)with)disease) Var001) (although)we)may)not) have)the)data)) ) Modifiers)eg)) Disease) B Gene3c)background/ Var001) Gene1) allele)origin/ Gene3) Var004) epigene3c)factors/ ethnicity) B Environment/lifestyle) Gene4) Var005) Gene5) Biochemical) Gene5) Gene5) PhenotypeX) Pathway)

Mendelian)dominant) Var006) Drugs)may)act)on) XBlinked) specific)protein) PhenotypeZ) Genomic)imprin3ng/silencing) isoforms)determined) by)the)underlying) gene3c)varia3on) PhenotypeW) Mul3genic) Mul3factorial) Complex) Dosage)sensi3ve/Copy)number)variants) Con3guous)gene)syndromes)) PolyPhen Assession of Variant Effect Directed graph of relationships among SNP prediction webservers and their bioinformatics sources.

Karchin R Brief Bioinform 2009;10:35-52

© The Author 2009. Published by Oxford University Press. For Permissions, please email: [email protected] Interpretome! GENE210 - Genomics and Personalized Medicine