The European Variation Archive, programmatically

Gary Saunders & Pablo Acre-Garcia November 22nd, 2017 www.ebi.ac.uk/eva [email protected] , European Nucleotide Archive Ensembl European -phenome Archive 1000 Genomes Metagenomics portal

Gene, & metabolite expression ArrayExpress Metabolights Expression Atlas PRIDE Literature & Protein sequences, families & motifs ontologies InterPro UniProt Europe PubMed Central Ontology Experimental Factor Molecular structures Ontology in Europe Electron Microscopy Data Bank

Chemical biology

ChEMBL ChEBI Reactions, interactions & pathways Systems BioModels BioSamples IntAct Reactome MetaboLights Portal Genes, genomes & variation European Nucleotide Archive Ensembl European Genome-phenome Archive 1000 Genomes Ensembl Genomes Metagenomics portal European Variation Gene, protein & metabolite expression Archive ArrayExpress Metabolights Expression Atlas PRIDE Literature & Protein sequences, families & motifs ontologies InterPro Pfam UniProt Europe PubMed Central Gene Ontology Experimental Factor Molecular structures Ontology Protein Data Bank in Europe Electron Microscopy Data Bank

Chemical biology

ChEMBL ChEBI Reactions, interactions & pathways Systems BioModels BioSamples IntAct Reactome MetaboLights Enzyme Portal European Variation Archive – EVA (Eva)

• Submission based data sharing & analysis platform • All types of variation: • SNVs, MNVs, small indels and structural variation • Germ line, somatic, within / cross population, potentially between species

Sharing variation files, and variant data, from a single resource European Variation Archive – EVA (Eva)

• Why archive variation files? • Administer file accessions • publications • share between researchers / labs • stable • Administer variant accessions • Individual variants accessioned • dbSNP • stable Any variation represented as a change against a reference genome sequence EVA Data Model

Submitter Analysis EVA

Sample(s) Methodology Genome EVA Content

- 98 short variation studies (40 human) - 154 structural variation studies (118 human) - >1 billion submitted variants EVA Content

- 98 short variation studies (40 human) - 154 structural variation studies (118 human) - >1 billion submitted variants The EVA API The EVA API • URLs • Fixed part • https://www.ebi.ac.uk/eva/webservices/rest/v1 • Variable path • /{category}/{IDs}/{resource}?{filters} https://www.ebi.ac.uk/eva/webservices/rest/v1/genes/BRCA2/variants?species=hsapiens_grch37 • https (http calls are redirected to https) • Only queries allowed: GET • HTTP returned codes • 200 OK • 400 Bad Request The EVA API: paths

/{category}/{IDs}/{resource}?{filters} • Categories • variants • segments • genes • files • studies • IDs: if several, separated by commas • resource: variants, info, all, files, view, summary The EVA API: filters

/{category}/{IDs}/{resource}?{filters} • Common

• studies: PRJ ids https://www.ebi.ac.uk/eva/?Study-Browser&browserType=sgv • species

aaegypti_aaegl3 hsapiens_grch38 agambiae_agamp4 hvulgare_030312v2 aminimus_1v1 lcrocea_10 aquadriannulatus_quad4av1 mmusculus_grcm38 asinensis_v1 oaries_oarv31 astephensi_sda500v1 osativa_irgsp10 athaliana_tair10 sbicolor_sorbi1 btaurus_umd31 slycopersicum_sl240 chircus_10 smansoni_23792v2 csabaeus_chlsab11 spombe_asm294v2 ggallus_galgal4 sratti_ed321v504 hsapiens_grch37 zmays_agpv3 The EVA API: filters

/{category}/{IDs}/{resource}?{filters} • Variant filters • annot-ct, annot-vep-version, annot-vep-cache-version • maf, polyphen, sift • exclude, include • Other options: • limit: limit number of returned results • skip: skip first n results The EVA API: output The EVA API: response

• numResults: number of returned results • numTotalResults: number of total results for that query in the database • result: list of • Strings (contig names) • Booleans (variant exists or not) • Json objects • Variant • Study The EVA API: variants • /variants/rs666/exists?species=hsapiens_grch37 • Returns true if the variant is in the archive • /variants/rs666/info?species=hsapiens_grch37 The EVA API: segments and genes

• /segments?species=hsapiens_grch37 • Return list of contig and names • /segments/8:200000-201000/variants?species= hsapiens_grch37 • Return all the variants located in the given region(s) for that assembly • /genes/BRCA2/variants?species=hsapiens_grch37 • Return all the variants in the given gene(s) Using the EVA API: examples Where to get help Help and Documentation for the EVA API • Documentation, including our FAQs, at the EVA website: • www.ebi.ac.uk Help and Documentation for the EVA API • Documentation, including our FAQs, at the EVA website: • www.ebi.ac.uk

• EVA web-services GitHub repo: • https://github.com/EBIvariation/eva-ws Help and Documentation for the EVA API • Documentation, including our FAQs, at the EVA website: • www.ebi.ac.uk

• EVA web-services GitHub repo: • https://github.com/EBIvariation/eva-ws

• Helpdesk email: • [email protected] Summary The European Variation Archive, programmatically • Introduction to the EVA

• Detailed description of the API structure and endpoints

• Command line example • Swagger documentation • Real world example using Python

• All materials available Acknowledgments EVA / DGVA Ensembl Variation Ensembl Genomes Thomas Keane Fiona Cunningham Paul Kersey Cristina Yenyxe Gonzalez Sarah Hunt Dan Bolser Pablo Acre William McLaren Christoph Grabmuller Jose Miguel Mut Anja Thormann Funding Tom Smith Laurent Gil Jag Kandasamy Diego Piggioli BioSamples ENA Helen Parkinson Rasko Leinonen Tony Burdett Marc Rosello Adam Faulconbridge Daniel Vaughan

@evarchive