Ensembl Varia on API Hinxton, Mai 2013
Anja Thormann [email protected]
EBI is an Outstation of the European Molecular Biology Laboratory. G GCTACA G GCTACA GGGCTACA GGGCTACA Lactose Intolerance rs4988235: a SNP near the LCT gene controls whether lactase enzyme is turned on or off as a person grows older. Alleles: G/A
Genotype What it means
GG Likely to be lactose intolerant.
AG Likely to be tolerant due to lactase persistence. AA Different paths to Varia on API • Varia on name (e.g. Varia on consequence) • Genomic Loca on (e.g. Linkage disequilibrium) • Popula on or individual specific (varia on, loca on) • Phenotype • Study (Structural varia on data) Ensembl Varia on
• Build a new varia on database: import data from e.g. dbSNP, … • Quality control • Data annota on: e.g. phenotype data, ancestral alleles, … • Calculate consequences of variant on transcript • SIFT and PolyPhen scores
• Varia on – Data descrip on • Varia on – Predicted data Course outline
• Central data objects: Varia on, Varia onFeature, Allele • Structural varia ons • Varia on consequences • Phenotype annota on • Linkage disequilibrium • Resequencing data • Varia on sets • What next? h p://www.ensembl.org/info/docs/Doxygen/varia on-api/index.html h p://www.ensembl.org/info/docs/Doxygen/core-api/index.html h p://www.ensembl.org/Homo_sapiens/Varia on/Summary?r=2:136608146-136609146;source=dbSNP;v=rs4988235;vdb=varia on;vf=3877172 StructuralVaria on OverlapConsequece Allele StructuralVaria onFeature PhenotypeFeature
TranscriptVaria onAllele LDFeatureContainer
TranscriptVaria on Suppor ngStructuralVaria on Varia onFeature
Varia on TranscriptVariationAdaptor
VariationAdaptor …Adaptor Varia on $variation->name >rs5571078 name >COSM998
$variation->var_class var_class Varia on >SNV >indel
source $variation->source >dbSNP >COSMIC … $allele->allele Allele >A
$allele->frequency >0.85
$allele->population allele >Bio::EnsEMBL::Variation::Population
frequency
popula on
Allele … Object creation
• Using adaptors • Fetch object(s) according to some property e.g. name, location • Check documentation which methods the adaptor provides • Using API objects: e.g. Variation • Get object(s) from an API object • $variation->get_all_Alleles() # returns a listref of Allele objects Adaptors use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db( -host => ‘ensembldb.ensembl.org’, -user => ‘anonymous’ );
# get VariationAdaptor my $va = $registry->get_adaptor(‘human’, ‘variation’, ‘variation’);
species group object type
… my $variation = $va->fetch_by_name(‘rs334’);
Exercise 1: Varia on, Allele
• Retrieve source and varia on class for the following varia ons in human: • rs55710239 • rs56385407 • COSM998 • CI003207 • For SNP rs1333049 in human, retrieve the following informa on for each of its alleles: • Allele • Frequency* • Popula on name* • Submi er name (‘handle’)* *if exists