VectorBase Overview

November 6, 2006

E.O. Stinson • Ryan Butler • Robert Bruggner VectorBase Website Overview

• Central starting point

• Organism branding

• Data • Genomic data (Ensembl browser) • Other data

• Tools

• Data Mining (BioMart) Homepage VectorBase Site Overview A. gambiae Homepage VectorBase Site Overview A. aegypti Homepage VectorBase Site Overview A. aegypti Homepage VectorBase Site Overview A. gambiae @ VectorBase: Genome: ContigView 11/04/2006 02:29 PM

VectorBase: A. gambiae Development Info: Get VectorBase Help Genomic Data EST Data Images Documents More... Tools Get Data Search User

Chromosome X Chromosome X 11,227,080 - 11,327,081

View of Chromosome X Graphical view Graphical overview Overview View alongside ... View region at UCSC

Export data

Export information about region Export sequence as FASTA Export EMBL file Export Gene info in region

A. gambiae @Ensembl VectorBase: Genome: GeneView Browser 11/04/2006 02:40 PM

VectorBase: A. gambiae Development Info: (enable tooltips) Get VectorBase Help Genomic Data EST Data Images Documents More... Tools Get Data Search User

Ensembl Gene Report for ENSANGG00000016645 ENSANGG00000016645• Large user base Detailed view Ensembl Gene ID ENSANGG00000016645 Gene information Genomic sequence Genomic Location This gene can be found on Chromosome X at location 11,248,409-11,252,253. Jump to region X : 11227080 - 11327081 Refresh Band: Refresh Genomic The start of this gene is located in Chunk AAAB01008846_2. Gene splice site image Gene tree info Description No description + Gene variation info. Zoom ID history Prediction Method Genes were annotated by the Ensembl automatic analysis pipeline using either a GeneWise/Exonerate model << 5MB < 2MB < 1MB < Window Window > 1MB > 2MB > 5MB >> Transcript information from a database protein or a set of aligned cDNAs/ESTs followed by an ORF prediction. GeneWise/Exonerate Exon information• Feature-rich models are further combined with available aligned cDNAs/ESTs to annotate UTRs (For more information see – V.Curwen et al., Genome Res. 2004 14:942-50) Protein information Export gene data Transcripts To show this information click the + to the left Chromosome X 11,248,409 - 11,252,253 Gene DAS Report View of Chromosome X Graphical view Transcript ENSANGT00000019134 Graphical overview • Easy to Transcriptuse information Exons: 7 Transcript length: 1,269 bps Protein length: 423 residues Export information about region [Further Transcript info] [Exon information] [Protein information] Export sequence as FASTA Export EMBL file Similarity Matches This Ensembl entry corresponds to the following database identifiers: Export Gene info in region Predicted UniProtKB/TrEMBL: Q7PRT0_ANOGA [Target %id: 61; Query %id: 99] [align] Export SNP info in region Predicted EMBL: AAAB01008846 [align] Predicted Protein ID: EAA06654.3 [align] Affymx Microarray Anopheles: Ag.X.513.0_CDS_at VectorBase Array: 90.2.6 • Actively developed 90.8.9 GO The following GO terms have been mapped to this entry via UniProt and/or RefSeq: GO:0003824 [catalytic activity] IEA GO:0008152 [metabolism] IEA

InterPro IPR002698 5-formyltetrahydrofolate cyclo-ligase - [View other genes with this domain] IPR001951 Histone H4 - [View other genes with this domain] • DAS clientTranscript structure

Protein features • Allows use of Ensembl databases

Vectorbase Last Updated November 01st, 2006 Contact Webmaster

Basepair view

Vectorbase Last Updated November 01st, 2006 Contact Webmaster

http://agambiae.vectorbase.org/Genome/ContigView/?seq_region_right=…&seq_region_left=1&seq_region_width=100000&vclick.x=23&vclick.y=248 Page 1 of 1

http://agambiae.vectorbase.org/Genome/GeneView/?panel_das=off;db=core;gene=ENSANGG00000016645 Page 1 of 1 A. gambiae @ VectorBase: Genome: ContigView 11/04/2006 02:29 PM

VectorBase: A. gambiae Development Info: Get VectorBase Help Genomic Data EST Data Images Documents More... Tools Get Data Search User

Chromosome X Chromosome X 11,227,080 - 11,327,081

View of Chromosome X Graphical view Graphical overview Overview View alongside ... View region at UCSC

Export data

Ensembl Browser Export information about region Export sequence as FASTA Export EMBL file Export Gene info in region

• Large user base

• Feature-rich

Detailed view

• Easy to use Jump to region X : 11227080 - 11327081 Refresh Band: Refresh + Zoom << 5MB < 2MB < 1MB < Window Window > 1MB > 2MB > 5MB >> • Actively developed –

• DAS client

• Allows use of Ensembl databases

Basepair view

Vectorbase Last Updated November 01st, 2006 Contact Webmaster

http://agambiae.vectorbase.org/Genome/ContigView/?seq_region_right=…&seq_region_left=1&seq_region_width=100000&vclick.x=23&vclick.y=248 Page 1 of 1 • Images

• Documents

• Downloads

• Analyzed data

A. gambiae A. aegypti Tools

• BLAST

• ClustalW

• HMMR

• Integrated pipeline Data Mining: BioMart

• “Query-oriented data management system”

• Configurable filters and output

• Can include pre-calculated analyzed data

• Data-source independent (Chado, Ensembl, others) Data Mining: BioMart VectorBase Site Overview Data Mining: BioMart VectorBase Site Overview Search

• Pre-indexed data from Chado Database • Can include non-genomic data

• Provides links to more resources • Genome View in Ensemble Viewer • Other resources • BAC In Situ Hybridization Images Keyword Search Search Images Search Data Migration

• Chado integrates multiple data sources

• Most recent Ensembl database

• External data sources

• Community data

• Non-genomic data Tools

• BLAST

• ClustalW

• In development

• HMMR

• Others BLAST

• Against local VectorBase databases

• Interact with results and alignments • Formatted • View alignment in Genome browser • Download sequences that are hit against

• BLAST Demo and Skip Slides Input Sequence BLAST Status BLAST Results BLAST Results BLAST Results BLAST Results BLAST Integrated Pipeline

Sequences • Consistent interface

• One-site analyses BLAST

• Integration with genome

browser ClustalW

• Saved analyses

HMMR Under Development

• Community Annotation Pipeline

• Controlled Vocabulary Browser

• AnoBase Tool Migration Manual Annotation

• Manual Gene Models • Sequenced cDNAs of experimentally confirmed genes. • Gene Names • Associated Meta-Data

• Uses • Top-Tier evidence for gene builds • Benchmarks for automatic annotation

• Sources • Manual Gene Annotators • Community Researchers

• Sustainability & Goals • Continual improvement of gene builds • Attractiveness in ease of use Manual Annotation Pipeline

• Gene models created by manual annotators through personal inspection.

• Imported into Chado, temporarily displayed in Genome Browser via DAS.

• Exported to Ensembl via GFF format & used as primary evidence for gene build.

Manual Annotators Chado Ensembl • Ensembl Data updated at VectorBase.

Community Gene Review Annotators Build Community Annotation Pipeline

• Users submit cDNA sequences and meta data via excel spreadsheet. • Automatically aligned to genome via Exonerate. • Sequences with ambiguous locations require submitter resolution.

• Organism community rep reviews submissions. • Can contact submitter to resolve questions.

• Approved models temporarily via DAS.

• Approved models exported to Ensembl Manual via GFF and used as primary evidence Annotators Chado Ensembl for gene build. • New Stable ID’s passed back to VectorBase.

Community Gene Review • Ensembl Data updated at VectorBase. Annotators Build • Submission credit displayed with models built from community submissions. Controlled Vocabulary Feature Browser

• Hierarchical browsing of features by functionality. AnoBase Tool Migration

• Integration of AnoBase • Store and relate data within the VectorBase database (Chado).

• Component Progress:

• Full Data & Interface Integration ‣ In Situ BAC hybridization images ‣ Gene Tool

• Stand Alone Entities ‣ AnoXcel ‣ AegyXcel ‣ A. gambiae mitochondrial data

• Not yet migrated ‣ Insecticide Resistance ‣ Marker Data Questions, Comments and Feedback