Genome Viewing

Genome Viewing

Ensembl – An Overview Twitter: #Ensembl Dr. Giulietta M. Spudich Ensembl Outreach EBI is an Outstation of the European Molecular Biology Laboratory. EMBL-EBI This talk … Genome Sequencing and Browsers Ensembl Data Genes Variation Comparative Genomics Regulation Access Beginnings … 1995: 1st free-living organism: bacterium Haemophilus influenzae (1.8 million bp) 2001: First draft of the human sequence (3 gb) 2004: ‘Finished’ human sequence 2014: Polished human sequence with haplotypes (GRCh38) Today’s genomics - human 1000 Genomes Project ENCODE COURTESY OF NIH THOMAS POROSTOCKY; SOURCE: MEETINGZONE Today’s genomics – other species 5 of 24 Ensembl – Access to … 6 of 24 Sister project … Bacteria, Protists, Plants, Fungi, (non-vertebrate) Metazoa 7 of 24 Raw sequence CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCTTACTC CGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCATTGGAGGAATATCG TAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTGCACTGCTGCGCCTCTGCTG CGCCTCGGGTGTCTTTTGCGGCGGTGGGTCGCCGCCGGGAGAAGCGTGAGGGGACAGA TTTGTGACCGGCGCGGTTTTTGTCAGCTTACTCCGGCCAAAAAAGAACTGCACCTCTGGA GCGGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAG AGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGATTTAGGACCAATA AGTCTTAATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAGAAGAATC TGAACATAAAAACAACAATTACGAACCAAACCTATTTAAAACTCCACAAAGGAAACCATCTTA TAATCAGCTGGCTTCAACTCCAATAATATTCAAAGAGCAAGGGCTGACTCTGCCGCTGTAC CAATCTCCTGTAAAAGAATTAGATAAATTCAAATTAGACTTAGGAAGGAATGTTCCCAATAGT AGACTAAAAGTCTTCGCACAGTGAAAT CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCTTACTC CGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCATTGGAGGAATATCG TAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTACTAAAATGGATCAAGCAGAT GATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAGTGAAAGT CCTGTTGTTCTACAATGTACACATGTAACACCACAAAGAGATAAGTCA Ensembl – unlocking the code Regulation Conserved sequence Gene Allele • Splice variants, proteins, non-coding RNA • Small and large scale sequence variation, phenotype associations • Whole genome alignments, protein trees • Potential promoters and enhancers, DNA methylation • User upload, custom data Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/ 06 March 2014 9 This talk … Genome Sequencing and Browsers Ensembl Data Genes Variation Comparative Genomics Regulation Access Challenge: number of gene/protein sequences increases • UniProtKB/Swiss-Prot (e.g.Q8IU82) 542,258 • UniProtKB/TrEMBL 51,616,950 • NCBI RefSeq (e.g. NP_006570) 37,371,278 11 of 24 Is there a consensus? • Reaching a consensus coding sequence set for human and mouse. • Human 29,045 CCDS IDs -18,683 Ensembl Gene IDs (e74) • Mouse 23,093 CCDS IDs- 19,988 Ensembl GeneIDs (e72) The GENCODE set www.gencodegenes.org • Ensembl has long been respected for its high- quality gene sets • GENCODE genes = Ensembl Automatic Pipeline + Havana Manual Annotation (+ Yale pseudogenes) • GENCODE is used by ENCODE, 1000 Genomes, and other projects. 13 of 24 This talk … Genome Sequencing and Browsers Ensembl Data Genes Variation Comparative Genomics Regulation Access Ensembl Variation Aims: • Collect, integrate and annotate all known variants • Provide tools for comparison to other genomic data • Provide a framework for access and to improve understanding Practical applications of variation Molecular and clinical medicine • Diagnosis, detection and treatment: – e.g. myotonic dystrophy, fragile X syndrome, inherited colon cancer, familial breast cancer • Pharmacogenomics "custom drugs" DNA forensics • Identification of suspects • catastrophe victims • endangered species Agriculture, livestock breeding • Disease-, insect-, and drought-resistant crops • Healthier, disease-resistant animals • Marker-assisted breeding • More nutritious produce • Reducing the costs of agriculture Anthropology, evolution, and human migration Variation Sources . dbSNP (1000 Genomes, ClinVar, etc) . ESP (Exome Sequencing Project) . UniProt . COSMIC . HGMD_Public . NHGRI-GWAS . & more … www.ensembl.org/info/genome/variation/sources_documentation Variation in the Browser Ensembl Variant Effect Predictor . Uses an Ensembl gene set to annotate: . SNPs . Indels New Interface! . Variants in regulatory regions . Structural variants Web interface Perl script REST API XM L . Publication: McLaren et al. 2010 (Bioinformatics) o v a p s o l o l s h a u c l g l n a _ t a s y i a g h t r r _ t g y s u s a t M g u s u a l e i _ S l l l o n a s M p i a a e i n n t _ r g a e o G M s a c y n d c a i o D n p l s r e n i C p o a o o a i s l r _ A h h s p p n a n o s i e y u h e c l u n p l u _ o i s a i s s P u h T s s e _ s i i r _ l _ l s e c o s _ u L p d o s a E _ h n c o c u c i s u n u i i x a c n o t d a y c i g A p lo r o v h o s m s i i r e h i o v d i i v _ e r n o r a o a n r d t _ r h e n s e i n i ig _ _ o o n o s l n s o s s t c m i m m _ e p i i p a a f t h e u u o n i s i l r u t r o t s c f i P p b a _ p i h d u l _ m c a r u a e i o h c o _ ac f t n a n n c o ra u m g _ r n e a r e _ e t u s E ic s l c r Te if ru f t n X ia O k o r a i O a h s in s a u r T op tipe S n n e h _la a o i s ip as c a r i X yzi e re i im Or P u x t uleatus te s _ a steus_ac ro _e a L Gastero p u ra us ro n M _ p e yo va a u Gadus_m orhua C tis m eu s Must ani _lu py s ela_p s_fa cif ru Da utor m i ug s nio_rerio Ailur ius_fu liar us opoda_m e ro is lanoleuca Felis_catus Petro caballus m yzo Equus_ n_m a rinus crofa Sus_s s uru us os Cio _ta cat c C na_ Bos un pa io sav _tr a_ na ign ps gn _in yi sio cu tes Tur Vi tin s s al ep lu s is c u u i n t r ri ic a Ca D p n e e e ro _ u g n s a c in s Ensembl o o l n _ n p o s lu rh h t m l a T il u l i a a o a e e i b h g i s _ c c e t d m c a i r l e r t u S i e C s t O o d o d b a is la e t r n i _ i n a c _ i c r p o u e o a n c y t _ r g l l _ i r r _ s h e a l a s i s u i a g s O a _ a y v y s t r a te s p s h N o n r u g a m M m s m u y m c u _ r o o C i l _ i y r o T r a P d u s x m c i g i d u o t c c e _ e c u c o a n G s h a I s j v b _ p m a s H c g o c i r u t e P o e e c c o a a o r D l a m r c i c u e n m _ _ l o l n o v _ h s a a o _ t m r _ i s _ b _ _ s s u t s i O c l u u g r a u i e a s e o t o p e l u l g t M a i M i r l i a c i o e t ll n R t o a d s a g y t e e n s y s Image obtained using Dendroscope (D.H. Huson and C Scornavacca, Dendroscope 3: An interactive tool for rooted phylogenetic trees and networks, Syst ematic Biology, 2012 ) Comparative Genomics Whole genome alignments Homo sapiens ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---ACTGCTGCGCCTCTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCTCTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Pan troglodytes ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAAGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCACTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCTCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Gorilla gorilla gorilla ........................................................................................................................ Great apes Ancestral sequences ........................................................................................................................ Old world Pongo abelii ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TGTGCTGCACCTGTG-CTGCGCCTCGGGTCTCTTTTGCGGCG monkeys Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAT-TAGGCG-GCAGAGGCGGAGC--TGCTG-TGGC--------------TCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Macaca mulatta ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAT-CAGGCG-GCAGAGGTGGAAC--TGCTGCTGGC--------------TCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Primates Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCGTCTGAAAT-GAGGCG-GCAGAGGCGGAGC--TGCTG-TGGC--------------TCTG-CCGCGCCTCGGGTCTTTTCTGCGGCG Callithrix jacchus ACGT-GG--TCAGCGCGGGCTTGTGGCGCGAGCGTCTGAAAT-GAGGCG-GCAGAGGCGGACC--TGCTG-TGTC--------------TCTG-CCGCGCCTCCGGTCTTTTCTGCGACG Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TGCTG-CGGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Mus musculus ACGG-GC--AGAGCGCGGGCTTTTCGCGGGAGCGGGAGCCGT-G----------AGGCGTTGCCGTCAGT-CAGCT-----------------ACCGCTGC------------------- Rodents Ancestral sequences ACGG-GC--AGAGCGCGGGCTTTTCGCGGGAGCGTGAGAAGT-G----------AGGCGGTGCCGTCCGT-CAGCT-----------------ACCGCAAC------------------- Rattus norvegicus ACGGCGC--AGAGCGCGGGCTTTTCGCAGGAGCGTGAGAAGT-G----------AGGCGGCGCCGTCCGT-CAGCG-----------------GCCGCAAC------------------- Glires Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TCCTT-CAGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Boroeutherian Oryctolagus cuniculus ACGT-GC--CCAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAA-AAGGCT-ATGGAGGCGGAGC--TCCTT-CAGCT------------------CCGCGTCTGGGGTCTTGCCTAGGGCA Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TGCTG-CGGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Bos taurus ACAT-ATCCCGAGAGCAGGCTTTTGGCGCGAGAATCTGAAAC-CCGGTGGGCGGAGGTGCGGC--TGCTG-AAGTTTG----------------C--TGTCTCGGGCGG-T--------- Laurasiatheria Ancestral sequences ACGT-GCTCCGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCGAGCGGAGGCGGAGC--TGCTG-GGGCTCC----------------C--TGTCTCGGGTGG-TTCTGTGGCA Canis lupus familiaris ........................................................................................................................ Ancestral sequences .......................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    33 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us