Genome Viewing

Genome Viewing

Exploring Genomes with Ensembl Prague June, 2014 Dr. Giulietta M. Spudich Ensembl EBI is an Outstation of the European Molecular Biology Laboratory. EMBL-EBI This Course Monday EBI Introduction (Giulietta) Ensembl (Giulietta) ArrayExpress (Karyn) Tuesday Phylogenies (Laura) Wednesday Proteomics (Sandra) www.ebi.ac.uk/training/Prague2014 EBI is an Outstation of the European Molecular Biology Laboratory. This Morning Ensembl Introduction Ensembl Browser Walkthrough Hands-on Exercises 11 AM Coffee BioMart 1 PM Lunch EBI is an Outstation of the European Molecular Biology Laboratory. Beginnings … 1995: 1st free-living organism: bacterium Haemophilus influenzae (1.8 million bp) 2001: First draft of the human sequence (3 gb) 2004: ‘Finished’ human sequence 2014: Polished human sequence with haplotypes (GRCh38) Today’s genomics - human 1000 Genomes Project ENCODE COURTESY OF NIH THOMAS POROSTOCKY; SOURCE: MEETINGZONE Today’s genomics – other species 6 of 24 Ensembl – Access to … 7 of 24 Ensembl Genomes - Expanding Species Bacteria, Protists, Plants, Fungi, (non-vertebrate) Metazoa 8 of 24 Ensembl Genomes – Examples http://ensemblgenomes.org Metazoa Protists Ixodes scapularis Leishmania major Schistosoma mansoni Plasmodium falciparum Anophelese gambiae Plants Arabidopsis thaliana Triticum aestivum Bacteria (10,000+) Fungi Acetobacter tropicalis Saccharomyces cerevisiae Escherichia coli Schizosaccharomyces_pombe 9 of 24 Raw sequence CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCTTACTC CGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCATTGGAGGAATATCG TAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTGCACTGCTGCGCCTCTGCTG CGCCTCGGGTGTCTTTTGCGGCGGTGGGTCGCCGCCGGGAGAAGCGTGAGGGGACAGA TTTGTGACCGGCGCGGTTTTTGTCAGCTTACTCCGGCCAAAAAAGAACTGCACCTCTGGA GCGGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAG AGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGATTTAGGACCAATA AGTCTTAATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAG ACTAAAATGGATCAAGCAGATGATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAGAAGAATC TGAACATAAAAACAACAATTACGAACCAAACCTATTTAAAACTCCACAAAGGAAACCATCTTA TAATCAGCTGGCTTCAACTCCAATAATATTCAAAGAGCAAGGGCTGACTCTGCCGCTGTAC CAATCTCCTGTAAAAGAATTAGATAAATTCAAATTAGACTTAGGAAGGAATGTTCCCAATAGT AGACTAAAAGTCTTCGCACAGTGAAAT CGCCGGGAGAAGCGTGAGGGGACAGATTTGTGACCGGCGCGGTTTTTGTCAGCTTACTC CGGCCAAAAAAGAACTGCACCTCTGGAGCGGACTTATTTACCAAGCATTGGAGGAATATCG TAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTACTAAAATGGATCAAGCAGAT GATGTTTCCTGTCCACTTCTAAATTCTTGTCTTAG AATTGGTTTGAAGAACTTTCTTCAGAAGCTCCACCCTATAATTCTGAACCTGCAGTGAAAGT CCTGTTGTTCTACAATGTACACATGTAACACCACAAAGAGATAAGTCA Ensembl – unlocking the code Regulation Conserved sequence Gene Allele • Splice variants, proteins, non-coding RNA • Small and large scale sequence variation, phenotype associations • Whole genome alignments, protein trees • Potential promoters and enhancers, DNA methylation • User upload, custom data Figure adapted from the ENCODE project www.nature.com/nature/focus/encode/ 29 May 2014 11 This talk … Genome Sequencing and Browsers Ensembl Data Genes Variation Comparative Genomics Regulation Access Challenge: number of gene/protein sequences increases • UniProtKB/Swiss-Prot (e.g.Q8IU82) 542,258 • UniProtKB/TrEMBL 51,616,950 • NCBI RefSeq (e.g. NP_006570) 37,371,278 13 of 24 Is there a consensus? • Reaching a consensus coding sequence set for human and mouse. • Human 29,045 CCDS IDs -18,683 Ensembl Gene IDs (e74) • Mouse 23,093 CCDS IDs- 19,988 Ensembl GeneIDs (e72) The GENCODE set www.gencodegenes.org • Human and mouse • Not just the cds (coding sequence) • Ensembl contributes genes to GENCODE • Havana contributes genes to GENCODE • GENCODE is used by ENCODE, 1000 Genomes, and other projects. 15 of 24 This talk … Genome Sequencing and Browsers Ensembl Data Genes Variation Comparative Genomics Regulation Access Ensembl Variation Aims: • Collect, integrate and annotate all known variants • Provide tools for comparison to other genomic data • Provide a framework for access and to improve understanding Practical applications of variation Molecular and clinical medicine • Diagnosis, detection and treatment: – e.g. myotonic dystrophy, fragile X syndrome, inherited colon cancer, familial breast cancer • Pharmacogenomics "custom drugs" DNA forensics • Identification of suspects • catastrophe victims • endangered species Agriculture, livestock breeding • Disease-, insect-, and drought-resistant crops • Healthier, disease-resistant animals • Marker-assisted breeding • More nutritious produce • Reducing the costs of agriculture Anthropology, evolution, and human migration Variation Sources . dbSNP (1000 Genomes, ClinVar, etc) . ESP (Exome Sequencing Project) . UniProt . COSMIC . HGMD_Public . NHGRI-GWAS . & more … www.ensembl.org/info/genome/variation/sources_documentation Variation in the Browser Ensembl Variant Effect Predictor . Uses an Ensembl gene set to annotate: . SNPs . Indels New Interface! . Variants in regulatory regions . Structural variants REST API Web interface Perl script XM L . Publication: McLaren et al. 2010 (Bioinformatics) o v a p s o l o l s h a u c l g l n a _ t a s y i a g h t r r _ t g y s u s a t M g u s u a l e i _ S l l l o n a s M p i a a e i n n t _ r g a e o G M s a c y n d c a i o D n p l s r e n i C p o a o o a i s l r _ A h h s p p n a n o s i e y u h e c l u n p l u _ o i s a i s s P u h T s s e _ s i i r _ l _ l s e c o s _ u L p d o s a E _ h n c o c u c i s u n u i i x a c n o t d a y c i g A p lo r o v h o s m s i i r e h i o v d i i v _ e r n o r a o a n r d t _ r h e n s e i n i ig _ _ o o n o s l n s o s s t c m i m m _ e p i i p a a f t h e u u o n i s i l r u t r o t s c f i P p b a _ p i h d u l _ m c a r u a e i o h c o _ ac f t n a n n c o ra u m g _ r n e a r e _ e t u s E ic s l c r Te if ru f t n X ia O k o r a i O a h s in s a u r T op tipe S n n e h _la a o i s ip as c a r i X yzi e re i im Or P u x t uleatus te s _ a steus_ac ro _e a L Gastero p u ra us ro n M _ p e yo va a u Gadus_m orhua C tis m eu s Must ani _lu py s ela_p s_fa cif ru Da utor m i ug s nio_rerio Ailur ius_fu liar us opoda_m e ro is lanoleuca Felis_catus Petro caballus m yzo Equus_ n_m a rinus crofa Sus_s s uru us os Cio _ta cat c C na_ Bos un pa io sav _tr a_ na ign ps gn _in yi sio cu tes Tur Vi tin s s al ep lu s is c u u i n t r ri ic a Ca D p n e e e ro _ u g n s Ensembla c in s o o l n _ n p o s lu rh h t m l a T il u l i a a o a e e i b h g i s _ c c e t d m c a i r l e r t u S i e C s t O o d o d b a is la e t r n i _ i n a c _ i c r p o u e o a n c y t _ r g l l _ i r r _ s h e a l a s i s u i a g s O a _ a y v y s t r a te s p s h N o n r u g a m M m s m u y m c u _ r o o C i l _ i y r o T r a P d u s x m c i g i d u o t c c e _ c u • o e a G s c n h a I s j v b _ p m a s H c g o c i r u t e P o e e c c o a a o r D l a m r c i c u e n m _ _ l o l n o v _ h s a a o _ t m r _ i s _ b _ _ s s u t s i O c l u u g r a u i e a s e o t o p e l u l g t M a i M i r i a c i l e Whole Genome Alignmentst l o • n l R t o a d s a g y t e e n s y s Image obtained using Dendroscope (D.H. Huson and C Scornavacca, • Gene Trees Dendroscope 3: An interactive tool for rooted phylogenetic trees and networks, Syst ematic Biology, 2012 ) • Homologues Protein Families Comparative Genomics Whole genome alignments Homo sapiens ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---ACTGCTGCGCCTCTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCTCTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Pan troglodytes ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAAGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCACTG-CTGCGCCTCGGGTGTCTTTTGCGGCG Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TCTGCTGCGCCTCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Gorilla gorilla gorilla ........................................................................................................................ Great apes Ancestral sequences ........................................................................................................................ Old world Pongo abelii ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAC-TAGGCG-GCAGAGGCGGAGC--CGCTG-TGGC---TGTGCTGCACCTGTG-CTGCGCCTCGGGTCTCTTTTGCGGCG monkeys Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAT-TAGGCG-GCAGAGGCGGAGC--TGCTG-TGGC--------------TCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Macaca mulatta ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCTTCTGAAAT-CAGGCG-GCAGAGGTGGAAC--TGCTGCTGGC--------------TCTG-CTGCGCCTCGGGTCTCTTTTGCGGCG Primates Ancestral sequences ACGT-GG--CCAGCGCGGGCTTGTGGCGCGAGCGTCTGAAAT-GAGGCG-GCAGAGGCGGAGC--TGCTG-TGGC--------------TCTG-CCGCGCCTCGGGTCTTTTCTGCGGCG Callithrix jacchus ACGT-GG--TCAGCGCGGGCTTGTGGCGCGAGCGTCTGAAAT-GAGGCG-GCAGAGGCGGACC--TGCTG-TGTC--------------TCTG-CCGCGCCTCCGGTCTTTTCTGCGACG Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TGCTG-CGGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Mus musculus ACGG-GC--AGAGCGCGGGCTTTTCGCGGGAGCGGGAGCCGT-G----------AGGCGTTGCCGTCAGT-CAGCT-----------------ACCGCTGC------------------- Rodents Ancestral sequences ACGG-GC--AGAGCGCGGGCTTTTCGCGGGAGCGTGAGAAGT-G----------AGGCGGTGCCGTCCGT-CAGCT-----------------ACCGCAAC------------------- Rattus norvegicus ACGGCGC--AGAGCGCGGGCTTTTCGCAGGAGCGTGAGAAGT-G----------AGGCGGCGCCGTCCGT-CAGCG-----------------GCCGCAAC------------------- Glires Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TCCTT-CAGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Boroeutherian Oryctolagus cuniculus ACGT-GC--CCAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAA-AAGGCT-ATGGAGGCGGAGC--TCCTT-CAGCT------------------CCGCGTCTGGGGTCTTGCCTAGGGCA Ancestral sequences ACGT-GC--CGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCG-GCGGAGGCGGAGC--TGCTG-CGGCT------------------CCGCGTCTCGGGTCTTTTCTGCGGCA Bos taurus ACAT-ATCCCGAGAGCAGGCTTTTGGCGCGAGAATCTGAAAC-CCGGTGGGCGGAGGTGCGGC--TGCTG-AAGTTTG----------------C--TGTCTCGGGCGG-T--------- Laurasiatheria Ancestral sequences ACGT-GCTCCGAGAGCGGGCTTTTGGCGCGAGCGTCTGAAAT-AAGGCGAGCGGAGGCGGAGC--TGCTG-GGGCTCC----------------C--TGTCTCGGGTGG-TTCTGTGGCA Canis lupus familiaris .......................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    33 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us