<<

Introduction to

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

SIB and EMBnet Bioinformatics resources for biomedical scientists

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 The Swiss Institute of Bioinformatics

! Founded in March 1998 ! Collaborative structure Lausanne - Geneva - Basel ! Groups at ISREC, Ludwig Institute, Unil, HUG, UniGe, recently UniBas and soon EPFL. ! Several roles: teaching, services, research ! Currently: ~ 160 employees

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Projects at SIB

! Databases ! SWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODEL ! TrEST, TrGEN (predicted proteins), tromer (transcriptome) ! Softwares ! Melanie, Deep View, proteomic tools, ESTScan, pftools, Java applets ! Services ! Web servers ExPASy, EMBnet, MyHits ! Teaching and helpdesk ! Research ! Mostly sequence and expression analysis, 3D structure, and proteomic

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Teaching

! Master degrees in Bioinformatics (Bologna type): 90 ECTS credits in Unige, Unil and Unibas. ! EMBnet courses: 4x 1 week per year in Lausanne, Basel and Zürich ! Pregrade courses in Geneva, Fribourg and Lausanne Universities ! Other courses at CHUV and EPFL ! Courses in other countries: Colombia, Cambodia, Peru, …

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Research

! New algorithms (faster alignments…) ! New technology (GRID or cluster computing) ! New tools (protein analysis, microarrays, confocal microscopy) ! New databases (microarrays, transcriptome, proteome)

! Collaborations with lab researchers!

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Three levels of services

! Simple web access to softwares and databases ! Easy to use for basic occasional research with few sequences ! Potentially insecure ! Command-line access with a local Unix account ! More powerful (automation) and secure ! Requires to understand Unix system and frequent practice ! Collaboration with SIB ! Access to experts in the field (help desk) ! For projects requiring huge programming or special hardware resources

! Help desk ! [email protected] or http://www.expasy.org/contact.html

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 SIB’s important sites

! Home ! www.isb-sib.ch ! ExPASy - Expert Protein Analysis System ! www..org ! MyHits database and tools ! myhits.isb-sib.ch ! EMBnet Switzerland ! www.ch.embnet.org ! Geneva Bioinformatics ! www.genebio.ch

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

SIB home

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Expert Protein Analysis System

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

MyHits http://myhits.isb-sib.ch

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Swiss node http://www.ch.embnet.org

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

EMBnet organisation

! European in 1988, now world-wide spread ! 32 country nodes, 8 special nodes. ! Role ! Training, education (EMBER) ! Software development (EMBOSS, SRS) ! Computing resources (databases, websites, services) ! Helpdesk and technical support ! Publications (EMBnet.news, Briefings in Bioinformatics) ! Access: www.embnet.org ! Each node with “www.xx.embnet.org” where xx is the country code (e.g., ch for Switzerland)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 EMBnet home

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

European Molecular Open Software Suite

! Free Open Source (for most Unix plateforms) ! GCG successor (compatible with GCG file format) ! More than 150 programs (ver. 2.9.0) ! Easy to install locally ! but no interface, requires local databases ! Unix command-line only ! Interfaces ! Jemboss, wEMBOSS, www2gcg, w2h… (with account) ! Pise, EMBOSS-GUI, SRSWWW (no account) ! Staden, Kaptain, CoLiMate, Jemboss (local) ! Access: www..org or emboss.sourceforge.net

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Other important sites

! ExPASy - Expert Protein Analysis System ! www.expasy.org ! EBI - European Bioinformatics Institute ! www.ebi.ac.uk ! NCBI - National Center for Biotechnology Information ! www.ncbi.nlm.nih.gov ! Sanger - The Sanger Institute ! www.sanger.ac.uk

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Bioinformatics: definition

! Every application of computer science to biology ! Sequence analysis, images analysis, sample management, population modelling, … ! Analysis of data coming from large-scale biological projects ! Genomes, transcriptomes, proteomes, metabolomes, etc…

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 The new biology

! Traditional biology ! Small team working on a specialized topic ! Well defined experiment to answer precise questions ! New «!high-throughput!» biology ! Large international teams using cutting edge technology defining the project ! Results are given raw to the scientific community without any underlying hypothesis

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Example of «!high-throughput!»

! Complete genome ! Large-scale sampling of the transcriptome (EST) ! Simultaneous expression analysis of thousands of genes (DNA microarrays, SAGE) ! Large-scale sampling of the proteome ! Protein-protein analysis large-scale 2-hybrid (yeast, worm) ! Large-scale 3D structure production (yeast) ! Metabolism modelling ! Simulations ! Biodiversity

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Role of bioinformatics

! Control and management of the data ! Analysis of primary data e.g. ! Base calling from chromatograms ! Mass spectra analysis ! DNA microarrays images analysis ! Statistics ! Database storage and access ! Results analysis in a biological context

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

First information: a sequence ?

! Nucleotide ! RNA (or cDNA) ! Genomic (intron-exon) ! Complete or incomplete? ! mRNA with 5’ and 3’ UTR regions ! Entire chromosome ! Protein ! Pre/Pro or functional protein? ! Function prediction ! Post-translational modifications? ! Holy Grail: 3D structure?

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Genomes in numbers

! Sizes: ! Gene number: 3 5 ! virus: 10 to 10 nt ! virus: 3 to 100 5 7 ! bacteria: 10 to 10 nt ! bacteria: ~ 1000 7 ! yeast: 1.35 x 10 nt ! yeast: ~ 7000 8 10 ! mammals: 10 to 10 nt ! mammals: ~ 30’000 10 11 ! plants: 10 to 10 nt ! Plants: 30’000-50’000?

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Sequencing projects

! «!small!»!genomes (<107): bacteria, virus ! Many already sequenced (industry excluded) ! More than 150 microbial genomes already in the public domain ! More to come! (one new every two weeks…) ! «!large!» genomes (107-1010) eucaryotes ! >30 finished (S.cerevisiae, S. Pombe, E. cuniculi, G. theta, C.elegans, D.melanogaster, A. gambiae, P. falciparum, P. yoelii, D. rerio, F. rubripes, A.thaliana, O. sativa (2x), M. musculus, Homo sapiens, P. troglodytes, R. norvegicus, C. familiaris, G. gallus…) ! Many more to come: cat, elephant, pig, cow, maize (and other plants), insects, fishes, many pathogenic parasites (Leishmania…) ! EST sequencing ! Partial mRNA sequences ~20x106 sequences in the public domain Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Human genome

! Size: 3 x 109 nt for a haploid genome ! Highly repetitive sequences 25%, moderately repetitive sequences 25-30% ! Size of a gene: from 900 to >2’000’000 bases (introns included) ! Proportion of the genome coding for proteins: 5-7% ! Number of chromosomes: 22 autosomal, 1 sexual chromosome ! Size of a chromosome: 5 x 107 to 5 x 108 bases centromer exons of a gene locus control region telomer

Swiss Institute of Bioinformatics regulatory elements repetitive sequences Institut Suisse de Bioinformatique LF-2004.10

How to sequence the human genome?

! Consortium «!international!» approach: ! Generate genetic maps (meiotic recombination) and pseudogenetic maps (chromosome hybrids) for indicator sequences ! Generate a physical map based on large clones (BAC or PAC) ! Sequence enough large clones to cover the genome ! «!commercial!» approach (Celera): ! Generate random libraries of fixed length genomic clones (2kb and 10kb) ! Sequence both ends of enough clones to obtain a 10x coverage ! Use computer techniques to reconstitute the chromosomal sequences, check with the public project physical map

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Interpretation of the human draft

! All chromosomes considered as finished ! Even a genomic sequence does not tell you where the genes are encoded. The genome is far from being «!decoded!»

! One must combine genome Last freeze Ncbi34 July, 2003 and transcriptome to have a better idea

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

The transcriptome

! The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genome ! The documentation of the localization (cell type) and conditions under which these RNAs are expressed ! The documentation of the biological function(s) of each RNA species

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Public draft transcriptome

! Information about the expression specificity and the function of mRNAs ! «!full!» cDNA sequences of know function ! «!full!» cDNA sequences (HTC), but «!anonymous!» (e.g. KIAA or DKFZ collections) ! EST sequences

! cDNA libraries derived from many different tissues

! Rapid random sequencing of the ends of all clones

! ORESTES sequences ! Growing set of expression data (microarrays, SAGE etc…) ! Increasing evidences for multiple alternative splicing and polyadenylation

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Example mapping of ESTs and mRNAs

mRNAs ESTs

Computer prediction

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 The proteome

! Set of proteins present in a particular cell type under particular conditions ! Set of proteins potentially expressed from the genome ! Information about the specific expression and function of the proteins

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Information on the proteome

! Separation of a complex mixture of proteins ! 2D PAGE (IEF + SDS PAGE) ! Capillary chromatography ! Individual characterisation of proteins ! Tryptic peptides signature (MS) ! Sequencing by chemistry or MS/MS ! All post-translational modifications (PTMs) !

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Tridimentional structures

! Methods to determine structures ! X-ray cristallography ! NMR ! Data format ! Atoms coordinates (except H) in a cartesian space ! Databases ! For proteins and nucleic acids (RSCB, was PDB) ! Independent databases for sugars and small organic molecules

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Visualisation of the structures

! Secondary structure elements ! Alpha helices, beta sheets, other

! Softwares ! Various representations (atoms, bonds, secondary…) ! Big choice of commercial and free software (e.g., DeepView)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10 Sequence information, and so what ?

! How to store and organise ? ! Databases (next lecture) ! How to access, search, compare ? ! Pairwise alignments, dot plots (Tuesday) ! BLAST searches in db (Tuesday) ! EST clustering (Wednesday) ! Multiple Alignments (Wednesday) ! Patterns, PSI-BLAST, Profiles and HMMs (Thursday) ! Gene prediction (Thursday) ! Protein function prediction (Friday) ! Users problems (Friday)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10

Thank you

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.10