Viikki Science Park 1999 Lars Paulin

New DNA sequencing technologies

DNA Sequencing and Genomics

Institute of Biotechnology University ofof HelsinkiHelsinki

http://www.biocenter.helsinki.fi/bi/DNA/

Lars Paulin Institute of Biotechnology University of Helsinki DNADNA SequencingSequencing LaboratoryLaboratory Cultivator 2, Viikinkaari 4 Started in 1990 with DNA Synthesis 1991 DNA Sequencing 1994 EU Yeast Project 1999 - 2000 High-throughput pipeline 1999 – 2002 Five EST Sequencing Projects 2003 First Microbe Genome Project – Move together with Microarray Laboratory to Cultivator 2 2006 Genome Sequencer 20

Core Facility – Service DNA sequencing and whole projects – Collaborative projects ”Research hotel” – Develope high-throughput methods – Consulting Lars Paulin Institute of Biotechnology University of Helsinki DNA Sequencing and Microarray Pipe-Line Bacterial growth Sequencers – QPix Colony Picker – ABI 3130 16-Capillary Sequencer – QFill 2 Microplate filler – ABI 3730 48-Capillary Sequencer – 4 Multidrop 96 and 384 well – Genome Sequencer 20 – 4 Titramax shakers Centrifuges PCR – 2 Eppendorf 5810 – 1 Eppendorf 5804 – 1 Tetrad MjResearch [1 x 96, 3 x (2x48)] – 1 Beckman – 2 Tetrad MjResearch ( 4 x 96 ) Other – 1 Eppendorf Mastercycler (384) – Tecan SpectraFluor Microplate Reader – 1 ABI 9700 ( 2 x 384 ) – ImageMaster VDS, GE Healthcare – 4 ABI 9700 ( 96 ) – 8 Servers for analysis Robotics Phred, Staden Program, – Qiagen Biorobot 9600 AceDB, ARB etc. – Qiagen Biorobot 8000 – SUN cluster – ABI Prism 7000 – Qiagen Biorobot 3000 ( 2m ) – 2 Tecan Genesis RSP100 – LightCycler 480, Roche low volume pipetting – Coulter Counter Z2 – Tecan Freedom Evo – Bioanalyzer low volume pipetting – LIMS, Sapphire – Biomek NXp , 96-pipettor Lars Paulin Institute of Biotechnology University of Helsinki WhatWhat cancan wewe do?do? All kinds of service High-throughput – Sequencing projects Picking 3 500 colonies / h Fragment analysis – petri discs, 22 x 22 cm plate – T-RLFP Growth ~50 x 96 / day SNP detection – 70 ml - 1 ml Colony picking PCR ~5 000 / day Plasmid, cosmid, fosmid, BAC – 50 - 100 ml purification PCR purification 12 x 96 / day PCR, PCR purification – 2 x 100 ml rDNA sequencing Plasmid, Cosmid, Fosmid, BAC purification EST, SAGE sequencing – 96 / 384 well Sequencing 0,5 - 1 Mb / 24h Metagenomic sequencing Gridding 12 membranes / day MLST-Sequencing – 1536 spots in duplicate, 384 etc. controls Genome Sequencer FLX Lars Paulin Institute of Biotechnology University of Helsinki ESTEST ProjectsProjects 19991999 –– 20022002 Barley Birch 44 829 ESTs Wood, Cold, Oxidative 13 448 clusters stress, Pathogen – 7 062 cluster 1 73 881 ESTs 20 103 clusters – 10 554 cluster 1

Gerbera Spruce Genome Res 2005, 15,475-86 Plant Mol Biol 2007, 65, 311-328. 16 994 ESTs 8 432 ESTs 11 266 clusters 5 817 clusters – 8 817 cluster 1 – 4 343 cluster 1

Total Poplar Genome Biol, 2005, 6, r101 13 845 ESTs 157 981 ESTs in Databases 8 043 clusters ~260 000 sequencing reactions – 5 860 cluster 1 Lars Paulin Institute of Biotechnology University of Helsinki Leuconostoc gasicomitatum genome project Paulin, BjörkrothBjörkroth andand Auvinen

First described by Johanna Björkroth – Appl Environ Microbiol. 2000, 66, 3764-72.

Food spoiling Leuconostoc gasicomitatum – can grow at + 6 oC – marinaded poultry products 1 953 872 bp – fish and meat 1 923 genes

Björkroth , J. Lars Paulin Institute of Biotechnology University of Helsinki LactobacillusLactobacillus rhamnosusrhamnosus GGGG

Widely used probiote

Biggest Lactobacillus genome so far – 3 009 877 bp – 3 120 ORFs 2 129 ORFS with assigned functions 991 ORFs without assigned functions – 5 rRNA operons – G+C % 46,69 – Annotation done at ERGO http://ergo.integratedgenomics.com/ERGO/ – Agilent microarrays done

– The sequence announced at the conference: SSA GutImpact 3rd Platform meeting on Foods for Intestinal Health 29.- 31.8.2007, Haikko Manor & Spa, Finland

Lars Paulin Institute of Biotechnology University of Helsinki GenomeGenome ProjectsProjects

L. gasicomitatum,1,9 Mb H. bizzozeronii, 1,7 Mb - Shotgun, fosmids - 454, plasmids - Status: finished, annotated - Status: closing phase C. pneumoniae, 1,3 Mb H. salomonis, 1,71,7 MbMb - PCR products - 454, plasmids - Status: almost finished - Status: closing phase L. rhamnosus , 3 Mb E. carotovora, 5 Mb - Shotgun, fosmids, 454 - 454, fosmids - 2 Strains, GG and Lc - Status: closing phase - Status: 2 finished, 1 annotated S. islandicus, 2,7 Mb L. oligofermentans, 1,8 Mb S. islandicus, 2,7 Mb - shotgun, 454 - 454 , fosmids - Status: closing phase - Status: closing phase A. brierleyi, 2,7 Mb C. botulinum, 3,8 Mb A. brierleyi, 2,7 Mb - Shotgun, 454 - 454, fosmids - Status: closing phase - Status: closing phase Lars Paulin Institute of Biotechnology University of Helsinki ShortShort HistoryHistory ofof DNADNA SequencingSequencing

1977 1998 – Maxam-Gilbert – First 96 Capillary – Sanger instruments MegaBace, ABI 3700 1986 2000 – First Automated DNA 2000 Sequencer ABI 370 (373) – ABI 3100, 16 Capillary 1988 2002 – Pharmacia ALF – ABI 3730, 48 or 96 Capillary 1995 2005 – ABI 377 2005 Up to 96 lanes – Genome Sequencer GS20 1996 2006 – First Capillary DNA – Solexa (Illumina) Sequencer ABI 310 2007 – SOLiD

Lars Paulin Institute of Biotechnology University of Helsinki ssDNA tai denaturoitu plasmidi AACGGTACACG SangerSanger DNADNA SequencingSequencing 3' 5' Alukkeen hybridisointi

5' 3' AACGGTACACG 1. Template 3' 5' – ssDNA or dsDNA Sekvensointireaktiot A C G T 2. Primer annealing dATP+ddATP dATP dATP dATP dCTP dCTP+ddCTP dCTP dCTP dGTP dGTP dGTP+ddGTP dGTP – Sequencing primer dTTP dTTP dTTP dTTP+ddTTP

TTGCCddA TTGCCATGTGddC TTGCCATGTddG TTGCCATGddT 3. Elongation TTGCddC TTGCCATddG TTGCCAddT TTGddC TTddG TddT – DNA polymerase ddT

Steps 2 and 3 can be done Geelielektroforeesi ja autoradiografia repeatedly => cycle sequencing A C G T 3' C 4. Electrophoresis G T G T A C C G T T 5'

deoksi TTP dideoksi TTP Lars Paulin Institute of Biotechnology University of Helsinki Incorporating Labels

Labelled primers •1 or 4 labels

Labelled deoxynucleotides •1 label

Labelled dideoxynucleotides •1 or 4 labels

•BigDye, ET DEOKSINUKLEOTIDI ALUKE terminators TEMPLAATTI DIDEOKSINUKLEOTIDI SYNTETISOITU JUOSTE

Lars Paulin Institute of Biotechnology University of Helsinki AutomatedAutomated DNADNA SequencingSequencing

Single-dye systems 4-dye systems

slab-gel systems capillary systems

1. 2. 1. 1. 2. A C G T

ELECTROPHORESIS

DATA COLLECTION

RAW DATA

PROSESSING

PROCESSED DATA

LOW CAPACITY HIGH CAPACITY

Lars Paulin Institute of Biotechnology University of Helsinki Lars Paulin Institute of Biotechnology University of Helsinki StrategiesStrategies forfor GenomeGenome SequencingSequencing

Shotgun approach Libraries – random sequencing – usually made by of different sized random shearing of libraries genomic DNADNA – assembly using – 2 kb, 4-6 kb, 10 kb different software plasmid librarieslibraries – closing of gaps using – fosmid or cosmid different methods libraries with 30 - 50 kb inserts

Lars Paulin Institute of Biotechnology University of Helsinki Whole Genome

Sheared DNA: Whole Genome: ~ 2 kb ~ 3 Mb

Random Sequencing Reads Templates Both ends

Lars Paulin Institute of Biotechnology University of Helsinki Shotgun Sequencing :ASSEMBLY

Contig 1 Contig 2 Single Low Base Consensus sequence Quality Stranded Sequence Quality Miss-Assembly Region Gap (Inverted)

• 0.5 -1.0 X (2 reads/kb) - ‘Skimming’ • 6.5 - 8.0 X (~18 reads/kb) - ‘pre-finished’ • 3.5 - 4.0 X (~9 reads/kb) -’half-shotgun’ • 10 X (22-24 reads/kb) - ‘deep shotgun’

Lars Paulin Institute of Biotechnology University of Helsinki PhredPhred,, PhrapPhrap andand StadenStaden PackagePackage ProgramProgram Phred and Phrap Staden Program University of Washington Cambridge, Sanger Center Phil Green, http://www.phrap.org/ Roger Staden, Phred quality score: http://staden.sourceforge.net/

QV = - 10 * log10( Pe ) where Pe is the probability that Trace editing the base call is an error.

Phred Pe Accuracy of Phrap assembly and Gap4 score the base call editing 10 1 in 10 90% – display of traces from sequencers 20 1 in 100 99% – display of traces from sequencers 30 1 in 1,000 99.9% – translations, orfs, RE etc. 40 1 in 10,000 99.99% – good capacity 50 1 in 100,000 99.999%

Lars Paulin Institute of Biotechnology University of Helsinki NewNew DNADNA SequencingSequencing TechnologyTechnology Parallel Sequencing Technology Massive throughput Fast sequencing No cloning step PCR

Currently three systems ready – Genome Sequencer (http://www.454.com/,http://www.roche.com) 454 Life Sciences, Roche Launched in October 2005 – Solexa (http://www.illumina.com) Illumina Launched 2006 – SOLiD (http://www.appliedbiosystems.com)

Launched in October 2007 Lars Paulin Institute of Biotechnology University of Helsinki Lars Paulin Institute of Biotechnology University of Helsinki GenomeGenome SequencerSequencer (http://www.454.com/,http://www.roche.com) Genome Sequencer GS20;FLX – Manufacturer 454 Life Science – Marketing Roche Parallel Sequencing – Shotgun sequencing No plasmid libraries Linkers ligated to fragments Emulsion PCR Picotiter plate, 1 600 000 wells – Pyrosequencing (Nyren, P. et al Anal Biochem. 1993, 208,171-5) Detection with sensitive CCD camera Run time ca. 4,5 h; 7,5 h Read lenght 100 -120 bp; 250 – 300 bp Raw sequence ca. 25 – 35 Mb/run; 80 – 100 Mb/run Lars Paulin Institute of Biotechnology University of Helsinki GenomeGenome SequencerSequencer GSGS 20/FLX20/FLX

Lars Paulin Institute of Biotechnology University of Helsinki LibraryLibrary preparationpreparation

Lars Paulin Institute of Biotechnology University of Helsinki EmulsionEmulsion PCRPCR

Lars Paulin Institute of Biotechnology University of Helsinki PicoTiterPlatePicoTiterPlate (PTP)(PTP)

Lars Paulin Institute of Biotechnology University of Helsinki PyrosequencingPyrosequencing

Adaptor Taq TCAG -- CTGA Lars Paulin Institute of Biotechnology University of Helsinki GenomeGenome SequencerSequencer GS20/FLXGS20/FLX

Lars Paulin Institute of Biotechnology University of Helsinki Lars Paulin Institute of Biotechnology University of Helsinki FlowgramFlowgram

Adaptor Taq TCAG -- CTGA Lars Paulin Institute of Biotechnology University of Helsinki Lars Paulin Institute of Biotechnology University of Helsinki Lars Paulin Institute of Biotechnology University of Helsinki AmpliconAmplicon sequencingsequencing

Lars Paulin Institute of Biotechnology University of Helsinki PairedPaired--endend SequencingSequencing

Lars Paulin Institute of Biotechnology University of Helsinki IlluminaIllumina//SolexaSolexa GenomeGenome AnalyzerAnalyzer (http://www.illumina.com) Clonal Single Molecule ArrayArray technologytechnology – Sequencing-by-synthesis technology Cluster Station – Reversible terminator-based sequencing removable fluorescence – Flow cell with > 10 million clusters Each cluster ~1,000 copies of template /cm2 – 1–8 samples / run – 3 laser system (660, 635, and 532 nm) – Read length 35 - 50 bp, 1- 2 Gb / run Run time 3 – 6 days,

Flow cell

Lars Paulin Institute of Biotechnology University of Helsinki IlluminaIllumina//SolexaSolexa

Sample preparation – 100ng–1ȝg – Attaching to Flow cell – Bridging – PCR Elongation Denaturation Clonal amplification

Lars Paulin Institute of Biotechnology University of Helsinki IlluminaIllumina//SolexaSolexa sequencingsequencing

Sequencing Sequencing - First bases - Second - Fluorescent bases reversible detected terminators after - Detection removal of with laser label and blocking and CCD camera Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD,, AppliedApplied BiosystemsBiosystems (http://www.appliedbiosystems.com)

Sequencing by Ligation – emPCR Small beads, 1mm Shendure, J. et.al. Science 2005, 309, 1728-1732 – Attaching to glass slides – Labelled probes SOLiD Fuor colours 2 base encoding system – Repeated ligation steps – Detection with 4 Mpixel camera – Read lenght 25-30 bp – 1-2 slides / run – 1-2 Gb / run – Run time 5 -10 days

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Library preparation

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD Probes – 1 024 Octamer Probes Cleavage site – 4 Dyes – 4 dinucleotides – 256 probes / dye

N = degenerate bases

Z = universal base

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki SOLiDSOLiD

Lars Paulin Institute of Biotechnology University of Helsinki ApplicationsApplications

Whole genome sequencing Transcriptome sequencing – de novo sequencing – cDNA Genome Sequencer FLX All three systems Comparative sequencing – Small RNA – All three systems All three systems ChIP sequencing – Genome Sequencer FLX – All three systems Amplicon sequencing Methylation sequencing – Mutations / SNP – All three systems – All three systems

Lars Paulin Institute of Biotechnology University of Helsinki OtherOther technologiestechnologies

Helicos (www.helicosbio.com) VisiGen (www.visigenbio.com) – Sequencin-by-synthesis – Real-time detection of DNA synthesis, FRET – No PCR amplification – Intact DNA fragments – 25-90 Mb/h – 1Mb/sec/machine

Lars Paulin Institute of Biotechnology University of Helsinki http://genomics.xprize.org/genomics $10M to the First Team to Sequence 100 Human in 10 Days RegisteredTeams 454 Life Sciences (Roche) (www.454.com )

VisiGen (www.visigenbio.com )

FfAME (www.ffame.org )

Reveo (www.reveo.com )

Base4innovation (www.base4innovation.co.uk/ )

Personal Genome X-Team (PGx), George Church Lars Paulin Institute of Biotechnology University of Helsinki AssemblersAssemblers

Phrap, Phil Green gsAssembler University of Washington Roche, 454 Life Science TIGR assemblerassembler Amos TIGR University of Maryland Celera SHARCGS Celera Corporation Genome Res,2007,17,1697 Euler SAKE University of California Bioinformatics,2007,23,500 Arachne SHRAP Broad Institute PlosONE,2007,2:e484 CAP3, PCAP New Euler Iowa State University Genome Res,2007,Dec14

Lars Paulin Institute of Biotechnology University of Helsinki