www.454.com

454 The Road to the Future November 13, 2012 IMPORTANT NOTICE Intended Use

Unless explicitly stated otherwise, all Roche Applied Science and 454 Life Sciences products and services referenced in this presentation/document are intended for the following use:

For life science research only. Not for use in diagnostic procedures.

454 Sequencing Read Length Evolution Raising the bar in next-gen sequencing

1,000 Up to 1,000 bp 750 GS FLX+ New!

500 400 bp GS FLX Titanium Read Length (bp)

250 250 bp GS FLX Standard 100 bp GS 20

2005 2007 2008 2011 GS FLX+ System Overview

• Latest generation of the GS FLX System • Available as new instrument or upgrade to existing instrument • New GS FLX Titanium Sequencing Kit XL+ for extra- long read sequencing. Uses existing Rapid Library Prep and emPCR kits • Backwards compatible with existing GS FLX Titanium Sequencing Kit XLR70 • New GS FLX+ Computingwww.454.com Station: 4 Off-the-shelf workstation

GS Junior System Widely published NGS benchtop platform

Ultra-Deep Sequencing Whole Microbial Transcripto Sequencing me Sequencing Genotyping

Read the complete list of peer-reviewed publications at www.454.com/publications One Fragment, One Bead, One Read

One fragment One bead One well

Shear DNA and add ‘Emulsion PCR’ Deposition of beads into wells linkers Clonal amplificationof PTP device

One read

Sequencing-by- synthesis Detection of PPi release GS FLX+ Chemistry Emulsion PCR

A) Single-stranded template mixed with Capture Beads

Before PCR B) Emulsify millions of beads in PCR reagents to form water-in-oil microreactors • Microreactor contains complete amplification mix After PCR C) Amplify

D) Break Microreactors www.454.com E) Enrich for DNA positive

454 Sequencing Systems Key Concept

Bases are flowed sequentially and in a specified cycle order (TACG)

A complementary to the template strand generates light

The light signal is recorded by the CCD camera during each flow

Signal intensity is proportional to the number of incorporated

Massively parallel sequencing of clonally amplified beads in picoliter-size wells Sequencing By Synthesis GS FLX+ Chemistry Sequencing-by-synthesis

Repeated dNTP flow sequence:T A C G Process continues until user-defined number of nucleotide flow cycles are completed.

A A T C G G C A T G C T A A A A C T G A T T A G C C G T A C G A T T T T CG GAT CA T Anneal Primer

www.454.com Data Processing GS RunProcessor

Image Processing T C • Region definition & background subtractionG • Well identification on the PTP device A T

Signal Processing • Normalization & correction steps • Quality filtering & quality read trimming • Flowgram generation & basecalling 454 Sequencing Systems Flowgram Generation & Basecalling

Flow Order T A C G TTCTGCGAA

Key sequence TCAG or GACT (library reads) Key sequence CATG or ATGC (control reads) Used for well identification and signal calibration Data Analysis Overview

• (SFF Tools) – SFF file manipulation tools • GS de novo Assembler – Genome or Transcriptome de novo assembly; Large up to human size; Hybrid assemblies – Pairwise alignments of reads resulting in consensus and corresponding quality scores of contigs, ACE file of the alignment, metrics files and contig scaffolds (when using Paired-end data) • GS Reference Mapper – Targeted mapping using Sequence Capture; Genome or Transcriptome reference mapping; Variation detection including large rearrangements – Individual reads mapped to a reference sequence resulting in consensus and corresponding quality score of contigs, ACE file of the alignment, metrics files and list of variants (difference between the sample consensus and reference) and structural rearrangements. • GS Amplicon Variant Analyzer – Ultra-deep sequencing for variant analyses; Haplotyping www.454.com – Identification and quantitation of sequence variants in an amplicon library

Why Read Length, Speed & Quality Matter Sequencing a killer E. coli bug • Deadly E. coli outbreak in Europe began in May 2011 • Scientists from various institutes and companies around the globe quickly sequenced strain isolates • Those with first access to samples had the lead withWith speed, the GS but Junior what assembly, of the experts were dataable quality to answer and biologicallyread -relevant length?questions where other assemblies failed. www.454.com E. coli O104 Outbreak Sequencing timeline

German authorities UK Health report three Protection Agency deaths and 80 cases releases de novo of haemolytic assembly of uraemic syndrome H112180280 isolate from E. coliEurope’s top University using infection health official Hospital GS3 GS Junior Junior System runs announces “a Muenster & Ion serious crisis” Torrent announce Beijing Genomics “assembly” of Institute LB22669210 PGM runs isolate releases assembly of TY2428 isolate 7 PGM + 1 HiSeq runs

May 24 June 1 June 2 June 6 June 10

www.454.com E. coli O104 Outbreak Sequencing timeline Lots of … answers! questions… Bioinforma Community embraces HPA tics assembly as Global bioinformatics community Community reference –answers biologically analyzing short-read data sets Hits a Wall meaningful “More sequence data, “Afterquestions only 5 • Lots of data but limited particularly minutes of looking context paired-end over the HPA • Highly fragmented sequencing and assembly we very assemblies ranging from longer reads than clearly saw what we have in the we had expected to >450 to >3,000 contigs current assembly see all along but • Recognized outbreak are really needed was missing in the strain genome was to resolve these previous assemblies different but couldn’t prophage regions...” – a 62 Kb phage arrange the pieces Nico Petty, insertion event in University of theHigh wrbA quality gene.” draft David Queensland Hirschberg,assembly Columbia University(13 scaffolds)

May 24 June 1 June 2 June 9 June 10

www.454.com Source: http://bacpathgenomics.wordpress.com/2011/06/09/german-e-coli-phage-analysis-by-nico-petty/ BGI Releases Ion Torrent – Illumina Assembly

LotsBGI of Analysis data but few answers Complicated post- Workflow assembly analysis (even for the experts)

Complex, manual Questions remain assembly Why is this strain so toxic? Some pieces don’t match the www.454.comprocess expected biology! (Is that real?)

Source: http://bacpathgenomics.wordpress.com/2011/06/09/german-e-coli-phage-analysis-by-nico-petty/ HPA Releases GS Junior Assembly Community454 Workflow starts answering questionsThe biology becomes clear GS Junior Sequence Data Generated

De Novo Assembly (GS Assembler)

Push-button Now the story starts to de novo assembly form… Performed by •Large insertion events healthcare scientist •Acquired multi-drug resistance at HPA – not a ~1 minute set-up •Adhesin (known virulence bioinformatician!~20 min computation factor) Answers in •Integrated into the hours chromosome!

www.454.com Source: http://bacpathgenomics.wordpress.com/2011/06/11/tn21-resistance-transposon-in- the-chromosome The Biology: Why Long Quality Reads Matter Biologically accurate insertion reconstruction WrbA 5’ Stx2 converting WrbA 3’ end end phage

Representation of Stx2 phage insertion on HPA GS Junior scaffold 1 • Stx-phages carries genes for Shiga toxin • Phage (62 kb long) is typically integrated into the wrbA gene • Both short read assemblies failed to accuratelyBlog Title: “Read reconstruct length matters: thisidentifying portion the phiStx2 of the att genomesite” “A multisequence alignment of the wrbA gene clearly reveals that the Life-Technologies AFOB00000000 assembling is not reliable at the attachment site. It contains three contigs matching the wrbA gene, two of them (AFOB01000188.1,AFOB01000143.1) representing the interrupted gene, GS Juniorwhile the assembly third contig correctly (AFOB01000030.1) contains shows an the uninterrupted interruptedgene (though wrbA with a distortedgene with reading stx2 frame).” phage inserted on a single scaffold as is expected from the biology Assembly Comparison Overview Fewer runs, more complete assembly, better biology

HPA Assembly vs. BGI Assembly 3 GS Junior runs 7 Ion Torrent + unspecified HiSeq runs Single-step assembly Complex manual expert assembly 13 scaffolds (35X fewer) 453 scaffolds 969 Kb N50 scaffold (8.5X longer) 115 Kb N50 scaffold ~50% of assembled bases in scaffolds >1 Mb 0% www.454.com Note: Without Illumina data, Ion Torrent & University of Muenster abandoned assembly

Cost Comparison Overview Fewer runs, easier analysis, lower cost

• HPA GS Junior assembly – 3 runs = ~ $3,000 – 1 instrument = $100,000 – Total sequencing run time = ~ 1 day – Assembly software = FREE – Bioinformatics expertise = molecular biologist friendly • BGI (Ion Torrent & Illumina) assembly – 7 Ion Torrent runs = ~$3,500 – Unknown HiSeq paired end runs = $20,000? – 1 Ion Torrent PGM & 1 Illumina HiSeq instrument = ~ $750,000 – Total sequencing run time = 1 week+ www.454.com– Assembly Software / bioinformatics expertise =$$??? Expensive! GS Junior Assembly Overview Delivering what the community needed

3 GS Junior Push-button Biological Runs de novo answers • Upon release, communityassembly immediately embraced HPA assembly as the reference • GS Junior assembly provided clear answers to key biological questions regarding the pathogenicity of this unique strain – Correctly identified and located drug resistance and toxin genes integration into chromosome – while other assemblies couldn’t – De novo assembled majority of plasmid content –AfterCorrected working mis with-assembled short read phage assemblies regions for in the days,short experts-read assembliesturned to the GS Junior data and within ~5 minutes found the answers

454 Sequencing Developments Improving ease of use, sequencing performance, and applications

Upfront Workflow Sequencing Read Applications & Automation Length Improvements Assays

• Ease of use • Extend GS Junior • Launch dedicated improvements to read length assays for HLA, GS Junior beyond 400 bp leukemia and workflow (shotgun & HIV/HBV/HCV amplicon) • Step-by-step • Planned CE-IVD automation • Optimize GS FLX+ approved kits for solutions for long reads for select assays library prep and amplicon seq

emPCR (breaking & • Improve enrichment) robustness and consistency of results GS GType Assays for 454 Sequencing Systems Available Now!

GS GType HLA MR & GS GType Leukemia Primer HR Primer Sets Sets (TET2/CBL/KRAS & RUNX1) • Sequence-based assays for • Sequence-based assays for high- and medium- leukemia research resolution HLA typing • For use on either the GS • For use on either the GS Junior or GS FLX System Junior or GS FLX System Expanded 454 Assay Menu Beyond 2012 – focus on virology assays

• Virology – GS VTect HIV Resistance (Planned CE-IVD in Europe) – HCV Resistance/Genotyping (Planned CE-IVD in Europe) – HBV Resistance/Genotyping (Planned CE-IVD in Europe)

• HLA – GS GType HLA Primer Sets (LSR) - Available Now! – GS GType HLA Registry (Planned CE-IVD in Europe)

Collaboration with DNA Electronics ISFET Sequencing Technology Description

• Collaboration to develop semiconductor-based sequencing system

• License DNAe’s ISFET (ion-sensitive, field effect transistor) sequencing technology

• Enables detection of pH changes

from nucleotide incorporation on semiconductor chip • Builds on current 454 Sequencing portfolio by moving from optical DNA Electronics to electrochemical detection Spun out of Imperial College London • Target applications in amplicon assays, gene panels, exome sequencing, RNA-seq, and whole

Collaboration with DNAe: Technology ISFETs enable label free pH Sensing on a Semiconductor Chip

Technology Concept Description • High-density array of Micro micro-wells on an well integrated circuit. Each Array well holds a unique DNA template bead. • Beneath each well is an Package ion-sensitive transistor. d Integra • The chip is sequentially ted flooded with one Circuit nucleotide after another. If the nucleotide that floods the chip is not a match to the DNA H+ ions template, no change will detecte be detected and no base d will be called. • If a nucleotide is incorporated into the

Collaboration with IBM Nanopore Sequencing Technology

Description

• Collaboration with IBM Research to develop nanopore-based sequencing system • Technology based on IBM DNA Transistor concept

• Leverage IBM as a world leader in microelectronics, information IBM technology and computational biology T.J. Watson Research Center • Access to world class prototype facilities for semiconductor manufacturing and nanofabrication Collaboration with IBM: Technology Concept Combining a dielectric nanopore transistor with a nucleotide sensing technology to identify DNA Technology Concept Description bases • IBM DNA transistor concept DNA comprises of layers of Transisto DNA interdigitated electrodes with dielectric material, built into a r membrane that contains the nanopores. • An electric field is used to move DNA across a nanopore one base at a time (electrophoretic ratchet). • The electric field within the nanopore interacts with electric charges on DNA backbone where individual nucleotides are trapped at local field minima. Local Nanopor • Switching the polarity of the Electric voltage across the electrodes e Field permits the drift of the DNA Minima molecule in one direction until the nucleotides get trapped at new local field minima • A nucleotide sensing methodology Watch the technology video at: http://www.youtube.com/watch?v=wvclP3GySUYbased on recognition tunneling from ASU will be coupled to the DNA transistor to determine the

454 Sequencing Developments Summary

Our objective is to: improve– GS Junior & GS FLX+ Systems strengthen - Assays grow – DNAe collaboration innovate – IBM collaboration Roche Technical Support [email protected] 1.800.262.4911

For life science research only. Not for use in diagnostic procedures. 454, 454 SEQUENCING, 454 LIFE SCIENCES, EMPCR, GS 20, GS FLX, GS JUNIOR, GS FLX www.454.com TITANIUM, PTP, FASTSTART and PICOTITERPLATE are trademarks of Roche. 2-23-12 Version 1 Another Recent Example Sequencing Streptococcus in Toxic Scarlet Fever Outbreak • June 15: University of Hong Kong used GS FLX System to sequence Streptococcus strain that caused scarlet fever outbreak and one reported death • Draft genome finished within 3 days – ½ GS FLX run (514,422 reads) assembled using GS De Novo Assembler – 47 contigs Researchers identified a unique mutation (48 Kb – insertion)81 Kb N50 withincontig genomesize that is absent in other published genomes of related strains Source: http://www.hku.hk/press/news_detail_6505.html