JULIA SANDBERG Stockholm, Sweden, 2011 Sweden, Stockholm, cells and nucleic acids and nucleic cells Doctoral Thesis in Biotechnology Doctoral Massively parallel analysis of analysis parallel Massively

JULIA SANDBERG Massively parallel analysis of and nucleic acids KTH 2011 www.kth.se ISSN 1654-2312 ISBN 978-91-7501-123-3 TRITA-BIO ReportTRITA-BIO 2011:22

Massively parallel analysis of cells and Massively parallel analysis of cells and nucleic acids nucleic acids

Julia Sandberg Julia Sandberg

Royal Institute of Technology Royal Institute of Technology School of Biotechnology School of Biotechnology Stockholm 2011 Stockholm 2011

© Julia Sandberg © Julia Sandberg Stockholm 2011 Stockholm 2011

Royal Institute of Technology Royal Institute of Technology School of Biotechnology School of Biotechnology Division of Technology Division of Gene Technology Science for Laboratory Science for Life Laboratory SE-104 50 Stockholm SE-104 50 Stockholm Sweden Sweden

Printed by E-PRINT Printed by E-PRINT Oxtorgsgatan 9-11 Oxtorgsgatan 9-11 SE-111 57 Stockholm SE-111 57 Stockholm Sweden Sweden

ISBN 978-91-7501-123-3 ISBN 978-91-7501-123-3 TRITA-BIO Report 2011:22 TRITA-BIO Report 2011:22 ISSN 1654-2312 ISSN 1654-2312 Cover Illustration: Microscope image of DNA capture beads (Roche/454) that have been Cover Illustration: Microscope image of DNA capture beads (Roche/454) that have been barcode-specifically labeled. barcode-specifically labeled.

Julia Sandberg (2011): Massively parallel analysis of cells and nucleic acids Julia Sandberg (2011): Massively parallel analysis of cells and nucleic acids Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Stockholm, Sweden. Stockholm, Sweden.

Abstract Abstract

Recent proceedings in biotechnology have enabled completely new avenues in life science research to be Recent proceedings in biotechnology have enabled completely new avenues in life science research to be explored. By allowing increased parallelization an ever-increasing complexity of cell samples or experiments can explored. By allowing increased parallelization an ever-increasing complexity of cell samples or experiments can be investigated in shorter time and at a lower cost. This facilitates for example large-scale efforts to study cell be investigated in shorter time and at a lower cost. This facilitates for example large-scale efforts to study cell heterogeneity at the single cell level, by analyzing cells in parallel that also can include global genomic analyses. heterogeneity at the single cell level, by analyzing cells in parallel that also can include global genomic analyses. The work presented in this thesis focuses on massively parallel analysis of cells or samples, The work presented in this thesis focuses on massively parallel analysis of cells or nucleic acid samples, demonstrating technology developments in the field as well as use of the technology in life sciences. demonstrating technology developments in the field as well as use of the technology in life sciences. In stem cell research issues such as cell morphology, cell differentiation and effects of reprogramming factors are In stem cell research issues such as cell morphology, cell differentiation and effects of reprogramming factors are frequently studied, and to obtain information on cell heterogeneity these experiments are preferably carried out on frequently studied, and to obtain information on cell heterogeneity these experiments are preferably carried out on single cells. In paper I we used a high-density microwell device in silicon and glass for culturing and screening of single cells. In paper I we used a high-density microwell device in silicon and glass for culturing and screening of stem cells. Maintained pluripotency in stem cells from and mouse was demonstrated in a screening assay stem cells. Maintained pluripotency in stem cells from human and mouse was demonstrated in a screening assay by antibody staining and the chip was furthermore used for studying neural differentiation. The chip format allows by antibody staining and the chip was furthermore used for studying neural differentiation. The chip format allows for low sample volumes and rapid high-throughput analysis of single cells, and is compatible with Fluorescence for low sample volumes and rapid high-throughput analysis of single cells, and is compatible with Fluorescence Activated Cell Sorting (FACS) for precise cell selection. Activated Cell Sorting (FACS) for precise cell selection. Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences by Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences by constantly producing increasing amounts of data from one sequencing run. However, the reagent costs and labor constantly producing increasing amounts of data from one sequencing run. However, the reagent costs and labor requirements in current massively parallel sequencing protocols are still substantial. In paper II-IV we have requirements in current massively parallel sequencing protocols are still substantial. In paper II-IV we have focused on flow-sorting techniques for improved sample preparation in bead-based massive sequencing platforms, focused on flow-sorting techniques for improved sample preparation in bead-based massive sequencing platforms, with the aim of increasing the amount of quality data output, as demonstrated on the Roche/454 platform. In paper with the aim of increasing the amount of quality data output, as demonstrated on the Roche/454 platform. In paper II we demonstrate a rapid alternative to the existing shotgun sample titration protocol and also use flow-sorting to II we demonstrate a rapid alternative to the existing shotgun sample titration protocol and also use flow-sorting to enrich for beads that carry amplified template DNA after emulsion PCR, thus obtaining pure samples and with no enrich for beads that carry amplified template DNA after emulsion PCR, thus obtaining pure samples and with no downstream sacrifice of DNA sequencing quality. This should be seen in comparison to the standard 454- downstream sacrifice of DNA sequencing quality. This should be seen in comparison to the standard 454- enrichment protocol, which gives rise to varying degrees of sample purity, thus affecting the sequence data output enrichment protocol, which gives rise to varying degrees of sample purity, thus affecting the sequence data output of the sequencing run. Massively parallel sequencing is also useful for deep sequencing of specific PCR-amplified of the sequencing run. Massively parallel sequencing is also useful for deep sequencing of specific PCR-amplified targets in parallel. However, unspecific product formation is a common problem in amplicon sequencing and since targets in parallel. However, unspecific product formation is a common problem in amplicon sequencing and since these shorter products may be difficult to fully remove by standard procedures such as gel purification, and their these shorter products may be difficult to fully remove by standard procedures such as gel purification, and their presence inevitably reduces the number of target sequence reads that can be obtained in each sequencing run. In presence inevitably reduces the number of target sequence reads that can be obtained in each sequencing run. In paper III a gene-specific fluorescent probe was used for target-specific FACS enrichment to specifically enrich for paper III a gene-specific fluorescent probe was used for target-specific FACS enrichment to specifically enrich for beads with an amplified target gene on the surface. Through this procedure a nearly three-fold increase in fraction beads with an amplified target gene on the surface. Through this procedure a nearly three-fold increase in fraction of informative sequences was obtained and with no sequence bias introduced. Barcode labeling of different DNA of informative sequences was obtained and with no sequence bias introduced. Barcode labeling of different DNA libraries prior to pooling and emulsion PCR is standard procedure to maximize the number of experiments that can libraries prior to pooling and emulsion PCR is standard procedure to maximize the number of experiments that can be run in one sequencing lane, while also decreasing the impact of technical noise. However, variation between be run in one sequencing lane, while also decreasing the impact of technical noise. However, variation between libraries in quality and GC content affects amplification efficiency, which may result in biased fractions of the libraries in quality and GC content affects amplification efficiency, which may result in biased fractions of the different libraries in the sequencing data. In paper IV barcode specific labeling and flow-sorting for normalization different libraries in the sequencing data. In paper IV barcode specific labeling and flow-sorting for normalization of beads with different barcodes on the surface was used in order to weigh the proportion of data obtained from of beads with different barcodes on the surface was used in order to weigh the proportion of data obtained from different samples, while also removing mixed beads, and beads with no or poorly amplified product on the different samples, while also removing mixed beads, and beads with no or poorly amplified product on the surface, hence also resulting in an increased sequence quality. surface, hence also resulting in an increased sequence quality.

In paper V, cell heterogeneity within a human being is being investigated by low-coverage whole In paper V, cell heterogeneity within a human being is being investigated by low-coverage whole genome sequencing of single cell material. By focusing on the most variable portion of the human genome, polyguanine sequencing of single cell material. By focusing on the most variable portion of the human genome, polyguanine nucleotide repeats regions, variability between different cells is investigated and highly variable polyguanine nucleotide repeats regions, variability between different cells is investigated and highly variable polyguanine repeat loci are identified. By selectively amplifying and sequencing polyguanine nucleotide repeats from single repeat loci are identified. By selectively amplifying and sequencing polyguanine nucleotide repeats from single cells for which the phylogenetic relationship is known, we demonstrate that massively parallel sequencing can be cells for which the phylogenetic relationship is known, we demonstrate that massively parallel sequencing can be used to study cell-cell variation in length of these repeats, based on which a phylogenetic tree can be drawn. used to study cell-cell variation in length of these repeats, based on which a phylogenetic tree can be drawn.

Keywords: Massively parallel sequencing, 454, Illumina, multiplex amplification, whole genome amplification, Keywords: Massively parallel sequencing, 454, Illumina, multiplex amplification, whole genome amplification, single cell, polyguanine, flow-cytometry single cell, polyguanine, flow-cytometry © Julia Sandberg © Julia Sandberg

iii iii

iv iv

Den här boken tillägnas Ulla som gav en bit av sig själv och räddade min pappas liv. Den här boken tillägnas Ulla som gav en bit av sig själv och räddade min pappas liv.

v v

vi vi

List of publications List of publications

This thesis is based upon the following five original papers, which are referred to in the This thesis is based upon the following five original papers, which are referred to in the text by their Roman numerals (I-V). The papers are appended at the end of this thesis. text by their Roman numerals (I-V). The papers are appended at the end of this thesis.

I. Lindstrom, S*, Eriksson, M*, Vazin, T, Sandberg, J, Lundeberg, J, Frisén, J and I. Lindstrom, S*, Eriksson, M*, Vazin, T, Sandberg, J, Lundeberg, J, Frisén, J and Andersson-Svahn, H (2009) High-density microwell chip for culture and analysis Andersson-Svahn, H (2009) High-density microwell chip for culture and analysis of stem cells. PLoS One, 4, e6997. of stem cells. PLoS One, 4, e6997. II. Sandberg, J, Stahl, PL, Ahmadian, A, Bjursell, MK and Lundeberg, J (2009) II. Sandberg, J, Stahl, PL, Ahmadian, A, Bjursell, MK and Lundeberg, J (2009) Flow cytometry for enrichment and titration in massively parallel DNA Flow cytometry for enrichment and titration in massively parallel DNA sequencing. Nucleic Acids Res, 37, e63. sequencing. Nucleic Acids Res, 37, e63. III. Sandberg, J, Neiman, M, Ahmadian, A and Lundeberg, J (2010) Gene-specific III. Sandberg, J, Neiman, M, Ahmadian, A and Lundeberg, J (2010) Gene-specific FACS sorting method for target selection in high-throughput amplicon FACS sorting method for target selection in high-throughput amplicon sequencing. BMC Genomics, 11, 140. sequencing. BMC Genomics, 11, 140. IV. Sandberg, J, Werne, B, Dessing, M, Lundeberg, J (2011) Rapid flow-sorting to IV. Sandberg, J, Werne, B, Dessing, M, Lundeberg, J (2011) Rapid flow-sorting to simultaneously resolve multiplex massively parallel sequencing products. Scientific simultaneously resolve multiplex massively parallel sequencing products. Scientific Report. 1, 108. Report. 1, 108. V. Sandberg, J*, Borgström, E*, Ernst, A, Paterlini, M, Réu, P, Ahmadian, A, V. Sandberg, J*, Borgström, E*, Ernst, A, Paterlini, M, Réu, P, Ahmadian, A, Frisén, J, Lundeberg, J (2011) Interrogation of polyguaninine nucleotide repeat Frisén, J, Lundeberg, J (2011) Interrogation of polyguaninine nucleotide repeat variability in human T-cells by whole genome sequencing of single cells. variability in human T-cells by whole genome sequencing of single cells. Manuscript. Manuscript.

*Authors have contributed equally to the work. *Authors have contributed equally to the work.

All papers are reproduced with permission from the copyright holders. All papers are reproduced with permission from the copyright holders.

Additional publications Additional publications

Lofblom, J., Sandberg, J., Wernerus, H. and Stahl, S. (2007) Evaluation of Lofblom, J., Sandberg, J., Wernerus, H. and Stahl, S. (2007) Evaluation of staphylococcal cell surface display and flow cytometry for postselectional staphylococcal cell surface display and flow cytometry for postselectional characterization of affinity proteins in combinatorial protein engineering characterization of affinity proteins in combinatorial protein engineering applications. Appl Environ Microbiol, 73, 6714-6721. applications. Appl Environ Microbiol, 73, 6714-6721.

vii vii

viii viii

Contents Contents

INTRODUCTION ...... 1 INTRODUCTION ...... 1 1. The cell ...... 1 1. The cell ...... 1 1.1. DNA encodes life ...... 2 1.1. DNA encodes life ...... 2 1.2. RNA links the information...... 2 1.2. RNA links the information...... 2 1.3. Proteins do the work...... 3 1.3. Proteins do the work...... 3 1.4. If it only was this simple ...... 3 1.4. If it only was this simple ...... 3 2. Studying single cells ...... 5 2. Studying single cells ...... 5 2.1. A short recapitulation...... 6 2.1. A short recapitulation...... 6 2.2. Acquiring cells...... 7 2.2. Acquiring cells...... 7 2.2.1. Mechanical micromanipulation ...... 7 2.2.1. Mechanical micromanipulation ...... 7 2.2.2. Laser Capture Microdissection ...... 7 2.2.2. Laser Capture Microdissection ...... 7 2.2.3. Flow cytometry...... 7 2.2.3. Flow cytometry...... 7 2.2.4. Non-contact cell printing ...... 8 2.2.4. Non-contact cell printing ...... 8 2.3. Compartmentalization and manipulation...... 8 2.3. Compartmentalization and manipulation...... 8 2.3.1. Microwells ...... 8 2.3.1. Microwells ...... 8 2.3.2. Patterns...... 9 2.3.2. Patterns...... 9 2.3.3. Droplets...... 9 2.3.3. Droplets...... 9 2.3.4. Cell manipulation in chips ...... 10 2.3.4. Cell manipulation in chips ...... 10 3. Getting enough - complexity reduction and amplification in 3. Getting enough - complexity reduction and amplification in DNA samples ...... 12 DNA samples ...... 12 3.1. Targeted amplification using polymerase ...... 13 3.1. Targeted amplification using polymerase ...... 13 3.1.1. Polymerase Chain Reaction (PCR)...... 13 3.1.1. Polymerase Chain Reaction (PCR)...... 13 3.1.2. Real-time, Digital, Single Cell and Long-range PCR...... 13 3.1.2. Real-time, Digital, Single Cell and Long-range PCR...... 13 3.1.3. Multiplexity limitations of PCR and some ways around them ...... 15 3.1.3. Multiplexity limitations of PCR and some ways around them ...... 15 3.1.4. Isothermal targeted amplification strategies...... 17 3.1.4. Isothermal targeted amplification strategies...... 17 3.2. Using ligation for increased specificity ...... 17 3.2. Using ligation for increased specificity ...... 17 3.2.1. Ligase chain rection (LCR) ...... 17 3.2.1. Ligase chain rection (LCR) ...... 17 3.2.2. Multiplex Ligation Dependent Probe Amplification (MLPA) ...... 18 3.2.2. Multiplex Ligation Dependent Probe Amplification (MLPA) ...... 18 3.2.3. Golden Gate...... 18 3.2.3. Golden Gate...... 18 3.2.4. Tri-nucleotide Threading (TnT)...... 19 3.2.4. Tri-nucleotide Threading (TnT)...... 19 3.3. Selective Circularization ...... 19 3.3. Selective Circularization ...... 19 3.3.1. Padlock probes ...... 19 3.3.1. Padlock probes ...... 19 3.3.2. Molecular Inversion probes (MIP) ...... 20 3.3.2. Molecular Inversion probes (MIP) ...... 20 3.3.3. Selector...... 20 3.3.3. Selector...... 20 3.4. Targeted capture by hybridization ...... 22 3.4. Targeted capture by hybridization ...... 22 3.5. Decisions, decisions. What to choose? ...... 23 3.5. Decisions, decisions. What to choose? ...... 23 3.6. Global amplification of genomic DNA ...... 24 3.6. Global amplification of genomic DNA ...... 24 3.6.1. PCR-based amplification strategies...... 24 3.6.1. PCR-based amplification strategies...... 24 3.6.2. Multiple Displacement Amplification (MDA)...... 24 3.6.2. Multiple Displacement Amplification (MDA)...... 24 3.6.3. Linear amplification techniques...... 25 3.6.3. Linear amplification techniques...... 25

ix ix

3.6.4. Choosing a global amplification strategy ...... 27 3.6.4. Choosing a global amplification strategy ...... 27 4. Massively parallel DNA sequencing...... 28 4. Massively parallel DNA sequencing...... 28 4.1. Deciphering the ...... 29 4.1. Deciphering the genetic code...... 29 4.2. The Massively Parallel Sequencing toolbox...... 30 4.2. The Massively Parallel Sequencing toolbox...... 30 4.2.1. Sequencing chemistry...... 30 4.2.1. Sequencing chemistry...... 30 Sequencing by synthesis ...... 30 Sequencing by synthesis ...... 30 Sequencing by ligation ...... 31 Sequencing by ligation ...... 31 4.2.2. Template amplification ...... 31 4.2.2. Template amplification ...... 31 Emulsion PCR...... 31 Emulsion PCR...... 31 Bridge amplification ...... 32 Bridge amplification ...... 32 Rolling Circle amplification ...... 32 Rolling Circle amplification ...... 32 4.2.3. Shrinking observation window for single molecule visualization...... 32 4.2.3. Shrinking observation window for single molecule visualization...... 32 4.3. Massive sequencing by washing and scanning...... 33 4.3. Massive sequencing by washing and scanning...... 33 4.3.1. 454 – pyrosequencing en masse ...... 33 4.3.1. 454 – pyrosequencing en masse ...... 33 4.3.2. SOLiD – chained ligation in color space ...... 34 4.3.2. SOLiD – chained ligation in color space ...... 34 4.3.3. Illumina – reversible terminators at their best ...... 35 4.3.3. Illumina – reversible terminators at their best ...... 35 4.3.4. Complete Genomics – outsourced sequencing of DNA nano-balls...... 35 4.3.4. Complete Genomics – outsourced sequencing of DNA nano-balls...... 35 4.3.5. Ion Torrent – moving beyond light ...... 36 4.3.5. Ion Torrent – moving beyond light ...... 36 4.3.6. Helicos – single molecule sequencing in its infancy ...... 37 4.3.6. Helicos – single molecule sequencing in its infancy ...... 37 4.4. Massive sequencing in real time...... 40 4.4. Massive sequencing in real time...... 40 4.4.1. Pacific Biosciences – immobilized polymerase in a tiny well ...... 40 4.4.1. Pacific Biosciences – immobilized polymerase in a tiny well ...... 40 4.4.2. Nanopore sequencing – electrical measurement through a pore...... 41 4.4.2. Nanopore sequencing – electrical measurement through a pore...... 41 4.5. Choosing sequencing platform(s) ...... 42 4.5. Choosing sequencing platform(s) ...... 42 4.6. Applications of massive sequencing ...... 44 4.6. Applications of massive sequencing ...... 44 4.6.1. Whole genome sequencing...... 44 4.6.1. Whole genome sequencing...... 44 De novo sequencing ...... 44 De novo sequencing ...... 44 Resequencing ...... 45 Resequencing ...... 45 4.6.2. Targeted resequencing...... 46 4.6.2. Targeted resequencing...... 46 4.6.3. Expression profiling...... 47 4.6.3. Expression profiling...... 47 PRESENT INVESTIGATIONS...... 49 PRESENT INVESTIGATIONS...... 49 5. The papers...... 49 5. The papers...... 49 5.1. Culturing of single stem cells in a microchip (I) ...... 50 5.1. Culturing of single stem cells in a microchip (I) ...... 50 5.2. Bead sorting approaches for improved performance in massively 5.2. Bead sorting approaches for improved performance in massively parallel sequencing (II-IV)...... 51 parallel sequencing (II-IV)...... 51 5.2.1. Library titration and collection of DNA covered beads (II)...... 51 5.2.1. Library titration and collection of DNA covered beads (II)...... 51 5.2.2. Selecting beads with a particular gene fragment (III) ...... 52 5.2.2. Selecting beads with a particular gene fragment (III) ...... 52 5.2.3. Normalizing barcode presence (IV) ...... 53 5.2.3. Normalizing barcode presence (IV) ...... 53 5.3. Investigating polyG variation between individual cells (V)...... 53 5.3. Investigating polyG variation between individual cells (V)...... 53 5.4. Concluding remarks and future perspectives ...... 54 5.4. Concluding remarks and future perspectives ...... 54 ABBREVIATIONS ...... 57 ABBREVIATIONS ...... 57 ACKNOWLEDGEMENTS...... 58 ACKNOWLEDGEMENTS...... 58 REFERENCES ...... 60 REFERENCES ...... 60 x x

xi xi

INTRODUCTION INTRODUCTION

1. The cell 1. The cell

Last year, the breaking news of the creation of the first-ever synthetic cell was published by Last year, the breaking news of the creation of the first-ever synthetic cell was published by Craig Venter, regarded by some as an icon of the world of biotechnology1. By Craig Venter, regarded by some as an icon of the world of biotechnology1. By computationally designing the DNA code, synthesizing it in the lab and finally transplanting computationally designing the DNA code, synthesizing it in the lab and finally transplanting it into a recipient cell to replace its original DNA, Venter’s team created living and it into a recipient cell to replace its original DNA, Venter’s team created living and proliferating bacterial cells containing only man-made genomic material. To verify that the proliferating bacterial cells containing only man-made genomic material. To verify that the bacterial cell was in fact carrying the synthetic DNA, they inserted a few watermarks in the bacterial cell was in fact carrying the synthetic DNA, they inserted a few watermarks in the form of quotations (such as “What I cannot build I cannot understand”- Richard Feynman) form of quotations (such as “What I cannot build I cannot understand”- Richard Feynman) into the DNA of the synthetic cell. Despite the fact that the information within its genome is into the DNA of the synthetic cell. Despite the fact that the information within its genome is essentially identical to that of existing , this is astonishing news and I will leave the essentially identical to that of existing bacteria, this is astonishing news and I will leave the discussion about “playing God” outside the scope of this thesis and instead focus on the discussion about “playing God” outside the scope of this thesis and instead focus on the biological side of this achievement. biological side of this achievement. Imagine building a car from scratch. The work will be much easier if you know what parts Imagine building a car from scratch. The work will be much easier if you know what parts are essential for a car to be able to run, decelerate, turn and stop, when you start building it. are essential for a car to be able to run, decelerate, turn and stop, when you start building it. However, if you don’t, you will certainly learn a lot about the subject along the way. The However, if you don’t, you will certainly learn a lot about the subject along the way. The same is true when making synthetic cells. Being able to build a living cell opens up same is true when making synthetic cells. Being able to build a living cell opens up opportunities to learn more about what parts of the genomic makeup of a cell are really opportunities to learn more about what parts of the genomic makeup of a cell are really necessary for life. The current state of knowledge of cell biology in general, and some necessary for life. The current state of knowledge of cell biology in general, and some essential technologies in particular2-4, have reached the stage where creating a synthetic cell essential technologies in particular2-4, have reached the stage where creating a synthetic cell is possible. This Chapter contains a brief introduction to the basic components and is possible. This Chapter contains a brief introduction to the basic components and processes of cells, describing a path between three types of macromolecules - DNA, RNA processes of cells, describing a path between three types of macromolecules - DNA, RNA and protein - that are essential for most known forms of life. and protein - that are essential for most known forms of life.

The concept of a cell was first recognized in the mid-nineteenth century, when Matthias The concept of a cell was first recognized in the mid-nineteenth century, when Matthias Jacob Schleiden and Theodor Schwann defined the cell as “the basic structural and Jacob Schleiden and Theodor Schwann defined the cell as “the basic structural and functional unit of living ”. This theory was soon extended to include the view that functional unit of living organisms”. This theory was soon extended to include the view that new cells are produced by division of preexisting cells and that tissues and organs are made new cells are produced by division of preexisting cells and that tissues and organs are made up of specialized cells that are created by cell multiplication5. However, it has taken a long up of specialized cells that are created by cell multiplication5. However, it has taken a long time to establish how the life of a cell works and is regulated, and there is still much left to time to establish how the life of a cell works and is regulated, and there is still much left to be determined. be determined.

1.1. DNA encodes life 1.1. DNA encodes life

Genetic information instructing the eukaryotic cell how to grow, divide and behave is stored Genetic information instructing the eukaryotic cell how to grow, divide and behave is stored in digital form as a set of long molecules, in the cell nucleus. The characteristic double- in digital form as a set of long molecules, in the cell nucleus. The characteristic double- helical arrangement of deoxyribonucleic acid (DNA) was deduced by Watson and Crick in helical arrangement of deoxyribonucleic acid (DNA) was deduced by Watson and Crick in 19536. DNA is made up of four different building blocks, nucleotides, each of which is in 19536. DNA is made up of four different building blocks, nucleotides, each of which is in turn are composed of a phosphate group, a sugar molecule and a nitrogenous base. Each turn are composed of a phosphate group, a sugar molecule and a nitrogenous base. Each block can only match – basepair - to one of the other three nucleotides. Basepairing of a block can only match – basepair - to one of the other three nucleotides. Basepairing of a DNA nucleotide to its complementary nucleotide is an essential feature of nature as it DNA nucleotide to its complementary nucleotide is an essential feature of nature as it enables DNA replication, a DNA copying process that is carried out when a cell is to divide enables DNA replication, a DNA copying process that is carried out when a cell is to divide into two. into two. In order for the DNA code to have an impact in the cell, it needs to be translated into a In order for the DNA code to have an impact in the cell, it needs to be translated into a more functional code. This system of multiple layers allows stability and backup, since the more functional code. This system of multiple layers allows stability and backup, since the information source in itself is quite inert and the functional elements (RNA and proteins), information source in itself is quite inert and the functional elements (RNA and proteins), whether they are correctly synthesized or not, are created and degraded in a continuous whether they are correctly synthesized or not, are created and degraded in a continuous flow. This process is carried out in two steps. flow. This process is carried out in two steps.

1.2. RNA links the information 1.2. RNA links the information

With the help of a collection of enzymes, a part of the long DNA code is copied into code With the help of a collection of enzymes, a part of the long DNA code is copied into code consisting of a similar polymer molecule, ribonucleic acid (RNA). This process, which is consisting of a similar polymer molecule, ribonucleic acid (RNA). This process, which is called , is done through the incorporation of RNA nucleotides matching and called transcription, is done through the incorporation of RNA nucleotides matching and base-pairing to the DNA template, and getting attached to each other, one base at a time, base-pairing to the DNA template, and getting attached to each other, one base at a time, to form an RNA molecule. to form an RNA molecule. There are many types of RNA molecules with different functions in the cell, a subject to There are many types of RNA molecules with different functions in the cell, a subject to which I will return later. Messenger RNA (mRNA) carries the information from DNA to which I will return later. Messenger RNA (mRNA) carries the information from DNA to the protein factories of the cells. A few hundred thousand mRNA molecules are present in a the protein factories of the cells. A few hundred thousand mRNA molecules are present in a human cell at any time; however, their internal ratios and origins may vary7. The newly human cell at any time; however, their internal ratios and origins may vary7. The newly synthesized mRNA molecules are transported out of the nucleus into the ribosomes in the synthesized mRNA molecules are transported out of the nucleus into the ribosomes in the cell cytoplasm or bound to the endoplasmic reticulum, for a final transformation of the cell cytoplasm or bound to the endoplasmic reticulum, for a final transformation of the genetic code into the third class of macromolecules – proteins. genetic code into the third class of macromolecules – proteins.

2 2 Julia Sandberg Julia Sandberg

1.3. Proteins do the work 1.3. Proteins do the work

The next step involves the interpretation of the ribonucleic acid code, which consists of four The next step involves the interpretation of the ribonucleic acid code, which consists of four letters, into the protein code of twenty letters in a process called . Different letters, into the protein code of twenty letters in a process called translation. Different combinations of three consecutive nucleotides in the RNA sequence correspond to different combinations of three consecutive nucleotides in the RNA sequence correspond to different amino acids. Thus, by using an mRNA sequence as template, amino acids can be coupled amino acids. Thus, by using an mRNA sequence as template, amino acids can be coupled together in a long chain. Based on the order of the amino acids, proteins are folded into together in a long chain. Based on the order of the amino acids, proteins are folded into well-defined three-dimensional structures, which are fundamental to their particular well-defined three-dimensional structures, which are fundamental to their particular functions. Proteins appear in practically all cellular machineries and processes, with widely functions. Proteins appear in practically all cellular machineries and processes, with widely varying functions, including catalyzing chemical reactions, giving structure and firmness to varying functions, including catalyzing chemical reactions, giving structure and firmness to tissues and organs, or defending the cell against hostile attack. A cell contains proteins in tissues and organs, or defending the cell against hostile attack. A cell contains proteins in varying concentrations; abundant ones may be present in counts of ~10 million in a cell, varying concentrations; abundant ones may be present in counts of ~10 million in a cell, whereas rare proteins can be found at only 1-10 copies per cell8. whereas rare proteins can be found at only 1-10 copies per cell8. The behavior of the cell is largely determined by its protein content. A useful image is to The behavior of the cell is largely determined by its protein content. A useful image is to think of the DNA sequence as a cookbook, where the are the recipes describing to the think of the DNA sequence as a cookbook, where the genes are the recipes describing to the cell how to cook up the different proteins. In an over- simplification one could state that the cell how to cook up the different proteins. In an over- simplification one could state that the more mRNA coding for a particular gene present at the ribosomes, the more of that protein more mRNA coding for a particular gene present at the ribosomes, the more of that protein will be produced. The pattern of protein levels in a cell gives the cell its particular traits and will be produced. The pattern of protein levels in a cell gives the cell its particular traits and behavior. A mutation in a gene causes altered information to be present in the RNA behavior. A mutation in a gene causes altered information to be present in the RNA transcript and thus in the corresponding protein or in the amount of gene product. Genetic transcript and thus in the corresponding protein or in the amount of gene product. Genetic mutations can thus alter the function of the protein, alter the protein amount or even mutations can thus alter the function of the protein, alter the protein amount or even completely block production of the protein. It is easy to see that damage to DNA can have a completely block production of the protein. It is easy to see that damage to DNA can have a considerable effect on the cell. considerable effect on the cell. The flow of information from the nuclear DNA through messenger RNA molecules to The flow of information from the nuclear DNA through messenger RNA molecules to functional proteins determines the cellular phenotype. This model of informational flow, in functional proteins determines the cellular phenotype. This model of informational flow, in combination with DNA replication, is called the central dogma of cell biology and was first combination with DNA replication, is called the central dogma of cell biology and was first proposed in 1958 by Francis Crick, and twelve years later updated in a widely cited article9. proposed in 1958 by Francis Crick, and twelve years later updated in a widely cited article9. To some extent this model still holds true, but since then the picture has become much To some extent this model still holds true, but since then the picture has become much more complex. more complex.

1.4. If it only was this simple 1.4. If it only was this simple

The picture of basic cellular processes has become more complicated since the central The picture of basic cellular processes has become more complicated since the central dogma was first proposed. It is now widely accepted that the regulatory machinery that dogma was first proposed. It is now widely accepted that the regulatory machinery that tightly controls cell behavior acts in response to information input from the surrounding tightly controls cell behavior acts in response to information input from the surrounding environment and neighboring cells. Tight regulation of transcription rate, transcript environment and neighboring cells. Tight regulation of transcription rate, transcript degradation, protein translation and the breakdown of proteins are all necessary for cell degradation, protein translation and the breakdown of proteins are all necessary for cell survival and this regulation plays a crucial role during embryonic development, as different survival and this regulation plays a crucial role during embryonic development, as different developmental states require different protein combinations in the cell. Biological processes developmental states require different protein combinations in the cell. Biological processes are driven by networks rather than genes. Gene networks are clusters of genes with causal are driven by networks rather than genes. Gene networks are clusters of genes with causal interactions so that their state influences the levels of expression of other genes directly or interactions so that their state influences the levels of expression of other genes directly or indirectly, through their RNA and protein expression products10. indirectly, through their RNA and protein expression products10. It is not only the protein-coding regions of the DNA that contain biologically meaningful It is not only the protein-coding regions of the DNA that contain biologically meaningful information. The ∼20,500 protein-coding genes in the human genome are surrounded by information. The ∼20,500 protein-coding genes in the human genome are surrounded by 3 3

large amounts of mostly repetitive DNA which is commonly called non-coding DNA11. In large amounts of mostly repetitive DNA which is commonly called non-coding DNA11. In fact, almost all the DNA sequences in the genome, the majority of which does not encode fact, almost all the DNA sequences in the genome, the majority of which does not encode proteins, is transcribed in the cell and various functions for these sequences have been proteins, is transcribed in the cell and various functions for these sequences have been identified10. RNA molecules can play a more direct role in determining cellular identified10. RNA molecules can play a more direct role in determining cellular characteristics by catalyzing chemical reactions, such as cutting and ligating other RNA characteristics by catalyzing chemical reactions, such as cutting and ligating other RNA molecules11, and small RNA molecules are widely involved in regulation of transcription molecules11, and small RNA molecules are widely involved in regulation of transcription and translation through a series of mechanisms. Moreover, splicing of mRNAs prior to and translation through a series of mechanisms. Moreover, splicing of mRNAs prior to translation into protein is a common cellular process that gives rise to differences in protein translation into protein is a common cellular process that gives rise to differences in protein function. Post-translational protein modification is another level at which protein function function. Post-translational protein modification is another level at which protein function and thus cell behavior is shaped. Finally, chromatin structure affects how accessible the and thus cell behavior is shaped. Finally, chromatin structure affects how accessible the DNA is to the enzymatic machinery needed for transcription, and through this affects the DNA is to the enzymatic machinery needed for transcription, and through this affects the transcription level. Chromatin structure can be inherited over many cell generations and transcription level. Chromatin structure can be inherited over many cell generations and therefore allows cellular memory10. therefore allows cellular memory10. It is obvious that there is no unidirectional flow of information as suggested by Crick et al It is obvious that there is no unidirectional flow of information as suggested by Crick et al in19709. Instead there is a much more complicated picture, which requires information in19709. Instead there is a much more complicated picture, which requires information flow in several dimensions and with the participation of many types of molecules10. Another flow in several dimensions and with the participation of many types of molecules10. Another interesting issue is the cell-to-cell variation in genetic makeup and phenotype. This will be interesting issue is the cell-to-cell variation in genetic makeup and phenotype. This will be discussed in the following chapter. discussed in the following chapter.

4 4 Julia Sandberg Julia Sandberg

2. Studying single cells 2. Studying single cells

Cells in a particular tissue are not identical. Instead, cells that are identical from a genomic Cells in a particular tissue are not identical. Instead, cells that are identical from a genomic point of view may have considerable variation in gene expression profile and protein point of view may have considerable variation in gene expression profile and protein levels12-14, giving rise to a heterogeneous collection of cells with different behavior and levels12-14, giving rise to a heterogeneous collection of cells with different behavior and appearance. Sometimes the genomic DNA sequence varies between neighboring cells appearance. Sometimes the genomic DNA sequence varies between neighboring cells within a tissue, a feature that can be used to study tissue and tumor development15,16. within a tissue, a feature that can be used to study tissue and tumor development15,16. Tissues are commonly made up from several cell types. When studying multiple cells, or a Tissues are commonly made up from several cell types. When studying multiple cells, or a mixture of genomic DNA from several cells, the average characteristics of the bulk are mixture of genomic DNA from several cells, the average characteristics of the bulk are usually obtained and information on rare cells may be lost17. Investigating single cells usually obtained and information on rare cells may be lost17. Investigating single cells renders a more detailed view and thus is preferable. Traditionally, and in many cases renders a more detailed view and thus is preferable. Traditionally, and in many cases currently, studies have been carried out on a collection of cells; in studies aimed at global currently, studies have been carried out on a collection of cells; in studies aimed at global analysis and/or investigating large/average differences between samples this approach is analysis and/or investigating large/average differences between samples this approach is still valid. still valid. Cell-cell variability is an interesting aspect to study in many biological contexts. In some Cell-cell variability is an interesting aspect to study in many biological contexts. In some diseases, a combination of traits is known to give a cell its particular disease phenotype and diseases, a combination of traits is known to give a cell its particular disease phenotype and obtaining information on cell heterogeneity within a tissue is sometimes crucial. For obtaining information on cell heterogeneity within a tissue is sometimes crucial. For instance, it has been argued that knowing more about the co-occurrence of several different instance, it has been argued that knowing more about the co-occurrence of several different mutations within the same cell is essential to fully understand the nature and evolution of mutations within the same cell is essential to fully understand the nature and evolution of cancer18,19. In other cases, a cell population may be made up of two subpopulations with cancer18,19. In other cases, a cell population may be made up of two subpopulations with distinctly varying traits, making the information obtained dependant upon their proportions distinctly varying traits, making the information obtained dependant upon their proportions in a mixed sample. Circulating tumor cells in cancer patients and fetal cells in the blood of in a mixed sample. Circulating tumor cells in cancer patients and fetal cells in the blood of pregnant women are two examples of rare cells that are difficult to study in their natural pregnant women are two examples of rare cells that are difficult to study in their natural habitat and therefore need to be isolated for proper characterization of cancer genotype, habitat and therefore need to be isolated for proper characterization of cancer genotype, and risk of metastasis or fetal aneuploidy, respectively. In microbial analysis, many species and risk of metastasis or fetal aneuploidy, respectively. In microbial analysis, many species cannot be cultured in the laboratory making single cell analysis the only option20. By cannot be cultured in the laboratory making single cell analysis the only option20. By monitoring single cell proliferation over time, the response of a heterogeneous cell monitoring single cell proliferation over time, the response of a heterogeneous cell

5 5

population to various agents can be evaluated21. Also, multiplication and analysis of single population to various agents can be evaluated21. Also, multiplication and analysis of single cell clones is a way to increase the genetic material available for analysis, by having the cell clones is a way to increase the genetic material available for analysis, by having the selected cell pass through an evolutionary bottleneck22. selected cell pass through an evolutionary bottleneck22. In order to get information from single cells one need to isolate, study, and sometimes In order to get information from single cells one need to isolate, study, and sometimes culture them, separately. In this chapter a selection of tools that can be used to single out culture them, separately. In this chapter a selection of tools that can be used to single out and isolate particular cells of interest will be described. and isolate particular cells of interest will be described.

2.1. A short recapitulation 2.1. A short recapitulation

The field of cell biology has always been dependent on technology development. The field of cell biology has always been dependent on technology development. Development of high-resolution microscopy in combination with specific staining methods Development of high-resolution microscopy in combination with specific staining methods in 17th century made it possible to observe tissues and individual cells for the first time23,24. in 17th century made it possible to observe tissues and individual cells for the first time23,24. Nowadays, microscopy is used daily in most biological laboratories, to investigate, e.g., the Nowadays, microscopy is used daily in most biological laboratories, to investigate, e.g., the intracellular localization of particular proteins and even the division of living cells, when intracellular localization of particular proteins and even the division of living cells, when using live-cell imaging in time-lapse microscopy25,26. Methods such as electrophysiology, using live-cell imaging in time-lapse microscopy25,26. Methods such as electrophysiology, where the voltage over a single cell or ion channel is measured by attaching a pipette where the voltage over a single cell or ion channel is measured by attaching a pipette containing a volt-meter to the surface of a cell, give information about one cell at a time27,28. containing a volt-meter to the surface of a cell, give information about one cell at a time27,28. The same is true for flow cytometry, by which the sizes of, and fluorescent signals from, The same is true for flow cytometry, by which the sizes of, and fluorescent signals from, individual cells in a stream can be monitored (cf. below). These methods have been around individual cells in a stream can be monitored (cf. below). These methods have been around for a long time and by their nature give information about the properties of single cells. for a long time and by their nature give information about the properties of single cells. However, when it comes to biochemical approaches for investigating DNA sequences, gene However, when it comes to biochemical approaches for investigating DNA sequences, gene expression levels or the amount of small molecules or protein in a cell, there is a need for expression levels or the amount of small molecules or protein in a cell, there is a need for tools offering higher sensitivity29. tools offering higher sensitivity29. One way to increase sensitivity is through miniaturization. Since human cells are about 10 One way to increase sensitivity is through miniaturization. Since human cells are about 10 µm in size and occupy a volume of less than a fl30, the standard volumes of 50 µl and more, µm in size and occupy a volume of less than a fl30, the standard volumes of 50 µl and more, commonly used in laboratories, can be excessive when studying small numbers of cells. A commonly used in laboratories, can be excessive when studying small numbers of cells. A reduction in reaction volume has been identified as a way to enhance the sensitivity and reduction in reaction volume has been identified as a way to enhance the sensitivity and speed of reactions by allowing shorter diffusion distances and smaller dilution factors31-33. speed of reactions by allowing shorter diffusion distances and smaller dilution factors31-33. Another obvious advantage of smaller reaction volumes is a decrease in reagent costs. In Another obvious advantage of smaller reaction volumes is a decrease in reagent costs. In recent times, the numbers of publications in the fields of genomics, transcriptomics and recent times, the numbers of publications in the fields of genomics, transcriptomics and proteomics describing single cell analysis utilizing miniaturization techniques have proteomics describing single cell analysis utilizing miniaturization techniques have dramatically increased. Transcriptomics and genomics will be described in more detail in dramatically increased. Transcriptomics and genomics will be described in more detail in later chapters, however a few examples will also be presented in this section. later chapters, however a few examples will also be presented in this section. Regardless of the biomolecule of interest, single cell analysis has some general technical Regardless of the biomolecule of interest, single cell analysis has some general technical features, such as picking up single cells (possibly based on certain traits such as cell surface features, such as picking up single cells (possibly based on certain traits such as cell surface markers) and putting these cells in a container where the downstream analysis can be markers) and putting these cells in a container where the downstream analysis can be carried out. There are several readily available platforms for these types of tasks, a handful carried out. There are several readily available platforms for these types of tasks, a handful of which will be presented here. of which will be presented here.

6 6 Julia Sandberg Julia Sandberg

2.2. Acquiring cells 2.2. Acquiring cells

There are many ways to manipulate cells into certain positions. Manual seeding, causing There are many ways to manipulate cells into certain positions. Manual seeding, causing cells to be randomly placed in response to gravity, is the simplest, and standard for some cells to be randomly placed in response to gravity, is the simplest, and standard for some platforms, but when a more precise selection or positioning of cells is wanted, alternative platforms, but when a more precise selection or positioning of cells is wanted, alternative methods may be used. methods may be used.

2.2.1. Mechanical micromanipulation 2.2.1. Mechanical micromanipulation Mechanical micromanipulation is a technique whereby cells can be handled on an inverted Mechanical micromanipulation is a technique whereby cells can be handled on an inverted microscope stage, carrying out extremely small-scale operations through a joystick that microscope stage, carrying out extremely small-scale operations through a joystick that hydraulically operates glass microtools such as transfer pipettes. Mechanical hydraulically operates glass microtools such as transfer pipettes. Mechanical micromanipulation of cells is an accessible but labor-intensive method34 that can be used for micromanipulation of cells is an accessible but labor-intensive method34 that can be used for manipulating cultured cells or cells from liquid samples such as blood, saliva and semen35. manipulating cultured cells or cells from liquid samples such as blood, saliva and semen35. The method first saw clinical use in in-vitro fertilization (IVF) treatment in cases of “slow The method first saw clinical use in in-vitro fertilization (IVF) treatment in cases of “slow sperm”, and it is still commonly used in assisted fertilization36 which was awarded the Nobel sperm”, and it is still commonly used in assisted fertilization36 which was awarded the Nobel price in Physiology or Medicine in 2010. Disadvantages of mechanical micromanipulation price in Physiology or Medicine in 2010. Disadvantages of mechanical micromanipulation include low throughput and the fact that it subjects the cells to mechanical shearing35. include low throughput and the fact that it subjects the cells to mechanical shearing35.

2.2.2. Laser Capture Microdissection 2.2.2. Laser Capture Microdissection Laser capture microdissection (LCM) was invented in 1996 and basically uses a microscope Laser capture microdissection (LCM) was invented in 1996 and basically uses a microscope coupled to a laser constructed to collect cells. Cell(s) are selected based on morphology or coupled to a laser constructed to collect cells. Cell(s) are selected based on morphology or immunohistochemical labeling, liberated from surrounding tissue by laser cutting and then, immunohistochemical labeling, liberated from surrounding tissue by laser cutting and then, either by catapulting or gravitation depending on the system, fall into a well or the lid of a either by catapulting or gravitation depending on the system, fall into a well or the lid of a tube or onto a slide37. A wide range of tissue preparation methods are possible with this tube or onto a slide37. A wide range of tissue preparation methods are possible with this technique, including formalin or alcohol fixation, paraffin embedding and fresh-freezing technique, including formalin or alcohol fixation, paraffin embedding and fresh-freezing and living cells in suspension can also be handled38 and re-culturing of LCM-collected cells and living cells in suspension can also be handled38 and re-culturing of LCM-collected cells is possible. LCM has been used when investigating neurogenesis by comparing genome- is possible. LCM has been used when investigating neurogenesis by comparing genome- wide transcriptional profiles from mature neurons and progenitor cells in the olfactory wide transcriptional profiles from mature neurons and progenitor cells in the olfactory system39, for picking single cells or for cytogenetic40 and genetic41 analysis. system39, for picking single cells or chromosomes for cytogenetic40 and genetic41 analysis. However, if starting with tissue sections, a limitation is the fact that some nuclei may be However, if starting with tissue sections, a limitation is the fact that some nuclei may be sliced which means loss of parts, which will affect the genomic data42. sliced which means loss of chromosome parts, which will affect the genomic data42.

2.2.3. Flow cytometry 2.2.3. Flow cytometry When flow cytometry was invented in the 1970s it radically improved the throughput of When flow cytometry was invented in the 1970s it radically improved the throughput of single cell analysis43,44. The principle of flow cytometry is as follows. Labeled cells are single cell analysis43,44. The principle of flow cytometry is as follows. Labeled cells are hydrodynamically focused in the center of a buffer stream and laser beams interrogate the hydrodynamically focused in the center of a buffer stream and laser beams interrogate the cells as they flow by. Information on cell size, granularity and presence of up to 17 cells as they flow by. Information on cell size, granularity and presence of up to 17 fluorescent dyes45 is acquired separately for each cell, at a high speed of up to ∼200,000 cells fluorescent dyes45 is acquired separately for each cell, at a high speed of up to ∼200,000 cells per second. For collection of interesting cells a Fluorescence Activated Cell Sorter (FACS) per second. For collection of interesting cells a Fluorescence Activated Cell Sorter (FACS) uses vibrations to break the sample stream into droplets and those containing interesting uses vibrations to break the sample stream into droplets and those containing interesting cells receive an electrical charge and are later diverted by an electrostatic deflection system cells receive an electrical charge and are later diverted by an electrostatic deflection system into a desired spot46. With current FACS instruments, sorting can be done at a speed of into a desired spot46. With current FACS instruments, sorting can be done at a speed of

7 7

70,000 cells/second. However, if cell viability is of the essence slower speeds are 70,000 cells/second. However, if cell viability is of the essence slower speeds are recommended. Flow cytometry is commonly used to study cell surface markers47, recommended. Flow cytometry is commonly used to study cell surface markers47, intracellular protein abundance29, binding of peptide libraries displayed on cells48 and for intracellular protein abundance29, binding of peptide libraries displayed on cells48 and for investigating ploidy status of tumor cells16. Double FACS sorting has been used as a means investigating ploidy status of tumor cells16. Double FACS sorting has been used as a means to purify single marine cyanobacterial cells free from contaminating DNA in order to to purify single marine cyanobacterial cells free from contaminating DNA in order to sequence the genome49. Despite the requirement for sophisticated instrumentation, FACS sequence the genome49. Despite the requirement for sophisticated instrumentation, FACS instruments are found in most research institutions and hospitals and the technique is instruments are found in most research institutions and hospitals and the technique is routinely used to sort and analyze cells from leukemia patients50. routinely used to sort and analyze cells from leukemia patients50.

2.2.4. Non-contact cell printing 2.2.4. Non-contact cell printing An ordinary desktop computer printer can be modified to print cells onto “bio-papers” or An ordinary desktop computer printer can be modified to print cells onto “bio-papers” or membranes, at high resolution, without touching the surface51. Non-contact printing (aka membranes, at high resolution, without touching the surface51. Non-contact printing (aka inkjet printing) is preferable when cells are sensitive to touch and inkjet printing has been inkjet printing) is preferable when cells are sensitive to touch and inkjet printing has been carried out using a number of different cell types including mouse motor neurons, with high carried out using a number of different cell types including mouse motor neurons, with high viabilities of ∼90%51-53. Cell printing is a method still in development and its final viabilities of ∼90%51-53. Cell printing is a method still in development and its final application is not yet clear, however it is considered a promising tool for printing three- application is not yet clear, however it is considered a promising tool for printing three- dimensional structures containing several cell types. Applications would include building dimensional structures containing several cell types. Applications would include building organs for transplantation or constructing cellular model systems to study cell-cell organs for transplantation or constructing cellular model systems to study cell-cell interactions52. interactions52. Oil-submerged inkjet printing has been used to avoid cell dehydration when printing single- Oil-submerged inkjet printing has been used to avoid cell dehydration when printing single- cell arrays on a glass slide53 and a cell printing device guided by a CCD camera has been cell arrays on a glass slide53 and a cell printing device guided by a CCD camera has been used to print single unlabeled HeLa cells in microwells and on slides with 75% viability54. used to print single unlabeled HeLa cells in microwells and on slides with 75% viability54. An obvious disadvantage of cell printing techniques is that cell selection needs to be done in An obvious disadvantage of cell printing techniques is that cell selection needs to be done in a separate step, prior to printing; however the high throughput and flexibility of printing a separate step, prior to printing; however the high throughput and flexibility of printing pattern are advantages of this method. pattern are advantages of this method.

2.3. Compartmentalization and manipulation 2.3. Compartmentalization and manipulation

There are a number of ways to compartmentalize the cells of choice, all of which have the There are a number of ways to compartmentalize the cells of choice, all of which have the advantage of requiring smaller volumes than the 50 µl -1 ml volumes commonly used in the advantage of requiring smaller volumes than the 50 µl -1 ml volumes commonly used in the lab. lab.

2.3.1. Microwells 2.3.1. Microwells Dividing cells between separate wells on a chip is an efficient way of analyzing many cells in Dividing cells between separate wells on a chip is an efficient way of analyzing many cells in parallel. There are a number of microwell devices on the market and many more “home- parallel. There are a number of microwell devices on the market and many more “home- made” alternatives; their varying features make them useful for a range of applications. made” alternatives; their varying features make them useful for a range of applications. Well sizes vary from just enough to hold one bacterium (2-3 µm Ø, circular)55 or cell Well sizes vary from just enough to hold one bacterium (2-3 µm Ø, circular)55 or animal cell (10 µm Ø, circular)56, up to those sufficiently large to enable cell culturing for days or weeks (10 µm Ø, circular)56, up to those sufficiently large to enable cell culturing for days or weeks (150-650 µm sides, at well bottom)57,58. (150-650 µm sides, at well bottom)57,58.

8 8 Julia Sandberg Julia Sandberg

Glass bottoms enable microscopy analysis and high-content screening, whereas chips with Glass bottoms enable microscopy analysis and high-content screening, whereas chips with bottoms made from less transparent materials such as silicon have an impact on the types of bottoms made from less transparent materials such as silicon have an impact on the types of analysis that are possible21,57. Microwell chips can also be made by cutting wells in polyester analysis that are possible21,57. Microwell chips can also be made by cutting wells in polyester using a laser58, or molding chips from polyethyleneglycol (PEG)59,60 and using a laser58, or molding chips from polyethyleneglycol (PEG)59,60 and poly(dimethylsiloxane) (PDMS)61. Adherent cell culturing in wells may require coating with poly(dimethylsiloxane) (PDMS)61. Adherent cell culturing in wells may require coating with extracellular matrix components such as fibronectin for adhesion57, and chemical treatment extracellular matrix components such as fibronectin for adhesion57, and chemical treatment has been used to enhance seeding efficiency by placing repelling surfaces on the well walls has been used to enhance seeding efficiency by placing repelling surfaces on the well walls and attracting surfaces at the bottom of the wells62. and attracting surfaces at the bottom of the wells62. Analysis of single cells in microwells has been utilized to screen hybridomas for cells Analysis of single cells in microwells has been utilized to screen hybridomas for cells producing antigen-specific antibodies63,64 as well as cytokine release from single T-cells59. producing antigen-specific antibodies63,64 as well as cytokine release from single T-cells59. Due to their wide range of characteristics, microwell devices may be used for many types of Due to their wide range of characteristics, microwell devices may be used for many types of cell deposition techniques such as flow cytometry, manual seeding and LCM. cell deposition techniques such as flow cytometry, manual seeding and LCM.

2.3.2. Patterns 2.3.2. Patterns Patterning of glass slides can be utilized both for culturing purposes and for Patterning of glass slides can be utilized both for culturing purposes and for compartmentalization of cells and reaction buffers for direct analysis of single cells. For compartmentalization of cells and reaction buffers for direct analysis of single cells. For culturing of single seeded cells, patterns of cell adhesion molecules such as collagen can be culturing of single seeded cells, patterns of cell adhesion molecules such as collagen can be printed using a modified computer printer, resulting in a dense pattern of 2-500 µm spots65. printed using a modified computer printer, resulting in a dense pattern of 2-500 µm spots65. Alternatively, hydrophilic patches created by photolithography, may be used together with Alternatively, hydrophilic patches created by photolithography, may be used together with a gas permeable oil layer66, both facilitating both separate attachment and growth of cells in a gas permeable oil layer66, both facilitating both separate attachment and growth of cells in an ordered manner. Patterning of extracellular matrix spots has been used to investigate the an ordered manner. Patterning of extracellular matrix spots has been used to investigate the impact of cell shape on apoptosis rate in fibroblasts67. impact of cell shape on apoptosis rate in fibroblasts67. Enzymatic reactions at the single-cell level can be done using slides with hydrophilic Enzymatic reactions at the single-cell level can be done using slides with hydrophilic reaction sites surrounded by hydrophobic circles that enable encapsulation of aqueous reaction sites surrounded by hydrophobic circles that enable encapsulation of aqueous reaction solution under a lid of oil preventing evaporation during heating. DNA reaction solution under a lid of oil preventing evaporation during heating. DNA methylation pattern analysis and DNA fingerprinting in combination with cytogenetic methylation pattern analysis and DNA fingerprinting in combination with cytogenetic analysis of single cells have been performed using the Ampligrid platform (BC, CA, USA), analysis of single cells have been performed using the Ampligrid platform (BC, CA, USA), which allows for 48 reactions, each in an area 1 µm in diameter on one slide40,68. which allows for 48 reactions, each in an area 1 µm in diameter on one slide40,68. Depending on the pattern, these tools may be suitable for cell seeding by hand, LCM, flow Depending on the pattern, these tools may be suitable for cell seeding by hand, LCM, flow cytometry, or inkjet printing and also allow easy interrogation of cell seeding by microscopy cytometry, or inkjet printing and also allow easy interrogation of cell seeding by microscopy or scanning68. or scanning68.

2.3.3. Droplets 2.3.3. Droplets Microreactors in aqueous phase can be encapsulated in oil, as in a vinaigrette, to allow Microreactors in aqueous phase can be encapsulated in oil, as in a vinaigrette, to allow separation of single cells or molecules. Emulsions may be created by dropwise addition of separation of single cells or molecules. Emulsions may be created by dropwise addition of aqueous phase to oil phase while stirring69, vigorous shaking of the two70 or, with the aqueous phase to oil phase while stirring69, vigorous shaking of the two70 or, with the advantage of a more homogeneous emulsion, through merging of channels containing oil advantage of a more homogeneous emulsion, through merging of channels containing oil and aqueous phase in a microfluidic chip73. The latter requires more sophisticated and aqueous phase in a microfluidic chip73. The latter requires more sophisticated laboratory apparatus, but it opens up additional possibilities, such as stirring, splitting and laboratory apparatus, but it opens up additional possibilities, such as stirring, splitting and merging, for the manipulation of droplets in the chip; it also facilitates fluorescence analysis merging, for the manipulation of droplets in the chip; it also facilitates fluorescence analysis and droplet sorting71. With some additional work, sorting of emulsion droplets in a double and droplet sorting71. With some additional work, sorting of emulsion droplets in a double emulsion can be done using an ordinary FACS instrument72 and this approach has been emulsion can be done using an ordinary FACS instrument72 and this approach has been utilized in enzyme library screening experiments73. utilized in enzyme library screening experiments73.

9 9

A number of different cell types have been successfully studied in emulsion with acceptable A number of different cell types have been successfully studied in emulsion with acceptable or good viabilities74. Drug-screening experiment has been performed in color-coded or good viabilities74. Drug-screening experiment has been performed in color-coded droplets75 and enzymatic signal amplification in droplets has been used for analysis of rare droplets75 and enzymatic signal amplification in droplets has been used for analysis of rare cell-surface markers76. Single-cell multiplexed mutation detection has been done by flow cell-surface markers76. Single-cell multiplexed mutation detection has been done by flow cytometry analysis of beads collected from an agarose-in-oil emulsion77. cytometry analysis of beads collected from an agarose-in-oil emulsion77.

2.3.4. Cell manipulation in chips 2.3.4. Cell manipulation in chips All the functions needed for cell manipulation, chemical experimentation and analysis can All the functions needed for cell manipulation, chemical experimentation and analysis can be incorporated into a single microfabricated device; this is a cost effective approach which be incorporated into a single microfabricated device; this is a cost effective approach which allows rapid screening and has the great advantages of minimal sample dilution and allows rapid screening and has the great advantages of minimal sample dilution and possibilities for automation33. Fluids are directed through an intricate system of channels, possibilities for automation33. Fluids are directed through an intricate system of channels, inlets and outlets, and cells and biomolecules can be fully or partly isolated in microvalves inlets and outlets, and cells and biomolecules can be fully or partly isolated in microvalves (volumes ranging from picoliters to tens of nanoliters) or physical traps. A characteristic (volumes ranging from picoliters to tens of nanoliters) or physical traps. A characteristic attribute of microfluidics is laminar flow, which makes diffusion virtually the only attribute of microfluidics is laminar flow, which makes diffusion virtually the only spontaneously-occurring mixing process. Hence, if mixing is needed, this issue needs to be spontaneously-occurring mixing process. Hence, if mixing is needed, this issue needs to be approached by incorporating active mixing modules into the chip. Initially, chip fabrication approached by incorporating active mixing modules into the chip. Initially, chip fabrication involved silicon and glass, using methods borrowed from the electronics industry. Today, involved silicon and glass, using methods borrowed from the electronics industry. Today, stainless steel and polymers such as PMMA and PDMS are also used78. Cells can be lysed stainless steel and polymers such as PMMA and PDMS are also used78. Cells can be lysed on-chip through electroporation, chemical breaking of the cell membrane, shearing against on-chip through electroporation, chemical breaking of the cell membrane, shearing against sharp nanostructures, acoustic lysis or laser illumination79. Purification of nucleic acids can sharp nanostructures, acoustic lysis or laser illumination79. Purification of nucleic acids can be done using functionalized silica beads for DNA and polyT beads for RNA purification be done using functionalized silica beads for DNA and polyT beads for RNA purification whereas protein metabolites are commonly analyzed immediately on the chip using e.g. whereas protein metabolites are commonly analyzed immediately on the chip using e.g. electrophoresis, chromatography or microbiosensors33,80. Cell culturing in microfluidics electrophoresis, chromatography or microbiosensors33,80. Cell culturing in microfluidics chips is customary, and in a recent paper Kang and coworkers used a multi-layer chips is customary, and in a recent paper Kang and coworkers used a multi-layer microfluidic array chip to culture and replate embryoid bodies by simply flipping the chip microfluidic array chip to culture and replate embryoid bodies by simply flipping the chip upside-down60. upside-down60. Various types of experiments can be performed in microfluidic chips. By capturing a single Various types of experiments can be performed in microfluidic chips. By capturing a single beating heart cell it was possible to carry out real-time measurement of ionic and beating heart cell it was possible to carry out real-time measurement of ionic and metabolomic flux, while also observing the cell by microscopy80. For nucleic acid analysis, metabolomic flux, while also observing the cell by microscopy80. For nucleic acid analysis, Quake and coworkers used a microchip to carry out multiplex PCR of 16S rRNA from Quake and coworkers used a microchip to carry out multiplex PCR of 16S rRNA from single bacterial cells for digital analysis of fluorescent signals and further sequencing single bacterial cells for digital analysis of fluorescent signals and further sequencing analysis81. They then developed a similar chip approach for efficient whole-genome analysis81. They then developed a similar chip approach for efficient whole-genome amplification in nanoliter volumes and subsequent phylogeny analysis of single bacterial amplification in nanoliter volumes and subsequent phylogeny analysis of single bacterial cells from different microbial communities20,82. A haplotype-resolved human genome was cells from different microbial communities20,82. A haplotype-resolved human genome was also recently obtained using microchips for separate amplification and sequencing of every also recently obtained using microchips for separate amplification and sequencing of every chromosome from a single cell83. chromosome from a single cell83. A number of cell-manipulating tools can be incorporated into microfluidic systems. The use A number of cell-manipulating tools can be incorporated into microfluidic systems. The use of optical fields to pick and position particles was pioneered in 1960 by Ashkin and of optical fields to pick and position particles was pioneered in 1960 by Ashkin and coworkers who developed the first optical tweezers device84. Today it is possible to trap, coworkers who developed the first optical tweezers device84. Today it is possible to trap, rotate and sort multiple cells based on size or refractive index, by creating a pattern of rotate and sort multiple cells based on size or refractive index, by creating a pattern of varying optical intensity over the sample area using lasers37,84,85. Optical methods have been varying optical intensity over the sample area using lasers37,84,85. Optical methods have been used both in open chips and in enclosed microchambers, however sample heating is a used both in open chips and in enclosed microchambers, however sample heating is a disadvantage84. Electrical fields may also be used to trap and manipulate even charge- disadvantage84. Electrical fields may also be used to trap and manipulate even charge- neutral cells, a technique called dielectrophoresis, invented in the 1950s86. The resolution is neutral cells, a technique called dielectrophoresis, invented in the 1950s86. The resolution is

10 10 Julia Sandberg Julia Sandberg low in comparison with optical methods and the system is non-flexible because the low in comparison with optical methods and the system is non-flexible because the electrodes are placed in fixed positions. Disadvantages for cell culturing include fluid electrodes are placed in fixed positions. Disadvantages for cell culturing include fluid heating and physical stress on cells87. In a recent study, dielectrophoresis was used to heating and physical stress on cells87. In a recent study, dielectrophoresis was used to capture and pull down one bacterial cell into each well of a chip, followed by lysis to release capture and pull down one bacterial cell into each well of a chip, followed by lysis to release the cellular contents55, and the technique can also be used for particle focusing in a micro- the cellular contents55, and the technique can also be used for particle focusing in a micro- chip flow cytometer88. Acoustic waves can be used for manipulating and capturing cells in chip flow cytometer88. Acoustic waves can be used for manipulating and capturing cells in cages in chips89, and with high reported cell viabilities the technique has been used to cause cages in chips89, and with high reported cell viabilities the technique has been used to cause controlled cell aggregation in the center of microwells for cell-cell interaction studies90. controlled cell aggregation in the center of microwells for cell-cell interaction studies90. Sound waves have also been utilized for cell depletion of a blood sample to enable analysis Sound waves have also been utilized for cell depletion of a blood sample to enable analysis of the plasma fraction for prostate-specific-antigen detection using an antibody array91. of the plasma fraction for prostate-specific-antigen detection using an antibody array91. This chapter has described tools for picking and placing cells for culturing and/or analysis, This chapter has described tools for picking and placing cells for culturing and/or analysis, and applications of some of these methods are demonstrated in Present Investigations. In and applications of some of these methods are demonstrated in Present Investigations. In nucleic acids research, which is the focus of this thesis, the next step often involves some nucleic acids research, which is the focus of this thesis, the next step often involves some kind of amplification, and this will be described in the following chapter. kind of amplification, and this will be described in the following chapter.

11 11

3. Getting enough - complexity 3. Getting enough - complexity reduction and amplification in DNA reduction and amplification in DNA samples samples

This chapter is about ways of getting what you want from a DNA sample. In many This chapter is about ways of getting what you want from a DNA sample. In many biomedical settings work often involves analysis and comparison of highly variable regions biomedical settings work often involves analysis and comparison of highly variable regions or positions in the genome, or comparison of candidate genes between healthy and diseased or positions in the genome, or comparison of candidate genes between healthy and diseased individuals. In these cases it is more efficient to direct the focus of the sequencing towards a individuals. In these cases it is more efficient to direct the focus of the sequencing towards a small fraction of the genome rather than to sequence it all. With this approach, the small fraction of the genome rather than to sequence it all. With this approach, the available sequencing capacity can be used to obtain high coverage of the selected genes, or available sequencing capacity can be used to obtain high coverage of the selected genes, or to increase the number of assayed individuals. There are a number of approaches for to increase the number of assayed individuals. There are a number of approaches for decreasing the complexity of the genome by enriching or amplifying targets of interest from decreasing the complexity of the genome by enriching or amplifying targets of interest from the three billion basepairs that are present in the haploid human genome and that make up the three billion basepairs that are present in the haploid human genome and that make up the background of the sample. The method of choice will depend on the number of genes the background of the sample. The method of choice will depend on the number of genes or regions as well as the number of samples included in the study. or regions as well as the number of samples included in the study. When starting from scarce material such as single, or a few, cells, amplification in some When starting from scarce material such as single, or a few, cells, amplification in some form is usually necessary either as a stand-alone procedure or as a pre-amplification step. form is usually necessary either as a stand-alone procedure or as a pre-amplification step. Retaining the relative quantities of components while amplifying is important for obtaining Retaining the relative quantities of components while amplifying is important for obtaining accurate information on, e.g., copy number variation in whole genome analysis, and also accurate information on, e.g., copy number variation in whole genome analysis, and also because a skewed representation of the original sample would impact on the amount of because a skewed representation of the original sample would impact on the amount of sequence data needed for coverage of the complete genome. Amplifying a DNA sample sequence data needed for coverage of the complete genome. Amplifying a DNA sample while introducing minimal bias and errors is therefore of the essence. Here, the most while introducing minimal bias and errors is therefore of the essence. Here, the most important methods for target selection, as well as for selective and global amplification of important methods for target selection, as well as for selective and global amplification of DNA samples, will be described. DNA samples, will be described. 12 12 Julia Sandberg Julia Sandberg

3.1. Targeted amplification using polymerase 3.1. Targeted amplification using polymerase

DNA polymerases are important enzymes in the cell that catalyze the incorporation of DNA polymerases are important enzymes in the cell that catalyze the incorporation of nucleotides into a growing strand of DNA, while using another strand as a template. These nucleotides into a growing strand of DNA, while using another strand as a template. These enzymes are widely used in different amplification strategies in laboratories worldwide in enzymes are widely used in different amplification strategies in laboratories worldwide in order to obtain multiple copies of a given target region. Different polymerases exhibit order to obtain multiple copies of a given target region. Different polymerases exhibit different features, for example some have proofreading activity - i.e. an incorrectly different features, for example some have proofreading activity - i.e. an incorrectly incorporated nucleotide can be removed from the end of the extended strand - and some incorporated nucleotide can be removed from the end of the extended strand - and some are thermo-stable, meaning that they can work at elevated temperatures92. are thermo-stable, meaning that they can work at elevated temperatures92.

3.1.1. Polymerase Chain Reaction (PCR) 3.1.1. Polymerase Chain Reaction (PCR) It all started with the Polymerase Chain Reaction (PCR), that made it possible for the first It all started with the Polymerase Chain Reaction (PCR), that made it possible for the first time to selectively amplify a piece of DNA in vitro, thus allowing the remarkable reduction time to selectively amplify a piece of DNA in vitro, thus allowing the remarkable reduction of complexity from the 3 billion basepairs, that make up the human genome, down to the of complexity from the 3 billion basepairs, that make up the human genome, down to the size of an ordinary PCR product i.e. a few hundred basepairs. PCR is now used routinely in size of an ordinary PCR product i.e. a few hundred basepairs. PCR is now used routinely in almost every biology lab and has enabled countless new techniques and findings since it was almost every biology lab and has enabled countless new techniques and findings since it was first invented. first invented. The history of the invention is interesting in itself. In 1971, Khorana and coworkers The history of the invention is interesting in itself. In 1971, Khorana and coworkers described most of the features that are essential to what we know today as PCR, but the described most of the features that are essential to what we know today as PCR, but the article lacked experimental evidence and received very limited attention93. Fourteen years article lacked experimental evidence and received very limited attention93. Fourteen years later, the method described by Kleppe et al was once more published, this time, in a more later, the method described by Kleppe et al was once more published, this time, in a more extensive way, by Kary Mullis and colleagues, who later received the Nobel price for this extensive way, by Kary Mullis and colleagues, who later received the Nobel price for this work94,95. When a thermostable polymerase was introduced, the potential of the reaction work94,95. When a thermostable polymerase was introduced, the potential of the reaction really escalated, since higher temperature for annealing and extension markedly increased really escalated, since higher temperature for annealing and extension markedly increased the specificity, sensitivity and yield of the reaction96. Since then, new and improved the specificity, sensitivity and yield of the reaction96. Since then, new and improved polymerases with markedly decreased error rates have been isolated or engineered97 and polymerases with markedly decreased error rates have been isolated or engineered97 and PCR has become an indispensible tool in most biological sciences. PCR has become an indispensible tool in most biological sciences.

3.1.2. Real-time, Digital, Single Cell and Long-range PCR 3.1.2. Real-time, Digital, Single Cell and Long-range PCR Due to the exponential nature of the amplification in PCR, the amount of PCR product Due to the exponential nature of the amplification in PCR, the amount of PCR product obtained at the end of the reaction does not reflect the initial number of target molecules in obtained at the end of the reaction does not reflect the initial number of target molecules in a sample, hence end point measurements can only distinguish a positive sample from a a sample, hence end point measurements can only distinguish a positive sample from a negative one. For quantification purposes, real-time PCR may be used where a reporter negative one. For quantification purposes, real-time PCR may be used where a reporter generates a fluorescent signal that indicates the exponential accumulation of product as the generates a fluorescent signal that indicates the exponential accumulation of product as the reaction proceeds. For absolute quantification a standard curve for that particular target reaction proceeds. For absolute quantification a standard curve for that particular target can be used. Reporters may be fluorescent dyes such as SYBR green, which fluoresce when can be used. Reporters may be fluorescent dyes such as SYBR green, which fluoresce when bound to dsDNA98, with the inherent limitation that primer-dimers also give fluorescent bound to dsDNA98, with the inherent limitation that primer-dimers also give fluorescent signals. For assays based on dsDNA-binding fluorophores, melting curve analysis is a useful signals. For assays based on dsDNA-binding fluorophores, melting curve analysis is a useful way to obtain information on product length and hence primer-dimer formation in the way to obtain information on product length and hence primer-dimer formation in the PCR reaction. PCR reaction. Another option is to use labeled oligonucleotides that show a change in fluorescent signal Another option is to use labeled oligonucleotides that show a change in fluorescent signal upon target sequence binding7. TaqMan probes are the most common detection probes upon target sequence binding7. TaqMan probes are the most common detection probes used in real-time PCR today. They are designed to hybridize within the target region and used in real-time PCR today. They are designed to hybridize within the target region and are dually labeled with a 5’ fluorophore and its quencher at the 3’ end. When the TaqMan are dually labeled with a 5’ fluorophore and its quencher at the 3’ end. When the TaqMan

13 13

probe is hybridized to a template molecule it forms a suitable substrate for the 5’ probe is hybridized to a template molecule it forms a suitable substrate for the 5’ exonuclease activity of the Taq polymerase, which hydrolyses the probe thus releasing the exonuclease activity of the Taq polymerase, which hydrolyses the probe thus releasing the fluorophore from the quencher, and so a fluorescent signal is emitted. The process repeats fluorophore from the quencher, and so a fluorescent signal is emitted. The process repeats in every cycle hence the fluorescence signal reflects the amount of specific product created in every cycle hence the fluorescence signal reflects the amount of specific product created in the reaction99,100. Another example is Molecular Beacons, which are dually labeled with a in the reaction99,100. Another example is Molecular Beacons, which are dually labeled with a fluorophore and its quencher at the two ends; when not bound to the target DNA, the fluorophore and its quencher at the two ends; when not bound to the target DNA, the molecule forms a stem-loop structure, which causes quenching of the fluorophore. molecule forms a stem-loop structure, which causes quenching of the fluorophore. However, when a Molecular Beacon is bound to its target sequence, quencher and However, when a Molecular Beacon is bound to its target sequence, quencher and fluorophore are sufficiently far away from each other for fluorescence to occur and an fluorophore are sufficiently far away from each other for fluorescence to occur and an amplification signal is emitted101. The advantage of specific detection probes is that target amplification signal is emitted101. The advantage of specific detection probes is that target product formation may be monitored instead of the total mass of dsDNA in the reaction. product formation may be monitored instead of the total mass of dsDNA in the reaction. Another advantage of using probes for detection is the possibility of amplifying multiple Another advantage of using probes for detection is the possibility of amplifying multiple targets in one sample using color-coded probes7. targets in one sample using color-coded probes7. Real-time PCR on cDNA has for a long time been the 'gold standard' in expression Real-time PCR on cDNA has for a long time been the 'gold standard' in expression profiling, where quantification of a particular transcript is based on the number of cycles profiling, where quantification of a particular transcript is based on the number of cycles required for the dye to reach a threshold. The method has also been used for single cell required for the dye to reach a threshold. The method has also been used for single cell gene expression analysis by immobilization of cDNA onto reusable beads102. Real time gene expression analysis by immobilization of cDNA onto reusable beads102. Real time PCR has also proven useful for quantification of libraries for high-throughput sequencing, PCR has also proven useful for quantification of libraries for high-throughput sequencing, where knowledge of the number of amplifiable molecules is important before a loading a where knowledge of the number of amplifiable molecules is important before a loading a sample onto a sequencer103,104. sample onto a sequencer103,104. Digital PCR, as implied by the name, relies on binary positive/negative results at the end- Digital PCR, as implied by the name, relies on binary positive/negative results at the end- point of the PCR and the reaction is normally done on a massive scale with up to one point of the PCR and the reaction is normally done on a massive scale with up to one million reactions in parallel105. By diluting a sample so that each chamber in a microfluidic million reactions in parallel105. By diluting a sample so that each chamber in a microfluidic chip is expected to contain one DNA molecule or less, an absolute readout of the number of chip is expected to contain one DNA molecule or less, an absolute readout of the number of amplifiable fragments in a sample per unit volume, as indicated by a fluorescent signal, is amplifiable fragments in a sample per unit volume, as indicated by a fluorescent signal, is obtained106. Analogous to real-time PCR, digital PCR can be used for expression analysis of obtained106. Analogous to real-time PCR, digital PCR can be used for expression analysis of transcription factors expressed at low levels in single cells107, quantification of rare SNPs transcription factors expressed at low levels in single cells107, quantification of rare SNPs and chromosomal copy number aberrations105 and library titration for high-throughput and chromosomal copy number aberrations105 and library titration for high-throughput sequencing108. Digital PCR may be slightly more sensitive and better at quantifying low sequencing108. Digital PCR may be slightly more sensitive and better at quantifying low copy number templates than real-time PCR105; however for many applications digital PCR copy number templates than real-time PCR105; however for many applications digital PCR is overkill, and the inaccessibility of the advanced instrumentation needed for digital PCR is overkill, and the inaccessibility of the advanced instrumentation needed for digital PCR as compared to real-time PCR makes the latter the method of choice for general users as compared to real-time PCR makes the latter the method of choice for general users today. today. PCR from single cells was first reported in the 1980s and was used early on for genetic PCR from single cells was first reported in the 1980s and was used early on for genetic analysis of sperm cells109 and to investigate rearrangements of the immune cells that give analysis of sperm cells109 and to investigate rearrangements of the immune cells that give rise to a diverse immune system110. Increased sensitivity is commonly obtained through rise to a diverse immune system110. Increased sensitivity is commonly obtained through decreased reaction volumes, careful optimization of reaction conditions and increased decreased reaction volumes, careful optimization of reaction conditions and increased number of reaction cycles111. In the field of prenatal diagnostics the method has played an number of reaction cycles111. In the field of prenatal diagnostics the method has played an important role in pre-implantation genetic analysis of a number of X-linked diseases and important role in pre-implantation genetic analysis of a number of X-linked diseases and single gene disorders (e.g. cystic fibrosis), and the approach is in clinical use in several single gene disorders (e.g. cystic fibrosis), and the approach is in clinical use in several countries today112. Miniaturization has been frequently utilized in recent years to achieve countries today112. Miniaturization has been frequently utilized in recent years to achieve increased sensitivity in amplification from minute amounts of DNA75,77,81. The level of increased sensitivity in amplification from minute amounts of DNA75,77,81. The level of amplification and the high sensitivity of the analysis that often follows amplification of amplification and the high sensitivity of the analysis that often follows amplification of minute amounts of nucleic acids increases the risk of amplifying background contaminants minute amounts of nucleic acids increases the risk of amplifying background contaminants from sample and reagents. Approaches to address this issue include UV treatment and/or from sample and reagents. Approaches to address this issue include UV treatment and/or filtering of reagents, and detergent and hypochlorite cleaning of tools and glassware113. filtering of reagents, and detergent and hypochlorite cleaning of tools and glassware113.

14 14 Julia Sandberg Julia Sandberg

Long-range PCR (LR-PCR) was, as the name suggests, developed to increase the length of Long-range PCR (LR-PCR) was, as the name suggests, developed to increase the length of the amplified fragment, which was previously limited to about 5 kb, by using a two- the amplified fragment, which was previously limited to about 5 kb, by using a two- polymerase system to obtain optimal levels of processive polymerase activity while adding polymerase system to obtain optimal levels of processive polymerase activity while adding proofreading114. Increased primer lengths, lower pH and the use of additives such as proofreading114. Increased primer lengths, lower pH and the use of additives such as glycerol that would enable shorter denaturation times and prevent nicking are strategies glycerol that would enable shorter denaturation times and prevent nicking are strategies commonly applied to obtain amplification of fragments up to 42 kb long92,114. commonly applied to obtain amplification of fragments up to 42 kb long92,114. Fragmentation of LR-amplicons followed by library preparation for massive parallel Fragmentation of LR-amplicons followed by library preparation for massive parallel sequencing (MPS) is commonly used115-119 and applications of this as well as common sequencing (MPS) is commonly used115-119 and applications of this as well as common amplicon sequencing by MPS will be discussed in more detail in Chapter 4.6.2 (Targeted amplicon sequencing by MPS will be discussed in more detail in Chapter 4.6.2 (Targeted resequencing). resequencing).

3.1.3. Multiplexity limitations of PCR and some ways around 3.1.3. Multiplexity limitations of PCR and some ways around them them For most applications, increased throughput and multiplexing is desirable in order to For most applications, increased throughput and multiplexing is desirable in order to reduce turn-around time, save money and minimize the amount of input material required. reduce turn-around time, save money and minimize the amount of input material required. PCR amplification of single loci or fragments is generally robust; however when it comes to PCR amplification of single loci or fragments is generally robust; however when it comes to multiplexing, limitations arise. Several factors, such as primer-dimer formation, unspecific multiplexing, limitations arise. Several factors, such as primer-dimer formation, unspecific product formation and amplification bias, limit the degree to which multiplexing can be product formation and amplification bias, limit the degree to which multiplexing can be used. used. As the number of targets amplified in parallel increases, so does the risk that two primers As the number of targets amplified in parallel increases, so does the risk that two primers will cause primer-dimer formation by priming each other through 3’-end complementarity. will cause primer-dimer formation by priming each other through 3’-end complementarity. Primer-dimers are problematic since due to their short length they are efficiently amplified Primer-dimers are problematic since due to their short length they are efficiently amplified and because of the competitive nature of the PCR reaction they therefore threaten to and because of the competitive nature of the PCR reaction they therefore threaten to outcompete the target for polymerase and reagents, thus taking over the whole outcompete the target for polymerase and reagents, thus taking over the whole reaction120,121. Several protocols addressing primer-dimer formation have been published reaction120,121. Several protocols addressing primer-dimer formation have been published that point towards careful primer design, titration of primer concentrations, optimization of that point towards careful primer design, titration of primer concentrations, optimization of cycling conditions120, increased amount of template122 and lower number of amplification cycling conditions120, increased amount of template122 and lower number of amplification cycles as ways to avoid primer-dimer formation123. cycles as ways to avoid primer-dimer formation123. Another difficulty of multiplex PCR, which is harder to tackle, is unspecific product Another difficulty of multiplex PCR, which is harder to tackle, is unspecific product formation. When targeting several regions in a PCR reaction and hence increasing the formation. When targeting several regions in a PCR reaction and hence increasing the number of primers in the tube, the chance that two primers will hybridize to unintended number of primers in the tube, the chance that two primers will hybridize to unintended sequences thus creating nonspecific amplification products, increases steeply124,125, making sequences thus creating nonspecific amplification products, increases steeply124,125, making primer design of a highly multiplex reaction a walk in a mine field. Attempts to address this primer design of a highly multiplex reaction a walk in a mine field. Attempts to address this issue have been made by introducing nested PCR, where two sets of primers are used in issue have been made by introducing nested PCR, where two sets of primers are used in two consecutive PCR amplification steps. Initial amplification of a region containing the two consecutive PCR amplification steps. Initial amplification of a region containing the target area is followed by amplification using a second pair of primers, which hybridize to target area is followed by amplification using a second pair of primers, which hybridize to the first PCR product inside the first primer pair and thus amplify the target sequence126. the first PCR product inside the first primer pair and thus amplify the target sequence126. Nested PCR has been an effective approach for lower degrees of multiplexity127 but for Nested PCR has been an effective approach for lower degrees of multiplexity127 but for higher levels of multiplexity the above-mentioned issues remain unresolved. higher levels of multiplexity the above-mentioned issues remain unresolved. Yet another issue of multiplex PCR is a result of the competitive nature of PCR Yet another issue of multiplex PCR is a result of the competitive nature of PCR amplification, which causes unequal amplification of different target regions and even loss amplification, which causes unequal amplification of different target regions and even loss of some targets when the degree of multiplexity is increased. These differences may occur of some targets when the degree of multiplexity is increased. These differences may occur due to different primer hybridization efficiencies, product lengths or target sequence due to different primer hybridization efficiencies, product lengths or target sequence secondary structures. Due to the exponential nature of the amplification, differences in the secondary structures. Due to the exponential nature of the amplification, differences in the yields of unequally amplified fragments, as well as errors introduced in early cycles, are yields of unequally amplified fragments, as well as errors introduced in early cycles, are

15 15

propagated in every cycle. To some extent thorough optimization of reaction conditions propagated in every cycle. To some extent thorough optimization of reaction conditions and primer design can discourage artifacts from occurring and PCR reactions of up to and primer design can discourage artifacts from occurring and PCR reactions of up to 1000-plex have been reported128. However, for most target regions this would probably not 1000-plex have been reported128. However, for most target regions this would probably not be possible, hence multiplex PCR of more than 10-20 loci is rare128,129. be possible, hence multiplex PCR of more than 10-20 loci is rare128,129. Several approaches have been employed to specifically amplify many targets in parallel by Several approaches have been employed to specifically amplify many targets in parallel by PCR. Introduction of universal amplification tags is a common theme in amplification since PCR. Introduction of universal amplification tags is a common theme in amplification since it enables amplification of an entire complex library using a single primer pair. The it enables amplification of an entire complex library using a single primer pair. The approach can be used in a two-step PCR where each primer carries 5’ universal tail approach can be used in a two-step PCR where each primer carries 5’ universal tail sequences, which are introduced in the first amplification cycles, after which amplification is sequences, which are introduced in the first amplification cycles, after which amplification is carried out using one universal primer pair only. This method was first described in 1996, carried out using one universal primer pair only. This method was first described in 1996, when it was demonstrated by 26-plex amplification130. In a similar two-stage PCR protocol, when it was demonstrated by 26-plex amplification130. In a similar two-stage PCR protocol, reduced primer-dimer formation was approached by designing the universal primer pair so reduced primer-dimer formation was approached by designing the universal primer pair so that potential primer-dimers were folded into an unamplifiable stem-loop structure121. The that potential primer-dimers were folded into an unamplifiable stem-loop structure121. The concept of introducing universal amplification handles is a common theme in amplification concept of introducing universal amplification handles is a common theme in amplification and with the advent of high-throughput sequencing, the concept has become a routine and with the advent of high-throughput sequencing, the concept has become a routine method for amplification. method for amplification. Another approach involves complete or partial physical separation of PCR reactions to Another approach involves complete or partial physical separation of PCR reactions to avoid competition and cross-reaction of multiple primers in the same reaction. avoid competition and cross-reaction of multiple primers in the same reaction. Immobilizing PCR primers on a solid support in solid-phase PCR is a way to limit primer- Immobilizing PCR primers on a solid support in solid-phase PCR is a way to limit primer- dimer formation and background amplification by physical isolation of each primer pair in dimer formation and background amplification by physical isolation of each primer pair in a gel matrix or on the surface of beads or chips131. The efficiency of solid-phase PCR is a gel matrix or on the surface of beads or chips131. The efficiency of solid-phase PCR is usually lower than that of in-solution reactions. A combination of solid-phase PCR and usually lower than that of in-solution reactions. A combination of solid-phase PCR and “universal tailing” has been used, in which introduction of a general PCR primer pair in “universal tailing” has been used, in which introduction of a general PCR primer pair in the initial cycles on a solid support is followed by a more efficient in-solution PCR the initial cycles on a solid support is followed by a more efficient in-solution PCR amplification using a universal primer pair. The method is named MegaPlex PCR and it amplification using a universal primer pair. The method is named MegaPlex PCR and it has been used on 75 target regions in parallel132. Bridge amplification, where universal has been used on 75 target regions in parallel132. Bridge amplification, where universal primers are immobilized on a chip surface is used in preparation for sequencing by Illumina primers are immobilized on a chip surface is used in preparation for sequencing by Illumina and will be described in more detail in chapter 4. PCR reactions may also be separated and will be described in more detail in chapter 4. PCR reactions may also be separated physically in the chambers of a microchip. The Access Array platform, which was recently physically in the chambers of a microchip. The Access Array platform, which was recently released by Fluidigm, enables 48 separate amplifications of choice (PCR or LR-PCR) to be released by Fluidigm, enables 48 separate amplifications of choice (PCR or LR-PCR) to be carried out in nanoliter volumes133. carried out in nanoliter volumes133. Yet another method of carrying out massive numbers of single-locus PCRs by physical Yet another method of carrying out massive numbers of single-locus PCRs by physical separation is in an emulsion, which was briefly described in Chapter 2. Through separation is in an emulsion, which was briefly described in Chapter 2. Through compartmentalization of each primer pair into separate emulsion droplets, competition compartmentalization of each primer pair into separate emulsion droplets, competition between different primer pairs is prohibited and each PCR reaction is allowed to saturate between different primer pairs is prohibited and each PCR reaction is allowed to saturate without competition134. Primer pairs for each target are separately introduced into droplets, without competition134. Primer pairs for each target are separately introduced into droplets, which are then pooled to generate a primer library after which gDNA is introduced into the which are then pooled to generate a primer library after which gDNA is introduced into the droplets for emulsion PCR135. The approach has been demonstrated by amplification of droplets for emulsion PCR135. The approach has been demonstrated by amplification of 4000 targets with a specificity of 84%136 and in a test for future parents to determine 4000 targets with a specificity of 84%136 and in a test for future parents to determine carriership of a number of recessive genetic disorders137. Emulsions with up to 20,000 carriership of a number of recessive genetic disorders137. Emulsions with up to 20,000 different primer pairs designed towards the customer’s regions of interest are now available different primer pairs designed towards the customer’s regions of interest are now available for purchase through RainDance Technologies138. Emulsion PCR is used for sample for purchase through RainDance Technologies138. Emulsion PCR is used for sample preparation in several massive sequencing techniques and will be described in more detail preparation in several massive sequencing techniques and will be described in more detail in chapter 4. in chapter 4.

16 16 Julia Sandberg Julia Sandberg

3.1.4. Isothermal targeted amplification strategies 3.1.4. Isothermal targeted amplification strategies In living cells, DNA helicase plays an important role in DNA replication by separating two In living cells, DNA helicase plays an important role in DNA replication by separating two complementary DNA strands so that the polymerase can start copying the template. In complementary DNA strands so that the polymerase can start copying the template. In vitro, the enzyme can be used in a reaction analogous to PCR but where heat denaturation vitro, the enzyme can be used in a reaction analogous to PCR but where heat denaturation has been replaced by enzymatic DNA separation. This method is called Helicase has been replaced by enzymatic DNA separation. This method is called Helicase Dependent Amplification (HDA) and it has the great advantage of being isothermal, which Dependent Amplification (HDA) and it has the great advantage of being isothermal, which makes the demands on equipment low. HDA is therefore a natural choice in the field of makes the demands on equipment low. HDA is therefore a natural choice in the field of development of point-of-care diagnostics devices139. Detection and readout in HDA can be development of point-of-care diagnostics devices139. Detection and readout in HDA can be done in real-time as described for real-time PCR, or by using lateral flow or done in real-time as described for real-time PCR, or by using lateral flow or biosensors139,140. By using a thermostable helicase, the reaction can be done at elevated biosensors139,140. By using a thermostable helicase, the reaction can be done at elevated temperature (thermophilic helicase dependent reaction, tHDA) with increased specificity as temperature (thermophilic helicase dependent reaction, tHDA) with increased specificity as a result140. tHDA protocols have been developed for detection of several pathogens a result140. tHDA protocols have been developed for detection of several pathogens including Herpes virus139, gonorrhea141 and tuberculosis140. The product size of ordinary including Herpes virus139, gonorrhea141 and tuberculosis140. The product size of ordinary tHDA is limited to 400 bp, presumably because of re-annealing of separated strands before tHDA is limited to 400 bp, presumably because of re-annealing of separated strands before the primers have been extended, or because the helicase interferes with the activity of the the primers have been extended, or because the helicase interferes with the activity of the polymerase by knocking it off the template. By fusing a polymerase and a helicase together polymerase by knocking it off the template. By fusing a polymerase and a helicase together into a helimerase, amplification of products of up to 2.3kb has been demonstrated142. into a helimerase, amplification of products of up to 2.3kb has been demonstrated142.

3.2. Using ligation for increased specificity 3.2. Using ligation for increased specificity

DNA ligases are enzymes that can join oligonucleotides that are hybridized adjacent to DNA ligases are enzymes that can join oligonucleotides that are hybridized adjacent to each other on a DNA strand by forming a phosphodiester bond, provided that the each other on a DNA strand by forming a phosphodiester bond, provided that the nucleotides at the junction are perfectly matched to their target143. By introducing a nucleotides at the junction are perfectly matched to their target143. By introducing a selective ligation step into a polymerase-based amplification, an additional level of selective ligation step into a polymerase-based amplification, an additional level of specificity can be obtained144. specificity can be obtained144.

3.2.1. Ligase chain rection (LCR) 3.2.1. Ligase chain rection (LCR) In a reaction similar to PCR, thermostable ligases can be used in amplification approaches In a reaction similar to PCR, thermostable ligases can be used in amplification approaches with the aim of reducing sample complexity while amplifying one or a few target regions. with the aim of reducing sample complexity while amplifying one or a few target regions. Here, amplification is done by ligases joining two primers that are hybridized adjacent to Here, amplification is done by ligases joining two primers that are hybridized adjacent to each other on the same strand of the target DNA. By placing the last base of the upstream each other on the same strand of the target DNA. By placing the last base of the upstream primer over a variable position, LCR allows discrimination of SNPs, since the enzyme will primer over a variable position, LCR allows discrimination of SNPs, since the enzyme will only join perfectly matched primers. In total, four primers are used, hybridizing to either only join perfectly matched primers. In total, four primers are used, hybridizing to either strand in pairs, and as more template is created in each cycle the amplification is strand in pairs, and as more template is created in each cycle the amplification is exponential145. Although LCR is known to be highly specific, background ligation still exponential145. Although LCR is known to be highly specific, background ligation still occurs. To address this issue Gap-LCR was developed. As the name suggests, a gap occurs. To address this issue Gap-LCR was developed. As the name suggests, a gap between the two primers of each pair is introduced and so a template-dependent extension between the two primers of each pair is introduced and so a template-dependent extension is required to fill in the gap, before the primers can be joined. By restricting the number of is required to fill in the gap, before the primers can be joined. By restricting the number of different nucleotides added in the reaction, specificity can be increased further. Readout has different nucleotides added in the reaction, specificity can be increased further. Readout has been achieved using gel electrophoresis and labeled primers or nucleotides; for real-time been achieved using gel electrophoresis and labeled primers or nucleotides; for real-time monitoring, fluorescent dyes or labeled probes can be used. LCR and variants thereof, most monitoring, fluorescent dyes or labeled probes can be used. LCR and variants thereof, most prominently real-time Gap-LCR, have proven useful for genotyping pools of genomic prominently real-time Gap-LCR, have proven useful for genotyping pools of genomic 17 17

DNA, and are reportedly good at evaluating the frequency of rare SNPs in a cell DNA, and are reportedly good at evaluating the frequency of rare SNPs in a cell population146. A variant of the method has recently been used to detect and quantify a population146. A variant of the method has recently been used to detect and quantify a disease-causing mutation in a genomic DNA pool147. The relatively low amplification disease-causing mutation in a genomic DNA pool147. The relatively low amplification efficiency of ligation-based amplification may be addressed by an initial PCR step to efficiency of ligation-based amplification may be addressed by an initial PCR step to achieve greater sensitivity in the assay145. achieve greater sensitivity in the assay145. A linear version of this technique, ligase detection reaction (LDR), can be done using one A linear version of this technique, ligase detection reaction (LDR), can be done using one pair of primers hybridizing adjacent to each other on the target strand145. In a recently pair of primers hybridizing adjacent to each other on the target strand145. In a recently published article, a PCR/LDR approach was used to detect rare point mutations in a published article, a PCR/LDR approach was used to detect rare point mutations in a human DNA sample. An initial PCR step was followed by LDR with biotinylated 3’ human DNA sample. An initial PCR step was followed by LDR with biotinylated 3’ primers and 5’ primers containing a tag sequence. Detection was done by binding of primers and 5’ primers containing a tag sequence. Detection was done by binding of products to tag-complementary spots and visualizing them by streptavidin-coupled silver products to tag-complementary spots and visualizing them by streptavidin-coupled silver staining148. Generally, ligation is less efficient at each step than polymerization, and the staining148. Generally, ligation is less efficient at each step than polymerization, and the amount of material obtained from ligation-based strategies is a limitation of these methods. amount of material obtained from ligation-based strategies is a limitation of these methods.

3.2.2. Multiplex Ligation Dependent Probe Amplification 3.2.2. Multiplex Ligation Dependent Probe Amplification (MLPA) (MLPA) A way to overcome the low product output of ligase-dependent amplification strategies is to A way to overcome the low product output of ligase-dependent amplification strategies is to follow them up with a PCR using a universal primer pair. Ligation of one unique primer follow them up with a PCR using a universal primer pair. Ligation of one unique primer pair per target, primers that hybridize adjacent to each other, in a linear fashion, allows pair per target, primers that hybridize adjacent to each other, in a linear fashion, allows introduction of universal amplification primers into the 5’ ends of ligated primer products. introduction of universal amplification primers into the 5’ ends of ligated primer products. These universal primers enable multiplex PCR amplification of successfully ligated probes. These universal primers enable multiplex PCR amplification of successfully ligated probes. Readout is commonly done by capillary electrophoresis and the unique length of each Readout is commonly done by capillary electrophoresis and the unique length of each amplified fragment identifies the targeted regions. The reaction is called multiplex ligation amplified fragment identifies the targeted regions. The reaction is called multiplex ligation dependent probe amplification (MLPA) and its strength is that by using one universal PCR dependent probe amplification (MLPA) and its strength is that by using one universal PCR primer pair the competitive nature of PCR in multiplex amplification is decreased primer pair the competitive nature of PCR in multiplex amplification is decreased substantially. Since the method is based on ligation, it can also be used for detection of substantially. Since the method is based on ligation, it can also be used for detection of single nucleotide changes. Another benefit is the linearity of the initial ligation step, which single nucleotide changes. Another benefit is the linearity of the initial ligation step, which enables comparative quantification of the different target regions. This makes MLPA useful enables comparative quantification of the different target regions. This makes MLPA useful for analysis of copy number variation, as demonstrated for up to 50 genomic DNA for analysis of copy number variation, as demonstrated for up to 50 genomic DNA sequences from as little as 20 ng of gDNA149, it and has also been used to detect sequences from as little as 20 ng of gDNA149, it and has also been used to detect chromosomal abnormalities in melanoma150. chromosomal abnormalities in melanoma150.

3.2.3. Golden Gate 3.2.3. Golden Gate A very common method for multiplex amplification for genotyping purposes, although not A very common method for multiplex amplification for genotyping purposes, although not for massive sequencing, is the Golden Gate assay. This assay is based on Allele-Specific for massive sequencing, is the Golden Gate assay. This assay is based on Allele-Specific Extension151, in which two primers that hybridize upstream of, and end in, each SNP Extension151, in which two primers that hybridize upstream of, and end in, each SNP position, are used, matching the two different genotypes at that locus. The two allele- position, are used, matching the two different genotypes at that locus. The two allele- specific primers contain a unique amplification handle in their 5’ end and the perfectly- specific primers contain a unique amplification handle in their 5’ end and the perfectly- matched primer will be extended and then ligated onto a third annealed primer. The third matched primer will be extended and then ligated onto a third annealed primer. The third probe is a locus-specific oligonucleotide since it contains a SNP-specific address tag and a probe is a locus-specific oligonucleotide since it contains a SNP-specific address tag and a universal amplification handle in its 3’ end. PCR is carried out with three universal primers, universal amplification handle in its 3’ end. PCR is carried out with three universal primers, two color-coded forward primers matching one SNP-specific sequence in the Allele-specific two color-coded forward primers matching one SNP-specific sequence in the Allele-specific oligonucleotide handle each, and one reverse primer that is complementary to the universal oligonucleotide handle each, and one reverse primer that is complementary to the universal handle at the locus-specific oligonucleotide. Analysis is done by hybridizing the sample to a handle at the locus-specific oligonucleotide. Analysis is done by hybridizing the sample to a random array of beads with different address tags that are tucked into wells on a chip152. random array of beads with different address tags that are tucked into wells on a chip152. 18 18 Julia Sandberg Julia Sandberg

Determination of the identity (i.e. what SNP does the address tag code for) of each bead in Determination of the identity (i.e. what SNP does the address tag code for) of each bead in the array, is carried out by repeated hybridization of fluorescently labeled probes, imaging the array, is carried out by repeated hybridization of fluorescently labeled probes, imaging and probe removal, to obtain a combinatorial code representing the identity of each and probe removal, to obtain a combinatorial code representing the identity of each bead153. Decoding is followed by hybridization of amplified sample for genotype bead153. Decoding is followed by hybridization of amplified sample for genotype interpretation. One color indicates a sample from a homozygote, and dual coloring signifies interpretation. One color indicates a sample from a homozygote, and dual coloring signifies a heterozygous SNP152. Random bead arrays are available through Illumina and presently a heterozygous SNP152. Random bead arrays are available through Illumina and presently supports 96-3072 SNPs per sample and 12 to 96 samples in parallel154. supports 96-3072 SNPs per sample and 12 to 96 samples in parallel154.

3.2.4. Tri-nucleotide Threading (TnT) 3.2.4. Tri-nucleotide Threading (TnT) Tri-nucleotide threading (TnT) is an amplification method that was developed at the Royal Tri-nucleotide threading (TnT) is an amplification method that was developed at the Royal Institute of Technology (KTH) in order to increase the degree of multiplexity that is Institute of Technology (KTH) in order to increase the degree of multiplexity that is possible155. In an initial linear amplification step, polymerase extension bridges a gap possible155. In an initial linear amplification step, polymerase extension bridges a gap between two primers that are hybridized onto the same strand and ligation seals the two between two primers that are hybridized onto the same strand and ligation seals the two probes, of which one is biotinylated. Cleanup, using streptavidin-coupled paramagnetic probes, of which one is biotinylated. Cleanup, using streptavidin-coupled paramagnetic beads, is followed by universal amplification to amplify ligated threads in parallel. By beads, is followed by universal amplification to amplify ligated threads in parallel. By designing the primer locations so that each gap contains only three types of nucleotides and designing the primer locations so that each gap contains only three types of nucleotides and adding only these to the reaction, the specificity of the assay is increased further. The adding only these to the reaction, the specificity of the assay is increased further. The technique has been used for expression profiling156 and genotyping from 1200 cells155 using technique has been used for expression profiling156 and genotyping from 1200 cells155 using chip readout; for analysis of short tandem repeats by capillary electrophoresis157; and for chip readout; for analysis of short tandem repeats by capillary electrophoresis157; and for determination of allele frequencies of two sets of 147 SNPs in a cohort of hundreds of determination of allele frequencies of two sets of 147 SNPs in a cohort of hundreds of individuals by Roche/454 sequencing158. individuals by Roche/454 sequencing158.

3.3. Selective Circularization 3.3. Selective Circularization 3.3.1. Padlock probes 3.3.1. Padlock probes By joining two specific ligation primers using a linker that allows primer folding, a ligation- By joining two specific ligation primers using a linker that allows primer folding, a ligation- based circularization reaction can be obtained with only one probe per target sequence. based circularization reaction can be obtained with only one probe per target sequence. This ingenious design decreases the number of potential byproducts that can be formed This ingenious design decreases the number of potential byproducts that can be formed with increasing multiplexity. Furthermore, as the ends are joined together while hybridized with increasing multiplexity. Furthermore, as the ends are joined together while hybridized to the target sequence, the circular probe is interlocked with its target, hence the name to the target sequence, the circular probe is interlocked with its target, hence the name Padlock probes. This allows rigorous washing schemes to be used in situ while containing Padlock probes. This allows rigorous washing schemes to be used in situ while containing the probes124,159. Padlock probes were first described in the literature in 1994 and since then the probes124,159. Padlock probes were first described in the literature in 1994 and since then the technique has been refined and utilized in various genotyping and mRNA studies160. the technique has been refined and utilized in various genotyping and mRNA studies160. Using exonuclease degradation of linear non-ligated Padlock probes, cross-reactivity can be Using exonuclease degradation of linear non-ligated Padlock probes, cross-reactivity can be kept to a minimum159. Amplification of the circularized probes can been done by PCR160 or kept to a minimum159. Amplification of the circularized probes can been done by PCR160 or Rolling Circle Amplification (RCA). The latter method takes advantage of the high Rolling Circle Amplification (RCA). The latter method takes advantage of the high processivity and strand displacement activity of the enzyme phi29 to create a very long processivity and strand displacement activity of the enzyme phi29 to create a very long DNA product containing a number of copies of the circularized fragment. Amplification DNA product containing a number of copies of the circularized fragment. Amplification products may be fluorescently labeled for in situ detection or chip readout160-162 or labeled products may be fluorescently labeled for in situ detection or chip readout160-162 or labeled with magnetic nano-sized beads and detected based on response to a magnetic field in a with magnetic nano-sized beads and detected based on response to a magnetic field in a portable device163. Padlock probes have been used for in situ detection and genotyping of portable device163. Padlock probes have been used for in situ detection and genotyping of mitochondrial DNA164 and mRNA molecules162 as well as for analysis of infection in mitochondrial DNA164 and mRNA molecules162 as well as for analysis of infection in cattle165 and more recently for CpG island methylation analysis161. cattle165 and more recently for CpG island methylation analysis161.

19 19

3.3.2. Molecular Inversion probes (MIP) 3.3.2. Molecular Inversion probes (MIP) By introducing a gap between adjacent primer ends, which required polymerase fill in By introducing a gap between adjacent primer ends, which required polymerase fill in before ligation, Padlock probes were transformed into Molecular Inversion probes (MIP)143. before ligation, Padlock probes were transformed into Molecular Inversion probes (MIP)143. Release of circularized products from the template by cleavage at a uracil site in the linker Release of circularized products from the template by cleavage at a uracil site in the linker region is followed by PCR amplification with primers that hybridize to the linker part of the region is followed by PCR amplification with primers that hybridize to the linker part of the circularized fragments. MIPs have been used to capture large numbers of target regions in circularized fragments. MIPs have been used to capture large numbers of target regions in parallel prior to analysis on microarrays143 or by massively parallel sequencing166,167 for parallel prior to analysis on microarrays143 or by massively parallel sequencing166,167 for analysis of SNPs in genomic DNA, and identification of RNA editing sites168 or alternative analysis of SNPs in genomic DNA, and identification of RNA editing sites168 or alternative splicing events169. The methodology has received extensive attention due to its impressive splicing events169. The methodology has received extensive attention due to its impressive multiplexing capabilities, high specificity and low input DNA requirements167. The initial multiplexing capabilities, high specificity and low input DNA requirements167. The initial number of 1000 targets in one tube143 has since been greatly exceeded through capture of number of 1000 targets in one tube143 has since been greatly exceeded through capture of 10,000 and 50,080 target regions166,167. Direct sequencing of MIPs by the introduction of 10,000 and 50,080 target regions166,167. Direct sequencing of MIPs by the introduction of Illumina handles in the final PCR step has been carried out in a slightly modified protocol Illumina handles in the final PCR step has been carried out in a slightly modified protocol requiring sub-micrograms of input material. High sensitivity was demonstrated by ~98% of requiring sub-micrograms of input material. High sensitivity was demonstrated by ~98% of all targets being captured per reaction167. In a recently optimized protocol called Long- all targets being captured per reaction167. In a recently optimized protocol called Long- Padlock, the maximum length of the capture regions was increased from the 191 bp Padlock, the maximum length of the capture regions was increased from the 191 bp achievable in the ordinary MIP approach to 546 bp. As is commonly seen for enzymatic achievable in the ordinary MIP approach to 546 bp. As is commonly seen for enzymatic amplification strategies, the limitation of MIPs appears to lie in bias of capture170. amplification strategies, the limitation of MIPs appears to lie in bias of capture170. In methods where complex oligonucleotide libraries are required cost is usually a limiting In methods where complex oligonucleotide libraries are required cost is usually a limiting factor. For several liquid-phase capture methods such as MIPs, this problem has been factor. For several liquid-phase capture methods such as MIPs, this problem has been solved by using probe release from a programmable microarray to obtain a vast number of solved by using probe release from a programmable microarray to obtain a vast number of oligonucleotide probes cheaply166. Microarrays from vendors such as Affymetrix, Agilent oligonucleotide probes cheaply166. Microarrays from vendors such as Affymetrix, Agilent and NimbleGen contain up to several million of customizable primers between 25 and 200 and NimbleGen contain up to several million of customizable primers between 25 and 200 bp long171. bp long171.

3.3.3. Selector 3.3.3. Selector Selector is a highly specific method, based on circularization, in which the actual targets, Selector is a highly specific method, based on circularization, in which the actual targets, rather than the probes, are circularized. Starting with digestion of gDNA, using one or rather than the probes, are circularized. Starting with digestion of gDNA, using one or several restriction enzymes, selector probes are hybridized to specific ends of the target; several restriction enzymes, selector probes are hybridized to specific ends of the target; fragments of interest are then folded and circularized through a process of gap fill and fragments of interest are then folded and circularized through a process of gap fill and ligation. As for previously described strategies involving circularization, uncircularized or ligation. As for previously described strategies involving circularization, uncircularized or erroneously ligated linear fragments are digested by exonuclease172 before the final erroneously ligated linear fragments are digested by exonuclease172 before the final amplification step, which may be PCR with primers specific for the common part of the amplification step, which may be PCR with primers specific for the common part of the circularized fragments172,173 or RCA174,175. Recently, the latter protocol has been shown to circularized fragments172,173 or RCA174,175. Recently, the latter protocol has been shown to reduce the enrichment bias in an experiment when capturing 501 exons using 1883 selector reduce the enrichment bias in an experiment when capturing 501 exons using 1883 selector probes in a single reaction175. probes in a single reaction175.

20 20 Targeted amplification strategies Targeted amplification strategies Tailed PCR Nested PCR Tailed PCR Nested PCR

Universal PCR Universal PCR

Ligase Chain Reaction Gapped Ligase Chain Reaction Multiplex Ligation Dependent Ligase Chain Reaction Gapped Ligase Chain Reaction Multiplex Ligation Dependent Probe Amplification Probe Amplification

2 primer pairs 2 primer pairs 2 primer pairs 2 primer pairs 1 primer pair 1 primer pair X X X X X X

Universal PCR Universal PCR Exponential Exponential Exponential Exponential amplification amplification amplification amplification

Tri-nucleotide Threading Golden Gate Tri-nucleotide Threading Golden Gate

Allele-specific Allele-specific Linear X Linear X amplification extension amplification extension X X X X

Linear Linear Universal PCR amplification Universal PCR amplification

Universal PCR Universal PCR

Padlock probes Molecular Inversion probes Selector Padlock probes Molecular Inversion probes Selector

digested DNA digested DNA

cleavage cleavage

Cleavage Cleavage

21 21

3.4. Targeted capture by hybridization 3.4. Targeted capture by hybridization

Assays based solely on hybridization offer by far the highest multiplexity level and can be Assays based solely on hybridization offer by far the highest multiplexity level and can be done either on a solid support or in solution. Custom-designed planar arrays can be used to done either on a solid support or in solution. Custom-designed planar arrays can be used to enrich for regions of interest from a linker-ligated DNA library, through a simple but enrich for regions of interest from a linker-ligated DNA library, through a simple but protracted hybridization – washing – release by heat protocol, which is followed by protracted hybridization – washing – release by heat protocol, which is followed by universal PCR amplification of enriched library material, prior to sequencing. This was first universal PCR amplification of enriched library material, prior to sequencing. This was first demonstrated in 2007176,177. One way of increasing the number of samples enriched in demonstrated in 2007176,177. One way of increasing the number of samples enriched in array-based strategies is by barcoding individual DNA libraries prior to enrichment178,179; array-based strategies is by barcoding individual DNA libraries prior to enrichment178,179; alternatively, arrays may be reused177. alternatively, arrays may be reused177. Capture may also be done in solution using biotinylated DNA or RNA180 oligonucleotides Capture may also be done in solution using biotinylated DNA or RNA180 oligonucleotides and binding to streptavidin-covered beads, target elution and universal PCR amplification. and binding to streptavidin-covered beads, target elution and universal PCR amplification. Solution-based capture has lower reagent costs, a lower DNA requirement (i.e. 10-20 µg Solution-based capture has lower reagent costs, a lower DNA requirement (i.e. 10-20 µg gDNA for an array171,177 0.5-3 µg in solution181) and is easily automated181. Hence, in- gDNA for an array171,177 0.5-3 µg in solution181) and is easily automated181. Hence, in- solution capture is currently the method of choice and several vendors, such as Agilent and solution capture is currently the method of choice and several vendors, such as Agilent and NimbleGen, provide kits targeting whole exomes, or parts thereof, and also offer custom NimbleGen, provide kits targeting whole exomes, or parts thereof, and also offer custom made kits. Uniformity is generally higher in hybrid capture methods than in amplification made kits. Uniformity is generally higher in hybrid capture methods than in amplification based methods, and this affects the number of reads required to obtain sufficient based methods, and this affects the number of reads required to obtain sufficient information on all target regions. However the specificity is low in comparison with that of information on all target regions. However the specificity is low in comparison with that of amplification-based methods180,181. amplification-based methods180,181. The method allows identification of insertions, deletions and fusion genes as demonstrated The method allows identification of insertions, deletions and fusion genes as demonstrated in a recent paper focused on leukemia-associated genes182 and capture has been done on in a recent paper focused on leukemia-associated genes182 and capture has been done on formalin fixed paraffin embedded material183. Since the advent of the technique, many formalin fixed paraffin embedded material183. Since the advent of the technique, many interesting studies using targeted resequencing by hybridization capture have emerged and interesting studies using targeted resequencing by hybridization capture have emerged and these will be described in more detail in Chapter 4.6.2 (Targeted resequencing). these will be described in more detail in Chapter 4.6.2 (Targeted resequencing).

22 22 Julia Sandberg Julia Sandberg

&'()"*+,-.$/)0 &'()"*+,-.$/)0

102/0%,"%3+4"()-)'+.)0.-)-$"#% 102/0%,"%3+4"()-)'+.)0.-)-$"#%

5%+6#4/$"#%+,-.$/)0 7))-'+,-.$/)0 5%+6#4/$"#%+,-.$/)0 7))-'+,-.$/)0

!0-*+,-.$/)0 !0-*+,-.$/)0

9-6:+;+<4/$0 9-6:+;+<4/$0

!"#$"% !"#$"% 1$)0.$-8"*"%0 1$)0.$-8"*"%0 7=.4">'+;+102/0%,0 7=.4">'+;+102/0%,0

3.5. Decisions, decisions. What to choose? 3.5. Decisions, decisions. What to choose?

Several studies comparing selected enrichment strategies have been carried out Several studies comparing selected enrichment strategies have been carried out recently135,137,178. Generally, enzymatic amplification strategies have higher specificity and recently135,137,178. Generally, enzymatic amplification strategies have higher specificity and lower uniformity than hybrid-capture methods. It appears that those regions which are GC lower uniformity than hybrid-capture methods. It appears that those regions which are GC rich and/or occur frequently in repetitive regions of the genome are difficult to capture rich and/or occur frequently in repetitive regions of the genome are difficult to capture whichever of the methods is used; they may also tend to get lost in downstream processes whichever of the methods is used; they may also tend to get lost in downstream processes such as sequencing and mapping. Thus, there appears to be little to gain by combining such as sequencing and mapping. Thus, there appears to be little to gain by combining more than one enrichment method178. more than one enrichment method178. The choice of enrichment strategy depends on the application. For amplifying and The choice of enrichment strategy depends on the application. For amplifying and analyzing a small number of target regions, PCR or multiplex PCR-derived methods may analyzing a small number of target regions, PCR or multiplex PCR-derived methods may be useful. PCR enrichment in emulsion, which is commercially available, allows a moderate be useful. PCR enrichment in emulsion, which is commercially available, allows a moderate number (>20,000 PCR reactions) of targets and samples. For a larger number of targets in number (>20,000 PCR reactions) of targets and samples. For a larger number of targets in combination with a large number of samples, methods which scale and multiplex combination with a large number of samples, methods which scale and multiplex effectively, such as selection by circularization, e.g. MIP and solution phase hybrid capture, effectively, such as selection by circularization, e.g. MIP and solution phase hybrid capture, of which the latter is commercially available for selection (MIPs are commercially available of which the latter is commercially available for selection (MIPs are commercially available only for genotyping purposes), will serve the researcher well for reducing the complexity of only for genotyping purposes), will serve the researcher well for reducing the complexity of the initial sample184. the initial sample184.

23 23

3.6. Global amplification of genomic DNA 3.6. Global amplification of genomic DNA

In contrast to the targeted amplification and selection methods presented previously, the In contrast to the targeted amplification and selection methods presented previously, the aim of global amplification of a nucleic acid sample is to maintain the complexity of the aim of global amplification of a nucleic acid sample is to maintain the complexity of the starting material unchanged while increasing the amount of nucleic acid. The approach can starting material unchanged while increasing the amount of nucleic acid. The approach can be used as a preparative step or as a stand-alone protocol prior to, for example, sequencing be used as a preparative step or as a stand-alone protocol prior to, for example, sequencing or microarray analysis. or microarray analysis.

3.6.1. PCR-based amplification strategies 3.6.1. PCR-based amplification strategies Most early global amplification strategies utilized PCR amplification together with partly or Most early global amplification strategies utilized PCR amplification together with partly or fully random primers. Degenerate Oligonucleotide-Primed PCR (DOP-PCR) uses primers fully random primers. Degenerate Oligonucleotide-Primed PCR (DOP-PCR) uses primers with a defined sequence at each end and a random hexamer sequence in the middle185 with a defined sequence at each end and a random hexamer sequence in the middle185 whereas in Primer Extension Preamplification PCR (PEP-PCR) completely random whereas in Primer Extension Preamplification PCR (PEP-PCR) completely random primers are used for amplification186. Despite protocol modifications since the methods primers are used for amplification186. Despite protocol modifications since the methods were first developed, both techniques have shown limitations in coverage when amplifying were first developed, both techniques have shown limitations in coverage when amplifying scarce material187. scarce material187. The recurrent trick of universal amplification tags can also be applied to global The recurrent trick of universal amplification tags can also be applied to global amplification. Initial tag incorporation can be carried out using primers with random 3’ amplification. Initial tag incorporation can be carried out using primers with random 3’ ends and a 5’ tag sequence either by a few initial PCR cycles188,189 or through linear ends and a 5’ tag sequence either by a few initial PCR cycles188,189 or through linear isothermal amplification to obtain overlapping copies of the initial DNA molecule190. isothermal amplification to obtain overlapping copies of the initial DNA molecule190. Another way is ligation of amplification tags to gDNA that has been fragmented physically Another way is ligation of amplification tags to gDNA that has been fragmented physically by shearing194 or cut by restriction enzymes (MseI or RsaI)15,191-193. A proprietary variant of by shearing194 or cut by restriction enzymes (MseI or RsaI)15,191-193. A proprietary variant of the approach is available as a commercial kit under the name GenomePlex (Sigma Aldrich, the approach is available as a commercial kit under the name GenomePlex (Sigma Aldrich, MO, USA) and this has been used in work reported in a number of publications for single MO, USA) and this has been used in work reported in a number of publications for single cell analysis by CGH arrays194, SNP arrays195 and massive sequencing16. cell analysis by CGH arrays194, SNP arrays195 and massive sequencing16. PCR-based WGA methods are generally affected by secondary DNA structures, which can PCR-based WGA methods are generally affected by secondary DNA structures, which can cause polymerase slippage196 and dissociation of enzyme from template; this may result in cause polymerase slippage196 and dissociation of enzyme from template; this may result in poor coverage, short product length and uneven representation of the template due to poor coverage, short product length and uneven representation of the template due to amplification bias and other amplification artifacts such as formation of chimers197. Despite amplification bias and other amplification artifacts such as formation of chimers197. Despite this, DOP-PCR and PEP-PCR and methods based on general amplification handles are this, DOP-PCR and PEP-PCR and methods based on general amplification handles are still used today, for example in the development of approaches such as pre-implantation still used today, for example in the development of approaches such as pre-implantation genetic diagnosis16,194,195,198,199. genetic diagnosis16,194,195,198,199.

3.6.2. Multiple Displacement Amplification (MDA) 3.6.2. Multiple Displacement Amplification (MDA) A major milestone in the field of molecular biology was the discovery of Phi29 (φ29), a A major milestone in the field of molecular biology was the discovery of Phi29 (φ29), a polymerase that showed extraordinary strength when holding on to the DNA template polymerase that showed extraordinary strength when holding on to the DNA template during synthesis. Through its remarkable processivity in combination with random during synthesis. Through its remarkable processivity in combination with random priming, the displacement activity of φ29 leads to a hyper-branched structure of amplified priming, the displacement activity of φ29 leads to a hyper-branched structure of amplified material with an average product length of >10 kb. The method is called Multiple material with an average product length of >10 kb. The method is called Multiple Displacement Amplification (MDA) and it can be used to isothermally amplify the human Displacement Amplification (MDA) and it can be used to isothermally amplify the human genome 1000-fold. Although amplification bias is lower than for PCR-based WGA genome 1000-fold. Although amplification bias is lower than for PCR-based WGA methods201, non-specific synthesis, either from DNA contamination or from endogenously methods201, non-specific synthesis, either from DNA contamination or from endogenously generated DNA such as primer-dimers, is an issue for this means of amplification too. A generated DNA such as primer-dimers, is an issue for this means of amplification too. A 24 24 Julia Sandberg Julia Sandberg decrease in reaction volume from 50 µl to 60 nl, using a microfluidic chip, has been shown decrease in reaction volume from 50 µl to 60 nl, using a microfluidic chip, has been shown to improve specificity and lower the amplification bias, while maintaining the level of to improve specificity and lower the amplification bias, while maintaining the level of amplification, possibly due to reduced competition from background contaminants or to amplification, possibly due to reduced competition from background contaminants or to reduced damage to the DNA template20, and similar a microfluidic setup has been used for reduced damage to the DNA template20, and similar a microfluidic setup has been used for evaluating nucleic acid contamination in commercially-available MDA reagents200. evaluating nucleic acid contamination in commercially-available MDA reagents200. MDA has been used to prepare material from single cells for various applications such as MDA has been used to prepare material from single cells for various applications such as CNV analysis by comparative genomic hybridization on, CGH40, cloning and shotgun CNV analysis by comparative genomic hybridization on, CGH40, cloning and shotgun sequencing in a process called “ploning”201, or as a pre-amplification step prior to PCR sequencing in a process called “ploning”201, or as a pre-amplification step prior to PCR amplification of target regions40,41. However, it has recently been shown that MDA amplification of target regions40,41. However, it has recently been shown that MDA introduces inversions but not translocations in the sequenced material202. Through double introduces inversions but not translocations in the sequenced material202. Through double WGA of FACS sorted cells, de novo sequencing and genome assembly of single bacteria WGA of FACS sorted cells, de novo sequencing and genome assembly of single bacteria has been done. Coverage varied 1000-fold however there was no correlation between has been done. Coverage varied 1000-fold however there was no correlation between coverage and GC content49. In a recent publication MDA-assisted pre-implantation coverage and GC content49. In a recent publication MDA-assisted pre-implantation diagnosis was used to ensure the birth of a healthy child from parents carrying genes for a diagnosis was used to ensure the birth of a healthy child from parents carrying genes for a recessive kidney disease36. Several kits for φ29-based amplification are commercially recessive kidney disease36. Several kits for φ29-based amplification are commercially available. available.

3.6.3. Linear amplification techniques 3.6.3. Linear amplification techniques Linear amplification strategies have the advantage of repeated copying the original Linear amplification strategies have the advantage of repeated copying the original template molecule over and over, instead of making more templates that in turn will be template molecule over and over, instead of making more templates that in turn will be copied. In this way linear amplification avoids the problem of error propagation that are an copied. In this way linear amplification avoids the problem of error propagation that are an unwelcome feature of exponential amplification strategies. unwelcome feature of exponential amplification strategies. One method of linear amplification of dsDNA goes via RNA production using a modified One method of linear amplification of dsDNA goes via RNA production using a modified variant of a widespread RNA amplification protocol203. Briefly, by addition of a poly(dT)- variant of a widespread RNA amplification protocol203. Briefly, by addition of a poly(dT)- tail to the 3’ end of the template, introduction of a T7 promoter sequence is enabled by tail to the 3’ end of the template, introduction of a T7 promoter sequence is enabled by extension from a polyA-tagged primer. This promoter sequence can then be utilized for extension from a polyA-tagged primer. This promoter sequence can then be utilized for linear amplification through in vitro transcription of the DNA template followed by reverse linear amplification through in vitro transcription of the DNA template followed by reverse transcription to convert the resulting RNA amplification products back into DNA203-205. transcription to convert the resulting RNA amplification products back into DNA203-205. The T7 promoter sequence can also be introduced by ligation206. Single Primer Isothermal The T7 promoter sequence can also be introduced by ligation206. Single Primer Isothermal Amplification (SPIA) is another elegant strategy in which a general RNA sequence is Amplification (SPIA) is another elegant strategy in which a general RNA sequence is introduced into the 5’ ends of template fragments by using a chimeric random-DNA and introduced into the 5’ ends of template fragments by using a chimeric random-DNA and tag-RNA primer that is either extended207 or introduced by ligation208, thus incorporating tag-RNA primer that is either extended207 or introduced by ligation208, thus incorporating priming sites for the forthcoming linear amplification. Chimeric RNA-DNA primers are priming sites for the forthcoming linear amplification. Chimeric RNA-DNA primers are then used to prime amplification by the strand-displacing polymerase. Cleavage of the RNA then used to prime amplification by the strand-displacing polymerase. Cleavage of the RNA part of the introduced primer by RNAse H exposes new priming sites, which enables part of the introduced primer by RNAse H exposes new priming sites, which enables template amplification in a linear fashion. template amplification in a linear fashion.

25 25 Primer Extension PreamplificationGlobalGlobalGlobal amplification amplification amplification PCR strategies strategies strategies Degenerate Oligonucleotide-primed PCR Primer Extension Preamplification PCR GlobalGlobal amplificationamplificationDegenerate strategiesstrategies Oligonucleotide-primed PCR Primer Extension Preamplification PCR (PEP-PCR)GlobalGlobal amplification amplificationDegenerate strategies strategies Oligonucleotide-primed PCR(DOP-PCR) (PEP-PCR)PrimerPrimerPrimer Extension Extension Extension Preamplification Preamplification Preamplification PCR PCR GlobalPCRGlobal amplification amplification strategies(DOP-PCR) DegeneratestrategiesDegenerateDegenerate Oligonucleotide-primed Oligonucleotide-primed Oligonucleotide-primed PCR PCR PCR PrimerPrimer(PEP-PCR) ExtensionExtension PreamplificationPreamplification PCRPCR DegenerateDegenerate(DOP-PCR) Oligonucleotide-primedOligonucleotide-primed PCRPCR PrimerPrimer ExtensionExtension(PEP-PCR)(PEP-PCR) (PEP-PCR) PreamplificationPreamplification PCRPCR Degenerate Oligonucleotide-primed(DOP-PCR)(DOP-PCR)(DOP-PCR) PCR Primer Extension(PEP-PCR)(PEP-PCR) Preamplification PCR Degenerate(DOP-PCR)(DOP-PCR) Oligonucleotide-primed PCR (PEP-PCR)(PEP-PCR)(PEP-PCR) (DOP-PCR)(DOP-PCR)(DOP-PCR)

Introduction of general amplification IntroductionIntroduction of general of amplificationgeneral amplificationhandles by extension handles IntroductionbyhandlesIntroduction extensionIntroduction by extension of of general ofgeneral general amplification amplification amplification IntroductionIntroductionhandleshandles ofhandlesof general generalby by extension byextension amplificationamplificationextension Global amplification strategies IntroductionIntroductionIntroductionhandleshandles ofof byby generalgeneral of extensionextension general amplificationamplification amplification handleshandleshandles byby extension extensionby extensionIsothermal extension or PCR Linear amplification with RNA primers IsothermalPrimer extension Extension or PCR PreamplificationCleanup PCR Linear amplificationDegenerate with RNA Oligonucleotide-primed primers PCR Cleanup Isothermal extension or PCR Linear amplification with RNA primers Cleanup Isothermal(PEP-PCR)IsothermalIsothermal extension extension extension or PCR or PCR or PCR LinearLinearLinear amplification amplification amplification(DOP-PCR) with with withRNA RNA RNA primers primers primers IsothermalIsothermalCleanupCleanupCleanup extension extension or or PCR PCR LinearLinear amplificationamplification withwith RNARNA primersprimers CleanupIsothermalIsothermal extensionextension oror PCRPCR Cleanup Isothermal extension orIsothermal PCR extension or PCR LinearLinearLinear amplificationamplification amplification withwith with RNARNA RNA primersprimers primers CleanupCleanup Isothermal extension or PCR Cleanup Cleanup Cleanup Isothermal extension or PCR Cleanup IsothermalIsothermalIsothermal extension extension extension or PCR or PCR or PCR IsothermalIsothermalCleanupCleanupCleanup extension extension or or PCR PCR CleanupCleanupIsothermalIsothermalIsothermal extensionextension extension oror PCRPCR or PCR CleanupCleanupCleanup Universal PCR Universal PCR Universal PCR UniversalUniversalUniversal PCR PCR PCR RNAseRNAse HRNAse cleavage H cleavage H cleavageto free to freeprimer to primerfree hybridization primer hybridization hybridization site site site RNAse H cleavage to free primer hybridization site UniversalUniversal PCR PCR RNAse H cleavage to free primer hybridization site RNAse H cleavage to free primer hybridization site UniversalUniversalUniversal PCRPCR PCR RNAse RNAseH cleavage H cleavage to free primerto free primerhybridization hybridization site site

Repeated priming and extension Introduction of general amplification IntroductionIntroduction ofIntroduction general of amplificationgeneral of general amplification amplificationhandles by ligation handlesIntroduction byhandlesIntroduction ligationIntroductionhandles by ligation ofby of generalextension ofgeneral general amplification amplification amplification IntroductionIntroductionhandleshandles ofofhandles generalgeneral by by ligation byligation amplificationamplification ligation IntroductionIntroductionIntroductionhandleshandles ofof by bygeneralgeneral of ligationligation general amplificationamplification amplification handleshandleshandles byby ligation ligationby ligation Isothermal extension or PCR Linear amplification with RNA primers Cleanup FragmentizeFragmentizeFragmentize FragmentizeFragmentize Fragmentize Isothermal extension or PCR FragmentizeFragmentize Cleanup LigateLigate adaptorsLigate adaptors adaptors LigateLigate adaptors adaptors LigateLigateLigate adaptorsadaptors adaptors LinearLinearLinearLinear amplification amplification amplification amplification through through through through in in invitro vitro vitroin vitrotranscription transcription transcription transcription Universal PCR LinearLinear amplificationamplification throughRNAsethrough H cleavage inin to vitro vitrofree primer transcriptiontranscription hybridization site LinearLinearLinear amplificationamplification amplification throughthrough through inin vitrovitro in vitro transcriptiontranscription transcription Fragmentize & dT-tail FragmentizeFragmentizeFragmentize

UniversalUniversalUniversal PCR PCR PCR FragmentizedT-tail TTTTTT FragmentizedT-taildT-tail

Universal PCR FragmentizeTTTTTT

Universal PCR TTTTTT TTTTTT FragmentizedT-taildT-tailFragmentize

Universal PCR TTTTTT TTTTTT

UniversalUniversal PCR PCR dT-tail TTTTTT

TTTTTT dT-tail TTTTTT TTTTTT TTTTTTTTTTTT dT-tail

TTTTTT TTTTTT

TTTTTT TTTTTT

TTTTTT TTTTTT

Introduction of general amplification TTTTTT TTTTTT TTTTTT TTTTTT

TTTTTT TTTTTT

TTTTTT TTTTTT

TTTTTT TTTTTT TTTTTT TTTTTT

TTTTTT

Multiple Displacement Amplification TTTTTT

MultipleMultiple Displacement Displacement Amplification Amplification TTTTTTTTTTTT TTTTTTTTTTTT TTTTTT

handles by ligation TTTTTT TTTTTTTTTTTT TTTTTT

TTTTTT

TTTTTT

Multiple Displacement Amplification TTTTTT TTTTTTTTTTTTTTTTTT

Multiple Displacement Amplification TTTTTT TTTTTT

TTTTTT TTTTTT TTTTTT

Multiple Displacement Amplification TTTTTT

TTTTTT TTTTTT

MultipleMultiple Displacement Displacement Amplification Amplification TTTTTT TTTTTTTTTTTT

TTTTTT TTTTTT

TTTTTT TTTTTT Priming with T7 promoter TTTTTT TTTTTT PrimePriming with with T7 promoterTTTTTT AAAA AAAA Priming withAAAA T7 promoter containing adapter AAAA AAAAdA-primercontaining adapter PrimingPrimingPriming with with T7with T7 promoter T7promoter promoter Fragmentize AAAA containingAAAA adapter

AAAAAAAAAAAAAAAA AAAA PrimingPrimingcontaining withwith T7T7adapter promoterpromoter

AAAA containingcontaining adapter adapter Priming with T7 promoter AAAA TT

AAAAAAAA AAAA containingPrimingPriming withadapter withT7 promoter T7 promoterTTTTTT

AAAA TTTTTTAAAA AAAATTAAAAcontaining adapter

AAAAAAAA TTTTTTAAAA containingcontainingTT containing adapteradapter adapter

AAAA TT AAAA TT

TTTTTTTTTTTTAAAA TTTTTTTT

Ligate adaptors TTTTTTAAAA TT

TTTTTTAAAA TT

TTTTTTAAAA TT

AAAA TT 3’-5’ exonuclease activity 3’-5’ exonuclease activity TTTTTTTTTTTTAAAA TT Linear amplification through3’-5’ exonuclease in vitro transcription activity Extension Extension 3’-5’ exonuclease activity

Extension3’-5’ exonuclease3’-5’ exonuclease activity activity

3’-5’3’-5’ExtensionExtension exonucleaseexonucleaseExtension activityactivity TTTTAAAA

TTTTAAAA 3’-5’ exonuclease activity

AAAA 3’-5’ exonuclease activity

TTTTExtensionExtension3’-5’ exonuclease activity AAAA

AAAA

ExtensionExtensionTTTTExtensionTTTTAAAA TTTT In vitro transcription FragmentizeAAAA

In vitro transcription TTTTTTTTAAAA

Universal PCR AAAA

TTTTAAAA In vitro transcriptiondT-tail AAAA In Invitro vitroInTTTT transcriptionvitro transcriptionTTTT transcription RNA

RNA InIn vitrovitro transcriptiontranscription TTTTTT TTTTTTRNA

InIn vitrovitroIn vitro transcriptiontranscriptionRNALigationRNA transcriptionRNA Multiple Displacement Amplification TTTTTT RNARNAExtension

RNA reverseTTTTTT Transcription TTTTTT reverse TranscriptionTTTTTT RandomRNA bases reverse TranscriptionRNA reversereversereverse Transcription Transcription Transcription cDNA cDNA reversereverse TranscriptionTranscription cDNAreverse Transcription Primingreversereverse with TranscriptioncDNA T7cDNA Transcription promotercDNA AAAA AAAA containing adaptercDNAcDNA

cDNAcDNAcDNA 26 TTTTTTAAAA TT

3’-5’ exonuclease activity

Extension TTTTAAAA

In vitro transcription Julia Sandberg Julia Sandberg

3.6.4. Choosing a global amplification strategy 3.6.4. Choosing a global amplification strategy Amplification bias and allelic dropout are still issues in current WGA methods when Amplification bias and allelic dropout are still issues in current WGA methods when starting from scarce material. Several comparative studies have been published starting from scarce material. Several comparative studies have been published demonstrating that MDA amplification gives a better representation of the initial sample in demonstrating that MDA amplification gives a better representation of the initial sample in comparison with other amplification strategies197; however, others have found linker-PCR comparison with other amplification strategies197; however, others have found linker-PCR strategies to give a better34 or equal209 result in terms of dropout rate and amplification bias strategies to give a better34 or equal209 result in terms of dropout rate and amplification bias in comparison with MDA-based methods. The short product length of many PCR-based in comparison with MDA-based methods. The short product length of many PCR-based amplification strategies is a limitation for some applications, and this, together with the ease amplification strategies is a limitation for some applications, and this, together with the ease of use of the MDA approach, makes many labs turn to MDA. However the potential for of use of the MDA approach, makes many labs turn to MDA. However the potential for amplifying degraded DNA is, for some applications, an obvious advantage of PCR-based amplifying degraded DNA is, for some applications, an obvious advantage of PCR-based approaches209. approaches209. This chapter has described various ways to treat a DNA sample in preparation for, e.g., This chapter has described various ways to treat a DNA sample in preparation for, e.g., sequencing analysis. The wide repertoire of methods available reflects the enormous sequencing analysis. The wide repertoire of methods available reflects the enormous importance of sample preparation and the limitations of available techniques. In the next importance of sample preparation and the limitations of available techniques. In the next chapter, current and forthcoming methods for Massively Parallel Sequencing will be chapter, current and forthcoming methods for Massively Parallel Sequencing will be described in more detail. described in more detail.

27 27

4. Massively parallel DNA sequencing 4. Massively parallel DNA sequencing

DNA sequencing can help us to answer fundamental questions about disease and human DNA sequencing can help us to answer fundamental questions about disease and human biology and thus contributes to both the saving of and increases human quality of life. biology and thus contributes to both the saving of lives and increases human quality of life. The first whose genome was completely sequenced was a , The first organism whose genome was completely sequenced was a virus, bacteriophage ΦX174210. This was accomplished in 1977; an immense amount of work was required to ΦX174210. This was accomplished in 1977; an immense amount of work was required to finish that tiny genome of roughly 5000 basepairs. finish that tiny genome of roughly 5000 basepairs. Today, 600,000 times larger than this (such as the human genome) can be Today, genomes 600,000 times larger than this (such as the human genome) can be sequenced in a matter of days. The technical revolution that has occurred in the field of sequenced in a matter of days. The technical revolution that has occurred in the field of gene technology during the last six years has made it possible to rapidly obtain tremendous gene technology during the last six years has made it possible to rapidly obtain tremendous amounts of sequence data, with a comparable reduction in cost. In fact, the amount of amounts of sequence data, with a comparable reduction in cost. In fact, the amount of sequence data obtained per day doubles every year. The increase is so rapid that it outstrips sequence data obtained per day doubles every year. The increase is so rapid that it outstrips Moore’s law (a proposal made in the 1960s, which stated that the amount of available Moore’s law (a proposal made in the 1960s, which stated that the amount of available computing power would double every two years for the foreseeable future)211. The rapid computing power would double every two years for the foreseeable future)211. The rapid progress in this field has greatly increased the scope of experimentation beyond simply progress in this field has greatly increased the scope of experimentation beyond simply determining the order of bases in a gene or viral genome, making it possible to ask determining the order of bases in a gene or viral genome, making it possible to ask completely new questions. completely new questions. This chapter is about massively parallel DNA sequencing (MPS). Current and emerging This chapter is about massively parallel DNA sequencing (MPS). Current and emerging sequencing technologies will be presented, starting with some molecular and technical tools sequencing technologies will be presented, starting with some molecular and technical tools that are used in these platforms. Since today’s technologies were developed on the that are used in these platforms. Since today’s technologies were developed on the foundation of those that came before, the chapter begins with a brief recapitulation of the foundation of those that came before, the chapter begins with a brief recapitulation of the history of the field. history of the field.

28 28 Julia Sandberg Julia Sandberg

4.1. Deciphering the genetic code 4.1. Deciphering the genetic code

While the double helix structure of DNA was first reported in 19536, it took another fifteen While the double helix structure of DNA was first reported in 19536, it took another fifteen to twenty years for the development of experimental methods for determining DNA to twenty years for the development of experimental methods for determining DNA sequences. In 1977, two approaches for reading DNA sequences were presented, both of sequences. In 1977, two approaches for reading DNA sequences were presented, both of which were based on the generation of ssDNA fragments of varying lengths spanning the which were based on the generation of ssDNA fragments of varying lengths spanning the template molecule and ending with a certain base, for subsequent size separation and base template molecule and ending with a certain base, for subsequent size separation and base calling by gel electrophoresis. The method of Maxam & Gilbert relied on base-specific calling by gel electrophoresis. The method of Maxam & Gilbert relied on base-specific chemical cleavage of DNA212, whereas Sanger used so-called dideoxy nucleotides (ddNTPs) chemical cleavage of DNA212, whereas Sanger used so-called dideoxy nucleotides (ddNTPs) - modified nucleotides that lack a 3’ hydroxyl group and therefore cannot be extended by a - modified nucleotides that lack a 3’ hydroxyl group and therefore cannot be extended by a polymerase213. In both methods, four separate reactions are performed in order to obtain polymerase213. In both methods, four separate reactions are performed in order to obtain radioisotope-labeled fragments ending with a specific base, the identity of which varies from radioisotope-labeled fragments ending with a specific base, the identity of which varies from reaction to reaction. The Sanger method was slightly easier and less hazardous than that reaction to reaction. The Sanger method was slightly easier and less hazardous than that presented by Maxam & Gilbert and therefore became the method of choice. The first step presented by Maxam & Gilbert and therefore became the method of choice. The first step towards the automation of Sanger sequencing was taken by Hood and colleagues in 1986; towards the automation of Sanger sequencing was taken by Hood and colleagues in 1986; these workers used color-coded primers for each of the four synthesis reactions, making it these workers used color-coded primers for each of the four synthesis reactions, making it possible to analyze the sample during the gel electrophoresis step214. Color-coded possible to analyze the sample during the gel electrophoresis step214. Color-coded terminating nucleotides have since replaced color-coded primers, to allow all four synthesis terminating nucleotides have since replaced color-coded primers, to allow all four synthesis reactions to be carried out in a single tube. Gel electrophoresis in capillaries has further reactions to be carried out in a single tube. Gel electrophoresis in capillaries has further increased the method’s sensitivity, increasing the read length to its current 700-1000 bp215. increased the method’s sensitivity, increasing the read length to its current 700-1000 bp215. Using contemporary techniques, it is possible to run 96 or 384 samples simultaneously. Using contemporary techniques, it is possible to run 96 or 384 samples simultaneously. The Sanger method has dominated the DNA sequencing field for nearly thirty years and The Sanger method has dominated the DNA sequencing field for nearly thirty years and has enabled a number of important achievements. The first sequenced genome of a free- has enabled a number of important achievements. The first sequenced genome of a free- living organism (a bacterium) was published in 1995216 and several others followed shortly living organism (a bacterium) was published in 1995216 and several others followed shortly thereafter. The first human genome sequence was a key achievement of biotechnology and thereafter. The first human genome sequence was a key achievement of biotechnology and resulted from a race between a public and a private initiative. The two competing teams resulted from a race between a public and a private initiative. The two competing teams used different approaches to stitch Sanger reads together into contigs and then assemble used different approaches to stitch Sanger reads together into contigs and then assemble these into a consistent whole. The public initiative used a direct mapping approach in these into a consistent whole. The public initiative used a direct mapping approach in which the genome was broken into ordered and overlapping fragments that were cloned which the genome was broken into ordered and overlapping fragments that were cloned and then sequenced, whereas the private initiative used “whole genome shotgun and then sequenced, whereas the private initiative used “whole genome shotgun sequencing” of random DNA fragments followed by sequence assembly on the basis of sequencing” of random DNA fragments followed by sequence assembly on the basis of overlapping sequences216,217. The private initiative also introduced paired-end sequencing of overlapping sequences216,217. The private initiative also introduced paired-end sequencing of both ends of the cloned fragment to facilitate the assembly of sequence reads217. Despite the both ends of the cloned fragment to facilitate the assembly of sequence reads217. Despite the fact that the private initiative started years later than the publically funded group, the two fact that the private initiative started years later than the publically funded group, the two groups published their drafts of the human genome on the same week in 2001 and the race groups published their drafts of the human genome on the same week in 2001 and the race was officially a tie218,219. Both shotgun sequencing and paired-end sequencing have since was officially a tie218,219. Both shotgun sequencing and paired-end sequencing have since found widespread use. It took 13 years and the efforts of a very large number of people to found widespread use. It took 13 years and the efforts of a very large number of people to accomplish the sequencing of the human genome, at a total cost of approximately $3 accomplish the sequencing of the human genome, at a total cost of approximately $3 billion220. billion220. New methods for DNA sequencing were developed during the late 80’s and early 90’s. One New methods for DNA sequencing were developed during the late 80’s and early 90’s. One such method involves sequencing by the hybridization of labeled template DNA molecules such method involves sequencing by the hybridization of labeled template DNA molecules to nucleotide probes that have been immobilized on a membrane or surface. By analyzing to nucleotide probes that have been immobilized on a membrane or surface. By analyzing the hybridization pattern, it is possible to assemble an entire sequence, although this the hybridization pattern, it is possible to assemble an entire sequence, although this method is incapable of detecting single-base alterations221. A sequencing method involving method is incapable of detecting single-base alterations221. A sequencing method involving cyclical monitoring of DNA synthesis events by detecting luminescence, known as cyclical monitoring of DNA synthesis events by detecting luminescence, known as pyrosequencing, was also developed; this technique is discussed in more detail later on in pyrosequencing, was also developed; this technique is discussed in more detail later on in 29 29

this chapter222,223. Shortly thereafter, methods for sequencing by mass spectrometry that this chapter222,223. Shortly thereafter, methods for sequencing by mass spectrometry that make it possible to rapidly analyze short sequences were reported224. A common factor in make it possible to rapidly analyze short sequences were reported224. A common factor in all of these methods, including Sanger sequencing, is the use of a labor-intense preparative all of these methods, including Sanger sequencing, is the use of a labor-intense preparative protocol prior to sequencing. The preparation of samples for sequencing typically involves a protocol prior to sequencing. The preparation of samples for sequencing typically involves a sequence of template amplification by PCR, cloning into a vector, clone picking and PCR sequence of template amplification by PCR, cloning into a vector, clone picking and PCR amplification of the target DNA. Methods for the in vitro amplification of templates in amplification of the target DNA. Methods for the in vitro amplification of templates in agarose gel225 or on solid surfaces69,131,226 prior to sequencing first appeared in the early agarose gel225 or on solid surfaces69,131,226 prior to sequencing first appeared in the early 2000s and paved the way for future efforts towards parallel sequencing. The advent of 2000s and paved the way for future efforts towards parallel sequencing. The advent of massively parallel sequencing in 200570,227 represented the start of a new era, making it massively parallel sequencing in 200570,227 represented the start of a new era, making it possible to amplify numerous templates separately in a single reaction and then sequence possible to amplify numerous templates separately in a single reaction and then sequence them in parallel. them in parallel.

4.2. The Massively Parallel Sequencing toolbox 4.2. The Massively Parallel Sequencing toolbox

Massively parallel sequencing can be done on solitary or clonally amplified templates, using Massively parallel sequencing can be done on solitary or clonally amplified templates, using different chemistries, although there are some common themes that are present in several different chemistries, although there are some common themes that are present in several platforms. The unique combination of techniques used in a given MPS technology platforms. The unique combination of techniques used in a given MPS technology determines the type of data that platform can produce. determines the type of data that platform can produce.

4.2.1. Sequencing chemistry 4.2.1. Sequencing chemistry

Sequencing by synthesis Sequencing by synthesis There are various different ways of sequencing a DNA template by synthesizing There are various different ways of sequencing a DNA template by synthesizing complementary DNA. By sequentially adding one nucleotide species at a time (and complementary DNA. By sequentially adding one nucleotide species at a time (and including washing steps between each successive addition step) and monitoring some effect including washing steps between each successive addition step) and monitoring some effect induced by the incorporation of that nucleotide into the growing DNA strand, it is possible induced by the incorporation of that nucleotide into the growing DNA strand, it is possible to observe the extension of the strand as it proceeds and determine which nucleotide was to observe the extension of the strand as it proceeds and determine which nucleotide was added. This concept is used in different ways in the Roche/45470 and Ion Torrent228 added. This concept is used in different ways in the Roche/45470 and Ion Torrent228 sequencing systems. sequencing systems. Fluorescently-labeled terminating nucleotides with a removable blocking group that Fluorescently-labeled terminating nucleotides with a removable blocking group that prevents chain extension while it is present are used in a procedure called Cyclic Reversible prevents chain extension while it is present are used in a procedure called Cyclic Reversible Termination (CRT). Chain extension with a modified nucleotide is followed by washing Termination (CRT). Chain extension with a modified nucleotide is followed by washing and fluorescent imaging, to analyze the incorporation event. The fluorophore is then and fluorescent imaging, to analyze the incorporation event. The fluorophore is then removed and the newly-incorporated nucleotide is de-blocked, rendering it competent in removed and the newly-incorporated nucleotide is de-blocked, rendering it competent in chain extension and allowing the next cycle to proceed. The different nucleotides may be chain extension and allowing the next cycle to proceed. The different nucleotides may be coded with differently-colored dyes and added as a mixture (Illumina)229 or labeled with the coded with differently-colored dyes and added as a mixture (Illumina)229 or labeled with the same dye and added in separate steps, as is done in the Helicos230,231 method. The former same dye and added in separate steps, as is done in the Helicos230,231 method. The former case is arguably better, since having all four bases present at once should reduce the case is arguably better, since having all four bases present at once should reduce the likelihood of the wrong base being incorporated at any one time – the incorporation of the likelihood of the wrong base being incorporated at any one time – the incorporation of the correct nucleotide is more kinetically favorable than that of a mismatched one230. correct nucleotide is more kinetically favorable than that of a mismatched one230. There are a variety of blocking groups that can be used in these techniques. The Illumina There are a variety of blocking groups that can be used in these techniques. The Illumina method accomplishes 3’ blocking by modifying the 3’ hydroxyl group, which is essential for method accomplishes 3’ blocking by modifying the 3’ hydroxyl group, which is essential for

30 30 Julia Sandberg Julia Sandberg the formation of a covalent bond with the incoming nucleotide. In addition, a cleavable the formation of a covalent bond with the incoming nucleotide. In addition, a cleavable fluorescent label is appended to the base229. Both modifications are chemically removed in fluorescent label is appended to the base229. Both modifications are chemically removed in each cycle. In Helicos sequencing, rather than modifying the nucleotide at two positions, a each cycle. In Helicos sequencing, rather than modifying the nucleotide at two positions, a cleavable linker containing a fluorescent dye is incorporated into the base in a position that cleavable linker containing a fluorescent dye is incorporated into the base in a position that allows its steric bulk to prevent further extension without requiring chemical modification of allows its steric bulk to prevent further extension without requiring chemical modification of the 3’-hydroxy group231. In both of these methods, some fragments of the linkers that the 3’-hydroxy group231. In both of these methods, some fragments of the linkers that connected the label/blocking group to the nucleotide remain attached after cleavage, connected the label/blocking group to the nucleotide remain attached after cleavage, causing so-called “molecular scars” to accumulate in the DNA strand during sequencing232. causing so-called “molecular scars” to accumulate in the DNA strand during sequencing232. A third approach used in some single-molecule sequencing methods involves real-time A third approach used in some single-molecule sequencing methods involves real-time monitoring of synthesis events by observing the incorporation of fluorescently-labeled monitoring of synthesis events by observing the incorporation of fluorescently-labeled nucleotides into the growing DNA chain233-235. Real-time analysis requires nucleotides that nucleotides into the growing DNA chain233-235. Real-time analysis requires nucleotides that are capable of undergoing further extension without chemical modification, but are color- are capable of undergoing further extension without chemical modification, but are color- coded with fluorescent tags that are released during the incorporation event itself. The coded with fluorescent tags that are released during the incorporation event itself. The fluorophores are bound to the terminal phosphate of the nucleotide; the release of the fluorophores are bound to the terminal phosphate of the nucleotide; the release of the fluorescent tag generates natural unmodified DNA232,234. fluorescent tag generates natural unmodified DNA232,234.

Sequencing by ligation Sequencing by ligation Ligation of fluorescently labeled oligonucleotides to a complementary template molecule is Ligation of fluorescently labeled oligonucleotides to a complementary template molecule is another way of interrogating a template DNA molecule. Like some of the other methods another way of interrogating a template DNA molecule. Like some of the other methods discussed above, this approach also involves cyclical sequencing using a mixture of color- discussed above, this approach also involves cyclical sequencing using a mixture of color- coded probes that are allowed to hybridize to the primed template. A ligase then forms a coded probes that are allowed to hybridize to the primed template. A ligase then forms a covalent bond between the probe and the primer. After washing, fluorescent imaging is covalent bond between the probe and the primer. After washing, fluorescent imaging is used to determine which probe paired up with the template. Probes may encode one227,236 used to determine which probe paired up with the template. Probes may encode one227,236 or two237 bases, which are combined with random or universal bases for stabilization. or two237 bases, which are combined with random or universal bases for stabilization. Before the next cycle begins, a free phosphate group needs to be re-generated to allow Before the next cycle begins, a free phosphate group needs to be re-generated to allow ligation. This is done either by cleaving off the last nucleotides of the incorporated probe, ligation. This is done either by cleaving off the last nucleotides of the incorporated probe, allowing ligation of the next probe in a chained manner (SOLiD)237, or by resetting the allowing ligation of the next probe in a chained manner (SOLiD)237, or by resetting the template by stripping off the whole primer-probe complex after each ligation step; in this template by stripping off the whole primer-probe complex after each ligation step; in this case, several probes are used that are complementary to different positions on the target case, several probes are used that are complementary to different positions on the target strand (Complete Genomics, Polonator)227,236. In both approaches, sequencing primers with strand (Complete Genomics, Polonator)227,236. In both approaches, sequencing primers with different intercepts are used in order to start the reaction at different bases. different intercepts are used in order to start the reaction at different bases.

4.2.2. Template amplification 4.2.2. Template amplification Most imaging systems are not built to detect single fluorescent events and so clonal Most imaging systems are not built to detect single fluorescent events and so clonal amplification of templates prior to sequencing is commonly used. Immobilization on amplification of templates prior to sequencing is commonly used. Immobilization on spatially separated positions avoids PCR competition and allows hundreds of thousands to spatially separated positions avoids PCR competition and allows hundreds of thousands to billions of templates to be amplified in parallel. billions of templates to be amplified in parallel.

Emulsion PCR Emulsion PCR By compartmentalizing separate PCR reactions in water-in-oil emulsions, template By compartmentalizing separate PCR reactions in water-in-oil emulsions, template molecules can be clonally amplified on the surface of primer beads69,70,97. Adaptors molecules can be clonally amplified on the surface of primer beads69,70,97. Adaptors containing universal priming sites are introduced onto the ends of the templates by ligation containing universal priming sites are introduced onto the ends of the templates by ligation or PCR, and when mixed with beads that have primers on the surface, templates are or PCR, and when mixed with beads that have primers on the surface, templates are captured by hybridization; the aim is to produce no more than one copy per bead. Beads captured by hybridization; the aim is to produce no more than one copy per bead. Beads are introduced into droplets containing PCR reagents and a small amount of the same are introduced into droplets containing PCR reagents and a small amount of the same 31 31

primer that is found on the bead. Beads that carry the amplified template on their surface primer that is found on the bead. Beads that carry the amplified template on their surface are then enriched and either embedded in agarose gel (Polonator)227, chemically cross- are then enriched and either embedded in agarose gel (Polonator)227, chemically cross- linked to an amino coated glass surface (SOLiD; Polonator)237,238 or placed in wells of a linked to an amino coated glass surface (SOLiD; Polonator)237,238 or placed in wells of a PicoTiterPlate (Roche/454)70 for subsequent sequencing. PicoTiterPlate (Roche/454)70 for subsequent sequencing.

Bridge amplification Bridge amplification Solid phase amplification on a glass slide makes it possible to prepare randomly distributed Solid phase amplification on a glass slide makes it possible to prepare randomly distributed clusters of clonally amplified molecules on an easily-interrogated flow cell. Adapter-ligated clusters of clonally amplified molecules on an easily-interrogated flow cell. Adapter-ligated template library molecules are captured by hybridization to the lawn of high-density template library molecules are captured by hybridization to the lawn of high-density forward and reverse primers that are covalently attached to the chip surface. An initial forward and reverse primers that are covalently attached to the chip surface. An initial extension step starting from the immobilized primer on the surface generates one surface- extension step starting from the immobilized primer on the surface generates one surface- attached template molecule; residual unbound DNA is removed by denaturation and attached template molecule; residual unbound DNA is removed by denaturation and washing. The attached molecule is then allowed to bend and hybridize to the surface washing. The attached molecule is then allowed to bend and hybridize to the surface primer corresponding to its outer end sequence, forming a bridge and a priming site for primer corresponding to its outer end sequence, forming a bridge and a priming site for extension. Bridge amplification is carried out in a cyclic manner under isothermal extension. Bridge amplification is carried out in a cyclic manner under isothermal conditions, using formamide for denaturation131; the result is a dense lawn of micron-sized conditions, using formamide for denaturation131; the result is a dense lawn of micron-sized clusters on the chip surface. To make the clusters suitable for sequencing, one of the primer clusters on the chip surface. To make the clusters suitable for sequencing, one of the primer sequences on the surface contains a cleavage site; cleavage generates allows clusters sequences on the surface contains a cleavage site; cleavage generates allows clusters containing identical 5’-attached DNA strands that are ready for sequencing. containing identical 5’-attached DNA strands that are ready for sequencing.

Rolling Circle amplification Rolling Circle amplification Rolling circle amplification (RCA) using the polymerase φ29 is described in the preceding Rolling circle amplification (RCA) using the polymerase φ29 is described in the preceding chapters of this thesis. Briefly, the high processivity and strand displacement activity of φ29 chapters of this thesis. Briefly, the high processivity and strand displacement activity of φ29 enables it to amplify a circularized template into a very long DNA molecule containing a enables it to amplify a circularized template into a very long DNA molecule containing a large number of copies of the original molecules by looping round the template while large number of copies of the original molecules by looping round the template while synthesizing174. The amplification product then collapses into a sphere, which is referred to synthesizing174. The amplification product then collapses into a sphere, which is referred to as a DNA Nanoball (DNB) in the terminology used by Complete Genomics236. as a DNA Nanoball (DNB) in the terminology used by Complete Genomics236. The sequencing of clonally amplified target molecules must occur in a synchronized The sequencing of clonally amplified target molecules must occur in a synchronized fashion, with all target copies undergoing the same process at the same time, since the fashion, with all target copies undergoing the same process at the same time, since the addition of multiple nucleotides/probes or incomplete extension of a cluster would result in addition of multiple nucleotides/probes or incomplete extension of a cluster would result in a de-synchronization of that cluster (leading/lagging strand de-phasing) that would increase a de-synchronization of that cluster (leading/lagging strand de-phasing) that would increase slightly in every subsequent cycle. In the long run, de-phasing causes increased background slightly in every subsequent cycle. In the long run, de-phasing causes increased background signal levels and results in base call errors, a decrease in sequence quality towards the end of signal levels and results in base call errors, a decrease in sequence quality towards the end of the read, and a limited read length239,240. the read, and a limited read length239,240.

4.2.3. Shrinking observation window for single molecule 4.2.3. Shrinking observation window for single molecule visualization visualization The sequencing of solitary template molecules by fluorescent labeling imposes high The sequencing of solitary template molecules by fluorescent labeling imposes high demands on the imaging system, as it must be able to accurately detect single fluorophores demands on the imaging system, as it must be able to accurately detect single fluorophores against the background signal originating from the other nucleotides present in the reaction. against the background signal originating from the other nucleotides present in the reaction. A few tricks have been used to focus the window of visualization in order to make the A few tricks have been used to focus the window of visualization in order to make the signal-to-noise ratio high enough for stable base calling. Polymerases perform optimally at signal-to-noise ratio high enough for stable base calling. Polymerases perform optimally at micromolar nucleotide concentrations241. At these levels, the background signal from micromolar nucleotide concentrations241. At these levels, the background signal from 32 32 Julia Sandberg Julia Sandberg surrounding fluorescently labeled nucleotides is very high. This issue is eliminated in surrounding fluorescently labeled nucleotides is very high. This issue is eliminated in cyclical wash-and-scan approaches by simply removing unincorporated nucleotides/probes cyclical wash-and-scan approaches by simply removing unincorporated nucleotides/probes before image capture. For real-time sequencing with an optical readout, the issue can be before image capture. For real-time sequencing with an optical readout, the issue can be addressed by shrinking the volume of observation beyond what can be obtained by addressed by shrinking the volume of observation beyond what can be obtained by conventional methods such as confocal or total internal reflection microscopy234. conventional methods such as confocal or total internal reflection microscopy234. The metallic grid in the door of a microwave oven prevents hazardous microwaves from The metallic grid in the door of a microwave oven prevents hazardous microwaves from getting out of the oven while still allowing us to peek through the holes to monitor the getting out of the oven while still allowing us to peek through the holes to monitor the cooking process. The reason for this is that microwaves, as inferred by the name, have cooking process. The reason for this is that microwaves, as inferred by the name, have wavelengths in the µm-range, which makes them too large to pass through the holes of the wavelengths in the µm-range, which makes them too large to pass through the holes of the grid. Visual light on the other hand has short enough wavelengths to get through the grid. Visual light on the other hand has short enough wavelengths to get through the wholes, and we can therefore see through the grid in the door. This concept is used in Zero- wholes, and we can therefore see through the grid in the door. This concept is used in Zero- mode waveguides (ZMW), which are the tiny wells used in Pacific Biosciences sequencing mode waveguides (ZMW), which are the tiny wells used in Pacific Biosciences sequencing technology. ZMWs are 100 nm deep, circular holes (30 nm Ø) in a metal film, placed on a technology. ZMWs are 100 nm deep, circular holes (30 nm Ø) in a metal film, placed on a glass surface. When illuminated from the bottom by laser beamlets whose wavelength is glass surface. When illuminated from the bottom by laser beamlets whose wavelength is greater than the wells’ diameter, this smart construction allows the beamlets to reach 20-30 greater than the wells’ diameter, this smart construction allows the beamlets to reach 20-30 nm into the well. This distance is just enough to reach the polymerase that sits at the nm into the well. This distance is just enough to reach the polymerase that sits at the bottom of the well and excite nucleotides that are close to the polymerase, but the high bottom of the well and excite nucleotides that are close to the polymerase, but the high concentrations of labeled nucleotides that are present in the rest of the well are not concentrations of labeled nucleotides that are present in the rest of the well are not illuminated by the laser and therefore do not affect the analysis234,242. illuminated by the laser and therefore do not affect the analysis234,242. Another variant on the same theme involves using a fluorescent nanocrystal (a so-called Another variant on the same theme involves using a fluorescent nanocrystal (a so-called quantum dot, or Qdot) that is affixed to the polymerase as the source of light. The single quantum dot, or Qdot) that is affixed to the polymerase as the source of light. The single stranded DNA to be sequenced is immobilized on a glass slide and the modified polymerase stranded DNA to be sequenced is immobilized on a glass slide and the modified polymerase binds to the DNA. UV laser illumination excites the polymerase-linked Qdot, causing it to binds to the DNA. UV laser illumination excites the polymerase-linked Qdot, causing it to emit fluorescent light at a lower wavelength. The wavelength emitted by the Qdot matches emit fluorescent light at a lower wavelength. The wavelength emitted by the Qdot matches that required to excite the fluorescently labeled nucleotides. When dye-labeled nucleotides that required to excite the fluorescently labeled nucleotides. When dye-labeled nucleotides are close enough to the polymerase, the light emitted by the Qdot can in turn excite the are close enough to the polymerase, the light emitted by the Qdot can in turn excite the four different fluorophores with which the nucleotides are labeled, in a process called four different fluorophores with which the nucleotides are labeled, in a process called Förster resonance energy transfer (FRET). In this way, nucleotides that are bound to the Förster resonance energy transfer (FRET). In this way, nucleotides that are bound to the polymerase during incorporation will absorb the light from the Qdot and in turn emit a polymerase during incorporation will absorb the light from the Qdot and in turn emit a nucleotide-specific light signal. This makes it possible to determine DNA sequences by real- nucleotide-specific light signal. This makes it possible to determine DNA sequences by real- time monitoring using a CCD camera, and yields exceptionally high read lengths 233,235 time monitoring using a CCD camera, and yields exceptionally high read lengths 233,235 .The method was initially called Visigen, but when LifeTech purchased the technology in .The method was initially called Visigen, but when LifeTech purchased the technology in 2008, it was rebranded as Startlight. More details regarding this approach were presented 2008, it was rebranded as Startlight. More details regarding this approach were presented to a full auditorium at the 2010 AGBT Conference in Florida, but little has been heard to a full auditorium at the 2010 AGBT Conference in Florida, but little has been heard about it since. about it since.

33 33

4.3. Massive sequencing by washing and scanning 4.3. Massive sequencing by washing and scanning 4.3.1. 454 – pyrosequencing en masse 4.3.1. 454 – pyrosequencing en masse The first sign of what was to revolutionize the field of sequencing appeared with the The first sign of what was to revolutionize the field of sequencing appeared with the landmark publication describing 454 sequencing in 200570. For the first time, it became landmark publication describing 454 sequencing in 200570. For the first time, it became possible to sequence hundreds of thousands of amplicons at once. The 454 technology, possible to sequence hundreds of thousands of amplicons at once. The 454 technology, which was named for the project number under which it was developed, is based on the which was named for the project number under which it was developed, is based on the pyrosequencing approach, which involves sequencing by synthesis and was developed at the pyrosequencing approach, which involves sequencing by synthesis and was developed at the Royal Institute of Technology (KTH) in the late 1990s222,223. Briefly, each time a nucleotide Royal Institute of Technology (KTH) in the late 1990s222,223. Briefly, each time a nucleotide is incorporated into the growing DNA chain, one inorganic pyrophosphate (PPi) molecule is is incorporated into the growing DNA chain, one inorganic pyrophosphate (PPi) molecule is released. Through an enzymatic cascade, involving ATP sulfurylase and firefly luciferase, released. Through an enzymatic cascade, involving ATP sulfurylase and firefly luciferase, this results in the emission of a detectable light signal. By sequentially adding one type of this results in the emission of a detectable light signal. By sequentially adding one type of nucleotide at a time, with continuous imaging by CCD cameras, it is possible to determine nucleotide at a time, with continuous imaging by CCD cameras, it is possible to determine the sequence of the target DNA223. An additional enzyme, apyrase, is used to continuously the sequence of the target DNA223. An additional enzyme, apyrase, is used to continuously degrade unused nucleotides in order to keep the background signal low222. Pyrosequencing degrade unused nucleotides in order to keep the background signal low222. Pyrosequencing was originally used with PCR amplicons in 96-well sequencing plates and its read length was originally used with PCR amplicons in 96-well sequencing plates and its read length was limited to 40 bp due to the enzyme-inhibiting effects of the by-products that are formed was limited to 40 bp due to the enzyme-inhibiting effects of the by-products that are formed in each cycle and accumulate as the reactions proceed. The 454 platform uses the same in each cycle and accumulate as the reactions proceed. The 454 platform uses the same sequencing chemistry but the reactions are performed in a fluidic system that makes it sequencing chemistry but the reactions are performed in a fluidic system that makes it possible to perform washing steps in between each cycle to remove byproducts. This has possible to perform washing steps in between each cycle to remove byproducts. This has increased the read lengths from the 40 bp of the original pyrosequencing setup to 700 bp at increased the read lengths from the 40 bp of the original pyrosequencing setup to 700 bp at present. present. Clonal amplification of fragmented and linker-ligated template molecules on the surface of Clonal amplification of fragmented and linker-ligated template molecules on the surface of 20-25 µm sepharose beads is performed within the droplets of an emulsion. DNA beads are 20-25 µm sepharose beads is performed within the droplets of an emulsion. DNA beads are enriched and loaded into individual titanium-coated picolitre volume wells that are etched enriched and loaded into individual titanium-coated picolitre volume wells that are etched into the surface of a fiber-optic slide. Beads are then embedded in smaller, enzyme-covered into the surface of a fiber-optic slide. Beads are then embedded in smaller, enzyme-covered beads and pyrosequencing is carried out by sequentially flooding the chip with different beads and pyrosequencing is carried out by sequentially flooding the chip with different reagents while imaging individual DNA-beads using a CCD camera. Because the method reagents while imaging individual DNA-beads using a CCD camera. Because the method uses natural non-terminated nucleotides, the only way to determine the length of uses natural non-terminated nucleotides, the only way to determine the length of homopolymer stretches in the target is to assess the intensity of the light signal. However, homopolymer stretches in the target is to assess the intensity of the light signal. However, the relationship between repeat length and signal intensity is only linear for homopolymeric the relationship between repeat length and signal intensity is only linear for homopolymeric stretches of up to 5-8 nucleotides70,243,244. Consequently, insertions and deletions are the stretches of up to 5-8 nucleotides70,243,244. Consequently, insertions and deletions are the most common errors in 454 sequence reads and the error rates for homopolymeric regions most common errors in 454 sequence reads and the error rates for homopolymeric regions and their surroundings are high due to the increased risk of carry-forward (premature and their surroundings are high due to the increased risk of carry-forward (premature incorporation of the nucleotide present in the homopolymer) and incomplete incorporation of the nucleotide present in the homopolymer) and incomplete extension243,244. extension243,244. Today, the throughput of 454 sequencing is relatively low and its cost per base is higher Today, the throughput of 454 sequencing is relatively low and its cost per base is higher than other MPS platforms. The power of 454 sequencing is its read length, which puts 454 than other MPS platforms. The power of 454 sequencing is its read length, which puts 454 in a niche of its own for de novo sequencing applications. Currently, 1M sequences with in a niche of its own for de novo sequencing applications. Currently, 1M sequences with average read length of over 700 bp can be obtained in a 23h-sequencing run245. average read length of over 700 bp can be obtained in a 23h-sequencing run245.

34 34 Julia Sandberg Julia Sandberg

4.3.2. SOLiD – chained ligation in color space 4.3.2. SOLiD – chained ligation in color space In Sequencing by Oligonucleotide Ligation and Detection (SOLiD) base interrogation is In Sequencing by Oligonucleotide Ligation and Detection (SOLiD) base interrogation is carried out by cyclic ligation of two-base color-coded probes, using templates that were carried out by cyclic ligation of two-base color-coded probes, using templates that were previously amplified on the surfaces of beads237. Library production and amplification are previously amplified on the surfaces of beads237. Library production and amplification are performed using very similar methods to those used in 454 sequencing, but following performed using very similar methods to those used in 454 sequencing, but following emPCR (cf. the section on “Template amplification”) and DNA bead enrichment, emPCR (cf. the section on “Template amplification”) and DNA bead enrichment, amplified DNA is chemically modified at the 3’ end to enable random immobilization of amplified DNA is chemically modified at the 3’ end to enable random immobilization of the beads on a glass slide for sequencing. By priming at the 5’ end close to the bead surface, the beads on a glass slide for sequencing. By priming at the 5’ end close to the bead surface, the hybridization of color-coded probes to the primer and the ligation of matching probes the hybridization of color-coded probes to the primer and the ligation of matching probes can be followed by washing and imaging to infer the identity of the known bases in the can be followed by washing and imaging to infer the identity of the known bases in the probe. The identities of the bases in the first two positions of each probe are known and are probe. The identities of the bases in the first two positions of each probe are known and are associated with specific fluorescent dyes. Chemical cleavage with silver ions removes the last associated with specific fluorescent dyes. Chemical cleavage with silver ions removes the last three bases and thereby also the fluorescent label. A series of ligation-steps is performed, three bases and thereby also the fluorescent label. A series of ligation-steps is performed, giving rise to equal numbers of color calls with three unknown bases between each color- giving rise to equal numbers of color calls with three unknown bases between each color- call. The extension product is then removed and the template is reset. A new sequencing call. The extension product is then removed and the template is reset. A new sequencing primer that binds to a site one position along the template is then hybridized and a new primer that binds to a site one position along the template is then hybridized and a new round of ligation cycles is performed. There are a variety of different implementations of round of ligation cycles is performed. There are a variety of different implementations of this method that use different numbers of chained ligations and sequencing primers; using this method that use different numbers of chained ligations and sequencing primers; using the most recent setups and instruments, read lengths of up to 75 bp can be attained246. the most recent setups and instruments, read lengths of up to 75 bp can be attained246. While SOLiD sequencing has lagged behind the alternatives (with the exception of Helicos) While SOLiD sequencing has lagged behind the alternatives (with the exception of Helicos) in terms of read lengths since it entered the market, this most recent upgrade greatly in terms of read lengths since it entered the market, this most recent upgrade greatly increases its general utility. increases its general utility. The two-base encoding confers inbuilt error correction and hence gives higher accuracy The two-base encoding confers inbuilt error correction and hence gives higher accuracy than one-base encoding232; SNPs give rise to color-changes in two positions rather than just than one-base encoding232; SNPs give rise to color-changes in two positions rather than just one. However, the technique requires data manipulation in color-space, whereby color- one. However, the technique requires data manipulation in color-space, whereby color- space sequence reads are aligned to a color-space reference genome; this requirement is space sequence reads are aligned to a color-space reference genome; this requirement is unique to SOLiD sequencing. Ligation-based methods also reduce problems with unique to SOLiD sequencing. Ligation-based methods also reduce problems with homopolymeric regions, which can give rise to insertions or deletions in sequence data with homopolymeric regions, which can give rise to insertions or deletions in sequence data with several other platforms. Instead, the most prominent sequencing error is substitution. The several other platforms. Instead, the most prominent sequencing error is substitution. The accuracy of the method can be increased from 99.94% to 99.99% by switching to a more accuracy of the method can be increased from 99.94% to 99.99% by switching to a more expensive protocol that uses a second, three-base encoding probe set. This method yields expensive protocol that uses a second, three-base encoding probe set. This method yields base-calls directly rather than color-calls246. base-calls directly rather than color-calls246.

4.3.3. Illumina – reversible terminators at their best 4.3.3. Illumina – reversible terminators at their best Thanks to a combination of respectable read lengths and high throughput, Illumina Thanks to a combination of respectable read lengths and high throughput, Illumina sequencing dominates the sequencing market today. The platform utilizes bridge sequencing dominates the sequencing market today. The platform utilizes bridge amplification (cf. the section on “Template amplification”) to obtain billions of clusters amplification (cf. the section on “Template amplification”) to obtain billions of clusters made up of identical single-stranded template molecules that are densely packed on a glass made up of identical single-stranded template molecules that are densely packed on a glass slide and ready to be sequenced. Illumina is a “sequencing by synthesis” technology, using slide and ready to be sequenced. Illumina is a “sequencing by synthesis” technology, using reversibly-terminated nucleotides (cf. the section on “Chemistry”). Briefly, the sequencing reversibly-terminated nucleotides (cf. the section on “Chemistry”). Briefly, the sequencing primer is hybridized to a template molecule, a mixture of reversibly terminated color-coded primer is hybridized to a template molecule, a mixture of reversibly terminated color-coded nucleotides is added, and one matching nucleotide is incorporated into each strand. This nucleotides is added, and one matching nucleotide is incorporated into each strand. This step is followed by washing, imaging, and de-blocking, and the whole process is repeated step is followed by washing, imaging, and de-blocking, and the whole process is repeated cyclically. 4-color readouts are obtained using a 2-laser system229. Paired-end sequencing cyclically. 4-color readouts are obtained using a 2-laser system229. Paired-end sequencing

35 35

can be done by switching the direction of the clusters on the slide, through an additional can be done by switching the direction of the clusters on the slide, through an additional bridge amplification step, followed by a second round of sequencing cycles. Illumina bridge amplification step, followed by a second round of sequencing cycles. Illumina sequencing currently offers 100 +100 bp paired-end reads or 150 bp long single end reads sequencing currently offers 100 +100 bp paired-end reads or 150 bp long single end reads from three billion DNA clusters247. The most common sequencing errors are substitutions, from three billion DNA clusters247. The most common sequencing errors are substitutions, which occur when DNA clusters become out-of-sync due to double or incomplete which occur when DNA clusters become out-of-sync due to double or incomplete extension. extension.

4.3.4. Complete Genomics – outsourced sequencing of DNA 4.3.4. Complete Genomics – outsourced sequencing of DNA nano-balls nano-balls The technology used by Complete Genomics is a comparatively inaccessible and not The technology used by Complete Genomics is a comparatively inaccessible and not amenable to modification by the customer, since the company only offers a sequencing amenable to modification by the customer, since the company only offers a sequencing service. Sample preparation and sequencing is achieved by subjecting fragmented DNA service. Sample preparation and sequencing is achieved by subjecting fragmented DNA using a number of cutting - ligation and circularization processes to yield, circularized using a number of cutting - ligation and circularization processes to yield, circularized strands of target DNA containing 4-8 different adapter sequences in between short stretches strands of target DNA containing 4-8 different adapter sequences in between short stretches of template sequence. The adapter sequences are then used as starting points for of template sequence. The adapter sequences are then used as starting points for sequencing. The circular template is amplified by rolling circle amplification to obtain sequencing. The circular template is amplified by rolling circle amplification to obtain numerous copies of the original molecule that are joined in a head-to-toe fashion174 in long numerous copies of the original molecule that are joined in a head-to-toe fashion174 in long single-stranded DNA molecules. These self-assemble into DNA Nano-balls (DNB) that are single-stranded DNA molecules. These self-assemble into DNA Nano-balls (DNB) that are about 200 nm in size. The DNA Nano-balls are then positioned on patterned nanoarrays, about 200 nm in size. The DNA Nano-balls are then positioned on patterned nanoarrays, aligned with the pixels of the imaging camera. Sequencing of each DNB is carried out by an aligned with the pixels of the imaging camera. Sequencing of each DNB is carried out by an unchained ligation technique called combinatorial probe anchor ligation (cPAL) unchained ligation technique called combinatorial probe anchor ligation (cPAL) chemistry236. An adapter-complementary anchor primer is hybridized to the DNB and chemistry236. An adapter-complementary anchor primer is hybridized to the DNB and degenerate nonamer (9-unit)-probes containing one known, color-coded nucleotide in degenerate nonamer (9-unit)-probes containing one known, color-coded nucleotide in position one are hybridized and ligated to the DNB. After washing and fluorescent readout, position one are hybridized and ligated to the DNB. After washing and fluorescent readout, the extended construct is removed from the template and the reaction is repeated, this time the extended construct is removed from the template and the reaction is repeated, this time using probes with a known base in position two. The step is repeated five times to identify using probes with a known base in position two. The step is repeated five times to identify the bases in positions 1-5 and then a new adapter sequence containing five degenerate bases the bases in positions 1-5 and then a new adapter sequence containing five degenerate bases in the end is used to interrogate nucleotides in position 6-10, in a similar manner. Up to ten in the end is used to interrogate nucleotides in position 6-10, in a similar manner. Up to ten bases may be read from each adapter-DNA junction, ultimately yielding a read length of 35 bases may be read from each adapter-DNA junction, ultimately yielding a read length of 35 + 35 bp from each end of the original DNA fragment. The sequencing method was first + 35 bp from each end of the original DNA fragment. The sequencing method was first demonstrated by sequencing three human genomes236. demonstrated by sequencing three human genomes236. The business model adopted by Compete Genomics has been available since November The business model adopted by Compete Genomics has been available since November 2010, and is very different to that used with other MPS platforms. Instead of selling 2010, and is very different to that used with other MPS platforms. Instead of selling instruments and reagents, Complete Genomics offers a sequencing service that specialized instruments and reagents, Complete Genomics offers a sequencing service that specialized in the sequencing of entire human genomes. Customers send in a DNA sample and the in the sequencing of entire human genomes. Customers send in a DNA sample and the whole process of library preparation, sequencing, assembly and variant calling is done by whole process of library preparation, sequencing, assembly and variant calling is done by the company220. For $5,000-$8,000 per genome, customers receive sequence data with 40x the company220. For $5,000-$8,000 per genome, customers receive sequence data with 40x coverage in less than three months248,249. The low price stems from the low cost of the coverage in less than three months248,249. The low price stems from the low cost of the reagents used and the extreme throughput of the system. Complete Genomics has already reagents used and the extreme throughput of the system. Complete Genomics has already sequenced over 1,000 high-coverage genomes for its customers249. sequenced over 1,000 high-coverage genomes for its customers249. Due to the closed nature of the business model of Complete Genomics, information on raw Due to the closed nature of the business model of Complete Genomics, information on raw sequence read quality has been difficult to obtain. An educated guess would be that the sequence read quality has been difficult to obtain. An educated guess would be that the most common errors obtained from unchained ligation, which has no error propagation or most common errors obtained from unchained ligation, which has no error propagation or phasing, would be substitutions due to hybridization and ligation of the wrong probes, phasing, would be substitutions due to hybridization and ligation of the wrong probes, giving rise to miscalls. giving rise to miscalls.

36 36 Julia Sandberg Julia Sandberg

4.3.5. Ion Torrent – moving beyond light 4.3.5. Ion Torrent – moving beyond light The first “post-light sequencing” platform was recently presented and has since then The first “post-light sequencing” platform was recently presented and has since then received significant attention. The platform is basically a miniaturized pH meter, which received significant attention. The platform is basically a miniaturized pH meter, which monitors sequencing by synthesis on a large number of DNA templates, in parallel. When a monitors sequencing by synthesis on a large number of DNA templates, in parallel. When a phosphodiester bond is formed in the nucleotide incorporated into a growing strand, a phosphodiester bond is formed in the nucleotide incorporated into a growing strand, a proton is released from the 3’-OH group of the primer strand and rapidly diffuses away proton is released from the 3’-OH group of the primer strand and rapidly diffuses away from the DNA molecule, which has gained one negative charge unit. Thus, by monitoring from the DNA molecule, which has gained one negative charge unit. Thus, by monitoring the evolution of charge, it is possible to detect nucleotide incorporation events without the evolution of charge, it is possible to detect nucleotide incorporation events without fluorescent labeling or enzymatic cascades that generate luminescence. fluorescent labeling or enzymatic cascades that generate luminescence. The viability of detecting nucleotide incorporation by monitoring charge perturbation was The viability of detecting nucleotide incorporation by monitoring charge perturbation was first demonstrated in 2006 by Pourmand and coworkers, using DNA immobilized on the first demonstrated in 2006 by Pourmand and coworkers, using DNA immobilized on the surface of gold-coated electrodes and a voltage clamp amplifier250,251. Since then the surface of gold-coated electrodes and a voltage clamp amplifier250,251. Since then the technique was further developed to allow sequencing en masse; a commercial platform technique was further developed to allow sequencing en masse; a commercial platform named the “Ion Torrent” system entered the market at the end of 2010228,252. Sample named the “Ion Torrent” system entered the market at the end of 2010228,252. Sample preparation is quite similar to that used in Roche/454 sequencing, involving emulsion PCR preparation is quite similar to that used in Roche/454 sequencing, involving emulsion PCR amplification of adapter-ligated templates on 2 µm acrylamide beads, loading of DNA- amplification of adapter-ligated templates on 2 µm acrylamide beads, loading of DNA- beads into wells, and the cyclical addition of individual types of natural nucleotides one at a beads into wells, and the cyclical addition of individual types of natural nucleotides one at a time, with washing steps in between each addition. However, the reactions are carried out time, with washing steps in between each addition. However, the reactions are carried out in wells in a semiconductor chip, of the kind used in computers and mobile phones; these in wells in a semiconductor chip, of the kind used in computers and mobile phones; these are very cheap to produce. Since the system relies on the direct detection of voltage are very cheap to produce. Since the system relies on the direct detection of voltage changes, no imaging is involved, which avoids the need for expensive optical capture changes, no imaging is involved, which avoids the need for expensive optical capture systems and makes Ion Torrent sequencing simpler, much faster, and more scalable than systems and makes Ion Torrent sequencing simpler, much faster, and more scalable than other wash-and-scan approaches. Its throughput is currently still lower than that of other wash-and-scan approaches. Its throughput is currently still lower than that of established short read platforms (but higher than that of 454), but the manufacturers expect established short read platforms (but higher than that of 454), but the manufacturers expect that increasing the size of the semiconductor chip will allow for increased throughput in the that increasing the size of the semiconductor chip will allow for increased throughput in the near future. The average read length is currently 100 bp, but 200 bp reads have been near future. The average read length is currently 100 bp, but 200 bp reads have been acquired228 and as of 2012 an increase to 400 bp-reads is foreseen. According to Ion acquired228 and as of 2012 an increase to 400 bp-reads is foreseen. According to Ion Torrent, the per-base read accuracy is 99.5%, but as with 454 sequencing, homopolymers Torrent, the per-base read accuracy is 99.5%, but as with 454 sequencing, homopolymers are problematic: 5-mers are called with 97,5% accuracy, and the probability of error rises are problematic: 5-mers are called with 97,5% accuracy, and the probability of error rises as the repeats get longer252. as the repeats get longer252. In contrast to 454 sequencing, Ion Torrent does not require any enzymatic degradation of In contrast to 454 sequencing, Ion Torrent does not require any enzymatic degradation of the nucleotides (possibly due to the fact that the relevant patent is owned by Qiagen, who the nucleotides (possibly due to the fact that the relevant patent is owned by Qiagen, who acquired it when they bought the rights to pyrosequencing). The Ion Torrent proof of acquired it when they bought the rights to pyrosequencing). The Ion Torrent proof of concept publication claims that the well size is small enough to allow rapid diffusion of concept publication claims that the well size is small enough to allow rapid diffusion of nucleotides out of the wells228. However, since 454 sequencing and Ion Torrent have many nucleotides out of the wells228. However, since 454 sequencing and Ion Torrent have many technical similarities, the discrepancy between their maximum read lengths suggests that technical similarities, the discrepancy between their maximum read lengths suggests that the accumulation of leftover material does indeed limit the Ion Torrent read length. It is the accumulation of leftover material does indeed limit the Ion Torrent read length. It is conceivable that this could be ameliorated by using electrical charges to facilitate nucleotide conceivable that this could be ameliorated by using electrical charges to facilitate nucleotide removal while washing. removal while washing.

4.3.6. Helicos – single molecule sequencing in its infancy 4.3.6. Helicos – single molecule sequencing in its infancy Amplification prior to sequencing is problematic because it can introduce errors and bias in Amplification prior to sequencing is problematic because it can introduce errors and bias in the sequenced clones97,253,254. Notably, because amplification is affected by fragment length the sequenced clones97,253,254. Notably, because amplification is affected by fragment length and GC content97,227,254,255 amplification before sequencing may skew the coverage of the and GC content97,227,254,255 amplification before sequencing may skew the coverage of the target genome. target genome.

37 37

The Helicos Genetic Analysis technique was the first single-molecule sequencing platform The Helicos Genetic Analysis technique was the first single-molecule sequencing platform to become available, this was in 2008230. Single molecule detection makes Helicos exclusive, to become available, this was in 2008230. Single molecule detection makes Helicos exclusive, although it relies on chemistry and procedures that are rather similar to those used in although it relies on chemistry and procedures that are rather similar to those used in previously-described massive sequencing techniques. Fragmented DNA templates are 3’ previously-described massive sequencing techniques. Fragmented DNA templates are 3’ polyadenylated, Cy3-labeled and blocked by the incorporation of Cy3 ddTTP, and then polyadenylated, Cy3-labeled and blocked by the incorporation of Cy3 ddTTP, and then captured onto a planar surface by covalently bound “5’ down” poly(dT) probes230. captured onto a planar surface by covalently bound “5’ down” poly(dT) probes230. Sequencing of surface-attached templates is achieved by incorporating reversibly-inhibited Sequencing of surface-attached templates is achieved by incorporating reversibly-inhibited nucleotides, in a step-wise one-color setup. The patented nucleotide analogs carry an nucleotides, in a step-wise one-color setup. The patented nucleotide analogs carry an inhibitor and a fluorescent label, both of which are connected to the base by the same inhibitor and a fluorescent label, both of which are connected to the base by the same cleavable linker; this makes it possible to remove both the inhibitor and the label in a single cleavable linker; this makes it possible to remove both the inhibitor and the label in a single step that affects only one position on the nucleotide and leaves it ready for the next cycle231. step that affects only one position on the nucleotide and leaves it ready for the next cycle231. Imaging of single Cy5 fluorophores that have been introduced into the template strands in Imaging of single Cy5 fluorophores that have been introduced into the template strands in each cycle is achieved using a Charged Coupled Device (CCD). each cycle is achieved using a Charged Coupled Device (CCD). The incorporation of unlabeled ‘dark bases’ results in deletion sequencing errors; the The incorporation of unlabeled ‘dark bases’ results in deletion sequencing errors; the accuracy of the sequence data can be increased by increasing the amount of time spent on accuracy of the sequence data can be increased by increasing the amount of time spent on the sequencing process. This requires a slightly different library preparation protocol that the sequencing process. This requires a slightly different library preparation protocol that involves adapter ligation prior to 3’ polyadenylation in order to introduce a priming site at involves adapter ligation prior to 3’ polyadenylation in order to introduce a priming site at the distal end of the fragment. This site is captured in the same way as above; however, by the distal end of the fragment. This site is captured in the same way as above; however, by extending the immobilized probes, it is possible to obtain copies of every template that is extending the immobilized probes, it is possible to obtain copies of every template that is covalently attached to the glass surface. These copies are primed using an adapter- covalently attached to the glass surface. These copies are primed using an adapter- complementary primer and can be sequenced repeatedly by melting off the extended complementary primer and can be sequenced repeatedly by melting off the extended primer and restarting the sequencing reaction230. The same library preparation procedure primer and restarting the sequencing reaction230. The same library preparation procedure also makes it possible to perform paired-end reads by sequencing the hybridized template also makes it possible to perform paired-end reads by sequencing the hybridized template rather than performing the first copying step. Moreover, if the polymerase is exchanged for rather than performing the first copying step. Moreover, if the polymerase is exchanged for a reverse transcriptase, the same setup can be used for the direct sequencing of RNA a reverse transcriptase, the same setup can be used for the direct sequencing of RNA molecules without the need for erroneous and bias-prone cDNA synthesis and further molecules without the need for erroneous and bias-prone cDNA synthesis and further amplification256,257. amplification256,257. The amplification-free template preparation protocol is an important feature of the Helicos The amplification-free template preparation protocol is an important feature of the Helicos technology and has proven particularly useful for the sequencing of short and technology and has proven particularly useful for the sequencing of short and damaged/degraded DNA fragments from samples such as paraffin-embedded tissue or damaged/degraded DNA fragments from samples such as paraffin-embedded tissue or ancient remains. It is useful because compromised DNA quality can give rise to elevated ancient remains. It is useful because compromised DNA quality can give rise to elevated bias in library preparations that involve PCR amplification258. bias in library preparations that involve PCR amplification258. The single-molecule nature of the platform provides an advantage in terms of library The single-molecule nature of the platform provides an advantage in terms of library preparation time and costs and absence of an error-prone PCR step. However, it is a wash- preparation time and costs and absence of an error-prone PCR step. However, it is a wash- and-scan based platform, which increases the time and reagent cost of the sequencing and-scan based platform, which increases the time and reagent cost of the sequencing reaction. In fact, the single-molecule nature of the Helicos platform works to its reaction. In fact, the single-molecule nature of the Helicos platform works to its disadvantage as seen in its high error rate. Deletion is the most common error type, even disadvantage as seen in its high error rate. Deletion is the most common error type, even with two-pass sequencing, and the substantial error rates (3-7% for one pass and 2-5% for with two-pass sequencing, and the substantial error rates (3-7% for one pass and 2-5% for two passes)230,259 must be taken into account, although the impact of this is mitigated to two passes)230,259 must be taken into account, although the impact of this is mitigated to some extent by the high throughput of the instrument (600 M-1 B reads per run)259-261. The some extent by the high throughput of the instrument (600 M-1 B reads per run)259-261. The problems relating to sequencing errors also limit the method’s read length to 32-33 bp257,259. problems relating to sequencing errors also limit the method’s read length to 32-33 bp257,259. Reads this short present challenges in terms of stitching together a genome, especially in Reads this short present challenges in terms of stitching together a genome, especially in repetitive regions259. Because it has the shortest read length of all contemporary platforms repetitive regions259. Because it has the shortest read length of all contemporary platforms and the highest error rates, along with certain problems with instrument robustness, Helicos and the highest error rates, along with certain problems with instrument robustness, Helicos is the least popular of the currently-available platforms262. is the least popular of the currently-available platforms262.

38 38 Massive sequening by wash-and-scan Massive sequening by wash-and-scan Julia Sandberg Julia Sandberg Emulsion PCR Bridge amplification Rollling circle amplification Emulsion PCR Bridge amplification Rollling circle amplification Ligate Ligate Primers half- Primers half- immobilized adapters immobilized adapters Repeat Repeat Cut with Cut with restriction restriction enzyme enzyme Cyclic Circularised Cyclic Circularised amplification template with amplification template with anchors anchors

Primers on beads RCA DNA Primers on beads RCA DNA Nanoball Nanoball PCR in (DNB) PCR in (DNB) droplets droplets

DNA covered beads are sequenced DNA covered beads are sequenced DNBs are sequenced DNBs are sequenced SOLiD Cleavage SOLiD Cleavage Color-encoded Complete Genomics Color-encoded Complete Genomics oligos ligated oligos ligated Clusters are sequenced Ligation of Clusters are sequenced Ligation of probeset 4 color-encoded probeset 4 color-encoded oligos oligos 454 APS 454 APS Wash & Wash & four-color four-color Sulphurylase Wash & Sulphurylase Wash & imaging four-color imaging four-color Luciferin Luciferin PP released at imaging PP released at imaging i ATP i ATP dNTP Repeat dNTP Repeat Cleave off incorporation PPi with new Cleave off incorporation PPi with new last three probeset Reset reaction last three probeset Reset reaction Luciferase Luciferase bases & dye (total 5) bases & dye (total 5)

Apyrase Apyrase Repeat Sequencing primer Repeat Sequencing primer Repeat chained for all Repeat chained for all ligation seven times with another offset ligation seven times with another offset Sequential addition of one dNTP at a time anchors Sequential addition of one dNTP at a time anchors Repeat for Sequencing Repeat for Sequencing five primers with five primers with different offsets different offsets sequencing Illumina sequencing Illumina primers primers Second Base Second Base AC G T Introduce Wash & Cleave dye AC G T Introduce Wash & Cleave dye

A Base First terminated four-color & A Base First terminated four-color & C nucleotides imaging terminator C nucleotides imaging terminator G G T T

Ion Torrent Ion Torrent Next base is read Next base is read

+ End-labeled Helicos + End-labeled Helicos H fragments are Single H fragments are Single captured & identified captured & identified labeled Wash & labeled Wash & + pH change nucleotides Cleave + pH change nucleotides Cleave H released at + one-color dye & H released at + one-color dye & dNTP H imaging inhibitor dNTP H imaging inhibitor incorporation incorporation

A A

Sequential addition of one dNTP at a time Sequential addition of one dNTP at a time

Fragments are sequenced Fragments are sequenced DNA polymerase Next type of dNTP DNA polymerase Next type of dNTP

39 39

4.4. Massive sequencing in real time 4.4. Massive sequencing in real time 4.4.1. Pacific Biosciences – immobilized polymerase in a tiny 4.4.1. Pacific Biosciences – immobilized polymerase in a tiny well well Real-time sequencing of single DNA molecules first became available when PacBio RS Real-time sequencing of single DNA molecules first became available when PacBio RS started shipping in April 2011. As described above, there are various tricks that can be used started shipping in April 2011. As described above, there are various tricks that can be used to monitor individual DNA polymerases as they synthesize a complementary DNA strand. to monitor individual DNA polymerases as they synthesize a complementary DNA strand. Pacific Biosciences uses ZeroModeWaveguides (ZMW) to obtain a nanoscopic sequencing Pacific Biosciences uses ZeroModeWaveguides (ZMW) to obtain a nanoscopic sequencing chamber containing one DNA polymerase each, to enable real-time monitoring of DNA chamber containing one DNA polymerase each, to enable real-time monitoring of DNA polymerization for up to 15,000 bp234,242,263. polymerization for up to 15,000 bp234,242,263. The polymerase used is a modified variant of φ29, which has been engineered to work more The polymerase used is a modified variant of φ29, which has been engineered to work more slowly than the natural enzyme. In conjunction with the fact that the reaction is performed slowly than the natural enzyme. In conjunction with the fact that the reaction is performed at low temperatures, this reduces the natural speed of nucleotide incorporation to about one at low temperatures, this reduces the natural speed of nucleotide incorporation to about one nucleotide per second, which is sufficiently slow that each incorporation event can be nucleotide per second, which is sufficiently slow that each incorporation event can be captured on film. Fluorescently-labeled nucleotides diffuse freely into and out of the ZMW; captured on film. Fluorescently-labeled nucleotides diffuse freely into and out of the ZMW; the only nucleotides in the system that are not in motion are those that are being held in the only nucleotides in the system that are not in motion are those that are being held in place by the immobilized polymerase immediately prior to their incorporation. Because the place by the immobilized polymerase immediately prior to their incorporation. Because the fluorescent dye is attached to the terminal phosphate group of the nucleotide rather than to fluorescent dye is attached to the terminal phosphate group of the nucleotide rather than to the base, it is automatically cleaved off by the polymerase at incorporation and leaves no the base, it is automatically cleaved off by the polymerase at incorporation and leaves no trace in the growing DNA strand234. trace in the growing DNA strand234. Template preparation is PCR-free and involves DNA fragmentation and end repair, Template preparation is PCR-free and involves DNA fragmentation and end repair, followed by ligation of looped adapters to effect template circularization. The followed by ligation of looped adapters to effect template circularization. The circularization of the template is critical, since it “locks” the polymerase on the template circularization of the template is critical, since it “locks” the polymerase on the template and enables repeated sequencing of the template. Sequencing errors are common in this and enables repeated sequencing of the template. Sequencing errors are common in this technology and occur for two reasons. First, if a (correct) nucleotide binds to the polymerase technology and occur for two reasons. First, if a (correct) nucleotide binds to the polymerase active site but is not incorporated, it may be interpreted as an incorporation event, creating active site but is not incorporated, it may be interpreted as an incorporation event, creating an insertion error in the sequence data. Second, if the incorporation event occurs too an insertion error in the sequence data. Second, if the incorporation event occurs too quickly, it may be overlooked, resulting in a deletion error. The accuracy for each quickly, it may be overlooked, resulting in a deletion error. The accuracy for each nucleotide position in a single-end read is only 85%, which makes re-reading of the circular nucleotide position in a single-end read is only 85%, which makes re-reading of the circular template very important in obtaining accurate sequences263,264. However, light damage template very important in obtaining accurate sequences263,264. However, light damage limits the life span of the polymerase; because of this, it is currently only possible to limits the life span of the polymerase; because of this, it is currently only possible to sequence short templates for circular consensus. In other words, one must choose between sequence short templates for circular consensus. In other words, one must choose between accuracy and read length. Templates of 1-6 kb currently yield single reads with moderate accuracy and read length. Templates of 1-6 kb currently yield single reads with moderate accuracy263, while templates of 250 bp -1 kb can be sequenced repeatedly to yield a circular accuracy263, while templates of 250 bp -1 kb can be sequenced repeatedly to yield a circular consensus sequence with a lower error rate. Extended read lengths can be achieved using consensus sequence with a lower error rate. Extended read lengths can be achieved using “strobed reads”. By turning the laser illumination on and off at certain intervals, the life “strobed reads”. By turning the laser illumination on and off at certain intervals, the life span of the polymerase is prolonged and a scaffold of linked sequence reads is obtained span of the polymerase is prolonged and a scaffold of linked sequence reads is obtained from a single template. “Strobed reads” can currently be used to obtain sequence from a single template. “Strobed reads” can currently be used to obtain sequence information from templates between 6-10 kb; this can be useful in de novo assembly or for information from templates between 6-10 kb; this can be useful in de novo assembly or for resolving repetitive regions. Sequencing is done in a strip of 8 SMRTcells; each SMRTcell resolving repetitive regions. Sequencing is done in a strip of 8 SMRTcells; each SMRTcell contains 150,000 ZMWs, of which one third will contain a single DNA polymerase234,242. contains 150,000 ZMWs, of which one third will contain a single DNA polymerase234,242. The number of concurrent sequencing reactions is currently limited by the 75k laser The number of concurrent sequencing reactions is currently limited by the 75k laser beamlets that are needed to individually illuminate a single ZMW, although it is possible to beamlets that are needed to individually illuminate a single ZMW, although it is possible to analyze several SMRTcells sequentially. analyze several SMRTcells sequentially.

40 40 Julia Sandberg Julia Sandberg

The long read lengths attained with the Pacific Biosciences method should be useful for The long read lengths attained with the Pacific Biosciences method should be useful for finding fused genes and for sequencing bacterial genomes263,265. Moreover, it has been finding fused genes and for sequencing bacterial genomes263,265. Moreover, it has been shown that by analyzing the kinetic signature obtained by repeated sequencing of a sample, shown that by analyzing the kinetic signature obtained by repeated sequencing of a sample, DNA modifications such as methylation patterns can be determined264. A recently-reported DNA modifications such as methylation patterns can be determined264. A recently-reported study exploited the flexibility of the Pacific Biosciences sequencing instrument in an study exploited the flexibility of the Pacific Biosciences sequencing instrument in an interesting way: by exchanging the DNA polymerase for a ribosome and adding labeled interesting way: by exchanging the DNA polymerase for a ribosome and adding labeled tRNAs instead of nucleotides, it was possible to observe mRNA translation in real time266. tRNAs instead of nucleotides, it was possible to observe mRNA translation in real time266. The flexibility of the platform is appealing and a similar approach could likely be adopted The flexibility of the platform is appealing and a similar approach could likely be adopted for other enzymes such as reverse transcriptase or cellular receptors, thus it will be for other enzymes such as reverse transcriptase or cellular receptors, thus it will be interesting to follow the future applications of this platform. interesting to follow the future applications of this platform.

4.4.2. Nanopore sequencing – electrical measurement through 4.4.2. Nanopore sequencing – electrical measurement through a pore a pore Nanopore sequencing is still in an early stage of development, but offers the possibility of Nanopore sequencing is still in an early stage of development, but offers the possibility of achieving high speeds and long reads (>5 kb)267 with unlabeled, unamplified DNA, achieving high speeds and long reads (>5 kb)267 with unlabeled, unamplified DNA, meaning it could potentially have many significant advantages over contemporary meaning it could potentially have many significant advantages over contemporary sequencing platforms. sequencing platforms. The central concept in nanopore sequencing is to monitor the translocation of a DNA The central concept in nanopore sequencing is to monitor the translocation of a DNA molecule through a nanometer-sized pore in a membrane. By applying an electrical field to molecule through a nanometer-sized pore in a membrane. By applying an electrical field to the membrane, the negatively charged DNA molecule can be threaded through the pore in the membrane, the negatively charged DNA molecule can be threaded through the pore in sequential order. The identity of the individual bases passing through the pore can be read sequential order. The identity of the individual bases passing through the pore can be read out in various ways. Nucleotide bases (i.e. dsDNA, ssDNA and free nucleotides) passing out in various ways. Nucleotide bases (i.e. dsDNA, ssDNA and free nucleotides) passing through a nanopore block the current that naturally flows through the pore in a through a nanopore block the current that naturally flows through the pore in a characteristic way (charge blocking), giving rise to an identifiable electrical signature267. characteristic way (charge blocking), giving rise to an identifiable electrical signature267. Alternatively, the nucleotides could be labeled with a fluorescent tag that would be read out Alternatively, the nucleotides could be labeled with a fluorescent tag that would be read out optically as the nucleotide traverses the pore268. Current approaches to nanopore optically as the nucleotide traverses the pore268. Current approaches to nanopore sequencing use either natural or mutant protein nanopores, or pores manufactured in solid- sequencing use either natural or mutant protein nanopores, or pores manufactured in solid- state membranes. state membranes. The translocation velocity through the pore is extremely high; with the widely-used protein The translocation velocity through the pore is extremely high; with the widely-used protein nanopore α-hemolysin it is >1 nucleotide/10 µs, and with solid-state membranes it is >1 nanopore α-hemolysin it is >1 nucleotide/10 µs, and with solid-state membranes it is >1 nucleotide/10 ns267. It can thus be difficult to obtain satisfactory resolution in nanopore nucleotide/10 ns267. It can thus be difficult to obtain satisfactory resolution in nanopore sequencing; this issue has been addressed in various ways. It has been proposed that sequencing; this issue has been addressed in various ways. It has been proposed that converting the DNA code into a longer (binary) code would be one way of getting around converting the DNA code into a longer (binary) code would be one way of getting around the issue of the single nucleotide resolution of nanopores; this approach could be used in the issue of the single nucleotide resolution of nanopores; this approach could be used in combination with fluorescent detection by the hybridization of color-coded probes to each combination with fluorescent detection by the hybridization of color-coded probes to each DNA segment268. DNA segment268. Slowing down the translocation through the nanopore would also make signal Slowing down the translocation through the nanopore would also make signal interpretation easier. An engineered α-hemolysin nanopore has been modified to briefly interpretation easier. An engineered α-hemolysin nanopore has been modified to briefly bind each nucleotide as they pass through long enough to enable nucleotide identification bind each nucleotide as they pass through long enough to enable nucleotide identification based on charge blocking. It has been demonstrated that nucleotide identification by using based on charge blocking. It has been demonstrated that nucleotide identification by using an exonuclease to sequentially chew off bases from a DNA template and ‘spit’ them into the an exonuclease to sequentially chew off bases from a DNA template and ‘spit’ them into the nearby nanopore is possible269,270. The difficulty appears to lie in ensuring that every nearby nanopore is possible269,270. The difficulty appears to lie in ensuring that every nucleotide reaches the nanopore. Another protein nanopore (MspA), which is narrow nucleotide reaches the nanopore. Another protein nanopore (MspA), which is narrow enough to let ssDNA through, but temporarily halts at a double stranded patch, has enough to let ssDNA through, but temporarily halts at a double stranded patch, has

41 41

recently been engineered and used to distinguish nucleotides separated by double-stranded recently been engineered and used to distinguish nucleotides separated by double-stranded sections in a promising proof-of-concept study271. sections in a promising proof-of-concept study271. While nanopore methods have yet to be fully developed, the possibility of reading DNA While nanopore methods have yet to be fully developed, the possibility of reading DNA sequences without relying on sequencing by synthesis or ligation and without enzymatic sequences without relying on sequencing by synthesis or ligation and without enzymatic cascades or template amplification is very tantalizing, as it would potentially allow for very cascades or template amplification is very tantalizing, as it would potentially allow for very rapid and inexpensive sequencing; the considerable effort currently being expended on rapid and inexpensive sequencing; the considerable effort currently being expended on studying and developing this technique is wholly justified. studying and developing this technique is wholly justified.

>&,9)+%'#)+54)+%1+@4+,5&,9 >&,9)+%'#)+54)+%1+@4+,5&,9 <05&:&5%=&+,5+1 >$0/)&97$ ?0,-#/+%1+@4+,5&,9 <05&:&5%=&+,5+1 >$0/)&97$ ?0,-#/+%1+@4+,5&,9

2.+3)0(+)+"% 2.+3)0(+)+"% ,45)+#$&"+1%0/+% ,45)+#$&"+1%0/+% ,#$%+65&$+"%7+/+ ,#$%+65&$+"%7+/+ 2.+3)0(+)+"% 2.+3)0(+)+"% ,45)+#$&"+1%0/+% ,45)+#$&"+1%0/+% ,#$%+65&$+%7+/+ !"#$%&''#(#)&*+"% ,#$%+65&$+%7+/+ !"#$%&''#(#)&*+"% #,%-#).'+/01+ #,%-#).'+/01+ A4//+,$%()#5B09+%&1% A4//+,$%()#5B09+%&1% "+$+5$+" "+$+5$+"

801+/%)&97$% 801+/%)&97$% !"#$%+65&$+1% !"#$%+65&$+1% /+057+1%$7&1%:0/% /+057+1%$7&1%:0/% :)4#/#-7#/+%#,% :)4#/#-7#/+%#,% &,$#%$7+%;+))% - &,$#%$7+%;+))% - ,45)+#$&"+%$70$%&1% ,45)+#$&"+%$70$%&1% (+&,9%&,5#/-#3 (+&,9%&,5#/-#3 /0$+" A + /0$+" A + <#).'+/01+%&''#(&)&*+"%0$% <#).'+/01+%&''#(&)&*+"%0$% ;+))%(#$$#' ;+))%(#$$#'

4.5. Choosing sequencing platform(s) 4.5. Choosing sequencing platform(s)

The sequencing platform of choice usually depends on the application at hand. The The sequencing platform of choice usually depends on the application at hand. The enormous sequence capacity offered by short-read platforms such as HiSeq, SOLiD (and to enormous sequence capacity offered by short-read platforms such as HiSeq, SOLiD (and to some extent Helicos) enables mutational detection by targeted (i.e. whole exome) some extent Helicos) enables mutational detection by targeted (i.e. whole exome) resequencing as well as variant discovery by resequencing of whole genomes. The massive resequencing as well as variant discovery by resequencing of whole genomes. The massive number of reads obtained from these platforms also makes them more useful for number of reads obtained from these platforms also makes them more useful for quantitative expression analysis by RNA sequencing. Helicos, is due to its high error rate quantitative expression analysis by RNA sequencing. Helicos, is due to its high error rate and short read lengths, is currently not a system of choice in many labs, but it is the only and short read lengths, is currently not a system of choice in many labs, but it is the only platform that enables direct RNA sequencing. The longer read lengths of the 454 system platform that enables direct RNA sequencing. The longer read lengths of the 454 system give it a special niche in de novo sequencing, since long sequence reads are valuable for give it a special niche in de novo sequencing, since long sequence reads are valuable for stitching together new genomes. However, the throughput of this platform is comparatively stitching together new genomes. However, the throughput of this platform is comparatively low and its cost per base is high. A combination of several sequencing platforms may offer low and its cost per base is high. A combination of several sequencing platforms may offer the best combination of long and massive reads for whole genome sequencing. the best combination of long and massive reads for whole genome sequencing.

42 42 Julia Sandberg Julia Sandberg

The cost per base of DNA sequencing is continuously decreasing due to industrial The cost per base of DNA sequencing is continuously decreasing due to industrial competition in combination with technical innovations, which are constantly increasing the competition in combination with technical innovations, which are constantly increasing the data output from one instrument run. However the runtime of established short-read data output from one instrument run. However the runtime of established short-read platforms is measured in weeks rather than hours and the large throughput of most MPS platforms is measured in weeks rather than hours and the large throughput of most MPS platforms does not match perfectly with sequencing projects of moderate size or clinical platforms does not match perfectly with sequencing projects of moderate size or clinical applications in which rapid results are needed (e.g. preimplantation genetics). To meet the applications in which rapid results are needed (e.g. preimplantation genetics). To meet the demand for quick and simple sequencing, several scaled-down sequencing platforms have demand for quick and simple sequencing, several scaled-down sequencing platforms have been announced (e.g. 454 GS Junior, Illumina MiSeq), which yield a fraction of the reads been announced (e.g. 454 GS Junior, Illumina MiSeq), which yield a fraction of the reads (GS Junior >100,000 reads, MiSeq >3.5M reads) of the large instruments in days rather (GS Junior >100,000 reads, MiSeq >3.5M reads) of the large instruments in days rather than weeks. The Ion Torrent platform, with its low-cost 2h-runs and comparably low data than weeks. The Ion Torrent platform, with its low-cost 2h-runs and comparably low data output is also a suitable alternative for smaller labs or experiments. output is also a suitable alternative for smaller labs or experiments.

Table 1. Overview over sequencing platforms 2011 Table 1. Overview over sequencing platforms 2011

Real-time Single molecule Template amplification Average Throughput Price per Real-time Single molecule Template amplification Average Throughput Price per Platform Chemistry Reads per run Platform Chemistry Reads per run detection detection method Readlength (bp) (Gb/day) base detection detection method Readlength (bp) (Gb/day) base

First Generation Sanger sequencing Enzymatic chain termination No No Cloning & PCR 1000 96-384 0.0006 $$$$ First Generation Sanger sequencing Enzymatic chain termination No No Cloning & PCR 1000 96-384 0.0006 $$$$ Pyrosequencing SBS/Pyrosequencing No No PCR 20-40 96 0.00003 $$$$$ Pyrosequencing SBS/Pyrosequencing No No PCR 20-40 96 0.00003 $$$$$ Second generation Roche/454 SBS/Pyrosequencing No No emPCR 700 1 M 0.4-0.6 $$$ Second generation Roche/454 SBS/Pyrosequencing No No emPCR 700 1 M 0.4-0.6 $$$ Illumina/Solexa SBS/Four-color Cyclic Reversible Termination No No Bridge amplification 100+100 3000 M 25 $$ Illumina/Solexa SBS/Four-color Cyclic Reversible Termination No No Bridge amplification 100+100 3000 M 25 $$ Life Tech/SOLiD SBL/Chained, two-base encoding No No emPCR 75+35 4000 M 20-30 $$ Life Tech/SOLiD SBL/Chained, two-base encoding No No emPCR 75+35 4000 M 20-30 $$ Helicos SBS/One-color Cyclic Reversible Termination No Yes - 33 600 M-1000 M 21-35 $$ Helicos SBS/One-color Cyclic Reversible Termination No Yes - 33 600 M-1000 M 21-35 $$ Complete Genomics SBL/Unchained, one-base encoding No No RCA 35+35 5000-15000 M 30-90 $$ Complete Genomics SBL/Unchained, one-base encoding No No RCA 35+35 5000-15000 M 30-90 $$ Life Tech/Ion Torrent SBS/H+ release causes voltage change No No emPCR 200 0.5 M 1.2 $$ Life Tech/Ion Torrent SBS/H+ release causes voltage change No No emPCR 200 0.5 M 1.2 $$ Third generation Pacific Biosciences SBS/Color-coded nucleotides and ZMW Yes Yes - 2060 0.05 M/SMRTcell 0.5-0.9 $$$ Third generation Pacific Biosciences SBS/Color-coded nucleotides and ZMW Yes Yes - 2060 0.05 M/SMRTcell 0.5-0.9 $$$ Life Tech/Visigen SBS/Color-coded nucleotides and FRET Yes Yes - - - - - Life Tech/Visigen SBS/Color-coded nucleotides and FRET Yes Yes - - - - - Nanopores Current change as DNA/nucleotides pass Yes Yes - - - - - Nanopores Current change as DNA/nucleotides pass Yes Yes - - - - -

Real-time detection is defined as unhalted detection of DNA synthesis or translocation through a nanopore. Conventional pyrosequencing is commonly seen as a real-time Real-time detection is defined as unhalted detection of DNA synthesis or translocation through a nanopore. Conventional pyrosequencing is commonly seen as a real-time process, since the activity of the polymerase is monitored in real time. However, since only one type of nucleotide is present at a time, it does not fall under the current process, since the activity of the polymerase is monitored in real time. However, since only one type of nucleotide is present at a time, it does not fall under the current definition of real-time detection platform. Throughput per day is calculated, using times not including template amplification. Price per base for the different platforms is definition of real-time detection platform. Throughput per day is calculated, using times not including template amplification. Price per base for the different platforms is described by ranking, where $$$$$ indicates highest cost per base, however the bins do not correlate to a linear increase in price. described by ranking, where $$$$$ indicates highest cost per base, however the bins do not correlate to a linear increase in price.

43 43

4.6. Applications of massive sequencing 4.6. Applications of massive sequencing

The ability of massive sequencing to produce humongous amounts of data at a low price The ability of massive sequencing to produce humongous amounts of data at a low price enables questions to be asked that were unimaginable not too long ago. The publication of enables questions to be asked that were unimaginable not too long ago. The publication of the reference sequence of the human genome one decade ago218,219 was one of the largest the reference sequence of the human genome one decade ago218,219 was one of the largest scientific accomplishments in modern history. However, to maximize the impact of this scientific accomplishments in modern history. However, to maximize the impact of this landmark achievement on society, it will be necessary to identify genomic variants of landmark achievement on society, it will be necessary to identify genomic variants of medical importance, such as traits connected to disease susceptibility or drug responses. medical importance, such as traits connected to disease susceptibility or drug responses. During the last ten years, a second phase of human genomic science has taken over, with During the last ten years, a second phase of human genomic science has taken over, with the aim of finding genomic variants linked to inherited diseases and other traits. This can be the aim of finding genomic variants linked to inherited diseases and other traits. This can be achieved in various ways, including by investigating genomes as a whole, particular parts or achieved in various ways, including by investigating genomes as a whole, particular parts or modifications thereof, or by analyzing gene expression. The examination of the genomes of modifications thereof, or by analyzing gene expression. The examination of the genomes of related species can also shed light on questions such as the origin of species. related species can also shed light on questions such as the origin of species.

4.6.1. Whole genome sequencing 4.6.1. Whole genome sequencing

De novo sequencing De novo sequencing Initial, de novo, sequencing of unknown genomes, is common today; as of September 2011, Initial, de novo, sequencing of unknown genomes, is common today; as of September 2011, the IMG database272 that keeps track of finished and draft genomes, contained 6891 the IMG database272 that keeps track of finished and draft genomes, contained 6891 genomes from various animal, and microbial species273 and the figure is continuously genomes from various animal, plant and microbial species273 and the figure is continuously increasing. Reference genomes from species with symbolic value such as the kangaroo274, increasing. Reference genomes from species with symbolic value such as the kangaroo274, orangutan275 and giant panda276, as well as species that are important in the food industry orangutan275 and giant panda276, as well as species that are important in the food industry such as cacao277, wild strawberry278, soybean279 and potato280 are constantly being such as cacao277, wild strawberry278, soybean279 and potato280 are constantly being generated. Obtaining sequence information on whole genomes of related organisms has generated. Obtaining sequence information on whole genomes of related organisms has allowed a number of interesting large-scale evolutionary studies and shed light on allowed a number of interesting large-scale evolutionary studies and shed light on phylogenetic relationships in species ranging from marsupials274 to vascular plants281 and phylogenetic relationships in species ranging from marsupials274 to vascular plants281 and Hominidae, the great apes275. Hominidae, the great apes275. Although de novo assemblies have been achieved using short sequencing reads alone282,283, Although de novo assemblies have been achieved using short sequencing reads alone282,283, bioinformatic assembly is greatly facilitated by long sequence reads284, which suggests that a bioinformatic assembly is greatly facilitated by long sequence reads284, which suggests that a combination of current sequencing methods may be useful for obtaining sequences with a combination of current sequencing methods may be useful for obtaining sequences with a length and depth suitable for de novo genome assembly. Mate-pairs may be helpful in length and depth suitable for de novo genome assembly. Mate-pairs may be helpful in stitching together unknown sequences and getting through problematic repetitive regions stitching together unknown sequences and getting through problematic repetitive regions and for detecting structural variation between genomes285. Mate-pair libraries contain two and for detecting structural variation between genomes285. Mate-pair libraries contain two joined segments that were originally situated several kilobasepairs from each other; the joined segments that were originally situated several kilobasepairs from each other; the generation of such libraries provides a way to increase the physical coverage of the genome, generation of such libraries provides a way to increase the physical coverage of the genome, since the distance between the two ends is known. Mate-pairs are generally created by end- since the distance between the two ends is known. Mate-pairs are generally created by end- biotinylation and circularization of kb-sized gDNA fragments, followed by fragmentation biotinylation and circularization of kb-sized gDNA fragments, followed by fragmentation and streptavidin capture of junction fragments286. and streptavidin capture of junction fragments286. In the serious EHEC epidemic outbreak in Germany in the summer of 2011, a number of In the serious EHEC epidemic outbreak in Germany in the summer of 2011, a number of people were infected with a highly contagious infection that killed several people within a people were infected with a highly contagious infection that killed several people within a few days. The infectious E. coli strain was rapidly isolated and sequenced by Pacific few days. The infectious E. coli strain was rapidly isolated and sequenced by Pacific Biosciences263 and Ion Torrent287,288. Pacific Biosciences performed de novo assembly using Biosciences263 and Ion Torrent287,288. Pacific Biosciences performed de novo assembly using both circular consensus and single-end long reads at 75x coverage resulting in 33 contigs263. both circular consensus and single-end long reads at 75x coverage resulting in 33 contigs263. Five Ion Torrent sequencing runs generated enough sequence reads for de novo assembly, Five Ion Torrent sequencing runs generated enough sequence reads for de novo assembly, 44 44 Julia Sandberg Julia Sandberg resulting in >3000 contigs287,289. Within a few weeks of the outbreak, sequence data from resulting in >3000 contigs287,289. Within a few weeks of the outbreak, sequence data from both platforms independently showed that the epidemic strain belonged to an E. coli lineage both platforms independently showed that the epidemic strain belonged to an E. coli lineage that had acquired genes coding for antibiotic resistance and a toxin from another bacterial that had acquired genes coding for antibiotic resistance and a toxin from another bacterial strain263,289, demonstrating the power of massive sequencing to rapidly deliver relevant strain263,289, demonstrating the power of massive sequencing to rapidly deliver relevant clinical information. clinical information.

Resequencing Resequencing As the cost of sequencing continues to fall, sequencing human genomes becomes more As the cost of sequencing continues to fall, sequencing human genomes becomes more accessible and common. Resequencing of human genomes does not require long read accessible and common. Resequencing of human genomes does not require long read lengths, since reads are mapped against a known reference sequence, and all current lengths, since reads are mapped against a known reference sequence, and all current massive sequencing platforms apart from Pacific Biosciences (which presently has too low a massive sequencing platforms apart from Pacific Biosciences (which presently has too low a throughput) have been used to resequence human genomes228,229,259,290-292. Each platform’s throughput) have been used to resequence human genomes228,229,259,290-292. Each platform’s benchmark publications have reported the sequencing genomes of eminent people such as benchmark publications have reported the sequencing genomes of eminent people such as ‘one of the fathers of DNA’, James Watson244, or the stipulator of Moore’s law, Gordon ‘one of the fathers of DNA’, James Watson244, or the stipulator of Moore’s law, Gordon Moore228, or anti-apartheid activist Archbishop Desmond Tutu293; and reference genomes Moore228, or anti-apartheid activist Archbishop Desmond Tutu293; and reference genomes belonging to people of various ethnical origin such as Nigera229, Kalahari Desert and a belonging to people of various ethnical origin such as Nigera229, Kalahari Desert and a Bantu southern Africa293, Korea294 and a Han-Chinese295 were also quickly established. Bantu southern Africa293, Korea294 and a Han-Chinese295 were also quickly established. Coverage usually varies between 7.4x244 and 36x295. Coverage usually varies between 7.4x244 and 36x295. The ultimate goal of sequencing masses of human genomes is to, by obtaining plentiful The ultimate goal of sequencing masses of human genomes is to, by obtaining plentiful genetic information, be able to distinguish patterns such as variants that are linked to genetic information, be able to distinguish patterns such as variants that are linked to positive/adverse affects of a particular drug treatment, or variants that are strongly or positive/adverse affects of a particular drug treatment, or variants that are strongly or loosely associated with a particular disease. This would enable the design of drug regimes loosely associated with a particular disease. This would enable the design of drug regimes tailored to the unique genetic profiles of individual patients, the implementation of tailored to the unique genetic profiles of individual patients, the implementation of preventive treatment for individuals at elevated risk of developing a particular disease, and preventive treatment for individuals at elevated risk of developing a particular disease, and the creation of prognostic tools for determining the severity of disease progress. This idea is the creation of prognostic tools for determining the severity of disease progress. This idea is one of many that is being addressed by The 1000 Genomes Project Consortium296, a world- one of many that is being addressed by The 1000 Genomes Project Consortium296, a world- encompassing project that has the aim of sequencing a massive number of genomes of encompassing project that has the aim of sequencing a massive number of genomes of individuals from five major population groups, and to use this data to identify all genomic individuals from five major population groups, and to use this data to identify all genomic variants that are present at a frequency of 1% or more297. In a recent publication of the variants that are present at a frequency of 1% or more297. In a recent publication of the pilot-experiment of this huge task, involving low-coverage, high-coverage and targeted pilot-experiment of this huge task, involving low-coverage, high-coverage and targeted resequencing of a number of individuals, the authors were able to determine that every resequencing of a number of individuals, the authors were able to determine that every person differs from the reference sequence by putative loss-of-function mutations in 250– person differs from the reference sequence by putative loss-of-function mutations in 250– 300 genes297. All data generated by the consortium is publicly available. 300 genes297. All data generated by the consortium is publicly available. In a large-scale approach to decipher the complex questions posed by cancer, the In a large-scale approach to decipher the complex questions posed by cancer, the International Cancer Genome Consortium is aiming to genetically characterize 50 human cancer International Cancer Genome Consortium is aiming to genetically characterize 50 human cancer types298. Both within and outside this project, the genomes of individuals suffering from types298. Both within and outside this project, the genomes of individuals suffering from melanoma299, leukemia300 and cancer of breast301 and lungs290,302 have been sequenced. melanoma299, leukemia300 and cancer of breast301 and lungs290,302 have been sequenced. These studies, in combination with a number of targeted resequencing studies have These studies, in combination with a number of targeted resequencing studies have together resulted in the identification of more than 1000 potential cancer genes303. together resulted in the identification of more than 1000 potential cancer genes303. Shotgun genome sequencing has already been of clinical use, for example in identifying Shotgun genome sequencing has already been of clinical use, for example in identifying candidate genes related to a Mendelian disease after whole-genome sequencing of a family candidate genes related to a Mendelian disease after whole-genome sequencing of a family quartet in which the parents were healthy and both children suffered from Miller quartet in which the parents were healthy and both children suffered from Miller Syndrome304. Moreover, the approach has been used in noninvasive genetic tests, searching Syndrome304. Moreover, the approach has been used in noninvasive genetic tests, searching for fetal chromosomal aberrations such as the presence of an extra copy of chromosome 21 for fetal chromosomal aberrations such as the presence of an extra copy of chromosome 21 in children with Downs’s syndrome. By shotgun sequencing of cell-free DNA from the in children with Downs’s syndrome. By shotgun sequencing of cell-free DNA from the mother, duplication events may be identified based on difference in sequence coverage mother, duplication events may be identified based on difference in sequence coverage 45 45

between different chromosomes305,306. Several reports have demonstrated positive results between different chromosomes305,306. Several reports have demonstrated positive results with this approach and large-scale clinical studies are currently being launched in Europe with this approach and large-scale clinical studies are currently being launched in Europe and elsewhere307. Shotgun sequencing of cell-free DNA from blood is also an approach that and elsewhere307. Shotgun sequencing of cell-free DNA from blood is also an approach that can be used to detect rejection of a transplanted organ in a patient. Since rejection causes can be used to detect rejection of a transplanted organ in a patient. Since rejection causes increased apoptosis of cells in the donated organ, the level of alien DNA in the blood of the increased apoptosis of cells in the donated organ, the level of alien DNA in the blood of the acceptor corresponds to the level of attack that the donated organ is suffering from the acceptor corresponds to the level of attack that the donated organ is suffering from the patient’s immune system308. patient’s immune system308.

4.6.2. Targeted resequencing 4.6.2. Targeted resequencing It has been estimated that only 5% of the human genome is functional309, which is a strong It has been estimated that only 5% of the human genome is functional309, which is a strong argument for specifically directing sequencing capacity to a smaller portion of the genome argument for specifically directing sequencing capacity to a smaller portion of the genome whilst gaining increased sequence depth. Depending on the scientific question at hand, it whilst gaining increased sequence depth. Depending on the scientific question at hand, it may be desirable to sequence specific regions and either some or all protein-coding may be desirable to sequence specific regions and either some or all protein-coding sequences (exons), using a suitable strategy for genomic partitioning. sequences (exons), using a suitable strategy for genomic partitioning. The sequencing of PCR-amplified target regions by massive sequencing is well The sequencing of PCR-amplified target regions by massive sequencing is well established116-119,310,311; the use of amplification primers tailed with platform-specific established116-119,310,311; the use of amplification primers tailed with platform-specific adapters makes it possible to streamline the process of library preparation133. Applications adapters makes it possible to streamline the process of library preparation133. Applications of ultra-deep amplicon sequencing include the interrogation of the subclonal phylogenetic of ultra-deep amplicon sequencing include the interrogation of the subclonal phylogenetic structures of cancer by investigating variance in the most variable part of the antibody structures of cancer by investigating variance in the most variable part of the antibody heavy-chain312; comparing gut between lean and obese twin pairs by 16S heavy-chain312; comparing gut microbiomes between lean and obese twin pairs by 16S rRNA sequencing from fecal samples313 and HLA typing118. Barcoding is often useful to rRNA sequencing from fecal samples313 and HLA typing118. Barcoding is often useful to increase the number of samples that may be sequenced in parallel, in order to match the increase the number of samples that may be sequenced in parallel, in order to match the ever-increasing throughput of MPS. The platforms themselves offer a number of barcodes ever-increasing throughput of MPS. The platforms themselves offer a number of barcodes that can be incorporated into the adapter sequences. One may wish to add an additional that can be incorporated into the adapter sequences. One may wish to add an additional dimension to an experiment i.e. time point, or maximize the number of samples beyond the dimension to an experiment i.e. time point, or maximize the number of samples beyond the limitations of the inbuilt identification system, by using home-made barcodes. Barcodes limitations of the inbuilt identification system, by using home-made barcodes. Barcodes may be introduced by PCR tagging314-316, ligation115,317 or a combination thereof318. may be introduced by PCR tagging314-316, ligation115,317 or a combination thereof318. It has been predicted that about 85% of all disease-causing mutations lie within the protein- It has been predicted that about 85% of all disease-causing mutations lie within the protein- coding regions of the human genome, which makes up 1% of the complete human coding regions of the human genome, which makes up 1% of the complete human genome319. It is often useful to focus on these regions; the practice of doing so is known as genome319. It is often useful to focus on these regions; the practice of doing so is known as exome sequencing, and is particularly useful when studying Mendelian diseases, since these exome sequencing, and is particularly useful when studying Mendelian diseases, since these often stem from a mutation in a protein coding gene319. In fact, exome sequencing has often stem from a mutation in a protein coding gene319. In fact, exome sequencing has showed considerable promise for rare Mendelian diseases. By sequencing whole exomes of showed considerable promise for rare Mendelian diseases. By sequencing whole exomes of members of families that suffer from recessive320 or dominant321,322 Mendelian diseases, the members of families that suffer from recessive320 or dominant321,322 Mendelian diseases, the causative variant of various diseases such as congenital chloride diarrhea have been causative variant of various diseases such as congenital chloride diarrhea have been identified320. Various selection techniques can be used for exome sequence capture and identified320. Various selection techniques can be used for exome sequence capture and hybrid selection; the most widely-used technique is detailed in Chapter 3. hybrid selection; the most widely-used technique is detailed in Chapter 3. Exome sequencing has also been suggested to be suitable for finding moderate-risk low Exome sequencing has also been suggested to be suitable for finding moderate-risk low frequency alleles; it could be used to identify loss-of-function mutations in one of two copies frequency alleles; it could be used to identify loss-of-function mutations in one of two copies of a gene, which would impact the expression of that gene but would not give rise to a of a gene, which would impact the expression of that gene but would not give rise to a Mendelian inheritance pattern319. It seems that resequencing approaches that investigate a Mendelian inheritance pattern319. It seems that resequencing approaches that investigate a number of genes associated with a particular disease are likely to find clinical applications in number of genes associated with a particular disease are likely to find clinical applications in the near future, as a part of various initiatives. One such project aims to develop a the near future, as a part of various initiatives. One such project aims to develop a preconception carrier screen for prospective parents in order to determine whether they are preconception carrier screen for prospective parents in order to determine whether they are a good match. The screen was directed at 448 severe recessive childhood diseases and was a good match. The screen was directed at 448 severe recessive childhood diseases and was

46 46 Julia Sandberg Julia Sandberg developed for implementation in IVF clinics137. In addition, the sequencing of exons of developed for implementation in IVF clinics137. In addition, the sequencing of exons of genes involved in nonsyndromic hearing loss (NSHL) has proven useful for diagnosing genes involved in nonsyndromic hearing loss (NSHL) has proven useful for diagnosing patients with this disease323. patients with this disease323.

4.6.3. Expression profiling 4.6.3. Expression profiling The mixture of proteins present in a cell determines its characteristics and functions, and is The mixture of proteins present in a cell determines its characteristics and functions, and is indicative of normal or pathological cellular conditions. As such, large-scale analysis of the indicative of normal or pathological cellular conditions. As such, large-scale analysis of the protein makeup of different cell types is interesting in many aspects of research. Mass protein makeup of different cell types is interesting in many aspects of research. Mass Spectrometry (MS) is the most widely used tool in global proteomics studies and has been Spectrometry (MS) is the most widely used tool in global proteomics studies and has been extensively developed over the last twenty years324. However, its reliance on complicated extensively developed over the last twenty years324. However, its reliance on complicated sample preparation procedures, limited dynamic range and the lack of methods for protein sample preparation procedures, limited dynamic range and the lack of methods for protein amplification mean that it would be useful to be able to complement proteomics data with amplification mean that it would be useful to be able to complement proteomics data with quantitative studies on mRNA, which is relatively easy to quantify on a large scale. quantitative studies on mRNA, which is relatively easy to quantify on a large scale. However, because the pathway by which mRNA is used in protein synthesis is regulated by However, because the pathway by which mRNA is used in protein synthesis is regulated by a variety of factors, including the rate of mRNA and protein degradation, and variable a variety of factors, including the rate of mRNA and protein degradation, and variable translational efficiency, it is not accurate to assume that mRNA levels will always mirror translational efficiency, it is not accurate to assume that mRNA levels will always mirror those of the corresponding proteins325. Nevertheless, expression profiling has been used those of the corresponding proteins325. Nevertheless, expression profiling has been used extensively to obtain information on cell function and the underlying mechanisms of extensively to obtain information on cell function and the underlying mechanisms of diseases and cellular processes. diseases and cellular processes. Interest in RNA analysis peaked in the late 1990s, with the arrival of scalable and low-cost Interest in RNA analysis peaked in the late 1990s, with the arrival of scalable and low-cost hybridization-based microarray technologies326. As massive sequencing spread, microarray hybridization-based microarray technologies326. As massive sequencing spread, microarray analysis of gene expression will more or less been replaced by sequencing-based expression analysis of gene expression will more or less been replaced by sequencing-based expression analysis (RNA-Seq) in the near future, which requires no prior knowledge of the target analysis (RNA-Seq) in the near future, which requires no prior knowledge of the target sequence and therefore makes it possible to identify and quantify unknown transcripts such sequence and therefore makes it possible to identify and quantify unknown transcripts such as fusion-genes and novel splice variants327,328. Briefly, RNA is converted into cDNA and as fusion-genes and novel splice variants327,328. Briefly, RNA is converted into cDNA and then further into short pieces, known as ‘tags’, that stem from the original transcript. ‘Tags’ then further into short pieces, known as ‘tags’, that stem from the original transcript. ‘Tags’ are sequenced and reads are mapped to the reference genome. By counting the mapping are sequenced and reads are mapped to the reference genome. By counting the mapping tags, a quantitative measure of gene expression is obtained with a greater sensitivity and tags, a quantitative measure of gene expression is obtained with a greater sensitivity and dynamic range than is possible with microarray based analysis329. dynamic range than is possible with microarray based analysis329. Since RNA-Seq was first presented in 2008, it has been further developed in a variety of Since RNA-Seq was first presented in 2008, it has been further developed in a variety of ways. In the original RNA sequencing studies, in order to find splice-sites, reads were ways. In the original RNA sequencing studies, in order to find splice-sites, reads were mapped to exon-junction sequences that are unique for particular transcripts327. With the mapped to exon-junction sequences that are unique for particular transcripts327. With the growing read lengths of current platforms, it is now easier to recognize splice-junctions growing read lengths of current platforms, it is now easier to recognize splice-junctions without having to search for specific ones. Also, with the extreme read lengths of emerging without having to search for specific ones. Also, with the extreme read lengths of emerging single molecule technologies, full-length cDNA/RNA sequencing will likely soon be single molecule technologies, full-length cDNA/RNA sequencing will likely soon be possible, enabling the identification of splice events in a completely unbiased fashion. The possible, enabling the identification of splice events in a completely unbiased fashion. The number of classified RNA species has increased greatly in the last years and it is now known number of classified RNA species has increased greatly in the last years and it is now known that transcripts are also synthesized from the antisense DNA strand. Antisense transcript that transcripts are also synthesized from the antisense DNA strand. Antisense transcript are believed to primarily be involved in regulatory mechanisms, whereas the sense version are believed to primarily be involved in regulatory mechanisms, whereas the sense version of a transcript is more likely to give rise to a functional protein330. Moreover, other types of of a transcript is more likely to give rise to a functional protein330. Moreover, other types of noncoding RNA species may also be transcribed from both strands of DNA. Information noncoding RNA species may also be transcribed from both strands of DNA. Information on the original strand that an RNA molecule stems from is therefore useful. There are a on the original strand that an RNA molecule stems from is therefore useful. There are a number of strand-specific sample preparation techniques available327,331-337; alternatively, it number of strand-specific sample preparation techniques available327,331-337; alternatively, it is possible to directly sequence first strand cDNA332,335,336 or RNA256. is possible to directly sequence first strand cDNA332,335,336 or RNA256.

47 47

cDNA synthesis and forthcoming amplification come with a risk of introducing errors and cDNA synthesis and forthcoming amplification come with a risk of introducing errors and bias due to reverse transcriptase template switching, the enzyme’s preference for longer bias due to reverse transcriptase template switching, the enzyme’s preference for longer transcripts and even GC content, and its talent for creating spurious second-strand cDNAs transcripts and even GC content, and its talent for creating spurious second-strand cDNAs due to its DNA-dependent DNA polymerase activity256. Problematic second strand due to its DNA-dependent DNA polymerase activity256. Problematic second strand synthesis and amplification steps can be avoided by instead sequencing the first strand synthesis and amplification steps can be avoided by instead sequencing the first strand cDNA. This has been done on the Helicos platform by capturing polyadenylated cDNA cDNA. This has been done on the Helicos platform by capturing polyadenylated cDNA molecules on the flow cell335 or by carrying out the actual cDNA synthesis straight on the molecules on the flow cell335 or by carrying out the actual cDNA synthesis straight on the sequencing flow cell, an approach presented by both Helicos336 and Illumina334. Taken to sequencing flow cell, an approach presented by both Helicos336 and Illumina334. Taken to its extreme, this approach would result in the sequencing of RNA molecules without the its extreme, this approach would result in the sequencing of RNA molecules without the prior conversion of RNA into cDNA; this would completely avoid the problems associated prior conversion of RNA into cDNA; this would completely avoid the problems associated with the modest efficiency of cDNA synthesis and its tendency to create false cDNA with the modest efficiency of cDNA synthesis and its tendency to create false cDNA molecules and introduce bias into the resulting sample. Direct RNA sequencing is currently molecules and introduce bias into the resulting sample. Direct RNA sequencing is currently possible only with the Helicos platform, and is done by capturing 5’-polyadenylated RNA possible only with the Helicos platform, and is done by capturing 5’-polyadenylated RNA molecules in vitro on the flow cell and then sequencing these in a manner analogous to molecules in vitro on the flow cell and then sequencing these in a manner analogous to Helicos DNA sequencing, but with the DNA polymerase exchanged for a reverse Helicos DNA sequencing, but with the DNA polymerase exchanged for a reverse transcriptase; less than one 1 ng of RNA is required in this process. The limited read length transcriptase; less than one 1 ng of RNA is required in this process. The limited read length of the Helicos platform is a shortcoming, since it affects the scope for mapping; in the proof of the Helicos platform is a shortcoming, since it affects the scope for mapping; in the proof of concept paper only 50% of the reads were >18 bp long. However, a major advantage of of concept paper only 50% of the reads were >18 bp long. However, a major advantage of the method is the ability to analyze partly-degraded samples due to the lack of amplification the method is the ability to analyze partly-degraded samples due to the lack of amplification steps256. steps256. In clinical applications of RNA-seq, as well as stem cell analysis and forensics, methods for In clinical applications of RNA-seq, as well as stem cell analysis and forensics, methods for the global profiling of minute amounts of RNA are important. Some sample preparation the global profiling of minute amounts of RNA are important. Some sample preparation protocols have addressed this issue, enabling RNA-seq of scarce material338 or even single protocols have addressed this issue, enabling RNA-seq of scarce material338 or even single cells337,339 by e.g. using a linear isothermal amplification strategy (SPIA) to amplify double cells337,339 by e.g. using a linear isothermal amplification strategy (SPIA) to amplify double stranded cDNA338, double PCR amplification339 or by using the template-switching activity stranded cDNA338, double PCR amplification339 or by using the template-switching activity of the reverse transcriptase in order to incorporate a cell-specific barcoded adapter and a of the reverse transcriptase in order to incorporate a cell-specific barcoded adapter and a general adapter sequence in each end of a first strand cDNA molecule337. general adapter sequence in each end of a first strand cDNA molecule337. Transcriptome profiling has shed light on vastly different subjects such as the Transcriptome profiling has shed light on vastly different subjects such as the polyadenylation pattern of human cells257 and embryonic stem cell maturation346. In polyadenylation pattern of human cells257 and embryonic stem cell maturation346. In complex diseases a number of alleles that are common among people can cause a slight complex diseases a number of alleles that are common among people can cause a slight increase in the risk of contracting a specific disease, and these alleles usually have a minor increase in the risk of contracting a specific disease, and these alleles usually have a minor effect on gene expression319. It is thus interesting to study changes in gene expression. An in- effect on gene expression319. It is thus interesting to study changes in gene expression. An in- depth analysis of expression patterns in combination with structural genomic data from 10 depth analysis of expression patterns in combination with structural genomic data from 10 melanoma samples identified 11 novel melanoma gene-fusions. The study also showed that melanoma samples identified 11 novel melanoma gene-fusions. The study also showed that there were a large number of somatic mutations that accumulated in these melanomas, there were a large number of somatic mutations that accumulated in these melanomas, which supports the idea that point mutations are the major driving force behind tumor which supports the idea that point mutations are the major driving force behind tumor progression340. Tasmanian devils are an endangered species that frequently suffer from an progression340. Tasmanian devils are an endangered species that frequently suffer from an unusual, infectious facial cancer, which spreads through physical transfer of living cancer unusual, infectious facial cancer, which spreads through physical transfer of living cancer cells when the bite each other (which is apparently common in mating and other cells when the animals bite each other (which is apparently common in mating and other encounters). By transcriptome profiling of tumor and a testis from the same devil, it was encounters). By transcriptome profiling of tumor and a testis from the same devil, it was shown that the tumor first occurred in the nervous-system-associated Schwann cells of the shown that the tumor first occurred in the nervous-system-associated Schwann cells of the original animal. Through a number of mutations, the tumor then somehow acquired the original animal. Through a number of mutations, the tumor then somehow acquired the ability to bypass the immune system of other animals so that when spread by biting, they ability to bypass the immune system of other animals so that when spread by biting, they were able to act like a tissue engraftment, forming a tissue in the new animal341. were able to act like a tissue engraftment, forming a tissue in the new animal341.

48 48 Julia Sandberg Julia Sandberg

PRESENT INVESTIGATIONS PRESENT INVESTIGATIONS

5. The papers 5. The papers

This thesis is based on five papers that cover the process from picking cells to analyzing This thesis is based on five papers that cover the process from picking cells to analyzing their genetic makeup. their genetic makeup. In the first paper we present a miniaturized chip and use it to seed single stem cells into In the first paper we present a miniaturized chip and use it to seed single stem cells into separate wells, and then culture and analyze maintained pluripotency and differentiation, in separate wells, and then culture and analyze maintained pluripotency and differentiation, in a high throughput manner. Then we move on to massively parallel sequencing. In paper II- a high throughput manner. Then we move on to massively parallel sequencing. In paper II- IV we develop ways to manipulate template-carrying beads for sequencing by Roche/454 IV we develop ways to manipulate template-carrying beads for sequencing by Roche/454 sequencing, in order to exploit the output that one obtains from a sequencing run. This is sequencing, in order to exploit the output that one obtains from a sequencing run. This is done (i) by selecting beads that have amplified template on the surface, or (ii) beads that done (i) by selecting beads that have amplified template on the surface, or (ii) beads that carry a particular target sequence or (iii) barcode sequence tag. In the last paper we use carry a particular target sequence or (iii) barcode sequence tag. In the last paper we use massively parallel sequencing on single cell material to interrogate the cell-to-cell variation massively parallel sequencing on single cell material to interrogate the cell-to-cell variation in hyper variable repeat regions in the genome. in hyper variable repeat regions in the genome.

€ €

49 49

5.1. Culturing of single stem cells in a microchip (I) 5.1. Culturing of single stem cells in a microchip (I)

The intriguing ability of stem cells to self-renew or differentiate into numerous cell types The intriguing ability of stem cells to self-renew or differentiate into numerous cell types (pluripotency) has for instance been demonstrated by the ability of an individual stem cell to (pluripotency) has for instance been demonstrated by the ability of an individual stem cell to generate a functional mammary gland342 or prostate343 upon transplantation in mouse. generate a functional mammary gland342 or prostate343 upon transplantation in mouse. Stem cell differentiation into distinct cell types occurs through changes in cellular Stem cell differentiation into distinct cell types occurs through changes in cellular biochemistry, giving rise to specialized cells with different functions of a tissue or organ. In biochemistry, giving rise to specialized cells with different functions of a tissue or organ. In vitro culturing of mammalian adult and embryonic neural stem cells, while maintaining vitro culturing of mammalian adult and embryonic neural stem cells, while maintaining multi-or-pluripotency is obtained by addition of particular growth factors. New growth multi-or-pluripotency is obtained by addition of particular growth factors. New growth factors and mixtures thereof, are sought for and extensively tested in large-scale screening factors and mixtures thereof, are sought for and extensively tested in large-scale screening studies, for effect in maintaining pluripotency or for directing stem cell differentiation or studies, for effect in maintaining pluripotency or for directing stem cell differentiation or reprogram mature stem cells into regaining pluripotency344. The cost of the growth factors reprogram mature stem cells into regaining pluripotency344. The cost of the growth factors used are steep, thus decreased screening volumes have large effects on the cost of the assay. used are steep, thus decreased screening volumes have large effects on the cost of the assay. Embryonic stem cells are typically cultured adherently by coating plates with extracellular Embryonic stem cells are typically cultured adherently by coating plates with extracellular matrix protein; while adult stem cells may grow adherently or as free-floating spherical cell- matrix protein; while adult stem cells may grow adherently or as free-floating spherical cell- balls, so called neurospheres (NS), depending on culturing conditions. A number of balls, so called neurospheres (NS), depending on culturing conditions. A number of microwell devices are available for cell culturing/analysis, with a wide span of well counts microwell devices are available for cell culturing/analysis, with a wide span of well counts and sizes, among which the 96 and 384 well plates are the most commonly used. and sizes, among which the 96 and 384 well plates are the most commonly used. In paper I we present a miniaturized stem cell screening chip and use of it for culturing and In paper I we present a miniaturized stem cell screening chip and use of it for culturing and analysis of mouse and human embryonic stem (ES) cells and adult neural stem cells from analysis of mouse and human embryonic stem (ES) cells and adult neural stem cells from mouse forebrain. The high-density microwell chip is glass bottomed to allow easy analysis mouse forebrain. The high-density microwell chip is glass bottomed to allow easy analysis by microscopy and imaging systems. The walls of the wells are made up from etched silicon by microscopy and imaging systems. The walls of the wells are made up from etched silicon and sloped to allow cells ‘rolling down’ to the bottom57. The chip is the size of an ordinary and sloped to allow cells ‘rolling down’ to the bottom57. The chip is the size of an ordinary microscopic slide and contains 672 individually labeled microwells, with 500 nl volume. microscopic slide and contains 672 individually labeled microwells, with 500 nl volume. The chip layout is carefully selected to match the shortest movement of the plate holder of The chip layout is carefully selected to match the shortest movement of the plate holder of ordinary flow cytometers (i.e. 1500 µm centre-to-centre distance between wells) to enable ordinary flow cytometers (i.e. 1500 µm centre-to-centre distance between wells) to enable directed single cell seeding by FACS. With minor changes, a similar chip had previously directed single cell seeding by FACS. With minor changes, a similar chip had previously been used to studying carcinoma cell heterogeneity57. To prevent liquid evaporation while been used to studying carcinoma cell heterogeneity57. To prevent liquid evaporation while allowing gas exchange and easy interrogation by microscopy, the chip is covered with a allowing gas exchange and easy interrogation by microscopy, the chip is covered with a semi-permeable PDMS membrane. semi-permeable PDMS membrane. The applicability of the chip for customary culturing conditions and assays was The applicability of the chip for customary culturing conditions and assays was demonstrated for the three types of stem cells. Differentiation was stimulated by demonstrated for the three types of stem cells. Differentiation was stimulated by supplementing differentiation medium to seeded single cells in microwells. After nine days supplementing differentiation medium to seeded single cells in microwells. After nine days differentiation was verified by antibody labeling for a neuronal differentiation marker. differentiation was verified by antibody labeling for a neuronal differentiation marker. Long-term culturing and clone formation with maintained pluripotency is interesting for Long-term culturing and clone formation with maintained pluripotency is interesting for investigating stem cell proliferation and was demonstrated in the current microwell chip. investigating stem cell proliferation and was demonstrated in the current microwell chip. Maintained pluripotent state of the two ES cell samples and an adherent culture of adult Maintained pluripotent state of the two ES cell samples and an adherent culture of adult neural stem cells, was confirmed four days after single cell seeding, by antibody labeling for neural stem cells, was confirmed four days after single cell seeding, by antibody labeling for a pluripotency marker. Adult neural stem cells were also cultured in a fashion that a pluripotency marker. Adult neural stem cells were also cultured in a fashion that encourages neurosphere formation, which indicates pluripotency. Indeed after four days encourages neurosphere formation, which indicates pluripotency. Indeed after four days sphere formation was observed. The microwell chip presented here, facilitates monitoring sphere formation was observed. The microwell chip presented here, facilitates monitoring and manipulation of cells individually or en masse, which is useful in high throughput and manipulation of cells individually or en masse, which is useful in high throughput differentiation screening assays and clonal experiments. differentiation screening assays and clonal experiments.

50 50 Julia Sandberg Julia Sandberg

5.2. Bead sorting approaches for improved 5.2. Bead sorting approaches for improved performance in massively parallel sequencing performance in massively parallel sequencing (II-IV) (II-IV)

In three current massive parallel sequencing platforms, 454, SOLiD and Ion Torrent, In three current massive parallel sequencing platforms, 454, SOLiD and Ion Torrent, template molecules are propagated on the surface of beads in emulsion, to obtain beads template molecules are propagated on the surface of beads in emulsion, to obtain beads covered in thousands to millions of identical template copies, for sequencing. Beads with covered in thousands to millions of identical template copies, for sequencing. Beads with more than one amplified template on the surface give rise to mixed and unreadable more than one amplified template on the surface give rise to mixed and unreadable sequencing signals and take up valuable sequencing space. Choosing the optimal DNA-to- sequencing signals and take up valuable sequencing space. Choosing the optimal DNA-to- bead ratio is a trade-off between reducing the number of mixed beads, while still obtaining bead ratio is a trade-off between reducing the number of mixed beads, while still obtaining sufficient amounts of beads with amplified template on the surface and the value varies sufficient amounts of beads with amplified template on the surface and the value varies between libraries. Poisson bead to droplet distribution and poisson template to droplet between libraries. Poisson bead to droplet distribution and poisson template to droplet distribution normally renders less than 1/5 of the resulting beads DNA covered75,345 and distribution normally renders less than 1/5 of the resulting beads DNA covered75,345 and naturally these need to be enriched prior to sequencing. naturally these need to be enriched prior to sequencing. In the sequencing protocols, enrichment is done either by coupling DNA-covered beads to In the sequencing protocols, enrichment is done either by coupling DNA-covered beads to large low-density enrichment beads and separating based on density237 or by coupling DNA large low-density enrichment beads and separating based on density237 or by coupling DNA covered beads to paramagnetic beads and gather these using a magnet70,228. Both methods covered beads to paramagnetic beads and gather these using a magnet70,228. Both methods require substantial manual washing steps, in order to remove empty beads. DNA-covered require substantial manual washing steps, in order to remove empty beads. DNA-covered beads are then loaded into individual wells for sequencing by synthesis or ligation, in a beads are then loaded into individual wells for sequencing by synthesis or ligation, in a wash-and-scan manner. Although flow cytometry were originally developed for cell wash-and-scan manner. Although flow cytometry were originally developed for cell analysis, the apparatuses can also be used for sorting and analyzing beads88,346,347. In paper analysis, the apparatuses can also be used for sorting and analyzing beads88,346,347. In paper II-IV flow cytometry has been used in different ways to improve the performance in II-IV flow cytometry has been used in different ways to improve the performance in massive parallel pyrosequencing. massive parallel pyrosequencing.

5.2.1. Library titration and collection of DNA covered beads (II) 5.2.1. Library titration and collection of DNA covered beads (II) In paper II we used flow cytometry to evaluate DNA libraries and to enrich for DNA In paper II we used flow cytometry to evaluate DNA libraries and to enrich for DNA covered beads. The enrichment strategies used in SOLiD, Ion Torrent and 454 have covered beads. The enrichment strategies used in SOLiD, Ion Torrent and 454 have varying efficiency, and when empty beads entering the sequencing reaction these waste varying efficiency, and when empty beads entering the sequencing reaction these waste valuable sequence space. In paper II we demonstrate that by fluorescently labeling DNA valuable sequence space. In paper II we demonstrate that by fluorescently labeling DNA covered beads in an emulsion PCR sample and high-speed flow-sorting, DNA covered covered beads in an emulsion PCR sample and high-speed flow-sorting, DNA covered beads could rapidly be obtained with an immaculate purity. Samples were labeled by using beads could rapidly be obtained with an immaculate purity. Samples were labeled by using a generic fluorescently labeled probe, or streptavidin-coupled fluorophore targeting a biotin a generic fluorescently labeled probe, or streptavidin-coupled fluorophore targeting a biotin on the non-attached strands. After FACS sorting the fluorescent probes were removed by on the non-attached strands. After FACS sorting the fluorescent probes were removed by alkali washing and the beads were sequenced. Sequencing of the FACS sorted beads alkali washing and the beads were sequenced. Sequencing of the FACS sorted beads demonstrated that these beads and the DNA on their surface were intact and that the demonstrated that these beads and the DNA on their surface were intact and that the sequence quality was not affected by the procedure. sequence quality was not affected by the procedure. Moreover, we demonstrated that a simple flow cytometry experiment could replace the Moreover, we demonstrated that a simple flow cytometry experiment could replace the laborious and time-consuming titration protocol of the initial Genome Sequencer system. laborious and time-consuming titration protocol of the initial Genome Sequencer system. At the time of the launch of the 454 and SOLiD platforms, the only available way of At the time of the launch of the 454 and SOLiD platforms, the only available way of titrating a DNA library, was by preparing several (usually four) miniature emulsion PCR titrating a DNA library, was by preparing several (usually four) miniature emulsion PCR reactions, with different amount of input DNA and then evaluate the outcome by reactions, with different amount of input DNA and then evaluate the outcome by sequencing. Titration sequencing is a costly, laborious and time-consuming procedure; sequencing. Titration sequencing is a costly, laborious and time-consuming procedure; hence there was a need for simpler protocol(s). In paper II we fluorescently labeled DNA hence there was a need for simpler protocol(s). In paper II we fluorescently labeled DNA beads in the miniature emulsion PCR reactions and analyzed these by flow cytometry. In beads in the miniature emulsion PCR reactions and analyzed these by flow cytometry. In the standard protocol titration curves were drawn based on the proportion of loaded beads the standard protocol titration curves were drawn based on the proportion of loaded beads 51 51

with a readable key sequence (% keypass). The key sequence is a 4-base sequence that is with a readable key sequence (% keypass). The key sequence is a 4-base sequence that is common to all template molecules and used for instrument calibration. Since also mixed common to all template molecules and used for instrument calibration. Since also mixed beads have a readable key sequence, the value of % keypass corresponds to the total beads have a readable key sequence, the value of % keypass corresponds to the total proportion of DNA covered beads in the sample. Likewise, labeled emulsion PCR sample proportion of DNA covered beads in the sample. Likewise, labeled emulsion PCR sample analyzed by flow cytometry, determines the proportion of DNA covered beads in the analyzed by flow cytometry, determines the proportion of DNA covered beads in the sample. We demonstrated that virtually identical titration curves for a particular library, sample. We demonstrated that virtually identical titration curves for a particular library, based on which the optimal DNA:bead ratio can be determined, could rapidly be obtained based on which the optimal DNA:bead ratio can be determined, could rapidly be obtained from the flow cytometric analysis. from the flow cytometric analysis.

5.2.2. Selecting beads with a particular gene fragment (III) 5.2.2. Selecting beads with a particular gene fragment (III) Deep sequencing of hypervariable genetic regions has shown useful for various applications Deep sequencing of hypervariable genetic regions has shown useful for various applications such as studies on tumor development and high-resolution human leukocyte antigen such as studies on tumor development and high-resolution human leukocyte antigen typing118,312. Amplicon sequencing is normally carried out by PCR amplifying target typing118,312. Amplicon sequencing is normally carried out by PCR amplifying target regions while also introducing suitable adapters for massive sequencing as well as identity regions while also introducing suitable adapters for massive sequencing as well as identity tags. PCR amplification sometimes causes unspecific product formation, which is tags. PCR amplification sometimes causes unspecific product formation, which is problematic as these fragments due to their shorter length, outcompete specific target problematic as these fragments due to their shorter length, outcompete specific target products in the ordinary PCR. Methods for eliminating unspecific fragments, if sufficient products in the ordinary PCR. Methods for eliminating unspecific fragments, if sufficient size difference exists, include precipitation onto carboxylic acids coated paramagnetic size difference exists, include precipitation onto carboxylic acids coated paramagnetic beads348, size separation by gel electrophoresis (gelcut) and nowadays also commercial beads348, size separation by gel electrophoresis (gelcut) and nowadays also commercial systems for automated gelcut, Pippin Prep (Sage Science) and LabChip XT (Caliper). systems for automated gelcut, Pippin Prep (Sage Science) and LabChip XT (Caliper). However, these approaches are work intensive and require sufficient size differences. However, these approaches are work intensive and require sufficient size differences. In paper III the second exon in Dog leukocyte antigen (DLA) in 34 dogs was investigated by In paper III the second exon in Dog leukocyte antigen (DLA) in 34 dogs was investigated by deep sequencing of tagged amplicons. In this paper we took the FACS enrichment deep sequencing of tagged amplicons. In this paper we took the FACS enrichment approach one step further by specifically labeling beads carrying the target sequence approach one step further by specifically labeling beads carrying the target sequence template to remove beads with amplified unspecific product that includes the A/B linker, template to remove beads with amplified unspecific product that includes the A/B linker, on the surface. After emPCR of the pooled library, beads were fluorescently labeled with a on the surface. After emPCR of the pooled library, beads were fluorescently labeled with a target-specific probe, combined with random hexamers and the DNA double-strand target-specific probe, combined with random hexamers and the DNA double-strand specific dye SYBRgreen. Target-bearing beads were collected by flow cytometric sorting specific dye SYBRgreen. Target-bearing beads were collected by flow cytometric sorting based on target-specific dye and DNA quantity. The FACS sorting protocol rendered based on target-specific dye and DNA quantity. The FACS sorting protocol rendered virtually a three-fold increase in target reads as compared to the same emulsion PCR virtually a three-fold increase in target reads as compared to the same emulsion PCR sample enriched according to standard protocol. Naturally, since the majority of the short sample enriched according to standard protocol. Naturally, since the majority of the short unspecific product-bearing beads were removed, there was also a vast increase in average unspecific product-bearing beads were removed, there was also a vast increase in average read length. An assessment of sequence distribution demonstrated that the 50 bp long probe read length. An assessment of sequence distribution demonstrated that the 50 bp long probe used for labeling allowed many mismatches and did not impact the representation of used for labeling allowed many mismatches and did not impact the representation of different individuals or genotypes in the samples. Gene-specific FACS sorting can thus be different individuals or genotypes in the samples. Gene-specific FACS sorting can thus be used to enhance sequence output of problematic amplicon libraries. used to enhance sequence output of problematic amplicon libraries.

52 52 Julia Sandberg Julia Sandberg

5.2.3. Normalizing barcode presence (IV) 5.2.3. Normalizing barcode presence (IV) Barcoding and pooling of multiple samples is often useful to increase the number of samples Barcoding and pooling of multiple samples is often useful to increase the number of samples that may be sequenced in parallel115,314-318 and the 454 platform offers numerous short that may be sequenced in parallel115,314-318 and the 454 platform offers numerous short barcodes denoted Multiplex Identity (MID) tags, that are incorporated in the sequencing barcodes denoted Multiplex Identity (MID) tags, that are incorporated in the sequencing adapters. Multiplex procedures require accurate quantification of the individual targets adapters. Multiplex procedures require accurate quantification of the individual targets before library pooling and emulsion PCR. However, despite careful quantification prior to before library pooling and emulsion PCR. However, despite careful quantification prior to pooling, highly biased library fractions have been reported349,350. pooling, highly biased library fractions have been reported349,350. In paper IV we used color-coded barcode-specific oligonucleotide probes to specifically In paper IV we used color-coded barcode-specific oligonucleotide probes to specifically label beads based on barcode. To enable sufficient hybridization probes were designed to label beads based on barcode. To enable sufficient hybridization probes were designed to cover the 10 bp long MID sequence as well as the four bp long key sequence and two bp in cover the 10 bp long MID sequence as well as the four bp long key sequence and two bp in the general sequencing primer site. In a proof-of-concept experiment a sample containing the general sequencing primer site. In a proof-of-concept experiment a sample containing four MID libraries in highly biased fractions were MID-specifically labeled and equal four MID libraries in highly biased fractions were MID-specifically labeled and equal amounts of beads from all libraries were collected by FACS sorting. Library equalization amounts of beads from all libraries were collected by FACS sorting. Library equalization was confirmed by sequencing the FACS enriched beads. was confirmed by sequencing the FACS enriched beads. Beads with more than one amplified fragment on the surface, mixed beads, would most Beads with more than one amplified fragment on the surface, mixed beads, would most likely (unless carrying several fragments with the same barcode) be carrying more than one likely (unless carrying several fragments with the same barcode) be carrying more than one fluorescent dye. Since only single-dye labeled beads were collected in the flow-sorting fluorescent dye. Since only single-dye labeled beads were collected in the flow-sorting process the proportion of mixed beads was vastly reduced by the protocol. Collection of process the proportion of mixed beads was vastly reduced by the protocol. Collection of single template beads, with a large number of amplified templates on the surface (i.e. high single template beads, with a large number of amplified templates on the surface (i.e. high fluorescent signal) rendered an increased proportion of high quality sequence reads in the fluorescent signal) rendered an increased proportion of high quality sequence reads in the FACS sorted sample. FACS sorted sample.

5.3. Investigating polyG variation between individual 5.3. Investigating polyG variation between individual cells (V) cells (V)

Cell heterogeneity is of key importance in many aspects of disease, such as in tumor Cell heterogeneity is of key importance in many aspects of disease, such as in tumor development, and can also be used to understand how cells organize themselves into development, and can also be used to understand how cells organize themselves into complex tissues initiated at the stage of embryogenesis. Over time, cells further away from complex tissues initiated at the stage of embryogenesis. Over time, cells further away from each other in the hereditability tree become more dissimilar than cells that are more closely each other in the hereditability tree become more dissimilar than cells that are more closely related. By studying the level of similarity between the polymers of life (i.e. genes, proteins related. By studying the level of similarity between the polymers of life (i.e. genes, proteins or parts thereof) in different animal species, phylogenetic maps of interspecies relationship or parts thereof) in different animal species, phylogenetic maps of interspecies relationship have been drawn since 1960351. have been drawn since 1960351. When studying cells that are more closely related to each other than those from different When studying cells that are more closely related to each other than those from different species, the markers need to be highly variable or in large numbers in order to carry species, the markers need to be highly variable or in large numbers in order to carry sufficient information on hereditary relationships. Mutations are more likely to occur in sufficient information on hereditary relationships. Mutations are more likely to occur in genomic regions with low-complexity such as repeat sequences. Among repeating units, genomic regions with low-complexity such as repeat sequences. Among repeating units, polyguanine nucleotide repeats have a significantly higher mutation rate than other types of polyguanine nucleotide repeats have a significantly higher mutation rate than other types of repeats, such as polyA- or CA-repeats352,353. Phylogenetic fate maps, that describe how repeats, such as polyA- or CA-repeats352,353. Phylogenetic fate maps, that describe how various cells within one organism are related to one another, have been drawn by tracking various cells within one organism are related to one another, have been drawn by tracking differences in length of highly variable polyguanine nucleotide repeats in single cells from differences in length of highly variable polyguanine nucleotide repeats in single cells from different organs in mice15. Discrimination between repeat-lengths was here performed by different organs in mice15. Discrimination between repeat-lengths was here performed by PCR and fragment analysis. Thus far, fate-mapping based on multiplex fragment length PCR and fragment analysis. Thus far, fate-mapping based on multiplex fragment length 53 53

differences has been used, and the method inherently limits the number of loci that can be differences has been used, and the method inherently limits the number of loci that can be investigated. In this study we have approached large-scale analysis of length altering investigated. In this study we have approached large-scale analysis of length altering mutations in polyguanine tracts of human T-cells by massively parallel sequencing. mutations in polyguanine tracts of human T-cells by massively parallel sequencing. The high variability of polyguanine nucleotide repeats is mainly due to polymerase The high variability of polyguanine nucleotide repeats is mainly due to polymerase slippage353. Since polymerases are used in every in vitro amplification step from DNA to slippage353. Since polymerases are used in every in vitro amplification step from DNA to sequencing, this poses a problem for analysis, in particular when studying scarce material or sequencing, this poses a problem for analysis, in particular when studying scarce material or single cells. Moreover, highly biased GC fractions are known to impact PCR amplification single cells. Moreover, highly biased GC fractions are known to impact PCR amplification in a negative manner354,355. The sequencing chemistry used in Illumia is meant to enable in a negative manner354,355. The sequencing chemistry used in Illumia is meant to enable homopolymer sequencing with no difficulties356, however since the technique still involves homopolymer sequencing with no difficulties356, however since the technique still involves PCR amplification, there may be limitations. By using a model system, the ability to PCR amplification, there may be limitations. By using a model system, the ability to accurately Illumina sequence polyguanine nucleotide repeats, ranging in lengths from 5-30 accurately Illumina sequence polyguanine nucleotide repeats, ranging in lengths from 5-30 nucleotides, was evaluated. A great difficulty in correctly establishing lengths exceeding 10- nucleotides, was evaluated. A great difficulty in correctly establishing lengths exceeding 10- 15 basepairs was demonstrated. 15 basepairs was demonstrated. High-coverage whole genome sequence data from a multi-cell sample from a male High-coverage whole genome sequence data from a multi-cell sample from a male individual was used to evaluate variability and sequence coverage in polyguanine nucleotide individual was used to evaluate variability and sequence coverage in polyguanine nucleotide repeats over the genome. It was evident that polyguanine repeat lengths for our individual, repeats over the genome. It was evident that polyguanine repeat lengths for our individual, determined by Illumina sequence reads were in general shorter than those present in the determined by Illumina sequence reads were in general shorter than those present in the human reference genome. Cell to cell variability was investigated in low-coverage sequence human reference genome. Cell to cell variability was investigated in low-coverage sequence data from individual T-cells originating from the same individual as the multi-cell sample. A data from individual T-cells originating from the same individual as the multi-cell sample. A comparison between multi-cell data and single cell data verified maintained quality of the comparison between multi-cell data and single cell data verified maintained quality of the single cell data however, allelic dropout of a portion of the loci was observed, presumably single cell data however, allelic dropout of a portion of the loci was observed, presumably due to the low coverage in combination with the effects of whole genome amplification of due to the low coverage in combination with the effects of whole genome amplification of single cell material. single cell material. In an attempt to focus sequencing efforts to a selected number of polyguanine nucleotide In an attempt to focus sequencing efforts to a selected number of polyguanine nucleotide repeat loci, we employed the method Tri-nucleotide threading (TnT) in a new way, repeat loci, we employed the method Tri-nucleotide threading (TnT) in a new way, denoted Mono-nucleotide threading (MnT), to selectively amplify polyguanine repeats from denoted Mono-nucleotide threading (MnT), to selectively amplify polyguanine repeats from single cell material. The method uses an initial extension and ligation step in a cyclic single cell material. The method uses an initial extension and ligation step in a cyclic manner to introduce universal amplification primers in each amplified fragment in a linear manner to introduce universal amplification primers in each amplified fragment in a linear fashion. The universal amplification handles are then used in a universal PCR amplification fashion. The universal amplification handles are then used in a universal PCR amplification step, of all threads in parallel. By designing MnT-primers so that they hybridize adjacent to step, of all threads in parallel. By designing MnT-primers so that they hybridize adjacent to the polyguanine repeats, and by only including dCTPs in the linear amplification reaction, the polyguanine repeats, and by only including dCTPs in the linear amplification reaction, an increased selectivity is approached. By selectively amplifying a set of polyguanine loci an increased selectivity is approached. By selectively amplifying a set of polyguanine loci from single fibroblast cells, for which the phylogenetic relationship was known, we from single fibroblast cells, for which the phylogenetic relationship was known, we demonstrated the applicability of using MnT to selectively amplify polyG loci, and demonstrated the applicability of using MnT to selectively amplify polyG loci, and construct a phylogenetic tree based on the differences in polyG length between the cells. construct a phylogenetic tree based on the differences in polyG length between the cells.

5.4. Concluding remarks and future perspectives 5.4. Concluding remarks and future perspectives

Recent advances in biotechnology have enabled completely new avenues in life science Recent advances in biotechnology have enabled completely new avenues in life science research to be explored. By allowing increased parallelization an ever-increasing complexity research to be explored. By allowing increased parallelization an ever-increasing complexity of cell samples or experiments can be investigated in shorter time and at a lower cost. This of cell samples or experiments can be investigated in shorter time and at a lower cost. This facilitates for example large-scale efforts to study cell heterogeneity at the single cell level facilitates for example large-scale efforts to study cell heterogeneity at the single cell level that may also include global genome analyses. The work presented in this thesis focuses on that may also include global genome analyses. The work presented in this thesis focuses on

54 54 Julia Sandberg Julia Sandberg massively parallel analysis of cells or nucleic acid samples, demonstrating technology massively parallel analysis of cells or nucleic acid samples, demonstrating technology developments in the field as well as use of the technology in life sciences. developments in the field as well as use of the technology in life sciences. The speed of development of the gene technology field in the last five years has been The speed of development of the gene technology field in the last five years has been dramatic, with protocol changes and upgrades giving swifter protocols and more and better dramatic, with protocol changes and upgrades giving swifter protocols and more and better sequence reads. Since the advent of SOLiD and 454 sequencing, the standard library sequence reads. Since the advent of SOLiD and 454 sequencing, the standard library titration protocols have been changed and library titration in these platforms are nowadays titration protocols have been changed and library titration in these platforms are nowadays carried out not by sequencing but by quantifying the number of enriched (i.e. DNA carried out not by sequencing but by quantifying the number of enriched (i.e. DNA covered) beads from the miniature emulsion PCR reactions. Yet other methods for library covered) beads from the miniature emulsion PCR reactions. Yet other methods for library titration are available that do not include emulsion PCR. The latter methods offer an titration are available that do not include emulsion PCR. The latter methods offer an advantage over previous procedures by rendering a vast decrease in protocol advantage over previous procedures by rendering a vast decrease in protocol time106,110,360,361. It is useful to have control over the input prior to sequencing, in order to time106,110,360,361. It is useful to have control over the input prior to sequencing, in order to maximize the output. Flow-sorting of DNA capture beads can be used to collect beads with maximize the output. Flow-sorting of DNA capture beads can be used to collect beads with different features depending on application, as demonstrated in this thesis for the 454 different features depending on application, as demonstrated in this thesis for the 454 sequencing platform. sequencing platform. Ultra-deep sequencing has shed light on various issues such as cancer development and Ultra-deep sequencing has shed light on various issues such as cancer development and human leukocyte antigen typing118,312. Gene-specific flow-sorting solves a frequent problem human leukocyte antigen typing118,312. Gene-specific flow-sorting solves a frequent problem in amplicon sequencing studies, namely the plentiful low quality reads in sequencing data. in amplicon sequencing studies, namely the plentiful low quality reads in sequencing data. On the same track is barcode-specific labeling and flow-sorting in which normalization of a On the same track is barcode-specific labeling and flow-sorting in which normalization of a pooled sample containing four barcoded libraries was carried out in order to obtain even pooled sample containing four barcoded libraries was carried out in order to obtain even levels of all four libraries and with increased sequence quality. With current instrumentation levels of all four libraries and with increased sequence quality. With current instrumentation the method should be useful for up to 16 libraries in one reaction. Since half of the massive the method should be useful for up to 16 libraries in one reaction. Since half of the massive sequencing platforms available today use beads as solid support for template amplification, sequencing platforms available today use beads as solid support for template amplification, methods for handling beads in enrichment and selection in order to increase the number of methods for handling beads in enrichment and selection in order to increase the number of useful reads are of relevance. Today, investment in an expensive flow cytometer for this useful reads are of relevance. Today, investment in an expensive flow cytometer for this purpose, can probably not be fully justified. However, a downscaled flow-sorter, specialized purpose, can probably not be fully justified. However, a downscaled flow-sorter, specialized for automated bead-enrichment, could come in handy in sample preparation for core for automated bead-enrichment, could come in handy in sample preparation for core laboratories that have a steady stream of samples coming through, in order to maximize the laboratories that have a steady stream of samples coming through, in order to maximize the output of each run. output of each run. As sequencing continues to develop to offer more data for less money, enabling enormous As sequencing continues to develop to offer more data for less money, enabling enormous sequencing initiatives such as 1000 Genomes project296,297 and the Cancer genomes sequencing initiatives such as 1000 Genomes project296,297 and the Cancer genomes project298 that are currently being carried out, the knowledge of the connection between project298 that are currently being carried out, the knowledge of the connection between certain diseases and genetic signatures will increase. This will give rise to new diagnostic certain diseases and genetic signatures will increase. This will give rise to new diagnostic markers for the clinic, and increase the attention for personalized medicine. markers for the clinic, and increase the attention for personalized medicine. Cell-cell variability is interesting in many biological contexts. In some diseases, a Cell-cell variability is interesting in many biological contexts. In some diseases, a combination of traits is known to give a cell its particular disease phenotype and obtaining combination of traits is known to give a cell its particular disease phenotype and obtaining information on cell heterogeneity within a tissue is then crucial. For instance, it has been information on cell heterogeneity within a tissue is then crucial. For instance, it has been argued that knowing more about the co-occurrence of several different mutations within the argued that knowing more about the co-occurrence of several different mutations within the same cell is essential to fully understand the nature and evolution of cancer18,19. In other same cell is essential to fully understand the nature and evolution of cancer18,19. In other cases, a cell population may be made up of two subpopulations with distinctly varying traits, cases, a cell population may be made up of two subpopulations with distinctly varying traits, making the information obtained dependant upon their proportions in a mixed sample. making the information obtained dependant upon their proportions in a mixed sample. Circulating tumor cells in cancer patients and fetal cells in the blood of pregnant women Circulating tumor cells in cancer patients and fetal cells in the blood of pregnant women are two examples of rare cells that are difficult to study in their natural habitat and are two examples of rare cells that are difficult to study in their natural habitat and therefore need to be isolated for proper characterization of cancer genotype, and risk of therefore need to be isolated for proper characterization of cancer genotype, and risk of metastasis or fetal aneuploidy, respectively. High density miniaturized wells suitable for metastasis or fetal aneuploidy, respectively. High density miniaturized wells suitable for high throughput cell screening of individually seeded and possibly clonally expanded cells, high throughput cell screening of individually seeded and possibly clonally expanded cells,

55 55

will most likely prove useful in studies focusing on e.g. directed differentiation or will most likely prove useful in studies focusing on e.g. directed differentiation or reprogramming of stem cells, and may help in making screening studies more accessible. reprogramming of stem cells, and may help in making screening studies more accessible. Whole genome sequencing of single cell material was used to investigate cell-to-cell Whole genome sequencing of single cell material was used to investigate cell-to-cell variation in the most variable portion of the human genome, polyguanine nucleotide repeat variation in the most variable portion of the human genome, polyguanine nucleotide repeat regions. Starting from single cells, the challenges were many, including amplification of regions. Starting from single cells, the challenges were many, including amplification of genomic DNA while introducing minimal bias and investigating the ability to accurately genomic DNA while introducing minimal bias and investigating the ability to accurately sequence polyguanine nucleotide repeats by Illumina HiSeq sequencing. A number of sequence polyguanine nucleotide repeats by Illumina HiSeq sequencing. A number of highly variable regions were identified in multi-cell data from the same individual from highly variable regions were identified in multi-cell data from the same individual from which the single cells were collected, and the variability in these loci was confirmed in the which the single cells were collected, and the variability in these loci was confirmed in the singe cell data. By selectively amplifying and sequencing a set of polyguanine loci from eight singe cell data. By selectively amplifying and sequencing a set of polyguanine loci from eight cells for which the phylogenetic relationship was known, we further demonstrated the cells for which the phylogenetic relationship was known, we further demonstrated the ability of constructing phylogenetic trees based on massively parallel sequencing of ability of constructing phylogenetic trees based on massively parallel sequencing of amplified polyguanine loci. amplified polyguanine loci. Single cell sequencing approaches using whole genome amplification are highly interesting Single cell sequencing approaches using whole genome amplification are highly interesting in areas such as tumor development and stem cell research, nevertheless these approaches in areas such as tumor development and stem cell research, nevertheless these approaches come with major technical limitations and are currently limited to low coverage (i.e. around come with major technical limitations and are currently limited to low coverage (i.e. around 6%)16. The advent of new sequencing technologies that require less input material and with 6%)16. The advent of new sequencing technologies that require less input material and with no inherent amplification steps, may lead to new opportunities in this field within the near no inherent amplification steps, may lead to new opportunities in this field within the near future. future.

56 56 Julia Sandberg Julia Sandberg

Abbreviations Abbreviations bp base pairs bp base pairs cDNA complementary DNA cDNA complementary DNA ChIP chromatin immunoprecipitation ChIP chromatin immunoprecipitation CNV copy number variation CNV copy number variation cPAL combinatorial probe-anchor ligation cPAL combinatorial probe-anchor ligation dATP deoxyadenosine triphosphate dATP deoxyadenosine triphosphate dCTP deoxycytidine triphosphate dCTP deoxycytidine triphosphate ddNTPs dideoxy nucleotides ddNTPs dideoxy nucleotides dGTP deoxyguaninene triphosphate dGTP deoxyguaninene triphosphate DNA deoxyribonucleic acid DNA deoxyribonucleic acid DNB DNA nano-ball DNB DNA nano-ball dNTP deoxynucleotide triphosphate dNTP deoxynucleotide triphosphate DOP-PCR degenerate oligonucleotide-primed PCR DOP-PCR degenerate oligonucleotide-primed PCR ds double-stranded ds double-stranded dTTP deoxythymidine triphosphate dTTP deoxythymidine triphosphate ES embryonic stem (cell) ES embryonic stem (cell) FACS fluorescence-activated cell sorting FACS fluorescence-activated cell sorting FRET fluorescence resonance energy transfer FRET fluorescence resonance energy transfer Gb giga bases Gb giga bases gDNA genomic DNA gDNA genomic DNA HDA helicase-dependent amplification HDA helicase-dependent amplification kbp kilo base pairs kbp kilo base pairs LCM laser capture microdissection LCM laser capture microdissection LCR ligase chain reaction LCR ligase chain reaction LR-PCR long-range PCR LR-PCR long-range PCR MDA multiple displacement amplification MDA multiple displacement amplification MIP molecular inversion probe MIP molecular inversion probe MLPA multiplex ligation-dependent probe amplification MLPA multiplex ligation-dependent probe amplification mRNA messenger RNA mRNA messenger RNA MS mass spectrometry MS mass spectrometry PCR polymerase chain reaction PCR polymerase chain reaction PDMS polydimethylsiloxane PDMS polydimethylsiloxane PEP primer-extension preamplification PEP primer-extension preamplification PPi inorganic phosphate PPi inorganic phosphate RCA rolling circle amplification RCA rolling circle amplification RNA ribonucleic acid RNA ribonucleic acid RT-PCR real-time PCR RT-PCR real-time PCR SBH sequencing-by-hybridization SBH sequencing-by-hybridization SBL sequencing-by-ligation SBL sequencing-by-ligation SBS sequencing-by-synthesis SBS sequencing-by-synthesis SMRT single-molecule real-time SMRT single-molecule real-time SNP single nucleotide polymorphism SNP single nucleotide polymorphism SOLiD sequencing by oligonucleotide ligation and detection SOLiD sequencing by oligonucleotide ligation and detection ss single-stranded ss single-stranded WGA whole-genome amplification WGA whole-genome amplification ZMW zero-mode waveguide ZMW zero-mode waveguide

57 57

Acknowledgements Acknowledgements

Jag trodde aldrig att den här dagen skulle komma, men nu är den klar, boken. Det här hade Jag trodde aldrig att den här dagen skulle komma, men nu är den klar, boken. Det här hade aldrig gått utan alla som hjälpt, inspirerat, diskuterat och distraherat under åren som gått. aldrig gått utan alla som hjälpt, inspirerat, diskuterat och distraherat under åren som gått. Stort tack till Joakim för att du erbjöd mig en doktorandplats (i ett mail skickat kl 05.30 på Stort tack till Joakim för att du erbjöd mig en doktorandplats (i ett mail skickat kl 05.30 på julafton). Det har varit en resa och jag är väldigt glad att du varit där. Du är en idéspruta julafton). Det har varit en resa och jag är väldigt glad att du varit där. Du är en idéspruta utan dess like och efter ett möte med dig känner man sig alltid inspirerad. Tack också för utan dess like och efter ett möte med dig känner man sig alltid inspirerad. Tack också för allt kul genetech har varit ute på; gruppresorna till Åre och alperna har varit guld! Stort allt kul genetech har varit ute på; gruppresorna till Åre och alperna har varit guld! Stort tack också till fantastiska Afshin för att du finns där när resultaten är ’intressanta’ om och tack också till fantastiska Afshin för att du finns där när resultaten är ’intressanta’ om och om igen, lika mycket som när det går bra. Du är en klippa, och dessutom otrolig på att lösa om igen, lika mycket som när det går bra. Du är en klippa, och dessutom otrolig på att lösa kniviga problem. Jag hade tur som fick dig som bihandledare. Tack också till Jonas kniviga problem. Jag hade tur som fick dig som bihandledare. Tack också till Jonas Frisén, det har varit väldigt kul och inspirerande att jobba i gemensamma projekt med dig. Frisén, det har varit väldigt kul och inspirerande att jobba i gemensamma projekt med dig. Gamla och nya Genetechare ska ha ett stort tack. Daniel, för alla frågor och väldigt Gamla och nya Genetechare ska ha ett stort tack. Daniel, för alla frågor och väldigt långa svar, Sverker för att du håller mig sane på Scian och för att du inte snarkar, finurliga långa svar, Sverker för att du håller mig sane på Scian och för att du inte snarkar, finurliga Mårten för otaliga idéer och trevliga luncher, Henrik för sekvenser, finfin pipe och Mårten för otaliga idéer och trevliga luncher, Henrik för sekvenser, finfin pipe och dinosariebestigning i Florida, Johanna för att du sätter färg på genetech, Sara för fikor, dinosariebestigning i Florida, Johanna för att du sätter färg på genetech, Sara för fikor, FACSsnack och en resa till Helsingfors, EB för att du kan skratta när Li Li inte direkt FACSsnack och en resa till Helsingfors, EB för att du kan skratta när Li Li inte direkt levererar och för att du programmerar som en gud (vi blev klara till slut), EP och Pavan ni levererar och för att du programmerar som en gud (vi blev klara till slut), EP och Pavan ni har lärt mig mycket om TnT, punk och ribbbåtar. Tack också till medförfattare Pat (för har lärt mig mycket om TnT, punk och ribbbåtar. Tack också till medförfattare Pat (för snöänglar!) och Beata, och till seniora gruppmedlemmar Pelin, Olof, Magnus och snöänglar!) och Beata, och till seniora gruppmedlemmar Pelin, Olof, Magnus och Anders för att ni höjer nivån på diskussionen, och till nyare stjärnor Anders, Mahya, Anders för att ni höjer nivån på diskussionen, och till nyare stjärnor Anders, Mahya, Yai-Jing, Fredrik och Anna. Skidåkning, äggsprängning, heldagar på Lidingö och Yai-Jing, Fredrik och Anna. Skidåkning, äggsprängning, heldagar på Lidingö och mentobomber, tack! mentobomber, tack! Tack till min goa exjobbare Peter som är grym trots att du hatar att labba; och min Tack till min goa exjobbare Peter som är grym trots att du hatar att labba; och min jämnglada projektarbetare Nemo, tur att jag hittade dig. Thanks to my collaborators at jämnglada projektarbetare Nemo, tur att jag hittade dig. Thanks to my collaborators at CMB Aurélie, Jeff, Marta, Pedro. It’s been great working with you guys! Jag vill också CMB Aurélie, Jeff, Marta, Pedro. It’s been great working with you guys! Jag vill också passa på att tacka alla PIs på avdelningen för att vi har så välfungerande och trevliga labb passa på att tacka alla PIs på avdelningen för att vi har så välfungerande och trevliga labb både på albanova och scilife. Tack till Lasse för trädsnack. både på albanova och scilife. Tack till Lasse för trädsnack. Tack till tappra och trevliga 454 och Illumina folk: Max, Anna S, Anna W, Kicki, Tack till tappra och trevliga 454 och Illumina folk: Max, Anna S, Anna W, Kicki, Christian, Annica, Frida, Miriam, Desiree, Sofie, Elsie, Emma, Bahram för Christian, Annica, Frida, Miriam, Desiree, Sofie, Elsie, Emma, Bahram för sekvenser och småsnack. Tack till Martin och Emma för konfokalmikroskopihjälp, sekvenser och småsnack. Tack till Martin och Emma för konfokalmikroskopihjälp, Jompis och Stefan för exemplarisk handledning under exjobbet, (+ jl för FACShjälp), Jompis och Stefan för exemplarisk handledning under exjobbet, (+ jl för FACShjälp), Marcelo för att du älskar att grotta ner dig i tekniska detaljer. Mikael & Pontus tack för Marcelo för att du älskar att grotta ner dig i tekniska detaljer. Mikael & Pontus tack för att ni löste biffen när minuterna var dyra, och Roman för BGI hjälp. Och till Cajsa, Per, att ni löste biffen när minuterna var dyra, och Roman för BGI hjälp. Och till Cajsa, Per, och Jochen för att ni är så goa. och Jochen för att ni är så goa. Tack till Daniel, Otto, Sara och Johan för att ni tog er tid att läsa ett kapitel eller två. Tack till Daniel, Otto, Sara och Johan för att ni tog er tid att läsa ett kapitel eller två. Minst en stjärna i min himmel får ni. Tack Anna, Emma, Nina, Tove & Torun för Minst en stjärna i min himmel får ni. Tack Anna, Emma, Nina, Tove & Torun för puder, spooning, vin och smågodis i Kittelfjäll. puder, spooning, vin och smågodis i Kittelfjäll.

58 58 Julia Sandberg Julia Sandberg

Sen vill jag vill absolut tacka KTH-tjejerna för mycket kul Naama, Yo & Anna det var ju Sen vill jag vill absolut tacka KTH-tjejerna för mycket kul Naama, Yo & Anna det var ju där det började, och andra goda och spridda vänner Elin & Jo, Lobi & Vic, Marica, där det började, och andra goda och spridda vänner Elin & Jo, Lobi & Vic, Marica, Reb & Fred, Sunke & Rick, Lena, Cat, Ulrica, Martina, Josefine för att ni Reb & Fred, Sunke & Rick, Lena, Cat, Ulrica, Martina, Josefine för att ni förgyller livet utanför labbet! förgyller livet utanför labbet!

Inget hade gått utan familjen. Jag vill tacka mamma & pappa för obegränsat med kärlek Inget hade gått utan familjen. Jag vill tacka mamma & pappa för obegränsat med kärlek och stöd genom alla år. Moster min stora förebild, tack för att du alltid finns där för pepp, och stöd genom alla år. Moster min stora förebild, tack för att du alltid finns där för pepp, skratt, goda råd (och middagar) och en och annan uppläxning. Min fina lilla mormor, still skratt, goda råd (och middagar) och en och annan uppläxning. Min fina lilla mormor, still going strong vid 90 års ålder, jag förväntar mig dans på bordet på festen! …och resten av going strong vid 90 års ålder, jag förväntar mig dans på bordet på festen! …och resten av familjegänget; Ulla, Alexander, Isabella, Sebastian & Helene, Farmor & Ragge, familjegänget; Ulla, Alexander, Isabella, Sebastian & Helene, Farmor & Ragge, Owe, Lollo & Ralph, Maria & Martin, Ancha & Hasse, Nisse, Knut. Tack för Owe, Lollo & Ralph, Maria & Martin, Ancha & Hasse, Nisse, Knut. Tack för skidåkning, hjortronplockning, fjällvandring, fiske, mys på landet, Toscana- och Lugano- skidåkning, hjortronplockning, fjällvandring, fiske, mys på landet, Toscana- och Lugano- resor och trevliga familjemiddagar i massor. resor och trevliga familjemiddagar i massor.

Jag vill avsluta med att tacka Johan för all hjälp under den här perioden! Tack för att du Jag vill avsluta med att tacka Johan för all hjälp under den här perioden! Tack för att du finns, du är fantastisk. (Nästa år tar jag båten.) finns, du är fantastisk. (Nästa år tar jag båten.)

59 59

References References

1 Gibson, D. et al. Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science 1 Gibson, D. et al. Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome. Science 329, 52-56, (2010). 329, 52-56, (2010). 2 Smith, H., Hutchison, C., Pfannkoch, C. & Venter, J. Generating a synthetic genome by whole genome 2 Smith, H., Hutchison, C., Pfannkoch, C. & Venter, J. Generating a synthetic genome by whole genome assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proceedings of the National Academy of assembly: phiX174 bacteriophage from synthetic oligonucleotides. Proceedings of the National Academy of Sciences of the United States of America 100, 15440-15445, (2003). Sciences of the United States of America 100, 15440-15445, (2003). 3 Lartigue, C. et al. Genome transplantation in bacteria: changing one species to another. Science 317, 632- 3 Lartigue, C. et al. Genome transplantation in bacteria: changing one species to another. Science 317, 632- 638, (2007). 638, (2007). 4 Gibson, D. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium 4 Gibson, D. et al. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science 319, 1215-1220, (2008). genome. Science 319, 1215-1220, (2008). 5 Nurse, P. A Long Twentieth Century of the Cell Cycle and Beyond. Cell 100, 71-78, (2000). 5 Nurse, P. A Long Twentieth Century of the Cell Cycle and Beyond. Cell 100, 71-78, (2000). 6 Watson, J. D. & Crick, F. H. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. 6 Watson, J. D. & Crick, F. H. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737-738, (1953). Nature 171, 737-738, (1953). 7 Kubista, M. et al. The real-time polymerase chain reaction. Molecular aspects of medicine 27, 95-125, (2006). 7 Kubista, M. et al. The real-time polymerase chain reaction. Molecular aspects of medicine 27, 95-125, (2006). 8 Kettman, J. R., Coleclough, C., Frey, J. R. & Lefkovits, I. Clonal proteomics: one gene - family of 8 Kettman, J. R., Coleclough, C., Frey, J. R. & Lefkovits, I. Clonal proteomics: one gene - family of proteins. Proteomics 2, 624-631, (2002). proteins. Proteomics 2, 624-631, (2002). 9 Crick, F. Central Dogma of Molecular Biology. Nature 227, 561-563, (1970). 9 Crick, F. Central Dogma of Molecular Biology. Nature 227, 561-563, (1970). 10 Shapiro, J. A. Revisiting the central dogma in the 21st century. Annals of the New York Academy of Sciences 10 Shapiro, J. A. Revisiting the central dogma in the 21st century. Annals of the New York Academy of Sciences 1178, 6-28, (2009). 1178, 6-28, (2009). 11 Bartel, D. P. & Szostak, J. W. Isolation of new from a large pool of random sequences. Science 11 Bartel, D. P. & Szostak, J. W. Isolation of new ribozymes from a large pool of random sequences. Science 261, 1411-1418, (1993). 261, 1411-1418, (1993). 12 Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling. Science 12 Levsky, J. M., Shenoy, S. M., Pezo, R. C. & Singer, R. H. Single-cell gene expression profiling. Science 297, 836-840, (2002). 297, 836-840, (2002). 13 Raj, A., Peskin, C., Tranchina, D., Vargas, D. & Tyagi, S. Stochastic mRNA synthesis in mammalian 13 Raj, A., Peskin, C., Tranchina, D., Vargas, D. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS biology 4, e309, (2006). cells. PLoS biology 4, e309, (2006). 14 Paszek, P. et al. Population robustness arising from cellular heterogeneity. Proceedings of the National Academy 14 Paszek, P. et al. Population robustness arising from cellular heterogeneity. Proceedings of the National Academy of Sciences of the United States of America 107, 11644-11649, (2010). of Sciences of the United States of America 107, 11644-11649, (2010). 15 Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proceedings of the National Academy of Sciences of 15 Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proceedings of the National Academy of Sciences of the United States of America 103, 5448-5453, (2006). the United States of America 103, 5448-5453, (2006). 16 Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90-94, (2011). 16 Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90-94, (2011). 17 Carlo, D. D. & Lee, L. P. Dynamic Single Cell Analysis for Quantitative Biology. Analytical chemistry 78, 17 Carlo, D. D. & Lee, L. P. Dynamic Single Cell Analysis for Quantitative Biology. Analytical chemistry 78, 7918-7925, (2006). 7918-7925, (2006). 18 d'Amore, F. et al. Clonal evolution in t(14;18)-positive follicular lymphoma, evidence for multiple common 18 d'Amore, F. et al. Clonal evolution in t(14;18)-positive follicular lymphoma, evidence for multiple common pathways, and frequent parallel clonal evolution. Clinical cancer research : an official journal of the American pathways, and frequent parallel clonal evolution. Clinical cancer research : an official journal of the American Association for Cancer Research 14, 7180-7187, (2008). Association for Cancer Research 14, 7180-7187, (2008). 19 Riethdorf, S., Wikman, H. & Pantel, K. Review: Biological relevance of disseminated tumor cells in 19 Riethdorf, S., Wikman, H. & Pantel, K. Review: Biological relevance of disseminated tumor cells in cancer patients. International Journal of Cancer 123, 1991-2006, (2008). cancer patients. International Journal of Cancer 123, 1991-2006, (2008). 20 Marcy, Y. et al. Nanoliter Reactors Improve Multiple Displacement Amplification of Genomes from 20 Marcy, Y. et al. Nanoliter Reactors Improve Multiple Displacement Amplification of Genomes from Single Cells. PLoS Genetics 3, e155, (2007). Single Cells. PLoS Genetics 3, e155, (2007). 21 Frisk, T. W., Khorshidi, M. A., Guldevall, K., Vanherberghen, B. & Onfelt, B. A silicon-glass microwell 21 Frisk, T. W., Khorshidi, M. A., Guldevall, K., Vanherberghen, B. & Onfelt, B. A silicon-glass microwell platform for high-resolution imaging and high-content screening with single cell resolution. Biomedical platform for high-resolution imaging and high-content screening with single cell resolution. Biomedical microdevices 13, 683-693, (2011). microdevices 13, 683-693, (2011). 22 Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an 22 Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an organism exposes its cell lineage tree. PLoS computational biology 1, e50, (2005). organism exposes its cell lineage tree. PLoS computational biology 1, e50, (2005). 23 Hooke, R. (1665). Micrographia; Some physiological descriptions of minute bodies made by magnifying 23 Hooke, R. (1665). Micrographia; Some physiological descriptions of minute bodies made by magnifying glasses. With observations and inquiries thereupon., London: J. Martyn and J. Allestry. glasses. With observations and inquiries thereupon., London: J. Martyn and J. Allestry. 24 Bode, P. K. et al. MAGEC2 is a sensitive and novel marker for seminoma: a tissue microarray analysis of 24 Bode, P. K. et al. MAGEC2 is a sensitive and novel marker for seminoma: a tissue microarray analysis of 325 testicular germ cell tumors. Modern pathology: an official journal of the United States and Canadian Academy of 325 testicular germ cell tumors. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc 24, 829-835, (2011). Pathology, Inc 24, 829-835, (2011). 25 Barbe, L. et al. Toward a confocal subcellular atlas of the human proteome. Molecular & cellular proteomics 7, 25 Barbe, L. et al. Toward a confocal subcellular atlas of the human proteome. Molecular & cellular proteomics 7, 499-508, (2008). 499-508, (2008). 26 Koshkin, V. & Krylov, S. N. Single-cell-kinetics approach to compare multidrug resistance-associated 26 Koshkin, V. & Krylov, S. N. Single-cell-kinetics approach to compare multidrug resistance-associated membrane transport in subpopulations of cells. Anal Chem 83, 6132-6134, (2011). membrane transport in subpopulations of cells. Anal Chem 83, 6132-6134, (2011). 27 Neher, E., Sakmann, B. & Steinbach, J. H. The extracellular patch clamp: a method for resolving 27 Neher, E., Sakmann, B. & Steinbach, J. H. The extracellular patch clamp: a method for resolving currents through individual open channels in biological membranes. Pflügers Archiv: European journal of currents through individual open channels in biological membranes. Pflügers Archiv: European journal of physiology 375, 219-228, (1978). physiology 375, 219-228, (1978).

60 60 Julia Sandberg Julia Sandberg

28 Mitsushima, D., Ishihara, K., Sano, A., Kessels, H. W. & Takahashi, T. Contextual learning requires 28 Mitsushima, D., Ishihara, K., Sano, A., Kessels, H. W. & Takahashi, T. Contextual learning requires synaptic AMPA receptor delivery in the hippocampus. Proceedings of the National Academy of Sciences of the synaptic AMPA receptor delivery in the hippocampus. Proceedings of the National Academy of Sciences of the United States of America 108, 12503-12508, (2011). United States of America 108, 12503-12508, (2011). 29 Newman, J. R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological 29 Newman, J. R. et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 441, 840-846, (2006). noise. Nature 441, 840-846, (2006). 30 Rubakhin, S. S., Romanova, E. V., Nemes, P. & Sweedler, J. V. Profiling metabolites and peptides in 30 Rubakhin, S. S., Romanova, E. V., Nemes, P. & Sweedler, J. V. Profiling metabolites and peptides in single cells. Nature methods 8, S20-29, (2011). single cells. Nature methods 8, S20-29, (2011). 31 Manz, A., Graber, N. & Widmer, H. M. Miniaturized Total Chemical Analysis Systems: a Novel 31 Manz, A., Graber, N. & Widmer, H. M. Miniaturized Total Chemical Analysis Systems: a Novel Concept for Chemical Sensing. Sensors and Actuators B1, 244-248, (1990). Concept for Chemical Sensing. Sensors and Actuators B1, 244-248, (1990). 32 Kricka, L. J. & Wilding, P. Microchip PCR. Analytical and bioanalytical chemistry 377, 820-825, (2003). 32 Kricka, L. J. & Wilding, P. Microchip PCR. Analytical and bioanalytical chemistry 377, 820-825, (2003). 33 Zare, R. & Kim, S. Microfluidic platforms for single-cell analysis. Annual Review of Biomedical Engineering 12, 33 Zare, R. & Kim, S. Microfluidic platforms for single-cell analysis. Annual Review of Biomedical Engineering 12, 187-201, (2010). 187-201, (2010). 34 Hoeckner, M., Erdel, M., Spreiz, A., Utermann, G. & Kotzot, D. Whole genome amplification from 34 Hoeckner, M., Erdel, M., Spreiz, A., Utermann, G. & Kotzot, D. Whole genome amplification from microdissected chromosomes. Cytogenetic and genome research 125, 98-102, (2009). microdissected chromosomes. Cytogenetic and genome research 125, 98-102, (2009). 35 Navin, N. & Hicks, J. Future medical applications of single-cell sequencing in cancer. Genome medicine 3, 1- 35 Navin, N. & Hicks, J. Future medical applications of single-cell sequencing in cancer. Genome medicine 3, 1- 12, (2011). 12, (2011). 36 Lau, E. et al. Birth of a healthy infant following preimplantation PKHD1 haplotyping for autosomal 36 Lau, E. et al. Birth of a healthy infant following preimplantation PKHD1 haplotyping for autosomal recessive polycystic kidney disease using multiple displacement amplification. Journal of Assisted Reproduction recessive polycystic kidney disease using multiple displacement amplification. Journal of Assisted Reproduction and Genetics 27, 397-407, (2010). and Genetics 27, 397-407, (2010). 37 Jonáš, A. & Zemánek, P. Light at work: the use of optical forces for particle manipulation, sorting, and 37 Jonáš, A. & Zemánek, P. Light at work: the use of optical forces for particle manipulation, sorting, and analysis. Electrophoresis 29, 4813-4851, (2008). analysis. Electrophoresis 29, 4813-4851, (2008). 38 Emmert-Buck, M. R. et al. Laser Capture Microdissection. Science 274, 998-1001, (1996). 38 Emmert-Buck, M. R. et al. Laser Capture Microdissection. Science 274, 998-1001, (1996). 39 Tietjen, I. et al. Single-cell transcriptional analysis of neuronal progenitors. Neuron 38, 161-175, (2003). 39 Tietjen, I. et al. Single-cell transcriptional analysis of neuronal progenitors. Neuron 38, 161-175, (2003). 40 Kroneis, T. et al. Combined Molecular Genetic and Cytogenetic Analysis from Single Cells after 40 Kroneis, T. et al. Combined Molecular Genetic and Cytogenetic Analysis from Single Cells after Isothermal Whole-Genome Amplification. Clinical chemistry 57, 1-10, (2011). Isothermal Whole-Genome Amplification. Clinical chemistry 57, 1-10, (2011). 41 Frumkin, D. et al. Amplification of multiple genomic loci from single cells isolated by laser micro- 41 Frumkin, D. et al. Amplification of multiple genomic loci from single cells isolated by laser micro- dissection of tissues. BMC Biotechnology 8, 1-16, (2008). dissection of tissues. BMC Biotechnology 8, 1-16, (2008). 42 Gelpi, E. et al. Fluorescent in situ hybridization on isolated tumor cell nuclei: a sensitive method for 1p 42 Gelpi, E. et al. Fluorescent in situ hybridization on isolated tumor cell nuclei: a sensitive method for 1p and 19q deletion analysis in paraffin-embedded oligodendroglial tumor specimens. Modern pathology: an and 19q deletion analysis in paraffin-embedded oligodendroglial tumor specimens. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc 16, 708-715, (2003). official journal of the United States and Canadian Academy of Pathology, Inc 16, 708-715, (2003). 43 Hulett, H. R., Bonner, W. A., Barrett, J. & Herzenberg, L. A. Cell Sorting: Automated Separation of 43 Hulett, H. R., Bonner, W. A., Barrett, J. & Herzenberg, L. A. Cell Sorting: Automated Separation of Mammalian Cells as a Function of Intracellular Fluorescence. Science 166, 747-749, (1969). Mammalian Cells as a Function of Intracellular Fluorescence. Science 166, 747-749, (1969). 44 Shapiro, H. M. (2003). Practical Flow Cytometry, 4th Edition 4 edn, 736 New York: Wiley-Liss. 44 Shapiro, H. M. (2003). Practical Flow Cytometry, 4th Edition 4 edn, 736 New York: Wiley-Liss. 45 Perfetto, S. P., Chattopadhyay, P. K. & Roederer, M. Seventeen-colour flow cytometry: unravelling the 45 Perfetto, S. P., Chattopadhyay, P. K. & Roederer, M. Seventeen-colour flow cytometry: unravelling the immune system. Nature reviews Immunology 4, 648-655, (2004). immune system. Nature reviews Immunology 4, 648-655, (2004). 46 Fulwyler, M. J. Electronic separation of biological cells by volume. Science 150, 910-911, (1965). 46 Fulwyler, M. J. Electronic separation of biological cells by volume. Science 150, 910-911, (1965). 47 Notta, F. et al. Isolation of single human hematopoietic stem cells capable of long-term multilineage 47 Notta, F. et al. Isolation of single human hematopoietic stem cells capable of long-term multilineage engraftment. Science 333, 218-221, (2011). engraftment. Science 333, 218-221, (2011). 48 Rockberg, J., Löfblom, J., Hjelm, B., Uhlén, M. & Ståhl, S. Epitope mapping of antibodies using bacterial 48 Rockberg, J., Löfblom, J., Hjelm, B., Uhlén, M. & Ståhl, S. Epitope mapping of antibodies using bacterial surface display. Nature methods 5, 1039-1045, (2008). surface display. Nature methods 5, 1039-1045, (2008). 49 Rodrigue, S. et al. Whole genome amplification and de novo assembly of single bacterial cells. PLoS One 4, 49 Rodrigue, S. et al. Whole genome amplification and de novo assembly of single bacterial cells. PLoS One 4, e6864, (2009). e6864, (2009). 50 Bain, B. J. et al. Revised guideline on immunophenotyping in acute leukaemias and chronic 50 Bain, B. J. et al. Revised guideline on immunophenotyping in acute leukaemias and chronic lymphoproliferative disorders. Clinical and laboratory haematology 24, 1-13, (2002). lymphoproliferative disorders. Clinical and laboratory haematology 24, 1-13, (2002). 51 Xu, T., Jin, J., Gregory, C., Hickman, J. J. & Boland, T. Inkjet printing of viable mammalian cells. 51 Xu, T., Jin, J., Gregory, C., Hickman, J. J. & Boland, T. Inkjet printing of viable mammalian cells. Biomaterials 26, 93-99, (2005). Biomaterials 26, 93-99, (2005). 52 Calvert, P. Printing cells. Science 318, 208-209, (2007). 52 Calvert, P. Printing cells. Science 318, 208-209, (2007). 53 Liberski, A. R., Delaney, J. T. & Schubert, U. S. "One Cell-One Well": A New Approach to Inkjet 53 Liberski, A. R., Delaney, J. T. & Schubert, U. S. "One Cell-One Well": A New Approach to Inkjet Printing Single Cell Microarrays. ACS Combinatorial Science 13, 190-195, (2011). Printing Single Cell Microarrays. ACS Combinatorial Science 13, 190-195, (2011). 54 Yusof, A. et al. Inkjet-like printing of single-cells. Lab on a Chip 11, 2447-2454, (2011). 54 Yusof, A. et al. Inkjet-like printing of single-cells. Lab on a Chip 11, 2447-2454, (2011). 55 Kim, S., Yamamoto, T., Fourmy, D. & Fujii, T. An electroactive microwell array for trapping and lysing 55 Kim, S., Yamamoto, T., Fourmy, D. & Fujii, T. An electroactive microwell array for trapping and lysing single-bacterial cells. Biomicrofluidics 5, 024114, (2011). single-bacterial cells. Biomicrofluidics 5, 024114, (2011). 56 Tokimitsu, Y. et al. Single lymphocyte analysis with a microwell array chip. Cytom Part A 71, 1003-1010, 56 Tokimitsu, Y. et al. Single lymphocyte analysis with a microwell array chip. Cytom Part A 71, 1003-1010, (2007). (2007). 57 Lindström, S., Larsson, R. & Svahn, H. A. Towards high-throughput single cell/clone cultivation and 57 Lindström, S., Larsson, R. & Svahn, H. A. Towards high-throughput single cell/clone cultivation and analysis. Electrophoresis 29, 1219-1227, (2008). analysis. Electrophoresis 29, 1219-1227, (2008). 58 Selimović, S. et al. Microfabricated polyester conical microwells for cell culture applications. Lab on a Chip 58 Selimović, S. et al. Microfabricated polyester conical microwells for cell culture applications. Lab on a Chip 11, 2325-2332, (2011). 11, 2325-2332, (2011). 59 Zhu, H. et al. Detecting cytokine release from single T-cells. Analytical chemistry 81, 8150-8156, (2009). 59 Zhu, H. et al. Detecting cytokine release from single T-cells. Analytical chemistry 81, 8150-8156, (2009).

61 61

60 Kang, E., Choi, Y., Jun, Y., Chung, B. & Lee, S. Development of a multi-layer microfluidic array chip to 60 Kang, E., Choi, Y., Jun, Y., Chung, B. & Lee, S. Development of a multi-layer microfluidic array chip to culture and replate uniform-sized embryoid bodies without manual cell retrieval. Lab on a Chip 10, 2651- culture and replate uniform-sized embryoid bodies without manual cell retrieval. Lab on a Chip 10, 2651- 2654, (2010). 2654, (2010). 61 Han, C. et al. Integration of single oocyte trapping, in vitro fertilization and embryo culture in a 61 Han, C. et al. Integration of single oocyte trapping, in vitro fertilization and embryo culture in a microwell-structured microfluidic device. Lab on a Chip 10, 2848-2854, (2010). microwell-structured microfluidic device. Lab on a Chip 10, 2848-2854, (2010). 62 Revzin, A., Sekine, K., Sin, A., Tompkins, R. & Toner, M. Development of a microfabricated cytometry 62 Revzin, A., Sekine, K., Sin, A., Tompkins, R. & Toner, M. Development of a microfabricated cytometry platform for characterization and sorting of individual leukocytes. Lab on a Chip 5, 30-37, (2005). platform for characterization and sorting of individual leukocytes. Lab on a Chip 5, 30-37, (2005). 63 Love, J. C., Ronan, J. L., Grotenbreg, G. M., van der Veen, A. G. & Ploegh, H. L. A microengraving 63 Love, J. C., Ronan, J. L., Grotenbreg, G. M., van der Veen, A. G. & Ploegh, H. L. A microengraving method for rapid selection of single cells producing antigen-specific antibodies. Nature Biotechnology 24, 703- method for rapid selection of single cells producing antigen-specific antibodies. Nature Biotechnology 24, 703- 707, (2006). 707, (2006). 64 Jin, A. et al. Rapid isolation of antigen-specific antibody-secreting cells using a chip-based immunospot 64 Jin, A. et al. Rapid isolation of antigen-specific antibody-secreting cells using a chip-based immunospot array. Nature protocols 6, 668-676, (2011). array. Nature protocols 6, 668-676, (2011). 65 Roth, E. A. et al. Inkjet printing for high-throughput cell patterning. Biomaterials 25, 3707-3715, (2004). 65 Roth, E. A. et al. Inkjet printing for high-throughput cell patterning. Biomaterials 25, 3707-3715, (2004). 66 Lin, L. I., Chao, S. H. & Meldrum, D. R. Practical, microfabrication-free device for single-cell isolation. 66 Lin, L. I., Chao, S. H. & Meldrum, D. R. Practical, microfabrication-free device for single-cell isolation. PLoS One 4, 1-5, (2009). PLoS One 4, 1-5, (2009). 67 Chen, C. S., Mrksich, M., Huang, S., Whitesides, G. M. & Ingber, D. E. Geometric control of cell life 67 Chen, C. S., Mrksich, M., Huang, S., Whitesides, G. M. & Ingber, D. E. Geometric control of cell life and death. Science 276, 1425-1428, (1997). and death. Science 276, 1425-1428, (1997). 68 Kantlehner, M. et al. A high-throughput DNA methylation analysis of a single cell. Nucleic Acids Research 68 Kantlehner, M. et al. A high-throughput DNA methylation analysis of a single cell. Nucleic Acids Research 39, 1-5, (2011). 39, 1-5, (2011). 69 Dressman, D., Yan, H., Traverso, G., Kinzler, K. W. & Vogelstein, B. Transforming single DNA 69 Dressman, D., Yan, H., Traverso, G., Kinzler, K. W. & Vogelstein, B. Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proceedings of the National Academy of Sciences of the United States of America 100, 8817-8822, (2003). Proceedings of the National Academy of Sciences of the United States of America 100, 8817-8822, (2003). 70 Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 70 Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376-380, (2005). 376-380, (2005). 71 Kelly, B. T., Baret, J. C., Taly, V. & Griffiths, A. D. Miniaturizing chemistry and biology in 71 Kelly, B. T., Baret, J. C., Taly, V. & Griffiths, A. D. Miniaturizing chemistry and biology in microdroplets. Chemical communications 18, 1773-1788, (2007). microdroplets. Chemical communications 18, 1773-1788, (2007). 72 Bernath, K. et al. In vitro compartmentalization by double emulsions: sorting and gene enrichment by 72 Bernath, K. et al. In vitro compartmentalization by double emulsions: sorting and gene enrichment by fluorescence activated cell sorting. Analytical biochemistry 325, 151-157, (2004). fluorescence activated cell sorting. Analytical biochemistry 325, 151-157, (2004). 73 Aharoni, A., Amitai, G., Bernath, K., Magdassi, S. & Tawfik, D. S. High-throughput screening of enzyme 73 Aharoni, A., Amitai, G., Bernath, K., Magdassi, S. & Tawfik, D. S. High-throughput screening of enzyme libraries: thiolactonases evolved by fluorescence-activated sorting of single cells in emulsion libraries: thiolactonases evolved by fluorescence-activated sorting of single cells in emulsion compartments. Chemistry & biology 12, 1281-1289, (2005). compartments. Chemistry & biology 12, 1281-1289, (2005). 74 Clausell-Tormos, J. et al. Droplet-based microfluidic platforms for the encapsulation and screening of 74 Clausell-Tormos, J. et al. Droplet-based microfluidic platforms for the encapsulation and screening of Mammalian cells and multicellular organisms. Chemistry & biology 15, 427-437, (2008). Mammalian cells and multicellular organisms. Chemistry & biology 15, 427-437, (2008). 75 Brouzes, E. et al. Droplet microfluidic technology for single-cell high-throughput screening. Proceedings of the 75 Brouzes, E. et al. Droplet microfluidic technology for single-cell high-throughput screening. Proceedings of the National Academy of Sciences of the United States of America 106, 14195-14200, (2009). National Academy of Sciences of the United States of America 106, 14195-14200, (2009). 76 Jönsson, H. N. et al. Detection and analysis of low-abundance cell-surface biomarkers using enzymatic 76 Jönsson, H. N. et al. Detection and analysis of low-abundance cell-surface biomarkers using enzymatic amplification in microfluidic droplets. Angewandte Chemie (International ed in English) 48, 2518-2521, (2009). amplification in microfluidic droplets. Angewandte Chemie (International ed in English) 48, 2518-2521, (2009). 77 Novak, R. et al. Single-Cell Multiplex Gene Detection and Sequencing with Microfluidically Generated 77 Novak, R. et al. Single-Cell Multiplex Gene Detection and Sequencing with Microfluidically Generated Agarose Emulsions. Angewandte Chemie 123, 410-415, (2010). Agarose Emulsions. Angewandte Chemie 123, 410-415, (2010). 78 Zimmerman, W. B. J. (2006). Microfluidics: History, Theory and Applications, Vol. 466 New York: 78 Zimmerman, W. B. J. (2006). Microfluidics: History, Theory and Applications, Vol. 466 New York: Springer-Verlag. Springer-Verlag. 79 Brown, R. B. & Audet, J. Current techniques for single-cell lysis. Journal of the Royal Society, Interface / the 79 Brown, R. B. & Audet, J. Current techniques for single-cell lysis. Journal of the Royal Society, Interface / the Royal Society 5 Suppl 2, S131-138, (2008). Royal Society 5 Suppl 2, S131-138, (2008). 80 Cheng, W., Klauke, N., Sedgwick, H., Smith, G. L. & Cooper, J. M. Metabolic monitoring of the 80 Cheng, W., Klauke, N., Sedgwick, H., Smith, G. L. & Cooper, J. M. Metabolic monitoring of the electrically stimulated single heart cell within a microfluidic platform. Lab on a Chip 6, 1424-1431, (2006). electrically stimulated single heart cell within a microfluidic platform. Lab on a Chip 6, 1424-1431, (2006). 81 Ottesen, E., Hong, J., Quake, S. & Leadbetter, J. Microfluidic Digital PCR Enables Multigene Analysis of 81 Ottesen, E., Hong, J., Quake, S. & Leadbetter, J. Microfluidic Digital PCR Enables Multigene Analysis of Individual Environmental Bacteria. Science 314, 1464-1467, (2006). Individual Environmental Bacteria. Science 314, 1464-1467, (2006). 82 Marcy, Y. et al. Dissecting biological "dark matter" with single-cell genetic analysis of rare and 82 Marcy, Y. et al. Dissecting biological "dark matter" with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proceedings of the National Academy of Sciences of the United uncultivated TM7 microbes from the human mouth. Proceedings of the National Academy of Sciences of the United States of America 104, 11889-11894, (2007). States of America 104, 11889-11894, (2007). 83 Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. 83 Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotechnology 29, 51-57, (2010). Nature Biotechnology 29, 51-57, (2010). 84 Ashkin, A. & Dziedzic, J. M. Optcal Trapping and Manipulation of and Bacteria. Science 235, 84 Ashkin, A. & Dziedzic, J. M. Optcal Trapping and Manipulation of Viruses and Bacteria. Science 235, 1517-1520, (1987). 1517-1520, (1987). 85 MacDonald, M. P., Spalding, G. C. & Dholakia, K. Microfluidic sorting in an optical lattice. Nature 426, 85 MacDonald, M. P., Spalding, G. C. & Dholakia, K. Microfluidic sorting in an optical lattice. Nature 426, 421-424, (2003). 421-424, (2003). 86 Pohl, H. A. (1978). Dielectrophoresis: The behavior of neutral matter in nonuniform electric fields, 86 Pohl, H. A. (1978). Dielectrophoresis: The behavior of neutral matter in nonuniform electric fields, Cambridge and New York: Cambridge University Press. Cambridge and New York: Cambridge University Press. 87 Desai, S. P., Voldman, J. in Twelfth International Conference on Miniaturized Systems for Chemistry and Life Sciences. 87 Desai, S. P., Voldman, J. in Twelfth International Conference on Miniaturized Systems for Chemistry and Life Sciences. 1308-1310. 1308-1310. 62 62 Julia Sandberg Julia Sandberg

88 Holmes, D., She, J. K., Roach, P. L. & Morgan, H. Bead-based immunoassays using a micro-chip flow 88 Holmes, D., She, J. K., Roach, P. L. & Morgan, H. Bead-based immunoassays using a micro-chip flow cytometer. Lab on a Chip 7, 1048–1056, (2007). cytometer. Lab on a Chip 7, 1048–1056, (2007). 89 Manneberg, O. & Vanherberghen, B. A three-dimensional ultrasonic cage for characterization of 89 Manneberg, O. & Vanherberghen, B. A three-dimensional ultrasonic cage for characterization of individual cells. Applied Physics Letters 93, 063901-063903, (2008). individual cells. Applied Physics Letters 93, 063901-063903, (2008). 90 Vanherberghen, B. et al. Ultrasound-controlled cell aggregation in a multi-well chip. Lab on a Chip 10, 90 Vanherberghen, B. et al. Ultrasound-controlled cell aggregation in a multi-well chip. Lab on a Chip 10, 2727-2732, (2010). 2727-2732, (2010). 91 Lenshof, A. et al. Acoustic whole blood plasmapheresis chip for prostate specific antigen microarray 91 Lenshof, A. et al. Acoustic whole blood plasmapheresis chip for prostate specific antigen microarray diagnostics. Analytical chemistry 81, 6030-6037, (2009). diagnostics. Analytical chemistry 81, 6030-6037, (2009). 92 Cheng, S., Chang, S. Y., Gravitt, P. & Respess, R. Long PCR. Nature 369, 684-685, (1994). 92 Cheng, S., Chang, S. Y., Gravitt, P. & Respess, R. Long PCR. Nature 369, 684-685, (1994). 93 Kleppe, K., Ohtsuka, E., Kleppe, R., Molineux, I. & Khorana, H. G. Studies on polynucleotides. XCVI. 93 Kleppe, K., Ohtsuka, E., Kleppe, R., Molineux, I. & Khorana, H. G. Studies on polynucleotides. XCVI. Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. Journal of molecular biology Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. Journal of molecular biology 56, 341-361, (1971). 56, 341-361, (1971). 94 Saiki, R. K. et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis 94 Saiki, R. K. et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350-1354, (1985). for diagnosis of sickle cell anemia. Science 230, 1350-1354, (1985). 95 Mullis, K. et al. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold 95 Mullis, K. et al. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol 51 Pt 1, 263-273, (1986). Spring Harb Symp Quant Biol 51 Pt 1, 263-273, (1986). 96 Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. 96 Saiki, R. K. et al. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487-491, (1988). Science 239, 487-491, (1988). 97 Li, M., Diehl, F., Dressman, D., Vogelstein, B. & Kinzler, K. W. BEAMing up for detection and 97 Li, M., Diehl, F., Dressman, D., Vogelstein, B. & Kinzler, K. W. BEAMing up for detection and quantification of rare sequence variants. Nature methods 3, 95-97, (2006). quantification of rare sequence variants. Nature methods 3, 95-97, (2006). 98 Zipper, H., Brunner, H., Bernhagen, J. & Vitzthum, F. Investigations on DNA intercalation and surface 98 Zipper, H., Brunner, H., Bernhagen, J. & Vitzthum, F. Investigations on DNA intercalation and surface binding by SYBR Green I, its structure determination and methodological implications. Nucleic Acids binding by SYBR Green I, its structure determination and methodological implications. Nucleic Acids Research 32, e103, (2004). Research 32, e103, (2004). 99 Holland, P. M., Abramson, R. D., Watson, R. & Gelfand, D. H. Detection of specific polymerase chain 99 Holland, P. M., Abramson, R. D., Watson, R. & Gelfand, D. H. Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. Proceedings of the National Academy of Sciences of the United States of America 88, 7276-7280, (1991). Proceedings of the National Academy of Sciences of the United States of America 88, 7276-7280, (1991). 100 Daum, L. T. et al. Comparison of TaqMan and Epoch Dark Quenchers during real-time reverse 100 Daum, L. T. et al. Comparison of TaqMan and Epoch Dark Quenchers during real-time reverse transcription PCR. Molecular and cellular probes 18, 207-209, (2004). transcription PCR. Molecular and cellular probes 18, 207-209, (2004). 101 Tyagi, S. & Kramer, F. R. Molecular beacons: probes that fluoresce upon hybridization. Nature 101 Tyagi, S. & Kramer, F. R. Molecular beacons: probes that fluoresce upon hybridization. Nature Biotechnology 14, 303-308, (1996). Biotechnology 14, 303-308, (1996). 102 Taniguchi, K., Kajiyama, T. & Kambara, H. Quantitative analysis of gene expression in a single cell by 102 Taniguchi, K., Kajiyama, T. & Kambara, H. Quantitative analysis of gene expression in a single cell by qPCR. Nature methods 6, 503-506, (2009). qPCR. Nature methods 6, 503-506, (2009). 103 Meyer, M. et al. From micrograms to picograms: quantitative PCR reduces the material demands of high- 103 Meyer, M. et al. From micrograms to picograms: quantitative PCR reduces the material demands of high- throughput sequencing. Nucleic Acids Research 36, e5, (2008). throughput sequencing. Nucleic Acids Research 36, e5, (2008). 104 Zheng, Z. et al. Titration-free massively parallel pyrosequencing using trace amounts of starting material. 104 Zheng, Z. et al. Titration-free massively parallel pyrosequencing using trace amounts of starting material. Nucleic Acids Res 38, e137, (2010). Nucleic Acids Res 38, e137, (2010). 105 Heyries, K. et al. Megapixel digital PCR. Nature methods 8, 649-651, (2011). 105 Heyries, K. et al. Megapixel digital PCR. Nature methods 8, 649-651, (2011). 106 Sykes, P. J. et al. Quantitation of targets for PCR by use of limiting dilution. BioTechniques 13, 444-449, 106 Sykes, P. J. et al. Quantitation of targets for PCR by use of limiting dilution. BioTechniques 13, 444-449, (1992). (1992). 107 Warren, L., Bryder, D., Weissman, I. L. & Quake, S. R. Transcription factor profiling in individual 107 Warren, L., Bryder, D., Weissman, I. L. & Quake, S. R. Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR. Proceedings of the National Academy of Sciences of the United States hematopoietic progenitors by digital RT-PCR. Proceedings of the National Academy of Sciences of the United States of America 103, 17807-17812, (2006). of America 103, 17807-17812, (2006). 108 White, R. A., 3rd, Blainey, P. C., Fan, H. C. & Quake, S. R. Digital PCR provides sensitive and absolute 108 White, R. A., 3rd, Blainey, P. C., Fan, H. C. & Quake, S. R. Digital PCR provides sensitive and absolute calibration for high throughput sequencing. BMC Genomics 10, 116, (2009). calibration for high throughput sequencing. BMC Genomics 10, 116, (2009). 109 Li, H. H. et al. Amplification and analysis of DNA sequences in single human sperm and diploid cells. 109 Li, H. H. et al. Amplification and analysis of DNA sequences in single human sperm and diploid cells. Nature 335, 414-417, (1988). Nature 335, 414-417, (1988). 110 Maryanski, J. L., Jongeneel, C. V., Bucher, P., Casanova, J. L. & Walker, P. R. Single-cell PCR analysis 110 Maryanski, J. L., Jongeneel, C. V., Bucher, P., Casanova, J. L. & Walker, P. R. Single-cell PCR analysis of TCR repertoires selected by antigen in vivo: a high magnitude CD8 response is comprised of very few of TCR repertoires selected by antigen in vivo: a high magnitude CD8 response is comprised of very few clones. Immunity 4, 47-55, (1996). clones. Immunity 4, 47-55, (1996). 111 Zhang, C. & Xing, D. Miniaturized PCR chips for nucleic acid amplification and analysis: latest advances 111 Zhang, C. & Xing, D. Miniaturized PCR chips for nucleic acid amplification and analysis: latest advances and future trends. Nucleic Acids Research 35, 4223-4237, (2007). and future trends. Nucleic Acids Research 35, 4223-4237, (2007). 112 Hahn, S. et al. Current applications of single-cell PCR. Cellular and molecular life sciences : CMLS 57, 96-105, 112 Hahn, S. et al. Current applications of single-cell PCR. Cellular and molecular life sciences : CMLS 57, 96-105, (2000). (2000). 113 Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314, (2011). 113 Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314, (2011). 114 Barnes, W. M. PCR amplification of up to 35-kb DNA with high fidelity and high yield from Lambda 114 Barnes, W. M. PCR amplification of up to 35-kb DNA with high fidelity and high yield from Lambda bacteriophage templates. Proceedings of the National Academy of Sciences of the United States of America 91, 2216- bacteriophage templates. Proceedings of the National Academy of Sciences of the United States of America 91, 2216- 2220, (1994). 2220, (1994). 115 Craig, D. W. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nature methods 115 Craig, D. W. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nature methods 5, 887-893, (2008). 5, 887-893, (2008).

63 63

116 Harismendy, O. et al. Evaluation of next generation sequencing platforms for population targeted 116 Harismendy, O. et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome biology 10, R32, (2009). sequencing studies. Genome biology 10, R32, (2009). 117 Harismendy, O. & Frazer, K. Method for improving sequence coverage uniformity of targeted genomic 117 Harismendy, O. & Frazer, K. Method for improving sequence coverage uniformity of targeted genomic intervals amplified by LR-PCR using Illumina GA sequencing-by-synthesis technology. BioTechniques 46, intervals amplified by LR-PCR using Illumina GA sequencing-by-synthesis technology. BioTechniques 46, 229-231, (2009). 229-231, (2009). 118 Lind, C. et al. Next-generation sequencing: the solution for high-resolution, unambiguous human 118 Lind, C. et al. Next-generation sequencing: the solution for high-resolution, unambiguous human leukocyte antigen typing. Human immunology 71, 1033-1042, (2010). leukocyte antigen typing. Human immunology 71, 1033-1042, (2010). 119 Melum, E. et al. SNP discovery performance of two second-generation sequencing platforms in the NOD2 119 Melum, E. et al. SNP discovery performance of two second-generation sequencing platforms in the NOD2 gene region. Human mutation 31, 875-885, (2010). gene region. Human mutation 31, 875-885, (2010). 120 Edwards, M. C. & Gibbs, R. A. Multiplex PCR: advantages, development, and applications. PCR methods 120 Edwards, M. C. & Gibbs, R. A. Multiplex PCR: advantages, development, and applications. PCR methods and applications 3, S65-75, (1994). and applications 3, S65-75, (1994). 121 Brownie, J. et al. The elimination of primer-dimer accumulation in PCR. Nucleic Acids Research 25, 3235- 121 Brownie, J. et al. The elimination of primer-dimer accumulation in PCR. Nucleic Acids Research 25, 3235- 3241, (1997). 3241, (1997). 122 Polz, M. F. & Cavanaugh, C. M. Bias in template-to-product ratios in multitemplate PCR. Applied and 122 Polz, M. F. & Cavanaugh, C. M. Bias in template-to-product ratios in multitemplate PCR. Applied and environmental microbiology 64, 3724-3730, (1998). environmental microbiology 64, 3724-3730, (1998). 123 Qiu, X. et al. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16S rRNA 123 Qiu, X. et al. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16S rRNA gene-based cloning. Applied and environmental microbiology 67, 880-887, (2001). gene-based cloning. Applied and environmental microbiology 67, 880-887, (2001). 124 Landegren, U. & Nilsson, M. Locked on target: strategies for future gene diagnostics. Ann Med 29, 585- 124 Landegren, U. & Nilsson, M. Locked on target: strategies for future gene diagnostics. Ann Med 29, 585- 590, (1997). 590, (1997). 125 Nilsson, M. et al. Making ends meet in genetic analysis using padlock probes. Human mutation 19, 410-415, 125 Nilsson, M. et al. Making ends meet in genetic analysis using padlock probes. Human mutation 19, 410-415, (2002). (2002). 126 Kamle, S., Kumar, A. & Bhatnagar, R. Development of multiplex and construct specific PCR assay for 126 Kamle, S., Kumar, A. & Bhatnagar, R. Development of multiplex and construct specific PCR assay for detection of cry2Ab transgene in genetically modified crops and product. GM Crops 2, 74-81, (2011). detection of cry2Ab transgene in genetically modified crops and product. GM Crops 2, 74-81, (2011). 127 Berg, C. et al. Direct Solid-Phase Sequence Analysis of the Human p53 Gene by Use of Multiplex 127 Berg, C. et al. Direct Solid-Phase Sequence Analysis of the Human p53 Gene by Use of Multiplex Polymerase Chain Reaction and alpha-Thiotriphosphate Nucleotides. Clinical chemistry 41, 1461-1466, Polymerase Chain Reaction and alpha-Thiotriphosphate Nucleotides. Clinical chemistry 41, 1461-1466, (1995). (1995). 128 Wang, H. Y. et al. A genotyping system capable of simultaneously analyzing >1000 single nucleotide 128 Wang, H. Y. et al. A genotyping system capable of simultaneously analyzing >1000 single nucleotide polymorphisms in a haploid genome. Genome research 15, 276-283, (2005). polymorphisms in a haploid genome. Genome research 15, 276-283, (2005). 129 Guichoux, E., Lagache, L., Wagner, S., Léger, P. & Petit, R. J. Two highly validated multiplexes (12-plex 129 Guichoux, E., Lagache, L., Wagner, S., Léger, P. & Petit, R. J. Two highly validated multiplexes (12-plex and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Molecular ecology resources and 8-plex) for species delimitation and parentage analysis in oaks (Quercus spp.). Molecular ecology resources 11, 578-585, (2011). 11, 578-585, (2011). 130 Lin, Z., Cui, X. & Li, H. Multiplex genotype determination at a large number of gene loci. Proceedings of the 130 Lin, Z., Cui, X. & Li, H. Multiplex genotype determination at a large number of gene loci. Proceedings of the National Academy of Sciences of the United States of America 93, 2582-2587, (1996). National Academy of Sciences of the United States of America 93, 2582-2587, (1996). 131 Adessi, C. et al. Solid phase DNA amplification: characterisation of primer attachment and amplification 131 Adessi, C. et al. Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms. Nucl. Acids Res. 28, 1-8, (2000). mechanisms. Nucl. Acids Res. 28, 1-8, (2000). 132 Meuzelaar, L. S., Lancaster, O., Pasche, J. P., Kopal, G. & Brookes, A. J. MegaPlex PCR: a strategy for 132 Meuzelaar, L. S., Lancaster, O., Pasche, J. P., Kopal, G. & Brookes, A. J. MegaPlex PCR: a strategy for multiplex amplification. Nature methods 4, 835-837, (2007). multiplex amplification. Nature methods 4, 835-837, (2007). 133 Voelkerding, K. V., Dames, S. & Durtschi, J. D. Next generation sequencing for clinical diagnostics- 133 Voelkerding, K. V., Dames, S. & Durtschi, J. D. Next generation sequencing for clinical diagnostics- principles and application to targeted resequencing for hypertrophic cardiomyopathy: a paper from the principles and application to targeted resequencing for hypertrophic cardiomyopathy: a paper from the 2009 William Beaumont Hospital Symposium on Molecular Pathology. The Journal of molecular diagnostics : 2009 William Beaumont Hospital Symposium on Molecular Pathology. The Journal of molecular diagnostics : JMD 12, 539-551, (2010). JMD 12, 539-551, (2010). 134 Williams, R. et al. Amplification of complex gene libraries by emulsion PCR. Nature methods 3, 545-550, 134 Williams, R. et al. Amplification of complex gene libraries by emulsion PCR. Nature methods 3, 545-550, (2006). (2006). 135 Hedges, D., Guettouche, T. & Gilbert, J. Comparison of Three Targeted Enrichment Strategies on the 135 Hedges, D., Guettouche, T. & Gilbert, J. Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform. PLoS One 6, e18595, (2011). SOLiD Sequencing Platform. PLoS One 6, e18595, (2011). 136 Tewhey, R. et al. Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nature 136 Tewhey, R. et al. Microdroplet-based PCR enrichment for large-scale targeted sequencing. Nature Biotechnology, 1-10, (2009). Biotechnology, 1-10, (2009). 137 Bell, C. et al. Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing. 137 Bell, C. et al. Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing. Science Translational Medicine 3, 65ra64-65ra64, (2011). Science Translational Medicine 3, 65ra64-65ra64, (2011). 138 Raindance Technologies, (Retrieved 2011-08-20). 138 Raindance Technologies, (Retrieved 2011-08-20). 139 Kim, H. et al. A rapid and simple isothermal nucleic acid amplification test for detection of herpes simplex 139 Kim, H. et al. A rapid and simple isothermal nucleic acid amplification test for detection of herpes simplex virus types 1 and 2. Journal of clinical virology: the official publication of the Pan American Society for Clinical Virology virus types 1 and 2. Journal of clinical virology: the official publication of the Pan American Society for Clinical Virology 50, 26-30, (2011). 50, 26-30, (2011). 140 Torres-Chavolla, E. & Alocilja, E. C. Nanoparticle based DNA biosensor for tuberculosis detection using 140 Torres-Chavolla, E. & Alocilja, E. C. Nanoparticle based DNA biosensor for tuberculosis detection using thermophilic helicase-dependent isothermal amplification. Biosensors & bioelectronics 26, 4614-4618, (2011). thermophilic helicase-dependent isothermal amplification. Biosensors & bioelectronics 26, 4614-4618, (2011). 141 Tong, Y., Lemieux, B. & Kong, H. Multiple strategies to improve sensitivity, speed and robustness of 141 Tong, Y., Lemieux, B. & Kong, H. Multiple strategies to improve sensitivity, speed and robustness of isothermal nucleic acid amplification for rapid pathogen detection. BMC Biotechnology 11, 50, (2011). isothermal nucleic acid amplification for rapid pathogen detection. BMC Biotechnology 11, 50, (2011). 142 Motré, A., Li, Y. & Kong, H. Enhancing helicase-dependent amplification by fusing the helicase with the 142 Motré, A., Li, Y. & Kong, H. Enhancing helicase-dependent amplification by fusing the helicase with the DNA polymerase. Gene 420, 17-22, (2008). DNA polymerase. Gene 420, 17-22, (2008).

64 64 Julia Sandberg Julia Sandberg

143 Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nature 143 Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nature Biotechnology 21, 673-678, (2003). Biotechnology 21, 673-678, (2003). 144 Hardenbol, P. et al. Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs 144 Hardenbol, P. et al. Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome research 15, 269-275, (2005). genotyped in a single tube assay. Genome research 15, 269-275, (2005). 145 Wiedmann, M. et al. Ligase chain reaction (LCR) - overview and applications. PCR methods and applications 145 Wiedmann, M. et al. Ligase chain reaction (LCR) - overview and applications. PCR methods and applications 3, S51-64, (1994). 3, S51-64, (1994). 146 Harden, S. V. et al. Real-time gap ligase chain reaction: a rapid semiquantitative assay for detecting p53 146 Harden, S. V. et al. Real-time gap ligase chain reaction: a rapid semiquantitative assay for detecting p53 mutation at low levels in surgical margins and lymph nodes from resected lung and head and neck mutation at low levels in surgical margins and lymph nodes from resected lung and head and neck tumors. Clinical cancer research : an official journal of the American Association for Cancer Research 10, 2379-2385, tumors. Clinical cancer research : an official journal of the American Association for Cancer Research 10, 2379-2385, (2004). (2004). 147 Psifidi, A., Dovas, C. & Banos, G. Novel quantitative real-time LCR for the sensitive detection of SNP 147 Psifidi, A., Dovas, C. & Banos, G. Novel quantitative real-time LCR for the sensitive detection of SNP frequencies in pooled DNA: method development, evaluation and application. PLoS One 6, e14560, frequencies in pooled DNA: method development, evaluation and application. PLoS One 6, e14560, (2011). (2011). 148 Yi, P. et al. Development of a PCR/Ligase Detection Reaction/Nanogold-Based Universal Array 148 Yi, P. et al. Development of a PCR/Ligase Detection Reaction/Nanogold-Based Universal Array Approach for the Detection of Low-Abundant DNA Point Mutations. Cell Biochemistry and Biophysics Aug Approach for the Detection of Low-Abundant DNA Point Mutations. Cell Biochemistry and Biophysics Aug 17, (2011). 17, (2011). 149 Eij-Van Os, P. G. & Schouten, J. P. (2011). Methods in Molecular Biology: PCR Mutation Detection 149 Eij-Van Os, P. G. & Schouten, J. P. (2011). Methods in Molecular Biology: PCR Mutation Detection Protocols: Multiplex Ligation-dependent Probe Amplification (MLPA) for the detection of copy number Protocols: Multiplex Ligation-dependent Probe Amplification (MLPA) for the detection of copy number variation in genomic sequences, Vol. 668 97-126 New York: Springer Science. variation in genomic sequences, Vol. 668 97-126 New York: Springer Science. 150 Damato, B., Dopierala, J. A. & Coupland, S. E. Genotypic profiling of 452 choroidal melanomas with 150 Damato, B., Dopierala, J. A. & Coupland, S. E. Genotypic profiling of 452 choroidal melanomas with multiplex ligation-dependent probe amplification. Clinical cancer research : an official journal of the American multiplex ligation-dependent probe amplification. Clinical cancer research : an official journal of the American Association for Cancer Research 16, 6083-6092, (2010). Association for Cancer Research 16, 6083-6092, (2010). 151 Newton, C. et al. Analysis of any point mutation in DNA. The amplification refractory mutation system 151 Newton, C. et al. Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS). Nucleic Acids Research 17, 2503-2516, (1989). (ARMS). Nucleic Acids Research 17, 2503-2516, (1989). 152 Fan, J. B. et al. Highly parallel SNP genotyping. Cold Spring Harb Sym 68, 69-78, (2003). 152 Fan, J. B. et al. Highly parallel SNP genotyping. Cold Spring Harb Sym 68, 69-78, (2003). 153 Gunderson, K. L. et al. Decoding randomly ordered DNA arrays. Genome research 14, 870-877, (2004). 153 Gunderson, K. L. et al. Decoding randomly ordered DNA arrays. Genome research 14, 870-877, (2004). 154 Golden gate genotyping, 154 Golden gate genotyping, (Retrieved 2011-09-09). (Retrieved 2011-09-09). 155 Pettersson, E., Lindskog, M., Lundeberg, J. & Ahmadian, A. Tri-nucleotide threading for parallel 155 Pettersson, E., Lindskog, M., Lundeberg, J. & Ahmadian, A. Tri-nucleotide threading for parallel amplification of minute amounts of genomic DNA. Nucleic Acids Research 34, e49, (2006). amplification of minute amounts of genomic DNA. Nucleic Acids Research 34, e49, (2006). 156 Zajac, P., Pettersson, E., Gry, M., Lundeberg, J. & Ahmadian, A. Expression profiling of signature gene 156 Zajac, P., Pettersson, E., Gry, M., Lundeberg, J. & Ahmadian, A. Expression profiling of signature gene sets with trinucleotide threading. Genomics 91, 209-217, (2008). sets with trinucleotide threading. Genomics 91, 209-217, (2008). 157 Zajac, P., Oberg, C. & Ahmadian, A. Analysis of short tandem repeats by parallel DNA threading. PLoS 157 Zajac, P., Oberg, C. & Ahmadian, A. Analysis of short tandem repeats by parallel DNA threading. PLoS One 4, e7823, (2009). One 4, e7823, (2009). 158 Pettersson, E. et al. Allelotyping by massively parallel pyrosequencing of SNP-carrying trinucleotide 158 Pettersson, E. et al. Allelotyping by massively parallel pyrosequencing of SNP-carrying trinucleotide threads. Human mutation 29, 323-329, (2008). threads. Human mutation 29, 323-329, (2008). 159 Nilsson, M. et al. Padlock probes: circularizing oligonucleotides for localized DNA detection. Science 265, 159 Nilsson, M. et al. Padlock probes: circularizing oligonucleotides for localized DNA detection. Science 265, 2085-2088, (1994). 2085-2088, (1994). 160 Banér, J. et al. Parallel gene analysis with allele-specific padlock probes and tag microarrays. Nucleic Acids 160 Banér, J. et al. Parallel gene analysis with allele-specific padlock probes and tag microarrays. Nucleic Acids Research 31, e103, (2003). Research 31, e103, (2003). 161 Zhao, H. et al. Analysis of CpG island methylation using rolling Circle amplification (RCA) product 161 Zhao, H. et al. Analysis of CpG island methylation using rolling Circle amplification (RCA) product microarray. Journal of biomedical nanotechnology 7, 292-299, (2011). microarray. Journal of biomedical nanotechnology 7, 292-299, (2011). 162 Larsson, C., Grundberg, I., Söderberg, O. & Nilsson, M. In situ detection and genotyping of individual 162 Larsson, C., Grundberg, I., Söderberg, O. & Nilsson, M. In situ detection and genotyping of individual mRNA molecules. Nature methods 7, 395-397, (2010). mRNA molecules. Nature methods 7, 395-397, (2010). 163 de la Torre, T. Z. et al. Detection of rolling circle amplified DNA molecules using probe-tagged magnetic 163 de la Torre, T. Z. et al. Detection of rolling circle amplified DNA molecules using probe-tagged magnetic nanobeads in a portable AC susceptometer. Biosensors & bioelectronics 29, 195-199, (2011). nanobeads in a portable AC susceptometer. Biosensors & bioelectronics 29, 195-199, (2011). 164 Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle 164 Larsson, C. et al. In situ genotyping individual DNA molecules by target-primed rolling-circle amplification of padlock probes. Nature methods 1, 227-232, (2004). amplification of padlock probes. Nature methods 1, 227-232, (2004). 165 Wamsley, H., Alleman, A., Johnson, C., Barbet, A. & Abbott, J. Investigation of endothelial cells as an in 165 Wamsley, H., Alleman, A., Johnson, C., Barbet, A. & Abbott, J. Investigation of endothelial cells as an in vivo nidus of Anaplasma marginale infection in cattle. Veterinary Microbiology 153, 264-273, (2011). vivo nidus of Anaplasma marginale infection in cattle. Veterinary Microbiology 153, 264-273, (2011). 166 Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nature methods 4, 931-936, (2007). 166 Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nature methods 4, 931-936, (2007). 167 Turner, E., Lee, C., Ng, S., Nickerson, D. & Shendure, J. Massively parallel exon capture and library-free 167 Turner, E., Lee, C., Ng, S., Nickerson, D. & Shendure, J. Massively parallel exon capture and library-free resequencing across 16 genomes. Nature methods 6, 315-316, (2009). resequencing across 16 genomes. Nature methods 6, 315-316, (2009). 168 Li, J. B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and 168 Li, J. B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210-1213, (2009). sequencing. Science 324, 1210-1213, (2009). 169 Lin, S., Wang, W., Palm, C., Davis, R. W. & Juneau, K. A molecular inversion probe assay for detecting 169 Lin, S., Wang, W., Palm, C., Davis, R. W. & Juneau, K. A molecular inversion probe assay for detecting alternative splicing. BMC Genomics 11, 712, (2010). alternative splicing. BMC Genomics 11, 712, (2010). 170 Shen, P. et al. High-quality DNA sequence capture of 524 disease candidate genes. Proceedings of the National 170 Shen, P. et al. High-quality DNA sequence capture of 524 disease candidate genes. Proceedings of the National Academy of Sciences of the United States of America 108, 6549-6554, (2011). Academy of Sciences of the United States of America 108, 6549-6554, (2011).

65 65

171 Turner, E., Ng, S., Nickerson, D. & Shendure, J. Methods for genomic partitioning. Annual review of 171 Turner, E., Ng, S., Nickerson, D. & Shendure, J. Methods for genomic partitioning. Annual review of genomics and human genetics 10, 263-284, (2009). genomics and human genetics 10, 263-284, (2009). 172 Dahl, F., Gullberg, M., Stenberg, J., Landegren, U. & Nilsson, M. Multiplex amplification enabled by 172 Dahl, F., Gullberg, M., Stenberg, J., Landegren, U. & Nilsson, M. Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragments. Nucleic Acids Research 33, e71, (2005). selective circularization of large sets of genomic DNA fragments. Nucleic Acids Research 33, e71, (2005). 173 Dahl, F. et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. 173 Dahl, F. et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proceedings of the National Academy of Sciences of the United States of America 104, 9387-9392, (2007). Proceedings of the National Academy of Sciences of the United States of America 104, 9387-9392, (2007). 174 Blanco, L. et al. Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode 174 Blanco, L. et al. Highly efficient DNA synthesis by the phage phi 29 DNA polymerase. Symmetrical mode of DNA replication. Journal of Biological Chemistry 264, 8935-8940, (1989). of DNA replication. Journal of Biological Chemistry 264, 8935-8940, (1989). 175 Johansson, H. et al. Targeted resequencing of candidate genes using selector probes. Nucleic Acids Research 175 Johansson, H. et al. Targeted resequencing of candidate genes using selector probes. Nucleic Acids Research 39, e8, (2011). 39, e8, (2011). 176 Albert, T. J. et al. Direct selection of human genomic loci by microarray hybridization. Nature methods 4, 176 Albert, T. J. et al. Direct selection of human genomic loci by microarray hybridization. Nature methods 4, 903-905, (2007). 903-905, (2007). 177 Okou, D. T. et al. Microarray-based genomic selection for high-throughput resequencing. Nature methods 4, 177 Okou, D. T. et al. Microarray-based genomic selection for high-throughput resequencing. Nature methods 4, 907-909, (2007). 907-909, (2007). 178 Teer, J. et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA 178 Teer, J. et al. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome research 20, 1420-1431, (2010). sequencing. Genome research 20, 1420-1431, (2010). 179 Ståhl, P. L. et al. Translational database selection and multiplexed sequence capture for up front filtering 179 Ståhl, P. L. et al. Translational database selection and multiplexed sequence capture for up front filtering of reliable breast cancer biomarker candidates. PLoS One 6, e20794, (2011). of reliable breast cancer biomarker candidates. PLoS One 6, e20794, (2011). 180 Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted 180 Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology 27, 182-189, (2009). sequencing. Nature Biotechnology 27, 182-189, (2009). 181 Bainbridge, M. N. et al. Whole exome capture in solution with 3 Gbp of data. Genome biology 11, R62, 181 Bainbridge, M. N. et al. Whole exome capture in solution with 3 Gbp of data. Genome biology 11, R62, (2010). (2010). 182 Grossmann, V. et al. Targeted next-generation sequencing detects point mutations, insertions, deletions 182 Grossmann, V. et al. Targeted next-generation sequencing detects point mutations, insertions, deletions and balanced chromosomal rearrangements as well as identifies novel leukemia-specific fusion genes in a and balanced chromosomal rearrangements as well as identifies novel leukemia-specific fusion genes in a single procedure. Leukemia: official journal of the Leukemia Society of America, Leukemia Research Fund, UK 25, 671- single procedure. Leukemia: official journal of the Leukemia Society of America, Leukemia Research Fund, UK 25, 671- 680, (2011). 680, (2011). 183 Duncavage, E. J. et al. Hybrid capture and next-generation sequencing identify viral integration sites from 183 Duncavage, E. J. et al. Hybrid capture and next-generation sequencing identify viral integration sites from formalin-fixed, paraffin-embedded tissue. The Journal of molecular diagnostics: JMD 13, 325-333, (2011). formalin-fixed, paraffin-embedded tissue. The Journal of molecular diagnostics: JMD 13, 325-333, (2011). 184 Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing. Nature methods 7, 111- 184 Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing. Nature methods 7, 111- 118, (2010). 118, (2010). 185 Telenius, H. et al. Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a 185 Telenius, H. et al. Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. Genomics 13, 718-725, (1992). single degenerate primer. Genomics 13, 718-725, (1992). 186 Zhang, L. et al. Whole genome amplification from a single cell: implications for genetic analysis. Proceedings 186 Zhang, L. et al. Whole genome amplification from a single cell: implications for genetic analysis. Proceedings of the National Academy of Sciences of the United States of America 89, 5847-5851, (1992). of the National Academy of Sciences of the United States of America 89, 5847-5851, (1992). 187 Dietmaier, W. et al. Multiple mutation analyses in single tumor cells with improved whole genome 187 Dietmaier, W. et al. Multiple mutation analyses in single tumor cells with improved whole genome amplification. Am J Pathol 154, 83-95, (1999). amplification. Am J Pathol 154, 83-95, (1999). 188 Grothues, D., Cantor, C. R. & Smith, C. L. PCR amplification of megabase DNA with tagged random 188 Grothues, D., Cantor, C. R. & Smith, C. L. PCR amplification of megabase DNA with tagged random primers (T-PCR). Nucleic Acids Research 21, 1321-1322, (1993). primers (T-PCR). Nucleic Acids Research 21, 1321-1322, (1993). 189 Updegrove, T. B., Correia, J. J., Galletto, R., Bujalowski, W. & Wartell, R. M. E. coli DNA associated 189 Updegrove, T. B., Correia, J. J., Galletto, R., Bujalowski, W. & Wartell, R. M. E. coli DNA associated with isolated Hfq interacts with Hfq's distal surface and C-terminal . Biochimica et biophysica acta with isolated Hfq interacts with Hfq's distal surface and C-terminal domain. Biochimica et biophysica acta 1799, 588-596, (2010). 1799, 588-596, (2010). 190 Pruett, A., Favello, T. & Mueller, E. Whole Genome Amplification: Moving Beyond the Limits of 190 Pruett, A., Favello, T. & Mueller, E. Whole Genome Amplification: Moving Beyond the Limits of Traditional PCR. Sigma-Aldrich Life Science Quarterly Fall, 2-6, (2004). Traditional PCR. Sigma-Aldrich Life Science Quarterly Fall, 2-6, (2004). 191 Klein, C. A. et al. Comparative genomic hybridization, loss of heterozygosity, and DNA sequence analysis 191 Klein, C. A. et al. Comparative genomic hybridization, loss of heterozygosity, and DNA sequence analysis of single cells. Proceedings of the National Academy of Sciences of the United States of America 96, 4494-4499, (1999). of single cells. Proceedings of the National Academy of Sciences of the United States of America 96, 4494-4499, (1999). 192 Thalhammer, S., Langer, S., Speicher, M., Heckl, W. M. & Geigl, J. B. Generation of chromosome 192 Thalhammer, S., Langer, S., Speicher, M., Heckl, W. M. & Geigl, J. B. Generation of chromosome painting probes from single chromosomes by laser microdissection and linker-adaptor PCR. Chromosome painting probes from single chromosomes by laser microdissection and linker-adaptor PCR. Chromosome research: an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 12, 337- research: an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 12, 337- 343, (2004). 343, (2004). 193 Geigl, J. B. & Speicher, M. Single-cell isolation from cell suspensions and whole genome amplification 193 Geigl, J. B. & Speicher, M. Single-cell isolation from cell suspensions and whole genome amplification from single cells to provide templates for CGH analysis. Nature protocols 2, 3173-3184, (2007). from single cells to provide templates for CGH analysis. Nature protocols 2, 3173-3184, (2007). 194 Fiegler, H. et al. High resolution array-CGH analysis of single cells. Nucleic Acids Research 35, e15, (2007). 194 Fiegler, H. et al. High resolution array-CGH analysis of single cells. Nucleic Acids Research 35, e15, (2007). 195 Treff, N. R., Su, J., Tao, X., Northrop, L. E. & Scott, R. T. Single-cell whole-genome amplification 195 Treff, N. R., Su, J., Tao, X., Northrop, L. E. & Scott, R. T. Single-cell whole-genome amplification technique impacts the accuracy of SNP microarray-based genotyping and copy number analyses. technique impacts the accuracy of SNP microarray-based genotyping and copy number analyses. Molecular Human Reproduction 17, 335-343, (2011). Molecular Human Reproduction 17, 335-343, (2011). 196 Viguera, E., Canceill, D. & Ehrlich, S. D. Replication slippage involves DNA polymerase pausing and 196 Viguera, E., Canceill, D. & Ehrlich, S. D. Replication slippage involves DNA polymerase pausing and dissociation. The EMBO journal 20, 2587-2595, (2001). dissociation. The EMBO journal 20, 2587-2595, (2001). 197 Dean, F. B. et al. Comprehensive human genome amplification using multiple displacement amplification. 197 Dean, F. B. et al. Comprehensive human genome amplification using multiple displacement amplification. Proceedings of the National Academy of Sciences of the United States of America 99, 5261-5266, (2002). Proceedings of the National Academy of Sciences of the United States of America 99, 5261-5266, (2002).

66 66 Julia Sandberg Julia Sandberg

198 Chong, S. S., Gore-Langton, R. E., Hughes, M. R. & Weremowicz, S. Single-cell DNA and FISH 198 Chong, S. S., Gore-Langton, R. E., Hughes, M. R. & Weremowicz, S. Single-cell DNA and FISH analysis for application to preimplantation genetic diagnosis. Curr Protoc Hum Genet Chapter 9, Unit9 10, analysis for application to preimplantation genetic diagnosis. Curr Protoc Hum Genet Chapter 9, Unit9 10, (2010). (2010). 199 Darouich, S., Popovici, C., Missirian, C. & Moncla, A. Use of DOP-PCR for amplification and labeling 199 Darouich, S., Popovici, C., Missirian, C. & Moncla, A. Use of DOP-PCR for amplification and labeling of BAC DNA for FISH. Biotech Histochem 2, (2011). of BAC DNA for FISH. Biotech Histochem 2, (2011). 200 Blainey, P. C. & Quake, S. R. Digital MDA for enumeration of total nucleic acid contamination. Nucleic 200 Blainey, P. C. & Quake, S. R. Digital MDA for enumeration of total nucleic acid contamination. Nucleic Acids Research 39, e19, (2011). Acids Research 39, e19, (2011). 201 Zhang, K. et al. Sequencing genomes from single cells by polymerase cloning. Nat Biotechnol 24, 680-686, 201 Zhang, K. et al. Sequencing genomes from single cells by polymerase cloning. Nat Biotechnol 24, 680-686, (2006). (2006). 202 Jiao, X. et al. Structural Alterations from Multiple Displacement Amplification of a Human Genome 202 Jiao, X. et al. Structural Alterations from Multiple Displacement Amplification of a Human Genome Revealed by Mate-Pair Sequencing. PLoS One 6, e22250, (2011). Revealed by Mate-Pair Sequencing. PLoS One 6, e22250, (2011). 203 Liu, C. L., Schreiber, S. L. & Bernstein, B. E. Development and validation of a T7 based linear 203 Liu, C. L., Schreiber, S. L. & Bernstein, B. E. Development and validation of a T7 based linear amplification for genomic DNA. BMC Genomics 4, 19, (2003). amplification for genomic DNA. BMC Genomics 4, 19, (2003). 204 Liu, C. L., Bernstein, B. E. & Schreiber, S. L. in Whole Genome Amplification: Methods Express. (ed S. Hughes 204 Liu, C. L., Bernstein, B. E. & Schreiber, S. L. in Whole Genome Amplification: Methods Express. (ed S. Hughes and R. Lasken) 1-22 (Scion Publishing Limited, 2005). and R. Lasken) 1-22 (Scion Publishing Limited, 2005). 205 Shankaranarayanan, P. et al. Single-tube linear DNA amplification (LinDA) for robust ChIP-seq. Nature 205 Shankaranarayanan, P. et al. Single-tube linear DNA amplification (LinDA) for robust ChIP-seq. Nature methods 8, 565-567, (2011). methods 8, 565-567, (2011). 206 Hoeijmakers, W., Bártfai, R., Francoijs, K. & Stunnenberg, H. Linear amplification for deep sequencing. 206 Hoeijmakers, W., Bártfai, R., Francoijs, K. & Stunnenberg, H. Linear amplification for deep sequencing. Nature protocols 6, 1026-1036, (2011). Nature protocols 6, 1026-1036, (2011). 207 Ovation WGA System - The Only Linear WGA Technology for Uniform Representation and 207 Ovation WGA System - The Only Linear WGA Technology for Uniform Representation and Quantitative Analysis of the Genome, (2009). Quantitative Analysis of the Genome, (2009). 208 Doug Amorese, P. C., Victor Sementchenko, Mark Stapleton. in AGBT. 208 Doug Amorese, P. C., Victor Sementchenko, Mark Stapleton. in AGBT. 209 Barker, D. L. et al. Two methods of whole-genome amplification enable accurate genotyping across a 209 Barker, D. L. et al. Two methods of whole-genome amplification enable accurate genotyping across a 2320-SNP linkage panel. Genome research 14, 901-907, (2004). 2320-SNP linkage panel. Genome research 14, 901-907, (2004). 210 Sanger, F. et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687-695, (1977). 210 Sanger, F. et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687-695, (1977). 211 Moore, G. Cramming more components onto integrated circuits. Electronics 38, 114-117, (1965). 211 Moore, G. Cramming more components onto integrated circuits. Electronics 38, 114-117, (1965). 212 Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proceedings of the National Academy of 212 Maxam, A. M. & Gilbert, W. A new method for sequencing DNA. Proceedings of the National Academy of Sciences of the United States of America 74, 560-564, (1977). Sciences of the United States of America 74, 560-564, (1977). 213 Sanger, F., Nicklen, S. & Coulson, A. DNA sequencing with chain-terminating inhibitors. Proceedings of the 213 Sanger, F., Nicklen, S. & Coulson, A. DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences of the United States of America 74, 5463-5467, (1977). National Academy of Sciences of the United States of America 74, 5463-5467, (1977). 214 Smith, L. M. et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674-679, 214 Smith, L. M. et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674-679, (1986). (1986). 215 Karger, B. & Guttman, A. DNA sequencing by CE. Electrophoresis 30 Suppl 1, S196-202, (2009). 215 Karger, B. & Guttman, A. DNA sequencing by CE. Electrophoresis 30 Suppl 1, S196-202, (2009). 216 Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae 216 Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496-512, (1995). Rd. Science 269, 496-512, (1995). 217 Edwards, A. & Caskey, C. T. Closure strategies for random DNA sequencing. METHODS: A Companion to 217 Edwards, A. & Caskey, C. T. Closure strategies for random DNA sequencing. METHODS: A Companion to Methods in Enzymology 3, 41-47, (1991). Methods in Enzymology 3, 41-47, (1991). 218 Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921, (2001). 218 Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921, (2001). 219 Venter, J. C. et al. The Sequence of the Human Genome. Science 291, 1304-1351, (2001). 219 Venter, J. C. et al. The Sequence of the Human Genome. Science 291, 1304-1351, (2001). 220 Reid, C. Company profile: Complete Genomics Inc. Future oncology (London, England) 7, 219-221, (2011). 220 Reid, C. Company profile: Complete Genomics Inc. Future oncology (London, England) 7, 219-221, (2011). 221 Drmanac, R. et al. DNA sequence determination by hybridization: a strategy for efficient large-scale 221 Drmanac, R. et al. DNA sequence determination by hybridization: a strategy for efficient large-scale sequencing. Science 260, 1649-1652, (1993). sequencing. Science 260, 1649-1652, (1993). 222 Ronaghi, M., Uhlen, M. & Nyren, P. A Sequencing Method Based on Real-Time Pyrophosphate. Science 222 Ronaghi, M., Uhlen, M. & Nyren, P. A Sequencing Method Based on Real-Time Pyrophosphate. Science 281, 363-365, (1998). 281, 363-365, (1998). 223 Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M. & Nyrén, P. Real-time DNA sequencing using 223 Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M. & Nyrén, P. Real-time DNA sequencing using detection of pyrophosphate release. Analytical biochemistry 242, 84-89, (1996). detection of pyrophosphate release. Analytical biochemistry 242, 84-89, (1996). 224 Murray, K. K. DNA Sequencing by Mass Spectrometry. Journal of Mass Spectrometry 31, 1203-1215, 224 Murray, K. K. DNA Sequencing by Mass Spectrometry. Journal of Mass Spectrometry 31, 1203-1215, (1996). (1996). 225 Mitra, R. D., Shendure, J., Olejnik, J., Edyta Krzymanska, O. & Church, G. M. Fluorescent in situ 225 Mitra, R. D., Shendure, J., Olejnik, J., Edyta Krzymanska, O. & Church, G. M. Fluorescent in situ sequencing on polymerase colonies. Analytical biochemistry 320, 55-65, (2003). sequencing on polymerase colonies. Analytical biochemistry 320, 55-65, (2003). 226 Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on 226 Brenner, S. et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nature Biotechnology 18, 630-634, (2000). microbead arrays. Nature Biotechnology 18, 630-634, (2000). 227 Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 227 Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728-1732, (2005). 1728-1732, (2005). 228 Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 228 Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348-352, (2011). 475, 348-352, (2011). 229 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. 229 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53-59, (2008). Nature 456, 53-59, (2008). 230 Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106-109, (2008). 230 Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106-109, (2008).

67 67

231 Bowers, J. et al. Virtual terminator nucleotides for next-generation DNA sequencing. Nature methods 6, 593- 231 Bowers, J. et al. Virtual terminator nucleotides for next-generation DNA sequencing. Nature methods 6, 593- 595, (2009). 595, (2009). 232 Metzker, M. L. Sequencing technologies - the next generation. Nature reviews Genetics 11, 31-46, (2010). 232 Metzker, M. L. Sequencing technologies - the next generation. Nature reviews Genetics 11, 31-46, (2010). 233 Hardin, S., Gao, X., Briggs, J., Willson, R. & Tu, S.-C. . Methods for real-time single molecule sequence 233 Hardin, S., Gao, X., Briggs, J., Willson, R. & Tu, S.-C. . Methods for real-time single molecule sequence determination. US Patent patent 7,329,492 (2000). determination. US Patent patent 7,329,492 (2000). 234 Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133-138, (2009). 234 Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133-138, (2009). 235 In Sequence: 'Life Tech Details Real-Time Single-Molecule Tech at AGBT; Combines Qdots with FRET-Based 235 In Sequence: 'Life Tech Details Real-Time Single-Molecule Tech at AGBT; Combines Qdots with FRET-Based Detection', (March 2, 2010). Detection', (March 2, 2010). 236 Drmanac, R. et al. Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA 236 Drmanac, R. et al. Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 327, 78-81, (2009). Nanoarrays. Science 327, 78-81, (2009). 237 Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal 237 Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome research 18, 1051-1063, (2008). sequence-dictated positioning. Genome research 18, 1051-1063, (2008). 238 Kim, J. B. et al. Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic 238 Kim, J. B. et al. Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic cardiomyopathy. Science 316, 1481-1484, (2007). cardiomyopathy. Science 316, 1481-1484, (2007). 239 Erlich, Y., Mitra, P., Delabastide, M., McCombie, W. R. & Hannon, G. Alta-Cyclic: a self-optimizing 239 Erlich, Y., Mitra, P., Delabastide, M., McCombie, W. R. & Hannon, G. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nature methods 5, 679-682, (2008). base caller for next-generation sequencing. Nature methods 5, 679-682, (2008). 240 Nakamura, K. et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research, (2011). 240 Nakamura, K. et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research, (2011). 241 Metzker, M. L. Sequencing in real time. Nature Biotechnology 27, 150-151, (2009). 241 Metzker, M. L. Sequencing in real time. Nature Biotechnology 27, 150-151, (2009). 242 Korlach, J. et al. Selective aluminum passivation for targeted immobilization of single DNA polymerase 242 Korlach, J. et al. Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proceedings of the National Academy of Sciences of the United molecules in zero-mode waveguide nanostructures. Proceedings of the National Academy of Sciences of the United States of America 105, 1176-1181, (2008). States of America 105, 1176-1181, (2008). 243 Huse, S., Huber, J., Morrison, H. G., Sogin, M. & Welch, D. M. Accuracy and quality of massively 243 Huse, S., Huber, J., Morrison, H. G., Sogin, M. & Welch, D. M. Accuracy and quality of massively parallel DNA pyrosequencing. Genome biology 8, R143, (2007). parallel DNA pyrosequencing. Genome biology 8, R143, (2007). 244 Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 244 Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872-876, (2008). 452, 872-876, (2008). 245 454 GS FLX+ System flyer, ( 245 454 GS FLX+ System flyer, ( 246 Discover the New 5500 Series SOLiD™ Sequencers, (2011). 246 Discover the New 5500 Series SOLiD™ Sequencers, (2011). 247 Illumina Sequencing Portfolio, (2011). 247 Illumina Sequencing Portfolio, (2011). 248 Drmanac, R. The advent of personal genome sequencing. Genetics in medicine : official journal of the American 248 Drmanac, R. The advent of personal genome sequencing. Genetics in medicine : official journal of the American College of Medical Genetics 13, 188-190, (2011). College of Medical Genetics 13, 188-190, (2011). 249 Complete Genomics, (Retrieved 2011-08-27). 249 Complete Genomics, (Retrieved 2011-08-27). 250 Pourmand, N. et al. Direct electrical detection of DNA synthesis. Proceedings of the National Academy of Sciences 250 Pourmand, N. et al. Direct electrical detection of DNA synthesis. Proceedings of the National Academy of Sciences of the United States of America 103, 6466-6470, (2006). of the United States of America 103, 6466-6470, (2006). 251 Anderson, E. P. et al. A System for Multiplexed Direct Electrical Detection of DNA Synthesis. Sensors and 251 Anderson, E. P. et al. A System for Multiplexed Direct Electrical Detection of DNA Synthesis. Sensors and actuators B, Chemical 129, 79-86, (2008). actuators B, Chemical 129, 79-86, (2008). 252 Ion Torrent: Spring 2011 Performance Application Note, (2011). 252 Ion Torrent: Spring 2011 Performance Application Note, (2011). 253 Dong, H. et al. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System. Acta 253 Dong, H. et al. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System. Acta biochimica et biophysica Sinica, (2011). biochimica et biophysica Sinica, (2011). 254 Schwientek, P., Szczepanowski, R., Rückert, C., Stoye, J. & Pühler, A. Sequencing of high G+C 254 Schwientek, P., Szczepanowski, R., Rückert, C., Stoye, J. & Pühler, A. Sequencing of high G+C microbial genomes using the ultrafast pyrosequencing technology. Journal of biotechnology, (2011). microbial genomes using the ultrafast pyrosequencing technology. Journal of biotechnology, (2011). 255 Odeberg, J. et al. Context-dependent Taq-polymerase-mediated nucleotide alterations, as revealed by 255 Odeberg, J. et al. Context-dependent Taq-polymerase-mediated nucleotide alterations, as revealed by direct sequencing of the ZNF189 gene: implications for mutation detection. Gene 235, 103-109, (1999). direct sequencing of the ZNF189 gene: implications for mutation detection. Gene 235, 103-109, (1999). 256 Ozsolak, F. et al. Direct RNA sequencing. Nature 461, 814-818, (2009). 256 Ozsolak, F. et al. Direct RNA sequencing. Nature 461, 814-818, (2009). 257 Ozsolak, F. et al. Comprehensive polyadenylation site maps in yeast and human reveal pervasive 257 Ozsolak, F. et al. Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 1018-1029, (2010). alternative polyadenylation. Cell 143, 1018-1029, (2010). 258 Orlando, L. et al. True single-molecule DNA sequencing of a Pleistocene horse bone. Genome research, 258 Orlando, L. et al. True single-molecule DNA sequencing of a Pleistocene horse bone. Genome research, (2011). (2011). 259 Pushkarev, D., Neff, N. & Quake, S. R. Single-molecule sequencing of an individual human genome. 259 Pushkarev, D., Neff, N. & Quake, S. R. Single-molecule sequencing of an individual human genome. Nature Biotechnology 27, 847-850, (2009). Nature Biotechnology 27, 847-850, (2009). 260 Schadt, E. E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Human molecular 260 Schadt, E. E., Turner, S. & Kasarskis, A. A window into third-generation sequencing. Human molecular genetics 19, R227-240, (2010). genetics 19, R227-240, (2010). 261 Helicos Biosciences Corporation, (Retrieved 2011-08-24). 261 Helicos Biosciences Corporation, (Retrieved 2011-08-24). 262 In Sequence: 'Sequencing Services Help Helicos Double Q1 Revenues, Though Two Institutes Return Machines', 262 In Sequence: 'Sequencing Services Help Helicos Double Q1 Revenues, Though Two Institutes Return Machines', (May 17, 2011). (May 17, 2011). 263 Rasko, D. A. et al. Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrome in 263 Rasko, D. A. et al. Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrome in Germany. The New England journal of medicine, (2011). Germany. The New England journal of medicine, (2011). 264 Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. 264 Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nature methods 7, 461-465, (2010). Nature methods 7, 461-465, (2010). 265 Chin, C. S. et al. The origin of the Haitian cholera outbreak strain. The New England journal of medicine 364, 265 Chin, C. S. et al. The origin of the Haitian cholera outbreak strain. The New England journal of medicine 364, 33-42, (2011). 33-42, (2011). 68 68 Julia Sandberg Julia Sandberg

266 Uemura, S. et al. Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 266 Uemura, S. et al. Real-time tRNA transit on single translating ribosomes at codon resolution. Nature 464, 1012-1017, (2010). 1012-1017, (2010). 267 Timp, W. et al. Nanopore Sequencing: Electrical Measurements of the Code of Life. IEEE Transactions on 267 Timp, W. et al. Nanopore Sequencing: Electrical Measurements of the Code of Life. IEEE Transactions on Nanotechnology 9, 281-294, (2010). Nanotechnology 9, 281-294, (2010). 268 Noblegen, (Retrieved 2011-09-03). 268 Noblegen, (Retrieved 2011-09-03). 269 Astier, Y., Braha, O. & Bayley, H. Toward single molecule DNA sequencing: direct identification of 269 Astier, Y., Braha, O. & Bayley, H. Toward single molecule DNA sequencing: direct identification of ribonucleoside and deoxyribonucleoside 5'-monophosphates by using an engineered protein nanopore ribonucleoside and deoxyribonucleoside 5'-monophosphates by using an engineered protein nanopore equipped with a molecular adapter. Journal of the American Chemical Society 128, 1705-1710, (2006). equipped with a molecular adapter. Journal of the American Chemical Society 128, 1705-1710, (2006). 270 Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature 270 Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature nanotechnology 4, 265-270, (2009). nanotechnology 4, 265-270, (2009). 271 Derrington, I. M. et al. Nanopore DNA sequencing with MspA. Proceedings of the National Academy of Sciences 271 Derrington, I. M. et al. Nanopore DNA sequencing with MspA. Proceedings of the National Academy of Sciences of the United States of America 107, 16060-16065, (2010). of the United States of America 107, 16060-16065, (2010). 272 Markowitz, V. M. et al. The integrated microbial genomes system: an expanding comparative analysis 272 Markowitz, V. M. et al. The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Research 38, D382-390, (2010). resource. Nucleic Acids Research 38, D382-390, (2010). 273 The Integrated Microbial Genomes (IMG), (Retrieved 2011-09- 273 The Integrated Microbial Genomes (IMG), (Retrieved 2011-09- 08). 08). 274 Renfree, M. et al. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into 274 Renfree, M. et al. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome biology 12, R81, (2011). the evolution of mammalian reproduction and development. Genome biology 12, R81, (2011). 275 Locke, D. P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529-533, 275 Locke, D. P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529-533, (2011). (2011). 276 Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311-317, (2010). 276 Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311-317, (2010). 277 Argout, X. et al. The genome of Theobroma cacao. Nature Genetics 43, 101-108, (2011). 277 Argout, X. et al. The genome of Theobroma cacao. Nature Genetics 43, 101-108, (2011). 278 Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics 43, 109-116, (2011). 278 Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics 43, 109-116, (2011). 279 Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178-183, (2010). 279 Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178-183, (2010). 280 Potato Genome Sequencing, C. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 280 Potato Genome Sequencing, C. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189-195, (2011). 189-195, (2011). 281 Banks, J. A. et al. The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of 281 Banks, J. A. et al. The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of Vascular . Science 332, 960-963, (2011). Vascular Plants. Science 332, 960-963, (2011). 282 Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome 282 Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome research 20, 265-272, (2010). research 20, 265-272, (2010). 283 Chapman, J. A. et al. Meraculous: de novo genome assembly with short paired-end reads. PLoS One 6, 283 Chapman, J. A. et al. Meraculous: de novo genome assembly with short paired-end reads. PLoS One 6, e23501, (2011). e23501, (2011). 284 Bao, S. et al. Evaluation of next-generation sequencing software in mapping and assembly. Journal of human 284 Bao, S. et al. Evaluation of next-generation sequencing software in mapping and assembly. Journal of human genetics 56, 406-414, (2011). genetics 56, 406-414, (2011). 285 Wetzel, J., Kingsford, C. & Pop, M. Assessing the benefits of using mate-pairs to resolve repeats in de 285 Wetzel, J., Kingsford, C. & Pop, M. Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics 12, 95, (2011). novo short-read prokaryotic assemblies. BMC Bioinformatics 12, 95, (2011). 286 Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 286 Korbel, J. O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420-426, (2007). 318, 420-426, (2007). 287 BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain., 287 BGI Sequences Genome of the Deadly E. Coli in Germany and Reveals New Super-Toxic Strain., (Retrieved 2011-09-03). (Retrieved 2011-09-03). 288 In Sequence: 'Something new', (June 2, 2011). 288 In Sequence: 'Something new', (June 2, 2011). 289 Tobes, R., Manrique, M., Pareja-Tobes, P., Pareja-Tobes, E. & Pareja, E. Escherichia coli EHEC 289 Tobes, R., Manrique, M., Pareja-Tobes, P., Pareja-Tobes, E. & Pareja, E. Escherichia coli EHEC Germany outbreak preliminary functional annotation using BG7 system. Nature Precedings, 1-56, (2011). Germany outbreak preliminary functional annotation using BG7 system. Nature Precedings, 1-56, (2011). 290 Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. 290 Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473-477, (2010). Nature 465, 473-477, (2010). 291 Lupski, J. R. et al. Whole-Genome Sequencing in a Patient with Charcot-Marie-Tooth Neuropathy. New 291 Lupski, J. R. et al. Whole-Genome Sequencing in a Patient with Charcot-Marie-Tooth Neuropathy. New England Journal of Medicine 362, 1181-1191, (2010). England Journal of Medicine 362, 1181-1191, (2010). 292 Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 292 Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636-639, (2010). 328, 636-639, (2010). 293 Schuster, S. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943-947, 293 Schuster, S. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943-947, (2010). (2010). 294 Ahn, S. M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio- 294 Ahn, S. M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio- ethnic group. Genome research 19, 1622-1629, (2009). ethnic group. Genome research 19, 1622-1629, (2009). 295 Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60-65, (2008). 295 Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60-65, (2008). 296 The 1000 Genomes Project Consortium, (Retrieved 2011-09-04). 296 The 1000 Genomes Project Consortium, (Retrieved 2011-09-04). 297 The 1000 Genomes Project, C. A map of human genome variation from population-scale sequencing. 297 The 1000 Genomes Project, C. A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073, (2010). Nature 467, 1061-1073, (2010). 298 The International Cancer Genome Project (Retrieved 2011-09-02). 298 The International Cancer Genome Project (Retrieved 2011-09-02). 299 Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. 299 Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191-196, (2010). Nature 463, 191-196, (2010). 69 69

300 Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. The 300 Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. The New England journal of medicine 361, 1058-1066, (2009). New England journal of medicine 361, 1058-1066, (2009). 301 Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 999- 301 Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 999- 1005, (2010). 1005, (2010). 302 Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. 302 Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190, (2010). Nature 463, 184-190, (2010). 303 Ciccarelli, F. D. The (r)evolution of cancer genetics. BMC biology 8, 74, (2010). 303 Ciccarelli, F. D. The (r)evolution of cancer genetics. BMC biology 8, 74, (2010). 304 Roach, J. et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science 304 Roach, J. et al. Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science 328, 636-639, (2010). 328, 636-639, (2010). 305 Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal 305 Fan, H. C., Blumenfeld, Y. J., Chitkara, U., Hudgins, L. & Quake, S. R. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proceedings of the National Academy of Sciences of aneuploidy by shotgun sequencing DNA from maternal blood. Proceedings of the National Academy of Sciences of the United States of America 105, 16266-16271, (2008). the United States of America 105, 16266-16271, (2008). 306 Ehrich, M. et al. Noninvasive detection of fetal trisomy 21 by sequencing of DNA in maternal blood: a 306 Ehrich, M. et al. Noninvasive detection of fetal trisomy 21 by sequencing of DNA in maternal blood: a study in a clinical setting. American journal of obstetrics and gynecology 204, 205.e201-211, (2011). study in a clinical setting. American journal of obstetrics and gynecology 204, 205.e201-211, (2011). 307 Hahn, S., Lapaire, O., Tercanli, S., Kolla, V. & Hösli, I. Determination of fetal chromosome aberrations 307 Hahn, S., Lapaire, O., Tercanli, S., Kolla, V. & Hösli, I. Determination of fetal chromosome aberrations from fetal DNA in maternal blood: has the challenge finally been met? Expert Reviews in Molecular Medicine from fetal DNA in maternal blood: has the challenge finally been met? Expert Reviews in Molecular Medicine 13, e16, (2011). 13, e16, (2011). 308 Snyder, T. M., Khush, K. K., Valantine, H. A. & Quake, S. R. Universal noninvasive detection of solid 308 Snyder, T. M., Khush, K. K., Valantine, H. A. & Quake, S. R. Universal noninvasive detection of solid organ transplant rejection. Proceedings of the National Academy of Sciences of the United States of America 108, organ transplant rejection. Proceedings of the National Academy of Sciences of the United States of America 108, 6229-6234, (2011). 6229-6234, (2011). 309 Bejerano, G., Haussler, D. & Blanchette, M. Into the heart of darkness: large-scale clustering of human 309 Bejerano, G., Haussler, D. & Blanchette, M. Into the heart of darkness: large-scale clustering of human non-coding DNA. Bioinformatics (Oxford, England) 20 Suppl, i40-i48, (2004). non-coding DNA. Bioinformatics (Oxford, England) 20 Suppl, i40-i48, (2004). 310 Thomas, R. et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel 310 Thomas, R. et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nature Medicine 12, 852-855, (2006). picoliter reactor sequencing. Nature Medicine 12, 852-855, (2006). 311 Weinstein, J. A., Jiang, N., White, R. A., Fisher, D. S. & Quake, S. R. High-throughput sequencing of the 311 Weinstein, J. A., Jiang, N., White, R. A., Fisher, D. S. & Quake, S. R. High-throughput sequencing of the zebrafish antibody repertoire. Science 324, 807-810, (2009). zebrafish antibody repertoire. Science 324, 807-810, (2009). 312 Campbell, P. J. et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. 312 Campbell, P. J. et al. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proceedings of the National Academy of Sciences of the United States of America 105, 13081-13086, (2008). Proceedings of the National Academy of Sciences of the United States of America 105, 13081-13086, (2008). 313 Turnbaugh, P. J. et al. A core gut in obese and lean twins. Nature 457, 480-484, (2009). 313 Turnbaugh, P. J. et al. A core gut microbiome in obese and lean twins. Nature 457, 480-484, (2009). 314 Binladen, J. et al. The use of coded PCR primers enables high-throughput sequencing of multiple 314 Binladen, J. et al. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS One 2, e197, (2007). homolog amplification products by 454 parallel sequencing. PLoS One 2, e197, (2007). 315 Galan, M., Guivier, E., Caraux, G., Charbonnel, N. & Cosson, J. F. A 454 multiplex sequencing method 315 Galan, M., Guivier, E., Caraux, G., Charbonnel, N. & Cosson, J. F. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies. BMC Genomics 11, for rapid and reliable genotyping of highly polymorphic genes in large-scale studies. BMC Genomics 11, 296, (2010). 296, (2010). 316 Kozarewa, I. & Turner, D. J. Amplification-free library preparation for paired-end Illumina sequencing. 316 Kozarewa, I. & Turner, D. J. Amplification-free library preparation for paired-end Illumina sequencing. Methods in molecular biology (Clifton, NJ) 733, 257-266, (2011). Methods in molecular biology (Clifton, NJ) 733, 257-266, (2011). 317 Cronn, R. et al. Multiplex sequencing of plant genomes using Solexa sequencing-by-synthesis 317 Cronn, R. et al. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Research 36, e122, (2008). technology. Nucleic Acids Research 36, e122, (2008). 318 Neiman, M., Lundin, S., Savolainen, P. & Ahmadian, A. Decoding a substantial set of samples in parallel 318 Neiman, M., Lundin, S., Savolainen, P. & Ahmadian, A. Decoding a substantial set of samples in parallel by massive sequencing. PLoS One 6, e17785, (2011). by massive sequencing. PLoS One 6, e17785, (2011). 319 Singleton, A. B., Hardy, J., Traynor, B. J. & Houlden, H. Towards a complete resolution of the genetic 319 Singleton, A. B., Hardy, J., Traynor, B. J. & Houlden, H. Towards a complete resolution of the genetic architecture of disease. Trends in Genetics 26, 438-442, (2010). architecture of disease. Trends in Genetics 26, 438-442, (2010). 320 Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. 320 Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 19096-19101, (2009). Proceedings of the National Academy of Sciences of the United States of America 106, 19096-19101, (2009). 321 Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272- 321 Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272- 276, (2009). 276, (2009). 322 Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics 42, 30-35, 322 Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics 42, 30-35, (2010). (2010). 323 Shearer, A. E. et al. Comprehensive genetic testing for hereditary hearing loss using massively parallel 323 Shearer, A. E. et al. Comprehensive genetic testing for hereditary hearing loss using massively parallel sequencing. Proceedings of the National Academy of Sciences of the United States of America 107, 21104-21109, sequencing. Proceedings of the National Academy of Sciences of the United States of America 107, 21104-21109, (2010). (2010). 324 Geiger, T., Cox, J., Ostasiewicz, P., Wisniewski, J. & Mann, M. Super-SILAC mix for quantitative 324 Geiger, T., Cox, J., Ostasiewicz, P., Wisniewski, J. & Mann, M. Super-SILAC mix for quantitative proteomics of human tumor tissue. Nature methods 7, 383-385, (2010). proteomics of human tumor tissue. Nature methods 7, 383-385, (2010). 325 Vogel, C. et al. Sequence signatures and mRNA concentration can explain two-thirds of protein 325 Vogel, C. et al. Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Molecular Systems Biology 6, 400, (2010). abundance variation in a human cell line. Molecular Systems Biology 6, 400, (2010). 326 Lockhart, D. J. et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature 326 Lockhart, D. J. et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotechnology 14, 1675-1680, (1996). Biotechnology 14, 1675-1680, (1996). 327 Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature methods 5, 327 Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature methods 5, 613-619, (2008). 613-619, (2008).

70 70 Julia Sandberg Julia Sandberg

328 Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying 328 Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 5, 621-628, (2008). mammalian transcriptomes by RNA-Seq. Nature methods 5, 621-628, (2008). 329 Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews 329 Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews Genetics 10, 57-63, (2009). Genetics 10, 57-63, (2009). 330 Kapranov, P., Willingham, A. & Gingeras, T. Genome-wide transcription and the implications for 330 Kapranov, P., Willingham, A. & Gingeras, T. Genome-wide transcription and the implications for genomic organization. Nature reviews Genetics 8, 413-423, (2007). genomic organization. Nature reviews Genetics 8, 413-423, (2007). 331 He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N. & Kinzler, K. W. The antisense 331 He, Y., Vogelstein, B., Velculescu, V. E., Papadopoulos, N. & Kinzler, K. W. The antisense transcriptomes of human cells. Science 322, 1855-1857, (2008). transcriptomes of human cells. Science 322, 1855-1857, (2008). 332 Lipson, D. et al. Quantification of the yeast transcriptome by single-molecule sequencing. Nature 332 Lipson, D. et al. Quantification of the yeast transcriptome by single-molecule sequencing. Nature Biotechnology 27, 652-658, (2009). Biotechnology 27, 652-658, (2009). 333 Parkhomchuk, D. et al. Transcriptome analysis by strand-specific sequencing of complementary DNA. 333 Parkhomchuk, D. et al. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Research 37, e123, (2009). Nucleic Acids Research 37, e123, (2009). 334 Mamanova, L. et al. FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nature methods 334 Mamanova, L. et al. FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nature methods 7, 130-132, (2010). 7, 130-132, (2010). 335 Ozsolak, F. et al. Digital transcriptome profiling from attomole-level RNA samples. Genome research 20, 335 Ozsolak, F. et al. Digital transcriptome profiling from attomole-level RNA samples. Genome research 20, 519-525, (2010). 519-525, (2010). 336 Ozsolak, F. et al. Amplification-free digital gene expression profiling from minute cell quantities. Nature 336 Ozsolak, F. et al. Amplification-free digital gene expression profiling from minute cell quantities. Nature methods 7, 619-621, (2010). methods 7, 619-621, (2010). 337 Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. 337 Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome research 21, 1160-1167, (2011). Genome research 21, 1160-1167, (2011). 338 Tariq, M. A., Kim, H. J., Jejelowo, O. & Pourmand, N. Whole-transcriptome RNAseq analysis from 338 Tariq, M. A., Kim, H. J., Jejelowo, O. & Pourmand, N. Whole-transcriptome RNAseq analysis from minute amount of total RNA. Nucleic Acids Research gkr547, (2011). minute amount of total RNA. Nucleic Acids Research gkr547, (2011). 339 Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA- 339 Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA- Seq analysis. Cell stem cell 6, 468-478, (2010). Seq analysis. Cell stem cell 6, 468-478, (2010). 340 Berger, M. F. et al. Integrative analysis of the melanoma transcriptome. Genome research 20, 413-427, 340 Berger, M. F. et al. Integrative analysis of the melanoma transcriptome. Genome research 20, 413-427, (2010). (2010). 341 Murchison, E. P. et al. The Tasmanian devil transcriptome reveals Schwann cell origins of a clonally 341 Murchison, E. P. et al. The Tasmanian devil transcriptome reveals Schwann cell origins of a clonally transmissible cancer. Science 327, 84-87, (2010). transmissible cancer. Science 327, 84-87, (2010). 342 Shackleton, M. et al. Generation of a functional mammary gland from a single stem cell. Nature 439, 84- 342 Shackleton, M. et al. Generation of a functional mammary gland from a single stem cell. Nature 439, 84- 88, (2006). 88, (2006). 343 Leong, K. G., Wang, B. E., Johnson, L. & Gao, W. Q. Generation of a prostate from a single adult stem 343 Leong, K. G., Wang, B. E., Johnson, L. & Gao, W. Q. Generation of a prostate from a single adult stem cell. Nature 456, 804-808, (2008). cell. Nature 456, 804-808, (2008). 344 Shi, Y. et al. Induction of pluripotent stem cells from mouse embryonic fibroblasts by Oct4 and Klf4 with 344 Shi, Y. et al. Induction of pluripotent stem cells from mouse embryonic fibroblasts by Oct4 and Klf4 with small-molecule compounds. Cell stem cell 3, 568-574, (2008). small-molecule compounds. Cell stem cell 3, 568-574, (2008). 345 Applied Biosystems SOLiD4 System Templated Bead Preparation Guide, (March 2010). 345 Applied Biosystems SOLiD4 System Templated Bead Preparation Guide, (March 2010). 346 Morgan, E. et al. Cytometric bead array: a multiplexed assay platform with applications in various areas of 346 Morgan, E. et al. Cytometric bead array: a multiplexed assay platform with applications in various areas of biology. Clinical immunology 110, 252-266, (2004). biology. Clinical immunology 110, 252-266, (2004). 347 Horejsh, D. et al. A molecular beacon, bead-based assay for the detection of nucleic acids by flow 347 Horejsh, D. et al. A molecular beacon, bead-based assay for the detection of nucleic acids by flow cytometry. Nucleic Acids Research 33, e13, (2005). cytometry. Nucleic Acids Research 33, e13, (2005). 348 Borgström, E., Lundin, S. & Lundeberg, J. Large scale library generation for high throughput sequencing. 348 Borgström, E., Lundin, S. & Lundeberg, J. Large scale library generation for high throughput sequencing. PLoS One 6, e19119, (2011). PLoS One 6, e19119, (2011). 349 Steuernagel, B. et al. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and 349 Steuernagel, B. et al. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics 10, 547, (2009). genome analysis in the complex genome of barley. BMC Genomics 10, 547, (2009). 350 Lundin, S., Stranneheim, H., Pettersson, E., Klevebring, D. & Lundeberg, J. Increased throughput by 350 Lundin, S., Stranneheim, H., Pettersson, E., Klevebring, D. & Lundeberg, J. Increased throughput by parallelization of library preparation for massive sequencing. PLoS One 5, e10029, (2010). parallelization of library preparation for massive sequencing. PLoS One 5, e10029, (2010). 351 Suárez-Díaz, E. & Anaya-Muñoz, V. H. History, objectivity, and the construction of molecular 351 Suárez-Díaz, E. & Anaya-Muñoz, V. H. History, objectivity, and the construction of molecular phylogenies. Studies in History and Philosophy of Biol & Biomed Sci 39, 451-468, (2008). phylogenies. Studies in History and Philosophy of Biol & Biomed Sci 39, 451-468, (2008). 352 Boyer, J. C. et al. Sequence dependent instability of mononucleotide microsatellites in cultured mismatch 352 Boyer, J. C. et al. Sequence dependent instability of mononucleotide microsatellites in cultured mismatch repair proficient and deficient mammalian cells. Human molecular genetics 11, 707-713, (2002). repair proficient and deficient mammalian cells. Human molecular genetics 11, 707-713, (2002). 353 Sagher, D., Hsu, A. & Strauss, B. Stabilization of the intermediate in frameshift mutation. Mutation research 353 Sagher, D., Hsu, A. & Strauss, B. Stabilization of the intermediate in frameshift mutation. Mutation research 423, 73-77, (1999). 423, 73-77, (1999). 354 Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved 354 Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature methods 6, 291-295, (2009). mapping and assembly of (G+C)-biased genomes. Nature methods 6, 291-295, (2009). 355 Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome 355 Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome biology 12, R18, (2011). biology 12, R18, (2011). 356 Gunnarsdóttir, E. D., Li, M., Bauchet, M., Finstermeier, K. & Stoneking, M. High-throughput 356 Gunnarsdóttir, E. D., Li, M., Bauchet, M., Finstermeier, K. & Stoneking, M. High-throughput sequencing of complete human mtDNA genomes from the Philippines. Genome research 21, 1-11, (2011). sequencing of complete human mtDNA genomes from the Philippines. Genome research 21, 1-11, (2011).

71 71