<<

kth royal institute of technology

Licentiate thesis in Biotechnology Towards spatial host-microbiome profiling

BRITTA LÖTSTEDT

Stockholm, Sweden 2021 Towards spatial host-microbiome profiling

BRITTA LÖTSTEDT

Academic Dissertation which, with due permission of the KTH Royal Institute of Technology, is submitted for public defence for the Degree of Licentiate of Engineering on Tuesday, February 23 2021, at 1:00 p.m

Licentiate thesis in Biotechnology KTH Royal Institute of Technology Stockholm, Sweden 2021 © Britta Lötstedt

ISBN 978-91-7873-767-3 TRITA-CBH-FOU-2021:2

Printed by: Universitetsservice US-AB, Sweden 2021 Britta Lötstedt

© Britta Lötstedt (2021): Towards spatial host-microbiome profiling, KTH Royal Institute of Technology, Stockholm, Sweden. ISBN 978-91-7873-767-3 | TRITA-CBH-FOU-2021:2

Abstract Sequencing technologies and applications have pushed the limits and enabled novel studies of biological mechanisms, evolutionary relationships and communication networks between cells. The technical developments leading to single cell RNA-sequencing have enabled detection of rare cell populations while spatial resolution added insights into larger biological environments, like tissues and organs. Massively parallel sequencing has paved the way for integrated high-throughput analyses including that of studying gene expression, expression and mapping of microbial communities. This thesis starts with an introduction describing the technical and biological advancements made in recent years with focus on spatially resolved approaches. Then, a summary of recent accomplishments is presented, which enabled ongoing work in a novel field of spatial host- microbiome profiling. Lastly, the concluding remarks include both a future perspective and a short reflection on the current developments in the spatial multi-omics field. 16S sequencing is often used for taxonomic classification of bacteria. In Paper I, this sequencing technique was used to study the aerodigestive microbiome in pediatric lung transplant recipients. Many of these patients regretfully reject the organ after transplant, but the underlying cause is, in many cases, unknown. In this paper, multiple factors influencing rejection were examined including that of the aerodigestive microbiome. Pediatric lung transplant recipients often suffer from gastrointestinal dysmotility and the focus of this study was also to analyze changes in the microbiome in relation to irregular gastric muscle movements. The results showed that lung transplant recipients had, in general, lower microbial diversity in the gastric fluid and throat and also that the microbial overlap between lung and gastric sampling sites was significantly less in transplant recipients compared to controls. In addition, gastrointestinal dysmotility was shown to influence the gastric microbiome in lung transplant recipients, but, given the small sample size available in this study, the correlation to patient outcome could not be examined. Integrated analysis of the transcriptome and the -based proteome in the same tissue section was enabled using the method developed in Paper II. Spatial Multi- Omics (SM-Omics) uses a barcoded glass array to capture mRNA and antibody-based expression of selected in the same section. The antibody-based profiling of the tissue section was enabled by either immunofluorescence or DNA-barcoded that were then decoded by sequencing. The protocol was scaled-up using an automated liquid- handling system. Using this method, simultaneous profiling of the transcriptome and multiplexed protein values was determined in both the mouse brain cortex and mouse spleen. Results showed a high correlation in spatial pattern between gene expression and antibody measurements, independently of the antibody labelling technique. SM-Omics generates a high-plex multi-omics characterization of the tissue in a high throughput manner while exhibiting low technical variation.

Keywords: 16S sequencing, RNA sequencing, spatially resolved transcriptomics, antibody- based measurements

Sammanfattning

Tekniker och applikationer som använder sekvensering har flyttat fram gränserna och tillåtit nya undersökningar av biologiska mekanismer, evolutionära släktskap och kommunikationsnätverk mellan celler. De tekniska utvecklingarna som har lett fram till RNA-sekvensering av enskilda celler har möjliggjort upptäckten av sällsynta cell- populationer medan den rumsliga upplösningen har inneburit en ökad förståelse av större biologiska miljöer, såsom vävnader och organ. Massively parallel sequencing har banat väg för integrerade analyser med hög kapacitet, vilket inkluderar analys av genuttryck, proteinuttryck och kartläggning av bakteriella samhällen. Den här avhandlingen börjar med en introduktion som beskriver tekniska och biologiska framsteg som gjorts de senaste åren, med fokus på den rumsliga upplösningen. Sedan följer en summering av de senaste prestationerna som har möjliggjort det pågående arbetet i ett nytt fält som avhandlar rumslig profilering av bakterien och dess värd. Slutligen innehåller slutordet både ett framtida perspektiv samt en kort reflektion av den nuvarande utvecklingen inom fälten för rumslig mång-omik. 16S-sekvensering används ofta för att taxonomiskt klassificera bakterier. Denna sekvenseringsteknik användes i artikel I för att studera mikrobiomet i luft- och matspjälkningskanalen hos barn med transplanterad lunga. Dessvärre är det vanligt med avstötning av lungan efter transplantationen hos många av dessa patienter, men den underliggande orsaken till avstötningen är, i många fall, okänd. I denna studie undersöktes flertalet faktorer, inklusive mikrobiomet i luft- och matspjälkningskanalen, som kan tänkas påverka bortstötningen. Barn med transplanterad lunga lider ofta av störningar i mag- tarmkanalens rörelser och artikelns fokus var därmed även att analysera förändringar i mikrobiomet i relation till dessa avvikande rörelser i mag-tarmkanalen. Resultatet visade att patienter med transplanterad lunga generellt hade lägre bakteriell mångfald i magsaft och hals, samt att det bakteriella överlappet mellan lunga och magsaft var signifikant mindre i patienter med transplanterad lunga jämfört med kontrollerna. För övrigt visade det sig att störningar i mag-tarmkanalens rörelser påverkade magsaftens mikrobiom hos patienter med transplanterad lunga, men på grund av studiens storlek på urvalet, kunde det inte undersökas hur detta korrelerade till utfallet hos patienterna. Integrerad analys av transkriptomet och antikroppsbaserad analys av proteomet i samma vävnadssnitt har möjliggjorts genom metoden som utvecklats i artikel II. Spatial Multi-Omics (SM-Omics) använder ett avkodningsbart mönster av korta DNA-segment på en glasyta för att fånga mRNA och antikroppsbaserat uttryck av utvalda proteiner från samma vävnadssnitt. Den antikroppsbaserade profileringen av vävnadssnittet uppnåddes genom antingen immunofluorescens eller antikroppar märkta med DNA-segment som kunde avkodas genom sekvensering. Protokollet skalades upp genom ett automatiserat system för att behandla vätskor. Genom användning av denna metod kunde simultan profilering av transkriptomet och flertalet proteiner uppnås i både hjärnbarken och mjälten hos en mus. Resultaten visade en hög korrelation i det rumsliga mönstret mellan genuttrycket och de antikroppsbaserade mätningarna, oberoende av hur antikropparna hade märkts. SM-Omics genererar en storskalig karaktärisering av vävnaden av flera omiker med hög kapacitet samtidigt som den har låg teknisk variation.

Nyckelord: 16S-sekvensering, RNA-sekvensering, rumslig upplösning av transkriptomet, antikroppsbaserade mätningar

Britta Lötstedt

List of publications

The thesis is based on the following two papers, which are referred to as Papers I-II.

I. Lötstedt B, Boyer D, Visner G, Freiberger D, Lurie M, Kane M, DiFilippo C, Lundeberg J, Narvaez-Rivas M, Setchell K, Alm E, Rosen R (2020) The impact of gastrointestinal dysmotility on the aerodigestive microbiome of pediatric lung transplant recipients. The Journal of Heart and Lung Transplantation 2020, ISSN 1053- 2498. doi: doi.org/10.1016/j.healun.2020.11.013.

II. Vickovic S*, Lötstedt B*, Klughammer J, Segerstolpe Å, Rozenblatt-Rosen O, Regev A (2020) SM-Omics: An automated platform for high-throughput spatial multi-omics. Submitted manuscript

* These authors contributed equally to this work.

All papers have been reprinted with permissions from the respective publishers.

Contents Introduction ...... 1 Life ...... 1 Biopolymers of life ...... 1 The central dogma of molecular biology ...... 3 Analysis of nucleic acids ...... 5 Enzymes and their applications ...... 5 Microarrays ...... 6 Sequencing ...... 7 Early sequencing approaches ...... 7 Massively parallel sequencing ...... 8 RNA-sequencing in bulk and in individual cells ...... 9 Unique Molecular Identifiers ...... 10 16S analysis: from culturing to massively parallel sequencing .. 11 Early microbial community identification approaches ...... 11 First 16S DNA sequences ...... 11 Massively parallel 16S sequencing ...... 12 Analysis of 16S sequencing data ...... 12 Metagenomic sequencing ...... 14 Long-read sequencing ...... 14 Methods to resolve the spatial transcriptome ...... 15 Imaging-based spatial techniques ...... 15 Sequencing-based spatial methods ...... 16 Cryosectioning-based spatial methods combined with RNA-seq ...... 17 In situ sequencing-based methods for tissue profiling ...... 17 In situ capture-based methods for tissue profiling ...... 19 Analysis of spatial transcriptomics data ...... 21 Antibody-based analyses ...... 24 Affinity ...... 24 Immunohistochemistry ...... 24 Multiplex protein measurements in tissues ...... 25 Combined spatial RNA and antibody-based measurements ... 26

Britta Lötstedt

Present investigations ...... 29 Paper I - Analysis of the aerodigestive microbiome in pediatric lung transplant recipients ...... 29 Paper II - Method to spatially resolve the transcriptome and antibody-based proteome in a high throughput fashion ...... 33 Concluding remarks ...... 37 Acknowledgments ...... 40 References ...... 41

Britta Lötstedt

Introduction

Life There is no easy definition of life (1), and the so-called ‘chemical Darwinian’ definition ‘Life is a self-sustained chemical system capable of undergoing Darwinian evolution’ does not hold true for life forms lacking Darwinian evolution, such as infertile organisms like most mules.

Life as we know it on Earth can be divided into four biological groups: archaea, viruses, bacteria and eukaryotes. As for the scope of this thesis, bacteria and eukaryotes have a long history of coexistence, interactions and symbiosis that most likely have impacted the evolution of the eukaryotic host (2). In diverse animals and plants, similar epithelial surfaces have been found to be populated by bacterial communities which are involved in the exchange of nutrients, protection from pathogens and influence organ development (3). It is fairly accepted that bacteria inhabit most body surfaces in mammals and the work to profile the bacterial communities on these surfaces at large scale has been ongoing since the 1670’s when Antonie van Leeuwenhoek made his first observations of bacteria, using his microscope (4).

On a cellular level, bacteria and eukaryotes differ in some aspects: bacteria are single cellular organisms that lack a nucleus and membrane- bound organelles, store genetic information in a circular form instead of a linear one and are usually smaller in size. Despite the differences, both taxonomic domains share the features of plasma membranes to compartmentalize biological processes, ribosomes to synthesize proteins, cytoplasm to enable an environment to perform biological processes and biopolymers to store, maintain and perform the actual processes.

Biopolymers of life Biopolymers are polymers that are produced naturally in cells in living organisms. Covalent bonds cohere specific monomers into larger units which, depending on the monomer, can be separated into three main categories; polysaccharides, polynucleotides and polypeptides. The focus of this thesis is on the latter two.

1

DNA was discovered by Friedrich Miescher in 1869 when Miescher isolated a new molecule from the cell nucleus in white blood cells, made up of hydrogen, oxygen, nitrogen and phosphorus. Miescher named the molecule ‘nuclein’ (5), which later on became known as deoxyribonucleic acid (DNA). As with many great discoveries, Miescher had to fight for his science, and his findings were first published two years later, in 1871 (6).

Around 80 years later in 1952, Rosalind Franklin unraveled the spiral shape of the DNA molecule using X-ray chromatography (7), a finding which was then used by and Francis Crick to establish the three dimensional structure of DNA in form of a double helix (8). DNA had already by then been determined to be the carrier of hereditary information, first by in 1944 (9) and later by Hershey and Chase in 1952 (10).

The chemical structure of DNA is made up of four nucleotides (adenine, cytosine, guanine and thymine; A, C, G and T), which are all covalently bound to one another. A single strand of the helix is made by binding phosphate and deoxyribose groups between nucleotides to form the sugar-phosphate backbone. Then, the two strands of DNA form the double helix through hydrogen bonds between specific nucleobase pairs (A to T and G to C). A union of sequences of DNA nucleotides that encodes functional products is called a gene (10, 11).

Ribonucleic acid (RNA), is a single stranded molecule also made up of four nucleotides branching out of a ribose backbone, but instead of thymine, there is an uracil (U) base which has the ability to bind adenosine. RNA comes in a variety of molecular forms and functions. Two that are within the scope of this thesis are ribosomal RNA (rRNA) and messenger RNA (mRNA). rRNA is the primary component of the ribosomes and makes up the major part of the total RNA content of a cell. Based on sedimentation velocity, rRNA molecules are divided into different rRNA species. In 1977, and George Fox were the first to use one of these species, the 16S rRNA molecule, in phylogenetic analysis to characterize relationships between prokaryotes (12). The 16S rRNA gene contains both conserved as well as hypervariable DNA regions; with these variable regions enabling identification of and systematic study of evolutionary relationships between prokaryotic

2 Britta Lötstedt

organisms. mRNA, on the other hand, is the RNA molecule responsible for the transport of genetic information stored in the DNA to the ribosomes, where it gets translated into proteins. The mRNAs from each gene are transcribed into multiple isoforms and copies. Depending on the cell's lineage and state, the amount of copies of these mRNA molecules can help us identify cell types, functions and processes the cell is undergoing at the time of measurement.

Proteins, first described by Gerardus Johannes Mulder in 1838 (13), are macromolecules, made up of one or several long chains of amino acids. Due to differences in the proteins’ primary structure, they fold into complex 3D structures that determine their function. Proteins are the key components in numerous activities within an organism, from catalysts in chemical reactions to structural entities and units involved in cell signaling. Significant factors in cell signaling and are antibodies. Michael Heidelberger and Oswald Avery showed in the 1920s that antibodies were indeed proteins (14, 15). Antibodies are produced by the immune system to neutralize reactions to foreign objects by targeting a specific part of that object, i.e. the . The structure of an antibody enables it to identify unique structures of an antigen, allowing the antibody and antigen to form precise interactions in space. Upon recognition of the target, large amounts of identical antibodies with the exact same antigen-binding site are produced to bind and neutralize the target. These identical antibodies are called monoclonal antibodies while antibodies with a slightly different antigen-binding site, yet targeting the same object are called polyclonal antibodies.

The central dogma of molecular biology The flow of genetic information between the biopolymers of life is described in the central dogma of molecular biology (16, 17). The transfer of information is generally made from DNA to DNA (DNA replication), from DNA to RNA (transcription) and from RNA to protein (translation) but special cases also involve RNA to RNA (RNA replication) and RNA to DNA (reverse transcription) processes. DNA is the molecule that stores and replicates the genetic information. However, RNA acts not only as the intermediate molecule between DNA and protein but has regulatory functions within cells as well. Depending on the mechanism, different regions of the genetic material are transcribed, which in multicellular

3

organisms can explain some of the variation seen between cell types. Quantitative measurements of the RNA molecules within a cell reflect the cell’s gene expression, but due to episodic transcription (18), the measurement is only valid for short periods of time.

4 Britta Lötstedt

Analysis of nucleic acids Important discoveries and technological developments have paved the way to where DNA and RNA science are today. The following section describes a selection of significant techniques and applications applied to studying nucleic acids.

Enzymes and their applications A fundamental part of the toolbox used to study DNA and RNA are enzymes. These proteins occur naturally in living cells and act as catalysts in chemical reactions. They are often discovered in their natural habitat before being isolated, eventually modified and produced in larger batches to facilitate use in vitro.

Reverse transcriptases were first discovered in 1970. The original idea of the central dogma from 1958, that the genetic information flowed in one direction, was violated when two independent research teams showed that RNA viruses were able to make DNA from an RNA template (19, 20) using reverse transcriptases. In molecular biology, these enzymes have then been used to copy RNA molecules into complementary DNA (cDNA) (21, 22), providing a more stable molecule compared to the RNA molecule itself. The RNA molecule alone is less stable due to the hydroxyl group on the second carbon in the ribose and is naturally in constant risk of degradation by ribonucleases. Thus, the discoveries and development of reverse transcriptases have not only enabled stable storage of RNA information, but also made the RNA information easily accessible. This discovery is today the basis for numerous biomolecular tools and most recently, has enabled study of the cell’s whole RNA collection, the transcriptome.

Polymerases exist in large variety, each with specific features. They are used to synthesize polymers of nucleic acid using a complementary oligonucleotide strand as template. DNA polymerases synthesize DNA from a DNA template. The first DNA polymerase discovered was DNA polymerase I in Escherichia coli (E. coli) by in 1956 (23). The discovery of DNA polymerases made it possible for researchers to soon after invent the polymerase chain reaction (PCR) method, first briefly described in 1971 by H. Gobind Khorana (24) and then in more detail in 1988 by Kary Mullis (25). Using PCR, a particular

5

oligonucleotide template can be exponentially amplified into multiple identical copies of itself which allows for downstream detection and analysis. The critical part of the PCR method is to maintain DNA polymerase activity at the high temperatures that are needed to separate the two DNA strands in each cycle of the reaction. Hence, the PCR method was significantly improved with the introduction of Taq DNA polymerase (26), isolated from Thermus aquaticus (27). Unlike other polymerases used, Taq DNA polymerase was stable at high temperatures required in the PCR reaction. In addition, the PCR method was developed into a version able to provide real-time read-outs conveying the amount of doubling of the original DNA template, by incorporation of fluorescently labeled dyes in each amplification cycle (28). Quantitative PCR (qPCR) has since been used in gene expression analysis to measure mRNA abundance providing a relative measure of gene activity reflecting fold-changes in expression between gene targets.

RNA polymerases, on the other hand, synthesize RNA from a DNA template. A specific type of RNA polymerase was isolated from E. coli infected with the T7 bacteriophage (29). Twenty-two years later, the T7 RNA polymerase system was used to linearly amplify millions of copies of a cell's mRNA by using cDNA as an intermediate molecule (30). This approach was later named in vitro transcription (IVT) and it enabled researchers to study, for the first time, the transcriptome of a single cell.

Microarrays DNA microarrays were glass slides whose surfaces were covered with spots deposited in a predetermined order and location, each spot containing identical copies of a single-stranded DNA probe (31). When used in expression analysis, these probes were designed for complementary hybridization of known and (most often) fluorescently labeled cDNA fragments. The strength of the fluorescent signal from a spot was correlated with the amount of cDNA fragments binding to the surface probes in that spot, reflecting the amount of mRNA in the sample. Microarrays were often used to study relative differences in expression between different conditions, but had their limitations in that only targets with already known sequences could be studied. In addition, they had problems with cross-hybridization leading to unspecific and high background signals. Nonetheless, DNA microarrays are still used to study

6 Britta Lötstedt

genetic variations in the human population, such as single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) (32).

Sequencing Sequencing is the process of determining the order of the nucleotides in DNA or RNA. In 1990, the Human Genome Project (HGP) officially declared to sequence all the bases in the human genome. It took another 11 years before two drafts of the human genome were completed (33, 34). Technological advances both before and after HGP led to innovations like RNA-sequencing and improved taxonomic microbial profiling using sequencing.

Early sequencing approaches The Maxam-Gilbert sequencing method was the first sequencing approach ever described (35), closely followed by Sanger sequencing 4 years later (36). Sanger sequencing, in short, uses a mix of both normal deoxynucleoside triphosphates (dNTPs) and modified dNTPs that are incorporated by DNA polymerase using a single-stranded DNA template. The modified dNTPs lack the 3’OH group which, once incorporated, stop the formation of the phosphodiester binding in the next cycle of dNTP incorporation, hence the elongation of the DNA strand is terminated. The resulting fragments are then separated based on length to reveal the sequencing order of the nucleotides. Frederick Sanger continued to develop the sequencing technique by introducing shotgun sequencing when he, together with his colleagues, sequenced the bacteriophage lambda (37). Using shotgun sequencing, DNA was randomly cleaved into shorter fragments and cloned into vectors to achieve more copies of each fragment. The fragments were sequenced using Sanger sequencing and the resulting reads were then assembled into longer continuous sequences by combining reads with overlapping ends. This method was used to sequence the first complete genome of a free-living organism, Haemophilus influenzae in 1995 (38), and later also the human genome (33). Sanger sequencing is still, with its low error rate of 0.001% (39), used in taxonomic species identification today (40).

7

Massively parallel sequencing In short, DNA is extracted, fragmented, prepared into DNA libraries and clonally amplified to obtain a detectable signal during sequencing. Sequencing-by-synthesis techniques, such as pyrosequencing (41), sequencing-by-reversible termination (42) and sequencing-by-detection of hydrogen bonds (43) have in common that the determination of the nucleotide sequence is done by simultaneous detection of the incorporated nucleotides while synthesizing the complementary DNA strand. The technique with the greatest impact so far has been sequencing-by-reversible termination, used today in Illumina sequencing platforms. Unlike the two other sequencing-by-synthesis techniques which used emulsion PCR (emPCR) in the clonal amplification step, Illumina sequencing uses bridge PCR. In bridge PCR, clusters of identical DNA strands are formed on a solid surface by first hybridizing the DNA library containing fragments with complementary sequences (adaptors) to the oligonucleotides on the solid surface. Then, a complementary strand can be synthetized. The original strand is then washed off, before the remaining DNA strand bends towards the surface and yet again hybridizes to an oligonucleotide on the surface. The process is repeated until millions of clusters, each containing multiple DNA copies originating from the same starting molecule in that cluster, have been generated on the surface. Detection, as already mentioned, is done when the DNA polymerase incorporates fluorescently labeled dNTPs, one at a time, complementary to the DNA strands on the surface. The sequencing machine detects the fluorescent signals from the clusters where a nucleotide is incorporated, after which the fluorophore is cleaved and washed away before a new fluorescently labeled dNTP is added and the sequencing-by-synthesis process continues.

Sequencing-by-ligation, a slightly different way of sequencing, was invented in 2005 and has been widely adopted in later techniques (44). Instead of relying on a polymerase to incorporate nucleotides, fluorescently labeled oligonucleotides were hybridized to the template and, if complementary, ligated to an “anchor” sequencing primer. The fluorescently labeled oligonucleotides were designed to have degenerative bases except at one specific position which could be correlated to a certain fluorophore. After fluorophore detection, the anchor-

8 Britta Lötstedt

oligonucleotide complex was stripped off and a new round of hybridization, ligation and fluorophore detection was repeated. This technique was further developed by Applied Biosystems into a two-base encoding system. In the two-base encoding system, a pair of nucleotides at the 3’ end of the labeled oligonucleotide was correlated to a certain fluorophore color at its 5’ end. After ligation, the last 3 bases downstream of the fluorophore were cleaved off, leaving a free phosphorylated 5’ end. The resulting degenerative base gap between each known pair of nucleotides was sequenced after the whole ligation complex was stripped off and a new primer was hybridized to the n-1 position. Then, the next round of ligation could start. Each nucleotide was decoded twice and the error rates were thus kept at low levels (45).

RNA-sequencing in bulk and in individual cells The use of sequence-by-synthesis techniques to study a cell’s transcriptome was first described in 2008, when part of the mouse transcriptome (46) and the complete yeast transcriptome (47) were published. Before that, gene expression analysis had been performed by first using DNA microarrays and later various Sanger sequencing based methods like expressed sequence tag (EST) (48), serial analysis of gene expression (SAGE) (49) and cap analysis of gene expression (CAGE) (50). Although these methods overcame limitations of detecting only known targets, they still possessed barriers to detect transcripts of low abundance and were in some cases dependent on bacterial cloning, which was time and cost consuming (46, 51). In contrast, RNA-sequencing offered a neat protocol to detect novel transcripts, transcripts of low abundance and RNA splice events, without prior knowledge of the transcripts present in the sample.

In short, RNA library preparation started with extraction of RNA and enrichment for mRNA, before fragmentation and generation of cDNAs by using either random hexamers or oligo(dT) primers, or a combination of both, to start the reverse transcription reaction. After that, sequencing adaptors were attached to the fragments. Early protocols required a large amount of RNA as input, hence experiments were done in bulk. By studying gene expression in bulk, individual variation between cells was missed. Additionally, bulk averages could be misrepresentative of the sample due to overexpression of certain genes in only a few cells (52).

9

Therefore, significant efforts in the scientific community were made to develop single cell RNA sequencing (scRNA-seq) protocols.

Already in 2009, the first successful single-cell gene expression profiling using scRNA-seq was performed on the mouse blastomere (53). The protocol was adapted from single-cell cDNA amplification protocols which had been developed for microarrays (54, 55), but included optimized cDNA synthesis and amplification reactions. The protocol detected 75% (5,270) more genes than using a microarray read-out, and found 1,753 novel splice junctions.

During the following years, numerous scRNA-seq protocols were developed. Advances included increasing multiplexing capabilities (56, 57), read coverage over transcripts (58), reducing amplification biases (57), preserving information on transcriptome strandedness (56, 57) and finally enabling transcriptome profiling with isoform and allele-specific resolution (59). Throughput was further improved by minimizing the reaction space, first by using microfluidic approaches (60, 61) and later on by using droplet emulsions (62–64). With the droplet based techniques, the number of single cells which could be studied in one experiment increased by ~1,000 fold, compared to the earlier scRNA-seq protocols (65). Wu et al also showed that RNA-seq sensitivity was improved and biases were reduced when reagent volumes for the sample preparation were reduced to nanoliters (66). By introducing various approaches to barcoding, the multiplexing was also improved. This was done by either split-and-pool barcode generation (63, 67, 68), using randomized barcodes (62, 69) or by combinatorial in situ barcoding (70, 71) where cells were first fixed and then the mRNAs were barcoded in situ.

Unique Molecular Identifiers Since most RNA-seq protocols were dependent on PCR amplification, common issues were that (i) highly expressed transcripts got amplified more easily than lowly abundant transcripts and (ii) copying errors accumulated in each amplification round. In 2011, Unique Molecular Identifiers (UMI) were presented as a way to reduce such noise in the data and improve accuracy (72) UMIs were randomized tags, molecularly introduced to each transcript copy before a PCR step in order to give each

10 Britta Lötstedt

transcript in an RNA library a unique identifying sequence. In the analysis, transcripts mapping to the same positions in the reference genome had to have a unique UMI to be counted as unique transcripts. UMIs were also later used in scRNA-seq (73) as well as in amplicon sequencing approaches (74, 75).

16S analysis: from culturing to massively parallel sequencing Early microbial community identification approaches Early approaches for microbial identification included culturing, followed by a detailed comparative morphologic and phenotypic description of the bacterial isolate. Another approach called reassociation, or DNA-DNA hybridization, is still used as the golden standard for bacterial identification in some labs. To declare isolates to be of the same species using reassociation, the sequences needed to exhibit greater than 70% DNA similarity and less than 5 °C difference in melting temperature (76). However, not all microbes can easily be cultured (77) and culturing conditions can be difficult and take time to optimize (78). The reassociation method is also considered to be expensive, work-intensive and time-consuming (79).

First 16S DNA sequences Instead, sequencing of the 16S rRNA gene (16S sequencing) possesses some advantages over the methods just mentioned. First, the 16S rRNA gene is present in almost all bacteria (80). Second, the function of the 16S rRNA gene has not changed over a long period of time, implying that changes in the gene sequence reflect random changes that over time result in a stable evolutionary difference between species, rather than selected changes that would modify the function of the rRNA molecule (81). Third, the gene itself is large enough (~1,500 bp) to hold relevant information about phylogenetic relationships (81). Lastly, 16S sequencing results increased sensitivity and throughput of the number of bacteria that can be identified in a sample.

In 1994, it was established that 70% or greater DNA similarity using the reassociation method corresponded to 97% or greater sequence identity of the 16S rRNA gene (82), meaning that bacterial strains with 16S rRNA

11

gene sequence identity of less than 97% were unrelated at species level. However, this did not hold true for all strains (83, 84) and a definitive identification at species level using the 16S rRNA gene could not always be provided (80).

Obtaining the rRNA sequence was first done by cloning and Sanger sequencing using universal adapters (85), after conserved regions within the 16S rRNA gene sequence had been discovered (86). The conserved regions flanked highly variable regions and the region with the highest degree of heterogeneity was found to be in the first 500 bases of the 5’ end of the 16S rRNA gene (87, 88). Soon followed cloning-free methods, first using reverse transcription (89), then using PCR which required less bacterial input material (90, 91). During the 1990s, reference sequences of the 16S rRNA gene increased rapidly in public databases such as in the Ribosomal Database Project (RDP) (92) and NCBI’s Genbank (93). At the time of writing, RDP contained 3,356,809 16S rRNA sequences (94) and NCBI 21,699 curated 16S rRNA sequences (95).

Massively parallel 16S sequencing The introduction of massively parallel sequencing to 16S sequencing enabled improved resolution of bacterial community composition at a lower cost. Pyrosequencing was the first technique used for 16S-based bacterial identification, due to its longer read length capacity. Both environmental (96, 97) and mammalian bacterial communities were sequenced (98–101). Illumina sequencing offered much shorter read lengths, but yielded ~10 times more reads per run (102), enabling an increased number of samples to be sequenced in one run. Hence, deep sequencing of the human microbiome could be performed (103). Soon after, previously undetected genera were identified in the human oral microbiome (104) and human gastro-intestinal tract (102) using Illumina sequencing.

Analysis of 16S sequencing data Limitations associated with Illumina sequencing included read length, sequencing error rate and low sequencing quality due to low complexity of the 16S rRNA amplicon libraries (105). Illumina has over the years improved the read lengths and today, 2x300 bp long reads can be

12 Britta Lötstedt

sequenced on the Miseq. However, sequencing of the full-length 16S rRNA gene still provides better bacterial identification resolution than when sequencing sub-regions of the same gene, and it is argued that partly sequencing of the 16S rRNA gene only can generate reliable taxonomic resolution at genus level (106). The sequencing quality can somewhat be improved by the use of PhiX, a bacteriophage genome made into a library and ready for Illumina sequencing to improve nucleotide diversity. However, the error rate, low sequence quality and yet another source of bias, errors accumulating in the PCR amplification, can partly be corrected computationally.

16S sequencing data analysis usually begins with the generation of operational taxonomic units (OTUs), which are clusters of sequences based on sequence similarity (107). Traditionally, a sequence similarity of 97% has been used, meaning that sequences with 97% or greater similarity are clustered together. Irrespective of the cause of the dissimilarities between sequences within a cluster, the OTU provides a representative sequence of all the sequences in the cluster. Depending on the prevalence of each error, sequences with errors can be clustered together with their correct taxonomic group, or be incorrectly assigned as their own cluster, i.e. a separate OTU. OTU generation has shown to overestimate the number of clusters in samples but also fails to correctly separate species into different clusters when the differences in the sequences are small (108–110). The method also has low reproducibility (111). However, accuracy was improved by the introduction of amplicon- specific error-correction, or denoising algorithms, first made for pyrosequencing data (112–115) and later also adapted for Illumina sequencing data (115, 116). In short, the denoising algorithm takes advantage of the sequence quality as well as sequence abundance to first learn about the error rate of the dataset, before combining sequencing reads into a unique sequence with a consensus quality rate.

Taxonomic assignment of OTUs is important because it facilitates biological conclusions, as well as allows for comparisons to published studies. A common approach is to use a naive Bayesian classifier (117), which is a probabilistic machine learning model, based on the Bayes theorem, that can be used for classification of taxonomic assignments. In short, the model first uses eight bases long subsequences to classify

13

sequences and then bootstrapping to establish empirical probabilistic estimates of the classification accuracy. This enables classification of bacteria at species level, although there are many examples where OTUs again could not be classified at species or even on genus levels (79).

Metagenomic sequencing Taxonomic classification can be improved down to strain resolution by using metagenomic sequencing; an untargeted approach to shotgun sequence all bacterial genomes in a sample. The method enables taxonomic profiling and functional assessment of the bacterial community in a sample (118). Similarly, the major part of the method relies on computationally assembling the dataset into longer sequence contigs in order to identify bacteria and their functional elements. However, the details of this technique are outside the scope of this thesis.

Long-read sequencing Sequencing of the full-length 16S rRNA gene would improve taxonomic assignment resolution, as already mentioned (106). In transcriptomics, full length sequencing would allow for improved examination of isoforms and splicing events (119). Various technologies are available, although long-read sequencing was not used as part of this thesis work and will not be explained in detail.

In Pacific Biosciences’ (PacBio) single molecule real time (SMRT) sequencing technique (120), a phi29 DNA polymerase and the single stranded DNA template are immobilized inside a zero mode waveguide while the polymerase, in real-time, incorporates fluorescently labeled nucleotides. After a nucleotide has been incorporated, the fluorophore is released and detected. By repeated detection of the same sequence, something called circular consensus sequencing (121), the error rate has been reduced dramatically.

The Oxford Nanopore device relies on nanopore technology (122). The DNA template passes through a nanopore and a substring of the sequence is detected by changes in the electrical current of the pore. The electrical signal caused by the passage of the substring must then be translated to call single nucleotides.

14 Britta Lötstedt

Both techniques can sequence from 500 bp (250 bp for PacBio) to on average 30 kb long reads but suffer from higher error rates and lower read throughput as compared to Illumina sequencing (123).

Methods to resolve the spatial transcriptome Single cell sequencing is a field that has started to explore the great diversity of cell types and gene expression present in the population at single-cell resolution (124). However, capturing cells in droplets or other microfluidic devices modifies their gene expression measurements (125). An important aspect of recent cell atlases efforts is to study cells in their natural environment. This has been shown to be of significant importance in tumor development (126), organ development (127, 128) and disease progression (129). Methods for spatial analysis of the transcriptome in tissues can be divided into two categories; imaging-based and sequencing-based approaches.

Imaging-based spatial techniques Imaging-based methods strive to generate an in situ image of the transcriptome and often rely on detecting particular genes of interest.

The first subgroup of these methods are techniques based on in situ hybridization (ISH), first used in 1969 (130). Using this technique, labeled probes were allowed to hybridize to a specific gene target directly in cells. Different labelling strategies exist. An indirect approach was used to spatially map genes in the mouse brain, resulting in the Allen Brain Atlas (131, 132), but the most common one today is direct labelling using a fluorophore (133). This technique is referred to as fluorescent in situ hybridization (FISH).

The development into single molecule fluorescent in situ hybridization (smFISH) allowed detection of individual RNA molecules in single cells by using multiple shorter fluorescently labeled probes that bound along the transcript body in order to amplify the detection signal (134, 135). Another technique to deal with the high background -to-noise ratio encountered in imaging-based assays was RNAscope (136). In RNAscope, custom designed branched DNA probes were used. These probes both bound to the target mRNA but also enabled a set of reporter probes to

15

bind to the “branch”. After adding fluorescent reporters, the desired signal was amplified. RNAscope reported a 3-plex detection scheme in formalin-fixed, paraffin-embedded tissue samples (136).

Still, traditional FISH or smFISH are the golden standard for detection and localization of a few select targets and are often used for validation of new techniques (135). Additionally, smFISH was later adapted, together with expansion microscopy (137), to visualize mRNA in tissues, resulting in even higher resolution images conveniently acquired with use of regular laboratory microscopes, but at the expense of detecting a few targets (138).

Additional improvements in multiplexing and resolution were made by using sequential barcoding, first by Sequential FISH (seqFISH) (139) and then by Multiplexed Error-Robust FISH (MERFISH) (140). The detection of the transcripts was done by multiple rounds of hybridization of shorter fluorescently labeled probes to readout sequences on the barcode probe, targeting the genes of interest. The two methods were first only used on dissociated cells, but with improved protocols, they were also demonstrated on a limited collection of tissue sections (141, 142). Historically, imaging-based methods have been associated with lower gene throughput. However, with methods like seqFISH+ (143) and MERFISH, up to 10,000 genes could be studied at once but the approach has only been demonstrated in cell sparse tissues like the mouse brain (141, 143). SeqFISH and MERFISH were both based on barcoded versions of smFISH, however, a barcode-free approach which also overcame limitations of optical crowding was presented and termed cyclic- ouroboros smFISH (osmFISH) (144). In osmFISH, multiple fluorescently labeled probes were hybridized to the target of interest, imaged and then stripped before the next hybridization run could start.

Sequencing-based spatial methods Sequencing-based methods have the advantage of combining detection, localization and quantitation of the transcriptome by the use of sequencing. The methods can be divided in three sub categories: cryosectioning methods combined with bulk RNA-seq, in situ sequencing methods and in situ capture-based methods.

16 Britta Lötstedt

Cryosectioning-based spatial methods combined with RNA-seq Spatial gene expression in Drosophila embryos (145) was first studied using meticulously ordered cryosections that then acted as input material for bulk RNA-seq in 2013. For the first time, novel genes with spatial patterns were identified using this untargeted approach. However, the technique was cumbersome and required carrier RNA to be sequenced as well due to low amount of extracted mRNA from the tissue sections.

A similar method was published shortly thereafter and termed Tomo-seq; where cryosections of the tissue were taken at three different orientations (146), followed by RNA-seq on each cryosection. The cryosections were then combined computationally to estimate the gene expression at the points where the sections intersected. The method could reconstruct the gene expression at single-cell resolution. However, it required biologically identical tissue replicates to correctly estimate the transcriptome from the three directions and was therefore not feasible in an advanced research setting.

A similar approach was developed a few years later, when Laser Capture Microdissection (LCM) was combined with RNA-seq, resulting in LCM- Seq (147). LCM had earlier been used to isolate subpopulations of cells from tissue sections at microscopic scale using a laser (148). LCM-Seq could isolate and sequence mRNA from single cells using a single cryosection with the inherent throughput limitations associated with LCM.

In situ sequencing-based methods for tissue profiling When sequencing of the target RNAs is performed directly in the tissue, it is called in situ sequencing (ISS). All ISS methods are based on the detection of fluorescently labeled probes or nucleotides that then decode the locations of individual transcripts in tissues. To enable detection of the fluorescence signal, the signal is amplified by use of the rolling circle amplification (RCA) to form clonally amplified rolling-circle products (RCPs) (149). Due to the formation of micro- and nano-sized RCPs, there

17

is a limitation in the number of RCPs that can be formed and efficiently and correctly detected due to optical crowding.

In 2013, human breast cancer tissue was profiled using ISS (150). Tissue sections were first fixed, after which mRNAs were transcribed into cDNAs directly in the tissue. This was followed by degradation of the mRNA strand. Then, different ends of a padlock probe, complementary to the gene sequence of interest, were allowed to hybridize to the cDNA strand. The gap between the two ends was then joined together by either a combination of DNA polymerase and DNA ligase activity (when probe capture ends were designed at 4 nucleotides distance) or DNA ligase only (when ends were designed without a gap in between them). This was done on multiple gene targets in parallel. The resulting DNA circles were amplified by RCA, creating RCPs. To decode which RCP encoded which gene target, sequencing-by-ligation of a gene-specific identifier on the padlock probe was performed. Image analysis of the resulting fluorescence pattern then revealed the sequence of the identifier, which in turn decoded the gene of interest. The method was recently improved by the use of multiple padlock probes per target in order to efficiently detect lowly expressed genes (151) and also by switching the sequencing chemistry to sequencing-by-hybridization (152). Now, fluorescently labeled probes were hybridized to a readout sequence on the barcode probe, targeting the gene of interest, similar to multiplexed smFISH approaches. With these improvements, 160 genes could be detected at once in a mouse brain tissue section.

A combination of padlock probes, containing a gene-specific identifier, and Specific amplification of Nucleic Acids via Intramolecular Ligation (SNAIL) probes, were used in a similar approach called Spatially-resolved Transcript Amplicon Readout mapping (STARmap) (153). A pair of padlock and SNAIL probes were designed to hybridize to adjacent regions of the same gene target. Both probes were first hybridized directly to the mRNA in the cells. If both padlock and SNAIL probes hybridized to the same target, the padlock was ligated into a DNA circle by the use of the adjacent SNAIL probe and subjected to RCA. After amplification, the tissue sections were then embedded in 3D-hydrogels to maintain the organization of the RCAs for the detection step. Each nucleotide in the gene-specific identifier in each RCP was then sequenced twice, using

18 Britta Lötstedt

sequencing-by-ligation. The combination of both padlock and SNAIL probes reduced the error rate and resulted in detection of ~1,000 genes in 8 µm thick mouse brain tissue sections. In addition, Wang et al showed that ISS could be performed in tissue volumes as well when the 3D expression structure of selected genes was obtained in 150 µm thick tissue sections.

Unlike ISS using padlocks, fluorescent in situ RNA sequencing (FISSEQ) was an untargeted ISS technique, published in 2014 (154). With FISSEQ, cDNAs from cultured cells were first synthesized and then cross-linked to a matrix in order to keep the spatial organization intact. This reverse transcription reaction was supplemented with a mix of both normal and modified nucleotides, otherwise used in the cross-linking reaction, together with tagged random hexamers used to initiate the cDNA synthesis reaction. Second, cross-linked cDNAs were then circularized and amplified by RCA, followed by hybridization of a sequencing primer to the RCPs. The target-of-interest was then sequenced by sequencing-by- ligation, using the two-base encoding technique. Although not yet peer- reviewed, FISSEQ has been used together with expansion microscopy to detect transcripts in the mouse brain (155).

In situ capture-based methods for tissue profiling In this group of spatial methods, RNA molecules from tissue sections are captured in situ, before being released into a solution-based library preparation and subsequent sequencing. A major advantage with these methods is their ability to relatively easily quantify transcripts, due to the sequencing setup. This leads to a shift in focus in how to evaluate the performance, from comparing the number of genes detected to instead, comparing the number of detected unique transcripts.

In 2016, Spatial Transcriptomics (ST), the first method that demonstrated efficient capture and sequencing of transcripts in situ (156) was published. It used a glass slide covered with barcoded spots to which mRNA transcripts from tissue sections were captured. Each well area on the slide contained 1007 spots, each spot with a unique barcode. After cDNA synthesis on the surface, the cDNA molecules were released from the surface, followed by library preparation in solution. In the library preparation, IVT was used to linearly amplify the RNA molecules. The

19

first design of the spots had a resolution of 100 µm; ~17,000 genes and around 10 million unique transcripts were detected in a tissue section of the main olfactory bulb (MOB) of the mouse brain. The array design as well as the library preparation were significantly changed in 2018, when ST was obtained by 10X Genomics. ST was renamed to Visium and the arrays developed to a resolution of 55 µm. The library preparation time was shortened by mainly replacing parts of the original ST library preparation to a Tn5 transposase (157) based protocol. By using the Tn5 transposase, DNA can be fragmented while library preparation adaptors are incorporated simultaneously.

The spatial resolution was further improved to 2 µm when High- Definition Spatial Transcriptomics (HDST) was introduced (158). By the use of spatially barcoded bead arrays, almost 3,000,000 individual barcoded beads were placed in 2 µm wells for decoding of the barcodes. The decoding was done by repeated rounds of hybridization using labeled oligonucleotides to assign each bead in the array with a spatial address. Almost 20,000 genes and around 1,500,000 unique transcripts were found in a MOB tissue section. In addition, the method was able to detect sub-cellular expression of genes in the nucleus.

Using a similar barcoded bead approach as HDST, but with lower resolution (10 µm), Slide-Seq packed their beads in a monolayer on a rubber-coated glass slide (159), before decoding of the spatial array and performing mRNA capture. When combining Slide-Seq with scRNA-seq from the same tissue type, spatial location of specific cell types in the mouse brain could be reconstructed. The same research group released an updated protocol in 2020, which could obtain ~10 times more unique transcripts as compared to their initial protocol (160).

The latest update in the in situ capture-based field was published in late 2020. Deterministic Barcoding in Tissue for spatial omics sequencing (DBiT-seq) (161) used microfluidic chips with 10 µm wide parallel microchannels. First, a microfluidic chip was placed on top of the tissue section to enable flow of oligo(dT)- and a total of n barcoded-probes (probe set A) in the channels over the tissue to capture mRNA and perform cDNA synthesis. A second microfluidic chip was then placed at 90 degrees angle from the first microfluidic chip orientation over the

20 Britta Lötstedt

same tissue section to enable flow and ligation of a second set of m barcodes (probe set B). The flow from the channels using the second microfluidic device and barcode pool intersected with those channel patterns created with pool A. This resulted in a total of nxm barcoded addresses connected to cDNA molecules directly in the tissue. Barcoded cDNAs were then collected to a library preparation reaction in solution and sequenced. Using a mouse embryo as their model system, ~2,000 genes and ~4,000 unique transcripts were reported per one spatial measurement.

Although target-based and only capable of processing selected areas of a tissue section, a method named GeoMx Digital Spatial Profiler, which combined a LCM-like concept with in situ hybridization in a closed system, was released by Nanostring in 2020 (162). The GeoMx method used oligonucleotide probes complementary to the target genes to stain the tissue with. These probes contained a UV-photocleavable linker. The tissue section was then fluorescently stained with antibodies and imaged. Based on the fluorescent imaging, a region of interest (ROI) was selected. The ROI was then exposed to UV light, which released the oligonucleotides from their targets before being collected by a microcapillary tube and transferred into a plate for library preparation. The process was then repeated for up to 96 times to collect 96 different ROIs from a tissue section. One ROI could contain one single cell or up to ~5,000 cells. Oligonucleotides from targeted transcripts were then purified and sequenced, before computationally mapping those back to the ROIs in the tissue section. A ~1,400-plex RNA probe cocktail was used to study the gene expression in tonsil and colon tissues.

Analysis of spatial transcriptomics data Although there is no obvious and broadly accepted definition of a cell type (163), current efforts strive to find and define specific molecular markers for identification of cells (124). With high-throughput scRNA- seq comes the opportunity to not only define cell type but also cell states, and in an unbiased way find transcription markers for these cell groups (164). However, scRNA-seq data is tied to technical noise, variation in observations and missing data (165), and can, as already mentioned, not reconstruct the spatial organization of cell types within tissues, when used as a standalone approach.

21

Merging ISH or ISS data representative of a smaller subset of genes spatially distributed in a tissue section, together with scRNA-seq data from dissociated cells obtained from the same tissue type, enabled mapping vectors of gene expression now available with cell type labels obtained in a high dimensional space from dissociated cells, to distinct spatial locations in the brain of a marine living worm (166), zebrafish embryo (167), a Drosophila embryo (168) and finally the mouse brain (151). In addition, with the assumption that single cells close in space hold similar gene expression profiles, NovoSpaRc was able to reconstruct the spatial location of genes in various tissue types by the use of only scRNA-seq (169). However, the quality of the reconstructions was greatly improved by including at least two marker genes with known spatial locations generated with FISH.

However, for more heterogeneous tissue types, ISH and ISS data might not be able to provide enough details and instead, other in situ spatial methods may be used. It is nowadays very common to combine methods for in situ RNA capture with scRNA-seq to analyze the complex biological structures in tissues. Such analysis approaches have already been applied in ST (170–172), HDST (158), Slide-Seq (159) and DBiT-seq (161).

Finally, spatial analysis is not only about localizing cell types. Another important aspect is to reveal the spatially differentially expressed genes across the tissue. SpatialDE (173) presented in 2018 a model that used either ST data or smFISH to identify spatially expressed genes within a tissue section. Interestingly, SpatialDE grouped genes with similar spatial expression and based on these groups, separated the tissue section into regions, something which gave insight into possible histological expression patterns. The same year, Trendsceek (174) used the same type of data to propose an alternative model to also define spatial tissue patterns. A third method, called Splotch, was published a year later and was built to analyze ST data from large amounts of tissue sections acquired from a population-sized study (170, 175). Splotch had the advantage of combining multiple sections to identify differentially expressed transcripts per tissue region and per experimental parameter, for example, age and sex. In addition, by integrating spatial context and information obtained with each measurement, Splotch could compute

22 Britta Lötstedt

posterior estimates of gene expression resulting in increased accuracy when estimating expression of genes detected at low abundances.

23

Antibody-based analyses Analysis of nucleic acids has enabled detailed high-throughput observations of biological mechanisms in cells and tissues, however, proteins perform the majority of a cell’s functional and structural tasks, in both bacteria and eukaryotes. In order to do this efficiently, there are many versions of the protein encoded by the nucleic acid code as proteins additionally undergo comprehensive post-translational modifications. Proteins are, similarly to the study of nucleic acids, used in many research applications. As already mentioned, enzymes are one such family of proteins used, for example, in amplification of genetic material from a cell - a method that actually enables the systematic study of the DNA content from a single cell. Antibodies are proteins also used in research, often for identification and localization of other proteins and targets. The following section describes, in short, a limited selection of significant antibody- based techniques and applications. Due to the scope of this thesis, the focus will be on monoclonal antibody-based analysis.

Affinity The affinity between the antibody and its target, the antigen, is highly dependent on reaction conditions such as ionic strength, pH, temperature, incubation time with the antigen and antibody concentrations used (176), all of which must be optimized for the specific antibody-antigen interaction to occur and remain strong. In immunoassays, affinity between antibodies and is used in the protein detection and quantification steps (177). Common immunoassays include detection of proteins in bulk (eg. enzyme-linked immunosorbent assay), detection of proteins in tissues (immunohistochemical or immunofluorescence microscopy) and detection of proteins in a cell suspension (flow cytometry).

Immunohistochemistry Immunohistochemistry (IHC) encompasses methods used to detect and localize antigens in tissues by the use of antibodies that can be visualized with a microscope (178). In the 1930’s John Marrack observed typhus and cholera using a red stain (179), and Albert Coons introduced fluorescently labeled antibodies in 1941 to detect Streptococcus pneumoniae in tissue sections (180). This was the first example of immunofluorescence (IF). Nonetheless, both IHC and IF suffered from low quality and lack of

24 Britta Lötstedt

consistency until the adoption of monoclonal antibodies (181). Today, IHC is a common tool in clinical diagnostics that has been demonstrated to perform robustly in many tissues, processed with various preservation techniques (178). Interpretation of IHC results, on the other hand, is still subjective and based on pathologist experience (182). More recently, instead of using a fluorescent or colorimetric readout to detect antibody- antigen binding, antibodies were tagged with a DNA primer. This was first demonstrated in immuno-PCR by Takeshi Sano in 1992 (183) where the specific DNA oligonucleotide could be amplified and detected by gel electrophoresis. Today, DNA barcoding of antibodies is used to tag antibodies with unique DNA sequences which can be used to improve not only sensitivity of the antibody-antigen binding, but also vastly improve detection throughput and absolute quantification to a single molecule resolution, when used together with massively parallel sequencing (184).

Multiplex protein measurements in tissues Similarly to the spatial transcriptomics field, scientists in the field of spatial proteomics strive to understand tissue complexity, detect and localize cell types and study the protein distribution within tissues. This is a challenging task due to the number of different proteins present in a cell and the dynamic range at which these proteins are present, covering roughly seven orders of magnitude (185). Additionally, simultaneous protein detection in tissue sections is often achieved by IF, where only a few targets can be detected due to the limitations of spectral and spatial overlaps of the fluorophores (186). Hence, only a few antibodies can be imaged simultaneously. Improvements such as deactivation, removal and/or decoloring of the fluorescence signal and repeated rounds of fluorophore imaging (187–189) were implemented at first to improve multiplexing. However, faster and more reliable techniques were developed during the 2010s’, using various approaches.

Instead of labeling antibodies with fluorophores, Giesen et al tagged antibodies with heavy metal ions of defined atomic mass, which allowed for simultaneous spatial detection of 32 proteins in a human breast cancer tissue sample (190). After antibody labeling of the tissue sample, the sample was carefully removed piece by piece by spatial UV laser shots and detected in a CyTOF mass cytometer. In the CyTOF mass cytometer, the tagged antibodies are ionized, separated by their mass-to-charge ratio

25

and finally detected using a time-of-flight (TOF) mass spectrometer (MS), where each ion can be identified based on the aforementioned ratio (191).

A similar approach was used in Multiplexed Ion Beam Imaging (MIBI) (192), but in MIBI, the antibodies were tagged with an isotopical metal reporter. After antibody labeling of the tissue, the tissue surface was subjected to an ion beam which made metal reporters yield secondary ions. The secondary ions were then detected in a magnetic sector MS, which, in contrast to a TOF instrument, separated the ions using a magnetic field. MIBI reported improved throughput with simultaneous analysis of up to 100 targets.

Disadvantages of spatial MS-based techniques include cumbersome operating procedures and expensive equipment, not easily accessible to many labs. An approach using more standardized reagents and instruments, combining fluorescence microscopy with DNA barcoded antibodies, named CO-Detection by indEXing (CODEX), was presented in 2018 (193). CODEX used antibodies conjugated with specially designed oligonucleotide duplexes with a 5’ overhang. These duplexes were of different lengths for each antibody, which allowed for repeated gradual visualization using fluorescently labeled dNTPs. By gradual polymerization of the 5’ overhang by one nucleotide in each round, antibodies were visualized once the labeled dNTP was incorporated, whereafter the fluorophore was removed and a new round of polymerization could begin. CODEX showed visualization of 30 targets in the mouse spleen and is today available as a commercialized product.

Nanostring’s GeoMx Digital Spatial Profiler has already been mentioned for doing spatial targeted RNA profiling of manually selected regions in a tissue section. The same instrument can also do spatial profiling of proteins using the same UV-cleavable linker technique. With antibodies conjugated to this linker, 44 proteins were detected in multiple 100 µm ROIs (162).

Combined spatial RNA and antibody-based measurements Traditionally, methods for studying the transcriptome and proteome in parallel have been performed on samples coming from biological

26 Britta Lötstedt

replicates and rarely on the same specimen. This is mostly due to the different characteristics of the RNA and protein molecules, how those can be extracted, signals amplified and the data analyzed. However, much is to be gained and explored if these two omics approaches were integrated and co-analyzed in the same cell or tissue. For example, linking the two could give insight into how gene expression influences phenotypic variation as well as potentially improve and simplify quantification of the proteome (194).

Early multi-omics methods include fluorescent imaging of proteins and mRNA in E.coli (195) and quantification of DNA, RNA and proteins using qPCR in a low-plex fashion in single cells from a cell culture (196). The number of single cells and targets was scaled up a few years later to 75 mRNA/protein targets (197) using qPCR. Both studies reported a low correlation between the mRNA and protein expression in single cells. In 2017, Cellular Indexing of Transcriptomes Epitopes by Sequencing (CITE-seq) was presented (198). CITE-seq combined DNA barcoded antibodies targeting cell surface proteins with scRNA-seq in droplets. By sequencing both the scRNA-seq material and 13 different oligonucleotide tags, CITE-seq obtained an integrated view containing both the transcriptome and protein expression at single cell resolution.

Integrated transcriptomics and multiplexed protein in situ methods are needed to capture the functional complexity of tissues. In 2018, Schulz et al presented a combined method using RNAscope and imaging MS to simultaneously detect proteins and transcripts in tissues (199). They first allowed metal-labeled probes to hybridize to the branched structure created with RNAscope. After the RNAscope staining, metal-labeled antibodies were also added to the same tissue. Both RNA and antibody signals were then co-detected using mass cytometry. Using simultaneous detection of 16 targets in breast cancer tissues, they confirmed that targets had different mRNA-to-protein correlation on single-cell level compared to cell population levels.

In 2020, two different methods were presented for combined spatial transcriptomics and protein measurements in situ. The transcriptomics profiling part has already been presented in this thesis, thus the following paragraph will focus on the results obtained with the combined approach.

27

First commercially available method that enabled low-plex protein measurements combined with spatial RNA-seq was presented by 10X Genomics in early 2020. Here, a two-plex IF, instead of traditional histology, was combined with spatial transcriptomics (200)although no peer-reviewed data has been made available yet. Then, DBiT-seq enabled transcriptome-wide spatial analysis in mouse embryos, as described above. In addition, Liu et al also performed a co-measurement experiment using 22 protein targets by staining the same tissue section with DNA-barcoded antibodies (161). The DNA tag contained both a unique barcode as well as a polyadenylated tail so that the tags could be processed and prepared for sequencing similarly as the spatially captured mRNA. Using this approach, they analyzed mRNA to protein correlations in 13 anatomical regions in the mouse embryo.

Finally, Nanostring’s GeoMx Digital Spatial Profiler uses IF to visualize tissue sections and guide ROI selection. This combined IF analysis of maximum four antibodies together with multiplex RNA targeting has been used in various tissue types to study the RNA abundance (201–203), but less importance has been given to quantifying the IF signals and using protein expression as part of the analysis pipelines. Since simultaneous analysis of RNA targets and DNA-barcoded proteins cannot be achieved using the current platform, the manufacturer provided a setup to perform protein measurements in adjacent tissue sections as already described (204).

28 Britta Lötstedt

Present investigations This thesis is a result of two research papers, both of which are based on applications using massively parallel sequencing. Paper I shows how 16S sequencing can be used to draw clinical predictions about microbial communities in pediatric populations while Paper II describes a technical development for spatial RNA- and antibody-based sequencing in a high throughput fashion. These two papers present an excellent foundation for further research towards spatial host-microbiome profiling.

Paper I - Analysis of the aerodigestive microbiome in pediatric lung transplant recipients The aerodigestive tract has gained interest in the microbiome field due to its interconnected proximity, yet diverse biochemical and physical properties of the organs involved. This system is inhabited by a wide range of different bacteria and each species is specialized to survive in its own niche (205). Still, organs in the aerodigestive tract have microbes in common (206–208) but if their presence at the different sites results from microbial migration within the system is not fully understood. However, it is generally accepted that the oral cavity seeds the lung with microbes by microaspirations and inhalation (209–211) but findings also suggest there is seeding from the oral cavity to the stomach by swallowing (210) and from the stomach to the lungs by gastroesophageal refluxing (212, 213).

The movement of food through the body, gastrointestinal motility, is interesting since it has been shown that irregular muscle movements (dysmotility) in the gastrointestinal tract may be a predictor of respiratory infections (214). An extra vulnerable patient group are children with lung transplants, where the majority suffer from gastric dysmotility and many experience lung rejections (215–217). By investigating the abundance and diversity of the microbial communities in gastric, bronchoalveolar lavage (BAL) and oropharyngeal samples from pediatric lung transplant recipients and non-lung transplant patients, we wanted to explore the possible effect of dysmotility on the microbiome at the three aforementioned sites.

29

Samples were collected from 23 lung transplant recipients and 98 non- lung transplant patients (controls). 16S sequencing data was analyzed using QIIME2 (218) and DADA2 (116) to generate OTUs, which were taxonomically classified using the Greengenes database (219). Patients also went through a gastric emptying scan (GES) which measured how quickly food left the stomach.

Lung transplant recipients had significantly lower microbial diversity in the gastric and oropharyngeal sites as compared to controls (Figure 1a). Among BAL samples, levels of microbial diversity clearly separated patients into two groups. There was also less microbial overlap between the gastric fluid and BAL microbial populations in transplant recipients compared to controls. The predominant taxonomic family in BAL samples among lung transplant recipients was Staphylococcaceae. Also, in lung transplant recipients, gastric and oropharyngeal samples showed more diversity in their microbial distribution, although the genera Streptococcus, Prevotella and Veillonella were common in both sites. A closer look at the BAL samples from lung transplant recipients revealed that patients exhibiting low diversity monocommunities with a dominant genus also had abnormal GES. In addition, lung transplant recipients with abnormal GES also had lower diversity in the gastric fluid, compared to controls (Figure 1b). After stratifying for GES status, average relative microbial abundances revealed that lung transplant recipients were dominated by similar genera, independent of motility status (Figure 1c). Finally, we concluded that, regardless of GES status, lung transplant recipients had significantly less gastric-to-lung microbial exchange compared to controls.

This study showed that the lung transplant recipients in general showed lower microbial diversity in gastric and oropharyngeal samples, and also that this lack of diversity was most pronounced in patients with dysmotility. Although, it was not determined how this decrease in diversity affected the lung. The fact that the microbial overlap between the gastric and lung microbiota was lower in lung transplant recipients, independent of dysmotility, raised the question whether there were other factors in the gastric environment that could damage the lungs. Due to difficulties in extracting significant amounts of microbial DNA from BAL samples from patients in this rare clinical group, this study suffered from

30 Britta Lötstedt

a small sample size. However, the study lays the groundwork for multicenter trials to assess the impact of dysmotility in lung transplant recipients and has, for the first time, provided microbial data from multiple sampled sites in pediatric lung transplant recipients.

31

Figure 1. Characteristics of the aerodigestive microbiome in pediatric lung transplant recipients. Box and-whisker representation of the (a) Shannon diversity index (H) in BAL, gastric fluid and oropharynx for lung transplant patients and control subjects and (b) Shannon diversity index (H) in gastric fluid for lung transplant patients and control subjects, stratified for GES status. (c) Bar plots of the average relative abundances of genera in BAL, gastric fluid and oropharynx for lung transplant patients and control subjects, separated by GES status. Number in brackets denotes the number of patients in the group. Only genera with an average relative abundance >10% are displayed in the legend. Statistical significance (Wilcoxon’s rank sum test) markings are displayed in (a) and (b): 0.05

32 Britta Lötstedt

Paper II - Method to spatially resolve the transcriptome and antibody-based proteome in a high throughput fashion Gene expression is, as already established in this thesis, regulated at multiple levels from transcription to protein degradation. Protein and RNA levels possess specific information on gene function and cell state. To study cells in their natural environment has been shown to be important in, for example, organ development and cancer progression (126–129). The Spatial Transcriptomics (ST) method has its limitations in throughput and efficiency and was not originally designed for studying protein expression in situ. To overcome those limitations, Spatial Multi- Omics (SM-Omics) was developed. SM-Omics simultaneously combines spatially resolved transcriptomics with antibody-based multiplexed protein detection in the same tissue section in situ. The protocol is compatible with both IF and DNA-based antibody staining. In addition, as the protocol is automated, SM-Omics generates a high-plex multi- omics characterization of the tissue in a high throughput manner while exhibiting low technical variation.

SM-Omics uses standard ST glass slides (156) which are covered with barcoded probes. First, frozen tissue sections are added on top of the glass slide. Then, the tissue sections are stained for either hematoxylin and eosin (H&E) staining, IF or with DNA-barcoded antibodies. Lastly, the glass slides are then loaded on the Agilent Bravo robot where the rest of the protocol is performed. This includes the in situ reactions where cDNA is made as well as all the library preparation steps needed to make final libraries, ready to be loaded on the sequencer. When using the DNA- barcoded antibodies to detect proteins in situ, the DNA-barcodes are copied and spatially tagged at the same time as the cDNA is made in situ and can, after amplification, be sequenced (Figure 2a).

Using the automated SM-Omics protocol, we first analyzed spatial measurements in the mouse brain as a proof-of-concept. Gene expression in the mouse MOB agreed well with expression measurements obtained from the Allen Brain Atlas reference. When validating the staining using fluorescently labeled antibodies, we could see that our in situ reaction conditions were optimal for antibody IF staining in combination with

33

detection of the gene activity (Figure 2b). The expression of the protein NeuN in the mouse brain cortex was significantly correlated with the expression of the gene encoding for NeuN, Rbfox3 (Figure 2c).

Next, we introduced staining with DNA-barcoded antibodies, due to the aforementioned limitations associated with multiplexed IF measurements. Here, we used the mouse spleen as a model system to validate our approach. Now, both IF and DNA-barcoded antibodies stained the tissue sections. Using two protein targets to start with, F4/80, expressed in the red pulp and IgD, expressed in the white pulp in the spleen, we achieved a specific barcoded-antibody spatial pattern which was significantly correlated with the corresponding IF intensities (Figure 2d). Finally, we combined DNA-barcoded antibodies with spatial transcriptomics on the same tissue section for six protein targets in the mouse spleen.

34 Britta Lötstedt

35

Figure 2. Highly-multiplexed SM-Omics tissue patterns. (a) Frozen tissue sections are placed on an SM-Omics array, tissues stained with both IF and DNA-tagged antibodies, imaged and in situ copying reactions performed and at the same time as cDNA made (I). Then, both the antibody tags and cDNAs are used in the library preparation reactions and sequenced (II). Finally, spatial IF, antibody tag and gene expression patterns can be evaluated (III). (b) Neuronal target NeuN; was stained for antibody IF and DAPI and corresponding gene activity labeled (cDNA) confirming feasibility of in situ reaction conditions for IF staining. ABA reference image for the same target with labeled zoomed-in area (isocortex). (c) Correlation between scaled NeuN IF and respective NeuN mRNA expression per tissue section (n=3). (d) Splenic tissue illustration of red and white pulp structures followed by spatial expression profiles of sequenced antibody tags as well as IF images in splenic tissue for F4/80 staining red pulp macrophages and IgD staining marginal zone B cells in the white pulp.

36 Britta Lötstedt

Concluding remarks Technical developments of applications based on sequencing have enabled an improved understanding of biological networks in cells and tissues. High-throughput technologies, such as scRNA-seq and spatial omics, pave the way to new discoveries directly applicable in translational research.

Since the development of ST in 2016, resolution, throughput and efficiency of in situ based techniques has greatly improved and today even enables researchers to make high-definition sub-cellular RNA-seq spatial measurements (158). Additionally, gene expression information is also coupled to quantitative information on tissue morphology - an important factor that enables us to couple high resolution sequencing data to decades of data collected in the fields of histology and pathology. Nonetheless, in order for the new generation technologies to make a translational impact, these methods need to be scalable and applicable on hundreds of different tissues. Spatial transcriptomics is currently the only scalable method already enabling higher specimen numbers to be processed - ST has been used to process ~1,200 tissue sections to study ALS disease pathology (170), however, much of this work was performed manually. These results encouraged us to develop a fully automated system presented in Paper II.

Morphology, cell density, gene and protein expression are just four features of how a tissue type can be defined. Highly scalable methods need to also be adaptable to extract robust information from a myriad of tissues in order to help the wide scientific community. Currently, HDST (158), Visium (220) and ST have been used to study various tissue types (170, 172, 221–225). Although defining cell types powers initial studies of structure and function within tissues, there is importance in studying subcellular interactions within individual cells to define a specific interaction mechanism (226). Hence, both resolution and throughput of spatial methods are important.

In addition, mRNA abundances in a cell cannot be directly correlated to the protein levels (227). Protein synthesis rates are dependent on the availability of cell’s own resources, both in time and space, and regulatory proteins and non-coding RNA have shown to play an important role in

37

precisely regulating the translational machinery (228, 229). mRNA and protein degradation are other processes that also influence abundances of the respective molecules. In Paper II, we focused on also providing researchers a new way of searching for protein targets in tissues but taking a step away from standard and usually low-throughput imaging- based approaches. SM-Omics relies on using DNA-barcoded antibodies to scale and combine spatial transcriptomics and spatial proteomics into a first user-friendly all-sequencing based technology. SM-Omics allows processing of up to 96 sequencing-ready libraries, of high complexity, in a ~2 days, making it the first truly high-throughput platform for spatial multi-omics.

Many times, as part of the complex biological machinery, bacteria, either present in the natural symbiotic environment or as pathogens, play a key role in sending stimuli to the host. The influence of the microbiome is still not fully understood but interactions between bacterial and host cells have shown to have an impact on conditions like chronic inflammation, metabolic diseases, cancer and neurological illnesses (230). Studies in germ-free (GF) mice have, in some ways, demonstrated the importance of bacteria to host health. For example, GF mice exhibit morbidities such as premature death, lack of naturally occurring antibodies (231) and essential vitamins (232) and are more susceptible to pathogen attacks (233). To understand if changes in the microbiome are a response to the diseased condition or a trigger of the condition itself, longitudinal, high- throughput spatial multi-omic studies are probably necessary.

Distinct dissimilar characteristics of the bacterial and mammalian cells complicate such efforts. Both Papers I and II show the knowledge and experience required to further develop methods, both experimental and computational, that would enable study of spatial host-microbiome interactions in a complex environment. While RNA sequencing of single cells is rather routine work for mammalian cells in a research setting, recent advances also successfully demonstrate sequencing of individual gut microbes (234). In addition, there are already multiple techniques for determining the spatial organization of microbes in the mammalian gut (235, 236) and oral cavity (237). However, none of these directly address the question of the impact of host-microbiome interactions. To bridge this gap, we are currently developing an approach for spatial host-

38 Britta Lötstedt

microbiome sequencing that will, for the first time, enable a high- resolution multi-omics characterization of the complex interactions in situ.

39

Acknowledgments A wise woman told me when I first began my research studies, that doing research is like running a marathon; ‘Keep up an evenly pace throughout this project, there is no point in sprinting’. Although I am more of a sprinter, I have come back to this advice many times during the last three years. Luckily, I am also a heavy weight lifter, something which has helped me a lot in order to survive in the academic jungle. However, I have not been alone in this and for that I am and will always be devoutly grateful.

Joakim Lundeberg, my main supervisor and steady rock who allows me to do research in my way and Anders Andersson, my co-supervisor, for your support and interest. To Eric Alm, Aviv Regev and Ramink Xavier, who all believed in my research and welcomed me to their labs. To my co- authors, for your effort and commitment to science.

I may be a princess and a padawan, but I am nothing without my queen and master. Not a single thing would have been possible without you.

For endless love and support from around the globe, mom and dad, family and friends.

THANK YOU!

To be continued...

40 Britta Lötstedt

References

1. C. E. Cleland, C. F. Chyba, Defining “life.” Orig. Life Evol. Biosph. 32, 387–393 (2002).

2. R. E. Ley, C. A. Lozupone, M. Hamady, R. Knight, J. I. Gordon, Worlds within worlds: evolution of the vertebrate gut microbiota. Nat. Rev. Microbiol. 6, 776–788 (2008).

3. K. R. Foster, J. Schluter, K. Z. Coyte, S. Rakoff-Nahoum, The evolution of the host microbiome as an ecosystem on a leash. Nature. 548, 43–51 (2017).

4. A. Leewenhoeck, An abstract of a letter from Mr. Anthony Leevvenhoeck at Delft, dated Sep. 17. 1683. Containing some microscopical observations, about animals in the scurf of the teeth, the substance call’d worms in the nose, the cuticula consisting of scales. Philosophical Transactions of the Royal Society of London. 14 (1684), pp. 568–574.

5. R. Dahm, Discovering DNA: Friedrich Miescher and the early years of nucleic acid research. Hum. Genet. 122, 565–581 (2008).

6. F. Miescher-Rüsch, Ueber die chemische Zusammensetzung der Eiterzellen.

7. R. E. Franklin, R. G. Gosling, Molecular configuration in sodium thymonucleate. Nature. 171, 740–741 (1953).

8. J. D. Watson, F. H. Crick, Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 171, 737–738 (1953).

9. O. T. Avery, C. M. Macleod, M. McCarty, STUDIES ON THE CHEMICAL NATURE OF THE SUBSTANCE INDUCING TRANSFORMATION OF PNEUMOCOCCAL TYPES : INDUCTION OF TRANSFORMATION BY A DESOXYRIBONUCLEIC ACID FRACTION ISOLATED FROM PNEUMOCOCCUS TYPE III. J. Exp. Med. 79, 137– 158 (1944).

10. A. D. Hershey, M. Chase, Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36, 39–56 (1952).

11. M. B. Gerstein, C. Bruce, J. S. Rozowsky, D. Zheng, J. Du, J. O. Korbel, O. Emanuelsson, Z. D. Zhang, S. Weissman, M. Snyder, What is a gene, post-ENCODE? History and updated definition. Genome Res. 17, 669–681 (2007).

12. C. R. Woese, G. E. Fox, Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U. S. A. 74, 5088–5090 (1977).

13. H. B. Vickery, The origin of the word protein. Yale J. Biol. Med. 22, 387–393 (1950).

14. M. Heidelberger, O. T. Avery, The specific soluble substance of pneumococcus. Experimental Biology and Medicine. 20 (1923), pp. 434–435.

15. M. Heidelberger, O. T. Avery, THE SOLUBLE SPECIFIC SUBSTANCE OF PNEUMOCOCCUS. Journal of Experimental Medicine. 40 (1924), pp. 301–317.

41

16. F. H. Crick, On protein synthesis. Symp. Soc. Exp. Biol. 12, 138–163 (1958).

17. F. Crick, Central Dogma of Molecular Biology. Nature. 227 (1970), pp. 561–563.

18. T. Fukaya, B. Lim, M. Levine, Enhancer Control of Transcriptional Bursting. Cell. 166, 358–368 (2016).

19. D. Baltimore, Viral RNA-dependent DNA Polymerase: RNA-dependent DNA Polymerase in Virions of RNA Tumour Viruses. Nature. 226 (1970), pp. 1209–1211.

20. H. M. Temin, S. Mizutani, Viral RNA-dependent DNA Polymerase: RNA-dependent DNA Polymerase in Virions of Rous Sarcoma Virus. Nature. 226 (1970), pp. 1211– 1213.

21. I. M. Verma, G. F. Temple, H. Fan, D. Baltimore, In vitro Synthesis of DNA Complementary to Rabbit Reticulocyte 10S RNA. Nature New Biology. 235 (1972), pp. 163–167.

22. D. L. Kacian, S. Spiegelman, A. Bank, M. Terada, S. Metafora, L. Dow, P. A. Marks, In vitro Synthesis of DNA Components of Human Genes for Globins. Nature New Biology. 235 (1972), pp. 167–169.

23. I. R. Lehman, M. J. Bessman, E. S. Simms, A. Kornberg, Enzymatic synthesis of deoxyribonucleic acid. I. Preparation of substrates and partial purification of an enzyme from Escherichia coli. J. Biol. Chem. 233, 163–170 (1958).

24. K. Kleppe, E. Ohtsuka, R. Kleppe, I. Molineux, H. G. Khorana, Studies on polynucleotides: XCVI. Repair replication of short synthetic DNA’s as catalyzed by DNA polymerases. J. Mol. Biol. 56, 341–361 (1971).

25. R. K. Saiki, D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi, G. T. Horn, K. B. Mullis, H. A. Erlich, Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 239, 487–491 (1988).

26. F. C. Lawyer, S. Stoffel, R. K. Saiki, S. Y. Chang, P. A. Landre, R. D. Abramson, D. H. Gelfand, High-level expression, purification, and enzymatic characterization of full- length Thermus aquaticus DNA polymerase and a truncated form deficient in 5’ to 3' exonuclease activity. PCR Methods Appl. 2, 275–287 (1993).

27. A. Chien, D. B. Edgar, J. M. Trela, Deoxyribonucleic acid polymerase from the extreme thermophile Thermus aquaticus. J. Bacteriol. 127, 1550–1557 (1976).

28. R. Higuchi, G. Dollinger, P. Sean Walsh, R. Griffith, Simultaneous Amplification and Detection of Specific DNA Sequences. Bio/Technology. 10 (1992), pp. 413–417.

29. M. Chamberlin, J. McGrath, L. Waskell, New RNA polymerase from Escherichia coli infected with bacteriophage T7. Nature. 228, 227–231 (1970).

30. J. Eberwine, H. Yeh, K. Miyashiro, Y. Cao, S. Nair, R. Finnell, M. Zettel, P. Coleman, Analysis of gene expression in single live neurons. Proceedings of the National Academy of Sciences. 89 (1992), pp. 3010–3014.

42 Britta Lötstedt

31. E. Southern, K. Mir, M. Shchepinov, Molecular interactions on microarrays. Nat. Genet. 21, 5–9 (1999).

32. T. LaFramboise, Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res. 37, 4181–4193 (2009).

33. J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigó, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, X. Zhu, The sequence of the human genome. Science. 291, 1304–1351 (2001).

34. E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, Y. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier,

43

J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R. A. Gibbs, D. M. Muzny, S. E. Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C. M. Rives, J. H. Gorrell, M. L. Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nelson, G. M. Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D. R. Smith, L. Doucette-Stamm, M. Rubenfield, K. Weinstock, H. M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R. W. Davis, N. A. Federspiel, A. P. Abola, M. J. Proctor, R. M. Myers, J. Schmutz, M. Dickson, J. Grimwood, D. R. Cox, M. V. Olson, R. Kaul, C. Raymond, N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M. Athanasiou, R. Schultz, B. A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W. R. McCombie, M. de la Bastide, N. Dedhia, H. Blöcker, K. Hornischer, G. Nordsiek, R. Agarwala, L. Aravind, J. A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D. G. Brown, C. B. Burge, L. Cerutti, H. C. Chen, D. Church, M. Clamp, R. R. Copley, T. Doerks, S. R. Eddy, E. E. Eichler, T. S. Furey, J. Galagan, J. G. Gilbert, C. Harmon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L. S. Johnson, T. A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W. J. Kent, P. Kitts, E. V. Koonin, I. Korf, D. Kulp, D. Lancet, T. M. Lowe, A. McLysaght, T. Mikkelsen, J. V. Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G. Schuler, J. Schultz, G. Slater, A. F. Smit, E. Stupka, J. Szustakowki, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe, S. P. Yang, R. F. Yeh, F. Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A. Wetterstrand, A. Patrinos, M. J. Morgan, P. de Jong, J. J. Catanese, K. Osoegawa, H. Shizuya, S. Choi, Y. J. Chen, J. Szustakowki, International Human Genome Sequencing Consortium., Initial sequencing and analysis of the human genome. Nature. 409 (2001), pp. 860– 921.

35. W. Gilbert, A. Maxam, The Nucleotide Sequence of the lac Operator. Proceedings of the National Academy of Sciences. 70 (1973), pp. 3581–3584.

36. F. Sanger, S. Nicklen, A. R. Coulson, DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U. S. A. 74, 5463–5467 (1977).

37. F. Sanger, A. R. Coulson, G. F. Hong, D. F. Hill, G. B. Petersen, Nucleotide sequence of bacteriophage λ DNA. Journal of Molecular Biology. 162 (1982), pp. 729–773.

38. R. D. Fleischmann, M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 269, 496– 512 (1995).

39. J. Shendure, H. Ji, Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).

40. R. H. Deurenberg, E. Bathoorn, M. A. Chlebowicz, N. Couto, M. Ferdous, S. García- Cobos, A. M. D. Kooistra-Smid, E. C. Raangs, S. Rosema, A. C. M. Veloo, K. Zhou, A. W. Friedrich, J. W. A. Rossen, Reprint of “Application of next generation sequencing in clinical microbiology and infection prevention.” J. Biotechnol. 250, 2–10 (2017).

44 Britta Lötstedt

41. M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlén, P. Nyrén, Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 242, 84–89 (1996).

42. D. R. Bentley, S. Balasubramanian, H. P. Swerdlow, G. P. Smith, J. Milton, C. G. Brown, K. P. Hall, D. J. Evers, C. L. Barnes, H. R. Bignell, J. M. Boutell, J. Bryant, R. J. Carter, R. Keira Cheetham, A. J. Cox, D. J. Ellis, M. R. Flatbush, N. A. Gormley, S. J. Humphray, L. J. Irving, M. S. Karbelashvili, S. M. Kirk, H. Li, X. Liu, K. S. Maisinger, L. J. Murray, B. Obradovic, T. Ost, M. L. Parkinson, M. R. Pratt, I. M. J. Rasolonjatovo, M. T. Reed, R. Rigatti, C. Rodighiero, M. T. Ross, A. Sabot, S. V. Sankar, A. Scally, G. P. Schroth, M. E. Smith, V. P. Smith, A. Spiridou, P. E. Torrance, S. S. Tzonev, E. H. Vermaas, K. Walter, X. Wu, L. Zhang, M. D. Alam, C. Anastasi, I. C. Aniebo, D. M. D. Bailey, I. R. Bancarz, S. Banerjee, S. G. Barbour, P. A. Baybayan, V. A. Benoit, K. F. Benson, C. Bevis, P. J. Black, A. Boodhun, J. S. Brennan, J. A. Bridgham, R. C. Brown, A. A. Brown, D. H. Buermann, A. A. Bundu, J. C. Burrows, N. P. Carter, N. Castillo, M. Chiara E Catenazzi, S. Chang, R. Neil Cooley, N. R. Crake, O. O. Dada, K. D. Diakoumakos, B. Dominguez-Fernandez, D. J. Earnshaw, U. C. Egbujor, D. W. Elmore, S. S. Etchin, M. R. Ewan, M. Fedurco, L. J. Fraser, K. V. Fuentes Fajardo, W. Scott Furey, D. George, K. J. Gietzen, C. P. Goddard, G. S. Golda, P. A. Granieri, D. E. Green, D. L. Gustafson, N. F. Hansen, K. Harnish, C. D. Haudenschild, N. I. Heyer, M. M. Hims, J. T. Ho, A. M. Horgan, K. Hoschler, S. Hurwitz, D. V. Ivanov, M. Q. Johnson, T. James, T. A. Huw Jones, G.-D. Kang, T. H. Kerelska, A. D. Kersey, I. Khrebtukova, A. P. Kindwall, Z. Kingsbury, P. I. Kokko- Gonzales, A. Kumar, M. A. Laurent, C. T. Lawley, S. E. Lee, X. Lee, A. K. Liao, J. A. Loch, M. Lok, S. Luo, R. M. Mammen, J. W. Martin, P. G. McCauley, P. McNitt, P. Mehta, K. W. Moon, J. W. Mullens, T. Newington, Z. Ning, B. Ling Ng, S. M. Novo, M. J. O’Neill, M. A. Osborne, A. Osnowski, O. Ostadan, L. L. Paraschos, L. Pickering, A. C. Pike, A. C. Pike, D. Chris Pinkard, D. P. Pliskin, J. Podhasky, V. J. Quijano, C. Raczy, V. H. Rae, S. R. Rawlings, A. Chiva Rodriguez, P. M. Roe, J. Rogers, M. C. Rogert Bacigalupo, N. Romanov, A. Romieu, R. K. Roth, N. J. Rourke, S. T. Ruediger, E. Rusman, R. M. Sanches-Kuiper, M. R. Schenker, J. M. Seoane, R. J. Shaw, M. K. Shiver, S. W. Short, N. L. Sizto, J. P. Sluis, M. A. Smith, J. Ernest Sohna Sohna, E. J. Spence, K. Stevens, N. Sutton, L. Szajkowski, C. L. Tregidgo, G. Turcatti, S. Vandevondele, Y. Verhovsky, S. M. Virk, S. Wakelin, G. C. Walcott, J. Wang, G. J. Worsley, J. Yan, L. Yau, M. Zuerlein, J. Rogers, J. C. Mullikin, M. E. Hurles, N. J. McCooke, J. S. West, F. L. Oaks, P. L. Lundberg, D. Klenerman, R. Durbin, A. J. Smith, Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 456, 53–59 (2008).

43. J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J. Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile, B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, J. Bustillo, An integrated semiconductor device enabling non-optical genome sequencing. Nature. 475, 348–352 (2011).

44. J. Shendure, G. J. Porreca, N. B. Reppas, X. Lin, J. P. McCutcheon, A. M. Rosenbaum, M. D. Wang, K. Zhang, R. D. Mitra, G. M. Church, Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 309, 1728–1732 (2005).

45. M. L. Metzker, Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

45

46. A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, B. Wold, Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 5, 621–628 (2008).

47. U. Nagalakshmi, Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, M. Snyder, The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 320, 1344–1349 (2008).

48. M. D. Adams, J. M. Kelley, J. D. Gocayne, M. Dubnick, M. H. Polymeropoulos, H. Xiao, C. R. Merril, A. Wu, B. Olde, R. F. Moreno, A. Et, Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 252, 1651– 1656 (1991).

49. V. E. Velculescu, L. Zhang, B. Vogelstein, K. W. Kinzler, Serial analysis of gene expression. Science. 270, 484–487 (1995).

50. T. Shiraki, S. Kondo, S. Katayama, K. Waki, T. Kasukawa, H. Kawaji, R. Kodzius, A. Watahiki, M. Nakamura, T. Arakawa, S. Fukuda, D. Sasaki, A. Podhajska, M. Harbers, J. Kawai, P. Carninci, Y. Hayashizaki, Cap analysis gene expression for high- throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. U. S. A. 100, 15776–15781 (2003).

51. J. Shendure, The beginning of the end for microarrays? Nat. Methods. 5 (2008), pp. 585–587.

52. M. Bengtsson, A. Ståhlberg, P. Rorsman, M. Kubista, Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. Genome Res. 15, 1388–1392 (2005).

53. F. Tang, C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau, B. B. Tuch, A. Siddiqui, K. Lao, M. A. Surani, mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 6, 377–382 (2009).

54. K. Kurimoto, Y. Yabuta, Y. Ohinata, Y. Ono, K. D. Uno, R. G. Yamada, H. R. Ueda, M. Saitou, An improved single-cell cDNA amplification method for efficient high-density oligonucleotide microarray analysis. Nucleic Acids Res. 34, e42 (2006).

55. K. Kurimoto, Y. Yabuta, Y. Ohinata, M. Saitou, Global single-cell cDNA amplification to provide a template for representative high-density oligonucleotide microarray analysis. Nat. Protoc. 2, 739–752 (2007).

56. S. Islam, U. Kjällquist, A. Moliner, P. Zajac, J.-B. Fan, P. Lönnerberg, S. Linnarsson, Characterization of the single-cell transcriptional landscape by highly multiplex RNA- seq. Genome Res. 21, 1160–1167 (2011).

57. T. Hashimshony, F. Wagner, N. Sher, I. Yanai, CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2, 666–673 (2012).

58. D. Ramsköld, S. Luo, Y.-C. Wang, R. Li, Q. Deng, O. R. Faridani, G. A. Daniels, I. Khrebtukova, J. F. Loring, L. C. Laurent, G. P. Schroth, R. Sandberg, Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

46 Britta Lötstedt

59. M. Hagemann-Jensen, C. Ziegenhain, P. Chen, D. Ramsköld, G.-J. Hendriks, A. J. M. Larsson, O. R. Faridani, R. Sandberg, Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).

60. A. M. Streets, X. Zhang, C. Cao, Y. Pang, X. Wu, L. Xiong, L. Yang, Y. Fu, L. Zhao, F. Tang, Y. Huang, Microfluidic single-cell whole-transcriptome sequencing. Proc. Natl. Acad. Sci. U. S. A. 111, 7048–7053 (2014).

61. A. A. Pollen, T. J. Nowakowski, J. Shuga, X. Wang, A. A. Leyrat, J. H. Lui, N. Li, L. Szpankowski, B. Fowler, P. Chen, N. Ramalingam, G. Sun, M. Thu, M. Norris, R. Lebofsky, D. Toppani, D. W. Kemp 2nd, M. Wong, B. Clerkson, B. N. Jones, S. Wu, L. Knutsson, B. Alvarado, J. Wang, L. S. Weaver, A. P. May, R. C. Jones, M. A. Unger, A. R. Kriegstein, J. A. A. West, Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat. Biotechnol. 32, 1053–1058 (2014).

62. E. Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. R. Bialas, N. Kamitaki, E. M. Martersteck, J. J. Trombetta, D. A. Weitz, J. R. Sanes, A. K. Shalek, A. Regev, S. A. McCarroll, Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 161, 1202–1214 (2015).

63. A. M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. A. Weitz, M. W. Kirschner, Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 161, 1187–1201 (2015).

64. G. X. Y. Zheng, J. M. Terry, P. Belgrader, P. Ryvkin, Z. W. Bent, R. Wilson, S. B. Ziraldo, T. D. Wheeler, G. P. McDermott, J. Zhu, M. T. Gregory, J. Shuga, L. Montesclaros, J. G. Underwood, D. A. Masquelier, S. Y. Nishimura, M. Schnall-Levin, P. W. Wyatt, C. M. Hindson, R. Bharadwaj, A. Wong, K. D. Ness, L. W. Beppu, H. J. Deeg, C. McFarland, K. R. Loeb, W. J. Valente, N. G. Ericson, E. A. Stevens, J. P. Radich, T. S. Mikkelsen, B. J. Hindson, J. H. Bielas, Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

65. V. Svensson, R. Vento-Tormo, S. A. Teichmann, Exponential scaling of single-cell RNA-seq in the past decade. Nature Protocols. 13 (2018), pp. 599–604.

66. A. R. Wu, N. F. Neff, T. Kalisky, P. Dalerba, B. Treutlein, M. E. Rothenberg, F. M. Mburu, G. L. Mantalas, S. Sim, M. F. Clarke, S. R. Quake, Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods. 11, 41–46 (2014).

67. H. C. Fan, G. K. Fu, S. P. A. Fodor, Expression profiling. Combinatorial labeling of single cells for gene expression cytometry. Science. 347, 1258367 (2015).

68. S. Bose, Z. Wan, A. Carr, A. H. Rizvi, G. Vieira, D. Pe’er, P. A. Sims, Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015).

69. T. M. Gierahn, M. H. Wadsworth 2nd, T. K. Hughes, B. D. Bryson, A. Butler, R. Satija, S. Fortune, J. C. Love, A. K. Shalek, Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods. 14, 395–398 (2017).

70. J. Cao, J. S. Packer, V. Ramani, D. A. Cusanovich, C. Huynh, R. Daza, X. Qiu, C. Lee, S. N. Furlan, F. J. Steemers, A. Adey, R. H. Waterston, C. Trapnell, J. Shendure,

47

Comprehensive single-cell transcriptional profiling of a multicellular organism. Science. 357, 661–667 (2017).

71. A. B. Rosenberg, C. M. Roco, R. A. Muscat, A. Kuchina, P. Sample, Z. Yao, L. T. Graybuck, D. J. Peeler, S. Mukherjee, W. Chen, S. H. Pun, D. L. Sellers, B. Tasic, G. Seelig, Single-cell profiling of the developing mouse brain and spinal cord with split- pool barcoding. Science. 360, 176–182 (2018).

72. T. Kivioja, A. Vähärautio, K. Karlsson, M. Bonke, M. Enge, S. Linnarsson, J. Taipale, Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods. 9, 72–74 (2011).

73. S. Islam, A. Zeisel, S. Joost, G. La Manno, P. Zajac, M. Kasper, P. Lönnerberg, S. Linnarsson, Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods. 11, 163–166 (2014).

74. I. Kinde, J. Wu, N. Papadopoulos, K. W. Kinzler, B. Vogelstein, Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. U. S. A. 108, 9530–9535 (2011).

75. Y. Kukita, R. Matoba, J. Uchida, T. Hamakawa, Y. Doki, F. Imamura, K. Kato, High- fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients. DNA Res. 22, 269–277 (2015).

76. L. G. Wayne, W. E. C. Moore, E. Stackebrandt, O. Kandler, R. R. Colwell, M. I. Krichevsky, H. G. Truper, R. G. E. Murray, P. A. D. Grimont, D. J. Brenner, M. P. Starr, L. H. Moore, Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. International Journal of Systematic and Evolutionary Microbiology. 37 (1987), pp. 463–464.

77. E. J. Stewart, Growing unculturable bacteria. J. Bacteriol. 194, 4151–4160 (2012).

78. J.-C. Lagier, S. Edouard, I. Pagnier, O. Mediannikov, M. Drancourt, D. Raoult, Current and Past Strategies for Bacterial Culture in Clinical Microbiology. Clinical Microbiology Reviews. 28 (2015), pp. 208–236.

79. J. M. Janda, S. L. Abbott, 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).

80. J. B. Patel, 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn. 6, 313–321 (2001).

81. C. R. Woese, Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987).

82. E. Stackebrandt, B. M. Goebel, Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology. International Journal of Systematic and Evolutionary Microbiology. 44 (1994), pp. 846–849.

83. S. Kusunoki, T. Ezaki, Proposal of Mycobacterium peregrinum sp. nov., nom. rev., and elevation of Mycobacterium chelonae subsp. abscessus (Kubica et al.) to species

48 Britta Lötstedt

status: Mycobacterium abscessus comb. nov. Int. J. Syst. Bacteriol. 42, 240–245 (1992).

84. B. Springer, E. C. Böttger, P. Kirschner, R. J. Wallace Jr, Phylogeny of the Mycobacterium chelonae-like organism based on partial sequencing of the 16S rRNA gene and proposal of Mycobacterium mucogenicum sp. nov. Int. J. Syst. Bacteriol. 45, 262–267 (1995).

85. G. J. Olsen, D. J. Lane, S. J. Giovannoni, N. R. Pace, D. A. Stahl, Microbial Ecology and Evolution: A Ribosomal RNA Approach. Annual Review of Microbiology. 40 (1986), pp. 337–365.

86. R. R. Gutell, B. Weiser, C. R. Woese, H. F. Noller, Comparative anatomy of 16-S-like ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 32, 155–216 (1985).

87. Y. W. Tang, N. M. Ellis, M. K. Hopkins, D. H. Smith, D. E. Dodge, D. H. Persing, Comparison of phenotypic and genotypic techniques for identification of unusual aerobic pathogenic gram-negative bacilli. J. Clin. Microbiol. 36, 3674–3679 (1998).

88. C. R. Woese, R. Gutell, R. Gupta, H. F. Noller, Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. Microbiol. Rev. 47, 621–669 (1983).

89. D. J. Lane, B. Pace, G. J. Olsen, D. A. Stahl, M. L. Sogin, N. R. Pace, Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proceedings of the National Academy of Sciences. 82 (1985), pp. 6955–6959.

90. E. C. Böttger, Rapid determination of bacterial ribosomal RNA sequences by direct sequencing of enzymatically amplified DNA. FEMS Microbiol. Lett. 53, 171–176 (1989).

91. U. Edwards, T. Rogall, H. Blöcker, M. Emde, E. C. Böttger, Isolation and direct complete nucleotide determination of entire genes. Characterization of a gene coding for 16S ribosomal RNA. Nucleic Acids Res. 17, 7843–7853 (1989).

92. N. Larsen, G. J. Olsen, B. L. Maidak, M. J. McCaughey, R. Overbeek, T. J. Macke, T. L. Marsh, C. R. Woese, The ribosomal database project. Nucleic Acids Res. 21, 3021– 3023 (1993).

93. National Center for Biotechnology Information (NCBI). National Center for Biotechnology Information (NCBI)[Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information (1988), (available at https://www.ncbi.nlm.nih.gov/).

94. Michigan State University, Ribosomal Database Project. RDP (2020), (available at http://rdp.cme.msu.edu/).

95. National Center for Biotechnology Information, U.S. National Library of Medicine, BLASTn - rRNA/ITS databases. Standard Nucleotide BLAST (2021), (available at https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch).

96. J. A. Huber, D. B. Mark Welch, H. G. Morrison, S. M. Huse, P. R. Neal, D. A. Butterfield, M. L. Sogin, Microbial population structures in the deep marine biosphere. Science. 318, 97–100 (2007).

49

97. M. L. Sogin, H. G. Morrison, J. A. Huber, D. Mark Welch, S. M. Huse, P. R. Neal, J. M. Arrieta, G. J. Herndl, Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proc. Natl. Acad. Sci. U. S. A. 103, 12115–12120 (2006).

98. A. F. Andersson, M. Lindberg, H. Jakobsson, F. Bäckhed, P. Nyrén, L. Engstrand, Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One. 3, e2836 (2008).

99. L. Dethlefsen, S. Huse, M. L. Sogin, D. A. Relman, The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol. 6, e280 (2008).

100. P. McKenna, C. Hoffmann, N. Minkah, P. P. Aye, A. Lackner, Z. Liu, C. A. Lozupone, M. Hamady, R. Knight, F. D. Bushman, The macaque gut microbiome in health, lentiviral infection, and chronic enterocolitis. PLoS Pathog. 4, e20 (2008).

101. P. J. Turnbaugh, M. Hamady, T. Yatsunenko, B. L. Cantarel, A. Duncan, R. E. Ley, M. L. Sogin, W. J. Jones, B. A. Roe, J. P. Affourtit, M. Egholm, B. Henrissat, A. C. Heath, R. Knight, J. I. Gordon, A core gut microbiome in obese and lean twins. Nature. 457, 480–484 (2009).

102. M. J. Claesson, Q. Wang, O. O’Sullivan, R. Greene-Diniz, J. R. Cole, R. P. Ross, P. W. O’Toole, Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38, e200 (2010).

103. R. Hummelen, A. D. Fernandes, J. M. Macklaim, R. J. Dickson, J. Changalucha, G. B. Gloor, G. Reid, Deep sequencing of the vaginal microbiota of women with HIV. PLoS One. 5, e12078 (2010).

104. V. Lazarevic, K. Whiteson, S. Huse, D. Hernandez, L. Farinelli, M. Osterås, J. Schrenzel, P. François, Metagenomic study of the oral microbiota by Illumina high- throughput sequencing. J. Microbiol. Methods. 79, 266–271 (2009).

105. M. Kircher, P. Heyn, J. Kelso, Addressing challenges in the production and analysis of illumina sequencing data. BMC Genomics. 12, 382 (2011).

106. J. S. Johnson, D. J. Spakowicz, B.-Y. Hong, L. M. Petersen, P. Demkowicz, L. Chen, S. R. Leopold, B. M. Hanson, H. O. Agresta, M. Gerstein, E. Sodergren, G. M. Weinstock, Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat. Commun. 10, 5029 (2019).

107. M. Blaxter, J. Mann, T. Chapman, F. Thomas, C. Whitton, R. Floyd, E. Abebe, Defining operational taxonomic units using DNA barcode data. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1935–1943 (2005).

108. A. M. Eren, G. G. Borisy, S. M. Huse, J. L. Mark Welch, Oligotyping analysis of the human oral microbiome. Proc. Natl. Acad. Sci. U. S. A. 111, E2875–84 (2014).

109. M. Tikhonov, R. W. Leach, N. S. Wingreen, Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution. ISME J. 9, 68–80 (2015).

110. M. J. Rosen, M. Davison, D. Bhaya, D. S. Fisher, Fine-scale diversity and extensive

50 Britta Lötstedt

recombination in a quasisexual bacterial population occupying a broad niche. Science. 348 (2015), pp. 1019–1023.

111. C. Wen, L. Wu, Y. Qin, J. D. Van Nostrand, D. Ning, B. Sun, K. Xue, F. Liu, Y. Deng, Y. Liang, J. Zhou, Evaluation of the reproducibility of amplicon sequencing with Illumina MiSeq platform. PLoS One. 12, e0176716 (2017).

112. C. Quince, A. Lanzen, R. J. Davenport, P. J. Turnbaugh, Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 12, 38 (2011).

113. M. J. Rosen, B. J. Callahan, D. S. Fisher, S. P. Holmes, Denoising PCR-amplified metagenome data. BMC Bioinformatics. 13, 283 (2012).

114. J. Reeder, R. Knight, Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat. Methods. 7, 668–669 (2010).

115. R. C. Edgar, UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods. 10, 996–998 (2013).

116. B. J. Callahan, P. J. McMurdie, M. J. Rosen, A. W. Han, A. J. A. Johnson, S. P. Holmes, DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods. 13, 581–583 (2016).

117. Q. Wang, G. M. Garrity, J. M. Tiedje, J. R. Cole, Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Environmental Microbiology. 73 (2007), pp. 5261–5267.

118. C. Quince, A. W. Walker, J. T. Simpson, N. J. Loman, N. Segata, Erratum: Corrigendum: Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 35 (2017), pp. 1211–1211.

119. B. Wang, E. Tseng, M. Regulski, T. A. Clark, T. Hon, Y. Jiao, Z. Lu, A. Olson, J. C. Stein, D. Ware, Unveiling the complexity of the maize transcriptome by single- molecule long-read sequencing. Nat. Commun. 7, 11708 (2016).

120. J. Eid, A. Fehr, J. Gray, K. Luong, J. Lyle, G. Otto, P. Peluso, D. Rank, P. Baybayan, B. Bettman, A. Bibillo, K. Bjornson, B. Chaudhuri, F. Christians, R. Cicero, S. Clark, R. Dalal, A. Dewinter, J. Dixon, M. Foquet, A. Gaertner, P. Hardenbol, C. Heiner, K. Hester, D. Holden, G. Kearns, X. Kong, R. Kuse, Y. Lacroix, S. Lin, P. Lundquist, C. Ma, P. Marks, M. Maxham, D. Murphy, I. Park, T. Pham, M. Phillips, J. Roy, R. Sebra, G. Shen, J. Sorenson, A. Tomaney, K. Travers, M. Trulson, J. Vieceli, J. Wegener, D. Wu, A. Yang, D. Zaccarin, P. Zhao, F. Zhong, J. Korlach, S. Turner, Real-time DNA sequencing from single polymerase molecules. Science. 323, 133–138 (2009).

121. K. J. Travers, C.-S. Chin, D. R. Rank, J. S. Eid, S. W. Turner, A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 38, e159 (2010).

122. J. Clarke, H.-C. Wu, L. Jayasinghe, A. Patel, S. Reid, H. Bayley, Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).

123. S. L. Amarasinghe, S. Su, X. Dong, L. Zappia, M. E. Ritchie, Q. Gouil, Opportunities

51

and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).

124. A. Regev, S. A. Teichmann, E. S. Lander, I. Amit, C. Benoist, E. Birney, B. Bodenmiller, P. Campbell, P. Carninci, M. Clatworthy, H. Clevers, B. Deplancke, I. Dunham, J. Eberwine, R. Eils, W. Enard, A. Farmer, L. Fugger, B. Göttgens, N. Hacohen, M. Haniffa, M. Hemberg, S. Kim, P. Klenerman, A. Kriegstein, E. Lein, S. Linnarsson, E. Lundberg, J. Lundeberg, P. Majumder, J. C. Marioni, M. Merad, M. Mhlanga, M. Nawijn, M. Netea, G. Nolan, D. Pe’er, A. Phillipakis, C. P. Ponting, S. Quake, W. Reik, O. Rozenblatt-Rosen, J. Sanes, R. Satija, T. N. Schumacher, A. Shalek, E. Shapiro, P. Sharma, J. W. Shin, O. Stegle, M. Stratton, M. J. T. Stubbington, F. J. Theis, M. Uhlen, A. van Oudenaarden, A. Wagner, F. Watt, J. Weissman, B. Wold, R. Xavier, N. Yosef, Human Cell Atlas Meeting Participants, The Human Cell Atlas. Elife. 6 (2017), doi:10.7554/eLife.27041.

125. S. C. van den Brink, F. Sage, Á. Vértesy, B. Spanjaard, J. Peterson-Maduro, C. S. Baron, C. Robin, A. van Oudenaarden, Single-cell sequencing reveals dissociation- induced gene expression in tissue subpopulations. Nat. Methods. 14, 935–936 (2017).

126. D. Hanahan, L. M. Coussens, Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell. 21, 309–322 (2012).

127. M. Mallo, C. R. Alonso, The regulation of Hox gene expression during animal development. Development. 140 (2013), pp. 3951–3963.

128. G. T. Reeves, N. Trisnadi, T. V. Truong, M. Nahmad, S. Katz, A. Stathopoulos, Dorsal- ventral gene expression in the Drosophila embryo reflects the dynamics and precision of the dorsal nuclear gradient. Dev. Cell. 22, 544–557 (2012).

129. D. J. DePianto, S. Chandriani, A. R. Abbas, G. Jia, E. N. N’Diaye, P. Caplazi, S. E. Kauder, S. Biswas, S. K. Karnik, C. Ha, Z. Modrusan, M. A. Matthay, J. Kukreja, H. R. Collard, J. G. Egen, P. J. Wolters, J. R. Arron, Heterogeneous gene expression signatures correspond to distinct lung pathologies and biomarkers of disease severity in idiopathic pulmonary fibrosis. Thorax. 70, 48–56 (2015).

130. J. G. Gall, M. L. Pardue, Formation and detection of RNA-DNA hybrid molecules in cytological preparations. Proc. Natl. Acad. Sci. U. S. A. 63, 378–383 (1969).

131. E. S. Lein, M. J. Hawrylycz, N. Ao, M. Ayres, A. Bensinger, A. Bernard, A. F. Boe, M. S. Boguski, K. S. Brockway, E. J. Byrnes, L. Chen, L. Chen, T.-M. Chen, M. C. Chin, J. Chong, B. E. Crook, A. Czaplinska, C. N. Dang, S. Datta, N. R. Dee, A. L. Desaki, T. Desta, E. Diep, T. A. Dolbeare, M. J. Donelan, H.-W. Dong, J. G. Dougherty, B. J. Duncan, A. J. Ebbert, G. Eichele, L. K. Estin, C. Faber, B. A. Facer, R. Fields, S. R. Fischer, T. P. Fliss, C. Frensley, S. N. Gates, K. J. Glattfelder, K. R. Halverson, M. R. Hart, J. G. Hohmann, M. P. Howell, D. P. Jeung, R. A. Johnson, P. T. Karr, R. Kawal, J. M. Kidney, R. H. Knapik, C. L. Kuan, J. H. Lake, A. R. Laramee, K. D. Larsen, C. Lau, T. A. Lemon, A. J. Liang, Y. Liu, L. T. Luong, J. Michaels, J. J. Morgan, R. J. Morgan, M. T. Mortrud, N. F. Mosqueda, L. L. Ng, R. Ng, G. J. Orta, C. C. Overly, T. H. Pak, S. E. Parry, S. D. Pathak, O. C. Pearson, R. B. Puchalski, Z. L. Riley, H. R. Rockett, S. A. Rowland, J. J. Royall, M. J. Ruiz, N. R. Sarno, K. Schaffnit, N. V. Shapovalova, T. Sivisay, C. R. Slaughterbeck, S. C. Smith, K. A. Smith, B. I. Smith, A. J. Sodt, N. N. Stewart, K.-R. Stumpf, S. M. Sunkin, M. Sutram, A. Tam, C. D. Teemer, C. Thaller, C. L. Thompson, L. R. Varnam, A. Visel, R. M. Whitlock, P. E. Wohnoutka, C. K. Wolkey, V. Y. Wong, M. Wood, M. B. Yaylaoglu, R. C. Young, B. L. Youngstrom, X. F. Yuan, B. Zhang, T. A. Zwingman, A. R. Jones, Genome-wide atlas of gene

52 Britta Lötstedt

expression in the adult mouse brain. Nature. 445, 168–176 (2007).

132. L. Ng, A. Bernard, C. Lau, C. C. Overly, H.-W. Dong, C. Kuan, S. Pathak, S. M. Sunkin, C. Dang, J. W. Bohland, H. Bokil, P. P. Mitra, L. Puelles, J. Hohmann, D. J. Anderson, E. S. Lein, A. R. Jones, M. Hawrylycz, An anatomic gene expression atlas of the adult mouse brain. Nat. Neurosci. 12, 356–362 (2009).

133. G. T. Rudkin, B. D. Stollar, High resolution detection of DNA-RNA hybrids in situ by indirect immunofluorescence. Nature. 265, 472–473 (1977).

134. A. M. Femino, F. S. Fay, K. Fogarty, R. H. Singer, Visualization of Single RNA Transcripts in Situ. Science. 280 (1998), pp. 585–590.

135. A. Raj, P. van den Bogaard, S. A. Rifkin, A. van Oudenaarden, S. Tyagi, Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods. 5, 877–879 (2008).

136. F. Wang, J. Flanagan, N. Su, L.-C. Wang, S. Bui, A. Nielson, X. Wu, H.-T. Vo, X.-J. Ma, Y. Luo, RNAscope: a novel in situ RNA analysis platform for formalin-fixed, paraffin-embedded tissues. J. Mol. Diagn. 14, 22–29 (2012).

137. F. Chen, P. W. Tillberg, E. S. Boyden, Expansion microscopy. Science. 347 (2015), pp. 543–548.

138. F. Chen, A. T. Wassie, A. J. Cote, A. Sinha, S. Alon, S. Asano, E. R. Daugharthy, J.-B. Chang, A. Marblestone, G. M. Church, A. Raj, E. S. Boyden, Nanoscale imaging of RNA with expansion microscopy. Nat. Methods. 13, 679–684 (2016).

139. E. Lubeck, A. F. Coskun, T. Zhiyentayev, M. Ahmad, L. Cai, Single-cell in situ RNA profiling by sequential hybridization. Nat. Methods. 11 (2014), pp. 360–361.

140. K. H. Chen, A. N. Boettiger, J. R. Moffitt, S. Wang, X. Zhuang, RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 348, aaa6090 (2015).

141. J. R. Moffitt, J. Hao, D. Bambah-Mukku, T. Lu, C. Dulac, X. Zhuang, High- performance multiplexed fluorescence in situ hybridization in culture and tissue with matrix imprinting and clearing. Proc. Natl. Acad. Sci. U. S. A. 113, 14456–14461 (2016).

142. S. Shah, E. Lubeck, W. Zhou, L. Cai, Editorial Note to: In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus. Neuron. 94 (2017), pp. 745–746.

143. C.-H. L. Eng, M. Lawson, Q. Zhu, R. Dries, N. Koulena, Y. Takei, J. Yun, C. Cronin, C. Karp, G.-C. Yuan, L. Cai, Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature. 568, 235–239 (2019).

144. S. Codeluppi, L. E. Borm, A. Zeisel, G. La Manno, J. A. van Lunteren, C. I. Svensson, S. Linnarsson, Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods. 15, 932–935 (2018).

53

145. P. A. Combs, M. B. Eisen, Sequencing mRNA from cryo-sliced Drosophila embryos to determine genome-wide spatial patterns of gene expression. PLoS One. 8, e71820 (2013).

146. J. P. Junker, E. S. Noël, V. Guryev, K. A. Peterson, G. Shah, J. Huisken, A. P. McMahon, E. Berezikov, J. Bakkers, A. van Oudenaarden, Genome-wide RNA Tomography in the zebrafish embryo. Cell. 159, 662–675 (2014).

147. S. Nichterwitz, G. Chen, J. Aguila Benitez, M. Yilmaz, H. Storvall, M. Cao, R. Sandberg, Q. Deng, E. Hedlund, Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling. Nat. Commun. 7, 12139 (2016).

148. M. R. Emmert-Buck, R. F. Bonner, P. D. Smith, R. F. Chuaqui, Z. Zhuang, S. R. Goldstein, R. A. Weiss, L. A. Liotta, Laser capture microdissection. Science. 274 (1996), pp. 998–1001.

149. C. Larsson, I. Grundberg, O. Söderberg, M. Nilsson, In situ detection and genotyping of individual mRNA molecules. Nat. Methods. 7, 395–397 (2010).

150. R. Ke, M. Mignardi, A. Pacureanu, J. Svedlund, J. Botling, C. Wählby, M. Nilsson, In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods. 10, 857–860 (2013).

151. X. Qian, K. D. Harris, T. Hauling, D. Nicoloutsopoulos, A. B. Muñoz-Manchado, N. Skene, J. Hjerling-Leffler, M. Nilsson, Probabilistic cell typing enables fine mapping of closely related cell types in situ. Nat. Methods. 17, 101–106 (2020).

152. D. Gyllborg, C. M. Langseth, X. Qian, E. Choi, S. M. Salas, M. M. Hilscher, E. S. Lein, M. Nilsson, Hybridization-based in situ sequencing (HybISS) for spatially resolved transcriptomics in human and mouse brain tissue. Nucleic Acids Res. 48, e112 (2020).

153. X. Wang, W. E. Allen, M. A. Wright, E. L. Sylwestrak, N. Samusik, S. Vesuna, K. Evans, C. Liu, C. Ramakrishnan, J. Liu, G. P. Nolan, F.-A. Bava, K. Deisseroth, Three- dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 361 (2018), doi:10.1126/science.aat5691.

154. J. H. Lee, E. R. Daugharthy, J. Scheiman, R. Kalhor, J. L. Yang, T. C. Ferrante, R. Terry, S. S. F. Jeanty, C. Li, R. Amamoto, D. T. Peters, B. M. Turczyk, A. H. Marblestone, S. A. Inverso, A. Bernard, P. Mali, X. Rios, J. Aach, G. M. Church, Highly multiplexed subcellular RNA sequencing in situ. Science. 343, 1360–1363 (2014).

155. S. Alon, D. R. Goodwin, A. Sinha, A. T. Wassie, F. Chen, E. R. Daugharthy, Y. Bando, A. Kajita, A. G. Xue, K. Marrett, R. Prior, Y. Cui, A. C. Payne, C.-C. Yao, H.-J. Suk, R. Wang, C.-C. (jay) Yu, P. Tillberg, P. Reginato, N. Pak, S. Liu, S. Punthambaker, E. P. R. Iyer, R. E. Kohman, J. A. Miller, E. S. Lein, A. Lako, N. Cullen, S. Rodig, K. Helvie, D. L. Abravanel, N. Wagle, B. E. Johnson, J. Klughammer, M. Slyper, J. Waldman, J. Jané-Valbuena, O. Rozenblatt-Rosen, A. Regev, G. M. Church, A. H. Marblestone, E. S. Boyden, IMAXT Consortium, Expansion Sequencing: Spatially Precise In Situ Transcriptomics in Intact Biological Systems, , doi:10.1101/2020.05.13.094268.

156. P. L. Ståhl, F. Salmén, S. Vickovic, A. Lundmark, J. F. Navarro, J. Magnusson, S.

54 Britta Lötstedt

Giacomello, M. Asp, J. O. Westholm, M. Huss, A. Mollbrink, S. Linnarsson, S. Codeluppi, Å. Borg, F. Pontén, P. I. Costea, P. Sahlén, J. Mulder, O. Bergmann, J. Lundeberg, J. Frisén, Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353, 78–82 (2016).

157. A. Adey, H. G. Morrison, Asan, X. Xun, J. O. Kitzman, E. H. Turner, B. Stackhouse, A. P. MacKenzie, N. C. Caruccio, X. Zhang, J. Shendure, Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).

158. S. Vickovic, G. Eraslan, F. Salmén, J. Klughammer, L. Stenbeck, D. Schapiro, T. Äijö, R. Bonneau, L. Bergenstråhle, J. F. Navarro, J. Gould, G. K. Griffin, Å. Borg, M. Ronaghi, J. Frisén, J. Lundeberg, A. Regev, P. L. Ståhl, High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods. 16, 987–990 (2019).

159. S. G. Rodriques, R. R. Stickels, A. Goeva, C. A. Martin, E. Murray, C. R. Vanderburg, J. Welch, L. M. Chen, F. Chen, E. Z. Macosko, Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 363, 1463– 1467 (2019).

160. R. R. Stickels, E. Murray, P. Kumar, J. Li, J. L. Marshall, D. J. Di Bella, P. Arlotta, E. Z. Macosko, F. Chen, Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. (2020), doi:10.1038/s41587-020-0739- 1.

161. Y. Liu, M. Yang, Y. Deng, G. Su, A. Enninful, C. C. Guo, T. Tebaldi, D. Zhang, D. Kim, Z. Bai, E. Norris, A. Pan, J. Li, Y. Xiao, S. Halene, R. Fan, High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue. Cell. 183, 1665– 1681.e18 (2020).

162. C. R. Merritt, G. T. Ong, S. E. Church, K. Barker, P. Danaher, G. Geiss, M. Hoang, J. Jung, Y. Liang, J. McKay-Fleisch, K. Nguyen, Z. Norgaard, K. Sorg, I. Sprague, C. Warren, S. Warren, P. J. Webster, Z. Zhou, D. R. Zollinger, D. L. Dunaway, G. B. Mills, J. M. Beechem, Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nat. Biotechnol. 38, 586–599 (2020).

163. What Is Your Conceptual Definition of “Cell Type” in the Context of a Mature Organism? Cell Syst. 4, 255–259 (2017).

164. K. D. Birnbaum, Power in Numbers: Single-Cell RNA-Seq Strategies to Dissect Complex Tissues. Annu. Rev. Genet. 52, 203–221 (2018).

165. D. Lähnemann, J. Köster, E. Szczurek, D. J. McCarthy, S. C. Hicks, M. D. Robinson, C. A. Vallejos, K. R. Campbell, N. Beerenwinkel, A. Mahfouz, L. Pinello, P. Skums, A. Stamatakis, C. S.-O. Attolini, S. Aparicio, J. Baaijens, M. Balvert, B. de Barbanson, A. Cappuccio, G. Corleone, B. E. Dutilh, M. Florescu, V. Guryev, R. Holmer, K. Jahn, T. J. Lobo, E. M. Keizer, I. Khatri, S. M. Kielbasa, J. O. Korbel, A. M. Kozlov, T.-H. Kuo, B. P. F. Lelieveldt, I. I. Mandoiu, J. C. Marioni, T. Marschall, F. Mölder, A. Niknejad, L. Raczkowski, M. Reinders, J. de Ridder, A.-E. Saliba, A. Somarakis, O. Stegle, F. J. Theis, H. Yang, A. Zelikovsky, A. C. McHardy, B. J. Raphael, S. P. Shah, A. Schönhuth, Eleven grand challenges in single-cell data science. Genome Biol. 21, 31 (2020).

166. K. Achim, J.-B. Pettit, L. R. Saraiva, D. Gavriouchkina, T. Larsson, D. Arendt, J. C.

55

Marioni, High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 33, 503–509 (2015).

167. R. Satija, J. A. Farrell, D. Gennert, A. F. Schier, A. Regev, Spatial reconstruction of single-cell gene expression data. Nature Biotechnology. 33 (2015), pp. 495–502.

168. N. Karaiskos, P. Wahle, J. Alles, A. Boltengagen, S. Ayoub, C. Kipar, C. Kocks, N. Rajewsky, R. P. Zinzen, The embryo at single-cell transcriptome resolution. Science. 358, 194–199 (2017).

169. M. Nitzan, N. Karaiskos, N. Friedman, N. Rajewsky, Gene expression cartography. Nature. 576, 132–137 (2019).

170. S. Maniatis, T. Äijö, S. Vickovic, C. Braine, K. Kang, A. Mollbrink, D. Fagegaltier, Ž. Andrusivová, S. Saarenpää, G. Saiz-Castro, M. Cuevas, A. Watters, J. Lundeberg, R. Bonneau, H. Phatnani, Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science. 364, 89–93 (2019).

171. R. Moncada, D. Barkley, F. Wagner, M. Chiodin, J. C. Devlin, M. Baron, C. H. Hajdu, D. M. Simeone, I. Yanai, Author Correction: Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 1476 (2020).

172. M. Asp, S. Giacomello, L. Larsson, C. Wu, D. Fürth, X. Qian, E. Wärdell, J. Custodio, J. Reimegård, F. Salmén, C. Österholm, P. L. Ståhl, E. Sundström, E. Åkesson, O. Bergmann, M. Bienko, A. Månsson-Broberg, M. Nilsson, C. Sylvén, J. Lundeberg, A Spatiotemporal Organ-Wide Gene Expression and Cell Atlas of the Developing Human Heart. Cell. 179, 1647–1660.e19 (2019).

173. V. Svensson, S. A. Teichmann, O. Stegle, SpatialDE: identification of spatially variable genes. Nat. Methods. 15, 343–346 (2018).

174. D. Edsgärd, P. Johnsson, R. Sandberg, Identification of spatial expression trends in single-cell gene expression data. Nat. Methods. 15, 339–342 (2018).

175. T. Äijö, S. Maniatis, S. Vickovic, K. Kang, M. Cuevas, C. Braine, H. Phatnani, J. Lundeberg, R. Bonneau, Splotch: Robust estimation of aligned spatial temporal gene expression data, , doi:10.1101/757096.

176. R. Reverberi, L. Reverberi, Factors affecting the antigen-antibody reaction. Blood Transfus. 5, 227–240 (2007).

177. S. F. Kingsmore, Multiplexed protein measurement: technologies and applications of protein and antibody arrays. Nature Reviews Drug Discovery. 5 (2006), pp. 310–321.

178. L. L. de Matos, D. C. Trufelli, M. G. L. de Matos, M. A. da Silva Pinhal, Immunohistochemistry as an important tool in biomarkers detection and clinical practice. Biomark. Insights. 5, 9–20 (2010).

179. J. Marrack, Nature of Antibodies. Nature. 133, 292–293 (1934).

180. A. H. Coons, H. J. Creech, R. N. Jones, Immunological Properties of an Antibody

56 Britta Lötstedt

Containing a Fluorescent Group. Experimental Biology and Medicine. 47 (1941), pp. 200–202.

181. T. E. Schutzbank, R. McGuire, D. R. Scholl, in Clinical Virology Manual, Fourth Edition (American Society of Microbiology, 2009), pp. 77–88.

182. T. Seidal, A. J. Balaton, H. Battifora, Interpretation and quantification of immunostains. Am. J. Surg. Pathol. 25, 1204–1207 (2001).

183. T. Sano, C. L. Smith, C. R. Cantor, Immuno-PCR: very sensitive antigen detection by means of specific antibody-DNA conjugates. Science. 258, 120–122 (1992).

184. R. Y. Nong, J. Gu, S. Darmanis, M. Kamali-Moghaddam, U. Landegren, DNA-assisted protein detection technologies. Expert Rev. Proteomics. 9, 21–32 (2012).

185. R. A. Zubarev, The challenge of the proteome dynamic range and its implications for in-depth proteomics. Proteomics. 13, 723–726 (2013).

186. B. Bodenmiller, Multiplexed Epitope-Based Tissue Imaging for Discovery and Healthcare Applications. Cell Syst. 2, 225–238 (2016).

187. M. J. Gerdes, C. J. Sevinsky, A. Sood, S. Adak, M. O. Bello, A. Bordwell, A. Can, A. Corwin, S. Dinn, R. J. Filkins, D. Hollman, V. Kamath, S. Kaanumalle, K. Kenny, M. Larsen, M. Lazare, Q. Li, C. Lowes, C. C. McCulloch, E. McDonough, M. C. Montalto, Z. Pang, J. Rittscher, A. Santamaria-Pang, B. D. Sarachan, M. L. Seel, A. Seppo, K. Shaikh, Y. Sui, J. Zhang, F. Ginty, Highly multiplexed single-cell analysis of formalin- fixed, paraffin-embedded cancer tissue. Proc. Natl. Acad. Sci. U. S. A. 110, 11982– 11987 (2013).

188. C. Wählby, F. Erlandsson, E. Bengtsson, A. Zetterberg, Sequential immunofluorescence staining and image analysis for detection of large numbers of antigens in individual cell nuclei. Cytometry. 47, 32–41 (2002).

189. W. Schubert, B. Bonnekoh, A. J. Pommer, L. Philipsen, R. Böckelmann, Y. Malykh, H. Gollnick, M. Friedenberger, M. Bode, A. W. M. Dress, Analyzing proteome topology and function by automated multidimensional fluorescence microscopy. Nat. Biotechnol. 24, 1270–1278 (2006).

190. C. Giesen, H. A. O. Wang, D. Schapiro, N. Zivanovic, A. Jacobs, B. Hattendorf, P. J. Schüffler, D. Grolimund, J. M. Buhmann, S. Brandt, Z. Varga, P. J. Wild, D. Günther, B. Bodenmiller, Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nature Methods. 11 (2014), pp. 417–422.

191. D. R. Bandura, V. I. Baranov, O. I. Ornatsky, A. Antonov, R. Kinach, X. Lou, S. Pavlov, S. Vorobiev, J. E. Dick, S. D. Tanner, Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).

192. M. Angelo, S. C. Bendall, R. Finck, M. B. Hale, C. Hitzman, A. D. Borowsky, R. M. Levenson, J. B. Lowe, S. D. Liu, S. Zhao, Y. Natkunam, G. P. Nolan, Multiplexed ion beam imaging of human breast tumors. Nature Medicine. 20 (2014), pp. 436–442.

193. Y. Goltsev, N. Samusik, J. Kennedy-Darling, S. Bhate, M. Hale, G. Vazquez, S. Black,

57

G. P. Nolan, Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell. 174, 968–981.e15 (2018).

194. B. B. Misra, C. D. Langefeld, M. Olivier, L. A. Cox, Integrated Omics: Tools, Advances, and Future Approaches. J. Mol. Endocrinol. (2018), doi:10.1530/JME-18-0055.

195. Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu, J. Hearn, A. Emili, X. S. Xie, Quantifying E. coli Proteome and Transcriptome with Single-Molecule Sensitivity in Single Cells. Science. 329 (2010), pp. 533–538.

196. A. Ståhlberg, C. Thomsen, D. Ruff, P. Åman, Quantitative PCR Analysis of DNA, RNAs, and Proteins in the Same Single Cell. Clinical Chemistry. 58 (2012), pp. 1682– 1691.

197. S. Darmanis, C. J. Gallant, V. D. Marinescu, M. Niklasson, A. Segerman, G. Flamourakis, S. Fredriksson, E. Assarsson, M. Lundberg, S. Nelander, B. Westermark, U. Landegren, Simultaneous Multiplexed Measurement of RNA and Proteins in Single Cells. Cell Reports. 14 (2016), pp. 380–389.

198. M. Stoeckius, C. Hafemeister, W. Stephenson, B. Houck-Loomis, P. K. Chattopadhyay, H. Swerdlow, R. Satija, P. Smibert, Simultaneous epitope and transcriptome measurement in single cells. Nature Methods. 14 (2017), pp. 865–868.

199. D. Schulz, V. R. T. Zanotelli, J. R. Fischer, D. Schapiro, S. Engler, X.-K. Lun, H. W. Jackson, B. Bodenmiller, Simultaneous Multiplexed Imaging of mRNA and Proteins with Subcellular Resolution in Breast Cancer Tissue Samples by Mass Cytometry. Cell Syst. 6, 531 (2018).

200. Visium Spatial Proteomics - Tissue profiling with transcriptomics ad protein co- detection. 10X Genomics (2020), (available at https://www.10xgenomics.com/products/spatial-proteomics).

201. S. Sangaletti, F. Iannelli, F. Zanardi, V. Cancila, P. Portararo, L. Botti, D. Vacca, C. Chiodoni, A. Di Napoli, C. Valenti, C. Rizzello, M. C. Vegliante, F. Pisati, A. Gulino, M. Ponzoni, M. P. Colombo, C. Tripodo, Intra-tumour heterogeneity of diffuse large B- cell lymphoma involves the induction of diversified stroma-tumour interfaces. EBioMedicine. 61, 103055 (2020).

202. A. Sharma, J. J. W. Seow, C.-A. Dutertre, R. Pai, C. Blériot, A. Mishra, R. M. M. Wong, G. S. N. Singh, S. Sudhagar, S. Khalilnezhad, S. Erdal, H. M. Teo, A. Khalilnezhad, S. Chakarov, T. K. H. Lim, A. C. Y. Fui, A. K. W. Chieh, C. P. Chung, G. K. Bonney, B. K.- P. Goh, J. K. Y. Chan, P. K. H. Chow, F. Ginhoux, R. DasGupta, Onco-fetal Reprogramming of Endothelial Cells Drives Immunosuppressive Macrophages in Hepatocellular Carcinoma. Cell. 183, 377–394.e21 (2020).

203. R. Cabrita, M. Lauss, A. Sanna, M. Donia, M. Skaarup Larsen, S. Mitra, I. Johansson, B. Phung, K. Harbst, J. Vallon-Christersson, A. van Schoiack, K. Lövgren, S. Warren, K. Jirström, H. Olsson, K. Pietras, C. Ingvar, K. Isaksson, D. Schadendorf, H. Schmidt, L. Bastholt, A. Carneiro, J. A. Wargo, I. M. Svane, G. Jönsson, Author Correction: Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature. 580, E1 (2020).

204. J. Decalf, M. L. Albert, J. Ziai, New tools for pathology: a user’s review of a highly

58 Britta Lötstedt

multiplexed method forin situanalysis of protein and RNA expression in tissue. The Journal of Pathology. 247 (2019), pp. 650–661.

205. G. B. Huffnagle, R. P. Dickson, N. W. Lukacs, The respiratory tract microbiome and lung inflammation: a two-way street. Mucosal Immunol. 10, 299–306 (2017).

206. A. Minalyan, L. Gabrielyan, D. Scott, J. Jacobs, J. R. Pisegna, The Gastric and Intestinal Microbiome: Role of Proton Pump Inhibitors. Curr. Gastroenterol. Rep. 19, 42 (2017).

207. E. S. Charlson, K. Bittinger, A. R. Haas, A. S. Fitzgerald, I. Frank, A. Yadav, F. D. Bushman, R. G. Collman, Topographical continuity of bacterial populations in the healthy human respiratory tract. Am. J. Respir. Crit. Care Med. 184, 957–963 (2011).

208. N. Segata, S. K. Haake, P. Mannon, K. P. Lemon, L. Waldron, D. Gevers, C. Huttenhower, J. Izard, Composition of the adult digestive tract bacterial microbiome based on seven mouth surfaces, tonsils, throat and stool samples. Genome Biol. 13, R42 (2012).

209. B. G. Wu, L. N. Segal, Lung Microbiota and Its Impact on the Mucosal Immune Phenotype. Microbiol Spectr. 5 (2017), doi:10.1128/microbiolspec.BAD-0005-2016.

210. C. M. Bassis, J. R. Erb-Downward, R. P. Dickson, C. M. Freeman, T. M. Schmidt, V. B. Young, J. M. Beck, J. L. Curtis, G. B. Huffnagle, Analysis of the upper respiratory tract microbiotas as the source of the lung and gastric microbiotas in healthy individuals. MBio. 6, e00037 (2015).

211. G. B. Huffnagle, R. P. Dickson, N. W. Lukacs, The respiratory tract microbiome and lung inflammation: a two-way street. Mucosal Immunol. 10, 299–306 (2017).

212. C. M. Bassis, J. R. Erb-Downward, R. P. Dickson, C. M. Freeman, T. M. Schmidt, V. B. Young, J. M. Beck, J. L. Curtis, G. B. Huffnagle, Analysis of the upper respiratory tract microbiotas as the source of the lung and gastric microbiotas in healthy individuals. MBio. 6, e00037 (2015).

213. R. Rosen, L. Hu, J. Amirault, U. Khatwa, D. V. Ward, A. Onderdonk, 16S community profiling identifies proton pump inhibitor related differences in gastric, lung, and oropharyngeal microflora. J. Pediatr. 166, 917–923 (2015).

214. C. J. Bernal, I. Aka, R. J. Carroll, J. R. Coco, J. J. Lima, S. A. Acra, D. M. Roden, S. L. Van Driest, CYP2C19 Phenotype and Risk of Proton Pump Inhibitor-Associated Infections. Pediatrics. 144 (2019), doi:10.1542/peds.2019-0857.

215. S. A. Hirji, B. C. Gulack, B. R. Englum, P. J. Speicher, A. M. Ganapathi, A. A. Osho, R. A. Shimpi, A. Perez, M. G. Hartwig, Lung transplantation delays gastric motility in patients without prior gastrointestinal surgery-A single-center experience of 412 consecutive patients. Clin. Transplant. 31 (2017), doi:10.1111/ctr.13065.

216. F. Jamie Dy, D. Freiberger, E. Liu, D. Boyer, G. Visner, R. Rosen, Impact of gastroesophageal reflux and delayed gastric emptying on pediatric lung transplant outcomes. J. Heart Lung Transplant. 36, 854–861 (2017).

217. Y. Raviv, F. D’Ovidio, A. Pierre, C. Chaparro, M. Freeman, S. Keshavjee, L. G. Singer,

59

Prevalence of gastroparesis before and after lung transplantation and its association with lung allograft outcomes. Clin. Transplant. 26, 133–142 (2012).

218. E. Bolyen, J. R. Rideout, M. R. Dillon, N. A. Bokulich, C. C. Abnet, G. A. Al-Ghalith, H. Alexander, E. J. Alm, M. Arumugam, F. Asnicar, Y. Bai, J. E. Bisanz, K. Bittinger, A. Brejnrod, C. J. Brislawn, C. T. Brown, B. J. Callahan, A. M. Caraballo-Rodríguez, J. Chase, E. K. Cope, R. Da Silva, C. Diener, P. C. Dorrestein, G. M. Douglas, D. M. Durall, C. Duvallet, C. F. Edwardson, M. Ernst, M. Estaki, J. Fouquier, J. M. Gauglitz, S. M. Gibbons, D. L. Gibson, A. Gonzalez, K. Gorlick, J. Guo, B. Hillmann, S. Holmes, H. Holste, C. Huttenhower, G. A. Huttley, S. Janssen, A. K. Jarmusch, L. Jiang, B. D. Kaehler, K. B. Kang, C. R. Keefe, P. Keim, S. T. Kelley, D. Knights, I. Koester, T. Kosciolek, J. Kreps, M. G. I. Langille, J. Lee, R. Ley, Y.-X. Liu, E. Loftfield, C. Lozupone, M. Maher, C. Marotz, B. D. Martin, D. McDonald, L. J. McIver, A. V. Melnik, J. L. Metcalf, S. C. Morgan, J. T. Morton, A. T. Naimey, J. A. Navas-Molina, L. F. Nothias, S. B. Orchanian, T. Pearson, S. L. Peoples, D. Petras, M. L. Preuss, E. Pruesse, L. B. Rasmussen, A. Rivers, M. S. Robeson 2nd, P. Rosenthal, N. Segata, M. Shaffer, A. Shiffer, R. Sinha, S. J. Song, J. R. Spear, A. D. Swafford, L. R. Thompson, P. J. Torres, P. Trinh, A. Tripathi, P. J. Turnbaugh, S. Ul-Hasan, J. J. J. van der Hooft, F. Vargas, Y. Vázquez-Baeza, E. Vogtmann, M. von Hippel, W. Walters, Y. Wan, M. Wang, J. Warren, K. C. Weber, C. H. D. Williamson, A. D. Willis, Z. Z. Xu, J. R. Zaneveld, Y. Zhang, Q. Zhu, R. Knight, J. G. Caporaso, Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

219. N. A. Bokulich, B. D. Kaehler, J. R. Rideout, M. Dillon, E. Bolyen, R. Knight, G. A. Huttley, J. Gregory Caporaso, Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome. 6, 90 (2018).

220. K. R. Maynard, L. Collado-Torres, L. M. Weber, C. Uytingco, B. K. Barry, S. R. Williams, J. L. Catallini, M. N. Tran, Z. Besich, M. Tippani, J. Chew, Y. Yin, J. E. Kleinman, T. M. Hyde, N. Rao, S. C. Hicks, K. Martinowich, A. E. Jaffe, Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, , doi:10.1101/2020.02.28.969931.

221. K. Thrane, H. Eriksson, J. Maaskola, J. Hansson, J. Lundeberg, Spatially Resolved Transcriptomics Enables Dissection of Genetic Heterogeneity in Stage III Cutaneous Malignant Melanoma. Cancer Res. 78, 5970–5979 (2018).

222. M. Asp, F. Salmén, P. L. Ståhl, S. Vickovic, U. Felldin, M. Löfling, J. Fernandez Navarro, J. Maaskola, M. J. Eriksson, B. Persson, M. Corbascio, H. Persson, C. Linde, J. Lundeberg, Spatial detection of fetal marker genes expressed at low level in adult human heart tissue. Sci. Rep. 7, 12941 (2017).

223. E. Berglund, J. Maaskola, N. Schultz, S. Friedrich, M. Marklund, J. Bergenstråhle, F. Tarish, A. Tanoglidi, S. Vickovic, L. Larsson, F. Salmén, C. Ogris, K. Wallenborg, J. Lagergren, P. Ståhl, E. Sonnhammer, T. Helleday, J. Lundeberg, Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat. Commun. 9, 2419 (2018).

224. A. L. Ji, A. J. Rubin, K. Thrane, S. Jiang, D. L. Reynolds, R. M. Meyers, M. G. Guo, B. M. George, A. Mollbrink, J. Bergenstråhle, L. Larsson, Y. Bai, B. Zhu, A. Bhaduri, J. M. Meyers, X. Rovira-Clavé, S. T. Hollmig, S. Z. Aasi, G. P. Nolan, J. Lundeberg, P. A. Khavari, Multimodal Analysis of Composition and Spatial Architecture in Human

60 Britta Lötstedt

Squamous Cell Carcinoma. Cell. 182, 1661–1662 (2020).

225. K. Carlberg, M. Korotkova, L. Larsson, A. I. Catrina, P. L. Ståhl, V. Malmström, Exploring inflammatory signatures in arthritic joint biopsies with Spatial Transcriptomics. Sci. Rep. 9, 18975 (2019).

226. A. Tanay, A. Regev, Scaling single-cell genomics from phenomenology to mechanism. Nature. 541, 331–338 (2017).

227. Y. Liu, A. Beyer, R. Aebersold, On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell. 165, 535–550 (2016).

228. G.-W. Li, D. Burkhardt, C. Gross, J. S. Weissman, Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 157, 624–635 (2014).

229. L. W. Barrett, S. Fletcher, S. D. Wilton, Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell. Mol. Life Sci. 69, 3613–3634 (2012).

230. E. Rackaityte, S. V. Lynch, The human microbiome in the 21st century. Nature Communications. 11 (2020), , doi:10.1038/s41467-020-18983-8.

231. N. S. Ikari, BACTERICIDAL ANTIBODY TO ESCHERICHIA COLI IN GERM-FREE MICE. Nature. 202, 879–881 (1964).

232. J. R. Pleasants, Rearing germfree cesarean-born rats, mice, and rabbits through weaning. Ann. N. Y. Acad. Sci. 78, 116–126 (1959).

233. A. K. Hansen, C. H. F. Hansen, L. Krych, D. S. Nielsen, Impact of the gut microbiota on rodent models of human disease. World Journal of Gastroenterology. 20 (2014), pp. 17727–17736.

234. W. Zheng, S. Zhao, Y. Yin, H. Zhang, D. M. Needham, E. D. Evans, C. L. Dai, P. J. Lu, E. J. Alm, D. A. Weitz, Microbe-seq: high-throughput, single-microbe genomics with strain resolution, applied to a human gut microbiome. bioRxiv (2020), doi:10.1101/2020.12.14.422699.

235. C. Tropini, K. A. Earle, K. C. Huang, J. L. Sonnenburg, The Gut Microbiome: Connecting Spatial Organization to Function. Cell Host Microbe. 21, 433–442 (2017).

236. R. U. Sheth, M. Li, W. Jiang, P. A. Sims, K. W. Leong, H. H. Wang, Spatial metagenomic characterization of microbial biogeography in the gut. Nat. Biotechnol. 37, 877–883 (2019).

237. H. Shi, Q. Shi, B. Grodner, J. S. Lenz, W. R. Zipfel, I. L. Brito, I. De Vlaminck, Highly multiplexed spatial mapping of microbial communities. Nature. 588, 676–681 (2020).

61