Targeted Sequencing: Single cells and single strand breaks by Navpreet Singh Ranu B.S. Chemical Engineering, University of California, Berkeley, 2011

Submitted to the Department of Biological Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 0 Massachusetts Institute of Technology 2018. All rights reserved.

Signature redacted A uthor ...... Department of Biological Engineering 24 May 2018 Signature redacted Certified by...... Paul .jBlainey Associate rofessor Thesis Supervisor Signature redacted A ccepted by ...... Forest White Chair of Graduate Program, Department of Biological Engineering MASSACHUSES INSTITUTE OF TECHNOWGY C0

AUG 2 8 2018 LIBRARIES Thesis Committee Members

Eric J. Alm, Ph.D. (Chair) Professor of Biological Engineering Massachusetts Institute of Technology

Deborah Hung MD,Ph.D. Associate Professor in the Department of Microbiology and Immunobiology Harvard Medical School Associate professor in the Department of Molecular Biology Massachusetts General Hospital

2 Targeted Sequencing: Single cells and single strand breaks by Navpreet Singh Ranu

Submitted to the Department of Biological Engineering on 24 May 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering

Abstract Sequencing the has spurred systematic work on understanding how expression and genomic integrity contribute to disease. To date, 3,519 have been identified as the underlying cause of specific single gene disorders. However, complex diseases still pose a daunting challenge that require both an understanding of cell function as well as how the genome interacts with its cellular environment. Sequencing technologies are now routinely applied to interrogate gene variants, gene expression patterns, accessibility, among other measurements to infer gene and cell function. We build upon past work to address the challenge of tar- geting sequencing effort to cells and genomic loci of interest to probe the molecular mechanisms behind disease. In this thesis, we demonstrate two novel targeted se- quencing methods that can enable a greater understanding of cell function. (1) The development of targeted sequencing in pooled single cell RNA-seq libraries and (2) the development of a novel sequencing approach that allows for the quantification and identification of single stranded break (SSB) locations across the genome. First, we introduce a new targeted sequencing approach to identify rare cells of interest in pooled sequence libraries. Improved throughput in single cell sequencing has enabled the transcriptional profiling of thousands of cells at once. However, due to reliance on pooled library construction methods, it is now more difficult to focus on and analyze particular cells of interest, apart from analyzing the library in its entirety. We designed multiplex PCR primers to simultaneously enrich targeted cells from a complex DNA library pool of single cells. We show how molecular enrich- ment can be used to efficiently target rare cell types, such as the recently identified AXL+SIGLEC6+ dendritic cell (AS DC). Next, we demonstrate a new targeted sequencing approach, called NickSeq, to locate and quantify DNA SSBs with single nucleotide resolution. SSBs are the most common form of DNA damage at an estimated 10,000 per cell per day, but there is no available method to robustly determine the exact sites of damage. SSB accumulation correlates with disease, but it is unknown how the location and amount of damage relate to health outcomes. We intentionally create a unique mutational signature at the SSB that is a fingerprint for this specific type of DNA damage when the is