Old Herborn University Seminar Monograph 31: Evolutionary biology of the virome, and impacts in human health and disease. Editors: Peter J. Heidt, Pearay L. Ogra, Mark S. Riddle and Volker Rusch. Old Herborn University Foundation, Herborn, Germany: 89-96 (2017).

READING THE REPERTOIRE AGAINST VIRUSES, MICROBES, AND THE SELF

URI LASERSON

Department of Genetics and Genomic Sciences and Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA

SUMMARY

Accessing the information in the antibody repertoire promises to provide an unprecedented window into the exposure history and disease susceptibility of an individual. However, the non-heritability and massive diversity of the receptors are major hurdles to interpreting the repertoire. We have devel- oped the phage sequencing (PhIP-seq) that inex- pensively generates high-dimensional functional profiles of a complex mix- ture of of unknown specificity. The assay leverages the recent ad- vances in next-generation DNA sequencing along with synthesis of large pools of DNA oligonucleotides. We have successfully applied PhIP-seq to characterize auto-antigen profiles in patients with autoimmune diseases along with a global survey of antiviral immune responses. We are further developing the assay to apply to host-microbiome interactions, among other application areas.

INTRODUCTION

The repertoire of antibodies present in studying antibody and T cell receptor a single drop of blood should reveal an (TCR) repertoires (Georgiou et al., enormous amount of information about 2014). The use of NGS to study the im- an individual, such as their history of mune repertoire has resulted in many infection, travel, , allergies, insights about the core function of im- and even their susceptibility to future munity and its role in virtually every diseases (Lerner et al., 1991). How- disease, including infections (Parames- ever, multiple hurdles have obstructed waran et al., 2013), cancer (Li et al., our ability to glean such insights from 2016; Liu and Mardis, 2017), and the repertoire: are more diffi- autoimmunity (Dekosky et al., 2015). cult to work with than nucleic acids, A holy grail in the field of adaptive the non-germline nature of the reper- immunity is the creation of a mapping toire complicates population studies between antibody/TCR sequences and (since most antibodies are rare antibod- their cognate , such that an ies), and the enormous sequence diver- antigen/ can be determined sity has historically exceeded the capa- purely from inspecting the antibody bilities of any measurement technol- sequence and vice versa. But despite ogy. The introduction of next-genera- the massive increase in antibody se- tion DNA sequencing technology quence data (Breden et al., 2017), most (NGS) a decade ago (Shendure et al., studies typically report either aggregate 2017) has reinvigorated excitement in statistics (e.g., repertoire diversity met-

89 rics, germline gene usage) or compare as random oligomers with lengths <20 them across experimental conditions to amino acids. However, libraries of ran- discover biomarkers (Yaari and Klein- dom peptides "waste" clones on se- stein, 2015). Alternatively, some stud- quences that are not observed in nature. ies have characterized antibody reper- Furthermore, the use of short peptides toires in the context of a single antigen precludes their ability to exhibit natural and require low-throughput sorting of structure. To circumvent these limita- antigen-specific cells. The ability to tions, we have leveraged improvements predict immune receptor function from in the ability to synthesize large, com- its sequence has only been demon- plex pools of DNA oligonucleotides strated in very narrow situations and (Kosuri and Church, 2014). Similarly primarily for TCRs (Emerson et al., to pooled-screen experiments, the abil- 2017; Glanville et al., 2017). ity to synthesize large pools of DNA Here we describe phage immuno- oligos to-order allows us to "synthe- precipitation sequencing (PhIP-seq) size" many experiments and execute (Larman et al., 2011), an NGS-based them in parallel (Gasperini et al., assay that can determine the specificity 2016). of a repertoire of antibodies against As described below, PhIP-seq has hundreds of thousands of antigens been used successfully to generate simultaneously. The assay makes use high-dimensional profiles of antibody of libraries containing a immunity in the context of autoimmun- set of antigens, each of which will be ity and viral infection. In the following measured for binding to some antibody sections, we review some of the results of interest. Because the universe of of these efforts, along with preliminary possible antigens grows exponentially methods for expanding the scope of with peptide length, such peptide li- PhIP-seq to bacterial antigens. braries have typically been constructed

PHAGE IMMUNOPRECIPITATION SEQUENCING

Given a monoclonal antibody or a com- limited by what can be cloned into the plex mixture of antibodies (e.g., serum) phage display library. of unknown specificity, the PhIP-seq For example, the set of human assay can determine the specificity of sequences can be tiled with a the antibodies from a large set of candi- set of ~400,000 unique peptide 36- date epitopes that have been selected in mers. The DNA oligos that code for the advance. Specifically, the "query anti- 36-mers can be synthesized by a body" is used to immunoprecipitate a commercial DNA vendor (e.g., Twist phage display library expressing many Bioscience or Agilent Technologies) antigens of interest. The phage clones and cloned into a commercially availa- that are pulled down by the antibody ble bacteriophage T7 protein display are quantified using next-generation system (e.g., Novagen T7Select) to DNA sequencing. Therefore, in a yield a phage library containing every single ~$30 assay, small amounts of possible human protein 36-mer. The antibody (nanogram to microgram phage library can then be used to quantities) can be assayed against hun- identify unknown auto-antigens using dreds of thousands of candidate serum samples from patients with epitopes in parallel. In principle, the autoimmune diseases. diversity of the epitope library is only It is estimated that a majority of

90

antibody epitopes are "conformational" translational modifications when appli- rather than linear (Sela et al., 1967; cable. These limitations ultimately lead Barlow et al., 1986). A clear limitation to a loss of sensitivity and make PhIP- of this method is that the epitopes are seq somewhat unsuitable for diagnostic limited to the length of DNA oligo- applications. nucleotides that can be synthesized en It is our experience that robust im- masse. As of this writing, synthesis of mune responses generally include at large oligo pools is generally limited to least some responses to linear epitopes. 200-300 bp sequences, in which some Therefore, PhIP-seq is highly suitable of the sequence must be used for com- for hypothesis generation experiments, mon adaptor sequence for cloning similarly to genome wide association (allowing ~56 aa peptides). Even if we studies (GWAS). Instead of testing could synthesize or assemble sig- SNPs for association with a phenotype, nificantly larger pieces of DNA, we we can use PhIP-seq to test whether would eventually reach the capacity of particular antibody specificities are the phage display system to display associated with a phenotype. Despite larger proteins without interfering with its limitations, PhIP-seq provides an the assembly of infective phage inexpensive means to generate a very particles. Furthermore, because the high-dimensional characterization of peptides are processed in E. coli, we the antibody specificity repertoire. cannot include any mammalian post-

DISCOVERY OF AUTO-ANTIGENS

PhIP-seq was first demonstrated by body that is not causing any pathology. building a library of ~400,000 36-mer Interestingly, some auto-antigen speci- peptides tiling >24,000 human open ficities were observed in >20 healthy reading frames with 7 aa overlaps (Lar- control subjects, such at MAGEE1, man et al., 2011). Our initial test ACVR2B, and TTN. But given the searched for auto-antigens in several healthy status of these individuals, the cancer patients with paraneoplastic most likely explanation is that they neurological disorder (PND), a cancer- represent common cross-reactivities. induced autoimmune syndrome. After Using the samples from T1D patients, successfully detecting known and novel we analysed the sensitivity of the PhIP- auto-antigens in these patients, we un- seq assay by comparing peptide enrich- dertook a larger survey of ~300 sub- ments to RIA assays for known auto- jects with multiple sclerosis (MS), type antibodies against insulin, ZnT8A, 1 diabetes (T1D), or rheumatoid arthri- GAD2, and PTPRN. PhIP-seq with the tis (RA), along with healthy controls 36-mer library was only able to recover (Larman et al., 2013A). Analysis of the reactivities against GAD2 and PTPRN repertoire of autoantibodies in healthy in some patients. Analysis of RA pa- controls revealed that most individuals tients did not find any new auto-anti- exhibit antibodies against a variety of gens; however, we were able to cluster self-epitopes. However, the vast major- a subset of patients into one of two ity of these self-specificities were "pri- specificity profiles, which was not cor- vate" in the sense that they were ob- related with their seropositivity status. served in only a single individual, Finally, by performing a motif analysis likely a result of a cross-reactive anti- on the positive peptide enrichments of

91 the MS cohort, we were able to reca- gen for inclusion body myositis was pitulate a previously discovered reac- discovered to be cytosolic 5'-nucleo- tivity to the BRRF2 protein. In a tidase 1A (Larman et al., 2013B). separate PhIP-seq study, the auto-anti-

CHARACTERIZING GLOBAL VIRAL IMMUNITY

Following the demonstration and use of (24.3%) were all lower than expected, the human PhIP-seq library, the possibly reflecting a narrowing of the "VirScan" PhIP-seq library was con- immune response due to absence of the structed containing 93,904 56-mer pep- antigens, or the low specificity of PhIP- tides (28 aa overlap) derived from 206 seq due to the limitation to linear viral species that infect humans (e.g., epitopes. Most interestingly, the popu- HIV, HCV, EBV, influenza) (Xu et al., lation analysis of VirScan provided a 2015). Sera from 569 individuals of dramatic display of immunodominance various ages primarily from the United across a large number of viral proteins. States but also including sera from When individuals displayed antibodies Thailand, South Africa, and Peru were against a 56-mer derived from a viral characterized with VirScan to generate protein, they generally all showed a global survey of immunological specificity for the same 56-mer tile. memory/repertoire against viral infec- However, some cases showed that the tion. A subset of samples were tested immunodominant epitopes varied with with ELISA against HIV1, HCV, geolocale. Because the epitope map- HSV1, and HSV2, and compared to the ping resolution of VirScan is only 28 VirScan results. When aggregating amino acids, several 56-mer tiles were VirScan epitopes to the virus level, chosen for alanine scanning mutagene- VirScan exhibited >90% sensitivity for sis in order to pinpoint the location of all viruses with 100% specificity for the epitope. In most cases, individuals HIV1, HSV1, and HSV2. The VirScan- specific for a particular 56-mer tile also computed fraction of individuals with showed that the sera was specific for immunity to CMV (48.5%) and EBV the same ~10-mer epitope. Overall, (87.1%) recapitulated known preva- VirScan has been shown to provide an lences of the diseases. In contrast, the unprecedentedly high-dimensional computed prevalences of Influenza A view of global immunity against the (53.4%), Poliovirus (33.7%), and VZV human virome.

BUILDING LIBRARIES FOR THE MICROBIOME

The influence of the microbiome on act with the commensal flora and pro- human health and disease has been re- tect against pathogens (Fagarassan and ceiving increasing attention (Cho and Honjo, 2003). However, the specific Blaser, 2012). This interaction is mechanisms and antigens that drive largely mediated through the mucosal this interaction are currently unknown immune system (McCoy et al., 2017). (Kubinak and Round, 2016). Signifi- Indeed, the majority of antibody syn- cant progress has been made with the thesized by the human body is IgA that development of the IgA-seq assay, in is secreted into the gut lumen to inter- which gut microbes are sorted into

92

IgA-bound and IgA-free fractions, fol- main advantages of this approach are lowed by 16S rRNA sequencing to de- that we are not limited by current termine which taxa are targeted by knowledge of the genetic contents of secretory IgA (sIgA) (Palm et al., the microbiome and we can also clone 2014; Kau et al., 2015). While this larger fragments than can be synthe- technique has generated useful hypoth- sized as oligonucleotides. However, eses in the pathogenesis of gut inflam- because the genomic DNA of the mation and other disorders, the specific microorganisms is randomly sheared, antigens involved cannot be determined we expect only one sixth of the clones from the relatively low-resolution 16S to be in the correct reading frame. data. While out-of-frame clones may show We have been developing library elevated levels of background noise, construction methods for bacterial anti- analysis of the sequences should distin- gens in order to characterize the sIgA guish which clones represent true ORF repertoire at antigen-resolution. While enrichments. Alternatively, a recent we have designed synthetic libraries for method has been developed based on particular bacterial species (Staphylo- padlock probes in order to capture large coccus aureus and Streptococcus py- DNA fragments in the correct reading ogenes), the microbiome as a whole is frame (Tosi et al., 2017). We hope that too genetically diverse to practically these libraries will have many applica- synthesize as DNA oligos using availa- tions, such as understanding the role of ble technology. As an alternative, we the mucosal immune system in confer- have developed a protocol for cloning ring protection against infection, or un- shotgun metagenomic libraries into the derstanding the host immune response T7 phage display system. The two in inflammatory bowel disease.

ALTERNATIVE TECHNOLOGIES

A number of assays similar to PhIP-seq overcome the size limitation of phage have been developed that have differ- display, the Elledge group has also ent trade-offs. For ease of library con- developed PLATO, a system analogous struction and non-biasing of sequences, to PhIP-seq that uses some have used random peptides in to express full ORFs (Zhu et al., 2013; display systems (Pantazes et al., 2016). Larman et al., 2014). In contrast to di- However, these libraries tend to rect measurement of specificities, the "waste" many clones on epitopes that recently published GLIPH technique are not observed in nature. Others have can perform semi-supervised clustering used alternative protein display tech- of TCRs in order to determine their nologies to phage display (Boder and specificities (Glanville et al., 2017). Wittrup, 1997; Spatola et al., 2013; Zhu Finally, protein microarrays are a well- et al., 2013). Most intriguingly, a sys- characterized alternative to protein dis- tem for characterizing TCR specifici- play-based methods (Templin et al., ties has been developed by expressing 2002). While they can sometimes ac- large libraries of peptide-MHC com- commodate full ORFs, they suffer plexes using (Birnbaum some solid-state kinetics, generally et al., 2014). However, the system is require larger amounts of input mate- limited to MHC-II, and each MHC rial, and are relatively expensive and allele must be individually optimized lower-throughput than NGS-based as- and engineered to fold correctly. To says.

93 DISCUSSION

The simultaneous improvement of cloning bacterial libraries. Further- next-generation DNA sequencing tech- more, we have designed libraries for nology and large-scale DNA synthesis allergens and toxins, among other clas- technology has enabled the develop- ses of antigens of interest. ment of highly multiplexed assays at By generating a large number of low costs. PhIP-seq provides the ability PhIP-seq profiles, our data sets could to assay antibody specificities with un- be used to train B cell epitope predic- precedented breadth and will provide tors (e.g., BepiPred; Larsen et al., new insights into the landscape of host 2006). Most intriguingly, coupling the immune responses to their environ- PhIP-seq assay to single-cell RNA-seq ments. One of the primary bottlenecks methods (Stubbington et al., 2017) is the availability of libraries contain- opens the possibility of generating ing antigens of interest. While we training data sets that could allow us to described the use of libraries contain- achieve the holy grail of predicting ing auto-antigens and viral antigens, we antibody specificity from primary se- are actively developing methods for quence.

ACKNOWLEDGEMENTS

I would like to thank Ben Larman, Meimei Shan, and Tim O'Donnell for helpful discussions.

LITERATURE

Barlow, D.J., Edwards, M.S., and Thornton, Cowell, L.G., Scott, J.K., and Thomas B. J.M.: Continuous and discontinuous protein Kepler, T.B.: Reproducibility and Reuse of antigenic determinants. Nature 322, 747- Adaptive Immune Receptor Repertoire 748 (1986). Data. Front. Immunol. 8, 1418 (2017). Birnbaum, M.E., Mendoza, J.L., Sethi, D.K., Cho, I. and Blaser, M.J.: The human Dong, S., Glanville, J., Dobbins, J., Ozkan, microbiome: at the interface of health and E., Davis, M.M., Wucherpfennig, K.W., disease. Nat. Rev. Genet. 13, 260-270 Garcia, K.C.: Deconstructing the peptide- (2012). MHC specificity of T cell recognition. Cell Dekosky, B.J., Kojima, T., Rodin, A., Charab, 157, 1073-1087 (2014). W., Ippolito, G.C., Ellington, A.D., and Boder, E.T. and Wittrup, K.D.: Yeast surface Georgiou, G.: In-depth determination and display for screening combinatorial analysis of the human paired heavy- and polypeptide libraries. Nat. Biotechnol. 15, light-chain antibody repertoire. Nat. Med. 553-557 (1997). 21, 86-91 (2015). Breden, F., Luning Prak, E.T., Peters, B., Emerson, R.O., DeWitt, W.S., Vignali, M., Rubelt, F., Schramm, C.A., Busse, C.E., Gravley, J., Hu, J.K., Osborne, E.J., Vander Heiden, J.A., Christley, S., Bukhari, Desmarais, C., Klinger, M., Carlson, C.S., S.A.C., Thorogood, A., Matsen, F.A. IV, Hansen, J.A., Rieder, M., and Robins, H.S.: Wine, Y., Laserson, U., Klatzmann, D., Immunosequencing identifies signatures of Douek, D.C., Lefranc, M.P., Collins, A.M., cytomegalovirus exposure history and Bubela, T., Kleinstein, S.H., Watson, C.T., HLA-mediated effects on the T cell

94

repertoire. Nat. Genet. 49, 659-665 (2017). D.A., Plenge, R.M., Nigrovic, P.A., De Fagarasan, S.S. and Honjo, T.T.: Intestinal IgA Jager, P.L., Weets, I., Martens, G.A., synthesis: regulation of front-line body O'Connor, K.C., and Elledge, S.J.: PhIP- defences. Nat. Rev. Immunol. 3, 63-72 Seq characterization of autoantibodies from (2003). patients with multiple sclerosis, type 1 Gasperini, M., Starita, L., and Shendure, J.: diabetes and rheumatoid arthritis. J. The power of multiplexed functional Autoimmun. 43, 1-9 (2013A). analysis of genetic variants. Nat. Protoc. 11, Larman, H.B., Salajegheh, M., Nazareno, R., 1782-1787 (2016). Lam, T., Sauld, J., Steen, H., Kong, S.W., Georgiou, G., Ippolito, G.C., Beausang, J., Pinkus, J.L., Amato, A.A., Elledge, S.J., Busse, C.E., Wardemann, H., and Quake, and Greenberg, S.A.: Cytosolic 5′- S.R.: The promise and challenge of high- nucleotidase 1A autoimmunity in sporadic throughput sequencing of the antibody inclusion body myositis. Ann. Neurol. 73, repertoire. Nat. Biotechnol. 32, 158-168 408-418 (2013B). (2014). Larman, H.B., Liang, A.C., Elledge, S.J., and Glanville, J., Huang, H., Nau, A., Hatton, O., Zhu, J.: Discovery of protein interactions Wagar, L.E., Rubelt, F., Ji, X., Han, A., using parallel analysis of translated ORFs Krams, S.M., Pettus, C., Haas, N., (PLATO). Nat Protoc. 9, 90-103 (2014). Arlehamn, C.S.L., Sette, A., Boyd, S.D., Larsen, J.E.P., Lund, O., and Nielsen, M.: Scriba, T.J., Martinez, O.M., and Davis, Improved method for predicting linear B- M.M.: Identifying specificity groups in the cell epitopes. Immunome Res. 2, 2 (2006). T cell receptor repertoire. Nature 547, 94- Lee, J.Y., Artiles, K.L., Zompi, S., Vargas, 98 (2017). M.J., Simen, B.B., Hanczaruk, B., Kau, A.L., Planer, J.D., Liu, J., Rao, S., McGowan, K.R., Tariq, M.A., Pourmand, Yatsunenko, T., Trehan, I., Manary, M.J., N., Koller, D., Balmaseda, A., Boyd, S.D., Liu, T.C., Stappenbeck, T.S., Maleta, K.M., Harris, E., and Fire, A.Z.: Convergent Ashorn, P., Dewey, K.G., Houpt, E.R., antibody signatures in human dengue. Cell Hsieh, C.S., and Gordon, J.I.: Functional Host Microbe 13, 691-700 (2013). characterization of IgA-targeted bacterial Lerner, R.A., Barbas, C.F. 3rd., Kang, A.S., taxa from undernourished Malawian and Burton, D.R.: On the use of children that produce diet-dependent combinatorial antibody libraries to clone enteropathy. Sci. Transl. Med. 7, 276ra24 the “fossil record” of an individual’s (2015). immune response. Proc. Natl. Acad. Sci. Kosuri, S. and Church, G.M.: Large-scale de USA 88, 9705-9706 (1991). novo DNA synthesis: technologies and Li, B., Li, T., Pignon, J.C., Wang, B., Wang, J., applications. Nat. Methods 11, 499-507 Shukla, S.A., Dou, R., Chen, Q., Hodi, F.S., (2014). Choueiri, T.K., Wu, C., Hacohen, N., Kubinak, J.L. and Round, J.L.: Do antibodies Signoretti, S., Liu, J.S., and Liu, X.S.: select a healthy microbiota? Nat. Rev. Landscape of tumor-infiltrating T cell Immunol. 16, 767-774 (2016). repertoire of human cancers. Nat. Genet. Larman, H.B., Zhao, Z., Laserson, U., Li, M.Z., 48, 725-732 (2016). Ciccia, A., Gakidis, M.A.M., Church, G.M., Liu, X.S. and Mardis, E.R.: Applications of Kesari, S., Leproust, E.M., Solimini, N.L., Immunogenomics to Cancer. Cell 168, 600- and Elledge, S.J.: Autoantigen discovery 612 (2017). with a synthetic human peptidome. Nat. McCoy, K.D., Ronchi, F., and Geuking, M.B.: Biotechnol. 29, 535-541 (2011). Host-microbiota interactions and adaptive Larman, H.B., Laserson, U., Querol, L., immunity. Immunol. Rev. 279, 63-69 Verhaeghen, K., Solimini, N.L., Xu, G.J., (2017). Klarenbeek, PL., Church, G.M., Hafler, Palm, N.W., De Zoete, M.R., Cullen, T.W.,

95 Barry, N.A., Stefanowski, J., Hao, L., system in health and disease. Science 358, Degnan, P.H., Hu, J., Peter, I., Zhang, W., 58-63 (2017). Ruggiero, E., Cho, J.H., Goodman, A.L., Templin, M.F., Stoll, D., Schrenk, M., Traub, and Flavell, R.A.: Immunoglobulin A P.C., Vöhringer, C.F., and Joos, T.O.: coating identifies colitogenic in Protein microarray technology. Drug inflammatory bowel disease. Cell 158, Discov. Today 7, 815-822 (2002). 1000-1010 (2014). Tosi, L., Sridhara, V., Yang, Y., Guan, D., Pantazes, R.J., Reifert, J., Bozekowski, J., Shpilker, P., Segata, N., Larman, H.B., and Ibsen, K.N., Murray, J.A., and Daugherty, Parekkadan, B.: Long-adapter single-strand P.S.: Identification of disease-specific oligonucleotide probes for the massively motifs in the antibody specificity repertoire multiplexed cloning of kilobase genome via next-generation sequencing. Sci. Rep. 6, regions. Nat. Biomed. Eng. 1, 0092 (2017). 30312 (2016). Xu, G.J., Kula, T., Xu, Q., Li, M.Z., Vernon, Sela, M., Schechter, B., Schechter, I., and S.D., Ndung’u, T., Ruxrungtham, K., Borek, F.: Antibodies to Sequential and Sanchez, J., Brander, C., Chung, R.T., Conformational Determinants. Cold Spring O'Connor, K.C., Walker, B., Larman, H.B., Harbor Symp. Quant. Biol. 32, 537-545 and Elledge, S.J.: Viral immunology. (1967). Comprehensive serological profiling of Shendure, J., Balasubramanian, S., Church, human populations using a synthetic human G.M., Gilbert, W., Rogers, J., Schloss, J.A., virome. Science 348, aaa0698 (2015). and Waterston, R.H.: DNA sequencing at Yaari, G. and Kleinstein, S.H.: Practical 40: past, present and future. Nature 550, guidelines for B-cell receptor repertoire 345-353 (2017). sequencing analysis. Genome Med. 7, 121 Spatola, B.N., Murray, .JA., Kagnoff, M., (2015). Kaukinen, K., and Daugherty, P.S.: Zhu, J., Larman, H.B., Gao, G., Somwar, R., Antibody repertoire profiling using Zhang, Z., Laserson, U., Ciccia, A., bacterial display identifies reactivity Pavlova, N., Church, G., Zhang, W., Kesari, signatures of celiac disease. Anal. Chem. S., and Elledge, S.J.: Protein interaction 85, 1215-1222 (2013). discovery using parallel analysis of Stubbington, M.J.T., Rozenblatt-Rosen, O., translated ORFs (PLATO). Nat. Biotechnol. Regev, A., and Teichmann, S.A.: Single- 31, 331-334 (2013). cell transcriptomics to explore the immune

96