<<

Evolution of Diversely Functionalized

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050009

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Evolution of Diversely Functionalized Nucleic Acid Polymers

A dissertation presented

by

Zhen Chen

to

The Department of Chemistry and Chemical Biology

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the subject of

Chemistry

Harvard University

Cambridge, Massachusetts

April 2018

© 2016 by Zhen Chen

All rights reserved. Dissertation Advisor: David R. Liu Zhen Chen

Evolution of Diversely Functionalized Nucleic Acid Polymers

Abstract

Darwinian evolution gave rise to biopolymers with a myriad of folded structures and functions, as well as all the diverse life forms on earth that are supported by those molecules.

Amazingly, the essence of molecular evolution can be recapitulated in a test tube with purely biochemical manipulations outside of living cells. For nearly three decades, researchers have been evolving biopolymers – and even biomimetic polymers that are not produced in any living system

– by subjecting diverse libraries of polymers to iterated cycles of templated synthesis, functional selection, and amplification. In principle, any class of sequence-defined whose synthesis can be templated by an amplifiable genetic material (such as DNA) is evolvable in this manner.

To date, however, the evolution of polymers made of building blocks beyond those compatible with enzymes or the ribosome has not been demonstrated.

To explore new kinds of evolvable sequence-defined polymers and discover new classes of receptors, catalysts, and materials, we used a ligase-mediated DNA-templated polymerization system and in vitro selection to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks containing eight chemically diverse side-chains on a DNA backbone.

Through iterated cycles of polymer translation, selection, and reverse translation, we discovered

HFNAPs that bind PCSK9 and IL-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity

(KD = 3 nM). This evolved polymer potently inhibited binding between PCSK9 and the LDL

iii receptor. Structure-activity relationship studies revealed that specific side-chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

iv Table of contents

Chapter 1 Introduction: Nucleic acid-templated synthesis and functional selection of sequence- defined synthetic polymers ...... 1

1.1 – Overview ...... 2

1.2 – Enzymatic templated syntheses of non-natural nucleic acids ...... 4

1.2.1 – Polymerase-catalyzed syntheses of backbone-modified nucleic acids ...... 4

1.2.2 – Polymerase-catalyzed syntheses of -modified nucleic acids ...... 6

1.2.3 – Polymerase-catalyzed syntheses of sugar-modified nucleic acids ...... 9

1.2.4 – Ligase-catalyzed syntheses of non-natural nucleic acids...... 15

1.3 – Ribosomal synthesis of non-natural peptides ...... 17

1.4 – Non-enzymatic polymerization of nucleic acids ...... 20

1.5 – Non-enzymatic polymerization of non-nucleic-acid polymers ...... 27

1.6 – Conclusion and Outlook ...... 33

References ...... 35

Chapter 2 Construction of a highly functionalized nucleic acid polymer library and selection for polymers that bind protein targets ...... 65

2.1 – Introduction ...... 66

2.2 – Design and validation of a new HFNAP translation system ...... 68

2.3 – Selection for HFNAPs that bind protein targets from naïve HFNAP libraries...... 71

2.4 – Characterization of PCSK9-binding HFNAPs ...... 75

2.5 – Characterization of IL-6-binding HFNAPs ...... 77

2.6 – Discusssion ...... 79

v References ...... 81

Chapter 3 Evolution of HFNAP and characterization of a high-affinity binder to PCSK9 ...... 87

3.1 – Introduction ...... 88

3.2 – Evolution of PCSK9-A5 by diversification and further selection ...... 90

3.3 – Activity and SAR characterization of the new consensus HFNAP, PCSK9-Evo5...... 92

3.4 – Milligram scale chemical synthesis of Evo5 ...... 95

3.5 – Evo5 potently disrupts PCSK9-LDLR binding in vitro ...... 98

3.6 – Efforts toward structural characterization of Evo5:PCSK9 complex ...... 99

3.7 – Discussion ...... 101

References ...... 102

Appendix: Materials and Methods ...... 105

A.1 – sequences ...... 105

A.2 – Synthesis and characterization of Phosphoramidite Intermediates ...... 110

A.3 – Synthesis and characterization of functionalized ...... 130

A.4 – Additional experimental procedures ...... 133

References ...... 143

vi

To my family.

vii Acknowledgements

First I would like to thank my advisor, Professor David R. Liu, for his unwavering support and trust throughout my graduate career. He taught me the importance of a broad, ambitious scientific vision, which complements my previous training that focused on technical competence and has helped me become the best scientist I could be. I also benefited tremendously from doing science within the highly interdisciplinary research program he has put together. In addition, I thank him for entrusting me with responsibilities such as writing grant applications and mentoring junior colleagues, which greatly enriched my graduate education experience.

I would like to thank all past and present Liu lab members with whom I had the great fortune to overlap. In particular, Jia Niu was my primary mentor in my first two years, and I thank him for teaching me literally everything I needed to get started in the lab. I thank Ryan Hili for crucial initial development of the project that became the main focus of my graduate work. I thank Rick McDonald for mentoring me during my rotation, and I thank both him and Lynn

McGregor for enthusiastic support in my first two years. From my third year on, I have had the great privilege of working closely with Phillip Lichtor, Dennis Dobrovolsky, Adrian Berliner, and Jonathan Chen in our project area. Together, we discussed science, looked after one another’s experiments, shared material, and worked hard to meet demanding project deadlines. I also greatly appreciate the other members of our close-knit chemistry subgroup, namely Dmitry

Usanov, Juan Pablo Maianti, Alix Chan, Beverly Mok, and Alexander Peterson, for valuable discussions throughout the years. All Liu lab members contributed to making the lab a supportive and enjoyable workplace, and I thank everyone for that. In addition to the chemistry subgroup members mentioned above, I especially enjoyed the friendship, support, and advice of

viii Xue Gao, John Zuris, Weixin Tang, Bill Kim, Chihui An, Holly Rees, Jeff Bessen, Tim Roth,

David Thompson, and Johnny Hu. I also thank Aleks Markovic and Keenan Holmes for keeping the lab running smoothly.

I would like to thank my Graduate Advisory Committee members, Professor Jack W.

Szostak and Professor Emily P. Balskus, for their valuable mentorship in the form of both scientific advice and career guidance over the years.

I am indebted to a number of collaborators and contractors for their hard work. Dr. Sunia

Trauger collected oligonucleotide mass spectrometry data as a service at the Harvard FAS Small

Molecule Mass Spectrometry facility. Dr. Yun Yang and coworkers at WuXi AppTec synthesized the modified phosphoramidites under a service contract. Dr. Doug Davies and his colleagues at Beryllium Discovery, as well as Dr. Romain Rouet, our collaborator in the

Doudna lab at UC Berkeley, made tremendous efforts in their attempts to obtain crystals of

Evo5:PCSK9 complexes.

In addition, I thank Dr. Kelly Arnett at the Center for Macromolecular Interactions in the

Department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School for access to Biacore SPR instrument, and thank Dr. Arnett, as well as Dr. Matt Blome and Dr.

Brian Lang at GE Healthcare Life Sciences, for technical assistance on SPR experiments. I thank

Dr. Kendra Hailey (Jennings lab, UCSD) for the IL-33 expression , and Prof. Markus

Seeliger (Stony Brook) for recombinant PD-1 and PD-L1 proteins.

I would also like to thank Dr. Ana Glavan from the Whitesides lab at Harvard for leading a collaboration to make low-cost analytical devices based on direct synthesis of DNA on paper, which I participated in and thoroughly enjoyed.

ix My graduate work was supported by the DARPA Fold Fx program (N66001-14-2-4053), the NIH R01 EB022376 (formerly R01 GM065400) and R35 GM118062, the Howard Hughes

Medical Institute, the Y. Kishi Graduate Prize in Chemistry and Chemical Biology sponsored by the Eisai Corporation, and the Chemistry and Chemical Biology department at Harvard

University. I am grateful to these sources of financial support.

Lastly, I thank my family for their love, patience, and encouragement throughout my years at Harvard. I thank my parents for relentlessly instilling in me the value of both moral integrity and intellectual curiosity, even in a time and society where they themselves were not necessarily rewarded for those qualities. I thank my wife, Lan Luo, for her wonderful companionship.

x List of Figures

Figure 1.1 | Backbone-modified nucleic acids...... 5

Figure 1.2 | with side chain functionalizations ...... 7

Figure 1.3 | Novel nucleobase pairs orthogonal to canonical Watson-Crick pairs ...... 9

Figure 1.4 | Non-canonical sugar moieties in nucleic acids ...... 12

Figure 1.5 | DNA ligase-catalyzed synthesis of non-natural nucleic acids ...... 16

Figure 1.6 | Non-natural backbone analogs of peptides ...... 18

Figure 1.7 | Non-enzymatic templated polymerization of nucleic acid monomers ...... 22

Figure 1.8 | Non-enzymatic templated polymerization of nucleic acid oligomers ...... 24

Figure 1.9 | DNA-templated synthesis of non-nucleic-acid polymers by sequential addition of monomers ...... 28

Figure 1.10 | Autonomous DNA-based polymer assembler systems ...... 30

Figure 1.11 | Polymerization of DNA template-linked building block arrays ...... 30

Figure 1.12 | Enzyme-free translation of DNA into sequence-defined non-nucleic-acid polymers

...... 32

Figure 2.1 | Reaction scheme for DNA ligase-mediated translation of DNA templates into sequence-defined highly functionalized nucleic acid polymers (HFNAPs) ...... 67

Figure 2.2 | Structures of 5’-phosphorylated trinucleotide building blocks for HFNAP library synthesis ...... 68

Figure 2.3 | Translation of libraries of randomized DNA templates into HFNAPs...... 69

xi Figure 2.4 | A complete cycle of HFNAP translation, HFNAP strand isolation, and reverse translation back into DNA faithfully recovered sequence information from the original DNA templates...... 70

Figure 2.5 | Experimental scheme for HFNAP translation, affinity selection, and reverse translation ...... 71

Figure 2.6 | PCSK9 binding selection progress ...... 72

Figure 2.7 | Sequence homology of selection-enriched PCSK9-binding HFNAPs ...... 73

Figure 2.8 | Selection of HFNAPs that bind IL-6, IL-33, PD-1, and PD-L1 proteins ...... 74

Figure 2.9. Affinity characterization of selection-enriched PCSK9-binding HFNAPs ...... 75

Figure 2.10 | Structure and PCSK9-binding activity of PCSK9-A5 ...... 76

Figure 2.11 | Characterization of IL-6-binding HFNAPs selected from a random library ...... 78

Figure 2.12 | SPR characterization of binding between IL-6 and HFNAPs...... 79

Figure 3.1 | Evolution of a PCSK9-binding polymer ...... 90

Figure 3.2 | Cheater suppression strategy ...... 91

Figure 3.3 | Structure and PCSK9-binding activity of PCSK9-Evo5, PCSK9-A5, and their truncated variants ...... 93

Figure 3.4 | Side-chain structure-activity relationship of PCSK9-Evo5 ...... 94

Figure 3.5 | Milligram scale chemical synthesis of Evo5 ...... 95

Figure 3.6 | SPR sensogram characterizing binding kinetics between PCSK9-Evo5-syn and surface-immobilized PCSK9 protein ...... 96

Figure 3.7 | Electrophoretic mobility shift assay (EMSA) confirms binding of Evo5-Fluor to

PCSK9 protein ...... 97

Figure 3.8 | Evo5 disrupts PCSK9-LDLR binding in vitro ...... 98

xii Figure 3.9 | Effect of truncating PCSK9 on its binding affinity to PCSK9-Evo5-syn ...... 99

Figure 3.10 | Secondary structure predictions of HFNAPs based on their DNA sequences ...... 100

xiii

Chapter 1

Introduction: Nucleic acid-templated synthesis and functional selection of sequence-defined

synthetic polymers

Parts adapted from: Z. Chen & D. R. Liu. Nucleic Acid-Templated Synthesis of Sequence-Defined

Synthetic Polymers. in Sequence-Controlled Polymers (ed. Lutz, J.-F.) 49–90 (Wiley, 2018).

1 1.1 – Overview

Living systems use templated polymerization to replicate DNA, synthesize RNA from

DNA during transcription, and synthesize protein from RNA during translation. This flow of information links heritable genes with functional gene products. These central dogma processes, together with mechanisms such as mutation and recombination that diversify gene populations, are also the molecular basis of Darwinian evolution and give rise to biopolymers with a myriad of folded structures and functions, including those needed to support life.

For decades, researchers have used Nature’s polymerization machinery to synthesize sequence-defined biomimetic polymers with non-natural elements. One of the earliest examples was Eckstein and coworkers’ polymerase-catalyzed synthesis, reported in 1968, of RNA strands with phosphorothioates in the backbone. With this synthesis came the discovery that nucleic acid polymers with phosphorothioate linkages are more resistant to nucleases1,2, an early demonstration of a biotechnologically useful property – still used in nucleic acid reagents and therapeutics today – conferred by non-natural chemical moieties. By the late 1980s, the development of methods to introduce more diverse functional groups through polymerase- catalyzed incorporation of carrying functionalized side chains3 and ribosomal incorporation of non-natural amino acids4 was well under way.

In 1990, Gold, Szostak, Joyce, and their respective coworkers independently reported the application of Darwinian evolution to select functional RNA molecules from random pools, a technology now known as SELEX.5–7 Those studies were the first to use minimal biochemical machinery to carry out laboratory evolution on biopolymers outside the constraints of living organisms, raising the possibility of performing evolution on non-natural polymers. This prospect was promptly fulfilled four years later with the selection of RNA with 2’-

2 modified moieties8 (which generally confer resistance9,10 and later enabled the development of the anti-VEGF pegaptanib, the first FDA-approved aptamer therapeutic) and DNA aptamers with C5-modified .11 The emergence of in vitro genotype-phenotype linkage techniques, such as mRNA display12,13, further expanded the scope of laboratory evolution to include non-genetic polymers. Since then, enabling an ever-broadening repertoire of non-natural nucleic acids and peptide analogs to be evolved in the laboratory has been a major driving force behind research in DNA-templated polymer synthesis. Engineered or evolved biochemical tools, such as polymerases14,15 and tRNA-charging ribozymes16, that are dedicated to the incorporation of non-natural building blocks have become increasingly common. Enzyme- free methods that direct sequence-defined monomer incorporation through DNA templating have also seen major advancements, and may one day enable the evolution of polymers of researcher- defined backbone structure.17

Studying templated synthesis of non-natural polymers has also deepened our insights into the origins of life-like systems. The difficulties of synthesizing and polymerizing

RNA under simulated prebiotic conditions have prompted researchers to suggest alternative nucleic acids with non-ribose backbones (“xenonucleic acids” or XNAs) as primordial genetic materials. Through studying their templated synthesis (both enzyme-catalyzed and spontaneous) and performing evolution experiments, researchers have been able to assess the potential of many XNA scaffolds to serve as heritable information carriers with functional properties.18–21

In this chapter we review the progress in synthesizing sequence-defined non-natural polymers through nucleic-acid-templated polymerization, with an emphasis on studies that use the polymerization reaction in applications such as directed evolution.

3 1.2 – Enzymatic templated syntheses of non-natural nucleic acids

1.2.1 – Polymerase-catalyzed syntheses of backbone-modified nucleic acids

Polymerase-catalyzed syntheses of nucleic acid polymers containing non-canonical chemical moieties date back to 1968, when Matzura and Eckstein synthesized a polyribonucleotide containing alternating phosphodiester and phosphorothioate (Figure 1.1a) linkages using a DNA-dependent RNA polymerase.1 A report on the enzymatic synthesis of fully phosphorothioate-linked RNA analog soon followed.2 Many DNA and RNA were subsequently found to catalyze the formation of phosphorothioate linkages (reviewed in Ref. 22).

Early applications focused on nucleic acid sequencing.23,24 More recently, the discovery of many aptamers containing phosphorothioate linkages has been reported.25–33

Huang and coworkers have reported the synthesis of DNA and RNA partially consisting of phosphoroselenoate linkages (Figure 1.1b) by Klenow fragment-catalyzed primer extension and T7 RNA polymerase-catalyzed transcription, respectively.34,35 The site-specifically incorporated selenium atoms facilitate phase and structure determination by X-ray crystallography through multi-wavelength anomalous dispersion (MAD).

Shaw and coworkers first reported Vent DNA polymerase-catalyzed PCR and T4 or

Klenow (exo-) DNA polymerase-catalyzed primer extention reactions that accepted α-[P- borano]-dNTP as substrates to produce partially36 or fully37 boranophosphate-linked products

(Figure 1.1c). The same group later reported T7 RNA polymerase-catalyzed transcription into partially boranophosphate-linked RNA products38, with applications in aptamer discovery39 and the production of siRNA with enhanced activity.40,41

Multiple groups reported templated enzymatic synthesis of oligonucleotides containing methylphosphonate linkages (Figure 1.1d) in the early 1990s. Enzymes capable of consecutive

4 incorporation of the modified α-[P-methyl]-TTP include HIV and AMV reverse transcriptases,

Klenow DNA polymerase, and terminal deoxynucleotidyl-transferase.42–44

a b c d O O O O S P O Base Se P O Base O P BH3 Base O P Me Base O O O O O O O O

O R O R O R O

R = H or OH R = H or OH R = H or OH

e f g O O O O P O O P O O P O Base Base Base

O O O O O

O R O O

R = H or OH

Figure 1.1 | Backbone-modified nucleic acids. (a) Phosphorothioate-linked DNA/RNA. (b) Phosphoroselenoate-linked DNA/RNA. (c) Boranophosphate-linked DNA/RNA. (d) methylphosphonate- linked DNA. (e) Phosphonomethylene-linked DNA/RNA. (f) Phosphonomethyl-linked DNA. (g) Phosphonomethyl-linked (TNA).

Gilham and coworkers studied the templated synthesis of oligonucleotides containing phosphonomethylene linkages, in which the phosphodiester 5’-oxygens were replaced by methylene units (Figure 1.1e). They found that the 5’-deoxa analog of ATP could be incorporated by T3 RNA polymerase into RNA transcripts45, while the analog of the deoxynucleotide TTP could be incorporated by M-MLV reverse transcriptase (RNase H-, commercialized as Superscript II) in reverse transcription reactions46.

Herdewijn and coworkers reported primer extension reactions to form oligonucleotides with phosphonomethyl linkages, which lengthen the repeating unit in the polymer by one methylene group. More than twenty consecutive phosphonomethyl analogs of A and C (Figure 1.1f) or up to ten threosyl A nucleotides (Figure 1.1g) were 5 incorporated by Therminator DNA polymerase (an engineered variant of the 9°N DNA polymerase) in DNA-templated primer extension reactions. Incorporation of phosphonomethyl T and U deoxyribonucleotides were less efficient.47,48

Together, these studies have demonstrated that many modifications to the phosphodiester backbone can be incorporated into polymerase-synthesized nucleic acids. In particular, α-P-thio-, seleno-, and borano-substituted nucleic acid polymers are synthesized with relatively high efficiency and have found interesting applications in basic research and .

1.2.2 – Polymerase-catalyzed syntheses of nucleobase-modified nucleic acids

Derivatives of canonical triphosphates with substituents at position

C5 of or C7 of 7-deazapurines (Figure 1.2a-d) are empirically known to be generally good substrates for a number of DNA polymerases, particularly family B archaeal DNA polymerases such as KOD, Pfu, and 9°N polymerases. A wide variety of substituted nucleobases have been incorporated into nucleic acid polymers by polymerase-catalyzed primer extension or

PCR on DNA templates (reviewed in Ref. 49,50). Early reports date back to 1981, when Langer and coworkers incorporated C5-biotin-labeled dU nucleotides with T4 DNA polymerase and E. coli DNA polymerase I (Ref. 3). Double-stranded DNA fully modified on all four nucleobases

(C, U, 7-deaza-A, and 7-deaza-G) could be produced by PCR.51 Substituents as large as a full- length folded protein (e.g. the 40 kDa horseradish peroxidase)52 and a 40-mer oligonucleotide53 could be conjugated to dUTP at the C5 position and subsequently incorporated into oligonucleotides. In a series of recent structural studies, Marx and coworkers confirmed that cavities in KlenTaq polymerase can accommodate substituents at these major-groove-directed positions and, upon conformational change of some polymerase residues, could allow bulky

6 substituents to project far out of the active site.54–57 Engineered or evolved mutants of DNA polymerases have further improved the efficiency of incorporating nucleobases with bulky side chains such as cyanine dyes58,59, nitroxyl radicals60, and amphiphilic hybrid side chains containing long alkyl and polyethylene glycol chains.61 Polymerase-catalyzed incorporation of

C5-alkynyl-substituted nucleobases followed by copper-catalyzed alkyne-azide cycloaddition can also be used to install side chain functionalities.62,63 Starting with libraries of modified oligonucleotides, in vitro selection campaigns have produced many aptamers11,62–71 and catalysts72–74 carrying substitutions at C5 positions.

a b c d e O NH2 NH2 R O R NH2 R R HN N N HN N N R N N N O N O N N H2N N N

Figure 1.2 | Nucleobases with side chain functionalizations. (a) C5-modified U. (b) C5-modified C. (c) C7-modified 7-deaza-A. (d) C7-modified 7-deaza-A. (e) C8-modified A.

Ribonucleotides substituted at position C5 of (Figure 1.2a) have also been incorporated into RNA transcripts. Langer and coworkers’ early report used T7 and E. coli RNA polymerases to incorporate C5-biotin-labeled U nucleotides3. Eaton and coworkers have used T7

RNA polymerase to incorporate a variety of C5-substituted U nucleotides.75,76

Other positions on canonical nucleobases are less commonly used to introduce substitutions. Hocek and coworkers reported that C8-bromo- and methyl-substituted dATP were good substrates for primer extension with Klenow, Dynazyme II, Vent, and Pwo DNA polymerases, while the C8-phenyl counterpart was incorporated less efficiently.77 Perrin and coworkers reported a series of DNAzymes with histamine moieties attached to the C8 position of

7 (Figure 1.2e).78–81 T7 RNA polymerase-catalyzed incorporation of nucleotides carrying substitutions on N6 of or N4 of has also been reported.82,83

Novel nucleobases pairs orthogonal to the canonical Watson-Crick pairs have been pursued by many researchers as new letters in an expanded genetic alphabet. Leading practitioners in this field have written detailed accounts of the orthogonal nucleobases’ development.84–86 The current state of the art is represented by the Ds:Px pair discovered by

Hirao and coworkers (Figure 1.3a), the NaM:TPT3 pair discovered by Romesberg and coworkers

(Figure 1.3b), and the P:Z pair discovered by Benner and coworkers (Figure 1.3c). For each of the pairs, triphosphates of the corresponding deoxynucleosides were incorporated in PCR with at least 99.8% fidelity per cycle under optimal conditions.87–89 Highly efficient transcription into

RNA containing ribonucleotides of these non-natural bases (with Pa replacing Px opposite Ds;

Figure 1.3a) has also been reported.90–94 The robustness of these base pairs toward replication was further demonstrated in a number of applications. In vitro selections on DNA libraries that included either the dP:dZ pair95–98 or the dDs:dPx pair99,100 have produced aptamers in which the novel bases were essential. The dNaM:dTPT3 pair (or an analog pair, dNaM:d5SICS; Figure

1.3b) was successfully propagated on a plasmid in E. coli101–104 and was demonstrated to support transcription and translation to produce functional protein105, and screens performed in E. coli have identified base pairs with further improved in vivo properties.106 Moreover, triphosphates of dPx, d5SICS, and dTPT3 with substituents on the novel nucleobases have been shown to retain high compatibility with PCR, enabling the incorporation of additional functional groups.88,89,92,107–109 Another promising series of non-natural systems featuring four hydrogen bonds between the pairing nucleobases has been reported by Minakawa and

8 coworkers, though the fidelity of their current best pair ImNN:NaOO (~99.5% per cycle of replication in PCR; Figure 1.3d) is lower than that of the other systems.110

a b R

S N

N N N S O O N N N N H O O Me S S R = H or -CH(OH)CH2OH

Ds Px Pa NaM TPT3 5SICS

c d H H N N H O O2N N H O N N N H N N H N N N N H N O H N N N H O H H N O Z P ImN NaO

Figure 1.3 | Novel nucleobase pairs orthogonal to canonical Watson-Crick pairs. (a) Ds, Px, and Pa. (b) NaM, TPT3, and 5SICS. (c) Z and P. (d) ImNN and NaOO.

Together, these studies have established that a large variety of modified canonical nucleobases and several pairs of novel orthogonal nucleobases are readily accepted by polymerases as substrates, allowing the incorporation of diverse non-natural functionalities into polymerase-synthesized nucleic acid polymers.

1.2.3 – Polymerase-catalyzed syntheses of sugar-modified nucleic acids

The incorporation of 2’-modified ribose moieties (Figure 1.4a) in nucleic acid polymers has been extensively studied, in part for the development of nucleic acid-based therapeutics.

9 Wild-type T7 RNA polymerase can incorporate 2'-amino-2'-deoxypyrimidine and 2'-fluoro-2'- deoxypyrimidine nucleotides into RNA in transcription reactions111, enabling SELEX on mixed

2’-modified pyrimidine/2’-OH RNA libraries.8,112,113 One such selection114 on a 2’-fluoro pyrimidine/2’-OH purine RNA library produced an inhibitor of vascular endothelial growth factor (VEGF) that was further derivatized to produce pegaptanib, the only approved aptamer therapeutic to date.115 Engineered mutants of T7 RNA polymerase and Syn5 RNA polymerase were reported to have superior activity in incorporating 2’-fluoro nucleotides116,117, and mutants of T7 RNA polymerase were engineered or evolved to incorporate the larger 2’-O-methyl nucleotides (which confer nuclease resistance more effectively than 2’-fluoro ones)118–121, though sequence-general transcription into fully 2’-modified polymers containing all four canonical nucleobases was only recently reported and still suffered from reduced efficiency, which was attributed mostly to inefficient transcriptional initiation, rather than elongation, with modified building blocks.117,120 Nonetheless, aptamer selection from partially 2’-OMe-modified RNA libraries has been demonstrated.122–124 Notably, by spiking in a small amount of 2’-OH GTP into their 2’-OMe NTP transcription reaction to improve transcription yield, Burmeister and coworkers performed SELEX on sequence-general, mostly 2’-OMe-modified RNA, and then demonstrated that a fully 2’-OMe-modified RNA corresponding to their SELEX-enriched sequence was a genuine VEGF aptamer.122 Other 2’-modified ribonucleotides that have been incorporated by T7 RNA polymerase mutants include 2’-thio, 2’-azido, 2’-hydroxymethyl, and

2’-methylseleno nucleotides.60,116,125–127

In a complementary line of research, the ability of DNA polymerases to incorporate 2’- modified ribonucleotides in DNA-templated primer extension reactions has been explored. Many thermostable DNA polymerases, including Pfu (exo−), Vent (exo−), Deep Vent (exo−), and

10 UITma (Ref. 128), could synthesize fully 2’-fluoro oligonucleotides, and engineered variants of the 9°N DNA polymerase (“Therminator variants”) were found to synthesize 2’-fluoro oligonucleotides with high efficiency and accuracy.129 Holliger and coworkers found a mutant of

Tgo DNA polymerase that synthesized fully 2’-fluoro and 2’-azido modified oligonucleotides on

DNA templates.130 Romesberg and coworkers evolved the Stoffel fragment of Taq polymerase and discovered a mutant, SFM4-6, that could synthesize a sequence-general 60-mer fully 2’-O- methyl complementary strand on a DNA template.131 They also reported a variant SFM4-9 that can reverse transcribe a fully 2’-OMe-modified template into complementary DNA.131 Using

SFM4-6 and SFM4-9, they demonstrated SELEX of fully 2’-OMe-modified aptamers.132 In addition, a variant SFM4-3 was reported to amplify DNA by PCR into partially 2’-OMe or 2’- fluoro double-stranded oligonucleotides131, and was used to selected partially 2’-fluoro modified aptamers.133 More recently, SFM4-3 was demonstrated to produce 2’-azido, 2’-chloro, or 2’- amino ribose (Figure 1.4a) or arabinose-backboned (Figure 1.4f) double-stranded products by primer extension or PCR.134

Matsuda, Minakawa, and coworkers showed that a DNA template could be amplified by

PCR with KOD dash DNA polymerase into sequence-general partially or fully 4’-thio- substituted (Figure 1.4b) double-stranded DNA.135,136 Using this technology, they demonstrated that dsDNA with partial (but not full) 4’-thio substitutions could be used as template for transcription, either in vitro by T7 RNA polymerase or in mammalian cell culture, into natural

RNA.135,137 On the other hand, only 4’-thio-substituted pyrimidine triphosphates, but not their purine counterparts, were accepted by T7 RNA polymerase for efficient transcription from a natural dsDNA template.138,139 Primer extension and PCR amplification

11 using 4’-seleno-substituted TTP along with natural dATP, dGTP and dCTP was also reported, though the efficiency was quite low.140

Isonucleic acid, apionucleic acid, and 2’-5’-linked 3’-deoxynucleic acid are regioisomers of natural DNA. Single isonucleic acid monomers (Figure 1.4c) could be incorporated by many

DNA polymerases in primer extension assays, but consecutive incorporation was only possible using Therminator and with only up to two monomers.141,142 Sequence-general DNA-templated primer extension to synthesize apionucleic acid (Figure 1.4d), on the other hand, proceeded

a b c d e Base Base Base Base O O O Base O O O X O O O

O R O O O R O

R = NH2, F, Cl, OMe, SH, X = S or Se R = H or OMe N3, CH2OH, SeMe

f g h i Base Base Base Base O O O R O O O O O O

O X O N O O O H

ANA: R = OH LNA: X = O FANA: R = F 2'-amino-LNA: X = NH 2'-seleno-LNA: X = Se

j k l m n Base Base Base Base O O O O Base O O O O

O O O O O

Figure 1.4 | Non-canonical sugar moieties in nucleic acids. The sugar moieties are: (a) 2’-modified ribose. (b) 4’-thiodeoxyribose and 4’-selenodeoxyribose. (c) 2’-deoxy-2’-nucleobase-D-arabitol (the sugar moiety in isonucleic acid). (d) Sugar moiety in apionucleic acid. (e) 3’- and 3’-O-methylribose. (f) Arabinose and 2’-fluoroarabinose. (g) 2’-O,4’-C-methylene-linked ribose (the sugar moiety in ) and its amino and seleno analogs. (h) Sugar moiety in α-L-LNA. (i) 2’-ONHCH2-4’-linked ribose. (j) Threose. (k) 1,5-anhydrohexitol. (l) Sugar moiety in cyclohexenyl nucleic acid (CeNA). (m) Sugar moiety in flexible nucleic acid. (n) Sugar moiety in (S)-glycerol nucleic acid.

12 efficiently with Therminator DNA polymerase, especially at lower temperatures (down to 34 °C) in the presence of Mn2+.143 Holliger and coworkers combined many mutations previously shown to expand the substrate scope of Tgo DNA polymerase58,130,144 and identified a variant that efficiently polymerized 2’-5’-linked 3’-deoxy or 3’-OMe nucleic acids (Figure 1.4e) in primer extension reactions.145

Peng and Damha studied the ability of DNA polymerases to synthesize 2’-deoxy-2’- fluoro-β-D-arabinonucleic acid (FANA; Figure 1.4f) in DNA-templated primer extension reactions. Deep Vent (exo-), 9°N, Therminator, and Phusion DNA polymerases all efficiently catalyzed the primer extension reaction to incorporate all four FANA nucleotides A, G, T, and C.

Among them, Phusion showed the highest sequence fidelity.146

Reports from Wengel, Kuwahara, and their respective coworkers collectively showed that

KOD, Phusion, or 9°Nm DNA polymerase-catalyzed primer extension and PCR reactions on

DNA templates could produce nucleic acids products whose backbones were partially 2’-O,4’-C- methylene-linked (“locked nucleic acid” or LNA; Figure 1.4g). The A, T, 5-methyl-C, and 5- ethynyl-C LNA nucleotides were incorporated in these reactions, though consecutive incorporation of more than six LNA nucleotides was difficult.147–153 Both the Kuwahara and

Wengel groups have selected aptamers with partial LNA backbone compositions from naïve libraries.154–156 The A, 5-methyl-C, and 5-ethynyl-C LNA nucleotides were also found to be incorporated by T7 RNA polymerase into transcripts.149,151,157 In addition, analogs of LNA, including 2’-amino-LNA, 2’-seleno-LNA (Figure 1.4g), α-L-LNA (Figure 1.4h), and 2’-

ONHCH2-4’-bridged LNA (Figure 1.4i) nucleotides could be incorporated by some DNA polymerases in primer extension reactions.158–160

13 The ability to catalyze the DNA-templated synthesis of threose nucleic acid (TNA;

Figure 1.4j) was first reported for Vent (exo-) and Deep Vent (exo-) polymerases.161,162

Therminator DNA polymerase was later found to be a more efficient catalyst163,164, enabling

Chaput and coworkers to perform the first iterated selection on TNA using a DNA display strategy that produced a TNA aptamer to thrombin, though only three nucleotides – G, T, and D

(diaminopurine) – were represented in that TNA library.165 Chaput and coworkers subsequently achieved efficient sequence-general copying of DNA into TNA with Therminator from a DNA library containing dA, dC, dT, and 7-deaza-dG (Ref. 166) and, more recently, from unmodified

DNA templates using an engineered KOD polymerase mutant (KOD-RI) 167,168 or an evolved

9°N polymerase mutant.169 KOD-RI-mediated TNA synthesis, together with Bst polymerase- mediated high-fidelity reverse transcription of TNA into DNA, supported the discovery of TNA aptamers to thrombin from an unbiased TNA library.170 KOD-RI-mediated incorporation of base-modified TNA nucleotides has also been demonstrated.171,172

Herdewijn and coworkers investigated the enzymatic polymerization of nucleotides with six-membered-ring suger moieties. In primer extension reactions on a DNA template, Vent

(exo-) DNA polymerase and the M184V mutant of HIV reverse transcriptase was found to incorporate up to eight consecutive 1,5-anhydrohexitol nucleotides (“hexitol nucleic acid” or

HNA; Figure 1.4k)173,174 or up to seven cyclohexenyl nucleotides (CeNA; Figure 1.4l)175.

Family B archeal DNA polymerases have been reported to possess modest abilities to incorporate nucleotides with acyclic backbones in DNA-templated primer extension assays.

Using Therminator polymerase, up to seven monomers of flexible nucleic acid (Figure 1.4m) of both the (R) and (S) configurations176 and up to five monomers of (S)-glycerol nucleic acid

(Figure 1.4n)177 could be incorporated.

14 Holliger and coworkers evolved Tgo DNA polymerase for XNA synthesis capabilities through a screen based on compartmentalized self-tagging.144 encoding polymerase mutants were each compartmentalized with their corresponding polymerases in water-in-oil emulsions, and a polymerase capable of using XNA monomers to extend a biotinylated primer on the plasmid template would allow its encoding plasmid to be retained by streptavidin pulldown. With this technique, they evolved Tgo variants that catalyzed the DNA-templated synthesis of HNA (Figure 1.4k), CeNA (Figure 1.4l), LNA (Figure 1.4g), arabinonucleic acid

(ANA; Figure 1.4f), FANA (Figure 1.4f), and TNA (Figure 1.4j). Aided by another mutant of

Tgo that was engineered to reverse-transcribe XNAs back to DNA, Holliger, DeStafano, and coworkers have discovered RNA- and protein-binding HNA144 and FANA178 aptamers, as well as catalysts of ANA, FANA, HNA, and CeNA backbones that possess RNA endonuclease, RNA ligase, or XNA-XNA ligase activities.179,180

In summary, many nucleotides with non-natural sugar moieties can be incorporated to varying degrees into nucleic acid polymers by commonly used polymerases, and especially by polymerase mutants such as Therminator that have been engineered to exhibit broadened substrate tolerance. Moreover, dedicated engineering and evolution efforts have produced polymerase mutants that incorporate nucleotides of high-value ribose analogs (such as 2’-O- methyl ribose) and distant structural relatives of ribose (such as threose and hexitol) efficiently enough to support in vitro selection experiments.

1.2.4 – Ligase-catalyzed syntheses of non-natural nucleic acids

As an alternative to polymerase-catalyzed reactions, our group have devised a strategy in which 5’-phosphorylated short oligonucleotides are polymerized on a DNA template by T4 DNA

15 ligase. Using this method, we polymerized trinucleotides each carrying a chemically functionalized side chain on a nucleobase to achieve the templated synthesis of DNA strands with up to eight different side chains at defined positions (Figure 1.5), which would have been difficult to achieve with a polymerase-based method. We further demonstrated a mock SELEX- like iterated selection on a library of such diversely functionalized DNA to enrich a known binder to a protein target.181 Hili and coworkers have expanded the building block set to include sixteen different functionalized side chains on pentanucleotide building blocks182, and recently demonstrated in vitro selection.183 They have further extended this strategy to synthesize sequence-defined peptide arrays on a DNA scaffold.184,185

a R R R

P P P R R R

P P P R R R

P T4 DNA ligase R R R

b

NH2 NH O 2 H N N N N N NH HN H N O N O O N

NH O 2 H H N N CF3 N N H S O N CF3

Figure 1.5 | DNA ligase-catalyzed synthesis of non-natural nucleic acids. (a) Reaction scheme. 5’- Phosphorylated building blocks carrying diverse functionalities through nucleobases are ligated on a DNA template by T4 DNA ligase, forming a sequence-defined highly functionalized nucleic acid polymer. (b) Examples of functionalized nucleobases compatible with ligase-catalyzed synthesis.

16 1.3 – Ribosomal synthesis of non-natural peptides

Ribosomal incorporation of isolated non-canonical amino acids into proteins by genetic code expansion is a fairly mature technology and has been reviewed elsewhere.186,187 Of greater interest to this chapter are the in vitro ribosomal syntheses of polypeptides carrying primarily non-natural side chains and peptide analogs with non-natural backbones. Research in this area requires extensive genetic code reprogramming and has been enabled by two lines of technological breakthroughs: 1) the development of reconstituted cell-free translation systems such as Protein Synthesis using Recombinant Elements (PURE)188; and 2) progress in charging tRNAs with non-natural building blocks, which has advanced from chemical189 and aminoacyl- tRNA synthetase (AARS)-catalyzed186,187,190 methods to a family of in vitro selected

(“flexizymes”) with extremely broad substrate scope.16,191 In addition, methods for covalently attaching a ribosomal translation product to its genetic template, exemplified by mRNA display12,13 and its many variants192–194, have enabled the in vitro selection of bioactive molecules from large randomized libraries of ribosomally translated peptides.

Ribosomal synthesis of short peptides with predominantly non-natural side chains under a reassigned genetic code was first demonstrated by Blacklow and coworkers, who incorporated five consecutive 2-amino-4-pentenoic acid residues in a heptapeptide, as well as three different non-natural amino acids in a pentapeptide, using reconstituted translational machinery.195

Szostak and coworkers greatly expanded the length and side chain diversity of such peptides by reassigning 35 sense codons to 12 non-natural amino acids and synthesizing peptides containing up to 10 different non-natural building blocks.196 Building on their extensive survey of the substrate scope of natural E. coli AARS enzymes190 and the ribosomal translation machinery197,

Szostak and coworkers synthesized an mRNA-displayed library of macrocyclic peptides

17 containing 12 non-natural building blocks, including one α,α-disubstituted amino acid. Selection

app of this library for binding to thrombin produced a potent inhibitor (Ki = 20 nM) in which four out of the ten building blocks in the randomized region were non-natural.198

The possibility of incorporating multiple consecutive N-methyl amino acids (Figure 1.6a) into a ribosomal translation product was first suggested by Roberts and coworkers199 and definitively confirmed by Suga and coworkers, who showed in a PURE-like system that N- methyl derivatives of amino acids with uncharged, non-bulky side chains were generally accepted by the ribosome, and that consecutive incorporation of up to ten N-methyl amino acids was possible.200 Independently, using a chemical N-methylation method for aminoacylated tRNA developed by Merryman and Green201, Szostak and coworkers demonstrated consecutive incorporation of three N-methyl amino acids.202 Suga, Murakami, and coworkers subsequently reported successful selections of E6AP ubiquitin ligase inhibitors (Kd < 1 nM) and a VEGFR2 antagnoist (IC50 = 20 nM in HUVEC cells) from mRNA-displayed macrocyclic peptide libraries that utilized several N-methyl amino acid building blocks in addition to proteinogenic amino acids.203,204 In their library syntheses, cyclization was achieved by the reaction between a cysteine residue near the C-terminus and a chloroacetyl moiety incorporated during translational

a b c R O R O R O Me N N HN N N HN Me O O R R O R

d R O e R O f R O R O H O N O N N N H H H O R O R Figure 1.6 | Non-natural backbone analogs of peptides. (a) N-methyl peptide. (b) Peptoid (polymer of N-substituted glycine). (c) Polypeptide of cyclic N-alkyl amino acids. (d) Polyester (polymer of α-hydroxy acids). (e) D-peptide. (f) β3-peptide.

18 initiation, which can be reprogrammed in vitro to accept a wider range of substrates than translational elongation.205–209

Ribosomal incorporation of other N-alkyl amino acids has also been studied in reconstituted translation systems. Forster and coworkers found no evidence of incorporation for

N-butylalanine and N-butylphenylalanine where their N-methyl counterparts were readily accepted.210 On the other hand, Suga and coworkers showed that many N-substituted glycines

(i.e. building blocks of peptoids; Figure 1.6b) were readily accepted by the ribosome, and that consecutive incorporation of up to six such building blocks was possible, though the efficiency greatly diminished with length.211 Murakami and coworkers demonstrated the incorporation of up to eight consecutive cyclic N-alkyl amino acids (proline and its analogs; Figure 1.6c) in an mRNA-displayed library.212 The full-length display efficiency was significantly boosted when the elongation factor EF-P, which alleviates ribosomal stalling at polyproline sequences in vivo213,214, was added to the translation system. Interestingly, this effect appeared to be specific to proline-like cyclic N-alkyl amino acids, as the addition of EF-P did not improve incorporation of N-methyl amino acids.215

The ribosome has also been shown to incorporate hydroxy acids into translation products.

Suga and coworkers demonstrated that a polyester chain could be synthesized from up to twelve consecutive α-hydroxy acids (Figure 1.6d).216

Translational elongation with D-amino acids (Figure 1.6e) and β-amino acids (Figure

1.6f) has long been reported to be difficult, especially for consecutive incorporation.197,217–219

Achenbach and coworkers reported the incorporation of up to three consecutive D-amino acids in a peptide.220 The authors attributed their success to their use of tRNAGly, which has higher intrinsic affinity to EF-Tu than the tRNAs used in previous studies, as the carrier for the D-

19 amino acid building blocks. More recently, Suga and coworkers used a tRNAGlu variant to carry

D-amino acid building blocks and optimized the concentrations of IF2, EF-Tu, and EF-G in the in vitro translation reaction mixture, resulting in the consecutive incorporation of up to ten consecutive D-serine residues or up to five consecutive mixed-sequence D-amino acid residues including D-Phe, D-Cys, D-Ser, or D-Ala.221 The addition of EF-P to the translation reaction and the use of an engineered chimeric tRNA, featuring both the D arm from tRNAPro (for interaction with EF-P) and the T stem from tRNAGlu (for interaction with EF-Tu), further increased translational elongation efficiency.222

Collectively, these studies have demonstrated that the ribosomal translation machinery is capable of synthesizing many peptide analogs with non-natural side-chain substitutions, but exhibits less tolerance for certain changes in backbone structure. Improving the ribosome’s ability to synthesize non-natural peptidic polymers will likely require extensive engineering or directed evolution.

1.4 – Non-enzymatic polymerization of nucleic acids

Templated non-enzymatic primer extension using activated mononucleotides as building blocks has been studied for decades as a step towards modeling the spontaneous molecular replication of genetic material in primordial life. For the formation of RNA in its modern form

(i.e. specific 3’-5’ phosphodiester linkages), templated non-enzymatic primer extension in aqueous solution was found to occur with poor sequence generality and regioselectivity223

(Szostak and coworkers recently made great progress toward overcoming this obstacle with their development of 2-aminoimidazole activated nucleotides224), leading Orgel and coworkers to pioneer the study of non-natural phosphoramidate-linked nucleic acids, which they demonstrated

20 could be polymerized at much higher rate and fidelity in their seminal early work.225–228 More recently, to prove the plausibility of this replication process in primitive cells, Szostak and coworkers demonstrated the efficient copying of oligo- into 2’-5’-linked phosphoramidate oligo-dG in model protocell membranes.229 However, sequence-general copying could not be achieved for nucleic acids of the 2’-5’-linked phosphoramidate backbone, while the introduction of modified nucleobases resulted in loss of fidelity.230 In a related study, sluggish kinetics were observed for template-directed polymerization of 2’-amino-2’- deoxythreose nucleotides.231 More promising results were obtained with 3’-5’-linked phosphoramidate DNA. Encouraged by the sub-minute single- extension kinetics reported by Richert and coworkers232, Szostak and coworkers demonstrated that 3’-5’-linked phosphoramidate DNA could be polymerized efficiently on DNA and RNA templates using 3’- amino nucleotides with a 2-methylimidazole-activated phosphate group in the presence of 1- hydroxyethylimidazole233 (Figure 1.7a). Furthermore, the use of activated monomers derived from the modified 2-thiothymidine and 2-thiouridine resulted in major improvements in sequence-general high-fidelity copying, both on an RNA template and on a template composed of 3’-5’-linked phosphoramidate DNA (conceptually a self-copying reaction), as the modified base enhanced base pairing with adenine while suppressing wobble base pairing with .234,235 It should be pointed out that the model studies mentioned above only studied the templated incorporation of a few consecutive nucleotides (no more than fifteen; usually less than seven), with the study of kinetics, fidelity, and sequence generality as the primary goal.

21 e a O P O Base Base O O O O O O N R' R' N P O Base HN SR RSH Base HN 1-hydroxyethylimidazole O NH HS O S O O O Spontaneous O P O Base NH NH primer extension O O O NH2 O Templated dynamic assembly R' R' HN Base HN NH HS O S O O NH NH b O P O Base O O AtO P O 1. Monomer addition O Base 2. TCEP O f O NH Iterated primer extension O Base O O P O Base NHAzoc & deprotection O Base O HN N O H

NaBH3CN NH NH NH O Templated irreversible O base filling Base c O P O Base HN N HN Base 1. Monomer addition O AzocHN 2. TCEP NH NH O O O P O O Iterated primer extension Base HN AtO P O & deprotection O g O O Base O O O Base HN N OH O Base EDC, sNHS or DMTMM HN NH NH d O Base O Templated irreversible O H2N base filling Base O HN N NaBH3CN Base O Templated HN CHO O step-growth NH NH polymerization (no primer)

Figure 1.7 | Non-enzymatic templated polymerization of nucleic acid monomers. (a) Spontaneous primer extension using 3’-amino nucleotides with a 2-methylimidazole-activated phosphate group. (b,c) Iterated primer extension and deprotection cycles using N-protected mononucleotides with an azaoxybenzotriazole-activated phosphate group to achieve 5’-to-3’ (b) or 3’-to-5’ (c) primer extension. (d) Step-growth polymerization of 5‘-amino-3‘-acetaldehyde-modified deoxyribonucleoside analogs through reductive amination. (e) Templated base-filling on a peptidic backbone through dynamic thioester exchange. (f) Templated base-filling on a peptidic backbone through reductive amination. (g) Templated base-filling on a peptidic backbone through amide coupling.

22 Also towards the templated synthesis of 3’-5’-linked phosphoramidate DNA, Richert and coworkers demonstrated a non-enzymatic alternative to the polymerase-based reversible termination primer extension reaction commonly used in DNA sequencing.236 3’-N-protected mononucleotides with an azaoxybenzotriazole-activated 5’-phosphate could be sequentially added opposite a DNA template in iterated cycles of extension and deprotection, achieving 5’-to-

3’ primer extension (Figure 1.7b). Conversely, 5’-N-protected monomers with an activated 3’- phosphate could be sequentially added to achieve templated 3’-to-5’ primer extension (Figure

1.7c), which is difficult for current polymerase-based methods.

Lynn and coworkers demonstrated the polymerization of 5’-amino-3’-acetaldehyde- modified deoxyribonucleoside analogs on a DNA template through reductive amination (Figure

1.7d).237 Multiple turnovers could be achieved by subjecting solid-phase-immobilized templates through cycles of synthesis and denaturation/elution processes.238

As a complementary approach to polymerization reactions in which building blocks are linked to form the polymer backbone, the templated synthesis of nucleic acids can also be achieved with base-filling reactions in which nucleobases are added to abasic sites on an existing backbone as directed by a template. Two groups have independently implemented this concept in

DNA-templated synthesis of nucleic acids with peptidic backbones. Ghadiri and coworkers used reversible thioester exchange reactions between a cysteine-containing backbone and thioester- functionalized nucleobases to achieve dynamic assembly of thioester peptide nucleic acids opposite DNA templates239 (Figure 1.7e), while Heemstra and Liu used reductive amination or amide coupling reactions to achieve DNA-templated irreversible base-filling240 (Figure 1.7f-g).

23 Base Base a O O O O O O EDC, 2-MeIm H H N N H2N PNA2 COOH N PNA2 C N PNA2 C H2N N OH H

b Base Base Base

O NaBH CN O O O O 3 H2N Amido-DNA2 or 4 CHO NH Amido-DNA2 or 4 CH2NH Amido-DNA2 or 4 CH2 H N 2 N N CHO H H 0 or 2

c Base Base Base O O O O O O NaBH3CN N N N H2N PNA4 or 5 CHO NH PNA4 or 5 CH2NH PNA4 or 5 CH2 H2N N N H H H R 2 or 3 R = H or proteinogenic side chain

d Base Base O O O O O hυ N Im P O GNA NHNvoc P O GNA NH P O GNA2 NH N P O O P O NHNvoc 2 2 O O O O O

NH2 NH2 NH2 e O O O N N N NH 366 nm light NH NH O N O N O N N O N O N O 302 nm light DNA6 DNA6 DNA6

f Base Base Base OH R O OH R O OH R O O OH pH 9.5 O HO O B O B O HO O O B DNA OH O DNA O DNA O O P OH 6 6 6 B P O O HO pH 5.5 HO O O 4

g Base Base Base

O O O O O Cu(I) O O O N3 DNAn N DNAn N DNAn P P O N3 O N N N N O O n-2

Figure 1.8 | Non-enzymatic templated polymerization of nucleic acid oligomers. (a) Polymerization of PNA dimers on DNA or PNA templates through amide coupling. (b) Polymerization of 5‘-amino-3‘- acetaldehyde-modified amide-linked DNA oligomers on DNA templates through reductive amination. (c) Polymerization of oligomeric PNA aldehyde on DNA templates through reductive amination. (d) Polymerization of 2’-N-protected GNA dimers with an activated 3’-phosphate on DNA or GNA templates initiated by light-triggered deprotection. (e) Reversible photopolymerization of DNA hexamers on DNA templates through cycloaddition between 5-vinyluracil and bases. (f) pH-controlled templated polymerization of boronic acid-functionalized DNA hexamers through boronate ester formation. (g) Polymerization of 5’-azide, 3’-alkyne-functionalized oligonucleotides through copper-catalyzed click reaction.

24 Non-enzymatic templated synthesis of non-natural nucleic acid polymers may also proceed through the ligation of oligomeric building blocks. Orgel and coworkers first demonstrated that oligo-dC readily facilitated the oligomerization of (PNA)

G dimers into decamers in the presence of EDC and 2-methylimidazole (Figure 1.8a), while oligo-PNA-C facilitated the oligomerization to shorter final lengths and with lower conversion in the presence of EDC and imidazole; the reaction would have been difficult with PNA monomers since they are prone to cyclization.241 Using similar conditions, the same group later reported the copying of DNA templates of other sequences into complementary PNA, where they found a strong dependence of oligomerization efficiency on template sequence identity.242

Lynn and coworkers reported that amide-linked analogs of DNA dimers and tetramers with amine and aldehyde moieties at their termini could be oligomerized on a DNA template with reductive amination to produce polymers with a mixed amine/amide backbone

(Figure 1.8b).237,238 Final products of up to 32 nucleotides in length were synthesized using either only dimers or a mixture of dimers and monomers as building blocks.243

Also using reductive amination, Liu and coworkers reported the sequence-defined polymerization of PNA aldehyde oligomers on DNA hairpin templates. Tetramer and pentamer building blocks could be efficiently polymerized, and additional small side chains were tolerated on the PNA building block backbone (Figure 1.8c).244,245 A DNA-display strategy based on this reaction allowed the same group to demonstrate iterated cycles of DNA-to-PNA translation, affinity selection, and DNA amplification as a directed evolution system for PNA.246

Szostak and coworkers reported the templated non-enzymatic synthesis of glycerol nucleic acids (GNAs) with mixed phosphodiester and phosphoramidate backbone linkages. Both

DNA and phosphodiester-GNA could be used as templates for oligomerization of 3’-imidazole-

25 activated 2’-(Nvoc)amino GNA dinucleotides triggered by in situ photolysis (Figure 1.8d).

Activated GNA mononucleotide and dinucleotide without the photolabile Nvoc protecting group were prone to cyclization and difficult to use as building blocks.247

Saito and coworkers reported a reversible photopolymerization reaction through the [2+2] photocyclization between a vinyluracil and a at the termini of adjacent DNA hexamer building blocks. Irradiation by 366 nm light promoted the DNA-templated oligomerization of up to five hexamers, while irradiation at 302 nm promoted reversion to the starting materials (Figure

1.8e).248

Using hexanucleotide building blocks that each has a DNA backbone, a boronic acid modification at the 5’-end, and a ribose sugar at the 3’-end, Smietana and coworkers reported

DNA-, RNA-, and self-templated pH-controlled reversible polymerization. Higher pH (up to 9.5) promoted formation of boronate ester between the boronic acid and the cis-2’,3’-diol of ribose, driving polymerization, while lower pH (down to 5.5) promoted the reverse reaction (Figure

1.8f). Three pH-driven cycles of assembly and dissociation were demonstrated.249

Tavassoli, Brown, and coworkers used copper-catalyzed click chemistry to perform self- templated chemical ligation of 5’-azide, 3’-alkyne-functionalized oligonucleotides (Figure 1.8g).

Ten such oligonucleotides were assembled in a one-pot reaction into a double-stranded gene of more than 300 base pairs containing eight triazole linkages. Importantly, although the synthetic gene was amplified by a polymerase at significantly lower efficiency than natural DNA, the synthetic gene could nonetheless be cloned into a plasmid, propagated in E. coli, and direct the synthesis of a functional protein product.250

These studies collectively establish that templated non-enzymatic polymerization can support the sequence-defined synthesis of nucleic acid polymers with many backbone structures

26 using a variety of coupling chemistries. Some of these findings also lend credence to the plausibility of non-ribose-based genetic materials. With continued development, some of these systems may serve as the genetic component of model protocells. Efficient non-enzymatic template copying processes may one day enable the in vitro selection of functional nucleic acid polymers currently inaccessible by enzymatic synthesis.

1.5 – Non-enzymatic polymerization of non-nucleic-acid polymers

DNA-templated strategies have also been used to direct the sequence-defined synthesis of polymers structurally unrelated to nucleic acids. Inspired by the mechanism of ribosomal translation, researchers have devised many reaction schemes where building blocks are sequentially added to the terminus of a growing polymer chain in a manner directed by DNA base pairing. While most implementations of these strategies have focused on building DNA- encoded small molecule libraries (reviewed in Ref. 251,252), a few groups have reported the synthesis of fairly long oligomers.

Harbury and coworkers used a sequence-encoded routing strategy (Figure 1.9a), in which

DNA templates were sequentially passed through anticodon DNA-coated cartridges and thereby spatially separated into subpools at each chain-growing chemical synthesis step, to generate a peptoid octamer library (theoretical size ~ 108). Using iterated cycles of synthesis and affinity

253 enrichment, they discovered a binder to the N-terminal SH3 domain of Crk (KD = 16 µM).

27 a

Template library Split onto DNA-barcoded solid support

Template-linked polymer library Add building block Pool

b c d

Figure 1.9 | DNA-templated synthesis of non-nucleic-acid polymers by sequential addition of monomers. (a) Split-pool synthesis of DNA-encoded chemical library by sequence-directed routing. (b) DNA-templated polymer synthesis by strand displacement with progressively longer DNA strands carrying new building blocks. (c) DNA-templated polymer synthesis by iterated cycles of introducing imperfectly hybridizing building-block-carrying strands and removing waste strands with perfectly hybridizing removal strands. (d) DNA-templated polymer synthesis by iterated cycles of adding building blocks at three-way junctions and removing waste strands with scavenger strands.

Two groups have developed sequential strand displacement strategies to enable multistep

DNA-templated synthesis. In He and Liu’s report, progressively longer substrate DNA strands displace previously annealed shorter substrates from the template, thereby introducing new building blocks that can be incorporated through amide coupling at the end of a growing chain

(Figure 1.9b).254 Turberfield, O’Reilly, and coworkers introduced new building blocks using a

28 substrate DNA that hybridizes imperfectly to the DNA carrying the growing polymer chain.

After each chain growth reaction by a Wittig reaction, they would introduce a remover strand that perfectly complemented the waste strand, releasing the polymer-carrying DNA for the next cycle (Figure 1.9c).255,256 Both systems enable relatively high-yielding syntheses (83-88% on average per step) over five or nine monomer incorporation steps, though combinatorial synthesis was not demonstrated for either system. O’Reilly, Turberfield, and coworkers later reported a variant of their method. In the new strategy, the DNA strands carrying the building block and the growing polymer chain had only short complementary regions and could only assemble in the presence of a longer instruction strand. After chain growth, a scavenger strand would be added to remove the instruction strand, releasing the polymer-carrying strand for further reaction (Figure

1.9d). The new system allowed parallel independent reactions to occur in one pot.257

More recently, the same two groups have reported autonomous DNA-based multistep assemblers that do not require researcher intervention at each step of monomer incorporation. He and Liu reported a DNAzyme-based nanowalker that carried out a three-step synthesis on a predefined template (Figure 1.10a).258 O’Reilly, Turberfield, and coworkers devised a system based on the one-dimensional hairpin hybridization chain reaction that allowed the synthesis of homo-7-mers and the combinatorial synthesis of heterotetramers by consecutive amide coupling.

The sequence of bond-forming events was recorded in the dsDNA that formed from the hairpin hybridizations and could be deciphered by sequencing (Figure 1.10b).259

29 a

1. Translocation

DNAzyme Docking & RNA spacer 2. Building block transfer

Spacer cleavage Docking

b Add Initiate building block

Reposition growing polymer chain to junction

Add building block …

Figure 1.10 | Autonomous DNA-based polymer assembler systems. (a) A DNAzyme-based nanowalker that performs a chain-growing building block transfer reaction at each step of translocation. (b) An autonomous system that performs a chain-growing building block transfer reaction at alternate steps in a one-dimensional hairpin hybridization chain reaction.

a

H2O2 S S N HRP = = NH

H2O2 HRP

b = S = S DMTMM

H2N NH2 HOOC COOH Figure 1.11 | Polymerization of DNA template-linked building block arrays. (a) Synthesis of conjugated conducting polymers from DNA-linked 2,5-bis(2-thienyl)pyrrole and aniline building blocks. DNA structure pre-organizes the building blocks into cyclic (top) or linear (bottom) arrays of defined sequences. (b) Synthesis of nylon nucleic acid from an array of DNA-linked alternating dicarboxylate and diamine building blocks.

30 As an alternative to the DNA-directed chain growth strategies described above, the DNA- templated synthesis of non-nucleic-acid polymers could also proceed through a mechanism in which an entire array of nucleic-acid-linked building blocks are assembled on a DNA template before polymerization is initiated. In a series of studies, Schuster and coworkers demonstrated that arrays of aniline260,261, 4-aminobiphenyl262, thieno[3,2-b]pyrrole263, or 2,5-bis(2- thienyl)pyrrole264 building blocks linked to the N4 of properly spaced in dsDNA could be polymerized under mildly oxidizing conditions (H2O2 with horseradish peroxidase) into

DNA-linked highly conjugated linear polymers. Moreover, using nonlinear DNA structures such as three-way junctions as templates, cyclic polymers could be synthesized.265 More recently, the same group demonstrated the syntheses of linear and cyclic sequence-defined copolymers of aniline and 2,5-bis(2-thienyl)pyrrole using this strategy (Figure 1.11a).266

Seeman, Canary, and coworkers reported a DNA-templated strategy for synthesizing nucleic acid strands with a polyamide backbone (“nylon nucleic acid”). An array of alternating dicarboxylate and diamine building blocks was linked to the 2’-positions of consecutive nucleotides on one strand in a dsDNA duplex. The building blocks were then oligomerized with

DMTMM-promoted amide coupling (Figure 1.11b). The reaction proceeded quite cleanly for the formation of up to 8-mers, while a control reaction performed on a building-block-carrying unhybridized ssDNA was much less efficient, demonstrating the templating effect of the rather rigid dsDNA.267

31 Liu and coworkers reported a strategy for translating DNA into sequence-defined non- nucleic-acid polymers. Building blocks equipped with azide and alkyne coupling handles were tethered through cleavable disulfide linkers to short oligomers of PNA into macrocycles. In a polymerization reaction, a DNA template would direct the assembly of an array of those macrocycles through DNA-PNA hybridization, thereby arranging the polymer building blocks in

a

N N N3 N3 3 3 N3 N3 Anneal

Cleave off Cu(I) adaptor

O NH b 2 O H H H H H H H O N N N N N N N N3 N N3 H O O O O O O NH O NH O COOH NH COOH

NH2 S S S S

O H H H H HN N N N N NH N N N N N O O O O O O O O O O NH N N NH2 N N 2 O N O N N N N NH NH HN HN N N N NH O O O O 2

Figure 1.12 | Enzyme-free translation of DNA into sequence-defined non-nucleic-acid polymers. (a) Reaction scheme. Each macrocyclic building block consists of a synthetic polymer segment equipped with alkyne and azide handles and tethered through cleavable disulfide linkers to short oligomers of PNA. PNA hybridization to template DNA assembles the polymer segments into a defined sequence. The polymerization proceeds through a copper(I)-catalyzed click reaction. The PNA adaptors are then cleaved off to give a template-linked synthetic polymer. (b) Structure of a representative macrocyclic building block.

32 the order specified by template DNA sequence. Addition of copper catalyst would then initiate the polymerization, after which the PNA adapters were cleaved off, leaving behind a sequence- defined polymer linked to its DNA template at one terminus but otherwise free to potentially fold into three-dimensional structures. Other than the mandatory click reaction handles and basic chemical compatibility, this system in principle imposes no restriction on the structure of the polymer backbone. The authors demonstrated the templated oligomerization of building blocks with polyethylene glycol, α-(D)-peptide, or β-peptide backbones. Up to sixteen consecutive building blocks could be incorporated (Figure 1.12).17

Taken together, the studies reviewed above highlight the potential of DNA-templated chemistry for synthesizing sequence-defined polymers ranging from peptidomimetics to conjugated conducting polymers. Future developments may enable fine-tuned synthesis of more types of polymers that are traditionally produced as bulk materials. Improvements in the construction of template-linked combinatorial chemical libraries may one day allow researchers to perform laboratory evolution on diverse sequence-defined synthetic polymers.

1.6 – Conclusion and Outlook

The templated synthesis of non-natural nucleic acid polymers has a long history dating back to shortly after the discovery of the central dogma. Since the invention of SELEX, the prospect of subjecting non-natural nucleic acids to directed evolution, both for discovering nucleic acids with novel function and desirable in vivo properties, and for illuminating the plausibility of alternative nucleic acids as primordial genetic materials, has accelerated efforts to explore and expand the substrate scope of various DNA and RNA polymerases. Many nucleic acids with non-canonical nucleobase, sugar, or backbone structures have now been synthesized

33 through polymerase-mediated transcription or primer extension. As the research community’s abilities to synthesize nucleotide analogs and to engineer and evolve polymerases continue to grow, so will the repertoire of evolvable nucleic acid polymers, which will lead to exciting new insights into nucleic acid structure and function. Together with research into non-enzymatic polymerization of nucleic acids under prebiotically plausible conditions, the study of xenonucleic acids may also help illuminate the chemical origins of Darwinian evolution.

Ribosomal incorporation of non-canonical building blocks into peptidic polymers is a formidable challenge. A decade ago, the convergence of PURE and Flexizyme technologies gave researchers the ability to systematically explore the substrate preference of ribosomal synthesis machinery in vitro, and to subject some nucleic-acid-displayed non-natural peptide libraries to selection. Expanding the substrate scope of ribosomal translation, on the other hand, is still difficult, particularly with regard to the formation of polymers with non-canonical backbone elements, and will likely require optimizing the interaction between tRNA and elongation factors221,222,259,268–272, engineering the ribosome (which has improved incorporation of single D- amino acids and β-amino acids273–278), or pursuing other strategies.

DNA-templated non-enzymatic synthesis methods, which have succeeded in constructing

DNA-encoded small molecule libraries, have also been used to generate increasingly long and diverse synthetic polymers. In addition to synthesizing sequence-defined polymers that have traditionally been produced as stochastic mixtures, DNA-templated methods have also been demonstrated to enable the combinatorial synthesis of genetically encoded polymers without the constraints imposed by enzymes. Current methods are still limited by relatively low yields or requirements for extensive researcher intervention at each step. Nonetheless, recent developments suggest a future in which researchers can routinely perform Darwinian evolution

34 on polymers with diverse structures to access structures and functions beyond those available to proteins and nucleic acids.

References

1. Matzura, H. & Eckstein, F. A Polyribonucleotide Containing Alternating P = O and P = S

Linkages. Eur. J. Biochem. 3, 448–452 (1968).

2. Eckstein, F. & Gindl, H. Polyribonucleotides containing a thiophosphate backbone. FEBS

Lett. 2, 262–264 (1969).

3. Langer, P. R., Waldrop, A. A. & Ward, D. C. Enzymatic synthesis of biotin-labeled : novel nucleic acid affinity probes. Proc. Natl. Acad. Sci. 78, 6633–6637 (1981).

4. Noren, C. J., Anthony-Cahill, S. J., Griffith, M. C. & Schultz, P. G. A general method for site-specific incorporation of unnatural amino acids into proteins. Science 244, 182–188 (1989).

5. Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).

6. Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822 (1990).

7. Robertson, D. L. & Joyce, G. F. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature 344, 467–468 (1990).

8. Lin, Y., Qiu, Q., Gill, S. C. & Jayasena, S. D. Modified RNA sequence pools for in vitro selection. Nucleic Acids Res. 22, 5229–5234 (1994).

9. Pieken, W. A., Olsen, D. B., Benseler, F., Aurup, H. & Eckstein, F. Kinetic characterization of ribonuclease-resistant 2’-modified hammerhead ribozymes. Science 253,

314–317 (1991).

35 10. Beigelman, L. et al. Chemical Modification of Hammerhead Ribozymes: Catalytic activity and nuclease resistance. J. Biol. Chem. 270, 25702–25708 (1995).

11. Latham, J. A., Johnson, R. & Toole, J. J. The application of a modified nucleotide in aptamer selection: novel thrombin aptamers containing 5-(1-pentynyl)-2′-. Nucleic

Acids Res. 22, 2817–2822 (1994).

12. Nemoto, N., Miyamoto-Sato, E., Husimi, Y. & Yanagawa, H. In vitro virus: Bonding of mRNA bearing puromycin at the 3’-terminal end to the C-terminal end of its encoded protein on the ribosome in vitro. FEBS Lett. 414, 405–408 (1997).

13. Roberts, R. W. & Szostak, J. W. RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. U. S. A. 94, 12297–12302 (1997).

14. Chen, T. & Romesberg, F. E. Directed polymerase evolution. FEBS Lett. 588, 219–229

(2014).

15. Laos, R., Thomson, J. M. & Benner, S. A. DNA polymerases engineered by directed evolution to incorporate non-standard nucleotides. Front. Microbiol. 5, 565 (2014).

16. Ohuchi, M., Murakami, H. & Suga, H. The flexizyme system: a highly flexible tRNA aminoacylation tool for the translation apparatus. Curr. Opin. Chem. Biol. 11, 537–542 (2007).

17. Niu, J., Hili, R. & Liu, D. R. Enzyme-free translation of DNA into sequence-defined synthetic polymers structurally unrelated to nucleic acids. Nat. Chem. 5, 282–292 (2013).

18. Eschenmoser, A. Chemical Etiology of Nucleic Acid Structure. Science 284, 2118–2124

(1999).

19. Pinheiro, V. B. & Holliger, P. The XNA world: progress towards replication and evolution of synthetic genetic polymers. Curr. Opin. Chem. Biol. 16, 245–252 (2012).

36 20. Chaput, J. C., Yu, H. & Zhang, S. The Emerging World of Synthetic Genetics. Chem.

Biol. 19, 1360–1371 (2012).

21. Pinheiro, V. B. & Holliger, P. Towards XNA nanotechnology: new materials from synthetic genetic polymers. Trends Biotechnol. 32, 321–328 (2014).

22. Eckstein, F. Nucleoside Phosphorothioates. Annu. Rev. Biochem. 54, 367–402 (1985).

23. Gish, G. & Eckstein, F. DNA and RNA sequence determination based on phosphorothioate chemistry. Science 240, 1520–1522 (1988).

24. Nakamaye, K. L., Gish, G., Eckstein, F. & Vosberg, H.-P. Direct sequencing of polymerase chain reaction amplified DNA fragments through the incorporation of deoxynucleoside α-thiotriphosphates. Nucleic Acids Res. 16, 9947–9959 (1988).

25. Jhaveri, S., Olwin, B. & Ellington, A. D. In vitro selection of phosphorothiolated aptamers. Bioorg. Med. Chem. Lett. 8, 2285–2290 (1998).

26. King, D. J., Ventura, D. A., Brasier, A. R. & Gorenstein, D. G. Novel Combinatorial

Selection of Phosphorothioate Oligonucleotide Aptamers. 37, 16489–16493

(1998).

27. King, D. J. et al. Combinatorial Selection and Binding of Phosphorothioate Aptamers

Targeting Human NF-κB RelA(p65) and p50. Biochemistry 41, 9696–9706 (2002).

28. Bassett, S. E. et al. Combinatorial Selection and Edited Combinatorial Selection of

Phosphorothioate Aptamers Targeting Human Nuclear Factor-κB RelA/p50 and RelA/RelA.

Biochemistry 43, 9105–9115 (2004).

29. Kang, J., Lee, M. S., Watowich, S. J. & Gorenstein, D. G. Combinatorial selection of a

RNA thioaptamer that binds to Venezuelan equine encephalitis virus capsid protein. FEBS Lett.

581, 2497–2502 (2007).

37 30. Kang, J., Lee, M. S., Copland, J. A., Luxon, B. A. & Gorenstein, D. G. Combinatorial selection of a single stranded DNA thioaptamer targeting TGF-β1 protein. Bioorg. Med. Chem.

Lett. 18, 1835–1839 (2008).

31. Somasunderam, A. et al. Combinatorial Selection of DNA Thioaptamers Targeted to the

HA Binding Domain of Human CD44. Biochemistry 49, 9106–9112 (2010).

32. Mann, A. P. et al. Identification of Thioaptamer Ligand against E-Selectin: Potential

Application for Inflamed Vasculature Targeting. PLoS ONE 5, e13050 (2010).

33. Gandham, S. H. A., Volk, D. E., Lokesh, G. L. R., Neerathilingam, M. & Gorenstein, D.

G. Thioaptamers targeting dengue virus type-2 envelope protein domain III. Biochem. Biophys.

Res. Commun. 453, 309–315 (2014).

34. Carrasco, N. & Huang, Z. Enzymatic Synthesis of Phosphoroselenoate DNA Using

Thymidine 5‘-(α-P- seleno)triphosphate and DNA Polymerase for X-ray Crystallography via

MAD. J. Am. Chem. Soc. 126, 448–449 (2004).

35. Carrasco, N., Caton-Williams, J., Brandt, G., Wang, S. & Huang, Z. Efficient Enzymatic

Synthesis of Phosphoroselenoate RNA by Using 5′-(α-P-Seleno)triphosphate. Angew.

Chem. Int. Ed. 45, 94–97 (2006).

36. Porter, K. W., Briley, J. D. & Shaw, B. R. Direct PCR sequencing with boronated nucleotides. Nucleic Acids Res. 25, 1611–1617 (1997).

37. Sergueev, D., Hasan, A., Ramaswamy, M. & Shaw, B. R. Boranophosphate

Oligonucleotides: New Synthetic Approaches. Nucleosides Nucleotides 16, 1533–1538 (1997).

38. Wan, J. & Shaw, B. R. Incorporation of ribonucleoside 5’-(α-P-borano)triphosphates into a 20-mer RNA by T7 RNA polymerase. Nucleosides Nucleotides Nucleic Acids 24, 943–946

(2005).

38 39. Lato, S. M. et al. Boron-containing aptamers to ATP. Nucleic Acids Res. 30, 1401–1407

(2002).

40. Hall, A. H. S., Wan, J., Shaughnessy, E. E., Ramsay Shaw, B. & Alexander, K. A. RNA interference using boranophosphate siRNAs: structure-activity relationships. Nucleic Acids Res.

32, 5991–6000 (2004).

41. Hall, A. H. S. et al. High potency silencing by single-stranded boranophosphate siRNA.

Nucleic Acids Res. 34, 2773–81 (2006).

42. Higuchi, H., Endo, T. & Kaji, A. Enzymic synthesis of oligonucleotides containing methylphosphonate internucleotide linkages. Biochemistry 29, 8747–8753 (1990).

43. Victorova, L. S. et al. Formation of phosphonester bonds catalyzed by DNA polymerase.

Nucleic Acids Res. 20, 783–789 (1992).

44. Dineva, M. A., Chakurov, S., Bratovanova, E. K., Devedjiev, I. & Petkov, D. D.

Complete template-directed enzymatic synthesis of a potential antisense DNA containing 42 methylphosphonodiester bonds. Bioorg. Med. Chem. 1, 411–414 (1993).

45. Higgins, E. J., Miller, T. J. & Gilham, P. T. The ATP Analogue Containing a Methylene

Group in Place of the 5′ Oxygen Serves as a Substrate for Bacteriophage T3 RNA Polymerase.

Nucleosides Nucleotides 14, 1671–1674 (1995).

46. Bell, M. A., Gough, G. R. & Gilham, P. T. The Analogue of Triphosphate

Containing a Methylene Group in Place of the 5′ Oxygen Can Serve as a Substrate for Reverse

Transcriptase. Nucleosides Nucleotides Nucleic Acids 22, 405–417 (2003).

47. Renders, M. et al. Enzymatic Synthesis of Phosphonomethyl Oligonucleotides by

Therminator Polymerase. Angew. Chem. Int. Ed. 46, 2501–2504 (2007).

39 48. Renders, M., Lievrouw, R., Krecmerová, M., Holý, A. & Herdewijn, P. Enzymatic

Polymerization of Phosphonate Nucleosides. ChemBioChem 9, 2883–2888 (2008).

49. Hocek, M. Synthesis of Base-Modified 2′-Deoxyribonucleoside Triphosphates and Their

Use in Enzymatic Synthesis of Modified DNA for Applications in Bioanalysis and Chemical

Biology. J. Org. Chem. 79, 9914–9921 (2014).

50. Hollenstein, M. Nucleoside Triphosphates — Building Blocks for the Modification of

Nucleic Acids. Molecules 17, 13569–13591 (2012).

51. Jäger, S. et al. A Versatile Toolbox for Variable DNA Functionalization at High Density.

J. Am. Chem. Soc. 127, 15071–15082 (2005).

52. Welter, M., Verga, D. & Marx, A. Sequence-Specific Incorporation of Enzyme-

Nucleotide Chimera by DNA Polymerases. Angew. Chem. Int. Ed. 55, 10131–10135 (2016).

53. Baccaro, A., Steck, A.-L. & Marx, A. Barcoded Nucleotides. Angew. Chem. Int. Ed. 51,

254–257 (2012).

54. Obeid, S., Baccaro, A., Welte, W., Diederichs, K. & Marx, A. Structural basis for the synthesis of nucleobase modified DNA by Thermus aquaticus DNA polymerase. Proc. Natl.

Acad. Sci. U. S. A. 107, 21327–21331 (2010).

55. Obeid, S., Bußkamp, H., Welte, W., Diederichs, K. & Marx, A. Interactions of non-polar and “Click-able” nucleotides in the confines of a DNA polymerase active site. Chem. Commun.

48, 8320–8322 (2012).

56. Obeid, S., Bußkamp, H., Welte, W., Diederichs, K. & Marx, A. Snapshot of a DNA

Polymerase while Incorporating Two Consecutive C5-Modified Nucleotides. J. Am. Chem. Soc.

135, 15667–15669 (2013).

40 57. Bergen, K. et al. Structures of KlenTaq DNA Polymerase Caught While Incorporating

C5-Modified Pyrimidine and C7-Modified 7-Deazapurine Nucleoside Triphosphates. J. Am.

Chem. Soc. 134, 11840–11843 (2012).

58. Ramsay, N. et al. CyDNA: Synthesis and Replication of Highly Cy-Dye Substituted

DNA by an Evolved Polymerase. J. Am. Chem. Soc. 132, 5096–5104 (2010).

59. Wynne, S. A., Pinheiro, V. B., Holliger, P. & Leslie, A. G. W. Structures of an Apo and a

Binary Complex of an Evolved Archeal B Family DNA Polymerase Capable of Synthesising

Highly Cy-Dye Labelled DNA. PLoS ONE 8, e70892 (2013).

60. Siegmund, V., Santner, T., Micura, R. & Marx, A. Screening mutant libraries of T7 RNA polymerase for candidates with increased acceptance of 2′-modified nucleotides. Chem.

Commun. 48, 9870–9872 (2012).

61. Hoshino, H. et al. Consecutive incorporation of functionalized nucleotides with amphiphilic side chains by novel KOD polymerase mutant. Bioorg. Med. Chem. Lett. 26, 530–

533 (2016).

62. MacPherson, I. S. et al. Multivalent Glycocluster Design through Directed Evolution.

Angew. Chem. Int. Ed. 50, 11238–11242 (2011).

63. Tolle, F., Brändle, G. M., Matzner, D. & Mayer, G. A Versatile Approach Towards

Nucleobase-Modified Aptamers. Angew. Chem. Int. Ed. 54, 10971–10974 (2015).

64. Vaught, J. D. et al. Expanding the chemistry of DNA for in vitro selection. J. Am. Chem.

Soc. 132, 4141–4151 (2010).

65. Gold, L. et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker

Discovery. PLoS ONE 5, e15004 (2010).

41 66. Davies, D. R. et al. Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. Proc. Natl. Acad. Sci. U. S. A. 109, 19971–19976

(2012).

67. Gupta, S. et al. Chemically-Modified DNA Aptamers Bind Interleukin-6 with High

Affinity and Inhibit Signaling by Blocking its Interaction with Interleukin-6 Receptor. J. Biol.

Chem. 289, 8706–8719 (2014).

68. Gelinas, A. D. et al. Crystal Structure of Interleukin-6 in Complex with a Modified

Nucleic Acid Ligand. J. Biol. Chem. 289, 8720–8734 (2014).

69. Imaizumi, Y. et al. Efficacy of Base-Modification on Target Binding of Small Molecule

DNA Aptamers. J. Am. Chem. Soc. 135, 9412–9419 (2013).

70. Shoji, A., Kuwahara, M., Ozaki, H. & Sawai, H. Modified DNA Aptamer That Binds the

(R)-Isomer of a Thalidomide Derivative with High Enantioselectivity. J. Am. Chem. Soc. 129,

1456–1464 (2007).

71. Gawande, B. N. et al. Selection of DNA aptamers with two modified bases. Proc. Natl.

Acad. Sci. U. S. A. 114, 2898–2903 (2017).

72. Santoro, S. W., Joyce, G. F., Sakthivel, K., Gramatikova, S. & Barbas, C. F. RNA cleavage by a DNA enzyme with extended chemical functionality. J. Am. Chem. Soc. 122, 2433–

2439 (2000).

73. Sidorov, A. V., Grasby, J. A. & Williams, D. M. Sequence-specific cleavage of RNA in the absence of divalent metal ions by a DNAzyme incorporating imidazolyl and amino functionalities. Nucleic Acids Res. 32, 1591–1601 (2004).

74. Zhou, C. et al. DNA-Catalyzed Amide Hydrolysis. J. Am. Chem. Soc. 138, 2106–2109

(2016).

42 75. Dewey, T. M., Mundt, A., Crouch, G. J., Zyzniewski, M. C. & Eaton, B. E. New

Derivatives for Systematic Evolution of RNA Ligands by Exponential Enrichment. J. Am. Chem.

Soc. 117, 8474–8475 (1995).

76. Vaught, J. D., Dewey, T. & Eaton, B. E. T7 RNA Polymerase Transcription with 5-

Position Modified UTP Derivatives. J. Am. Chem. Soc. 126, 11231–11237 (2004).

77. Cahová, H. et al. Synthesis of 8-bromo-, 8-methyl- and 8-phenyl-dATP and their polymerase incorporation into DNA. Org. Biomol. Chem. 6, 3657–3660 (2008).

78. Perrin, D. M., Garestier, T. & Hélène, C. Expanding the catalytic repertoire of nucleic acid catalysts: simultaneous incorporation of two modified deoxyribonucleoside triphosphates bearing ammonium and imidazolyl functionalities. Nucleosides Nucleotides 18, 377–391 (1999).

79. Perrin, D. M., Garestier, T. & Hélène, C. Bridging the Gap between Proteins and Nucleic

Acids: A Metal-Independent RNAseA Mimic with Two Protein-Like Functionalities. J. Am.

Chem. Soc. 123, 1556–1563 (2001).

80. Lermer, L., Roupioz, Y., Ting, R. & Perrin, D. M. Toward an RNaseA mimic: A

DNAzyme with imidazoles and cationic amines. J. Am. Chem. Soc. 124, 9960–9961 (2002).

81. Hollenstein, M., Hipolito, C. J., Lam, C. H. & Perrin, D. M. A self-cleaving DNA enzyme modified with amines, guanidines and imidazoles operates independently of divalent metal cations (M2+). Nucleic Acids Res. 37, 1638–1649 (2009).

82. Ito, Y., Suzuki, A., Kawazoe, N. & Imanishi, Y. In Vitro Selection of RNA Aptamers

Carrying Multiple Biotin Groups in the Side Chains. Bioconjug. Chem. 12, 850–854 (2001).

83. Schoetzau, T. et al. Aminomodified Nucleobases: Functionalized Nucleoside

Triphosphates Applicable for SELEX. Bioconjug. Chem. 14, 919–926 (2003).

43 84. Hirao, I., Kimoto, M. & Yamashige, R. Natural versus Artificial Creation of Base Pairs in

DNA: Origin of Nucleobases from the Perspectives of Unnatural Base Pair Studies. Acc. Chem.

Res. 45, 2055–2065 (2012).

85. Malyshev, D. A. & Romesberg, F. E. The Expanded Genetic Alphabet. Angew. Chem.

Int. Ed. 54, 11930–11944 (2015).

86. Benner, S. A. et al. Alternative Watson–Crick Synthetic Genetic Systems. Cold Spring

Harb. Perspect. Biol. 8, (2016).

87. Yang, Z., Chen, F., Alvarado, J. B. & Benner, S. A. Amplification, Mutation, and

Sequencing of a Six-Letter Synthetic Genetic System. J. Am. Chem. Soc. 133, 15105–15112

(2011).

88. Yamashige, R. et al. Highly specific unnatural base pair systems as a third base pair for

PCR amplification. Nucleic Acids Res. 40, 2793–2806 (2012).

89. Li, L. et al. Natural-like Replication of an Unnatural Base Pair for the Expansion of the

Genetic Alphabet and Biotechnology Applications. J. Am. Chem. Soc. 136, 826–829 (2014).

90. Leal, N. A. et al. Transcription, Reverse Transcription, and Analysis of RNA Containing

Artificial Genetic Components. ACS Synth. Biol. 4, 407–413 (2015).

91. Seo, Y. J., Matsuda, S. & Romesberg, F. E. Transcription of an Expanded Genetic

Alphabet. J. Am. Chem. Soc. 131, 5046–5047 (2009).

92. Seo, Y. J., Malyshev, D. A., Lavergne, T., Ordoukhanian, P. & Romesberg, F. E. Site-

Specific Labeling of DNA and RNA Using an Efficiently Replicated and Transcribed Class of

Unnatural Base Pairs. J. Am. Chem. Soc. 133, 19878–19888 (2011).

44 93. Kimoto, M., Sato, A., Kawai, R., Yokoyama, S. & Hirao, I. Site-specific incorporation of functional components into RNA by transcription using unnatural base pair systems. Nucleic

Acids Symp. Ser. 53, 73–74 (2009).

94. Morohashi, N., Kimoto, M., Sato, A., Kawai, R. & Hirao, I. Site-Specific Incorporation of Functional Components into RNA by an Unnatural Base Pair Transcription System.

Molecules 17, 2855–2876 (2012).

95. Sefah, K. et al. In vitro selection with artificial expanded genetic information systems.

Proc. Natl. Acad. Sci. 111, 1449–1454 (2014).

96. Zhang, L. et al. Evolution of functional six-nucleotide DNA. J. Am. Chem. Soc. 137,

6734–6737 (2015).

97. Zhang, L. et al. Aptamers against Cells Overexpressing Glypican 3 from Expanded

Genetic Systems Combined with Cell Engineering and Laboratory Evolution. Angew. Chem. Int.

Ed. 55, 12372–12375 (2016).

98. Biondi, E. et al. Laboratory evolution of artificially expanded DNA gives redesignable aptamers that target the toxic form of anthrax protective antigen. Nucleic Acids Res. 44, 9565–

9577 (2016).

99. Kimoto, M., Yamashige, R., Matsunaga, K., Yokoyama, S. & Hirao, I. Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 31, 453–457

(2013).

100. Matsunaga, K., Kimoto, M. & Hirao, I. High-Affinity DNA Aptamer Generation

Targeting von Willebrand Factor A1-Domain by Genetic Alphabet Expansion for Systematic

Evolution of Ligands by Exponential Enrichment Using Two Types of Libraries Composed of

Five Different Bases. J. Am. Chem. Soc. 139, 324–334 (2017).

45 101. Malyshev, D. A. et al. A semi-synthetic organism with an expanded genetic alphabet.

Nature 509, 385–388 (2014).

102. Zhang, Y. et al. A semisynthetic organism engineered for the stable expansion of the genetic alphabet. Proc. Natl. Acad. Sci. 114, 1317 (2017).

103. Feldman, A. W., Dien, V. T. & Romesberg, F. E. Chemical Stabilization of Unnatural

Nucleotide Triphosphates for the in Vivo Expansion of the Genetic Alphabet. J. Am. Chem. Soc.

139, 2464–2467 (2017).

104. Morris, S. E., Feldman, A. W. & Romesberg, F. E. Parts for the

Storage of Increased Genetic Information in Cells. ACS Synth. Biol. 6, 1834–1840 (2017).

105. Zhang, Y. et al. A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644 (2017).

106. Feldman, A. W. & Romesberg, F. E. In Vivo Structure–Activity Relationships and

Optimization of an Unnatural Base Pair for Replication in a Semi-Synthetic Organism. J. Am.

Chem. Soc. 139, 11427–11433 (2017).

107. Li, Z. et al. Site-Specifically Arraying Small Molecules or Proteins on DNA Using An

Expanded Genetic Alphabet. Chem. Eur. J. 19, 14205–14209 (2013).

108. Kimoto, M., Kawai, R., Mitsui, T., Yokoyama, S. & Hirao, I. An unnatural base pair system for efficient PCR amplification and functionalization of DNA molecules. Nucleic Acids

Res. 37, e14 (2008).

109. Okamoto, I., Miyatake, Y., Kimoto, M. & Hirao, I. High Fidelity, Efficiency and

Functionalization of Ds–Px Unnatural Base Pairs in PCR Amplification for a Genetic Alphabet

Expansion System. ACS Synth. Biol. 5, 1220–1230 (2016).

46 110. Tarashima, N., Komatsu, Y., Furukawa, K. & Minakawa, N. Faithful PCR Amplification of an Unnatural Base-Pair Analogue with Four Hydrogen Bonds. Chem. Eur. J. 21, 10688–

10695 (2015).

111. Aurup, H., Williams, D. M. & Eckstein, F. 2’-Fluoro and 2-amino-2’-deoxynucleoside

5’-triphosphates as substrates for T7 RNA polymerase. Biochemistry 31, 9636–9641 (1992).

112. Jellinek, D. et al. Potent 2’-Amino-2’-deoxypyrimidine RNA Inhibitors of Basic

Fibroblast Growth Factor. Biochemistry 34, 11363–11372 (1995).

113. Green, L. S. et al. Nuclease-resistant nucleic acid ligands to vascular permeability factor/vascular endothelial growth factor. Chem. Biol. 2, 683–695 (1995).

114. Ruckman, J. et al. 2’-Fluoropyrimidine RNA-based Aptamers to the 165-Amino Acid

Form of Vascular Endothelial Growth Factor (VEGF165): Inhibition of receptor binding and

VEGF-induced vascular permeability through interactions requiring the exon 7-encoded domain.

J. Biol. Chem. 273, 20556–20567 (1998).

115. Ng, E. W. M. et al. Pegaptanib, a targeted anti-VEGF aptamer for ocular vascular disease. Nat. Rev. Drug Discov. 5, 123 (2006).

116. Padilla, R. & Sousa, R. Efficient synthesis of nucleic acids heavily modified with non- canonical ribose 2’-groups using a mutantT7 RNA polymerase (RNAP). Nucleic Acids Res. 27,

1561–1563 (1999).

117. Zhu, B. et al. Synthesis of 2′-Fluoro RNA by Syn5 RNA polymerase. Nucleic Acids Res.

43, e94 (2015).

118. Padilla, R. & Sousa, R. A Y639F/H784A T7 RNA polymerase double mutant displays superior properties for synthesizing with non-canonical NTPs. Nucleic Acids Res. 30, e138 (2002).

47 119. Chelliserrykattil, J. & Ellington, A. D. Evolution of a T7 RNA polymerase variant that transcribes 2′-O-methyl RNA. Nat. Biotechnol. 22, 1155–1160 (2004).

120. Ibach, J. et al. Identification of a T7 RNA polymerase variant that permits the enzymatic synthesis of fully 2′-O-methyl-modified RNA. J. Biotechnol. 167, 287–295 (2013).

121. Meyer, A. J. et al. Transcription yield of fully 2′-modified RNA can be increased by the addition of thermostabilizing mutations to T7 RNA polymerase mutants. Nucleic Acids Res. 43,

7480–7488 (2015).

122. Burmeister, P. E. et al. Direct In Vitro Selection of a 2′-O-Methyl Aptamer to VEGF.

Chem. Biol. 12, 25–33 (2005).

123. Burmeister, P. E. et al. 2’-Deoxy Purine, 2’-O-Methyl Pyrimidine (dRmY) Aptamers as

Candidate Therapeutics. Oligonucleotides 16, 337–351 (2006).

124. Friedman, A. D., Kim, D. & Liu, R. Highly stable aptamers selected from a 2′-fully modified fGmH RNA library for targeting biomaterials. Biomaterials 36, 110–123 (2015).

125. Raines, K. & Gottlieb, P. A. Enzymatic incorporation of 2’-thio-CTP into the HDV . RNA 4, 340–345 (1998).

126. Pavey, J. B. J., Lawrence, A. J., O’Neil, I. A., Vortler, S. & Cosstick, R. Synthesis and transcription studies on 5′-triphosphates derived from 2′-C-branched-uridines: 2′-homouridine-

5′-triphosphate is a substrate for T7 RNA polymerase. Org. Biomol. Chem. 2, 869–875 (2004).

127. Siegmund, V., Santner, T., Micura, R. & Marx, A. Enzymatic synthesis of 2′- methylseleno-modified RNA. Chem. Sci. 2, 2224–2231 (2011).

128. Ono, T., Scalf, M. & Smith, L. M. 2’-Fluoro modified nucleic acids: polymerase-directed synthesis, properties and stability to analysis by matrix-assisted laser desorption/ionization mass spectrometry. Nucleic Acids Res. 25, 4581–4588 (1997).

48 129. Kasuya, T., Hori, S., Hiramatsu, H. & Yanagimoto, T. Highly accurate synthesis of the fully 2′-fluoro-modified oligonucleotide by Therminator DNA polymerases. Bioorg. Med. Chem.

Lett. 24, 2134–2136 (2014).

130. Cozens, C., Pinheiro, V. B., Vaisman, A., Woodgate, R. & Holliger, P. A short adaptive path from DNA to RNA polymerases. Proc. Natl. Acad. Sci. U. S. A. 109, 8067–8072 (2012).

131. Chen, T. et al. Evolution of thermophilic DNA polymerases for the recognition and amplification of C2ʹ-modified DNA. Nat. Chem. 8, 556–562 (2016).

132. Liu, Z., Chen, T. & Romesberg, F. E. Evolved polymerases facilitate selection of fully

2[prime or minute]-OMe-modified aptamers. Chem. Sci. 8, 8179–8182 (2017).

133. Thirunavukarasu, D., Chen, T., Liu, Z., Hongdilokkul, N. & Romesberg, F. E. Selection of 2′-Fluoro-Modified Aptamers with Optimized Properties. J. Am. Chem. Soc. 139, 2892–2895

(2017).

134. Chen, T. & Romesberg, F. E. Enzymatic Synthesis, Amplification, and Application of

DNA with a Functionalized Backbone. Angew. Chem. Int. Ed. 56, 14046–14051 (2017).

135. Inoue, N. et al. Amplification of 4‘-ThioDNA in the Presence of 4‘-Thio-dTTP and 4‘-

Thio-dCTP, and 4‘-ThioDNA-Directed Transcription in Vitro and in Mammalian Cells. J. Am.

Chem. Soc. 129, 15424–15425 (2007).

136. Kojima, T. et al. PCR Amplification of 4′-ThioDNA Using 2′-Deoxy-4′-thionucleoside

5′-Triphosphates. ACS Synth. Biol. 2, 529–536 (2013).

137. Maruyama, H., Furukawa, K., Kamiya, H., Minakawa, N. & Matsuda, A. Transcription of 4′-thioDNA templates to natural RNA in vitro and in mammalian cells. Chem Commun 51,

7887–7890 (2015).

49 138. Kato, Y. et al. New NTP analogs: the synthesis of 4’-thioUTP and 4’-thioCTP and their utility for SELEX. Nucleic Acids Res. 33, 2942–2951 (2005).

139. Minakawa, N., Sanji, M., Kato, Y. & Matsuda, A. Investigations toward the selection of fully-modified 4′-thioRNA aptamers: Optimization of in vitro transcription steps in the presence of 4′-thioNTPs. Bioorg. Med. Chem. 16, 9450–9456 (2008).

140. Tarashima, N. et al. Synthesis of DNA fragments containing 2′-deoxy-4′- selenonucleoside units using DNA polymerases: comparison of dNTPs with O, S and Se at the

4′-position in replication. Org Biomol Chem 13, 6949–6952 (2015).

141. Jiang, C. et al. Synthesis and recognition of novel isonucleoside triphosphates by DNA polymerases. Bioorg. Med. Chem. 15, 3019–3025 (2007).

142. Ogino, T., Sato, K. & Matsuda, A. Incorporation of 2′-Deoxy-2′-isonucleoside 5′-

Triphosphates (iNTPs) into DNA by A- and B-Family DNA Polymerases with Different

Recognition Mechanisms. ChemBioChem 11, 2597–2605 (2010).

143. Kataoka, M., Kouda, Y., Sato, K., Minakawa, N. & Matsuda, A. Highly efficient enzymatic synthesis of 3′-deoxyapionucleic acid (apioNA) having the four natural nucleobases.

Chem. Commun. 47, 8700–8702 (2011).

144. Pinheiro, V. B. et al. Synthetic Genetic Polymers Capable of Heredity and Evolution.

Science 336, 341–344 (2012).

145. Cozens, C. et al. Enzymatic Synthesis of Nucleic Acids with Defined Regioisomeric 2′-5′

Linkages. Angew. Chem. Int. Ed. 54, 15570–15573 (2015).

146. Peng, C. G. & Damha, M. J. Polymerase-Directed Synthesis of 2‘-Deoxy-2‘-fluoro-β-D- arabinonucleic Acids. J. Am. Chem. Soc. 129, 5310–5311 (2007).

50 147. Veedu, R. N., Vester, B. & Wengel, J. Enzymatic Incorporation of LNA Nucleotides into

DNA Strands. ChemBioChem 8, 490–492 (2007).

148. Veedu, R. N., Vester, B. & Wengel, J. In Vitro Incorporation of LNA Nucleotides.

Nucleosides Nucleotides Nucleic Acids 26, 1207–1210 (2007).

149. Veedu, R. N., Vester, B. & Wengel, J. Polymerase Chain Reaction and Transcription

Using Locked Nucleic Acid Nucleotide Triphosphates. J. Am. Chem. Soc. 130, 8124–8125

(2008).

150. Veedu, R. N., Vester, B. & Wengel, J. Efficient enzymatic synthesis of LNA-modified

DNA duplexes using KOD DNA polymerase. Org. Biomol. Chem. 7, 1404–1409 (2009).

151. Veedu, R. N. et al. Polymerase-directed synthesis of C5-ethynyl locked nucleic acids.

Bioorg. Med. Chem. Lett. 20, 6565–6568 (2010).

152. Doessing, H., Hansen, L., Veedu, R., Wengel, J. & Vester, B. Amplification and Re-

Generation of LNA-Modified Libraries. Molecules 17, 13087–13097 (2012).

153. Kuwahara, M. et al. Systematic analysis of enzymatic DNA polymerization using oligo-

DNA templates and triphosphate analogs involving 2′,4′-bridged nucleosides. Nucleic Acids Res.

36, 4257–4265 (2008).

154. Kasahara, Y., Irisawa, Y., Ozaki, H., Obika, S. & Kuwahara, M. 2′,4′-BNA/LNA aptamers: CE-SELEX using a DNA-based library of full-length 2′-O,4′-C-methylene- bridged/linked bicyclic ribonucleotides. Bioorg. Med. Chem. Lett. 23, 1288–1292 (2013).

155. Kasahara, Y. et al. Capillary Electrophoresis–Systematic Evolution of Ligands by

Exponential Enrichment Selection of Base- and Sugar-Modified DNA Aptamers: Target Binding

Dominated by 2′-O ,4′- C -Methylene-Bridged/Locked Nucleic Acid Primer. Anal. Chem. 85,

4961–4967 (2013).

51 156. Elle, I. C. et al. Selection of LNA-containing DNA aptamers against recombinant human

CD73. Mol. BioSyst. 11, 1260–1270 (2015).

157. Kore, A., Hodeib, M. & Hu, Z. Chemical Synthesis of LNA-mCTP and Its Application for MicroRNA Detection. Nucleosides Nucleotides Nucleic Acids 27, 1–17 (2008).

158. Johannsen, M. W., Veedu, R. N., Madsen, A. S. & Wengel, J. Enzymatic polymerisation involving 2′-amino-LNA nucleotides. Bioorg. Med. Chem. Lett. 22, 3522–3526 (2012).

159. Højland, T., Veedu, R. N., Vester, B. & Wengel, J. Enzymatic synthesis of DNA strands containing α-L-LNA (α-L-configured locked nucleic acid) nucleotides. Artif. DNA PNA

XNA 3, 14–21 (2012).

160. Wheeler, M. et al. Synthesis of selenomethylene-locked nucleic acid (SeLNA)-modified oligonucleotides by polymerases. Chem. Commun. 48, 11020–11022 (2012).

161. Chaput, J. C. & Szostak, J. W. TNA Synthesis by DNA Polymerases. J. Am. Chem. Soc.

125, 9274–9275 (2003).

162. Kempeneers, V., Vastmans, K., Rozenski, J. & Herdewijn, P. Recognition of threosyl nucleotides by DNA and RNA polymerases. Nucleic Acids Res. 31, 6221–6226 (2003).

163. Ichida, J. K. et al. An in Vitro Selection System for TNA. J. Am. Chem. Soc. 127, 2802–

2803 (2005).

164. Ichida, J. K., Horhota, A., Zou, K., McLaughlin, L. W. & Szostak, J. W. High fidelity

TNA synthesis by Therminator polymerase. Nucleic Acids Res. 33, 5219–5225 (2005).

165. Yu, H., Zhang, S. & Chaput, J. C. Darwinian evolution of an alternative genetic system provides support for TNA as an RNA progenitor. Nat. Chem. 4, 183–187 (2012).

52 166. Dunn, M. R. et al. DNA Polymerase-Mediated Synthesis of Unbiased Threose Nucleic

Acid (TNA) Polymers Requires 7-Deazaguanine To Suppress G:G Mispairing during TNA

Transcription. J. Am. Chem. Soc. 137, 4014–4017 (2015).

167. Dunn, M. R., Otto, C., Fenton, K. E. & Chaput, J. C. Improving Polymerase Activity with

Unnatural Substrates by Sampling Mutations in Homologous Protein Architectures. ACS Chem.

Biol. 11, 1210–1219 (2016).

168. Chim, N., Shi, C., Sau, S. P., Nikoomanzar, A. & Chaput, J. C. Structural basis for TNA synthesis by an engineered TNA polymerase. Nat. Commun. 8, 1810 (2017).

169. Larsen, A. C. et al. A general strategy for expanding polymerase function by droplet microfluidics. Nat. Commun. 7, 11235 (2016).

170. Dunn, M. R. & Chaput, J. C. Reverse Transcription of Threose Nucleic Acid by a

Naturally Occurring DNA Polymerase. ChemBioChem 17, 1804–1808 (2016).

171. Mei, H. et al. Synthesis and polymerase activity of a fluorescent cytidine TNA triphosphate analogue. Nucleic Acids Res. 45, 5629–5638 (2017).

172. Mei, H. & Chaput, J. C. Expanding the chemical diversity of TNA with tUTP derivatives that are substrates for a TNA polymerase. Chem. Commun. 54, 1237–1240 (2018).

173. Vastmans, K. et al. Enzymatic Incorporation in DNA of 1,5-Anhydrohexitol Nucleotides.

Biochemistry 39, 12757–12765 (2000).

174. Vastmans, K., Froeyen, M., Kerremans, L., Pochet, S. & Herdewijn, P. Reverse transcriptase incorporation of 1,5-anhydrohexitol nucleotides. Nucleic Acids Res. 29, 3154–3163

(2001).

53 175. Kempeneers, V., Renders, M., Froeyen, M. & Herdewijn, P. Investigation of the DNA- dependent cyclohexenyl nucleic acid polymerization and the cyclohexenyl nucleic acid- dependent DNA polymerization. Nucleic Acids Res. 33, 3828–3836 (2005).

176. Heuberger, B. D. & Switzer, C. A Pre-RNA Candidate Revisited: Both Enantiomers of

Flexible Nucleoside Triphosphates are DNA Polymerase Substrates. J. Am. Chem. Soc. 130,

412–413 (2007).

177. Chen, J. J. et al. Enzymatic Primer-Extension with Glycerol-Nucleoside Triphosphates on

DNA Templates. PLoS ONE 4, e4949 (2009).

178. Alves Ferreira-Bravo, I., Cozens, C., Holliger, P. & DeStefano, J. J. Selection of 2’- deoxy-2’-fluoroarabinonucleotide (FANA) aptamers that bind HIV-1 reverse transcriptase with picomolar affinity. Nucleic Acids Res. 43, 9587–9599 (2015).

179. Taylor, A. I. et al. Catalysts from synthetic genetic polymers. Nature 518, 427–430

(2014).

180. Taylor, A. I. & Holliger, P. Directed evolution of artificial enzymes (XNAzymes) from diverse repertoires of synthetic genetic polymers. Nat. Protoc. 10, 1625–1642 (2015).

181. Hili, R., Niu, J. & Liu, D. R. DNA ligase-mediated translation of DNA into densely functionalized nucleic acid polymers. J. Am. Chem. Soc. 135, 98–101 (2013).

182. Kong, D., Lei, Y., Yeung, W. & Hili, R. Enzymatic Synthesis of Sequence-Defined

Synthetic Nucleic Acid Polymers with Diverse Functional Groups. Angew. Chem. Int. Ed. 55,

13164–13168 (2016).

183. Kong, D., Yeung, W. & Hili, R. In Vitro Selection of Diversely Functionalized

Aptamers. J. Am. Chem. Soc. 139, 13977–13980 (2017).

54 184. Guo, C., Watkins, C. P. & Hili, R. Sequence-Defined Scaffolding of Peptides on Nucleic

Acid Polymers. J. Am. Chem. Soc. 137, 11191–11196 (2015).

185. Guo, C. & Hili, R. Fidelity of the DNA Ligase-Catalyzed Scaffolding of Peptide

Fragments on Nucleic Acid Polymers. Bioconjug. Chem. 28, 314–318 (2017).

186. Liu, C. C. & Schultz, P. G. Adding New Chemistries to the Genetic Code. Annu. Rev.

Biochem. 79, 413–444 (2010).

187. Chin, J. W. Expanding and Reprogramming the Genetic Code of Cells and Animals.

Annu. Rev. Biochem. 83, 379–408 (2014).

188. Shimizu, Y. et al. Cell-free translation reconstituted with purified components. Nat.

Biotechnol. 19, 751–755 (2001).

189. Heckler, T. G. et al. T4 RNA ligase mediated preparation of novel ‘chemically misacylated’ tRNAPhes. Biochemistry 23, 1468–1473 (1984).

190. Hartman, M. C. T., Josephson, K. & Szostak, J. W. Enzymatic aminoacylation of tRNA with unnatural amino acids. Proc. Natl. Acad. Sci. U. S. A. 103, 4356–4361 (2006).

191. Murakami, H., Ohta, A., Ashigai, H. & Suga, H. A highly flexible tRNA acylation method for non-natural polypeptide synthesis. Nat. Methods 3, 357–359 (2006).

192. Kurz, M., Gu, K. & Lohse, P. A. Psoralen photo-crosslinked mRNA-puromycin conjugates: a novel template for the rapid and facile preparation of mRNA-protein fusions.

Nucleic Acids Res. 28, e83 (2000).

193. Tabuchi, I., Soramoto, S., Nemoto, N. & Husimi, Y. An in vitro DNA virus for in vitro protein evolution. FEBS Lett. 508, 309–312 (2001).

55 194. Ishizawa, T., Kawakami, T., Reid, P. C. & Murakami, H. TRAP Display: A High-Speed

Selection Method for the Generation of Functional Polypeptides. J. Am. Chem. Soc. 135, 5433–

5440 (2013).

195. Forster, A. C. et al. Programming peptidomimetic syntheses by translating genetic codes designed de novo. Proc. Natl. Acad. Sci. U. S. A. 100, 6353–6357 (2003).

196. Josephson, K., Hartman, M. C. T. & Szostak, J. W. Ribosomal synthesis of unnatural peptides. J. Am. Chem. Soc. 127, 11727–11735 (2005).

197. Hartman, M. C. T., Josephson, K., Lin, C.-W. & Szostak, J. W. An Expanded Set of

Amino Acid Analogs for the Ribosomal Translation of Unnatural Peptides. PLoS ONE 2, e972

(2007).

198. Schlippe, Y. V. G., Hartman, M. C. T., Josephson, K. & Szostak, J. W. In vitro selection of highly modified cyclic peptides that act as tight binding inhibitors. J. Am. Chem. Soc. 134,

10469–10477 (2012).

199. Frankel, A., Millward, S. W. & Roberts, R. W. Encodamers: Unnatural Peptide

Oligomers Encoded in RNA. Chem. Biol. 10, 1043–1050 (2003).

200. Kawakami, T., Murakami, H. & Suga, H. Messenger RNA-Programmed Incorporation of

Multiple N-Methyl-Amino Acids into Linear and Cyclic Peptides. Chem. Biol. 15, 32–42 (2008).

201. Merryman, C. & Green, R. Transformation of Aminoacyl tRNAs for the In Vitro

Selection of “Drug-like” Molecules. Chem. Biol. 11, 575–582 (2004).

202. Subtelny, A. O., Hartman, M. C. T. & Szostak, J. W. Ribosomal Synthesis of N-Methyl

Peptides. J. Am. Chem. Soc. 130, 6131–6136 (2008).

56 203. Yamagishi, Y. et al. Natural Product-Like Macrocyclic N-Methyl-Peptide Inhibitors against a Ubiquitin Ligase Uncovered from a Ribosome-Expressed De Novo Library. Chem.

Biol. 18, 1562–1570 (2011).

204. Kawakami, T. et al. In Vitro Selection of Multiple Libraries Created by Genetic Code

Reprogramming To Discover Macrocyclic Peptides That Antagonize VEGFR2 Activity in

Living Cells. ACS Chem. Biol. 8, 1205–1214 (2013).

205. Goto, Y. et al. Reprogramming the Translation Initiation for the Synthesis of

Physiologically Stable Cyclic Peptides. ACS Chem. Biol. 3, 120–129 (2008).

206. Goto, Y., Murakami, H. & Suga, H. Initiating translation with D-amino acids. RNA 14,

1390–1398 (2008).

207. Goto, Y. & Suga, H. Translation Initiation with Initiator tRNA Charged with Exotic

Peptides. J. Am. Chem. Soc. 131, 5040–5041 (2009).

208. Torikai, K. & Suga, H. Ribosomal Synthesis of an Amphotericin-B Inspired Macrocycle.

J. Am. Chem. Soc. 136, 17359–17361 (2014).

209. Rogers, J. M. et al. Ribosomal synthesis and folding of peptide-helical aromatic foldamer hybrids. Nat. Chem. (2018). doi:10.1038/s41557-018-0007-x

210. Zhang, B. et al. Specificity of Translation for N-Alkyl Amino Acids. J. Am. Chem. Soc.

129, 11316–11317 (2007).

211. Kawakami, T., Murakami, H. & Suga, H. Ribosomal Synthesis of Polypeptoids and

Peptoid−Peptide Hybrids. J. Am. Chem. Soc. 130, 16861–16863 (2008).

212. Kawakami, T., Ishizawa, T. & Murakami, H. Extensive Reprogramming of the Genetic

Code for Genetically Encoded Synthesis of Highly N-Alkylated Polycyclic Peptidomimetics. J.

Am. Chem. Soc. 135, 12297–12304 (2013).

57 213. Ude, S. et al. Translation elongation factor EF-P alleviates ribosome stalling at polyproline stretches. Science 339, 82–85 (2013).

214. Doerfel, L. K. et al. EF-P is essential for rapid synthesis of proteins containing consecutive proline residues. Science 339, 85–88 (2013).

215. Katoh, T., Wohlgemuth, I., Nagano, M., Rodnina, M. V. & Suga, H. Essential structural elements in tRNAPro for EF-P-mediated alleviation of translation stalling. Nat. Commun. 7,

11657 (2016).

216. Ohta, A., Murakami, H., Higashimura, E. & Suga, H. Synthesis of Polyester by Means of

Genetic Code Reprogramming. Chem. Biol. 14, 1315–1322 (2007).

217. Tan, Z., Forster, A. C., Blacklow, S. C. & Cornish, V. W. Amino Acid Backbone

Specificity of the Escherichia coli Translation Machinery. J. Am. Chem. Soc. 126, 12752–12753

(2004).

218. Fujino, T., Goto, Y., Suga, H. & Murakami, H. Reevaluation of the D-Amino Acid

Compatibility with the Elongation Event in Translation. J. Am. Chem. Soc. 135, 1830–1837

(2013).

219. Fujino, T., Goto, Y., Suga, H. & Murakami, H. Ribosomal Synthesis of Peptides with

Multiple β-Amino Acids. J. Am. Chem. Soc. 138, 1962–1969 (2016).

220. Achenbach, J. et al. Outwitting EF-Tu and the ribosome: translation with D-amino acids.

Nucleic Acids Res. 43, 5687–5698 (2015).

221. Katoh, T., Tajima, K. & Suga, H. Consecutive Elongation of D-Amino Acids in

Translation. Cell Chem. Biol. 24, 46–54 (2017).

222. Katoh, T., Iwane, Y. & Suga, H. Logical engineering of D-arm and T-stem of tRNA that enhances d-amino acid incorporation. Nucleic Acids Res. 45, 12601–12610 (2017).

58 223. Zielinski, M., Kozlov, I. A. & Orgel, L. E. A Comparison of RNA with DNA in

Template-Directed Synthesis. Helv. Chim. Acta 83, 1678–1684 (2000).

224. Li, L. et al. Enhanced Nonenzymatic RNA Copying with 2-Aminoimidazole Activated

Nucleotides. J. Am. Chem. Soc. 139, 1810–1813 (2017).

225. Lohrmann, R. & Orgel, L. E. Polymerization of nucleotide analogues I: Reaction of nucleoside 5’-phosphorimidazolides with 2’-amino-2’-deoxyuridine. J. Mol. Evol. 7, 253–267

(1976).

226. Lohrmann, R. & Orgel, L. E. Template-directed synthesis of high molecular weight analogues. Nature 261, 342–344 (1976).

227. Zielinski, W. S. & Orgel, L. E. Oligomerization of activated derivatives of 3′-amino-3′- on poly(C) and poly(dC) templates. Nucleic Acids Res. 13, 2469–2484 (1985).

228. Tohidi, M., Zielinski, W. S., Chen, C.-H. B. & Orgel, L. E. Oligomerization of 3’-amino-

3’-deoxyguanosine-5’-phosphorimidazolidate on a d(CpCpCpCpC) template. J. Mol. Evol. 25,

97–99 (1987).

229. Mansy, S. S. et al. Template-directed synthesis of a genetic polymer in a model protocell.

Nature 454, 122–125 (2008).

230. Schrum, J. P., Ricardo, A., Krishnamurthy, M., Blain, J. C. & Szostak, J. W. Efficient and Rapid Template-Directed Nucleic Acid Copying Using 2′-Amino-2′,3′- dideoxyribonucleoside−5′-Phosphorimidazolide Monomers. J. Am. Chem. Soc. 131, 14560–

14570 (2009).

231. Blain, J. C., Ricardo, A. & Szostak, J. W. Synthesis and Nonenzymatic Template-

Directed Polymerization of 2′-Amino-2′-deoxythreose Nucleotides. J. Am. Chem. Soc. 136,

2033–2039 (2014).

59 232. Röthlingshöfer, M. et al. Chemical primer extension in seconds. Angew. Chem. Int. Ed.

47, 6065–6068 (2008).

233. Zhang, S., Zhang, N., Blain, J. C. & Szostak, J. W. Synthesis of N3′-P5′-linked

Phosphoramidate DNA by Nonenzymatic Template-Directed Primer Extension. J. Am. Chem.

Soc. 135, 924–932 (2013).

234. Zhang, S., Blain, J. C., Zielinska, D., Gryaznov, S. M. & Szostak, J. W. Fast and accurate nonenzymatic copying of an RNA-like synthetic genetic polymer. Proc. Natl. Acad. Sci. 110,

17732–17737 (2013).

235. Izgu, E. C., Oh, S. S. & Szostak, J. W. Synthesis of activated 3’-amino-3’-deoxy-2-thio- thymidine, a superior substrate for the nonenzymatic copying of nucleic acid templates. Chem.

Commun. 52, 3684–3686 (2016).

236. Kaiser, A., Spies, S., Lommel, T. & Richert, C. Template-Directed Synthesis in 3′- and

5′-Direction with Reversible Termination. Angew. Chem. Int. Ed. 51, 8299–8303 (2012).

237. Li, X., Zhan, Z.-Y. J., Knipe, R. & Lynn, D. G. DNA-catalyzed polymerization. J. Am.

Chem. Soc. 124, 746–747 (2002).

238. Li, X. & Lynn, D. G. Polymerization on Solid Supports. Angew. Chem. Int. Ed. 41, 4567–

4569 (2002).

239. Ura, Y., Beierle, J. M., Leman, L. J., Orgel, L. E. & Ghadiri, M. R. Self-Assembling

Sequence-Adaptive Peptide Nucleic Acids. Science 325, 73–77 (2009).

240. Heemstra, J. M. & Liu, D. R. Templated Synthesis of Peptide Nucleic Acids via

Sequence-Selective Base-Filling Reactions. J. Am. Chem. Soc. 131, 11347–11349 (2009).

241. Böhler, C., Nielsen, P. E. & Orgel, L. E. Template switching between PNA and RNA oligonucleotides. Nature 376, 578–581 (1995).

60 242. Schmidt, J. G., Christensen, L., Nielsen, P. E. & Orgel, L. E. Information transfer from

DNA to peptide nucleic acids by template-directed syntheses. Nucleic Acids Res. 25, 4792–4796

(1997).

243. Li, X., Hernandez, A. F., Grover, M. A., Hud, N. V. & Lynn, D. G. Step-Growth Control in Template-Directed Polymerization. HETEROCYCLES 82, 1477–1488 (2011).

244. Rosenbaum, D. M. & Liu, D. R. Efficient and sequence-specific DNA-templated polymerization of peptide nucleic acid aldehydes. J Am Chem Soc 125, 13924–13925 (2003).

245. Kleiner, R. E., Brudno, Y., Birnbaum, M. E. & Liu, D. R. DNA-Templated

Polymerization of Side-Chain-Functionalized Peptide Nucleic Acid Aldehydes. J. Am. Chem.

Soc. 130, 4646–4659 (2008).

246. Brudno, Y., Birnbaum, M. E., Kleiner, R. E. & Liu, D. R. An in vitro translation, selection and amplification system for peptide nucleic acids. Nat. Chem. Biol. 6, 148–155

(2010).

247. Chen, J. J., Cai, X. & Szostak, J. W. N2′→P3′ Phosphoramidate Glycerol Nucleic Acid as a Potential Alternative Genetic System. J. Am. Chem. Soc. 131, 2119–2121 (2009).

248. Fujimoto, K., Matsuda, S., Takahashi, N. & Saito, I. Template-Directed Photoreversible

Ligation of Deoxyoligonucleotides via 5-Vinyldeoxyuridine. J. Am. Chem. Soc. 122, 5646–5647

(2000).

249. Barbeyron, R., Vasseur, J.-J. & Smietana, M. pH-controlled DNA- and RNA-templated assembly of short oligomers. Chem. Sci. 6, 542–547 (2015).

250. Kukwikila, M., Gale, N., El-Sagheer, A. H., Brown, T. & Tavassoli, A. Assembly of a biocompatible triazole-linked gene by one-pot click-DNA ligation. Nat. Chem. 9, 1089 (2017).

61 251. Li, G., Zheng, W., Liu, Y. & Li, X. Novel encoding methods for DNA-templated chemical libraries. Curr. Opin. Chem. Biol. 26, 25–33 (2015).

252. O’Reilly, R. K., Turberfield, A. J. & Wilks, T. R. The Evolution of DNA-Templated

Synthesis as a Tool for Materials Discovery. Acc. Chem. Res. 50, 2496–2509 (2017).

253. Wrenn, S. J., Weisinger, R. M., Halpin, D. R. & Harbury, P. B. Synthetic Ligands

Discovered by in Vitro Selection. J. Am. Chem. Soc. 129, 13137–13143 (2007).

254. He, Y. & Liu, D. R. A Sequential Strand-Displacement Strategy Enables Efficient Six-

Step DNA-Templated Synthesis. J. Am. Chem. Soc. 133, 9972–9975 (2011).

255. McKee, M. L. et al. Multistep DNA-templated reactions for the synthesis of functional sequence controlled oligomers. Angew. Chem. Int. Ed. 49, 7948–7951 (2010).

256. Milnes, P. J. et al. Sequence-specific synthesis of macromolecules using DNA-templated chemistry. Chem Commun 48, 5614–5616 (2012).

257. Mckee, M. L. et al. Programmable One-Pot Multistep Organic Synthesis Using DNA

Junctions. J. Am. Chem. Soc. 134, 1446–1449 (2012).

258. He, Y. & Liu, D. R. Autonomous multistep organic synthesis in a single isothermal solution mediated by a DNA walker. Nat. Nanotechnol. 5, 778–782 (2010).

259. Meng, W. et al. An autonomous molecular assembler for programmable chemical synthesis. Nat. Chem. 8, 542–548 (2016).

260. Datta, B., Schuster, G. B., McCook, A., Harvey, S. C. & Zakrzewska, K. DNA-Directed

Assembly of Polyanilines: Modified Cytosine Nucleotides Transfer Sequence Programmability to a Conjoined Polymer. J. Am. Chem. Soc. 128, 14428–14429 (2006).

62 261. Chen, W., Josowicz, M., Datta, B., Schuster, G. B. & Janata, J. In Situ

Electropolymerization of DNA-Templated Aniline Assemblies on a Gold Surface. Electrochem.

Solid-State Lett. 11, E11 (2008).

262. Datta, B. & Schuster, G. B. DNA-Directed Synthesis of Aniline and 4-Aminobiphenyl

Oligomers: Programmed Transfer of Sequence Information to a Conjoined Polymer Nanowire. J.

Am. Chem. Soc. 130, 2965–2973 (2008).

263. Srinivasan, S. & Schuster, G. B. A Conjoined Thienopyrrole Oligomer Formed by Using

DNA as a Molecular Guide. Org. Lett. 10, 3657–3660 (2008).

264. Chen, W. et al. Development of Self-Organizing, Self-Directing Molecular Nanowires:

Synthesis and Characterization of Conjoined DNA−2,5-Bis(2-thienyl)pyrrole Oligomers.

Macromolecules 43, 4032–4040 (2010).

265. Chen, W. & Schuster, G. B. DNA-Programmed Modular Assembly of Cyclic and Linear

Nanoarrays for the Synthesis of Two-Dimensional Conducting Polymers. J. Am. Chem. Soc. 134,

840–843 (2012).

266. Chen, W. & Schuster, G. B. Precise Sequence Control in Linear and Cyclic Copolymers of 2,5-Bis(2-thienyl)pyrrole and Aniline by DNA-Programmed Assembly. J. Am. Chem. Soc.

135, 4438–4449 (2013).

267. Liu, Y. et al. Templated synthesis of nylon nucleic acids and characterization by nuclease digestion. Chem. Sci. 3, 1930–1937 (2012).

268. Doi, Y., Ohtsuki, T., Shimizu, Y., Ueda, T. & Sisido, M. Elongation Factor Tu Mutants

Expand Amino Acid Tolerance of Protein System. J. Am. Chem. Soc. 129, 14458–

14462 (2007).

63 269. Ieong, K.-W., Pavlov, M. Y., Kwiatkowski, M., Ehrenberg, M. & Forster, A. C. A tRNA body with high affinity for EF-Tu hastens ribosomal incorporation of unnatural amino acids.

RNA 20, 632–643 (2014).

270. Park, H.-S. et al. Expanding the genetic code of Escherichia coli with phosphoserine.

Science 333, 1151–1154 (2011).

271. Lee, S. et al. A Facile Strategy for Selective Incorporation of Phosphoserine into

Histones. Angew. Chem. Int. Ed. 52, 5771–5775 (2013).

272. Aldag, C. et al. Rewiring Translation for Elongation Factor Tu-Dependent

Selenocysteine Incorporation. Angew. Chem. Int. Ed. 52, 1441–1445 (2013).

273. Dedkova, L. M., Fahmi, N. E., Golovine, S. Y. & Hecht, S. M. Enhanced d-Amino Acid

Incorporation into Protein by Modified Ribosomes. J. Am. Chem. Soc. 125, 6616–6617 (2003).

274. Dedkova, L. M., Fahmi, N. E., Golovine, S. Y. & Hecht, S. M. Construction of Modified

Ribosomes for Incorporation of d-Amino Acids into Proteins. Biochemistry 45, 15541–15551

(2006).

275. Dedkova, L. M. et al. β-Puromycin Selection of Modified Ribosomes for in Vitro

Incorporation of β-Amino Acids. Biochemistry 51, 401–415 (2012).

276. Maini, R. et al. Incorporation of β-amino acids into dihydrofolate reductase by ribosomes having modifications in the peptidyltransferase center. Bioorg. Med. Chem. 21, 1088–1096

(2013).

277. Maini, R. et al. Ribosome-Mediated Incorporation of Dipeptides and Dipeptide

Analogues into Proteins in Vitro. J. Am. Chem. Soc. 137, 11206–11209 (2015).

278. Melo Czekster, C., Robertson, W. E., Walker, A. S., Söll, D. & Schepartz, A. In Vivo

Biosynthesis of a β-Amino Acid-Containing Protein. J. Am. Chem. Soc. 138, 5194–5197 (2016).

64

Chapter 2

Construction of a highly functionalized nucleic acid polymer library and selection for

polymers that bind protein targets

Parts adapted from: Z. Chen, P. A. Lichtor, A. P. Berliner, J. C. Chen, & D. R. Liu. Nat. Chem.

(2018) doi: 10.1038/s41557-018-0008-9

65 2.1 – Introduction

The gene-encoded synthesis and Darwinian selection of sequence-defined biopolymers are fundamental features of all known forms of life. These processes have been harnessed in the laboratory to evolve RNA1–3, DNA4–6, and polypeptides7–10 with a variety of binding and catalytic properties through iterated cycles of biopolymer translation, selection, replication, and mutation. The speed and effectiveness of the evolutionary process has inspired efforts to apply these principles to the much larger chemical space of synthetic polymers11. To date, however, the evolution of sequence-defined non-natural polymers in the laboratory has been limited to analogs of nucleic acids12–15 and polypeptides16 that can be synthesized by polymerases and ribosomes.

Polymerases and ribosomes impose structural requirements on the building blocks that can be polymerized and thereby limit the diversity of synthetic polymers that are accessible to directed evolution. In particular, polymerases use only mononucleotides as substrates, precluding the ability to encode a diverse set of codons and side chains. Known classes of polymerase- synthesized functional non-natural nucleic acid polymers, including those derived from non- natural sugar backbones17–21, uniform installation of hydrophobic22–29 or positively charged30–36 side-chains on nucleobases, or introduction of novel nucleobases to the four canonical possibilities37–40, therefore have chemical diversities that are only modestly expanded beyond those of natural DNA and RNA, and fall short of the much more diverse chemical functionality present in proteins.

Previously we developed an in vitro system that uses DNA ligase to translate DNA sequences into sequence-defined highly functionalized nucleic acid polymers (HFNAPs) containing a wide range of side-chains chosen by the researcher41. In our original report, we showed that DNA sequences can be translated into HFNAPs using DNA ligase to catalyze the

66 polymerization of up to 50 consecutive short, chemically functionalized oligonucleotide building blocks along a DNA template41 (Figure 2.1). We discovered that T4 DNA ligase accepts trinucleotide building blocks with a wide range of side-chains on the 5’ nucleobase, including side-chains at the C5 position of pyrimidine nucleobases that are both synthetically accessible and unlikely to disrupt Watson-Crick base pairing41.

R R R trinucleotide

codon P P P R R R R R R R

P P P

R R R R T3 DNA ligase reading frame initiating primer terminating primer P binding site binding site

Figure 2.1 | Reaction scheme for DNA ligase-mediated translation of DNA templates into sequence- defined highly functionalized nucleic acid polymers (HFNAPs).

This artificial translation system allows researchers, in principle, to mimic and even expand the chemical repertoire of protein building blocks in an evolvable synthetic polymer system. The broad chemical scope of HFNAPs gives them the potential to adopt unique folding and functional properties distinct from those of known natural or non-natural nucleic acid polymers. The original HFNAP system proved difficult to support the evolution of functional polymers, however, likely because of the limited diversity provided by its eight-codon genetic code and the long, flexible linkers present in the building blocks. In this study, we designed a new HFNAP “genetic code”, translation system, and in vitro selection system that overcomes these challenges, then applied the resulting HFNAP evolution system to generate sequence- defined synthetic polymers that binds two protein targets of biomedical interest.

67 2.2 – Design and validation of a new HFNAP translation system

Our new genetic code was designed to offer a high degree of both codon and side-chain diversity to evolving polymers (Figure 2.2). We included the maximum number of different trinucleotides containing a 5’ pyrimidine (all 32 possible YNN combinations, where Y = C or T and N = A, C, G, or T) as codons. The corresponding 32 building blocks were each linked to one of eight side-chains (four codons per side-chain) that include hydrophobic, aliphatic, aromatic, halogenated, polar, and charged groups. The linkers between side-chains and nucleobases were designed to have limited conformational flexibility in order to increase the likelihood that the polymer backbone, nucleobases, and side-chains would cooperatively adopt defined folded structures. We assigned each side-chain to a set of four codons that collectively contained the same balance of A/T versus C/G bases following the side-chain-functionalized 5’ pyrimidine base.

OH

HN F N OH

NH2 O

R R O NH O NH O NH NH3 N HN

O N O N p-CTT p-CAA p-CAT p-CTA p-TTT p-TAT p-TAA p-TTA p-CTG p-CAC p-CAG p-CCA p-TTG p-TAG p-TAC p-TTC C p-CGT p-CTC p-CGA p-CCT T p-TGT p-TGA p-TCA p-TCT p-CCG p-CGG p-CGC p-CCC p-TCG p-TGC p-TCC p-TGG

Figure 2.2 | Structures of 5’-phosphorylated trinucleotide building blocks for HFNAP library synthesis.

68 We improved DNA-templated polymerization (artificial “translation”) reactions by screening ligase enzymes and adjusting polymerization conditions. We found that subjecting translation reactions to a slow (0.01 °C/s) temperature ramp to 4 °C before initiating ligation with T3 DNA ligase substantially improved yields of full-length HFNAP from libraries of DNA templates containing random coding regions of 45 nucleotides, which encoded the incorporation of 15 consecutive side-chain-functionalized trinucleotide building blocks of mixed sequence

(Figure 2.3). To test the ability of translated polymers to be “reverse-translated” back into DNA, thereby enabling iterated cycles of translation, selection, reverse translation, and PCR amplification, we performed polymerization on four templates that each encoded the incorporation of eight different building blocks and collectively covered all 32 building blocks,

a Template 11-mer library 13-mer library 15-mer library Functionalized trimer mix + - + - + -

Template+primers hybrid Template+primers hybrid Polymerization product Polymerization product

b Enzyme T4 T3 T7

Trimer mix + + - + + - + + - Temperature ramp time 8h 2h 2h 8h 2h 2h 8h 2h 2h

Template (11-mer library) + primers Polymerization product

Figure 2.3 | Translation of libraries of randomized DNA templates into HFNAPs. (a) T3 DNA ligase- catalyzed translation reactions, as well as control reactions from which the trinucleotide building blocks were omitted, using template libraries of increasing length were analyzed by polyacrylamide gel electrophoresis on a non-denaturing 10% TBE gel and imaged by SYBR gold staining. Full-length HFNAP- DNA duplex products that incorporate up to 15 consecutive functionalized trinucleotide building blocks were obtained. (b) Screening of reaction conditions for polymerization. T3 DNA ligase mediated higher translation efficiency than T4 and T7 DNA ligases. Increasing the temperature ramp-down time from 2 hours to 8 hours had little effect on translation yield.

69 and then subjected the resulting HFNAP products, separated from template DNA, to reverse translation in a PCR reaction using Q5 DNA polymerase39 (Figure 2.4). One of the PCR primers binds a 3’-overhang (Figure 2.4) present in the HFNAP products but absent in template DNA, precluding the amplification of any contaminating template DNA. DNA sequencing of the resulting PCR products showed that the original sequence information in the templates was faithfully recovered (Figure 2.4), indicating that both translation from DNA to HFNAPs and reverse translation from HFNAPs to their encoding DNA occur with high sequence fidelity using this set of building blocks.

TTG AAA G T G C AA G A G A C A CCG C G A Biotinylated Biotin template DNA

1. Translation with full building block set 2. Capture on immobilized streptavidin T A G A T A T GGC T A GGGT C AAGGGC A 3. Strand separation

R R R HFNAP strand without A T G TTA C T GGT A T C G T G A G C GGGA template DNA R R R

PCR (Q5 DNA polymerase) DNA AA G T AAC A GGAAAC G A G A C GGCCA sequencing Reverse-translated DNA duplex

Figure 2.4 | A complete cycle of HFNAP translation, HFNAP strand isolation, and reverse translation back into DNA faithfully recovered sequence information from the original DNA templates. Left: experimental scheme; right: electrophoretic traces from Sanger sequencing. In control experiments in which the trinucleotide building blocks were omitted from the polymerization reactions, the PCR step did not generate any amplicons of the correct size.

These observations are qualitatively consistent with results from Hili and coworkers, who reported fidelities ranging from 95.1% to 98.4% per codon for ligase-mediated DNA-templated polymerization of functionalized pentanucleotides42. Perfect fidelity is not expected for a ligase- mediated polymerization, which lacks proofreading mechanisms that polymerases may have through their exonuclease activity, but we reasoned that the level of fidelity in our system may be sufficient to support iterated selection for functional polymers, consistent with our previous

70 mock selection results41. Modest levels of mutations may also confer a benefit to the selection, as reported by Benner and coworkers for selections of aptamers containing novel nucleobases38.

2.3 – Selection for HFNAPs that bind protein targets from naïve HFNAP libraries

Encouraged by these developments, we generated a library of HFNAPs containing 15 consecutive building blocks drawn from the set of 32 (theoretical polymer library space = 3 ×

1022; average HFNAP molecular weight = 28 kDa) and subjected the resulting library (starting quantity = 3 × 1012 molecules) to iterated rounds of in vitro selection (Figure 2.5) for binding to five proteins of therapeutic interest: PCSK9, IL-6, IL-33, PD-1, and PD-L1. Each HFNAP library was incubated with its respective protein target immobilized on agarose beads. After washing the beads with buffer, the bound HFNAPs were eluted by boiling the beads in buffer containing detergent. The surviving HFNAPs were reverse translated in a PCR reaction using Q5

DNA polymerase, and the resulting double-stranded were purified by PAGE. The non- template strands were removed by alkaline denaturation, and the streptavidin-bound template strands were translated back into HFNAPs by ligase-catalyzed DNA-templated polymerization

R R

Biotin Polymer library Protein target 2+ 2+ R R in PBS w/ Mg , Ca , on aldehyde-functionalized Ligase-catalyzed Strand separation BSA, Tween-20 agarose beads polymerization R R

Biotin

R R 1. Incubate 1 hr, 25 °C Strand separation Enrich binders 2. 3x 1 minute wash R R 3. Boil in 0.5% SDS PCR Biotin

(PAGE purified) qPCR R R 4. Clean up with Qiagen MinElute

Figure 2.5 | Experimental scheme for HFNAP translation, affinity selection, and reverse translation. Inset shows detailed selection conditions.

71 for the next round of PCSK9-binding selection. We note that the ability of HFNAPs to be directly reverse-translated by a DNA polymerase, though not an absolute requirement for iterated selection as demonstrated by the use of display methods to evolve nucleic acids17,43, provides a practical and high-fidelity way to complete a selection cycle.

As the iterated rounds of selection progressed, the fraction of HFNAP that was retained on beads coated with PCSK9 (a target implicated in low-density lipoprotein (LDL) metabolism and cardiovascular disease44–46) generally increased, consistent with enrichment of PCSK9- binding polymers, even though we steadily elevated selection stringency by decreasing the amount of PCSK9 protein (Figure 2.6). At the eighth round of selection, the polymer population was retained by PCSK9 protein-conjugated agarose beads with an efficiency of approximately

10%. In contrast, less than 0.1% of the same polymer population was retained on agarose beads not conjugated to any protein (Figure 2.6), suggesting that the ability of the selected polymers to bind PCSK9-linked beads arose from their ability to bind PCSK9, rather than agarose beads.

Selection Round

1 2 3 4 5 6 7 8 9 0

-4 through)

- -8 (Elution) t

C Blank beads

- -12 (Flow t C -16

PCSK9 protein 180 180 180 120 90 60 30 30 12 (pmol) Translation 5 2 2 2 2 2 2 1 2 template (pmol)

Figure 2.6 | PCSK9 binding selection progress. The HFNAP pool’s bulk affinity to PCSK9-coated beads was assessed by quantifying the amount of HFNAP in the flow-through versus the elution at each round of selection by quantitative PCR. Higher values in the graph indicate higher ratios of polymers that bound to immobilized PCSK9 and were eluted relative to polymers that flowed through the immobilized PCSK9.

72 Results from high-throughput DNA sequencing after nine rounds of the PCSK9 selection indicated that the HFNAP pool had strongly converged to just seven sequence families containing conserved sub-sequences suggestive of common binding motifs (Figure 2.7).

Sequences within the same family were likely descendants of a single parental polymer derived through mutations that accumulated through the selection process, as evidenced by high levels of homology. a

Other

Figure 2.7 | Sequenceb homology of selection-enriched PCSK9-binding HFNAPs. Relative abundances and sequence logos are shown for the dominant seven sequence clusters found in 3.1 million sequences from high-throughput DNA sequencing after nine rounds of PCSK9-binding selection from a naïveHFNAP library. : PCSK9 HFNAP : Thrombin

DNA : PCSK9 On the other hand, the selections for IL-6, IL-33, PD-1, and PD-L1 binding did not show n.d. similar levels of increase in the polymer pools’ bulk affinity to target-coated beads (Figure 2.8a).

High-throughput sequencing data for the other four selections revealed that parasitic cheater sequences, which likely arose from the ligase erroneously incorporating a polymerization primer into the coding region (Figure 2.8b), accounted for significant portions of those polymer pools after seven rounds of selection (Figure 2.8a), which probably interfered with the selection processes and the qPCR analyses. Nonetheless, the polymer pool selected-16 -for12 IL-6- 8binding-4 0

Ct(Flow-through) - Ct(Elution) 73 exhibited signs of convergence, which merited further investigation of the most abundant individual sequences (see below).

a IL-6 IL-33 0.0 23% cheaters 0.0 88% cheaters -2.0 -2.0 @ Round 7 @ Round 7 -4.0 -4.0 -6.0 thru) thru)

- -6.0 - -8.0 (Elute) (Elute) t -8.0 t C C -10.0 (Flow (Flow - - t -10.0 t C C -12.0 -12.0 -14.0 -14.0 -16.0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Round Round PD-1 PD-L1 0.0 50% cheaters 11.0 27% cheaters -2.0 @ Round 7 6.0 @ Round 7 -4.0

thru) thru) 1.0

- -6.0 - (Elute) (Elute) t -8.0 t

C C -4.0 (Flow (Flow - - t -10.0 t C C -9.0 -12.0 -14.0 -14.0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Round Round b Desired

Cheater

Figure 2.8 | Selection of HFNAPs that bind IL-6, IL-33, PD-1, and PD-L1 proteins. (a) The polymer pools’ bulk affinity to target-coated beads exhibited generally downward trends. The percentage of cheaters recovered from high-throughput sequencing data after seven rounds of selection to each target is indicated in each panel. (b) In addition to the desired polymerization reaction (top), incorporation of a polymerization primer in the coding region can generate undesired cheaters (bottom) that are rapidly enriched over rounds of selection.

74 2.4 – Characterization of PCSK9-binding HFNAPs

Using ligase-mediated DNA-templated polymerization, we individually synthesized the seven HFNAPs that were the most highly enriched from the PCSK9 selection, and tested their retention on immobilized PCSK9 or immobilized thrombin, an unrelated protein. Five of the a seven tested polymers exhibited substantial apparent binding activity to immobilized PCSK9 beyond any apparent binding to immobilized thrombin. WOthere also assayed unfunctionalized DNA sequences (lacking any side-chains) corresponding to these seven HFNAPs and observed no evident PCSK9-binding activity (Figure 2.9). Together, these results suggest that the HFNAP populations emerging from nine rounds of in vitro selection converged on a small number of polymer families that bind immobilized PCSK9 in a manner dependent on the polymers’ side- chains.

b

HFNAP : PCSK9 HFNAP : Thrombin DNA : PCSK9

n.d.

-16 -12 -8 -4 0

Ct(Flow-through) - Ct(Elution)

Figure 2.9. Affinity characterization of selection-enriched PCSK9-binding HFNAPs. The graph shows retention of selection-enriched HFNAPs (only the variable, functionalized parts of the sequences are shown) on immobilized PCSK9 (target; blue bars) and immobilized thrombin (non-target; red bars) and of sequence-matched unfunctionalized DNA on immobilized PCSK9 (green bars), as quantified by qPCR. n.d. = not determined.

75 We characterized in depth PCSK9-A5, the polymer with the highest apparent PCSK9 binding activity (Figure 2.10a). We synthesized biotinylated PCSK9-A5 by templated translation and confirmed its binding affinity for PCSK9 (dissociation constant KD = 98 nM) by surface

a OH NH NH PCSK9-A5 N OH NH3 NH3N OH CGAATCAGATTGGACCAG CCA CGT TAC TGA TTC TTC TAC CGT TTT TGC CCC CGG CAC CTT CTG GAGTCCAGATGTAGGTAG b 5 -1 -1 40 kon = 1.70 10 M s 35 -2 -1 koff = 1.67 10 s 30 KD = 98 nM 25 20 15 10 Response (RU) Response 5 0 -5 -200 0 200 400 600 800 1000 1200 1400 Time (s) c O HN

O N

NH2 NH2 O O R R “ΔSide chain” for T N N HN N H O N O N O N O O Me HN N “ΔSide chain” for C H O N

“Linker only” for T d CCA CGT TAC TAC TGA TTC TTT TTT TGC TGC CCC CGG CAC CTT CTG A5 ΔSide ΔSide ΔSide Linker Linker ΔSide ΔSide Linker ΔSide Linker ΔSide ΔSide ΔSide ΔSide ΔSide chain chain chain only only chain chain only chain only chain chain chain chain chain

kon 1.70 2.08 1.30 2.16 1.38 1.56 2.04 - - 1.28 1.98 2.88 1.79 1.12 - 1.85 (105 M-1 s-1)

koff 1.67 1.76 1.08 1.79 1.10 1.25 1.77 - - 1.06 1.76 2.13 1.50 2.29 - 2.18 (10-2 s-1)

KD 98 85 83 83 80 80 87 >300 >300 83 89 74 84 204 >300 118 (nM) Figure 2.10 | Structure and PCSK9-binding activity of PCSK9-A5. (a) Sequence and side-chain structure of selected polymer PCSK9-A5. Side-chains essential for binding activity are boxed. (b) SPR sensogram characterizing binding kinetics between surface-immobilized PCSK9-A5 polymer and the target PCSK9 protein. The concentrations of injected PCSK9 were 10, 30, 100, and 300 nM. The observed sensogram is shown in red and the fitted curve with the kinetic parameters listed is shown in black. (c) Structures of bases in side-chain mutants. (d) Kinetic parameters for binding of PCSK9-A5 or its side- chain-deficient variants to PCSK9 protein, as measured by SPR. For the variants “TTT ΔSide chain”, “TTT Linker only”, and “CTT ΔSide chain”, no SPR signal was observed at highest analyte concentration tested (300 nM PCSK9).

76 plasmon resonance (SPR) (Figure 2.10b). To probe the role of the side-chain functional groups, we synthesized side-chain mutants of PCSK9-A5 in which all instances of each building block were replaced by the corresponding trinucleotide either lacking any side-chain or containing a linker but missing the side-chain’s functional group (Figure 2.10c). The removal of a single phenol side-chain in codon 9 or a single cyclopentyl side-chain in codon 14 completely abolished the binding between PCSK9-A5 and its target (Figure 2.10a and 2.10d, dark red). Furthermore, the removal of the isopentyl side-chain at codon 13 resulted in an approximately two-fold reduction in affinity (Figure 2.10a and 2.10d, light red). The individual removal of other side- chains had less significant effects on binding affinity.

2.5 – Characterization of IL-6-binding HFNAPs

The top seven HFNAPs enriched in the IL-6 selection (Figure 2.11a) were individually synthesized and assayed for binding to immobilized IL-6. Based on its high apparent binding activity to immobilized IL-6, but not to immobilized PCSK9 (Figure 2.11a), the HFNAP IL6-A7

(Figure 2.11b) was chosen for further characterization. Binding of biotinylated IL6-A7 to the target IL-6 protein was confirmed by SPR. Although the binding kinetics of IL6-A7 to IL-6 protein did not conform to a classical one-to-one binding model, a phenomenon often observed in aptamer-protein binding18,25, fitting to a heterogeneous ligand model resulted in an apparent affinity of KD = 12 nM for the major component and KD = 22 nM for the minor component

(Figure 2.11c).

77 a OH F F F HFNAP : PCSK9 NH3 OH OH OH NH3 3.6% TTG CGG CAG TCT CAG CCT CCG TGA CGG TGC TGC CAG TGG CCA CCA HFNAP : IL-6 OH OH OH OH NH NH3 NH3 NH3 OH N 2.3% TTT TTG TTC TGG CTA CGG CCT TTG CCC TTC CGG TAT CCC TCC TGT OH OH

OH NH3 NH3 NH3 OH NH3 2.0% TGT TGC TGG TTA CCG CCT CTC CTG TGG TGC TTC CGG CTG CGT TGT OH OH F OH F F

NH3 OH NH3 OH NH3 OH NH3 1.9% TTG TGG TGT TGC TCT CAG TGC CGG TTA CCG TCG TGA CGC TTC CGA OH F OH NH N NH3 NH3 OH NH3 OH 1.2% TCA CCC CTT TGG CTG TGG TTG CAG TGC CCG TGG TCG CCT TAG CCA OH OH OH NH NH N NH3 NH3 NH3 NH3 N 1.1% TCC TGG TTA CGG CCT TTG TCG CCC TTC CCC CTC TCT CCC TCA TTG OH OH OH NH NH NH OH OH NH3 OH N N N NH3 0.89% TGC TGC TGT TTA TTT CGT TGC CCT TCC TCG TAC TCA TTC CTC CGG

-10 -5 0

Ct(Flow-through) - Ct(Elute) b OH OH OH NH NH NH OH OH NH3 OH N N N NH3 IL6-A7 CTCGGATGAACCTGGACT TGC TGC TGT TTA TTT CGT TGC CCT TCC TCG TAC TCA TTC CTC CGG GGACTGAGTCCAGAGTAA c 4 -1 -1 90 ka1 = 1.85 10 M s -4 -1 kd1 = 2.26 10 s 70 KD1 = 12 nM

50 Rmax1 = 102.7 RU

30

Response (RU) 5 -1 -1 ka2 = 8.54 10 M s -2 -1 10 kd2 = 1.88 10 s

KD2 = 22 nM -10 -200 0 200 400 600 800 1000 1200 1400 1600 Rmax2 = 10.45 RU Time (s)

Figure 2.11 | Characterization of IL-6-binding HFNAPs selected from a random library. (a) Retention of individual selection-enriched HFNAPs on immobilized IL-6 (target; blue bars) or immobilized PCSK9 (non-target; red bars). The experiment was performed as a screen to identify the most promising hits for further characterization, and was not performed in replicates. The percentages of each sequence in the pool after seven rounds of selection are listed to the left. (b) Sequence and side-chain structure of IL6-A7. Side- chains essential for binding activity are boxed. (c) SPR sensogram characterizing binding kinetics between biotinylated IL6-A7 and its target IL-6 protein. The concentrations of injected IL-6 were 10, 30, 100, and 300 nM. The observed sensogram is shown in red and the fitted curve with the kinetic parameters listed is shown in black.

78 The binding kinetics and affinity of IL6-A7 to E. coli-expressed IL-6 (used in the selection) and to human HEK293 cell-expressed IL-6 protein were comparable (Figure 2.12). We then measured by SPR the IL-6 affinity of IL6-A7 side-chain mutants in which all instances of a trimer building block were replaced by the corresponding trinucleotide lacking any side-chain.

The mutants missing isopentyl side-chain in either codon 14 or codon 15 exhibited severely impaired binding (Fig 2.12), implicating these two side chains as key determinants of IL-6 binding activity in IL6-A7.

120 IL6-A7 : IL-6 (E. coli-expressed) IL6-A7 : IL-6 (HEK293-expressed) 100 CGG Δside-chain : IL-6 (E. coli-expressed) CTC Δside-chain : IL-6 (E. coli-expressed) 80

60

40

Response (RU) 20

0

-20 -200 0 200 400 600 800 1000 1200 1400 1600 Time (s) Figure 2.12 | SPR characterization of binding between IL-6 and HFNAPs. The SPR sensograms show binding kinetics between biotinylated IL6-A7 (red) or two side-chain variants (blue and green) and E. coli- expressed IL-6 protein, and between biotinylated IL6-A7 and HEK293 cell-expressed IL-6 protein (purple). For all experiments, comparable amounts of the biotinylated HFNAPs (all between 90 and 120 RU) were immobilized on the active flow cell. The concentrations of injected IL-6 were 10, 30, 100, and 300 nM.

2.6 – Discusssion

We used a ligase-mediated DNA-templated polymerization system and in vitro selection to evolve HFNAPs, nucleic acid polymers that are densely functionalized with chemically

79 diverse side-chains. HFNAPs that bind PCSK9 and IL-6 were selected from random polymer libraries. We characterized structure-activity relationships within these polymers, revealing side chains at specific positions that are critical to target-binding activity. Collectively, these findings represent to our knowledge the first laboratory evolution of functional, genetically encoded sequence-defined synthetic polymers without the constraints imposed by polymerases or ribosomes. The DNA-templated, ligase-based translation system developed here supports many rounds of iterated selection of polymers with diverse side-chains, including side-chains that mimic—and extend beyond—the repertoire of amino acid side-chains found in proteins. Both the

PCSK9-binding and IL-6-binding polymers generated in this system exhibit position-dependent and side-chain dependent structure-activity relationships resembling those of proteins. Finally, we note that the PCSK9-binding polymers generated in this work are strongly dependent on the presence of multiple side-chains with different physical properties, consistent with the importance of chemical diversity to the functional potential of these polymers.

The ligase-based polymerization method allows straightforward redesign of the genetic code of the polymer, as we exploited to expand the sequence and structural diversity of the polymers used in this work compared with those of our original system41. This feature also enables researchers to generate and select HFNAPs with side-chains tailored toward specific applications, as recently demonstrated by Hili and coworkers for scaffolding peptides on a DNA template47. Moreover, the side-chain flexibility of this polymer evolution system raises the possibility of performing parallel evolution experiments with libraries of different side-chain compositions to shed light on the fundamental relationship between the structure of the building blocks in a genetic code and the evolutionary potential of the resulting polymers.

80 References

1. Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822 (1990).

2. Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).

3. Robertson, D. L. & Joyce, G. F. Selection in vitro of an RNA enzyme that specifically cleaves single-stranded DNA. Nature 344, 467–468 (1990).

4. Bock, L. C., Griffin, L. C., Latham, J. A., Vermaas, E. H. & Toole, J. J. Selection of single-stranded DNA molecules that bind and inhibit human thrombin. Nature 355, 564–566

(1992).

5. Ellington, A. D. & Szostak, J. W. Selection in vitro of single-stranded DNA molecules that fold into specific ligand-binding structures. Nature 355, 850–852 (1992).

6. Breaker, R. R. & Joyce, G. F. A DNA enzyme that cleaves RNA. Chem. Biol. 1, 223–229

(1994).

7. Mattheakis, L. C., Bhatt, R. R. & Dower, W. J. An in vitro polysome display system for identifying ligands from very large peptide libraries. Proc. Natl. Acad. Sci. U. S. A. 91, 9022–

9026 (1994).

8. Roberts, R. W. & Szostak, J. W. RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. U. S. A. 94, 12297–12302 (1997).

9. Nemoto, N., Miyamoto-Sato, E., Husimi, Y. & Yanagawa, H. In vitro virus: Bonding of mRNA bearing puromycin at the 3’-terminal end to the C-terminal end of its encoded protein on the ribosome in vitro. FEBS Lett. 414, 405–408 (1997).

81 10. Yamaguchi, J. et al. cDNA display: a novel screening method for functional disulfide- rich peptides by solid-phase synthesis and stabilization of mRNA-protein fusions. Nucleic Acids

Res. 37, e108 (2009).

11. Brudno, Y. & Liu, D. R. Recent Progress Toward the Templated Synthesis and Directed

Evolution of Sequence-Defined Synthetic Polymers. Chem. Biol. 16, 265–276 (2009).

12. Chaput, J. C., Yu, H. & Zhang, S. The Emerging World of Synthetic Genetics. Chem.

Biol. 19, 1360–1371 (2012).

13. Pinheiro, V. B. & Holliger, P. The XNA world: progress towards replication and evolution of synthetic genetic polymers. Curr. Opin. Chem. Biol. 16, 245–252 (2012).

14. Pinheiro, V. B. & Holliger, P. Towards XNA nanotechnology: new materials from synthetic genetic polymers. Trends Biotechnol. 32, 321–328 (2014).

15. Hollenstein, M. Nucleoside Triphosphates — Building Blocks for the Modification of

Nucleic Acids. Molecules 17, 13569–13591 (2012).

16. Rogers, J. M. & Suga, H. Discovering functional, non-proteinogenic amino acid containing, peptides using genetic code reprogramming. Org. Biomol. Chem. 13, 9353–9363

(2015).

17. Yu, H., Zhang, S. & Chaput, J. C. Darwinian evolution of an alternative genetic system provides support for TNA as an RNA progenitor. Nat. Chem. 4, 183–187 (2012).

18. Pinheiro, V. B. et al. Synthetic Genetic Polymers Capable of Heredity and Evolution.

Science 336, 341–344 (2012).

19. Taylor, A. I. et al. Catalysts from synthetic genetic polymers. Nature 518, 427–430

(2014).

82 20. Dunn, M. R. & Chaput, J. C. Reverse Transcription of Threose Nucleic Acid by a

Naturally Occurring DNA Polymerase. ChemBioChem 17, 1804–1808 (2016).

21. Taylor, A. I. & Holliger, P. Directed evolution of artificial enzymes (XNAzymes) from diverse repertoires of synthetic genetic polymers. Nat. Protoc. 10, 1625–1642 (2015).

22. Vaught, J. D. et al. Expanding the chemistry of DNA for in vitro selection. J. Am. Chem.

Soc. 132, 4141–4151 (2010).

23. Gold, L. et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker

Discovery. PLoS ONE 5, e15004 (2010).

24. Davies, D. R. et al. Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. Proc. Natl. Acad. Sci. U. S. A. 109, 19971–19976

(2012).

25. Gupta, S. et al. Chemically-Modified DNA Aptamers Bind Interleukin-6 with High

Affinity and Inhibit Signaling by Blocking its Interaction with Interleukin-6 Receptor. J. Biol.

Chem. 289, 8706–8719 (2014).

26. Gelinas, A. D. et al. Crystal Structure of Interleukin-6 in Complex with a Modified

Nucleic Acid Ligand. J. Biol. Chem. 289, 8720–8734 (2014).

27. Imaizumi, Y. et al. Efficacy of Base-Modification on Target Binding of Small Molecule

DNA Aptamers. J. Am. Chem. Soc. 135, 9412–9419 (2013).

28. Gawande, B. N. et al. Selection of DNA aptamers with two modified bases. Proc. Natl.

Acad. Sci. U. S. A. 114, 2898–2903 (2017).

29. Tolle, F., Brändle, G. M., Matzner, D. & Mayer, G. A Versatile Approach Towards

Nucleobase-Modified Aptamers. Angew. Chem. Int. Ed. 54, 10971–10974 (2015).

83 30. Santoro, S. W., Joyce, G. F., Sakthivel, K., Gramatikova, S. & Barbas, C. F. RNA cleavage by a DNA enzyme with extended chemical functionality. J. Am. Chem. Soc. 122, 2433–

2439 (2000).

31. Perrin, D. M., Garestier, T. & Hélène, C. Expanding the catalytic repertoire of nucleic acid catalysts: simultaneous incorporation of two modified deoxyribonucleoside triphosphates bearing ammonium and imidazolyl functionalities. Nucleosides Nucleotides 18, 377–391 (1999).

32. Perrin, D. M., Garestier, T. & Hélène, C. Bridging the Gap between Proteins and Nucleic

Acids: A Metal-Independent RNAseA Mimic with Two Protein-Like Functionalities. J. Am.

Chem. Soc. 123, 1556–1563 (2001).

33. Lermer, L., Roupioz, Y., Ting, R. & Perrin, D. M. Toward an RNaseA mimic: A

DNAzyme with imidazoles and cationic amines. J. Am. Chem. Soc. 124, 9960–9961 (2002).

34. Hollenstein, M., Hipolito, C. J., Lam, C. H. & Perrin, D. M. A self-cleaving DNA enzyme modified with amines, guanidines and imidazoles operates independently of divalent metal cations (M2+). Nucleic Acids Res. 37, 1638–1649 (2009).

35. Shoji, A., Kuwahara, M., Ozaki, H. & Sawai, H. Modified DNA Aptamer That Binds the

(R)-Isomer of a Thalidomide Derivative with High Enantioselectivity. J. Am. Chem. Soc. 129,

1456–1464 (2007).

36. Sidorov, A. V., Grasby, J. A. & Williams, D. M. Sequence-specific cleavage of RNA in the absence of divalent metal ions by a DNAzyme incorporating imidazolyl and amino functionalities. Nucleic Acids Res. 32, 1591–1601 (2004).

37. Sefah, K. et al. In vitro selection with artificial expanded genetic information systems.

Proc. Natl. Acad. Sci. 111, 1449–1454 (2014).

84 38. Zhang, L. et al. Evolution of functional six-nucleotide DNA. J. Am. Chem. Soc. 137,

6734–6737 (2015).

39. Kimoto, M., Yamashige, R., Matsunaga, K., Yokoyama, S. & Hirao, I. Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 31, 453–457

(2013).

40. Matsunaga, K., Kimoto, M. & Hirao, I. High-Affinity DNA Aptamer Generation

Targeting von Willebrand Factor A1-Domain by Genetic Alphabet Expansion for Systematic

Evolution of Ligands by Exponential Enrichment Using Two Types of Libraries Composed of

Five Different Bases. J. Am. Chem. Soc. 139, 324–334 (2017).

41. Hili, R., Niu, J. & Liu, D. R. DNA ligase-mediated translation of DNA into densely functionalized nucleic acid polymers. J. Am. Chem Soc. 135, 98–101 (2013).

42. Lei, Y., Kong, D. & Hili, R. A High-Fidelity Codon Set for the T4 DNA Ligase-

Catalyzed Polymerization of Modified Oligonucleotides. ACS Comb. Sci. 17, 716–721 (2015).

43. Brudno, Y., Birnbaum, M. E., Kleiner, R. E. & Liu, D. R. An in vitro translation, selection and amplification system for peptide nucleic acids. Nat. Chem. Biol. 6, 148–155

(2010).

44. Cohen, J. C., Boerwinkle, E., Mosley, T. H. & Hobbs, H. H. Sequence Variations in

PCSK9, Low LDL, and Protection against Coronary Heart Disease. New Engl. J. Med. 354,

1264–1272 (2006).

45. Lambert, G., Charlton, F., Rye, K.-A. & Piper, D. E. Molecular basis of PCSK9 function.

Atherosclerosis 203, 1–7 (2009).

46. Stein, E. A. et al. Effect of a Monoclonal Antibody to PCSK9 on LDL Cholesterol. N.

Engl. J. Med. 366, 1108–1118 (2012).

85 47. Guo, C., Watkins, C. P. & Hili, R. Sequence-Defined Scaffolding of Peptides on Nucleic

Acid Polymers. J. Am. Chem. Soc. 137, 11191–11196 (2015).

86

Chapter 3

Evolution of HFNAP and characterization of a high-affinity binder to PCSK9

Parts adapted from: Z. Chen, P. A. Lichtor, A. P. Berliner, J. C. Chen, & D. R. Liu. Nat. Chem.

(2018) doi: 10.1038/s41557-018-0008-9.

Sunia Trauger (Harvard) collected oligonucleotide mass spectrometry data. Yun Yang and coworkers (WuXi AppTec) synthesized the modified nucleoside phosphoramidites. Doug Davies and coworkers (Beryllium Discovery) and Romain Rouet and coworkers (UC Berkeley) performed crystallization condition screening.

87 3.1 – Introduction

The discovery of aptamers from random sequence nucleic acid libraries through SELEX has been established since 1990.1,2 In recent years, however, the search for reliable ways to routinely generate aptamers whose affinity and specificity toward protein targets could equal or surpass those of antibodies, the current gold standard of affinity reagents, has prompted a new wave of technological innovation in the aptamer field.

One interesting line of research focused on the development of new aptamer discovery techniques. The advent of high-throughput sequencing technologies allowed researchers to profile aptamer candidate pools in much greater depth. These studies led to the finding that aptamers of the highest affinities may not attain the highest enrichment over many rounds of

SELEX, perhaps due to biases accumulated in repeated amplifications3,4. To develop alternative methods to screen for the highest affinity aptamers after selection, Soh and coworkers explored strategies that feature either a small number of rounds of SELEX followed by massively parallel array-based affinity profiling5 or a single round of SELEX followed by high-throughput fluorescence-activated sorting of particles that each displays multiple clonal copies of an aptamer candidate6. Impressively, the particle display sorting strategy routinely generated natural DNA aptamers to protein targets with Kd values in the low- to sub-nanomolar range6, and sorting can be performed in complex biological matrices such as serum to enrich highly specific binders7.

A complementary approach, from which our studies drew much inspiration, aimed to improve the aptamer candidate pools’ chemical composition to increase the probability of obtaining high-affinity aptamers. A systematic series of studies were published by researchers at

SOMALogic, who selected aptamers against protein targets, including PCSK9, using conventional SELEX procedures from modified DNA libraries in which all instances of one or

88 both pyrimidines (C and/or T) were replaced by side-chain-functionalized variants8,9. Libraries modified with hydrophobic aromatic side chains consistently afforded aptamers of higher affinity

(KD values typically in the low- to sub-nanomolar range) than those from unmodified DNA libraries. Libraries modified with a combination of phenolic and hydrophobic aromatic side chains performed even better, while hydrophilic side chains alone did not significantly improve aptamer discovery success rate8,9. Hirao and coworkers, working with nucleic acid libraries containing base pairs orthogonal to the canonical A:T and G:C pairs, also noted that the inclusion of hydrophobic aromatic moieties in their DNA library improved the affinity of SELEX-derived aptamers toward protein targets10,11.

Crystal structures of several modified aptamers in complex with their respective target proteins have been solved. Remarkably, all these aptamers extend side-chain hydrophobic moieties toward their respective protein target, suggesting a common mechanism for affinity gain12. While the secondary structure of some of those aptamers can resemble well-known nucleic acid structural motifs13,14, other aptamers adopt strikingly unusual structures with aromatic side chains playing integral structural roles15,16. These studies inspired us to attempt to obtain detailed structural information of our PCSK9-binding HFNAPs.

The potential existence of better binders in the HFNAP sequence space that our initial

SELEX may not have enriched, the reports of similarly side-chain-modified high-affinity (Kd down to 12 pM) aptamers against PCSK99, and the desire to obtain an improved binder before committing to more detailed structural studies prompted us to evolve PCSK9-A5 for higher binding activity.

89 3.2 – Evolution of PCSK9-A5 by diversification and further selection

To evolve the PCSK9-A5 polymer into variants with improved PCSK9 affinity, we synthesized a library of mutated PCSK9-A5 templates containing 79% identity and 21% diversity (79:21 at the pyrimidine-only first position and 79:7:7:7 at the second and third positions of each codon) for each nucleotide in the variable region (Figure 3.1a), and then subjected the resulting mutated PCSK9-A5 library to additional iterated cycles of translation and selection for PCSK9 binding. A similar strategy has previously been used to improve RNA aptamers17,18 as well as DNA aptamers containing noncanonical bases10, and the outcome of the additional evolution could also be used to pinpoint functional sequence segments within the aptamer and to infer secondary structure information10. An initial attempt at evolving PCSK9-A5 failed due to rapid enrichment of cheaters (Figure 3.2a). We therefore used a non-extendable

2’,3’-dideoxyribose-terminated 3’ polymerization primer in subsequent selections, as truncated

a PCSK9-A5 b Diversify Selection Round 1 2 3 4 5 6 0 Re-select ) -2 -4 through)

Round 2 - (Elute

t -6 C

- -8 (Flow t -10 C Increase stringency -12 Round 6 PCSK9 12 12 1.8 1.8 1.8 0.5 (pmol)

Blue = Retained heterogeneity Green = Enriched mutations Red = Converged towards original sequence

Figure 3.1 | Evolution of a PCSK9-binding polymer. (a) Evolution scheme and DNA sequencing results of the diversification and iterated selection of PCSK9-A5 variants with increased PCSK9 binding activity. (b) Affinity maturation of the diversified PCSK9-A5 pool. The evolving polymer pool’s bulk affinity to immobilized PCSK9 was assessed by quantifying the amount of HFNAP in the flow-through and the elution at each round of selection by quantitative PCR.

90 cheaters could be removed during PAGE purification (Figure 3.2b). After just one round of enrichment at a stringency level comparable to that of the last round of our initial selection, the mutated PCSK9-A5 library exhibited bulk affinity for PCSK9-conjugated beads (Figure 3.1b).

High-throughput DNA sequencing revealed that the mutant population after a second round of translation and selection began to converge toward the sequence of PCSK9-A5 at many positions

(Figure 3.1a). In subsequent rounds, we reduced the amount of immobilized PCSK9 to further increase selection stringency. Four additional rounds of translation and selection resulted in steadily improved retention of the polymer population on immobilized PCSK9 (Figure 3.1b).

High-throughput sequencing revealed new consensus codons at four out of 15 positions within the population of evolved polymers (Figure 3a). Among these four positions, codons 6,

10, and 12 evolved a different side-chain compared with that of PCSK9-A5, while codon 9 evolved a different codon (TGT instead of TTT) encoding the same phenol side-chain at this position. In addition, five other codons converged to the original sequence in PCSK9-A5, and six

a Desired

Cheater

b Desired

Removed by PAGE after PCR Would-be cheater

Figure 3.2 | Cheater suppression strategy. (a) In addition to the desired polymerization reaction (top), incorporation of a polymerization primer in the coding region can generate undesired cheaters (bottom) that are rapidly enriched over rounds of selection. While our first PCSK9-binding selection campaign was not substantially affected by cheaters, our initial attempt at evolving PCSK9-A5 for higher affinity was unsuccessful because the cheaters eventually took over the pool. (b) Using a non-extendable 2’,3’- dideoxyribose-terminated 3’ polymerization primer addresses this problem, as truncated cheaters can be removed during PAGE purification.

91 other codon positions, mostly near the 5’ end, retained sequence heterogeneity introduced during mutation, suggesting that these positions do not strongly contribute to binding activity (Figure

3.1a). The three side-chain functional groups that were shown to be crucial to the PCSK9- binding activity of PCSK9-A5 were all maintained in the consensus sequence of the evolved polymer population.

3.3 – Activity and SAR characterization of the new consensus HFNAP, PCSK9-Evo5

We synthesized a biotinylated, truncated HFNAP (designated PCSK9-Evo5; Figure 3.3a) retaining only the evolved consensus sequence and the 3’ constant region and measured its affinity for PCSK9 to be KD = 3.0 nM (Figure 3.3a and 3.3b), representing a 33-fold increase over that of PCSK9-A5. In contrast, similarly truncated version of PCSK9-A5 that either retained (A5-trc1) or lost (A5-trc2) the 5’ constant region did not substantially improve affinity

(KD = 70 nM and 68 nM, respectively) over that of full-length PCSK9-A5 (KD = 98 nM) (Figure

3.3c). Additional truncation of PCSK9-Evo5 from the 3’-end resulted in complete loss of affinity

(Evo2, KD > 300 nM) (Figure 3.3c).

Similar to the side-chain structure-activity relationships observed for PCSK9-A5, the removal of a single phenol or cyclopentyl side-chain from PCSK9-Evo5 abolished its affinity to

PCSK9 protein (Figure 3.4b), and the removal of the isopentyl side-chain severely impaired target binding (the binding interaction approximately fits a two-state reaction kinetic model with

KD ≈ 420 nM; Figure 3.4b and 3.4c). The individual removal of other side-chains had less significant effects on binding affinity; indeed, an HFNAP containing only three side chains

(phenol at position 9, isopentyl at position 21, and cyclopentyl at position 24) and the rest of the

92 a OH NH PCSK9-Evo5 N NH3 OH GC TAC CGT TGT TTA CCC TGC CAC CTT CTG GAGTCCAGATGTAGGTAG

b RU 5 -1 -1 kon = 7.68 10 M s Evo5 : PCSK9 45 -3 -1 koff = 2.27 10 s 35 KD = 3.0 nM

25

Response 15

5

-5 -200 0 200 400 600 800 1000 1200 1400 1600 Time s

OH c NH NH N N OH NH3 NH3 OH A5 (98 nM) CGAATCAGATTGGACCAG CCA CGT TAC TGA TTC TTC TAC CGT TTT TGC CCC CGG CAC CTT CTG GAGTCCAGATGTAGGTAG

OH NH NH3N OH A5-trc1 (70 nM) CGAATCAGATTGGACCAG TTC TAC CGT TTT TGC CCC CGG CAC CTT CTG GAGTCCAGATGTAGGTAG

OH NH NH3N OH A5-trc2 (68 nM) TC TTC TAC CGT TTT TGC CCC CGG CAC CTT CTG GAGTCCAGATGTAGGTAG

OH NH OH N NH3 OH Evo2 (>300 nM) TC TGC TAC CGT TGT TTA CCC TGC CAC CTT CTG GAGTCCAGATGTAG----

Figure 3.3 | Structure and PCSK9-binding activity of PCSK9-Evo5, PCSK9-A5, and their truncated variants. (a) Structure of the evolved polymer PCSK9-Evo5. (b) SPR sensogram characterizing binding kinetics between surface-immobilized PCSK9-Evo5 polymer and the target PCSK9 protein. The concentrations of injected PCSK9 were 2, 6, 20, and 60 nM. The observed sensogram is shown in red and the fitted curve with the kinetic parameters listed is shown in black. (c) Structure and PCSK9-binding activity of PCSK9-A5, two of its truncation variants, and a truncation variant of PCSK9-A5. Numbers in parentheses are KD values to PCSK9 protein determined by SPR.

Evo5 sequence as unfunctionalized DNA maintains strong binding to PCSK9 (KD = 3.7 nM;

Figure 3.4b and 3.4d).

Together, these results establish the evolution (iterated selection with intervening mutation and replication) of a sequence-defined synthetic polymer with improved target affinity.

93 a OH NH N NH3 OH PCSK9-Evo5 GC TAC CGT TGT TTA CCC TGC CAC CTT CTG GAGTCCAGATGTAGGTAG a RU

b 5 -1 -1 45 kon = 7.68 TAC10 MTACs CGT TGT TGT TTA CCC TGC TGC CAC CTTEvo5 : CTGPCSK9 3 side -3 -1 koffEvo5= 2.27Δ Side10 sLinker ΔSide ΔSide Linker ΔSide ΔSide ΔSide Linker ΔSide ΔSide ΔSide chains chain only chain chain only chain chain chain only chain chain chain only 35 KD = 3.0 nM k on 7.68 8.38 6.64 6.14 - - 5.83 5.15 6.76 5.34 - - 6.17 18.3 (10255 M-1 s-1) Response 15koff 2.27 2.96 2.93 4.42 - - 3.08 2.16 4.10 2.69 - - 4.12 6.82 (10-3 s-1) 5 K D 3.0 3.5 4.4 7.2 >60 >60 5.3 4.2 6.1 5.0 ~420 >60 6.7 3.7 -5(nM) -200 0 200 400 600 800 1000 1200 1400 1600 Time s c b RU ka1 ka2 A + B AB AB* CAC Δside-chain : PCSK9 12 kd1 kd2 10 5 -1 -1 -3 -1 ka1 = 7.68 10 M s ka2 = 2.48 10 s 8 -3 -1 -3 -1 kd1 = 2.27 10 s kd2 = 3.74 10 s 6 KD,total = 420 nM Response 4 2 0 -2 -200 0 200 400 600 800 1000 1200 1400 Time s c

6 -1 -1 kon = 1.83 10 M s -3 -1 koff = 6.82 10 s 3 side-chains : PCSK9

KD = 3.7 nM

Figure 3.4 | Side-chain structure-activity relationship of PCSK9-Evo5. (a) Sequence and side-chain structure of PCSK9-Evo5. Side chains essential for binding activity are boxed. (b) Kinetic parameters for binding of PCSK9-Evo5 or its side-chain-deficient variants to PCSK9 protein, as measured by SPR. For the variants “TGT ΔSide chain”, “TGT Linker only”, and “CTT ΔSide chain”, no SPR signal was observed at the highest analyte concentration tested (60 nM PCSK9). For the variant “CAC ΔSide chain”, the binding interaction fits a two-state reaction kinetic model with KD ≈ 420 nM. (c) SPR sensograms characterizing binding kinetics between surface-immobilized PCSK9-Evo5 side-chain variants (“CAC ΔSide chain” and “3 side-chains”) and the target PCSK9 protein. The concentrations of injected PCSK9 were 2, 6, 20, and 60 nM. The observed sensograms are shown in red and the fitted curves with the kinetic parameters listed are shown in black.

94 3.4 – Milligram scale chemical synthesis of Evo5

To further characterize PCSK9-Evo5, we executed the multi-milligram-scale total synthesis of a variant of PCSK9-Evo5 (PCSK9-Evo5-syn), which has an additional 3’ inverted dT base for exonuclease resistance20 , using standard phosphoramidite chemistry on solid support

(Figure 3.5a). We planned to install all side-chain-functionalized building blocks entirely through corresponding side-chain-functionalized phosphoramidite reagents, with the nucleophilic

a

PCSK9-Evo5-syn

b Intensity Intens . 7- -MS, 0.3-0.6min #19-37 4 4 x10 2280.5777 x10 7- 4 7- 2280.4344 7- 2280.8642 2280.5777 7- 2280.2911 7- 4 3 2281.0071

7- 7- 2280.1479 2281.1507 2 7- 7- 2281.2940 2280.0051 7- 2281.4370 8- 7- 3 7- 2281.5801 7- 1 2279.8619 7- 2281.7228 1995.3797 2279.7200

0 C₅₃₆H₆₈₃N₁₇₇O₃₀₅P₄₈, M-nH, 2279.5753 7- 2280.5780 2000 7- 2 2280.4348 7- 2280.8645 7- 1500 2280.2915 7- 2281.0077

7- 7- 2280.1483 1 6- 1000 2281.1509 7- 9- 7- 2281.2941 2660.8384 2280.0051 7- 1773.5579 500 2281.4374 7- 7- 2279.8618 2281.5806 7- 7- 2281.7238 2279.7186 0 0 1800 2000 2200 2400 2600 2800 m/z 2279.0 2279.5 2280.0 2280.5 2281.0 2281.5 2282.0 m/z

Figure 3.5 | Milligram scale chemical synthesis of Evo5. (a) Synthetic scheme. (b) Left: ESI-MS spectrum of PCSK9-Evo5-syn. Molecular formula of PCSK9-Evo5-syn is C536H683N177O305P48. Top right: observed spectrum of the [M-7H]7- peak cluster. Bottom right: predicted spectrum based on the molecular formula and isotopic abundances.

95 functional groups protected with base-labile protecting groups that would be removed under standard oligonucleotide deprotection conditions. Although most of the side-chain-functionalized nucleoside phosphoramidites were readily synthesized, the imidazole-bearing thymidine reagent proved difficult to prepare. We therefore synthesized PCSK9-Evo5-syn on solid phase with an activated ester (NHS-carboxy-dT) in place of the imidazole-functionalized thymidine, and coupled histamine to that position in the bead-bound polymer chain to install the imidazole side- chain before global deprotection and cleavage from solid support with ammonium hydroxide

(Figure 3.5a). The identity of PCSK9-Evo5-syn (approximately 2 mg from a one-micromole solid-phase synthesis, ~13% overall yield) was confirmed by high-resolution mass spectrometry

(Fig 3.5b and 3.5c).

High-affinity binding (KD = 6.8 nM) between totally synthetic PCSK9-Evo5-syn and biotinylated Avi-tagged PCSK9 was confirmed by SPR (Figure 3.6), though the sensogram fit less well to classic 1:1 kinetics in this format than in previous SPR assays, in which Evo5 was immobilized and PCSK9 was injected in flow, hinting at more complex binding dynamics than we initially realized.

140 5 -1 -1 kon = 3.56 10 M s -3 -1 120 koff = 2.42 10 s

100 KD = 6.8 nM

80

60

40 Response (RU) Response 20

0

-20 -200 0 200 400 600 800 1000 1200 1400 1600 1800 Time (s) Figure 3.6 | SPR sensogram characterizing binding kinetics between PCSK9-Evo5-syn and surface- immobilized PCSK9 protein. The concentrations of injected PCSK9-Evo5-syn were 1.8, 6, 18, 60, and 180 nM. The observed sensogram is shown in red and the fitted curve with the kinetic parameters listed is shown in black.

96 In addition, binding of a similarly synthesized Alexa Fluor 647-labeled Evo5 (“Evo5-

Fluor”; Figure 3.7a) at low nanomolar concentrations to PCSK9 was confirmed in an electrophoretic mobility shift assay (EMSA) (Figure 3.7b). We also observed a significant amount of Evo5-Fluor homodimer under EMSA assay conditions (Figure 3.7b). Similar dimerization may have caused the complex binding dynamics observed in the PCSK9-Evo5-syn

SPR assay (Figure 3.6). Consistent with previously established side-chain structure-activity relationship, sequence-matched DNA without any side-chains did not exhibit binding affinity to

PCSK9 protein at comparable concentrations (Figure 3.7b). Interestingly, the sequence-matched

DNA also formed significant amounts of homodimers, suggesting that dimerization was dependent on the DNA sequence, rather than the side-chains, of Evo5-Fluor.

a Solid phase synthesis

1. Histamine coupling 2. NH4OH 3. Fluorous capture and elution NH N

OH HN O

NH3 OH

TGC TAC CGT TGT TTA CCC TGC CAC CTT CTG + p-GAGTCCAGATGTAGGTAG - Spacer18 - AlexaFluor647

T3 DNA ligase NH Template N

OH HN O Evo5-Fluor NH3 OH

TGC TAC CGT TGT TTA CCC TGC CAC CTT CTG GAGTCCAGATGTAGGTAG - Spacer18 - AlexaFluor647

b Evo5-Fluor (1 nM) Sequence-matched DNA (1 nM) PCSK9 (nM) 300 100 30 10 3 1 0.3 0 300 100 30 10 3 1 0.3 0

Evo5-Fluor/PCSK9 complex Evo5-Fluor dimer Evo5-Fluor

Figure 3.7 | Electrophoretic mobility shift assay (EMSA) confirms binding of Evo5-Fluor to PCSK9 protein. (a) Synthesis of the Alexa Fluor 647-labeled Evo5-Fluor. (b) EMSA results testing Evo5-Fluor binding to PCSK9. Evo5-Fluor (1 nM) could form a complex with PCSK9 protein at concentrations down to 0.3 nM, while a sequence-matched DNA (1 nM) formed a small amount of complex only at 300 nM PCSK9. Both Evo5-Fluor and the sequence-matched DNA dimerize under the assay conditions (4 °C), resulting in the extra band.

97 3.5 – Evo5 potently disrupts PCSK9-LDLR binding in vitro

PCSK9 regulates cholesterol metabolism by binding the LDL receptor (LDLR) and promoting the lysosomal degradation of LDLR.21 We tested the ability of PCSK9-Evo5-syn to disrupt PCSK9-LDLR binding in an SPR assay. PCSK9-Evo5-syn dose-dependently reduced binding of PCSK9 to surface-immobilized LDLR (Figure 3.8a and 3.8b). The potency of

PCSK9-Evo5-syn inhibition of PCSK9-LDLR binding (IC50 = ~9 nM) was similar to that of a known PCSK9-neutralizing monoclonal antibody (Figure 3.8a). In contrast, unfunctionalized

DNA of the same sequence as PCSK9-Evo5-syn produced no inhibitory effect (Figure 3.8a), 70 consistent60 with the necessity of the side chains implicated in PCSK9 binding.

50 a b DNA Unlabeled LDLR 20 nM PCSK9 40 20 nM PCSK9 + 2 nM Evo5-syn PCSK9-Evo5-syn PCSK9-neutralizing Ab 20 nM PCSK9 + 6 nM Evo5-syn 20 nM PCSK9 + 20 nM Evo5-syn 30 70 20 nM PCSK9 + 60 nM Evo5-syn 1.2 20 nM PCSK9 + 200 nM Evo5-syn 60 Response (RU) 20 1.0 50 10 0.8 20 nM PCSK9 40 20 nM PCSK9 + 2 nM Evo5-syn 20 nM PCSK9 + 6 nM Evo5-syn 0 20 nM PCSK9 + 20 nM Evo5-syn

LDLR SPR responseSPRLDLR 0.6 30 20 nM PCSK9 + 60 nM Evo5-syn - 20 nM PCSK9 + 200 nM Evo5-syn -10 -500.4 50 150 250 350Response (RU) 20 450 Time (s) 10 0.2 0

0.0 -10 Normalized PCSK9Normalized 1 10 100 1000 -50 50 150 250 350 450 Time (s) [Competitor] (nM)

Figure 3.8 | Evo5 disrupts PCSK9-LDLR binding in vitro. (a) SPR response on an LDLR-coated surface produced by flowing PCSK9 in the presence of either PCSK9-Evo5-syn, unfunctionalized DNA of identical nucleotide sequence, unlabeled LDLR, or a known PCSK9-neutralizing monoclonal antibody. The SPR response shown is normalized to the response in experiments without any competitor. Error bars represent s.d. (n = 3). (b) A representative group of SPR sensograms characterizing PCSK9-LDLR binding in the presence of varying concentrations of PCSK9-Evo5-syn.

We also tested the affinity of surface-immobilized PCSK9-Evo5 to different PCSK9 protein constructs and found that a truncated PCSK9 variant lacking the prodomain exhibited no

98 apparent binding to PCSK9-Evo5 (Figure 3.9), implicating the PCSK9 prodomain, known to be involved in a secondary binding interface between PCSK9 and LDLR,22 in mediating the interaction of PCSK9 with PCSK9-Evo5.

80 PCSK9 (full) 70 PCSK9 (153-692) 60

50

40

30

Response (RU) 20

10

0

-10 -200 0 200 400 600 800 1000 1200 1400 1600 Time (s)

Figure 3.9 | Effect of truncating PCSK9 on its binding affinity to PCSK9-Evo5-syn. SPR sensograms show binding kinetics between surface-immobilized biotinylated PCSK9-Evo5 and either full-length PCSK9 protein (red) or a truncated PCSK9 protein missing the prodomain (green). The concentrations of injected protein were 2, 6, 20, and 60 nM.

3.6 – Efforts toward structural characterization of Evo5:PCSK9 complex

To gain further insight into the secondary structures of PCSK9-binding HFNAPs and the affinity improvement of PCSK9-Evo5 over PCSK9-A5, we subjected the DNA sequences of

PCSK9-Evo5 and PCSK9-A5trc2 to secondary structure prediction using mFold.23 The 3’ region, conserved between PCSK9-A5trc2 and PCSK9-Evo5, was predicted to form a stem-loop structure containing a bulge (Figure 3.10). The 5’ region, which includes all the consensus mutations leading to PCSK9-Evo5, contained significantly less (in the case of PCSK9-A5trc2) or no (in the case of PCSK9-Evo5) predicted secondary structure based on Watson-Crick base- pairing (Figure 3.10). Consistent with the observed side-chain structure-activity relationships

(Figure 3.4b, 3.7b, and 3.8a), these calculations further suggest crucial contributions of side

99 chains, which are not captured in the secondary structure model, to HFNAP folding and target- binding activity.

a b T G T T G T C T C A A C T T C T C C T 5’ C C T A T G T C T T C 5’ G T G G g G g C C C T a C 3’ a t A t A C G 3’ C C g C g C C g C g C

a T a T t t T T C C T T G g a G g a g a g a g t g t a c g a c c c g t t

PCSK9-A5trc2 (DNA) PCSK9-Evo5 (DNA)

Figure 3.10 | Secondary structure predictions of HFNAPs based on their DNA sequences. (a) Predicted minimum free energy secondary structure based on the DNA sequence of PCSK9-A5trc2. (b) Predicted minimum free energy secondary structure based on the DNA sequence of PCSK9-Evo5. Nucleotides in the constant, unfunctionalized primer-binding region are shown in lowercase letters, and nucleotides in the variable, functionalized region are shown in uppercase letters. Note that predictions do not consider side chains, which are likely to be crucial in HFNAP folding.

We attempted to obtain crystals of the Evo5:PCSK9 complex, but no crystals emerged after a crystallization condition screen. We then attempted to obtain crystals of Evo5 bound to a variant of PCSK9 (“PCSK9-trunc”) missing segments at both termini that were not resolved in any previous crystal structures. Incidentally, the affinity between PCSK9-trunc and Evo5 was improved by about 10-fold (KD = 0.3 nM) over that between full-length PCSK9 and Evo5, perhaps due to the removal of unfavorable electrostatic interactions between Evo5 and the highly acidic N-terminal segment of PCSK9. Nonetheless, macromolecular crystals unfortunately did not emerge after extensive screening of crystallization conditions.

100 3.7 – Discussion

Through diversification and further selection, we evolved an improved PCSK9-binding

HFNAP (Evo5) with KD = 3 nM. Recently, Gawande and coworkers performed selections for

PCSK9 aptamers from modified DNA libraries in which all instances of one or both pyrimidines

9 (C and/or T) were replaced by side-chain-functionalized variants . High-affinity aptamers (KD down to 12 pM) were enriched from doubly modified libraries in which hydrophobic or phenolic side chains were present on 50% of the nucleobases on average. Aptamers enriched from singly modified libraries (25% hydrophobic side chains on average) were less potent (KD ³ 100 pM), while libraries containing hydrophilic side chains or consisting of unmodified DNA did not produce aptamers with KD £ 30 nM. The affinity (KD = 3 nM) of the best PCSK9 binder we have found from our HFNAP library, which contains a roughly equal mix of hydrophilic and hydrophobic side chains installed at 33% total frequency, is approximately in line with Gawande and coworkers’ observed correlation between the hydrophobic composition of a modified DNA library and the affinity of selected binders, which is suggestive of a generalizable trend that may inform nucleic acid-based affinity reagent discovery efforts. Nonetheless, the diverse, balanced set of side-chains in HFNAPs, similar to the natural repertoire of proteins, may be more versatile in other settings.

As in the original A5 polymer, structure-activity relationships within Evo5 revealed side chains of different physical properties at specific positions that are critical to target-binding activity. While we did not successfully obtain a crystal structure for Evo5, this side-chain SAR, coupled with secondary structure predictions, strongly suggested that the folding of Evo5 could be dependent on cooperative interactions that involve both the DNA backbone and the side-chain

101 moieties. Aided by a multi-milligram scale chemical synthesis of PCSK9-Evo5-syn, we further determined that Evo5 potently inhibits binding between PCSK9 and the LDL receptor.

References

1. Ellington, A. D. & Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818–822 (1990).

2. Tuerk, C. & Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510 (1990).

3. Cho, M. et al. Quantitative selection of DNA aptamers through microfluidic selection and high-throughput sequencing. Proc. Natl. Acad. Sci. 107, 15373–15378 (2010).

4. Schütze, T. et al. Probing the SELEX Process with Next-Generation Sequencing. PLOS

ONE 6, e29604 (2011).

5. Cho, M. et al. Quantitative selection and parallel characterization of aptamers. Proc. Natl.

Acad. Sci. 110, 18460–18465 (2013).

6. Wang, J. et al. Particle Display: A Quantitative Screening Method for Generating High-

Affinity Aptamers. Angew. Chem. Int. Ed. 53, 4796–4801 (2014).

7. Wang, J. et al. Multiparameter Particle Display (MPPD): A Quantitative Screening

Method for the Discovery of Highly Specific Aptamers. Angew. Chem. Int. Ed. 56, 744–747

(2017).

8. Gold, L. et al. Aptamer-Based Multiplexed Proteomic Technology for Biomarker

Discovery. PLoS ONE 5, e15004 (2010).

102 9. Gawande, B. N. et al. Selection of DNA aptamers with two modified bases. Proc. Natl.

Acad. Sci. U. S. A. 114, 2898–2903 (2017).

10. Kimoto, M., Yamashige, R., Matsunaga, K., Yokoyama, S. & Hirao, I. Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 31, 453–457

(2013).

11. Matsunaga, K., Kimoto, M. & Hirao, I. High-Affinity DNA Aptamer Generation

Targeting von Willebrand Factor A1-Domain by Genetic Alphabet Expansion for Systematic

Evolution of Ligands by Exponential Enrichment Using Two Types of Libraries Composed of

Five Different Bases. J. Am. Chem. Soc. 139, 324–334 (2017).

12. Gelinas, A. D., Davies, D. R. & Janjic, N. Embracing proteins: structural themes in aptamer–protein complexes. Curr. Opin. Struct. Biol. 36, 122–132 (2016).

13. Davies, D. R. et al. Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. Proc. Natl. Acad. Sci. U. S. A. 109, 19971–19976

(2012).

14. Gelinas, A. D. et al. Crystal Structure of Interleukin-6 in Complex with a Modified

Nucleic Acid Ligand. J. Biol. Chem. 289, 8720–8734 (2014).

15. Jarvis, T. C. et al. Non-helical DNA Triplex Forms a Unique Aptamer Scaffold for High

Affinity Recognition of Nerve Growth Factor. Structure 23, 1293–1304 (2015).

16. Ren, X., Gelinas, A. D., von Carlowitz, I., Janjic, N. & Pyle, A. M. Structural basis for

IL-1α recognition by a modified DNA aptamer that specifically inhibits IL-1α signaling. Nat.

Commun. 8, 810 (2017).

103 17. Hirao, I. et al. In Vitro Selection of RNA Aptamers that Bind to Colicin E3 and

Structurally Resemble the Decoding Site of 16S Ribosomal RNA. Biochemistry 43, 3214–3221

(2004).

18. Duclair, S., Gautam, A., Ellington, A. & Prasad, V. R. High-affinity RNA Aptamers

Against the HIV-1 Protease Inhibit Both In Vitro Protease Activity and Late Events of Viral

Replication. Mol. Ther. - Nucleic Acids 4, e228 (2015).

19. Ortigão, J. F. R. et al. Antisense Effect of Oligodeoxynucleotides with Inverted Terminal

Internucleotidic Linkages: A Minimal Modification Protecting against Nucleolytic Degradation.

Antisense Res. Dev. 2, 129–146 (1992).

20. Lambert, G., Charlton, F., Rye, K.-A. & Piper, D. E. Molecular basis of PCSK9 function.

Atherosclerosis 203, 1–7 (2009).

21. Lo Surdo, P. et al. Mechanistic implications for LDL receptor degradation from the

PCSK9/LDLR structure at neutral pH. EMBO Rep. 12, 1300–1305 (2011).

22. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction.

Nucleic Acids Res. 31, 3406–3415 (2003).

104

Appendix: Materials and Methods

A.1 – Oligonucleotide sequences

All occurrences of U below are 2’-deoxy-U. Commercially available oligonucleotide modifiers are denoted by shorthand notations used by IDT.

Evaluation of translation yield on template libraries

Name Sequence Template-11codon CGTACGGTCGACGCTAGCNNRNNRNNRNNRNNRNNRNNRN NRNNRNNRNNRCACGTGGAGCTCGGATCC Template-13codon CGTACGGTCGACGCTAGCNNRNNRNNRNNRNNRNNRNNRN NRNNRNNRNNRNNRNNRCACGTGGAGCTCGGATCC Template-15codon CGTACGGTCGACGCTAGCNNRNNRNNRNNRNNRNNRNNRN NRNNRNNRNNRNNRNNRNNRNNRCACGTGGAGCTCGGATC C pp1-library /5Phos/GCTAGCGTCGACCGTACG pp2-library GGATCCGAGCTCCACGTG

Validation of sequence specificity of translation and amplification

Name Sequence Template-Bt-CATA /52-Bio//iSp18/CGTACGGTCGACGCTAGCTTGAAAGTGCAAGA GACACCGCGACACGTGGAGCTCGGATCC Template-Bt-CBTB /52-Bio//iSp18/CGTACGGTCGACGCTAGCATGTTACTGGTATC GTGAGCGGGACACGTGGAGCTCGGATCC Template-Bt-CCTC /52-Bio//iSp18/CGTACGGTCGACGCTAGCTAGATATGGCTAG GGTCAAGGGCACACGTGGAGCTCGGATCC Template-Bt-CDTD /52-Bio//iSp18/CGTACGGTCGACGCTAGCAAGTAACAGGAAA CGAGACGGCCACACGTGGAGCTCGGAUCC pp1-library-T7 /5Phos/GCTAGCGTCGACCGTACGAGCGTCGCTACGCGTGAC pp2-library GGATCCGAGCTCCACGTG T7-out-PCR2 TAATACGACTCACTATAGGGCTCGATTTAATTTCGCCGACGT GATGACATTCCAGGCAGTGTCACGCGTAGCGACGCT

105

PCSK9 binder selection and evolution

Name Sequence Naïve library AZ15 CGA ATC AGA TTG GAC CAG YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN GAG TCC AGA TGT AGG TAG BtBt-ExtA /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC ExtA CTA CCT ACA TCT GGA CTC pp1A /5Phos/GAG TCC AGA TGT AGG TAG pp2Z CGA ATC AGA TTG GAC CAG MiSeqA ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA TCT NNNN CTA CCT ACA TCT GGA CTC MiSeqZ TGG AGT TCA GAC GTG TGC TCT TCC GAT CT NNNN CGA ATC AGA TTG GAC CAG IlluminaAdapterFwd AATGATACGGCGACCACCGAGATCTACAC[8-base barcode]ACACTCTTTCCCTACACGAC IlluminaAdapterRev CAAGCAGAAGACGGCATACGAGAT[8-base barcode] GTGACTGGAGTTCAGACGTGTGCT Rediv library AZ15 CGA ATC AGA TTG GAC CAG XZP XFO JPZ JFP JOZ JOZ JPZ XFO JOO JFZ XZZ XFF XPZ XOO XOF GAG TCC AGA TGT AGG TAG X = 79% dC, 21% T Z = 7% dA, 79% dC, 7% dG, 7% T P = 79% dA, 7% dC, 7% dG, 7% T F = 7% dA, 7% dC, 79% dG, 7% T O = 7% dA, 7% dC, 7% dG, 79% T J = 21% dC, 79% T pp1A-3ddC /5Phos/GAG TCC AGA TGT AGG TAG/3ddC/

Synthesis of putative PCSK9 binders for bead retention assay

Name Sequence pp1A /5Phos/GAG TCC AGA TGT AGG TAG pp2Z CGA ATC AGA TTG GAC CAG PCSK9A1-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC CAG AAG GTG ATG CAA AGG CAA ACG GTA GGA GAG AGA ATA TAG TGG CTG GTC CAA TCT GAT TCG PCSK9A1-DNA CGA ATC AGA TTG GAC CAG CCA CTA TAT TCT CTC TCC TAC CGT TTG CCT TTG CAT CAC CTT CTG GAG TCC AGA TGT AGG TAG PCSK9A2-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC TCA GCG CAG AAG GTG AGG GCA AAA ACG GTA GAA GAA TCA TTG TTG CTG GTC CAA TCT GAT TCG

106

PCSK9A2-DNA CGA ATC AGA TTG GAC CAG CAA CAA TGA TTC TTC TAC CGT TTT TGC CCT CAC CTT CTG CGC TGA GAG TCC AGA TGT AGG TAG PCSK9A3-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC TGA TCA GGA TAA GTA GTG CTA AAG ACA TGA AAG AGG TTG TAG AGA CTG GTC CAA TCT GAT TCG PCSK9A3-DNA CGA ATC AGA TTG GAC CAG TCT CTA CAA CCT CTT TCA TGT CTT TAG CAC TAC TTA TCC TGA TCA GAG TCC AGA TGT AGG TAG PCSK9A4-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC GAA TAG GTA CCG CTA AAG ACG TGA TAG AAA CGA AAA GCA TTG GGG CTG GTC CAA TCT GAT TCG PCSK9A4-DNA CGA ATC AGA TTG GAC CAG CCC CAA TGC TTT TCG TTT CTA TCA CGT CTT TAG CGG TAC CTA TTC GAG TCC AGA TGT AGG TAG PCSK9A5-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC CAG AAG GTG CCG GGG GCA AAA ACG GTA GAA GAA TCA GTA ACG TGG CTG GTC CAA TCT GAT TCG PCSK9A5-DNA CGA ATC AGA TTG GAC CAG CCA CGT TAC TGA TTC TTC TAC CGT TTT TGC CCC CGG CAC CTT CTG GAG TCC AGA TGT AGG TAG PCSK9A6-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC GGA ACA ACG CAG AAG GAG CCG TGG AGA AGG CAG AAG AGG TTG AAA CTG GTC CAA TCT GAT TCG PCSK9A6-DNA CGA ATC AGA TTG GAC CAG TTT CAA CCT CTT CTG CCT TCT CCA CGG CTC CTT CTG CGT TGT TCC GAG TCC AGA TGT AGG TAG PCSK9A7-BtTempl /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC CAG AAG GTG AAA GTA AGA GAA CAA ACG GTA GAG GAA TAG GGA AGA CTG GTC CAA TCT GAT TCG PCSK9A7-DNA CGA ATC AGA TTG GAC CAG TCT TCC CTA TTC CTC TAC CGT TTG TTC TCT TAC TTT CAC CTT CTG GAG TCC AGA TGT AGG TAG

Synthesis of biotinylated PCSK9-A5 and variants for surface plasmon resonance assay

Name Sequence pp1A /5Phos/GAG TCC AGA TGT AGG TAG BtBt-pp2Z /52-Bio//iSp18/CGA ATC AGA TTG GAC CAG PCSK9A5-Templ CTA CCT ACA TCT GGA CTC CAG AAG GTG CCG GGG GCA AAA ACG GTA GAA GAA TCA GTA ACG TGG CTG GTC CAA TCT GAT TCG

107

Synthesis of biotinylated PCSK9-Evo5 and variants for surface plasmon resonance assay

Name Sequence pp1A-3primeBtBt /5Phos/GAG TCC AGA TGT AGG TAG/iSp18/iBiodT/3Bio/ pp1A-3primeBtBt- /5Phos/GAG TCC AGA TGT AG/iSp18/iBiodT/3Bio/ 14nt Evo5-pp2-dU CCA CGT TAC TGA TTC UGC PCSK9Evo5-Templ CTA CCT ACA TCT GGA CTC CAG AAG GTG GCA GGG TAA ACA ACG GTA GCA GAA TCA GTA ACG TGG

Synthesis of PCSK9-Evo5-Fluor and negative control for EMSA assay

Name Sequence pp1A-Alexa647 /5phos/GAG TCC AGA TGT AGG TAG /iSp18//3AlexF647N/ Evo5_DNA- TGCTACCGTTGTTTACCCTGCCACCTTCTG LeftHalf BtBt_Evo5- /52-Bio//iSp18/CTA CCT ACA TCT GGA CTC CAG AAG GTG Template GCA GGG TAA ACA ACG GTA GCA GAA TCA GTA ACG TGG CTG

Negative control for PCSK9-Evo5-syn in SPR assay

Name Sequence Evo5DNA-InvdT TGC TAC CGT TGT TTA CCC TGC CAC CTT CTG GAG TCC AGA TGT AGG TAG /3InvdT/

IL-6 binder selection

Name Sequence Naïve library CW15 CTC GGA TGA ACC TGG ACT YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN YNN GGA CTG AGT CCA GAG TAA BtBt-ExtC /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC ExtC TTA CTC TGG ACT CAG TCC pp1C /5Phos/GGA CTG AGT CCA GAG TAA pp2W CTC GGA TGA ACC TGG ACT MiSeqC ACA CTC TTT CCC TAC ACG ACG CTC TTC CGA TCT NNNNNN TTA CTC TGG ACT CAG TCC MiSeqW TGG AGT TCA GAC GTG TGC TCT TCC GAT CT NNNN CTC GGA TGA ACC TGG ACT IlluminaAdapterFwd AATGATACGGCGACCACCGAGATCTACAC[8-base barcode] ACACTCTTTCCCTACACGAC 108

IlluminaAdapterRev CAAGCAGAAGACGGCATACGAGAT[8-base barcode] GTGACTGGAGTTCAGACGTGTGCT

Synthesis of putative IL-6-binding HFNAPs for bead retention assay

Name Sequence pp1C /5phos/GGA CTG AGT CCA GAG TAA pp2W CTC GGA TGA ACC TGG ACT IL6-A1-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC TGG TGG CCA CTG GCA GCA CCG TCA CGG AGG CTG AGA CTG CCG CAA AGT CCA GGT TCA TCC GAG IL6-A2-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC ACA GGA GGG ATA CCG GAA GGG CAA AGG CCG TAG CCA GAA CAA AAA AGT CCA GGT TCA TCC GAG IL6-A3-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC ACA ACG CAG CCG GAA GCA CCA CAG GAG AGG CGG TAA CCA GCA ACA AGT CCA GGT TCA TCC GAG IL6-A4-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC TCG GAA GCG TCA CGA CGG TAA CCG GCA CTG AGA GCA ACA CCA CAA AGT CCA GGT TCA TCC GAG IL6-A5-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC TGG CTA AGG CGA CCA CGG GCA CTG CAA CCA CAG CCA AAG GGG TGA AGT CCA GGT TCA TCC GAG IL6-A6-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC CAA TGA GGG AGA GAG GGG GAA GGG CGA CAA AGG CCG TAA CCA GGA AGT CCA GGT TCA TCC GAG IL6-A7-BtTempl /52-Bio//iSp18/TTA CTC TGG ACT CAG TCC CCG GAG GAA TGA GTA CGA GGA AGG GCA ACG AAA TAA ACA GCA GCA AGT CCA GGT TCA TCC GAG

Synthesis of biotinylated IL6-A7 and variants for surface plasmon resonance assay

Name Sequence pp1C-3ddC /5Phos/GGACTGAGTCCAGAGTAA/3ddC/ BtBt-pp2W /52-Bio//iSp18/CTCGGATGAACCTGGACT IL6-A7-Templ TTA CTC TGG ACT CAG TCC CCG GAG GAA TGA GTA CGA GGA AGG GCA ACG AAA TAA ACA GCA GCA AGT CCA GGT TCA TCC GAG

109

A.2 – Synthesis and characterization of Phosphoramidite Intermediates

Compounds were prepared and characterized by Wuxi AppTec Co. under the direction of

Xun Hong. The protocols and characterization furnished along with these compounds are printed here. NMR spectra were recorded on a Bruker Avance 400 MHz for 1H NMR. Chemical shifts are reported in ppm (d). Chromatographic purifications were by flash chromatography using 100~200 mesh silica gel. Anhydrous solvents were pre-treated with 3 Å MS column before use. All commercially available reagents were used as received unless otherwise stated.

General synthetic routes:

110

111

Synthesis of phosphoramidites 5a-d

To a solution of 1 (40.0 g, 113.2 mmol, 1 equiv) and DMAP (0.113 g, 1.13 mmol, 0.01 equiv) in pyridine (400 mL) was added dropwise DMTrCl (40.2 g, 119 mmol, 1.05 equiv) and at

0 °C. The mixture was stirred at 25 °C for 16 h. TLC (DCM/MeOH = 20/1) indicated that 1 was consumed completely. The reaction mixture was concentrated with MeOH (50 mL) under reduced pressure to remove pyridine. The residue was purified by column chromatography (SiO2,

DCM/MeOH=50/1 to 20/1) to give the 2 (62 g, yield 84%) as a white foam. 1H NMR (400 MHz,

DMSO-d6) d 8.65 (d, J=4.02 Hz, 1 H), 7.99 (s, 1 H), 7.53 (dd, J=7.28, 5.77 Hz, 1 H), 7.36 - 7.42

(m, 2 H), 7.20 - 7.35 (m, 6 H), 6.90 (d, J=9.03 Hz, 4 H), 6.09 (t, J=6.78 Hz, 1 H), 4.15 - 4.24 (m,

1 H), 3.91 (d, J=3.51 Hz, 1 H), 3.74 (s, 6 H), 3.18 (d, J=3.01 Hz, 2 H), 2.22 (ddd, J=13.30, 5.77,

3.01 Hz, 1 H), 2.06 - 2.16 (m, 1 H).

112

To a solution of 2 (6 g, 9.15 mmol, 1 equiv), Cs2CO3 (8.95 g, 27.5 mmol, 3 equiv), (E)-

4,4,5,5-tetramethyl-2-(5-methylhex-1-en-1-yl)-1,3,2-dioxaborolane (2.46 g, 11 mmol, 1.2 equiv) and PPh3 (1.2 g, 4.58 mmol, 0.5 equiv) in dioxane (700 mL) and water (30 mL) was added

Pd(OAc)2 (2.35 g, 10.5 mmol, 0.1 equiv) at 25 °C under N2 current. The mixture was heated to

90 °C and stirred for 16 h. TLC (ethyl acetate/MeOH=20/1) showed 2 was consumed completely.

The reaction mixture was diluted with water 50 mL and extracted with ethyl acetate (100 mL x 2).

The combined organic layers were washed with sat. aqueous NaCl (50 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (SiO2, DCM/MeOH=50/1 to 20:1) to give compound 3a (5.1 g, 8.15

1 mmol, 89% yield) was obtained as a light-yellow solid. H NMR (400 MHz, CDCl3) d 7.86 (s, 1

H), 7.42 (d, J=7.53 Hz, 2 H), 7.18 - 7.35 (m, 7 H), 6.81 (d, J=7.53 Hz, 4 H), 6.46 (t, J=6.53 Hz, 1

H), 5.53 - 5.70 (m, 2 H), 4.47 - 4.57 (m, 1 H), 4.13 (d, J=3.01 Hz, 1 H), 3.79 (s, 5 H), 3.47 (dd,

J=10.54, 3.01 Hz, 1 H), 3.28 (dd, J=10.54, 3.01 Hz, 1 H), 2.70 (ddd, J=13.55, 5.52, 3.01 Hz, 1 H),

2.24 (dt, J=13.55, 6.78 Hz, 1 H), 1.56 - 1.83 (m, 4 H), 1.13 - 1.40 (m, 3 H), 0.91 (dtd, J=9.47, 6.43,

6.43, 3.26 Hz, 2 H), 0.75 (dd, J=6.78, 2.76 Hz, 6 H).

To a solution of 3a (4.23 g, 6.76 mmol, 1.00 equiv) in DMF (40.00 mL) was added Et3N

(1.03 g, 10.14 mmol, 1.41 mL, 1.50 equiv) and benzoic anhydride (1.84 g, 8.11 mmol, 1.53 mL,

113

1.20 equiv). The mixture was stirred at 0 to 25 °C for 16 h. TLC (petroleum ether/ethyl acetate

=1/1) indicated 3a was consumed completely and the reaction was clean. The reaction mixture was quenched by addition water 20 mL at 0-5 °C, and then extracted with ethyl acetate (50 mL). The combined organic layers were washed with sat. aqueous NaCl (20 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column

1 chromatography (Basic SiO2, petroleum ether/ethyl acetate=5/1 to 2/1) to a yellow solid. H NMR

(400 MHz, CDCl3) d 13.52 (br. s., 1 H), 8.30 (d, J=7.03 Hz, 2 H), 7.96 (s, 1 H), 7.50 - 7.57 (m, 1

H), 7.40 - 7.50 (m, 4 H), 7.21 - 7.37 (m, 7 H), 6.84 (dd, J=8.53, 1.51 Hz, 4 H), 6.40 (t, J=6.78 Hz,

1 H), 6.21 (d, J=16.06 Hz, 1 H), 5.92 - 6.03 (m, 1 H), 4.55 (d, J=3.01 Hz, 1 H), 4.10 (d, J=3.01

Hz, 1 H), 3.79 (s, 6 H), 3.56 (dd, J=10.54, 3.01 Hz, 1 H), 3.31 (dd, J=10.54, 3.01 Hz, 1 H), 2.52

(ddd, J=13.55, 5.77, 2.76 Hz, 1 H), 2.35 (dt, J=13.68, 6.96 Hz, 1 H), 2.09 (d, J=3.51 Hz, 1 H),

1.72 - 1.91 (m, 2 H), 1.41 (dt, J=13.43, 6.59 Hz, 1 H), 0.83 - 1.00 (m, 2 H), 0.77 (dd, J=6.53, 1.51

Hz, 6 H).

To a solution of 4a (2.87 g, 3.93 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.696 g, 5.90 mmol, 1.5 equiv) in DCM (30 mL) was added drop wise of 3-bis(diisopropylamino) phosphanyloxypropanenitrile (1.42 g, 4.72 mmol, 1.2 equiv) at 0 °C under N2 current. Then the mixture was stirred at 0-25 °C for 2 h under N2 current. A clear yellow solution was obtained. TLC

114

(petroleum ether/ethyl acetate =2/1) showed 4a was consumed completely. The reaction mixture was concentrated under reduced pressure to give a residue. The resulting residue was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate =10/1 to 5/1) to give phosphoramidite 5a (1.75 g, 1.88 mmol, 48% yield) as a light-yellow foam. 1H NMR (400 MHz,

CDCl3) d 13.51 (br. s., 1 H) 8.30 (d, J=7.53 Hz, 2 H) 7.99 (d, J=18.57 Hz, 1 H) 7.50 - 7.58 (m, 1

H) 7.41 - 7.50 (m, 4 H) 7.21 - 7.37 (m, 8 H) 6.84 (ddd, J=6.65, 4.64, 2.26 Hz, 4 H) 6.36 - 6.48 (m,

1 H) 6.18 (d, J=16.06 Hz, 1 H) 5.88 - 6.02 (m, 1 H) 4.62 (td, J=6.65, 3.26 Hz, 1 H) 4.15 - 4.26 (m,

1 H) 3.79 (d, J=2.51 Hz, 7 H) 3.50 - 3.66 (m, 4 H) 3.25 (dt, J=10.79, 3.39 Hz, 1 H) 2.54 - 2.70 (m,

2 H) 2.27 - 2.44 (m, 2 H) 1.66 - 1.86 (m, 2 H) 1.38 (dd, J=13.30, 6.78 Hz, 1 H) 1.12 - 1.22 (m, 9

H) 1.04 (d, J=7.03 Hz, 3 H) 0.79 - 0.93 (m, 3 H) 0.75 (t, J=5.77 Hz, 6 H). 31P NMR (162 MHz,

CDCl3) d 148.40 - 149.28 (m, 1 P). TLC petroleum ether/ethyl acetate =2/1 (Rf = 0.43).

To a solution of 2 (8.00 g, 12.2 mmol, 1 equiv), Cs2CO3 (11.9 g, 36.6 mmol, 3 equiv), (E)-

2-(2-cyclopropylvinyl)-4,4,5,5-tetramethyl-1,3,2-dioxaborolane (2.84 g, 14.7 mmol, 1.2 equiv) and PPh3 (1.60 g, 6.10 mmol, 0.5 equiv) in dioxane (60 mL) and water (30 mL) was added

Pd(OAc)2 (0.274 g, 1.22 mmol, 0.1 equiv) at 25 °C under N2 current. The mixture was heated to

90 °C and stirred for 16 h. TLC (ethyl acetate/MeOH = 20/1) showed compound 2 was consumed completely. The reaction mixture was diluted with water 50 mL and extracted with ethyl acetate

115

(100 mL x 2). The combined organic layers were washed with sat. aqueous NaCl (50 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (SiO2, DCM/MeOH=50/1 to 20:1) to give 3b (7.00 g, 11.8 mmol, 96% yield), obtained as a light-yellow solid.

To a solution of 3b (3.4 g, 5.71 mmol, 1.00 equiv) in DMF (30.00 mL) was added Et3N

(0.693 g, 6.85 mmol, 1.50 equiv) and benzoic anhydride (1.42 g, 6.28 mmol, 1.20 equiv). The mixture was stirred at 0-25 °C for 16 h. TLC (DCM/MeOH=20/1) indicated 3b was consumed completely and the reaction was clean. The reaction mixture was quenched by addition sat. aqueous NaCl 20 mL at 25 °C, and then extracted with ethyl acetate (50 mL). The combined organic layers were washed with sat. aqueous NaCl (20 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (Basic SiO2, petroleum ether/ethyl acetate = 3/1 to 2:1) to give 4b (2.6 g, yield

1 66%) as a yellow solid. H NMR (400 MHz, CDCl3) d 13.53 (br. s., 1 H), 8.29 (d, J=7.53 Hz, 2

H), 7.92 (s, 1 H) 7.50 - 7.57 (m, 1 H), 7.41 - 7.49 (m, 4 H), 7.21 - 7.37 (m, 8 H), 6.85 (d, J=7.53

Hz, 4 H), 6.39 (t, J=6.53 Hz, 1 H), 6.30 (d, J=16.06 Hz, 1 H), 5.60 (dd, J=15.56, 9.03 Hz, 1 H),

4.48 - 4.56 (m, 1 H), 4.08 (d, J=3.01 Hz, 1 H), 3.80 (s, 6 H), 3.56 (dd, J=10.54, 3.01 Hz, 1 H), 3.28

(dd, J=10.79, 3.26 Hz, 1 H), 2.51 (ddd, J=13.55, 6.02, 3.01 Hz, 1 H), 2.32 (dt, J=13.93, 6.84 Hz,

116

1 H), 2.19 (d, J=4.02 Hz, 1 H), 1.28 (td, J=8.41, 4.77 Hz, 2 H), 0.43 - 0.61 (m, 2 H), -0.22 - 0.01

(m, 2 H).

To a solution of 4b (2.25 g, 3.22 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.570 g, 4.83 mmol, 1.5 equiv) in DCM (20 mL) was added dropwise of 3-bis(diisopropylamino) phosphanyloxypropanenitrile (1.16 g, 3.86 mmol, 1.2 equiv) at 0 °C under N2 current. Then the mixture was stirred at 0-25 °C for 2 h under N2 current. A clear yellow solution was obtained. TLC

(DCM/MeOH=20/1) showed 4b was consumed completely. The reaction mixture was concentrated under reduced pressure to give a residue. The resulting residue was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate =6/1 to 5/1) to give phosphoramidite 5b (2.20 g, 2.44 mmol, 75.9% yield) as a light-yellow foam. 1H NMR (400 MHz,

CDCl3) d 13.53 (br. s., 1 H) 8.29 (d, J=7.53 Hz, 2 H) 7.95 (d, J=18.57 Hz, 1 H) 7.50 - 7.57 (m, 1

H) 7.41 - 7.49 (m, 4 H) 7.21 - 7.38 (m, 7 H) 6.85 (dd, J=7.53, 4.52 Hz, 4 H) 6.36 - 6.46 (m, 1 H)

6.28 (dd, J=15.81, 3.76 Hz, 1 H) 5.57 (dd, J=15.81, 9.29 Hz, 1 H) 4.59 (td, J=6.53, 3.01 Hz, 1 H)

4.19 (dd, J=15.56, 2.01 Hz, 1 H) 3.80 (d, J=2.51 Hz, 6 H) 3.48 - 3.66 (m, 4 H) 3.22 (dt, J=10.54,

3.01 Hz, 1 H) 2.52 - 2.70 (m, 2 H) 2.27 - 2.43 (m, 2 H) 1.21 - 1.31 (m, 2 H) 1.17 (dd, J=6.78, 2.76

Hz, 10 H) 1.04 (d, J=6.53 Hz, 3 H) 0.78 - 0.92 (m, 1 H) 0.39 - 0.56 (m, 2 H) –0.29 - –0.08 (m, 2

117

31 H). P NMR (162 MHz, CDCl3) d 148.36 - 149.25 (m, 1 P). TLC 20:1 DCM:methanol (Rf =

0.85).

To a solution of 2 (6.00 g, 9.15 mmol, 1 equiv), Cs2CO3 (8.95 g, 27.5 mmol, 3 equiv), (E)-

2-(2-cyclopropylvinyl)-4,4,5,5-tetramethyl-1,3,2-dioxaborolane (2.59 g, 11.0 mmol, 1.2 equiv) and PPh3 (1.20 g, 4.58 mmol, 0.5 equiv) in dioxane (40 mL) and water (20 mL) was added

Pd(OAc)2 (0.274 g, 1.22 mmol, 0.1 equiv) at 25 °C under N2 current. The mixture was heated to

90 °C and stirred for 16 h. TLC (DCM/ MeOH=20/1) showed 2 was consumed completely. The reaction mixture was diluted with water (50 mL) and extracted with ethyl acetate (100 mL x 2).

The combined organic layers were washed with sat. aqueous NaCl (50 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (SiO2, DCM/MeOH=50/1 to 20:1) to give the 3c (4.90 g, 7.68 mmol, 83%

1 yield) was obtained as a light-yellow solid. H NMR (400 MHz, CDCl3) d 7.82 (s, 1 H), 7.43 (d,

J=7.53 Hz, 2 H), 7.19 - 7.36 (m, 8 H), 6.82 (d, J=8.53 Hz, 4 H), 6.52 (t, J=6.53 Hz, 1 H), 5.55 -

5.69 (m, 2 H), 4.46 - 4.55 (m, 1 H), 4.16 (d, J=2.51 Hz, 2 H), 3.72 - 3.84 (m, 6 H), 3.45 (dd,

J=10.04, 3.01 Hz, 1 H), 3.29 (dd, J=10.29, 3.26 Hz, 1 H), 3.09 (q, J=7.19 Hz, 1 H), 2.74 (dt,

J=10.79, 2.89 Hz, 1 H), 2.21 (dt, J=13.80, 6.65 Hz, 1 H), 1.72 - 1.86 (m, 2 H), 1.35 - 1.59 (m, 9

H), 0.92 (br. s., 2 H).

118

To a solution of compound 3c (4.8 g, 7.53 mmol, 1.00 equiv) in DMF (50.0 mL) was added Et3N (1.14 g, 11.3 mmol, 1.50 equiv) and benzoic anhydride (2.04 g, 9.04 mmol, 1.20 equiv).

The mixture was stirred at 0-25 °C for 16 h. TLC (DCM/MeOH = 20/1) indicated 3c was consumed completely and the reaction was clean. The reaction mixture was quenched by addition water 20 mL at 25 °C, and then extracted with ethyl acetate (50 mL). The combined organic layers were washed with sat. aqueous NaCl (20 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate = 5/1 to 2:1) to give compound 4c (3.60

1 g, yield 64%) as a yellow solid. H NMR (400 MHz, CDCl3) d 13.52 (br. s., 1 H), 8.30 (d, J=7.53

Hz, 2 H), 7.90 (s, 1 H), 7.50 - 7.57 (m, 1 H), 7.40 - 7.48 (m, 4 H), 7.20 - 7.37 (m, 8 H), 6.84 (d,

J=8.03 Hz, 4 H), 6.39 (t, J=6.53 Hz, 1 H), 6.13 (s, 2 H), 4.51 - 4.58 (m, 1 H), 4.10 (d, J=3.01 Hz,

1 H), 3.79 (s, 6 H), 3.53 (dd, J=10.54, 3.51 Hz, 1 H), 3.33 (dd, J=10.54, 3.51 Hz, 1 H), 2.52 (ddd,

J=13.68, 5.90, 3.01 Hz, 1 H), 2.33 (dt, J=13.55, 6.78 Hz, 1 H), 2.18 (d, J=3.51 Hz, 1 H), 1.79 -

1.94 (m, 2 H), 1.37 - 1.61 (m, 7 H), 1.00 (br. s., 2 H). TLC DCM/MeOH = 20/1 (Rf = 0.43).

119

To a solution of 4c (2.40 g, 3.24 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.574 g, 4.86 mmol, 1.5 equiv) in DCM (20 mL) was added dropwise of 3-bis(diisopropylamino) phosphanyloxypropanenitrile (1.17 g, 3.89 mmol, 1.2 equiv) at 0 °C under N2 current. Then the mixture was stirred at 0-25 °C for 2 h under N2 current. A clear yellow solution was obtained. TLC

(petroleum ether/ethyl acetate = 2/1) showed 4c was consumed completely. The reaction mixture was concentrated under reduced pressure to give a residue, which was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate = 10/1 to 6/1) to give phosphoramidite

1 5c (1.70 g, 2.44 mmol, 56% yield) as a white foam. H NMR (400 MHz, CDCl3) d 13.51 (br. s.,

1 H), 8.30 (d, J=7.03 Hz, 2 H), 7.93 (d, J=18.57 Hz, 1 H), 7.50 - 7.57 (m, 1 H), 7.41 - 7.49 (m, 4

H), 7.22 - 7.37 (m, 8 H), 6.84 (dd, J=7.53, 5.02 Hz, 4 H), 6.36 - 6.46 (m, 1 H), 6.03 - 6.18 (m, 2

H), 4.62 (td, J=6.78, 3.01 Hz, 1 H), 4.16 - 4.26 (m, 1 H), 3.79 (d, J=2.51 Hz, 6 H), 3.49 - 3.66 (m,

4 H), 3.27 (dt, J=10.54, 3.76 Hz, 1 H), 2.53 - 2.70 (m, 2 H), 2.27 - 2.44 (m, 2 H), 1.72 - 1.91 (m,

2 H), 1.35 - 1.56 (m, 6 H), 1.14 - 1.23 (m, 9 H), 1.05 (d, J=6.53 Hz, 2 H), 0.97 (d, J=3.01 Hz, 3

31 H). P NMR (162 MHz, CDCl3) d 148.49 - 149.26 (m, 1 P).

120

To a solution of 2 (6.00 g, 12.211.0 mmol, 1 equiv), Cs2CO3 (8.95 g, 27.5 mmol, 3 equiv),

(E)-2-(4-fluorostyryl)-4,4,5,5-tetramethyl-1,3,2-dioxaborolane (2.72 g, 11.0 mmol, 1.2 equiv) and

PPh3 (1.20 g, 4.58 mmol, 0.5 equiv) in dioxane (60 mL) and water (30 mL) was added Pd(OAc)2

(0.206 g, 0.915 mmol, 0.1 equiv) at 25 °C under N2 current. The mixture was heated to 90 °C and stirred for 16 h. TLC (DCM/ MeOH=20/1) showed 2 was consumed completely. The reaction mixture was diluted with water (50 mL) and extracted with ethyl acetate (100 mL x 2). The combined organic layers were washed with sat. aqueous NaCl (50 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (SiO2, DCM/MeOH = 50/1 to 20:1) to give 3d (3.10 g, 4.77 mmol, 52% yield) as a light-yellow solid. TLC DCM/MeOH = 20/1 (Rf = 0.20).

To a solution of 3d (2.7 g, 4.16 mmol, 1.00 equiv) in DMF (30.00 mL) was added Et3N

(0.631 g, 6.24 mmol, 1.50 equiv) and benzoic anhydride (1.13 g, 4.99 mmol, 1.20 equiv). The

121

mixture was stirred at 0-25 °C for 16 h. TLC (petroleum ether/ethyl acetate = 1/1) indicated 3d was consumed completely and the reaction was clean according to TLC. The reaction mixture was quenched by addition water 100 mL at 0-5 °C, and then extracted with ethyl acetate (50 mL). The combined organic layers were washed with sat. aqueous NaCl (20 mL), dried over MgSO4, filtered and concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate = 2/1 to 1:1) to give 4d (2.00 g, yield

1 64%) as a yellow solid. H NMR (400 MHz, CDCl3) d 13.59 (br. s., 1 H), 8.32 (d, J=7.53 Hz, 2

H), 8.22 (s, 1 H), 7.52 - 7.59 (m, 1 H), 7.42 - 7.51 (m, 4 H), 7.33 (dd, J=8.78, 1.76 Hz, 4 H), 7.22

- 7.29 (m, 3 H), 7.14 - 7.21 (m, 1 H), 7.05 (d, J=16.56 Hz, 1 H), 6.71 - 6.89 (m, 9 H), 6.43 (t,

J=6.53 Hz, 1 H), 4.58 (br. s., 1 H), 4.15 (d, J=2.51 Hz, 1 H), 3.59 - 3.75 (m, 7 H), 3.29 (dd, J=11.04,

3.01 Hz, 1 H), 2.59 (ddd, J=13.55, 6.02, 3.01 Hz, 1 H), 2.41 (dt, J=13.55, 6.78 Hz, 1 H), 2.14 (br.

19 s., 1 H); F NMR (376 MHz, CDCl3) d -114.32 (s, 1 F)

To a solution of 4d (2.30 g, 3.05 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.570 g, 4.27 mmol, 1.5 equiv) in DCM (20 mL) was added drop wise of 3-bis(diisopropylamino) phosphanyloxypropanenitrile (1.10 g, 3.66 mmol, 1.2 equiv) at 0 °C under N2 current. Then the mixture was stirred at 0-25 °C for 3 h under N2 current. A clear yellow solution was obtained. TLC

(petroleum ether/ethyl acetate = 2/1) showed 4d was consumed completely. The reaction mixture

122

was concentrated under reduced pressure to give a residue, which was purified by column chromatography (basic SiO2, petroleum ether/ethyl acetate =6/1 to 5/1) to give phosphoramidite

1 5d (2.10 g, 2.20 mmol, 72% yield) as a light-yellow foam. H NMR (400 MHz, CDCl3) d 13.59

(s, 1 H), 8.33 (d, J=7.53 Hz, 2 H), 8.25 (d, J=19.07 Hz, 1 H), 7.52 - 7.59 (m, 1 H), 7.41 - 7.51 (m,

4 H), 7.22 - 7.38 (m, 7 H), 7.18 (dd, J=7.03, 4.02 Hz, 1 H), 7.03 (d, J=16.06 Hz, 1 H), 6.66 - 6.85

(m, 9 H), 6.45 (q, J=6.53 Hz, 1 H), 4.65 (td, J=6.53, 3.01 Hz, 1 H), 4.19 - 4.30 (m, 1 H), 3.69 (s,

6 H), 3.49 - 3.65 (m, 3 H), 3.19 - 3.28 (m, 1 H), 2.58 - 2.76 (m, 2 H), 2.36 - 2.46 (m, 2 H), 1.51 (d,

31 J=6.53 Hz, 1 H), 1.17 (d, J=7.03 Hz, 8 H), 1.04 (d, J=6.53 Hz, 3 H). P NMR (162 MHz, CDCl3)

19 d 148.45 - 149.26 (m, 1 P). F NMR (376 MHz, CDCl3) d -114.45 (s, 1 F). TLC 2:1 pentane ether: ethyl acetate (Rf = 0.43).

Synthesis of phosphoramidite 11

To a solution of 6 (50.0 g, 141.2 mmol, 1 equiv) and DMAP (0.172 g, 1.41 mmol, 0.01 equiv) in pyridine (500 mL) was added dropwise DMTrCl (50.2 g, 148.2 mmol, 1.05 equiv) and at

0°C. The mixture was stirred at 25°C for 16 h. TLC (Petroleum ether /Ethyl acetate = 1/1) indicated compound 6 was consumed completely. The reaction mixture was concentrated with MeOH (5 mL) under reduced pressure to remove pyridine. The residue was purified by column

123

chromatography (SiO2, petroleum ether/ethyl acetate = 2/1 to 1/2) to give 7 (85.0 g, yield 92%) as

1 a white foam. H NMR (400 MHz, CDCl3) d 8.16 (s, 1 H), 7.44 (d, J=7.53 Hz, 2 H), 7.22 - 7.38

(m, 7 H), 6.87 (d, J=8.53 Hz, 4 H), 6.35 (dd, J=7.78, 5.77 Hz, 1 H), 4.53 - 4.62 (m, 1 H), 4.11 -

4.18 (m, 2 H), 3.81 (s, 6 H), 3.35 - 3.48 (m, 2 H), 2.53 (ddd, J=13.55, 5.52, 2.51 Hz, 1 H), 2.26 -

2.37 (m, 1 H). TLC petroleum ether/ethyl acetate = 1/1 (Rf = 0.15).

To a solution of 7 (68.7 g, 105 mmol, 1 equiv), methyl acrylate (54 g, 627 mmol, 2 equiv),

PPh3 (5.5 g, 20.9 mmol, 0.2 equiv), and trimethylamine (21.2 g, 209 mmol, 2 equiv) in dioxane

(700 mL) was added Pd(OAc)2 (2.35 g, 10.5 mmol, 0.1 equiv) at 25°C under N2 current. The mixture was heated to 90 °C and stirred for 16 h. TLC (petroleum ether/ethyl acetate =1/1) showed

7 was consumed completely. The reaction mixture was filtered under reduced pressure to give a residue. The residue was purified by column chromatography (SiO2, petroleum ether/ethyl acetate

=1/1 to 1.5:1) to give the compound 8 (42 g, 68.3 mmol, 65.3% yield) as a light-yellow solid.

124

To solution of 8 (42 g, 68.3 mmol, 1 equiv) in THF (500 mL) was added NaOH aqueous

(1N, 102.5 mL, 1.5 equiv) at 25 °C, and then the resulting mixture was stirred at 25 °C for 16 h.

TLC (DCM/MeOH=10/1) indicated 8 was consumed completely. The reaction mixture was partitioned between ethyl acetate (200 mL) and water (100 mL), the water phase was separated, acidified with sat. aqueous citric acid to pH7, the white suspension was filtered and dried under reduced pressure to give the compound 9 (30.5 g, 50.8 mmol, 74% yield) as a white solid. 1H

NMR (400 MHz, CDCl3) d 7.67 (s, 1 H), 7.40 (d, J=7.53 Hz, 2 H), 7.24 - 7.33 (m, 8 H), 7.15 -

7.22 (m, 1 H), 6.91 (d, J=15.56 Hz, 1 H), 6.82 (d, J=9.04 Hz, 4 H), 6.29 (t, J=6.27 Hz, 1 H), 4.47

(d, J=6.02 Hz, 1 H), 3.99 (d, J=5.02 Hz, 1 H), 3.75 (s, 5 H), 3.45 (dd, J=10.29, 5.27 Hz, 1 H), 3.34

(dd, J=10.04, 4.52 Hz, 1 H), 2.41 - 2.51 (m, 1 H), 2.26 (dt, J=13.80, 6.65 Hz, 1 H). TLC DCM

/MeOH = 10/1 (Rf = 0.15).

A solution of 9 (5 g, 8.32 mmol, 1 equiv), Et3N (4.21 g, 41.6 mmol, 5 equiv) and HATU

(4.75 g, 12.5 mmol, 1.5 equiv) in DMF (60 mL) was stirred for 30 min at 25 °C. Then to this mixture was added 2-aminoethyl acetate hydrochloride (1.39 g, 9.99 mmol, 1.2 equiv) at 25°C.

The mixture was stirred at 25 °C for 16 h. TLC (DCM/MeOH=20/1) showed the acid was consumed completely. The reaction mixture was quenched by addition of sat. aqueous NaHCO3

(50 mL) at 25 °C, and then extracted with ethyl acetate (100 mL x 2). The combined organic layers

125

were concentrated under reduced pressure to give a residue. The residue was purified by column chromatography (basic SiO2, DCM/MeOH = 50/1 to 30/1) to give compound 10 (2.7 g, yield 47%) as a white foam. TLC DCM/MeOH = 20/1 (Rf = 0.30).

To a solution of 10 (2.10 g, 3.06 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.543 g, 4.59 mmol, 1.5 equiv) in DCM (30 mL) was added drop wise of 3- bis(diisopropylamino)phosphanyloxypropanenitrile (1.11 g, 3.67 mmol, 1.2 equiv) at 0 °C under

N2 current. Then the mixture was stirred at 0-25 °C for 2 h under N2 current. A clear yellow solution was obtained. TLC (DCM/MeOH=20/1) showed 10 was consumed completely. The reaction mixture was concentrated under reduced pressure to give a residue, which was purified by column chromatography (basic SiO2, DCM/Acetone= 15/1 to 8/1) to give phosphoramidite 11

1 (1.7 g, 1.44 mmol, 75% yield) as a white foam. H NMR (400 MHz, CDCl3) d 7.79 - 7.90 (m, 1

H) 7.43 (d, J=7.53 Hz, 2 H), 7.20 - 7.36 (m, 7 H), 7.04 (d, J=15.56 Hz, 1 H), 6.82 - 6.92 (m, 4 H),

6.71 - 6.80 (m, 1 H), 6.28 (t, J=6.53 Hz, 1 H), 5.41 - 5.55 (m, 1 H), 4.57 (dt, J=6.53, 3.26 Hz, 1

H), 4.18 - 4.29 (m, 1 H), 4.07 (q, J=5.02 Hz, 2 H), 3.79 (s, 6 H), 3.54 - 3.70 (m, 3 H), 3.39 - 3.51

(m, 3 H), 3.28 - 3.37 (m, 1 H), 2.55 - 2.80 (m, 2 H), 2.45 (t, J=6.27 Hz, 1 H), 2.28 (dt, J=13.68,

6.96 Hz, 1 H), 2.06 (s, 3 H), 1.25 - 1.31 (m, 1 H), 1.18 (t, J=6.27 Hz, 9 H), 1.09 (d, J=7.03 Hz, 2

H).

126

Synthesis of phosphoramidite 15

Pivaloyl chloride (7.25 g, 61.1 mmol, 1 equiv) was added drop wise to a solution of 4-(2- aminoethyl) phenol (7.5 g, 54.5 mmol, 1 equiv) in DCM (50 mL) and TFA (50 mL) at 25 °C, then the resulting brown mixture was stirred for 12 h. LCMS showed reactant was consumed completely and one main peak with desired MS was detected. The reaction mixture was concentrated under reduced pressure to give a residue, which was purified by column chromatography (SiO2, DCM: MeOH = 20/1 to 1:1) to give compound 13 (9.5 g, 42.9 mmol, 79%

1 yield) as a brown solid. H NMR (400 MHz, CDCl3) d 7.21 (d, J=8.53 Hz, 2 H), 6.95 (d, J=8.53

Hz, 2 H), 3.18 (t, J=7.28 Hz, 2 H), 2.90 - 2.99 (m, 2 H), 1.34 (s, 9 H). 19F NMR (376 MHz, CDCl3) d -75.82 (s, 1 F).

A solution of 9 (2.8 g, 4.66 mmol, 1 equiv), Et3N (0.94 g, 9.32 mmol, 2 equiv) and HATU

(3.54 g, 9.32 mmol) in DMF (50 mL) was stirred for 30 min at 25°C. Then to this mixture was added 13 (1.13 g, 5.13 mmol, 1.1 equiv) at 25°C. The mixture was stirred at 25 °C for 16 h. TLC

127

(DCM/MeOH = 20/1) showed the acid was consumed completely. The reaction mixture was concentrated under reduced pressure to remove solvents. The residue was purified by column chromatography (basic SiO2, DCM/MeOH = 20/1 to 10/1) to give compound 14 (2 g, yield 86%) as a white foam. TLC DCM/MeOH=20/1 (Rf = 0.20).

To a solution of 14 (2.90 g, 3.61 mmol, 1 equiv) and 4,5-dicyanoimidazole (0.639 g, 5.41 mmol, 1.5 equiv) in DCM (30 mL) was added drop wise of 3- bis(diisopropylamino)phosphanyloxypropanenitrile (1.30 g, 4.33 mmol, 1.2 equiv) at 0 °C under

N2 current. Then the mixture was stirred at 0-25 °C for 2 h under N2 current. A clear yellow solution was obtained. TLC (DCM/MeOH=20/1) showed 14 was consumed completely. The reaction mixture was concentrated under reduced pressure to give a residue, which was purified by column chromatography (basic SiO2, DCM/acetone = 15/1 to 10/1) to give phosphoramidite 15

1 (1.95 g, 1.94 mmol, 54% yield) as a light-brown gum. H NMR (400 MHz, CDCl3) d 7.76 - 7.91

(m, 2 H), 7.45 (d, J=6.02 Hz, 2 H), 7.27 - 7.38 (m, 7 H), 7.13 - 7.26 (m, 3 H), 6.97 - 7.09 (m, 3 H),

6.87 (dd, J=8.78, 3.76 Hz, 3 H), 6.61 (dd, J=15.56, 11.04 Hz, 1 H), 6.28 - 6.35 (m, 1 H), 6.18 (s,

1 H), 5.01 - 5.12 (m, 1 H), 4.59 (br. s., 1 H), 4.10 - 4.28 (m, 4 H), 3.80 (s, 5 H), 3.28 - 3.62 (m, 9

H), 2.78 (td, J=6.15, 1.76 Hz, 3 H), 2.60 - 2.72 (m, 10 H), 2.45 (t, J=6.27 Hz, 1 H), 2.28 (dt,

128

J=12.92, 6.34 Hz, 1 H), 1.37 (s, 7 H), 1.30 (t, J=6.27 Hz, 15 H), 1.16 - 1.22 (m, 7 H), 1.09 (d,

31 J=6.53 Hz, 3 H). P NMR (162 MHz, CDCl3) d 148.76 - 149.28 (m, 1 P) 14.16 (s, 2 P).

129

A.3 – Synthesis and characterization of functionalized oligonucleotides.

Standard phosphoramidite reagents and 1000-Å controlled-pore glass (CPG) supports for dA, Ac-dC, dmf-dG, and dT were purchased from Glen Research, as were chemical phosphorylation reagent II (10-1901) and the phosphoramidite for NHS-carboxy-dT (10-1535).

The phosphoramidite reagent for the incorporation of the aminoallyl side-chain-functionalized nucleotide (BA 0311) was purchased from Berry and Associates, as was the perfluoroalkyl-DMT dT phosphoramidite (FL 1300). The phosphoramidite reagents for the incorporation of isopentyl

(5a), cyclopropyl (5b), cyclopentyl (5c), fluorophenyl (5d), ethanolamine (11), and tyramine (15) side-chain-functionalized nucleotides were custom synthesized by WuXi AppTec as detailed in the previous section.

Solid-phase synthesis of side-chain-functionalized DNA was performed on a PerSeptive

Biosystems Expedite 8909 DNA synthesizer. All side-chain-functionalized phosphoramidites were incubated with molecular sieves overnight before use. Syntheses were performed on 1-µmol columns using standard coupling cycles, except for 5’-phosphorylation, which required 7 minutes of coupling with chemical phosphorylation reagent II. When the histamine side-chain- functionalized nucleotide was called for, NHS-carboxy-dT was incorporated in its place, and after the full-length synthesis was completed, a solution of histamine (free base; 5 mg) and diisopropylethylamine (1 µl) in 200 µl of 10% DMSO in acetonitrile was manually injected into the column and allowed to react overnight, and then the column was washed with acetonitrile and dried. Similarly, when a methylamine-functionalized nucleotide was required (for probing side chain SAR; see Figure 2.10c), NHS-carboxy-dT was incorporated in its place, and methylamine

(30 equiv. with respect to solid-phase synthesis scale, from a 2M stock in THF) and diisopropylethylamine (1 µl) in 200 µl of acetonitrile was added to the column and allowed to react

130

overnight before washing and drying. The functionalized DNA was then deprotected and cleaved from solid support by incubation in 30% ammonium hydroxide overnight at room temperature.

The phosphorylated trinucleotide building blocks were synthesized DMT-off and, after deprotection, cleavage, and evaporation of ammonium hydroxide, were purified by reverse-phase

HPLC using a gradient of 0-20 % acetonitrile in 0.1 M TEAA, pH 7, over 24 minutes, followed by 20-40 % acetonitrile in 0.1 M TEAA, pH 7, over 10 minutes. The full length HFNAP Evo5- syn was synthesized DMT-on and the 5’-most nucleotide was incorporated with a perfluoroalkyl-

DMT phosphoramidite (Berry and Associates, FL 1300). After deprotection/cleavage, the polymer was purified and deprotected on column with a fluorous phase purification cartridge (Fluoro-Pak

II from Berry and Associates) according to manufacturer’s instructions. (See Figure 3.5a for synthetic scheme.) The Evo5-syn used for mass spectrometry characterization and for SPR experiments was further purified by denaturing PAGE on a 10% TBE-urea gel.

Oligonucleotide samples were analyzed in negative ion mode using a Bruker Impact II q-

TOF mass spectrometer equipped with an Agilent 1290 uHPLC using flow injection analysis. The purified samples were introduced at a constant flow rate of 0.200 mL/minute using 50% acetonitrile and 0.1% formic acid. Each individual data file was calibrated for the m/z scale using a plug of sodium formate clusters introduced through a secondary isocratic pump and introduced using a 6-port valve. Using this internal calibration method, less than 2 ppm relative error was obtained on all samples. Bruker Data Analysis software was used to simulate the isotope pattern for each target ion.

131

Name Expected Observed Expected Observed m/z m/z m/z m/z ([M-2H]2-) ([M-H]-) phos-CAA-Isopentyl 513.6257 513.6253 1028.2588 1028.2580 phos-CAC-Isopentyl 501.6201 501.6198 1004.2475 1004.2467 phos-CTC-Isopentyl 497.1143 497.1142 995.2360 995.2355 phos-CGG-Isopentyl 529.6207 529.6204 1060.2486 1060.2478 phos-CAT-Fluorophenyl 521.0918 251.0914 1043.1908 1043.1896 phos-CAG-Fluorophenyl 533.5950 533.5946 1068.1973 1086.1961 phos-CGA-Fluorophenyl 533.5950 533.5946 1068.1973 1068.1960 phos-CGC-Fluorophenyl 521.5894 521.5890 1044.1861 1044.1849 phos-CTA-Cyclopropyl 494.0965 494.0961 989.2003 989.1990 phos-CCA-Cyclopropyl 486.5967 486.5961 974.2006 974.1993 phos-CCT-Cyclopropyl 482.0909 482.0907 965.1890 965.1885 phos-CCC-Cyclopropyl 474.5910 474.5907 950.1894 950.1885 phos-CTT-Cyclopentyl 510.6142 510.6137 1022.2356 1022.2343 phos-CTG-Cyclopentyl 523.1174 523.1170 1074.2421 1047.2406 phos-CGT-Cyclopentyl 523.1174 523.1169 1047.2421 1047.2406 phos-CCG-Cyclopentyl 515.6176 515.6170 1032.2425 1032.2408 phos-TTT-Phenol 551.5987 551.5983 - - phos-TTG-Phenol 564.1020 564.1019 1129.2112 1129.2099 phos-TGT-Phenol 564.1020 564.1015 1129.2112 1129.2094 phos-TCG-Phenol 556.6021 556.6017 1114.2115 1114.2103 phos-TAA-Imidazole 547.6081 547.6079 1096.2234 1096.2227 phos-TAC-Imidazole 535.6025 535.6023 1072.2122 1072.2117 phos-TCA-Imidazole 535.6025 535.6021 1072.2122 1072.2113 phos-TCC-Imidazole 523.5969 523.5964 1048.2010 1048.1999 phos-TAT-Primary alcohol 518.0889 518.0883 1037.1850 1037.1837 phos-TAG-Primary alcohol 530.5921 530.5917 1062.1915 1062.1904 phos-TGA-Primary alcohol 530.5921 530.5917 1062.1915 1062.1903 phos-TGC-Primary alcohol 518.5865 518.5862 1038.1802 1039.1793 phos-TTA-Allylamine 489.0861 489.0857 979.1795 979.1786 phos-TTC-Allylamine - - 955.1683 955.1672 phos-TCT-Allylamine - - 955.1683 955.1673 phos-TGG-Allylamine 509.5868 509.5864 1020.1809 1020.1799

Mass spectrometry data for PCSK9-Evo5-syn are given in Figure 3.5b.

132

A.4 – Additional experimental procedures

General information

Unless otherwise specified, all materials and compounds were prepared using commercially available reagents from Sigma-Aldrich, and used without further purification. Water was purified with a Milli-Q purification system. DNA oligonucleotides without nucleobase side chain functional groups were purchased from Integrated DNA Technologies (IDT) unless noted otherwise. In-house synthesis of side-chain-functionalized DNA was performed on a PerSeptive

Biosystems Expedite 8909 DNA synthesizer and purified by reverse-phase high-pressure liquid chromatography (HPLC, Agilent 1200) using a C18 stationary phase (Waters XBridge Prep C18,

5 µm, 10 x 250 mm) and an acetonitrile/ 100 mM triethylammonium acetate gradient. All materials and reagents used for oligonucleotide synthesis were purchased from Glen Research, Berry &

Associates, or ChemGenes, or custom synthesized by WuXi AppTec. Oligonucleotide and protein concentrations were quantified by UV spectroscopy using a Nanodrop ND1000 spectrophotometer, using extinction coefficients calculated with the IDT Oligo Analyzer and Expasy ProtParam web servers, respectively. Non-commercial oligonucleotides were characterized at the Harvard FAS

Small Molecule Mass Spectrometry Facility by ESI-MS on a Bruker Impact II q-TOF mass spectrometer equipped with an Agilent 1290 uHPLC using flow injection analysis. Polyacrylamide gels were purchased from Bio-Rad. Sanger sequencing was performed by Eton BioSciences and analyzed with ApE – A plasmid Editor. Quantitative polymerase chain reactions (qPCRs) were performed on a Bio-rad CFX96 system. Deep sequencing was performed on an Illumina MiSeq.

Surface plasmon resonance (SPR) analysis was carried out on a Biacore X100 or Biacore T200

(GE Healthcare Life Sciences). Time-resolved FRET assays were performed on a Tecan Infinite

M1000 PRO microplate reader.

133

Synthesis of HFNAP by templated translation via ligase-mediated polymerization

To synthesize the double-stranded HFNAP-template hybrid, DNA template [up to 10 pmol, either in solution or immobilized on MyOne Streptavidin C1 magnetic beads (ThermoFisher

Scientific)], polymerization initiation and termination primers (1.5 equivalents each relative to template), functionalized trinucleotide building blocks (10 equivalents relative to template for each occurrence of the corresponding codon) and 10x T4 RNA ligase reaction buffer (New England

Biolabs; 1 µL) were mixed in a total volume of 8 µL in a PCR tube. The mixture was subjected to the following temperature program on a thermocycler: 95 °C for 10 sec; 65 °C for 4 min; a ramp from 65 °C to 4 °C at 0.1 °C per 10 s. To the PCR tube were added 1 µL of 10 mM ATP and 1 µL of T3 DNA ligase (New England Biolabs). The reaction was incubated at 4 °C for 12 h and then at 16 °C for 2 h. For the evaluation of translation yield on template libraries, the reaction mixture was separated on nondenaturing 10% TBE polyacrylamide gel electrophoresis and stained with

SYBR gold for characterization.

To synthesize and isolate an unbiotinylated HFNAP, unbiotinylated primers and a doubly biotinylated ssDNA template was used in a solution phase polymerization reaction. After the reaction incubation period, the reaction mixture was combined with MyOne Streptavidin C1 magnetic beads (ThermoFisher Scientific, 65002; 1 µL of the stock 1% suspension per 4 pmol of biotinylated template), and then an equal volume of 2x bind-and-wash buffer (2M NaCl, 2mM

EDTA, 20 mM Tris-HCl, pH 7.5) was added. After 30 minutes of incubation on a rotor, the supernatant was removed by magnetic separation, and the beads were suspended 18 µL of 20 mM

NaOH. The supernatant was combined with 12 µL of formamide denaturing mix (95% formamide,

1 mM EDTA) and run on a 10% TBE-urea PAGE gel. Desired product was visualized by UV shadowing at 265 nm against a TLC plate (with F254 indicator), excised from the gel, eluted in

134

200 µL of TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) overnight, filtered, mixed with 2 mL of ssDNA column loading mix (40:60:0.5 v/v/v of saturated aqueous guanidinium chloride/isopropanol/3M sodium acetate, pH 5.2) and cleaned up with a Qiagen QiaQuick column.

Typical isolated yield of HFNAP from 200 pmol of template was between 5 and 15 pmol as determined by Nanodrop or quantitative PCR. For the validation of sequence specificity of translation and amplification, a small sample (~1 fmol) was amplified by PCR using Q5 DNA polymerase and the primers T7-out-PCR2 and pp2-library. The amplicon was purified by PAGE on a nondenaturing 10% TBE gel and subjected to Sanger sequencing.

Synthesis and isolation of a biotinylated HFNAP followed the same procedure as above, except that an unbiotinylated ssDNA template was used, and one of polymerization primers was doubly biotinylated. After polymerization reaction, streptavidin bead capture, and alkaline denaturation, the bead-bound biotinylated molecules were eluted by heating in 95% formamide, 1 mM EDTA at 95 °C for 30 min. After magnetic separation, desired product was isolated from the supernatant by PAGE on a denaturing TBE-urea gel. Typical isolated yield of HFNAP from 200 pmol of template was between 5 and 15 pmol. Synthesis and isolation of a biotinylated, truncated

HFNAP (such as PCSK9-Evo5) followed the same procedure, except that the primer contained a

2’-deoxy-U nucleotide, and the ligation reaction mixture was treated with USER enzyme (New

England Biolabs) at 37 °C for 2 h before proceeding to streptavidin bead capture.

During the selections, polymerization reactions were performed with templates immobilized on streptavidin beads and processed as detailed in the next section.

135

Selection of HFNAPs that bind protein targets

Selection of PCSK9-binding HFNAPs from a naïve library

Recombinant human PCSK9 protein (ACROBiosystems, PC9-H5223) was immobilized onto AminoLink Plus aldehyde-functionalized agarose resin via reductive amination with a

MicroLink Protein Coupling Kit (ThermoFisher Scientific, 20475) at a loading of 1 mg protein per mL resin according to the resin’s manufacturer’s instructions.

To initiate the selection, primer extension was performed with 10 pmol BtBt-ExtA on 5 pmol of sense strand randomized DNA library (“naïve library AZ15”) with Klenow (exo-) polymerase (New England Biolabs, M0212S) at 37 °C overnight. The reaction mixture was combined with an equal volume of 2x bind-and-wash buffer (2 M NaCl, 2 mM EDTA, 20 mM

Tris-HCl, pH7.5) and immobilized onto 10 µL of a 1% suspension of MyOne Streptavidin C1 magnetic beads. After removal of supernatant, the beads were washed three times with 20 µL of

20 mM NaOH (leaving a biotinylated ssDNA template library on the beads) and then twice with

20 µL of 1x T4 RNA ligase reaction buffer.

To the bead-immobilized template library were added pp1A and pp2Z (7.5 pmol each), a mixture of all 32 functionalized trinucleotide building blocks (100 pmol each), 10x T4 RNA ligase reaction buffer (1 µL), and water to a total of 8 µL. The suspension was transferred to a PCR tube and subjected to the following temperature program on a thermocycler: 95 °C for 10 sec; 65 °C for 4 min; a ramp from 65 °C to 4 °C at 0.1 °C per 10 s. To the PCR tube were added 1 µL of 10 mM ATP and 1 µL of T3 DNA Ligase. The reaction was incubated at 4 °C for 12 h and then at

16 °C for 2 h. An additional 10 µL of a 1% suspension of MyOne Streptavidin C1 magnetic beads and 20 µL of 2x bind-and-wash buffer was added, and the mixture was incubated at room temperature for 30 min before magnetic separation. The supernatant was discarded, and then the

136

unbiotinylated HFNAP strand was eluted from the beads by treatment with 2 × 30 µL of 20 mM

NaOH. To the combined HFNAP fractions was added 600 uL of ssDNA column loading mix

(40:60:0.5 v/v/v of saturated aqueous guanidinium chloride/isopropanol/3M sodium acetate, pH

5.2) and the mixture was cleaned up with a Qiagen MinElute column, eluting into 15 µL of water.

The HFNAP was added to 35 µL of DPBS (with calcium and magnesium; Lonza 17-513Q) containing 0.1 mg/ml BSA and 0.01 % Tween-20, and then incubated with PCSK9 resin in a micro-spin filtration column (Pierce 89879) at room temperature for 1 h on a rotor. (The amounts of resin-bound protein used in each round are indicated in Figure 2.6.) The flow-through was collected by centrifugation at 1000 g into an Eppendorf tube. The beads in the column were washed three times with 50 µL each of DPBS, each wash being collected by centrifugation as well. The column was cut open and the beads were collected by centrifugation into an Eppendorf tube. To the beads was added 50 µL of a lithium dodecyl sulfate (LDS)-containing buffer (Life

Technologies B0007, diluted 4-fold), and the tube was incubated at 95 °C for 15 min. After cooling,

600 uL of ssDNA column loading mix was added, and the mixture was cleaned up with a QiaQuick column, eluting the HFNAP into 50 µL of water.

Samples of 1 µL each from the flow-through, the three washes, and the elution were quantified by qPCR (20 µL reaction volume) using the iTaq Supermix (Biorad, 172-5125) with pp2Z and ExtA (500 nM each) as primers under the following temperature program: 95 °C for 3 min; 35 cycles of 95 °C for 15 s, 59.5 °C for 30 s, 72 °C for 15 s. The number of cycles for the qPCR curve on the elution sample to reach the end of exponential growth was used as the number of cycles for the preparative PCR (400 µL reaction volume split into 8 x 50 µL) using Q5 Hot Start

High-Fidelity 2X Master Mix (New England Biolabs, M0494), with the selection elution pool (20

µL) as template and pp2Z and BtBt-ExtA (500 nM each) as primers, under the same annealing and

137

extension temperatures as in the qPCR. The finished reaction was mixed with 2 mL of ssDNA column loading mix and cleaned up with a Qiagen MinElute column, eluting into 15 µL of water.

The amplified dsDNA was purified by PAGE on a non-denaturing 10% TBE gel. A portion

(indicated in Figure 2.6) of the gel-purified product was immobilized on 10 µL of a 1% suspension of MyOne Streptavidin C1 magnetic beads, strand-separated with 20 mM NaOH, and the immobilized template strand was used for HFNAP translation to initiate the next round of selection.

Evolution of PCSK9-A5 for higher affinity

The evolution of PCSK9-A5 was performed in a similar fashion with the following differences. Rediv library AZ15 (custom synthesized by TriLink ) was used to initiate the selection. The primer pp1A-3ddC was used instead of pp1A for ligase-based polymerization in order to facilitate the removal of cheaters (Figure 3.2). After two rounds of selection using the same PCSK9 bead loading as before, beads with reduced loading (150 µg protein per mL resin for rounds 3-5; 40 µg protein per mL resin for round 6) were used for affinity enrichment.

Selection of IL-6-binding HFNAPs from a naïve library

The selection of IL-6-binding HFNAPs was performed in a similar fashion with the following differences. Recombinant human IL-6 protein (PeproTech, 200-06) immobilized on

AminoLink Plus aldehyde at 0.25 mg protein per ml resin was used as the immobilized target.

Throughout the selection, 240 pmol of immobilized IL-6 protein was used in each round. A primer extension on naïve library CW15 with BtBt-ExtC was used to initiate the selection. The primers pp1C and pp2W were used for ligase-based polymerization (translation) reactions. The primers

138

ExtC and pp2W were used for qPCR reactions. The primers BtBt-ExtC and pp2W were used in

PCR reactions that amplify affinity-enriched HFNAP into dsDNA for initiating the next round of selection.

High-throughput DNA sequencing and data analysis

Small samples from the elution pool of selection rounds were amplified by PCR using Q5

Hot Start High-Fidelity 2X Master Mix to sub-saturation number of cycles (determined during the selection by qPCR) with primers that install flanking sequences. The amplicons were PAGE- purified and amplified by PCR with Illumina adapter primers. The amplicons were again PAGE- purified and subjected to high-throughput sequencing on an Illumina MiSeq.

The FASTQ files from high-throughput sequencing were first processed with CutAdapt1 for the following operations: a quality-based trim (with a threshold Phred score of 30), removal of constant regions (with a one-base error tolerance in each region; sequences were discarded if either constant region was not found), and filtering for the correct length (45) in the remaining sequence.

Sequences that could not be completely parsed into trimer codons (whose first nucleobase should always be C or T) were discarded.

For the initial PCSK9 selection and the IL-6 selection, the copy numbers of all unique sequences were tallied and the unique sequences above 5 reads per million were clustered using

FASTAptamer2. Sequence logos for PCSK9 selection-enriched sequences were then generated for individual clusters with WebLogo 33. For the PCSK9-A5 evolution, sequence logos were generated directly from sequencing data with WebLogo 3.

139

Affinity characterization by bead retention assay

Candidate PCSK9-binding HFNAPs (from ligase-catalyzed polymerization reaction) or sequence-matched unfunctionalized DNA (0.5 pmol) in 50 µL of DPBS (with calcium and magnesium) containing 0.1 mg/ml BSA and 0.01 % Tween-20 was incubated with 1 µL of either

PCSK9 beads or thrombin beads (1 mg protein per mL resin) in a micro-spin filtration column at room temperature for 1 h on a rotor. Following the same procedure described for the selection, flow-through was collected, the beads were washed three times and the retained HFNAP or DNA was eluted by heating, and the amount of amplifiable HFNAP or DNA in the flow-through, wash, and elution samples were quantified by qPCR.

Candidate IL-6-binding HFNAPs and sequence-matched DNA were similarly assayed on

PCSK9 beads (prepared as above, but serving as negative control) or on IL-6 beads (0.25 mg protein per mL resin).

Surface plasmon resonance (SPR) assays

All SPR assays were performed at 25 °C on a Biacore X100 or Biacore T200 (GE

Healthcare Life Sciences).

Binding kinetics between enzymatically synthesized biotinylated HFNAPs and unlabeled

PCSK9 (ACROBiosystems, PC9-H5223) were measured using single-cycle kinetics with the

Biotin CAPture kit (GE Life Sciences, 28920233 or 28920234). HBS-EP buffer (GE Life Sciences,

BR100188), diluted by MilliQ water to 0.9x, was used as the bulk running buffer. Each experiment consisted of three start-up cycles followed by multiple data collection and blank cycles. In each data collection cycle, the CAP reagent was injected onto both active and control flow cells of the

CAP chip to generate streptavidin-coated surfaces, and then a doubly biotinylated HFNAP was

140

injected onto the active flow cell as the immobilized ligand. Afterwards, four ascending concentrations of PCSK9 protein [10, 30, 100, 300 nM protein supplemented with 1 mg/ml salmon sperm DNA (Invitrogen, 15632-011) as nonspecific binding reducer for PCSK9-A5 and its variants; 2, 6, 20, 60 nM protein without salmon sperm DNA for PCSK9-Evo5 and its variants] in

0.9x HBS-EP were injected onto both flow cells in series at 30 µL/min for 150 seconds each, followed by 240 seconds of dissociation. Both flow cells were then regenerated with a 3:1 mixture of 8 M guanidinium chloride and 1 M NaOH following manufacturer’s instructions. Blank cycles were run similarly except that 0.9x HBS-EP (containing 1 mg/ml salmon sperm DNA when

PCSK9-A5 and its variants were assayed) without PCSK9 protein was injected. As signals from blank cycles were similar regardless of the immobilized HFNAP, one blank cycle was run for every two data collection cycles. Kinetic parameters were fitted to double-blank-subtracted sensograms using BIAEvaluation software under a 1:1 binding model, unless stated otherwise.

Binding between biotinylated Evo5 and truncated PCSK9 protein missing the prodomain (“human mature PCSK9”, ACROBiosystems, PC9-H5226) was also assayed using this protocol.

Binding kinetics between enzymatically synthesized biotinylated HFNAPs and unlabeled

IL-6 protein, expressed in either E. coli (PeproTech, 200-06) or human HEK293 cells

(ACROBiosystems, IL6-H4218), were assayed similarly with the following differences. Four ascending concentrations (10, 30, 100, 300 nM) of IL-6 without additional nonspecific binding reducer were injected in the single-cycle kinetic runs. As the binding kinetics did not fit a classical

1:1 binding model, a heterogeneous ligand model was used to fit the double-blank-subtracted sensograms.

Binding kinetics between chemically synthesized Evo5-syn and biotinylated Avi-tagged

PCSK9 (ACROBiosystems, PC9-H82E7) were measured using single-cycle kinetics with on a

141

Series S SA chip (GE Life Sciences, BR100531) using 0.9x HBS as the bulk running buffer. Both active and control flow cells were conditioned with three consecutive one-minute injections of 1

M NaCl in 50 mM NaOH. Biotinylated PCSK9 in 0.9x HBS buffer was immobilized onto the active flow cell to ~1000 RU. Either five portions of buffer or five ascending concentrations of

Evo5-syn (1.8, 6, 18, 60, 180 nM) were injected onto both flow cells in series at 30 µL/min for

150 seconds each, followed by 600 seconds of dissociation. Kinetic parameters were fitted to double-blank-subtracted sensograms under a 1:1 binding model using BIAEvaluation software.

Binding of PCSK9 on surface-immobilized LDLR in the presence of various competing agents was measured on a Series S SA chip. The bulk running buffer was 10 mM HEPES, 150 mM NaCl, 0.1 mM CaCl2, 0.005% Tween-20, pH 7.5. Both active and control flow cells were conditioned with three consecutive one-minute injections of 1 M NaCl in 50 mM NaOH, and then biotinylated Avi-tagged LDLR (BPS Bioscience, 71206) was immobilized onto the active flow cell to ~2000 RU. In each data collection cycle, a solution consisting of PCSK9 (20 nM final), a carboxymethyl dextran-based non-specific binding reducer (GE Healthcare, BR-1006-91, 1 mg/ml final), and varying concentrations (0, 2, 6, 20, 60, or 200 nM final) of PCSK9-Evo5-syn,

Evo5DNA-InvdT, unlabeled LDLR (AcroBiosystems, LDR-H5224), or a known PCSK9- neutralizing antibody (BPS Bioscience, 71207) in bulk running buffer was injected at 10 µL/min for 420 s, followed by 15 s of dissociation. The surface was regenerated using two consecutive one-minute injections of 50 mM HCl. Blank cycles were run similarly except that running buffer containing 1 mg/ml non-specific binding reducer without protein was injected. Response levels at the end of the injection periods from double-blank-subtracted sensograms were recorded.

142

Electrophoretic mobility shift assay (EMSA)

A 7.5% Tris-Glycine polyacrylamide gel (Bio-rad, 5671024) was pre-run at 150 V for 1 hour at 4 °C in a cold room. Mixtures (12 µl each) of PCSK9-Evo5-Fluor or a sequence-matched

DNA (1 nM final), PCSK9 protein (ACROBiosystems, PC9-H5223, between 0.3 and 300 nM final), and salmon sperm DNA (Invitrogen, 15632-011, 30 µg/ml final) in 0.5x HBS-EP buffer

(diluted from HBS-EP buffer, GE Life Sciences, BR100188) containing 3% v/v glycerol was incubated at 25 °C for 30 minutes, and then at 4 °C for 5 minutes. The mixtures were loaded onto the pre-run gel and run at 150 V for 15 minutes at 4 °C. The gel was imaged with a Typhoon imager using the Cy5 channel.

DNA secondary structure prediction

DNA secondary structure prediction was performed on the mfold Web server4 using parameters that reflect conditions used in SPR assays: the folding temperature was set to 25 °C, the sodium concentration to 0.15 M, and the magnesium concentration to zero.

References

1. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads.

EMBnet.journal 17, 10–12 (2011).

2. Alam, K. K., Chang, J. L. & Burke, D. H. FASTAptamer: A Bioinformatic Toolkit for High-

throughput Sequence Analysis of Combinatorial Selections. Mol. Ther. - Nucleic Acids 4,

e230 (2015).

3. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo

generator. Genome Res. 14, 1188–1190 (2004).

143

4. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic

Acids Res. 31, 3406–3415 (2003).

144