Research Collection

Doctoral Thesis

DNA-encoded chemical libraries

Author(s): Mannocci, Luca

Publication Date: 2009

Permanent Link: https://doi.org/10.3929/ethz-a-005783014

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library Luca Mannocci DNA-Encoded Chemical Libraries Diss. ETH No.18153 DNA-Encoded ChemicalLibraries DNA-Encoded Luca Mannocci Diss. ETHNo.18153

DISS. ETH NO. 18153

DNA-Encoded Chemical Libraries

A dissertation submitted to the

ETH Zurich

For the degree of

Doctor of Sciences

Presented by

Luca Mannocci

Dott. Chim. Università degli Studi di Pisa Born September 7, 1979 Citizen of Pisa (Italy)

Accepted on the recommendation of

Prof. Dr. Dario Neri, examiner Prof. Dr. Karl-Heinz Altmann, co-examiner

Zurich, 2009

“I believe in intuition and inspiration. Imagination is more important than knowledge. Knowledge is limited. Imagination embraces the entire world, stimulating progress, giving birth to evolution. It is, strictly speaking, a real factor in scientific research.”

Albert Einstein

Alla mia famiglia

TABLE OF CONTENTS

1. SUMMARY ...... 7

RIASSUNTO ...... 9

List of abbreviations ...... 11

2. INTRODUCTION ...... 14

2.1 DNA-Encoded Chemical Libraries ...... 16

2.1.1 Libraries of DNA displaying one covalently linked chemical entity...... 20

2.1.1.1 DNA-encoded “Split-&-Pool” ...... 20

2.1.1.2 DNA-assisted “Split-&-Pool” ...... 21

2.1.1.3 DNA-templated synthesis...... 24

2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules ...... 28

2.1.2 DNA libraries displaying multiple covalently linked chemical entities ESAC libraries...... 30

2.2. The decoding of DNA-encoded chemical libraries...... 38

2.2.1 Microarray-based decoding ...... 35

2.2.2 Decoding by high throughput sequencing ...... 38

2.2.2.1 “454” technology...... 40

2.2.2.2 Solexa technology ...... 42

2.2.2.3 SOLiD techonlogy...... 44

2.2.2.4 Single Molecule DNA Sequencing – Helicos technology...... 48 3. RESULTS ...... 50

3.1 DNA-Encoded Library “DEL4000”...... 50

3.1.1 Library design and synthesis ...... 51

3.1.2 Model Compounds ...... 53

3.1.3 Oligonucleotides...... 54

3.1.4 Compounds...... 55

3.1.5 HPLC Purification...... 56

3.1.6 Mass Spectrometry ...... 57

3.1.7 Oligonucleotide concentration determination ...... 58

3.1.8 Polymerase Klenow encoding...... 59

3.1.9 Summary ...... 59

3.2 Selections using the DEL4000 library...... 61

3.2.1 Streptavidin selection ...... 62

3.2.1.1 Identification of streptavidin binding molecules ...... 64

3.2.1.2 Characterization of streptavidin binding molecules...... 65

3.2.2 Polyclonal human IgG selection ...... 68

3.2.2.1 Identification of polyclonal IgG binding molecules ...... 68

3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity chromatography resins ...... 70

3.2.3 Matrix metalloproteinase 3 (MMP3) selection ...... 71

2 3.2.3.2 Characterization of MMP3 binding molecules...... 72

3.2.4 Computational simulation of DEL4000 selections...... 73

3.3 General strategies for the stepwise construction of very large DNA encoded chemical libraries...... 75

3.3.1 Selective deprotection and reaction of di-amine derivatives ...... 75

3.3.1.1 Orthogonal protective group and selective deprotection ...... 76

3.3.1.2 Core scaffolds design and synthesis strategy ...... 78

3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core scaffold based library...... 80

3.3.2 Stepwise DNA-encoding ...... 82

3.3.2 Encoding by ligation ...... 82

3.3.2.1 Encoding by a combination of Klenow polymerase and ligation...... 83

3.3.2.2 Encoding by Klenow polymerase...... 84

3.3.3 Summary ...... 85

4. DISCUSSION ...... 87

5. MATERIAL AND METHODS ...... 89

5.1 Reagents and general remarks ...... 89

5.2 Synthesis of DEL4000 DNA Encoded Library...... 89

5.2.1 Synthesis of library model compounds oligonucleotide conjugate...... 90

5.2.2 Coupling reactions of 20 Fmoc-protected amino acids...... 91

5.2.3 Coupling reactions of 200 carboxylic acids...... 91

3 5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions...... 92

5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive control) ...... 92

5.3 Library DEL 4000 selections...... 93

5.3.1 Streptavidin selection...... 93

5.3.1.1 Identification of binding molecules...... 93

5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates...... 93

5.3.2 Affinity measurements...... 94

5.3.3 Polyclonal human IgG selection...... 95

5.3.3.1 Polyclonal human IgG coating of sepharose beads...... 95

5.3.3.2 Identification of human IgG binding molecules...... 95

5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40 or 16-40...... 96

5.3.3.4 Polyclonal human IgG Cy5 labeling...... 97

5.3.3.5 Biotinylated polyclonal human IgG...... 97

5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG Cy5 labeled or biotinylated human IgG on IgG binding resin. 97

5.3.4 Human MMP3 selection...... 98

5.3.4.1 Human MMP3 coating of sepharose beads...... 98

5.3.4.2 Identification of human MMP3 binding molecules...... 99

5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates...... 99

5.3.5 Computational simulation ...... 100

4 5.4 Stepwise coupling by selective deprotection and reaction of di-amine derivatives...... 100

5.4.1 DNA-compatible cleavage of different amino protective groups...... 100

5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c)...... 100

5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d)...... 101

5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b)...... 101

5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e)...... 102

5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2)...... 102

5.4.1.6 Oligonucleotide conjugation of N-protected cis-2-

aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc-Nε-Nvoc-lysine...... 103

5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid oligonucleotide conjugate...... 103

5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid. oligonucleotide conjugate...... 104

5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and N- Fmoc-N’-Nvoc-lysine oligonucleotide conjugate...... 104

5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library...... 105

5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino- cyclopentanecarboxylate (4)...... 105

5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino- cyclopentanecarboxylate (5)...... 105

5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino- cyclopentanecarboxylate (6)...... 105

5 5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino- cyclopentanecarboxylate (8)...... 106

5.5 Stepwise encoding ...... 106

5.5.1 Stepwise encoding by Ligation...... 107

5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation...... 107

5.5.3 Stepwise encoding by Klenow Polymerase...... 108

5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε- Nvoc di-amino carboxylic acid derivative based library...... 109

5.5.5 Bacterial and sequencing...... 111

6. REFERENCES...... 112

7. CURRICULUM VITAE ...... 122

8. ACKNOWLEDGMENTS...... 128

9. APPENDIX...... 131

9.1 Model compounds oligonucleotide conjugate...... 131

9.2 Library synthesis overview ...... 133

6 1. SUMMARY

The isolation of small organic molecules capable of specific binding to biological targets is a central problem in chemistry, biology and pharmaceutical sciences. Consequently, there is a considerable interest in the development of powerful and convenient technologies for the construction of large sets (“libraries”) of chemical compounds and of novel screening methodologies for the identification of binding molecules. DNA-encoded chemical libraries represent an innovative approach to the construction and screening of libraries of unprecedented dimension and quality. Such libraries consist of a collection of chemical compounds, each individually coupled to a distinctive DNA fragment which serves as identification bar code. DNA-encoded chemical libraries can be "panned" on a target protein immobilized on a solid support. Typically, high-throughput sequencing reveals the different composition of the library before and after panning, thus allowing the identification of binding molecules to the target protein of interest. In this respect, DNA-encoded chemical libraries bear a logical similarity to phage display libraries of proteins and peptides, in which the binding specifically displayed on the tip of the phage surface (“phenotype”) is physically linked to the gene coding for the polypeptide (“genotype”).

In the first part of the this thesis, I present a general strategy for the stepwise coupling of coding DNA fragments to nascent organic molecules following individual reaction steps, as well as the implementation of high-throughput sequencing for the identification and relative quantification of library members. The methodology was exemplified in the construction of a DNA-encoded chemical library containing 4’000 compounds (DEL4000) covalently attached to unique DNA-fragments serving as amplifiable identification bar-codes. We have also assessed the relative composition of the new library and its functionality by performing selection experiments on sepharose resin coated with streptavidin. This study has led to the identification of novel chemical compounds with submicromolar dissociation constants towards streptavidin. Moreover we have found that selections can conveniently be decoded using a recently described high throughput DNA sequencing technology (termed “454 technology”), originally developed for genome sequencing,

7

In a second selection experiment binding molecules to polyclonal human IgG were identified. I could show that, upon coupling to resin, these compounds could be used for the affinity purification of human IgG from culture supernatants. Furthermore we also carried out a selection against the catalytic domain of human matrix metalloproteinase 3 (MMP3). Matrix metalloproteinases (MMPs) are zinc- dependent proteases which are involved in tissue remodelling of a variety of physiological and pathological processes. The selection facilitated the identification of a binding compound with dissociation constant in the low μM range.

Encouraged by these results we investigated methodologies for the construction of very large DNA-encoded chemical libraries, featuring the stepwise addition of at least three independent sets of chemical moieties onto an initial scaffold, using suitable orthogonal chemical reactions and/or protecting strategies, followed by the sequential addition of the corresponding DNA codes. Our experiments have shown that it should be possible to construct DNA-encoded libraries containing over one million individual chemical compounds. The construction of such libraries is currently in progress.

8 RIASSUNTO

L’isolamento di sostanze organiche in grado di interagire specificamente con target biologici è un problema cruciale sia in chimica, biologia che in campo farmaceutico. Di conseguenza sta emergendo un crescente interesse in sviluppare nuove rapide ed efficienti tecnologie per la costruzione e lo screening di ampie raccolte (“librerie”) di composti organici. Un’innovativa e brillante soluzione a questo problema è rappresentato dalle librerie chimiche “DNA-encoded”. Essenzialmente queste tecniche prevedono la costruzione di librerie di composti organici in cui ciascun membro è covalentemente coniugato a uno specifico frammento di DNA che “codifica” inequivocabilmente la sua natura. Per tanto, la selezione di composti d’interesse con specifiche attività biologiche (“screening”) utilizzando librerie “DNA- encoded” può essere facilmente eseguita incubando ad esempio la libreria con l’opportuno target biologico immobilizzato su un supporto solido. Dopo aver escluso i composti non-leganti, attraverso appropriati lavaggi del supporto, le moderne tecniche di “high-throughput sequencing” permettono di sequenziare gli specifici codici di DNA, di determinare la composizione della libreria prima e dopo la selezione e di conseguenza di identificare i composti effettivamente interagenti con il target biologico d’interesse. Da questo punto di vista le librerie chimiche “DNA-encoded” racchiudono un’intrinseca analogia con le librerie di fagi utilizzate nella “phage display”, in cui ciascuna proteina o peptide (“fenotipo”) è fisicamente associata al corrispondente gene codificante (“genotipo”).

Nella prima parte di questa Tesi è descritta una strategia generale per la costruzione di librerie chimiche “DNA-encoded” e l’implementazione delle tecniche di “high- throughput sequencing” per l’identificazione e la relativa quantificazione dei membri della libreria prima e dopo la selezione. La metodologia è qui esemplificata nella costruzione di una libreria chimica “DNA-encoded” contenente 4’000 composti (DEL4000) ciascuno univocamente identificato tramite specifici DNA-oligonucleotidi covalentemente coniugati. In seguito è stata determinata la relativa composizione della libreria e la sua funzionalità eseguendo esperimenti di selezione impiegando strptavidina immobilizzata su resina di sefarosio. Questi studi hanno condotto all’identificazione di nuovi composti chimici con costanti di dissociazione sub-

9 micromolare verso la streptavidina e hanno inoltre dimostrato che le tecniche di “high-thoughput sequencing” (denominate “tecnologie 454”), originariamente sviluppate per la sequenziazione del genoma, possono essere efficacemente impiegate nel processo di decodifica delle selezioni.

In una seconda selezione utilizzando DEL4000 sono stati identificati composti specifici per “polyclonal human IgG”. E’ stato quindi dimostrato che tali composti, in seguito a immobilizzazione su resina cromatografica, possono essere utilizzati nella purificazione per affinità di IgG umani da supernatanti derivanti da colture cellulari. Infine è stata eseguita una selezione per l’identificazione di nuovi composti specifici per il dominio catalitico del “human matrix metalloproteinase 3” (MMP3). Le “matrix metalloproteinases” (MMPs) sono una famiglia di proteasi zinco-dipendenti coinvolte nel rimodellamento del tessuto in una varietà di processi fisiologici e patologici. La selezione ha permesso l’identificazione di un composto con costante di dissociazione micromolare.

Incoraggiati da questi risultati, abbiamo deciso di approfondire le ricerche per la costruzione di una libreria chimica “DNA-encoded” di dimensioni superiori prevedendo la congiunzione sequenziale di almeno tre serie indipendenti di composti, utilizzando reazioni chimiche ortogonali e/o strategie di protezione/deprotezione di gruppi funzionali, seguita dall’introduzione di corrispondenti codici di DNA. E’ stata quindi dimostrata la possibilità di costruire una libreria chimica “DNA-encoded” contenente oltre un milione di composti. La costruzione di questa libreria (DEL10e6) è attualmente in corso.

10 List of abbreviations

aq. aqueous

ATP Adenosine-5'-triphosphate

bp base pair

CAII Carbonic Anhydrase II

CHO Chinese Hamster Ovary

CNBr Cyanobromide

2-((1E,3E)-3-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3- Cy3 dimethylindolin-2-ylidene)prop-1-enyl)-3,3-dimethyl-1-propyl-3H-indolium 2-((1E,3E,5E)-5-(1-(5-(2,5-dioxopyrrolidin-1-yloxy)-5-oxopentyl)-3,3- Cy5 dimethylindolin-2-ylidene)penta-1,3-dienyl)-1,3,3-trimethyl-3H-indolium

DCM Dichloromethane

DEL DNA Encoded Library

DIEA N,N'-Diisopropyethylamine

DMBAA Dimethylbuthylammonium acetate

DMF N,N'-Dimethylformamide

DMSO Dimethylsulfoxyde

DMT dimethoxytrityl

DNA Deoxyribonucleic acid

dNTPs deoxyribonucleotides

DTT Dithiothreitol

ECM Extracellular Matrix

EDC N-ethyl-N'-(3-dimethylaminopropyl)-carbodiimide

EDTA Ethylenediamineetracetic acid

11 equiv. equivalent

ESAC Encoded Self-Assembling Chemical library

ESI Electrospray ionization

FG Functional Group

FITC Fluorescein isothiocyanate

Fmoc (9-fluorenylmethoxycarbonyl)

O-(7-Azabenzotriazol-1-yl)-1,1,3,3-tetramethyluronium HATU hexafluorophosphate

HBTU 2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HFIP 1,1,1,3,3,3-hexafluoroisopropanol

HOBt N-hydroxybenzotriazole

HPLC High Performance Liquid Chromatography

HSA Human Serum Albumin

HTS High Throughput Screening/Sequencing

IgG Immunoglobulin G

Kd Dissociation constant

LC Liquid Chromatography

MMP3 human Matrix MetalloProteinase 3

MS Mass Spectrometry

ND NanoDrop

NHS N-hydroxysuccinimmide

NMR Nuclear magnetic resonance

Nvoc 4,5-dimethoxy-2-nitrobenzylmethoxycarbonyl

PAGE Polyacrylamide

12 PBS Phosphate buffered saline

PCR Polymerase Chain Reaction

Prep Preparative

RNA Ribonucleic acid

RP Reverse Phase

SDS sodium dodecyl sulfate

SNP Single Nucleotide Polymorphism

SOLiD Sequencing by Oligonucleotides Ligation and Detection sst single strand

TBE Tris-borate-EDTA

TEAA Triethylammonium acetate

THF Tetrahydrofuran

TFA Trifluoroacetic acid

TFE Trifluoroethanol

Tris 2-Amino-2-hydroxymethyl-propane-1,3-diol tSMS True Single Molecule Sequencing

Tween 20 Polyoxyethylene (20) sorbitan monolaurate

UV Ultraviolet

13 2. INTRODUCTION

The discovery of molecules binding to macromolecular targets is formidable task in chemistry, biology and pharmaceutical sciences. Following the sequencing of the human genome1,2, the advances in proteome research3,4 and transcriptomics5, a multitude of biological targets associated with relevant processes in healthy and diseased cells have been discovered. With an aging population and an increased understanding of the mechanisms of disease at a molecular level, biomedical scientists are facing the demand for more and better drugs. Additionally, elucidation of the biological function of proteins will, in many cases, require access to specific ligands (an approach that is often termed ‘Chemical Genetics’4). Even though the specific binding to the biological target is not per se sufficient to turn a binding molecule into a drug, as it is widely recognized that other molecular properties (such as pharmacokinetic behaviour and stability) contribute to the performance of a drug. Nevertheless the isolation of specific binders against a relevant biological target typically represents the starting point in the process, which leads to a new drug.6

Techniques for the general, fast, inexpensive isolation of small, organic, binding compounds are lacking at present. Currently, hundreds of thousands of molecules typically have to be screened, in order to find a suitable candidate.6 High-throughput screening (HTS) in certain cases allows the screening of some 100,000 compounds per day. However, HTS is cumbersome both in terms of costs (for robotic equipment and material consumption) and technical development (set-up of sophisticated bio- assays, storage and handling of the chemical archives). Similarly, the preparation, storage and screening of very large synthetic libraries of organic molecules can be very demanding, not only from the synthetic point of view, but also in terms of logistics. Although combinatorial synthetic approaches such as the intriguing “split- &-pool”7,8,9 methods and solid phase synthesis10,11,12 facilitated the construction of chemical pools of compounds, inevitably the complexity associated to the specific binding molecules grows together with the size of the chemical library to be screened while the relative concentration of each individual member in the library decreases. Consequently, chemical libraries as pool of compounds are often limited in size due to

14 sensitivity limits of biochemical assays and of the chemical analytical methods for structural characterization.

Over the last decade, the interest in the development of powerful and convenient technologies for chemical library construction and screening has increased dramatically. Techniques such as phage display13,14, yeast display15, ribosome display16 and covalent display17. In this light it would be useful to devise strategies for the identification of small organic molecules, capable of binding to target proteins with high affinity and specificity, based on the association of individual chemical compounds to unique DNA-fragments serving as identification bar-codes.

15 2.1 DNA-Encoded Chemical Libraries The concept of DNA-encoding was first described in a theoretical paper by Brenner and Lerner in 1992 who anticipated a “split-&-pool”-based combinatorial synthesis in which monomeric chemical compounds and coding oligonucleotide tags would be attached on beads in an alternated fashion (Figure 2-1).18 Shortly afterwards, the first practical implementation of this approach was presented by S. Brenner and K. Janda19 and similarly by the group of M.A. Gallop20. Brenner and Janda suggested to generate individual encoded library members by an alternating parallel combinatorial synthesis of the heteropolymeric chemical compound and the appropriate oligonucleotide sequence on the same bead in a “split-&-pool”-based fashion, using the solid support as a structural linker between the nascent chemical entity and its corresponding oligonucleotide label. Therefore they developed as a test system the synthesis of a functionally active leucine-enkephalin pentapeptide, with the aim of testing the feasibility of alternating peptide and oligonucleotide synthesis on the bead. Totally they accomplished five alternating rounds of peptide and oligonucleotide synthesis.19

aa1 tag1

. . . . . aa2 tag2 ...... aax . . . . 1. m rounds . . split-&-pool . aa3 2. Release aa2 Split1 Pool1 from beads aa1 tag1 tag2 tag3

tagm

nm compounds aa3 tag3

aan tagn

Figure 2-1: Schematic representation of the DNA-encoding of peptides on beads. The coupling of amino acids by peptide forming reaction to a growing peptidic chain, alternated to the stepwise synthesis of a DNA bar-code lead to DNA encoded beads displaying peptides, which can be probed for binding to selected target protein of interest. ‘aa’ represents the different amino acids, while ‘tag’ refers to a DNA sequence encoding the corresponding amino acid added in the split-&-pool procedure.

16 Controlled-pore glass was used as a solid matrix to facilitate an efficient oligonucleotide synthesis. The solid support was derivatized with a succinyl aminohexanol-sarcosine appendage that allowed the easy detachment of the oligonucleotide-encoded peptide after synthesis (Figure. 2-2a). In order to fulfill orthogonality requirements, O-DMT-protected serine and N-Fmoc protected lysine scaffolds were used for the attachment of the emerging oligonucleotide and peptide sequences, respectively (Figure. 2-2a). The oligonucleotide-tagged peptides were released from the beads and Edman-sequenced. The leucine-enkephalin pentapeptide (YGGFL) constructed in this fashion (Figure. 2-2b) was shown to bind to the anti- 21 leucine-enkephalin antibody 3-E7 as efficiently as the reference peptide (Kd = 7.1 nM). Remarkably, the codes of released oligonucleotide-tagged peptides could be amplified by standard polymerase chain reaction (PCR).

Site for the nascent peptide

O O O O a) H N N O O N O(CH2)6NH O N O NHFmoc H H O O O DMT Cleavage site

Site for the oligonucleotide code

PCR priming site PCR priming site

b) 5’-AGCTACTTCCCAAGG GAG CTG CTG CTA GTC GGGCCCTATTCTTAG-3’ LINKER LNHPGGY

Peptide sequence Peptide sequence

Figure 2-2: Derivatized solid support allows the oligonucleotide encoding of a nascent peptide sequence. a) Schematic representation of the derivatized support with succinyl aminohexanol-sarcosine cleavable appendage. The cleavable linker enables the easy detachment of the oligonucleotide-encoded peptide after synthesis, while O-DMT-protected serine and N-Fmoc protected lysine allows the bidirectional synthesis of oligonucleotide and peptide sequences. Recent approaches to DNA-encoded chemical libraries prefer to omit the beads and link the compounds directly to DNA. b) Leucine- enkephalin pentapeptide (YGGFL) oligonucleotide conjugate after release from beads. The codes of released oligonucleotide-tagged peptides could be amplified by standard polymerase chain reaction (PCR). Leucine-enkephalin pentapeptide was shown to bind to the anti-leucine-enkephalin antibody 3-

E7 as efficiently as the reference peptide (Kd = 7.1 nM)

17 In the same year, Gallop and co-workers constructed an 823´543-member DNA- encoded heptapeptide library performing seven alternating split-&-pool synthesis cycles on spherical beads using seven different D- and L-amino acid building blocks.20 Beads were conjugated with a mixture of two different linkers, one of which, with a DMT-protected hydroxyl group serving for the stepwise nucleotide addition, while the other in ca. 20-fold excess over the first one,with an Fmoc-protected amine was used for building up the polypeptide. After removal of the Fmoc group the beads were uniformly split into seven pools and reacted with one of the seven amino acid building blocks. A dinucleotide coding tag was synthesized on the beads of the individual pools and this process was repeated until the heptapeptide had been obtained. An additional oligonucleotide sequence was attached to all beads to allow PCR-based decoding. The final oligonucleotide cleavage in trifluoroacetic acid would lead to the depurination of deoxyguanosine and deoxyadenosine, which were therefore deliberately excluded from the oligonucleotide. The final library was subjected to on-bead screening against the fluorescent monoclonal antibody D32.39 that specifically binds the heptapeptide RQFKVVT. The corresponding oligonucleotide sequence could be revealed after FACS-based sorting and PCR.

Since unprotected DNA is restricted to a narrow window of conventional reaction conditions, until the end of the 1990s a number of alternative chemical and physical encoding strategies were envisaged (i.e. MS-based compound tagging, peptide encoding, haloaromatic tagging, encoding by secondary amines, semiconductor devices.) 22, mainly to avoid inconvenient solid phase DNA synthesis and to create easily screenable combinatorial libraries in high-throughput fashion.

There is considerable evidence that the isolation of binding polypeptides (e.g. antibodies) requires libraries comprising at least >107-108 members23. In full analogy, it appears reasonable to assume that large libraries will facilitate the isolation of small organic binders to protein of interest. However, using conventional methods, even the largest pharmaceutical companies cannot screen more then few hundred thousands compounds in HTS campaign. The selective amplifiability of DNA greatly facilitates library screening and it becomes indispensable for the encoding of organic compounds libraries of this unprecedented size. Consequently, at the beginning of the 2000s DNA-encoded combinatorial chemistry experienced a revival.

18

Around 2002 several groups realized that omitting beads and attaching chemical compounds directly to oligonucleotides or DNA fragments could conveniently lead to very large DNA-encoded chemical libraries. The set-up of DNA-encoded chemical libraries (DEL) was pursued investigating completely novel avenues. The resulting libraries can be grouped in libraries DNA-encoded presenting single or multiple oligonucleotides displaying one covalently linked putative binding molecules (Figure 2-3).

a) b)

5‘ 3‘ 5‘

3‘ 5‘ 3‘

Multiple pharmacophore format Single pharmacophore format

Figure 2-3: Schematic representation of DNA-encoded library displaying chemical compounds directly attached to oligonucleotides. a) DNA-encoded library presenting multiple pairing oligonucleotides each displaying a covalently linked binding molecule. b) DNA-encoded library presenting a single oligonucleotide covalently linked to a putative binding molecule.

19 2.1.1 Libraries of DNA displaying one covalently linked chemical entity

2.1.1.1 DNA-encoded “Split-&-Pool” An alternative strategy to construct DNA-encoded library in full analogy with the encoded “split-&-pool” technique described by Brenner and R. Lerner18, features the synthesis of chemical compounds directly on the oligonucleotide, omitting the use of the solid support (i.e., beads) (Figure 2-4). Initially a set of unique oligonucleotides each containing a specific sequence is chemically conjugated to a corresponding set of small organic molecules carrying a suitable reactive group. Typically a carboxylic acid is coupled to amino-modified oligonucleotide. Consequently the oligonucleotide- conjugate compounds are mixed and divided into a number of groups.

x tag x bb1 1

Reactive site bb2 x tag2 x

x x

x x Split1 Pool1 x x x

x

x x

bb3 tag 3 m rounds split-&-pool

x x x bb n tagn x x

x

nm compounds

Figure 2-4: Schematic representation of hypothetical DNA-encoded libraries of linear peptides constructed in a split-&-pool fashion omitting bead support. An initial building block is conjugated to oligonucleotide and encoded with a further set of oligonucleotide either by ligases or by polymerase. Consequently the oligonucleotide-conjugate compounds are mixed, divided into a number of groups and reacted again with an additional building block. Following encoding, these steps are repeated a given number of times. ‘bb’ represents the different building block, while ‘tag’ refers to a DNA sequence encoding the corresponding amino acid added in the split-&-pool procedure.

In appropriate conditions a second set of building blocks are coupled to the first one and a further oligonucleotide which is coding for the second modification is

20 hybridized to the initial oligonuclotide and enzymatically encoded either by ligases or by polymerase. In a “split-&-pool” fashion these steps are then repeated. In 2002 the Danish company Nuevolution and the US company Praecis filed patent applications for proprietary enzymatic ligation strategies for DNA code assembly enabling sequential chemical synthesis and DNA-tagging steps.24,25,26,27 Thus far, the two companies have not yet described practical library application in the literature.

2.1.1.2 DNA-assisted “Split-&-Pool” In 2004, D.R. Halpin and P.B. Harbury presented a novel intriguing method for the construction of DNA-encoded libraries.28 For the first time the DNA-conjugate templates served for both encoding and programming the infrastructure of the “split- &-pool” synthesis of the library components. The design of Halpin and Harbury enabled alternating rounds of selection, amplification and diversification with small organic molecules, in complete analogy to phage-display technology.

In a further milestone paper on DNA-encoded chemical libraries, Halpin and Harbury demonstrated the efficiency of unique DNA-routing machinery, consisting of series- connected columns bearing resin-bound anticodons, which could sequence- specifically separate a population of DNA-templates into spatially distinct locations by hybridization (termed DNA-routing), (Figure 2-5).28 A 340-mer oligonucleotide template combinatorial library was constructed in two steps by PCR assembly of overlapping complementary 40-mer oligonucleotides which contained a 20 base coding and an adjacent 20-mer non-coding constant region. Therefore, a 108 member 340-mer DNA-duplex template library was obtained which was further converted into single-stranded DNA format by reverse-transcription and sodium hydroxide hydrolysis of the RNA strand. These templates were used for investigating the feasibility of sequence-specific gene routing. A number of anticodon columns were produced in which the anticodon sequences to the template genes were covalently coupled to sepharose resin. In high salt conditions, the template genes hybridized sequence-specifically to the corresponding anticodon columns connected in series. The individual sequence-specific columns were then joined in series with weak anion- exchange (DEAE) columns. When changing the conditions from high salt to low salt and 50% DMF, the oligonucleotides were eluted from the anticodon columns and could bind to the DEAE columns, where chemical reaction can take place. Following

21 elution from the DEAE columns in high salt conditions the combined DEAE column eluates were split again by sequence-specific columns, thus entering a new cycle of “split-&-pool” synthesis. Using a radioactively labelled 340-mer template the authors showed that the routing was indeed both sequence-specific and efficient (>95% for anticodon to DEAE column and >90% for DEAE to anticodon column), resulting in an overall yield of 0.85n for n hybridization rounds. Furthermore, the anticodon columns proved to be reusable for at least 30 rounds of hybridization and elution.

NH2 (a-j)6 (a-j)5 (a-j)4 (a-j)3 (a-j)2 (a-j)1

z7 z6 z5 z4 z3 z2 z1

Split

NH NH NH2 a1 2 b1 2 j1 ...... * * * a 1 b 1 j 1 NH2 NH2 NH2

NH2 NH2 NH2 Coupling

NH NH NH a b j 1 1 ...... 1

NH NH 2 (a-j)1 Pool

NH (a-j)6 (a-j)5 (a-j)4 (a-j)3 (a-j)2 (a-j)1 6 rounds Split-&-pool z7 z6 z5 z4 z3 z2 z1

Figure 2-5: Synthesis of a DNA-encoded chemical library by ‘DNA-routing’. The initial oligonucleotide template contains six coding regions for ten different amino acids [(a-j)1-6] as well as seven constant domains (z1-7). The library of coding oligonucleotides, comprising all the possible combinations of the different coding regions was split by affinity chromatography using specific complementary oligonucleotides bound on resin [(a*-j*)1-6]. Following separation, each oligonucleotide template was conjugated to the corresponding amino acid and subsequently pooled together. The whole cycle was repeated totally six times, yielding to a library of DNA-encoded hexapeptides.

According to this split-and-pool protocol (Figure 2-5), a combinatorial library composed of 106 N-acylated pentapeptides conjugate to 340-mer oligonucleotides was generated.29 Ten different amino acid building blocks were used for the first positions and nine carboxylic acids for the N-acylation step. The library included acylated leucine-enkephalin pentapeptides as positive control. After conversion into a DNA duplex form, the library was subjected to an affinity-based selection against the

22 monoclonal antibody 3-E7, which was known to bind the leucine-enkephalin pentapeptide YGGFL with 7.1 nM affinity30 (the same selection system was used by Brenner und Janda in 1993)19. Two iterated cycles of panning were performed. The eluted DNA from the first round was PCR-amplified and used as input for the following round of synthesis and selection. After sequencing both input DNA and eluted DNA after two rounds of panning a strong round-to-round of leucine- enkephalin pentapeptide DNA conjugates could be demonstrated, leading to a consensus sequence matching leucine-enkephalin.

To confirm that the coding sequences did not bias the synthesis of leucine-enkephalin DNA-conjugates, an analogous DNA-pentapeptide library was constructed, differing only in the coding sequences. Selections performed with this library also evidenced a 105-fold enrichment of the leucine-enkephalin encoded compound.

This novel embodiment of “split-&-pool” library construction, together with the possibility of chemical translation and diversification, holds promises for the construction of large DNA-encoded chemical libraries. While the set-up of the routing technology seems to be tedious at a first glance, exponentially larger libraries can be constructed with only a linear increase of work. Yet, chemistry has so far been limited to peptide synthesis. In an additional publication, Harbury and co-workers describe the feasibility and efficiency of solid phase peptide synthesis on unprotected DNA.31 Yields over 90% per individual coupling step could be achieved which might be sufficient for the construction of big libraries. Future selection experiments will reveal whether the accumulation of synthesis failure sequences accumulating from step to step does not encumber the identification of the best binders. From a drug discovery point of view, the linear peptides which so far have been produced by this approach may not represent the drug-like structures pharmaceutical industry is interested in.32 Nonetheless the potentiality of this technology can probably be increased by enlarging the repertoire of building blocks and expanding the range of chemical reactions.

23 2.1.1.3 DNA-templated synthesis In 2001 David Liu and co-workers showed that complementary DNA oligonucleotides can be used to assist certain synthetic reactions, which do not efficiently take place in solution at low concentration.33,34 At the same time, Summerer and Marx demonstrated that the use of reagents in close spatial proximity may lead to an enhancement of reaction rates.35 Indeed, a DNA-heteroduplex can be used to accelerate the reaction between chemical moieties displayed at the extremities of the two DNA strands.33,34D.R. Liu and coworkers were the first to show an efficient series of solution-phase DNA sequence-programmed chemical reactions. In these reactions, oligonucleotides carrying one chemical reactant group are hybridized to complementary oligonucleotide derivatives carrying a different reactive chemical group (Figure 2-6).36 The close proximity conferred by the DNA hybridization drastically increases the effective molarity of the reaction reagents attached to the oligonucleotides, enabling the desired reaction to occur even in an aqueous environment at concentrations which are several orders of magnitude lower than those needed for the corresponding conventional organic reaction not DNA-templated.36 A variety of oligonucleotide-derivatives can be paired and can be used to discover novel chemical reactions.36,37

Figure 2-6: DNA sequence-programmed chemical reactions: schematic overview of the reactions compatible with the ‘DNA-templated synthesis’ approach. The close proximity conferred by the DNA hybridization drastically increases the effective molarity of the reaction reagents attached to the oligonucleotides, enabling the desired reaction to occur. (Adapted from Li, X. and Liu, D.R.36)

24

To a certain extent, this proximity effect which accelerates bimolecular reaction is distance-independent (at least within a distance of 30 nucleotides), allowing the introduction of variable DNA coding regions on the oligonucleotide template at different position. These DNA-templated reactions can be performed in multiple consecutive steps38 and in step-programmed fashion39. Crucially, by linking chemical compounds directly to DNA, a linkage of phenotype and genotype may be established, in full analogy to protein display methodologies. Subsequently the information content can be amplified by PCR after affinity capture. In a later step, sequence-programmed synthesis of DNA-conjugates may facilitate library amplification after selection. The selection efficiency which could be achieved with DNA-encoded binding molecules and affinity captures, was investigated by performing selections on glutathione S-transferase with suitable inhibitors, revealing enrichment factors of the cognate DNA derivatives up to 10,000-fold.40 Recently, Liu and co-workers described the DNA-templated set-up of a small library of macrocycles which they subjected to in vitro selection (Figure 2-7).41 For this purpose, a 48-base DNA-template library comprising 48-mer oligonucleotides carrying an amino group at 5’ end and containing three consecutive coding regions was used. A lysine was coupled to the primary amino group at the oligonucleotide extremity by amide bond reaction formation. The lysine was ε-protected by acylation with a compound containing a vicinal diol, which allows the cleavage to an aldehyde which serves for the final ring-closing step through a Wittig-olefination. Initially a code-1 complementary 10-mer oligonucleotide, carrying both a biotin at its 5′ end and an amino acid N-protected with a base-labile cleavable linker at its 3′ terminus, was hybridized to the template. The free carboxylic acid moiety of the protected amino acid was activated to a sulfo-N-hydroxysuccinimidyl ester and covalently reacted with the free-amino group displayed on the 48-mer template oligonucleotide to form an amide bond. A purification step of the resulting covalent conjugate was obtained by capture on avidin-coated beads which retained all biotin-containing fragments, thus washing away residual, not covalently conjugate 48-mer template oligonucleotide.

25 1 2 . . . . .

n Reagent Library 1 . Library of n DNA templates . . Annealing and DNA-templated reaction 2 Annealing and Reagent Enter next round: DNA-templated reaction 1 Library 2 Reconstitute enriched library members . . DNA-sequencing Binder synthesis

Annealing and PCR-selection DNA-templated using primer reaction 3 Reagent 1 Library 3

Selection with Ring 2 Enriched target protein . closure . conjugates . . n

Figure 2-7: Schematic representation of a DNA-encoded library by ‘DNA-templated synthesis’. A library of oligonucleotides (i.e, 64 different oligonucleotides) containing three coding regions was hybridized to a library of reagent compound-oligonucleotide conjugates (i.e., 4 reagent oligonucleotide conjugates), able of pairing with the initial coding domain of the template oligonucleotide. After transferring of the compounds on the corresponding olgonucleotide template, the synthesis cycle was repeated the desired number of times with further sets of carrier compound-oligonucleotide conjugates (i.e., two rounds with four carrier compound-oligonucleotide conjugates per round). Subsequently functional selection was performed and the sequence of the binding template amplified by PCR. Thus, DNA-sequencing allowed the identification of the binding molecule. In the construction of the 65 member library, the 65th template which served as positive control was also subjected to the DNA- templated synthesis scheme.

By increasing the pH, the base-labile linker could be cleaved and the reaction product (i.e., the α-amino acylated 48-mer DNA fragment) could be eluted. This procedure was repeated with an additional code-2 specific specific 12-mer reagent oligonucleotide and a code-3 specific 12-mer reagent oligonucleotide. In the last coupling step, the reagent amino acid building block was connected to the oligonucleotide not by a base-labile linker, but with a linker containing a phosphonium group. After the third conjugation step and avidin-coated resin purification, the geminal diol linker of the α-amino group of the 48-mer template was cleaved by periodate and the resulting aldehyde could undergo a Wittig-olefination to form a fumaramide, leading to ring-closure to a macrocycle. As in the course of the

26 Wittig reaction the P–C bond between reagent oligonucleotide and template oligonucleotide was broken, the desired macrocycle-template conjugate self-eluted from the avidin beads. The authors generated a 65-fumaramide macrocycles library, starting from four initial building blocks for the three synthetic steps plus plus one additional aryl sulfonamide building block in the first step which was known to bind to carbonic anhydrase with nanomolar affinity. The DNA-template of the positive control included a NlaIII restriction site, which facilitated the monitoring of the enrichment after the selection by polyacrylamide gel electrophoresis (PAGE) following PCR amplification and NlaIII digestion. 100 fmol of the DNA-conjugate macrocycle library were subjected to an in vitro experiment against immobilized carbonic anhydrase. In a further pseudo-round of selection the eluted DNA was again loaded onto a carbonic anhydrase column. As decoding strategy of the positive control binder, the DNA was PCR-amplified and NlaIII digested before selection and after each elution. Liu and coworkers demonstrated that a significant enrichment of the positive control oligonucleotide-macrocycle conjugate was detectable after the second elution. However, the decoding method described in the paper41 was quite rudimentary and not directly applicable to libraries of larger size. Furthermore, the possibility to re-synthesize the unbiased library after selection was not demonstrated.

Assisting oligonucleotide strands and proximity-based chemical reactions may represent an alternative to “split-&-pool” strategies for the construction of large libraries in solution. While amide bond forming reactions have so far been used for library construction, it is expected that different chemistries may be used in order to generate non-peptidic structures. The group of Liu considered a variety of other possible reaction, which may occur in the presence of DNA (Figure 2-6).36 Additionally, even though the overall yields for the multi-step synthesis of DNA- encoded compounds were not excellent (approx. 5% over three steps), the use of avidin resins for products purification contributed to the purity of library compounds. Nevertheless, quality controls of library synthesis may become more difficult for libraries of larger size. In this light, DNA-templated synthesis method as the one described by D.R. Liu and co-workers for constructing libraries with complexities of pharmaceutical interest remains at present a formidable challenge.

27 2.1.1.4 Stepwise coupling of coding DNA fragments to nascent organic molecules A promising strategy for the construction of DNA-encoded libraries is represented by the use of multifunctional building blocks covalently conjugate to an oligonucleotide serving as a “core structure”for library synthesis. In a ‘spit-&-pool’ fashion a set of multifunctional scaffolds could undergo orthogonal reactions with series of suitable reactive partners. Following each reaction step, the identity of the modification could be encoded by an enzymatic addition of DNA segment to the original DNA “core structure” (e.g., by ligation, Figure 2-8). This feature has been exploited for the first time by our group.42,43 Initially we envisaged the use either of a variety of N-protected amino acids or of diene carboxylic acid derivatives. The use of N-protected amino acids covalently attached to a DNA fragment allow, after a suitable deprotection step, a further peptide bond formation with a series of carboxylic acids or a reductive amination with aldehydes. Similarly, diene carboxylic acids used as scaffolds for library construction at the 5’-end of amino modified oligonucleotide, could be subjected to a Diels-Alder reaction with a variety of maleimide derivatives.

FG2 FG2

FG2 FG2

FG2 FG2

FG2 FG2

FG2 FG2

FG2 FG2 FG FG2 2

FG1

FG2 FG FG 1 2. . FG2 FG Split / Pool Split / Pool 1 Encoding FG . Encoding . Reaction 2 . Reaction . . .

......

FG2

FG2 FG2

FG2

FG2

FG2

Figure 2-8: Schematic representation of a DNA-encoded library by stepwise coupling of coding DNA

fragments to nascent organic molecules. An initial set of multifunctional building blocks (FGn represents the different orthogonal functional groups) are covalently conjugate to a corresponding

encoding oligonucleotide and reacted in a split-&-pool fashion on a specific functional group (FG1 in red) with a suitable collection of reagents. Following enzymatic encoding, a further round of split-&- pool is initiated. At this stage the second functional group (FG2 in blue) undergoes an additional

28 reaction step with a different set of suitable reagents. The identity of the final modification could be ensured yet again by enzymatic DNA encoding by means of a further oligonucleotide carrying a specific coding region.

After completion of the desired reaction step, the identity of the chemical moiety added to the oligonucleotide could be established by the annealing of a partially complementary oligonucleotide and by a subsequent Klenow fill-in DNA- polymerization, yielding a double stranded DNA fragment. The synthetic and encoding strategies described above enable the facile construction of DNA-encoded libraries of a size up to 104 member compounds carrying two sets of “building blocks”. However the stepwise addition of at least three independent sets of chemical moieties to a tri-functional core building block for the construction and encoding of a very large DNA-encoded library (comprising up to 106 compounds) (see Chapter 3.3) can also be envisaged.

Importantly we have found that selections of DNA-encoded chemical libraries can conveniently be decoded after PCR amplification of the DNA-tags using recently described high-throughput DNA sequencing technologies (such as “454 technology”), which had originally been developed for genome sequencing (see Chapter 2.2.2).44 Recent advances in ultra high-throughput DNA sequencing allow the sequencing of over one million sequence tags per sequencing run (see Chapter 2.2.2)44,45 and may thus allow the decoding of DNA-encoded libraries containing millions of chemical compounds.

29 2.1.2 DNA libraries displaying multiple covalently linked chemical entities—ESAC libraries

Watson-Crick and Hoogsteen46 base pairing allow the sequence-specific assembly of oligonucleotides to form stable heterodimers and heterotrimers, respectively. Our laboratory has exploited this feature for the combinatorial self-assembly of oligonucleotide-chemical compound conjugates.47 In principle, the self-assembly of two sublibraries of a size of only 103 members containing a constant complementary hybridization domain can yield a combinatorial DNA-duplex library after hybridization with a complexity of 106 uniformly represented library members (Figure 2-9).

a) b) c) d)

Target Target Target Target

Compound 1 Known Hybridization binder domain

Code 1

Single pharmacophore Affinity maturation Duplex library Triplex library

e) Target

II III V

IV VI

I

Figure 2-9: ESAC library technology overview. Small organic molecules are coupled to 5’-amino modified oligonucleotides, containing a hybridization domain and a unique coding sequence, which ensure the identity of the coupled molecule. The ESAC library can be used in single pharmacophore format (a), in affinity maturations of known binders (b), or in de novo selections of binding molecules by self assembling of sublibraries in DNA-double strand format (c) as well as in DNA-triplexes (d). The ESAC library in the selected format is used in a selection and read-out procedure (e). Following incubation of the library (i) with the target protein of choice (ii) and washing of unbound molecules (iii), the oligonucleotide codes of the binding compounds are PCR-amplified and compared with the library without selection on oligonucleotide micro-arrays (iv, v). Identified binders/binding pairs are validated after conjugation (if appropriate) to suitable scaffolds (vi).

30 A third strand can be added introducing Hoogsteen base pairing46. Hoogsteen and reversed-Hoogsteen48,49 base pairing mediate the interaction of a third cognate oligonucleotide with a Watson-Crick DNA double helix. Using a triplex DNA format, three 103 member sublibraries could yield a 109 member library (Figure 2-9). Each sub-library member would consist of an oligonucleotide containing a variable, coding region flanked by a constant DNA sequence, carrying a suitable chemical modification at the oligonucleotide extremity (Figure 2-9). This approach has been termed ESAC (for Encoded Self-Assembling Chemical libraries). In contrast to the library formats described in the previous section (see Chapter 2.1.1), in which only one oligonucleotide in the DNA-heteroduplex would carry a chemical group, the ESAC method enables multiple (i.e. single-, double-, triple-) oligonucleotides displaying different pharmacophores. Moreover each sub-library member can be individually produced and purified by HPLC in nanomolar quantities, thus enabling reliable analytics and quality controls. These sublibraries can be used in at least four different embodiments. In a first example, a sub-library can be paired with a complementary oligonucleotide and used as a DNA encoded library displaying a single covalently linked compound for affinity-based selection experiments (Figure 2-9a). Alternatively, a sub-library can be paired with an oligonucleotide displaying a known binder to the target, thus enabling affinity maturation strategies (Figure 2-9b). In a third embodiment, two individual sublibraries can be assembled combinatorially and used for the de novo identification of bindentate binding molecules (Figure 2-9c). Finally, three different sublibraries can be assembled to form a combinatorial triplex library (Figure 2-9d). The multiple pharmacophore display approaches may lead to high binding affinities, by virtue of a simultaneous engagement of adjacent binding sites, thus exploiting the chelate effect in analogy to fragment-based drug discovery.50 The conjugation of two pharmacophores to the two strands of a DNA double helix introduces a spacing of roughly 10-15 Ǻ, with some flexibility between the binding moieties and the core DNA structure. Preferential binders isolated from an affinity- based selection can be PCR-amplified and decoded on complementary oligonucleotide microarrays51,52 (Figure 2-9e) or by concatenation of the codes, subcloning and sequencing53. The individual building blocks can eventually be conjugated using suitable linkers to yield a drug-like high-affinity compound. The characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and

31 solubility) influence the binding affinity and the chemical properties of the resulting binder.

A first 138-member ESAC library (termed ‘elib1’ library) which consisted of carboxylic acids covalently linked to 5′ amino-modified 48-mer oligonucleotides and contained a biotin-oligonucleotide conjugate as positive control. The library was hybridized with an oligonucleotide conjugated to a cyanine dye (irrelevant for the binding) and subjected to affinity-based selection on streptavidin. A significant enhancement of the biotin-oligonucleotide conjugate signal was observed after selection and microarray-based decoding.47

In a second proof of principle, the 137-member ESAC library was employed in affinity maturation experiments. A dansylamide and a benzoyl sulfonamide conjugated at the 3’ extremity of an oligonucleotide were used as lead binders to human serum albumin (HSA) and bovine carbonic anhydrase II (CAII) respectively. The oligonucleotide derivatives were hybridized with the 137-member library and subjected to selection using immobilized HSA and CAII. Following microarray-based decoding, the enriched binding molecules were linked to the lead-binder with a set of bifunctional linkers of different length and the affinities of the respective conjugates towards the target protein were determined. The simultaneous engagement of the lead-binder and the selected compound led to a 10–40-fold increase in affinity.47

Encouraged by the results, ‘Elib1’ ESAC library was extended from 137-compounds to over 600 compound members and termed ‘elib2’ library. Thereby, a further series of bio-panning experiments on streptavidin and HSA were performed, leading to the identification after micro-array based read-out of novel target specific binding molecules ranking dissociation constant from the mM to the fM range.54,55 Notably the screening of the ‘Elib2’ ESAC library towards HSA allowed the isolation of the 4-(p-iodophenyl)butanoic moiety. The compound discovered by our group represents the core structure of a series of portable albumin binding molecules and of Albufluor™, a recently developed fluorescein angiographic contrast agent currently 55 under clinical evaluation.

32 Recently, ESAC technology has been used by our group for the isolation of potent inhibitors of bovine trypsin56 and for the identification of novel inhibitors of stromelysin-1 (MMP-3)57, a matrix metalloproteinase involved in both physiological and pathological tissue remodeling processes. Benzamidine, a trypsin inhibitor with an IC50 value in the 100 μΜ range, was used aslead in an ESAC-based affinity maturation procedure. 5-(4-carbamidoylbenzylamino)-5-oxopentanoic acid was conjugated at the 3’-end of an amino-modified oligonucleotide and hybridized with a 620-member ESAC sublibrary. After selection using immobilized trypsin and microarray-based decoding, a number of bidentate binders were identified and synthesized, allowing for different linkers connecting the benzamidine moiety to the other pharmacophore identified in the ESAC procedure. The most active inhibitor

exhibited an IC50 value of 98 nM, but various bidentate ligands also revealed a dramatically improved affinity, compared to a set of parental benzamidine derivatives,

whose IC50 values were in the 11-220 μM range. Similarly for the identification of novel inhibitors of stromelysin-1 matrix metalloproteinase (MMP-3) an ESAC library of 550 DNA-encoded chemical compounds was used. After selection on immobilized MMP-3 and microarray-based decoding, the best candidate was conjugated to the amino-modified 3′-extremity of a 24-mer oligonucleotide capable of pairing with the initial 550 member ESAC sublibrary and used as lead for affinity maturation selections. After a second round of selection enrichment of one synergistic binding moiety was identified. The newly discovered pharmacophores were used for the synthesis of low-molecular weight bidentate MMP-3 inhibitors with a series of diamino linkers. The bidentate binder was superior compared to DNA conjugates displaying the individual pharmacophores or no pharmacophore at all. After measuring the corresponding inhibition constants to MMP-3, the best binder exhibited an IC50 value of 9.9 µM.

In most cases, the spatial arrangement and the flexibility associated to the linker used to conjugate the two pharmacophores identified after ESAC-library selection, dramatically influence the binding affinity of the corresponding bidentate ligand. The identification of optimal linkers may sometimes be a tedious procedure. Furthermore the decoding of ESAC library in a multiple DNA-stranded format comprising over 104 compounds as for the de novo identification of binding molecules (Figure 2-9c, 2- 9d) cannot be efficiently achieved by a microarray-based approach due to suboptimal

33 read-out quality and to physical spotting limitation. In principle, high-throughput sequencing techniques could be considered for the decoding of selections performed with ESAC libraries (see Chapter 2.2.2).58

2.2 The decoding of DNA-encoded chemical libraries

The identification of specific binding compounds from DNA-encoded chemical libraries requires the use of affinity-based selection strategies and of suitable decoding techniques. Generally, selections are performed by capture of binding compounds on a target protein, immobilized on a solid support. The stringency of both capture and washing steps crucially influences the outcome of affinity selections.19,20,29,41,47 The decoding strategy also greatly contributes to the successful use of DNA-encoded chemical libraries. So far, most groups active in DNA-encoded libraries research often used rudimentary techniques, mainly aiming at demonstrating the feasibility of the DNA-encoded strategy principle, rather than exhaustively analyzing the decoding aspect of the selection.19,20,29,41 Although many authors implicitly envisaged a traditional Sanger-sequencing-based decoding (for an overview on Sanger sequencing see Ref 65), the number of codes to sequence simply according to the complexity of the library is definitely an unrealistic task for a traditional Sanger-sequencing approach. If one assumes a library complexity of 106 and an enrichment factor of 100 for good binders versus non-binders in a round of selection then, statistically, 105 sequences are required to identify preferential binding compounds with suitable confidence. Furthermore the number of sequences to be read is destined to grow up together with the increase of library size. Nevertheless a first implementation of Sanger-sequencing for decoding DNA-encoded chemical libraries in high-throughput fashion was described by our laboratory.47 After selection and PCR amplification of the DNA-tags of the library compounds, concatamers containing multiple coding sequences were generated and ligated into an EcoRI-digested pUC19 vector. Following sequencing of a representative number of the resulting colonies revealed the frequencies of the codes present in the ESAC DNA sample before and after selection. Besides the Sanger-sequencing-based decoding, our group investigated microarray-based47 methodology and very recently implemented the novel robust high-throughput sequencing techniques for efficiently decoding DNA-encoded libraries42.

34 2.2.1 Microarray-based decoding A DNA microarray is a device for high-throughput investigations widely used in and in medicine.59 It consists of an arrayed series of microscopic spots (‘features’ or ‘locations’) containing few picomoles of oligonucleotides carrying a specific DNA sequence (Figure 2-10). This can be a short section of a gene or other DNA element that are used as probes to hybridize a DNA or RNA sample under suitable conditions. Probe-target hybridization is usually detected and quantified by fluorescence-based detection of fluorophore-labeled targets to determine relative abundance of the target nucleic acid sequences. In standard microarrays, the probes are attached to a solid surface by a covalent bond to a chemical matrix (via epoxy- silane, amino-silane, lysine, polyacrylamide or others). The solid surface can be glass or a silicon chip, in which case they are commonly known as gene chip (Affy-chip when an Affymetrix chip is used, Figure 2-10). Other microarray platforms, (Illumina), use microscopic beads, instead of the large solid support.

1.28 cm

1.28 cm

Current size of last generation of GeneChip®

Millions of DNA-probe strands built up on each „feature“

„Feature“

Probe oligonucleotide

Figure 2-10: Schematic representation of an Affimetrix micro-array chip. Microscopic spots (‘features’ or ‘locations’) on the solid support contain several millions of single stranded DNA-probes immobilized. After hybridization to the chip of fluorescent labelled DNA or RNA sample, detection and quantification are carried out by fluorescence-based analysis. (Adapted from http://www.affymetrix.com)

35 Microarray technology was originally derived from the Southern blotting60 technique, in which DNA fragments are probed with a labelled oligonucleotide complementary to the DNA segment. The use of a library of distinct DNAs in arrays format for expression profiling was first described in 1987, and the arrayed DNAs were used to identify genes whose expression is modulated by interferon.61 These first gene arrays were prepared by spotting cDNAs onto filter paper with a pin-spotting device. Conversely, the use of miniaturized microarrays was first reported in 1995,62 and a complete eukaryotic genome (Saccharomyces cerevisiae) on a microarray was published in 199763.

So far, DNA microarrays have found many applications in a variety of technologies (gene expression profiling, SNP detection, comparative genomic hybridization, alternative splicing determination) and have dramatically accelerated many types of investigations.59 Over the last few years, our laboratory used DNA microarray for the decoding of DNA-encoded chemical libraries.47 In this setting 19-mer, 5' amino- tagged oligonucleotides each containing a specific sequence representing the code of the individual chemical compounds in the library, are spotted in quintuplicate onto 25x75 mm polyethylene glycol−coated and epoxy-activated microarray slides, using a BioChip Arrayer robot and incubated in a humid chamber overnight at 25 °C. Subsequently, the oligonucleotide tags of the binding compounds isolated from the affinity-based selection are PCR amplified using a fluorescent primer and hybridized onto the DNA-microarray slide. Afterwards, microarrays are analyzed using a laser scan-array and spot intensities detected and quantified. The enrichment of the preferential binding compounds is revealed comparing the spots intensity of the DNA-microarray slide before and after selection.

Although DNA microarrays have provided a powerful approach to decode DNA- encoded chemical libraries and to rapidly interrogate biological systems at a genomic level, several limitations restrict the margins of its application. Even for the last generation of high-density microarray chip (up to 7x106 features), the spotting and hybridization of DNA-encoded libraries is quite demanding. Additionally, the fundamental reliance of microarrays on nucleic-acid hybridization results in a “low- fidelity” hybridization analysis of highly related sequences because of cross- hybridization. This problematic is crucial in the decoding of DNA-encoded chemical

36 libraries. Since the differences between distinct compounds could be very small at the level of the oligonucleotide tags, cross-hybridization may yield to false positive identification. Additionally it is difficult to confidently detect and quantify low- abundance species by DNA-microarray-based decoding even if the enrichment after selection is substantial. Moreover, microarray decoding is currently challenging regarding the reproducibility of results and is very dependent on specific platforms. For instance, the “analog” quantification rather than “digital” limits the dynamic range and the sample comparison. Last but not least, from the economical point of view, the technology is costly (DNA probes and robotic equipment). However since 2004, massively parallel DNA sequencing technologies have became available, offering dramatically lower per-base costs64 and promising to overcome the limitations of microarrays. Millions of independently derived sequencing tags can nowadays be simultaneously investigated in a single experiment at a cost below 1000 Sfr.

37 2.2.2 Decoding by high throughput sequencing According to the complexity of the DNA encoded chemical library (typically between 103 and 106 members), a conventional Sanger-sequencing based decoding is unlikely to be usable in practice, due both to the high cost per base for the sequencing65 and to the tedious procedure involved65. However nearly three decades have passed since the invention of electrophoretic methods for DNA sequencing and various novel sequencing technologies have recently been developed, each aiming to reduce costs to the point at which the genomes of individual humans could be sequenced as part of routine health care. Large-scale sequencing projects, including whole-genome sequencing, have usually involved the Sanger sequencing method65 using fluorescent chain-terminating nucleotide analogues66 and either slab gel or capillary electrophoresis. Recent estimates of cost for human genome sequencing with standard sequencing technologies are between $10 million and $25 million. Alternative sequencing methods have been described67,68,69,70,71; nonetheless all these strategies were essentially based on bacterial vectors and Sanger sequencing as the main final generators of sequence information and consequently failed to develop new ultra-low- cost massive sequencing techniques. Very recently new methods exploited strategies that parallelize the sequencing process displacing the use of capillary electrophoresis and producing thousands or millions of sequences at once.

Since the detection methods are often not sensitive enough for sequencing a single molecule of DNA, the majority of the novel strategies use an in vitro amplification step. Typically, it is possible to isolate individual DNA molecules along with primer- coated beads in aqueous bubbles within an oil phase by emulsion PCR. A polymerase chain reaction (PCR) then coats each bead with several clonal copies (called “polony”) of the isolated library DNA molecule.72 This strategy is employed in the methods commercialized by 454 Life Sciences, acquired by Roche, in the "polony sequencing"73 and SOLiD sequencing (developed by Agencourt and acquired by Applied Biosystems)74. Each bead is subsequently immobilized on a support for the subsequent sequencing step. An alternative method for in vitro PCR amplification is the "bridge-PCR", where fragments are amplified on primers anchored to a solid surface. This system is developed and used by Solexa (now purchased by Illumina).75 Both approaches produce many physically isolated locations, each containing several

38 copies of a single DNA fragment. In 2006, Stephen Quake's laboratory (later commercialized by Helicos) described the first second generation method for ultra high throughput sequencing based on a single-molecule sequencing, skipping the amplification step and directly fixing DNA molecules to a surface.76

Once every single sequence of DNA is physically localized to separate positions on a support, various sequencing strategies may be applied to parallel determine the DNA sequences. The "sequencing by synthesis", in full analogy with the dye-termination electrophoretic sequencing used in the Sanger-method, employs the process of DNA synthesis by DNA polymerase to identify the bases present in the complementary DNA molecule.72 Pyrosequencing (used in “454” technology) also uses DNA polymerization to add nucleotides, then detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates.72,77 Alternatively “reversible terminator methods” (used by Illumina and Helicos) are used.75,76 The nucleotides are added one at a time, then the fluorescence corresponding to that position is detected, and the polymerization of another nucleotide is enabled following removing of a blocking group. "Sequencing by ligation" is another enzymatic method of sequencing, pioneered by the laboratory of G.M. Church and employed in the “polony sequencing” and in the SOLiD technology offered by Applied Biosystems. By means of a DNA ligase enzyme rather than a polymerase and a pool of all possible oligonucleotide sequences of a fixed length, labeled according to the sequenced position, oligonucleotides are annealed and ligated.73,74,78 The corresponding ligation for matching sequences results in a signal related to the complementary sequence at that position.

In this light, advances in high-throughput DNA sequencing technologies are likely to revolutionize the strategies for the accurate decoding of DNA-encoded chemical libraries of unprecedented size.

39 2.2.2.1 “454” technology The “454” technology of Genome Sequencer FLX System (GS FLX), was developed by 454 Life Sciences and has recently (2005) been acquired by Roche. The GS FLX is a next generation DNA high throughput sequencing system featuring long reads, high accuracy, and ultra-high throughput application.72,79 Currently GS FLX is one of the most versatile high-throughput sequencing platforms available, supporting high profile studies in a wide range of categories.72,79

Figure 2-11 schematically depicts the workflow of the “454” technology. Initially, large DNA samples, such as genomic material, are fragmented in smaller fragments (between 300 and 800 basepairs) by nebulisation. The DNA sample is then denaturated to single stranded DNA (sstDNA). Subsequently specific short adaptors (called A and B) are added to each fragment using standard molecular biology techniques. An excess of sepharose beads carrying oligonucleotides complementary to e.g. the A-adaptor sequence of the library fragments is added to the DNA library previously generated in order to ensure that each of these beads hybridize to a unique single-stranded DNA sequence. The bead-bound library is emulsified with the amplification reagents in a water-in-oil mixture. Following an emulsion PCR is performed yielding in several on-beads immobilized clonally copies of a specific DNA fragment (ca. 10 million identical DNA molecules per bead). Afterwards, the emulsion PCR is broken while the amplified fragments remain bound to their specific beads. The clonally amplified on-bead fragments are enriched and loaded onto a “PicoTiterPlate” device for sequencing (70x75 mm, containg 1.6 million wells), in which the diameter of the single wells (44μm) allows for only one bead (round 30μm) per well. After addition of a DNA bead incubation mix (containing DNA polymerase, sulfurylase and luciferase), the fluidics subsystem of the Genome Sequencer FLX instrument flows individual nucleotides in a fixed order across the wells containing one bead each. Addition of one (or more) nucleotide(s) complementary to the template strand yields in a chemiluminescent signal recorded by the CCD camera.

40 a sstDNA library b c Emulsification and em-PCR

sstDNA annealed emulsify beads Monoclonal to an excess and PCR reagents amplification break emulsion Capture Beads Amplicon

sequencing by synthesis: d Partitioning : one bead per well e chemiluminescent signals upon nucleotide incorporation add enzymes

deposit beads into wells 1 well = 1 bead = 1 clonal amplification

pyrophosphate release f SIGNAL Sequences

Figure 2-11: Workflow enabling “454” technology high-throughput sequencing technology. Adaptors (A and B) - specific for both the 3' and 5' ends - are added to each sample fragment. The adaptors are used for purification, amplification, and sequencing steps. Single-stranded fragments with A and B adaptors compose the sample library used for subsequent workflow steps (a). The single-stranded DNA library is immobilized onto specifically designed DNA Capture Beads. Each bead carries a unique single-stranded DNA library fragment. The bead-bound library is emulsified with amplification reagents in a water-in-oil mixture resulting in microreactors containing just one bead with one unique sample-library fragment (b). The emulsion PCR (em-PCR) is performed and each fragment results in a copy number of several million per bead. Subsequently, the emulsion PCR is broken while the amplified fragments remain bound to their specific beads (c). The enriched beads are loaded onto a PicoTiterPlate device for sequencing. The diameter of wells allows for only one bead per well (d). After addition of sequencing enzymes, nucleotides are flowed in a fixed order across the wells containing one bead each. Addition of one (or more) nucleotide(s) complementary to the template strand results in a chemiluminescent signal recorded by the CCD camera (e). The combination of signal intensity and positional information allows the software to determine the sequence (f). (Adapted from http://www.454.com)

The nucleotide flow described above enables parallel sequencing of hundreds of thousands of beads each carrying millions of copies of a unique single stranded DNA molecule. Typically 400,000 individual reads per 7.5-hour instrument run simultaneously. For sequencing-data analysis, different bioinformatics tools are available supporting the various applications including de novo assembly;

41 resequencing and amplicon variant detection by comparison with a known reference sequence. Currently the 454 Genome Sequencer FLX instrument ensures read accuracies of >99.5% over the first 250 bases and 200 Mb of sequence information per day.72,79

In this Thesis we describe a novel convenient implementation of “454” high- throughput sequencing technology for the decoding of DNA encoded chemical library.

2.2.2.2 Solexa technology Solexa sequencing technology, acquired by Illumina in 2007, is based on massively parallel sequencing employing reversible terminator-based sequencing chemistry.75 Figure 2-12 schematically describes the Solexa technology process. Similarly to the “454” technology (see Chapter 2.2.2.1), after fragmentation of the double stranded DNA genomic material, adapters are ligated to both the extremities. Subsequently the randomly fragmented genomic DNA is denatured to single strand DNA (sstDNA) and hybridized to the complementary adapter sequences attached on a planar, optically transparent surface. Following addition of unlabelled nucleotides and DNA polymerase, the attached adapters are extended and “bridge”-amplified, resulting in an ultra-high density sequencing flow cell with ≥50 million clusters, each containing ~1,000 copies of the same template. These templates are sequenced using a four-color DNA “sequencing-by-synthesis” technology that employs reversible terminators with removable fluorescent dyes. The four fluorescent dye-nucleotides are added simultaneously at the beginning of every chemistry cycle. Therefore, after wash of the unincorporated reagents and laser excitation, the fluorescence emission from each cluster on the flow cell is recorded and the corresponding base called. Afterward the fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated. Repeating a number of times the sequencing cycles, the entire template sequence of each cluster-fragment is determined. Furthermore, after completion of the first sequence read, the templates can be regenerated in situ enabling a second read from the opposite end of the fragments.75

42 DNA sample Adapter Attach DNA Bridge Ligation to surface PCR

a b c

Adapter Attached Free terminus terminus Denaturation d A C G T

f T Bridge PCR cycles C G G Sequencing-by e A C -synthesis cycles

g Laser Clusters imaging TA...G T A G h CG...A C G G G A C GG...C Sequences G C T Sequences GC...T A C C A . . . A C determination AC...A CA...C 1st cycle 2nd cycle n cycle

Figure 2-12: Schematic description of Solexa sequencing workflow. Initially adapters are ligated to the DNA samples (a) and hybridized to the complementary adapter sequences on the slide support (b). Following addition of nucleotides and DNA polymerase, “bridge”-PCR is performed, resulting in an ultra-high density sequencing flow cell with ≥50 million clusters, each containing ~1,000 copies of the same template (c, d, e). “Sequencing-by-synthesis” technology employs reversible terminators with removable fluorescent dyes. After inclusion of the fluorescent dye-nucleotides and wash of the unincorporated reagents, laser image capture the emitted fluorescence from each cluster, then fluorophore-dyes at 3’ terminus are removed and the next chemistry cycle is initiated (f, g). Repeating the sequencing cycles, the sequence of each cluster-fragment is determined (h).

Currently the range of applications of the Solexa technology includes gene expression, small RNA discovery, and protein-nucleic acid interactions. So far the main limitation of Solexa system especially for the decoding of DNA-encoded chemical libraries implementations is represented by the short maximum read length currently up to 50 basepairs (standard 36 basepairs) for each DNA fragment, that can be extended to 100 basepairs (averagely 72 basepairs) in the case of the “double reading” from both the adaptor ends. On the other hand, the Solexa system allows the generation of up to 600 Mb/day of sequence information, three times more compared to “454” Genome Sequencer FLX instrumentation with comparable accuracy (>98.5%).75

43 2.2.2.3 SOLiD techonlogy SOLiD (Sequencing by Oligonucleotides Ligation and Detection) technology was firstly described by the group of G.M. Church in 200573 and has recently been purchased by Applied Biosystem. The methodology is base on sequential ligation with dye-labeled ologonucleotides.73 Moreover the ultra high throughput capability and the unequalled accuracy features of the SOLiD system, together with the broad range of possible applications, provide the vanguard of the next generation high throughput sequencing technologies.

In full analogy to “454” technology (see Chapter 2.2.2.1), after preparation of a suitable DNA fragment library containing specific adapters at the extremities, SOLiD methodology employs emulsion PCR (em-PCR) to generate a clonal bead populations (Figure 2-13a). Following em-PCR the templates are denatured and the beads with the extended template are enriched from the undesired beads. A suitable 3’-end modification allows the selected beads to be covalently attached to the sequencing glass slide (Figure 2-12a). Thereafter, the sequencing process is started. Typically the probe library set enabling the sequences determination contains 1024 different 8mer single strand 5’-fluorescent DNA synthetic oligonucleotides (Figure 2-13b). Each probe comprises a full randomized sequence of five bases, a cleavage site for removing the 5’-fluorescent dye and an additional three bases constant domain as depicted in Figure 2-13b. Importantly, only four different dyes are used for labelling the entire probe library set (1’024 probes, 256 probes per dye). Thereby each of the four dyes does not call for a single base, whereas it represents one of the four possible di-base combinations of position 4 and 5 of the corresponding probe (4 colours coding 16 di-base possible combinations, Figure 2-13b).

44 a) I II IIIIII IV

5‘ sstDNA sample 3‘ Random covalent bead deposition on glass slide Adapter Bead Hybridization Em-PCR and emulsion break ligation

b) Ligation site Cleavage site Fluorescent dye

1st Base 5,4 3‘ 5‘ ACG T n nnTA zzz T

G 4 Dyes, 4 di-nucleotides per dyes, 1,024 Probes / 4 Dyes = 256 probes per dye n = degenerate bases 2nd Base z = Universal bases C 1,024 Octamer-Probes (45) A

Figure 2-13: Sample preparation for SOLiD sequencing and schematic representation of the probes system enabling the sequence identification. a) Single stranded DNA sample fragments (sstDNA) are ligated to specific adapters to the 5’ and 3’ terminus (i). Following hybridization to capture beads carrying the corresponding complementary adapter sequence (ii), emulsion PCR with suitable primers is performed (iii). Lastly, the emulsion is broken and the amplified beads are covalently attached to the sequencing glass slide by the 3’-end (iv). b) Each probe of the probe library comprises from the 3’-end a random sequence of five bases, a cleavage site and an additional constant domain of three bases. Four different dyes are used for labelling the entire probe library set (1024 probes, 256 probes per dye). Each dye represents one of the four possible di-base combinations of position 4 and 5 of the corresponding probe.

The sequencing process starts hybridizing an n-base long universal sequencing primer to the adapter attached to the bead. Subsequently a set of four 5’-fluorescent removable di-base probes of fixed length together with DNA ligase are flowed on the slide, competing for ligation to the sequencing primer (Figure 2-14). Therefore after laser excitation, the fluorescence emission from each cluster on the flow cell reveals the nature of the di-base probe ligated. Following cleavage of the fluorescent dye by restriction of the probe at a specific position, the ligation process is repeated (Figure 2-14). Consequently, after every cycle a precise “di-base position” of each template fragment is interrogated. Following a series of ligation cycles the extension product is removed and the template is reset with a primer complementary to the n-1 position for a full second round of ligation cycles (Figure 2-14). After multiple cycles of reset (typically five) and ligation every base of the template sequence results to be “double

45 interrogated” by different probes (Figure 2-14). Therefore starting from a known base (e.g. the last base of the initial adapter) it is possible to univocally translate the entire colour-sequence into the corresponding base-sequence (Figure 2-14).

Universal Universal primer n primer n a b c Bead T C Bead Bead st 4,5 5‘ 3‘ 1 Ligation Cycle: Universal n primer hybridization 1st Ligation Cycle and 1nd di-base calling

Universal Universal primer n primer n d e f T C T T Bead Bead T C Bead 4,5 4,5 9,10 4,5 9,10 nd nd Fluorescent dye cleavage 1 Ligation Cycle, 2 di-base calling 1st Ligation Cycle base calling complete

Universal Universal primer n-1 g primer n-1 h Bead G T A T C Repeating Bead ligation cycle and base calling 3,4 8,9 13 2nd Ligation Cycle: Universal n-1 primer hybridization 2nd Ligation Cycle base calling

n 4,5 9,10 1st Cycle h i n-1 3,4 8,9 13 2nd Cycle n-2 2,3 7,8 12,13 3th Cycle After n Ligation Cycle: ‚color-sequence‘ n-3 1,2 6,7 11,12 4th Cycle j n-4 5th Cycle 0,1 5,6 10,11 A C G T C G C A T T C A C Multiple cycles di-base calling (each base is “double interrogated” by two different probes) Corresponding base-sequence starting from a known base

Figure 2-14: Sequencing by oligonucleotides ligation workflow. An n long universal sequencing primer is hybridized to the adapter attached to the bead (a). Subsequently the probe library 5’- fluorescent labelled together with DNA ligase are flowed on the slide, competing for ligation to the sequencing primer (b). The fluorescence emission from each cluster (bead) on the flow cell reveals the nature of the di-base probe ligated (b). After cleaving of the fluorescent dye the ligation process is repeated until the terminal adapter (in green) is reached (c, d). Hence, the first cycle of ligation and “di- base” calling is completed and the system reset (e). Following hybridization with an n-1 long universal sequencing primer, a second cycle of ligation is started and a second round of “di-base” calling accomplished (f, g). Repeating a number of ligation cycles and “di-base” calling (typically 5), each base of the template sequence results to be “double interrogated” by different probes and a ‘colour’- sequence can be generated (h, i). Starting from a known base (e.g. the last base of the initial adapter) the entire ‘colour’-sequence is converted into the corresponding base-sequence (j).

Although the SOLiD double base interrogation might appear more cumbersome, it facilitates the discrimination between system errors and true polymorphism. In essence, a true single nucleotide polymorphism (SNP) results in a consecutive double colour change between the colour-sequence of the reference-template and the observed, while sequencing errors unambiguously result in single colour change

46 (Figure 2-15). The double base interrogation enables ultra high base calling accuracy (>99.94%). Additionally, the SOLiD system is able to generate 600 Mb of sequence information per day and up to 6 Gb in a single experiment.74

A C G T C G C A T T C A C A C G T C G C A T T C A C Expected

Observed A C G T C G G A T T C A C A C G T C G G T A A G T G

SNP Sequencing error (two color change) (single color change)

Figure 2-15: SOLiD discrimination between true polymorphism (SNP) and system sequencing errors. True single nucleotide polymorphism (SNP) results in a consecutive double colour change between the reference-template and the observed ‘colour’-sequence (left panel), while sequencing errors unambiguously result in single colour change (right panel).

As for the Solexa system (see Chapter 2.2.2.2), the main drawback of the SOLiD technology, particularly for decoding of DNA-encoded chemical libraries, is represented by the narrow maximum read length currently fixed to 35 basepairs for standard applications. However the double base interrogation feature of the SOLiD approach is undoubtedly very attractive for high fidelity decoding of large DNA- encoded libraries, where the mismatch on a single base calling might be crucial for the proper identification of the binding structures.

47 2.2.2.4 Single Molecule DNA Sequencing – Helicos technology An alternative ambitious solution to address the issues of costs, speed and sensitivity of the conventional sequencing technologies and the exponentially increasing demand of DNA and RNA sequence information was very recently presented by Stephen Quake's laboratory describing the use of DNA polymerase and fluorescence microscopy to obtain sequence information from single DNA molecules.76 Furthermore, single DNA molecule sensitivity might permit direct sequencing of mRNA from rare cell populations or perhaps even individual cells.

The technology has been commercialized in 2006 as Helicos True Single Molecule Sequencing (tSMS). Initially the DNA samples are restricted in fragments comprising up to 55 basepairs. Subsequently the DNA library fragment is denatured, ligated to an adaptor sequence at the 3’-terminus and captured on the flow-cell by hybridization to the complementary adapter sequences attached on the surface (Figure 2-16). According to a sequencing-by-synthesis approach, reversible fluorescently labeled nucleotides are sequentially added to the nucleic acid templates (Figure 2-16). The polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides into nascent complementary strands on all the templates. After a washing step, which removes all non-reacted nucleotides, the incorporated nucleotides are imaged and their positions recorded (Figure 2-16). Angstrom spatial resolution is not necessary since the hybridized templates distance is sufficiently high (0.1 micrometer range) and the nucleotides are inserted sequentially; only the time resolution to discriminate successive incorporations is required. Following removal of the fluorescent group the process continues through the flowing of each of the other three bases (Figure 2-16). Therefore multiple four-base cycles result in the parallel determination of billions of template sequences and the generation of up to 900Mb/day sequence information (Figure 2-16). 80 Unlike amplification-based sequencing technologies, in tSMS every strand is unique and sequenced independently. As a result, the tSMS process is not subject to “dephasing” errors that occur when amplified DNA clusters fall out of step.80,81

48 b c a 5‘ 3‘-end Hybridization DNA Adapter of DNA to surface sample Denaturation Ligation

3‘

Sequencing-by- d synthesis Flow A C A C A C A f e

1. Capture image 2. Cleavage 3. Flow C 1. Capture image 2. Cleavage 3. Flow G

G T Sequencing-by- G T synthesis G T next cycle 1. Capture image 2. Cleavage 3. Flow T

Capture image g h

Sequences

Figure 2-16: True Single Molecule Sequencing (tSMS) workflow. DNA library fragment is denaturated, ligated to an adaptor sequence at the 3’-terminus and hybridized on the flow-cell (a, b, c). Sequencing-by-synthesis is initiated adding sequentially reversible fluorescently labelled nucleotides. The polymerase catalyzes the sequence-specific incorporation into the template strands of the specific fluorescent nucleotides. After a washing step and removal of the fluorescent group, the incorporated nucleotides are imaged, their positions recorded and the next fluorescent nucleotide flowed (d, f, g). Multiple sequencing-by-synthesis cycles result in the parallel determination of the template sequences (h).

Although the Helicos methodology is very promising and displays an accuracy over 99%80, research applications are currently not reported in literature. In the view of a DNA-encoded chemical library implementation, the read length space is at present very limited (55 basepairs). However, in the future, technology improvements may permit the use of a True Single Molecule Sequencing in chemical library decoding.

49 3. RESULTS

3.1 DNA-Encoded Library “DEL4000” DNA-encoding facilitates the construction and screening of large chemical libraries. Here, we describe general strategies for the stepwise coupling of coding DNA fragments to nascent organic molecules throughout individual reaction steps. The methodology was exemplified in the construction of a DNA-encoded chemical library containing 4’000 compounds named “DEL4000” (DNA Encoded Library 4000). The synthesis of the library was achieved using a split-and-pool procedure, which featured the following sequential steps: (i) conjugation of different N-Fmoc-amino acids to distinct amino-modified synthetic oligonucleotides; (ii) deprotection of the amino moiety (iii) pool and split; (iv) amide bond formation reaction with selected carboxylic acid; (v) encoding of the carboxylic acid used in the previous step by hybridization of partially complementary oligonucleotides followed by Klenow- mediated DNA polymerization, yielding the final compounds in a double-stranded DNA format. Moreover the purity of the intermediate steps was extensively investigated using HPLC and mass spectrometry.

50 3.1.1 Library design and synthesis Figure 3-1 describes the strategy for the construction of a DNA-encoded chemical library consisting of 20 x 200 modules (i.e., 4’000 compounds), joined together by the formation of an amide bond.

NH2 1) Sulfo-NHS EDC 1) SPLIT 200 DMSO HN O 2) Amide bond 30°C, 15min (C ) (1-20) formation HNFmoc 12 ...... POOL NH2 NH2 5‘ NH2 2) (C ) COOH 12 NH2 3‘ 5‘ 20 COOH TEA/HCl pH = 10 30 °C, o/n 3‘ 200 Carboxylic acids x20 3) Piperidine 500mM 3) EtOH prec. 4°C, 1h 4) HPLC x20

...... 200 200 200 ......

20 Encoding Encoding (Annealing) (Klenow)

1) Ion-exchange on cartridge ...... 2) POOL

Pool 4000

Figure 3-1: Schematic representation of the strategy used for the synthesis and encoding of the DEL4000 library. Initially, 20 different Fmoc-protected amino acids were coupled to unique oligonucleotides derivatives, carrying a primary amino group at the 5’ extremity. After deprotection and HPLC purification, these derivatives were pooled and coupled to 200 carboxylic acids in parallel reactions. The identity of each carboxylic acid was encoded by means of a Klenow polymerization step, using a set of partially complementary oligonucleotides. This procedure resulted in a 4000- member library (DEL4000), in which each chemical compound was covalently attached to a double- stranded DNA fragment, containing two coding domains which unambiguously identify the compound’s structure (i.e., the two chemical moieties used for compound synthesis).

Initially, 20 Fmoc-protected amino acids (for the structures see Appendix 9.2) were chemically coupled to 20 individual amino-tagged oligonucleotides. After deprotection and HPLC purification, the 20 resulting DNA-encoded primary amines were coupled to 200 carboxylic acids (for the structures see Appendix 9.2), generating

51 a library of 4’000 members. In order to ensure that each library member contained a different DNA code, a split-and-pool strategy was chosen, which also minimizes the number of oligonucleotides needed for library construction. As indicated in Figure 3- 1, the 20 primary amines covalently linked to individual single-stranded oligonucleotides were mixed and aliquoted in 200 reaction vessels, prior to coupling with the 200 different carboxylic acids (one per well). Following the reaction, the oligonucleotides of each vessel were precipitated as sodium phosphate adducts, after addition of an AcOH/AcONa solution (pH 4.7) and three volumes of ethanol. The identities of the carboxylic acids used for the coupling reactions were encoded by performing an annealing step with individual oligonucleotides, partially complementary to the first oligonucleotide carrying the chemical modification. A successive Klenow fill-in DNA-polymerization step yielded double stranded DNA fragments, each of which contained two identification codes (one corresponding to the initial 20 compounds and one corresponding to the 200 carboxylic acids, see Figure 3-1). The 200 reaction mixtures were then purified on an anion exchange cartridge and pooled. Model reactions performed prior to library construction had shown that the yields of the amide bond forming reaction ranged between 51% and 98% (see Chapter 3.1.2, Table 3-1). The resulting DNA-encoded chemical library, containing 4’000 compounds, was aliquoted at a total DNA concentration of 300 nM and stored frozen prior to further use.

52 3.1.2 Model Compounds A high quality library is crucial for reliable and reproducible selection experiments. Unreacted oligonucleotide and side products may lead to erroneous decoding interpretation and consequently incorrect binder identification. Therefore, since the library quality relies essentially on the yield of the reactions used to produce each compound member, model compounds of the library oligonucleotide conjugate were synthesized in order to validate reaction conditions, yields and product recovery. Three 42mer 5’-Fmoc-deprotected model amino acids oligonucleotide conjugates carrying a primary amino group were individually coupled to four different carboxylic acids using a solution of N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide (EDC) and N-hydroxysulfosuccinimide, and finally buffering the pH by adding an aqueous triethylamine hydrochloride, pH9.0. Following overnight stirring and quenching by addition of Tris-Cl buffer, the reactions were analysed by HPLC and the masses of the reacted oligonucleotides detected by LC-ESI-MS. Typical HPLC coupling yields and recovery were assessed to range between 51% and 98% (Table 3-1, see also Appendix 9.1).

DNA H O O N HN DNA NH

DNA Structure O H 2 N NH2 H 2 N Yield % Recovery*) % Yield % Recovery*) % Yield % Recovery*) %

H S HN OH 98 90 83 68 65 60 O O N H H

HO 70 60 72 60 76 65 O OI

N O >70 70 >64 64 >57 57

OH

Br HO >52 52 >55 55 >51 51

O

Table 3-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three selected 5’-Fmoc-deprotected amino acids oligonucleotide conjugate and four different model carboxylic acids (see also Appendix 9.1). *) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer) following HPLC purification (see Chapter 3.1.7).

53 3.1.3 Oligonucleotides In Figure 3-2 the two distinct sets of oligonucleotide used for the unambiguous encoding of DEL4000 compounds are schematically depicted. The first set (Figure 3- 2a) consisted of 20 unique 42mer single-stranded DNA oligonucleotides, comprising three domains: an 18 nucleotides primer region (including an EcoRI restriction site) for PCR amplification at the 5’-terminus, a region of six bases serving as code (each code differing from the others by at least three bases, see Appendix 9.2) and a hybridization domain of 18 nucleotides at 3’-end. For the conjugation of the initial 20

Fmoc-protected amino acids, a NH2-(CH2)12-modification was added to 5’-terminal phosphate group. The general sequence was 5’-NH2-(CH2)12PO4-GGA GCT TGT GAA TTC TGG XXXXXX GGA CGT GTG TGA ATT GTC (a list with all the 20 codes used for library construction can be found in the Appendix 9.2). A second set of oligonucleotides (Figure 3-2b) for the encoding of the further 200 carboxylic acids used in the second step of the synthesis of the DEL4000 library (see Chapter 3.1.1) consisted of 200 distinct 44mer single-stranded DNA oligonucleotides with a general sequence: 5’-GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC ACG TCC-3’. The sequence contains an 18 nucleotides primer region (including a BamHI restriction site) for PCR amplification at the 5’-terminus, a specific coding region of eight nucleotides (each differing from the others by at least four bases, see Appendix 9.2) and a hybridization domain of 18 bases always complementary to the hybridization domain of the previous set of oligonucleotides (the list with all the 200 codes used is given in the Appendix 9.2).

54 NH EcorI restriction site 2 5‘ 3‘ GGA GCT TGT GAA TTC TGG XXX XXX GGA CGT GTG TGA ATT GTC (42nt) a) X 20

NH2-(CH2)12 18nt PCR primer domain 6nt code 18nt hybridization domain modification

BamHI restriction site 5‘ 3‘ b) GTA GTC GGA TCC GAC CAC XXXXXXXX GAC AAT TCA CAC ACG TCC (44nt) X 200

18nt PCR primer domain 8nt code 18nt complementary hybridization domain

Figure 3-2: Schematic representation of the oligonucleotide sets employed for the encoding of

DEL4000 library. a) 20 unique 42mer single stranded DNA 5’-NH2-(CH2)12PO4- oligonucleotides. The sequences contain three domains: an 18 nucleotides primer region (including an EcorI restriction site) for PCR amplification, a coding region of six bases (each differing from the others by at least three bases, see Appendix 9.2) and a hybridization domain of 18 nucleotides at 3’-end. The amino modification serves as reactive group for the conjugation of the initial 20 Fmoc-protected amino acids. b) Second set of 200 unique 44mer single stranded DNA oligonucleotides served as identification bar- code for the 200 carboxylic acids used in the synthesis of DEL4000. The sequences contain from 5’- terminus: an 18 nucleotides primer region (including a BamHI restriction site) for PCR amplification, a coding region of eight nucleotides (each differing from the others by at least four bases, see Appendix 9.2) and a complementary hybridization domain of 18 bases.

3.1.4 Compounds Various considerations were taken into account for the selection of the 20 Fmoc- protected amino acids and the 200 carboxylic acids to build the library. The compounds had to be commercially available and suitable for conjugation to an amino modified oligonucleotide forming a stable amide bond. The amide bond formation reaction on amino-tagged oligonucleotides worked very well for the construction of DNA-encoded ESAC libraries in our laboratory.47 We mainly utilized the amide bond forming reaction for the conjugation of activated alkylic carboxylic acids to primary amine moieties. The molecules selected were further restricted in size to be between

55 100 and 300 Dalton, (without removable protecting groups in the case of the Fmoc- protected amino acids). We sought compounds with a range of functional groups, with hydrophobic and hydrophilic properties. A complete list with all the structures of the 20 Fmoc-protected amino acids and the 200 carboxylic acids is given in the Appendix 9.2.

The protocol for the amide bond formation reaction was set up by testing several compounds and analyzing them by HPLC and MS (see Chapter 3.1.2). A typical reaction procedure is schematically depicted in Figure 3-3.

O NH a) 1. EDC 1 eq. HN 2 S-NHS 4 eq. O HNFmoc DMSO, rt, 30‘min HNC Piperidine HNC 3‘ Code 5‘ (C12) O 3‘ Code 5‘ (C12) O NH 2. 2 50 eq. COOH (C ) 3‘ 5‘ 12 4°C, 2h

pH=10 TEAA-HCl rt, o/n Fmoc O removing NHC 1. EDC 1 eq. b) S-NHS 4 eq. HNC DMSO, rt, 30‘min 3‘ Code 5‘ (C12) O COOH 2. 3‘ Code 5‘ NH2 18 mer 6 mer 18 mer HNC 3‘ Code 5‘ (C12) O 42mer oligonucleotide pH=10 TEAA-HCl rt, o/n

Amide bond formation Figure 3-3: Reaction scheme of library synthesis. a) Coupling of Fmoc-amino acids to the initial 5’- amino oligonucleotides and Fmoc removal. b) Amide bond formation reaction enabling the final coupling with 200 different carboxylic acids. In the right panel is schematically depicted the structure of the oligonucleotide.

3.1.5 HPLC Purification A challenging task for library construction was the separation of the conjugate oligonucleotide from the unconjugate oligonucleotide precursor. Typically, purifications after first step of library synthesis were performed by reversed phase HPLC using an ion pairing reagent. In order to prevent the addition of contaminants, a volatile buffer was employed and removed under vacuum after the chromatographic

step. The best purification profiles were obtained using a C18 column with increased pH stability (Figure 3-4a). Dimethylbutylammonium acetate, (DMBAA, 100 mM,

56 pH = 7) was used for those oligonucleotides not sufficiently resolved by the TEAA buffer.

In order to distinguish oligonucleotides and oligonucleotide conjugates from starting compounds and side-products, absorption was monitored at 260 nm and 280 nm. The oligonucleotide absorption ratio 260 nm: 280 nm is typically 1.8 : 1.

3.1.6 Mass Spectrometry Electrospray ionization mass spectrometry (ESI-MS) was employed for the characterization of the reaction products after oligonucleotide conjugation in the first step of library construction. Desalting of the oligonucleotide from sodium and potassium adducts is crucial for the ESI-MS analysis. The multiple adducts of the phosphate backbone of the oligonucleotide with sodium and potassium dramatically decrease the sensitivity and complicate the interpretation of the spectra. To avoid manual desalting (e.g. by Zip Tips), desalting was performed on-flow before each mass spectrometric analysis. While several combinations of column package material and buffer systems have been reported to efficiently desalt oligonucleotides on-flow before mass spectrometry82, the only system working successfully in our hands was 1,1,1,3,3,3-hexafluoroisopropanol (HFIP) as volatile acid component and

triethylamine (TEA) as ion pairing reagent on a C18 column. Since TEA strongly suppresses the ionization, its concentration was kept to 5 mM, thus allowing sufficient ion formation and desalting. This protocol enabled the ESI-MS of oligonucleotides of various sizes as multiple charged molecules (Figure 3-4b) with sensitivity up at 5 pmol.

57 a)

Compound-oligonucleotide conjugated Carboxylic acid compound

Initial oligonucleotide

b) - 14 13 - - 12

11 -

10 - - 9 8 - - 7

Figure 3-4: Example of oligonucleotide HPLC purification and mass spectrometry charachterization. a) HPLC purification of a typical coupling reaction of an Fmoc-amino acid and 5’-amino- ologonucleotide after Fmoc removal. The green line indicates the absorption at 260 nm, the red line at

280 nm. The chromatogram is recorded using TEAA, 100 mM, pH = 7 as buffer system on a C18 column. b) ESI-MS of a compound oligonucleotide conjugate as multiple negative charged molecules. The peaks with a mass over charge ratio between 7 and 14 are depicted.

3.1.7 Oligonucleotide concentration determination Following HPLC purification and solvent removal under vacuum, the oligonucleotide fractions were dissolved in water. The concentration was determined measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer). The extinction coefficient of each oligonucleotide was calculated from the specific sequence assuming the following per nucleotide molar extinction -1 -1 -1 -1 -1 -1 coefficient: εT = 8400 cm M ; εA = 15200 cm M ; εC = 7050 cm M ; εG = 12010 cm-1M-1. The ratio of absorbance at 260 nm and 280 nm was used to estimate the purity of DNA and other contaminants that absorb strongly at or near 280 nm. A ratio 260/280 of ~1.8 was generally accepted as “pure” for DNA. The ratio of absorbance at 230 nm and 280 nm was used as a secondary measure of nucleic acid purity from

58 organic contaminants which absorb at or near 230 nm. Expected 260/230 values for “pure” DNA are commonly in the range of 2.0-2.2.

3.1.8 Polymerase Klenow encoding Following coupling with the 200 different carboxylic acids in the second step of library construction (see Chapter 3.1.1, Figure 3-1), an annealing step with individual 44mer oligonucleotides (see Chapter 3.1.3), partially complementary to the 42mer oligonucleotides carrying the chemical modification (see Chapter 3.1.3), was performed. A subsequent Klenow fill-in DNA-polymerization step at 37 °C yielded double stranded DNA fragments, each of which contained both identification codes (see Chapter 3.1.1, Figure 3-1). The Klenow fragment83 is a large protein fragment produced when DNA polymerase I from E. coli is enzymatically cleaved by the protease subtilisin. The Klenow Polymerase I exhibits optimal performance at 37 °C retaining the 5’ → 3’ polymerase activity and the 3’ → 5’ exonuclease activity for removal of precoding nucleotides and proofreading, but losing its 5' → 3' exonuclease activity. Therefore Klenow Polymerase I is very suitable for fill-in reactions of partially complementary DNA strands at mild temperature. Conversely, polymerase fill-in using conventional Taq DNA polymerase requires higher polymerization temperature (75-80 °C), which may compromises the stability of the conjugate compounds.

Product of Klenow Polymerase fill-in encoding were analysed by gel electrophoresis on polyacrylamide gels 20 % trisborate-EDTA (TBE) and 15 % trisborate-EDTA urea (TBU). In all the reactions the rate of polymerization was complete.

3.1.9 Summary The individual steps described above were used for the construction of the DEL4000 library as shown in Par 3.1.1, Figure 3-1. After dissolving the 20 Fmoc-protected amino acid compounds and the specific 5’-amino-modified oligonucleotide tag, a peptide bond formation reaction was performed. Following Fmoc protection removal, the reaction products were purified by HPLC and the appropriate fractions dried under vacuum, dissolved in water and analyzed by mass spectrometry. Typical HPLC yields on this first step were over 43% (on average 65%). In a second step, each of the 20

59 compound oligonucleotide conjugates was mixed in equimolar amount (4 nmol each) in order to generate a first DNA encoded sub-library of 20 amino-tagged compounds. The pool was then equally split in 200 vessels and each vessel underwent a second peptide bond formation with a different carboxylic acid. Following precipitation of the oligonuclotides of each reaction as phosphate adducts, the modification was enzymatically encoded by Klenow assisted polymerization using a further DNA oligonucleotide fragment. At the same time, the encoding also generated the desired double stranded DNA format of the final DEL4000 library. After purification of DNA over ion-exchange cartridges, the 200 reaction vessels were pooled to produce the final 4000 member compounds DNA Encoded Library (DEL4000). The library was aliquoted at a total DNA concentration of 300 nM and stored frozen prior to further use. Figure 3-5 schematically represents the general structure of a typical compound in the library.

Pharmacophore compound 2nd building block

corresponding 5‘CODE1 CODE2 3‘ NH O code: 8 bp CODE2

NH

O 5‘ G GA 1st building block 3’ GC - C TT CT GT corresponding CG GA AA AT code: 6 bp CA TC CT TG TA GX CODE1 AG XX AC XX CX XG 1st constant XX GA XX CG domain:18 bp XC TG CT TG GC TG AC AA AC TT AC GT TT CY AA YY CA YY GY YY YY YG YY TG YY GT 2nd constant YC CG AC GA domain:18 bp CA TC GC CG CT AC AG TA GC C- TG 3’ AT Total: 68 bp G- 3rd constant 5‘ domain:18 bp

Figure 3-5: Schematic representation of the general structure of a typical compound in the DEL4000 library. Each pharmacophore compound was assembled from two different building blocks (in green and red) in a split-&-pool fashion and was encoded by two corresponding DNA domains (green X and red Y) of six and eight base pairs respectively. The coding regions are both flanked by two constant PCR priming domains of 18 base pairs and by a constant spacer of 18 base pairs that acts as spacer between the codes.

60 3.2 Selections using the DEL4000 library In order to investigate the functionality of the newly synthesized DEL4000 library and to validate the reliability of the selection and of the high-throughput sequencing read- out procedure, DEL4000 was biopanned onto three target proteins (streptavidin, matrix metalloproteinase 3 and polyclonal human IgG) immobilized on a sepharose support in three independent selection experiments. Although the concentration of an individual library member is below 1 nM, binding compounds can efficiently be recovered by selection with biotinylated target protein in solution at concentrations

above the dissociation constant Kd, followed by streptavidin capture. Similarly, the selection can be performed with the protein of interest immobilized at high surface density on a solid support (e.g., CNBr activated sepharose), in full analogy to the procedures commonly used for the selection of antibodies from phage display libraries.84 Therefore selections were performed by incubating the DEL4000 library with the target protein attached on a sepharose resin (Figure 3-6a).54 The resin, containing the retained DNA-encoded binding molecules was washed four times with 400 µL PBS and finally resuspended in 100 µL water for a subsequent PCR amplification step followed by high-throughput sequencing (Figure 3-6a). After analysis of the experimental sequences derived by high-throughput sequencing using an in-house developed program written in C++, the frequency of each code corresponding to the individual pharmacophores was plotted in a 3D graph in which the xy plane represents the 4000 different sequences (compounds) of the library, while the number of sequence counts for each compounds is reported on the z axis (Figure 3-6b).

61 a)

Target I IITarget III IV protein protein High-Throughput Sequencing

DEL4000 Library

b) GGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTAATTGTCGACTTCCCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCTTGGGGTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCTGATCCCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCAGTCAGGGGTGGTCGGATCCGACTA GAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCGTTGACGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGNTTAACTGGACGTGTGTGAATTGTCCTCTNTGTC GGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCTGTGCAGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCAACGTAAGTGGTCGGATCCGACTA CODE1 CODE2 GGAGCTTGTGAATTCTGGGGGTAAGGACGTGTGTGAATTGTCATTAGCTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCCAACGCCGGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTAAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCCACAGTCCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCCACAACTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTCGATCGGACGTGTGTGAATTGTCGTTGTTCCGTGGTCGGATCCGACTA Sequence GGAGCTTGTGAATTCTGGCAAGCTGGACGTGTGTGAATTGTCGCCGTAGGGTGGTCGGATCCGACTASequences GGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGGAAAAGGGTGGTCGGATCCGACTASequences GGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCTGGTGTACGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAGACCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGCGTGCAGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTCCGGCGGACGTGTGTGAATTGTCCCCCCCCGTGGTCGGATCCGACTA 43 GGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCACCAACGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGGGTAAGGCGTGTGTGAATTGTCACAACGGGTGGTCGGATCCGACTA 38 GGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCCATGACCCGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGACGGCAGGAGTGTGTGAATTGTCTATANGCC GGGAGCTTGTGAATTCTGGTCTCCAGGACGTGTGTGAATTGTCACCAGTAAGTGGTCGGATCCGACTA 33 GGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAAAAGGGGGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGACTA 28 GGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCACGTTGGGGTGGTCGGATCCGACT GGAGCTTGTGAATTCTGGACGGCAGGACGTGTGTGAATTACTAAGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCGACTTCCCGTGATCGGATCCGACTA 23 GGAGCTTGTGAATTCTGGAGAACGGGACGTGTGTGAATTGTCGTGTGTCCGTGGTCGGATCCGACTA 20 GAGCTTGTGAATTCTGGATTACTGGACGTGTGTGAATTGTCCCAAAACCGTGGTCGGATCCGATA 18 GGAGCTTGTGAATTCTGGCCCTCCGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGTGAAATGGACGTGTGTGAATTGTCTCCTAGTTGTGGTCGCATCCGACTA 13 GGAGCTTGTGAATTCTGGTATCAGGGACGTGTGTGAATTGTCCGCGCGTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGACGAATGGACGTGTGTGAATTGTCCAGTGTGGGTGGTCGGATCCGACTA 8 Code 1 GGAGCTTGTGAATTCTGGGCACTGGGACGTGTGTGAATTGTCAGGAAGTTGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGGCTGCGGGACGTGTGTGAATTGTCGCATATAAGTGGTCGGATCCGACTA GGAGCTTGTGAATTCTGGATCTTAGGACGTGTGTGAATTGTCCAACACGGGTGGTCGGATCCGACTA 1 Code 2 1 200

Figure 3-6: Selection and high-throughput sequencing workflow. a) DEL4000 was incubated with the target protein immobilized on a sepharose resin (i). The resin, containing the retained DNA-encoded binding molecules was washed several times (ii) and used as template in a polymerase chain reaction (PCR) amplification (iii) prior to high-throughput sequencing decoding (iv). b) An in-house made C++ program processes various thousands of raw DNA sequences after high-throughput sequencing (left panel). All the codes 1 and 2 present in every sequence are identified and plotted in a 3D graph (right panel). On the xy plane all the 4000 different possible compounds are represented as combination of Code1+Code2, while on the z axis the number of counts for a specific combination (compound) is reported.

3.2.1 Streptavidin selection We have initially assessed the relative composition of the new library and its functionality by performing selection experiments on sepharose resin coated with streptavidin. Since a variety of streptavidin ligands were known with dissociation constants ranking from the mM to the fM54 range, the challenge was to investigate whether binders with different affinities could be isolated from a library containing 4’000 members. D-desthiobiotin was chosen as positive control binder for 54 streptavidin (Kd = 47nM) and a D-desthiobiotin-oligonucleotide-conjugate was

62 synthesized, unambiguously encoded and added to a final concentration of 1 pM to the library of 4000 compounds (20 nM total DNA concentration). Subsequently the spiked library was either added to similar amount of streptavidin-sepharose slurry or to sepharose slurry without streptavidin. Both resins were preincubated herring sperm DNA to prevent aspecific binding. After incubation for 1 h at 25 ºC the beads were washed 4 times with PBS buffer and used as template for PCR amplification of the selected codes.

63 3.2.1.1 Identification of streptavidin binding molecules Figure 3-7 shows the results of the high-throughput sequencing analysis performed on the library before selection, after selection on unmodified sepharose resin used as negative control, and after selection on streptavidin-coated sepharose.

Figure 3-7: Plots representing the frequency (i.e., sequence counts) of the 4000 library members before selection, after selection on empty resin and after selection on streptavidin resin, as revealed by high- throughput 454 sequencing. The chemical structures of some of the most relevant straptavidin binders are indicated. The building blocks used in the two synthetic steps are indicated in green and red color respectively, together with the respective identification number. A known streptavidin binder (desthiobiotin) had been mixed with the library at low concentration prior to the selections serving as positive control.

64 High-throughput sequencing of the library containing 4000 DNA-encoded compounds yielded up to 12.000 sequences per sample. The counts for individual library codes (z axis of the 20 x 200 matrices in Figure 3-7) indicate the abundance of the corresponding oligonucleotide-compound conjugate. As expected, compounds were found to be represented in comparable amounts in the library before selection. The average counts and the standard deviations for the 4’000 compounds were found to be 1.72 +/- 1.42 when analyzing 7’336 individual codes from the library before selection. Similarly, no striking enrichment was observed for selections on unmodified resin. By contrast, the decoding of the streptavidin selection revealed a preferential enrichment of certain classes of structurally-related compounds (Figure 3-7). In addition to desthiobiotin, a biotin analogue with nanomolar affinity to streptavidin,54 which had been spiked into the library as positive control prior to selection (see Chapter 3.2.1), we observed an enrichment of derivatives of the thioester moiety 78, of the ester moiety 49, as well as of other pharmacophores (e.g., 175). Fluorescent amide derivatives of compounds 49 and 78 had previously been found to bind to streptavidin with dissociation constants in the millimolar range, as assessed by fluorescence polarization assays,54 while others (e.g., 175) had not previously been reported as streptavidin binders.

3.2.1.2 Characterization of streptavidin binding molecules In order to evaluate whether the extensions of the pharmacophore 49 and 78 moieties within the new 4’000-membered chemical library (02, 07, 11, 15, 16, 17 depicted in green color in Figure 3-7, Chapter 3.2.1.1) contribute to an increased affinity towards streptavidin, we measured the dissociation constants of the most enriched compounds by fluorescence polarization at 25 °C, following conjugation to fluorescein (Figure 3- 8a; see also Chapter 3.2.4). Additionally, to assess the specificity of preferentially enriched compounds, we determined the binding affinities towards two unrelated proteins (bovine carbonic anhydrase II and hen egg lysozyme Figure 3-8b) serving as negative controls, and we included four non-enriched compounds (15-117, 02-107, 13-40 and 15-78) in the analysis.

65 a) Streptavidinb) Carbonic anhydrase IIc) Lysozyme 400 400 400

02-78 (108) 350 350 350 07-78 (73)

300 300 300 17-78 (70)

16-78 (55) 250 250 250 17-49 (32)

200 200 200 11-78 (48) 02-49 (41) 150 150 150 02-107 (0)

(2) 100 100 100 13-40 15-117 (0)

Fluorescence Polarization [mP] 50 50 50 15-78 (7)

(In brackets the counts) 0 0 0 10-8 10-7 10-6 10-5 10-4 10-3 10-8 10-7 10-6 10-5 10-4 10-3 10-8 10-7 10-6 10-5 10-4 10-3 Concentration [M] Concentration [M] Concentration [M]

O CH3 O CH3 O CH3 O CH3 CH3 O O Me S S S 49 78 S 78 78 S

78 78 O HN I Br I HN O MeO HN O HN O HN O S HN O 07 11 OMe 17 DNA 02 DNA 17 DNA DNA O N O N O O N O N H O H H 16 H OtBu (C H2 )4 NH O HN DNA HN O DNA

CH3 O O O Me 49 O S OMe HN HN DNA DNA CH HN O DNA 3 HN O 78 HN 117 O O O S Br HO N O HN O 40 O N O O HN O H S O N H 15 O O 13 DNA 15 02 O N H

Figure 3-8: Dissociation constants of the selected compounds determined by fluorescence polarization. Individual compounds identified in the streptavidin selection experiments were synthesized as fluorescein conjugates and incubated with different concentrations of target proteins (streptavidin, bovine carbonic anhydrase II and hen egg lysozyme). a) The top streptavidin binding molecules [various shades of blue], identified with at least 30 counts (see Chapter 3.2.4), exhibited a preferential binding towards streptavidin, with Kd values ranging between 350 nM and 11 μM). By contrast, non- enriched compounds (shades of red) did not exhibit an appreciable binding to streptavidin (Kd > 50 μM). b, c) Neither the streptavidin binders nor the non-enriched compounds exhibited an appreciable binding to carbonic anhydrase II or to lysozyme. The structures of the 11 compounds can be found in Figure 3-7.

The dissociation constants towards streptavidin of the most enriched compounds ranged between 350 nM and 11 μM [Kd (17-49) = 350 nM; Kd (02-78) = 385 nM; Kd

66 (17-78) = 374 nM; Kd (02-49) = 804 nM; Kd (16-78) = 1.1 μM; Kd (11-78) = 3.5 μM;

Kd (07-78) = 11 μM; Figure 3-8a]. These compounds, each represented at least 30 times in the high-throughput sequencing results, were found at least ten-times more frequently after selection on streptavidin, compared to their occurrence in the unselected library and to what would be predicted by a random statistical distribution (for a simulation, see Chapter 3.2.4). By contrast, four randomly chosen negative- control compounds, experimentally found less than 7-times after sequencing,

exhibited Kd values to streptavidin > 50 μM (Figure 3-8a). Importantly, all compounds exhibited no appreciable binding affinity (Kd > 200 μM; Figure 3-8b,c) towards lysozyme and carbonic anhydrase serving as negative control proteins, thus confirming the specificity of the streptavidin selection. Table 3-2 summarizes the dissociation constants of the tested compounds towards the different targets.

Fluorescent Counts after DEL4000 Streptavidin Lysozyme Carbonic anhydrase Compound selection Kd (μM) Kd (μM) Kd (μM)

13-40 2 54 384 703

11-78 48 3.5 753 781

17-49 32 0.35 1.9e3 225

17-78 70 0.37 834 264

16-78 55 1.1 5.2e3 1e5

15-117 0 99 1.3e7 1.58e8

02-49 41 0.80 1.9e4 452

02-78 108 0.38 3.9e3 5.4e7

07-78 73 11 9.8e8 448

15-78 7 79 1.9e7 6.9e6

02-107 0 50 694 1.4e8

Table 3-2: Complete list of the dissociation constants towards different targets of the selected compound fluorescein conjugate revealed by fluorescent polarization measurements.

67 3.2.2 Polyclonal human IgG selection Immunoglobulin G (IgG) is an immunoglobulin consisting of two heavy chains γ (H) and two light chains (L) linked to each other by disulfide bonds, with a total molecular weight of approximately 150KDa.85 As for the other immunoglobulins, the

variable portion (V-domain) of the heavy and light chains (VH and VL respectively) of IgG confers to the antibody the ability to bind specific antigen, whereas the constant domains (C domains, CH and CL respectively) determine the isotype and therefore the functional properties of the antibody.85 The IgG is the most abundant immunoglobulin with four different isotypes (IgG1, 2, 3, and 4 in humans) representing the 75% of serum immunoglobulins in humans. 85 IgG molecules are synthesised and secreted by plasma B cells and are predominantly involved in the secondary antibody response. 85 Two antigen binding sites allow the binding of IgG to a variety of pathogens (viruses, and fungi), protecting the body against them by agglutination, immobilization, complement activation, opsonization for phagocytosis and neutralization of their toxins. 85 IgG plays a fundamental role in the immune defence against pathogens and certain monoclonal antibodies can be used for pharmaceutical applications. Consequently the production and engineering of therapeutic antibodies has attracted the interest of numerous pharmaceutical companies.86 For this reason, in a second selection of DEL4000 library we aimed to identify small organic molecules which display binding to polyclonal human IgG, immobilized on CNBr-activated sepharose, which could be useful for affinity purification of human IgG in the industrial manufacture practice.

3.2.2.1 Identification of polyclonal IgG binding molecules After selection of the library DEL4000 on polyclonal human IgG-sepharose resin and a PCR amplification step, high-throughput sequencing decoding was performed. A total 39’092 sequence tags were identified. Figure 3-9 graphically summarizes the high-throughput sequencing results, revealing a superior enrichment, after selection of the derivatives of the compound 40 (927 times overall combination counts) and of the thiophene moiety 69 (927 times overall combination counts). Typically, bromide 02- 40 was identified 96 times out of a total 39’092 identified sequence tags, while >50% of library members were detected between 1 and 10 counts and approximately 10% of the compounds were identified over 20 counts (see also Chapter 3.2.4).

68 H Br OMe O N HO H DNA S N O O 2 N O 16 69 40 O N H 02 O HN O S 18 NH 69 Br DNA S H N O DNA DNA O N 02 H 100 NH

O 118 40 O 20 60 O MeO NO2 O NH 08 HO

20 O H 1 N DNA

1 O 200

Figure 3-9: Plot representing the frequency (i.e., sequence counts) of DEL4000 library members after selection on polyclonal human IgG resin, as revealed by high-throughput 454 sequencing. The chemical structures of some of the most relevant compounds enriched are indicated. The building blocks used in the two synthetic steps are indicated in green and red colour respectively, together with an identification number.

69 3.2.2.2 Characterization of polyclonal IgG binding molecules by affinity chromatography resins Using the diamino linker O-bis-(aminoethyl) ethylene glycol, compound 02-40 and 16-40 were coupled to CNBr-activated sepharose, and the resulting resin was evaluated for its performance in the affinity capture of labelled (Cy5 fluorescent dye and biotinilated) polyclonal human IgG, spiked into Chinese hamster ovary (CHO) cell supernatant. After loading 100 μL (4 μM) of labeled polyclonal human IgG on 70 mg either of 02-40-sepharose resin or of 16-40-sepharose resin, the affinity chromatography columns were washed with 5 mL PBS, 5 mL 500 mM NaCl, 0.5 mM EDTA and 5 mL 100 mM NaCl, 0.1% Tween 20, 0.5 mM EDTA and eluted three times with 200 μL of triethylamine 100 mM. All the fractions were collected and concentrated back to a final volume of 100µL by centrifugation and consequently analyzed by gel electrophoresis. Figure 3-10 shows that both IgG labelled with the fluorophore Cy5 and with biotin could be completely and selectively captured from the supernatant, and could be eluted using 100 mM aqueous triethylamine solution.

02 or 16 H H O N N resin O O 2 O HN 40 O

Coomassie Blue Cy5 Detection Coomassie Blue Streptavidin-based blot

MInW E(+)MInW E (+) M In W E (+) MInW E (+)

225 150 102 76

52

38

IgG (Cy5-labeled) IgG (biotinylated) Figure 3-10: Affinity chromatography of CHO cells supernatant on resin containing the compound 02- 40 or 16-40, spiked with human IgG labeled either with Cy5 or with biotin. For antibody purifications, relevant fractions were analyzed by SDS-PAGE both with Coomassie Blue staining and with a specific detection method (Cy5 fluorescence and a streptavidin horseradish peroxidase-based blot, respectively). M = molecular weight marker; In = Input fraction for the chromatographic process; W = pooled washed fractions; E = pooled eluted fractions. The lane (+) corresponds to Cy-5 or biotin- labeled polyclonal human IgG. In, W and E fractions were normalized to the same volume, prior to SDS-PAGE analysis.

70 3.2.3 Matrix metalloproteinase 3 (MMP3) selection General catabolism of tissue structures by tumour-cell proteases provides access to the vascular and lymphatic systems, thereby facilitating metastases and cancer dissemination.87 Proteolytic enzymes, through their capacity to degrade extracellular matrix (ECM) proteins, are important components of this process. Among protease- like proteins, the matrix metalloproteinases (MMPs) are a group of 24 zinc-dependent enzymes capable of degrading the ECM and the basement membrane and process bioactive mediators.88 For this reason, MMPs have been the focus of much anticancer research, with inhibitors investigated in clinical trials. The establishment of causal relationships between MMP overexpression and tumour progression initially encouraged the development of MMP inhibitors (MMPIs) as cancer therapeutics.89 In addition to connective-tissue-remodelling functions, MMPs are known to precisely regulate the function of bioactive molecules by proteolytic processing. For example, MMPs mediate cell-surface-receptor cleavage and release, cytokine and chemokine activation and inactivation, and the release of apoptotic ligands.90 These processes are involved in cell proliferation, adhesion and dispersion, migration, differentiation, angiogenesis, apoptosis and host defence evasion characteristic of the early stages of tumour growth, before metastasis occurs.89,90 MMPIs may therefore be potentially suitable for blocking cancer progression. At the same time, inhibitors with insufficient specificity may suppress normal tissue function or host defence processes. Very recently our group used 550 member ESAC library (for the technology see Chapter 2.1.2) in a two-step selection procedure for the identification of novel inhibitors of stromelysin-1 (MMP-3), a matrix metalloproteinase involved in both physiological and pathological tissue remodeling processes, yielding novel inhibitors with micromolar potency suitable for subsequent medicinal chemistry optimization.57 Encouraged by the promising results we decided to perform a MMP3 selection with the larger library DEL4000.

3.2.3.1 Identification of MMP3 binding molecules Figure 3-11a shows the relative abundance of the individual compounds as obtained from high-throughput sequencing. A different fingerprint compared to the streptavidin and IgG selections was observed. Among the compounds which displayed the highest

71 enrichment, four compounds were selected (02-118, 13-17, 18-96, 17-104) and tested for MMP3 binding and inhibition.

3.2.3.2 Characterization of MMP3 binding molecules The MMP3 affinity constants of the compounds 02-118, 13-17, 18-96, 17-104 were determined by fluorescence polarization at 25 °C, following conjugation to fluorescein using the diamino linker O-bis-(aminoethyl) ethylene glycol (Figure 3-

11b). Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while the other selected compounds did not reveal an appreciable binding to MMP3 (Figure 3-11b). On the other hand the inhibition assays were performed incubating the MMP3 (500 nM) with a dilution series of the inhibitor (02-118, 13-17, 18-96, 17-104) using Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH2 as fluorogenic substrate. Essentially no substantial inhibition was observed for any of the compounds tested. The observation led us to the conclusion that compound 02-118 likely binds yet at a site outside of the catalytic pocket.

250 O 13 O I a)DNA b)

N NH DNA H NH 18 O 17 O Br H H O HN S N N NH O DNA NH O O N 2 HN N I 200 HN O 2 OO 17 96 O O O O 55 NH Br 104 55 DNA HN O 150 O HO O OH

118 100 30 20

50

5 5 0 -7 -6 -5 1 10 10 10 0.0001

1 [MMP3] (M) 02-118 200 13-17 15-117 17-104 18-96

Figure 3-11: DEL4000 library selection with human MMP3. a) The plot represents the frequency (i.e., sequence counts) of DEL4000 library members after selection on human MMP3 resin, as revealed by high-throughput 454 sequencing. The chemical structures of some of the most relevant compounds enriched are indicated. The building blocks used in the two synthetic steps are indicated in green and red colour respectively, together with an identification number. b) MMP3 affinity constants determination by fluorescence polarization of the compounds 02-118, 13-17, 18-96, 17-104.

Compound 02-118 exhibited the best dissociation constant (Kd of 11 μM), while the other selected compounds did not reveal an appreciable binding to MMP3.

72 3.2.4 Computational simulation of DEL4000 selections In order to assess whether the enrichment of a compound in high-thoughput sequencing procedures is statistically significant, we simulated the stochastic distribution of sequence counts, using software written in-house (Dr. Y. Zhang) in C++. The program generated a pool of 4000 equally likely numerical codes representing the 4000 member of DEL4000 library. According to the number of sequences obtained after high-throughput sequencing decoding, a corresponding number of codes picking was randomly performed by the software out of the stochastic pool. The simulation was then repeated 100 times. The average simulated distribution was plotted displaying the number of codes (i.e., DNA-encoded compounds in the library), which would be observed with a given number of counts (i.e., number of sequences) in an ideal library before selection, in which all library members were equally represented. This simulated distribution was compared with the experimental distribution of the sequence counts observed for the members of the library after 454-assisted sequencing of the PCR reaction before selection (Figure 3- 12a), after selection on Tris-quenched resin (Figure 3-12b) and on streptavidin-resin (Figure 3-12c), as well as resin coated with human matrix metalloproteinase 3 (Figure 3-12d) and with polyclonal human IgG (Figure 3-12e).

73

a) b) c) d) e)

*

Figure 3-12: Simulated and experimental distribution of sequence counts observed for members of the library before selection (a) and after selection on Tris-quenched resin (b), streptavidin-resin (c), as well as resin coated with human matrix metalloproteinase 3 (d) and with polyclonal human IgG (e). The plots display the number of codes (i.e., DNA-encoded compounds in the library), which were observed with a given number of counts (i.e., number of sequences) either in the experimental 454-assisted sequence of PCR reaction (performed before or after selection), or in a computer-assisted simulation. While in the library before selection experimental findings and simulation are in excellent agreement, in selection experiments certain compounds are enriched much more compared to what would be predicted from the statistical distribution of sequence counts in an equimolar mixture of compounds. The sequences of compounds in plot (c) identified with an asterisk were found more than 30-times; these compounds were then chosen for the experimental affinity determination (Figure 3-8). The individual plots exhibit a different maximum for the simulated curve of number of codes observed with a certain number of counts, due to differences in the overall number of experimental sequences (e.g., 7’336 overall sequence counts for the library before selection; 39’032 overall sequence counts for IgG selections).

While in the library before selection experimental findings and simulation are in excellent agreement, in selection experiments certain compounds are enriched much more compared to what would be predicted from the stochastical distribution of sequence counts in an equimolar mixture of compounds. The sequences of compounds in plot of Figure 3-12c identified more than 30-times (indicated with an asterisk in Figure 3-12c) were then chosen for the experimental affinity determination (see Chapter. 3.2.1.1, Figure 3-7). Notably, the individual plots exhibit a different maximum for the simulated curve of number of codes observed with a certain number of counts, due to differences in the overall number of experimental sequences (e.g., 7336 overall sequence counts for the library before selection; 39032 overall sequence counts for IgG selections).

74 3.3 General strategies for the stepwise construction of very large DNA encoded chemical libraries The demonstration that high-quality DNA-encoded chemical libraries could be synthesized and decoded using 454 high-throughput sequencing technology encouraged us to investigate methodologies for the construction of larger DNA- encoded chemical libraries of unprecedented size, (potentially comprising >106 compounds), featuring the stepwise addition of at least three independent sets of chemical moieties and identification oligonucleotide tags. Therefore we investigate a three rounds split-&-pool chemical library synthesis based on selective deprotection and reaction of di-amine carboxylic acid derivative core scaffolds as well as three different encoding strategies, featuring the stepwise insertion of three independent oligonucleotide codes using experimental procedures based either on the sticky-end ligation of DNA fragments and/or annealing of partially complementary oligonucleotides, followed by Klenow-assisted polymerization.

3.3.1 Selective deprotection and reaction of di-amine derivatives The general strategy for the construction of a DNA-encoded chemical library consisting of N x M x K modules (i.e., 10 x 200 x 200 compounds) joined together by the formation of an amide bond using a split-&-pool procedure is given in Figure 3- 13 (see also Chapter 2.1.1.4, Figure 2-8). Initially, a set (i.e., N = 10) of di-amino protected carboxylic acids is conjugated to distinct amino modified synthetic

oligonucleotides. Cleavage of one amino moiety protective group (PG1) of each of the core scaffolds followed by split-&-pool amide bond formation reaction with selected carboxylic acid (i.e., M = 200) and subsequently enzymatic encoding lead to a first sub-library pool of N x M members. After removal of the further amino moieties

protective group (PG2) and split, the N x M pool may undergoes an additional amide bond formation reaction with suitable carboxylic acids (i.e., K = 200). Encoding of the last set of carboxylic acids used through enzymatic elongation of the oligonucleotide tags and pooling of the reaction may lead to the final library mix of N x M x K member compounds (i.e., 400’000).

75 NHPG 2 NHPG2

NHPG NHPG 1 2 2 1

NHPG2 NHPG2

NHPG2 NHPG2

NHPG NHPG 2 2 2 2

NHPG2 NHPG2 NHPG NHPG2 2

PG1NH NHPG2

PG1NH NHPG2 . NHPG . . 2 Split / . Split / PG NH Encoding Pool . Encoding Pool 1 NHPG . Reaction 2 Reaction . .

10 N n ...... n ......

NHPG2

NHPG2 NHPG2

NHPG2

NHPG2

NHPG2

200 200 M K

Figure 3-13: General strategy for the construction of a DNA-encoded chemical library consisting of N x M x K modules (i.e., 10 x 200 x 200 compounds). The amino protective groups (PG1) of a set (i.e., N = 10) of di-amino protected carboxylic acids conjugate to unique oligonucleotide tags are removed. Subsequently in a split-&-pool fashion amide bond formation reaction is performed with selected carboxylic acid (i.e., M = 200). After enzymatic encoding, cleavage of the further amine protective group (PG2) allows an additional split-&-pool amide bond formation reaction with carboxylic acids (i.e., K = 200). DNA Encoding of the final modifications led to the final DNA-encoded library of N x M x K compounds (i.e., 400’000).

3.3.1.1 Orthogonal protective group and selective deprotection The choice of appropriate orthogonal protective groups and of convenient di-amino carboxylic acid core scaffolds is crucial for the construction of a DNA-encoded library as described in the previous paragraph (Figure 3-13). A list of useful protective groups for amino moieties with suitable removal condition compatible with the DNA is shown in Table 3-3. The assessment of the effective DNA-compatibility of the cleavage condition (for Fmoc cleavage see Chapter 3.1.2) was obtained by coupling a specific N-protected cis-2-aminocyclopentanecarboxylic acid to a 5’- amino-modified oligonucleotide. Following purification by HPLC, the cleavage of

76 the amino moiety was performed and analyzed by HPLC and mass spectrometry (Table 3-3).

NHPG

COOH

1a-d N-protected cis-2-aminocyclopentanecarboxylic acid Protective Group Compound Name Type Cleavage HPLC yield (PG) Piperidine 9H-fluoren-9-methyl O 500mM 1a carbamate Base labile Quantitative O water/DMSO, 4 ºC, (Fmoc) 1h 4,5-dimethoxy-2- 366 nm, 1mM O 2N OMe nitrobenzyl AcOH/AcONa 1b O Photocleavable Quantitative OMe carbamate pH 4.7 Pyrex, 4 ºC, O (Nvoc) 30min

pent-4-enamide Iodo I2 THF/water 1c 80 % O lactonization 1h

2-(biphenyl-4- AcOH/AcONa yl)propan-2-yl 1d Acid labile water 90 % O O carbamate pH 3-4, 35 ºC, 1h (Bpoc)

Table 3-3: Protective groups for amino moieties with compatible with the DNA. Cis-2- aminocyclopentanecarboxylic acid was protected on the amino moiety with a selected protective group and coupled on a 5’-amino-oligonucleotide. Following conjugation, removal of the protective group was performed and HPLC assessed the yield of the cleavage.

Prior to scaffolds preparation, we investigated the selectivity of the orthogonal- removal of a combination of two amino-protective groups in presence of DNA. We explored the use of Fmoc (base labile) and Nvoc (photo-cleavable) amino protective

group combination. After coupling of Nα-Fmoc, Nε-Nvoc lysine (2) to an amino- modified oligonucleotide, Fmoc was removed through addition of piperidine and the

77 completeness of the reaction assessed by HPLC (Figure 3-14). The mass of the expected N-Nvoc amino-acid oligonucleotide conjugate was confirmed by ESI-MS as the only product of the reaction.

1) Sulfo-NHS EDC NH NvocCl 1eq. DMSO 2 NHFmoc 30°C, 15min O NHFmoc Na2CO3 2eq. HN HN COOH H N COOH water/dioxane NH 2 2 Nvoc 2h Nvoc 2) (C12) NH (C ) 2 TEA/HCl pH = 10 12 30 °C, o/n 3) Piperidine 500mM 4°C, 1h

O 2 N OMe = NvocCl Cl O OMe O

Figure 3-14: Selective removal of Fmoc protective group on model Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid oligonucleotide conjugate. Initially the terminal amino moiety of 2-N-Fmoc lysine was protected by mean of NvocCl reagent (i). Following coupling to 5’-amino-oligonucleotide, piperidine was added and Fmoc removed (ii). After HPLC, ESI-MS revealed N-Nvoc amino-acid oligonucleotide conjugate as the only product of the reaction.

3.3.1.2 Core scaffolds design and synthesis strategy

The confirmation that Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid conjugated to amino-modified oligonucleotide allows the selective cleavage of the Fmoc moiety, quantitatively yielding in the corresponding N-Nvoc protected di-amino acid

oligonucleotide conjugate, led us to the preparation of a variety of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffolds for investigating the feasibility of the library synthesis pathway. Figure 3-15 depicts four convenient strategies for the preparation

of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acids.

78 HO FmocHN ∗ ∗ ∗ ∗ a) COOMe COOH ∗ ∗ BocHN NvocHN

H2N NH2 1. NvocCl 1eq. FmocHN NHNvoc DIEA, DMF/water

b) 2HCl 2. FmocCl

COOH COOH

B(OH)2 R NvocHN NHFmoc + NvocHN 1. Pd(PPh3)2Cl2 R c) K2CO3 HOOC X DME/EtOH/water COOH μW or or 2. FmocCl B(OH)2 X NvocHN NHFmoc NvocHN R +

R COOH COOH X = Br, I R = substituent with primary amino moiety

d) HOOC NvocCl 1eq. HOOC NHFmoc DIEA, NHFmoc

NH DMF/water NHNvoc 2

Figure 3-15: Scheme summarizing convenient strategies for the preparation of Nα-Fmoc, Nε-Nvoc di- amino carboxylic acid scaffolds. a) Synthesis of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid chiral scaffolds. All the eight initial diastereomers are available commercially. The synthetic strategy allowing the preparation of the final Nα-Fmoc, Nε-Nvoc product can be found afterwards in Figure 3-

16. b) Preparation of aromatic Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid symmetric scaffold. c)

Synthesis of biphenyl Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffolds by means of Suzuki cross-coupling microwave assisted using either amino boronic acid and suitable aromatic halides or amino carboxylic boronic acid and opportune amino halides. d) Synthesis of short alkyl linkers as Nα-

Fmoc, Nε-Nvoc di-amino carboxylic acid.

Notably, the strategy in Figure 3-15a allows in a straightforward fashion the synthesis of a large variety of stereoisomeric core scaffolds starting from chiral precursor compounds. Conversely, the strategies in Figure 3-15b,c,d describe convenient pathways for the preparation of aromatic, bi-phenyl and alkyl Nα-Fmoc,

Nε-Nvoc carboxylic acid building-blocks respectively.

79 3.3.1.3 Model compounds for N-Fmoc, N’-Nvoc di-amino carboxylic acid core scaffold based library. In order to demonstrate the possibility of constructing a DNA encoded library by means of Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid core scaffold in a three split-

&-pool rounds approach, we initially synthesized two building blocks (Nα-Fmoc, Nε- Nvoc -lysine and (1S,3R,4R)-3-Nvoc-amino-4-Fmoc-amino-cyclopentanecarboxylic acid) according to the reaction scheme given previously (see Chapter.3.3.1.1, Figure 3-14) and in Figure 3-16a.

HO 1. MeSO Cl N H N 2 3 H 1 atm 2 a) TEA, CH Cl 2 COOMe 2 2 COOMe Pd/C COOMe 2. NaN BocHN 3 BocHN MeOH 3 DMF 4 BocHN 5

FmocHN FmocCl FmocHN FmocHN NvocCl DIEA HCl 2N COOH COOMe Na2CO3 COOH DMF water/dioxane H N water/dioxane BocHN 672 NvocHN 8 HCl

H S HN COOH NH b) 2 NHNvoc O N H H Biotin Biotin (C12) NH

(C ) NH O 12 2

Sulfo-NHS 366nm, pyrex EDC DMSO 1 mM AcOH/AcONa NH 2 30°C, 15min (pH 4.7), 30min NHNvoc (C12) NH

O (C12) NH 2 H HOOC HN COOH Sulfo-NHS N O H H EDC DMSO Nvoc emoval Coupling I Desthiobiotin 30°C, 15min

O NH Desthiobiotin Desthiobiotin H N NH 2 NH 2 O = = NvocHN O NH Biotin NvocHN (C12) (C ) NH NH HN O 12 O O

Coupling II

Figure 3-16: Preparation of DNA-encoded model compounds of Nα-Fmoc, Nε-Nvoc protected di- amino carboxylic acid based library. a) Reaction scheme for the synthesis of an Nα-Fmoc, Nε-Nvoc di- amino carboxylic acid chiral scaffold. The synthesis has been accomplished with an overall yield of 70 %. b) The N-Nvoc di-amino acid oligonucleotide conjugates were coupled by amide bond formation reaction to biotin or to 3-p-tolylpropanoic acid. Irradiation at 366 nm at 4°C for 30 min in 1 mM AcOH/AcONa (pH 4.7), enables Nvoc removal. In a final step, the two resulting compounds were coupled to a further carboxylic acid (desthiobiotin). HPLC and ESI-MS analysis confirmed the identity of the expected products.

80 Following coupling of the Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative to 5’-amino-modified oligonucleotide and Fmoc deprotection, the product was purified over HPLC. Biotin or to 3-p-tolylpropanoic acid as model carboxylic acids were coupled by amide bond formation reaction to the N-Nvoc di-amino acid oligonucleotide conjugates. The reaction was then purified from small sized organic contaminants by precipitation as sodium phosphate adducts (Figure 3-16b). Following Nvoc removal by irradiation at 366 nm at 4°C for 30 min in 1 mM AcOH/AcONa (pH 4.7), the oligonucleotide conjugats was precipitated once more as sodium phosphate complex (Figure 3-16b). In a final step, the two resulting compounds linked to the oligonucleotide were coupled to a further model carboxylic acid (desthiobiotin). HPLC and mass spectrometry analysis revealed an overall conversion of 35% and 38%, respectively, into the desired bi-sobstituted oligonucleotide conjugates.

81 3.3.2 Stepwise DNA-encoding Encouraged by the preliminary results on selective deprotection and split-&-pool reaction of the di-amine derivatives (see Chapter 3.3.1) , the next challenge prior to the library construction was to investigate methodologies for the stepwise addition of at least three independent sets of oligonucleotide tags as identification code for the chemical building blocks. Three experimental procedures were explored to add in a stepwise fashion the oligonucleotide fragments either by DNA ligation enzyme using sticky-end oligonucleotides or through annealing of partially complementary DNA fragments followed by Klenow-assisted polymerization. The feasibility of the three alternative strategies was demonstrated by gel-electrophoretic analysis and by DNA sequencing of the final construct.

3.3.2 Encoding by ligation Figure 3-17 features the stepwise addition of groups of chemical moieties onto an initial scaffold followed by the sequential addition of the corresponding DNA codes by an iterative ligation procedure. This scheme (Figure 3-17a) is conceptually simple and can be implemented experimentally, but requires two double-stranded DNA fragments with sticky ends for each encoding event (i.e., 200 + 200 + 200 oligonucleotides for a library containing 100 x 100 x 100 chemical groups). Native PAGE analysis with 20% TBE revealed the identity and purity of the DNA fragments used in the encoding procedure (Figure 3-17b).

a) b) a1 3‘ Code A 5‘ Code A 5‘ 3‘ Code BCode A 5‘ Reaction 3‘ Ligation 100

5‘ a2 3‘ 5‘ 3‘ 5‘ 3‘ 60 Code B a3 a4 40 * 20

Reaction 3‘ Code BCode A 5‘ Ligation 3‘Code C Code BCode A5‘

5‘ 3‘ 5‘ 3‘ a5 a5 M a1 a2 a3 a4

Figure 3-17: Stepwise encoding by ligation. a) Encoding strategy based on the sequential ligation of double-stranded DNA fragments. b) Native PAGE analysis with a 20% TBE gel revealed the identity and purity of the DNA fragments used in the encoding procedure. M: marker; a1) single strand 28mer DNA fragment; a2) single stranded 32mer DNA fragment; a3) hybridization of 28mer (a1) with the 32mer (a2) DNA fragments; a4) double-stranded DNA 50mer first ligation step product; a5) Double- stranded 78mer second ligation step product. *) The band is the hybridized oligonucleotide (of a 28mer and 24mer) carrying the Code C which was used in excess.

82

3.3.2.1 Encoding by a combination of Klenow polymerase and ligation An encoding strategy featuring the combination of the Klenow-assisted encoding strategy (see Chapter 3.1.8) and the encoding by ligation (see Chapter 3.3.2.1) is depicted in Figure 3-18a. A double-stranded DNA fragment generated by Klenow fill-in using a biotinylated template is digested with a non-palindromic cutter (i.e., BssSI), followed by streptavidin capture of the biotinylated residual fragments (Figure 3-18a). Subsequently, a ligation step with a complementary double-stranded DNA fragment carrying the third code is performed (Figure 3-18a). Denatured PAGE analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA fragments generated in the encoding steps (Figure 3-18b). a) Code B Code A Code A 5‘ Code A 5‘ Code A 5‘ Klenow 3‘ 5‘ 3‘ Reaction 3‘ Annealing 3‘ a1 5‘ Code B3‘ 5‘ 3‘ a2 Biotin Biotin BssSI Non-palindromic Reaction

Streptavidin Ligation BssSI Capture 3‘ 5‘ Code B Code A 3‘ Code C Code BCode A5‘ Digestion 3‘ 5‘ 5‘ 3‘ 5‘ 3‘ 5‘ a6 3‘ a5 a4 a3 Biotin Biotin b) 100 * 70

40

M a1 a2 a3 a4 a5 a6

Figure 3-18: Stepwise encoding by combination of Klenow polymerase and ligation. a) Encoding strategy based on the formation of a double-stranded DNA fragment by a Klenow-assisted polymerization step, followed by the ligation of a DNA-fragment carrying the third code. b) Denaturing PAGE analysis using a 15% TBE-Urea gel revealed the purity and identity of the DNA fragments generated in the encoding steps. M = marker; a1) single strand 42mer DNA fragment a2) 44mer partially complementary 5’-biotinylated single stranded DNA fragment; a3) 27mer and 23mer hybridized DNA ligation fragments; a4) Klenow assisted polymerization 68mer product; a5) BssSI digestion product (54mer); a6) full-length (81mer) DNA fragment. *) The band is an artefact resulting from incomplete denaturation. If excised, extracted and loaded on a gel, this band migrates at the expected height of a double-stranded 81-base DNA fragment.

83 3.3.2.2 Encoding by Klenow polymerase The synthetic and encoding strategy depicted in Figure 3-19a represents a natural extension of the encoding strategy used in the assembling of DEL4000 library (see Chapter 3.1.8), which would require the lowest number of oligonucleotides for library encoding (100 + 100 + 100 oligonucleotides for a library containing 100 x 100 x 100 chemical groups). The feasibility of the experimental procedure was demonstrated by denaturing 15% TBE-Urea gel-electrophoretic analysis (Figure 3-19b), which monitored the stepwise assembly of DNA-fragments of suitable size. a) Code A 5‘ Code A 5‘ Code BCode A 3‘ Reaction 3‘ Annealing 3‘ Code A 5‘ Klenow 3‘ 5‘ 5‘ 3‘ a1 5‘ Code B 3‘ Biotin a2 Biotin Streptavidin Capture

3‘ Code CCode BCode A5‘ 1) Annealing 3‘ Code BCode A 5‘ Reaction 3‘ Code BCode A 5‘

5‘ 2) Klenow a5 3‘ a4 a3 b) 100

70

40

M a1 a2 a3a4 a5

Figure 3-19: Stepwise encoding by Klenow polymerization. a) Encoding strategy based on the formation of a double-stranded DNA fragment by the sequential use of two Klenow-assisted polymerization steps, starting from partially complementary oligonucleotides. b) Denaturing PAGE analysis performed using a 15% TBE-Urea gel revealed the purity and identity of the DNA fragments generated in the three Klenow-mediated encoding steps. M = marker; a1) single stranded 5’- aminomodified 42mer DNA fragment; a2) partially complementary 3’-biotinylated single stranded DNA fragment; a3) 42mer single-strand DNA fragment partially complementary to first Klenow step product; a4) single-strand 66mer DNA product following first Klenow step polymerization and purification; a5) full- length (90mer) DNA fragment, following purification.

84 3.3.3 Summary Based on the promising results achieved with the selective deprotection and reaction of di-amine derivatives (see Chapter 3.3.1 and Chapter 3.3.2), we have investigated the feasibility of a step-by-step encoding of a model library member comprising three building blocks. The N-Fmoc, N’-Nvoc protected di-amino carboxylic acid depicted in Figure 3-20a was coupled to a 5’-amino-modified oligonucleotide. Following Fmoc removal, the oligonucleotide conjugate was coupled to a further model carboxylic acid (3-p-tolylpropanoic acid) by amide bond formation reaction and precipitation as sodium phosphate adducts (Figure 3-20a). A subsequent Nvoc removal step (Chapter 3.3.1.3) allowed the modification of the resulting oligonucleotide derivative, carrying a reactive primary amino group. In order to generate a DNA fragment carrying three codes which univocally identify the building blocks used for library construction, three experimental strategies were envisaged and experimentally demonstrated. One of the three strategies is depicted in Figure 3-20, featuring the use of a biotinylated oligonucleotide in the klenow-assisted fill-in reaction for the introduction of the second code. A third step in the encoding procedure, featuring the ligation of a Cy3-labeled double stranded DNA fragment, allowed the monitoring of the encoding procedure not only by EtBr DNA staining, but also by fluorescence imaging of gel-electrophoresis (Figure 3-20b). DNA sequencing confirmed the identity and the purity of the DNA constructs (Figure 3-20c).

85 a) NHFmoc NH2 1. Coupling O 1. Coupling H HN HN COOH N NH2 HN Nvoc Nvoc (C12) NH Nvoc O HOOC 5‘ O NH 3‘ (C12) (C12) 2. Fmoc removal 2. Ethanol precipitation 3. HPLC II 1. Nvoc removal I 2. Klenow encoding III 5‘ 3‘ Cy5 H Biotin BssSI N O HN H V O Cy3 N O NH H N Streptavidin (C ) 1. Cy5 Coupling 2 Capture 3‘ 5‘ 12 2. Ethanol Precipitation O 3‘ 5‘ O NH 5‘ 3‘ Ligation 3. BssSI digestion (C12) encoding a2, a3 Biotin IV a1 IV Biotin

b) M a1 a2 a3 a4

100 ** * * Ethidium 80 Bromide 70 * Fluorescence 60 Cy5 H N HN O Cy5 O Fluorescence O NH

3‘ 5‘(C12) a4 5‘ 3‘ Cy3 Cy3 Fluorescence

c) Restriction Restriction Restriction Colony Code A Code B Code C site Ecor I Code A Code B Code C site BssSI site BamHI 1 GTTAGT GATTACCA TTTGCT 2 GTTAGT GATTACCA TTTGCT 3 GTTAGT GATTACCA TTTGCT 5’-GGAGCTTGTGAATTCTGGGTTAGTGGACGTGTGTGAATTGTCGATTACCAGTACTCGTGAAATTTGCTAGGATCCATATTG-3’ 4 GTTAGT GATTACCA TTTGCT 5 GTTAGT GATTACCA TTTGCT 6 GTTAGT GATTACCA TTTGCT 3‘- CCTCGAACACTTAAGACCCAATCACCTGCACACACTTAACAGCTAATGGTCATGAGCACTTTAAACGATCCTAGGTATAAC–5‘ 7 GTTAGT GATTACCA TTTGCT 8 GTTAGT GATTACCA TTTGCT 10 15 19 24 43 50 54 59 63 68 70 75

Figure 3-20: Step-by-step synthesis and encoding of a model library member compound of N-Fmoc, N-Nvoc di-amino carboxylic acid based library. a) N-Fmoc, N-Nvoc di-amino carboxylic acid compound was conjugated to a 5’-amino-modified oligonucleotide (42mer), (i). Following removal of Fmoc and HPLC (i), coupling reaction with 3-p-tolylpropanoic acid was performed (ii). Subsequently Nvoc was removed by irradiation at 366nm and Klenow-assisted encoding was completed with a partially complementary 5’-biotinylated oligonucleotide (44mer) carrying a BssSI restriction site (iii). The extended DNA product was labelled with Cy5-N-hydroxysuccinimide ester reagent and restricted with BssSI enzyme (iv). Incubation with streptavidin sepharose beads allowed the deletion of the small DNA restriction products (v). Cy3-labelled oligonucleotide (23mer) carrying the third code was ligated (vi). b) Gel- electrophoretic analysis with specific detection method (ethidium bromide, Cy5 and Cy3 fluorescence) monitored the stepwise assembly of DNA-fragments of suitable size. a1) Klenow assisted polymerization 68mer product; a2) Klenow assisted 68mer product Cy5 coupled; a3) BssSI digestion product (54mer); a4) full-length (81mer) DNA fragment after ligation with 23mer Cy3 labelled DNA fragment. *) The band is an artefact resulting from incomplete denaturation. If excised, extracted and loaded on a gel, this band migrates at the expected height of a double-stranded 81-base DNA fragment. c) Bacterial cloning and Sanger sequencing of eight different bacterial colonies revealed the identity of the DNA constructs.

86 4. DISCUSSION

We have constructed a high-quality DNA-encoded chemical library containing 4000 compounds (DEL4000). This library was selected for the identification of novel streptavidin, MMP3 and IgG binders. High-throughput sequencing of the library before and after selection revealed the preferential enrichment of binding molecules. In the case of the newly discovered streptavidin binders, we have observed that both building blocks used for the stepwise synthesis of compounds in the library may contribute to the resulting binding affinity. For example, we observed a >100-fold

difference in binding affinity between compounds 02-78 and 15-78, with Kd constants = 385 nM and 78 μM, respectively, in line with their different recovery rates after streptavidin selection (see Chapter 3.2.1.1 and 3.2.1.2, Figure 3-7 and Figure 3-8).

We have also shown that the encoding strategy followed for the construction of the DEL-4000 library can be extended, for example by incorporating a third set of chemical groups and corresponding DNA-coding fragments (see Chapter 3.3.2). Recent advances in ultra high-throughput DNA sequencing with 454 technology indicate that it should be possible to sequence over one million sequence tags per sequencing run.44 Thereby, provided that two orthogonal synthetic procedures are used which feature high coupling yields and which preserve the integrity of the DNA molecule, it should be possible to construct, perform selections and decode DNA- encoded libraries containing millions of chemical compounds.

The potential of using DNA tagging for the identification of binding compounds (e.g., in panning experiments) has long been recognized. However in the last few years research in the field of DNA-encoded chemical libraries has been advanced by the development of novel methodologies for library construction and decoding. The recent interest in DNA-encoded chemical libraries is mainly related to the possibility of constructing libraries of unprecedented size, which can still be screened at low concentrations for protein binding, thanks to ultra-sensitive DNA detection experimental procedures, such as the polymerase chain reaction (PCR) and high- throughput DNA sequencing. In full analogy to antibody phage libraries, DNA- encoded chemical libraries do not rely on biological assays for the identification of the binding molecules, but rather on the physical separation of binding molecules

87 from non-binders. Therefore affinity selection with DNA-encoded chemical libraries as shown in this work can be performed in one reaction tube with standard laboratory equipment, even with target proteins for which screening assays are not yet available.

While the work presented in this Thesis clearly illustrates the potential of DNA- encoded chemical libraries, challenges for the further improvements of this methodology include the improvement of the synthetic procedures, of the encoding strategies and of the read-out methodologies (i.e., high-throughput sequencing). The relatively narrow choice of reactions for the conjugation of chemical moieties to DNA oligonucleotides still represents a limitation, which deserves to be addressed in the future.

At present, large pharmaceutical companies typically screen a few hundred thousand compounds in their high-throughput screening campaigns facing enormous challenges for the preparation, storage and screening of very large libraries of organic molecules, not only from the synthetic point of view, but also in terms of logistics and analysis. Furthermore, the costs associated with the identification of specific binding molecules from a pool of candidates grow exponentially with the size of the chemical library to be screened. Thus, the combination of large repertoires of organic molecules and ingenious screening methodologies is recognized as an important approach for isolating desired binding molecules. For this reason, selections of DNA-encoded chemical libraries such as the one described in this Thesis may facilitate the identification of binding molecules (“hits”) for pharmaceutical applications.

Among the selections described in this work, the identification of binders to polyclonal human IgG appears to have the most direct application. At present, monoclonal antibodies for therapeutic applications represent the fastest growing sector of pharmaceutical biotechnology.86 Protein A sepharose, which is used in virtually all industrial purification procedures for monoclonal antibodies, represents the largest cost factor for the manufacture of therapeutic antibodies. In consideration of the substantial costs, these resins are typically regenerated and re-used, which complicates certain aspects of good manufacture practice. It could be conceivable to replace protein A-based affinity supports with the affinity purification supports based on IgG binding molecules, like the ones described in this work.

88 5. Material and Methods

5.1 Reagents and general remarks Unless otherwise denoted, chemical compounds and proteins were from Sigma- Aldrich-Fluka (Buchs, Switzerland), resin for solid phase synthesis from Novabiochem (Laufelfingen, Switzerland), enzymes from New England Biolabs (Ipswich, MA, USA) and HPLC grade lyophilized oligonucleotides were from IBA GmbH (Göttingen, Germany). SpinX columns were purchased from Corning Costar Incorporated (Acton, MA, USA) and ion-exchange cartridges for DNA purification from Qiagen (Hilden, Germany), (PCR purification cat.no 28104, Nucleotides removal cat.no 28306) and used according to the protocol described by the provider. NMR spectra were recorded with a Bruker 400 MHz spectrometer, with TMS as the internal standard. All reactions involving air- and water-sensitive materials were performed in flame-dried glassware under argon by standard syringe, cannula and septa techniques. Precoated Merck 60 F254 alumina silica gel sheets were used for TLC.

5.2 Synthesis of DEL4000 DNA Encoded Library

The individual organic compounds to be coupled to the 5’ amino-modified 42-mer oligonucleotides were dissolved to a DMSO stock solution (100 mM), occasionally by further addition of water or diluted hydrochloric acid. All HPLC were performed on an XTerra Prep RP18 column (5µm, 10x150mm) using a linear gradient from 10% to

40% MeCN in 100 mM TEAA, pH 7. LC-ESI-MS were performed on an XTerra RP18 column (5 µm, 4.6x20 mm) using a linear gradient from 0% to 50% MeOH over 1 min in 400 mM HFIP/5 mM TEA. The mass spectra were measured from m/z 900 to 2000 by a Waters Quattro Micro instrument (Waters, Milford, MA, USA). Oligonucleotide quantification was performed measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer).

89 5.2.1 Synthesis of library model compounds oligonucleotide conjugate. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds were added to the respective final concentrations in the order: Fmoc-protected amino acid (A, see Appendix) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 50 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’). The reaction was stirred overnight at 25 °C; residual activated species were then quenched and simultaneously Fmoc deprotected by addition of piperidine (500 mM in DMSO). Following HPLC purification, coupling yield was estimated (see Appendix) and the desired fractions were dried under reduced pressure and redissolved in 50 µL of water. The recovery was determined measuring the absorption at 260 nm using a NanoDrop instrument and the masses of the reacted oligonucleotides detected by LC-ESI-MS (see Appendix). Subsequently, to a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds were added to the respective final concentrations in the order: amino acid (B, see Appendix) DMSO solution, 4 mM; N- hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)- carbodiimide in DMSO, 4 mM; the resulting compound oligonucleotide-conjugate aqueous solution, 15 µM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM. After overnight stirring at 25 °C, residual activated species were quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice-cold 90% (v/v) ethanol and then dissolved in 100 µL water. Following HPLC, coupling yields on this reaction step was determined (see Appendix). The desired fractions were dried under reduced pressure and redissolved in 50 µL of water. The recovery was determined measuring the absorption at 260 nm using a NanoDrop instrument and the masses of the reacted oligonucleotides detected by LC-ESI-MS (see Appendix).

90 5.2.2 Coupling reactions of 20 Fmoc-protected amino acids. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds were added to the respective final concentrations in the order: Fmoc-protected amino acids DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N- ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 50 µM, (DEL_O_1_n, 151% overall. 4.0 nmol of each DNA- compound conjugate were pooled to generate a 20 member DNA encoded sub-library.

5.2.3 Coupling reactions of 200 carboxylic acids. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds were added to the respective final concentrations: DMSO-dissolved carboxylic acid, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride, pH9.0, 80 mM; DNA-oligonucleotide sub-library pool, 1.5 µM. All coupling reactions were stirred overnight at 25 °C; residual activated species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice-cold 90% (v/v) ethanol and then dissolved in 100 µL water. Test coupling reactions were also performed with the reaction conditions described above (see Chapter 3.1.2, Table 3-1); using model 42mer 5’- Fmoc-deprotected amino acid oligonucleotide conjugates and model carboxylic acids.

91 The reactions were analysed by HPLC and the masses of the reacted oligonucleotides detected by LC-ESI-MS. Typical HPLC coupling yields and recovery on this step were always >51%.

5.2.4 Polymerase Klenow encoding of 200 carboxylic acids reactions. To a reaction volume of 50 µL, reagents were added to the respective final concentrations: aqueous solution of the pool of 20 oligonucleotide conjugates coupled with the specific carboxylic acid (see Chapter 5.2.3) 320 nM, 44mer oligonucleotide coding oligonucleotide (DEL_O_2_n, 1

5.2.5 Preparation of D-desthiobiotin oligonucleotide-conjugate (positive control) D-desthiobiotin-oligonucleotide-conjugate was synthesized (DEL_O_1_21: 5’-amino- C12—GGA GCT TGT GAA TTC TGG ATC GAG GGA CGT GTG TGA ATT GTC-3’; underlined sequence represent coding sequence) and unambiguously encoded (DEL_O_2_201: 5'-GTA GTC GGA TCC GAC CAC TTCA CACA GAC AAT TCA CAC ACG TCC-3'; underlined sequence represent coding sequence ) as described above. ESI-MS DEL_O_1_21 D-desthiobiotin conjugate: expected: 13572; measured: 13573.

92 5.3 Library DEL 4000 selections

5.3.1 Streptavidin selection. The resulting library DEL 4000 (total oligonucleotide conjugate concentration 300 nM) was diluted 1:15 in PBS (20nM final concentration), spiked with D-desthiobiotin oligonucleotide-conjugate (final concentration 1 pM). 50 µL of the library 20nM was either added to 50 µl streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113- 01) or to 50 µl sepharose slurry without streptavidin. Both resins were preincubated with PBS, 0.3 mg/mL herring sperm DNA. After incubation for 1 h at 25 ºC the mixture was transferred to a SpinX column, the supernatant was removed, and the resin washed 4x with 400 µL PBS. After washing, the resin was resuspended in 100 µL water.

5.3.1.1 Identification of binding molecules. The codes of the oligonucleotide-compound conjugates were amplified by PCR (total volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5 µL of 100 fM DEL4000 library before selection as template, or 5 µL of each resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B (5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’) additionally contain at one extremity a 19 bp domain (underlined) required for high- throughput sequencing with the 454 Genome Sequencer system. The PCR products were purified on ion-exchange cartridges. Subsequent high-throughput sequencing was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform (Sequencing service by Eurofins MWG GmbH, Ebersberg, Germany). Analyses of the codes from high-throughput sequencing were performed by an in-house program written in C++. The frequency of each code has been assigned to each individual pharmacophore.

5.3.1.2 Synthesis of the binding molecules as fluorescein conjugates. In a polypropylene syringe, 50 mg (46 µmol) of O-bis-(aminoethyl)ethylene glycol trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the appropriate Fmoc-protected amino acid (100 µmol, 1 mL), HBTU (Aldrich, 200 µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight

93 incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc moiety was removed by addition of 1 mL piperidine (50% in dry DMF) for 1 h at 25 °C. After washing 6x with 2 mL dry DMF, the corresponding carboxylic acid (100 µmol, 1 mL DMF) was added and a further amide bond formation reaction was performed as described above. The resulting product was cleaved by treating the resin

10x with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were quenched

in 5 mL NaHCO3 aqsat and the water phase was back extracted 2x with 5 mL CH2Cl2.

The pooled organic phases were washed 3x with water, dried on Na2SO4 and concentrated in vacuo. The crude product was reacted with 2 equivalents of fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5 µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the mass of the expected FITC conjugate products: 02-78 (C45H49BrN4O10S2) m/z expected: 949.93 measured: 951.31 [M+H+]; 07-78 (C50H59N5O12S2) m/z expected: 985.36 measured: 986.37 [M+H+]; 15-78 (C44H49N5O12S3) m/z expected: 935.25 measured: 936.25 [M+H+]; 02-107 (C47H47BrN4O9S) m/z expected: 923.87 measured: 925.12 [M+H+]; 13-40 (C46H49N5O14S) m/z expected: 927.30 measured: 928.42 [M+H+]; 11-78 (C49H58N4O13S2) m/z expected: 974.34 measured: 975.41; 17-49 (C45H49IN4O11S) m/z expected: 980.22 measured: 981.29 [M+H+]; 17-78 (C45H49IN4O10S2) m/z expected: 996.19 measured: 997.26; 16-78 (C43H48N4O10S3) m/z expected: 876.25 measured: 877.33; 15-117 (C45H45N5O12S2) m/z expected: 911.25 measured: 912.33; 02-49 (C45H49BrN4O11S) m/z expected: 932.23; measured: 933.32 [M+H+].

5.3.2 Affinity measurements. In a total volume of 60 µL, fluorescein-compound conjugates (500 nM) were incubated with increasing amounts of streptavidin (from 10 nM to 200 µM, BIOSPA, cat.no S002-6) or MMP3 (from 33 nM to 40 µM) in PBS, 5% DMSO, for 1 h at 25 ºC. The fluorescence polarization was determined with a TECAN Polarion instrument by excitation at 485 nm and measuring emission at 535 nm (ε = 72000 M-1cm-1). [All the curves were fitted applying a formula derived as following: [FC] = [FC]0 - [C] and

94 2 Kd = ([FC]* [P])/ [C]; substituting and solving for [C]: [C] -([P]+ [FC]0+Kd)* [C]+

[P]*[FC]0 = 0. The solutions of the quadratic equation are: ±

considering that only the minus gives a meaningful solution and FP = a*[FC]+b*[C] = a*[FC]0+(b-a)*[C], the solution of the quadratic equation can be derived in FP and used in the fitting to determine the dissociation constant:

[FC] = fluorescein compound conjugates total molar concentration; [FC]0 = fluorescein compound conjugates initial molar concentration (in the experiment 500 nM); [C] = concentration of the complex; [P] = protein total molar concentration; FP = fluorescence polarization; a,b = proportionality constant; Kd = dissociation constant].

5.3.3 Polyclonal human IgG selection. The library DEL4000 (total oligonucleotide conjugate concentration 300 nM) was diluted 1:15 in PBS (20 nM final concentration). 50µL of the library 20 nM was added to 50 µl IgG-sepharose slurry. The resin was preincubated with PBS, 0.3 mg/mL herring sperm DNA (Sigma). After incubation for 1 hour at 25 °C the mixture was transferred to a SpinX column (Corning Costar Incorporated), the supernatant was removed, and the resin washed four times with 400 µL PBS. After washing, the resin was resuspended in 100 µL water.

5.3.3.1 Polyclonal human IgG coating of sepharose beads. 100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in 500 µL, 1 mM HCl, washed (10 times with 500 µL 1 mM HCl, 3 times with 500 µL

0.1 M NaHCO3aq), and mixed with 2.5 mg/ml polyclonal human IgG (Sigma-Aldrich-

Fluka, Buchs, Switzerland) dissolved in 1.2 mL 0.1 M NaHCO3aq. After 4 hour incubation at 4°C, the slurry was repeatedly and alternatively washed with 0.1 M

NaHCO3aq 0.1 M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in 1 mL of PBS at 4°C.

5.3.3.2 Identification of human IgG binding molecules. The codes of the oligonucleotide-compound conjugates were amplified by PCR (total

95 volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5 µL of 100 fM DEL4000 library before selection as template, or 5 µL of each resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B (5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’) additionally contain at one extremity a 19 bp domain (underlined) required for high- throughput sequencing with the 454 Genome Sequencer system. The PCR products were purified on ion-exchange cartridges. Subsequent high-throughput sequencing was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of the codes from high-throughput sequencing were performed by an in-house program written in C++. The frequency of each code has been assigned to each individual pharmacophore.

5.3.3.3 Synthesis of affinity chromatography resin containing the compound 02-40 or 16-40. In a polypropylene syringe, 129 mg (120 µmol) of O-bis-(aminoethyl)ethylene glycol trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the appropriate Fmoc-protected amino acid (180 µmol, 1 mL), HBTU (Aldrich, 360 µmol, 1mL), and DIEA (Fluka, 720 µmol, 0.5mL) in dry DMF. After overnight incubation at 25°C, the resin was washed 6x with 2 mL dry DMF and the Fmoc moiety was removed by addition of 3 mL piperidine (50% in dry DMF) for 1 h at 25 °C. After washing 6x with 2 mL dry DMF, 4-(4-(1-hydroxyethyl)-2-methoxy-5- nitrophenoxy)butanoic acid (40, 54 mg, 180 µmol, 1 mL DMF) was added and a further amide bond formation reaction was performed as described above. The resulting product was cleaved by treating the resin with 10x with 2 mL TFA (1% in

CH2Cl2). The dichloromethylene fractions were quenched in 5 mL NaHCO3 aqsat and

the water phase was back extracted 2x with 5mL CH2Cl2. The pooled organic phases

were washed 3 times with water, dried on Na2SO4 and concentrated in vacuo.

Following HPLC purification on an XTerra Prep RP18 column (5µM, 10x150mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions were collected and lyophilized. ESI-MS analysis confirmed the mass of the expected products m/z: [M+H+]; 02-40 (C29H41BrN4O9) m/z expected: 668.21 measured: 669.37 [M+H+]; 16-40 (C27H40N4O9S) m/z expected: 596.25 measured: 597.12 [M+H+]. 200mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was

96 swollen in 1 mM HCl, washed, and mixed in separate tubes with 15µmol of the

compounds dissolved in 2mL 0.1M NaHCO3aq, 10% DMF. After 4 hours incubation

at 25°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1 M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in PBS at 4°C.

5.3.3.4 Polyclonal human IgG Cy5 labeling. Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with Cy5 Mono-reactive kit (Amersham, cat.no PA25001) according to the protocol of the provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as described by the supplier.

5.3.3.5 Biotinylated polyclonal human IgG. Polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) was labelled with NHS-LC-Biotin reagent (Pierce, cat.no 21336) according to the protocol of the provider and purified over a PD10 column (GE Healthcare, cat.no 17-0851-01) as described by the supplier.

5.3.3.6 Affinity chromatography of CHO cells supernatant spiked with human IgG Cy5 labeled or biotinylated human IgG on IgG binding resin. 70mg of the resin containing compound 02-40 or 16-40 were loaded on a chromatography cartridge (Glen Research, cat.no 20-0030-00) and washed 3 times with 1mL PBS before loading a CHO cells supernatant (60 µL) spiked with human IgG Cy5 labeled (40 µL, 9.68 µM) or with biotinylated human IgG (30 µL, 17.2 µM). The flow-through, the washing fractions (washing 1 time with 10 mL PBS; 1x with 10 mL 500 mM NaCl, 0.5 mM EDTA; 1x with 10 mL 100 mM NaCl, 0.1% Tween 20, 0.5 mM EDTA) and the elutate (elution 3 times with 200 µL aqueous triethylamine 100 mM) were collected and concentrated back to a final volume of 100µL by centrifugation in a Vivaspin 500 tube (Vivascience, cat.no VS0101, cut-off 10.000 MW). The samples were then analyzed by gel electrophoresis on a NuPAGE 4-12% Bis-Tris Gel (Invitrogen, cat.no NP0321) using MOPS SDS as running buffer and stained with Coomassie Blue. Cy5 activity was detected by a Diana III Chemiluminescence Detection System (Raytest) by excitation at 675 nm and measuring emission at 694 nm (ε = 250,000 M-1cm-1). Western Blot analysis has been

97 performed transferring the proteins to NC membrane (Millipore, Billerica, MA, USA) with the Xcell II blot module (Invitrogen) using standard procedures. The membrane was quickly rinsed with water before soaking them twice in methanol. The membrane was dried at room temperature for 15 min and incubated for 1 h with 1:500 dilutions in 4% defatted milk-containing PBS of the following protein: Streptavidin- horseradish peroxidase conjugate (HRP-Streptavidin, Amersham Biosciences, Little Chalfont Buckinghamshire, UK, cat.no RPN1231V ). For detection of immunoreactive bands the membrane was washed three times for 5 min with PBS and soaked in chemiluminescent reagent (ECL1plus Western Blotting Detection System from Amersham Biosciences) for 5 sec and exposed to BioMax films (Kodak, Hemel, UK) in an autoradiographic cassette.

5.3.4 Human MMP3 selection. The library DEL4000 (total oligonucleotide conjugate concentration 300nM) was diluted 1:15 in PBS (20nM final concentration). 50µL of the library 20nM was added to 50µl MMP3-sepharose slurry. The resin was preincubated with PBS, 0.3mg/mL herring sperm DNA (Sigma). After incubation for 1 hour at 25°C the mixture was transferred to a SpinX column (Corning Costar Incorporated), the supernatant was removed, and the resin washed 4x with 400µL PBS. After washing, the resin was resuspended in 100µL water.

5.3.4.1 Human MMP3 coating of sepharose beads. 100mg CNBr-activated sepharose (GE Healthcare, Piscataway, NJ) was swollen in 1 mM HCl, washed, and mixed in separate tubes with 1 mg/ml polyclonal human IgG (Sigma-Aldrich-Fluka, Buchs, Switzerland) dissolved in. After 4 hour incubation at

4°C, the slurry was repeatedly and alternatively washed with 0.1M NaHCO3aq 0.1 M Tris-Cl, 0.5 M NaCl, pH 8.3 and 0.1 M NaOAc, 0.5 M NaCl, pH 4 then stored in PBS at 4°C.

98 5.3.4.2 Identification of human MMP3 binding molecules. The codes of the oligonucleotide-compound conjugates were amplified by PCR (total volume 50 µL, 25 cycles of 1 min at 94 ºC, 1 min at 55 ºC, 40 s at 72 ºC) with either 5 µL of 100 fM DEL4000 library before selection as template, or 5 µL of each resuspended resin after selection as template. The PCR primers DEL_P1_A (5’-GCC TCC CTC GCG CCA TCA GGG AGC TTG TGA ATT CTG G-3’) and DEL_P2_B (5’-GCC TTG CCA GCC CGC TCA GGT AGT CGG ATC CGA CCA C-3’) additionally contain at one extremity a 19 bp domain (underlined) required for high- throughput sequencing with the 454 Genome Sequencer system. The PCR products were purified on ion-exchange cartridges. Subsequent high-throughput sequencing was performed on a 454 Life Sciences-Roche GS 20 Sequencer platform. Analyses of the codes from high-throughput sequencing were performed by an in-house program written in C++. The frequency of each code has been assigned to each individual pharmacophore.

5.3.4.3 Synthesis of the MMP3 binding molecules as fluorescein conjugates. In a polypropylene syringe, 50 mg (46 µmol) of O-bis-(aminoethyl)ethylene glycol trityl resin (Novabiochem, cat.no 01-64-0235) was suspended in a mixture of the appropriate Fmoc-protected amino acid (100 µmol, 1 mL), HBTU (Aldrich, 200 µmol, 1 mL), and DIEA (Fluka, 400 µmol, 0.5 mL) in dry DMF. After overnight incubation at 25 °C, the resin was washed 6x with 2 mL dry DMF and the Fmoc moiety was removed by addition of 1 mL piperidine (50% in dry DMF) for 1 h at 25 °C. After washing 6 times with 2 mL dry DMF, the corresponding carboxylic acid (100 µmol, 1 mL DMF) was added and a further amide bond formation reaction was performed as described above. The resulting product was cleaved by treating the resin

10 times with 2 mL TFA (1% in CH2Cl2). The methylenchloride fractions were

quenched in 5 mL NaHCO3 aqsat and the water phase was back extracted 2 times with

5 mL CH2Cl2. The pooled organic phases were washed 3 times with water, dried on

Na2SO4 and concentrated in vacuo. The crude product was reacted with 2 equivalents

of fluorescein isothiocyanate (800 µL of DMF) and 200 µL NaHCO3 aqsat in the dark

overnight at 25 °C. Following HPLC purification on an XTerra Prep RP18 column (5 µM, 10x150 mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions were collected and lyophilized. 2 mg of the fluorescein conjugates were dissolved in DMSO as 5 mM stock solution. ESI-MS analysis confirmed the

99 mass of the expected FITC conjugate products: 02-118 (C55H63BrN4O10S) m/z expected: 1052.08 measured: 1053.30 [M+H+]; 13-17 (C49H55N5O11S) m/z expected: 921.36 measured: 922.42 [M+H+]; 15-117 (C45H45N5O12S2) m/z expected: 911.25 measured: 912.33 [M+H+]; 17-104 (C46H43I2N5O10S) m/z expected: 1111.08 measured: 1112.19 [M+H+]; 18-96 (C49H45BrN4O9S) m/z expected: 945.87 measured: 947.10 [M+H+].

5.3.5 Computational simulation The simulated distribution of number of codes represented by individual counts, which are related to the probability that certain counts are experimentally found in a non-biased mixture of equimolar compounds, was computed using home-written software. The basic principle used in the simulation relies on the computer-assisted random generation of numbers corresponding to any of the 4000 compounds in the library. The repetition of the simulation more than once allows the computation of fractional values for the number of codes associated to a given "count" value. For example, a number of code-value of 0.1 corresponds to the observation of a given "Counts" value in only 1 out of 10 simulations each performed with a total number of counts equal to the total number of experimental sequences in a given experiment.

5.4 Stepwise coupling by selective deprotection and reaction of di- amine derivatives.

5.4.1 DNA-compatible cleavage of different amino protective groups.

5.4.1.1 Synthesis of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid (1c). To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2- aminocyclopentanecarboxylic acid hydrochloric salt (17 mg) was added 0.1 mmol (1 eq.) of 4-pentenoic N-hydroxy succinimide ester (1e, for the synthesis see Chapter

5.4.1.4) and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted with ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10 mL

of brine and dried on Na2SO4. After removing of the solvent under vacuum the crudes

100 were dissolved in 1 mL of dry DMSO and used as such for the coupling to the 1 oligonucleotide. H NMR (400 MHz, CDCl3) δ = 1.73 (m, 2H), 1.82 (m, 1H), 2.10 (m, 3H), 2.32 (m, 2H), 2.41 (m, 2H), 3.15 (m, 1H), 4.55 (m, 1H), 5.10 (m, 2H), 5.82 13 (m, 1H), 6.37 (br, 1H) ppm. C NMR (100 MHz, CDCl3): δ = 22.1, 28.1, 29.6, 31.7, 35.7, 46.3, 52.2, 115.8, 136.7, 173.1, 178.0 ppm. ESI-MS 2-pent-4-enamido-cis- cyclopentanecarboxylic acid (C11H17NO3) m/z expected: 211.12 measured: 212.07 [M+H+].

5.4.1.2 Synthesis of N-Bpoc cis-2-aminocyclopentanecarboxylic acid (1d). Cis-2-aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg, 0.1 mmol) was dissolved in in 1 mL of water and Triton B was added (0.2 mmol, 0.1 mL, d = 0.920 g/mL, 40% MeOH). Following evaporation of MeOH under reduced preassure, DMF (2 mL) was added to the residue and the suspension evaporated under high vacuum at 50 °C. This procedure was repeated 3 times and 5 mL DMF was then added to the residue followed by 0.1 mmol of methyl 4-((2-(biphenyl-4-yl)propan-2- yloxy)carbonyloxy)benzoate (Bpoc carbonate reagent, 1 eq., 39 mg). The suspension was heated at 50 °C and stirred for 5 h, during which time the solids dissolved. Afterwards the DMF was removed at 50 °C under reduced pressure and the residue distributed between water (10 mL) and ether (5 mL). To facilitate the phase separation the aqueous phase was acidified with citric acid until pH = 4 and then extracted 5 times with 5 mL of ether. The collected ether phases were washed with citric/citrate

aqueous buffer pH =4 2 times 10 mL, with water 2 times 5 mL and dried (Na2SO4). After removing of the solvent under vacuum the crudes were dissolved in 1 mL of dry DMSO and used as such for the coupling to the oligonucleotide. 1H NMR (400 MHz, MeOD) δ = 1.55-1.80 (m, 12H), 2.73 (m, 1H), 3.95 (m, 1H), 7.33 (t, J = 8 Hz, 1H), 7.44 (m, 4H), 7.58 (m, 4H) ppm. 13C NMR (100 MHz, MeOD): δ = 23.2, 24.1, 29.5 (2C), 32.3, 50.8, 54.2, 81.7, 125.9, 127.8, 127.9, 128.3, 129.8, 140.8, 142.1, 147.3, 157.2, 181.7 ppm. ESI-MS N-Bpoc-cis-2-aminocyclopentanecarboxylic acid (C22H25NO4) m/z expected: 367.18 measured: 390.04 [M+Na+].

5.4.1.3 Synthesis of N-Nvoc cis-2-aminocyclopentanecarboxylic acid (1b). To 4 mL dioxane/water solution (1:1) of 0.1 mmol of cis-2- aminocyclopentanecarboxylic acid hydroclhloric salt (17 mg) was added 0.1 mmol (1 eq.) of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at

101 room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted with ethyl acetate (5 times, 5 mL). The organic phases collected were washed with 10

mL of brine and dried on Na2SO4. After removing of the solvent under vacuum the crudes were dissolved in 1 mL of dry DMSO and used as such for the coupling to the oligonucleotide. 1H NMR (400 MHz, MeOD) δ = 1.61-1.92 (m, 6H), 2.82 (m, 1H), 3.85 (s, 3H), 3.90 (s, 3H), 3.93 (m, 1H), 5.44 (s, 2H), 7.11 (s, 1H), 7.71 (s, 1H), 7.89 (s, 1H) ppm. 13C NMR (100 MHz, MeOD): δ = 21.1, 25.3, 32.6, 46.1, 52.1, 55.8 61.3, 113.5, 128.9, 132.2, 143.3, 145.2, 159.6, 163.1, 173.8 ppm. ESI-MS confirmed the mass of the expected products: N-Nvoc-cis-2-aminocyclopentanecarboxylic acid (C16H20N2O8) m/z expected: 368.12 measured: 369.41 [M+H+].

5.4.1.4 Synthesis of 4-pentenoic N-hydroxy succinimide ester (1e).

N-hydroxysuccinimide (14 mmol, 1.61g) was suspended in CH2Cl2 (10mL) with diisopropylethylamine (14 mmol). A solution of pent-4-enoyl chloride (13 mmol, d =

1.074 g/ml) in CH2Cl2 (10 mL) was added drop wise in 1 h to the suspension. The mixture, during which time turned into a yellowish solution, was stirred for further 4

hours and then poured onto water (80 mL) and the water phase extracted with CH2Cl2 (5 times, 5 mL). The organic phase was washed 2 times with 10 mL water, dried

(Na2SO4) and concentrated under reduced pressure. The crude product as white solid was used as such in the further reaction. 1H NMR

(400 MHz, CDCl3) δ = 2.44 (m, 2H), 2.63 (t, J = 7.9 Hz, 2H), 2.7-2.8 (m, 2H), 5.02

(dd, J1 = 20 Hz, J2 = 2 Hz, 1H), 5.64 (dd, J1 = 24 Hz, J2 = 2 Hz, 1H), 5.77 (m, 1H) 13 ppm. C NMR (100 MHz, CDCl3): δ = 25.6, 28.3, 30.3, 99.1, 116.6, 135.2, 168.0, 169.1 ppm. ESI- MS 4-pentenoic N-hydroxy succinimide ester (C9H11NO4): m/z expected: 197.07 measured: 198.06 [M+H+].

5.4.1.5 Synthesis of Nα-Fmoc-Nε-Nvoc-lysine (2).

To 4 mL of dioxane/water solution (1:1) of 0.1 mmol of Nα-Fmoc-lysine was added

0.1 mmol of NvocCl and 0.4 mmol (34 mg) NaHCO3. The mixture was allowed to stir at room temperature for 3 h then poured into aqueous 0.1 N HCl (20 mL) and extracted with ethyl acetate (5 times, 5 mL). The organic phases collected were

washed with 10 mL of brine and dried on Na2SO4. After removing of the solvent under vacuum the crudes were dissolved in 1 mL of dry DMSO and used as such for

102 1 the coupling to the oligonucleotide. H NMR (400 MHz, d6DMSO) δ = 1.4-1.75 (m, 6H), 3.1 (m, 2H), 3.85 (s, 3H), 3.89 (s, 3H), 3.93 (m, 1H), 4.20-4.29 (m, 3H), 5.44 (s, 2H), 7.17 (s, 1H), 7.33 (t, J =8.9 Hz, 2H), 7.43 (t, J =8.9 Hz, 2H), 7.70 (s, 1H), 7.75 13 (d, J =8.8 Hz, 2H), 7.88 (d, J =8.9 Hz, 2H). C NMR (100 MHz, d6DMSO): δ = 23.6, 29.8, 32.3, 37.9, 46.6, 53.2, 57.2 (2C), 64.3, 66.6, 108.3, 120.1, 125.2, 126.8, 128.1, 129.0, 139.2, 141.3, 144.3, 147.6, 154.4, 156.7, 157.3, 174.5 ppm. ESI-MS N-Fmoc- N’-Nvoc-lysine (C31H33N3O10) m/z expected: 607.22 measured: 608.34 [M+H+].

5.4.1.6 Oligonucleotide conjugation of Bpoc or Nvoc N-protected cis-2-

aminocyclopentanecarboxylic acid derivatives and Nα-Fmoc Nε-Nvoc-lysine. To a reaction volume of 300 µL, containing 70% (v/v) DMSO/water were added 5 µL either of the crude N-protected (Bpoc or Nvoc) cis-2-aminocyclopentanecarboxylic acid derivative DMSO solution or of the crude Nα-Fmoc-Nε-Nvoc-lysine DMSO solution and in the order the following compounds to the respective final concentrations: N-hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3- dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 50 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’) and 1. All coupling reactions were stirred overnight at 25 °C; residual activated species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to the reaction mixture. The reactions were then purified by HPLC and the desired fractions were dried under reduced pressure and redissolved in 100 µL of water and an amount (ca. 1 nmol) analyzed by LC-ESI-MS. ESI-MS N-Nvoc cis-2- aminocyclopentanecarboxylic olgionucleotide conjugate: expected: 13676 measured: 13678; ESI-MS N-Bpoc cis-2-aminocyclopentanecarboxylic olgionucleotide

conjugate: expected: 13675 measured: 13674; ESI-MS Nα-Fmoc-Nε-Nvoc-lysine olgionucleotide conjugate: expected: 13915 measured: 13916.

5.4.1.7 Cleavage of 2-pent-4-enamido-cis-cyclopentanecarboxylic acid oligonucleotide conjugate. 150 μL of a water solution 27 μM of the oligonucleotide conjugate (according to absorption measurement at 260 nm using a NanoDrop instrument) were added to 150

103 μL of 200 mM I2 solution in THF. After 1h at room temperature the reaction was quenched with 100 μL of aqueous 1M sodium thiosolfate and purified over HPLC (yield 80%). The desired fractions were dried under reduced pressure and analyzed by LC-ESI-MS, revealing the expected product. ESI-MS cis-2- aminocyclopentanecarboxylic oligonucleotide conjugate: expected: 13437 measured: 13437.

5.4.1.8 Cleavage of N-Bpoc cis-2-aminocyclopentanecarboxylic acid. oligonucleotide conjugate. 150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to absorption measurement at 260 nm using a NanoDrop instrument) were added to 150 μL of aqueous AcOH/AcONa pH 3-4 and heated at 35 ºC for 1h. Subsequently the mixture was directly injected in HPLC (yield 90%). The desired fractions were dried under reduced pressure and analyzed by LC-ESI-MS, revealing the expected product. ESI-MS cis-2-aminocyclopentanecarboxylic oligonucleotide conjugate: expected: 13437 measured: 13438.

5.4.1.9 Cleavage of N-Nvoc cis-2-aminocyclopentanecarboxylic acid and Nα-Fmoc

Nε-Nvoc-lysine oligonucleotide conjugate. 150 μL of a water solution 27 mM of the oligonucleotide conjugate (according to absorption measurement at 260 nm using a NanoDrop instrument) were added to 150 μL of aqueous 2mM AcOH/AcONa pH 4.7 in a pyrex glass vial and irradiated at 366 nm at 4 ºC for 30 min. Subsequently the mixture was directly injected in HPLC (quantitative conversion). The desired fractions were dried under reduced pressure and analyzed by LC-ESI-MS, revealing the expected product. Notably, using Nα-

Fmoc, Nε-Nvoc-lysine oligonucleotide conjugate none Fmoc cleavage was observed. ESI-MS cis-2-aminocyclopentanecarboxylic olgionucleotide conjugate: expected:

13437 measured: 13439. ESI-MS Nε-Nvoc-lysine olgionucleotide conjugate: expected: 13676 measured: 13679.

104 5.4.2 Synthesis of model scaffolds for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library.

5.4.2.1 Synthesis of (1R,3R,4R)-methyl 3-azido-4-Boc-amino- cyclopentanecarboxylate (4).

To a solution of the alcohol 3 (1 mmol, 259 mg) in CH2Cl2 (20 mL) was added triethylamine (3 mmol, 0.45 mL) and methanesulfonyl chloride (1.6 mmol). The solution was stirred 45 min and then treated with water (100 mL). The water phase

was extracted with CH2Cl2 (5 times, 25 mL) and the organic extract washed with

brine (2×25 mL), dried (Na2SO4) and concentrated under reduced pressure. The crude was dissolved in 20 mL of DMF and a solution of sodium azide (20 mL DMF) added. The suspension was heated at 70 ºC for 8h and then quenched in water (100 mL) and extracted in ethyl acetate (5 times, 25 mL). Subsequently the organic extract was washed with brine (2×25 mL), dried (Na2SO4) and concentrated under reduced pressure prior to use in the further reaction. 1H NMR (400 MHz, MeOD) δ = 1.33 (s, 9H), 1.46 (m, 1H), 1.61 (m, 2H), 2.11 (m, 2H), 2.75 (m, 1H), 3.37 (m, 1H), 3.77 (s, 3H). ESI-MS C12H20N4O4 m/z expected: 284.15 measured: 306.96 [M+Na+].

5.4.2.2 Synthesis of (1S,3R,4R)-methyl 3-amino-4-Boc-amino- cyclopentanecarboxylate (5). To a stirred suspension of the crude 4 (ca. 1 mmol) and Pd/C (102mg, 10% Pd) in 5 mL MeOH was added an overpressure of H2 for 3h. Subsequently catalyst was filtered off and the MeOH removed at reduced pressure. The crude was used as such in the further reaction. 1H NMR (400 MHz, MeOD) δ = 1.38 (s, 9H), 1.47 (m, 1H), 1.59 (m, 2H), 2.26 (m, 2H), 3.10 (m, 1H), 3.57 (m, 1H), 3.77 (s, 3H).ESI-MS C12H22N2O4 m/z expected: 258.16 measured: 259.00 [M+H+].

5.4.2.3 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Boc-amino- cyclopentanecarboxylate (6). FmocCl (1.25 mmol, 323 mg) and diisopropylethylamine (2.5 mmol, 0.44 mL) were added to a solution of the crude 5 (ca. 1 mmol) in 25 mL DMF. Following 3h stirring, the mixture was treated with water (100 mL) and the water phase extracted with ethyl acetate (5 times, 25 mL) and the organic extract washed with brine (2 times 25 mL),

105 dried (Na2SO4) and concentrated under reduced pressure. The crude was directly used 1 in the next reaction. H NMR (400 MHz, CDCl3) δ = 1.41 (s, 9H), 1.46 (m, 1H), 1.55 (m, 2H), 2.32 (m, 2H), 3.77 (s, 3H), 3.96 (m, 1H), 4.12 (m, 1H), 4.77 (m, 1H) 5.02 (m, 2H), 7.29-7.41 (m, 4H), 7.53-7.89 (m, 4H). [M+H+].ESI-MS C27H32N2O6 m/z expected: 480.23 measured: 481.06 [M+H+].

5.4.2.4 Synthesis of (1S,3R,4R)-methyl 3-Fmoc-amino-4-Nvoc-amino- cyclopentanecarboxylate (8). To water/dioxane 1:1 (20 mL) sospension of the crude of 6 (ca. 1 mmol) was added 10 mL 6 N aqueous HCl and the mixured allowed stirring at 50 ºC for 5h, during which time the solids dissolved. The solvent was then removed and the residue dissolved again in water/dioxane 1:1 (10 mL). Following adjustment of the pH = 9 by

Na2CO3, NvocCl (1.5 mmol, 414 mg) was added and the reaction stirred for 3h at room temperature. The mixture was then treated with water (50 mL) and the water phase extracted with ethyl acetate (5 times, 10 mL) and the organic extract washed with brine (2×15 mL), dried (Na2SO4) and concentrated under reduced pressure.

Preparative HPLC were performed on an XTerra Prep RP18 column (5µm, 10x150mm) using a linear gradient from 10% to 100% MeCN 0.1% TFA, the desired fractions were collected and lyophilized (yellowish solid, 73 mg). 1H NMR (400 MHz, MeOD) δ = 1.21 (m, 1H), 1.71 (m, 2H), 2.23 (m, 2H), 2.85 (m, 1H), 2.89 (m, 1H), 3.73 (s, 3H), 3.81 (s, 3H), 3.90-3.92 (m, 2H), 4.3 (m, 1H), 5.14 (d, J =20, 1H), 5.41 (d, J =20, 1H), 7.13-7.63 (m, 10H). ESI-MS C31H31N3O10 m/z expected: 605.20 measured: 606.02 [M+H+].

5.5 Stepwise encoding Gel electrophoresis was performed either using 15% Tris-Borate-EDTA-Urea denaturing polyacrylamide gels (TBE-Urea, Invitrogen, cat.no EC68852) or 20% Tris-Borate-EDTA native polyacrylamide gels (TBE, Invitrogen, cat.no EC63152) and stained with SYBRgreenII. DNA Ethanol precipitation of DNA was performed by adding 1/10 volumes of 3M AcOH/AcONa buffer pH 4.7, and 3 volumes of ethanol relative to the volume of the DNA sample. After 2h incubation at -23 ºC the mixture was centrifuged in a table-top centrifuge for 40min (16.000g) at 4 ºC, the supernatant

106 removed and the pellet washed with 300 µL ice-cold ethanol 90%. After a further 20 min centrifugation (16.000g) at 4 ºC, the pellet was dried and redissolved in water.

5.5.1 Stepwise encoding by Ligation. Hybridization of 3 pairs (A, B, C) of oligonucleotides (A: 5’-CAT GGA ATT CGC TCA CTC CGA CTA GAG G-3’ and 5’-(Phosphate)-CGT ACC TCT AGT CGG AGT GAG CGA ATT CCA TG-3’; B: 5’-(Phosphate)-TAC GTG AGC TTG ACC TGG TGA G-3’ and 5’- (Phosphate)-GCT TCT CAC CAG GTC AAG CTC A-3’; C: 5’-(Phosphate)-AAG CAC GTT CGC TGG ATC CTC AAC TGT G-3’ and 5’-CAC AGT TGA GGA TCC AGC GAA CGT-3’; underlined sequences represent coding sequences) was carried out by mixing the oligonucleotides at a concentration of 1.25 µM per oligonucleotide in 1x ligase buffer (40 mM Tris-HCl, 10 mM MgCl2, 10 mM DTT, 0.5 mM ATP, pH 7.8) and incubating the mixtures for 10 minutes at 50 °C. Subsequently the ligations were performed mixing 10 µl of hybridized oligonucleotide pairs A and B with 10 µl of 1x ligase buffer and 1 µL of T4 ligase (Roche Applied Science, Basel, Switzerland), and incubated at 25 ºC for 2 hours. The ligation product was purified using a Qiagen Nucleotide Removal Kit, and eluted with 50 µl of 10 mM Tris-HCl pH 8.0. 18 µl of the eluate was mixed with 10 µl of hybridized oligonucleotide pair C (which was present in excess), 2 µl of 10x ligase buffer, and 1 µl of T4 ligase, and incubated for 2 hours at 25 ºC. Aliquots of the two starting oligonucleotides, and the different ligation products were subjected to electrophoresis on a 20% TBE gel.

5.5.2 Stepwise encoding by a combination of Klenow polymerase and Ligation. To a reaction volume of 50 µL, reagents were added to the respective final concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, a 42mer 5’- C6-biotinylated-oligonucleotide containing the non-palindromic BssSI restriction site (in boldface type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT CAC ACA CGT CC–3‘; underlined sequences represent coding sequences), 3 µM, klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow Polymerase enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture was purified on ion-exchange cartridge and eluted in 25 µL of water. 8 units of BssSI enzyme were added to the purified Klenow product in 50 µL of BssSI restriction

107 buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50 µL of streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added and the slurry was incubated for 30 min at 4 ºC. After SpinX column centrifugation, the supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL of water. Subsequently were added the following reagents to the final volume of 50 µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate- TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA TCC TAG CAA ATT TC–3`), 3 µM, T4 ligase buffer (Roche Applied Science, Basel, Switzerland) and T4 ligase (Roche Applied Science, Basel, Switzerland), 4 units. The ligation was performed overnight at 16ºC then purified on ion-exchange cartridge. Aliquots of the starting oligonucleotides, and the different Klenow, restriction and ligation products were analyzed on a 15% TBE-Urea gel. Sequencing of the excised band after three stepwise encoding confirmed the identity of the expected product.

5.5.3 Stepwise encoding by Klenow Polymerase. To a reaction volume of 50 µL, reagents were added to the respective final concentrations: a 42mer 5’-amino-C12-DNA-oligonucleotide (5’-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’), 2 µM, 42mer 3’- C6-biotinylated-oligonucleotide (5’-GTA GTC GGA TCC GAC CAC GTT CCT GAC AAT TCA CAC ACG TCC-3’; underlined sequences represent coding sequences), 3 µM, Klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5mM, Klenow Polymerase enzyme, 5 units. The Klenow polymerization reaction was incubated at 37 ºC for 1 h, purified on ion-exchange cartridge and eluted in 100 µL of 4 M urea. After incubating at 94 ºC for 2 min, 50 µl of streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) were added and the slurry was incubated for 1 h at 4 ºC. The streptavidin sepharose resin and the supernatant were separated by centrifugation in a SpinX column. The DNA in the supernatant was ethanol precipitated as described above. The resulting single-stranded oligonucleotide was mixed with a 42mer unmodified DNA oligonucleotide (5’-GTC GTA TCG CCA TGG TCC AAC ATC GTA GTC GGA GAG GAC CAC-3’) and a Klenow polymerization reaction was performed as described above. Aliquots of the three starting oligonucleotides, and the different Klenow products were applied on a 15% TBE-Urea gel.

108 5.5.4 Stepwise coupling and encoding of model compound for Nα-Fmoc, Nε-Nvoc di-amino carboxylic acid derivative based library. To a reaction volume of 310 µL, containing 70% (v/v) DMSO/water, compounds were added to the respective final concentrations in the order: N-Fmoc-N’-Nvoc- lysine (2) DMSO solution, 4 mM; N-hydroxysulfosuccinimide in DMSO, 10 mM; N- ethyl-N’-(3-dimethylaminopropyl)-carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride solution, pH 9.0, 80 mM; oligonucleotide aqueous solution, 100 µM, (5’-amino-C12-GGA GCT TGT GAA TTC TGG GTT AGT GGA CGT GTG TGA ATT GTC-3’, underlined sequence represent coding sequence). The reaction was stirred overnight at 25 °C; residual activated species were then quenched and simultaneously Fmoc deprotected by addition of piperidine (500 mM in DMSO). Prior to HPLC purification 500 µL of 100 mM TEAA, pH 7.0, was added to the reaction mixture. The reaction was then purified by HPLC and the desired fractions were dried under reduced pressure and redissolved in 100 µL of water and analyzed by LC-ESI-MS. The sample showed the expected Fmoc-deprotected product N’- Nvoc-lysine DNA-oligonucleotide conjugate. Subsequently a further peptide forming reaction step was performed. Therefore to a final volume of 310 µL, containing 70% (v/v) DMSO/water, the following compounds were added to the respective final concentrations: 3-p-tolylpropanoic acid DMSO solution, 4 mM; N- hydroxysulfosuccinimide in DMSO, 10 mM; N-ethyl-N’-(3-dimethylaminopropyl)- carbodiimide in DMSO, 4 mM; aqueous triethylamine hydrochloride, pH9.0, 80 mM; 50 µM N’-Nvoc-lysine DNA-oligonucleotide conjugate. The reaction was stirred overnight at 25 °C; residual activated species were then quenched by addition of 50 µL Tris-Cl buffer, 500 mM pH 9.0. The mixture was allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice- cold 90% (v/v) ethanol and then dissolved in 300 µL of aqueous 1mM AcOH/AcONa pH 4.7 in a pyrex glass vial. Following irradiation at 366 nm at 4 ºC for 30 min, an aliquot of the mixture was injected in HPLC and the desired fractions analyzed by LC-ESI-MS, revealing the expected product (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate. The encoding of the 3-p-tolylpropanoyl moiety was achieved adding the following reagents to a final volume of 50 µL to the respective

109 final concentrations: (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate (5’-GGA GCT TGT GAA TTC TGG GTT AGT GGA CGT GTG TGA ATT GTC-3’, underlined sequence represent coding sequence), 2 µM, a 42mer 5’-C6-biotinylated- oligonucleotide containing the non-palindromic BssSI restriction site (in boldface type) (5‘-GTA GTC GGA CAC GAG TAC TGG TAA TCG ACA ATT CAC ACA CGT CC–3‘; underlined sequences represent coding sequences), 3 µM, klenow buffer, dNTPs (Roche, cat.no 11969064001), 0.5 mM, and Klenow Polymerase enzyme, 5 units. After incubation at 37 ºC for 1 h, the reaction mixture was purified on ion-exchange cartridge and eluted in 50 µL of water. Subsequently the encoded 40 µL (3-p-tolylpropanoyl)-lysine oligonucleotide conjugate was coupled to Cy5 by addition of Cy5-NHS ester (Amersham, cat.no PA25001) and aqueous triethylamine hydrochloride solution, pH 9.0, to a final concentration of 4 mM and 80 mM respectively. The reaction was stirred overnight at 25 °C. The mixture was then allowed to quantitatively precipitate by sequential addition of 25 µL of 1 M acetic acid, 12.5 µL of 3 M sodium acetate buffer, pH 4.7 and 500 µL ethanol followed by 2 h incubation at-23 ºC. The DNA was centrifuged and the resulting oligonucleotide pellet was washed with ice-cold 90% (v/v) ethanol. Ultimately, the encoding of the Cy5 moiety was performed. Therefore 8 units of BssSI enzyme were added to the N’- Cy5-N-(3-p-tolylpropanoyl)-lysine oligonucleotide conjugate in 50 µL of BssSI restriction buffer. The restriction cutting reaction was carried out at 37 ºC for 1.5 h. 50 µL of streptavidin-sepharose slurry (GE Healthcare, cat.no 17-5113-01) was added and the slurry incubated for 30 min at 4 ºC. After SpinX column centrifugation, the supernatant was collected and purified on ion-exchange cartridge and eluted in 25 µL of water. Subsequently were added the following reagents to the final volume of 50 µL: preincubated mixture 1:1 of hybridized oligonucleotides (27mer 5`-phosphate- TCG TGA AAT TTG CTA GGA TCC ATA TTG–3` and 23mer 5`-CAA TAT GGA TCC TAG CAA ATT TC–3`, underlined sequences represent coding sequences), 3 µM, T4 ligase buffer (Roche Applied Science, Basel, Switzerland) and T4 ligase (Roche Applied Science, Basel, Switzerland), 4 units. The ligation was performed overnight at 16ºC then purified on ion-exchange cartridge. Aliquots of the starting oligonucleotides, and the different Klenow, restriction and ligation products were analyzed on a 15% TBE-Urea gel. Sequencing of the excised band after three stepwise encoding confirmed the identity of the expected product.

110 5.5.5 Bacterial cloning and sequencing. Following gel polyacrylamide electrophoresis the band of interest was excised, extracted in aqueous TrisCl 10 mM and PCR amplified using the following oligonucleotides as primer: DEL_P1 (5’-GGA GCT TGT GAA TTC TGG-3’, underlined EcorI restriction site) and DEL_P2 (5’-GTA GTC GGA TCC GAC CAC- 3’, underlined BamHI restriction site). The PCR products were purified on ion- exchange cartridges and cloned in pUC19 vector using the restriction sites EcorI and BamHI and electroporated in TG1 bacteria. Sequencing of the vector in a number of colonies was performed using an ABI PRISM 3130 Genetic Analyzer (Applied Biosystem).

111

6. REFERENCES

1 E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J. Baldwin, K. Devon, K. Dewar and M. Doyle et al., Initial sequencing and analysis of the human genome, Nature 409 (2001),860–921.

2 J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton, H.O. Smith, M. Yandell and C.A. Evans et al., The sequence of the human genome, Science 291 (2001), 1304–1351.

3 S.D. Patterson and R.H. Aebersold, Proteomics: the first decade and beyond, Nat. Genet. 33 (2003) (Suppl.), 311–323.

4 Robert L. Strausberg and Stuart L. Schreiber, From Knowing to Controlling: A Path from Genomics to Drugs Using Small Molecule Probes, Science 300 (2003), 294-295.

5 Stoughton RB, Applications of DNA Microarrays in Biology, Annu Rev Biochem 74 (2005), 53-82

6 Drews J. Drug discovery: a historical perspective. Science 287 (2000), 1960- 1964

7 A. Furka, F. Sebestyen, M. Asgedom and G. Dibo, General method for rapid synthesis of multicomponent peptide mixtures, Int. J. Pept. Protein Res. 37 (1991), 487–493.

8 R.A. Houghten, C. Pinilla, S.E. Blondelle, J.R. Appel, C.T. Dooley and J.H. Cuervo, Generation and use of synthetic peptide combinatorial libraries for basic research and drug discovery, Nature 354 (1991), 84–86.

9 K.S. Lam, S.E. Salmon, E.M. Hersh, V.J. Hruby, W.M. Kazmierski and R.J. Knapp, A new type of synthetic peptide library for identifying ligand-binding activity, Nature 354 (1991), 82–84.

112

10 R.B. Merrifield, Solid phase peptide synthesis. I. The synthesis of a tetrapeptide, J. Am. Chem. Soc. 85 (1963), 2149–2154.

11 R. Frank, W. Heikens, G. Heisterberg-Moutsis and H. Blocker, A new general approach for the simultaneous chemical synthesis of large numbers of oligonucleotides: segmental solid supports, Nucl. Acids Res. 11 (1983), 4365– 4377.

12 R.A. Houghten, General method for the rapid solid-phase synthesis of large numbers of peptides: specificity of antigen-antibody interaction at the level of individual amino acids, Proc. Natl. Acad. Sci. U. S. A 82 (1985), 5131–5135.

13 G.P. Smith, Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface, Science 228 (1985), 1315–1317.

14 T. Clackson, H.R. Hoogenboom, A.D. Griffiths and G. Winter, Making antibody fragments using phage display libraries, Nature 352 (1991), 624– 628.

15 E.T. Boder and K.D. Wittrup, Yeast surface display for screening combinatorial polypeptide libraries, Nat. Biotechnol. 15 (1997), 553–557.

16 J. Hanes and A. Pluckthun, In vitro selection and evolution of functional proteins by using ribosome display, Proc. Natl. Acad. Sci. U. S. A 94 (1997), 4937–4942.

17 J. Bertschinger, D. Grabulovsky, D. Neri, Selection of single domain binding proteins by covalent DNA display, Protein Eng Des Sel 20 (2007), 57-68.

18 S. Brenner and R.A. Lerner, Encoded combinatorial chemistry, Proc. Natl. Acad. Sci. U. S. A. 89 (1992), 5381–5383.

113

19 J. Nielsen, S. Brenner and K.D. Janda, Synthetic methods for the implementation of encoded combinatorial chemistry, J. Am. Chem. Soc. 115 (1993), 9812–9813.

20 M.C. Needels, D.G. Jones, E.H. Tate, G.L. Heinkel, L.M. Kochersperger, W.J. Dower, R.W. Barrett and M.A. Gallop, Generation and screening of an oligonucleotide-encoded synthetic peptide library, Proc. Natl. Acad. Sci. U. S. A. 90 (1993), 10700–10704.

21 Meo T, Gramsch C, Inan R, Hollt V, Weber E, Herz A, Riethmuller G, Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid peptides exhibits the specificity requirements of mammalian opioid receptors, Proc. Natl. Acad. Sci. U. S. A. 80 (1983), 4084-4088.

22 Mukund S. Chorghade, Drug Discovery and Development - Combinatorial Chemistry in the Drug Discovery Process ISBN: 9780471398486 Ed. 2006 John Wiley & Sons, Inc., 129-167

23 K. FitzGerald, In vitro display technologies – new tools for drug discovery, Drug Discov. Today 5 (2000), 253–258.

24 Pedersen, H., Gouilaev, A.H., Sams, K.C., Slok, F.A., Freskgard, P.-O., Holtmann, A., Kampmann Olsen, E., Husemoen Gitte, N., Felding, J., et al., 2002. Methods for template-directed synthesis of and modification of polymers and screening for desired activity. WO02103008.

25 Pedersen, H., Holtmann, A., Franch, T., Gouliaev, A.H., Felding, J., 2003. Methods for template-directed synthesis of and modification of polymeric libraries and their use in screening for biological activity. WO03078625.

26 Freskgard, P.-O., Franch, T., Gouliaev, A.H., Lundorf, M.D., Felding, J., Olsen, E.K., Holtmann, A., Jakobsen, S.N., Sams, C., et al., 2004. Bifunctional substances and their use in preparation and enzyme-based encoding of combinatorial libraries. WO2004039825.

114

27 Morgan, B., Hale, S., Arico-Muendel, C.C., Clark, M., Wagner, R., Israel, D.I., Gefter, M.L., Benjamin, D., Hansen, N.J.V., et al., 2004. Methods and building blocks for synthesis of combinatorial libraries of mols. comprising functional moieties operatively linked to encoding oligonucleotides. WO2005058479.

28 D.R. Halpin and P.B. Harbury, DNA display I. Sequence-encoded routing of DNA populations, PLoS Biol. 2 (2004), 1015–1021.

29 D.R. Halpin and P.B. Harbury, DNA display. II. Genetic manipulation of combinatorial chemistry libraries for small-molecule evolution, PLoS Biol. 2 (2004), 1022–1030.

30 T. Meo, C. Gramsch, R. Inan, V. Hollt, E. Weber, A. Herz and G. Riethmuller, Monoclonal antibody to the message sequence Tyr-Gly-Gly-Phe of opioid peptides exhibits the specificity requirements of mammalian opioid receptors, Proc. Natl. Acad. Sci. U. S. A 80 (1983),. 4084–4088.

31 D.R. Halpin, J.A. Lee, S.J. Wrenn and P.B. Harbury, DNA display III. Solid- phase organic synthesis on unprotected DNA, PLoS Biol. 2 (2004), 1031– 1038.

32 C.A. Lipinski, F. Lombardo, B.W. Dominy and P.J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliv. Rev. 46 (2001), 3–26.

33 Z.J. Gartner and D.R. Liu, The generality of DNA-templated synthesis as a basis for evolving non-natural small molecules, J. Am. Chem. Soc. 123 (2001), 6961–6963.

34 C.T. Calderone, J.W. Puckett, Z.J. Gartner and D.R. Liu, Directing otherwise incompatible reactions in a single solution by using DNA-templated organic synthesis, Angew Chem., Int. Ed. Engl. 41 (2002), 4104–4108.

115

35 D. Summerer and A. Marx, DNA-templated synthesis: more versatile than expected, Angew Chem., Int. Ed. Engl. 41 (2002), 89–90.

36 X. Li and D.R. Liu, DNA-templated organic synthesis: nature's strategy for controlling chemical reactivity applied to synthetic molecules, Angew Chem., Int. Ed. Engl. 43 (2004), 4848–4870.

37 M.W. Kanan, M.M. Rozenman, K. Sakurai, T.M. Snyder and D.R. Liu, Reaction discovery enabled by DNA-templated synthesis and in vitro selection, Nature 431 (2004), 545–549.

38 Z.J. Gartner, M.W. Kanan and D.R. Liu, Multistep small-molecule synthesis programmed by DNA templates, J. Am. Chem. Soc. 124 (2002), 10304–10306.

39 T.M. Snyder and D.R. Liu, Ordered multistep synthesis in a single solution directed by DNA templates., Angew Chem., Int. Ed. Engl. 44 (2005), 7379– 7382.

40 J.B. Doyon, T.M. Snyder and D.R. Liu, Highly sensitive in vitro selections for DNA-linked synthetic small molecules with protein binding affinity and specificity, J. Am. Chem. Soc. 125 (2003), 12372–12373.

41 Z.J. Gartner, B.N. Tse, R. Grubina, J.B. Doyon, T.M. Snyder and D.R. Liu, DNA-templated organic synthesis and selection of a library of macrocycles, Science 305 (2004), 1601–1605.

42 Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi E., Dumelin C., Melkko S, Neri D., High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries, Proc. Natl. Acad. Sci. U. S. A. 105(46), (2008), 17670-17675.

43 Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D., Design and synthesis of a novel DNA-encoded chemical library using Diels- Alder cycloadditions, Bioorg Med. Chem. Lett. 18(22), (2008), 5926-5931.

116

44 Margulies M., et al., Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057) (2005), 376-80.

45 S.C. Schuster, Nat. Methods 5 (1) (2008), 16-18.

46 K. Hoogsteen, The crystal and molecular structure of a hydrogen-bonded complex between 1-methylthymine and 9-methyladenine. Acta Crystallographica 16 (1963), 907-916.

47 S. Melkko, J. Scheuermann, C.E. Dumelin and D. Neri, Encoded self- assembling chemical libraries, Nat. Biotechnol. 22 (2004), 568–574.

48 Cheng Y.K., Pettitt B.M., Stabilities of double- and triple-strand helical nucleic acids. Prog Biophys Mol Biol. 58(3) (1992), 225-257.

49 Aich P., Ritchie S., Bonham K., Lee J.S., Thermodynamic and kinetic studies of the formation of triple helices between purine-rich deoxyribo- oligonucleotides and the promoter region of the human c-src proto-oncogene. Nucleic Acids Res. 26(18) (1998), 4173-4177.

50 S. Melkko, C.E. Dumelin, J. Scheuermann, D. Neri, On the magnitude of the chelate effect for the recognition of proteins by pharmacophores scaffolded by self-assembling oligonucleotides. Chem Biol. 13(2) (2006), 225-231.

51 M. Lovrinovic and C.M. Niemeyer, DNA microarrays as decoding tools in combinatorial chemistry and chemical biology, Angew Chem., Int. Ed. Engl. 44 (2005), 3179–3183.

52 M. Uttamchandani, D.P. Walsh, S.Q. Yao and Y.T. Chang, Small molecule microarrays: recent advances and applications, Curr. Opin. Chem. Biol. 9 (2005), 4–13.

53 S. Melkko, J. Sobek, G. Guarda, J. Scheuermann, C.E. Dumelin and D. Neri, Encoded self-assembling chemical libraries, Chimia 59 (2005), 798–802.

117

54 Dumelin C.E., Scheuermann J., Melkko S., Neri D., Selection of streptavidin binders from a DNA-encoded chemical library. Bioconjug Chem., 17(2) (2006), 366-70.

55 Dumelin CE, Trüssel S, Buller F, Trachsel E, Bootz F, Zhang Y, Mannocci L, Beck SC, Drumea-Mirancea M, Seeliger MW, Baltes C, Müggler T, Kranz F, Rudin M, Melkko S, Scheuermann J, Neri D. A portable albumin binder from a DNA-encoded chemical library. Angew Chem Int Ed Engl. 47(17) (2008); 3196-201.

56 Melkko S, Zhang Y, Dumelin CE, Scheuermann J, Neri D., Isolation of high- affinity trypsin inhibitors from a DNA-encoded chemical library. Angew Chem Int Ed Engl. 46(25) (2007), 4671-4674.

57 Scheuermann J, Dumelin CE, Melkko S, Zhang Y, Mannocci L, Jaggi M, Sobek J, Neri D., DNA-Encoded Chemical Libraries for the Discovery of MMP-3 Inhibitors. Bioconjug Chem. 19(3) (2008), 778-785.

58 Melkko S., Neri D., 2002 Encoded Self-Assembling Chemical Libraries, WO/2003/076943

59 Michael J. Heller, DNA-microarray Technology: Devices, Systems, and Applications. Annu. Rev. Biomed. Eng., 4 (2002). 129–153.

60 Southern, E.M., Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol., 98 (1975), 503-517.

61 Kulesh D.A., Clive D.R., Zarlenga D.S., Greene J.J., Identification of interferon-modulated proliferation-related cDNA sequences. Proc Natl Acad Sci USA, 84 (1987), 8453–8457.

62 Schena M., Shalon D., Davis R.W., Brown P.O., Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 (1995), 467–470.

118

63 Lashkari D.A., DeRisi J.L., McCusker J.H., Namath A.F., Gentile C., Hwang S.Y., Brown P.O., Davis R.W., Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94 (1997), 13057–13062.

64 Shendure, J., Mitra, R.D., Varma, C., Church G.M., Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5 (2004), 335–344.

65 Sanger, F. , Nicklen, S. & Coulson, A. R. DNA sequencing with chain- terminating inhibitors. Proc. Natl Acad. Sci. USA, 74 (1977), 5463–5467.

66 Prober, J. M. et al. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science, 238 (1987), 336–341.

67 Nyren, P., Pettersson, B. & Uhlen, M. Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay. Anal. Biochem. 208 (1993), 171–175.

68 Ronaghi, M. et al. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 242 (1996), 84–89.

69 Jacobson, K. B. et al. Applications of mass spectrometry to DNA sequencing. GATA 8 (1991), 223–229.

70 Bains, W. & Smith, G. C. A novel method for nucleic acid sequence determination. J. Theor. Biol. 135 (1988), 303–307.

71 Jett, J. H. et al. High-speed DNA sequencing: an approach based upon fluorescence detection of single molecules. Biomol. Struct. Dynam. 7 (1989), 301–309.

72 M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben, J. Berka, M.S. Braverman and Y.J. Chen et al. Genome sequencing in microfabricated high-density picolitre reactors, Nature 437 (2005), 376–380.

119

73 J. Shendure, G. J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M. Rosenbaum, M. D. Wang, K. Zhang, R.D. Mitra, G.M. Church. Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science, 309(5741) (2005), 1728 – 1732.

74 http://solid.appliedbiosystems.com/ - Applied Biosystems' SOLiD technology.

75 http://www.illumina.com/

76 Braslavsky I., Hebert H., Kartalov E., Quake S.R.. Sequence information can be obtained from single DNA molecules. Proc. Natl Acad. Sci. USA, 100 (2003), 3960–3964.

77 M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, P. Nyren. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem., 242 (1996), 84-89.

78 S. C. Macevicz, US Patent 5750341, filed 1995

79 Droege, M., The Genome Sequencer FLXTM System – Longer reads, More applications, Straight forward bioinformatics & More Complete Data Sets. J. Biotechnol. (2007), in press.

80 http://www.helicosbio.com/

81 Mitra R.D., Shendure J., Olejnik J., Olejnik E.K., Church G.M., Fluorescent in situ sequencing on polymerase colonies. Anal. Biochem., 320(1) (2003), 55- 65.

82 Huber C.G., Oberacher H., Analysis of nucleic acids by on-line liquid chromatography-mass spectrometry. Mass Spectrom Rev. 20(5) (2001), 310- 343.

120

83 Klenow H., Henningsen I., Selective Elimination of the Exonuclease Activity of the Deoxyribonucleic Acid Polymerase from Escherichia coli B by Limited Proteolysis. Proc Natl Acad Sci 65 (1970), 168–175.

84 Silacci M., Brack S., Schirru G., Mårlind J., Ettorre A., Merlo A., Viti F., Neri D., Design, construction, and characterization of a large synthetic human antibody phage display library. Proteomics, 5(9) (2005), 2340–2350.

85 Janeway, Travers, Walport, Shlomchik, Immunobiology, 6th Ed. (2005) Churchill Livingstone.

86 Walsh G., Biopharmaceutical benchmarks 2006. Nature Biotechnology, 24(7) (2006), 769-776.

87 Liotta L. A. et al. Metastatic potential correlates with enzymatic degradation of basement membrane collagen. Nature, 284 (1980), 67–68.

88 Brinckerhoff C. E., Matrisian, L. M., Matrix metalloproteinases: a tail of a frog that became a prince. Nature Rev. Mol. Cell Biol. 3 (2002), 207–214.

89 Coussens L. M., Fingleton B., Matrisian, L. M., Matrix metalloproteinase inhibitors and cancer: trials and tribulations. Science 295, 2387–2392 (2002).

90 Egeblad M., Werb Z., New functions for the matrix metalloproteinases in cancer progression. Nature Rev. Cancer. 2, (2002), 163–175.

121

7. Curriculum Vitae

Luca Mannocci Wolfgang-Paulistrasse 10 Tel.: +41 44 63 37 453 ETH Zürich, HCI G398 Fax.: +41 44 63 31 358 CH-8093 Zürich [email protected] Switzerland

Personal Details

Name Luca Mannocci Date of birth 07th of September 1979 Citizen Zürich Nationality Italian Civil state Unmarried Address: Kolbenacker 34 8052 Zürich Switzerland Tel.: +41 43 44 39 900 Mobile: +41 76 43 76 485

122

Education

2005 – 2008 ETH Swiss Federal Institute of Technology Zürich, Switzerland. Ph.D. in Sciences

2004 Italian State exam for the habilitation to chemistry profession

1998 - 2004 Università degli Studi di Pisa Pisa, Italy. Degree in Chemistry (Mark: 110/110 e lode).

1998 Liceo Scientifico Statale “Ulisse Dini” (Scientific Lyceum), Pisa, Italy. High school diploma (Mark: 60/60).

123

Research Experience

SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH)

2005-2008 Doctoral student, Institute of Pharmaceutical Sciences.

Ph.D. Thesis “DNA-encoded Chemical Libraries” Advisor: Prof. Dr. Dario Neri

UNIVERSITY OF PISA

2004-2005 Internship, Organic Chemistry Division of the Department of Chemistry at the University of Pisa

Silver(I)-catalysed Advisor: Prof. Adriano Carpita and protiodesilylation Prof. Renzo Rossi

Collaboration for natural science book Prof. R. Rossi, Prof. A. Carpita, Prof. publication F. Bellina, “Sostanze organiche naturali e loro derivati da analoghi strutturali con proprietà antineoplastiche”, (2005), Ed. Plus. (“Natural occurring substances and structural analogues with anti- neoplastic properties”).

124

2002-2004 Training in the Organic Chemistry Division of the Department of Chemistry at the University of Pisa

Diploma Thesis “La prima sintesi totale del (-)- nitidone, una sostanza naturale con proprietà antitumorali, e del suo enantiomero” (“First total synthesis of naturally-occurring (-)-nitidon and its enantiomer”). Advisor: Prof. Adriano Carpita

Languages

Italian Native speaker

English Fluent

German Basic knowledge

Hungarian Basic knowledge

125

Publications and Patents

• Mannocci L., Zhang Y., Scheuermann J., Leimbacher M., De Bellis G., Rizzi E., Dumelin C.E., Melkko S., Neri D., “High-throughput sequencing allows the identification of binding molecules isolated from DNA-encoded chemical libraries”. PNAS, 2008, 105(46), 17670-17675.

• Mannocci L., Neri D., Melkko S., “DNA-Encoded Chemical Libraries”. SCREENING - Trends in Drug Discovery, 2009, 10, 16-18.

• Mannocci L., Melkko S., Neri D., DNA-encoded chemical libraries. US Patent Application 2008 No 61/008,249.

• Scheuermann J., Dumelin C.E., Melkko S., Zhang Y., Mannocci L., Jaggi M., Sobek J., Neri D., “DNA-encoded chemical libraries for the discovery of MMP-3 inhibitors”. Bioconjug Chem., 2008, 19(3), 778- 785.

• Dumelin C.E., Trüssel S., Buller F., Trachsel E., Bootz F., Zhang Y., Mannocci L., Beck S.C., Drumea-Mirancea M., Seeliger M.W., Baltes C., Müggler T., Kranz F., Rudin M., Melkko S., Scheuermann J., Neri D. “A portable albumin binder from a DNA-encoded chemical library”. Angew Chem Int Ed Engl., 2008, 47(17), 3196-3201.

• Buller F., Mannocci L., Zhang Y., Dumelin C.E., Scheuermann J., Neri D., “Design and synthesis of a novel DNA-encoded chemical library using Diels-Alder cycloadditions”. Bioorg Med Chem Lett. 2008

• A. Carpita, L. Mannocci, R. Rossi, “Silver(I)-catalysed Protiodesilylation of 1-(Trimethylsilyl)-1-alkynes”. Eur. J. Org. Chem. 2005, 12, 1367- 1377.

• F. Bellina, A. Carpita, L. Mannocci, R. Rossi “First total synthesis of naturally-occurring (-)-nitidon and its enantiomer”, Eur. J. Org. Chem. 2004, 12, 2610-2619.

126

Poster Presentations

• L. Mannocci, Y. Zhang, J. Scheuermann, M. Leimbacher, G. De Bellis, E. Rizzi, C. Dumelin, S. Melkko, D. Neri, “Novel Strategies for the Synthesis, Selection and Decoding of DNA-Encoded Chemical Libraries” - Molecular Medicine Tri-Conference, San Francisco, USA, 25th - 28th March 2008.

• A. Carpita, L. Mannocci, R. Rossi, XVII Convegno Nazionale della Divisione di Chimica Farmaceutica della Società Chimica Italiana, Pisa (PI), Italy, 6th – 10th September 2004.

• A. Carpita, L. Mannocci - poster and flash communication, XXIX Convegno Nazionale della Divisione Chimica Organica, Potenza (PZ), Italy, 31st August – 4th September 2004.

• M. Biagetti, A. Carpita, L. Mannocci, R. Rossi “Studi sulla sintesi del (-)-nitidone”, communication, VI Convegno Nazionale “Giornate di Chimica delle Sostanze Naturali”, Vietri sul Mare (SA), Italy, 29th September – 1st October 2003. Acts pag. 3.

127 8. ACKNOWLEDGMENTS

First of all, I would like to express my sincere gratitude to my PhD advisor Professor Dr. Dario Neri for giving me the privilege to pursue my doctoral studies in his laboratory. In every discussion, I constantly perceived a brilliant intellect beyond all his answers, as well as an enormous wisdom in his questions. I especially appreciated his success focus attitude towards research and I was impressed by his excellent and ubiquitous scientific knowledge, inexhaustible source of creativity and curiosity.

I would like to thank Dr. Yixin Zhang, Dr. Jörg Scheuermann and Dr. Samu Melkko. Their constant scientific and personal support inspired me in the day-by-day experiments and was absolutely essential for the accomplishment of this work. Dr. Yixin Zhang was an invaluable help in the design and in the set up of all the bioinformatic tools. His expertise and precious advice were often crucial for the synthesis and the assembling of the library. Dr. Jörg Scheuermann introduced me into the field of DNA-encoded chemistry and to the laboratory life. I constantly benefit from his priceless critical input and exciting discussions. Samu Melkko was a brilliant support for the selection procedures and a very big help with the gene cloning and the radioactive selections.

Further thanks go to the “chemistry team”: Christoph Dumelin, Fabian Buller, Jean- Paul Gapian, Sabrina Trüssel, Madalina Jaggi and Ilona Molnàr for their priceless support and for the numerous daily, open, controversial and stimulating scientific (and non-scientific) discussions, which created the constant extraordinary enjoyable atmosphere, typical of the room G398. Special thanks go to Markus Leimbacher for helping me on the assembling of the library over his Master Thesis and for his heroic efforts for the set up of the encoding strategies and of the selection procedures.

I greatly acknowledge my co-examiner Prof. Karl-Heinz Altmann for thoroughly proof-reading my Thesis for all his valuable input. Additionally, I am most grateful to Gianluca De Bellis and Ermanno Rizzi from the Institute for Biomedical Technologies of Milan, who enabled us to use “454” high- throughput sequencing technology, by providing the platform.

128

Finally, an enormous hug goes to all present and former members (and friends) of Professor Dario Neri’s group, who contributed to the pleasant atmosphere and provided me advices, friendship, vitality and for many years a home away from home. Without you the winter here would have certainly been much colder and the clouds not so bright. A smile lasts only for while, but in the memory can be forever: you will always have a special place in the treasure of my heart.

Un credito non solo di gratitudine ma soprattutto di affetto e stima lo devo al Prof. Adriano Carpita. La sua sincera amicizia mi ha accompagnato e supportato in tutti questi anni. M’insegnò, con quell’arte che oggi è mio mestiere, come spesso la piacevole scoperta è più frutto della tenace ingegnosa pazienza che di qualsiasi altro talento.

Un ringraziamento e un abbraccio non possono certo mancare per il mio amico, compagno di sogni grandiosi e viaggi avventurosi Dott. Dario Lombardi. Con vino, parole e allegria mi ha sempre aiutato a buttar giù le pillole più amare e a lasciare gli errori in fondo al boccale. Ti auguro di cuore di mietere presto tutto quel successo che in questi anni hai seminato con il tuo talento.

Voglio inoltre esprimere la mia più sincera gratitudine a tutti i miei amici “vecchi” e “nuovi” per il loro supporto e per la loro vicinanza nella lontananza. E poiché mi pare ingiusto nominarne alcuno quanto nessuno allora li riporto tutt’e centomila: Alessio Catarsi, Luca Mantilli, Mirko Sardelli, Enrico Marsili, Roberto Scamuzzi, Sandro Orsini, Luca Reggiani, Giorgio La Corte, Francesco Attuali, Andrea Scarpellini, Silvia Anthoine Dietrich, Giulio Casi, Stefania Capone, Andrea Chicca, Cesare Borgia e tutti coloro che non trovano spazio qua, ma di sicuro lo hanno tra i miei ricordi. Brindo a voi amici di oggi, amici di ieri, amici, spero, di sempre!

Un riconoscimento speciale va a mio Padre, a mia Madre, ai miei Nonni e a tutti i miei cari oggi presenti e a tutti quelli che purtroppo non posso essere qui con me a festeggiare questo traguardo. Questo lavoro è dedicato a voi che durante questa mia storia, con amore e pazienza, mi avete sostenuto e incoraggiato giorno per giorno a

129 inseguire i miei sogni, a vedere aquiloni là dove c’erano soltanto nuvole. Un posto d’onore sarà sempre per voi nello scrigno del mio cuore.

Infine, un immenso abbraccio e un ringraziamento speciale lo devo ad Anita. E’ stata la mia scorta di sole quando l’inverno sembrava infinito e il mio più potente antidoto contro i più diversi vapori di questo laboratorio. Forse è vero, la vita e i sogni sono fogli di uno stesso libro: leggerli in ordine è vivere, sfogliarli a caso è sognare, ma questo libro parla comunque di te. Sei tutto quello che so sulla felicità. Grazie!

130 9. APPENDIX

9.1 Model compound oligonucleotide conjugates.

DEL_O_1 Conjugated Fmoc-amino acid (A) (5’-amino-C12-GGA GCT TGT GAA TTC TGG ATC TTA GGA CGT GTG TGA ATT GTC-3’) *) 97 HO Yield % Recovery†) % 53 O NHFmoc ESI-MS (Da) (in brackets is the expected 13474 (13473) MS) OOH *) Yield % 90 †) 65 FmocNH Recovery % ESI-MS (Da) (in brakets the expected MS) 13439 (13437)

O *) Yield % 73 OH Recovery†) % 55

13457 (13459) FmocNH ESI-MS (Da) (in brakets the expected MS)

Table 9-1: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three selected Fmoc-amino acids (A) and a model 5’-amino-oligonucleotide (DEL_O_1). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer) following HPLC purification (see Chapter 3.1.6).

131

DNA H O O N HN DNA NH

DNA O H 2 N NH2 **) H N 2 Structures *) †) ‡) *) †) ‡) *) †) ‡) Yield Recovery ESI-MS Yield Recovery ESI-MS Yield Recovery ESI-MS % % % % (Da) % (Da) % (Da) H S HN OH 13670 13663 13685 O O N H 98 90 83 68 65 60 H (13699) (13663) (13685)

HO 13732 13694 13721 O OI70 60 72 60 76 65 (13733) (13697) (13719)

N 13647 13609 13632 O >70 70 >64 64 >57 57 (13644) (13608) (13630) OH Br HO 13670 136332 13657 >52 52 >55 55 >51 51 (13670) (13634) (13656) O

Table 9-2: HPLC coupling yields and recovery assessed after peptide bond formation reaction between three selected 5’-Fmoc-deprotected amino acids (A) oligonucleotide conjugated and four different model carboxylic acids (B). **) In the row are schematically represented the structures of the Fmoc-deprotected amino acids (A) oligonucleotide conjugated, while in the column, the structures of the model carboxylic acids (B). *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) Evaluated measuring the absorption at 260 nm using a NanoDrop instrument (ND-1000 UV-Vis spectrophotometer) following HPLC purification (see Chapter 3.1.6). ‡) In brackets is reported the calculated MS for the oligonucleotide conjugated product.

132 9.2 Library synthesis overview

List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building block for constructing DEL4000:

HPLC †) Coding *) ESI-MS DEL_Cn Structure Name MW Yield sequence (Da) %

N

(S)-3-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13474 1 388.42 -3-(pyridin-4-yl)propanoic ATCTTA 97 (13474) acid (S) COOH FmocHN

Br

3-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13610 2 480.35 -4-(4- GCTGCG 70 (13608) NHFmoc bromophenyl)butanoic acid

(S) COOH

COOH

(R) (1R,2S)-2-(((9H-fluoren-9- (S) yl)methoxy)carbonylamino) 13497 3 NHFmoc 351.4 cyclopentanecarboxylic AGAACG 86 (13496) acid

HOOC

3-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13528 4 MeS 433.52 -3-(pyridin-2-yl)propanoic GACATC 53 (13529) NHFm oc acid

HOOC F (S) (S)-2-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13491 5 -3-(3- 405.43 ATTACT 64 NHFmoc fluorophenyl)propanoic (13491) acid

133

HO OC (S) (R) NHFmoc (1S,4R)-4-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13467 6 (Z) 349.38 cyclopent-2-enecarboxylic ACGGCA 72 (13470) acid

HOOC

(R)

(R)-3-(4-((((9H-fluoren-9- BocHN yl)methoxy)carbonylamino) 13586 7 methyl)phenyl)-2-(tert- 516.58 AGAGAA 60 butoxycarbonylamino)prop (13685) anoic acid

FmocHN

HOOC

Acetic acid, [[5-[[(9H- O fluoren-9- ylmethoxy)carbonyl]amino] 13588 8 505.56 -10,11-dihydro-5H- TCCAAA 87 (13585) dibenzo[a,d]cyclohepten-2- yl]oxy] NHFmoc

N

(S)-2-(((9H-fluoren-9- S (S) yl)methoxy)carbonylamino) 13482 9 NHFmoc 394.44 -3-(thiazol-4-yl)propanoic TCGATC 75 (13481) HOOC acid

N (S) NHFmoc (S)-2-(((9H-fluoren-9- N HOO C yl)methoxy)carbonylamino) 13553 10 -3-(1-benzyl-1H-imidazol- 467.52 TCCGGC 60 Ph (13555) 4-yl)propanoic acid

(CH 2)4 HOOC O

5-(4-((((9H-fluoren-9- yl)methoxy)carbonylamino) 13618 11 methyl)-3,5- 505.56 CGTGCA 53 dimethoxyphenoxy)pentan (13617) MeO OMe oic acid

FmocHN

134

HOOC (R)-2-(((9H-fluoren-9- (R) yl)methoxy)carbonylamino) 13598 12 -3-(4- 421.88 GGGTAA 80 (13598) NHFmoc chlorophenyl)propanoic Cl acid

NHFmoc (R) (R)-3-(((9H-fluoren-9- 13358 13 yl)methoxy)carbonylamino) 349.38 CCCTCC 98 hex-5-ynoic acid (13357) COOH

NHFmoc (S) HOOC (S)-3-(((9H-fluoren-9- 13521 14 yl)methoxy)carbonylamino) 477.55 TCTCCA 70 -4,4-diphenylbutanoic acid (13524) Ph Ph

O O S HN (S)-3-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13564 15 -2- 466.51 CAAGCT 80 (S) (13562) HOOC (phenylsulfonamido)propan oic acid NHFmoc

NHFm oc (S) (S)-3-(((9H-fluoren-9- S yl)methoxy)carbonylamino) 13520 16 407.48 -4-(thiophen-3-yl)butanoic GCACTG 43 (13519) COOH acid

NHFmoc

(S) (S)-3-(((9H-fluoren-9- COOH yl)methoxy)carbonylamino) 13648 17 527.35 -4-(4-iodophenyl)butanoic ACGAAT 64 (13647) acid

I

135

NHFmoc (R) (R)-3-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13562 18 COOH 451.51 -4-(naphthalen-2- TATCAG 80 (13562) yl)butanoic acid

COOH

(R) (R)-3-(((9H-fluoren-9- FmocHN yl)methoxy)carbonylamino) 13587 19 451.51 -4-(naphthalen-1- TGAAAT 62 (13586) yl)butanoic acid

HOOC (S) (S)-2-(((9H-fluoren-9- yl)methoxy)carbonylamino) 13543 20 -3-(4- 403.43 GTTAGT 56 NHFmoc hydroxyphenyl)propanoic (13545) OH acid

Table 9-3: List of the 20 Fmoc-amino acids and of the oligonucleotide codes used as initial building block for constructing DEL4000: *) Determined by HPLC after Fmoc deprotection of the oligonucleotide conjugated compound. †) In brackets is reported the calculated expected MS for the oligonucleotide conjugated product Fmoc- deprotected.

136 List of the 200 carboxylic acids and of the oligonucleotide codes used as second building block for the construction of DEL4000:

Formula Coding Num. Structure MW Supplier structure sequence

O N S S

1 N C13H8N2O4S2 320.34771 TTTTTTTT SALOR HO O

O

O O O HO N

SO 2 C12H9NO5S 279.27323 SALOR GGGGTTTT

O N S OH O 3 C7H7NO3S 185.20274 SALOR CCCCTTTT

O CH3 O + N OH O 4 C10H11NO4 209.20347 SALOR AAAATTTT

O + N O N N CH 5 3 C6H7N3O4 185.14039 SALOR OH ACGTGTTT

O

O + N Cl O N N CH 6 3 C6H6ClN3O4 219.58542 SALOR OH CATGGTTT

O

137 OO N OH H OH HO H 7 H OH C8H15NO8 253.21065 SALOR H OH GTACGTTT H OH H

S O

8 C6H6O2S 142.17752 ALDRICH OH TGCAGTTT

OH S 9 C7H8O2S 156.20461 ALDRICH O GACTCTTT

S OH

O 10 C8H10O2S 170.2317 ALDRICH TCAGCTTT

OH O

OH

NOCH3

CH3 O CH 11 3 C16H29NO5 315.41323 SIGMA AGTCCTTT

OH O Chiral H3C OH

O CH3 CH3 N H C CH 12 O 3 3 C13H25NO5 275.3479 FLUKA CTGACTTT

H C Chiral 3 O O H C 3 N OH CH3 O OH

H C CH 13 3 3 C13H25NO5 275.3479 FLUKA CGATATTT

138 CH Chiral O O 3 O CH3 H3C CH3 H3C O N H3C OH 14 O C14H25NO6 303.35845 FLUKA ATCGATTT

OOHChiral H C 3 O N

H3C CH3 O

15 C14H19NO4 265.31183 FLUKA TAGCATTT

Chiral

OH

16 O N O C15H21NO4 279.33892 GCTAATTT FLUKA CH O 3

CH3 H3C

CH O Chiral 3 H3C O N O H3C OH 17 C16H23NO4 293.36601 FLUKA CAGTTGTT

CH3

CH3 H3C O H3C OON

18 OH C21H25NO4 355.4377 FLUKA ACTGTGTT

CH3 O H3C O N O H3C OH

19 C15H20ClNO4 313.78395 FLUKA TGACTGTT

Cl

O CH3 H3C O N O H3C OH 20 C15H20ClNO4 313.78395 FLUKA GTCATGTT

Cl

139 CH3 O H3C O N O H3C OH

21 C19H23NO4 329.39946 FLUKA GGTTGGTT

CH3 O H3C O N O H3C

OH 22 C16H20N2O4 304.3488 FLUKA TTGGGGTT

CN

CH3 O H3C O N O H3C OH 23 C16H20N2O4 304.3488 FLUKA AACCGGTT

CN

CH Chiral H C 3 3 O H3C OON

OH 24 C16H20F3NO4 347.3373 FLUKA CCAAGGTT

F F F

CH3 O H3C O N O H3C

OH

25 C16H20F3NO4 347.3373 FLUKA ATATCGTT

CF3

O Chiral CH3 H3C O N O H3C OH 26 C16H20F3NO4 347.3373 CGCGCGTT FLUKA F

F F

O Chiral CH3 H3C O N O H3C

F OH F 27 C16H20F3NO4 347.3373 GCGCCGTT FLUKA F

140 CH3 Chiral H3C O H3C O N O

28 OH C14H20N2O4 280.3265 FLUKA TATACGTT

N

CH3 O H3C O N H3C O

OH 29 CN C16H20N2O4 304.3488 FLUKA TCCTAGTT

CH O H C 3 3 O N O H3C OH

30 C21H25NO4 355.4377 FLUKA GAAGAGTT

CH O 3 + CH3 N

HO CH3 31 C7H16ClNO2 146.21107 ALDRICH CTTCAGTT

Cl

Cl

H C 32 3 + O C7H16ClNO3 162.21047 FLUKA N AGGAAGTT

H3C CH3 OH OH

CH3 H3C N O H3C

O CH3

HO O

33 C13H28ClNO4 260.354 SIGMA AGCTTCTT

ClH

CH 3 H3C N O H3C

O CH3

HO O

34 C17H36ClNO4 316.462 SIGMA CTAGTCTT

ClH

141

O O

+ N 35 C6H11NO4 161.15887 ALDRICH O OH GATCTCTT

OH O + N O 36 O O C8H7NO5 197.14869 ALDRICH TCGATCTT

O HO + N O O 37 C10H11NO4 209.20347 FLUKA TAATGCTT

O

OH N O 38 + O C9H8N2O5 224.17451 ALDRICH N GCCGGCTT O

OO

NOH O 39 + C10H10N2O5 238.2016 ALDRICH N CGGCGCTT O

O O + N O O OH HO 40 O C13H17NO7 299.28294 FLUKA ATTAGCTT CH3 CH3

HO O

+ 41 CH O Na C4H7NaO3 104.10739 FLUKA 3 CCTTCCTT

142 OH Chiral

H3C

O 42 O C4H7NaO3 104.10739 FLUKA AAGGCCTT

+ Na

CH3 O

H C OH 3 OH 43 C5H10O3 118.13365 FLUKA TTCCCCTT

OH O H3C NH2 OH CH 44 3 C7H13NO4 175.18596 SALOR GGAACCTT O

O OH

HO O 45 C9H10O4 182.17765 FLUKA GTGTACTT

O OH

O

C10H12O5 212.20414 FLUKA 46 CH TGTGACTT O 3

HO

O O OH HO 47 C12H16O5 240.25832 FLUKA O ACACACTT H3C

CH3 OH CH H3C 3 48 C12H20O3 212.2914 ALDRICH CACAACTT O OH

143 O

OCH3 HO 49 C8H14O4 174.19838 ALDRICH GCATTATT O

O CH3

H3C O O CH3 50 HO O C18H22O6 334.37244 TACGTATT SALOR O

O

H3C N OH N

N O 51 O N C11H14N4O4 266.25863 SIGMA ATGCTATT

CH3

O

OH N 52 C14H11NO2 225.24927 FLUKA CGTATATT

O OH

N 53 C15H13NO2 239.27636 FLUKA CTCTGATT

O

SOH

N N 54 C11H14N4O2S 266.32383 SALOR AGAGGATT N N

CH3

H O

OH 55 C18H17NO2 279.34169 ALDRICH TCTCGATT N

CH3

144 OO

N NH2 HO N NH2 HO 56 NH C3H7N3O2 117.10814 ALDRICH NH GAGAGATT

O OH NH

H2N N 57 C5H11N3O2 145.16232 FLUKA TGGTCATT

CH 3 O

N 58 C4H10ClNO2 139.5828 ALDRICH CH 3 OH GTTGCATT

HC l

O

CH3 HO 59 C5H10O2 102.13425 ALDRICH CAACCATT

OH

60 C10H18O2 170.25376 ALDRICH O ACCACATT

O

OH

O 61 C16H22O3 262.35194 SALOR AATTAATT

O O

62 HO C5H8O3 116.11771 CCGGAATT FLUKA CH3

145

O O 63 C7H10O3 142.15595 ALDRICH OH GGCCAATT

O

N CH3 HO 64 C8H15NO3 173.21365 ALDRICH TTAAAATT O

N OH O 65 C5H6N2O4 158.11457 ALDRICH N O GGTTTTGG O

O HO N O N 66 C13H14N2O4 262.26753 ALDRICH TTGGTTGG O

O O OH O N OH N O N O N ON N 67 N O C13H17N5O5 323.31094 AACCTTGG SIGMA N O N N

O

N S OH 68 C6H6N2O2S 170.19092 ALDRICH N CCAATTGG

S

69 C16H14O2S 270.35278 CAGTGTGG SALOR O OH

146 OH S

O 70 C9H10O2S 182.24285 ALDRICH ACTGGTGG

S

OH 71 S C9H8O2S2 212.29091 ALDRICH TGACGTGG O

OH

H C 72 3 O C9H10O2S 182.24285 ALDRICH S GTCAGTGG

O

OH

73 C9H10O2S 182.24285 ALDRICH TCCTCTGG CH S 3

O S OH 74 C9H10O2S 182.24285 ALDRICH GAAGCTGG H3C

O

75 Cl S OH C8H7ClO2S 202.66079 ALDRICH CTTCCTGG

HO S 76 O C12H10O2S 218.2763 FLUKA AGGACTGG

147

O

S 77 C3H6O2S 106.14407 ATATATGG ALDRICH HO CH3

O OH H3C S O 78 C8H14O3S 190.26298 ALDRICH CGCGATGG

O

OH 79 C9H9FO2 168.16928 ALDRICH GCGCATGG F

O

OH

80 O C10H9FO3 196.17983 SIGMA F TATAATGG

O FOH 81 C9H8F2O2 186.15971 ALDRICH ACGTTGGG F

O F F 82 OH C9H6F4O2 222.14057 ALDRICH CATGTGGG F F

F O F OH F 83 C10H9F3O2 218.17723 ALDRICH GTACTGGG

148 F F

F O OH 84 C11H8F6O2 286.17561 ALDRICH F TGCATGGG F F

F F

F O OH 85 C11H8F6O2 286.17561 ALDRICH F TTTTGGGG F F

F F O O F OH 86 C9H7F3O3 220.14954 ALDRICH GGGGGGGG

O OH

87 C8H7ClO2 170.59679 ALDRICH CCCCGGGG

Cl

O

OH 88 C9H9ClO2 184.62388 ALDRICH AAAAGGGG Cl

O

OH

C10H9ClO3 212.63443 ALDRICH 89 O CGATCGGG Cl

HO O

O CH3 90 C9H9ClO3 200.62328 ALDRICH ATCGCGGG

Cl

149

O

O OH 91 C8H6Cl2O3 221.04122 TAGCCGGG ALDRICH Cl Cl

H2C CH3

O 92 HO C13H12Cl2O4 303.14419 SIGMA O Cl GCTACGGG O Cl

Cl

O

N CH3

H C 93 3 C19H16ClNO4 357.79667 SIGMA O GACTAGGG

O OH

CH3 Cl O 94 HO C12H9ClO5 268.65553 TCAGAGGG ALDRICH O OO

OH

O 95 C8H7BrO2 215.04779 FLUKA AGTCAGGG Br

OH

O Br 96 C8H7BrO2 215.04779 FLUKA CTGAAGGG

OH Br O

97 C8H7BrO2 215.04779 FLUKA CTCTTCGG

150 O

OH

Br 98 C9H9BrO2 229.07488 ALDRICH AGAGTCGG

O O OH

Br 99 C8H7BrO3 231.04719 ALDRICH TCTCTCGG

O OH

O Br 100 C10H9BrO3 257.08543 ALDRICH GAGATCGG

O

I OH 101 C8H7IO2 262.04819 ALDRICH GCATGCGG

OH

O I 102 C10H11IO2 290.10237 SIGMA TACGGCGG

O O OH

I 103 C8H7IO3 278.04759 ALDRICH ATGCGCGG

OH N O

O I 104 C9H8INO3 305.07341 FLUKA CGTAGCGG

151 CH O 3 N HO O 105 C6H7NO3 141.12759 ALDRICH AATTCCGG

N N O N N OH 106 C3H4N4O2 128.09093 ALDRICH CCGGCCGG

O

OH

H C 107 3 C10H12O2 164.20594 ALDRICH GGCCCCGG

H C CH 3 3O

OH CH 108 3 C11H14O2 178.23303 ALDRICH TTAACCGG

OOHChiral

H3C

109 C10H12O2 164.20594 FLUKA TGGTACGG

Chiral

H C O 110 3 C10H12O2 164.20594 FLUKA GTTGACGG OH

O

OH

111 C15H12O2 224.26169 ALDRICH CAACACGG

152 ClH

112 N C7H8ClNO2 173.60031 FLUKA O ACCAACGG HO

O

OH

113 C10H10O3 178.1894 ALDRICH O TAATTAGG

O O

OH 114 C11H12O3 192.21649 ALDRICH GCCGTAGG

O O OH

115 C11H12O3 192.21649 ALDRICH CGGCTAGG

CH3

O O OH

116 C8H8O3 152.15116 ALDRICH ATTATAGG

O O OH

CH3 117 C9H10O3 166.17825 ALDRICH AGCTGAGG

CH CH 3 3 H3C CH3

CH3 OH CH 3 O 118 O C18H28O3 292.42206 ALDRICH GATCGAGG

153

HO O

OH 119 C9H10O3 166.17825 ALDRICH TCGAGAGG

HO

O O CH 120 3 C9H10O3 166.17825 FLUKA GTGTCAGG

OH CH O O 3 121 C9H10O3 166.17825 FLUKA TGTGCAGG

OH CH3 O O

122 C9H10O3 166.17825 FLUKA ACACCAGG

O OH H3C O

O CH 123 3 C10H12O4 196.20474 ALDRICH CACACAGG

O

CH3 OH O O CH3 124 C11H14O5 226.23123 FLUKA AAGGAAGG O CH3

H3C O CH3 OH O O 125 C11H14O5 226.23123 TTCCAAGG FLUKA O

H3C

154 OH

O H3C O 126 C10H12O3 180.20534 ALDRICH GGAAAAGG

O

127 OH C10H12O3 180.20534 FLUKA O CCTTTTCC CH3

O CH3 O HO 128 C10H12O3 180.20534 FLUKA AAGGTTCC

CH O 3 O

OH

129 O C11H14O4 210.23183 ALDRICH TTCCTTCC CH3

CH3 O O

OH

130 C11H14O4 210.23183 GTGTGTCC ALDRICH O H3C

H C 3 O CH O 3 O

HO 131 O C12H16O5 240.25832 FLUKA TGTGGTCC H3C

OH

O O O CH 132 3 C16H16O4 272.30352 SALOR ACACGTCC

155 O

O

H3C O OH 133 C12H12O4 220.22704 ALDRICH CACAGTCC

O O O O HO

134 C12H10O5 234.2105 ALDRICH AGCTCTCC CH3

O O HO

135 C16H14O3 254.28818 ALDRICH CTAGCTCC

O OH N

O 136 C9H9NO3 179.17698 ALDRICH GATCCTCC

O

OH N 137 C10H11NO3 193.20407 ALDRICH O TCGACTCC CH3

O OH N

O 138 C10H11NO3 193.20407 ALDRICH TAATATCC CH3

O OH N

O H C 139 3 C10H11NO3 193.20407 ALDRICH GCCGATCC

156 O

O N

OH 140 O C10H7NO4 205.17159 FLUKA CGGCATCC

O

NH2 O O 141 + C9H8NNaO4 217.15821 ALDRICH O Na ATTAATCC

OO

NOH

142 C10H11NO3 193.20407 SIGMA TGGTTGCC

O OH

N

143 CH3 C11H13NO3 207.23116 FLUKA GTTGTGCC O

H CH 3 O O N OH 144 C12H15NO3 221.25825 ALDRICH CAACTGCC

O H CH 3 O N OH

145 C12H15NO3 221.25825 ALDRICH ACCATGCC

H CH 3 OO N OH 146 C16H17NO3 271.31879 ALDRICH AATTGGCC

157

H CH3 OO N OH 147 C16H17NO3 271.31879 ALDRICH CCGGGGCC

OOH

N O 148 C11H11NO4 221.21462 ALDRICH GGCCGGCC O

O

O N N 149 C20H21N3O6 399.40687 GCATCGCC SIGMA O OH O N

O

ClH

150 N C5H7ClN2O2 162.57674 TACGCGCC FLUKA O

N HO

O CH N 3

O N 151 C7H8N2O4 184.15281 ALDRICH OH ATGCCGCC

O

O O OH N

O 152 C7H7NO4 169.13814 ALDRICH CGTACGCC

H O O OH N H O 153 C9H9NO4 195.17638 ALDRICH CTCTAGCC

158

N ONa N N 154 C3H5N4NaO2S 184.1527 ALDRICH N O AGAGAGCC S

OOH

155 C9H10O2 150.17885 ALDRICH TCTCAGCC

CH3

OH

O 156 C9H10O2 150.17885 ALDRICH GAGAAGCC CH3

OH

O CH3 157 C11H14O2 178.23303 SALOR GACTTCCC

OH

O H3C 158 C9H10O2 150.17885 ALDRICH TCAGTCCC

O

OH

159 C9H10O2 150.17885 ALDRICH AGTCTCCC

O

OH

CH 160 3 C10H12O2 164.20594 ALDRICH CTGATCCC

159 CH3 O

OH

161 C10H12O2 164.20594 ALDRICH CGATGCCC

OH

O 162 C10H10O2 162.19 ALDRICH ATCGGCCC

O

OH 163 C12H10O2 186.2123 FLUKA TAGCGCCC

O

HO 164 C12H10O2 186.2123 ALDRICH GCTAGCCC

O 165 C14H12O2 212.25054 ALDRICH TTTTCCCC HO

HO 166 C15H14O2 226.27763 FLUKA GGGGCCCC O

O

167 N 253.256 CCCCCCCC OH

O

160

OH

168 O 154.139 AAAACCCC F

HO O O 169 202.208 ACGTACCC

O F O 170 HO 170.138 CATGACCC

O

H3C O O

OH 171 + 204.244 GTACACCC N H3C CH3 H3C Cl

OH O O 172 202.208 TGCAACCC

O OH N N O 173 C10H8N2O3 204.18686 SALOR ATATTACC

O

N

NO 174 C11H10N2O4 234.21335 SALOR CGCGTACC

OOH

O H C 3 NN

O N N O 175 C10H12N4O4 252.23154 SALOR CH GCGCTACC 3 OH

161 CH3 O H C 3 OH

CH 176 3 C12H16O2 192.26012 SALOR TATATACC

O S O

177 OH C8H8O3S 184.214 TCCTGACC

O Cl H C 3 N N

O N N S 178 C16H15ClN4O4S 394.83935 GAAGGACC SALOR CH3 OH

O

O OH N N O 179 C16H12N2O3 280.28564 SALOR CTTCGACC

O S OH N N O O N 180 C6H7N3O4S 217.20439 SALOR AGGAGACC

O N OH

NO 181 C11H10N2O3 218.21395 SALOR CAGTCACC

HO 182 O OO C15H14O5 274.27583 SALOR ACTGCACC O

162 O

OH

H C CH 183 3 3 C11H14O2 178.23303 SALOR TGACCACC

N

N CH3 HO 184 C12H14N2O2 218.25758 SALOR GTCACACC O

O O N OH N 185 C10H8N2O3 204.18686 SALOR GGTTAACC

O Cl OH N O 186 C9H8ClNO3 213.62201 SALOR TTGGAACC

OO OH N

F F 187 F C10H8F3NO3 247.17536 SALOR AACCAACC

O O

HO O O CH3 Cl 188 O C18H13ClO6 360.75371 SALOR CCAAAACC

O

SOH CH 189 N N 3 C11H14N4O2S 266.32383 SALOR AATTTTAA N N

163 Cl Cl O O N Cl OH 190 Cl O C10H3Cl4NO4 342.95171 SALOR CCGGTTAA

H N O O S OH 191 C10H9NO3S 223.25213 SALOR TTAATTAA

O

192 HO C9H8O2 148.16 TGGTGTAA ALDRICH

O OH N O 193 N C10H8N2O3 204.184 GTTGGTAA

O

194 C10H12O2 164.21 CAACGTAA FLUKA OH

O

Alfa C11H14O2 178.23 195 HO ACCAGTAA Aesar

O

OH Trans 196 C9H9IO2 276.08 CTCTCTAA World I Chemicals

OH

O N 197 C8H13NO3 171.195 AGAGCTAA O

164 OH CH3 O N 198 S C9H11NO4S 229.255 TCTCCTAA O O

NH O 2 199 C10H13O2N 179.2 GAGACTAA Aldrich HO

H C 3 S O N S OH 200 C7H13NO2S2 207.31516 SALOR GCATATAA H3C

165