<<

Hybrid-Phase Native Chemical Ligation Approaches to Overcome the Limitations of Total Synthesis

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Ruixuan Ryan Yu

Graduate Program in the Ohio State Program

The Ohio State University

2016

Committee:

Jennifer J. Ottesen – Advisor

Michael G. Poirier

Michael A. Freitas

Dennis Bong

Copyrighted by

Ruixuan Ryan Yu

2016

Abstract

Total protein synthesis allows the preparation of with chemically diverse modifications. The numerous advantages of total synthesis are sometimes offset by some major limitations. Protein synthesis is a non-trivial task involving many chemical steps, and these steps increase with the size of the protein. Therefore, larger proteins are difficult to synthesize with high yield. We have developed a strategy which we term hybrid-phase native chemical ligation (NCL) to overcome some of the limitations of size and yield.

Hybrid-phase NCL combines ligating peptides on a solid support (solid-phase NCL) and in solution (solution-phase NCL) to maximize synthetic yield. We have successfully used this method to synthesize triple-acetylated histone H4-K5ac,K12ac,K91ac and, for the first time, acetylated centromeric histone CENP-A-K124ac (CpA-K124ac).

In order to improve the yield of CENP-A total synthesis, we have incorporated a convergent ligation element in our hybrid-phase strategy. This new approach reduced the number of purification steps, leading to a synthetic yield that was almost triple that of the original approach.

Finally, we introduce the convergent solid-phase hybrid NCL approach that allows the preparation of a long peptide segment bearing a masked thioester on a solid support. ii

Through a newly developed resin-anchoring strategy, cleavage of the product from solid- phase generated a ligation-compatible segment that could be used directly with no purification. This method has the potential to synthesize large proteins in good yield, effectively overcoming the size and yield limits of protein total synthesis.

iii

Dedication

Alice and Owen.

iv

Acknowledgments

I would like to thank all members of the Ottesen lab for their constant support, assistance, and guidance throughout my graduate career. I would like to thank Dr. Santosh Mahto for mentoring me during the first few years, and pushing me to continue his H4 total synthesis project. Thank you to C.J. Howard for advising me with his broad knowledge of biology.

Thank you to Michael Cotten and Mallory Alexander for helping move my project forward.

Thank you to Kurt Justus for assisting me with all the problems plaguing the histone peptides. Thank you to Dr. John Shimko for giving me his expert opinions on biochemistry, organic chemistry, and nerd culture. Thank you to Ziyong Hong for assisting me with all of the time-consuming experiments. My project would not have moved forward so efficiently without his assistance.

Most of all, I thank my advisor Dr. Jennifer Ottesen for giving me unconditional support both inside and outside the lab. Her confidence in me did not change even during the year when I was starting to drift off from my research. She encouraged me to keep going even during the times when I was ready to give up. Her support even extended to my family, especially after my son was born. In addition, she used all of her available resources to support my decision to attend medical school after graduation. She is a mentor who truly cares about the success of her students. v

Vita

2002-2006 ...... Adrian Wilcox High School

2010...... B.A. Molecular and Cell Biology,

University of California, Berkeley

2010...... B.A. Practice of Art, University of

California, Berkeley

2010-2016 ...... Graduate Teaching and Research Associate,

The Ohio State University

2013...... M.S. Biochemistry, The Ohio State

University

Publications

Ruixuan R. Yu, Santosh K. Mahto, Kurt Justus, Mallory Alexander, Cecil J. Howard, Jennifer J. Ottesen, “Hybrid phase ligation for efficient synthesis of histone proteins”. Organic and Biomolecular Chemistry, 2016, 14:2603-2607

Cecil J. Howard, Ruixuan R. Yu, Miranda L. Gardner, John C. Shimko, Jennifer J. Ottesen, “Chemical and Biological Tools for the preparation of modified histone proteins. Topics in Current Chemistry, 2015. 363:193-226

vi

Fields of Study

Major Field: The Ohio State Biochemistry Program

vii

Table of Contents

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita ...... vi

Publications ...... vi

Fields of Study ...... vii

Table of Contents ...... viii

List of Tables ...... xvii

List of Figures ...... xviii

List of Acronyms ...... xxii

Chapter 1: Introduction ...... 1

Protein Total Synthesis ...... 1

Native Chemical Ligation ...... 2

Applications of total synthesis ...... 7

Histones...... 10

Histone Post-Translational Modification ...... 13

viii

Methods to Prepare Histone PTMs ...... 13

Genetic mimics ...... 14

Expanded genetic code ...... 15

Dehydroalanine ...... 16

Chemical installation through ...... 17

Disulfide stapling ...... 18

Chemical Synthesis ...... 20

Goals ...... 20

Outline ...... 22

Chapter 2: Solid-Phase Ligation vs. Hybrid-Phase Ligation of Histones ...... 23

Introduction ...... 23

Solution-Phase NCL ...... 23

Solid-Phase NCL ...... 25

Experimental Methods ...... 28

Materials ...... 28

RP-HPLC ...... 29

Mass spectrometry ...... 29

Solid-Phase ...... 30

Synthesis of 3-Fmoc-Dbz-OH ...... 30

ix

Automated Solid-Phase Peptide Synthesis ...... 30

Manual peptide synthesis ...... 32

Manual synthesis of Dbz(Alloc) resin ...... 34

Loading the first on Dbz(Alloc) ...... 35

Symmetric anhydride coupling on HMBA ...... 35

Alloc Deprotection ...... 36

Nbz Conversion in DCM ...... 36

Nbz conversion in DMF and NMP ...... 37

Peptide cleavage ...... 37

Synthesis and purification of H4 peptides ...... 38

H4-A (acSer1-Leu37)-Nbz ...... 38

H4-A-K5ac,K12ac (Ser1-Leu37)-Nbz(formyl)-Arg ...... 39

H4-B (Thz38-Gly56)-Nbz ...... 39

H4-H (Pen57-H75)-Nbz(formyl)-Arg ...... 40

H4-C-K91ac (Thz76-Gly102-HMBA-Arg-Gly)-Nbz ...... 41

H4-(76-102)-K79ac (Thz76-Gly102) for semi-synthetic H4-K79ac ...... 41

Synthesis and purification of CENP-A peptides ...... 42

CpA-1 (Gly2-Gly34)-Nbz ...... 42

CpA-2 (Thz35-Leu70)-Nbz (formyl) ...... 42

x

CpA-2 (Thz35-Leu70)-O-Cys(StBu) ...... 43

CpA-3 (Thz71-Ala97)-Nbz-Arg-Arg ...... 45

CpA-4 (Thz98-H115)-Nbz-R ...... 45

CpA-5-K124ac (Thz116-Gly14-HMBA-Arg-Gly)-Nbz ...... 46

SP-NCL ...... 46

Buffers ...... 46

Base resin synthesis for SP-NCL ...... 47

Quantificaiton of product from dry PEGA resin ...... 48

Fmoc deprotection ...... 49

Thz deprotection ...... 49

Ligation ...... 50

Micro-cleavages to assess reaction progress ...... 50

On-resin desulfurization ...... 51

Cleavage from the resin at the HMBA linker ...... 51

Cleavage from the resin at the Rink linker ...... 52

SDS-PAGE ...... 52

Solution-phase NCL of Hybrid-phase ligation ...... 53

Preparation of SDS-PAGE samples by TCA precipitation ...... 53

Ziptip and MALDI-TOF MS ...... 53

xi

Second Solution-phase NCL ...... 54

Desulfurization ...... 55

Purification of Synthetic histones ...... 55

Refolding histone tetramer ...... 56

Refolding histone octamer ...... 57

Nucleosome reconstitution ...... 57

Native PAGE ...... 58

His6-tagged CENP-A expression ...... 58

Expressed Protein Ligaiton of H4-K79ac ...... 59

H4(1-75)-intein-CBD Expression ...... 59

H4(1-75)-intein-CBD Purification ...... 60

Thiolysis to produce H4(1-75)-SR ...... 61

Ligation ...... 61

Desulfurization ...... 61

Purification ...... 62

Quantification of Histone using UV-Vis spectroscopy ...... 62

Results and Discussion ...... 63

SP-NCL of H4 ...... 63

Dual linker design for SP-NCL ...... 64

xii

Synthetic peptide segments for H4 ...... 66

Synthesis of a peptide segment with an α-thioester ...... 68

Nbz Conversion of H4-A-Dbz ...... 70

Racemization of ...... 71

Synthesis of H4 peptides ...... 76

Sequential SP-NCL of ac-H4 ...... 78

SP-NCL of modified H4 ...... 80

CENP-A ...... 82

Semi-synthesis of CpA-K124ac ...... 86

SP-NCL of CpA-K124ac ...... 89

Nbz conversion of CpA-2-Dbz ...... 91

CpA-2-thioester through O to S acyl shift ...... 91

Nbz conversion using different solvents ...... 92

Synthesis of CENP-A peptides ...... 95

Sequential SP-NCL of CpA-K124ac ...... 97

Hybrid Phase Ligation of H4-K5ac, K12ac, K91ac ...... 101

SP-NCL of H4-BHC-K91ac ...... 102

Solution-phase NCL of H4-ABHC-K5ac,K12ac,K91ac ...... 105

Hybrid Phase Ligation of CpA-K124ac ...... 108

xiii

SP-NCL of CpA-345-K124ac ...... 110

Sequential solution-phase ligation of CpA-12345-K124ac ...... 111

Nucleosome reconstitution using synthetic and semi-synthetic histones ...... 115

Semi-synthesis of H4-K79ac ...... 115

Recombinant expression of His6-tagged CENP-A ...... 117

Refolding and reconstitution of recombinant and semi-synthetic histones ...... 118

Refolding and reconstitution of synthetic histones ...... 119

Conclusions ...... 121

Acknowledgements ...... 122

Chapter 3: Convergent Hybrid-Phase Native Chemical Ligation ...... 123

Introduction ...... 123

Experimental Methods ...... 125

Hydrazinolysis of peptide Nbz and peptide HMBA ...... 125

Preparation of Hydrazide resin using Wang ...... 125

Solution-phase NCL of CpA-12-Dbz ...... 126

Ligation with CpA-1-Nbz ...... 126

Ligation with CpA-1-Dbz ...... 127

Solution-phase NCL of CENP-A using CpA-12-Dbz and CpA-345 ...... 127

Ligation of CpA-12-Dbz and CpA-345 ...... 127

xiv

Desulfurization of CpA-12345 ...... 128

Glycolic acid base resin ...... 129

Synthesis of glycolic acid base resin ...... 129

Resin thioesterification ...... 129

Ligation of CpA-2-Dbz-Gly-Lys(Cys) ...... 130

Ligation of CpA-1-Dbz ...... 130

Cleavage of CpA-12-Dbz0 ...... 130

SDS-PAGE of TFA-treated resin ...... 131

Base resin sequences ...... 131

Quantificaiton of product from dry PEGA resin ...... 131

Results and Discussion ...... 133

Hydrazide as a cryptic thioester for convergent ligation ...... 133

Synthesis of CpA-2 on hydrazide resin ...... 134

CpA-2-N2H3 by hydrazinolysis of Nbz ...... 135

CpA-12-N2H3 by hydrazinolysis of HMBA ...... 138

Using Dbz as a cryptic thioester for convergent ligation ...... 142

Convergent Hybrid-Phase NCL of CENP-A ...... 143

Solution-phase ligation of CpA-12-Dbz ...... 144

Solution-Phase NCL of CpA-12345-K124ac ...... 146

xv

Convergent SP-Hybrid NCL of CpA ...... 149

Ligation handle for convergent SP-NCL ...... 149

SP-NCL of CpA-12-Dbz0 ...... 152

Refolding synthetic CENP-A without purification ...... 154

Conclusions ...... 156

Acknowledgements ...... 156

Chapter 4: Conclusions ...... 157

Future Work and Application ...... 159

References ...... 164

Appendix A: Standard Laboratory Solutions ...... 187

15% acrylamide gel ...... 187

Stacking gel (5% acrylamide) ...... 187

6 x SDS loading buffer ...... 187

SDS-PAGE running buffer ...... 188

Coomassie Stain ...... 188

Destain ...... 188

LB growth media ...... 188

SOC growth media ...... 188

xvi

List of Tables

Table 1: H4 peptide segments160 ...... 67

Table 2: Histidine coupling conditions on Dbz ...... 74

Table 3: CENP-A peptides160 ...... 90

Table 4: SP-NCL conditions for CENP-A ...... 97

Table 5: SP-NCL of H4-BHC-K91ac ...... 103

Table 6: H1.2 Peptides ...... 160

xvii

List of Figures

Figure 1: Mechanism of Native Chemical Ligation ...... 5

Figure 2: Expressed Protein Ligation ...... 6

Figure 3: Nucleosome Structure ...... 11

Figure 4: Post-translational modifications and their genetic mimics ...... 15

Figure 5: Chemistry of Dehydroalanine7 ...... 17

Figure 6: Chemical Modifications of Cysteine7 ...... 18

Figure 7: Introduction of Ubiquitylation through stapling ...... 19

Figure 8: Solution-Phase NCL Approaches ...... 24

Figure 9: C to N SP-NCL Scheme ...... 26

Figure 10: SP-NCL Ligation Scheme for H4160 ...... 63

Figure 11: Dual linker strategy for SP-NCL ...... 65

Figure 12: Preparation of Thioester through the Dbz ...... 69

Figure 13: Nbz Conversion of H4-A ...... 71

Figure 14: Analysis of H4-H peptide ...... 72

Figure 15: Histidine coupling conditions on Dbz ...... 75

Figure 16: Purified H4 peptides ...... 77

Figure 17: SP-NCL of ac-H4160 ...... 79

Figure 18: SP-NCL of Synthetic H4 constructs ...... 81

Figure 19: Comparison of CENP-A and H3-containing nucleosomes ...... 82 xviii

Figure 20: Centromeric Nucleosome PTMs ...... 83

Figure 21: Comparison of CENP-A and H3 nucleosomes ...... 85

Figure 22: CpA-K124ac EPL scheme ...... 87

Figure 23: Expression of CpA(1-115)-intein-CBD ...... 88

Figure 24: CENP-A SP-NCL scheme ...... 90

Figure 25: Nbz conversion of CpA-2 ...... 91

Figure 26: O to S acyl shift approach ...... 92

Figure 27: Nbz conversion of CpA-2 and H4-A peptides using the dry DMF approach . 93

Figure 28: through Vilsmeier-Haack ...... 94

Figure 29: Nbz conversion of CpA-2 using dry NMP ...... 94

Figure 30: Purified CENP-A peptides ...... 96

Figure 31: SP-NCL of CpA-345-K124ac160 ...... 98

Figure 32: SP-NCL of CpA-1 and CpA-2 ...... 99

Figure 33: Hybrid phase ligation of H4160 ...... 102

Figure 34: SP-NCL of H4-BHC-K91ac160 ...... 104

Figure 35: Solution-phase ligation of H4160 ...... 106

Figure 36: Desulfurization of H4-K5ac,K12ac,K91ac ...... 107

Figure 37: Purified H4-K5ac,K12ac,K91ac160 ...... 107

Figure 38: Hybrid-phase NCL scheme of CpA-K124ac160 ...... 109

160 Figure 39: SP-NCL of CpA-3450-K124ac ...... 110

Figure 40: Solution-phase ligation of CENP-A160 ...... 112

Figure 41: Desulfurization of CpA-K124ac ...... 112

xix

Figure 42: Purified CpA-K124ac160 ...... 113

Figure 43: Nucleosome containing CENP-A ...... 114

Figure 44: H4-K79ac EPL scheme ...... 116

Figure 45: H4-K79ac EPL ...... 117

Figure 46: Refolding and reconstitution of recombinant CENP-A ...... 118

Figure 47: Refolding of synthetic histones160 ...... 120

Figure 48: Convergent ligation of CENP-A ...... 124

Figure 49: Thioester conversion from peptide hydrazide ...... 133

Figure 50: Preparation of hydrazide base resin ...... 134

Figure 51: Synthesis of CpA-2-N2H3 ...... 135

Figure 52: Convergent ligation using hydrazide: hydrazinolysis of Nbz ...... 136

Figure 53: Hydrazinolysis of CpA-2- Nbz(formyl) ...... 137

Figure 54: Hydrazinolysis of CpA-2-Nbz ...... 138

Figure 55: Convergent ligation using hydrazide: hydrazinolysis of HMBA ...... 139

Figure 56: CpA-2-HMBA-Arg-Gly-Dbz ...... 140

Figure 57: Ligation and hydrazinolysis of CpA-12 ...... 141

Figure 58: Preparation of thioester using Dbz ...... 142

Figure 59: Convergent hybrid-phase NCL of CENP-A ...... 144

Figure 60: Solution-phase ligation of CpA-12-Dbz ...... 145

Figure 61: Solution-phase ligation of CpA-12 and CpA-345 ...... 146

Figure 62: Desulfurization of CENP-A ...... 147

Figure 63: Purified CpA-K124ac ...... 148

xx

Figure 64: CENP-A Convergent SP-hybrid NCL scheme ...... 150

Figure 65: CpA-2-Dbz-GK(C) ...... 151

Figure 66: SP-NCL of CpA-12-Dbz0 ...... 153

Figure 67: Refolding synthetic CENP-A with no purification ...... 155

Figure 68: Convergent SP-Hybrid NCL scheme of H1 ...... 161

Figure 69: Comparison of CENP-A, H3, and H4 structures ...... 162

xxi

List of Acronyms

Amino acids are referred to by the appropriate one or three letter codes.

6-Cl-HOBt 6-chloro-1-hydroxybenzotriazole

AA Amino acid

Ac Acetylated

CAN Acetonitrile

Anx ε-aminohexanoic acid

Alloc Allyloxycarbonyl

Boc t-butoxycarbonyl

CBD Chitin binding domain

CpA CENP-A

DBU 1,8-Diazabicycloundec-7-ene

Dbz 3,4-diaminobenzoic acid

DCM Dichloromethane

DIEA N,N-diisopropylethylamine

DIC N,N'-Diisopropylcarbodiimide

DMAP 4-Dimethylaminopyridine

DMF N,N-Dimethylformamide

xxii

DmThz 2,2-dimethylthiazolidine

DTT Dithiothreitol

EDT Ethanedithiol

EDTA 2,2',2'',2'''-(ethane-1,2-diyldinitrilo)tetraacetic acid

EPL Expressed protein ligation

Fmoc 9-fluoronylmethoxycarbonyl

FPLC Fast performance protein chromatography

GuHCl Guanidinium hydrochloride

HATU 2-(7-aza-1-H-benzotriazol-1-yl)-1,1,3,3-tetramethylaminium hexafluorophosphate

HBTU 2-(1-benzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate

HCTU 2-(6-chloro-1-H-benzotriazol-1-yl-1,1,3,3-tetramethyluronium hexafluorophosphate

HCCA α-Cyano-4-hydroxycinnamic acid

HMBA 4-(Hydroxymethyl)benzoic acid

HEPES 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid

HO Histone octamer

MALDI-TOF Matrix assisted laser desorption ionization-time of flight

HBHA 4-methylbenzhydrylamine hydrochloride salt resin

Me Methylated

MESNa Mercaptoethylsulfonate sodium salt

MPAA 4-mercaptophenylacetic acid

xxiii

MS Mass spectrometry

Nbz N-acyl-benzimidazolinone

NCL Native chemical ligation

Ni-NTA Nickel nitrilotriacetic acid resin

Nle Norleucine

NMP N-Methylpyrrolidone

NMR Nuclear magnetic resonance

NPCF 4-nitrophenyl chloroformate

Pen Penicillamine

PMSF phenylmethylsulfonyl fluoride

PTM Post-translational modification

RP-HPLC Reversed phase high performance liquid chromatography

SDS-PAGE Sodium dodecylsulfate-polyacrilamide gel electrophoresis

SPPS Solid phase peptide synthesis

TCA Trichloroacetic acid

TCEP Tris(2-carboxyethyl)phosphine

TEMED 1,2-bis(dimethylamino)ethane

TFA Trifluoroacetic acid

Thz Thiazolidine

TIS Triisopropyl silane

Tris 2-amino-2-hydroxymethyl-propane-1,3-diol

UV Ultraviolet

xxiv

VA-044-US 2,2’-azobis[2-(2-imidazolin-2-yl)propane]dihydrochloride

xxv

Chapter 1: Introduction

Protein Total Synthesis

Chemical synthesis of proteins has always been of great interest to chemists.1,2 Proteins are some of the most complex molecules in an organism,3 and protein total synthesis allows the artificial preparation of these important macromolecules.4 At first glance, it may be hard to imagine why one would go through the painstaking work of artificially synthesizing a protein. Typically, recombinant techniques are used to prepare proteins since the labor of protein synthesis is often left to cellular machinery, which can prepare these complex molecules efficiently from simple building blocks.5,6 However, making alterations to the protein is relatively restricted using standard molecular biology techniques.3 Using mutagenesis, the changes we make are limited to the amino acid sidechain, which is in turn limited to the 20 natural amino acids encoded by the DNA sequence.7 It is possible to expand the selection through tRNA manipulation and chemical modification of the protein, but the scope is still severely restricted.7,8 These limitations are absent in total synthesis.9,10

Chemical synthesis gives us control over essentially every atom in the protein, allowing us to introduce any functional moieties possible within the bounds of chemistry.4,11 Both side chains and backbones of proteins can be modified.12,13 Chemical synthesis enables many

1 unique modifications, including the insertion of site-specific isotope labels,13,14 introduction of unnatural amino acids and post-translational modifications,15,16 replacement of peptide backbone with non-native backbone,17 and synthesis of D-form proteins.18,19 All of these synthetic proteins have been used in studies to better understand protein function.

Native Chemical Ligation

Protein total synthesis from start to finish is a daunting task. Since every addition of an amino acid involves condensation and deprotection steps, synthesizing a 100-residue protein requires several hundred chemical steps.4 Synthesis begins with the chemical preparation of peptides. During the earlier years of peptide synthesis, product yield was extremely low because all reactions were carried out in solution.20,21 After each step, product needed to be purified from other reagents before performing the next reaction.

Each purification led to significant yield loss. In addition, the side chain protecting groups caused longer peptides to have poor solubility in organic solvents.22 The development of solid phase peptide synthesis (SPPS) was a major breakthrough in peptide chemistry.23,24

The amino acids are coupled onto a solid support,25 and removing excess reagents before subsequent reactions only requires a simple flow-wash.23 The advantages of the solid-phase over the solution-phase strategy are numerous. Simple wash steps replace the laborious intermediate purification steps that led to significant yield loss.22 Solvent can be easily changed in between chemical steps. Reactions can be carried out with excess amino acids

2 to ensure rapid and efficient coupling. All this greatly improved the efficiency of peptide synthesis.23 Despite the advantages, the upper limit of SPPS is around 50 residues.2,26 Side products begin to accumulate for longer peptides, which are often caused by formation of secondary structures.22 In addition, although each coupling step typically has an efficiency of >99%,27 when it is multiplied over many steps, the overall efficiency can become very low. Therefore, as powerful as SPPS is, this method by itself cannot be used to synthesize moderate-sized proteins. A ligation technique that can stitch smaller peptides into a single protein is necessary.3

Several chemical ligation techniques were developed to overcome the size limit of total synthesis.28-30 In chemical ligation, proteins are synthesized in short peptide fragments which are then condensed together to form the full-length product.28 The challenge was to find two functional groups that chemoselectively react with each other. One of the methods involved reacting a peptide with a C-terminal thiocarboxylate and a peptide with a N- terminal bromoacetyl group.28 This condenses the two peptides with a thioester linkage, and the method was used for the total synthesis of HIV-1 protease.28 Other early chemical ligations techniques result in the formation of oxime,29 thioether,31 disulfide,32 and thiazolidine33 linkages.

The first ligation method that yielded a native linkage through simple, accessible chemistry in the absence of side chain protection was termed native chemical ligation

(NCL).30 This powerful approach involves the specific reaction between the C-terminal α-

3 thioester of one peptide and the N-terminal cysteine, (or any 1,2-aminothiol) of another peptide (Figure 1). In the transthioesterification step, the thiol attacks the carbonyl of the thioester. The N-terminal amine then displaces the thiol of the cysteine in a rearrangement step termed the S to N acyl shift. This leads to the formation of a native with a cysteine at the ligation junction.30 What makes this reaction so efficient and specific is the two-step reaction mechanism. The initial transthioesterification step is reversible, and internal can also participate. However, the S to N acyl shift is only possible with the primary amine of the N-terminal cysteine. Since the formation of the amide bond is very favorable, this second step is essentially irreversible, driving the forward reaction.

Adding to the advantage of NCL, the reaction condition is very mild since it is performed at room temperature with neutral pH. Ligation kinetics vary greatly depending on the C- terminal residue of the thioester.34 Rate typically decreases with steric bulk of the C- terminal residue sidechain. Ligation is complete within 4 h for residues such as Gly and

His, while 48 h may be required for residues such as Val and Thr.34 Therefore it is important to consider relative kinetics of different residues when deciding the ligation site of a protein. Overall, the development of NCL extended the size limit of protein total synthesis considerably.35

4

HS O

SR H2N

Transthioesterification

O

S

H2N S to N Acyl Shift

SH O

N H

Figure 1: Mechanism of Native Chemical Ligation

NCL has also been modified so that part of the protein is expressed while the other segment containing the desired PTM is synthesized. This expressed protein ligation (EPL) approach can produce the N-terminal protein segment with a C-terminal thioester.36 Preparing a protein through this method is referred to as protein semi-synthesis. When expressing the

N-terminal segment, the C-terminus of the segment is fused to a modified intein, a self- splicing protein using DNA cloning techniques. After refolding the intein, this fusion protein can be cleaved with an external thiol, which generates the expressed peptide thioester (Figure 2).36 EPL is particularly useful if the desired modification is close to the

C-terminus of the protein sequence. This way the synthetic portion of the protein is short and relatively easy to synthesize in good yield.

5

SH O

N-term N Intein H

N to S Acyl Shift

O HS R N-term S

Intein H2N Thiol exchange HS O N-term C-term SR H2N NCL

Figure 2: Expressed Protein Ligation

NCL leaves a cysteine at the ligation site. If that site originally contained another residue, we would be effectively making a point mutation. Sometimes this Cys mutation could affect the function of the synthetic protein.16 Desulfurization removes the thiol functionality of Cys and convert the residue into Alanine.37,38 Since Ala is a common amino acid in proteins,37 this allows for more flexibility when it comes to choosing split sites in the protein. If a native alanine is chosen as a split site, ligation using an N-terminal Cys followed by desulfurization results in a traceless ligation. As mentioned before, any 1,2- aminothiol moiety can participate in NCL, which allows for the possibility of other, non-

Cys ligation sites. For example, ligation and subsequent desulfurization using penicillamine yields valine, which allows for the option to carry out ligation at a Valine ligation site.39 Other residues that can be chosen as ligation sites include phenylalanine,40

6 ,41 ,42 and .42 For ligations at Serine and threonine, a special ligation approach was introduced where a peptide ester was used.42

Applications of total synthesis

The ability to carry out total synthesis of a protein has allowed researchers to introduce probes into the native sequence of a protein, and therefore learn numerous aspects about the protein’s activity. Examples include one study where the active site aspartate residue of the HIV-1 protease was labeled with 13C for NMR, which revealed the enzyme’s catalytic mechanism.13 Two aspartate residues in the active site acted as general acid and base to hydrolyze the substrate. Chemical synthesis of proteins allowed for the detailed study of redox potential in Rubredoxins where natural amino acids were replaced by nonstandard amino acids.15 This study found that aromatic residues could regulate metalloprotein reduction potentials, which could be fine-tuned by changing the residue to unnatural amino acids. The histidine residue of human secretory phospholipase A2 was replaced by isosteric thienyl alanine. Even though the difference between the two residues was small, it was enough to inactivate the protein, suggesting that the imidazole ring was important for function.34 This kind of precise change would not be possible using standard mutagenesis techniques. Replacing the same histidine with another natural amino might reveal if the residue is essential for function, but various confounding factors would prevent a concise reason for the loss of function.

7

Unlike mutagenesis, which is limited to the manipulation of the side chain, chemical synthesis allows for the modification of the backbone. One HIV-1 protease was synthesized by replacing the Gly-Gly in the β-turn by a constrained bicyclic ring compound. This change did not affect activity, and it led to increased thermal stability.17 In another backbone engineering study, an amide bond of a serine proteinase was replaced by an ester bond to assess the significance of backbone hydrogen bonding.12 This study found that the elimination of this hydrogen bond reduced binding of the protein to its target by 15-fold.

By making peptides with aza-Gly backbone, it was found that adding an extra nitrogen in the peptide bond increases collagen stability.43. With total synthesis, any modification compatible with the chemistry of choice can be introduced site-specifically into a protein.

One of the most profound examples of the ramifications of total protein synthesis is the ability to construct a protein entirely out of D amino acids. This was first demonstrated in the synthesis of HIV-1 protease. This enzyme was shown to only take the of the standard substrate, and its kinetics for catalysis was identical to that of the native HIV-

1 protease.18 This observation suggested that the three-dimensional fold of the D-protein is the mirror image of the original. Importantly, it supported the hypothesis that D-proteins were capable of supporting lifeforms of the opposite . This was further confirmed by a recent study where a DNA polymerase was synthesized from D amino acids. This polymerase successfully performed replication using a L-DNA template, and two mirrored polymerases could function in a without cross-inhibition.19 Another group used mirror-image DapA to answer the interesting question of whether or not natural

8

GroEL/ES chaperone proteins can fold D-proteins. Interestingly, the chaperone was found to be ambidextrous, and is able to fold proteins of either chirality.44 Recently, Liu and coworkers synthesized a D-polymerase. Mirror image proteins can also be used in racemic

X-ray crystallography, where a racemic mixture of L- and D-proteins are used to make easier. The structures of several proteins were determined using this method.45-47

At times people turn to chemical synthesis for proteins that are difficult to express, such as membrane proteins. One of the earliest examples is the synthesis of the proton channel

M2.48 Synthesis of membrane proteins is difficult because of their hydrophobicity, but methods have been introduced to optimize synthesis and handling conditions.49 In Chapter

2, we will discuss this in the context of centromeric protein A (CENP-A).

Glycoproteins are another popular target for chemical synthesis, which allows for the site- specific modification of a protein with another macromolecule. One of the first glycoprotein to be synthesize was diptericin.50 In addition, chemical synthesis or EPL can be used to introduce post-translational modifications (PTMs) common in eukaryotic cells that are not accessible through bacterial expression systems. One example of a protein with extensive PTMs is alpha-synuclein, which is associated with various neurodegenerative diseases like the Parkingson’s Disease.51 Synthetic and semi-synthetic alpha-synucleins have been used to understand the various PTMs and how they are involved in the pathology.

9

In our next section, we will discuss another group of proteins with extensive PTMs: the histone proteins.

Histones

Eukaryotic DNA is packaged and compacted into the nucleus by histone proteins.52,53 DNA and histones function as a unit called the chromatin.54 There are four major core histone proteins: H2A, H2B, H3, and H4.55 Two copies of each histone forms an octamer complex,

56 which assemble from one (H3/H4)2 tetramer and two H2A/H2B dimers. DNA is wrapped

~1.7 times (146 bp) around the histone octamer (HO),56 forming a complex known as the nucleosome (Figure 3).

10

DNA entry/exit region

Dyad region H2A: Green

H2B: Gray

H3: Blue

H4: Red

LRS region

Figure 3: Nucleosome Structure PDB: 1KX356

The nucleosome can be roughly divided into two regions, the tail and the core region.57

Histones have unstructured N-terminal tail regions that are extensively post-translationally modified.52 The core region is the folded octamer region with a defined three-dimensional structure. Within the core region, the entry-exit region is where the DNA begins to wrap/unwrap from HO.58 The dyad region is defined by the pseudo-plane of symmetry of

11

HO that roughly divides the octamer into two halves.59,60 The lateral surface region interacts with the wrapped DNA, and the solvent-accessible face can stack with another nucleosome unit to form higher-order chromatin structure.61,62 In addition, the loss of rDNA silencing (LRS) region is required for transcriptional silencing.63

Because of this close interaction between histones and DNA, histones are regulators of many DNA dependent processes, such as replication, transcription, and DNA repair.64-66

In addition to the canonical core histones, there are several histone variants, many of which serve distinct functions for particular situations.67,68 CENP-A for example is a H3 variant that is present in centromeres,69 and it signals where the kinetochore should assemble during cell division.70-72

In order for DNA-dependent processes to occur, the histone-wrapped DNA is made sterically accessible by several means. Chromatin remodeling proteins can slide the HO across the DNA to reveal accessible segments, while histone chaperone proteins can disassemble histones from the DNA.73-77 Even without these factors, the interaction between histone and DNA is inherently dynamic.78,79 Dynamics can also be affected by histone PTMs, which is discussed in the following section.

12

Histone Post-Translational Modification

Histones have extensive PTMs, which can be found in any of the nucleosome regions discussed in the previous section.57 Most modifications observed in other proteins can also be found in histones.53 are commonly mono-, di-, or tri-methylated, acetylated,80 ubiquitinated,81 sumoylated,82 biotinylated,83 formylated,84 ADP ribosylated,85 or crotonylated.86 Arginine can be methylated87 or converted to citrulline.88 Ser, Thr, Tyr, and

His can be phosphorylated,89 and Ser and Thr can be glycosylated.90. PTMs in the tails and unstructured regions usually recruit other proteins to affect downstream pathways.91,92

These modifications are suggested to affect one another due to “histone cross-talk”.93-95

The fact that countless combinations of tail PTMs act as an epigenetic code to affect various biological activities have led researchers to propose the concept of the “histone code”.52,96

The PTMs are recognized by reader, writer, and eraser proteins, which triggers down- stream events such as transcription and silencing.97,98 PTMs buried in the histone core on the other hand can directly affect the dynamic interaction between DNA and histone99,100 as well as the structure of the nucleosome itself,101 often changing the steric or electrostatic characteristics of key residues in protein-protein or protein-DNA interfaces.16,59,102,103

Methods to Prepare Histone PTMs

While large numbers of histone PTMs have been discovered,104 and several have been correlated with biological effects, the precise effects of most of these PTMs are still unknown.105 Part of the reason for this slow progress is the difficulty in preparing a

13 homogenous sample of histones with desired modifications. Various methods have been developed to overcome this challenge.7,106

Genetic mimics

One of the most straightforward methods to study PTMs is mutagenesis, where the unmodified residue is mutated to another natural amino acid that shares similar characteristics with the modified residue (Figure 4). For example, a can be replaced by as a mimic of acetyllysine, or by arginine as a mimic of constitutively unmodified lysine.107 Glutamate or aspartate have often been used to mimic phosphorylated Ser, Thr, or Tyr.108 An advantage of mutagenesis is that it can be used in a high-throughput method to quickly screen for possible effects of histone PTMs.109 Since this method only involves simple genetic mutations, any laboratory equipped with recombinant protein expression tools can prepare these mimics. However, in several instances, when the effects of mimics have been directly compared to the chemically precise modifications, the mimics have not replicated the exact modification. This might be expected from the structural differences. For example, H3-K115, K122 ,

(H3-K115ac, K122ac), reduced the free energy of octamer binding to DNA, while the mimics H3-K115Q, K122Q did not.59 In another study, H3-T118 was found to decrease DNA-histone binding free energy and increase nucleosome mobility, while H3-K118E had no effect on those areas.110 For cases like these, it is desired to make either a more similar analog, or the exact modification of interest.

14

O

HN O O O O P O O P O O P O O O O PTMs

N N N N H H H H O O O O Acetyllysine Phosphoserine Phosphothreonine Phosphotyrosine

O NH2 O O

Mimics N N H H O O Glutamine Glutamate

Figure 4: Post-translational modifications and their genetic mimics

Expanded genetic code

Codon suppression is a method that allows for the genetic introduction of modified amino acids. The expanded genetic code was first suggested when a strain of E. coli was found to read through the UAG (amber) stop codon.111 Later, Schultz and coworkers succeeded in introducing a at the stop codon in a species of methanogen.112 After discovering another species of methanogen that incorporated pyrrolysine through the amber codon, the

Chin group eventually developed the genetic incorporation of acetyllysine (Kac) by artificially evolving a methanogen’s pyrrolysyl-tRNA synthetase and tRNA pair to take acetyllysine as substrate. This codon suppression method was first used to prepare H3-

K56ac.113 The system has since been modified to allow incorporation of methyllysines,114,115 Ne(Cys)-lysine,116,117 azidonorleucine,118 and phosphoserine.119

15

Dehydroalanine

Schultz and coworkers developed the genetic incorporation of several other unnatural amino acids, and one of particular interest to the introduction of PTMs was phenylselenocysteine. This residue can be converted to dehydroalanine (Dha), and Michael addition with a thiol reagent introduces the thioether analog of the desired PTM.120 Instead of genetic incorporation, the Davis group developed the chemical conversion of cysteine to Dha using 2,5-dibromohexanediamide.121 The Dha was then converted into thioether analogs of methyllysine, acetyllysine, and phosphoserine, and glycosylated serine. Another group developed the genetic incorporation of Se-alkylselenocysteine, which can also convert into Dha with an improved expression yield over the phenylselenocysteine method.122 Figure 5 summarizes the versatility of Dha. One potential problem is that conversion of chiral cysteine or selenocysteine to planar Dha leads to racemic mixture of the resulting modification analog. Some studies suggest that the inherent chirality of the protein will bias product towards the L-form analog.123

16

H2O2 H2O2

Figure 5: Chemistry of Dehydroalanine7

Chemical installation through cysteine

When preparing PTM analogs, chemical modification is another popular method. The target for these chemical approaches is often cysteine, due to its reactivity and relative rarity, especially in histones. Various reactions have been developed that are specific to the sulfhydryl of cysteines. Cysteine can be introduced in a protein using site-directed mutagenesis, and chemical modification allows for site-specific installation of a PTM. One well-established method is the preparation of methyllysine analogs (MLAs) through cysteine alkylation (Figure 6A). This yields mono-, di, or tri-MLAs depending on the aminoethyl halide used.124 A methylene in methyllysine is replaced by a sulfide in MLA, causing a slight lengthening of the sidechain by 0.28 A, and an 1.1 pKa decrease in the 17 sidechain amine.124 The analog is sufficiently similar to be recognized by methyltransferases and methyllysine antibodies, albeit with decreased affinity.124 Cysteines can also be converted to acetyllysine analogs (Figure 6B,C).117,125 and methylarginine analogs (Figure 6D).126 These cysteine derivatives seem to mimic the effects of the corresponding modifications in some cases but not others.127,128

A B

C D

Figure 6: Chemical Modifications of Cysteine7 (A) Generation of MLA. (B) Generation of thio-methyl aceteyllysine as acetylysine analog. (C) Generation of acetyllysine mimic with thiol-ene chemistry. (D) Generation of methylarginine mimic.

Disulfide stapling

Cysteine has another useful chemical property, which is the ability to form disulfide bonds, which has been exploited to introduce modifications through a technique termed disulfide stapling (Figure 7). For example, histone ubiquitynation is possible by covalently linking and cysteine of histone through a disulfide bond. In this disulfide stapling method, the ubiquitin is expressed as a fusion protein to intein, and thiolysis with 1,2-aminothiol 18 results in ubiquitin with a thiol terminus. This product is then incubated with histone containing a single cysteine, yielding a ubiquitynated histone mimic.129 Despite the disulfide linkage, this mimic was recognized by other enzymes, suggesting the difference in covalent linkage does not change the effect of the modification. The ability to remove the ubiquitin by reduction is both an advantage – allowing for the dynamic study of ubiquitynation by addition of reducing agent,129,130 – as well as a disadvantage, limiting the buffer conditions under which these modifications are stable.

SH O O

Ubiquitin N Intein UbiquitinN-term S H Intein H2N

SH H2N

O O SH UbiquitinN-term N UbiquitinN-term S H SH H2N Protein

O O UbiquitinN-term N H UbiquitinN-term N S H S Protein Protein Ubiquitylated protein Native Ubiquitylated protein using disulfide stapling

Figure 7: Introduction of Ubiquitylation through disulfide stapling

The unique reactivity of cysteines allows for the introduction of various PTM mimics through diverse chemistry. Histones with cysteine mutants can be easily prepared using standard recombinant approaches, allowing for large quantities of modified histones.

19

However, similarly to the amber codon method, the cysteine modification method is generally only limited to the introduction of one type of modification for a given protein.

Chemical Synthesis

As discussed above, there are many ways to introduce PTMs. However, many of these methods can only produce PTM analogs, which are not structurally identical to the native

PTM. Although codon suppression enables the introduction of the precise PTM, it is currently difficult to introduce more than one PTM in a single protein.131 These limitations are not seen with chemical synthesis of modified proteins. In terms of the level of control provided by protein total synthesis, no other methods come close.3 There is essentially no limit to the number and type of modification that can be introduced with synthetic histones.132 Since multiple PTMs can be found distributed throughout the sequence of a single histone in vivo,52,133 total synthesis is currently the only method that can replicate histones typically found in nature. Despite these advantages, total synthesis does have major limitations, and our goal as will be discussed in the coming chapters is to overcome those limitations.

Goals

Protein total synthesis has two major limitations. The first is the size limit of total synthesis.

Although the development of NCL increased the upper size limit of synthetic proteins considerably, it is still not practical to synthesize proteins with more than 300 residues.30,134

20

A larger protein needs to be split into more peptide fragments, which means more chemical steps to produce the full-length product. This leads to a lower yield, which is the second major limitation of total synthesis.135 The yield of synthesis is low, especially compared to protein expression. Total synthesis involves multiple steps. Yield loss can occur at every step from peptide synthesis to ligation and purification. Even making milligram quantities of the protein becomes a challenge.132 Our goal is to develop techniques that can overcome these two limitations, and make total synthesis more efficient and practical. We have used histone proteins as a platform to develop these techniques and to demonstrate the effectiveness of these techniques.

Histone proteins have several properties that make them attractive targets for total synthesis.

They are relatively small proteins, which makes them accessible for total synthesis. They have very few cysteine residues, (with H4 having none), such that chemical ligation followed by desulfurization yields a product with minimal mutations. Histone proteins are relatively challenging to synthesize due to their unique sequences. Some of the histone peptides have poor solubility making purification challenging, while other peptides do not yield clean chemical conversions using standard reaction conditions. However, we view these challenges not as shortcomings but rather as opportunities to optimize total synthesis methods. Many of the techniques we use to handle these challenging peptides can be applied to the chemical synthesis of other proteins containing similarly challenging sequences.

21

Outline

In Chapter 2 we introduce hybrid-phase ligation as a new approach to the efficient synthesis of histones requiring multiple ligation steps. This method grew out of our initial studies in which solid-phase NCL (SP-NCL) proved incompatible with histones. We use the hybrid- phase ligation approach to synthesize H4 and CENP-A. A dual linker is used to optimize peptide cleavage yield and produce the native carboxy terminus. We hope to overcome the current yield limitation of total synthesis with this method.

In Chapter 3 we introduce convergent hybrid ligation as a further improvement to total histone synthesis. We assess two alternate approaches: one in which a protein segment prepared with solution-phase ligation is combined with a segment prepared by SP-NCL, and one in which both segments are prepared by SP-NCL. Key to both of these approaches are versatile linker 3,4-diaminobenzoic acid (Dbz) as a masked thioester.136 For the second approach, we developed a resin-anchoring strategy that maintains a C-terminal cryptic thioester, such that both segments of the protein can be used without further purification steps. This approach to chemical ligation should be applicable for the efficient total synthesis of a wide variety of larger proteins.

22

Chapter 2: Solid-Phase Ligation vs. Hybrid-Phase Ligation of Histones

Introduction

The synthesis of a protein from two peptide fragments is relatively straight-forward process involving only one ligation step. However, given the limits of SPPS and the size of most proteins, it is usually necessary to split even moderately sized proteins into three or more fragments. Syntheses that require more than one ligation step can lead to various complications and low yields. The various approaches to ligate multiple peptide fragments are discussed in the following sections.

Solution-Phase NCL

A simple and straight-forward approach to ligating multiple peptides is through sequential ligation. Peptides are ligated one by one in sequence, either in the N to C or C to N direction, (Figure 8).135 With ligation in the C to N direction, the N-terminal cysteine of peptide thioester must be protected to prevent cyclization and oligomerization. Further, the protection must be reversible under mild conditions compatible with ligation. While several strategies have been developed, the cysteine is most often protected with

23 acetamidomethyl (Acm)137 or as a ring-closed form of cysteine called thiazolidine

(Thz).138,139 After the ligation of two segments, deprotection can reveal the cysteine.

Typically, the ligation intermediate is purified before the next ligation. H3 has been successfully synthesized using the C to N sequential NCL using Thz as the protected cysteine.16 Ligation in the N to C direction typically requires a thioester surrogate which is stable to NCL conditions, but can be readily converted to thioester when desired. This method includes the use of N-alkylcysteine,140 cysteinyl prolyl ester,141 bis(2- sulfanylethyl)amino (SEA) peptide,142 and peptide hydrazide.143

N to C Sequential NCL C to N Sequential NCL Convergent NCL

Figure 8: Solution-Phase NCL Approaches

The methods described above requires intermediate purification steps, which are necessary for removing excess peptides and reagents before the next round of ligation. For example, methoxylamine used for deprotection of Thz will react with the incoming peptide thioester.

However, reversed phase high performance liquid chromatography (RP-HPLC) purification of the intermediates results in a significant yield loss. This is especially true for histones, where we see upwards of 70% yield loss from purification.132 Several methods have been developed to minimize the purification steps.35 In the one-pot ligation strategy, sequential ligation and deprotection is performed in the same vessel.139,144 This method

24 intermediate purification step, but it is generally only limited to a three peptide ligation. In addition, the ligation kinetics must be carefully controlled using different thioesters for the first and second ligations. Convergent ligation is a useful approach to minimize the number of purification steps when there are four or more peptide fragments.145-147 However it can be challenging to find the optimal masked thioester required for this method. The various ligation schemes are illustrated in Figure 8.

Solid-Phase NCL

Just as using a solid-phase support drastically improved the yield of peptide synthesis, solid-phase NCL (SP-NCL) is an attractive strategy to improve the yield of total synthesis

(Figure 9).148,149 With SP-NCL, the first peptide is ligated to a solid support, and the subsequent peptides are ligated sequentially to the immobilized segments. Purification is not required since the soluble components can be washed away with excess buffer.

Eliminating the time-consuming purification steps accelerates synthesis. Changing reaction buffers in-between chemical steps is also straight-forward. Excess peptides can be used for ligation, and incomplete reactions can be repeated after a quick wash step.

25

+

SP-NCL

Cleave

Figure 9: C to N SP-NCL Scheme

SP-NCL has been attempted by several labs. SP-NCL can be performed in either the N to

C or the C to N direction. With either approach, protection strategies must be developed in order to avoid cyclization of the peptide dissolved in solution. Early studies include the development of N to C SP-NCL by the Kent research group using thioacids as masked thioester. The thioacid was not completely inert during NCL, causing a small percentage of the middle peptide segment to cyclize.148 Another N to C SP-NCL strategy developed by Raibaut and coworkers used the bis(2-sulfanylethyl)amido (SEA) linker. Reduction of the linker’s disulfide bond induces a rearrangement to produce the active thioester.150

Although efficient, this method prevents the addition of reducing agents during ligation, which is sometimes necessary in order to reverse disulfide formation of the N-terminal cysteine, especially for slow ligations.

The Kent group also developed a His6-tag-assisted SP-NCL of Crambin. C-terminal peptide was synthesized with His6-tag, which was bound on Ni-NTA resin, during the ligation steps. The overall yield of Crambin, however, was 16% compared to the 40% 26 achieved previously with a one-pot strategy, which was stated to be due to less efficient folding.151 Ni-NTA resin is relatively inexpensive, and no specialized linker is required to anchor the peptide to the solid-support. However, many metal-binding and highly charged proteins interact with nickel, so the syntheses of those proteins would not be compatible with this resin. Another approach uses the safety-catch acid-labile (SCAL) linker, which was used to synthesize an 8kDa protein with 20% yield.152 Cleavage of the SCAL linker with trifluoroacetic acid (TFA) produces a C-terminal amide in the product instead of the native carboxyl, and the incorporation of SCAL is nontrivial. More recently, SP-NCL of

H2B was performed using a C-terminal Rink amide linked to PEGA resin.153 Rink is cleaved rapidly with TFA,154 but like with the SCAL linker, cleavage produces an amide terminus.

For this work, we introduce a new SP-NCL strategy that we have designed for the total synthesis of H4 and CENP-A aimed to improve the current yield of synthetic histones. We begin by discussing our initial development of H4 total synthesis by sequential SP-NCL.

We discuss the problems we encountered with several of the peptides and the approaches we used to resolve them. We then examine our initial attempts at synthesis of H4 and

CENP-A using the sequential SP-NC strategy. Despite the efficiency of the reactions, the overall yield was unacceptably low. This prompted the development of a new method, the hybrid-phase NCL, which combined solid-phase and solution phase components. This new strategy led to a significant improvement of yield for both H4 and CENP-A. We end this chapter by successfully incorporating the synthetic histones into nucleosomes.

27

Experimental Methods

Materials

Rink Amide MBHA resin LL (100-200 mesh, 0.36 mmol/g loading) was purchased from

Novabiochem. PL-PEGA resin (300-500 µm, 0.2 mmol/g) was purchased from Varian.

DMF C3H7NO, DMF CH2Cl2, ACN C2H3N, and diethyl ether (C2H5)2O were purchased from Fisher Scientific. NMP C5H9NO was purchased from AGTC Bioproducts. Piperidine

C5H11N, NPCF ClCO2C6H4NO2, MPAA HSC6H4CH2CO2H, TCEP C9H15O6P, DIEA

C8H19N, Phenyl silane C6H8Si, and Tetrakis(triphenylphosphine)palladium(0) (Pd(PPh3)4) were purchased from Sigma Aldrich. Fmoc protected amino acids were purchased from

AAPPTec and Novabiochem. Fmoc-6-Aminohexanoic acid (Fmoc-Ahx-OH), Fmoc-L- norleucine (Fmoc-Nle-OH), and DMAP C7H10N2 were purchased from Novabiochem.

HATU, HBTU, HCTU, and 6-Cl-HOBt C6H4ClN3O were purchased from AAPPTec.

Acetic anhydride and MESNa C2H5NaO3S2 was purchased from Fluka Analytical. DIC

C7H14N2 was purchased from Alfa Aesar. VA-044-US C12H22N6.2HCl was purchased from

Wako Chemicals. Ultra-pure guanidine-HCl CH6ClN3 (GuHCl) was purchased from MP

Biomedicals and Alfa Aesar. Dbz acid C37H28N2O6 was purchased from Anaspec. Alloc

C4H5ClO2 was purchased from Acros Organics. TIS C9H22Si (TIS) was purchased from

GFS Chemicals. Boc-(R)-5,5-dimethyl-1,3-thiazolidine-4-caboxylic acid (Boc-dmThz-

OH) was purchased from Chem-Implex International. 9-Fluorenylmethyl N-succinimidyl

28 carbonate (Fmoc-OSu) C19H15NO5 was purchased from Novabiochem. HCCA matrix

C10H7NO3 was purchased from Bruker Daltonics.

RP-HPLC

Analytical RP-HPLC was run on a Shimadzu or Waters instrument using an analytical column (Supelco C18 15 cm × 4.6 mm × 5 µm, flow rate 0.9 mL/min). Preparative RP-

HPLC was run on a Waters instrument using a semi-preparative column (Supelco C18 25 cm × 10 mm × 10 µm, flow rate 5 mL/min), or a preparative column (Supelco C18 25 cm

× 21.2 mm × 10 µm, flow rate 18 mL/min). Solvent A was 0.1 % TFA in water, and Solvent

B was 0.1 % TFA in 1:9 water:ACN. Eluate was monitored at 218 nm and 280 nm wavelengths. Only the 218 nm absorbance trace is shown in the figures.

Mass spectrometry

Peptide masses were confirmed by MALDI-TOF-MS (Bruker Daltonics Microflex) using flexControl 3.3 and flexAnalysis 3.3 sofwares. α-Cyano-4-hydroxycinnamic acid (HCCA) was used for the matrix. Peptide Calibration Standard II (Bruker) was used for calibration of peptides ranging from 0-3 kDa, and Protein Calibration Standard I (Bruker) was used for calibration of proteins ranging from 5-20 kDa. HCCA solution was prepared by resuspending solid HCCA in 1:1 Solvent A:ACN. Typically a mixture of 0.5 µL RP-HPLC purified sample and 0.6 µL HCCA solution was spotted on the MADLI target plate. The

29 expected and observed m/z are the average values. The instrument was calibrated using calibration standards before the analysis of samples.

Solid-Phase Peptide Synthesis

Synthesis of 3-Fmoc-Dbz-OH

3,4-Diaminobenzoic acid (1 g, 6.5 mmol) was resuspended in 125 mL 1:1 ACN:NaHCO3.

Reaction was initiated by the dropwise addition of 9-Fluorenylmethyl N-succinimidyl carbonate (Fmoc-OSu) (2.4 g, 7.1 mmol) in 15 mL 1:1 ACN:NaHCO3 and proceeded for

2 h. HCl was added to a final pH of 1.0, and the mixture was filtered. Filtrate was dissolved in 4 mL DMSO, precipitated with acidified reaction buffer, washed extensively, and dried under vacuum to yield a light brown product. Product identity and purity were validated by NMR spectroscopy.

Automated Solid-Phase Peptide Synthesis

Peptides were synthesized on 100-200 mesh Rink amide MBHA resin using the AAPTec

APEX 396 automated synthesizer. 40-well reaction vessel block was used. Five types of solutions were prepared beforehand for synthesis. 0.3 M Fmoc-AA-OH in NMP were prepared and put into the monomer rack of the synthesizer. 20% piperidine in NMP, 1 M

DIEA in DMF, 0.3 M HCTU in NMP, and capping solution (300 mM 6-Cl-HOBt and 300 mM Acetic anhydride in 1:9 DCM:DMF) were all prepared in separate glass bottles and set in the appropriate location in the synthesizer.

30

For one reaction well, 0.05 mmol of resin calculated from the theoretical loading in g/mmol was used. The resin was transferred to the well using DMF, and swollen in DMF by shaking for 15 minutes.

Fmoc deprotection was performed shaking the resin in 20% piperidine in NMP for 5 min.

The solution was drained, and the deprotection step was repeated two more times. The resin was washed by shaking in DMF for 5 min. The DMF was drained, and the wash step was repeated 4 more times. Solution was drained from the resin before the coupling step.

For coupling, 1 mL Fmoc-AA-OH, 0.9 mL HCTU, and 0.45 mL DIEA were added to the resin. When using 0.05 mmol of resin, the molar equivalents of Fmoc-AA-OH, HCTU, and

DIEA were 6, 5.5, and 9, respectively. The concentrations of Fmoc-AA-OH, HCTU, and

DIEA were 128 mM, 115 mM, and 191 mM, respectively. The resin was shaken for 30 min, and drained. In cases where double-coupling was needed, Fmoc-AA-OH, HCTU, and

DIEA were added again and coupling step was repeated for another 30 min. The resin was then washed by shaking in DMF. DMF was drained, and the wash step was repeat two more times.

If performing an acetyl capping step, 6 mL of capping solution was added to the resin, and the resin was shaken for 5 min. The solution was drained, and the step was repeated one more time. The resin was washed three times with DMF. If no capping step was performed, the resin was washed three times with DMF after coupling. DMF was drained, and the next

31 synthesis cycle was performed, starting with Fmoc deprotection. N-terminal residue of each peptide was coupled as the Boc-AA-OH so the Boc could be deprotected during TFA cleavage.

At the end of synthesis, the resin was transferred from the reaction vessel to a 10 mL Poly- prep chromatography column (Bio-Rad) using DMF. For transferring resin, a thick tip plastic transfer pipette (Samco Scientific) was used. The weight of the empty column before the addition of resin was recorded in order to calculate the dry weight of the resin.

The resin was washed with DMF, then with DCM. The resin was partially dried over a vacuum, and the column containing the resin was lyophilized. Resin was then stored at -20

°C. When synthesizing multiple peptides in parallel, the synthesizer was programmed to pause each time one of the peptides finished synthesis, so that the resin could be transferred and lyophilized. After a 15 min swelling step in DMF, synthesis of the remaining peptides was resumed.

Manual peptide synthesis

Manual synthesis was performed using a glass peptide reactor vessel. Dry resin was measured, and the resin was transferred into the reactor using DMF. Resin was swelled for

15 min by agitating the resin using N2 flow through the bottom of the reactor. DMF was then drained using vacuum.

32

Fmoc deprotection was performed using 20% piperidine in NMP and mixed by N2 agitation by agitation for 3 min. Resin was drained and deprotection was repeated two more times.

The resin was flow-washed with DMF. Flow-wash was performed for 30 seconds with the bottle pointed toward one side of the vessel, and 30 seconds on the other side. The 30 second flow-wash was repeated twice on each side

For a typical manual coupling cycle, 4.4 eq Fmoc-AA-OH, 4 eq HCTU, and 8.8 eq DIEA were used. The Fmoc-AA-OH and HCTU were dissolved in 1 mL DMF. DIEA was added, and the amino acid was pre-activated by shaking for 5 minutes. When using 0.05 mmol of resin, the concentration of Fmoc-AA-OH, HCTU, and DIEA were approximately 200 mM,

180 mM, and 400 mM, respectively. The activated amino acid solution was added to the resin, and resin was mixed by agitation.

After 30 min, reaction completion was assessed by performing Ninhydrin test on 10-20 resin beads. The resin sample was transferred to a 0.8 mL Micro bio-spin column (Bio-

Rad), and washed with DMF, then with DCM. The resin was dried over a vaccum, and transferred into a glass test tube. Two drops each of Monitors 1, 2, and 3 (Anaspec) were added to the resin, and the sample was incubated at 85 °C for 2 minutes. The color was assessed by diluting the sample 10-fold with ethanol. Sufficient presence of unprotected primary amine gave rise to a blue/violet color. No detectable primary amine resulted in a clear to light yellow color.

33

If coupling was complete as assessed by ninhydrin, the resin was flow-washed three times with DMF for the next step. If not, the resin was allowed to couple for 30 more min. If coupling was not complete after 1 h of coupling, the resin was flow-washed three times, and the coupling reaction was repeated one more time.

Acetyl capping of unreacted amines was achieved using 15:15:70 Acetic anhydride:DIEA:DMF, which was prepared immediately prior to the capping reaction.

Half of the prepared capping solution was added to the resin, and the resin was agitated for

5 minutes. The solution was then drained, and the capping step was repeated using the remaining half of the capping solution. The resin was then flow-washed three times. At the end of synthesis, the resin was transferred to a 10 mL Poly-prep chromatography column.

Manual synthesis of Dbz(Alloc) resin

3-Fmoc-Dbz-OH was loaded on the Rink amide resin using 4.4 eq 3-Fmoc-Dbz-OH, 4 eq

HCTU, and 8.8 eq DIEA, similarly to a Fmoc-AA-OH. When Arg tags were required,

Fmoc-Arg(Pbf)-OH was loaded on the reisn before the Dbz. Reaction completion was assessed using Ninhydrin. The resin was washed with DMF, and transferred to a Poly-prep chromatography column, and washed with DCM. The resin was dried briefly using vacuum to remove the DCM.

The column was removed from vacuum, and the bottom was plugged with a yellow cap.

The cap was wrapped with parafilm to prevent leakage. Anhydrous DCM (Sigma) was

34 added to the resin to approximately 3/4 the maximum volume of the column. Allyl chloroformate was added to a final concentration of 250 mM. 1 eq DIEA was then added to the column, and the column was shaken at room temperature for 24 h.

Loading the first amino acid on Dbz(Alloc)

Amino acid coupling directly onto 4-Alloc-Dbz resin was accomplished using 16.5 eq

Fmoc-AA-OH, 15 eq HCTU, and 33 eq DIEA. The Fmoc-AA-OH, HCTU, DIEA mixture was preactivated for 5 min before adding to the resin. Coupling was allowed to proceed for

1 h, followed by acetyl capping. The remaining residues were added using standard molar excesses.

Loading Fmoc-His(Trt)-OH on Dbz with minimal racemization

Fmoc-His(Trt)-OH and 6-Cl-HOBt was dissolved in DMF to a final concentration of 330 mM Fmoc-His(Trt)-OH and 300 mM 6-Cl-HOBt, and immediately added, without pre- activation, to the unprotected Dbz resin. DIC was added to the resin to final concentration of 300 mM, and coupling was allowed to proceed for 45 min. The resin was washed with

DMF, then with DCM. Sample resin cleavage and RP-HPLC analysis was performed in order to assess the extent of coupling. Coupling was repeated if incomplete.

Symmetric anhydride coupling on HMBA

Amino acid was loaded on the HMBA linker using the symmetric anhydride method. 10 eq Fmoc-AA-OH was dissolved in DCM. Drops of DMF was added until the amino acid

35 was completely dissolved. 5 equivalent of DIC was added, and incubated on ice with occasional shaking for 30 minutes. The amino acid solution was filtered in order to remove the precipitate and the filtrate was added to the resin. 0.1 eq DMAP was added as a catalyst.

The reaction was allowed to proceed for 1 hour and the coupling was repeated one more time to ensure complete reaction.

Alloc Deprotection

DCM (typically 7 mL) was added to the resin in the Poly-prep chromatography column and the resin was incubated at room temperature for 20 min. 0.35 eq Pd(PPh3)4 and 20 eq

Phenylsilane was added to the resin. The column was shaken for 45 minutes. The resin was flow-washed with DCM, dried, and lyophilized. Alternatively, the resin could be taken directly to Nbz conversion.

Nbz Conversion in DCM

DCM (typically 7 mL) was added to the resin in the Poly-prep chromatography column and the resin was incubated at room temperature for 20 min. NPCF was added to a final concentration of 50 mM. The column was nutated at room temperature for 30 minutes. The resin was washed with DCM followed by DMF. 0.5 M DIEA in DMF was then added to the resin, and the column was nutated at room temperature for 15 minutes. For successful

Nbz conversions, the DIEA solution typically turned bright yellow immediately upon its addition to resin. The resin was then washed with DMF, then with DCM. The resin was dried and lyophilized.

36

Nbz conversion in DMF and NMP

For Nbz conversion in DMF, the resin was swollen in dry DMF. DMF was dried by adding molecular sieves that had been baked overnight at 300 °C. After the addition of the molecular sieves, DMF was allowed to dry for at least 5 h before use. The dry DMF must be used within a week of its preparation. Using dry DMF that was more than a week old resulted in incomplete Nbz conversions. Solid NPCF was added directly to the resin/dry

DMF suspension to 50 mM. Reaction was allowed to proceed for 30 min. The remaining step was identical to Nbz conversion in DCM. Un-dried DMF could be used for the DIEA treatment. For Nbz conversion in NMP, the same steps described above were performed using NMP.

Peptide cleavage

For analytical cleavages, peptides were cleaved at the Rink linker in 95:2.5:2.5

TFA:TIS:H2O for 2 hours. For sequences containing Cys, 94:2.5:2.5:1

TFA:H2O:ethanedithiol:TIS was used. TFA was eluted in a 1.7 mL centrifuge tube, and

TFA was concentrated with a stream of nitrogen. Peptide was precipitated by adding cold diethyl ether to the tube. For maximal precipitation of peptide, the volume of diethyl ether should be at least five times the volume of remaining TFA. The sample was centrifuged, and the ether was decanted. Ether wash was repeated two more times. The pellet was allowed to air-dry, and then resuspended in 1:1 Solvent A:ACN. The relative ratio of

Solvent A and ACN varied depending on the solubility of the peptide. This sample could

37 be immediately analyzed using RP-HPLC and MALDI-TOF. The peptide could also be flash-frozen and lyophilized before analysis, and lyophilized peptides could be stored at -

80 °C. For preparative cleavages, the procedure was almost identical. However, cleavage was allowed to proceed for at least 3 h instead of 2 h. TFA was eluted into a 50 mL Corning centrifuge tube (with plug-cap). After elution, TFA was added to the resin and the column was shaken for 1 min. The TFA was eluted into the same 50 mL tube. The TFA wash was repeated two more times to ensure maximum peptide extraction.

Synthesis and purification of H4 peptides

H4-A (acSer1-Leu37)-Nbz ac-SGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRL-Nbz

H4-A peptide was synthesized with 0.05 mmol of mFmoc-Dbz(Alloc) resin. The N- terminus was acetylated to mimic the constitutive of eukaryotic H4. Alloc deprotection was followed by Nbz conversion in DCM, which resulted in several additional products.

For purification, 25 mg of lyophilized crude peptide was dissolved in 4 mL of 15% Solvent

B. The sample was centrifuged, and the supernatant was loaded on semi-preparative RP-

HPLC using a gradient of 15-30 % solvent B over 40 minutes. If the total crude peptide was more than 25 mg, two or more purification runs were performed. No more than 25 mg

38 was loaded for each semi-preparative run. Collected fractions were assessed by MALDI-

TOF MS and RP-HPLC. Pure fractions were combined and lyophilized.

H4-A-K5ac,K12ac (Ser1-Leu37)-Nbz(formyl)-Arg

SGRGKacGGKGLGKacGGAKRHRKVLRDNIQGITKPAIRRL-Nbz-R

H4-A-K5ac,K12ac peptide was synthesized with 0.04 mmol of mFmoc-Dbz(Alloc)-R resin. The N-terminal Ser was added as Fmoc-Ser(tBu)-OH. After Alloc deprotection, Nbz conversion was carried out in dry DMF to generate the Nbz(formyl) derivative, and the N- terminal Fmoc was removed by treatment with 1% DBU in DMF for three minutes.

For purification, 25 mg of lyophilized crude peptide was dissolved in 4 mL of 15% Solvent

B. The sample was centrifuged, and the supernatant was loaded on semi-preparative RP-

HPLC using a gradient of 10-30 % solvent B over 40 minutes.

H4-B (Thz38-Gly56)-Nbz

Thz-RRGGVKRISGLIYEETRG-Nbz

H4-B peptide was synthesized with 0.05 mmol of mFmoc-Dbz(Alloc) resin.

For purification, 50 mg of lyophilized crude peptide was dissolved in 4 mL of 20% Solvent

B. The sample was centrifuged, and the supernatant was loaded on preparative RP-HPLC using a gradient of 20-35 % solvent B over 40 minutes. If the crude peptide was more than

39

50 mg, two or more purification runs were performed. No more than 50 mg was loaded for each semi-preparative run.

H4-H (Pen57-H75)-Nbz(formyl)-Arg dmThz-LKVFLENVIRDAVTYTEH-Nbz(formyl)-R

H4-H was synthesized with 0.05 mmol of unprotected Dbz-Arg resin. Histidine was loaded using minimal racemization conditions. Alloc deprotection was performed using dry DMF.

For purification, 50 mg lyophilized crude peptide was resuspended in 7 mL Solvent B. The sample was vortexed for 5 min, and 1 mL Solvent A was added. After vortexing for another

5 min, another 1 mL Solvent A was added. This process was repeated until most of the peptide was dissolved. Then 3 mL of Solvent A was added at a time, vortexing in between each addition, until the total volume of the sample reached 20 mL. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 35-

50 % solvent B over 40 minutes.

H4-C (Thz76-Gly102-HMBA-Arg-Gly)-Nbz Met84Nle

H4-C peptide was synthesized with 0.05 mmol of Gly-HMBA-Arg-Gly-Dbz(Alloc) resin.

Nle was used in place of Met in order to avoid oxidation.

40

For purification, 50 mg lyophilized peptide was dissolved in 4 mL of 30% Solvent B. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 30-45 % solvent B over 40 minutes.

H4-C-K91ac (Thz76-Gly102-HMBA-Arg-Gly)-Nbz

Thz-KRKTVTANleDVVYALKacRQGRTLYGFGG-HMBA-RG-Nbz

H4-C-K91ac peptide was synthesized with 0.05 mmol of Gly-HMBA-Arg-Gly-Dbz(Alloc) resin.

For purification, 50 mg lyophilized peptide was dissolved in 4 mL of 30% Solvent B. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 30-45 % solvent B over 40 minutes.

H4-(76-102)-K79ac (Thz76-Gly102) for semi-synthetic H4-K79ac

Peptide was synthesized using 0.05 mmol of 100-200 mesh Gly-Wang155 resin

(Novabiochem) using standard SPPS methods. The N-terminal residue was added as Boc-

Thz-OH. Cleaved and lyophilized peptide was dissolved in 0.4 M methoxylamine in 1:1

H2O:ACN. The Thz deprotection was allowed to proceed for at least 2 h, and reaction completion was assessed by MALDI-TOF MS.

41

Solvent B was added to the peptide solution so that the final percentage of ACN was 27%.

The sample was centrifuged, and the volume corresponding to 50 mg of crude peptide was loaded on preparative RP-HPLC using a gradient of 30-45 % solvent B over 40 minutes.

Synthesis and purification of CENP-A peptides

CpA-1 (Gly2-Gly34)-Nbz

GPRRRSRKPEAPRRRSPSPTPTPGPSRRGPSLG-Nbz

CpA-1 peptide was synthesized with 0.05 mmol of mFmoc-Dbz(Alloc) resin. The N- terminal Gly was added as Fmoc-Gly-OH. After Alloc deprotection and Nbz conversion,

Fmoc was removed using 1% DBU before cleavage from the resin with TFA.

For purification, 50 mg lyophilized peptide was dissolved in 4 mL of 12% Solvent B. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 12-25% solvent B over 40 minutes.

CpA-2 (Thz35-Leu70)-Nbz (formyl)

Thz-SSHQHSRRRQGWLKEIRKLQKSTHLLIRKLPFSRL-Nbz(formyl)

CpA-2 peptide was synthesized with 0.05 mmol of mFmoc-Dbz(Alloc) resin with the automated synthesizer. The N-terminal residue was added as Boc-Thz-OH. Alloc was removed, and Nbz conversion was performed using dry DMF.

42

For purification, 50 mg lyophilized peptide was resuspended with 1.2 mL of Solvent B.

The sample was vortexed, and 1 mL of Solvent A was added. The sample was vortexed and 1 mL of Solvent A was again added. The sample was vortexed until most peptide dissolved, and the volume of the sample was brought up to 4 mL using Solvent A. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 30-60% solvent B over 40 minutes.

CpA-2 (Thz35-Leu70)-O-Cys(StBu)

The O to S resin was prepared manually using 0.05 mmol PEGA resin. Fmoc Rink linker was coupled using 4.4 eq Fmoc Rink linker, 4 eq HCTU, and 7.95 eq DIEA. After Fmoc deprotection, Fmoc-Cys(StBu)-OH was coupled using the same condition. The resin was deprotected and Ninhdrin test was performed. The ninhydrin resin sample was kept for comparisons purposes.

Using a plastic transfer pipette, the resin was transferred into a 20 mL scintillation vial on ice using cold 0.5 M HCl solution. 2 mL of HCl solution was used per gram of wet PEGA measured. While stirring the mixture with a small magnetic stir bar, 1.4 M KNO2 was added dropwise over a period of 20 min. The volume ratio of HCl to KNO2 should be 2:1 so that the final concentration was 0.33 M HCl and 0.47 M KNO2.

After adding the KNO2 solution, the vial was lightly capped, and the the sample was allowed to stir at room temperature. Reaction completion was assessed by performing a

43 ninhydrin test, and the resulting color was compared to the ninhydrin test before conversion. Reaction was typically complete after 4 h, and an almost clear solution was obtained from the ninhydrin test.

After confirming reaction completion, the resin was transferred back into the glass peptide reactor. The resin was flow-washed three times with water. A solution of saturated Sodium bicarbonate was prepared by adding sodium bicarbonate to water until no more of the solute dissolved. This saturated solution was used to flow-wash the resin three times. Resin turned red upon the addition of the solution. The resin was incubated in the sodium bicarbonate solution for 15 minutes. The resin was flow-washed again water, and then flow-washed with DMF.

The symmetric anhydride coupling method was used to load the first amino acid. The resin was then loaded on the synthesizer to couple the remaining residues. The peptide was cleaved with 95:2.5:2.5 TFA:H2O:TIS for 3 h.

For purification the lyophilized crude peptide was dissolved in 4 mL of 30% Solvent B.

The sample was centrifuged, and supernatant was loaded on semi-preparative RP-HPLC using a gradient of 30-60% solvent B over 40 minutes.

44

CpA-3 (Thz71-Ala97)-Nbz-Arg-Arg

Thz-REISVKFTRGVDFNWQAQALLALQEA-Nbz-RR

CpA-3 peptide was synthesized with 0.05 mmol of mFmoc-Dbz(Alloc)-RR resin with the automated synthesizer. The two Arg tags were added for improved solubility. The N- terminal amino acid was added as Boc-Thz-OH.

For purification, 50 mg lyophilized crude peptide was resuspended in 6 mL Solvent B. The sample was vortexed for 5 min, and 1 mL Solvent A was added. After vortexing for another

5 min, another 1 mL Solvent A was added. This process was repeated until most of the peptide was dissolved. Then 3 mL of Solvent A was added at a time, vortexing in between each addition, until the total volume of the sample reached 20 mL. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 30-

45 % solvent B over 40 minutes.

CpA-4 (Thz98-H115)-Nbz-R

Thz-EAFLVHLFEDAYLLTLH-Nbz-R

CpA-4 peptide was synthesized with 0.05 mmol of Dbz-R resin. Fmoc-His(Trt)-OH was coupled manually using the minimal racemization method. The N-terminal amino acid was added as Boc-Thz-OH.

For purification, 50 mg lyophilized crude peptide was resuspended in 7 mL Solvent B. The sample was vortexed for 5 min, and 1 mL Solvent A was added. After vortexing for another

45

5 min, another 1 mL Solvent A was added. This process was repeated until most of the peptide was dissolved. Then 3 mL of Solvent A was added at a time, vortexing in between each addition, until the total volume of the sample reached 20 mL. The sample was centrifuged, and supernatant was loaded on preparative RP-HPLC using a gradient of 35-

60 % solvent B over 40 minutes.

CpA-5-K124ac (Thz116-Gly14-HMBA-Arg-Gly)-Nbz

Thz-GRVTLFPKacDVQLARRIRGLEEGLG-HMBA-RG-Nbz

CpA-5 peptide was synthesized with 0.05 mmol of Gly-HMBA-Arg-Gly-Dbz(Alloc) resin.

The Gly was coupled to the HMBA using the symmetric anhydride method.

For purification, 50 mg lyophilized crude peptide was dissolved in 4 mL of 30% Solvent

B. The sample was centrifuged, and the supernatant was loaded on preparative RP-HPLC using a gradient of 30-55 % solvent B over 40 minutes.

SP-NCL

Buffers

SP Wash Buffer: 0.1 M Phosphate, 6 M GuHCl, pH 7

SP Deprotection Buffer: 0.1 M Phosphate, 0.4 M Methoxylamine, 6 M GuHCl, pH 4

SP Ligation Buffer: 0.1 M Phosphate, 0.05 M MPAA, 6 M GuHCl, pH 7

46

Synthesis note: after addition of the H4-C-HMBA-RG and CpA-5-HMBA-RG peptides, the resin should not be stored in methanol due to susceptibility of the HMBA linker to methanolysis. All ligations and deprotections, whether on solid-phase or in solution, were conducted at room temperature unless otherwise indicated.

Base resin synthesis for SP-NCL

All base resins were prepared manually using 0.2 mmol/g PL-PEGA resin. Theoretical loading in methanol was 0.01 mmol/mL. PEGA resin was transferred into a graduate Poly- prep chromatography column using methanol. The starting bed volume of PEGA in methanol was recorded. The bed volume of the resin swelled in methanol was estimated to be the volume of the resin. The resin was then transferred to a glass peptide reactor using

DMF, and the resin was flow-washed four times in DMF. The resin was then swelled in

DMF for 15 min.

In order to reduce steric crowding, the loading of the resin was reduced 10-fold to 0.02 mmol/g by coupling of the resin with a mixture of 1:9 Fmoc-Gly-OH:Boc-Gly-OH.

Standard coupling excesses were used for the other residues after the loading cut. Rink linker was loaded as Fmoc-Rink linker, and the N-terminal Thz was added as Fmoc-Thz-

OH. After synthesis, the resin was transferred back into the graduated chromatography column, and washed with methanol. The new bed volume in methanol was recorded.

Methanol was added to cover the resin, and the resin was stored at 4 °C. The sequences of

47 the base resins used are listed below along with their theoretical loading in methanol. Gly where the loading cut is performed is indicated in bold.

H4 SP-NCL: Fmoc-Thz-Ala-Rink-Gly-Gly-PEGA (1.66 µmol/mL loading in methanol)

CENP-A SP-NCL: Fmoc-Thz-Ala-Gly-Gly-Rink-Gly-PEGA (1.66 µmol/mL)

H4 and CENP-A Hybrid NCL: Fmoc-Thz-Ala-Ahx-Rink-Gly-PEGA (1.33 µmol/mL)

Theoretical loading was calculated with the following equation:

���� 0.01 × ������� ������ �� ���� �� ���ℎ���� ×0.1 ������� ��� �� ����� ������ �� ���� �� ���ℎ����

Quantificaiton of product from dry PEGA resin

If the final dry weight of PEGA resin after ligation is known, theoretical starting weight of the PEGA resin can be calculated using the following equation:

����� ����ℎ� �������� ����ℎ� = ���� 1 + 0.02 (�� ) �

Where MWtotal is total molecular weight of all the components added on the resin, starting from the first Gly to the full SP-NCL peptide. Once the starting weight is calculated, the theoretical yield of the cleaved peptide is calculated using the following equation:

���� �ℎ��������� ����� = (����� ����ℎ�)(0.02 )(�� ) �

Where MWpeptide is the molecular weight of the cleaved peptide product.

48

Fmoc deprotection

The base resin was transferred to a Bio-spin column (Bio-Rad) using methanol. The volume of the base resin in methanol was recorded in order to calculate the theoretical yield. Resin was flow-washed with DMF, and it was swelled by incubating in DMF for 20 minutes. The DMF was drained right above the bed volume of the resin, and one column volume of 20% piperidine in NMP was added. The resin was incubated for 5 minutes. The piperidine was drained, and the resin was flow-washed with one column volume of 20% piperidine. Piperidine was drained right above the bed volume of the resin, and the deprotection step was repeated two more times. The resin was then washed with 5 column volumes of DMF, followed with 5 columns volumes of methanol, and then with 5 column volumes of water. The resin was nutated in water for 10 minutes. The resin was then flow- washed with 3 column volumes of SP Wash Buffer, and nutated in SP Wash Buffer for 5 minutes. The flow-wash/nutation step was repeated 3 more times.

Thz deprotection

SP Wash buffer was drained, and the resin was flow-washed with three column volumes of SP Deprotection Buffer. The resin was nutated in SP Deprotection Buffer. After 1 h, the buffer was drained, and SP Deprotection Buffer was added again to the resin. The buffer was replaced at least two more times, with reaction proceeding for a total of 5 hours

(thiazolidine opening to Cys) or 12 hours (dimethyl-Thz opening to penicillamine).

MALDI-TOF MS of micro-cleavages was used to assess reaction completion.

49

Ligation

The flow-wash/nutation step was performed on the resin at least three times with SP Wash

Buffer. To reverse any disulfide formation, the resin was incubated with SP Wash Buffer

+ 10 mM TCEP for 10 minutes and then washed again with SP Ligation Buffer. The buffer was then drained from the resin. Peptide-Nbz was dissolved in SP Ligation Buffer + 20 mM TCEP, and added to the resin. The concentration of peptide was kept between 1-3 mM for optimal kinetics, and ligation was allowed to proceed at least 6 h.

Micro-cleavages to assess reaction progress

To assess the progress of ligation and deprotection, approximately 1% of the resin was cleaved to monitor the progress of the ligation as follows: Resin sample was taken using a

P200 micro-pipette with a cut-off tip, and transferred into a 0.8 mL Micro bio-spin column.

The resin was washed with SP Wash Buffer + 20 mM TCEP and then with water. TFA was added to the resin and incubated for 30 min; residual water in the the resin sample acted as a scavenger. The supernatant was collected by filtration, and the TFA was concentrated using N2 flow. 7:3 H2O:ACN was added to the sample in order to dilute the TFA, and the sample was analyzed by RP-HPLC and MALDI-TOF MS. TCEP was added prior to RP-

HPLC analysis. In an alternative method, ether wash can be performed on the sample.

Eluted TFA was diluted with cold diethyl ether, and the sample was spun down using microcentrifuge. Ether was carefully decanted, and the residual ether was allowed to air- dry. The sample was then resuspended and analyzed on the RP-HPLC. Ether is effective at

50 removing TFA, preventing high concentration of TFA from damaging the RP-HPLC column.

On-resin desulfurization

After lyophilization, the resin was resuspended in buffer containing 0.1 M Phosphate, 5 M

GuHCl, 75 mM MESNa, 300 mM TCEP, pH 7.4 that had been sparged with argon for 30 minutes. VA-044-US was added to 10 mM, and the column was incubated in 42 °C.

Reaction was monitored by micro-cleavages followed by MALDI-TOF MS.

Desulfurization was allowed to proceed overnight.

Cleavage from the resin at the HMBA linker

The resin was washed (flow-wash/nutation cycle) 3 times with SP Wash Buffer, 1 time in

SP Wash Buffer + 10 mM TCEP, followed by 4 repetitions of flow-wash/nutation with water. The resin was then lyophilized. The peptide-resin was resuspended in 0.1 M NaOH, and the reaction was allowed to proceed for 30 minutes. The eluate was collected via filtration. To neutralize the solution, equal volume of 0.1 M HCl was added to the resin, and eluate was combined with first eluate. The resin was washed 3 times with TFA, and the filtrate was collected in a separate vessel. The neutralized base treated sample and the

TFA wash sample were lyophilized separately. The NaCl generated by the neutralization was removed by adding a small amount of water (no more than 50 µL). The sample was spun down, and the supernatant containing the salt was removed. The remaining pellet was resuspended in 1:1 H2O:ACN and lyophilized again to give the salt-free crude peptide.

51

Cleavage from the resin at the Rink linker

The resin was washed (flow-wash/nutation) 3 times with SP Wash Buffer, 1 time in SP

Wash Buffer + 10 mM TCEP, followed by 4 repetitions of flow-wash/nutation with water.

After lyophilization, the resin was treated with 95:2.5:2.5 TFA:H2O:TIS for 1 hour. The solution was then filtered, and the resin was washed three times with TFA. All eluate was combined and concentrated using a flow of N2. The sample was washed with ether, and the pellet was resuspended using a 7:3 H2O:ACN mixture and lyophilized.

SDS-PAGE

The formula for each of the solutions used in SDS-PAGE is listed in Appendix A. SDS loading buffer was added to the samples so that the final concentration was at least 2 x SDS loading buffer. The samples were boiled on a heat block for 5-10 min, and briefly centrifuged. The samples were then loaded in each well of the gel. For each well, a maximum of 15 µL could be loaded. The gel was run with 180 V for 40-50 min.

After the run was complete, the gel was transferred to a container containing Coomassie stain. The gel was stained for at least 2 h. The stain was then discarded, and destain was added to the gel. After 1 h, the destain was discarded, and fresh destain was again added.

The gel was allowed to destain for no more than 12 h.

52

Solution-phase NCL of Hybrid-phase ligation

1.5 molar excess of peptide-Nbz to cysteinyl peptide was dissolved in SP Ligation Buffer, and TCEP was added to make 20 mM. The final concentration of the cysteinyl peptide was

1 mM or more. Ligation proceeded for about 16 hours, and the reaction was monitored by

SDS-PAGE and Ziptip follolowed by MALDI-TOF MS.

Preparation of SDS-PAGE samples by TCA precipitation

100 µL H2O and 25 µL TCA were added to 1-5 µL sample taken from ligation reactions in GuHCl. The sample was incubated at 4 °C for 10 minutes, and spun down using a microcentrifuge. The supernatant was decanted, and diethyl ether was added to wash the pellet. The sample was spun down again, and the ether was decanted. The sample was allowed to air-dry, and SDS loading buffer was added. No more than 0.5 µL 5 M NaOH was added to neutralize the sample.

Ziptip and MALDI-TOF MS

For performing crude MALDI-TOF MS ligation and desulfurization samples, GuHCl, salt, and other buffer components can be removed using C18 ZipTip pipette tips (EMD

Millipore). The tip was washed by pipetting up and discarding 10 µL Solvent A. 2-10 µL of sample was taken with the Ziptip, and the sample was pipetted up and down 5 times in order to assist the binding of the peptide. The tip was then washed 5 times with Solvent A.

2 uL of 30% elution buffer (7:3 Solvent A:ACN) and 2 uL of 70% elution buffer were

53 prepared in separate tubes. The peptide was eluted by first pipetting the Ziptip up and down

5 times in the 30% elution buffer back into the same tube. The Ziptip was then used to pipette up and down in the 70% elution buffer. 2 µL HCCA solution was mixed into both elution samples, and the two samples were spotted for MALDI-TOF MS.

Second Solution-phase NCL

For CENP-A where a second ligation in solution was required, methoxylamine was added to the ligation sample to make 0.4 M. pH was adjusted to 4 in order to deprotect the Thz.

The sample was then transferred to a D-tube dialyzer mini, molecular weight cut-off

(MWCO) 3-6 kDa (Novagen) and dialyzed against 200 mL of SP Wash Buffer at 4 °C.

Buffer change was performed after 5 h, where the dialysis tube was transferred to a fresh

200 mL SP Wash Buffer, and the second dialysis was allowed to go overnight.

The sample was transferred into a new tube. The dialysis tube was washed with a small amount of SP Wash Buffer, and added to the dialyzed sample. The pH of the sample was increased to 10 using NaOH in order to cleave the HMBA. Cleavage was allowed to proceed for no more than 30 minutes in order to prevent epimerization of amino acid residues. pH was brought down by the addition of MPAA to 50 mM. After adjusting the pH to 7.4, 1.5 eq CpA-1-Nbz was added to the solution. TCEP was added to the reaction to make 20 mM, and reaction was monitored by SDS-PAGE, Ziptip MALDI-TOF, and

RP-HPLC. Ligation was allowed to proceed for 16 h.

54

Desulfurization

Before desulfurization, the ligation reaction was dialyzed against 200 mL of SP Wash

Buffer at 4 °C. Buffer change was performed after 5 h, and the second dialysis was allowed to go overnight. TCEP and MESNA were added to the dialyzed sample to make the final concentration as follows: 0.1 M Phosphate, 6 M GuHCl, 75 mM MESNa, 250 mM TCEP, pH 7. The sample was sparged with argon for 30 minutes. To start desulfurization, VA-

044-US was added to a final concentration of 10 mM. The sample was incubated in a 42°C water bath for 16 hours and assessed by RP-HPLC and Ziptip MALDI-TOF MS for completion. If protein precipitates during desulfurization, the sample after dialysis should be diluted at least 5-fold with desulfurization buffer before sparging.

Purification of Synthetic histones

ACN was added to the desulfurization sample to 25% of the total volume. The sample was centrifuged, and loaded on to an analytical column using a 25-70% B gradient over 50 min for H4, and 25-90% B (25-40 for 10min then 40-90% for 40 min) for CENP-A. Fractions were analyzed by MALDI-TOF MS and SDS-PAGE. Purity was assessed by MALDI-TOF

MS, SDS-PAGE, and RP-HPLC. Pure fractions were combined and lyophilized.

55

Refolding histone tetramer

H4-K5ac,K12ac,K91ac was desulfurized for the second time before refolding. Lyophilized protein was dissolved in 100 µL of sparged desulfurization buffer (0.1 M Phosphate, 6 M

GuHCl, 75 mM MESNa, 300 mM TCEP, pH 7). VA-044-US was added to a final concentration of 10 mM, and the reaction was allowed to proceed for 4 hours at 42°C.

Protein quality was assessed by MALDI-TOF MS. This sample was used directly for refolding without purification or dialysis. Equimolar recombinant H3-C110A was added directly to this mixture and placed into a dialysis button. A 6-8 kDa MWCO dialysis tubing containing 50 mL of 25 mM Tris, 6 M GuHCl, 1 mM EDTA, 2 M NaCl, pH 7.5 was prepared, and the button was inserted into the tubing. The tubing was then dialyzed against

25 mM Tris, 1 mM EDTA, 2 M NaCl, pH 7.5.156 Three buffer changes were performed in a course of 3 days.

After dialysis, the sample was purified over a GE Healthcare Superdex 20/300 SEC column in 25 mM Tris, 1 mM EDTA, 2 M NaCl, pH 7.5. All fractions were collected, and assessed by SDS-PAGE. Pure tetramer fractions were pooled and concentrated using Amicon Ultra centrifuge filters 5KDa (EMD Millipore).

56

Refolding histone octamer

Histone octamer refolding was carried out with standard procedures similar to that of tetramer refolding156: equimolar histones were resuspended in 25 mM Tris, 1 mM EDTA,

6 M GuHCl, pH 7.5 and double-dialyzed extensively against 25 mM Tris, 1 mM EDTA, 2

M NaCl, pH 7.5. Octamer was purified using the same protocol described for tetramer.

Concentration of the octamer was determined by A280 using the total extinction coefficient of the four histones.

Nucleosome reconstitution

1.1 molar equivalent of 601 DNA157 (1:9 cy5-labeled:unlabeled DNA) was added to histone octamer in 10 mM Tris-HCl, 2 M NaCl, 1 mM EDTA, pH 7.4, and was dialyzed against 10 mM Tris-HCl, 1 mM EDTA, pH 7.4 overnight. Samples were loaded on PAGE and visualized using Typhoon imager.

For Nap1-assisted reconstitution, octamer in refolding buffer was added to 7.5 mM Tris, pH 7.4, 0.25 mM EDTA, 0.25 mM DTT, 0.1 mg/mL bovine serum albumin (BSA). His6- tagged yNap1 was added to the solution to a final concentration of 0.7 µM and incubated at 37 °C for 15 min. 1.1 molar equivalent of 1:9 cy5-labeled:unlabeled 601 DNA was then added, and the sample was incubated at 37 °C for 45 minutes.

57

Native PAGE

Gel was pre-ran for 1 h before loading in 0.3 x TBE at constant 300 V. The wells were flushed immediately before loading. Ficoll was added to the reconstituted sample to 1x.

Samples were loaded with the gel still running. Fluorescence was measured using the

Typhoon imager.

His6-tagged CENP-A expression

His6-tagged CENP-A in pHCE vector was expressed in DH5α cells following the procedure described in Tanaka et. al.158 Glycerol stock of DH5α transformed with pHCE was inoculated in 5 mL LB Amp media and grown for 24 h. The culture was then transferred to 500 mL of LB Amp media, and grown for 16 h. The culture was centrifuged at 3000 rpm for 20 min at 4 °C, and the pellet was resuspended in buffer containing 50 mM

Tris-HCl (pH 8) and 500 mM NaCl. The cells were lysed by sonication, and the sample was spun down at 23000 rpm for 20 min at 4 °C. The pellet was resuspended in buffer containing 50 mM Tris-HCl (pH 8), 500 mM NaCl, and 6 M urea and shaken for 2 h to solubilize the CENP-A protein. The sample was again spun down at 23000 rpm for 20 min at 4 °C. The supernatant containing CENP-A was separated from the pellet. The solubilization process was repeated on the remaining pellet. The two fractions of supernatant were added to Econo-Pac Chromatography column (Bio-Rad) containing Ni-

NTA resin. The column was nutated at 4 °C for 1 h. Flowthrough was collected, and the column was washed with buffer containing 50 mM Tris-HCl (pH 8), 500 mM NaCl, 6 M

58 urea, and 20 mM imidazole, and eluted with 300 mM imidazole. Pure fractions were assessed by SDS-PAGE. Pooled fractions were dialyzed against water overnight, and the sample was lyophilized. The product was confirmed my MALDI-TOF MS. When recombinant CENP-A was dissolved in RP-HPLC Solvent A, a TFA adduct was observed on MALDI-TOF MS. In order to eliminate this species, the sample was instead dissolved in 0.1% formic acid in water and lyophilized.

Expressed Protein Ligaiton of H4-K79ac

H4(1-75)-intein-CBD Expression

5 mL LB Amp media was inoculated using a glycerol stock of BL21 (DE3) containing

H4(1-75)-intein(Mxe GyrA)-CBD in pTXB1 vector (NEB). The overnight culture was grown at 37 °C for 16 h. The 5 mL culture was added to a 500 mM LB Amp media (typical expression was composed of 5 flasks of 500 mM media for a 2.5 L expression). The cells were induced using 0.2 mM IPTG once Optical Density (OD600) reached 0.4. 4 h after induction, the cells were centrifuged at 3000 rpm for 20 min at 4 °C. Media was discarded, and the cell pellet was resuspended in lysis buffer (25 mM 4-(2-hydroxyethyl)-1- piperazineethane-sulfonic acid (HEPES), pH 7.5, 1 mM ethylenediaminetetraacetic acid

(EDTA), 1 M NaCl, and 1 mM phenylmethylsulfonyl fluoride (PMSF), and stored at -80

°C.

59

Cells were thawed at room temperature, and lysed using sonication. The lysate was centrifueged at 23000 rpm for 20 min at 4 °C. The pellet was washed with Triton wash buffer (25 mM HEPES, pH 7.5, 1 mM EDTA, 1 M NaCl, 1% Triton-X), shaken for 1 h, and centrifuged 23000 rpm for 20 min at 4 °C. Triton wash was repeated and the sample was again centrifuged. The pellet was washed with 25 mM HEPES, pH 7.5, 1 mM EDTA,

1 M NaCl, centrifuged, and decanted.

250 µL DMSO was added to the pellet, and the pellet was minced with a spatula. The pellet was allowed to soak in DMSO at room temperature for 30 min. 30 mL of 25 mM HEPES, pH 7.5, 6 M Urea, 1 mM EDTA, and 0.5 M NaCl was added to the pellet. The pellet was shaken for 1 h, and centrifuged at 23000 rpm for 20 min at 25 °C. Supernatant was collected, and this extraction step was repeated. The first supernatant was combined with the second supernatant.

H4(1-75)-intein-CBD Purification

The sample was split in half, and each was purified by ion exchange over a 5 mL SP-FF column. Sample was loaded on the column using a peristaltic pump. The column was washed with 100 mM NaCl HEPES Urea buffer (25 mM HEPES, pH 7.5, 6 M Urea, 1 mM

EDTA). Product was eluted using salt concentrations at 200mM, 300mM, 400mM, and

500mM NaCl in HEPES Urea buffer. All eluents were analyzed by SPS-PAGE. Fractions containing H4(1-75)-intein-CBD were combined.

60

Thiolysis to produce H4(1-75)-SR

The sample was diluted to 50 mL using refolding buffer (25 mM HEPES, pH 7.5, 6 M

Urea, 1 mM EDTA, 1 M NaCl), and was dialyzed overnight against 4 L of refolding buffer at 4 °C. MESNa was added to the sample to make 100 mM. The sample was nutated at 4

°C for 18-24 h. The amount of H4(1-75) thioester was estimated based on the A280 and

SDS-PAGE quantification.132 Below are the extinction coefficients (ε) of various species in cm-1/M.

H4(1-75)-intein-CBD: 37320

Intein-CBD: 34760

H4(1-75)-SR: 2560

Ligation

10 eq of H4(76-102)-K79ac peptide to calculated H4(1-75)-SR was added to the thioester solution. Final concentration of the K79ac peptide was 1 mM. Ligation was allowed to proceed overnight. Ligation was assessed by SDS-PAGE and Ziptip crude MALDI-TOF

MS.

Desulfurization

The buffer of the sample was changed to 25 mM HEPES, pH 7.5, 6 M GuHCl, 1 mM

EDTA, 1 M NaCl before desulfurization using a concentrator (Amicon centrifuge filter

5kDa). MESNa was added to 75 mM. and 1 M TCEP, pH 7.4 was added to 300 mM.

61

Sample was sparged with argon for 30 min. 0.5 M VA-044-US was added to 10 mM, and sample was incubated in a 42 °C water bath. Desulfurization was allowed to proceed overnight. Complete desulfurization was determined by MALDI-TOF MS of Ziptip- prepared sample and RP-HPLC.

Purification

Desulfurized H4-K79ac protein was purified on RP-HPLC using a 25-70 % Solvent B gradient. Fraction were analyzed using RP-HPLC, MALDI-TOF MS, and SDS-PAGE.

Pure fractions were combined and lyophilized.

Quantification of Histone using UV-Vis spectroscopy

Lyophilized full-length histone was resuspended 0.1 M Phosphate and 6 M GuHCl, pH 7.

Absorbance at 280 nm was measured and concentration was calculated using the Beer-

Lambert law. The extinction coefficient ε was calculated using the following equation:

ε = 5690 � + (1280)(�)

159 Where Ntrp and Mtyr are number of Trp and Tyr residues, respectively. Calculated ε280 was 12660 M-1cm-1 for CENP-A and 5120 M-1cm-1.

62

Results and Discussion

SP-NCL of H4

Figure 10 outlines the SP-NCL scheme of H4. After four rounds of ligation, H4-ABHC would be desulfurized on the solid-phase and cleaved at the HMBA. The native H4 would be produced with only one purification step overall.

Rink linker

Ligation handle Deprotection HMBA-Arg-Gly C

Ligation

H4-C0 C 1) Deprotection 2) Ligation

H4-HC0 H C

ac-H4-ABHC0 ac- A B H Pep1C Desulfurization

ac- A B H C ac-H4-ABHC0 HMBA cleavage

ac-H4 ac- A B H C

Figure 10: SP-NCL Ligation Scheme for H4160

63

Dual linker design for SP-NCL

In order to yield a product with a native carboxy terminus while enabling rapid cleavage and analysis of the peptide intermediates, we developed a dual linker strategy (Figure 11).

The handle between the peptide and the resin contained two key linkers: 4-

Hydroxymethylbenzoic acid (HMBA)161 and Rink. H4-C was synthesized with a C- terminal HMBA. The ester bond bond between the HMBA and the C-terminal residue is relatively stable to SPPS and NCL conditions,162 but can be rapidly cleaved at pH 10 in aqueous solution, producing a carboxy terminus (Figure 11). In turn, we synthesized a short peptide containing an N-terminal Thz anchored to the resin with a Rink linker. After ligation of H4-C peptide, the anchored peptide could be rapidly and efficiently cleaved with TFA for analysis using RP-HPLC and Matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS).

64

Rink S

N Fmoc SPPS/Cleavage Piperidine

HMBA Methoxylamine C Arg-Gly

NCL

C

HMBA-Arg-Gly

NaOH TFA

O C C NH2 H4-C H4-C0

Figure 11: Dual linker strategy for SP-NCL

The solid support has excellent swelling properties in TFA, expanding the resin pores and allowing free diffusion of peptide out of the resin. This combined with the high solubility of peptides in TFA effectively eluted most peptides. In order to denote the difference between the H4-C peptide cleaved at different linkers, H4-C will refer to the peptide cleaved at HMBA, while H4-C0 will refer to the peptide cleaved at Rink. H4-C peptide before ligation will be written as H4-C-HMBA-RG.

We chose PEGA resin as the solid support for its good swelling properties in both organic and aqueous solvents,163 allowing a reasonable pore size for the reagents and peptides to 65 diffuse in and out of the resin. It is also stable in a broad range of pH values, which is important since Thz deprotection (pH 4), ligation (pH 7), and HMBA cleavage (pH 10) are all performed on resin. One downside is that the resin is relatively fragile, and repeated lyophilization can damage the resin, making storage and quantification of the resin relatively difficult. Proper resin handling techniques are described in the experimental methods section.

Steric crowding is known to occur as the length of peptide increased upon ligation, preventing efficient reactions.152 In order to avoid this issue, we performed a one tenth loading cut on the resin by coupling a mixture of 1:9 Fmoc-:Boc-Gly.164 This reduced the reactive amine to 10% of the initial loading.

Synthetic peptide segments for H4

Sequences of the peptides synthesized for H4 are listed in Table 1. The initial development of H4 SP-NCL was done in collaboration with Dr. Santosh Mahto. For H3 total synthesis, in order to reduce the number of ligation steps in solution, each peptide fragments were long and challenging to synthesize.132 In contrast, none of the peptides exceeded 40 residues for H4 to ensure efficient synthesis of each peptide. Increasing the number of ligation steps without significant yield loss was part of the benefit of SP-NCL.

66

Table 1: H4 peptide segments160

Peptide residues Peptide sequence H4-A 1-37 SGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRL H4-B 38-56 ARRGGVKRISGLIYEETRG H4-H 57-75 VLKVFLENVIRDAVTYTEH H4-C 76-102 AKRKTVTAMDVVYALKRQGRTLYGFGG

Peptide residues Synthesized peptide sequence H4-A 1-37 ac-SGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRL-Dbz(Alloc) H4-B 38-56 Thz-ARRGGVKRISGLIYEETRG-Dbz(Alloc) H4-H 57-75 dmThz-LKVFLENVIRDAVTYTEH-Dbz-R H4-C- 76-102 Thz-KRKTVTA-Nle-DVVYALKRQGRTLYGFGG-HMBA-RG-Dbz(Alloc) HMBA-RG

We only chose alanine and valine at the split sites since Thz and dimethyl-Thz (dmThz), a precursor to Val, were both commercially available. We avoided Lys as the C-terminal residue of the peptide because the amine sidechain could react with activated MPAA thioester, forming a cyclized product.

Peptide H4-H had marginal solubility, so Arg was introduced as a solubility tag.165 The

Met in H4-C was replaced by an isostere norleucine (Nle) in order to avoid Met oxidation.132 H4-B and H4-C were synthesized with N-terminal Thz, and H4-H was synthesized with N-terminal dmThz. In the initial SP-NCL, H4-A was acetylated at the N- terminus.166 All peptides were synthesized with diaminobenzoic acid (Dbz) linker, a thioester precursor which will be discussed in more detail in the next sections.

67

Synthesis of a peptide segment with an α-thioester

Thioester is a key requirement of NCL, and preparing it is often a major challenge. The two most common types of SPPS are Fmoc and Boc SPPS, which refer to the N-terminal protecting groups of the amino acid. TFA is used to deprotect Boc,167 and HF is required for peptide cleavage from resin, which requires specialized equipment and experience. It is difficult to introduce some modifications, such as phosphorylation and , using Boc chemistry. On the other hand, these modifications are straight-forward with

Fmoc SPPS, and the chemicals required for Fmoc chemistry are relatively mild.168

Importantly, Fmoc SPPS is easily automatable, allowing the rapid synthesis multiple peptides in parallel.169 This is especially important for our studies, because it enables the preparation of histone peptide libraries with different PTMs. However, compared to Boc

SPPS, it is harder to prepare a peptide thioester necessary for NCL using Fmoc SPPS.50

Thioester moiety is base-labile, and the piperidine treatment during the deprotection step can therefore cleave the peptide from the resin. For this reason, several methods have been introduced for the preparation of peptide thioester compatible with Fmoc chemistry.170

These methods include using a milder deprotection reagent,171 safety-catch linker,50 thioesterification of fully protected peptide,172 O to S,173 and N to S acyl shift methods,141,174 SEA ligation,142 and recently using peptide hydrazide.143

In our case we prepare thioesters by synthesizing peptides on 3,4-diamino benzoic acid

(Dbz) linker (Figure 12). The first amino acid is typically loaded on the amino group in the meta (3) position with respect to the carboxy substituent. The other amino group in the

68 para (4) position is relatively deactivated, and its reactivity is further reduced after the acylation of the para-amino group due to steric and electronic effects.136 Once the peptide is complete, on-resin treatment with 4-nitrophenyl chloroformate (NPCF) converts the Dbz into N-acyl-benzimidazolinone (Nbz). The peptide Nbz is then cleaved from the resin, and purified by RP-HPLC. Nbz is rapidly cleaved by thiols such as 4-mercaptophenylacetic acid (MPAA) to yield the desired peptide thioester.136 In order for the Nbz conversion to occur, the deactivated amino group of the Dbz cannot be acylated.

O O H H O N N O N N H H N O O N HN H N Alloc Deprotection 2 Nbz Conversion O H N O O H

Cleavage O O SR O Thiolysis N NH2 O N H

Figure 12: Preparation of Thioester through the Dbz

Despite the efficient conversion steps, coupling has been observed on the deactivated amine,175,176 particularly for Gly-rich sequences. Once an amino acid is coupled on the deactivated amine, subsequent addition of amino acids results in branched products. For several histone peptides with Gly-rich sequences, significant branched products are observed.175 To prevent this problem, in some syntheses the para-amine is protected using allyloxy carbonyl (Alloc).175,177Alloc can be removed after synthesis before Nbz conversion (Figure 12). In addition, this orthogonal protecting group also allows for acetyl

69 capping, which is typically performed after each coupling step to acetylate the unreacted amine species. This prevents the accumulation of truncated peptides. Acetyl capping is not compatible with unprotected Dbz since acetylation of the 4-amine prevents conversion to

Nbz. Of note, a second-generation Dbz linker had recently been developed to address the problem of branched products.178 The Dbz linker allows for the straightforward preparation of thioester for most of our peptides.

Using Solid-Phase Peptide Synthesis (SPPS) we synthesized the peptide fragments that would be ligated together to form the full length protein. Several of the H4 peptides were challenging to synthesize, and various complications were encountered in their preparation.

These challenges are addressed in the coming sections.

Nbz Conversion of H4-A-Dbz

The Dbz of the peptide was converted to Nbz using 4-nitrophenyl chloroformate (NPCF) with dichloromethane (DCM) as solvent, followed by a treatment with diisopropyl ethylamine (DIEA) in dimethyl formaamide (DMF). Although this conversion was straightforward with most peptides, conversion of H4-A-Dbz gave rise to multiple side products, most of which could not be identified (Figure 13). During the initial studies of

H4 SP-NCL, H4-A-Nbz was prepared using this method since it was still possible to isolate the desired product through RP-HPLC purification, albeit with overall yield averaging only

5%. A modified method to optimize the conversion will be discussed in the CENP-A total

70 synthesis section, where a similar problem was encountered with one of the CENP-A peptides.

O O ac- A O ac- A O HN N NH2 NH2 O N H2N H 15-30 % B 15-30 % B

1) NPCF in DCM 2) DIEA in DMF

0 10 20 30 Time (min)

Figure 13: Nbz Conversion of H4-A

Racemization of Histidine

After the Alloc protection of the para-amine of the Dbz, the meta-amine is sterically occluded, and stringent coupling conditions (excess amino acid, strong coupling agent, extended coupling time) were required to load the first amino acid. However, when His was the first amino acid in the sequence, we observed an additional side product in RP-

HPLC analysis, with identical mass to the desired product. When coupling an amino acid to unprotected Dbz, two isomers can be generated with the two species having different retention times in RP-HPLC. However, since our Fmoc-Dbz-OH was generated in only one isomer as assessed by NMR spectroscopy, this seemed unlikely as an explanation. We carried out our analysis of each of the early loading and coupling steps. While cleavage after the first coupling step to generate His-Dbz(Alloc) resulted in only one peak via RP-

71

RP-HPLC analysis, when we analyzed the two-residue sequence, Glu-His-Dbz(Alloc), we observed two peaks with identical mass (Figure 14A). Analysis after coupling 4 more residues generated a single peak (Figure 14B).

O A 0-73 % B H N EH NH2

HN

O O

15 25 Time (min) B 0-73 % B O H N VTYTEH NH2

HN

O O

5 15 Time (min)

O H Eh N C 0-73 % B NH2 HN

O O

O H N EH NH2

HN

O O 5 15 Time (min)

Figure 14: Analysis of H4-H peptide (A) RP-HPLC of EH-Dbz(Alloc). (B) RP-HPLC of the 6-residue segment of H4-H- Dbz(Alloc). (C) RP-HPLC of EH-Dbz(Alloc) diastereomeric mixture after His coupling using racemizing condition.

We hypothesized that the minor peak was the Glu-his-Dbz(Alloc), where “his” is D-histidine, resulting from racemization during the coupling step to the relatively deactivated Dbz amine. Histidine is prone to racemization compared to other amino 72 acids,179 so this was a reasonable hypothesis. To test this, Fmoc-His(Trt)-OH was loaded on Dbz(Alloc) with excess 4-Dimethylaminopyridine (DMAP) to generate the racemic mixture.180 RP-HPLC analysis after Glu coupling revealed that the previously observed minor peak had increased to roughly the same height as the major peak (Figure 14C). This indicated that the side product was the result of His racemization, which gave an explanation for our observations. Single peak was observed for H-Dbz(Alloc) because the two species are , which have indistinguishable properties. Adding Glu converts the two species into , offering a possible explanation as to why they could be resolved using an achiral column. It is not clear if the Dbz and the Alloc contribute to the observable separation of the two species, and the actual reason may be more complex. Only a single peak was observed for the hexapetide, suggesting that the difference in properties between the two diastereomers was too small to achieve separation on the RP-HPLC.

To assess the extent of racemization, various His coupling conditions were tested. Some of the conditions of note are listed in Table 2. RP-HPLC of the diastereomeric mixture generated from the excess DMAP condition was used as a standard for directly comparing the retention times of the various species (Figure 15). When coupling on unprotected Dbz, the resin was Alloc protected after His coupling, followed by Glu coupling. Both coupling efficiency and racemization level were considered when determing the optimal condition for His loading. We found that all coupling conditions tested on Dbz(Alloc)-Arg with reasonable coupling efficiency gave rise to unacceptable level of racemization. Although initial tests were done on Dbz(Alloc) and Dbz resins, unreacted Dbz(Alloc) and Dbz were

73 eliminated during the ether wash, and therefore were not observed on the RP-HPLC. Since this prevented the assessment of coupling efficiency for cases such as condition 1 on Table

2: Histidine coupling conditions on Dbz. Later tests used Dbz(Alloc)-Arg and Dbz-Arg, which were retained after the ether wash, allowing observation via RP-HPLC.

Table 2: Histidine coupling conditions on Dbz

Pre-activation Coupling Racemization Base Resin / coupling Reagents (%) (%) time 30 min / 60 22:10:1 Fmoc-His(Trt)- Standard Dbz(Alloc) min OH:DIC:DMAP 0.23 M His, 0.23 M HATU, 1 Dbz(Alloc) 3 min / 60 min N/A 11.6 0.42 M DIEA None / 30 min 0.28 M His, 0.25 M HATU, 2 Dbz(Alloc)-R 82 4.8 (double) 0.04 M DIEA None / 30 min 0.28 M His, 0.25 M HATU, 3 Dbz(Alloc)-R 88 6.4 (triple) 0.04 M DIEA None / 30 min 0.27 M His, 0.25 M 6-Cl- 4 Dbz(Alloc)-R 25 2.3 (double) HOBt, 0.125 M DIC 0.11 M His, 0.1 M HCTU, 5 Dbz-R None / 45 min 98 6.5 0.16 M DIEA 0.11 M His, 0.1 M 6-Cl- 6 Dbz-R None / 30 min 98 <1 HOBt, 0.1 M DIC

74

Standard b

cd e a

392.7 O O a O HN N Arg H

H2N 1 e 391.6 b O

H2N N Arg b c d H HN

O O a

3 c

O H N Glu N Arg e H 522.8 HN O O

b d d O H N Glu-his N Arg 4 H 658.0 HN O O e c e

O H 659.1 N Glu-His N Arg H HN

O O 6 e

b

Figure 15: Histidine coupling conditions on Dbz (Left) 0-20 % B RP-HPLC of Glu-His-Dbz(Alloc)-Arg for His coupling conditions corresponding to standard, 1,3,4, and 6 found in Table 2. (Right) MALDI-TOF MS of the five observed species a-e in RP-HPLC with structures. For each MALDI-TOF MS, the peak is labeled with the observed m/z, which is within 1 unit of the corresponding species. Note that for condition 1, the species do not contain the Arg tag on the Dbz.

75

Among the conditions, coupling using Diisopropylcarbodiimide (DIC) and 1-Hydroxy-6- chloro-benzotriazole (6-Cl-HOBt) on unprotected Dbz was found to be the optimal condition,181 with racemization at almost background level. After the initial His loading, the Dbz could be Alloc protected or left unprotected. Since H4-H lacks Gly residues, we found that Alloc protection was not necessary. The remaining residues were coupled using standard SPPS methods.

For the Nbz conversion of H4-H-Dbz-Arg, a modified Nbz conversion was used. Full conversion was not achieved with NPCF in DCM, whereas conversion was complete in a mixture of DMF and DCM. This condition produced a mixture of H4-H-Nbz-Arg and H4-

H-Nbz(formyl)-Arg. The cause of formylation will be discussed in more detail in later sections. The formylated species converted to thioester without issues.

Synthesis of H4 peptides

Synthesis and Nbz conversions of the remaining peptide segments for H4 were straightforward. RP-HPLC and MALDI-TOF MS of purified H4 peptides are in Figure 16.

76

20 -35 % B H4-acA-Nbz [M + H]+ O ac- A O observed m/z 4137 N Expected m/z 4138 NH2 O N H

20-35 % B H4-B-Nbz

+ O [M + H] B O observed m/z 2322 N NH2 expected m/z 2321 O N H

H4-H-Nbz-Arg 30-60 % B

+ O [M + H] H O observed m/z 2636 N ArgNH2 H-R (no Nbz) expected: m/z 2634 O N H H O

25 -40 % B [M + H]+ H4-C0-Nbz observed m/z 3517 O expected m/z 3520 C O N NH2 O N H

30 -40 % B H4-C0-K91ac-Nbz

[M + H]+ O C O observed m/z 3579 N NH2 expected m/z 3581 O N H

Figure 16: Purified H4 peptides RP-HPLC (left) and MALDI-TOF MS (right) of purified H4 peptides.

77

Sequential SP-NCL of ac-H4

After we synthesized the required H4 peptides, Dr. Mahto performed the first sequential

SP-NCL of H4. Intermediates were analyzed by RP-HPLC and MALDI-TOF MS after

TFA cleavage at the Rink linker. Production of full-length H4 with acetylated N-terminus

(ac-H4) was confirmed by RP-HPLC and MALDI-TOF (Figure 17). After performing desulfurization on resin, the final product was cleaved at HMBA with 0.1 M NaOH and purified by RP-HPLC. Although synthesis appeared to be efficient as assessed by purify of the product, yield was only about 1%.

78

C0 [M + H]+ Exp. m/z 3507 Obs m/z 3505 C

HC0 [M + H]+ Exp. m/z 5768 Obs m/z 5764 S

N H C H

BHC0 [M + H]+ Exp. m/z 7899 Obs m/z 7895 B H C

ac-ABHC0 25-70 % B [M + H]+ Exp. m/z 11848 ac- A B H C Obs m/z 11853

Figure 17: SP-NCL of ac-H4160 RP-HPLC and MALDI-TOF MS of H4 SP-NCL ligations. Courtesy of Dr. Mahto.

79

SP-NCL of modified H4

We hypothesized that perhaps the yield had been minimized by suboptimal resin handling, which might have led to increased cleavage at the HMBA linker during synthesis. As such,

I repeated the synthesis of three H4 proteins in parallel: ac-H4, ac-H4-pS47, and ac-H4-

K5ac,K12ac,K91ac (Figure 18). The products were cleaved at the HMBA linker using a solution of 0.1 M NaOH (pH 10). In order to achieve maximal protein extraction, the the base-treated resin was further washed extensively with TFA, and the eluate combined with the base extraction. The major side product arose from the incomplete ligation of H4-H.

Aside from the side product, the major peak on the crude RP-HPLC was the desired product, and the syntheses appeared relatively clean. However, we did not see an improvement in yield, suggesting that careful resin handling procedures did not improve the synthesis.

80

ac- A B C A 25-70 % B ac-H4 [M + H]+ Exp. m/z 11259 ac- A B H C * Obs m/z 11259

10 20 30 Time (min) B 25-70 % B * ac-H4-pS47 [M + H]+ Sph Exp. m/z 11339 ac- A B H C Obs m/z 11341

10 20 30 Time (min)

C 25-70 % B [M + H]+ ac-H4- Exp. m/z 11403 * K5ac,K12ac,K91ac Obs m/z 11407

ac- A B H C

10 20 30 Time (min)

Figure 18: SP-NCL of Synthetic H4 constructs Crude RP-HPLC and MALDI-TOF MS of ac-H4 (A), ac-H4-S47ph (B), and ac-H4- K5ac,K12ac,K91ac (C)

These studies demonstrated that H4 SP-NCL consistently resulted in a low yield. However, the origins of these reduced yields remained unclear. While we attempted to resolve this issue, we also carried out the SP-NCL of histone variant CENP-A. In the following section, we will discuss our initial attempt to synthesize CENP-A. The combined results from H4 and CENP-A provided an explanation for out observation, which eventually led to the development of an alternative ligation strategy. 81

CENP-A

Centromeric protein A (CENP-A) is a histone H3 variant that is found as a core histone in the centromere (Figure 19).182,183 CENP-A acts as an epigenetic marker that distinguishes the centromere from other regions of the chromosome, and it is required for mitotic spindle attachment during cell division.184,185 Over-expression of this protein results in multiple centromeres on a single chromosome and commensurate errors in chromosome replication.186,187 Despite some controversy surrounding the exact structure of the centromeric nucleosome,188-190 growing evidence seems to suggest that the nucleosome is composed of an octameric protein core as with the canonical nucleosome, but with a slightly more open structure with less wrapped DNA.191-193

CpA Nucleosome H3 Nucleosome

Figure 19: Comparison of CENP-A and H3-containing nucleosomes Structure of nucleosome containing CENP-A (left, PDB: 3AN2194) and H3 (right, PDB: 1KX556). CENP-A and H3 are indicated in blue. 82

Relatively few PTMs have been discovered on CENP-A but the modifications that have been identified are thought to play important biological roles.195-198 Our modification of interest is the K124 acetylation (CpA-K124ac), which along with H4-K79ac appear to be

199 correlated with the cell-cycle and are found primarily during the G1/S phase (Figure 20).

The two modifications were initially suggested to play a role in the transition between an octameric nucleosome and a hemisome, an alternate nucleosome structure which contains only one copy each of H2A, H2B, H4, and CENP-A due to correlation in the cell cycle to alterations in AFM analysis of centromeric nucleosomes.199 However, a growing body of evidence suggests that their role may be more complex.

CpA-K124

H4-K79

Figure 20: Centromeric Nucleosome PTMs Crystal Structure of CENP-A-containing nucleosome with CpA-K124 and H4-K79ac indicated in magenta. PDB: 3AN2194

83

CpA-K124ac is found in the dyad region at the histone-DNA interface, and the analogous

PTM on H3 (H3-K122ac) was found to destabilize the nucleosome and decrease the affinity of histone to the DNA.59 It is possible that CpA-K124ac has a similar dynamic effect. Dynamic molecular simulation studies of nucleosome with CENP-A and H3 showed that the CENP-A dimer interface is weaker than the H3 dimer interface, causing the CENP-

A nucleosome to adopt a more flexible structure.200 Since CpA-K124 is found in this dimer interface, acetylation is predicted to affect the dynamics in this region. The same K124 can also be ubiquitylated, which has been suggested to be required for CENP-A deposition at the centromere.196 An alternate role proposed for CpA-K124ac is that it may act as a placeholder for ubiquitylation until the M phase.196 In either case, CpA-K124 is a key site for several different PTMs, which is further supported by a recent finding that CpA-K124 is also methylated.201

H4-K79ac is found in the DNA entry/exit region, and has been identified as part of the loss of ribosomal DNA silencing (LRS) region. Acetylation within this region of the histone

DNA interface, studied in the context of H4-K77ac, K79ac, has been found to increase

DNA unwrapping,102 but it is not known what effect it has in the context of centromeric nucleosome. It is hypothesized that H4-K79ac could increase access to the nucleosome by remodelers.199 Interestingly, some of the major sequence differences between H3 and

CENP-A are concentrated near H4-K79ac (Figure 21). Because of this, it is possible that the structural and dynamic effects of H4-K79ac in CENP-A-containing nucleosome will

84 be different from the same PTM in H3-containing nucleosome, mediated by the residue differences surrounding the H4-K79ac sidechain.

CENP-A(45-138)GWLKEIRKLQKSTHLLIRKLPFSRLAREICVKFTRGVDFNWQAQALLALQEAAEAFLVHLFEDAYLLTLHAGRVTLFPKDVQLARRIRGLEEGLG H3 (46-134) VALREIRRYQKSTELLIRKLPFQRLVREIAQDFK--TDLRFQSSAVMALQEACEAYLVGLFEDTNLCAIHAKRVTIMPKDIQLARRIRGERA

Figure 21: Comparison of CENP-A and H3 nucleosomes (Top) Sequence alignment of CENP-A and H3. Significant residue differences are indicated in red. (Bottom) Crystal structures of H3-containing nucleosome (left) and CENP-A-containing nucleosome (right). H4 is in yellow and H4-K79 is indicated in magenta. CENP-A/H3 are in blue (space-filling model) and major residue differences seen in the alignment are colored cyan. PDB: 1KX556, 3AN2194

Taken together, we hypothesize that CpA-K124ac and H4-K79ac together alter the stability and dynamics of the nucleosome. The PTMs may play a role in regulating access to the centromeric nucleosome at particular times during the cell cycle. Since CpA-K124ac is located near the proposed binding site for the inner kinetochore protein CENP-C,202 the modification may also regulate kinetochore assembly. By preparing CpA-K124ac and H4-

K79ac, we hope to understand the effects of these modifications on the centromeric nucleosome, and thereby shed light on what roles they might play in cell division. Since

85 multiple PTMs can be found at CpA-K124, a robust synthetic strategy of CENP-A would enable the study of not just CpA-K124ac, but of other PTMs as well.

Semi-synthesis of CpA-K124ac

Because the modification is close to the C-terminus of the protein, we initially proposed an expressed protein ligation (EPL) strategy for the production of CpA-K124ac, similar to the strategy that our laboratory developed for preparation of H3-K122ac,59 and that have been established for H2A, H2B, H3, and H4.59,102,203 We first proposed a ligation-desulfurization approach using a ligation site at position 116. Native human CENP-A contains a single cysteine at position 75. Since ligation and subsequent desulfurization would convert this native Cys into Ala, it was not compatible with our strategy. Our collaborator Dr. Yamini

Dalal therefore prepared two recombinant mutants CpA-C75A and CpAC75S that would be compatible with EPL and found that both mutants localized to the centromere, confirming that the Cys was not essential for function.204 We chose to use CpA-C75S for semi-synthesis since the Ser mutation should have minimal effect on the CENP-A structure.

In this work, all synthetic and semi-synthetic CENP-A contain the C75S mutation unless otherwise indicated.

We intended to prepare CpA-K124ac by using expressed CpA(1-115)-intein-Chitin

Binding Domain (CBD) and synthesized CpA(116-140)-K124ac peptide (Figure 22). The fusion protein would be expressed and purified. Allowing intein to fold into its native structure would trigger the N to S acl shift, and cleavage with an external thiol would

86 produce the CpA(1-115) thioester (CpA(1-115)-SR). Ligation would be performed with the addition of the synthetic peptide bearing the K124ac modification. Subsequent desulfurization and purification would produce the desired CpA-K124ac.

SPPS

CpA(1-115) Intein CBD

HS-R HS O CpA1-115)N-term 116-140 SR H2N NCL

CpA(1-N115)-term 116-140 Desulfurization

CpA(1-N115)-term 76-102

Figure 22: CpA-K124ac EPL scheme

Although the K124ac peptide fragment was straightforward, the expression of the CpA(1-

115)-intein-CBD fusion protein was far more challenging than we had anticipated. We tested expression with varying E. coli strains, growth temperature, growth media, inducer concentration, and induction time, but none of the conditions tested resulted in successful expression. SDS-PAGE of the whole cell lysate for the tested expression conditions are shown in Figure 23. Expression of CENP-A is known to be challenging.158 Using the Mfold web server,205 we noted that the mRNA of CENP-A had significant secondary structure that could potentially prevent efficient expression. We prepared a plasmid in which codon substitution was used to disrupt proposed mRNA structure, but this did not improve expression. 87

A B O.D. at induction 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.6 0.8 0.8 0.60.80.80.1.02.0 IPTG (1mM) 0.2 0.2 0.5 0.5 1.0 1.0 0.2 0.2 1.0 1.0 1.00.21.01.01.0 Expression time 2h 4h 2h 4h 2h 4h 2h 4h 2h 4h 2h 2h 2h 2h 2h uninduced uninduced

C O/N Media LB SOC IPTG - + - + - + - + - + - + Expression time 2h 2h 4h 4h 6h 6h 2h 2h 4h 4h 6h 6h

D LB SOC - + - + - + - + - + - + 2h 2h 4h 4h 6h 6h 2h 2h 4h 4h 6h 6h

Figure 23: Expression of CpA(1-115)-intein-CBD (A) SDS-PAGE of expression conditions for cells grown in LB media. O.D. at induction refers to the optical density of the culture when inducer (IPTG) was added. (B) SDS-PAGE of expression condition for cells grown in MMI media. (C) SDS-PAGE of expression conditions with cells inoculated in LB or SOC overnight (O/N) growth media. (D) SDS- PAGE of expression conditions identical to (C), but cells were expressed at 25 °C. All other expression conditions were carried out at 37 °C. For (C) and (D), cells were induced with 1 mM IPTG at O.D. 0.8.

88

After concluding that preparation of CpA-K124ac through EPL was not time-efficient, we turned to a total synthesis approach. We proposed an SP-NCL strategy involving the ligation of five peptides. Although H4 SP-NCL gave low yield, we hypothesized that the problem resulted not from our ligation strategy, but from the properties of H4 protein. If this were the case, CENP-A SP-NCL would be straightforward.

SP-NCL of CpA-K124ac

Figure 24 illustrates the proposed SP-NCL scheme of CENP-A. Through the total synthesis of CENP-A, we hoped to prepare a homogenous sample of CpA-K124ac, as well as demonstrate the utility of SP-NCL. The split sites were determined based on the same criteria discussed for H4 SP-NCL, notably peptide length, commercially available reagents, and predicted fragment solubility and ligation kinetics. Five peptides were synthesized for the total synthesis of CENP-A (Table 3). Synthetic CENP-A contained the C75S mutation

(in CpA-3), similarly to CENP-A EPL.

89

Rink linker Ligation handle 5 HMBA-Arg-Gly SP-NCL

1 2 3 4 5

Desulfurization

1 2 3 4 5

NaOH

1 2 3 4 5

Figure 24: CENP-A SP-NCL scheme

Table 3: CENP-A peptides160

Name residues Peptide sequence CpA-1 2-34 GPRRRSRKPEAPRRRSPSPTPTPGPSRRGPSLG CpA-2 35-70 ASSHQHSRRRQGWLKEIRKLQKSTHLLIRKLPFSRL CpA-3 71-97 AREICVKFTRGVDFNWQAQALLALQEA CpA-4 98-115 AEAFLVHLFEDAYLLTLH CpA-5 116-140 AGRVTLFPKDVQLARRIRGLEEGLG

Name residues Synthesized peptide sequence CpA-1 2-34 GPRRRSRKPEAPRRRSPSPTPTPGPSRRGPSLG-Dbz(Alloc) CpA-2 35-70 Thz-SSHQHSRRRQGWLKEIRKLQKSTHLLIRKLPFSRL-Dbz(Alloc) CpA-3 71-97 Thz-REISVKFTRGVDFNWQAQALLALQEA-Dbz(Alloc)-RR CpA-4 98-115 Thz-EAFLVHLFEDAYLLTLH-Dbz-R CpA-5- 116-140 Thz-GRVTLFPKDVQLARRIRGLEEGLG-HMBA-RG-Dbz(Alloc) HMBA-RG

90

Nbz conversion of CpA-2-Dbz

While all peptides could be efficiently and cleanly synthesized as the Dbz, CpA-2-Dbz failed to convert efficiently to Nbz, similarly to H4-A peptide (Figure 25). To circumvent this issue, we therefore had to find an alternative approach to produce the CpA-2-thioester.

O O O 2 O 2 N HN NH2 NH2 O N H H2N 30-60 % B 30-60 % B

1) Alloc deprotection 2) NPCF in DCM 3) DIEA in DMF

Figure 25: Nbz conversion of CpA-2

CpA-2-thioester through O to S acyl shift

One alternative approach to prepare Fmoc thioester is the O to S shift method (Figure

26).173 In this method, the resin was prepared by first loading Fmoc-Cys(StBu)-OH. Fmoc was removed and the N-terminal amine of Cys was converted into a hydroxyl. The first amino acid was coupled to create an ester linkage with the base resin. The resulting peptide was cleaved with the carboxy ester intact. During ligation the disulfide was reduced, exposing the thiol, causing rearrangement through O to S shift to generate the peptide thioester in situ. This thioester could immediately participate in NCL, or it could be displaced by an external thiol in the ligation buffer. Although it was possible to prepare

CpA-2 thioester using this method, the overall yield of CpA-2-OCys(StBu) was only 5%. 91

By assessing yields at different points during SPPS, we determined that the ester linkage was gradually cleaving after every cycle. Although the O to S method was suitable for short peptides, it was not compatible with the 36-residue CpA-2 for obtaining reasonable yields.

We therefore looked for other alternatives to synthesize CpA-2 thioester in good yield.

S S S S S S H H O H H2N N N H N HO N 2 O O KNO2/HCl O 1) Fmoc-Cys(StBu)-OH SPPS O 2) Piperidine 1) Cleave/Purify 2) Reduce

O HS O S NH2 O NH2 HO O O to S Acyl Shift O

Figure 26: O to S acyl shift approach

Nbz conversion using different solvents

After the unsuccessful alternative approach, we attempted to determine the cause of the

Nbz conversion side products. We hypothesized that the multiple products observed upon conversion might be due to intramolecular interactions and structure formation as described by Siman et. al.206 This could be disrupted by using solvents with hydrogen bonding capabilities such as DMF or NMP. We then tested this hypothesis by performing the NPCF treatment in DMF instead of DCM on both H4-A-Dbz and CpA-2-Dbz. In initial tests, conversion was observed. However, working in collaboration with Dr. John Shimko and

Kurt Justus, we determined that dry DMF must be used, possibly due to water-mediated hydrogen bonding. Therefore, DMF was dried over molecular sieves. By performing

92 conversion in DMF, a clean conversion to a reactive N-acylurea species was finally achieved for CpA-2 (Figure 27A). In addition, we performed Nbz conversion of H4-A using the same method, and clean conversion was also observed (Figure 27B).

O 2 O O N 2 O NH2 A O HN N NH2 H H O H2N 1) NPCF in dry DMF 30-60 % B 30-60 % B 2) DIEA in DMF

0 10 20 30 Time (min)

O O A O A O B HN N NH2 NH2 O N H2N H H O

1) NPCF in dry DMF 15-30 % B 2) DIEA in DMF 15-30 % B

0 10 20 30 Time (min)

Figure 27: Nbz conversion of CpA-2 and H4-A peptides using the dry DMF approach

(A) RP-HPLC of CpA-2-Dbz and CpA-2- Nbz(formyl). (B) RP-HPLC of H4-A-Dbz and H-4-Nbz(formyl). Compare to Figure 13.

It is important to note that reaction in DMF resulted in formylated Nbz, resulting from

Vilsmeier–Haack reaction (Figure 28).207 Although this did not affect the conversion to thioester, it does prevent further derivatization – for example, treatment with hydrazine to generate a peptide hydrazide (Chapter 3). Clean Nbz conversion with no formylation could be achieved using NMP as solvent, since it possesses similar hydrogen bonding capabilities as DMF, and it lacks the aldehyde moiety to participate in Vilmeier-Haack (Figure 29) 93

NO2 DMF O O Cl Cl O N H N H

O O O O O O HN HN HN NH2 NH2 NH2

H2N Cl H2N H2N H N N

H2O

O O O O N NH2 HN O NH2 N Nbz H H H2N H O conversion O

Figure 28: Formylation through Vilsmeier-Haack

O 2 O HN NH2

H2N

30-60 % B

0 10 20 30 Time (min) 1) NPCF in dry NMP 2) DIEA in DMF

O 2 O N NH2 O [M + H]+ N H observed m/z 4598 30-60 % B expected m/z 4596

0 10 20 30 4000 5000 Time (min)

Figure 29: Nbz conversion of CpA-2 using dry NMP RP-HPLC of CpA-2-Dbz (top) and RP-HPLC and MALDI-TOF MS of CpA-2-Nbz (bottom).

94

Synthesis of CENP-A peptides

The syntheses and conversions of the remaining CENP-A peptides were achieved in good yields. With CpA-4, the C-terminal His was loaded using the protocols developed to minimize racemization discussed previously. CpA-3 had relatively poor solubility properties, requiring the installation of an Arg-Arg tag. RP-HPLC and MALDI-TOF MS of purified CENP-A peptides are provided in Figure 30.

95

12-30 % B

[M + H]+ CpA-1-Nbz observed m/z 3765 O expected m/z 3765 1 O N NH2 O N H

+ 30-60 % B [M + H] CpA-2-Nbz(formyl) observed m/z 4625 expected m/z 4623 O 2 O N NH2 O N H H O

30-60 % B [M + H]+ CpA-3-Nbz-Arg-Arg no Nbz observed m/z 3577 expected m/z 3578 O 3 O N ArgNH2 -Arg O N H

[M + H]+ 30-60 % B CpA-4-Nbz-Arg observed m/z 2489 expected m/z 2489 O 4 O N ArgNH2 O N H

0 10 20 30 Time (min) CpA-50-Nbz 30-55 % B [M + H]+

O observed m/z 3343 5 O expected m/z 3345 N NH2 O N H

Figure 30: Purified CENP-A peptides RP-HPLC (left) and MALDI-TOF MS (right) of purified CENP-A peptides

96

Sequential SP-NCL of CpA-K124ac

With all peptides in hand, we carried out the first sequential SP-NCL of CENP-A. The ligation conditions are listed in Table 4. Every ligation was carefully monitored with test cleavages at the Rink using TFA (Figure 31). The first three cycles of deprotection and ligation were observed to generate relatively pure product in high yield.

Table 4: SP-NCL conditions for CENP-A

Ligation MW Peptide Volume Concentration Molar Round Peptide (g/mol) (mg) (mL) (mM) Equivalent Time (h) 1 CpA-5-124ac 3345 3 0.7 1.2 2.8 15 2 CpA-4 2462 2.5 0.7 1.2 2.3 17 3 CpA-3 3577 2.5 0.7 1.0 2.3 20 4 CpA-2 4628 2.5 0.5 1.1 2.6 12 5 CpA-1 3765 3 0.7 1.6 3.8 20

97

CpA-50

[M + H]+ Exp. m/z 3473 Obs m/z 3475 5

CpA-450

[M + H]+ Exp. m/z 5589 Obs m/z 5590 4 5

CpA-3450 [M + H]+ Exp. m/z 8665 Obs m/z 8665 3 4 5

Figure 31: SP-NCL of CpA-345-K124ac160 Crude MALDI-TOF MS of CpA-5 ligation (A). RP-HPLC of CpA-4 and CpA3 ligations (B).

However, after ligation of CpA-2, we were unable to observe any significant product using

RP-HPLC or MALDI-TOF (Figure 32). The disappearance of CpA-345 on the RP-HPLC suggested that the it had fully converted to another product. We initially hypothesized that

CpA-2345 had poor solubility. The last ligation with CpA-1 was then performed. If the issue was due to solubility, we anticipated that the addition of a soluble peptide would

98 make the product observable, since recombinant CENP-A is observable on the RP-HPLC.

However, no improvement was seen.

A CpA-23450

CpA-123450

B

Figure 32: SP-NCL of CpA-1 and CpA-2 RP-HPLC of CpA-2 and CpA-1 ligations (A). SDS-PAGE of CENP-A resin (B).

99

While attempting to assess these results, we considered the possibility that after TFA cleavage, the product was still interacting with the resin. In order to confirm this, we heated the TFA-treated resin in SDS loading buffer to 100 °C for 10 min, and loaded the resin on

SDS-PAGE. Interestingly, we observed a band with a similar size to recombinant CENP-

A but no smaller side products. This suggested that although ligation was successful, the final product could not be eluted from the resin. Attempts to elute the peptide using various buffers and solvents including GuHCl, dimethyl (DMSO), and ionic liquid were unsuccessful.

We hypothesized that full-length CENP-A interacted nonspecifically with the PEGA resin.

Since the yield seemed to drop after the ligation intermediate CpA-345, the N-terminal peptides CpA-1 and CpA-2 could be responsible for the poor recovery of the final product.

To assess sequence dependence, we ligated CpA-1 and CpA-2 directly to the base resin, cleaved at the Rink linker, and observed 30–56% recovered yield. These yields were reduced but did not replicate the extreme losses observed for the equivalent ligation in the protein context. Given that PEGA resin has a large pore size compatible with folded proteins, it seems unlikely that yield reductions are solely size-based. Together, these suggest context-dependent interactions of the larger histone sequences and the solid phase.

100

Hybrid Phase Ligation of H4-K5ac, K12ac, K91ac

The dramatic yield loss observed for CENP-A provided an important insight into the poor yields of H4 SP-NCL. Re-analysis of our syntheses validated that yields through the first

3 rounds of ligation were acceptable, and only the last ligation step reduced yields. We speculated that perhaps this problem could be overcome by cleaving the peptide after the third ligation, and performing the problem ligation in solution. We termed this hybrid- phase ligation, which combines solid-phase and solution-phase ligations in order to maximize yield.

We first attempted this approach with triple-modified H4-K5ac, K12ac, K91ac (Figure 33) using the same peptide fragments described in Table 1. The K5ac and K12ac modifications are found in newly synthesized H4,208 and its exact function is not clear. H4-K91 is found in the histone-histone interface, and acetylation is found to destabilize the nucleosome.209

Although the three acetylations have not be observed simultaneously on H4, H4-K91 is acetylated before assembly on DNA,210 just like H4-K5,K12. Furthermore, mutations in

H4 K5, K12 show hypersensitivity to replication stress and DNA-damaging agents when combined with mutations in H4-K91.211 Therefore, it has been hypothesized that the three acetylations could be working together to serve important functions in the cell.

101

Rink linker Ligation handle HMBA-Arg-Gly

H4-C0-K91ac C

H4-HC0-K91ac H C

H4-BHC0-K91ac B H C HMBA Cleavage

H4-BHC-K91ac B H C

H4-ABHC-K5ac,K12ac,K91ac A B H C

H4-K5ac,K12ac,K91ac A B H C

Figure 33: Hybrid phase ligation of H4160

Peptide H4-BHC would be prepared through SP-NCL. After cleaving with NaOH at the

HMBA linker, the last ligation with H4-A would be performed in solution. Traceless ligation would be achieved through desulfurization. For the entire synthesis, only a single purification would be required to obtain the homogenously modified H4 protein.

SP-NCL of H4-BHC-K91ac

SP-NCL of H4-BHC-K91ac was performed using the ligation conditions on Table 5.

Ligation reaction was monitored by cleaving resin samples with TFA, and analyzing with

RP-HPLC and MALDI-TOF MS (Figure 34). We did not account for the slower kinetics of the deprotection of dmThz, which led to the H4-HC side product observed in RP-HPLC.

102

For future procedures, we plan to account for the longer deprotection time of dmThz compared to Thz, and test cleave to confirm completion of the reaction.

Table 5: SP-NCL of H4-BHC-K91ac

Ligation MW Peptide Volume Concentration Molar Time Round Peptide (g/mol) (mg) (mL) (mM) Equivalent (h) 1 H4-C-Nbz K91ac 3520 6.5 1.5 1.2 2.8 14 2 H4-H-Nbz-R 2580 4 1.3 1.2 2.3 15 H4-H-Nbz-R 2580 4 1.6 1.0 2.3 18 3 H4-B-Nbz 2321 4 1.6 1.1 2.6 10 H4-B-Nbz 2321 6 1.6 1.6 3.8 20

103

A H4-C0

S

N C H

H4-HC0

S

N H C H

[M + H]+ H4-BHC0 Exp m/z: 8099 Obs m/z: 8101 B H C

S

N H C H H4-HC0

B H4-BHC [M + H]+ Exp m/z: 7454 Obs m/z: 7452 H4-HC B H C

Figure 34: SP-NCL of H4-BHC-K91ac160 RP-HPLC of ligations with H4-C, H4-H, and H4-B (A). Crude RP-HPLC of H4-BHC cleaved by NaOH (B).

104

After three ligation steps, H4-BHC-K91ac was released from the resin by cleavage of the

HMBA linker with 0.1 M NaOH. The resin was then washed with TFA for maximum peptide extraction. With the cleavage and extraction combined, we achieved 97% crude yield as assessed by lyophilized weight. The product was sufficiently pure that the solution- phase ligation was carried out with H4-BHC-K91ac without purification.

Solution-phase NCL of H4-ABHC-K5ac,K12ac,K91ac

The solution phase ligation to generate H4-ABHC proceeded to 90% as assessed by SDS-

PAGE (Figure 35A). The mixture was dialyzed extensively against thiol-free SP Wash

Buffer to exchange the aryl thiol MPAA, which is not compatible with desulfurization by the Danishefsky approach.38 Desulfurization was then carried out (Figure 36) prior to RP-

HPLC purification to obtain the final product (Figure 37). After four rounds of ligation and desulfurization (9 chemical steps), the isolated H4-K5ac, K12ac, K91ac product was obtained in 16% yield, which is approximately commensurate with yields observed for total synthesis of H4 by other approaches.144

105

A 0h 2h 4h 6h BHC

ABHC BHC HC H4-A

B 2

1 [M + H]+ 3 Exp m/z: 11457 Obs m/z: 11452 5

4

OH O O H4-A-K5ac,K12ac-MPAA 1 A S

2 H C H4-HC-K91ac

3 B H C H4-BHC-K91ac

4,5 A B H C H4-ABHC-K5ac,K12ac,K91ac

Figure 35: Solution-phase ligation of H4160

(A) SDS-PAGE of solution-phase ligation of H4-A. (B) MALDI-TOF MS of crude ligation.

106

[M + H]+ Exp m/z: 11457 Obs m/z: 11352

[M + H]+ Exp m/z: 11361 Obs m/z: 11364

Figure 36: Desulfurization of H4-K5ac,K12ac,K91ac MALDI-TOF MS of before (top) and after (bottom) desulfurization.

A B H C

25-70 % B

[M + H]+ Exp m/z: 11361 Obs m/z: 11361

Figure 37: Purified H4-K5ac,K12ac,K91ac160 RP-HPLC (top and MALDI-TOF MS (bottom) of purified H4

107

Hybrid Phase Ligation of CpA-K124ac

With the successful synthesis of H4 in good yield, we next carried out the total synthesis of CpA-K124ac using the hybrid approach (Figure 38). We again carried out the first three ligations on the solid phase. CpA-3450-K124ac was released from resin by cleavage of the

Rink linker. After cleavage, two sequential solution-phase NCL were required, as opposed to only one for H4. It was possible to perform HMBA cleavage at a number of steps in solution. However, we found that the thiol mercaptophenylacetic acid (MPAA) in the ligation buffer significantly slowed down HMBA cleavage. Therefore, it is recommended to perform HMBA cleavage before CpA-2 ligation, before CpA-1 ligation, or before desulfurization, when no thiols are present. For this synthesis HMBA was cleaved before the last ligation with CpA-1.

108

Rink linker Ligation handle SP-NCL HMBA-Arg-Gly

3 4 5 Rink cleavage

3 4 5

2 3 4 5 HMBA cleavage

2 3 4 5

1 2 3 4 5

1 2 3 4 5

Figure 38: Hybrid-phase NCL scheme of CpA-K124ac160

109

SP-NCL of CpA-345-K124ac

Preparation of CpA-345 through SP-NCL was efficient (Figure 39). CpA-3450-K124ac was recovered in 99% yield by lyophilized weight, and in excellent (>90%) purity as assessed by RP-HPLC (Figure 39). This peptide was used directly for the solution-phase ligations without purification.

5

CpA-50 0-73 % B

3 4 5

CpA-450 30-70 % B

4 5

CpA-3450 30-90 % B [M + H]+ Exp m/z: 8651 Obs m/z: 8651 3 4 5

160 Figure 39: SP-NCL of CpA-3450-K124ac RP-HPLC of CpA-345 SP-NCL and MALDI-TOF MS of product.

110

Sequential solution-phase ligation of CpA-12345-K124ac

CpA-2-Nbz was added and efficiently ligated in solution to generate CpA-23450-K124ac as assessed by SDS-PAGE (Figure 40A). Addition of methoxylamine directly to the ligation mixture deprotected N-terminal Cys. The mixture was dialyzed to remove the methoxylamine and the pH was adjusted to 10 using NaOH, which cleaved the HMBA linker to generate CpA-2345-K124ac. pH was returned to ligation conditions, and CpA-1-

Nbz peptide added to generate CpA-12345-K124ac (Fig. 24). The ligation mixture was then dialyzed extensively prior to desulfurization. After confirming complete desulfurization by MALDI-TOF MS (Figure 41), the protein was purified by RP-HPLC

(Figure 42).

111

CpA-2 CpA-1 A Ligation Ligation 0 CpA345 0h 2h 4h 6h 0h CpA2345 O/N

CpA-12345 CpA-23450 CpA-12 CpA-3450 CpA-1 CpA-2

30-90 % B B CpA-12 CpA-12345

CpA-1345

Figure 40: Solution-phase ligation of CENP-A160 SDS-PAGE of solution-phase NCL (A). RP-HPLC of CpA-1 ligation to CpA-2345 (B).

[M + H]+ Exp m/z: 16010 Obs m/z: 16012

[M + H]+ Exp m/z: 15882 Obs m/z: 15883

Figure 41: Desulfurization of CpA-K124ac MADLI-TOF MS of before (top) and after (bottom) desulfurization. 112

A 40-90 % B B

CpA12345 CpA1345

C 3 2 1

4 5

1,3,5 1 2 3 4 5

2,4 1 3 4 5

Figure 42: Purified CpA-K124ac160 RP-HPLC (A), SDS-PAGE (B), and MALDI-TOF MS (C) of purified CpA-K124ac Expected and Observed m/z: 1 CpA-K124ac: [M + 3H]+3 Exp. m/z 5295, Obs. m/z 5297 2 CpA-1345: [M +2H]+2 Exp. m/z 5755, Obs. m/z 5755 3 CpA-K124ac: [M + 2H]+2 Exp. m/z 7941, Obs. m/z 7944 4 CpA-1345 [M +H]+ Exp. m/z 11509, Obs. m/z 11510 5 CpA-K124ac [M + H]+ Exp. m/z 15882, Obs. m/z 15885

113

A small amount of CpA-1345 was observed in the purified sample. However, this was acceptable because we expected this species to be eliminated in the octamer refolding process since it was lacking an essential helix for proper folding (Figure 43). We have used this same purification-through-refolding procedure to maximize yields of H359 and H4132 produced by EPL. We find that complete octamer is only formed with the full-length proteins, but that H32/H42 tetramer will sometimes incorporate a partial protein. In the case of the octamer, partial proteins were removed via size exclusion chromatography.

Figure 43: Nucleosome containing CENP-A PDB: 3AN2194 CpA-2 segment is indicated in light blue. CpA-3, CpA-4, and CpA-5 segments are indicated in red, blue, and green, respectively.

The overall isolated yield of CpA-K124ac was 7% after five rounds of ligation, two cleavage steps, desulfurization, and RP-HPLC purification. The yield was comparable to 114 the current yield for H3 total synthesis using solution-phase NCL, which is 5-7%.132 The similar yields were expected, since the same number of solution-phase ligation steps were performed for both CENP-A and H3.

Nucleosome reconstitution using synthetic and semi-synthetic histones

This section discusses the successful incorporation of synthetic histones in nucleosomes, confirming that the products were functional as histones. We first describe nucleosome reconstitution using semi-synthetic H4-K79ac and recombinant CENP-A, in order to confirm that CENP-A nucleosome could be readily prepared without issues. We then use

H4 and CENP-A synthesized through hybrid-phase NCL to produce histone octamer, tetramer, and nucleosomes.

Semi-synthesis of H4-K79ac

In order to prepare the thioester, H4(1-75)-intein-CBD in pTXB1 plasmid was expressed in BL21 strain. Expressed protein was solubilized from the inclusion bodies using a denaturant such as urea. Denaturant was gradually removed by dialysis. Once the intein folds into its native structure, it catalyzes the N to S acyl rearrangement, forming a thioester bond between H4(1-75) and intein-CBD. Addition of excess thiol such as sodium 2- sulfanylethanesulfonate (MESNa) displaces the intein-CBD, forming the H4(1-75)-SR

(Figure 44). Typically, the separation of cleaved thioester and intein-CBD is achieved using a chitin column. The expressed thioester elutes from the column while the CBD-

115 containing species remain bound. However, in the case of histones, including H3 and H4, the cleaved thioester interacts with chitin, preventing efficient elution. For this reason, we instead exploit the high pI of histone thioesters to separate the products by cation exchange chromatography. It can be difficult to separate H4(1-75)-SR and the uncleaved H4(1-75)- intein-CBD, but the latter species can be removed by RP-HPLC purification.

SH O

H4(1-75) N Intein CBD H

N to S Acyl Shift

O HS R H4(1-75) N-term S SPPS Intein CBD H2N

Thiol exchange HS O H4(1-N75)-term 76-102 SR H2N NCL

H4(1-N75)-term 76-102 Desulfurization

H4(1-N75)-term 76-102

Figure 44: H4-K79ac EPL scheme

After cleavage, the synthesized C-terminal H4 peptide with the K79ac modification was added directly to the solution of H4(1-75)-SR. Ligation was complete in 4h. After desulfurization and RP-HPLC purification, we obtained a homogenous sample of H4-

K79ac (Figure 45).

116

A 0h 2h 4h O/N B

H4(1-75)-intCBD intCBD

H4-K79ac H4(1-75) H4-K79ac K79ac pep

C

H4(1-75) 76-102

0 10 20 30 Time (min) D [M + H]+ observed m/z 11277 expected: m/z 11277

9000 11000 m/z

Figure 45: H4-K79ac EPL (A) SDS-PAGE of H4(1-75)-SR and K79ac peptide ligation. (B) SDS-PAGE of purified H4-K79ac. (C) RP-HPLC and (D) MALDI-TOF MS of purified H4-K79ac.

Recombinant expression of His6-tagged CENP-A

We used the semi-synthetic H4-K79ac and the expressed CENP-A constructs for incorporation in nucleosomes. CENP-A was expressed using a codon-optimized CENP-A in a constitutively expressing pHCE vector, wild-type CENP-A containing a His6-tag was

117 expressed using literature protocols.158 A thrombin cleavage sequence between the tag and the protein allowed for removal of the His6-tag. Using site-directed mutagenesis, we successfully prepared the mutant His6-CpA-C75S.

Refolding and reconstitution of recombinant and semi-synthetic histones

Histones could be reconstituted into nucleosomes either from refolded histone octamer

156 212 cores or from separately refolded (H3/H4)2 tetramers and H2A/H2B dimers. Even with the His-tag, recombinant CENP-A successfully refolded into an octamer with H4,

H2A, and H2B (Figure 46). Semi-synthetic H4-K79ac was also refolded into octamers. We then used these octamer constructs for nucleosome reconstitution on 601 DNA157 using the nucleosome assembly protein Nap1213. With these successful reconstitutions, we next conducted the reconstitutions of the synthetic histones.

A B

Nuc CpA H2A/H2B DNA H4

Figure 46: Refolding and reconstitution of recombinant CENP-A (A) SDS-PAGE of histone octamer with recombinant CENP-A. (B) PAGE of CENP-A nucleosomes prepared with Nap1-assisted nucleosome reconstitution.

118

Refolding and reconstitution of synthetic histones

To confirm that we have synthesized functional histones, we performed refolding using the synthetic H4 and CENP-A. Refolding was carried out with Cecil (CJ) Howard into the relevant protein complexes. Recombinant H3 and synthetic H4-K5ac, K12ac, K91ac were refolded into (H3/H4)2 tetramer (Figure 47A). Recombinant H2A, H2B, H3, and synthetic

CpA-K124ac were refolded into octamer (Figure 47A). Of note, after careful MALDI-TOF

MS analysis, we found that desulfurization of H4- K5ac, K12ac, K91ac was not complete.

We therefore resuspended the lyophilized protein in desulfurization buffer. After confirming the completion of the reaction, we were able to use H4 in desulfurization conditions directly for tetramer refolding. This suggested that sufficiently pure proteins could be refolded immediately following desulfurization without the need for purification.

119

A 1 2 3

CpA-K124ac H3 H2A/H2B

H4 H4-Kac3

B DNA H3 CpA-K124ac

Nuc DNA

C

Figure 47: Refolding of synthetic histones160 (A) SDS-PAGE of CpA-K124ac histone octamer (2), and H4-K5ac,K12ac,K91ac tetramer (3). (B) Salt dialysis reconstitution of nucleosome with synthetic CpA-K124ac. (C) Nucleosome with CENP-A (PDB: 3AN2).194

As predicted, the 5% CpA-1345 deletion product was eliminated through the octamer refolding process, similar to effects observed for semi-synthetic H3 and H4.132 These protein complexes will be taken forward for further study of the effects of these modifications on nucleosome structure and dynamics. 120

Conclusions

In conclusion, we demonstrated a simple hybrid ligation approach that combines both solid and solution-phase ligation chemistry for optimal yields of challenging synthetic histone protein targets. We maximized product yields through resin cleavage at an external Rink linker, with subsequent cleavage at an internal HMBA linker to generate the native carboxyl terminus. We used this approach for synthesis of a triple-modified H4 histone and, notably, for the challenging target CpA-K124ac which could not be accessed using more common expression-based approaches. We find that the key step in hybrid ligation is monitoring yields of SP-NCL to determine if there is a turn- over point at which reduced release from the resin overcomes the chemical advantage of solid phase reactions.

121

Acknowledgements

Dr. Santosh Mahto contributed to the initial studies of H4 SP-NCL, including the development of the dual linker strategy and the determination of the optimal ligation sites of H4. The condition to cleanly convert H4-A-Dbz and CpA-2-Dbz to Nbz using dry DMF was developed in collaboration with Dr. John Shimko and Kurt Justus. Mallory Alexander assisted in the anlaysis of the required CENP-A peptides. Cecil Howard assisted with refolding the synthetic H4-K5ac,K12ac,K91ac and CpA-K124ac into tetramer and octamer, respectively. Thanks to Dr. Kurumizaka and Dr. Dalal for providing the pHCE vector containing the codon-optimized His6-tagged CENP-A.

122

Chapter 3: Convergent Hybrid-Phase Native Chemical Ligation

Introduction

Our prior chapter demonstrates some of the problems inherent to sequential ligation including decreased yield due to lengthy peptides and more purification steps. While solid- phase and hybrid-phase ligation ameliorated some of these issues, improved methods are necessary for high yield of fully synthetic proteins. Two main approaches have been suggested: one-pot and convergent schemes. In one-pot approaches, all ligation reactions are carried out in a single reaction vessel.139,144 Yields are often reduced due to the complex schemes and careful procedures are required. Recently, a convergent ligation scheme was recently shown to be more efficient than one-pot ligation for the synthesis of H2B.214

Here, we develop convergent hybrid-phase NCL to further improve our synthetic yield

(Figure 48). In Chapter 2, we demonstrated the synthesis of CENP-A with hybrid phase ligation, but the method required two rounds of ligation in solution after the cleavage of

CpA-345. In solution, we were required to carry out CpA-2 ligation, Thz deprotection, dialysis, HMBA cleavage, CpA-1 ligation, dialysis, desulfurization, and finally purification. With these multiple handling steps, the yield was expectedly low. We

123 hypothesized that if only one solution-phase ligation was required, we could improve the overall yield significantly, and minimize side products.

1 + 2 SP-NCL Cleavage

1 2 3 4 5

Convergent Ligation

1 2 3 4 5

Figure 48: Convergent ligation of CENP-A

In this chapter, we propose a convergent approach where CpA-1 and CpA-2 are ligated to form CpA-12, which is then ligated to CpA-345 (Fig. 27). This simplifies the solution- phase component of hybrid-phase ligation. This requires the development of a cryptic thioester at the C-terminus of CpA-2, which is necessary for our convergent scheme. We will then introduce convergent SP-hybrid NCL, a further refinement to improve CENP-A total synthesis.

124

Experimental Methods

Hydrazinolysis of peptide Nbz and peptide HMBA

Peptide Nbz was dissolved in 0.1 M Phosphate, 6 M GuHCl, 1% v/v hydrazine, pH 7.

Peptide HMBA was dissolved in 0.1 M Phosphate, 6 M GuHCl, 1% v/v hydrazine.

Reaction was complete in 2 h, and product was confirmed with RP-HPLC and MADLI-

TOF MS.

Preparation of Hydrazide resin using Wang

Wang resin155 (100-200 mesh) (Novabiochem) was swelled in 14 mL DCM and 160 µL N- methyl morpholine (final concentration was approximately 100 mM). NPCF was added to a final concentration of 100 mM at 0 °C. The mixture was brought to room temperature, and the resin was stirred overnight.

The resin was flow-washed with DCM three times, followed by three flow-washes in DMF.

Resin turned bright yellow upon the addition of DMF. The resin was flow-washed three times with methanol, followed by three flow-washes in DCM. The resin was drained and lyophlilized.

125

At 0 °C, 15 mL DMF, 11mL DCM, and 50 µL hydrazine (30 mM) was added to the lyophilized resin. The resin turned yellow upon the addition of the mixture. The mixture was brought to room temperature, and the resin was nutated overnight.

The resin was flow-washed three times with DCM. The resin at this point was white. The resin was then flow-washed three times in the following sequence: DMF, methanol, DCM.

The resin was drained and lyophilized. CpA-2 was synthesized on the hydrazide resin using standard SPPS protocols.

Solution-phase NCL of CpA-12-Dbz

Ligation with CpA-1-Nbz

CpA-1-Nbz (1.5 molar equivalent to CpA-2-Dbz) was dissolved in SP Ligation Buffer (see

Experimental Methods in Chapter 2). CpA-2-Dbz was added, and 1 M TCEP pH 7.4 was added to make 20 mM. The final concentration of CpA-2-Dbz should be at least 1 mM to allow for rapid ligation. Ligation was allowed to proceed for at least 5 h, and reaction was monitored by RP-HPLC and MALDI-TOF MS. Ligated product was purified using RP-

HPLC. Pure fractions were confirmed by RP-HPLC and MALDI-TOF MS, and the fractions were combined and lyophilized.

126

Ligation with CpA-1-Dbz

CpA-1-Dbz was dissolved in 0.1 M Phosphate, 6 M GuHCl, pH 3, and incubated at -15 °C for 15 minutes. -15 °C ice bath was prepared by mixing 2 kg of ice with 10 g of NaCl.

Temperature was adjusted by adding more ice or salt. 200 mM NaNO2 was prepared in water, and this solution was added to the CpA-1-Dbz solution to make 20 mM NaNO2. The peptide solution was mixed by pipetting up and down using a micropipette, and further incubated at -15 °C for 15 minutes. The solution was then taken out of the ice bath to room temperature. CpA-2-Dbz was dissolved in 0.1 M Phosphate, 6 M GuHCl, 0.2 M MPAA, pH 7.4. This solution was added to the CpA-1 solution, and the pH was adjusted to 7.4.

After combining the two peptide solutions, the final concentration of MPAA was 0.1 M.

Final concentration of CpA-1-Dbz should be at least 1 mM to promote rapid ligation. We found that best kinetics was achieved at 3 mM peptide. After 1 h, TCEP was added to make

20 mM. Ligation was allowed to proceed for at least 5 h, and reaction completion was assessed by RP-HPLC and MALDI-TOF MS. Ligated product was purified using RP-

HPLC, and the pure fractions were combined and lyophilized.

Solution-phase NCL of CENP-A using CpA-12-Dbz and CpA-345

Ligation of CpA-12-Dbz and CpA-345

Ligation condition for CpA-12-Dbz and CpA-345 was identical that of CpA-1-Dbz and

CpA-2-Dbz. CpA-12-Dbz (1.5 molar equivalent to CpA-345) was dissolved in 0.1 M

Phosphate, 6 M GuHCl, pH 3, and incubated at -15 °C for 15 minutes. NaNO2 was added

127 to make 20 mM, and further incubated at -15 °C for 15 minutes. The solution was taken out of the ice bath to room temperature. CpA-2-Dbz was dissolved in 0.1 M Phosphate, 6

M GuHCl, 0.2 M MPAA, pH 7.4. This solution was added to the CpA-1 solution, and the pH was adjusted to 7.4. The final concentration of MPAA should be 0.1 M, and the final concentration of CpA-1-Dbz should be at least 1 mM. After 1 h, TCEP was added to make

20 mM. Ligation was allowed to proceed for at least 5 h, and the reaction was monitored by SDS-PAGE, RP-HPLC, and Ziptip MALDI-TOF MS. Samples for SDS-PAGE were

TCA precipitated before loading. Procedures for Ziptip and TCA precipitation are described in Eperimental Methods in Chapter 2.

Desulfurization of CpA-12345

Sample was dialyzed using the D-tube dialyzer against 200 mL of SP Wash Buffer at 4 °C.

Buffer change was performed after 5 h, and the second dialysis was allowed to go overnight.

The sample was then transferred to a 2.0 mL tube, and the dialysis tube was rinsed with a small volume of SP Wash Buffer, and added to the same 2.0 mL tube. The sample was diluted 5-fold from the original volume during ligation in order to minimize precipitation during desulfurization. To a solution of 1 M TCEP, GuHCl and MESNa were added to make 6 M and 0.2 M MESNa, respectively. The buffer was spun down, and the supernatant was added to the CENP-A sample to make 0.25 M TCEP and 50 mM MESNa. The sample was sparged with Argon for 30 minutes. 0.5 M VA-044-US in water was prepared, and this solution was added to the sample to a final concentration of 30 mM, and the sample was incubated in a 42 °C water bath. Reaction was allowed to proceed for at least 5 h. Complete

128 desulfurization was confirmed by RP-HPLC and ziptip followed by crude MALDI-TOF

MS.

Glycolic acid base resin

Synthesis of glycolic acid base resin

Diglycolic acid –Ala-Ahx-Lys-Gly-Rink-PEGA

3 mL of PEGA resin swelled in methanol was measured, which equated to 0.03 mmol of reactive amine. One tenth loading cut was performed on Gly. Fmoc-Ala-OH, Fmoc-

Lys(Boc)-OH, and Fmoc-Ahx-OH were coupled using starndard manual synthesis conditions. Diglycolic anhydride was loaded using standard coupling conditions with

HCTU and DIEA. The final volume of the resin in DMF was 2.7 mL. The estimated loading of the base resin in DMF was calculated to be 0.001 mmol/mL. The resin was stored at 4

°C in DCM.

Resin thioesterification

Resin was swelled in DMF for 15 minutes. Solution containing 0.3 M Thiophenol, 0.3 M

DIC, and 2mM DMAP in DMF was added to the resin. After 1 h, test cleavage was performed to confirm the completion of the reaction. A small resin sample was washed with DMF, then with DCM. Vacuum-dried resin was incubated in TFA for 15 minutes.

TFA was eluted, evaporated with N2, and diluted with water. Sample was analyzed with

RP-HPLC and MALDI-TOF. If the unreacted glycolic acid was observed, the resin was

129 washed with DMF, and the thioesterification was repeated. Analysis was repeated after 1 h. The reaction was typically complete after the second reaction.

Ligation of CpA-2-Dbz-Gly-Lys(Cys)

After confirming the completion of thioesterification, the resin was immediately washed with DMF, and then with water. The resin was then washed three times with SP Wash

Buffer and SP Ligation Buffer. Wash here refers to the flow-wash and nutation step discussed in Experimental Methods for Chapter 2. Buffer was drained from the resin, and

CpA-2-Dbz-Gly-Lys(Cys) dissolved in SP Ligation Buffer was added to the resin. TCEP was added to make 20 mM, and the ligation was allowed to proceed for at least 5 h.

Ligation of CpA-1-Dbz

Procedure to oxidize CpA-1-Dbz is identical to that of solution-phase ligation using CpA-

1-Dbz. After adding MPAA and adjusting the pH to 7.4, the solution was added to drained resin washed with SP Wash Buffer + 100 mM MPAA. After 1 h ligation, 1 M TCEP pH

7.4 was added to a final concentration of 20 mM.

Cleavage of CpA-12-Dbz0

After washing the resin 3 times with SP Wash Buffer and 3 times with water, the resin is lyophilized inside the column. Peptide was cleaved with 95:2.5:2.5 TFA:H2O:TIS for 1 h.

Resin was washed 3 times with TFA. All TFA eluates were combined and concentrated

130 using flow of N2. EtO2 was added to at least 5 times the volume of TFA. The sample was centrifuged and decanted. Pellet was dissolved with water and ACN, and lyophilized.

SDS-PAGE of TFA-treated resin

Resin was washed 3 times with water and the eluates were collected. The resin was drained or decanted, and transferred to a tube. SDS loading buffer containing 100 mM DTT was added, and the resin was incubated at 95-100 °C for 5 min. The resin was loaded on the gel along with the loading buffer using a cut-off tip.

Base resin sequences

The SP-NCL resin for the synthesis of CpA-345 used here had been modified so that the unligated base resin handle could be observed on the RP-HPLC and MALDI-TOF MS.

SP-NCL resin: Fmoc-Thz-Ala-Ahx-Lys-Gly-Rink-PEGA (1.33 µmol/mL in methanol)

Glycolic resin: Glycolic acid - Ala-Ahx-Lys-Gly-Rink-PEGA (1.0 µmol/mL in DMF)

Quantificaiton of product from dry PEGA resin

If the final dry weight of PEGA resin after ligation is known, theoretical starting weight of the PEGA resin can be calculated using the following equation:

����� ����ℎ� �������� ����ℎ� = ���� 1.06 + 0.02 (�� ) �

131

Where MWtotal is total molecular weight of all the components added on the resin, starting from the first Gly to the full SP-NCL peptide. Once the starting weight is calculated, the theoretical yield of the cleaved peptide is calculated using the following equation:

���� �ℎ��������� ����� �� = (����� ����ℎ�)(0.02 )(�� ) �

Where MWpeptide is the molecular weight of the cleaved peptide product.

132

Results and Discussion

Hydrazide as a cryptic thioester for convergent ligation

Convergent ligation requires protection schemes to prevent reaction of the internal thioester while initial ligation steps are carried out. In order to develop an efficient cryptic thioester for CENP-A convergent ligation, we first examined the hydrazide functionality.

Peptide hydrazides are unreactive under ligation conditions, but can be activated through oxidation to peptide azide using NaNO2. The azide can then be displaced by a thiol to form a thioester in situ (Figure 49).143 Further, it has been used successfully for the total synthesis of histones.144,146

O

NH2 N H

NaNO2

O N N N

Thiol

O

SR

Figure 49: Thioester conversion from peptide hydrazide

133

Synthesis of CpA-2 on hydrazide resin

Initially, we attempted to directly synthesize CpA-2 on a resin with a hydrazide linker,215 which was prepared from Wang resin (Figure 50) This synthesis was not successful.

Although the 16-residue peptide could still be observed, by the end of synthesis, the expected product could not be observed (Figure 51).

HO

PNCF

O2N O

O O

hydrazine

O

H2N N O H

Figure 50: Preparation of hydrazide base resin

134

CpA-2-N2H3 16-mer 0-73% B

0 10 20 30 Time (min)

CpA-2-N2H3 29-mer 0-73% B

0 10 20 30 Time (min)

CpA-2-N2H3 0-73% B

0 10 20 30 Time (min)

Figure 51: Synthesis of CpA-2-N2H3

RP-HPLC of CpA-2-N2H3 at 16-residue, 29-residue, and full-length (36 residues).

CpA-2-N2H3 by hydrazinolysis of Nbz

Given that the CpA-2 peptide itself can be synthesized on simple linkers, we hypothesized that the hydrazide linker might have posed a problem during synthesis. We therefore proposed an approach in which an internal linker was displaced by hydrazine to allow preparation of the protected derivative. Nbz can be cleaved with hydrazine to yield peptide

135

146 hydrazide, so we attempted to convert CpA-2-Nbz(formyl) to CpA-2-N2H3 using this method. The proposed convergent ligation scheme is illustrated in Figure 52.

O 2 O N NH2 O Hydrazine N H

O O 1 O NH2 + 2 N N H NH2 O N H

O

NH2 1 2 N H

NaNO2

O N N 1 2 N 3 4 5

MPAA

1 2 3 4 5

Figure 52: Convergent ligation using hydrazide: hydrazinolysis of Nbz

Hydrazinolysis of CpA-2-Nbz(formyl) generated an unknown species with m/z of 4614

(Figure 53). The expected hydrazide product was also observed, but the unknown species seemed to be the major product. We hypothesized that the condensation between the formyl group and the hydrazine reagent led to a rearrangement to generate the unknown product.

We therefore performed hydrazinolysis again using CpA-2-Nbz. As discussed in Chapter

2, Nbz conversion using dry NMP generated CpA-2-Nbz with no formylation. Although

136

CpA-2-N2H3 was observed, we found very little product in the supernatant of the peptide sample as assessed by RP-HPLC (Figure 54). The peptide sample was relatively dilute and we did not observe visible precipitation, but it seemed like CpA-2-N2H3 had marginal solubility even in 6 M GuHCl. The solubility issue of CpA-2- N2H3 would partially explain why synthesis of CpA-2 on a hydrazide resin was so poor.

O A 2 O N NH2 O N H H O

30-60 % B

0 10 20 30 Time (min) B 30-60 % B

0 10 20 30 Time (min) C m/z 4614 O

NH2 2 N H [M + H]+ observed m/z 4154 expected m/z 4152

4000 5000

Figure 53: Hydrazinolysis of CpA-2- Nbz(formyl) (A) RP-HPLC of CpA-2-Nbz(formyl). (B) RP-HPLC of CpA-2-Nbz(formyl) after the addition of hydrazine. (C) MALDI-TOF MS of the major peak in (B).

137

O A 2 O N NH2 O N H

30-60 % B

0 10 20 30 Time (min)

O

NH2 B 2 N H

30-60 % B

0 10 20 30 Time (min)

Figure 54: Hydrazinolysis of CpA-2-Nbz

(A) RP-HPLC of CpA-2-Nbz. (B) RP-HPLC of CpA-2-N2H3.

CpA-12-N2H3 by hydrazinolysis of HMBA

Since it was not possible to prepare CpA-2-N2H3 due to solubility issues, we decided to perform hydrazinolysis after ligating CpA-1 to CpA-2. CpA-1 is very soluble, so CpA-12-

N2H3 should have improved solubility over CpA-2-N2H3. To do this, it was necessary to use CpA-2 with a C-terminal HMBA, since using CpA-2-Nbz would lead to cyclization and hydrolysis during ligation. HMBA, like Nbz, can be cleaved with hydrazine to yield peptide hydrazide,216 but unlike Nbz, it is stable under ligation conditions. The convergent ligation scheme using HMBA is shown in Figure 55.

138

O 1 O + 2 N NH2 O HMBA N H

1 2

Hydrazine O

1 2 NH2 N H

NaNO2

O N N 1 2 N 3 4 5

MPAA

1 2 3 4 5

Figure 55: Convergent ligation using hydrazide: hydrazinolysis of HMBA

For this purpose we synthesized CpA-2-HMBA-Arg-Gly-Dbz (Figure 56). Performing hydrazinolysis after CpA-1 ligation improved solubility, but the presence of a side product made purification difficult. The yield of partially pure CpA-12- N2H3 was less than 15% from the starting CpA-2-HMBA-ARg-Gly-Dbz (Figure 57).

139

O 2 O HN NH2 [M + H]+ H N 2 observed m/z 4903 expected m/z 4906 25-50 % B

0 10 20 30 4000 5000 Time (min)

Figure 56: CpA-2-HMBA-Arg-Gly-Dbz RP-HPLC (left) and MALDI-TOF MS (right) of CpA-2-Arg-Gly-Dbz. Note the C-terminal Dbz was originally intended for ligating CpA-2 to a solid support, so that CpA-12 could be prepared in solid-phase. The SP-NCL of CpA-12 is not relevant to the study conducted here, and will be discussed in a later section.

140

A CpA-12-HMBA-RG-Dbz 25-45 % B *

O 1 2 O HN NH2

H2N

0 10 20 30 Time (min)

B CpA-12-N2H3 25-45 % B * O

NH2 1 2 N H

0 10 20 30 Time (min)

C CpA-12-N2H3 semi-pure * 25-45 % B

0 10 20 30 Time (min)

Figure 57: Ligation and hydrazinolysis of CpA-12 RP-HPLC of CpA-1 and CpA-2-HMBA-RG-Dbz ligation (A). Hydrazinolysis of CpA-12 (B). Semipure CpA-12-N2H3 (C).

141

Generating hydrazide through hydrazinolysis of Nbz and HMBA was not compatible with CpA-2. We therefore needed to find another suitable functional group to serve as the masked thioester.

Using Dbz as a cryptic thioester for convergent ligation

While searching for the ideal cryptic thioester, we found that recent advances by the

Dawson217 and Liu218 laboratories demonstrate that peptide-Dbz can be directly converted into a reactive thioester. Dbz is oxidized using NaNO2, and the resulting triazole functionality can be displaced by an external thiol.218 This approach completely bypasses the Nbz conversion step. As illustrated in Figure 58, Dbz is a very versatile linker that can be used for the efficient production of thioester, as well as serve as a cryptic thioester.

O O HN NH2 NPCF/DMF NaNO2 H2N NPCF/DCM

O O O O O O

N N N NH2 NH2 NH2 O O N N N N H H H O Hydrazine

O

NH2 MPAA N MPAA H MPAA

NaNO2

MPAA OH O O S

Figure 58: Preparation of thioester using Dbz

142

The ability to use Dbz as a thioester precursor had several advantages over Nbz.

Elimination of a chemical step improved yield of the peptide. Nbz is more labile than Dbz, and even under acidic condition of RP-HPLC, hydrolysis products were observed if the peptides were not lyophilized immediately. Peptide was much more stable as the Dbz derivative. However, it must be noted that direct activation of Dbz cannot replace Nbz entirely as the means to produce thioester. Ligation using peptide Dbz with a N-terminal

Thz generated a ring-opened side product. Therefore, ligation using Dbz was only suitable for peptides lacking Thz.

Convergent Hybrid-Phase NCL of CENP-A

Figure 59 illustrates the scheme for the convergent hybrid-phase NCL approach. CpA-345 is prepared through SP-NCL as described in Chapter 2. CpA-345 is cleaved at the HMBA rather than the Rink linker in order to reduce the number of chemical steps performed in solution. CpA-12-Dbz is prepared through solution-phase NCL using CpA-2-Dbz and

CpA-1-Nbz. CpA-12-Dbz is activated using NaNO2, and convergent ligation with the cleaved CpA-345 generates the full-length product. Compared to the previous hybrid- phase NCL approach, this method eliminates a solution-phase Thz deprotection step, a dialysis step, and a solution-phase NCL step. With this new strategy, we anticipated an improvement in yield of CENP-A total synthesis.

143

O 2 O HN NH2 SP-NCL

Solution-phase NCL H2N

O 1 2 O 3 4 5 HN NH2

NaNO2 H2N NaOH

O 1 2 O 3 4 5 N NH2 N N

MPAA

1 2 3 4 5

Figure 59: Convergent hybrid-phase NCL of CENP-A

Solution-phase ligation of CpA-12-Dbz

CpA-1-Nbz and CpA-2-Dbz were dissolved in MPAA ligation buffer. CpA-1 converted to thioester while CpA-2-Dbz remained inert, and the ligation was efficient (Figure 60) The product was purified by RP-HPLC to remove excess MPAA and unreacted peptides, providing a 40% purified yield.

144

A MPAA 15-50 % B

CpA-12-Dbz

CpA-1 CpA-2-Dbz

0 10 20 30 Time (min)

B 25-50 % B O 1 2 O HN NH2

H2N

0 10 20 30 Time (min) C [M + H]+ observed m/z 8144 expected m/z 8144

7000 8000 10000 m/z

Figure 60: Solution-phase ligation of CpA-12-Dbz Crude RP-HPLC of the ligation (A). RP-HPLC (B) and MALDI-TOF MS (C) of purified CpA-12-Dbz.

145

Solution-Phase NCL of CpA-12345-K124ac

With both CpA-12-Dbz and CpA-345 in hand, we carried out the convergent ligation in solution-phase. CpA-12-Dbz was converted to the triazole derivative using NaNO2, while

CpA-345 was dissolved in MPAA ligation buffer. The two peptides were combined, generating the CpA-12 thioester in situ. CpA-12 was efficiently ligated to produce the full- length CpA-12345 (Figure 61).

1 2 1 2 3 4 5

30-90 % B

0 10 20 30 Time (min)

Figure 61: Solution-phase ligation of CpA-12 and CpA-345 RP-HPLC of convergent ligation after 16 h.

Overnight dialysis was carried out in order to remove the MPAA before the free-radical desulfurization. After confirming complete desulfurization by RP-HPLC and MALDI-

TOF MS (Figure 62), the protein was purified with RP-HPLC (Figure 63).

146

A [M + H]+ observed m/z 16001 expected m/z 16008

[M + H]+ observed m/z 15882 expected m/z 15881

8000 m/z 10000

B 1 2 1 2 3 4 5

30-90 % B

0 10 20 30 Time (min)

Figure 62: Desulfurization of CENP-A (A) MALDI-TOF MS of before (top) and after (bottom) desulfurization of CpA-K124ac. (B) RP-HPLC of CENP-A after desulfurization.

147

A 1 2 3 4 5

30-90 % B

0 10 20 30 Time (min) B [M + H]+ observed m/z 15882 expected m/z 15881

Figure 63: Purified CpA-K124ac RP-HPLC (A) and MALDI-TOF MS (B) of purified CpA-K124ac.

Two trials were performed using the convergent hybrid-phase NCL method. The yield, calculated on resin loading of the CpA-345 SP-NCL, was 18% in both trials. This was significantly higher than the 7% yield that we initially obtained from the CpA hybrid-phase

NCL. In addition, no side product was observed in the purified product.

148

Convergent SP-Hybrid NCL of CpA

Ligating CpA-1 and CpA-2 in solution was an effective way to improve total synthesis yield. However, the CpA-12 still requires purification. Importantly, total synthesis of larger proteins may require the N-terminal segment to be composed of three or more peptides, which is not compatible with this convergent approach. For larger proteins, it would be difficult to implement convergent hybrid-phase NCL since yield decreases significantly as the number of solution-phase ligation increases. We envisioned being able to synthesize the N-terminal segment using SP-NCL, enabling the ligation of multiple peptides while eliminating the need for purification. We term this approach convergent SP-hybrid NCL.

Ligation handle for convergent SP-NCL

The key to the convergent SP-hybrid NCL of CENP-A was the development of a strategy to anchor CpA2-Dbz to a solid support with the Dbz linker intact. In our proposed scheme, reverse NCL was performed between the C-terminal cysteine of CpA-2 and the resin with a terminal thioester (Figure 64). The cysteine was linked through the C-terminal Lys sidechain. Ligation with CpA-1 followed by cleavage results in CpA-12 with the Dbz intact.

149

H2N

O O O

O

O O O HO N H Glycolic acid

Thiophenol

O

S O

NH2 HN

SH O 2 O HN N H O H H2N N HN SP-NCL SH O 2 O HN N H SP-NCL H2N

3 4 5 O 1 2 O HN N H NaOH

H2N TFA

O 1 2 O 3 4 5 HN N H

H2N

NaNO2

MPAA

1 3 4 2 5

Figure 64: CENP-A Convergent SP-hybrid NCL scheme

150

Synthesis of CpA-2-Dbz-Gly-Lys(Cys)

In order to synthesize CpA-2 with the required C-terminal handle, Lys was coupled to the

Rink amide resin as Fmoc-Lys(Alloc)-OH. After Alloc deprotection, Boc-Cys(Trt)-OH was coupled to the ε-amine. Gly was added as a spacer residue before the addition of Dbz.

CpA-2 was then synthesized on the resin using standard SPPS procedure (Figure 65). The

Cys remained protected during synthesis until the Boc was deprotected with TFA cleavage.

A

O Cys NH 2 O HN N H

H2N

0 10 20 30 Time (min) + B [M + H] observed m/z 5852 expected m/z 5858

no Dbz

4000 m/z 6000

0 10 20 30 Time (min)

Figure 65: CpA-2-Dbz-GK(C) (A) RP-HPLC of crude CpA-2-Dbz-GK(C). (B) RP-HPLC (left) and MALDI-TOF MS (right) of CpA-2-Dbz-GK(C).

151

SP-NCL of CpA-12-Dbz0

Thioesterification of the glycolic acid base resin was performed using thiophenol due to the fast kinetics of aryl thioesters.145 Full conversion was confirmed by MALDI-TOF MS.

The base resin was stored in the diglycolic acid form, and thioesterification was performed immediately before ligation in order to minimize the chance of thioester hydrolysis. CpA-

2-Dbz-Gly-Lys(Cys) ligated successfully to the resin and no unreacted base resin was observed. After methoxylamine deprotection, the second ligation with CpA-1 was also successful (Figure 66A). The crude yield of CpA-12-Dbz0 was 20% calculated from the dry weight of the resin after cleavage, which was lower than expected. When the TFA- treated CpA-12 resin was boiled in SDS and ran on SDS-PAGE, we observed a significant band of CpA-12 (Figure 66C). This was consistent with what was observed for CpA-12345

SP-NCL, suggesting that the low yield was due to the incomplete elution of the peptide from resin. We believe that the unique sequence of CpA-12 was the cause of the inefficient cleavage. Nevertheless, SP-NCL of CpA-12 still held a definite advantage over the solution-phase approach as there was no purification required. The lyophilized crude peptide could be used directly for the solution-phase ligation to CpA-345.

152

0-73 % B A CpA-2-Dbz0

O 2 O HN N H

H2N

0 10 20 30 Time (min) 15-60 % B CpA-12-Dbz0

O 1 2 O HN N H

H2N

0 10 20 30 Time (min) B C

[M + H]+ observed m/z 8019

expected m/z 8021 O 1 2 O HN N H

H2N

7000 10000 m/z

Figure 66: SP-NCL of CpA-12-Dbz0

RP-HPLC of CpA-2 and CpA-1 ligations (A). MALDI-TOF MS of CpA-12-Dbz0 (B). SDS-PAGE of TFA-treated CpA-12 resin (C). CpA-12-Dbz0 is indicated by the arrow.

153

Refolding synthetic CENP-A without purification

With CpA-K124ac synthesized from convergent hybrid-phase NCL, most of the yield loss seemed to come from RP-HPLC purification. The RP-HPLC of the desulfurized CENP-A is relatively clean. We hypothesize that the desulfurized product should be sufficiently pure to be used directly for octamer refolding. We have shown previously that we could refold synthetic H4 from desulfurization condition. In the convergent hybrid-phase NCL of CpA-

K124ac, we used purified CpA-12-Dbz and relatively pure CpA-345 for the solution-phase ligation. The RP-HPLC following desulfurization of CpA-12345-K124ac had only two major products: CpA-12 and CpA-K124ac. CpA-12 peptide should not be able to refold since the majority of its sequence is the unstructured tail.194 In fact, the segment corresponding to CpA-1 peptide is not visible in the crystal structure.

By performing refolding directly after desulfurization, the only purification step needed in the entire convergent hybrid-phase NCL scheme will be the purification of CpA-12-Dbz.

We have shown that even this purification step can be eliminated by preparing CpA-12 using SP-NCL. Therefore, using the convergent SP-hybrid NCL approach, it should be possible to prepare CENP-A octamer without a single purification step (Figure 67). This will be an unprecedented feat in histone total synthesis.

154

SP-NCL SP-NCL

O 1 2 O 3 4 5 HN N H TFA NaOH H2N

O 1 2 O 3 4 5 HN N H

NaNO2 H2N

MPAA

1 2 3 4 5

1 2

Desulfurization

1 2 3 4 5

1 2

Refolding

CpA-12

Figure 67: Refolding synthetic CENP-A with no purification Segment containing CpA-1 was not visible in the crystal structure.194

155

Conclusions

By incorporating convergent ligation in our hybrid-phase approach, we were able to increase the yield of CENP-A total synthesis significantly. Following this success, we used another convergent approach where CpA-12 was prepared in solid-phase. CpA-12 yield from SP-NCL was lower than expected due to inefficient elution, likely arising from the unique properties of CpA-12. It is possible that this issue is not seen in other proteins.

Through CpA total synthesis, we have demonstrated the viability of convergent SP-hybrid

NCL. In this approach, a protein fragment prepared through SP-NCL can be cleaved and directly converted into a thioester without the need for purification. Large proteins can be ligated from 2 to 3 fragments, each produced by SP-NCL. The convergent SP-NCL approach can potentially overcome the current size limit of protein total synthesis. We also reveal that similarly to H4 and CENP-A, H3 could not be synthesized using sequential SP-

NCL. This provides us with H3 as the next target for our new convergent hybrid approach.

Acknowledgements

Thanks to Cecil (CJ) Howard and Ziyong Hong for the helpful discussions on the anchoring strategy of CpA-2 to resin used in convergent SP-hybrid NCL.

156

Chapter 4: Conclusions

Total synthesis offers unparalleled control when it comes to installing multiple site-specific modifications on a protein. Synthetic proteins have been used to understand the structure, function, and mechanism of action of many proteins. As powerful as protein total synthesis is, the technique has major limitations. We have developed multiple ligation techniques to overcome those limitations.

The state of the art in using sequential solution-phase ligation for the total synthesis of H3 has an overall yield of 7%. Through hybrid-phase NCL, we successfully synthesized H4, prepared from four peptides, with 16% yield. Despite the increased number of ligation steps, the yield was superior to H3 synthesis. Hybrid-phase NCL of CENP-A, with the same number of solution-phae ligation steps as H3, had 7% yield. With the convergent hybrid-phase method, the overall yield of CENP-A was further improved to 18%. This is a significant step to overcoming the current limitations in total protein synthesis.

We have revealed the limitation of H4, CENP-A, and H3 total syntheses by sequential SP-

NCL. Yield reductions were observed after one particular ligation, resulting in a low overall product yield. We therefore developed the hybrid-phase NCL approach that

157 combined solution-phase and solid-phase ligation techniques to maximize yield. Sequential

SP-NCL was performed to prepare the C-terminal segment of the protein. The segment was detached from the solid support and the ligation responsible for the yield cut was done in solution. We employed this strategy to generate H4-K5ac,K12ac,K91ac and CpA-

K124ac. The synthetic histones could be refolded into histone octamers, and reconstituted into nucleosome.

We have improved hybrid-phase NCL by incorporating convergent ligation strategies in the context of CENP-A. The N-terminal segment was prepared by solution-phase NCL, and the C-termiinal segment was prepared as previously described by SP-NCL. The two segments were converged in solution and ligated to give the full-length protein.

Desulfurization and purification produced CpA-K124ac with a significant yield improvement over the original hybrid-phase NCL method. The key to the convergent approach was use of the Dbz linker as a masked thioester that remained inert until activated by oxidation.

In order to eliminate the need to purify the N-terminal segment before its use in the convergent step, we developed the convergent SP-hybrid NCL strategy to produce the N- terminal segment. After ligation, the peptide was cleaved from the solid support while retaining the Dbz as the masked thioester. Cleaved segment was used directly for convergent ligation with no purification. CpA-K124ac was produced, and only one purification step was performed in the entire NCL process.

158

Future Work and Application

The convergent SP-hybrid NCL approach allows for the N-terminal segment to be ligated from multiple peptides. Therefore, this approach can potentially be used for the efficient synthesis of larger proteins with more than 200 residues. In order to demonstrate this potential, we are currently applying this approach to the synthesis of the 212-residue linker histone H1.2 using convergent SP-hybrid NCL. Ziyong Hong has synthesized the required peptides (Table 6). An initial sequential SP-NCL will be performed as a test to determine where the yield cut occurs, if at all. Depending on the result, we will divide the protein into two or three parallel SP-NCL. After cleavage the segments will be ligated in solution to produce the full length protein (Figure 68).

159

Table 6: H1.2 Peptides

H1.2 residues Peptide sequence H1(1-23) SETAPAAPAAAPPAEKAPVKKKA H1(24-48) AKKAGGTPRKASGPPVSELITKAVA H1(49-66) ASKERSGVSLAALKKALA H1(67-86) AAGYDVEKNNSRIKLGLKSL H1(87-110) VSKGTLVQTKGTGASGSFKLNKKA H1(111-133) ASGEAKPKVKKAGGTKPKKPVGA H1(134-162) AKKPKKAAGGATPKKSAKKTPKKAKKPAA H1(163-188) ATVTKKVAKSPKKAKVAKPKKAAKSA H1(189-212) AKAVKPKAAKPKVVKPKKAAPKKK

H1.2 residues Peptide sequence H1(1-23)-Dbz SETAPAAPAAAPPAEKAPVKKKA-Dbz H1(24-48)-Dbz Thz-KKAGGTPRKASGPPVSELITKAVA-Dbz H1(49-66)-Dbz Thz-SKERSGVSLAALKKALA-Dbz H1(67-86)-Dbz Thz-AGYDVEKNNSRIKLGLKSL-Dbz H1(87-110)-Dbz(Alloc) dmThz-SKGTLVQTKGTGASGSFKLNKKA-Dbz H1(111-133)-Dbz Thz-SGEAKPKVKKAGGTKPKKPVGA-Dbz(Alloc) H1(134-162)-Dbz Thz-KKPKKAAGGATPKKSAKKTPKKAKKPAA-Dbz H1(163-188)-Dbz Thz-TVTKKVAKSPKKAKVAKPKKAAKSA-Dbz H1(189-212)-HMBA- Thz-KAVKPKAAKPKVVKPKKAAPKKK-HMBA-RG-Dbz(Alloc) RG-Dbz(Alloc)

160

Two-piece ligation

O HS 1-23 24-48 49-66 67-86 O HN 87-110 111-133 134-162 163-188 189-212 N - H2N - H H2N 1) TFA NaOH 2) NaNO2

1) MPAA 2) Desulfurization

1-23 24-48 49-66 67-86 87-110 111-133 134-162 163-188 189-212

Three-piece ligation

O O 1-23 24-48 49-66 O 67-86 87-110 111-133 O HN HN N - N - H H H N H2N 2 TFA 1) TFA 2) NaNO2

MPAA

O 1-23 24-48 49-66 67-86 87-110 111-133 O 134-162 163-188 189-212 - HN N H H2N

1) TFA NaOH 2) NaNO2

1) MPAA 2) Desulfurization

1-23 24-48 49-66 67-86 87-110 111-133 134-162 163-188 189-212

Figure 68: Convergent SP-Hybrid NCL scheme of H1

161

In addition, the SP-hybrid NCL approach is also being applied for the total synthesis of

H3. In a preliminary test we carried out a sequential SP-NCL of H3 using five peptide fragments. Micro-cleavages after each ligation revealed a reduction in yield after the fourth ligation, much like H4 and CENP-A. Yield loss when ligation crosses from the relatively hydrophobic α-helical core domain to the unstructured, highly charged tail may be a common trend in histone proteins (Figure 69).

CpA H3 H4

Figure 69: Comparison of CENP-A, H3, and H4 structures Green segments, from left to right, are CpA-5, H4-C, and H3-C2. Blue segments are CpA- 4, H4-H, and H3-C1. Red segments are CpA-3, H3-M2, H4-B. Light blue segments are CpA-2, H3-M1, and H4-A. Yellow green segment is H3-N.56,194

162

Overall, we have demonstrated the efficient total synthesis of histone proteins by developing a new ligation strategy, which combines three NCL strategies: Solid-phase

NCL, solution-phase NCL, and convergent NCL. In the process we have also developed solutions for the various challenges involved with the efficient preparation of individual peptide thioesters using Fmoc-SPPS. This work demonstrates significant progress in the field of efficient protein total synthesis.

163

References

Figures with Ref. 7 were reproduced from the indicated reference with permission from

Springer (license number 3911880464769). Figures and Tables with Ref. 160 were reproduced and adapted from the indicated reference with permission from The Royal

Society of Chemistry.

1. Fischer, E. Synthesis in the purine and sugar group. Nobel Lecture (1902).

2. Kent, S. et al. Through the looking glass - a new world of proteins enabled by chemical synthesis. Journal of Peptide Science 18, 428-436 (2012).

3. Kent, S.B.H. Total chemical synthesis of proteins. Chem. Soc. Rev. 38, 338-351 (2009).

4. Nilsson, B.L., Soellner, M.B. & Raines, R.T. Chemical synthesis of proteins. in Annual Review of Biophysics and Biomolecular Structure, Vol. 34 91-118 (2005).

5. Stevens, R.C. Design of high-throughput methods of protein production for structural biology. Structure 8, R177-R185 (2000).

6. Rosano, G.L. & Ceccarelli, E.A. Recombinant protein expression in Escherichia coli: advances and challenges. Frontiers in microbiology 5, 172-172 (2014).

7. Howard, C.J., Yu, R.R., Gardner, M.L., Shimko, J.C. & Ottesen, J.J. Chemical and biological tools for the preparation of modified histone proteins. Topics in current chemistry 363, 193-226 (2015).

164

8. Müller, M.M. & Muir, T.W. Histones: At the Crossroads of Peptide and Protein Chemistry. Chemical Reviews 115, 2296-2349 (2015).

9. Borgia, J.A. & Fields, G.B. Chemical synthesis of proteins. Trends in Biotechnology 18, 243-251 (2000).

10. Kent, S. Total chemical synthesis of enzymes. Journal of Peptide Science 9, 574- 593 (2003).

11. Dirksen, A. & Dawson, P.E. Expanding the scope of chemoselective peptide ligations in chemical biology. Current Opinion in Chemical Biology 12, 760-766 (2008).

12. Lu, W.Y., Qasim, M.A., Laskowski, M. & Kent, S.B.H. Probing intermolecular main chain hydrogen bonding in serine proteinase-protein inhibitor complexes: Chemical synthesis of backbone-engineered turkey ovomucoid third domain. Biochemistry 36, 673-679 (1997).

13. Smith, R., Brereton, I.M., Chai, R.Y. & Kent, S.B.H. Ionization states of the catalytic residues in HIV-1 protease. Nature Structural Biology 3, 946-950 (1996).

14. Ottesen, J.J., Bar-Dagan, M., Giovani, B. & Muir, T.W. An amalgamation of solid phase peptide synthesis and ribosomal peptide synthesis. Biopolymers 90, 406- 414 (2008).

15. Low, D.W. & Hill, M.G. Rational fine-tuning of the redox potentials in chemically synthesized rubredoxins. Journal of the American Chemical Society 120, 11536-11537 (1998).

16. Shimko, J.C., North, J.A., Bruns, A.N., Poirier, M.G. & Ottesen, J.J. Preparation of Fully Synthetic Histone H3 Reveals That Acetyl-Lysine 56 Facilitates Protein Binding Within Nucleosomes. Journal of Molecular Biology 408, 187-204 (2011).

165

17. Baca, M., Alewood, P.F. & Kent, S.B.H. STRUCTURAL-ENGINEERING OF THE HIV-1 PROTEASE MOLECULE WITH A BETA-TURN MIMIC OF FIXED GEOMETRY. Protein Science 2, 1085-1091 (1993).

18. Milton, R.C.D., Milton, S.C.F. & Kent, S.B.H. TOTAL CHEMICAL SYNTHESIS OF A D-ENZYME - THE ENANTIOMERS OF HIV-1 PROTEASE SHOW DEMONSTRATION OF RECIPROCAL CHIRAL SUBSTRATE-SPECIFICITY. Science 256, 1445-1448 (1992).

19. Wang, Z., Xu, W., Liu, L. & Zhu, T.F. A synthetic molecular system capable of mirror-image genetic replication and transcription. Nature Chemistry published online ahead of print(2016).

20. Duvigneaud, V. et al. THE SYNTHESIS OF AN OCTAPEPTIDE AMIDE WITH THE HORMONAL ACTIVITY OF OXYTOCIN. Journal of the American Chemical Society 75, 4879-4880 (1953).

21. Hirschma.R et al. STUDIES ON TOTAL SYNTHESIS OF AN ENZYME .V. PREPARATION OF ENZYMATICALLY ACTIVE MATERIAL. Journal of the American Chemical Society 91, 507-& (1969).

22. Kent, S.B.H. CHEMICAL SYNTHESIS OF PEPTIDES AND PROTEINS. Annual Review of Biochemistry 57, 957-989 (1988).

23. Merrifield, R.B. SOLID PHASE PEPTIDE SYNTHESIS .1. SYNTHESIS OF A TETRAPEPTIDE. Journal of the American Chemical Society 85, 2149-& (1963).

24. Mitchell, A.R. Invited Review: Bruce Merrifield and solid-phase peptide synthesis: A historical assessment. Biopolymers 90, 175-184 (2008).

25. Martin, F.G. & Albericio, F. Solid supports for the synthesis of peptides - From the first resin used to the most sophisticated in the market. Chimica Oggi- Chemistry Today 26, 29-34 (2008).

26. Albericio, F. Developments in peptide and amide synthesis. Current Opinion in Chemical Biology 8, 211-221 (2004).

166

27. Al-Warhi, T.I., Al-Hazimi, H.M.A. & El-Faham, A. Recent development in peptide coupling reagents. Journal of Saudi Chemical Society 16, 97-116 (2012).

28. Schnolzer, M. & Kent, S.B.H. CONSTRUCTING PROTEINS BY DOVETAILING UNPROTECTED SYNTHETIC PEPTIDES - BACKBONE- ENGINEERED HIV PROTEASE. Science 256, 221-225 (1992).

29. Rose, K. FACILE SYNTHESIS OF HOMOGENEOUS ARTIFICIAL PROTEINS. Journal of the American Chemical Society 116, 30-33 (1994).

30. Dawson, P.E., Muir, T.W., Clark-Lewis, I. & Kent, S.B.H. Synthesis of Proteins by Native Chemical Ligation. Science 266, 776-779 (1994).

31. Englebretsen, D.R., Garnham, B.G., Bergman, D.A. & Alewood, P.F. A NOVEL THIOETHER LINKER - CHEMICAL SYNTHESIS OF A HIV-1 PROTEASE ANALOG BY THIOETHER LIGATION. Tetrahedron Letters 36, 8871-8874 (1995).

32. Baca, M., Muir, T.W., Schnolzer, M. & Kent, S.B.H. CHEMICAL LIGATION OF CYSTEINE-CONTAINING PEPTIDES - SYNTHESIS OF A 22-KDA TETHERED DIMER OF HIV-1 PROTEASE. Journal of the American Chemical Society 117, 1881-1887 (1995).

33. Liu, C.F., Rao, C. & Tam, J.P. Orthogonal ligation of unprotected peptide segments through formation for the synthesis of HIV-1 protease. Journal of the American Chemical Society 118, 307-312 (1996).

34. Hackeng, T.M. & Dawson, P.E. Protein synthesis by native chemical ligation: Expanded scope by using straightforward methodology. Proc Natl Acad Sci U S A 96, 10069-10073 (1999).

35. Bondalapati, S., Jbara, M. & Brik, A. Expanding the chemical toolbox for the synthesis of large and uniquely modified proteins. Nature Chemistry 8, 407-418 (2016).

167

36. Muir, T.W., Sondhi, D. & Cole, P.A. Expressed protein ligation: A general method for protein engineering. Proceedings of the National Academy of Sciences of the United States of America 95, 6705-6710 (1998).

37. Yan, L.Z. & Dawson, P.E. Synthesis of peptides and proteins without cysteine residues by native chemical ligation combined with desulfurization. Journal of the American Chemical Society 123, 526-533 (2001).

38. Wan, Q. & Danishefsky, S.J. Free-Radical-Based, Specific Desulfurization of Cysteine: A Powerful Advance in the Synthesis of Polypeptides and Glycopolypeptides. Angewandte Chemie International Edition 46, 9248-9252 (2007).

39. Haase, C., Rohde, H. & Seitz, O. Native Chemical Ligation at Valine. Angewandte Chemie International Edition 47, 6807-6810 (2008).

40. Crich, D. & Banerjee, A. Native Chemical Ligation at Phenylalanine. JACS Communications 129, 10064-10065 (2007).

41. Malins, L.R. & Payne, R.J. Modern Extensions of Native Chemical Ligation for Chemical Protein Synthesis. in Protein Ligation and Total Synthesis I, Vol. 362 (ed. Liu, L.) 27-87 (Springer-Verlag Berlin, Berlin, 2015).

42. Zhang, Y., Xu, C., Lam, H.Y., Lee, C.L. & Li, X. Protein chemical synthesis by serine and threonine ligation. Proceedings of the National Academy of Sciences of the United States of America 110, 6657-6662 (2013).

43. Zhang, Y., Malamakal, R.M. & Chenoweth, D.M. Aza- Induces Collagen Hyperstability. Journal of the American Chemical Society 137, 12422-12425 (2015).

44. Weinstock, M.T., Jacobsen, M.T. & Kay, M.S. Synthesis and folding of a mirror- image enzyme reveals ambidextrous chaperone activity. Proceedings of the National Academy of Sciences of the United States of America 111, 11679-11684 (2014).

168

45. Zawadzke, L.E. & Berg, J.M. THE STRUCTURE OF A CENTROSYMMETRIC PROTEIN CRYSTAL. Proteins-Structure Function and Genetics 16, 301-305 (1993).

46. Pentelute, B.L. et al. X-ray structure of snow flea antifreeze protein determined by racemic crystallization of synthetic protein enantiomers. Journal of the American Chemical Society 130, 9695-9701 (2008).

47. Mandal, K. et al. Racemic crystallography of synthetic protein enantiomers used to determine the X-ray structure of plectasin by direct methods. Protein Science 18, 1146-1154 (2009).

48. Kochendoerfer, G.G. et al. Total chemical synthesis of the integral membrane protein influenza A virus M2: Role of its C-terminal domain in tetramer assembly. Biochemistry 38, 11905-11913 (1999).

49. Olschewski, D. & Becker, C.F.W. Chemical synthesis and semisynthesis of membrane proteins. Molecular Biosystems 4, 733-740 (2008).

50. Shin, Y. et al. Fmoc-based synthesis of peptide-(alpha)thioesters: Application to the total chemical synthesis of a glycoprotein by native chemical ligation. Journal of the American Chemical Society 121, 11684-11689 (1999).

51. Pratt, M.R., Abeywardana, T. & Marotta, N.P. Synthetic Proteins and Peptides for the Direct Interrogation of alpha-Synuclein Posttranslational Modifications. Biomolecules 5, 1210-1227 (2015).

52. Jenuwein, T. & Allis, C.D. Translating the histone code. Science 293, 1074-1080 (2001).

53. Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. Nature 403, 41-45 (2000).

54. Weisbrod, S. ACTIVE CHROMATIN. Nature 297, 289-295 (1982).

55. Kornberg, R.D. STRUCTURE OF CHROMATIN. Annual Review of Biochemistry 46, 931-954 (1977). 169

56. Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F. & Richmond, T.J. Crystal structure of the nucleosome core particle at 2.8 angstrom resolution. Nature 389, 251-260 (1997).

57. Mersfelder, E.L. & Parthun, M.R. The tale beyond the tail: histone core domain modifications and the regulation of chromatin structure. Nucleic Acids Research 34, 2653-2662 (2006).

58. Brehove, M. et al. Histone Core Phosphorylation Regulates DNA Accessibility. Journal of Biological Chemistry 290, 22612-22621 (2015).

59. Manohar, M. et al. Acetylation of Histone H3 at the Nucleosome Dyad Alters DNA-Histone Binding. Journal of Biological Chemistry 284, 23312-23321 (2009).

60. North, J.A. et al. Histone H3 phosphorylation near the nucleosome dyad alters chromatin structure. Nucleic Acids Research 42, 4922-4933 (2014).

61. Riposo, J. & Mozziconacci, J. Nucleosome positioning and nucleosome stacking: two faces of the same coin. Molecular Biosystems 8, 1172-1178 (2012).

62. Scheffer, M.P., Eltsov, M., Bednar, J. & Frangakis, A.S. Nucleosomes stacked with aligned dyad axes are found in native compact chromatin in vitro. Journal of Structural Biology 178, 207-214 (2012).

63. Park, J.H., Cosgrove, M.S., Youngman, E., Wolberger, C. & Boeke, J.D. A core nucleosome surface crucial for transcriptional silencing. Nature Genetics 32, 273- 279 (2002).

64. Bannister, A.J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Research 21, 381-395 (2011).

65. Zhang, Y. & Reinberg, D. Transcription regulation by histone : interplay between different covalent modifications of the core histone tails. Genes & Development 15, 2343-2360 (2001).

170

66. Rodriguez, Y., Hinz, J.M. & Smerdon, M.J. Accessing DNA damage in chromatin: Preparing the chromatin landscape for base excision repair. DNA Repair 32, 113-119 (2015).

67. Henikoff, S. & Ahmad, K. Assembly of variant histones into chromatin. in Annual Review of Cell and Developmental Biology, Vol. 21 133-153 (2005).

68. Vardabasso, C. et al. Histone variants: emerging players in cancer biology. Cellular and Molecular Life Sciences 71, 379-404 (2013).

69. Sullivan, K.F., Hechenberger, M. & Masri, K. HUMAN CENP-A CONTAINS A HISTONE H3 RELATED HISTONE FOLD DOMAIN THAT IS REQUIRED FOR TARGETING TO THE CENTROMERE. Journal of Cell Biology 127, 581- 592 (1994).

70. Verdaasdonk, J.S. & Bloom, K. Centromeres: unique chromatin structures that drive chromosome segregation. Nature Reviews Molecular Cell Biology 12, 320- 332 (2011).

71. Stellfox, M.E., Bailey, A.O. & Foltz, D.R. Putting CENP-A in its place. Cellular and Molecular Life Sciences 70, 387-406 (2013).

72. Rosic, S. & Erhardt, S. No longer a nuisance: long non-coding RNAs join CENP- A in epigenetic centromere regulation. Cellular and Molecular Life Sciences 73, 1387-1398 (2016).

73. Ito, T., Tyler, J.K., Bulger, M., Kobayashi, R. & Kadonaga, J.T. ATP-facilitated chromatin assembly with a nucleoplasmin-like protein from Drosophila melanogaster. Journal of Biological Chemistry 271, 25041-25048 (1996).

74. Loyola, A. & Almouzni, G. Histone chaperones, a supporting role in the limelight. Biochimica Et Biophysica Acta-Gene Structure and Expression 1677, 3-11 (2004).

75. Saha, A., Wittmeyer, J. & Cairns, B.R. Chromatin remodelling: the industrial revolution of DNA around histones. Nature Reviews Molecular Cell Biology 7, 437-447 (2006).

171

76. Wilson, B.G. & Roberts, C.W.M. SWI/SNF nucleosome remodellers and cancer. Nature Reviews Cancer 11, 481-492 (2011).

77. Javaid, S. et al. Nucleosome remodeling by hMSH2-hMSH6. Mol Cell 36, 1086- 94 (2009).

78. Polach, K.J. & Widom, J. MECHANISM OF PROTEIN ACCESS TO SPECIFIC DNA-SEQUENCES IN CHROMATIN - A DYNAMIC EQUILIBRIUM- MODEL FOR GENE-REGULATION. Journal of Molecular Biology 254, 130- 149 (1995).

79. Li, G., Levitus, M., Bustamante, C. & Widom, J. Rapid spontaneous accessibility of nucleosomal DNA. Nature Structural & Molecular Biology 12, 46-53 (2005).

80. Allfrey, V.G., Faulkner, R. & Mirsky, A.E. ACETYLATION + METHYLATION OF HISTONES + THEIR POSSIBLE ROLE IN REGULATION OF RNA SYNTHESIS. Proceedings of the National Academy of Sciences of the United States of America 51, 786-+ (1964).

81. Goldknopf, I.L. et al. ISOLATION AND CHARACTERIZATION OF PROTEIN-A24, A HISTONE-LIKE NON-HISTONE CHROMOSOMAL PROTEIN. Journal of Biological Chemistry 250, 7182-7187 (1975).

82. Dhall, A. et al. Sumoylated Human Histone H4 Prevents Chromatin Compaction by Inhibiting Long-range Internucleosomal Interactions. J Biol Chem 289, 33827- 37 (2014).

83. Singh, M.P., Wijeratne, S.S.K. & Zempleni, J. Biotinylation of lysine 16 in histone H4 contributes toward nucleosome condensation. Archives of Biochemistry and Biophysics 529, 105-111 (2013).

84. Wisniewski, J.R., Zougman, A. & Mann, M. N(epsilon)-Formylation of lysine is a widespread post-translational modification of nuclear proteins occurring at residues involved in regulation of chromatin function. Nucleic Acids Research 36, 570-577 (2008).

172

85. Nishizuk.Y, Ueda, K., Honjo, T. & Hayaishi, O. ENZYMIC ADENOSINE DIPHOSPHATE RIBOSYLATION OF HISTONE AND POLY ADENOSINE DIPHOSPHATE RIBOSE SYNTHESIS IN RAT LIVER NUCLEI. Journal of Biological Chemistry 243, 3765-& (1968).

86. Tan, M. et al. Identification of 67 Histone Marks and Histone Lysine Crotonylation as a New Type of Histone Modification. Cell 146, 1016-1028 (2011).

87. Wang, H.B. et al. Methylation of histone H4 at arginine 3 facilitating transcriptional activation by nuclear hormone receptor. Science 293, 853-857 (2001).

88. Tanikawa, C. et al. Regulation of histone modification and chromatin structure by the p53-PADI4 pathway. Nature Communications 3(2012).

89. Ord, M.G. & Stocken, L.A. PHOSPHATE AND THIOL GROUPS IN HISTONE F3 FROM RAT LIVER AND THYMUS NUCLEI. Biochemical Journal 102, 631-& (1967).

90. Sakabe, K., Wang, Z. & Hart, G.W. Beta-N-acetylglucosamine (O-GlcNAc) is part of the histone code. Proc Natl Acad Sci U S A 107, 19915-20 (2010).

91. Greenberg, R.A. Histone tails: Directing the chromatin response to DNA damage. Febs Letters 585, 2883-2890 (2011).

92. Voigt, P. & Reinberg, D. Histone tails – ideal motifs for probing epigenetics through chemical biology approaches. ChemBioChem 12, 236-252 (2011).

93. Hung, T. et al. ING4 mediates crosstalk between histone H3 K4 trimethylation and H3 acetylation to attenuate cellular transformation. Mol Cell 33, 248-56 (2009).

94. Fischle, W., Wang, Y. & Allis, C.D. Histone and chromatin cross-talk. Current Opinion in Cell Biology 15, 172-183 (2003).

173

95. Schwammle, V., Aspalter, C.M., Sidoli, S. & Jensen, O.N. Large Scale Analysis of Co-existing Post-translational Modifications in Histone Tails Reveals Global Fine Structure of Cross-talk. Molecular & Cellular Proteomics 13, 1855-1865 (2014).

96. Rothbart, S.B. & Strahl, B.D. Interpreting the language of histone and DNA modifications. Biochimica Et Biophysica Acta-Gene Regulatory Mechanisms 1839, 627-643 (2014).

97. Fierz, B. Synthetic Chromatin Approaches To Probe the Writing and Erasing of Histone Modifications. Chemmedchem 9, 495-504 (2014).

98. Torres, I.O. & Fujimori, D.G. Functional coupling between writers, erasers and readers of histone and DNA methylation. Current Opinion in Structural Biology 35, 68-75 (2015).

99. Zentner, G.E. & Henikoff, S. Regulation of nucleosome dynamics by histone modifications. Nature Structural & Molecular Biology 20, 259-266 (2013).

100. Jack, A.P.M. & Hake, S.B. Getting down to the core of histone modifications. Chromosoma 123, 355-371 (2014).

101. North, J.A. et al. Histone H3 phosphorylation near the nucleosome dyad alters chromatin structure. Nucleic Acids Research 42, 4922-4933 (2014).

102. Simon, M. et al. Histone fold modifications control nucleosome unwrapping and disassembly. Proc Natl Acad Sci U S A 108, 12711-6 (2011).

103. Chatterjee, N. et al. Histone Acetylation near the Nucleosome Dyad Axis Enhances Nucleosome Disassembly by RSC and SWI/SNF. Molecular and Cellular Biology 35, 4083-4092 (2015).

104. Su, X., Ren, C. & Freitas, M.A. Mass spectrometry-based strategies for characterization of histones and their post-translational modifications. Expert Review of Proteomics 4, 211-225 (2007).

174

105. Zhao, Y. & Garcia, B.A. Comprehensive Catalog of Currently Documented Histone Modifications. Cold Spring Harbor Perspectives in Biology 7, a025064 (2015).

106. Pick, H., Kilic, S. & Fierz, B. Engineering chromatin states: Chemical and synthetic biology approaches to investigate histone modification function. Biochimica Et Biophysica Acta-Gene Regulatory Mechanisms 1839, 644-656 (2014).

107. Megee, P.C., Morgan, B.A., Mittman, B.A. & Smith, M.M. GENETIC- ANALYSIS OF HISTONE-H4 - ESSENTIAL ROLE OF LYSINES SUBJECT TO REVERSIBLE ACETYLATION. Science 247, 841-845 (1990).

108. Zhao, Y. et al. SITE-DIRECTED MUTAGENESIS OF PHOSPHORYLATION SITES OF THE BRANCHED-CHAIN ALPHA-KETOACID DEHYDROGENASE COMPLEX. Journal of Biological Chemistry 269, 18583- 18587 (1994).

109. Matsubara, K., Sano, N., Umehara, T. & Horikoshi, M. Global analysis of functional surfaces of core histones with comprehensive point mutants. Genes Cells 12, 13-33 (2007).

110. North, J.A. et al. Phosphorylation of histone H3(T118) alters nucleosome dynamics and remodeling. Nucleic Acids Research 39, 6465-6474 (2011).

111. Normanly, J., Kleina, L.G., Masson, J.M., Abelson, J. & Miller, J.H. CONSTRUCTION OF ESCHERICHIA-COLI AMBER SUPPRESSOR TRANSFER-RNA GENES .3. DETERMINATION OF TRANSFER-RNA SPECIFICITY. Journal of Molecular Biology 213, 719-726 (1990).

112. Wang, L., Magliery, T.J., Liu, D.R. & Schultz, P.G. A new functional suppressor tRNA/aminoacyl-tRNA synthetase pair for the in vivo incorporation of unnatural amino acids into proteins. Journal of the American Chemical Society 122, 5010- 5011 (2000).

113. Neumann, H. et al. A Method for Genetically Installing Site-Specific Acetylation in Recombinant Histones Defines the Effects of H3 K56 Acetylation. Molecular Cell 36, 153-163 (2009). 175

114. Nguyen, D.P., Alai, M.M.G., Virdee, S. & Chin, J.W. Genetically Directing ɛ-N, N-Dimethyl-l-Lysine in Recombinant Histones. Chemistry & Biology 17, 1072- 1076 (2010).

115. Wang, Y.-S. et al. A genetically encoded photocaged Nε-methyl-l-lysine. Molecular BioSystems 6, 1557 (2010).

116. Yang, R., Pasunooti, K.K., Li, F., Liu, X.-W. & Liu, C.-F. Dual Native Chemical Ligation at Lysine. JACS Communications 131, 13592-13593 (2009).

117. Li, F. et al. A Direct Method for Site-Specific Protein Acetylation. Angewandte Chemie-International Edition 50, 9611-9614 (2011).

118. Yang, R., Bi, X., Li, F., Cao, Y. & Liu, C.-F. Native chemical ubiquitination using a genetically incorporated azidonorleucine. Chemical Communications 50, 7971 (2014).

119. Lee, S. et al. A Facile Strategy for Selective Incorporation of Phosphoserine into Histones. Angewandte Chemie International Edition 52, 5771-5775 (2013).

120. Guo, J., Wang, J., Lee, J.S. & Schultz, P.G. Site-Specific Incorporation of Methyl- and Acetyl-Lysine Analogues into Recombinant Proteins. Angewandte Chemie International Edition 47, 6399-6401 (2008).

121. Chalker, J.M., Lercher, L., Rose, N.R., Schofield, C.J. & Davis, B.G. Conversion of cysteine into dehydroalanine enables access to synthetic histones bearing diverse post-translational modifications. Angew Chem Int Ed Engl 51, 1835-9 (2012).

122. Wang, Z.U. et al. A Facile Method to Synthesize Histones with Posttranslational Modification Mimics. Biochemistry 51, 5232-5234 (2012).

123. Bernardes, G.J.L., Malker, J.M., Errey, J.C. & Davis, B.G. Facile Conversion of Cysteine and Alkyl Cysteines to Dehydroalanine on Protein Surfaces: Versatile and Switchable Access to Functionalized Proteins. JACS Communications 130, 5052-5053 (2008).

176

124. Simon, M.D. et al. The Site-Specific Installation of Methyl-Lysine Analogs into Recombinant Histones. Cell 128, 1003-1012 (2007).

125. Huang, R. et al. Site-Specific Introduction of an Acetyl-Lysine Mimic into Peptides and Proteins by Cysteine Alkylation. Journal of the American Chemical Society 132, 9986-9987 (2010).

126. Le, D.D., Cortesi, A.T., Myers, S.A., Burlingame, A.L. & Fujimori, D.G. Site- Specific and Regiospecific Installation of Methylarginine Analogues into Recombinant Histones and Insights into Effector Protein Binding. Journal of the American Chemical Society 135, 2879-2882 (2013).

127. Jia, G. et al. A systematic evaluation of the compatibility of histones containing methyl-lysine analogues with biochemical reactions. Cell Research 19, 1217-1220 (2009).

128. Chen, Z., Gryzbowski, A.T. & Ruthenburg, A.J. Traceless Semisynthesis of a Set of Histone 3 Species Bearing Specific Lysine Methylation Marks. ChembioChem 15, 2071-2075 (2014).

129. Chatterjee, A., McGinty, R.K., Fierz, B. & Muir, T.W. Disulfide-directed histone ubiquitylation reveals plasticity in hDot1L activation. Nature Chemical Biology 6, 267-269 (2010).

130. Whitcomb, S.J. et al. Histone monoubiquitylation position determines specificity and direction of enzymatic cross-talk with histone methyltransferases Dot1L and PRC2. J Biol Chem 287, 23718-25 (2012).

131. Davis, L. & Chin, J.W. Designer proteins: applications of genetic code expansion in cell biology. Nature Reviews Molecular Cell Biology 13, 168-182 (2012).

132. Shimko, J.C., Howard, C.J., Poirier, M.G. & Ottesen, J.J. Preparing Semisynthetic and Fully Synthetic Histones H3 and H4 to Modify the Nucleosome Core. Methods in Molecular Biology 981, 177-192 (2013).

133. Wang, Z. et al. Combinatorial patterns of histone acetylations and in the human genome. Nature Genetics 40, 897-903 (2008).

177

134. Meledin, R., Mali, S.M. & Brik, A. Pushing the Boundaries of Chemical Protein Synthesis: The Case of Ubiquitin Chains and Polyubiquitinated Peptides and Proteins. Chemical Record 16, 509-519 (2016).

135. Raibaut, L., Ollivier, N. & Melnyk, O. Sequential native peptide ligation strategies for total chemical protein synthesis. Chemical Society Reviews 41, 7001 (2012).

136. Blanco-Canosa, J.B. & Dawson, P.E. An Efficient Fmoc-SPPS Approach for the Generation of Thioester Peptide Precursors for Use in Native Chemical Ligation. Angewandte Chemie International Edition 47, 6851-6855 (2008).

137. Becker, C.F.W. et al. Total chemical synthesis of a functional interacting protein pair: The protooncogene H-Ras and the Ras-binding domain of its effector c- Raf1. Proceedings of the National Academy of Sciences of the United States of America 100, 5075-5080 (2003).

138. Villain, M., Vizzavona, J. & Rose, K. Covalent capture: a new tool for the purification of synthetic and recombinant polypeptides. Chemistry & Biology 8, 673-679 (2001).

139. Bang, D. & Kent, S.B.H. A one-pot total synthesis of crambin. Angewandte Chemie-International Edition 43, 2534-2538 (2004).

140. Hojo, H. et al. Application of a novel thioesterification reaction to the synthesis of chemokine CCL27 by the modified thioester method. Organic & Biomolecular Chemistry 6, 1808-1813 (2008).

141. Kawakami, T. & Aimoto, S. Sequential peptide ligation by using a controlled cysteinyl prolyl ester (CPE) autoactivating unit. Tetrahedron Letters 48, 1903- 1905 (2007).

142. Ollivier, N., Dheur, J., Mhidia, R., Blanpain, A. & Melnyk, O. Bis(2- sulfanylethyl)amino Native Peptide Ligation. Organic Letters 12, 5238-5241 (2010).

178

143. Fang, G.M. et al. Protein Chemical Synthesis by Ligation of Peptide Hydrazides. Angewandte Chemie-International Edition 50, 7645-7649 (2011).

144. Li, J. et al. One-pot native chemical ligation of peptide hydrazides enables total synthesis of modified histones. Organic & Biomolecular Chemistry 12, 5435 (2014).

145. Bang, D., Pentelute, B.L. & Kent, S.B.H. Kinetically controlled ligation for the convergent chemical synthesis of proteins. Angewandte Chemie-International Edition 45, 3985-3988 (2006).

146. Siman, P., Karthikeyan, S.V., Nikolov, M., Fischle, W. & Brik, A. Convergent Chemical Synthesis of Histone H2B Protein for the Site-Specific Ubiquitination at Lys34. Angewandte Chemie-International Edition 52, 8059-8063 (2013).

147. Fang, G.M., Wang, J.X. & Liu, L. Convergent chemical synthesis of proteins by ligation of peptide hydrazides. Angew Chem Int Ed Engl 51, 10347-50 (2012).

148. Canne, L.E. et al. Chemical protein synthesis by solid phase ligation of unprotected peptide segments. Journal of the American Chemical Society 121, 8720-8727 (1999).

149. Camarero, J.A., Cotton, G.J., Adeva, A. & Muir, T.W. Chemical ligation of unprotected peptides directly from a solid support. Journal of Peptide Research 51, 303-316 (1998).

150. Raibaut, L. et al. Highly efficient solid phase synthesis of large polypeptides by iterative ligations of bis(2-sulfanylethyl)amido (SEA) peptide segments. Chemical Science 4, 4061-4066 (2013).

151. Bang, D. & Kent, S.B.H. HiS(6) tag-assisted chemical protein synthesis. Proceedings of the National Academy of Sciences of the United States of America 102, 5014-5019 (2005).

152. Brik, A., Keinan, E. & Dawson, P.E. Protein synthesis by solid-phase chemical ligation using a safety catch linker. Journal of Organic Chemistry 65, 3829-3835 (2000).

179

153. Jbara, M., Seenaiah, M. & Brik, A. Solid phase chemical ligation employing a rink amide linker for the synthesis of histone H2B protein. Chem Communications 50, 12534-12537 (2014).

154. Rink, H. SOLID-PHASE SYNTHESIS OF PROTECTED PEPTIDE- FRAGMENTS USING A TRIALKOXY-DIPHENYL-METHYLESTER RESIN. Tetrahedron Letters 28, 3787-3790 (1987).

155. Wang, S.S. PARA-ALKOXYBENZYL ALCOHOL RESIN AND PARA- ALKOXYBENZYLOXYCARBONYLHYDRAZIDE RESIN FOR SOLID- PHASE SYNTHESIS OF PROTECTED PEPTIDE FRAGMENTS. Journal of the American Chemical Society 95, 1328-1333 (1973).

156. Luger, K., Rechsteiner, T.J. & Richmond, T.J. Preparation of Nucleosome Core Particle from Recombinant Histones. Methods in Enzymology 304, 1-19 (1999).

157. Lowary, P.T. & Widom, J. New DNA sequence rules for high affinity binding to histone octamer and sequence-directed nucleosome positioning. Journal of Molecular Biology 276, 19-42 (1998).

158. Tanaka, Y. et al. Expression and purification of recombinant human histones. Methods 33, 3-11 (2004).

159. Pace, C.N., Vajdos, F., Fee, L., Grimsley, G. & Gray, T. HOW TO MEASURE AND PREDICT THE MOLAR ABSORPTION-COEFFICIENT OF A PROTEIN. Protein Science 4, 2411-2423 (1995).

160. Yu, R.R. et al. Hybrid phase ligation for efficient synthesis of histone proteins. Organic & Biomolecular Chemistry 14, 2603-2607 (2016).

161. Choma, C.T., Robillard, G.T. & Englebretsen, D.R. Synthesis of hydrophobic peptides: An Fmoc "Solubilising Tail" method. Tetrahedron Letters 39, 2417- 2420 (1998).

162. Harris, P.W.R. & Brimble, M.A. Toward the Total Chemical Synthesis of the Cancer Protein NY-ESO-1. Biopolymers 94, 542-550 (2010).

180

163. Meldal, M. PEGA - A FLOW STABLE POLYETHYLENE-GLYCOL DIMETHYL ACRYLAMIDE COPOLYMER FOR SOLID-PHASE SYNTHESIS. Tetrahedron Letters 33, 3077-3080 (1992).

164. Kunys, A.R., Lian, W. & Pei, D. Specificity Profiling of Protein-Binding Domains Using One-Bead-One-Compound Peptide Libraries. Current Protocols in Chemical Biology 4, 331-355 (2012).

165. Johnson, E.C.B. & Kent, S.B.H. Towards the total chemical synthesis of integral membrane proteins: a general method for the synthesis of hydrophobic peptide- (alpha)thioester building blocks. Tetrahedron Letters 48, 1795-1799 (2007).

166. Song, O.K., Wang, X.R., Waterborg, J.H. & Sternglanz, R. An N-alpha- acetyltransferase responsible for acetylation of the N-terminal residues of histones H4 and H2A. Journal of Biological Chemistry 278, 38109-38112 (2003).

167. Schnolzer, M., Alewood, P., Jones, A., Alewood, D. & Kent, S.B.H. INSITU NEUTRALIZATION IN BOC-CHEMISTRY SOLID-PHASE PEPTIDE- SYNTHESIS - RAPID, HIGH-YIELD ASSEMBLY OF DIFFICULT SEQUENCES. International Journal of Peptide and Protein Research 40, 180- 193 (1992).

168. Fields, G.B. & Noble, R.L. SOLID-PHASE PEPTIDE-SYNTHESIS UTILIZING 9-FLUORENYLMETHOXYCARBONYL AMINO-ACIDS. International Journal of Peptide and Protein Research 35, 161-214 (1990).

169. Maede, V., Els-Heindl, S. & Beck-Sickinger, A.G. Automated solid-phase peptide synthesis to obtain therapeutic peptides. Beilstein Journal of Organic Chemistry 10, 1197-1212 (2014).

170. Mende, F. & Seitz, O. 9-Fluorenylmethoxycarbonyl-Based Solid-Phase Synthesis of Peptide alpha-Thioesters. Angewandte Chemie-International Edition 50, 1232- 1240 (2011).

171. Li, X.Q., Kawakami, T. & Aimoto, S. Direct preparation of peptide thioesters using an Fmoc solid-phase method. Tetrahedron Letters 39, 8669-8672 (1998).

181

172. Mezo, A.R., Ottesen, J.J. & Imperiali, B. Discovery and characterization of a discretely folded homotrimeric beta beta alpha peptide. Journal of the American Chemical Society 123, 1002-1003 (2001).

173. Botti, P., Villain, M., Manganiello, S. & Gaertner, H. Native chemical ligation through in situ O to S acyl shift. Organic Letters 6, 4861-4864 (2004).

174. Terrier, V.P., Adihou, H., Arnould, M., Delmas, A.F. & Aucagne, V. A straightforward method for automated Fmoc-based synthesis of bio-inspired peptide crypto-thioesters. Chemical Science 7, 339-345 (2016).

175. Mahto, S.K., Howard, C.J., Shimko, J.C. & Ottesen, J.J. A Reversible Protection Strategy To Improve Fmoc-SPPS of Peptide Thioesters by the N-Acylurea Approach. ChemBioChem 12, 2488-2494 (2011).

176. White, P.D. & Behrendt, R. Practical aspects of the use of the Dbz linker for making thioesters by Fmoc SPPS. Journal of Peptide Science 16, 71-72 (2010).

177. Morley, A.D. Allyloxycarbonyl - a useful protecting group for phenolic amino acids and applications on solid support. Tetrahedron Letters 41, 7401-7404 (2000).

178. Blanco-Canosa, J.B., Nardone, B., Albericio, F. & Dawson, P.E. Chemical Protein Synthesis Using a Second-Generation N-Acylurea Linker for the Preparation of Peptide-Thioester Precursors. Journal of the American Chemical Society 137, 7197-7209 (2015).

179. Kovacs, J., Kim, S., Holleran, E. & Gorycki, P. STUDIES ON THE RACEMIZATION AND COUPLING OF N-ALPHA,NIM-PROTECTED HISTIDINE ACTIVE ESTERS. Journal of Organic Chemistry 50, 1497-1504 (1985).

180. Han, Y.X., Albericio, F. & Barany, G. Occurrence and minimization of cysteine racemization during stepwise solid-phase peptide synthesis. Journal of Organic Chemistry 62, 4307-4312 (1997).

182

181. Windridg.Gc & Jorgense.Ec. 1-HYDROXYBENZOTRIAZOLE AS A RACEMIZATION-SUPPRESSING REAGENT FOR INCORPORATION OF IM-BENZYL-L-HISTIDINE INTO PEPTIDES. Journal of the American Chemical Society 93, 6318-& (1971).

182. Palmer, D.K., Oday, K., Wener, M.H., Andrews, B.S. & Margolis, R.L. A 17-KD CENTROMERE PROTEIN (CENP-A) COPURIFIES WITH NUCLEOSOME CORE PARTICLES AND WITH HISTONES. Journal of Cell Biology 104, 805- 815 (1987).

183. Yoda, K. et al. Human centromere protein A (CENP-A) can replace histone H3 in nucleosome reconstitution in vitro. Proceedings of the National Academy of Sciences of the United States of America 97, 7266-7271 (2000).

184. Earnshaw, W.C. Discovering centromere proteins: from cold white hands to the A, B, C of CENPs. Nature Reviews Molecular Cell Biology 16, 443-449 (2015).

185. McKinley, K.L. & Cheeseman, I.M. The molecular basis for centromere identity and function. Nature Reviews Molecular Cell Biology 17(2016).

186. Tomonaga, T. et al. Overexpression and mistargeting of centromere protein-A in human primary colorectal cancer. Cancer Research 63, 3511-3516 (2003).

187. Amato, A., Schillaci, T., Lentini, L. & Di Leonardo, A. CENPA overexpression promotes genome instability in pRb-depleted human cells. Molecular Cancer 8(2009).

188. Dimitriadis, E.K., Weberb, C., Gillc, R.K., Diekmann, S. & Dalal, Y. Tetrameric organization of vertebrate centromeric nucleosomes. Proceedings of the National Academy of Sciences 107, 20317-20322 (2010).

189. Henikoff, S. et al. The budding yeast Centromere DNA Element II wraps a stable Cse4 hemisome in either orientation in vivo. eLife 3, 1-23 (2014).

190. Bui, M., Walkiewicz, M.P., Dimitriadis, E.K. & Dalal, Y. The CENP-A nucleosome A battle between Dr Jekyll and Mr Hyde. Nucleus-Austin 4, 37-42 (2013).

183

191. Camahort, R. et al. Cse4 Is Part of an Octameric Nucleosome in Budding Yeast. Molecular Cell 35, 794-805 (2009).

192. Padeganeh, A. et al. Octameric CENP-A Nucleosomes Are Present at Human Centromeres throughout the Cell Cycle. Current Biology 23, 764-769 (2013).

193. Hasson, D. et al. The octamer is the major form of CENP-A nucleosomes at human centromeres. Nature Structural & Molecular Biology 20, 687-+ (2013).

194. Tachiwana, H. et al. Crystal structure of the human centromeric nucleosome containing CENP-A. Nature 476, 232-235 (2011).

195. Zeitlin, S.G., Barber, C.M., Allis, C.D. & Sullivan, K.E. Differential regulation of CENP-A and histone H3 phosphorylation in G2/M. Journal of Cell Science 114, 653-661 (2000).

196. Niikura, Y. et al. CENP-A K124 Ubiquitylation Is Required for CENP-A Deposition at the Centromere. Developmental Cell 32, 589-603 (2015).

197. Bailey, A.O. et al. Posttranslational modification of CENP-A influences the conformation of centromeric chromatin. Proceedings of the National Academy of Sciences of the United States of America 110, 11827-11832 (2013).

198. Yu, Z. et al. Dynamic Phosphorylation of CENP-A at Ser68 Orchestrates Its Cell- Cycle-Dependent Deposition at Centromeres. Developmental Cell 32, 68-81 (2015).

199. Bui, M. et al. Cell-Cycle-Dependent Structural Transitions in the Human CENP- A Nucleosome In Vivo. Cell 150, 317-326 (2012).

200. Winogradoff, D., Zhao, H.Q., Dalal, Y. & Papoian, G.A. Shearing of the CENP-A dimerization interface mediates plasticity in the octameric centromeric nucleosome. Scientific Reports 5(2015).

201. Bui, M. et al. unpublished manuscript. (2016).

184

202. Carroll, C.W., Milks, K.J. & Straight, A.F. Dual recognition of CENP-A nucleosomes is required for centromere assembly. Journal of Cell Biology 189, 1143-1155 (2010).

203. Fierz, B., Kilic, S., Hieb, A.R., Luger, K. & Muir, T.W. Stability of Nucleosomes Containing Homogenously Ubiquitylated H2A and H2B Prepared Using Semisynthesis. Journal of the American Chemical Society 134, 19548-19551 (2012).

204. Dalal, Y. unpublished work. unpublished work.

205. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research 31, 3406-3415 (2003).

206. Siman, P. et al. Chemical Synthesis and Expression of the HIV-1 Rev Protein. Chembiochem 12, 1097-1104 (2011).

207. Vilsmeier, A. & Haack, A. The effect of halogen phosphor on alkyl formanilide - A new method for the characterisation of secondary and tertiary p-alkylamino- benzaldehyde. Berichte Der Deutschen Chemischen Gesellschaft 60, 119-122 (1927).

208. Sobel, R.E., Cook, R.G., Perry, C.A., Annunziato, A.T. & Allis, C.D. CONSERVATION OF DEPOSITION-RELATED ACETYLATION SITES IN NEWLY SYNTHESIZED HISTONES H3 AND H4. Proceedings of the National Academy of Sciences of the United States of America 92, 1237-1241 (1995).

209. Ye, J.X. et al. Histone H4 lysine 91 acetylation: A core domain modification associated with chromatin assembly. Molecular Cell 18, 123-130 (2005).

210. Yang, X.H. et al. HAT4, a Golgi Apparatus-Anchored B-Type Histone Acetyltransferase, Acetylates Free Histone H4 and Facilitates Chromatin Assembly. Molecular Cell 44, 39-50 (2011).

211. Ge, Z.Q. et al. Sites of Acetylation on Newly Synthesized Histone H4 Are Required for Chromatin Assembly and DNA Damage Response Signaling. Molecular and Cellular Biology 33, 3286-3298 (2013).

185

212. Iwasaki, W. et al. Comprehensive Structural Analysis of Mutant Nucleosomes Containing Lysine to Glutamine (KQ) Substitutions in the H3 and H4 Histone- Fold Domains. Biochemistry 50, 7822-7832 (2011).

213. Park, Y.J., Chodaparambil, J.V., Bao, Y.H., McBryant, S.J. & Luger, K. Nucleosome assembly protein 1 exchanges histone H2A-H2B dimers and assists nucleosome sliding. Journal of Biological Chemistry 280, 1817-1825 (2005).

214. Seenaiah, M., Jbara, M., Mali, S.M. & Brik, A. Convergent Versus Sequential Protein Synthesis: The Case of Ubiquitinated and Glycosylated H2B. Angewandte Chemie-International Edition 54, 12374-12378 (2015).

215. Wang, S.S. SOLID-PHASE SYNTHESIS OF PROTECTED PEPTIDE HYDRAZIDES - PREPARATION AND APPLICATION OF HYDROXYMETHYL RESIN AND 3-(P-BENZYLOXYPHENYL)-1,1- DIMETHYLPROPYLOXYCARBONYLHYDRAZIDE RESIN. Journal of Organic Chemistry 40, 1235-1239 (1975).

216. Bello, C., Kikul, F. & Becker, C.F.W. Efficient generation of peptide hydrazides via direct hydrazinolysis of Peptidyl-Wang-TentaGel resins. Journal of Peptide Science 21, 201-207 (2015).

217. Dawson, P.E. Personal Communication. (2015).

218. Wang, J.-X. et al. Peptide o-Aminoanilides as Crypto-Thioesters for Protein Chemical Synthesis. Angewandte Chemie-International Edition 54, 2194-2198 (2015).

186

Appendix A: Standard Laboratory Solutions

15% acrylamide gel

1.77 mL H2O 1.88 mL 40% 37.5:1 Acrylamide/Bisacrylamide Solution 37.5:1 1.25 mL 1.5 M Tris pH 8.8 50 µL 10% SDS 50 µL 10% ammonium persulfate 2 µL tetramethylethylenediamine (TEMED)

Stacking gel (5% acrylamide)

605 µL H2O 125 µL 40% 37.5:1 Acrylamide/Bisacrylamide Solution 37.5:1 250 µL 0.5 M Tris pH 6.8 10 µL 10% SDS 10 µL 10% ammonium persulfate 1 µL TEMED

6 x SDS loading buffer

To make 10 mL of 6 x SDS loading buffer, the following was measured: 1.2 g Sodium dodecyl sulfate (SDS) 6 mg Bromophenol blue 4.7 mL Glycerol 1.2 mL 0.5 M Tris pH 6.8 2.1 mL H2O 1 M Dithiothreitol (DTT) was added to a 1 mL aliquot of 6 x SDS buffer to a final concentration of 100 mM.

187

SDS-PAGE running buffer

5 x Tris Gly Buffer was prepared using the following: 60.4 g Tris base 376 glycine Volume was brought up to 4 L with H2O 1 x Tris Gly was prepared making a 1/5 dilution of 5 x Tris Gly

Coomassie Stain

2 g of Coomassie Brilliant Blue was dissolved in 4 L of 45% methanol, 10% glacial acid, and 45% H2O.

Destain

4 L was prepared with 47.5% methano, 10% acetic acid, 42.5% H2O. LB growth media 20 g LB Broth Lennox (Fisher Scientific) was resuspended in 1 L H2O, and autoclaved. For LB Amp media, Ampicillin was added to the autoclaved LB media to a final concentration of 100 mg/mL.

SOC growth media

0.5% yeast extract 2% Tryptone 10 mM NaCl 2.5 mM KCl 10 mM MgCl2 10 mM MgSO4 20 mM Glucose (added after autoclaving)

188