<<

University of Nevada, Reno

Developing Novel Non-Natural Amino Acids as Spectroscopic Reporters of Structure for Systems

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Chemistry

by

Amy R. Cunningham

Dr. Matthew J. Tucker/Thesis Advisor

August 2018

University of Nevada, Reno The Graduate School

We recommend that the thesis prepared under our supervision by

AMY R. CUNNINGHAM

entitled

Developing Novel Non-Natural Amino Acids as Spectroscopic Reporters of Structure for Peptide Systems

be accepted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Matthew J. Tucker, Ph.D., Advisor

Laina Geary, Ph.D., Committee Member

Ian Wallace, Ph.D., Graduate School Representative

David W. Zeh, Ph.D., Dean, Graduate School

August 2018 i

Abstract

Properly folded proteins are essential to living organisms. Mis-folded proteins can lead to some serious diseases such as Alzheimer’s disease, Lou Gehrig’s disease (ALS), and muscular dystrophy which affect many people. The key to potentially preventing such diseases lies in understanding how and why protein mis-folding occurs and determining ways to prevent it from happening in the first place. This thesis describes a possible method of determining how proteins fold in solution using vibrational probes, two types of infrared spectroscopy (Fourier Transform Infrared and two-dimensional infrared), and molecular dynamics simulations. With these three techniques, the angle, distance, and coupling constant between vibrational probes can be determined. These three quantities allow for the determination of orientation of the backbone of each peptide or protein as well as the orientation of the probes within the peptide or protein systems in solution.

These studies lay the groundwork for technical advances in monitoring in solution on the femtosecond timescale.

ii

Table of Contents

Abstract………………………………………………………………………………….....i

Introduction………………………………………………………………………………..1

Chapter 1: Experimental and Computational Methods……………………………………7

1.1 Peptide Synthesis……………………………………………………………...7

1.2 Two-Dimensional Infrared Spectroscopy……………………………………..8

1.2.1 Theory…………………………...………………………………...8

1.2.2 Utilizing a 2D IR Spectrum to Determine Coupling Constants…..8

1.2.3 Experimental Layout and Design………………………………...11

1.2.4 Pulse Shaper Layout…………………………………………..…11

1.3 Performing 2D IR Experiments Using a Mid-IR Pulse Shaper……………...11

Chapter 2: Cyanophenylalanine and Azide Derivitized Amino Acids…………………..15

2.1 Introduction…………………………………………………………………..15

2.2 Experimental Methods……………………………………………………….16

2.2.1 FTIR Spectra……………………………………………………..16

2.2.2 Computational Methods……………………………………….....17

2.3 Experimental Results………………………………………………………...17

2.4 Transition Dipole Coupling Model and Simulations………………………...19

2.5 Discussion……………………………………………………………………19

Chapter 3: Comparing and as a Spacer in Short ……………...30

3.1 Introduction…………………………………………………………………..30

3.2 Computational Methods……………………………………………………...31 iii

3.2.1 Calculating Peptide Length and Dihedral Angles……………….32

3.3 Results………………………………………………………………………..33

3.4 Conclusions…………………………………………………………………..35

Chapter 4: Fourier Transform Infrared Spectroscopy of Plants…………………………44

4.1 Introduction…………………………………………………………………..44

4.2 Experimental…………………………………………………………………44

4.3 Results and Discussion………………………………………………………46

4.4 Conclusion………………………………………………………………...…47

Summary…………………………………………………………………………………54

References………………………………………………………………………………..55

Appendix A: VMD and MatLab Code…………………………………………………...63

Appendix B: Supplemental Figures……………………………………………………...76

iv

List of Tables

Table 2.1 Frequencies and bandwidths of cyanophenylalanine derivatives in water and tetrahydrofuran (THF)…………………………………………………………………...22

Table 2.2 Extinction coefficients and transition dipole strengths of cyanophenylalanine derivatives………………………………………………………………………………..23

Table 2.3 Extinction coefficients and transition dipole moment strengths of selected infrared probes…………………………………………………………………………...24

v

List of Figures

Figure I.1 Definition of the dihedral angles phi/psi that are used to construct

Ramachandran plots ………………………………………………………………………4

Figure I.2 Allowed and favored regions of a Ramachandran plot………………………...5

Figure I.3 Peak movement in the weak coupling regime……………………………….....6

Figure 1.1 General example of a 2D IR spectrum……………………………………….13

Figure 1.2 Determining coupling constants using frequency of IR bands……………….14

Figure 2.1 Linear infrared spectra of cyanophenylalanine derivatives………………..…25

Figure 2.2 FTIR spectra of short peptides in water……………………………………...26

Figure 2.3 FTIR spectrum of a short peptide that has been fitted with two different ratios of Gaussian curves……………………………………………………………………….27

Figure 2.4 FTIR spectrum of a long peptide that has been fitted with four different ratios of Gaussian curves…………………………………………………………………….…28

Figure 2.5 Coupling constant with respect to distance………………………………….29

Figure 3.1 Depiction of the ‘bond’ that was used to determine the length of the peptides in the MD simulations……………………………………………………………………37

Figure 3.2 Dihedral angles between the nitrile probes….……………………………….38

Figure 3.3 Distance distributions for short and long peptides…………………………..39 vi

Figure 3.4 Distance distributions from multiple starting conformations………………..40

Figure 3.5 Distance distributions after 10 ns and 100 ns………………………………..41

Figure 3.6 Ramachandran plots………………………………………………………….42

Figure 3.7 Distance, angle, and coupling constant maps………………………………...43

Figure 4.1 Depiction of the multiple bounce behavior that occurs in the crystal of the

ATR instrument………………………………………………………………………….48

Figure 4.2 ATR-FTIR spectra of Astragalus leaves (unprepared) ………………………49

Figure 4.3 ATR-FTIR spectra of Astragalus leaves (without wax coating)………….…50

Figure 4.4 ATR-FTIR spectra of methanol extraction of Astragalus leaves with wax….51

Figure 4.5 ATR-FTIR spectra of methanol extraction of Astragalus leaves w/o wax…..52

Figure 4.6 ATR-FTIR spectra of Astragalus leaves after methanol extraction with wax.53

1

Introduction

Many diseases, such as Alzheimer’s disease, Lou Gehrig’s disease (ALS), and muscular dystrophy, are caused by misfolded peptides and proteins. If the correctly folded structure and/or the aggregation mechanism of β-amyloid peptide, one of the main peptides that cause

Alzheimer’s disease, can be elucidated, then potential treatments for this disease could be pursued. The studies that follow illustrate a combination of methods that can be used to determine structural information for larger peptide systems.

Protein structure can be divided into four categories: primary (a chain of amino acids), secondary (helices and sheets), tertiary (coordination of helices and sheets into a single structure), and quaternary (coordination of fully folded proteins with other fully folded proteins).

Peptides are shorter than proteins and they exhibit secondary structures such as α-, 310-, π-, poly- proline, and poly-glycine helices as well as β-sheets in solution. The α-, 310-, and π-helices are all right-handed helices with hydrogen bond connections between i···i+4, i···i+3, and i···i+5 in the backbone, respectively.1,2 Poly-glycine and poly-proline chains form left-handed helices. Since the handedness of the helix changes when these residues are added to a sequence, proline and glycine can be called ‘helix breakers’ of the common right-handed helices.3,4 Dihedral angles, defined as the angle between C1–N–Cαʹ–C1ʹ (φ) and N–Cαʹ–C1ʹ–Nʹ (ψ), formed by the peptide backbone create φ/ψ pairs that can be plotted into a Ramachandran plot (Figure I.1).5,6 Since each secondary structure has a unique set of φ/ψ angles, this type of plot explains the different secondary structures that the peptide could be exhibiting based on their φ/ψ pair. The favored conformational regions shown in the regular Ramachandran plot do not work for poly-proline or poly-glycine peptides because the glycine residues create a very flexible backbone that can have 2 nearly any φ/ψ combination whereas the proline residue creates such a rigid structure that the peptide angles are constrained to a small region of the plot (Figure I.2).7

Spectroscopic techniques, such as infrared spectroscopy, can be used to determine the various secondary structures within a peptide through the application of molecular probes in the system. Common probes are the carbonyl group (C=O) which is found in the peptide backbone and exists as a part of every residue, isotopic labels such as deuterium (C–D stretch)8,9,10 that can be substituted for hydrogen atoms on the amino acid side chains, large metal

11,12 carbonyls such as tungsten hexacarbonyl (W(CO)6), or smaller probes such as azide (R–

11,13 11 N3) and nitrile (C≡N). These probes all have transition dipole moments that can interact with each other either through space or through molecular bonds. This interaction is referred to as vibrational coupling and can be used to determine structural information about each peptide.

This vibrational coupling is evident in linear infrared spectroscopy when the splitting between coupled transitions and intensity changes are observed relative to the uncoupled transitions

(Figure I.3). Unfortunately, if the coupling is very small there will not be any clear evidence of coupling in the linear spectrum, such as changes in peak intensities or peak frequency shifts. A technique that has much more sensitivity than linear infrared spectroscopy is two-dimensional infrared (2D IR) spectroscopy. This technique is analogous to two-dimensional nuclear magnetic resonance (2D NMR) which projects data on two axes.14,15,16,17 2D IR is a more sensitive technique than linear infrared because the signal is proportional to the fourth power of the transition dipole moment whereas the linear signal is proportional to the square of the transition dipole.18,19,20,21,22,23,24 The theory behind 2D IR is discussed in more detail in Chapter 1.

Peptide structure in solution can also be approached from a computational standpoint with the use of nanoscale molecular dynamics (NAMD) and visual molecular dynamics (VMD) 3 software as well as Gaussian calculations. One of the benefits of performing molecular dynamics simulations is that a wide variety of peptides can be sampled in a relatively short timeframe without the costs associated with experimental peptide synthesis and expensive laser setups. This situation allows for the screening of multiple theoretical peptides of interest and a selection of candidates for further experimental investigation. Software, such as VMD, provides a way to visualize peptide structure and computational techniques, such as NAMD, allow for determining structural changes of peptides suspended in solution (i.e. distance and angle between certain parts of the peptide) which can lead to hypotheses regarding the potential solution structure of the peptide. Ramachandran plots can also be constructed with these data and structure can be determined that way as well.

In the studies that follow, it will be shown that infrared probes can be incorporated into small peptide systems, vibrational coupling between the probes can be quantified, and the distance and angle between sets of infrared probes can be determined both experimentally and computationally. These quantities are combined to form a ‘molecular ruler’ that serves as a measuring device for angle and distance determination in larger unknown systems using 2D IR.

This molecular ruler will make it possible to know the angle and distance between infrared probes by simply obtaining the coupling constant from the 2D IR spectra of the peptides and proteins of interest over many different time delays (i.e. transient 2D IR) which will in create a molecular ‘movie’ of the peptide or protein folding over time.

4

Figure I.1 Definition of the dihedral angles φ/ψ that are used to construct Ramachandran plots (image modified from http://cib.cf.ocha.ac.jp/bitool/DIHED2/)

5

Figure I.2 Allowed and favored regions of a Ramachandran plot

The allowed (blue) and favored (red) regions of the Ramachandran plot for typical peptides that do not contain proline or glycine (upper left), glycine-rich peptides (upper right), and proline- rich peptides (bottom) (produced from JavaScript from the Robinson Group and regions based on calculations performed by the Richardson Group)

6

Figure I.3 Peak movement in the weak coupling regime

Depiction of how the peaks would move in an FTIR spectrum when coupling is modeled by the weak coupling regime

7

Chapter 1 Experimental and Computational Methods

1.1 Peptide Synthesis

Peptides were synthesized via fluorenylmethyloxycarbonyl (FMOC) solid state peptide synthesis. This synthesis works by attaching the first amino acid to a resin. This resin acts as the scaffolding for the rest of the peptide. Next, the FMOC group is de-protected using piperidine leaving a free amide group on the amino acid attached to the scaffold. Lastly, a peptide bond is formed (i.e. coupling) between the amide group of the amino acid on the scaffold and the carboxyl group of the next amino acid in the sequence using 1-hydroxy-benzotriazole (HOBt) as the activation agent. This process of de-protection and coupling continues until all of the amino acids have been added in the sequence. Once the peptide is completely synthesized, it is cleaved from the resin using a 95% trifluoroacetic acid (TFA), 2.5% water, 2.5% triisopropylsilane (TIS) cocktail. A depiction of the deprotection and coupling steps is found below.

8

1.2 Two-Dimensional Infrared Spectroscopy (2D IR)

1.2.1 Theory

Two-dimensional infrared (2D IR) spectroscopy is a three infrared pulse interaction that probes not only the 0-1 main transition, but also the 1-2 transition revealing information on the shape of the potential energy curve and cross peaks that can indicate coupling, energy transfer, or chemical exchange.16,25,26,27,15,23,28,12,22 These three process can be differentiated by observing the behavior of the cross peaks in the 2D IR spectrum. Cross peaks that are present immediately when the system has not been given the chance to evolve indicate coupling29,30 whereas cross peaks that grow in intensity over time indicates an energy transfer process30 and cross peaks that change in intensity (one growing in intensity and the other reducing in intensity) indicates a chemical exchange process.25,26,31 The first pulse in the three pulse interaction labels the starting frequencies of all of the dipoles in the system of study and creates a coherence between the ground state and first excited state energy levels. The second pulse ends this coherence and initiates the waiting time over which the system will evolve. The third pulse is the probe pulse which ends the waiting time and reads out the final frequencies of all the dipoles in the system.30

A plot is created by Fourier transforming over the time between the first and second pulses (τ) and between the third pulse and the photon echo (t) (Figure 1.1).22,20,26,27,13,23

1.2.2 Utilizing a 2D IR Spectrum to Determine Coupling Constants

Many pieces of information can be gathered from these two-dimensional contour plots.

For example, cross peaks can indicate coupling or energy transfer between the vibrational modes and the peaks along the diagonal indicate the coupled transition frequencies of each probe and also elucidate solvent and environmental dynamics around each probe depending on their shape at equilibrium and over time. 9

The coupling constant, or strength of an interaction between two objects, can be determined experimentally using the 2D IR spectrum or computationally using a coupling model.

In order to determine the coupling constant experimentally, the following equation is used:

2 훽푎푏 ∆푎푏= −4∆푏푏 Eq. 1 (휀푎−휀푏)

where Δab is the frequency shift between the positive and negative portions of one of the cross peaks, Δbb is the frequency shift between the positive and negative portions of one of the peaks along the diagonal, εa and εb are the uncoupled frequencies of the oscillators being studied

(obtained by way of a linear infrared spectrum of each individual probe in solution), and βab is the coupling constant.

One of the most conventional theoretical models that can be used to calculate the coupling strength is transition dipole coupling theory (TDC)32,33,34 which is defined by the following equation:

1 휇̂푖 ∙ 휇̂푗 3(휇̂푖 ∙ 푟̂푖푗)(휇̂푗 ∙ 푟̂푖푗) 훽푖푗 = ( 3 − 5 ) Eq. 2 4휋휀0 |푟⃑푖푗| |푟⃑푖푗| where μi and μj are the transition dipoles, rij is the distance between the dipoles, and βij is the coupling constant. The transition dipole coupling strengths of each oscillator were determined experimentally using linear infrared spectra and the following equation:

휀(휔) |휇푡표푡|2 = 9.18 x 10−3 ∫ 푑휔 휔 Eq. 3 where ε(ω) is the extinction coefficient at each frequency along the linear IR spectrum and ω is the frequency.24 The dot product between the transition dipoles in Eq. 2 takes into account the angle between the dipoles (θ) and the dot product between the dipole and the distance (r). The angle between the dipole moments is typically set as either 0 or 180 degrees in order to calculate 10 the minimum and maximum amount of coupling that is to be expected in the system, respectively. A minimum and maximum coupling constant is calculated for each peptide system so that a range can be determined that accounts for any angle between the dipoles. A more precise angle between the dipole moments can be determined using molecular dynamics simulations or static Gaussian calculations. A more detailed description of the molecular dynamics process is found in Chapter 3 and a description of the static Gaussian calculations is found in Chapter 2.

The coupling constant can also be predicted based on the position of the peaks in the linear infrared spectrum. This is done by using the Hamiltonian from perturbation theory:

휔1 훽12 퐻̂ = [ ] 훽 휔 21 2 Eq. 4 where ω1 and ω2 are the uncoupled frequencies of the probes in the linear IR spectrum and β is the coupling constant. A plot of the coupled frequency vs. coupling constant is constructed by plotting the eigenvalues of the matrices (Eq. 4) with respect to a changing coupling constant

(Figure 1.2). In Figure 1.2, a coupling constant can be approximated by using the probe frequencies from the linear spectrum, matching these frequencies to those found on the y-axis and drawing two horizontal lines connecting the y-axis frequencies to the two curves on the plot, and finally drawing a vertical line down to the coupling constant (x-axis). An example of one of

-1 these frequency vs. coupling constant plots is found in Figure 1.2. In this case ω1 = 2220 cm , ω2

= 2228 cm-1 and β was calculated using Eq. 4. Transitions occurred at 2218 and 2229 cm-1 in the linear spectrum of this peptide and the resulting coupling constant was predicted to be 4.5 cm-1.

If all of the parameters (transition dipole strengths, coupling constant, distance between the dipoles and the angle between them) are known, then it is possible to elucidate structural information about the system of study. 11

1.2.3 Experimental Layout and Design

1.2.4 Pulse Shaper Layout

As seen from the layout above, the pump beam enters the pulse shaper through iris 1 as a single pulse (right side), then hits mirror 1 and reflects onto grating 1 which disperses the pulse into different frequencies. These dispersed frequencies reflect off parabola 1 and travel through the germanium crystal of the acoustic optic modulator (AOM) which shapes the single pulse into two pulses by placing a mask of an acoustic wave over the dispersed frequencies. The two pulses exit the AOM, reflect off parabola 2, and recombine the frequencies on grating 2. The recombined pulses reflect off mirror 2 and exit the pulse shaper through iris 2 (left side).

1.3 Performing 2D IR Experiments Using a Mid-IR Pulse Shaper

Structural information of molecules can be obtained from 2D IR spectra in the form of vibrational frequencies, coupling cross peaks, and lineshapes.35,36,19 Cross peaks are used to determine the structure of the molecule because the coupling depends on the distance and orientation between the two dipoles of interest.35 One of the fastest ways of collecting 2D IR spectra is to use a pulse shaper. The mid-IR pulse shaper is set up in a 4f geometry that is similar 12 to the visible pulse shaper.37,38 This standard 4f geometry is composed of two gratings, two parabolic mirrors, and one acoustic optic modulator (AOM). The first grating and parabolic mirror are used to disperse the mid-IR light and collimate the dispersed pulse on the germanium window of the acoustic optic modulator. An acoustic wave, whose phase and amplitude are dictated by an arbitrary waveform generator, travels through the germanium crystal and acts as a programmable grating that deflects the mid-IR pulse frequencies with a specific phase and amplitude.35,36 Lastly, the second parabola and grating are used to recombine and transform the frequencies back into the time domain.35 This technique is fast because the acoustic wave can travel the length of the germanium crystal of the AOM in 10 μs.35,37 This corresponds to a new waveform that can be generated with a 100 kHz repetition rate, much faster than the 1 kHz repetition rate of the pulsed lasers that are typically used for these experiments.35,37 This technique is also faster than using a BOXCAR setup because there are no moving mirrors in the setup. All of the waveforms that will be used in the experiment are loaded into the memory of the arbitrary waveform generator and when the experiment is started, each waveform is sent through the AOM one by one with each successive laser pulse.35 Scatter in the sample can be removed by a technique called phase cycling where the observed frequencies of the scatter are shifted to a different frequency resulting in a cleaner spectrum of the desired signal.35,19

The shape of the pump pulses that exit the pulse shaper have a Gaussian line shape instead of the Lorentzian or Etalon line shapes that were used for different methods of collecting

2D IR spectra.19 The Gaussian line shape is used because it decays to zero faster than the other two, which is advantageous because the tail end of a pulse does not interfere with the interaction of the next pulse in the pulse sequence with the sample.19

13

ωb ωa

Δaa

Δ ω bb τ Δab

ωt

Figure 1.1 General example of a 2D IR spectrum

Positive peaks (0 – 1 transitions) are shown in red and negative peaks (1 – 2 transitions) are shown in blue. Structural information for the peptide is obtained by using the values for Δab and

Δbb and inserting them into Eq. 1 to calculate the coupling constant. The value for the coupling constant can then be inserted into Eq. 2 along with the value for the transition dipole strength

(calculated from Eq. 3) and the distance between the probes (obtained from a molecular dynamics simulation) in order to calculate the angle between the oscillators.

14

Figure 1.2 Determining coupling constants using frequency of IR bands

Plot of the frequencies that a transition will occur at based on the amount of coupling that is present in the system (black curves) as well as an approximation of a coupling constant (red lines) when the coupled frequencies of two probes in a system are used.

15

Chapter 2 Cyanophenylalanine and Azido- Derivitized Amino Acids

2.1 Introduction

Non-natural amino acids that can be used as structural and environmental probes are of great importance due to the number of diseases that involve the mis-folding of important peptides and proteins. One of the most widely used non-natural amino acids is p- cyanophenylalanine (PheCN). The nitrile (C≡N) probe is useful since its stretching frequency occurs in an isolated range of the infrared spectrum (2100 – 2400 cm-1) and that it is small compared to some of the other common probes which leads to less perturbation to the system of study. This small probe can also be site-specifically incorporated into any peptide system. These probes are incorporated into peptide systems by first attaching them to the amino acid of interest, for example attaching a nitrile group to phenylalanine to form cyanophenylalanine, and then synthesizing the peptide. Modified amino acids can take the place of un-modified amino acids in the sequence to probe the site of interest. One downside to using the nitrile stretch is that it has a rather small extinction coefficient (~200 M-1cm-1),39 and therefore requires either a large amount of sample or a large amount of time to average the infrared spectra.

Another common probe that has been used for peptide structural studies is the azide moiety (R-N3). One of the common amino acid side chains that this probe is attached to is alanine. Azides have an extinction coefficient that is approximately twice that of the nitrile stretch (350 – 400 M-1cm-1).40 Since this probe has a relatively strong extinction coefficient and more atoms than the nitrile probe, it is more prone to cause a perturbation to the system. One probe that nearly eliminates the perturbation to the system of study is the C–D stretch. This isotopic substitution can be performed on any amino acid, but the extinction coefficient is so 16 small (5 – 10 M-1cm-1)10 that multiple probes need to be in a system and many averages of the spectra need to be collected.

This study focused on using two different derivatives of PheCN. These derivatives are 2-

PheCN and 3-PheCN which have not been used as infrared probes.

2.2 Experimental Methods

The current study was performed using 2-cyanophenylalanine (2-PheCN), 3- cyanophenylalanine (3-PheCN), and 4-cyanophenylalanine (4-PheCN) as well as short peptides with the following sequences: 2-PheCN–3-PheCN, and 2-PheCN–Pro17–3-PheCN. The peptides were synthesized using standard fluorenylmethyloxycarbonyl chloride (FMOC) solid state synthesis with a peptide synthesizer (PS3, Protein Technologies) and purified using reverse-phase high performance liquid chromatography (HPLC, Hewlett Packard 1050 series). Peptide masses were verified using mass spectrometry (Waters LCMS) and matrix assisted laser desorption ionization spectroscopy (Microflex, Bruker). 2-PheCN, 3-PheCN, and their FMOC derivatives were purchased from Chem-Impex Int’l Inc. 4-PheCN was purchased from BACHEM, and FMOC-4-

PheCN was purchased from AnaSpec Inc. Solutions of 2-PheCN, 3-PheCN, 4-PheCN, FMOC-2-

PheCN, FMOC-3-PheCN, FMOC-4-PheCN, and the peptides were made in ultrapure water (18.2 mΩ, Millipore, Synergy) and tetrahydrofuran (THF, Reagent grade ACS) in concentrations ranging from 0.01 M – 0.03 M depending on the variable solubility of the compound. The solutions were then analyzed using Fourier-Transform Infrared (FTIR) spectroscopy and two- dimensional infrared spectroscopy.

2.2.1 FTIR Spectra

FTIR spectra of the individual amino acids and short peptides were obtained using an infrared spectrometer (Nicolet 6700, Thermo Scientific) with a custom designed external set-up 17 and the following instrumental parameters: 1 cm-1 resolution and a liquid nitrogen cooled MCT detector. A dual compartment FTIR injection sample cell with calcium fluoride windows and

100 μm spacer was utilized to analyze the background and the sample under the same conditions by moving the cell from one side to the other with a translation stage. All spectra were obtained under nitrogen in a purged chamber at ambient temperatures. Typical optical densities (OD) for any of the compounds ranged from 0.005 – 0.030 depending on solubility. All FTIR spectra were fit to a Gaussian lineshape to determine absolute peak positions, peak intensities, and peak widths.

2.2.2 Computational Methods

Theoretical calculations of nitrile vibrations in the gas phase were performed using

B3LYP/6-31G* with Gaussian 09. MD simulations were performed using Nanoscale Molecular

Dynamics (NAMD) and processed using Visual Molecular Dynamics (VMD, version 1.9.1).41

The two residue peptide was encased in a 30 Å cubic water box and the nineteen residue peptide was encased in a water box with dimensions of 65 x 50 x 55 Å. The NAMD simulations utilized

CHARMM2242 and CHARMM3643,44,42 force fields and were completed at 298 K and 1 atm.

Simulations progressed with a timestep of 2 fs for 1 ns and 10 ns for the two residue and nineteen residue peptides, respectively. Distances between the nitrile probes were found by measuring the bond length between the carbon atoms on the phenyl rings over the length of the simulation. Histograms of these distances were constructed and fit to Gaussian curves in

OriginPro8 to give the ensemble average distance.

2.3 Experimental Results

The linear IR spectra of 2-PheCN, 3-PheCN, and 4-PheCN in water and THF exhibit a single band in the region of the CN symmetric stretching mode (Figure 2.1). The nitrile vibrational 18

transitions for 2-PheCN, 3-PheCN, and 4-PheCN are observed at 2232, 2238, and 2237 in water, respectively. The corresponding bandwidths of these transitions in water are 11.31, 11.08, and

-1 10.44 cm , respectively. Likewise, the nitrile vibrational transitions for 2-PheCN, 3-PheCN, and 4-

-1 PheCN in THF occur at 2226, 2230, and 2228 cm , respectively, with corresponding bandwidths of 7.29, 6.39, and 5.67 cm-1 (Table 2.1). The bandwidth decreases from water to THF because water is a strongly hydrogen-bonding solvent that can interact with the CN moiety in many different ways whereas THF is a non-hydrogen bonding solvent that does not interact with the

CN moiety.

The extinction coefficients of 2-PheCN and 3-PheCN in water were found to be 90.6 ± 16

-1 -1 -1 -1 M cm and 136.6 ± 9 M cm , respectively, whereas the extinction coefficient of 4-PheCN in

-1 -1 39 water was previously found to be 200 M cm by Gai et. al. Once the individual PheCN residues were characterized, two short peptides were synthesized so that coupling strengths and two ends of a molecular ruler could be determined.

A short peptide composed of four residues (2-PheCN–Pro2–3-PheCN) was synthesized in order to be the starting point for building out the molecular ruler. The linear infrared spectrum of the 4-mer (Figure 2.2, top) shows one transition at 2232.1 cm-1 (according to a Gaussian curve fit) where the FWHM value is 10.7 cm-1. A second peptide composed of nineteen residues (2-

PheCN–Pro17–3-PheCN) was synthesized to serve as the maximum length of the molecular ruler where there would be no coupling observed between the nitrile tags. The linear infrared spectrum of the 19-mer (Figure 2.2, bottom) exhibits a single transition at 2235.3 cm-1 with a FWHM value of 13.85 cm-1.

19

2.4 Transition Dipole Coupling Model and Simulations

Nanoscale Molecular Dynamics (NAMD) simulations were carried out on the 4-mer and

19-mer peptide in order to determine the ensemble average distance between the nitrile probes.

The average distance between the probes for the 4-mer and the 19-mer peptides was found to be

9.3 ± 2.4 Å and 14.8 ± 1.7 Å, respectively. Transition dipole strengths were calculated for 2-

PheCN, 3-PheCN, and 4-PheCN using the extinction coefficients and the cross-section of the FTIR spectrum (Eq. 1)

휀(휔) |휇푡표푡|2 = 9.18 x 10−3 ∫ 푑휔 Eq. 1 휔 where μ is the transition dipole strength, ε is the extinction coefficient at each frequency ω.32 The transition dipole strengths for 2-PheCN, 3-PheCN, and 4-PheCN in water were determined to be

0.23 ± 0.08, 0.27 ± 0.07, and 0.39 ± 0.02 D, respectively (Table 2.2). A transition dipole coupling model (Eq. 2)

1 휇̂푖∙휇̂푗 3(휇̂푖∙푟̂푖푗)(휇̂푗∙푟̂푖푗) 훽푖푗 = ( 3 − 5 ) Eq. 2 4휋휀0 |푟⃑푖푗| |푟⃑푖푗| where βij is the coupling constant, μi and μj are the transition dipole strengths, and rij is the distance between the probes was used to calculate the amount of coupling between the nitrile probes in the 4-mer and 19-mer peptides. The coupling constant for the 4-mer and the 19-mer was calculated to be -1.408 and -0.348 cm-1, respectively.

2.5 Discussion

In FTIR, if coupling exists one transition will move to a higher frequency while one moves to a lower frequency. There can also be a degree of intensity borrowing between the transitions. The FTIR spectrum of the 4-mer peptide (Figure 2.2, top) exhibits a peak centered at

2232.1 cm-1. Two transitions should be present under this single peak; however, FTIR does not have a high enough resolution to be able to observe two transitions. To support the fact that there 20

should be two transitions, the spectra of the peptide and individual 2-PheCN and 3-PheCN amino acids were fit with Gaussian curves and the amplitudes of the individual amino acid curves were changed until the sum of the two curves matched the Gaussian curve of the peptide. The summed curve with the best fit was composed of the 2-PheCN curve with an amplitude the full height of the peptide curve and the 3-PheCN curve with an amplitude of nearly zero (Figure 2.3). This indicates that the major contribution to the peptide lineshape is from the 2-PheCN residue. This major contribution from a single species could indicate a high degree of intensity borrowing which could result in a larger value for the coupling constant of the system. The same type of analysis was performed for the 19 residue peptide (Figure 2.4). This figure shows that the best

Gaussian fit for the linear spectrum occurs when the intensities of the 2-PheCN and 3-PheCN are nearly equal. Nearly equal intensities imply that there is very little or no intensity sharing between the transitions, so it is likely that the coupling constant between the nitrile probes in this peptide is very small or equal to zero.

This qualitative analysis using the ratios of the lineshapes to determine the magnitude of coupling was compared with the results of the quantitative analysis that uses the distances between the nitrile probes and the transition dipole strengths (Eq.2). The coupling strength

-1 -1 calculated for the four residue peptide (2-PheCN–Pro2–3-PheCN) was -1.408 cm and -0.348 cm for the nineteen residue peptide (2-PheCN–Pro17–3-PheCN). These two model peptides represent a near beginning and an end for the potential molecular ruler that will be built out. Ideally 2D IR spectra will be collected of all of the peptides that are synthesized as part of this molecular ruler.

Coupling strengths under 1 cm-1 are difficult to observe with this technique, so a plot was created detailing the coupling constant with respect to length of the peptide (Figure 2.5). From this figure, it was determined that the maximum length of a peptide that can have observable 21

coupling using a 2D IR technique is ~ 10 Å using 2-PheCN and 3-PheCN as infrared probes. Since the length of the four residue peptide (2-PheCN–Pro2–3-PheCN) was determined to be 9.3 ± 2.4 Å by MD simulation, this peptide may actually be close to the end of the ruler instead of near the beginning and the nineteen residue peptide (2-PheCN–Pro17–3-PheCN) with a length measured at

14.8 ± 1.7 Å by MD simulation is likely out of the observable range.

22

Freq. in water Freq. in THF Bandwidth in water Bandwidth in THF (cm-1) (cm-1) (cm-1) (cm-1) 2-PheCN 2232 2226 11.31 7.29 3-PheCN 2238 2230 11.08 6.39 4-PheCN 2237 2228 10.44 5.67 Table 2.1 Frequencies and bandwidths of cyanophenylalanine derivatives in water and tetrahydrofuran (THF)

23

Extinction Coefficient (M-1 cm-1) Transition Dipole Strength (D) 2-PheCN 90.6 ± 16 0.23 ± 0.08 3-PheCN 136.6 ± 9 0.27 ± 0.07 39 4-PheCN 200 0.39 ± 0.02 Table 2.2 Extinction coefficients and transition dipole strengths of cyanophenylalanine derivatives

24

Extinction Transition Dipole References Coefficient Moment (D) (cm-1M-1) p-cyanophenylalanine ~220a ~0.20 39 Azidohomoalanine 350 – 400b ~0.40 40 C – D 5 – 10 ---- 10 Cyanoindole derivatives 70 – 140a 0.21 – 0.26 45 5-Cyanotryptophan 168 ± 23a ~0.22 46 Methyl Thiocyanate 300c ---- 47 a. Water b. D2O c. 2-methyltetrahydrofuran Table 2.3 Extinction coefficients and transition dipole moment strengths of select infrared probes

25

2 4 3

Normalized OD Normalized

-1 Wavenumbers (cm )

Figure 2.1 Linear infrared spectra of cyanophenylalanine derivatives

FTIR spectra of 2-PheCN (blue), 3-PheCN (red), and 4-PheCN (green) in water (top) and THF

(bottom)

26

0.012 2-Phe –Pro –3-Phe 0.010 CN 2 CN 0.008 0.006

0.004 0.002 0.000 0.012

0.010 2-PheCN–Pro17–3-PheCN

Optical Density Optical 0.008 0.006 0.004 0.002 0.000 2200 2220 2240 2260 2280 Wavenumbers (cm-1)

Figure 2.2 FTIR spectra of short peptides in water

Linear infrared spectra of peptides with the sequences 2-PheCN–Pro2–3-PheCN (top) and 2-PheCN–

Pro17–3-PheCN (bottom) in water

27

Figure 2.3 FTIR spectrum of a four residue peptide that has been fitted with two different ratios of Gaussian curves

The FTIR spectrum of the four residue peptide 2-PheCN–Pro2–3-PheCN has been fit with two different ratios of the Gaussian curves for the individual 2-PheCN (red) and 3-PheCN (green) probes in order to determine which sum of the curves (black) fits the experimental spectrum

(blue) the best. 28

Figure 2.4 FTIR spectrum of the nineteen residue peptide that has been fitted with four different ratios of Gaussian curves

The FTIR spectrum of the nineteen residue peptide 2-PheCN–Pro17–3-PheCN has been fit with four different ratios of the Gaussian curves for the individual 2-PheCN (red) and 3-PheCN (green) probes in order to determine which sum of the curves (black) fits the experimental spectrum

(blue) the best.

29

Figure 2.5 Coupling constant with respect to distance

Plot of how the coupling constant changes with respect to the distance between the vibrational probes according to transition dipole coupling theory

30

Chapter 3 Comparing Glycine to Proline as a Spacer in Short Peptides

3.1 Introduction

In the previous chapter small probes such as cyanophenylalanine were characterized using FTIR to determine experimental coupling strengths and to calculate the distance over which coupling can be observed. It was determined that using cyanophenylalanine derivatives could only measure a distance of 10 Å or less, but what if different probes with stronger transition dipole moment strengths were used? Or if two probes were placed in a real peptide system and coupling peaks could be seen in a 2D IR spectrum, but the distance between them was not known? Then it would be beneficial to have a map of distance, angle, and coupling constant already mapped out. The purpose of this study is to construct this map of distance, angle, and coupling constant so that these three aspects of peptide motion can be determined from a 2D IR spectrum.

This project will be carried out by first determining peptide lengths using molecular dynamics simulations, then determining the angles associated with each peptide by calculating phi/psi angles and plotting them into Ramachandran plots, and finally the coupling strengths will be calculated using Transition Dipole Coupling Theory along with previously determined peptide lengths and angles. Transition Dipole Coupling Theory can be used when there is weak coupling and short distances between the dipoles, but since it only takes into account electrostatic effects, it is not very accurate when the interaction is primarily through-bond instead of through-space.48,49,50

In this study proline and glycine chains were used in six different starting conformations

(α-helix, 310-helix, π-helix, β-sheet, straight chain, and either poly-proline or poly-glycine helix) since it is unknown what the actual relaxed conformation of these peptides are in solution and 31 this case would relieve as much starting conformation bias as possible. Proline and glycine were used in these peptides in order to determine which amino acid created a more stable peptide conformation in water and which amino acid would be better to use in a synthesized peptide.

Phenylalanine was used in this study so that the results could be compared with the experimental results from the previous chapter.

3.2 Computational Methods

Simulations were carried out using the nanoscale molecular dynamics (NAMD) software with CHARMM2242 and CHARMM3643,44,42 force fields for proteins. A series of poly-proline and poly-glycine peptides were constructed with the sequences Phe–ProX–Phe and Phe–GlyX–

Phe where X = 0 – 10. Six starting conformations were investigated for each peptide: straight chain (linear), α-helix, β-sheet, 310-helix, π-helix, and either poly-proline (PII) or poly-glycine helix depending on the polypeptide. The cubic boxes contained between 426 – 2753 water molecules (TIP3P) depending on the peptide starting conformation and the cube size (25 – 45 Å).

Each peptide was centered in the cube with a minimum of ~ 3 Å distance from the walls of the cube. The atoms of the backbone were restrained to one of the six starting conformations at the start of each simulation, but were allowed to relax over the course of the simulation. A periodic boundary condition was used to ensure that the peptide did not leave the water box over the course of the simulation. Simulations were carried out at a constant temperature of 298 K and a constant pressure of 1 atm. Each system was equilibrated for 2 ps before trajectory data was collected within a file for further analysis. The total simulation time for each peptide in each conformation was 10 ns with time steps of 2 fs and a trajectory written to a file every 250 time steps resulting in 20000 trajectories for each simulation. A control simulation of 100 ns was 32 performed on the peptides with 10 proline and 10 glycine residues to ensure that the results were consistent for long time periods.

3.2.1 Calculating Peptide Length and Dihedral Angles

Peptide length was determined on the basis that non-native nitrile derivatives of phenylalanine (2-PheCN, 3-PheCN, and 4-PheCN) would be used in peptide synthesis during an experiment. Hence, nine different lengths of each peptide could be determined depending on what combinations of PheCN derivatives were used. The possible combinations are: 2-PheCN–2-

PheCN, 2-PheCN–3-PheCN, 2-PheCN–4-PheCN, 3-PheCN–2-PheCN, 3-PheCN–3-PheCN, 3-PheCN–4-

PheCN, 4-PheCN–2-PheCN, 4-PheCN–3-PheCN, and 4-PheCN–4-PheCN where the first designation of each pair is the phenylalanine on the N-terminus of the peptide and the second designation of each pair is the phenylalanine on the C-terminus of the peptide. The lengths of each peptide were determined using visual molecular dynamics (VMD)41 software by creating a ‘bond’ between the carbon atom where each of the nitrile probes would be placed in the synthesized peptides (Figure

3.1). These ‘bonds’ were then tracked for each trajectory for each 10 ns simulation and length distributions were constructed (histograms of peptide length vs the number of times that length occurs over the course of the 10 ns simulation with a bin size of 0.5 Å). The ensemble average length of each peptide was determined by plotting the length distributions in Origin8 Pro and fitting each one to a Gaussian curve. The length of each peptide was taken as the center position of the Gaussian curve ± the standard deviation of all of the trajectories that made up that length distribution.

The dihedral angles within the backbone of each of the simulated peptides were measured as phi and psi angles where phi is the dihedral angle of C1–N–Cαʹ–C1ʹ and psi is the dihedral angle of N–Cαʹ–C1ʹ–Nʹ (Figure I.1) and the dihedral angle between the nitrile probes is found in 33

Figure 3.2). These pairs of phi and psi angles were used to construct Ramachandran plots to elucidate peptide structure in solution.

3.3 Results

Examples of a couple of distance distributions that were constructed are shown in Figure

3.3. This figure shows the difference between a shorter peptide (3 proline residues between the phenylalanine residues) and a larger peptide (10 proline residues between the phenylalanine residues). In this case, both of the distributions were constructed from simulations on peptides that were started in the poly-proline helix starting conformation and there is little difference in the shape of the distributions besides where the ensemble average lies. This is the same case for the peptides with the poly-glycine helices. However, the distributions for some of the other starting conformations look much different. For example, the distributions constructed from these example peptides in the starting conformation have two preferred conformations except for the peptide with the three proline residues as evidenced by the two peaks exhibited in the distributions (Figure 3.4). The distributions for the pi helix conformations show the same pattern. Even though only three of the starting conformations are shown in Figure

3.4, the distributions for all of the starting conformations combined are also shown. It is seen that all of the distributions have more than one preferred conformation in solution.

Since the combination of the six starting conformations was supposed to result in a single non-biased ending conformation for each of the peptides being studied and it did not, there could be a possibility that the 10 ns simulation time was insufficient to sample all of the conformations in solution and settle on a single equilibrium distance. This assumption was tested by extending the simulation time from 10 ns to 100 ns for the two longest peptides starting from the configuration that is the farthest away from equilibrium, the straight chain. It is seen from Figure 34

3.4 that the distance distributions constructed after 10 ns of simulation time each exhibit a main peak and a prominent shoulder or another small peak. After the simulation had been extended to

100 ns, the distribution for the poly-proline peptide exhibits a single Gaussian curve whereas the distribution for the poly-glycine peptide still contains a prominent shoulder (Figure 3.5). From this result it can be concluded that 100 ns or less is sufficient for the poly-proline peptide, but it is likely not sufficient for the poly-glycine peptide to relax in solution.

Another possibility results from the flexibility of the peptide backbone. Since the structure of proline is a five-membered ring that is attached to the peptide backbone, the structure is very rigid, but glycine, having no functional groups to help stabilize it, it very flexible. This flexibility, or lack thereof, is qualitatively determined when the dihedral angles are calculated over the course of the simulation and plotted into a Ramachandran plot. An example plot is shown for a short peptide (3 proline or glycine residues) and a longer one (10 proline or glycine residues) with the normal Ramachandran regions (left) and the poly-proline or poly-glycine specific regions (right) (Figure 3.6). It is seen from Figure 3.6 that these poly-proline or poly- glycine peptides do not have conformations that fit within the normal Ramachandran regions, but instead conform to the specific regions. It is also seen that the specific regions for the poly- glycine plot has six different regions whereas there are only three allowed regions for the poly- proline peptides. The main phi/psi angles could be used to interpret the overall structure of the peptide along the peptide backbone, but it could not be used to determine the angle between the individual probes.

In order to determine the angle between the probes in all of the peptides, the dihedral angle between H1-C1 and HX-CX (where 1 is the phenylalanine residue on the N-terminus of the peptide and X is the phenylalanine on the C-terminus of the peptide) when the nitrile probes 35 would be placed in the para positions on each phenylalanine residue. This position was used to determine the angle because the 4-PheCN amino acid is one of the more popular non-natural amino acids compared to 2-PheCN and 3-PheCN. For each of the 21 peptides that were simulated, dihedral angles from all six of the conformations for each peptide were used resulting in

~120000 different dihedral angles for each peptide. The ensemble average of these dihedral angles was obtained by fitting a Gaussian curve to the resulting distributions and calculating the standard deviation to be used as the upper and lower limit of the angle. Using the equilibrium length of each peptide and the preferred angles between the dipoles, the coupling constant can be calculated for each peptide. Figure 3.7 shows how the distance between the probes affects the coupling constant (top left), how the angle between the probes affects the coupling constant (top right), and the map of all three variables (bottom). It is seen from Figure 3.6 that the distance between the probes has a significantly larger effect on the magnitude of the coupling constant even if the standard deviation in the distance is rather small (between 2 and 6 Å) whereas the angle between the dipoles has a small effect even when the standard deviation in the angle ranges from 49 – 104 degrees.

3.4 Conclusions

In this study, peptide behavior in solution was investigated by constructing a series of small peptides and modeling them using molecular dynamics simulations. This was done to determine the length of the peptides in solution, the angle between any potential probes that could be attached in an experimental system, the structure of the peptide in solution, and finally to construct a map of the distance and angle with respect to the vibrational coupling constant.

It was found that a 10 ns simulation time may not be sufficient to sample all of the possible structures in solution and that even 100 ns might not be enough if glycine is used in 36 abundance within the peptide. There is also very little difference between the length of the peptide when the probe is placed in a different spot on the phenylalanine rings. The lengths of the peptides in solution ranged from 5 – 30 Å with standard deviations from 2 – 6 Å. The angle between the probes was determined to be variable for each peptide with standard deviations ranging from 49 – 104 degrees. The general orientation of the peptide backbones was determined using Ramachandran plots. These plots showed that the proline-rich peptides have more consistent structure in solution as evidenced by only having three angle allowed regions where glycine-rich peptides have six angle allowed regions. Finally, the map of angle, distance, and coupling constant was constructed for this system of 21 peptides and a single type of nitrile probe. In order to make this a useful map for reference during experiments there needs to be significantly more peptides sampled using a variety of spacers and probes in order to determine possible patterns similar to Ramachandran plots.

37

Figure 3.1 Depiction of the ‘bond’ that was used to determine the length of the peptides in the MD simulations

38

Figure 3.2 Dihedral angles between the nitrile probes

The dihedral angles that were measured between the probes of each peptide in this study were used to determine the ensemble average angle between the dipoles

39

Figure 3.3 Distance distributions for short and long peptides

These distributions are shown as an example of a short (3 proline residue) peptide and a long (10 proline residue peptide). All of the other distributions that were constructed are shown in

Appendix 2.

40

Figure 3.4 Distance distributions from multiple starting conformations

Distance distributions for three of the six different starting conformations for an example short peptide (3 proline or glycine residues) and long peptide (10 proline or glycine residues) along with the distance distributions for all six of the starting conformations combined.

41

Figure 3.5 Distance distributions after 10 ns and 100 ns

Distance distributions of the two longest peptides (each with 10 proline or glycine residues) after

10 ns of simulation time and the distance distributions of the same two peptides after 100 ns of simulation time.

42

Figure 3.6 Ramachandran plots

Ramachandran plots of the example short peptides (3 proline or glycine residues) and the long peptides (10 proline or glycine residues) with the normal regions (left column) and the proline or glycine specific regions (right column) 43

Figure 3.7 Distance, angle, and coupling constant maps

A map of how the coupling constant is changed when it is influenced by the distance between the probes (peptide length) and the angle between the dipole (dihedral angle) (upper left) along with individual maps of how the coupling constant is changed by just the distance (upper right) or just the angle between the probes (bottom). These maps were only constructed using the 21 peptides that were simulated in this current study.

44

Chapter 4 Fourier Transform Infrared Spectroscopy of Plants

4.1 Introduction

In the previous chapters, infrared spectroscopy has been used to characterize and analyze small molecules in peptide systems that have been synthesized in a lab and are suspended in an ideal solution such as water, but FTIR can also be used on other samples such as plants, paints, or powders of unknown composition.51,52 When doing research in the field, it would be ideal to have a hand-held device that analyzed samples for spectral fingerprints quickly and efficiently without having to do any sample preparation in a laboratory and without destroying the sample in the process. One possible technique that could fulfill all of these requirements is attenuated total reflectance Fourier Transform infrared (ATR-FTIR) spectroscopy. In this study, ATR-FTIR is used to analyze plant leaves from a common desert plant from the genus Astragalus to determine if differences in the linear spectra such as peak placement and intensity can be detected between species using each species unique infrared signature as well as determining if

ATR-FTIR can be used as an appropriate method to be used in the field as a hand-held device.

4.2 Experimental

Materials and Methods

Eleven samples of Astragalus were obtained with unknown species designations in order to maintain an unbiased double-blind study using the ATR-FTIR technique. ATR-FTIR spectra were collected from 650 – 4000 cm-1 with a resolution of 4 cm-1. The aperture was 98 μm and

100 averages were collected per scan with five scans per sample. Five different sets of spectra were collected. These sets of spectra represent the various ways that a sample can be prepared in a laboratory setting so that the instrument function can be observed in a variety of different conditions. The first set was the unprepared leaf of each sample. These samples were pressed 45 against the ATR-FTIR crystal with 6 pounds of pressure using the pressure head on the apparatus. These leaves were then placed in one milliliter of methanol and allowed to extract overnight. The second set of ATR-FTIR spectra was collected using the methanol solutions.

These liquid samples were air-dried onto the ATR-FTIR crystal for 20 minutes to ensure the total evaporation of the methanol on the crystal. The third set of spectra was collected using the leaves that had been extracted by the methanol. These leaves were subjected to the same 6 pounds of pressure as the unprepared leaves. Lastly, a new set of unprepared leaves was stirred in 1 mL of hexane for 10 seconds to remove the wax layer on the leaves. Pieces of these leaves were extracted with 1 mL of methanol and the rest of the leaves were left whole. The last two sets of

ATR-FTIR spectra were collected on the de-waxed leaves and the resulting methanol extraction solution.

ATR Theory

In ATR spectroscopy, the infrared beam enters a trapezoidal crystal at such an angle that the beam reflects off the internal part of the crystal at least once, but preferably more than once.

At each reflection the electromagnetic wave (evanescent wave) penetrates the sample to a depth depending on the frequency of light that is used (Figure 4.1).51,52 The information contained in the evanescent wave then exits the crystal and carries the information to the detector. There are two main types of instruments, single-bounce and multiple-bounce. In a single-bounce instrument there is only a single reflection before the light goes to the detector, whereas in a multiple-bounce instrument there are multiple reflections within the crystal (and therefore more interactions with the sample). One advantage of this technique is the ability to measure spectra in aqueous media since the pathlength is highly reproducible which makes it possible to subtract a water background accurately.51 46

4.3 Results and Discussion

The eleven samples of Astragalus serve as a start for piecing together a library of FTIR spectra of this genus. This library can then be used in a hand-held field device that can identify plant species based on these libraries. In all of the spectra collected for the unprepared leaves

(Figure 5.1) it is seen that there are several key spectral features that do not vary much between the species. These features include a large peak centered between 1000 and 1100 cm-1, a small peak at ~ 1170 cm-1 that exhibits itself as a small notch or shoulder attached to the main peak, another small peak at ~ 1260 cm-1, a larger peak at ~ 1390 cm-1, and two more distinct peaks occurring at ~ 1600 and ~ 1730 cm-1. There are many possible groups that can be giving rise to the large peak occurring between 1000 and 1100 cm-1. Some of these possible groups are O–H and C–OH stretching (1035 cm-1)53,54 and C–O stretching (1064 cm-1)53,54 found in the polysaccharides that are present in the cell walls, and stretches from β-galactan (1073 cm-1).53

The peak around 1170 cm-1 could potentially be assigned to cellulose (1145 cm-1).53 The peak at

~1260 cm-1 could be assigned to either the C=N or N–H stretching from peptides (1235 cm-1)53 as well as the carboxyl group of pectin substances (1250 cm-1).53,55 The peak at ~1390 cm-1 could be assigned to the OH bending modes of the polysaccharides found in the cell walls (1415 cm-

1).53 The peak at ~ 1600 cm-1 can be assigned to the C=O aromatic stretching from lignin (1601 cm-1),53 and the peak at ~1730 cm-1 can be assigned to the C=O stretch from phospholipids, cholesterol esters, hemicellulose, or pectin (1733 cm-1).53

When the wax was removed from the leaves using hexane, all of the peaks still occur in the same approximate places, but some are more defined (Figure 5.2). All of these same features are present in the methanol extractions as well as the extracted leaves (Figures 5.3 – 5.5) which possibly means that the components that are contained within the leaves (excluding components 47 such as cell wall polysaccharides) are being fully extracted by the methanol. Another option is that the components exist in each of the species in different concentrations. In order to test this theory, hole-punches (~ 2 mm in diameter) of each of the unprepared leaves were placed in one milliliter of methanol and allowed to extract overnight. The resulting methanol solutions were clear with a very faint green pigment. The ATR-FTIR spectra that were collected of these solutions (data not shown) were not concentrated enough to give peaks that were resolved from the baseline.

4.4 Conclusion

The purpose of this project was to determine if a technique such as ATR-FTIR spectroscopy could be used to distinguish between species of a plant to be used as a field identification tool. It was seen that there were six different peaks that can mainly be assigned to vibrations coming from the sugars in the cell walls of the plants. It was also seen that there was a difference in the intensities and positions of some of these peaks with each of the Astragalus species. Computational analysis of these Astragalus samples, as well as many others, need to be completed to create a library of the unique spectra of this genus.

48

Figure 4.1 Depiction of the multiple “bounce” behavior that occurs in the crystal of an

ATR instrument

49

Figure 4.2 ATR-FTIR spectra of Astragalus leaves (unprepared)

ATR-FTIR spectra of all eleven Astragalus sample leaves that have not been prepared in any way in the laboratory (i.e. did not undergo the de-waxing procedure with hexane)

50

Figure 4.3 ATR-FTIR spectra of Astragalus leaves (without wax coating)

ATR-FTIR spectra of the eleven Astragalus leaves after they have been treated with hexane in order to remove the wax layer on the surface of the leaves

51

Figure 4.4 ATR-FTIR spectra of the methanol extraction of the unprepared Astragalus leaves

ATR-FTIR spectra of the dried contents of the methanol solutions after the eleven Astragalus samples had been extracted into it. These leaves had not been treated with hexane and therefore still had the wax present on the surface of the leaves.

52

Figure 4.5 ATR-FTIR spectra of the methanol extraction of the de-waxed Astragalus leaves

ATR-FTIR spectra of the dried contents of the methanol solutions after the eleven Astragalus samples had been extracted into it. These leaves were treated with hexane and therefore have the wax removed from the surface of the leaves.

53

Figure 4.6 ATR-FTIR spectra of the Astragalus leaves that were extracted with methanol

(unprepared)

ATR-FTIR spectra of all eleven Astragalus sample leaves after they had been extracted in the methanol. These leaves had not been treated with the hexane and therefore still had the wax on the surface of the leaves.

54

Summary and Future Goals

The overall goal of these studies was to develop non-natural amino acids as probes for determining peptide structure. This goal was accomplished by first characterizing nitrile derivatives of phenylalanine and placing these amino acids into small peptide systems. Then, a

‘molecular ruler’ was built by simulating a series of peptides that could establish a measure of how the angle between the dipoles and the distance between the probes could influence the coupling constant. Ultimately, by using the maps of how distance and angle affect the coupling constant, a molecular ‘movie’ can be made of any peptide in solution given its unique set of 2D

IR spectra over time.

55

References

(1) Fodje, M. N.; Al-Karadaghi, S. Occurrence, Conformational Features and Amino Acid

Propensities for the π-Helix. Protein Eng. Des. Sel. 2002, 15 (5), 353–358.

(2) Lee, K. H.; Benson, D. R.; Kuczera, K. Transitions from α to π Helix Observed in

Molecular Dynamics Simulations of Synthetic Peptides. 2000, 39 (45),

13737–13747.

(3) Chou, P. Y.; Fasman, G. D. Conformational Parameters for Amino Acids in Helical, β-

Sheet, and Random Coil Regions Calculated from Proteins. Biochemistry 1974, 13 (2),

211–222.

(4) Cowan, P. M.; McGavin, S.; North, A. C. T. The Polypeptide Chain Configuration of

Collagen. Nature 1955, 176 (4492), 1062–1064.

(5) Ramachandran, G. N.; Ramakrishnan, C.; Sasisekharan, V. Stereochemistry of

Polypeptide Chain Configurations. J. Mol. Biol. 1963, 7 (1), 95–99.

(6) Richardson, J. S.; Keedy, D. A.; Richardson, D. C. “THE PLOT” THICKENS: MORE

DATA, MORE DIMENSIONS, MORE USES. In Biomolecular Forms and Functions;

2013; pp 46–61.

(7) Lovell, S. C.; Davis, I. W.; Adrendall, W. B.; de Bakker, P. I. W.; Word, J. M.; Prisant, M.

G.; Richardson, J. S.; Richardson, D. C. Structure Validation by C Alpha Geometry:

Phi,psi and C Beta Deviation. Proteins-Structure Funct. Genet. 2003, 50 (August 2002),

437–450.

(8) Yu, W.; Dawson, P. E.; Zimmermann, J.; Romesberg, F. E. Carbon-Deuterium Bonds as 56

Probes of Protein Thermal Unfolding. J. Phys. Chem. B 2012, 116 (22), 6397–6403.

(9) Adhikary, R.; Zimmermann, J.; Dawson, P. E.; Romesberg, F. E. IR Probes of Protein

Microenvironments: Utility and Potential for Perturbation. ChemPhysChem 2014, 15 (5),

849–853.

(10) Zimmermann, J.; Thielges, M. C.; Yu, W.; Dawson, P. E.; Romesberg, F. E. Carbon-

Deuterium Bonds as Site-Specific and Nonperturbative Probes for Time-Resolved Studies

of Protein Dynamics and Folding. J. Phys. Chem. Lett. 2011, 2 (5), 412–416.

(11) Ma, J.; Pazos, I. M.; Zhang, W.; Culik, R. M.; Gai, F. Site-Specific Infrared Probes of

Proteins. Annu. Rev. Phys. Chem. 2015, 66 (December 2014), 357–377.

(12) Baiz, C. R.; McRobbie, P. L.; Anna, J. M.; Geva, E.; Kubarych, K. J. Two-Dimensional

Infrared Spectroscopy of Metal Carbonyls. Acc. Chem. Res. 2009, 42 (9), 1395–1404.

(13) Tucker, M. J.; Gai, X. S.; Fenlon, E. E.; Brewer, S. H.; Hochstrasser, R. M. 2D IR Photon

Echo of Azido-Probes for Biomolecular Dynamics. Phys. Chem. Chem. Phys. 2011, 13

(6), 2237–2241.

(14) Tokmakoff, A. Two-Dimensional Line Shapes Derived from Coherent Third-Order

Nonlinear Spectroscopy. J. Phys. Chem. A 2000, 104 (18), 4247–4255.

(15) Hamm, P.; Lim, M.; DeGrado, W. F.; Hochstrasser, R. M. The Two-Dimensional IR

Nonlinear Spectroscopy of a Cyclic Penta-Peptide in Relation to Its Three-Dimensional

Structure. Proc. Natl. Acad. Sci. U. S. A. 1999, 96 (5), 2036–2041.

(16) Zanni, M. T.; Ge, N.-H.; Kim, Y. S.; Hochstrasser, R. M. Two-Dimensional IR

Spectroscopy Can Be Designed to Eliminate the Diagonal Peaks and Expose Only the 57

Crosspeaks Needed for Structure Determination. Proc. Natl. Acad. Sci. 2001, 98 (20),

11265–11270.

(17) Asbury, J. B.; Steinel, T.; Kwak, K.; Corcelli, S. A.; Lawrence, C. P.; Skinner, J. L.;

Fayer, M. D. Dynamics of Water Probed with Vibrational Echo Correlation Spectroscopy.

J. Chem. Phys. 2004, 121 (24), 12431–12446.

(18) Dunkelberger, E. B.; Grechko, M.; Zanni, M. T. Transition Dipoles from 1D and 2D

Infrared Spectroscopy Help Reveal the Secondary Structures of Proteins: Application to

Amyloids. J. Phys. Chem. B 2015, 119 (44), 14065–14075.

(19) Shim, S. H.; Strasfeld, D. B.; Ling, Y. L.; Zanni, M. T. Automated 2D IR Spectroscopy

Using a Mid-IR Pulse Shaper and Application of This Technology to the Human Islet

Amyloid Polypeptide. Proc Natl Acad Sci U S A 2007, 104 (36), 14197–14202.

(20) Middleton, C. T.; Woys, A. M.; Mukherjee, S. S.; Zanni, M. T. Residue-Specific

Structural Kinetics of Proteins through the Union of Isotope Labeling, Mid-IR Pulse

Shaping, and Coherent 2D IR Spectroscopy. Methods 2010, 52 (1), 12–22.

(21) Ghosh, A.; Hochstrasser, R. M. A Peptide’s Perspective of Water Dynamics. Chem. Phys.

2011, 390 (1), 1–13.

(22) Kim, Y. S.; Hochstrasser, R. M. Applications of 2D IR Spectroscopy to Peptides, Proteins,

and Hydrogen-Bond Dynamics. J. Phys. Chem. B 2009, 113 (24), 8231–8251.

(23) Tucker, M. J.; Kim, Y. S.; Hochstrasser, R. M. 2D IR Photon Echo Study of the

Anharmonic Coupling in the OCN Region of Phenyl Cyanate. Chem. Phys. Lett. 2009,

470 (1-3), 80–84. 58

(24) Grechko, M.; Zanni, M. T. Quantification of Transition Dipole Strengths Using 1D and

2D Spectroscopy for the Identification of Molecular Structures via Exciton Delocalization:

Application to α-Helices. J. Chem. Phys. 2012, 137 (18).

(25) Dunkelberger, E. B.; Woys, A. M.; Zanni, M. T. 2D IR Cross Peaks Reveal Hydrogen-

Deuterium Exchange with Single Residue Specificity. J. Phys. Chem. B 2013, 117 (49),

15297–15305.

(26) Kim, Y. S.; Hochstrasser, R. M. Chemical Exchange 2D IR of Hydrogen-Bond Making

and Breaking. Proc. Natl. Acad. Sci. U. S. A. 2005, 102 (32), 11185–11190.

(27) Ghosh, A.; Remorino, A.; Tucker, M. J.; Hochstrasser, R. M. 2D IR Photon Echo

Spectroscopy Reveals Hydrogen Bond Dynamics of Aromatic Nitriles. Chem. Phys. Lett.

2009, 469 (4-6), 325–330.

(28) Kim, Y. S.; Wang, J.; Hochstrasser, R. M. Two-Dimensional Infrared Spectroscopy of the

Alanine Dipeptide in Aqueous Solution. J. Phys. Chem. B 2005, 109 (15), 7511–7521.

(29) Baiz, C. R.; McRobbie, P. L.; Anna, J. M.; Geva, E.; Kubarych, K. J. Two-Dimensional

Infrared Spectroscopy of Metal Carbonyls. Acc. Chem. Res. 2009, 42 (9), 1395–1404.

(30) Hamm, P.; Zanni, M. T. Concepts and Methods of 2D Infrared SPectroscopy; Cambridge

University Press, 2011.

(31) Sun, Z.; Zhang, W.; Ji, M.; Hartsock, R.; Gaffney, K. J. Contact Ion Pair Formation

between Hard Acids and Soft Bases in Aqueous Solutions Observed with 2DIR

Spectroscopy. J. Phys. Chem. B 2013, 117 (49), 15306–15312.

(32) Zheng, M. L.; Zheng, D. C.; Wang, J. Non-Native Side Chain IR Probe in Peptides: Ab 59

Initio Computation and ID and 2D IR Spectral Simulation. J. Phys. Chem. B 2010, 114

(6), 2327–2336.

(33) Hamm, P.; Lim, M.; Hochstrasser, R. M. Structure of the Amide I Band of Peptides

Measured by Femtosecond Nonlinear-Infrared Spectroscopy. J. Phys. Chem. B 1998, 102

(98), 6123–6138.

(34) Ghosh, A.; Ostrander, J. S.; Zanni, M. T. Watching Proteins Wiggle: Mapping Structures

with Two-Dimensional Infrared Spectroscopy. Chem. Rev. 2017, acs.chemrev.6b00582.

(35) Shim, S.-H.; Zanni, M. T. How to Turn Your Pump-Probe Instrument into a

Multidimensional Spectrometer: 2D IR and Vis Spectroscopies via Pulse Shaping. Phys.

Chem. Chem. Phys. 2009, 11 (5), 748–761.

(36) Shim, S.-H.; Strasfeld, D. B.; Zanni, M. T. Generation and Characterization of Phase and

Amplitude Shaped Femtosecond Mid-IR Pulses. Opt. Express 2006, 14 (26), 13120–

13130.

(37) Middleton, C. T.; Strasfeld, D. B.; Zanni, M. T. Polarization Shaping in the Mid-IR and

Polarization-Based Balanced Heterodyne Detection with Application to 2D IR

Spectroscopy. Opt. Express 2009, 17 (17), 14526–14533.

(38) Ghosh, A.; Serrano, A. L.; Oudenhoven, T. A.; Ostrander, J. S.; Eklund, E. C.; Blair, A.

F.; Zanni, M. T. Experimental Implementations of 2D IR Spectroscopy through a

Horizontal Pulse Shaper Design and a Focal Plane Array Detector. Opt. Lett. 2016, 41 (3),

524–527.

(39) Getahun, Z.; Huang, C. Y.; Wang, T.; De Leon, B.; DeGrado, W. F.; Gai, F. Using Nitrile- 60

Derivatized Amino Acids as Infrared Probes of Local Environment. J. Am. Chem. Soc.

2003, 125 (2), 405–411.

(40) Bloem, R.; Koziol, K.; Waldauer, S. A.; Buchli, B.; Walser, R.; Samatanga, B.; Jelesarov,

I.; Hamm, P. Ligand Binding Studied by 2D IR Spectroscopy Using the

Azidohomoalanine Label. J. Phys. Chem. B 2012, 116 (46), 13705–13712.

(41) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol.

Graph. 1996, 14 (1), 33–38.

(42) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M.

J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.;

Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W.

E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.;

Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for

Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102 (18),

3586–3616.

(43) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; MacKerell, A. D.

Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting

Improved Sampling of the Backbone φ, ψ and Side-Chain χ1and χ2 Dihedral Angles. J.

Chem. Theory Comput. 2012, 8 (9), 3257–3273.

(44) MacKerell, A. D.; Feig, M.; Brooks, C. L. Improved Treatment of the Protein Backbone in

Empirical Force Fields. J. Am. Chem. Soc. 2004, 126 (3), 698–699.

(45) Chalyavi, F.; Gilmartin, P. H.; Schmitz, A. J.; Fennie, M. W.; Tucker, M. J. Synthesis of 61

5-Cyano-Tryptophan as a Two-Dimensional Infrared Spectroscopic Reporter of Structure.

Angew. Chemie - Int. Ed. 2018, 57, 7528–7532.

(46) Waegele, M. M.; Tucker, M. J.; Gai, F. 5-Cyanotryptophan as an Infrared Probe of Local

Hydration Status of Proteins. Chem. Phys. Lett. 2010, 478 (4), 249–253.

(47) Suydam, I. T.; Boxer, S. G. Vibrational Stark Effects Calibrate the Sensitivity of

Vibrational Probes for Electric Fields in Proteins. Biochemistry 2003, 42 (41), 12050–

12055.

(48) Moran, A.; Mukamel, S. The Origin of Vibrational Mode Couplings in Various Secondary

Structural Motifs of Polypeptides. Proc. Natl. Acad. Sci. 2004, 101 (2), 506–510.

(49) Torii, H.; Tasumi, M. Ab Initio Molecular Orbital Study of the Amide I Vibrational

Interactions between the Peptide Groups in Di- and Tripeptides and Considerations on the

Conformation of the Extended Helix. J. Raman Spectrosc. 1998, 29 (1), 81–86.

(50) Torii, H.; Tasumi, M. Model Calculations on the Amide-I Infrared Bands of Globular

Proteins. J. Chem. Phys. 1992, 96 (5), 3379–3387.

(51) Hind, A. R.; Bhargava, S. K.; McKinnon, A. At the Solid/liquid Interface: FTIR/ATR -

The Tool of Choice. Adv. Colloid Interface Sci. 2001, 93 (1-3), 91–114.

(52) Chittur, K. K. FTIR/ATR for Protein Adsorption to Biomaterial Surfaces. Biomaterials

1998, 19 (4-5), 357–369.

(53) Gorgulu, S. T.; Dogan, M.; Severcan, F. The Characterization and Differentiation of

Higher Plants by Fourier Transform Infrared Spectroscopy. Appl. Spectrosc. 2007, 61 (3),

300–308. 62

(54) Yan, H.; Xie, Y.; Sun, S.; Sun, X.; Ren, F.; Shi, Q.; Wang, S.; Zhang, W.; Li, X.; Zhang,

J. Chemical Analysis of Astragalus Mongholicus Polysaccharides and Antioxidant

Activity of the Polysaccharides. Carbohydr. Polym. 2010, 82 (3), 636–640.

(55) M.P., F. Practical Infrared Spectroscopy of Pectic Substances. Food Hydrocoll. 1992, 6

(1), 115–142.

63

Appendix A: MatLab/VMD Codes

VMD Code Set mol [mol new “monom.psf” waitfor all] Mol addfile “monom.dcd” molid $mol waitfor all Set fp [open “phi-psi.dat” w] Set sel [atomselect $mol “alpha”] Set n [molinfo $mol get numframes] For {set I 0} {$i<$n} {incr i} { $sel frame $i $sel update Puts $fp “\#frame:$i” Set a [$sel num] For {set j 0} {$j<$a} {incr j} { Puts $fp “[expr $j+1] [lindex [$sel get {resname phi psi}] $j]” } } $sel delete Close $fp

The first two lines load the psf and dcd files into VMD. I used the actual file names since the ones above are just place holders. The third line opens a file called phi-psi.dat which is a text file that is opened in notepad. The forth line selects the atoms of the peptide backbone and the fifth counts the number of trajectory steps are contained in the dcd file. The nested for loop calculates the phi and psi angles for each residue and the last couple of lines delete a variable and close the file. Before this code was carried out systematically on all of the peptides in this project, a test peptide that should have had a poly-glycine helix formation was chosen (F7GF) to test if the code actually worked. Below is a screenshot of the Tk consul in VMD and a piece of the notepad file that was produced. It is clearly seen that the code worked for an eight residue peptide and is in the process of being applied to all of the peptides in this project.

64

MatLab Codes %Creating Ramachandran Plots from NAMD Simulations in VMD data = load ('phi-psi_F10PF_straight.txt'); phi = data(:,1); %extracts the first column from the txt file psi = data(:,2); %extracts the second column from the txt file data2 = load ('phi-psi_F10PF_3-10.txt'); phi2 = data2(:,1); psi2 = data2(:,2); data3 = load ('phi-psi_F10PF_alpha.txt'); phi3 = data3(:,1); psi3 = data3(:,2); data4 = load ('phi-psi_F10PF_beta.txt'); phi4 = data4(:,1); psi4 = data4(:,2); data5 = load ('phi-psi_F10PF_pi.txt'); phi5 = data5(:,1); psi5 = data5(:,2); data6 = load ('phi-psi_F10PF_pp2.txt'); phi6 = data6(:,1); psi6 = data6(:,2); sz = 2; %specifies the size of the circle markers, smaller number = smaller circle scatter(phi,psi,sz,'filled','k') %creates a scatter plot with filled circle markers hold on scatter(phi2,psi2,sz,'filled','k') hold on scatter(phi3,psi3,sz,'filled','k') hold on scatter(phi4,psi4,sz,'filled','k') hold on scatter(phi5,psi5,sz,'filled','k') hold on scatter(phi6,psi6,sz,'filled','k') hold on xlim([-180,180]) %changes the x-axis 65 ylim([-180,180]) %changes the y-axis xlabel('phi') ylabel('psi') box on %adds a border around the plot hold on x1=0; x2=0; y1=180; y2=-180; plot([x1,x2],[y1,y2],'k') %adds a vertical black line hold on x3=-180; x4=180; y3=0; y4=0; plot([x3,x4],[y3,y4],'k') %adda a horizontal black line hold on

%%%%Plots glycine regions of the Ramachandran plot %e=2; %Glycine_Regions_code; %plot(X1,Y1,'r','linewidth',e) %hold on %plot(X2,Y2,'r','linewidth',e) %hold on %plot(X3,Y3,'r','linewidth',e) %hold on %plot(X4,Y4,'r','linewidth',e) %hold on %plot(X5,Y5,'r','linewidth',e) %hold on %plot(X6,Y6,'r','linewidth',e) %hold on %plot(X7,Y7,'b','linewidth',e) %hold on %plot(X8,Y8,'b','linewidth',e) %hold on %plot(X9,Y9,'b','linewidth',e) %hold on %plot(X10,Y10,'b','linewidth',e) %hold on

%%%%Plots proline regions of the Ramachandran plot d=2; Proline_Regions_code; plot(X1,Y1,'r','linewidth',d) 66 hold on plot(X2,Y2,'r','linewidth',d) hold on plot(X3,Y3,'r','linewidth',d) hold on plot(X4,Y4,'r','linewidth',d) hold on plot(X5,Y5,'b','linewidth',d) hold on plot(X6,Y6,'b','linewidth',d) hold on

%%%%Plots regular regions of the Ramachandran plot %c=2; %Regular_Ramachandran_Regions_code; %plot(X1,Y1,'r','linewidth',c) %hold on %plot(X2,Y2,'r','linewidth',c) %hold on %plot(X3,Y3,'r','linewidth',c) %hold on %plot(X4,Y4,'b','linewidth',c) %hold on %plot(X5,Y5,'b','linewidth',c) %hold on %plot(X6,Y6,'b','linewidth',c) %hold on %plot(X7,Y7,'b','linewidth',c) %hold on %plot(X8,Y8,'b','linewidth',c) %hold on %plot(X9,Y9,'b','linewidth',c) %hold on %plot(X10,Y10,'b','linewidth',c) %hold on

%Plotting the Ramachandran regions in the background (Glycine Regions) Gly_regions = load ('Glycine_Regions.txt'); for n = 1:1:27 GlyX1(n) = Gly_regions (n,1); X1 = GlyX1'; end for m = 28:1:86 GlyX2(m) = Gly_regions (m,1); X2 = nonzeros(GlyX2'); 67 end for o = 87:1:121 GlyX3(o) = Gly_regions (o,1); X3 = nonzeros (GlyX3'); end for p = 122:1:154 GlyX4(p) = Gly_regions (p,1); X4 = nonzeros (GlyX4'); end for q = 155:1:213 GlyX5(q) = Gly_regions (q,1); X5 = nonzeros (GlyX5'); end for r = 214:1:240 GlyX6(r) = Gly_regions (r,1); X6 = nonzeros (GlyX6'); end for s = 241:1:308 GlyX7(s) = Gly_regions (s,1); X7 = nonzeros (GlyX7'); end for t = 309:1:364 GlyX8(t) = Gly_regions (t,1); X8 = nonzeros (GlyX8'); end for u = 365:1:422 GlyX9(u) = Gly_regions (u,1); X9 = nonzeros (GlyX9'); end for a = 423:1:486 GlyX10(a) = Gly_regions (a,1); X10 = nonzeros (GlyX10'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for n = 1:1:27 GlyY1(n) = Gly_regions (n,2); Y1 = GlyY1'; end for m = 28:1:86 GlyY2(m) = Gly_regions (m,2); Y2 = nonzeros(GlyY2'); end for o = 87:1:121 GlyY3(o) = Gly_regions (o,2); Y3 = nonzeros (GlyY3'); end 68 for p = 122:1:154 GlyY4(p) = Gly_regions (p,2); Y4 = nonzeros (GlyY4'); end for q = 155:1:213 GlyY5(q) = Gly_regions (q,2); Y5 = nonzeros (GlyY5'); end for r = 214:1:240 GlyY6(r) = Gly_regions (r,2); Y6 = nonzeros (GlyY6'); end for s = 241:1:308 GlyY7(s) = Gly_regions (s,2); Y7 = nonzeros (GlyY7'); end for t = 309:1:364 GlyY8(t) = Gly_regions (t,2); Y8 = nonzeros (GlyY8'); end for u = 365:1:422 GlyY9(u) = Gly_regions (u,2); Y9 = nonzeros (GlyY9'); end for a = 423:1:486 GlyY10(a) = Gly_regions (a,2); Y10 = nonzeros (GlyY10'); end

%Plotting the Ramachandran regions in the background (Proline Regions) Pro_regions = load ('Proline_Regions.txt'); for n = 1:1:28 ProX1(n) = Pro_regions (n,1); X1 = ProX1'; end for m = 29:1:36 ProX2(m) = Pro_regions (m,1); X2 = nonzeros(ProX2'); end for o = 37:1:55 ProX3(o) = Pro_regions (o,1); X3 = nonzeros (ProX3'); end for p = 56:1:98 ProX4(p) = Pro_regions (p,1); 69

X4 = nonzeros (ProX4'); end for q = 99:1:174 ProX5(q) = Pro_regions (q,1); X5 = nonzeros (ProX5'); end for r = 175:1:192 ProX6(r) = Pro_regions (r,1); X6 = nonzeros (ProX6'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for n = 1:1:28 ProY1(n) = Pro_regions (n,2); Y1 = ProY1'; end for m = 29:1:36 ProY2(m) = Pro_regions (m,2); Y2 = nonzeros(ProY2'); end for o = 37:1:55 ProY3(o) = Pro_regions (o,2); Y3 = nonzeros (ProY3'); end for p = 56:1:98 ProY4(p) = Pro_regions (p,2); Y4 = nonzeros (ProY4'); end for q = 99:1:174 ProY5(q) = Pro_regions (q,2); Y5 = nonzeros (ProY5'); end for r = 175:1:192 ProY6(r) = Pro_regions (r,2); Y6 = nonzeros (ProY6'); end

%Plotting the Ramachandran regions in the background (Regular Regions) Ramachandran_regions = load ('Ramachandran_Regions.txt'); for n = 1:1:16 RamaX1(n) = Ramachandran_regions (n,1); X1 = RamaX1'; end for m = 17:1:40 RamaX2(m) = Ramachandran_regions (m,1); X2 = nonzeros(RamaX2'); 70 end for o = 41:1:147 RamaX3(o) = Ramachandran_regions (o,1); X3 = nonzeros (RamaX3'); end for p = 148:1:176 RamaX4(p) = Ramachandran_regions (p,1); X4 = nonzeros (RamaX4'); end for q = 177:1:253 RamaX5(q) = Ramachandran_regions (q,1); X5 = nonzeros (RamaX5'); end for r = 254:1:320 RamaX6(r) = Ramachandran_regions (r,1); X6 = nonzeros (RamaX6'); end for s = 321:1:349 RamaX7(s) = Ramachandran_regions (s,1); X7 = nonzeros (RamaX7'); end for t = 350:1:359 RamaX8(t) = Ramachandran_regions (t,1); X8 = nonzeros (RamaX8'); end for u = 360:1:368 RamaX9(u) = Ramachandran_regions (u,1); X9 = nonzeros (RamaX9'); end for a = 369:1:377 RamaX10(a) = Ramachandran_regions (a,1); X10 = nonzeros (RamaX10'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%% for n = 1:1:16 RamaY1(n) = Ramachandran_regions (n,2); Y1 = RamaY1'; end for m = 17:1:40 RamaY2(m) = Ramachandran_regions (m,2); Y2 = nonzeros(RamaY2'); end for o = 41:1:147 RamaY3(o) = Ramachandran_regions (o,2); Y3 = nonzeros (RamaY3'); 71 end for p = 148:1:176 RamaY4(p) = Ramachandran_regions (p,2); Y4 = nonzeros (RamaY4'); end for q = 177:1:253 RamaY5(q) = Ramachandran_regions (q,2); Y5 = nonzeros (RamaY5'); end for r = 254:1:320 RamaY6(r) = Ramachandran_regions (r,2); Y6 = nonzeros (RamaY6'); end for s = 321:1:349 RamaY7(s) = Ramachandran_regions (s,2); Y7 = nonzeros (RamaY7'); end for t = 350:1:359 RamaY8(t) = Ramachandran_regions (t,2); Y8 = nonzeros (RamaY8'); end for u = 360:1:368 RamaY9(u) = Ramachandran_regions (u,2); Y9 = nonzeros (RamaY9'); end for a = 369:1:377 RamaY10(a) = Ramachandran_regions (a,2); Y10 = nonzeros (RamaY10'); end

Coupling constant prediction clear B=-5; %the lower limit for the coupling constant n=(5-(-5))/0.1; %number of iterations depending on step size and B for a=1:n H=[2232 B;B 2238]; EH=eig(H)'; mat(a,:)=EH; B=B+0.1; end b=-B:0.1:B; %bounderies of coupling constant with defined step size plot(b,mat,'k') xlabel('coupling constant (cm-1)') ylabel('frequency (cm-1)')

Coupling constant determination 72 clear mu1=0.06059*3.3356e-30; mu2=0.08439*3.3356e-30; epsilon=1.112650056e-10; mu1mu2=mu1.*mu2; r=14.76e-10; r3=r^3; r5=r^5; rmu1=r.*mu1; rmu2=r.*mu2; betaJ=(1/epsilon)*((mu1mu2/r3)-((3*rmu1*rmu2)/r5)); betacm=betaJ*5.03411e22

Determining eigenvalues from Hamiltonian clear B=-1.1462; %beginning coupling constant for A=1:5 H=[2232 B;B 2238]; format long %displays the values to 16 decimal places EH=eig(H)'; %gives eigenvalues of the matrix H in a row vector P=poly(H) [V,D]=eig(H,'nobalance') %gives eigenvectors and a diagonalized matrix with eigenvalues B=B+0.001; mat(A,:)=EH; %places all of the eigenvalues into a single matrix V1=V(:,1)'; %places the first column of V into a vector V2=V(:,2)'; %places the second column of V into a vector V12=V1.*V1; %V1^2 V22=V2.*V2; %V2^2 sumV12=sum(V12); %sum of squares sumV22=sum(V22); %sum of squares magV1=sqrt(sumV12) %gives the magnitude of the vector V1 magV2=sqrt(sumV22) %gives the magnitude of the vector V2 end Coupling constant with angle clear mu1=0.081675457*3.3356e-30; mu2=0.069331014*3.3356e-30; epsilon=1.112650056e-10; mu1mu2=mu1*mu2; r=11.5230975e-10; r3=r^3; r5=r^5; rmu1=r*mu1; rmu2=r*mu2; theta1=0; theta2=0; 73 for a=1:41 costheta1=cos(theta1); costheta2=cos(theta2); betaJ=(1/epsilon)*((mu1mu2/r3)-((3*rmu1*costheta1*rmu2*costheta2)/r5)); betacm=betaJ*5.03411e22; mat(:,a)=betacm; theta1=theta1+0.25; theta2=theta2+0.25; end angle=0:0.25:10; plot(angle,mat) xlabel('angle (deg)') ylabel('coupling constant (cm-1)')

Determine eigenvectors clear B=-1; for A=1:100 H=[2232 B;B 2238]; format long EH=eig(H)'; %gives the eigenvalues mat(A,:)=EH; %places the eigenvalues into a matrix lambda1=mat(A,1); %chooses eigenvalue at row A, column 1 lambda2=mat(A,2); %chooses eigenvalue at row A, column 2 syms x y sol = solve((2232-lambda1)*x + B*y == 0, x); x=sol; y=subs(x,y,1); %assumes y=1 and solves for eigenvector vec=[1/y,1]'; %eigenvector associated with lambda1 matV1(:,A)=vec; syms m n solu = solve((2232-lambda2)*m +B*n == 0, n); n=solu; m=subs(n,m,1); %assumes m=1 and solves for eigenvector vec2=[1,1/m]'; %eigenvector associated with lambda2 matV2(:,A)=vec2; V1=matV1(:,A); %picks all rows of the column A as a vector in matV1 V2=matV2(:,A); %picks all rows of the column A as a vector in matV2 V12=V1.*V1; %V1^2 V22=V2.*V2; %V2^2 sumV12=sum(V12); %sum of the squares sumV22=sum(V22); %sum of the squares magV1=sqrt(sumV12); %magnitude of the vector V1 magV2=sqrt(sumV22); %magnitude of the vector V2 matmagV1(A,:)=magV1; matmagV2(A,:)=magV2; 74

B=B+0.01; end q=A; b=B-(q*0.01):0.01:B-0.01; plot(b,matmagV1,b,matmagV2) xlabel('coupling constant (cm-1)') ylabel('magnitude of eigenvector')

Autocorrelation clear filename = 'FF_2F2F_distance_histograms.xlsx'; sheet = 'All'; xlrange = 'B1:B50'; y = xlsread(filename,sheet,xlrange); N = length(y); plot(y) xlim([0,N]) figure autocorr(y)

Distance from coupling clear mu1=0.22832*3.3356e-30; mu2=0.273125*3.3356e-30; epsilon=1.112650056e-10; epsilon2=1/epsilon; beta=1.4/5.03411e22; a=epsilon*beta; b=3*mu1*mu2; c=0; d=-(mu1*mu2); s=(-b^3/(27*a^3))+((b*c)/(6*a^2))-(d/(2*a)); t=(c/(3*a))-(b^2/(9*a^2)); w=b/(3*a); r=(s+(s^2+t^3)^(1/2))^(1/3)+(s-(s^2+t^3)^(1/2))^(1/3)-w

Coupling constant vs distance clear mu1=0.386398*3.3356e-30; mu2=0.304747*3.3356e-30; epsilon=1.112650056e-10; mu1mu2=mu1.*mu2; r=5e-10; for A=1:2000 r3=r^3; r5=r^5; 75 rmu1=r.*mu1; rmu2=r.*mu2; betaJ=(1/epsilon)*((mu1mu2/r3)-((3*rmu1*rmu2)/r5)); betacm=betaJ*5.03411e22 r=r+0.01e-10; B(:,A)=betacm; end R=5e-10:0.01e-10:5e-10+(0.01e-10*(A-1)); plot(R,B) xlabel('distance (angstrom)') ylabel('coupling constant (cm-1)')

76

Appendix B: Supplementary Figures

Plots were then generated for all of the peptides with the sequences Phe–GlyX–Phe and Phe–

ProX–Phe where X = 0 – 10 (Figures 1 – 21). It is seen from figure 1 that the phi/psi angles calculated for the two-residue peptide Phe-Phe lie exclusively on the horizontal and vertical axes.

This is the case for all of the phenylalanine residues in all of the peptides. Phi/psi angles do not occur off of the axes until a single glycine or proline is introduced into the peptide.

Figures 2 – 11 show all of the Ramachandran plots for the peptides with the glycine spacers.

Most of the phi/psi angles occur within the allowed and favorable regions of the glycine-specific plot with only a few outliers outside of these regions. However, as the peptide becomes larger, the number of outliers increases. This pattern is not observed in the plots of the proline-rich peptides (Figures 12 – 21). This suggests that the proline-rich peptides have more stable structures than the glycine-rich peptides.

Figure 1. Ramachandran plot of the peptide with the sequence Phe-Phe 77

Figure 2. Ramachandran plots of the peptide with the sequence Phe–Gly–Phe with the regular defined regions (left) and the glycine specific regions (right)

Figure 3. Ramachandran plots of the peptide with the sequence Phe–Gly2–Phe with the regular defined regions (left) and the glycine specific regions (right) 78

Figure 4. Ramachandran plots of the peptide with the sequence Phe–Gly3–Phe with the regular defined regions (left) and the glycine specific regions (right)

Figure 5. Ramachandran plots of the peptide with the sequence Phe–Gly4–Phe with the regular defined regions (left) and the glycine specific regions (right) 79

Figure 6. Ramachandran plots of the peptide with the sequence Phe–Gly5–Phe with the regular defined regions (left) and the glycine specific regions (right)

Figure 7. Ramachandran plots of the peptide with the sequence Phe–Gly6–Phe with the regular defined regions (left) and the glycine specific regions (right) 80

Figure 8. Ramachandran plots of the peptide with the sequence Phe–Gly7–Phe with the regular defined regions (left) and the glycine specific regions (right)

Figure 9. Ramachandran plots of the peptide with the sequence Phe–Gly8–Phe with the regular defined regions (left) and the glycine specific regions (right) 81

Figure 10. Ramachandran plots of the peptide with the sequence Phe–Gly9–Phe with the regular defined regions (left) and the glycine specific regions (right)

Figure 11. Ramachandran plots of the peptide with the sequence Phe–Gly10–Phe with the regular defined regions (left) and the glycine specific regions (right) 82

Figure 12. Ramachandran plots of the peptide with the sequence Phe–Pro–Phe with the regular defined regions (left) and the proline specific regions (right)

Figure 13. Ramachandran plots of the peptide with the sequence Phe–Pro2–Phe with the regular defined regions (left) and the proline specific regions (right) 83

Figure 14. Ramachandran plots of the peptide with the sequence Phe–Pro3–Phe with the regular defined regions (left) and the proline specific regions (right)

Figure 15. Ramachandran plots of the peptide with the sequence Phe–Pro4–Phe with the regular defined regions (left) and the proline specific regions (right) 84

Figure 16. Ramachandran plots of the peptide with the sequence Phe–Pro5–Phe with the regular defined regions (left) and the proline specific regions (right)

Figure 17. Ramachandran plots of the peptide with the sequence Phe–Pro6–Phe with the regular defined regions (left) and the proline specific regions (right) 85

Figure 18. Ramachandran plots of the peptide with the sequence Phe–Pro7–Phe with the regular defined regions (left) and the proline specific regions (right)

Figure 19. Ramachandran plots of the peptide with the sequence Phe–Pro8–Phe with the regular defined regions (left) and the proline specific regions (right) 86

Figure 20. Ramachandran plots of the peptide with the sequence Phe–Pro9–Phe with the regular defined regions (left) and the proline specific regions (right)

Figure 21. Ramachandran plots of the peptide with the sequence Phe–Pro10–Phe with the regular defined regions (left) and the proline specific regions (right)