Structural and Biochemical Characterization of Members of the Biosynthetic Pathway to Inform their Application in Synthetic Biology

A thesis submitted to the University of Manchester for the degree of Master of Philosophy in the faculty of science and engineering

2017

Lewis J. Kearsey

School of Chemistry

1 Contents List of tables ...... 4 List of figures ...... 4 List of abbreviations ...... 7 Abstract ...... 9 Declaration ...... 10 Copyright statement ...... 10 Acknowledgements ...... 11 1. Introduction ...... 12 1.1 Fundamentals of synthetic biology ...... 12 1.2 Uses of synthetic biology...... 13 1.3 Microbial chassis used in synthetic biology ...... 13 1.4 Synthetic biology approaches ...... 14 2. Structural biology ...... 16 2.1 Principles of structural biology ...... 16 2.2 Crystallization ...... 16 2.3 X ray crystallography ...... 20 2.4 Macromolecule structure solving ...... 22 3. ...... 25 3.1 The role of cannabinoids in plants ...... 25 3.2 Endocannabinoid system ...... 27 3.3 The therapeutic potential of cannabinoids ...... 30 4. Cannabinoid biosynthetic pathway ...... 36 4.1 Hexanoyl-CoA and malonyl-CoA ...... 37 4.2 Tetraketide synthase ...... 38 4.3 Olivetolic acid cyclase ...... 41 4.4 Aromatic prenyltransferase ...... 43 4.5 Tetrahydrocannabinolic acid synthase ...... 43 4.6 synthase ...... 45 4.7 THCA and CBDA decarboxylation...... 47 5. Further work on the pathway ...... 48 6. Objectives of this study ...... 49 6.1 Expression of functional proteins ...... 49 6.2 Determining the structure of TKS ...... 49 6.3 Mutant design to improve efficiency of TKS ...... 50

2 7. Materials and methods ...... 51 7.1 Genes, vectors and expression strains ...... 51 7.2 Transformation and expression protocol ...... 51 7.3 Harvesting cells and protein purification ...... 52 7.4 SDS-PAGE ...... 53 7.5 Removal of protein tags ...... 53 7.6 Size exclusion chromatography ...... 53 7.7 TKS crystallization and structure determination ...... 54 7.8 Adding soluble tags to OAC ...... 55 7.9 OAC size exclusion chromatography ...... 57 7.10 Biotransformations ...... 57 7.11 Mass spectrometry of organic products ...... 58 7.12 TKS mutant design and activity testing ...... 58 8. Results and discussion ...... 61 8.1 TKS expression and purification ...... 61 8.2 OAC expression and purification ...... 66 8.3 Introduction of soluble tags to OAC ...... 68 8.4 Point mutations of TKS ...... 76 8.5 Liquid chromatography mass spectrometry of biotransformations ...... 82 8.6 TKS structure ...... 89 9. Conclusions ...... 105 10. Future work ...... 107 Bibliography ...... 108

Word count: 30,796

3 List of tables

Table 1 - X-ray data collection and refinement statistics…………………………………………..53

Table 2 - Primers for OAC amplification…………………………………………………………..…..54

Table 3 - HPLC conditions……………………………………………………………………..………...57

Table 4 - Primers to introduce TKS mutations……………………………………………………….58

List of figures

Figure 1.1 Siting drop vapour diffusion well diagram……………………………………….……..20

Figure 1.2 Chemical structures of THCA and CBDA………………………………………………..25

Figure 1.3 Chemical structures of anandamide and 2-AG…………………………………………28

Figure 1.4 Endocannabinoid retrograde signaling pathway………………………………………29

Figure 1.5 full cannabinoid biosynthetic pathway…………………………………………………..36

Figure 1.6 PKS iterative condensation reaction mechanism……………………………………...37

Figure 1.7 TKS catalyzed reaction……………………………………………………………………...40

Figure 1.8 TKS by-product reactions…………………………………………………………………..40

Figure 1.9 Ribbon representation of apo and substrate bound OAC……………………………41

Figure 1.10 Reaction mechanism of OAC…………………………………………………………….42

Figure 1.11 Prenylation reaction mediated by C. sativa APT……………………………………...43

Figure 1.12 Structure and of FAD bound THCAS………………………………………44

Figure 1.13 THCAS reaction mechanism……………………………………………………………...45

Figure 1.14 CBDAS and THCAS reaction mechanisms…………………………………………….46

Figure 2.1 SDS-PAGE of TKS IMAC purification…...………………………………………………..60

Figure 2.2 SDS-PAGE of TKS TEV digestion…………………………………………………………61

Figure 2.3 SDS-PAGE of TEV digested TKS reverse IMAC…………...…………………………...62

Figure 2.4 TKS size exclusion chromatography UV absorption………………………………….63

Figure 2.5 SDS-PAGE of fractions from TKS size exclusion chromatography………………...65

Figure 2.6 SDS-PAGE of BL21 cells expressed OAC IMAC purification………………………...65

Figure 2.7 SDS-PAGE of ArcticExpress expressed OAC IMAC purification……………………66

Figure 2.8 1% agarose gel of OAC gene amplification……………………………………………..67

4 Figure 2.9 1% agarose gel of double digested soluble tag vectors……………………………...68

Figure 2.10 SDS-PAGE of TRX tagged OAC IMAC purification…………………………………...69

Figure 2.11 SDS-PAGE of GST tagged OAC IMAC purification…………………………………...70

Figure 2.12 SDS-PAGE of NUS tagged OAC IMAC purification……………...…………………...71

Figure 2.13 SDS-PAGE of reverse IMAC of TEV digested GST tagged OAC…………………...72

Figure 2.14 OAC size exclusion chromatography UV absorption ……………………………….73

Figure 2.15 SDS-PAGE of fractions from OAC size exclusion chromatography………………75

Figure 2.16 1% agarose gel of TKS mutant PCR reactions………………………………………..75

Figure 2.17 SDS-PAGE of TKS IMAC variant purification………………………………………….79

Figure 3.1 LC/MS of TKS reaction controls…………………………………………………………..81

Figure 3.2 LC/MS of control reactions…………………………………………………………………83

Figure 3.3 LC/MS of TKS incubated with its substrates……………………………………………85

Figure 3.4 LC/MS of TKS and OAC incubated with their substrates……………………………..85

Figure 3.5 Comparison of GST tagged OAC and native GST LC/MS…………………………….86

Figure 3.6 LC/MS of TKS variants incubated with their substrates………………………………87

Figure 4.1 Crystals used in X ray crystallography of TKS…………………………………………88

Figure 4.2 Ribbon representation of apo TKS homodimer………………………………………...88

Figure 4.3 Ribbon representation of TKS superimposed onto Freesia hybrid CHS…………..89

Figure 4.4 Ribbon representation of apo TKS with catalytic residues shown…………………90

Figure 4.5 Active site of TKS with catalytic residues labelled…………………………………….90

Figure 4.6 TKS active site superimposed onto L208F benzolactone from Rhem palmatum..91

Figure 4.7 The phenylalanine gatekeepers of TKS…………………………………………………92

Figure 4.8 The gatekeepers of TKS superimposed onto CHS from Oryza sativa……………..92

Figure 4.9 The Met130 Residue of TKS………………………………………………………………..94

Figure 4.10 The Met130 of TKS superimposed onto CHS from Oryza sativa…………………..94

Figure 4.11 Substrate bound TKS monomer…………………………………………………………96

Figure 4.12 Electrostatic potential representation of TKS active site entrance……………….96

Figure 4.13 Apo TKS superimposed onto substrate bound TKS…………………………………98

Figure 4.14 Loop where movement occurs during substrate binding to TKS…………………98

Figure 4.15 Hydrogen bonds of TKS bound to malonyl-CoA……………………………………100

5 Figure 4.16 Hydrogen bonds of TKS with hexanoyl-CoA………………………………………...101

Figure 4.17 Active site of substrate bound TKS……………………………………………………102

Figure 4.18 Substrate bound TKS active site superimposed onto apo TKS active site…….102

Figure 4.19 The active site residues of TKS that were mutated…………………………………103

6 List of abbreviations

Å – Angstrom

AAE - Acyl-activating

ACCase - Acetyl-CoA carboxylase

Acyl-ACP - Acyl-acyl carrier

AEA - Anandamide

AIDS - Acquired immune deficiency syndrome

AKT – Protein kinase B

AMPA - 2-amino-3-(4-butyl-3-hydroxyisoxazol-5-yl)propionic acid

APT - Aromatic prenyltransferase

BCCP – Biotin carboxyl carrier protein

BC – Biotin carboxylase

C - Celsius

CB1 - Cannabinoid receptor 1

CB2 - Cannabinoid receptor 2

CBCA -

CBD - Cannabidiol

CBDA - Cannabidiolic acid

CBDAS - Cannabidiolic acid synthase

CBGA -

CHS - Chalcone synthase

CNS - Central nervous system

CoA - Coenzyme A

CT – Carboxyl

EMT - Endocannabinoid membrane transporter

FAD - Flavin adenine dinucleotide

FC – Calculated data

FO – Observed data

GPP - Geranyl pyrophosphate

HIV - Human immunodeficiency virus

7

HTAL - Hexanoyl triacetic acid lactone

IL-1 – Interleukin-1

IFN-γ – Interferon gamma

K - Kelvin kDa – KiloDalton

KeV – Kiloelectron-volt

LC/MS – Liquid chromatography-mass spectometry

MAPKs - Mitogen activated protein kinases

MPT - Mitochondrial permeability transition

MS - Multiple sclerosis

NMDAr - N-methyl-D-aspartate

NO - Nitric oxide

OAC - Olivetolic acid cyclase

OD - Optical density

PDAL - Pentyl diacetic acid lactone

PCR - Polymerase chain reaction

PI3K - Phosphatidylinositol 3-kinase

PKS - Polyketide synthase

PNS - Peripheral nervous system

RMSD - Root mean square deviation

ROI - Reactive oxygen intermediates

THC - Tetrahydrocannabinol

THCA - Tetrahydrocannabinolic acid

THCAS - Tetrahydrocannabinolic acid synthase

TKS - Tetraketide synthase

TNF - Tumour necrosis factor

UV - Ultraviolet

VR1 - Vallinoid receptor 1

2-AG - 2-arachidonoylglycerol

8 Abstract

Cannabinoids are a unique group of secondary metabolites only found in the plant species Cannabis sativa. Over 70 cannabinoids have been identified; of which the two present at the highest levels in the plant which, due to their psychoactive and therapeutic activities, have attracted the most attention are tetrahydrocannabinol and cannabidiol. In this study, members of the cannabinoid biosynthetic pathway are characterized using the principles of biophysics and biochemistry in order to inform the synthetic biology process of introducing the pathway into a heterogeneous organism.

The first enzyme of the pathway – tetraketide synthase – was expressed and purified using an E. coli expression system. The activity of the recombinant tetraketide synthase was tested by LC/MS analysis of the products from a biotransformation reaction. The product profile from which matched that of the native tetraketide synthase from C. sativa, thus, confirming that E. coli can express active forms of this protein. Successful crystallization and subsequent structure determination for the previously uncharacterized tetraketide synthase was achieved. This structure identified the five layered αβαβα homodimer tertiary structure of the enzyme and the of cysteine, asparagine and histidine, both of which are typical features of a type III polyketide synthase. A number of other conserved type III polyketide synthase features were identified in the tetraketide synthase, including phenylalanine gatekeepers to the active site and a methionine important to dimerization. The structure also allowed the design of variants in an attempt to improve the efficiency, particularly by reducing the level of by-products. Alanine scanning of the residues Ser126, Met130, Asp185, Met187, Ile248, Leu257, Phe259, Leu261, His297, Asn330 and Ser332 was conducted. Mutation of the residues Met130, His297 and Asn330 confirmed their importance to the activity and stability of the protein. The other mutated residues were localized to the active site of the tetraketide synthase to test their possible impact on the proteins activity. Though none of these mutations were successful in producing active enzyme they can inform future design of variants.

The second enzyme of the pathway – olivetolic acid cyclase – was also expressed and purified using an E. coli expression system. The previously reported method of attaching a GST tag to the protein was confirmed as an effective approach to overcome its insolubility. Two other soluble tags, namely TRX and NUS, were also tested but showed to be less effective than the GST. A biotransformation reaction with LC/MS analysis of the products also confirmed that the recombinant olivetolic acid cyclase, both with the GST tag present and removed, was active and produced the intermediate of the cannabinoid pathway, olivetolic acid.

From the information gained through these studies the first two enzymes can effectively be introduced to an E. coli host as part of a construct to allow efficient cannabinoid production utilizing a synthetic biology approach. Additionally, further attempts to improve the efficiency of these proteins can be made by using the data collected.

9 Declaration

No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning

Copyright statement

The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes.

Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made.

The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions.

Further information on the conditions under which disclosure, publication and commercialization of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=24420), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.library.manchester.ac.uk/about/regulations/) and in The University’s policy on Presentation of Theses

10 Acknowledgements

Firstly, I would like to thank my supervisor professor Nigel Scrutton for the opportunity to contribute to this project and to be part of his research group. His advice and guidance in the direction of my work has been of much value in the success of this project.

I would like to give a special thanks to Dr Vijaykumar Karuppiah whose help was invaluable to my education of the principles of protein expression, purification, crystallization and structure determination. The help I received from Vijay was pivotal to the accurate determination of the protein structure that is presented in this thesis. From his tutorage I have gained the essential skills of proper laboratory protocol and approach to research, which will be crucial to any future work I partake in.

Many thanks to Cunyu Yan for teaching me the principles of mass spectrometry and his help in the LC/MS analysis that was conducted as part of this study. Thank you to Adrian Jervis, Andrew Currin and the support of the Synbio center of the MIB for all of their contributions to this project. Thank you to Nicole Prandi and her supervisor Eriko Takano, for all of their work that has been conducted in conjunction with this project.

I would to give a big thanks to everyone in the Scrutton group for their welcome into the laboratory any help or advice I received from them throughout this year.

Thank you to the University of Manchester for the opportunity to study and research at their institution.

Finally, I would like to show my gratitude my family and friends who have helped and supported me during my studies.

11 1. Introduction

Synthetic biology has been at the forefront of drug discovery and small molecule production for a number of years and, when used in tandem with structural biology and biochemistry, an understanding of the complex mechanisms behind the biosynthetic pathways utilized can be attained (Go et al., 2015). This allows the modification of members of these pathways in order to improve their catalytic efficiency, substrate specificity and product identity to increase their potential use in synthetic biology and drug discovery. There is significant evidence of the potential use of cannabinoids as a treatment for a number of acute and chronic disorders. These include a number of conditions that do not currently have effective treatments. This thesis studies the principles of synthetic biology and structural biology, and how they can be combined. The major focus is on the cannabinoid biosynthetic pathway and how the products of this pathway can be used as a treatment of a number of disorders. The structural and biochemical study of this pathway will allow its effective use in synthetic biology and enable modifications to be made to increase its potential as a source of cannabinoid compounds.

1.1 Fundamentals of synthetic biology

Synthetic biology is an interdisciplinary science that involves the combined efforts of biologists, engineers, physicists and computer scientists. The main aims of synthetic biology are to understand, redesign and repurpose biological systems in order to fulfill a novel purpose that have useful applications in science, industry or medicine. Synthetic biology approaches are most often utilized for sustainable biomolecule, biomaterial or fine and bulk chemical production. This sustainable production reduces the need to extract biomolecules or biomaterials from their natural sources, and eliminates the need for large cultivation of natural resources to obtain products that may not be abundant in nature. Synthetic biology can also give alternative, and less costly, routes to chemical production compared to more traditional chemistry methods (Moses et al., 2017).

The disciplines of metabolic engineering and synthetic biology are similar and have some overlap but are still two distinct fields. Metabolic engineering - the older of the two fields - is based on the directed modification of metabolic pathways in living organisms in order to synthesize desired products. The basis of this process is the assembly of pathway genes in order to yield a greater titre with an efficient high rate of production. Metabolic engineering is inherently quick to produce engineered strains but takes time to optimize these strains to a level suitable for commercial use (Stephanopoulos, 2012). The overlap between metabolic engineering and synthetic biology occurs in the engineering of pathways; unlike metabolic engineering, synthetic biology is characterized by utilizing well defined ‘parts’ that act as building blocks, which are easily combined and exchanged to build and modify pathways. Combining this with modeling and computational techniques allows the development of high throughput methodologies to generate novel gene networks and synthesize whole genomes by utilizing synthetic cells, genetic circuits and non-linear cell dynamics.

12 The combined strategies of metabolic engineering and synthetic biology results in improved performance of industrial organisms in vivo (Keasling, 2012; Stephanopoulos, 2012; Awan et al., 2016; Lechner et al., 2016).

1.2 Uses of synthetic biology

Synthetic biology has revolutionized drug discovery, particularly through exploiting plants for their high value compounds that have served as a source of human medicine throughout history (Frasch et al., 2013). Higher plant species accumulate diverse natural products known as secondary metabolites that are non-essential to the physiological processes of the plant but contribute to their ecology and interaction with the environment. Many secondary metabolites have been exploited for their therapeutic effects by humans and, more recently, for flavours, fragrances and colourants in the food and cosmetic industries. There are three major classes of secondary metabolites: terpenoids; alkaloids; and phenolics. These high value compounds are often only present in scarce amounts in plants, which has led to efforts to introduce the biosynthetic pathways of secondary metabolites into heterogeneous organisms, primarily microorganisms, to increase their production (Moses et al., 2017).

1.3 Microbial chassis used in synthetic biology

Microbial systems have many advantages including: rapid proliferation; robust tolerance to process conditions; and being readily scalable. The purification of the desired secondary metabolite is also simplified, as other similar competing compounds are not present. This process is very cost effective as it allows the conversion of inexpensive feedstock, which is sometimes waste, to high value compounds (Paddon and Keasling, 2014). Production of secondary metabolites in a heterogeneous organism requires balancing the expression of pathway genes with the native functions of the organism. This is due to the burden that might be placed on an organism by introducing biosynthetic pathways, which use resources of native pathways and may produce toxic intermediates (Keasling, 2012).

The two most commonly used cell factories in synthetic biology are Escherichia coli (E. coli) - a prokaryotic expression system - and Saccharomyces cerevisiae (S. cerevisiae) - a eukaryotic expression system. E. coli is an effective prokaryotic host due to its rapid growth rate, high yield and scaling up capability. Additionally, there is cost effective manipulation technology available for E. coli and it has a well-established and extensive gene toolkit with various expression vectors and strains available. The drawbacks of E. coli as an expression system come from its lack of intracellular membranes and its inability to perform post translational modifications. S. cerevisiae is similar in that the manipulation technology and gene toolkits are established, but the main advantages of its use as an expression system is due to the presence of intracellular membranes and its ability to provide post translational modifications. The disadvantages of S. cerevisiae are

13 that it can be a challenging organism to engineer and can only express some proteins as non- functional forms (Moses et al., 2017).

1.4 Synthetic biology approaches

The engineering of biological systems for a desired and novel purpose is possible using knowledge of molecular biology and understanding the relationship between a biomolecule's structure and function, thereby permitting the design of novel nucleic acid and protein sequences (Marner, 2009). The principle behind synthetic biology driven engineering is the acquirement of standardized DNA ‘parts’ that can be readily assembled and interchanged in order to design and build multigene heterologous pathways in an engineered organism (Scaife and Smith, 2016).

The success of the engineering ultimately relies on the predictable expression of transgenes to modify metabolism. This is dependent on protein synthesis being tightly regulated and inducible in order to control product yield, while avoiding unwanted defects to the organism’s endogenous functions (Moses et al., 2017). Regulatory control of protein expression can be achieved at various levels that include translation, protein stability and enzyme activity. The most commonly employed and simplest method of protein expression regulation is at the transcription level. This approach to control has heavily relied on the discovery and characterization of promoters that can be exploited as a means of regulation (Berens et al., 2015; McKeague et al., 2016). Selection of a promoter is crucial as they have a wide range of expression capabilities at varying magnitudes of transcription. Thus, choosing the correct one for optimal expression is important. Characterized natural promoters or engineered promoters, that have modified transcription capacity, can be used (Moses et al., 2017).

Natural proteins and enzymes are often engineered so that they are capable of meeting the requirements of the industrial process for which they are being employed. This is achieved by tailoring the protein’s properties to suit that of the designed biosynthetic pathway. This can involve increasing enzymatic activity and modifying the substrate or product specificity (Foo et al., 2012; Moses et al., 2017; Li and Cirino, 2014). Understanding the biological function and mechanism of a protein by precisely determining its three-dimensional structure is essential to protein engineering. Computational techniques allow the relationship between structure and function to be understood by providing the information to model and design a novel or modified enzyme from its primary amino acid sequence (Kelchtermans et al., 2014; Kingsley and Lill, 2015; Khan et al., 2016; Wei and Zou, 2016).

Protein engineering can be achieved via directed evolution through a series of randomized variants that are expressed and screened to identify the mutations that result in the desired characteristics of the protein. Repetitive cycles allow proteins to be redesigned and improved by mutagenesis and amplification of beneficial mutations (Cobb et al., 2013). Artificial protein scaffolding is a very

14 versatile method to bring together multiple proteins in a designed multienzyme pathway in order to combine individual reactions and allow metabolic channeling to occur. This ensures intermediates of a pathway follow the sequential cascade of enzymes to optimize production (Jorgensen et al., 2005; Bassard et al., 2017; Singleton et al., 2014).

Alongside the engineering of proteins, the ability to quantify and modify cellular activity in response to metabolite production is also critical in synthetic biology. This is achieved by using biosensors. Biosensors are ubiquitous detectors of both intracellular and extracellular signals, such as small molecules, ions or physical parameters. They respond to such stimulation by modifying cellular activity at transcription, translation or protein activity level. This action can be applied by synthetic biology to regulate and optimize metabolic pathways. Natural biological regulators that act as molecular reporters in the presence of a specific ligand are also effective biosensors. They are particularly useful in the quantification of the desired metabolite in a host, whose concentration is often difficult to determine. Typically, they do so by linking the metabolite’s concentration to a colorimetric output, which allows screening for efficient strains (Moses et al., 2017; Zhang and Keasling, 2011; Paige et al., 2012; Zhang et al., 2012; Raman et al., 2014; Mehrotra, 2016; Rogers and Church, 2016; Hassani et al., 2017).

The development of the minimal genome has also been an important step in the progress of synthetic biology. The minimal genome consists of only the genes essential to maintaining cellular life (Glass et al., 2006). By using a gene deletion approach, it has been possible to create smaller and more stable bacterial genomes (Mizoguchi et al., 2007; Umenhoffer et al., 2010; Csorgo et al., 2012). This allows the removal of non-essential genes from bacterial chassis that have been accumulated by evolution over time to endure environmental changes. This is important in synthetic biology as some of these genes may be harmful to the industrial production of high value compounds, such as encoding enzymes that degrade desired products. Thus, a chassis with minimal genome is preferred for an industrial process (Fujio, 2007).

15 2. Structural biology

2.1 Principles of structural biology

Structural biology is the study of the structure and dynamics of biological macromolecules, how this relates to their function and the mechanism behind this function. This is achieved by incorporating principles of molecular biology, biochemistry and biophysics. By exploiting the inherent ability of a macromolecule to form ordered crystals and the diffraction of X rays when they interact with a molecule, data can be collected to model a biological macromolecule’s three-dimensional structure. This allows elucidation of a macromolecule’s mechanism in relation to its structure, which aids synthetic biology by providing information to design modified proteins in order to improve their efficiency or alter the substrate and product specificity. When extended to systems such as viruses, structural biology can be crucial to drug discovery and design (Smyth and Martin, 2000; Shi, 2014).

2.2 Crystallization

The first protein was crystallized over 150 years ago and, for the majority of the 19th and early 20th century, crystallization was a method of purification and demonstrating the purity of a protein. In the late 1930s, crystallized proteins would assume a new role with the advent of X ray crystallography. Initially, studies could only be conducted on proteins that are abundant in nature and readily purified in significant volumes. However, the expansion of genetic engineering in the 1980s and 1990s allowed previously unobtainable, but still biologically interesting, proteins to be studied. This revolutionized the field of structural biology as there was no longer a restriction on the proteins available to study. Further advances in recombinant expression, purification and crystal growth techniques, paired with advances in X ray sources, computing programs and graphics, has led to major progress in structural biology (McPherson and Gavira, 2014; Lieberman et al., 2013; Giege, 2013; Shi, 2014).

A crystal’s macroscopic structure is built up by a repeating unit cell in arranged in a periodic array. The crystal is catalogued by its space group and dimensions of the internal lattice structure. It is the very large number of repeating units in the lattice and their predictable arrangement that gives a crystal its diffracting power (Lieberman et al., 2013). Macromolecule crystallization can be performed on proteins, nucleic acids and assemblies such as viruses and ribosomes. There is no comprehensive theory behind the phenomena or the diverse set of variables that impacts it. The lack of understanding is due, in part, to the vast range and complexity of macromolecules, with even small proteins containing thousands of atoms, bonds and degrees of movement. The process of crystallization is fundamentally trial and error screening of a wide range of independent parameters until a set is found that is capable of inducing crystallization. This set often requires subsequent optimization in order to yield crystals of sufficient quality to conduct X ray crystallography (McPherson and Gavira, 2014; Smyth and Martin, 2000).

16 On average, a macromolecule crystal will consist of 50% solvent and 50% macromolecule. Due to this, they are considered an ordered gel that is permeated by extensive interstitial spaces that allow solvent and small molecules to diffuse freely throughout the crystal. Macromolecules have evolved to be compatible to the aqueous chemistry of living organisms within narrow pH ranges and severe deviations from these conditions are not tolerated. Thus, macromolecule crystals must be grown in aqueous solutions of tolerable conditions, referred to as the mother liquor (McPherson and Gavira, 2014; Lieberman et al., 2013).

Conventional crystallized molecules, such as salts or small molecules, are characterized by a highly ordered firm lattice that gives hard, brittle crystals that are easy to manipulate and can be exposed to air. They also have strong optical properties and intensely diffract X rays. This differs greatly to macromolecule crystals that are: much more limited in size; very soft; crushed easily; disintegrate if dehydrated; and temperature dependent. Their optical properties are also much weaker and can require X ray diffraction from multiple crystals for the structure to be defined. Prolonged exposure to X rays causes extensive damage that also results in multiple crystals being required. This difference is due to macromolecule crystals having fewer contacts, such as hydrogen bonds, salt bridges or hydrophobic interactions, per molecule in comparison to salt or small molecule crystals. These contacts provide the lattice that is essential to crystal formation (McPherson and Gavira, 2014; Lieberman et al., 2013).

The extent of the X ray diffraction pattern that can be collected from a crystal is directly correlated to its level of internal order; the more structurally uniform the molecules in the crystal, the greater the resolution of the pattern. The primary factor causing poor diffraction of macromolecule crystals is the extensive solvent filled channels and cavities that permeate the crystal, which cause large spaces between adjacent molecules that leads to poor lattice forces. However, this extensive solvent permeation gives great biochemical value to macromolecule crystals, especially in the case of proteins, as they maintain their native conformations by being surrounded by water and preserve their biochemical features such as ligand binding. This means that biochemically significant compounds, such as ions, ligands, substrates, coenzymes and inhibitors, can diffuse freely throughout the crystal and interact with the macromolecule (McPherson and Gavira, 2014).

Macromolecule crystallization is a complicated process with multiple factors contributing to its complexity. A major factor is the multiple distinct solid states that a macromolecule can form such as amorphous precipitants, oils, gels and crystals. Another important factor is that crystal initiation, called nucleation, occurs at very high supersaturation whereas crystal growth occurs at lower supersaturation, both of these steps are much slower than in conventional crystals. Supersaturation occurs when the macromolecule is present at a concentration greater than its solubility limit; under specific chemical and physical conditions a solid state will develop to re- equilibrate the solubility limit. If the solid state develops too quickly a precipitate forms, but if it occurs at a slower rate, in correct conditions, crystals will grow. To overcome the complexity of

17 crystallization, screening and optimization of the contributing parameters must be performed. There are two approaches to this: one is systematic variation of parameters; the other, termed ‘shotgun’, uses a wide range of crystallization screening kits that contain a variety of parameters (McPherson and Gavira, 2014; Smyth and Martin, 2000).

Crystal growth occurs in two stages, the first is nucleation which is the most difficult to explain theoretically and experimentally. This is because it is the first order phase transition from wholly disordered to ordered state; it is presumed that this occurs via a paracrystalline intermediate. The growth of crystals is better understood as growing by classical mechanisms known as dislocation growth and growth by two-dimensional nucleation. The occurrence, extent and kinetics of nucleation and growth ultimately depends on the level of supersaturation in the mother liquor. In order to produce a supersaturated solution from an undersaturated one, the properties of the medium must be altered to reduce its ability to solubilize a macromolecule. Alternatively, this can be achieved by altering a macromolecule’s solubility or increasing their affinity for each other. Various approaches can be taken to modify the solubility limit. These include: changing the pH or adding bridge ions to alter the ionization state of surface amino acids, which alters the degree of attraction between protein molecules; and the addition of salts or polymers to modify the proteins interaction with the solvent (McPherson and Gavira, 2014; Giege, 2013).

There is a very wide range of crystallization agents that are designated into three broad categories: salts; organic solvents; and long chain polymers. A sufficient salt concentration causes the macromolecules to be deprived of solvent, thus, promoting interaction between one another to satisfy their electrostatic requirements. This phenomenon, known as salting out, can cause the formation of crystals or amorphous precipitate. Due to the structural complexity and polyvalency of proteins, it is difficult to predict the concentration of salt required for crystallization. Generally, it is predicted to be just under the amorphous precipitation concentration. Organic solvents reduce the dielectric of the medium, therefore increasing the macromolecule’s attraction to each other. Polymers, such as polyethylene glycols, cause volume exclusion due to their lack of a consistent structure. This leads to them occupying more space, reducing the available volume of solvent and causing the macromolecule’s segregation, aggregation and transition to the solid state. A number of other additives can be included to aid crystallization. These can include biochemically or physiologically relevant small molecules ranging from substrates and inhibitors to coenzymes. These additives can help to stabilize the protein or provide cross-linking that improves the crystal growth. Other factors can also impact crystal growth such as the temperature or, in the case of membrane proteins, the addition of a detergent (McPherson and Gavira, 2014; Giege, 2013; Shi, 2014).

Seeding is a technique that can be used to induce nucleation and allow more controlled growth of crystals. It involves the introduction of homogeneous or heterogeneous solids to the supersaturated mother liquor. Homogeneous nucleation occurs when macro or microcrystals of the

18 macromolecule are used to seed. It is important that optimal amounts of microcrystals and washed macrocrystals are used to ensure that too many nuclei do not form and further microcrystals are avoided. Heterogeneous nucleation uses surfaces such as fibres, mineral faces and highly concentrated polyethylene microdroplets to induce crystal growth (McPherson and Gavira, 2014; Giege, 2013).

It can be considered that the most important factor in crystallization is the macromolecule itself as the purity and characteristics of the protein greatly impacts the chance of crystallization. Typically, proteins that are glycosylated, contain a flexible region or are less conformationally restrained are more difficult to crystallize. In general, the more stable the protein, such as those from extremophile organisms, the easier they are to crystallize. Membrane bound proteins are difficult to study, partly due to bottlenecks in their expression and purification, but also because they inherently have few hydrophilic or polar domains that are essential for crystal lattice formation. Very large complexes, like viruses, can crystallize relatively readily due to the large amount of symmetry that runs through their structure (Smyth and Martin, 2000; Lieberman et al., 2013). Often, if crystallization is unsuccessful, attempting to improve the homogeneity of the macromolecule is the most effective method of increasing the chance of crystallization. If further purification is not successful, then modification of a protein’s structure is a promising route to improving crystal yield. Modifications can be achieved by: single or multiple point mutations; truncation of the polypeptide; or amino acid modification by chemical reaction or exposure to a modifying enzyme. Truncated proteins are often much more easily crystallized than full length ones and modifying amino acids can remove some of the entropic costs involved in crystallisation (McPherson and Gavira, 2014; Giege, 2013).

The main early crystallization method was microbatch, which involved adding a precipitating agent to a relatively large volume of protein. The development of the versatile vapour diffusion methods, most commonly sitting drop and hanging drop, allowed for smaller amounts of protein to be used in wide range screening of parameters. Vapour diffusion involves a small volume of concentrated protein being added to a precipitant next to a reservoir containing a higher concentration of said precipitant. A diagram of a sealed sitting drop well is shown in Figure 1.1, depicting the small protein drop and the reservoir that allows equilibration takes place. The drop dehydrates to reach equilibrium with the reservoir leading to nucleation and crystal growth. Commercial vapour diffusion screens that contain cocktails of various crystallization parameters are available. If only microcrystals or precipitant are yielded, then the parameters of that screen must be optimized so that large high quality macrocrystals are obtained. Optimization involves: altering the precipitant type or concentration; pH of cocktail; salt type or concentration; additional additives; protein concentration; drop’s geometry; and temperature of incubation. Optimization is required to improve lattice structure and therefore diffraction limit (Lieberman et al., 2013; Giege, 2013).

19

Figure 1.1 Diagram of a well from a sitting drop vapour diffusion crystallization screen – Depicts a well from a VDX or linbro style plate, the image shows the bridge where the crystallization droplet that contains the protein of interest is loaded, this droplet equilibrates with the reservoir solution to reach supersaturation. Figure is taken from (McPherson and Gavira, 2014).

2.3 X ray crystallography

The ultimate aim of X ray crystallography is to obtain a three-dimensional molecular structure of a macromolecule from its crystal. Exposure of the crystal to an X ray beam produces a diffraction pattern. Initially, information on the crystals packing symmetry and size of the repeating unit is determined by processing the data from the pattern. The arrangement and intensities of spots in the diffraction pattern are then used to determine the structure factors to calculate an electron density map. Optimization of the map is conducted until it is of sufficient quality to build a molecular structure using the primary sequence of the protein. The modeled structure is refined to fit the map and correspond to the most thermodynamically favoured conformation (Smyth and Martin, 2000).

The fundamentals of X ray crystallography are based on the principles of light and the electromagnetic spectrum, including: the wave and particle duality; diffraction; and the complex interaction of radiation with electrons in matter. X rays are high-energy photons with a wavelength between 0.1Å and 100Å and energy in the range of 0.12keV to 12keV. X ray diffraction follows the same principle as light diffraction, where the beam is diffracted as it passes through a slit the same order as the wavelength. X ray crystallography uses hard X rays that have energy of 5keV to 10keV and a wavelength of approximately 1Å, which on average is the same distance of most covalent bonds in a macromolecule. The X rays are scattered when their electric field vector interacts with the electrons of matter. This scattering from a single molecule is very weak, but when the molecules are arranged in a periodic crystal lattice the signal is amplified (Lieberman et al., 2013).

20 The X ray source can be in house laboratory generators, produced by electrons striking a copper anode, that are filtered to produce monochromatic X rays. Alternatively, the source can be from a synchrotron that has tunable wavelengths and a higher flux to give high resolution data collection. A synchrotron is able to produce extremely intense X rays which, when coupled with high quality optics to detect the diffraction pattern, allows a shorter exposure time. This also results in higher signal to noise ratio. Originally X ray films were used to collect the X ray diffraction images, after which there was the advancement to using imaging plates that were multiple times more sensitive. There is now a progression to using charge coupled device detectors that allow data collection within seconds, rather than minutes (Lieberman et al., 2013; Smyth and Martin, 2000; Shi, 2014).

Typically, a macromolecule crystal of 50 μm to 300 μm in every dimension is suitable to use for structure determination. A crystal is harvested into a nylon loop and placed in liquid nitrogen for cryocooling, and remains in liquid nitrogen for the duration of data collection. Cryocooling improves the resolution and quality of data that are collected by reducing the extent of radiation damage on the crystal used. Additionally, it reduces the thermal vibrations and conformational changes occurring in the protein. The superposition of diffracted X rays is described by Bragg’s law. The position and intensities of the diffracted waves are recorded on a detector and appear as spots in a diffraction pattern. The exposure distance between the crystal and detector can be calculated and adjusted to allow the collection of a diffraction pattern that gives the greatest resolution, often between 1.5Å to 3.0Å. The resolution of spots increases as the diffraction angle increases, which means that the highest resolution is obtained at the edge of the detector. A complete X ray diffraction data set contains precise measurements of all the possible reflections, the number of which, at a given resolution, are defined by the space group. The data collected are simplified using a reciprocal lattice for indexing, scaling and phasing. A complete data set can be collected from a single cryocooled crystal, maintained at 100K by liquid nitrogen, which is mounted on a goniometer centered in an X ray beam and rotated in small increments up to 360°. The diffraction pattern is taken at each increment until a full data set is collected and the space group determines the minimum rotation required to do so (Lieberman et al., 2013; Smyth and Martin, 2000).

Once the diffraction pattern is collected, the initial step is to confirm that it is of sufficient resolution in order to determine the macromolecule’s structure to near atomic detail. This can be done visually by verifying that an ordered array of spots has been detected towards the edge of the diffraction image. Algorithms of imaging programs determine the resolution of a particular spot in the diffraction pattern; a resolution of 3Å is sufficient to identify the side chains of amino acids in an electron density map (Smyth and Martin, 2000).

Computer programs are available that select and record the intensities of each reflection. This allows identification of the space group and unit cell dimensions that are expressed as three lengths - a, b and c- with three angles - α, β and γ. The spot spacing on a diffraction pattern is determined by the dimensions of the unit cell; the larger the cell the more spots per unit of

21 diffraction pattern. The intensity of the reflections provides mathematical predictions of each atom’s unique position in the macromolecule. However, the detector can only record the intensities and not the phase of the X rays, as they cannot be refocused once diffracted. Phase information and Fourier synthesis of intensities compose the structure factor, which gives an interpretable electron density map for building a model of the macromolecule. Thus, the phase problem must be overcome either by experimental or computational methods in order to solve the structure of a macromolecule (Lieberman et al., 2013; Smyth and Martin, 2000).

There are several factors that impact the quantity of data required and the technique of data collection that needs to be employed for structure determination. The amount of symmetry that exists within a crystal system and space group, known as crystallographic symmetry, effects the amount of rotation of the crystal in the X ray beam required. A highly symmetrical system only needs diffraction data to be collected through as little as 35°, while a less symmetrical system will require much more rotation. Another factor is the non-crystallographic symmetry, which is the level of symmetry within the asymmetric unit; described as the particles in a unit cell that are related by symmetry operations. A virus made up of many identical subunits has high levels of non- crystallographic symmetry and an incomplete data set can be used in this case to produce a high quality structure. A monomeric protein can exist with no non-crystallographic symmetry, therefore, it will require a more complete data set to model the structure. If there is a sufficiently similar previously solved structure, then molecular replacement can be employed as a starting model and fill in any gaps in the data. If a structure is not available, then additional data sets from heavy atom derivatives must be collected. The amount of diffraction data collected is limited by the resolution quality, which is ultimately defined by the quality of the crystal itself (Smyth and Martin, 2000).

2.4 Macromolecule structure solving

Processing of diffraction data takes multiple steps in order to reach an electron density map that can be used to model a macromolecule’s structure. The initial step is to define the unit cell dimensions and crystal system as accurately as possible. Also, the orientation of the crystal in the X ray beam must be determined. Once these factors have been defined, the next step is to assign an index to each spot on the diffraction image. Computer programs perform auto indexing on the diffraction image spots by predicting the diffraction pattern from the unit cell’s dimensions and orientation, then attempt to fit the recorded pattern to the predicted one. Next, the intensity of each spot of the diffraction pattern is measured, which varies between each spot depending on the amplitude and phase relation of the diffracted X rays. A scale factor is allocated to the intensities of every image in a data set so that the intensities from different images can be compared (Smyth and Martin, 2000).

22 Several strategies can be employed to obtain the phase information to solve the crystal structure; the three main methods are molecular replacement, multiple isomorphous replacement and anomalous dispersion. Molecular replacement is a computational technique; whereas both multiple isomorphous replacement and anomalous dispersion are experimental and identify a number of atoms real positions in a lattice (Lieberman et al., 2013).

Molecular replacement involves searching for the model of a known protein crystal that is predicted to be similar in structure to the protein of interest; this model is used to obtain the phase information. The major variable affecting the success of molecular replacement is the search model. Most often the structures are chosen if they come from the same family, fulfill a similar function, have a similar three-dimensional structure or share more than 30% identity with the protein of interest. The main drawback of this method is the bias towards the search model. In order to reduce this bias, the loop regions and side chains of the search model are deleted. The model is then placed to best match the observed data by rotation and translation, the similarity of the models is then evaluated by the correlation coefficient (Lieberman et al., 2013; Smyth and Martin, 2000).

Multiple isomorphous replacement uses the ions of heavy metals such as lead, platinum or mercury, which are introduced to the crystals by soaking or cocrystallization. These ions have more electrons than the light carbon, nitrogen and oxygen atoms that are most common in macromolecules, which causes them to contribute to intensities of diffracted X rays. This technique requires two or more isomorphous crystals, one native and the other, or others, containing heavy ions, called the derivatives. Comparing the differences in the data sets caused solely by the introduced heavy metal ions allows their positions in the protein to be determined, giving a starting point to estimating the phase (Lieberman et al., 2013; Smyth and Martin, 2000).

Anomalous dispersion also identifies the coordinates of metal ions in real space by utilizing specific ions that eject a core electron when bombarded by an X ray beam of a specific energy. These ions are known as anomalous scatterers. A number of metal ions can be used in both single and multiple wavelength anomalous dispersion; these include: bromine; gold; platinum; lead; selenium; zinc; copper; iron; manganese; nickel; and sulphur. If the macromolecule being studied is not a metalloprotein then these ions can be introduced by soaking, cocrystallization, recombinant expression or in the case of RNA and DNA structures by nucleic acid synthesis. Crystals ‘doped’ with metal ions are exposed to an X ray beam at an energy tuned to the absorption edge of the anomalous scatterer, which results in symmetrically related reflections, called Bijvoet pairs, having differing intensities. In single wavelength anomalous dispersion the differences in the Bijvoet pairs intensities at peak anomalous dispersion are used to gain an estimate of the phase information for the structure. In multiple wavelength anomalous dispersion the data are collected at both just above the absorption edge and at the inflection or remote energy, the comparison between the data solves the phase problem (Lieberman et al., 2013).

23 Multiple isomorphous replacement and anomalous dispersion are less biased methods than molecular replacement, and they are the only option when there is no suitable search model available. Anomalous dispersion has the advantage over multiple isomorphous replacement as it only requires one crystal, removing the issue of isomorphism. However, anomalous dispersion has the disadvantage of requiring a synchrotron as X ray source whereas multiple isomorphous replacement can be conducted with an in house X ray generator (Lieberman et al., 2013; Smyth and Martin, 2000).

The determination of the real space coordinates of select groups of atoms by various techniques allows the estimation of phases, when this is combined with the experimental intensities recorded an atomic density map can be produced. If the phase estimate is accurate and the recorded data are of sufficient quality an interpretable electron density map can be modeled, and the full structure, or at least major elements, of the macromolecule can be determined. Typically the initial quality of the map is poor and a full model is difficult to build. However, repeated attempts to do so identifies the positions of a number of atoms within the map. This improves the accuracy of phases producing a more interpretable electron density map for model building. A number of computer protocols can improve the phases and thus map quality can be improved by considering solvent boundaries and averaging non-crystallographic symmetry. During modeling two electron density maps are used, each differing in their weighting of observed data to calculated data, referred to as

FO and FC respectively. The 2FO-FC map is used to guide the model building and the FO-FC map is used to identify any mistakes in the model. Repetitive computational refinement of each residue in the model further improves phases, easing the process of model building. Once a model has been optimized the surrounding water molecules density may be apparent and included in the model. Larger ions, such as phosphates, may also be identified as bound to the macromolecule. This identification is more difficult and the likelihood of bound ions must be predicted from the known components of the mother liquor and the macromolecule’s native interaction with ions (Lieberman et al., 2013).

24 3. Cannabinoids

Cannabinoids are a unique group of secondary metabolites that are only found in the species Cannabis sativa (C. sativa), which consist of alkylresorcinol and monoterpene groups. Over 70 different cannabinoids have been isolated from C. sativa, the two present at the highest levels are tetrahydrocannabinolic acid (THCA) and cannabidiolic acid (CBDA). THCA is the major constituent of drug-type C. sativa and CBDA is the major constituent of fiber-type C. sativa (Shoyama et al., 2012; Taura et al., 2007). The structures of these two cannabinoids are shown in figure 1.2.

Figure 1.2 The chemical structures of THCA and CBDA - The skeletal structures of the two major cannabinoid constituents of C. sativa THCA and CBDA. Figure adapted from Fellermeier et al., 2001.

3.1 The role of cannabinoids in plants

Multicellular organisms require systems to induce cell death so that they are capable of eliminating damaged, superfluous or ectopic cells. In higher plants this induced cell death is involved in a number of physiologically important events such as root cap elimination. There are two types of cell death, both of which are ultimately controlled by mitochondria and their membrane permeability. Apoptosis, considered the controlled method of cell death, is caused by the release of apoptotic proteins, such as cytochrome c, from mitochondria that in turn activate executing enzymes, which includes caspases and nucleases. Necrosis, the less controlled method of cell death, is induced by a drop in ATP production due to severe mitochondrial dysfunction. Mitochondrial permeability

25 transition (MPT) can be considered the most important event in both apoptosis and necrosis; it is regulated by the opening of the Ca2+ dependent channels known as MPT pores. The opening of these pores induces loss of mitochondrial membrane potential, causing mitochondrial swelling and disruption of the outer membrane. This results in the release of apoptotic proteins and mitochondria dysfunction (Morimoto et al., 2007).

Cannabinoids are stored in the capitate-sessile trichome glands on the leaves of the plant species C. sativa. The cannabinoid resin that is secreted from the trichomes has shown to induce cell death in leaves that are exposed to it. The cannabinoids THCA and cannabichromic acid (CBCA) have been identified as two unique endogenous necrotic cell death mediators present in the resin, capable of inducing cell death in both the leaves and cell suspensions of C. sativa. THCA and 2+ CBCA mediate the opening of MPT pores independently of Ca , H2O2 and any other cytosolic factors. This leads to swelling of the mitochondria and release of the mitochondrial proteins cytochrome c and nuclease, resulting in the irreversible loss of mitochondria membrane potential. Mitochondrial damage can be confirmed by observing decreased ATP levels in suspended cells that are treated with cannabinoids. This cell death response can also be suppressed by pretreatment with a MPT inhibitor such as cyclosporin A, further confirming the role of mitochondria permeability. The cell death induced by cannabinoids is considered necrotic as it occurs via mitochondria dysfunction and the DNA degradation is performed by the released nucleases independently of the apoptotic pathway caspases. Additionally, the inhibition of DNA degradation only slightly blocks cell death induced by cannabinoids. The cannabinoid resin is secreted from trichome glands in the specific regions where cell death is required to occur. Typically in intact cells, cell death mediating molecules are maintained at very low or undetectable levels, and a death stimulus or signal will induce their accumulation in much larger levels via activation of their synthesis pathway. However, despite their high toxicity, both THCA and CBCA are biosynthesized at high levels in young leaves and accumulate in the trichome glands. They are then secreted to mediate cell death, which is a system that has not been reported in other organisms (Morimoto et al., 2007).

Cannabinoids are also capable of inducing cell death in insect cells, meaning that as well as controlling cell death during plant development, cannabinoids may also act as a protective measure for C. sativa, which is an activity shared by many secondary metabolites. This explains why they would be maintained at relatively high levels in intact cells and why cannabinoids are commonly produced in physically fragile young tissues, in order to protect them from predators. Hydrogen peroxide is produced as a by-product of cannabinoid synthesis and is stored in the trichomes to further contribute to the self-defence system of C. sativa. The wide range of cannabinoids with differing structures suggests many may play different physiological roles that have not been elucidated yet (Sirikantaramas et al., 2005).

26 3.2 Endocannabinoid system

The endocannabinoid system in mammals exists as cannabinoid receptor 1 (CB1) and cannabinoid receptor 2 (CB2), and a number of various endocannabinoid ligands. Both receptors are most commonly coupled to G proteins through which they are able to alter the activity of adenylate cyclases, mainly by inhibition, and mitogen activated protein kinases by stimulation. CB1 receptors 2+ + also inhibit voltage activated Ca channels and stimulate inwardly rectifying K channels. CB1 is the most common G coupled receptor present in the central nervous system (CNS). Its highest density is in the basal ganglia, cerebellum, hippocampus and cortex. Furthermore, CB1 is present in the peripheral nervous system (PNS) and several peripheral organs. In contrast, CB2 is almost exclusively present in immune cells and tissues; primarily it is expressed by leukocytes and is thought to potentially function in haemopoetic development. CB2 alters chemical messaging such as cytokine release from immune cells, and may regulate the migration of immune cells in and out of the CNS (Di Marzo et al., 2004; Baker et al., 2003; Pertwee, 2008).

Endocannabinoids are derivatives of long chained polyunsaturated fatty acids that include amides, esters and ethers. Endocannabinoids are local mediators of both the autocrine and paracrine systems. They have been shown to contribute to the control of cell metabolism, differentiation, proliferation and death. The two most studied endocannabinoids are anandamide and 2- arachidonoylglycerol (2-AG) whose structures are shown in Figure 1.3. Both are produced by hydrolysis of precursors derived from remodeled phospholipids. Anandamide is produced by hydrolysis of the phosphodiester bond of N-arachidonoylphosphatidylethanomine, a minor component of animal membranes, by the enzyme N-arachidonoylphosphatidylethanomine phospholipase D. 2-AG is produced by the hydrolysis of sn-1-acyl-2-arachidonoyl-glycerols by sn- 1-selective–diacylglycerol lipases. Endocannabinoids are synthesized and released on demand from a cell when and where they are required. Once they are released they are rapidly deactivated and taken up by cells, suggested to be by the endocannabinoid membrane transporter (EMT) (Di Marzo et al., 2004; Baker et al., 2003).

27

Figure 1.3 Chemical structure of anandamide and 2-AG – The skeletal structures of the two most studied endocannabinoids present in mammalian species. Figure adapted from Di Marzo et al., 2004.

CB1 receptors have seven transmembrane domains and are preferentially distributed to presynaptic neurons where they are coupled to the inhibition of voltage gated Ca2+ channels and activation of K+ channels. Post synaptic neurons synthesise membrane bound endocannabinoid precursors which are cleaved when cytosolic Ca2+ increases, for example after neurotransmitter binding, releasing endocannabinoids into the synaptic space where they can act as a retrograde messenger 2+ + by binding presynaptic CB1 inhibiting Ca channels and activating K channels. Thus, by reducing membrane depolarization and inhibiting neurotransmitter release, the endocannabinoids must then be deactivated and up taken, most likely by the suggested EMT. This retrograde signaling of endocannabinoids is shown in Figure 1.4 (Guzman, 2003; Baker et al., 2003; Pertwee, 2008; Grotenhermen and Muller-Vahl, 2012).

28 Figure 1.4 Diagram of the retrograde messaging pathway of endocannabinoids in mammalian synapses - Depicts the release of neurotransmitter (NT) into the synaptic space, which binds the post synaptic neuron inducing an increase in intracellular Ca2+, resulting in endocannabinoid precursor cleavage and release of anandamide (AEA) and 2-arachidonoylglycerol (2-AG). These endocannabinoids bind the CB1 receptor, inhibiting Ca2+ channels and activating K+ channels, resulting in reduced NT release. The endocannabinoids are up taken by the endocannabinoid membrane transporter (T) and degraded by a fatty acid amide hydroxylase (FAAH). Figure is taken from (Guzman, 2003).

The distribution of CB1 receptors through the CNS and PNS, in both excitatory and inhibitory circuits, allows endocannabinoids to have a wide range of neuromodulatory actions in both the sensory and autonomic nervous systems that include regulation of pain perception, cardiovascular, gastrointestinal and respiratory functions. The interaction of endocannabinoids and CB1 receptors also modulates the release of hypothalamic hormones and peptides, contributing their modulatory effect on food intake and feedback interaction with endocrine glands. CB2 receptors have been implicated in the mediation of inflammation and chronic pain, particularly humoral immune responses due to their distribution being mainly restricted to blood cells and immune tissues (Di Marzo et al., 2004; Baker et al., 2003; Pertwee, 2008).

The endocannabinoid system represents a major therapeutic pathway for many conditions due to the wide distribution of the receptors. These therapeutic opportunities can come from agonists, antagonists and reverse agonists to the CB1 and CB2 receptors, or by inhibiting endocannabinoid biosynthesis or degradation (Di Marzo et al., 2004).

29 3.3 The therapeutic potential of cannabinoids

Cannabis, alongside its recreational use, has been used throughout history for its medicinal properties. There is evidence of its use in China from 2600 BC and in Europe from the 13th century, where it was often used to treat cramps and became popularized in the 19th century as an anticonvulsive, analgesic and antiemetic. The availability of alternative treatments, paired with increasing sociopolitical pressure, led to the decline in the medical use of cannabis in the 20th century. Despite this, illegal self-medication with cannabis has given much anecdotal evidence of its benefits to a number of disorders. The acute effects of cannabis have been described as psychoactive, mildly euphoric, relaxing intoxication that leads to changing psychomotor and cognitive function with limited cases reporting unpleasant side effects of anxiety, panic and paranoia. Physiological effects have been described as increased heart rate, reduced blood pressure, increased appetite, dry mouth and dizziness (Baker et al., 2003; Grotenhermen and Muller-Vahl, 2012).

Tetrahydrocannabinol (THC) is the main psychoactive constituent and considered the most important cannabinoid due to its high abundance in C. sativa and high potency. The wide variety of biological activities of THC, that contribute to its therapeutic effects, are exerted by mimicking endogenous ligands to the cannabinoid receptors. Cannabidiol (CBD) is also produced in large amounts in C. sativa but does not exert such potent cannabimetic effects as it does not interact with the cannabinoid receptors (Guzman, 2003).

The majority of effects that cannabis induces are due to the agonistic activity of THC on various cannabinoid receptors, though some can be attributed to its interaction with other receptors such as its antagonistic effect on the serotoninergic 5-hydroxytryptamine (HT)3 receptor that alleviates nausea and vomiting (Grotenhermen and Muller-Vahl, 2012). Some of the activity of cannabis is due to other cannabinoids such as CBD, which can provide antiemetic, neuroprotective, anti- inflammatory and anti-anxiety activity. These effects come from the complex relationship CBD has with CB1 antagonism, stimulation of the vanilloid-1 receptor (VR1), inhibition of anandamide degradation and activation of the nuclear transport receptor PPAR-gamma. CBD does not interact with either the CB1 or CB2 receptors, but rather interacts with other members of the endocannabinoid system such as the fatty acid amide , which degrades anandamide, and the anandamide membrane transporter. This increases the level of endocannabinoids that exert anti-inflammatory and neuroprotective actions. CBD also shows molecular interaction with the human VR1 receptor, whose stimulation can lead to rapid desensitization resulting in analgesic and anti-inflammatory mediators being released. CBD has similar anticonvulsive and antiarthritic effects as capsaicin, an agonist to VR1, which suggests that it is active via this similar mechanism (Bisogno et al., 2001). The multiple actions of several different cannabinoids adds to their therapeutic potential as their anti-spastic, analgesic, antiemetic, neuroprotective and anti- inflammatory activities come from a variety of mechanisms (Grotenhermen and Muller-Vahl, 2012).

30 Oral administration of cannabinoids leads to variable and slow release into plasma due to their sequestration into fat. The metabolism of cannabinoids in the liver also causes variation in the circulating concentration. This has led to the development of more effective administration methods that allow controlled titration of the plasma concentration, such as inhalers. Development of non- psychotropic cannabinoids has also been a focus. This can be achieved by developing cannabinoids that do not cross the blood brain barrier when interacting with peripheral CB1 receptors, limiting their adverse psychoactive side effects while maintaining the therapeutic capabilities on peripheral CB1 receptors. Alternatively, this can be achieved by the development of cannabinoids that only interact with CB2 receptors and therefore do not have psychoactive properties. Another approach being taken is the production of cannabinoids that have short half- lives so they can be administered peripherally and will break down before they are able to interact with CNS receptors (Baker et al., 2003; Di Marzo et al., 2004; Guzman, 2003).

The drug safety profile of cannabinoids is very favourable as their extrapolated median lethal dose is several grams of THC per kilogram of body weight. The toxicity of CDB has not been established either but can also be estimated to multiple grams per kilogram. No acute fatal cases have been directly attributed to the use of cannabinoids or cannabis. Cannabinoids are predominantly metabolized in the liver by cytochrome P450 isoenzymes, thus, there may be a risk of them interacting with other drugs that are metabolized in a similar pathway. Cannabis smoking has been shown to reduce the concentration of antipsychotic medications in plasma, however, it does not impact any antiviral drugs or cytostatic drugs that are used in AIDS and cancer treatment respectively, giving them a high potential for use in tandem with conventional treatments for AIDS and cancer. Cannabinoids can interact a number of substances and enhance their effects when they share an effecter system, such as increasing the sensation of tiredness when combined with alcohol (Grotenhermen and Muller-Vahl, 2012; Guzman, 2003; Mechoulam et al., 2002).

The main adverse side effects of cannabinoid treatments are associated with the ‘high’ of recreational cannabis use, the acute effects of which are a relaxing sensation, euphoria, heightened sensory perception, impairment to memory, distorted perception of time, reduced psychomotor and cognitive function, which can give way to anxiety, dysphoria and panic. There are also physical side effects of tiredness, dizziness, tachycardia, orthostatic hypotension, dry mouth, reduced lacrimation, muscle relaxation and increased appetite. Research into extended use is lacking, but is considered a risk to adolescents and may contribute to schizophrenia in vulnerable people. Tolerance to these side effects does accumulate over extended usage as repeated administration of cannabinoids reduces the density of CB1 receptors and their coupling efficiency. This widens the therapeutic window of cannabinoids as the therapeutic effects are more resistant to tolerance. In some cases of heavy users, withdrawal symptoms can be observed (Grotenhermen and Muller-Vahl, 2012; Pertwee, 2008).

31 The most common use of cannabinoids is the treatment of nausea and vomiting caused by cytostatic drugs used during chemotherapy, which occurs by interaction with the CB1 receptors present in cholinergic nerve terminals of the digestive tract. Cannabinoid activation of these CB1 receptors causes inhibition of motility in the digestive tract, reducing the sensation of nausea. CB1 receptors are also present in the region of the brainstem that controls vomiting, so cannabinoid treatment also mediates this reflex. Though THC is the predominant source of the antiemetic properties of cannabis, CBD has also shown signs of contributing to this activity and lacks the psychoactive properties (Grotenhermen and Muller-Vahl, 2012; Guzman, 2003).

A typical effect of cannabinoids is the stimulation of appetite; in HIV and AIDS patients this becomes a very effective treatment of anorexia and cachexia. Advanced cancer patients also experience a lack of appetite that results in anorexia and cachexia, considered the most troublesome symptoms in the morbidity of the disease and often the cause of mortality. THC, and to a lesser extent, other cannabinoids stimulate appetite, this can be achieved at very low doses which removes the psychoactive effects of the cannabinoids. Appetite stimulation is induced by

CB1 receptors present in the hypothalamus region of the brain, which regulates food intake. Experimental evidence has shown that the main anorexia hormone leptin causes a reduction in the hypothalamic endocannabinoid level, giving evidence that the level of CB1 activation is important in mediating anorexia (Grotenhermen and Muller-Vahl, 2012; Guzman, 2003).

Cannabinoids can inhibit pain via both the CB1 and CB2 receptors. There are high concentrations of CB1 receptors on primary afferent nociceptors that allow interference with the nociceptive pathway and reduction in pain sensory. There is evidence that cannabinoids inhibit pain by activating CB1 receptors in the brain, spinal cord and nerve terminals. The peripheral CB2 receptors control inflammatory pain by regulating the release of pain and inflammation mediators. Thus, cannabinoids act as analgesics via two distinct mechanisms that involve interaction with both cannabinoid receptors. Animal models show that cannabinoids inhibit acute, chronic and spontaneous pain. Cannabinoid treatment can be combined with other pain medication such as opioids or benzodiazepines to reduce the side effects and form a very effective pain treatment that produces a stronger and longer lasting analgesic effect. Cannabinoids analgesic properties make them an effective treatment of the chronic pain symptoms, particularly neuropathic pain, associated with HIV and multiple sclerosis. Cancer patients’ quality of life is also severely diminished by the pain caused by the disease, which often increases in the advanced stages. Cannabinoids represent an effective treatment of managing this pain (Grotenhermen and Muller-Vahl, 2012; Guzman, 2003; Baker et al., 2003).

Cannabinoid treatment of multiple sclerosis (MS) patients has shown to significantly reduce spasticity and the frequency of spasms, by doing so this greatly increased the quality of sleep patients’ experience. MS is a demyelinating disease of the CNS, which leads to abnormal neurotransmission resulting in severe symptoms, typically muscle spasms, weakness and stiffness

32 that results in problems with mobility. So far conventional treatments have had limited success, but there are large numbers of cases of self-medication with cannabis that exhibit promising results.

The activation of CB1 limits the pathological symptoms that are a result of inappropriate neuronal signaling. The therapeutic capability of cannabinoids has been confirmed by experimental autoimmune encephalomyelitis models of MS which showed the endocannabinoid system exhibits tonic control of spasticity. The drawbacks of this treatment are that the interactions of THC and CB1 are also responsible for the psychoactive side effects, which can be removed if the interaction is specialized to only peripheral CB1 receptors (Grotenhermen and Muller-Vahl, 2012; Pryce et al., 2014).

The management of pathological symptoms of HIV, multiple sclerosis and cancer has been the main medical use of cannabinoids, either self-medicated or as prescribed drugs. Cannabinoids have been effective in treating the pain, spasticity, nausea and loss of appetite associated with these disorders. Recently research has been conducted into other areas where cannabinoids may represent a promising treatment, or an alternative therapy to less effective conventional treatments.

Cannabinoids have shown the capability to slow the progression of neurodegenerative disorders, so may represent a viable treatment of neurological disorders. Neurodegeneration is the main cause of morbidity in a number of conditions including Huntington’s, Parkinson’s, stroke, motor neurone disease and MS. The pathway to neuron death may vary between different disorders but some similarities can exist, such as glutamate induced excitotoxicity, reactive oxygen species damage and toxic ion imbalance. CB1 receptors are able to inhibit excessive glutamate production, Ca2+ influx and reactive oxygen species production as mechanisms of neuroprotection. The inherent antioxidant properties of THC and CBD also contribute to the neuroprotective qualities of these cannabinoids (Baker et al., 2003).

The selective loss of CB1 receptors in the striatum has been associated with the onset of Huntington’s disease, which occurs prior to significant axonal loss. This suggests endocannabinoid regulation is lost before the significant pathology of the disease sets in, giving the potential that activation of the remaining receptors with cannabinoids may limit the progression of the disease (Baker et al., 2003).

During Ischemic episodes associated with strokes large levels of the excitatory neurotransmitter glutamate are released. This results in neuronal death by overstimulation of the N-methyl-D- aspartate (NMDAr), 2-amino-3-(4-butyl-3-hydroxyisoxazol-5-yl)propionic acid (AMPA) and kainite type receptors, resulting in metabolic stress and increased intracellular calcium to toxic levels. Antioxidants or antagonists to the NMDAr, AMPA and kainite receptors can block this neurotoxicity. 2+ THC can prevent glutamate neurotoxicity via CB1 receptors that reduces the Ca influx by voltage- gated channels. THC and CBD also protect against glutamate neurotoxicity independently of the cannabinoid receptor. Instead they remove damaging reactive oxygen species, which occurs due

33 to their antioxidant characteristics. There is also a possibility that THC and CBD are antagonists to the NMDAr, AMPA and kainite receptors to reduce their overstimulation (Mechoulam et al., 2002; Hampson et al., 1998).

Cannabinoids have shown immunosuppressive and anti-inflammatory activity that makes them a promising treatment to conditions such as arthritis. Many of the immune system’s responses to microbes and tumours are also damaging to the host by causing inflammation that can damage cells and tissue. Prominent damaging agents produced by the immune system include intermolecular mediator cytokines such as tumour necrosis factor (TNF) that is involved in triggering cytokine cascades, reactive oxygen intermediates (ROI) produced by respiratory bursts from stimulated phagocytes as an antitumour and antimicrobial response, and nitric oxide (NO) production, where NO possesses antitumour and antibacterial properties, and mediates inflammatory cascades. Thus, a major focus of immune system drug development is limiting the effects of TNF, ROI and NO production. THC can inhibit the proliferation response of T- lymphocytes, inhibit cytotoxic T cells and suppress macrophage function thereby reducing NO production. THC and CBD remove the ROI produced, as they are potent antioxidants. CBD modifies production of the TNF, IL-1 and IFN-γ by peripheral blood mononuclear cells, providing strong anti-inflammatory activity. CBD as an anti TNF therapy is a viable treatment option for rheumatoid arthritis and colitis. CBD administered once the symptoms of collagen induced arthritis have manifested has shown potent antiarthritic effects via its combined immunosuppressive and anti-inflammatory qualities (Mechoulam et al., 2002; Malfait et al., 2000).

Endocannabinoids, cannabinoids and synthetic cannabinoids have all shown antiproliferative activity in a wide range of tumour cells in culture and animal models. The antitumour activity of cannabinoids can be exerted via a number of mechanisms; these include directly promoting death of transformed cells, inhibiting the growth of transformed cells or inhibiting tumour angiogenesis and metastasis. This can be achieved by directly modulating signaling pathways involved in determining a cell’s fate, such as mitogen activated protein kinases (MAPKs) that are important in mediating cell growth and differentiation (Guzman, 2003).

Activation of the phosphatidylinositol 3-kinase (PI3K)-AKT survival pathway results in phosphorylation and inhibition of nuclear translocation, preventing the expression of proapoptotic proteins. Cannabinoid receptors are negatively coupled to the (PI3K)-AKT survival pathway, therefore, when activated they inhibit this pro-survival pathway. Cannabinoids also induce sustained ceramide production by the prolonged activation of the RAF1-MEK-ERK signaling cascade and inhibition of AKT. Increasing the level of the lipid second messenger ceramide induces cell cycle arrest and cell death by apoptosis in glioma cells. CB1 activation also blocks the cell cycle at G1-S transition in breast carcinoma cells by inhibiting adenylyl cyclase and the cAMP- protein kinase-A pathway, which normally phosphorylates and inhibits RAF1. Therefore, cannabinoid activation of CB1 results in RAF1 not being inhibited and leads to prolonged activation

34 of the RAF1-MEK-ERK signaling cascade mediating the antiproliferation of tumour cells (Guzman, 2003).

Cannabinoid receptor activation has been observed to inhibit growth factor receptor signaling in multiple cancers including skin carcinoma, prostate carcinoma and pheochromocytoma, which may represent a general mechanism of cannabinoid mediated antiproliferation. Tumours require angiogenesis to receive a blood supply for nutrients, gas exchange and waste removal to grow beyond a minimal size. Cannabinoid receptor activation in vascular endothelial cells inhibits cell migration and survival by reducing the expression of vascular endothelial growth factor and other pro angiogenic cytokines, which halts a tumour’s ability to grow due to lack of blood supply (Guzman, 2003).

Cannabinoids have shown the ability to inhibit or reduce convulsions in animal experiments, where they are as effective as conventional seizure treatments and can enhance their effects when used in tandem. CBD and THC have similar effects on seizures but they act via different mechanisms. CBD has a mechanism more similar to anti-convulsants and has been shown to be effective on cortical, focal, limbic, and generalized maximal seizures. CBD has also shown some sedative and anxiolytic effects, where a high dose can improve the sleep length of insomniacs, and relieve the anxiety caused by THC. CBD can also alleviate a number of other CNS effects mediated by THC (Mechoulam et al., 2002).

It is clear that cannabinoids represent potential treatments for a myriad of disorders. Thus, a sustainable source of cannabinoids may become increasingly desired to allow drug development and mass production of pharmaceuticals. Synthetic biology represents a possible approach to fulfill this demand. Gaining an understanding of the cannabinoid biosynthetic pathway will allow its effective introduction into a microbial chassis to give a sustainable source of these secondary metabolites.

35 4. Cannabinoid biosynthetic pathway

Figure 1.5 shows the full biosynthetic pathway of the cannabinoids THC and CBD. This pathway consists of five proteins including: a polyketide synthase; a polyketide cyclase; an aromatic prenyltransferase; and two flavin dependent oxidative cyclases. These proteins produce the acidic forms of THC and CBD, which then undergo decarboxylation independently of any enzymes to produce the neutral forms of these cannabinoids.

Figure 1.5 The full biosynthetic pathway of the cannabinoids THC and CBD – The figure depicts the skeletal structures of all the intermediates, by-products and main products of the cannabinoid biosynthetic pathway. Each step’s mediating enzyme is also labelled. Figure is adapted from Gagne et al., 2012.

36 4.1 Hexanoyl-CoA and malonyl-CoA

The starting substrates of the cannabinoid biosynthetic pathway are one hexanoyl-CoA molecule and three malonyl-CoA molecules both of which are derived from fatty acid precursors. The first commitment step in fatty acid synthesis is the carboxylation of acetyl-CoA to produce malonyl-CoA, this reaction is performed by the enzyme acetyl-CoA carboxylase (ACCase) (Sasaki and Nagano, 2004).

Fatty acids cannot be transported long distances within plants, therefore, each cell must produce fatty acids when they are required. De novo synthesis of fatty acids predominantly occurs in plastids, as they pass through the plastid envelope and enter the cytosol they can be modified. The precursor to fatty acids, malonyl-CoA, cannot pass through the plastid envelope so must be synthesized in the plastid and the cytosol to fulfill the requirements of these localized areas. The malonyl-CoA that is produced in plastids is utilized to synthesize fatty acids, whereas in the cytosol it contributes to flavonoid synthesis, which protects the plant from Ultraviolet (UV) radiation. This compartmentalization of ACCase activity is necessary to segregate appropriate levels of malonyl- CoA production (Sasaki and Nagano, 2004).

The majority of fatty acids are used to synthesize membrane and storage lipids that are primarily required during the early stages of plant growth and development, past this stage the level of de novo fatty acid synthesis drops to basal level. This change in fatty acid levels can be controlled by the transcription levels of ACCase, which is high during the early phase of seed development and reduces in the late phase. The transcription levels of ACCase are mediated by genes from both the genome and plastid. Post transcriptional regulation of ACCase also occurs as the polypeptides that are encoded by the genome and the plastid must be coordinated in order to form a functional complex, although the mechanism behind this is not yet fully understood. The activity of ACCase can also be controlled by environmental changes; increased ACCase activity is required during photosynthesis to increase the production of flavonoids that protect photosynthetic organelles from UV-B radiation damage. ACCase activity is coordinated by photosynthesis as the ATP and NADPH produced are used in fatty acid synthesis. ACCase activity is also regulated by stromal pH and Mg ion concentration, pH increases from 7 to 8 and the Mg ion concentration increases from 1 mM to 3 mM during photosynthesis, which activates ACCase (Sasaki and Nagano, 2004).

There are two distinct types of ACCase that are present in plants; the homomeric ACCase, which consists of one large polypeptide, located in the cytosol and the heteromeric ACCase, with four subunits, situated in the plastid where de novo fatty acid synthesis takes place by the two enzymes ACCase and fatty acid synthase. The four subunits that compose the heteromic ACCase are a biotin carboxy carrier protein (BCCP), biotin carboxylase (BC) and the α and β subunits of the carboxyl transferase (CT). The production of malonyl-CoA occurs via two half reactions: - 2+ - 2+ BBCP + HCO3 + Mg -ATP  BCCP-CO2 + Mg -ADP + Pi - BCCP-CO2 + acetyl-CoA  BCCP + malonyl-CoA

37 The first reaction is performed by BC and the second by CT (Sasaki and Nagano, 2004).

The acyl-CoA thioester hexanoyl-CoA is produced by members of the acyl-activating enzyme (AAE) superfamily that activate the carboxylic acid group through an adenylate intermediate to add the coenzyme A. The substrates of AAEs in plants include phenylpropanoids, jasmonate precursors and fatty acids. Hexanoyl-CoA is derived from the short-chained fatty acid hexanoate either by CsAAE1 or CsAAE3, both of these AAEs are found in the trichome transcriptome and both activate hexanoate. Cannabinoid production is a rare example of polyketide synthesis in plants that utilizes hexanoyl-CoA; because of this the origin of the fatty acid precursor hexanoate has not yet been clarified. No acyl-acyl carrier protein (acyl-ACP) that terminates at six carbons has been identified from the transcriptome suggesting that de novo synthesis of hexanoate is unlikely. A more likely pathway is the cleavage of an eighteen carbon fatty acid to yield twelve and six carbon products, though this is not yet confirmed (Stout et al., 2012).

4.2 Tetraketide synthase

The first enzyme in the cannabinoid biosynthesis pathway is a type III polyketide synthase (PKS), referred to as tetraketide synthase (TKS), which produces the linear polyketide intermediate pentyl tetra-β-ketide CoA from the hexanoyl-CoA and three malonyl-CoA molecules. Polyketides are the largest and most diverse group of secondary metabolites that are produced by numerous organisms. They include antibiotics, mycotoxins, stilbenoids and flavonoids. Polyketides are often exploited in medicine for their antimicrobial, antiparasitic, immunosuppressive and antineoplastic activity. PKSs are a large family of enzymes that produce polyketides. There are three types of PKS, type I PKS consist of one or more multifunctional proteins with a number of different active sites that are organized into modules and contain at least an acyltransferase, acyl carrier protein and β-keto acyl synthase. Type II PKS are a system of individual proteins that carries a single set of repeating activities, and they contain at least an acyl carrier protein, that gives an anchor for the growing polyketide, and two ketosynthase units. Type III PKS catalyze the iterative condensation of a starter CoA ester and extender CoA esters. There are a number of type III PKSs in nature that produce a large variety of secondary metabolite products; this variation comes from the type of starter CoA ester, the number of condensation reactions performed and the type of cyclisation that occurs. Often Type III PKS perform cyclisation via two reactions; C2-C7 aldol cyclisation paired with decarboxylation or C6-C1 Claisen cyclisation. This cyclisation occurs post PKS in C. sativa by a separate modifying enzyme (Flores-Sanchez and Verpoorte, 2009).

Plant type III PKS enzymes show a high level of similarity in their amino acid sequence, structure and mechanism, especially within families such as chalcone synthases (CHS). Typically a PKS is a symmetric dimer where each monomer has a five layered αβαβα core with independent active sites. Dimerization is necessary for activity which shows there is allosteric cooperation between the two active sites; a methionine residue helps to shape the active site of the adjoining monomer. The

38 basic reaction catalyzed by a PKS is the sequential condensation of two carbon units from the decarboxylated extender CoA ester, malonyl-CoA, and the starter CoA ester, mediated by the conserved triad of cysteine, asparagine and histidine residues. The active site of a PKS consists of a starter substrate binding pocket and cyclisation pocket, which is buried in the monomer with substrates entering via a long CoA binding tunnel. The cysteine acts as a nucleophile and initiates the reaction by attacking the thioester carbonyl of the starter CoA, while the asparagine orientates the thioester carbonyl CoA of the malonyl-CoA to the histidine to facilitate decarboxylation of the terminal carboxylate group. This leaves an acetyl CoA carbanion for condensation with the enzyme bound polyketide intermediate. The elongated polyketide intermediate is then recaptured by the cysteine for additional rounds of condensation reactions; simultaneously the CoA is released from the active site. This iterative condensation mechanism is depicted in figure 1.6. Two phenylalanine residues are often conserved in type III PKSs that act as gatekeepers to the active site, one of these phenylalanines also contribute to a non-polar environment in the active site to aid decarboxylation. Type III plant PKSs also contain a conserved GFGPG loop that provides a scaffold for polyketide cyclisation (Flores-Sanchez and Verpoorte, 2009; Go et al., 2015; Lussier et al., 2012).

Figure 1.6 Diagram of mechanism behind PKS iterative condensation reaction – The figure depicts the general mechanism of a plant type III PKS, the starter CoA is coloured green and the malonyl-CoA molecules are coloured green and blue. The catalytic residues are shown and labelled. Figure is adapted from Go et al., 2015.

39

Type III PKSs from plants are usually 41 to 44 kDa and as a superfamily often show 46 to 95% homology with one another and it is believed that they likely evolved from fatty acid synthases. The TKS from C. sativa is 42.5 kDa and uses the substrates hexanoyl-CoA and malonyl-CoA to produce the linear pentyl tetra-β-ketide CoA, via the reaction shown in Figure 1.7, which continues through the cannabinoid biosynthetic pathway, and three by-products; that may form by decarboxylative aldol cyclization mediated by the TKS after liberation of the polyketide from the cysteine, pentyl diacetic acid lactone (PDAL) and hexanoyl triacetic acid lactone (HTAL) that both form by spontaneous lactonisation of poly-β-keto triketide and poly-β-keto tetraketide, respectively, which are prematurely released from TKS. The reactions that produce these by-products are shown in Figure 1.8 (Flores-Sanchez and Verpoorte, 2009; Taura et al., 2009; Gagne et al., 2012).

Figure 1.7 The reaction catalyzed by the TKS – The figure depicts a skeletal representation of the substrates utilized by the TKS, which are one hexanoyl-CoA and three malonyl-CoA, to produce the linear polyketide intermediate, which is the substrate of the polyketide cyclase from C. sativa. The figure is adapted from (Raharjo et al., 2004)

Figure 1.8 The reactions that produce the by-products of TKS – The figure depicts a skeletal representation of the spontaneous reactions that occur independently of TKS to produce PDAL and HTAL and the aldol cyclisation reaction mediated by TKS that produces olivetol. Figure adapted from Taura et al., 2009.

40 4.3 Olivetolic acid cyclase

The linear polyketide intermediate pentyl tetra-β-ketide CoA from TKS undergoes C2-C7 aldol cyclisation by the enzyme olivetolic acid cyclase (OAC) to produce olivetolic acid. OAC cannot utilize the by-products olivetol, HTAL and PDAL as it is unable to open their aromatic rings. OAC is the only known plant polyketide cyclase and only functionally characterized dimeric α+β barrel (DABB) protein from a plant. OAC accepts the polyketide from TKS, without any direct interaction, and subsequently performs C2-C7 aldol cyclisation by cleaving the thioester bond to allow aromatization. OAC is a 12 kDa homodimeric protein that has hydrophobic tunnel active sites in both monomers. The structure solved for OAC shows that monomer A is composed of four stranded antiparallel β sheets and three α helices, and monomer B consists of four stranded antiparallel β sheets and two α helices. The outer surface of the antiparallel β sheets face each other forming the α+β barrel. Several hydrogen bonds and hydrophobic interactions are involved in dimerization and each monomer buries their surface area. Each monomer contains an active site in the α+β barrel with its entrance at the centre of α2, α3 and β4, which are labelled in figure 1.9. The structure of OAC is shown in Figure 1.9, the similarity between the two monomers and olivetolic

acid can be observed (Yang et al., 2016; Yang et al., 2015; Gagne et al., 2012).

Figure 1.9 Ribbon representation of the apo homodimer of OAC and olivetolic acid bound monomer A of OAC – Monomer A is coloured yellow and monomer B is coloured blue. The full apo DABB structure is depicted with the secondary structures that form the active site entrance labelled. The catalytic residues of the active site are shown in monomer A, depicted as cylinders structures with the carbon atoms coloured green, nitrogen atoms coloured blue and oxygen atoms coloured red. The bound olivetolic acid is depicted as a purple stick and ball structure. Figure adapted from Yang et al., 2016 using PDB 5B08 and 5B09.

The active site cavity volume is estimated to be 270Å3, deep inside this is a long, narrow hydrophobic tunnel, called the pentyl binding pocket that accommodates the pentyl moiety of the pentyl tetra-β-ketide CoA. The tetra-β-ketide moiety is accommodated by a hydrophilic region near the tunnel entrance formed by His5, His78, Tyr27 and Tyr72. The His5 and His78 together with

41 Ile73, Gly82, Trp89, Leu92 and Ile94 form the active site entrance. The CoA moiety is not bound to the enzyme but rather it protrudes from the active site and either interacts with the solvent or the surface of the protein. The amino acids Lys4, Lys12, Lys38, Asp45 and His75 are far from the active site but have shown important roles in the correct folding of OAC or perhaps interaction with the CoA moiety (Yang et al., 2016).

Once the aldol cyclization has occurred the dihydroxybenzoate moiety of the olivetolic acid is accommodated near the entrance of the active site. One carboxyl oxygen, the two hydroxyl groups and part of the aromatic ring are exposed to the solvent; the other carboxyl oxygen forms hydrogen bonds with the His5 and Tyr72. The His5 also forms a hydrogen bond with Asp96 outside the active site. It has been observed that His5, Tyr27, Tyr72, His78, Asp96 and the hydrophobic pentyl binding pocket all play crucial roles in the substrate and product specificity of the enzyme (Yang et al., 2016).

OAC has the linear pentyl tetra-β-ketide CoA loaded into its active site. Nucleophilic attack by His78, which is activated by Tyr72, abstracts the proton from C2 to form an enol intermediate, shown in figure 1.10 A. This enol intermediate promotes nucleophilic attack on C7 by keto/enol tautomerisation, then abstraction of the proton from the protonated His78 by the C7 carbonyl oxygen allows C2-C7 aldol cyclization shown in figure 1.10 B. Figure 1.10 C shows the cyclized intermediate that is released from OAC, which is still linked to the CoA. This intermediate is instantly subjected to spontaneous aromatization and cleavage of the CoA non-enzymatically to form olivetolic acid shown in figure 1.10 D and E (Yang et al., 2016).

Figure 1.10 Diagram of each stage in reaction mechanism mediated by OAC - A) The abstraction of the proton by His78. B) The abstraction of the proton from His78 to allow aldol cyclisation. C) The intermediate released from OAC. D) The spontaneous non enzymatic aromatization and CoA cleavage. Figure adapted from Yang et al., 2016.

42 4.4 Aromatic prenyltransferase

The next step in the cannabinoid biosynthetic pathway is the prenylation of olivetolic acid to produce cannabigerolic acid (CBGA), which is the common intermediate to a number of cannabinoids. This prenylation reaction is shown in figure 1.11 and is performed by a membrane bound aromatic prenyltransferase (APT), which has six transmembrane domains and uses geranyl pyrophosphate (GPP) as the prenyl source. Prenylation is a ubiquitous reaction that is involved in the production of both primary and secondary metabolites in plants, fungi and bacterial organisms. The prenylation of aromatic compounds coupled with tailoring reactions that include cyclization, oxidation and reduction produces a myriad of products with diverse biological activities. Typically, the introduction of a prenyl moiety increases the lipophilicity of the product, improving their ability to interact with biological membranes (Fellermeier and Zenk, 1998; Chen et al., 2017).

Figure 1.11 The reaction of olivetolic acid with GPP to produce CBGA – The figure depicts the prenylation reaction mediated by the APT of C. sativa. The CBGA produced is the common intermediate to cannabinoid synthases such as THCAS and CBDAS. Figure adapted from Fellermeier and Zenk, 1998.

4.5 Tetrahydrocannabinolic acid synthase

The CBGA produced by prenylation of olivetolic acid is the substrate of tetrahydrocannabinolic acid synthase (THCAS) that catalyzes the oxidative cyclization of the monoterpene moiety to produce THCA, which is the acidic form of THC. THCAS is comprised of 545 amino acids where the first 28 are a cleavable signal sequence. Thus, mature THCAS is a 517 amino acid flavin dependent protein with a bound flavin adenine dinucleotide (FAD) and has shown homology with a number of other . THCAS is a monomeric enzyme that is divided into two domains, I and II, which are separated by the FAD binding pocket. Domain I consists of eight α helices and eight β strands described by the residue positions 28 to 253 and 476 to 545, and this domain is covalently bound to FAD. Domain I is divided into two subdomains, Ia and Ib, with the adenylic acid part of the bound FAD between these two subdomains. Ia includes the region of residues 28 to 134 that form three α helices, A B and C labelled on figure 1.12, surrounding three β strands. There is a disulphide bond between Cys37 in α helix A and Cys99 in α helix C. Subdomain Ib, which is described by the residues 135 to 253 and 476 to 545, is composed of five antiparallel β strands that surround five α helices. Domain II includes the residues 254 to 475 and consists of eight

43 antiparallel β strands that surround six α helices. The structure of THCAS is shown in Figure 1.12 and is split into the domains and subdomains with each labelled; the bound FAD is also shown in the structure. The theoretical weight of THCAS is 58.6 kDa whereas the observed weight is 62 kDa. This can be attributed to the glycosylation of the protein as eight possible N-glycosylation sites have been identified on the enzyme (Shoyama et al., 2005; Shoyama et al., 2012; Sirikantaramas et al., 2004; Sirikantaramas et al., 2005).

Figure 1.12 Ribbon representation of THCAS - Left: ribbon depiction of the full apo structure of THCAS with the bound FAD shown as a ball and stick structure. The domains and sub domains Ia, Ib and II are depicted by lines over the structure and the helices A, B and C are labelled. The secondary structures of α helices are coloured red and β strands are coloured blue. Top right: FAD depicted as stick and ball structure covalently bound to Cys176 and His114. Bottom right: FAD and Tyr484 that mediate catalytic activity. Figure adapted from Shoyama et al., 2012 using PDB 3VTE.

The FAD is crucial to the enzymatic activity of THCAS and is covalently bound at two positions which can be seen in figure 1.12. The Nδ1 of His114 and Sγ of Cys176 form covalent bonds with the carbon atoms at positions C8M and C6 of the isoalloxazine ring of FAD, respectively. There are also a number of hydrogen bonds that interact with the FAD, six come from the main chain nitrogens of Gly113, His114, Gly122, Tyr175, Gly180 and Gly183, two come from side chain nitrogens of His184 and Asn483, and two from the side chain hydroxyl groups of Tyr190 and Tyr481 (Shoyama et al., 2012).

44 THCAS catalyzes a unique oxidative cyclization to produce THCA. The oxidative ring formation occurs via an intermediate that is produced by the elimination of a hydride and a proton from the C3 and O6’ positions of CBGA, respectively. This results in two new bonds forming between C3 and C4, and the C8 and O6’ positions of THCA. The N5 of the isoalloxazine ring of FAD is responsible for accepting the hydride from C3 and Tyr484 acts as a base to accept the proton from O6’, both of these actions are essential to the activity of THCAS. Molecular oxygen is also required for this reaction to take place as it accepts the hydride from reduced FAD to produce hydrogen peroxide and reactivate the FAD. The carboxyl functional group of CBGA is necessary for its specificity to THCAS, as no catalytic activity is shown for that lacks this group. His292 at the surface of the active site of THAS does not exert any catalytic activity but is critical to binding the carboxyl group of CBGA to give the substrate specificity. The reaction mechanism and residues that are essential to the substrate binding and catalytic activity of THCAS are shown in Figure 1.13. Their position and orientation in the protein can be seen in figure 1.12 (Shoyama et al., 2012).

Figure 1.13 The reaction mechanism of THCAS - The amino acids that bind the substrate, the catalytic residues and bound FAD are depicted. The elimination of the proton and hydride from CBGA, the formation of the two new bonds, and the release of THCA is shown in the diagram. Figure adapted from Shoyama et al., 2012.

4.6 Cannabidiolic acid synthase

Cannabidiolic acid synthase (CBDAS) uses CBGA as its substrate to produce CBDA by a similar oxidative cyclization reaction to THCAS. The primary structure of CBDAS shows 83.9% homology with THCAS and is also a flavin dependent protein with FAD covalently bound at His114. CBDAS is comprised of 544 residues with the first 28 being a cleavable signal sequence. Thus, mature CBDAS is 516 amino acids in length with a theoretical mass of 58.8 kDa but an observed mass of 62 kDa, which is believed to be due to glycosylation as there are seven possible N-glycosylation sites on the protein. Similarly to THCAS, CBDAS initiates the reaction by abstraction of a hydride from the C3 position of CBGA, which is accepted by the N5 of the isoalloxazine ring of FAD. The difference in the two enzymes comes from the proton transfer. For CBDAS, the proton is

45 abstracted from the terminal methyl group of CBGA, rather than the hydroxyl group for THCAS. This proton is once again accepted by a basic residue in the active site. This results in the stereoselective ring closure to form CBDA; molecular oxygen is also required by CBDAS to reactivate the FAD, which produces hydrogen peroxide as a by-product. The similarity between the reactions and mechanisms of THCAS and CBDAS is shown in figure 1.14. There have been fewer studies on the structure of CBDAS, however, due to its homology with THCAS and similar reaction catalyzed, it can be hypothesized that their structures are similar with differing residues in the active site to give product specificity (Taura et al., 2007).

Figure 1.14 The two reactions catalyzed by CBDAS and THCAS – The diagram of the reactions mediated by CBDAS and THCAS shows both requiring FAD, a basic residue and molecular oxygen. The similarity of the cyclisation reaction can be observed. This gives evidence that the structure and mechanism of THCAS and CBDAS are similar. The figure is adapted from Taura et al., 2007.

46 4.7 THCA and CBDA decarboxylation

THCAS and CBDAS synthesize THCA and CBDA respectively, which are the acidic forms of these cannabinoids that do not exert any activity in mammalian systems. Therefore, they have no medical application. THCA and CBDA must undergo decarboxylation to their neutral forms, THC and CBD, in order to be medically useful. Acidic cannabinoids are thermally unstable and can be readily decarboxylated by exposure to light or heat. This most often occurs during the drying, smoking or baking of cannabis and its extracts (Wang et al., 2016).

47 5. Further work on the pathway

Though the biosynthetic pathway of cannabinoids has attracted much interest, due to their therapeutic value, there are gaps in the understanding of the biocatalysts involved in the pathway. There is a major lack in the understanding of the structural and biochemical properties of the TKS involved in the first step of the pathway. Understanding the reactions and mechanisms of the TKS facilitates the opportunity to alter its structure in order to improve its catalytic efficiency, substrate specificity or product identity. This increases the synthetic biology potential of the protein and, therefore, the ability to introduce the whole cannabinoid biosynthetic pathway into a heterogeneous organism.

The ability to express active members of the pathway in a heterogeneous organism is also crucial to informing the process of introducing the whole pathway into different species. Ensuring that active soluble forms of the proteins that are not toxic to the heterogeneous organism is essential to allowing the biosynthetic pathway members to be introduced while not being detrimental to the survival of the organism.

48 6. Objectives of this study

The overall aim of this project is the construction of the cannabinoid biosynthetic pathway in an E. coli host by using synthetic biology principles to build an efficient production platform of the valuable cannabinoids THC and CBD. The studies conducted in this thesis aimed to inform this process, specifically the first two enzymes of the cannabinoid pathway – TKS and OAC. An essential aim is to confirm that an E. coli host can express active forms of these recombinant proteins from the plant species C. sativa. If complications are encountered during this process then it is necessary to design methods to overcome them. Use of the principles of biophysics and biochemistry, particularly structural and mass spectrometry studies, was employed to characterize the proteins and test their activity, where necessary this also allows their design to increase their efficiency. Expression of the members of the pathway in E. coli and the study of their catalytic activity is important in elucidating their possible use in such a synthetic biology system. The structural study of the members facilitates the design of variant proteins to allow the introduction of a more efficient cannabinoid biosynthetic pathway into E. coli compared to the naturally occurring one.

6.1 Expression of functional proteins

The initial stages of this study are the expression and purification of the first two enzymes of the cannabinoid biosynthetic pathway in an E. coli host strain, and subsequently the establishment of their activity in vitro. Production and purification of these active enzymes allows the study of their product profile via mass spectrometry. Confirming the ability of E. coli to express active forms of these proteins is crucial in relation to synthetic biology as it verifies the possibility of in vivo activity of these members of the pathway once they are introduced. If issues are encountered in the production of these enzymes, new approaches can be designed to improve their expression and, therefore, potential in vivo activity in a heterogeneous host.

6.2 Determining the structure of TKS

A major objective in this study was to determine the three-dimensional structure of the TKS from C. sativa. Solving this structure allows the hypothesized similarities between the structure and mechanism of the C. sativa TKS and other plant type III PKSs to be confirmed. Also determining the structures in the apo form and in complex with the substrates enables any conformational changes that occur during catalysis to be identified. The TKS structure can subsequently be used to design variants of the protein in the attempt to increase its efficiency. Such variants would then be introduced to the pathway construct to increase the total efficiency and output of cannabinoids from the system.

49 6.3 Mutant design to improve efficiency of TKS

The TKS structure would allow visualization of the enzymes active site and surrounding amino acids, mutation of said residues can be conducted to design variants that will increase the catalytic efficiency of the enzyme. This may be achieved by removing the level of by-products produced by the enzyme, thereby producing a cleaner product profile. If this increase in efficiency is possible it would hold significant synthetic biology value as it allows the opportunity of designing a cannabinoid biosynthetic pathway that would be more efficient than the naturally occurring one. A more efficient variant can be introduced to the cannabinoid pathway constructed in an E. coli host to increase the level of intermediates for use by later enzymes in the pathway. By doing so, this may increase the overall cannabinoid output and, therefore, the value of the system.

50 7. Materials and methods

7.1 Genes, vectors and expression strains

The genes for the TKS and OAC proteins of the cannabinoid biosynthesis pathway were synthesized and subcloned into the pETM-11 vector using the Gene Art commercial service. The final constructs contain a 6x-Histidine tag, TEV protease site, lactose operon, kanamycin resistance and T7 promoter. The genes were codon optimized for expression in E. coli strains.

The proteins were expressed in the competent cells of E. coli strain BL21 (DE3) (New England Biolabs), which is a T7 expression strain and resistant to T1 phage. The proteins were also expressed in the competent E. coli strain ArcticExpress DE3 (Agilent Technologies) that co- expresses the cold-adapted chaperonins Cpn10 and Cpn60 to aid protein folding and reduce inclusion body formation. ArcticExpress cells express proteins using the T7 promoter and contain an additional plasmid for the chaperonins that is resistant to the antibiotic gentamycin.

7.2 Transformation and expression protocol

All transformation and expression steps were conducted under aseptic conditions and initial expression of proteins was conducted in BL21 (DE3) cells. For both proteins 1 μl of plasmid DNA (50 ng/μl) was added to 25 μl of the competent cells and incubated for 30 minutes on ice. The cell samples were heat shocked in a 42 °C water bath for 45 seconds, allowing plasmid uptake by causing the cell membranes to become porous. Cells were then returned to ice for 2 minutes. 475 μl of SOC medium, which consisted of 2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl,

10 mM MgSO4, 10 mM MgCI2 and 20 mM glucose, was added to the transformation samples and incubated at 37 °C for 1 hour while shaking. The introduced plasmid contains the kanamycin resistance gene, thus, 100 μl of transformed cells were plated on to a LB agar plate, which consisted of 10 g/l tryptone, 5 g/l yeast extract, 10 g/l NaCl and 15 g/l agar, containing 40 μg/ml of kanamycin and incubated overnight at 37 °C to selectively culture transformed cells.

One colony for each protein was taken to inoculate 50 ml of 2X YT broth (Formedium), which consisted of 16 g/l tryptone, 10 g/l yeast extract and 5 g/l NaCl, containing 40 μg/ml of kanamycin, to ensure selectivity, and incubated overnight at 37 °C while shaking. For each protein two 500 ml 2X YT broth containing 40 μg/ml of kanamycin was inoculated with 20 ml of the start-up culture. The inoculated broth was incubated at 37 °C while shaking until an OD of 0.6-0.8 at 600 nm was reached, measured using a spectrophotometer (Jenway 6300). Once the required OD was reached the incubation temperature was reduced to 16 °C and the expression of the proteins was induced by adding 0.1 mM IPTG, and then left to grow overnight.

51 When protein solubility was an issue transformation was conducted in ArcticExpress DE3 cells that carry additional chaperonins that improve protein folding, reduce occurrence of inclusion bodies and increase soluble protein yield. 2 μl of XL10-Gold β-mercaptoethanol (Agilent technologies) diluted with dH2O at a ratio of 1:10 was added to 100 μl of ArcticExpress cells and incubated on ice for 10 minutes. 1 μl of the plasmid (50 ng/μl) was added to 25 μl of the cells and incubated on ice for 30 minutes. The cell samples were heat shocked in a 42 °C water bath for 20 seconds then returned to ice for 2 minutes. 475 μl of SOC medium was added to the transformation samples and incubated at 37 °C for 1 hour while shaking. 100 μl of the transformed cells were plated on to LB agar containing 20 μg/ml of gentamycin and 40 μg/ml of kanamycin, and incubated overnight at 37 °C to selectively culture transformed cells. 50 ml of 2X YT broth containing 20 μg/ml of gentamycin and 40 μg/ml of kanamycin were inoculated with one colony, and incubated overnight at 37 °C while shaking. Two flasks of 500 ml 2X YT broth containing 20 μg/ml of gentamycin and 40 μg/ml of kanamycin were inoculated with 20 ml of the cultured broth. The inoculated broth was incubated at 37 °C while shaking until an OD of 0.6-0.8 at 600 nm was reached. Once this OD was reached the incubation temperature was reduced to 16 °C and protein expression was induced by adding 0.1 mM IPTG, and then left to incubate overnight.

7.3 Harvesting cells and protein purification

The cells were harvested by centrifuging at 8000 g for 10 minutes at 4 °C, the supernatant was disposed of and the cell pellet collected. The cells were resuspended in lysis buffer; 25 mM Tris pH 8, 150 mM NaCl, 5% glycerol (the lysis buffer was used for the remainder of the procedures), for every 10 grams of dry weight cells 20 ml of buffer were used. 1 tablet of complete EDTA-free Protease Inhibitor Cocktail (Roche) per 50 ml of culture and 10 ng/ml DNAse was added. Cells were lysed using a sonicator (Bandelin sonopuls), pulsed on for 10 seconds and off for 20 seconds over a period of 20 minutes at 40% amplitude. The sonicated samples were centrifuged at 40000 g for 30 minutes at 4 °C, the supernatant was removed and passed through a 0.45 micron filter (Sartorius Stedim Biotech).

The 6x-Histidine tags expressed on the proteins allow their purification using an immobilized metal affinity chromatography (IMAC) column. Thus, the filtered samples were passed through a Ni2+ containing His Trap FF column (GE Healthcare). Undesired and non-specifically bound proteins were washed from the column using a series of two column volume washes of lysis buffer with 0 mM, 10 mM, 20 mM, 30 mM, 40 mM and 50 mM of imidazole pH 8.0. The desired tagged proteins were eluted using four elution fractions, each containing three column volumes of lysis buffer with two containing imidazole at 200 mM and two containing imidazole at 500 mM.

52 7.4 SDS-PAGE

The lysis, pellet, supernatant, flow through, washes and elutions from the purification were electrophoresed on a SDS-PAGE gel to identify samples that contained the desired protein and, if the over-expression was successful, producing the soluble protein. The SDS-PAGE protocol used throughout the experiments used 12% Mini-PROTEAN TGX Stain-Free Precast Gels (Bio-Rad) with 10 or 15 wells depending on number of samples. 10 μl of 2X SDS sample buffer were added to 10 μl of the samples, then heated for 5 minutes at 97 °C using an accublock digital dry bath (Labnet). The 2X SDS sample buffer contained 62.5 mM Tris-HCl pH 6.2, 2% SDS, 25% glycerol, 0.01% bromophenol blue and 2.5% β-mercaptoethanol. 10X SDS running buffer contained 288 g glycine, 60 g Tris-HCl and 20 g SDS made up to 2 l with distilled water, diluted to 1X with distilled water for use in SDS-PAGE. The samples were electrophoresed at 300 volts for 20 minutes and then were visualized using a BioRad imaging system with the stain-free gel protocol.

7.5 Removal of protein tags

The imidazole used in the elutions was removed from the protein samples by passing them through a CentriPure P100 desalting column (emp Biotech). TEV protease (1:500 w/w) was added to the proteins with 1 mM DTT, to ensure the active cysteine of TEV is reduced, and incubated overnight at 0 °C to allow cleavage of the 6x-Histidine tags. A SDS-PAGE of the cleaved samples was run to confirm the cleavage was successful. A reverse IMAC column was run to remove the cleaved 6x- Histidine tags and TEV protease from the sample solutions. The cleaved samples were passed through a His Trap FF column and the desired protein was collected in the flow through and washed with two column volumes of lysis buffer. The column bound 6x-Histidine tags and TEV protease were eluted with three column volumes of lysis buffer containing 30 mM imidazole and six column volumes of lysis buffer containing 500 mM imidazole. A SDS-PAGE gel of the flow through, lysis buffer wash and imidazole washes was run to confirm the reverse IMAC was successful. The desired tag cleaved protein in the flow through and lysis buffer wash fractions were pooled.

7.6 Size exclusion chromatography

As TKS was to be used in crystal trials, further purification was required to increase the likelihood of crystals forming. For further purification of TKS, size exclusion chromatography was used. TKS was concentrated using a vivaspin 20 (Sartorius) with a molecular weight cut-off limit of 30 kDa and centrifuged at 8000 g, and further purified via gel filtration. The concentrated TKS was injected onto a HiLoad Superdex 75 (26/60) column connected to an AKTA Pure system (GE Healthcare) and run at 1.2 ml/min. The column was pre-equilibrated with the lysis buffer before the run. The samples were collected from the column as fractions (1.75 ml) and their purity was analyzed by SDS-PAGE. Pure fractions were pooled and concentrated to approximately 15 mg/ml.

53 7.7 TKS crystallization and structure determination

Crystallization trials were conducted for TKS. The trials were setup using the Mosquito robot (TTP labtech) with drop volume of 200 nl protein and 200 nl of crystallization solution. 4-6 different crystallization screens (JCSG+, PACT Premier, Morpheus, Morpheus II and SG1 (Molecular dimensions) were used. Three variants of TKS were set up for trials: apoprotein, with 2 mM hexanoyl-CoA and 2 mM malonyl-CoA. Rod-like crystals appeared in 2 days for all three variants and were fully-grown within a week. Crystals were formed when the reservoir solution contained 0.2 M potassium nitrate, 20% w/v PEG 3350. Crystals were cryoprotected by incubating with reservoir solution supplemented with 20% glycerol and cryo-cooled using liquid nitrogen. X-ray data were collected at the Diamond Light Source (DLS). Data were processed using xia2 (Winter et al., 2013) implementing XDS (Kabsch, 2010) and XSCALE. The apoprotein and substrate bound structures of TKS were solved by molecular replacement using the chalcone synthase structure (PDB 1BI5) as the search model in Phaser (McCoy et al., 2007). The models were built using the AutoBuild wizard in Phenix (Adams et al., 2010). The models were completed using iterative rounds of model building using Coot (Emsley et al., 2010) and refinement using phenix.refine (Afonine et al., 2012). The structures were analyzed using PDB_REDO (Joosten et al., 2014) server and validated using Molprobity (Chen et al., 2010). The X-ray data collection and refinement statistics are in Table 1. Table 1 - X-ray data collection and refinement statistics

TKS (Apo) TKS (Hexanoyl-CoA) TKS (Malonyl-CoA) Data collection

Space group P 1 21 1 P 1 21 1 P 1 21 1 Unit cell dimensions a=50.69Å, b=123.33Å, a=71.56Å, b=123.29Å, a=71.61Å, b=123.49Å, c=57.67Å; α=γ=90°, c=87.94Å; α=γ=90°, c=88.0Å; α=γ=90°, β=112.58° β=109.55° β=109.62° X-ray source DLS I04-1 DLS I04-1 DLS I04-1 Wavelength (Å) 0.92819 0.92819 0.92819 Resolution range (Å) 61.67-1.19 (1.21-1.19) 82.87-1.39 (1.42-1.39) 68.82-1.52 (1.55-1.52) Completeness (%) 98.7 (97.5) 98.4 (96.4) 99.1 (99.9) Multiplicity 3.2 (2.6) 3.3 (2.9) 3.4 (3.4) I/σ I 12.3 (1.5) 13.7 (2.1) 10.4 (1.9)

Rmerge 0.041 (0.642) 0.043 (0.420) 0.071 (0.632) Rmeas 0.048 (0.814) 0.051 (0.521) 0.084 (0.751) Rpim 0.026 (0.493) 0.027 (0.303) 0.045 (0.401) CC1/2 0.999 (0.593) 0.999 (0.778) 0.996 (0.612) Total observations 664758 (26521) 938425 (39561) 734657 (37136) Total unique 205738 (10155) 280655 (13838) 218588 (10980)

Refinement R-work 0.159 0.162 0.163 R-free 0.179 0.184 0.184 RMS (bonds) 0.007 0.007 0.01 RMS (angles) 0.84 0.85 0.92 Average B-factor (Å2) 19.9 21 22 Ramachandran plot Favoured 97.77 98.29 98.41 Allowed 2.23 1.71 1.59 Outliers 0 0 0

54 7.8 Adding soluble tags to OAC

Purification of OAC revealed that most of the protein was found in the insoluble fraction, indicating formation of inclusion bodies. To overcome this problem, soluble tags were added. OAC was cloned into three different expression vectors, pET32a, pET42a and pET50b, which contained the soluble tags TRX, GST and NUS respectively. PCR amplification of the OAC gene from the OAC- pETM-11 constructed was conducted. The amplified gene was then introduced to the double digested pET32a, pET42a and pET50b vectors using an in-fusion HD cloning kit (Clontech laboratories). The vectors pET32a, pET42a and pET50b were digested using the restriction enzymes MscI, SacII and XhoI (New England Biolabs). The primers used for amplification of OAC for each vector are shown in table 2.

Table 2 – Primers for OAC amplification Expression Forward primer (5’-3’) Reverse primer (5’-3’) vector pET32a CGGTTCTGGTTCTGGCATGA GATGGTGATGGTGATGTTTCATGCCA AACATCACCATCACCATC GAACCAGAACCG pEt42a CATCACCATCACTCCATGAG CATCACCATCACTCCATGAGCGATTA CGATTACGACATCCC CGACATCCC pET50b CATCACCATCACTCCATGAG CATCACCATCACTCCATGAGCGATTA CGATTACGACATCCC CGACATCCC

PCR reactions for OAC amplification were set up as: 12.5 μl 2X CloneAmp HIFI PCR premix 0.2 μl forward primer (10 pmol) 0.2 μl reverse primer (10 pmol) 1 μl OAC-pETM-11 template plasmid (100 ng/ml) 11.1 μl sterilized water

These PCR reactions were run using a Biometra TRIO - Triple Powered PCR thermal cycler (Analytik Jena AG) with the following thermal conditions:

Step 1: 98 °C for 3 minutes 10 seconds Step 2: 35 cycles of: 98 °C for 10 seconds 50 °C for 15 seconds 72 °C for 3 minutes 7 seconds Step 3: 72 °C for 2 minutes

55 The PCR products were electrophoresed on a 1% agarose gel with safe view nucleic acid stain (NBS Biologicals Ltd) and extracted using a QIAquick gel extraction kit (Qiagen) to purify them. A 1% gel consists of 1g agarose (Fisher bioreagents) in 100 ml 1X TAE, from a 50X TAE stock of 242 g tris, 57.1 ml acetic acid and 100 ml 0.5 M EDTA pH 8 in 1 L of water

The double digest reactions for vectors were set up as follows: pET32a double digest reaction: 17.5 μl pET32a Plasmid (57 ng/µl) 5 μl 10X Cut smart buffer (New England Biolabs) 2 μl MscI restriction enzyme (5000 units/ml) 1 μl XhoI restriction enzyme (20000 units/ml) 24.5 μl sterilized water pET42a double digest reaction: 14.1 μl pET42a Plasmid (79 ng/ml) 5 μl 10X Cut smart buffer 1 μl SacII restriction enzyme (20000 units/ml) 1 μl XhoI restriction enzyme (20000 units/ml) 28.9 μl sterilized water pET50b double digest reaction: 11.1 μl pET50b Plasmid (90 ng/ml) 5 μl 10X Cut smart buffer 1 μl SacII restriction enzyme (20000 units/ml) 1 μl XhoI restriction enzyme (20000 units/ml) 31.9 μl sterilized water

All three double digestion reactions were incubated for 4 hours at 37 °C. The linearized vector plasmid were purified by being electrophoresed on a 1% agarose gel and extracted using a QIAquick gel extraction kit (Qiagen).

The in-fusion reactions of the OAC PCR product with the linearized vectors were set up as follows: 10 to 200 ng purified PCR product 50 to 200 ng linearized vector 2 μl 5X In-Fusion HD enzyme premix Made up to 10 μl with sterilized water

56 These in-fusion reactions were incubated at 50 °C for 15 minutes. The in-fusion reactions were used to transform DNA into Stellar competent cells (Clontech laboratories). 2.5 μl of the in-fusion reactions were added to 50 μl of Stellar cells and incubated on ice for 30 minutes. The cells were then heat shocked for 45 seconds in a 42 °C water bath, then returned to ice for 2 minutes. 450 μl of SOC medium was added to the transformed cells and incubated for 1 hour at 37° C while shaking. 100 μl of the transformed cells were plated on to a LB agar plate containing 40 μg/ml of kanamycin for pET42a and pET50b, and 100 μg/ml of ampicillin for pET32a, then incubated overnight at 37 °C to selectively culture the transformed cells. 5 colonies from each in-fusion reaction were selected to inoculate 5 ml of LB medium containing the appropriate antibiotic and incubated overnight at 37 °C while shaking. The incubated cells were harvested by centrifuging at 6800 g and the plasmids were extracted using a QIAprep spin miniprep kit (Qiagen)

The soluble tagged OACs were expressed and purified by the same protocol as previously stated, except for pET32a, which required 100 μg/ml of ampicillin rather than kanamycin. The soluble tags were cleaved by the same TEV protease protocol, and concentrated using a vivaspin 20 with a molecular weight limit of 10 kDa.

7.9 OAC size exclusion chromatography

Due to significant contaminants being present in the OAC, reverse IMAC gel filtration was conducted using the previous protocol in order to improve the sample purity and therefore accuracy of biochemical studies. The purity of the fractions collected from gel filtration were analyzed by SDS-PAGE. Pure fractions were pooled and concentrated to approximately 10 mg/ml.

7.10 Biotransformations

TKS and OAC were incubated with their substrates in order to test their activity and identify their product profile. TKS at 10 μM, 20 μM and 40 μM was incubated overnight at 25 °C with 10 μM hexanoyl-CoA and 30 μM malonyl-CoA in 250 μl reaction volumes. The organic products were extracted by vigorously vortexing the reaction volume with 100 μl of 100% ethyl acetate. This mixture was then centrifuged at 17900 g for 5 minutes and the organic layer was removed. The organic layer was dried and then resuspended in 50% ethanol for testing by mass spectrometry. This protocol was repeated with both OAC and OAC tagged with GST (OAC-GST) at varying concentrations. The reactions included; 10 μM TKS and 20 μM OAC, 20 μM TKS and 20 μM OAC, 10 μM TKS and 10 μM OAC-GST, 20 μM TKS and 10 μM OAC-GST while the hexanoyl-CoA and malonyl-CoA were maintained at 10 μM and 30 μM, respectively. Negative controls of TKS and OAC without substrates, substrates without the proteins and substrates with OAC were also incubated and samples were extracted and analysed by mass spectrometry.

57 7.11 Mass spectrometry of organic products

The extracted products were analysed by HPLC and mass spectrometry. A 1290 Infinity II UHPLC (Agilent) was used with a ACQUITY UPLC BEH C18 Column, 130Å, 1.7µm, 2.1mm X 50mm (Waters). The column temperature was maintained at 50 °C. The mobile phase consisted of solvent A containing water and 0.05% formic acid, and solvent B containing acetonitrile and 0.05 % formic acid. The ratios of solvent, running time and flow rate used are shown in table 3.

Table 3 - HPLC conditions Time (min) Solvent A % Solvent B % Flow rate (ml/min) 0 95 5 0.6 2 95 5 0.6 10 5 95 0.6 12 5 95 0.6 13 95 5 0.6 15 95 5 0.6

The mass spectrometry was conducted using 6560 Ion mobility QTOF LC/MS (Agilent) set to a negative ion polarity and mass range of 100m/z to 1200m/z. The gas temperature was set to 325 °C. The drying gas set to 12 L/min, nebulizer set to 40 psig, and the sheath gas temperature was set to 400 °C with a flow rate of 12 L/min. The data were analyzed using Masshunter workstation software version B.08.00 (Agilent).

7.12 TKS mutant design and activity testing

Using the determined TKS crystal structure, a number of single amino acid mutants were designed in an attempt to increase the efficiency of the protein by reducing the levels of by-products. The mutants designed were S126A, M130A, D185A, M187A, I248A, L257A, F259A, L261A, H297A, N330A and S332A. All of these mutants were based in the active site of the TKS in an effort to alter its shape and therefore product specificity, the cysteine that is crucial to polyketide activity was not altered. However, the previously reported histidine and asparagine - His297 and Asn330 - that are important to the enzymes activity were mutated to test the effect of their loss. The methionine that has been reported to be important in dimerization, Met130, was also mutated to test the effect of the loss of this residue. Primers were designed and CloneAmp HiHi PCR premix was used for the reactions. The primers used for each mutation are shown in table 4.

58 Table 4 – Primers to introduce TKS mutations Mutation Forward primer (5’-3’) Reverse primer (5’-3’) S126A CACTTAATTTTTACCTCAGCGGC CACCAGGCATATCGGTAGTCGCCGCT GACTACCGATATGCCTGGTG GAGGTAAAAATTAAGTG M130A CCTCAGCGTCGACTACCGATGCG CAATGATAGTCGGCACCAGGCGCATC CCTGGTGCCGACTATCATTG GGTAGTCGACGCTGAGG D185A GTGTTCTGGCCGTTTGCTGTGCT CGAAACAGGCATGCCATGATAGCACA ATCATGGCATGCCTGTTTCG GCAAACGGCCAGAACAC M187A GCCGTTTGCTGTGATATCGCGGC CCACGAAACAGGCATGCCGCGATATC ATGCCTGTTTCGTGG ACAGCAAACGGC I248A CAAATAGCGAAGGCACTGCCGG CGCGGATGTGGCCCCCGGCAGTGCC GGGCCACATCCGCG TTCGCTATTTG L257A GCCACATCCGCGAAGCTGGAGC CTTTATGCAGGTCAAAAATCGCTCCAG GATTTTTGACCTGCATAAAG CTTCGCGGATGTGGC F259A CCGCGAAGCTGGACTGATTGCTG GACATCTTTATGCAGGTCAGCAATCAG ACCTGCATAAAGATGTC TCCAGCTTCGCGG L261A GCTGGACTGATTTTTGACGCGCA CATCGGGACATCTTTATGCGCGTCAA TAAAGATGTCCCGATG AAATCAGTCCAGC H297A CAGCATTTTCTGGATAACGGCTC GATAGCTTTGCCGCCCGGAGCCGTTA CGGGCGGCAAAGCTATC TCCAGAAAATGCTG N330A CGTGTTGAGCGAACATGGTGCTA GACTGTTGAGGAGCTCATAGCACCAT TGAGCTCCTCAACAGTC GTTCGCTCAACACG S332A GAGCGAACATGGTAATATGGCCT GAATAGGACTGTTGAGGAGGCCATAT CCTCAACAGTCCTATTC TACCATGTTCGCTC

The PCR reactions contained: 1 μl forward primer (10 pmol) 1 μl reverse primer (10 pmol) 1 μl template TKS plasmid (100 ng/ml) 12.5 μl 2X CloneAmp HIFI PCR premix 9.5 μl sterilized water

These PCR reactions were run using a Biometra TRIO thermal cycler- Triple Powered PCR thermal cycler with the following thermal cycling conditions: Step 1: 98 °C for 3 minutes 10 seconds Step 2: 35 cycles of: 98 °C for 30 seconds 55 °C for 40 seconds 72 °C for 4 minutes Step 3: 72 °C for 9 minutes 40 seconds

59 These PCR products were electrophoresed on a 1% agarose gel to ensure they were successful. Each reaction had 1 μl DpnI (20000 units/ml) restriction enzyme (Agilent) added to them and incubated at 37 °C for 1 hour to digest the plasmid template. These reaction products were used to transform Stellar competent cells using the same transformation protocol mentioned before. Four colonies from each mutation reaction were selected to inoculate 5ml of LB medium containing 40 μg/ml of kanamycin and incubated overnight at 37 °C while shaking. The cells were harvested by centrifuging at 6800g and the plasmids were extracted using QIAprep spin miniprep kit (Qiagen). The plasmids were sequenced to verify the mutations.

The TKS mutants were then expressed in ArcticExpress cells and purified using the previous IMAC protocol. The activity of the soluble mutants was tested by the previous mass spectrometry protocol, both individually and also in tandem with OAC.

60 8. Results and discussion

8.1 TKS expression and purification

Section 8.1 set out to produce a pure soluble sample of the TKS protein from the cannabinoid biosynthetic pathway using histidine tagged recombinant protein and an IMAC column. This was necessary to obtain a sample of TKS which could be used in the biochemical and structural characterisation studies.

Figure 2.1 SDS-PAGE of the IMAC column purification of TKS - Histidine tagged TKS expressed in BL21 (DE3) cells was purified using a 5 ml His Trap FF column and analyzed by SDS-PAGE. The samples were electrophoresed on a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with masses labelled to the left. Lane 2 contains a lysed cells sample, lane 3 contains a sample of the supernatant from centrifuged lysed cells and lane 4 contains a sample of the flow through from the IMAC column. Lanes 5 to 10 contain samples of the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain samples of the column elutions; lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions. The sample for lane 12 was lost during loading.

The SDS-PAGE of the IMAC purification of TKS in figure 2.1 shows the over-expression was successful in BL21 (DE3) cells and produced a large quantity of soluble TKS. The elution fractions contain large amounts of a protein at 43 kDa, which is the estimated mass of TKS with a 6x- Histidine tag. This shows that large amounts of soluble TKS can be expressed in an E. coli host, which would suggest that this protein could be introduced into this heterogeneous organism and be active in vivo. The IMAC purification was relatively effective as the elution fractions contained low levels of contaminants. The volume and quality of the TKS obtained was sufficient for use in the subsequent biochemical studies of the protein. For crystal trials of TKS the removal of the 6x- Histidine tag and further purification was required, and this was conducted by TEV protease digestion and size exclusion chromatography, respectively.

61

Figure 2.2 SDS-PAGE of purified histidine-tagged TKS digestion by TEV protease - The purified TKS was incubated overnight at 0 °C with TEV protease to cleave the 6x-Histidine tag. Cleaved and uncleaved samples were electrophoresed on a 10 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel SDS-PAGE; lane 1 contains the protein ladder with the masses labelled to the left, lane 2 contains desalting flow through, lanes 3 and 4 contain unlceaved TKS with sample volumes of 5 μl and 10 μl respectively. Lanes 5 to 10 contain the cleaved TKS with alternating sample volumes of 5 μl and 10 μl.

The SDS-PAGE of the TEV digestion in figure 2.2 shows that it was successful as there is a reduction in molecular weight of 1 kDa between the cleaved and uncleaved samples, caused by the removal of the 6x-Histidine tag from the TKS. The desalting flow through showed no sign of the TKS protein, which means that none was lost during the process of removing the imidazole from the samples. The cleaved tags and TEV protease must subsequently be removed from the protein sample; this was achieved by reverse IMAC.

62

Figure 2.3 SDS-PAGE of the reverse IMAC of TEV digested TKS - The TEV digested TKS was passed through a reverse IMAC using a 5 ml His Trap FF column to remove the 6x-Histidine tag and TEV protease from the protein samples. The flow through samples and washes of the column were electrophoresed on a 10 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel SDS-PAGE; lane 1 contains the protein ladder with the masses labelled to the left, lanes 2 to 4 contains the flow through samples, lanes 5 to 8 contain the washes of the column with increasing imidazole concentrations from 0 mM to 500 mM.

The SDS-Page of the reverse IMAC of the TEV digested TKS in figure 2.3 shows that it was successful in removing the 6x-Histidine tag, TEV protease and the majority of contaminants that were still present from the initial IMAC purification. The three flow through samples and the 0mM imidazole wash contained large quantities of relatively pure TKS which, with further purification, would be of sufficient purity and homogeneity for crystallography trials. The uncleaved TKS and contaminants can be seen in the 30 mM and both 500 mM imidazole washes verifying that they had been removed from the TKS sample. The flow through and 0 mM wash samples could now undergo size exclusion chromatography in order to remove any contaminants, which were not removed during the previous steps.

63 TKS size exclusion chromatography 1000.00

900.00

800.00 700.00 600.00 500.00 400.00 300.00 200.00

100.00 UV UV absorption 280 nm(mAU) 0.00 0.00 200.00 400.00 600.00 800.00 1000.00 -100.00 Fraction volume (ml)

Figure 2.4 UV absorption of the fraction volumes collected from the size exclusion chromatography of TKS - The flow through samples and 0 mM imidazole wash from the reverse IMAC of digested TKS were further purified by gel filtration using a HiLoad Superdex 75 (26/60) column. The peaks in UV absorption at the wavelength 280 nm show which fraction volumes collected from the size exclusion chromatography contained the protein of interest. UV absorption units are milli-arbitrary units (mAU), the volumes of the fractions collected from the column are shown in ml.

The UV absorption profile shown in figure 2.4 enables the identification of which fractions collected from the size exclusion chromatography contain the desired protein. The UV peak at 860 mAU between the fraction volumes 300 ml and 370 ml identifies these fractions as containing the TKS. From this the fractions labelled C2 to D7 were chosen to have their purity analyzed by SDS-PAGE, in order to confirm they were of suitable purity to be pooled and used in subsequent biochemical and crystallization studies. The small peak at 800ml may represent a small quantity of contaminants that have been removed from the TKS sample. However, this peak may also have been caused by a small level of TKS protein aggregates.

64

Figure 2.5 SDS-PAGE of the size exclusion chromatography purification of TKS – The fractions labelled C2 to D7 collected from the HiLoad Superdex 75 (26/60) column that showed signs of containing the desired protein were analyzed by SDS-PAGE to test their purity. The fractions collected from the size exclusion chromatography were electrophoresed on a 10 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left, lanes 2 to 10 contain the fractions that contain TKS collected from the gel filtration.

The SDS-PAGE of the TKS fractions from the size exclusion chromatography in figure 2.5 confirms that the protein was successfully purified to a level that can be used in crystallography trials as no contaminants can be observed in the fractions. The gel also shows that significant levels of the protein had not been lost during the multiple purification steps. The gel showed the highest quality fractions were C4 to D9, as these contained the highest quantity of protein with no signs of contaminants. These fractions were pooled and concentrated for use in crystallography trials.

65 8.2 OAC expression and purification

Section 8.2 set out to produce a pure soluble sample the protein OAC from the cannabinoid biosynthetic pathway by using histidine tagged recombinant protein and an IMAC column. This was necessary to confirm that an E. coli host can produce soluble active protein for biochemical analysis.

Figure 2.6 SDS-PAGE of the IMAC column purification of OAC from BL21 (DE3) - Histidine tagged OAC was expressed in BL21 (DE3) cells and purified using a 5 ml His Trap FF column. SDS-PAGE of the samples was performed using a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 contains the supernatant sample of the centrifuged lysed cells and lane 4 contains the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column, lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions.

The SDS-PAGE of the IMAC column purification of OAC expressed in BL21 (DE3) cells in figure 2.6 shows a large quantity of protein in the lysed cells samples at 13 kDa - the mass of OAC with a 6x-Histidine tag - and low levels in the elution fractions. This shows that the majority of the overexpressed protein is forming inclusion bodies and is restricted to the insoluble fraction. This suggests that OAC is a relatively insoluble protein which will require another approach to produce sufficient amounts of soluble protein for biochemical analysis. The new approach may also aid the ability of a heterogeneous organism to produce the active protein in vivo. In order to achieve this, OAC was expressed in competent Arctic Express cells (DE3) that contain additional chaperonins that should aid the folding of the protein and reduce its tendency to form inclusion bodies.

66

Figure 2.7 SDS-PAGE of the IMAC column purification of OAC from ArcticExpress (DE3) - Histidine tagged OAC that was expressed in ArcticExpress (DE3) cells and purified using a 5 ml His Trap FF column was analyzed by SDS-PAGE using a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 the supernatant sample of centrifuged lysed cells and lane 4 the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column, lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions.

The SDS-PAGE of the IMAC column purification of histidine tagged OAC expressed in ArcticExpress (DE3) cells in figure 2.7, similarly to the BL21 (DE3) expression, shows a large quantity of protein in the lysed cells samples at 13 kDa. However, no protein is present in the wash elution fractions. This further suggests that OAC has a high tendency to form insoluble inclusion bodies, which represents a major bottleneck in the biochemical study of this protein. The formation of insoluble inclusion bodies may also be toxic to an E. coli host, representing a potential obstacle in the proteins use in synthetic biology. As changing the strain of E. coli was ineffective, a new technique to produce soluble protein was required. The new approach was the addition of soluble tags to OAC and expression in the ArcticExpress DE3 cells. The three soluble tags added were TRX, GST and NUS. This was achieved by cloning the OAC gene into three new expression vectors, pET32a, pET42a and pET50b respectively.

67 8.3 Introduction of soluble tags to OAC

Work was done to introduce soluble tags to the OAC protein to overcome the formation of inclusion bodies and produce active soluble forms of the enzyme to be characterized in biochemical studies. This was required as the native OAC proved to be highly insoluble with a tendency to form inclusion bodies. Previous studies by Yang et al., 2015 have shown the soluble tag GST is capable of keeping OAC from forming inclusion bodies during expression. Therefore, the approach of adding this tag was taken in this study. The capability of the two soluble tags TRX and NUS at keeping OAC soluble during overexpression was also tested.

Figure 2.8 1% agarose gel of OAC PCR amplification products - The PCR amplification products of the OAC gene for each expression vector were electrophoresed on a 1% agarose gel with safe view nucleic acid stain. Lane 1 contains the 1 kb DNA ladder, lane 2 contains OAC for pET32a, lane 3 contains OAC for pET42a and lane 4 contains OAC for pET50b.

The 1% agarose gel of the OAC PCR products in figure 2.8 confirms that the gene was successfully amplified for each of the soluble tag expression vectors as bands at 300 base pairs can be observed. These genes were excised and purified from the gel using a QIAquick gel extraction kit for an in-fusion reaction with double digested pET32a, pET42a and pET50b.

68

Figure 2.9 1% agarose gel of pET32a, pET42a and pET50b double digest products - The double digested pET32a, pET42a and pET50b vectors were electrophoresed on a 1% agarose gel with safe view nucleic acid stain. Lane 1 contains the 1 kb DNA ladder, lane 2 contains digested pET32a, lane 3 contains digested pET42a and lane 4 contains digested pET50b.

The 1% agarose gel in figure 2.9 shows that all three vectors were successfully double digested as the three bands can be observed; they were then excised and purified using a QIAquick gel extraction kit. The amplified genes and double digested vectors were combined via an in-fusion reaction and transformed into Stellar cells. The DNA was extracted from these cultured cells using a QIAprep spin miniprep kit and sequenced to ensure the cloning was successful.

69

Figure 2.10 SDS-PAGE of the IMAC column purification of TRX tagged OAC - OAC tagged with TRX was expressed in ArcticExpress DE3 cells and purified using a 5 ml His Trap FF column. SDS-PAGE analysis of the samples was performed using a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 the supernatant sample of centrifuged lysed cells and lane 4 the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column; lanes 11 and 12 are the 20 0 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions.

The SDS-PAGE of the IMAC column purification of TRX tagged OAC in figure 2.10 shows a relatively large quantity of protein at 24 kDa - the combined mass of TRX and OAC – in the lysed cells sample, but not in any other fractions. This suggests that the TRX tag is not capable of keeping OAC soluble during protein over-expression and purification, though a more effective soluble tag may be able to. This shows that OAC is a highly insoluble protein that has a high propensity to form inclusion bodies.

70

Figure 2.11 SDS-PAGE of the IMAC column purification of GST tagged OAC - OAC tagged with GST was expressed in ArcticExpress DE3 cells and purified using a 5 ml His Trap FF column. SDS-PAGE was used to analyze the samples from the purification. The samples were electrophoresed on a 15 well 12% Mini- PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 the supernatant sample of centrifuged lysed cells and lane 4 the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column; lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions.

The SDS-PAGE of the IMAC purification of GST tagged OAC in figure 2.11 shows a large level of protein with the mass 38 kDa - the combined mass of OAC and GST – is present in the 200mM and 500mM imidazole elution fractions. This shows that GST was capable of stopping OAC from forming inclusion bodies and produced soluble protein in a sufficient quantity to be used in further studies. If the protein is still active with the GST tag still bound, this technique represents a viable approach to stopping inclusion bodies from forming in vivo, thereby increasing its activity.

71

Figure 2.12 SDS-PAGE of the IMAC column purification of NUS tagged OAC - OAC tagged with NUS was expressed in ArcticExpress DE3 cells and purified using 5 ml His Trap FF column. SDS-PAGE was used to analyze the samples using a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 the supernatant sample of centrifuged lysed cells and lane 4 the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column; lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions.

The SDS-PAGE of the IMAC column purification of the NUS tagged OAC in figure 2.12 shows a large quantity of protein at around 67 kDa - the combined mass of NUS and OAC – in the lysed cells sample, a much smaller amount is found in one of the 200 mM and one of the 500 mM imidazole elutions. This suggests that the NUS tag is only partially capable of keeping OAC soluble during overexpression and purification. Again this shows that OAC is a highly insoluble protein which cannot be improved by some soluble tags.

The GST tag was shown to be the most successful tag at improving the solubility of OAC. Therefore, the GST tagged OAC was used in the further studies of the protein. The GST tag required cleavage in order to test if OAC is able to remain soluble once the tag is removed. The GST tag was removed by TEV protease digestion, a sample of tagged OAC was left uncleaved in order to test the relative activity of native and GST tagged OAC. If the tagged OAC can retain its activity in vitro it further validates this as a technique that can be used in vivo to increase the solubility of the protein.

72

Figure 2.13 SDS-PAGE of the reverse IMAC of TEV digested GST tagged OAC - The TEV digested GST tagged OAC was passed through a reverse IMAC column using a 5 ml His Trap FF column to remove the GST tag, 6x-Histidine tag and TEV protease from the protein samples. The flow through samples and washes of the column were electrophoresed on a 10 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel SDS- PAGE. Lane 1 contains the protein ladder with the masses labelled to the left, lanes 2 to 4 contains the flow through samples, lanes 5 to 8 contain the washes of the column with increasing imidazole concentrations from 0 mM to 500 mM.

The SDS-PAGE of the reverse IMAC of the TEV digested OAC tagged with GST in figure 2.13 shows that it was successful in removing the majority of the GST and 6x-Histidine tags, however, there were significant levels of contaminants left. There was also a significant level of OAC still bound to the GST tag present, particularly in the third flow through. Due to this, gel filtration was conducted on the flow through and 0mM imidazole wash to ensure a sufficient purity to conduct biochemical studies.

73 OAC size exclusion chromatography 200

150

100

50

UV UV absorption 280 nm(mAU) 0 0 50 100 150 200 250 300 350 400

-50 fraction volume (ml)

Figure 2.14 UV absorption of the fraction volumes collected from the size exclusion chromatography of OAC - The peaks in UV absorption at the wavelength 280 nm shows where the desired protein is present in the fraction volumes collected from the HiLoad Superdex 75 (26/60) column. UV absorption units are milli arbitrary units (mAU), the elution volumes of the fractions collected from the column shown in ml.

The UV absorption profile shown in figure 2.14 enables the identification of which fraction volumes collected from the size exclusion chromatography contain the desired protein. The UV peak at 190 mAU between the fraction volumes of 170 ml and 210 ml identifies these fractions as containing the OAC. From this the fractions labelled E1 to G2 were chosen to be analyzed by SDS-PAGE to confirm they were of suitable purity for pooling and use in biochemical studies. There are a number of other smaller peaks at 120 ml, 150 ml, 300 ml and 350 ml. These peaks represent the numerous contaminants that had previously been observed on the SDS-PAGE of the reverse IMAC column. These peaks suggest that the majority of contaminants had been removed from the sample, and an SDS-PAGE would confirm this. One of these peaks may represent a low level of OAC aggregates.

74

Figure 2.15 SDS-PAGE of the size exclusion chromatography purification of OAC - The gel filtration of the flow through samples and 0 mM imidazole wash from the reverse IMAC of digested OAC using a HiLoad Superdex 75 (26/60) column was conducted to remove the contaminants. The fractions collected from the size exclusion chromatography were electrophoresed on a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel SDS-PAGE. Lane 1 contains the protein ladder with the masses labelled to the left, lanes 2 to 10 contain the fractions that contained OAC.

The SDS-PAGE of the OAC size exclusion chromatography fractions in figure 2.15 confirms that the contaminants previously observed in the OAC sample had been successfully removed from the chosen fractions. The fractions E3 to G1 were pooled for use in biochemical studies. These samples were of suitable purity to test the catalytic activity of OAC without the risk of contaminant proteins interfering or altering the reactions.

75 8.4 Point mutations of TKS

Section 8.4 set out to introduce point mutations to residues in the active site of TKS in the attempt to produce variants that exhibit a greater efficiency in production of the desired linear polyketide intermediate than the wild-type protein. If more efficient variants were designed then they could be introduced into the pathway in place of the wild-type to increase the pathway’s synthetic biology potential.

Figure 2.16 TKS mutations PCR products – The TKS mutation PCR products were electrophoresed on a 1% agarose gel with safe view nucleic acid stain. For the gel on the left, lane 1 contains the 1 kb DNA ladder, lane 2 contains the control TKS plasmid, lane 3 contains TKS S126A, lane 4 contains TKS M130A, lane 5 contains TKS D185A and lane 6 contains TKS M187A. For the gel on the right, lane 1 contains the 1 kb DNA ladder, lane 2 contains TKS I248A, lane 3 contains TKS L257A, lane 4 contains TKS F259A, lane 5 contains TKS L261A, lane 6 contains TKS H297A, lane 7 contains TKS N330A and lane 8 contains TKS S332A.

The agarose gels in figure 2.16 shows that the PCR reactions to introduce the mutations to the TKS plasmid were successful. Distinct bands can be seen at the same length as the control TKS plasmid, for TKS S126A and TKS L257 the bands are very faint but still present. These successful products were transformed into Stellar cells and the DNA was extracted using a QIAprep spin miniprep kit. The extracted DNA was sequenced to confirm that the mutations were present. These mutated plasmids were then used to transform ArcticExpress (DE3) cells to express and purify the variant proteins.

76 TKS S126A

TKS M130A

TKS D185A

77 TKS M187A

TKS I248A

TKS L257A

78

TKS F259A

TKS L261A

TKS H297A

79

TKS N330A

TKS S332A

Figure 2.17 SDS-PAGE of the IMAC column purification of the TKS variants - The TKS variants were all expressed in ArcticExpress (DE3) cells and purified using a 5 ml His Trap FF column. The solubility of the TKS variants was tested by SDS-PAGE. The mutants S126A, M130A, D185A, M187A, H297A, N330A and S332A were electrophoresed on a 15 well 12% Mini-PROTEAN TGX Stain-Free Precast Gel. Lane 1 contains the protein ladder with the masses labelled to the left. Lane 2 contains the lysed cells sample, lane 3 the supernatant sample of centrifuged lysed cells and lane 4 the flow through sample of the IMAC column. Lanes 5 to 10 contain the column washes with increasing imidazole concentrations from 0 mM to 50 mM. Lanes 11 to 14 contain the elutions of the column; lanes 11 and 12 are the 200 mM imidazole elutions, 13 and 14 are the 500 mM imidazole elutions. The mutants I248A, L257A, F259A and L261A were electrophoresed on multiple 10 well 12% Mini-PROTEAN TGX Stain-Free Precast Gels, each has a protein ladder in lane 1, then follows the same sample order left to right as the previous stated TKS variant gels with an additional lane containing a protein ladder. For L257A and L261A the 10 mM imidazole washes could not be loaded.

80

From figure 2.17 it can be observed that S126A, M130A, I248A, L257A and S332A variants of TKS do not show any soluble protein in the supernatants, flow through, washes or elution fractions of the IMAC purifications. This suggests that these mutations substantially reduce the solubility of the protein. In the case of Met130, this supports the notion that this residue is important in dimerization and shaping the adjacent active site. Due to these mutations being detrimental to the protein’s solubility they could not be tested for any biochemical activity. The D185A, M187A, F259A, L261A, H297A and N330A TKS variants showed some soluble protein being produced. D185A, M187A and L261A show the highest level of soluble protein produced, however, it is still significantly lower than the purified wild-type TKS, which suggests these mutations cause a reduction in the solubility of the protein. During the storage of the variants D185A and M187 they precipitated out of solution, again suggesting that these mutations severely reduced the stability of TKS. The F259A, H297A, and N330A variants showed minimal amounts of soluble protein in the washes or elution fractions, giving further confirmation that the mutation of residues in the active site of TKS reduces its ability to fold correctly and therefore its solubility. Despite the low yield of soluble protein, the variants F259A, L261A, H297A and N330A had their activities tested by incubation with their substrates – hexanoyl-CoA and malonyl-CoA - and their products tested by liquid chromatography–mass spectrometry (LC/MS). The TKS variants were tested individually and in tandem with OAC.

81 8.5 Liquid chromatography mass spectrometry of biotransformations

Section 8.5 set out to confirm that both the recombinant proteins TKS and OAC exhibited the same activity as the native proteins that have previously been extracted from the plant species C. sativa and characterized. LC/MS was conducted to identify the product profile of the recombinant proteins and characterize their catalytic activity. Confirmation of the protein’s activity was important in ensuring that an E. coli host is able to express active forms of the enzymes, which is significant for the introduction of the pathway into the microbial chassis for use in synthetic biology.

PDAL HTAL Olivetol

Figure 3.1 LC/MS of olivetol, HDAL and PDAL - The available standards of the products from the TKS mediated reaction were analysed by a 1290 Infinity II UHPLC (Agilent) coupled with a 6560 Ion mobility QTOF LC/MS (Agilent). The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

The LC/MS analysis of the controls olivetol, PDAL and HTAL in figure 3.1 allows their accurate identification during the characterization of the product profile of TKS. The dark green peak of olivetol shows an acquisition time of 5.5 minutes, the blue peak of PDAL shows an acquisition time of 5 minutes, and the light green HTAL shows an acquisition time of 4.8 minutes. When molecules with the same mass are identified to have the same acquisition time they can be confirmed as that product.

82

Negative control of hexanoyl-CoA and malonyl-CoA without protein

Negative control of TKS and OAC without substrates

83 Negative control of OAC with hexanoyl-CoA and malonyl-CoA

Figure 3.2 LC/MS of controls - The organic layer of the negative controls hexanoyl-CoA and malonyl-CoA without either protein, TKS and OAC without hexanoyl-CoA and malonyl-CoA, and OAC with hexanoyl-CoA and malonyl-CoA were analysed by a 1290 Infinity II UHPLC coupled with a 6560 Ion mobility QTOF LC/MS. The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

The LC/MS analysis of the organic layer of the negative controls hexanoyl-CoA and malonyl-CoA, TKS and OAC without hexanoyl-CoA and malonyl-CoA, and OAC with hexanoyl-CoA and malonyl- CoA in figure 3.2 showed no significant peaks, therefore, no intermediates or products of the pathway were present. This allows the confirmation of any products identified from the biotransformation reactions that are due to the enzyme’s activity.

84 TKS with hexanoyl-CoA and malonyl-CoA

PDAL

HTAL Olivetol

PDAL isomer HTAL isomer

Figure 3.3 LC/MS of TKS incubated with hexanoyl-CoA and malonyl-CoA - The organic layer was extracted from the TKS incubated with hexanoyl-CoA and malonyl-CoA and analysed by a 1290 Infinity II UHPLC coupled with a 6560 Ion mobility QTOF LC/MS. The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

The LC/MS analysis of the organic layer from the reaction of TKS incubated with hexanoyl-CoA and malonyl-CoA in figure 3.3 produced three distinct peaks that represent the three by-products of the reaction. The green peak representing olivetol had an acquisition time of 5.5 minutes, the blue peak representing PDAL had an acquisition time of 5 minutes and the purple peak representing HTAL had an acquisition time of 4.8 minutes. These products having the same mass and acquisition time as the controls confirms their presence. This shows that the TKS is active and is able to produce all three by-products; this product profile has previously been shown by Gagne et al., 2012. The CoA linked linear polyketide intermediate could not be identified due to a lack of a protocol to identify CoA linked molecules, though there is also a high potential that majority of this intermediate is channeled to produce the by-products when OAC is not present in the reaction. These results confirm that an E. coli host can produce active TKS which is important to consider when introducing the cannabinoid biosynthetic pathway into a heterogeneous host.

85 TKS and OAC with hexanoyl-CoA and malonyl-CoA

PDAL

HTAL Product with olivetolic acid mass

HTAL PDAL isomer isomer PDAL isomer

Figure 3.4 LC/MS of TKS incubated with OAC, hexanoyl-CoA and malonyl-CoA - The organic layer extracted from the TKS incubated with OAC, hexanoyl-CoA and malonyl-CoA were analysed by a 1290 Infinity II UHPLC coupled with a 6560 Ion mobility QTOF LC/MS. The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

The LC/MS analysis of the organic layer from the reaction of TKS incubated with OAC, hexanoyl- CoA and malonyl-CoA in figure 3.4 produced three distinct peaks that represent two of the by- products and potentially the desired product of the reaction, olivetolic acid. The red peak with an acquisition time of 5 minutes represents PDAL and the green peak with an acquisition time of 4.8 minutes represents HTAL. These two peaks confirm that these two by-products are present. The other green peak represents a product that has the same mass as the desired olivetolic acid, however, there is no accurate standard to be measured in order to compare their acquisition time. This means that it can be assumed that both the TKS and OAC are active, though the risk of the product being an isomer of olivetolic acid must be considered. The peaks for HTAL and olivetolic acid are the same colour due to them having the same mass, however, there is a significant difference in the acquisition time due to the variation in the molecule’s structure. A significant peak is no longer seen for olivetol, suggesting that the intermediate source for olivetol may be the same as olivetolic acid. This product profile, especially the drop in olivetol production, has been previously shown by (Gagne et al., 2012).

86 Native OAC and TKS with hexanoyl-CoA and malonyl-CoA

PDAL HTAL Product with olivetolic acid mass

OAC tagged with GST and TKS with hexanoyl-CoA and malonyl-CoA

PDAL Product with HTAL olivetolic acid mass

Figure 3.5 Comparison of the LC/MS analysis of OAC and OAC tagged with GST - The organic layer extracted from the TKS incubated with OAC, hexanoyl-CoA and malonyl-CoA, and TKS incubated with OAC tagged with GST, hexanoyl-CoA and malonyl-CoA. Products were analysed by a 1290 Infinity II UHPLC coupled with a 6560 Ion mobility QTOF LC/MS. The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

Both the GST tagged OAC and the native OAC show the same activity when compared by LC/MS as shown in figure 3.5, the tagged OAC retains the peaks at the acquisition times of 5.68 minutes, 5.85 minutes and 6.15 minutes, which represent HTAL, PDAL and olivetolic acid, respectively. The acquisition times differ to the standards due to these samples being analyzed during a separate run on the LC/MS, however, the similarity between the two graphs is sufficient to confirm that the tagged and native OAC maintained the same activity. GST tagged OAC maintaining the same product profile as native OAC suggests that the tag does not interfere significantly with or remove the enzyme’s activity. This has significant value when considering the use of the cannabinoid biosynthetic pathway in synthetic biology as the addition of a GST tag gives a technique of producing soluble and active OAC in an E. coli host that can work as part of the pathway in vivo.

87

Figure 3.6 Overlay of LC/MS traces of the soluble TKS variants incubated with hexanoyl-CoA, malonyl- CoA and OAC – The TKS variants that yielded soluble protein had their activities tested. These variants included F259A, L261A, H297A and N330A. The organic layer was extracted from the TKS variants incubated with hexanoyl-CoA and malonyl-CoA, and the natural products extracted from the TKS variants incubated with OAC, hexanoyl-CoA and malonyl-CoA were analysed by a 1290 Infinity II UHPLC coupled with a 6560 Ion mobility QTOF LC/MS. The x-axis represents the HPLC retention time and the y-axis represents ion intensity on the mass spectrometry detector.

The overlay off the traces from the LC/MS analysis of the natural product phase of the soluble TKS variants - F259A, L261A, H297A and N330A - incubated with hexanoyl-CoA and malonyl-CoA, and the TKS variants incubated with OAC, hexanoyl-CoA and malonyl-CoA in figure 3.6 showed no significant peaks, which suggests that all of the mutations that were performed on TKS removed its catalytic ability, both in producing the desired linear polyketide intermediate utilized by OAC and in producing the by-products. For the residues His297 and Asn330 this confirms that they are essential to the catalytic activity of the TKS. Though the other residues mutated were not those directly involved in the reaction they may have been critical to the stereochemistry of the active site or impacted the occurrence of dimerization that is necessary for activity. Thus, the variants were not suitable for introduction to the biosynthetic pathway, due to them being detrimental to the structure of the active site or overall protein and therefore substrate binding and catalytic activity.

88 8.6 TKS structure

The studies in section 8.6 set out to crystalize the protein TKS to allow determination of its apo structure by X ray crystallography. Cocrystallization of TKS with hexanoyl-CoA and malonyl-CoA to determine the structure of substrate bound TKS was also an objective of this section. The determination of the structure of TKS allows the mechanism behind its activity to be characterized and enables the design of variants of this protein in an attempt to improve its efficiency.

Apo Malonyl-CoA Hexanoyl-CoA

Figure 4.1 Images of the crystals that were used for X ray diffraction studies - Images of the crystals used for X ray crystallography and structural determination are shown, indicating the apo form and the proteins co-crystallised with malonyl-CoA and hexanoyl-CoA substrates. The scale bar show represents 100 x 100 µm.

The crystals shown in figure 4.1 are those that were used to solve the apo, malonyl-CoA bound and hexanoyl-CoA bound structures of TKS.

Figure 4.2 Ribbon representation of the apo structure of the homodimer TKS determined to a resolution of 1.2Å - Monomer A is shown in blue and monomer B is shown in orange. The entrance to the active site, the N-terminus and the C-terminus are labelled for each monomer.

89 Figure 4.3 Ribbon representation of the structure of the homodimer of TKS superimposed on the homodimer chalcone synthase (CHS) from Freesia hybrida - The superimposed structures show a RMSD of 0.69Å, when 389 residues are superimposed. TKS is blue and the CHS is purple. The CHS from Freesia hybrida was determined by Sun et al., 2015, figure adapted from PDB 4WUM.

The full homodimer structure of TKS shown in figure 4.2 shows that the protein is present as a homodimer; the entrance to the buried active site, N-terminus and C-terminus are labelled. TKS shows a very high level of structural similarity with the CHS from Freesia hybrida (F. hybrida), as shown by the low root mean square deviation (RMSD) of 0.69Å when 389 residues are superimposed. These overlaid structures are shown in figure 4.3. In both structures each monomer shows a five layered αβαβα structure, which is typical of a plant type III PKS. Both of these structures show the same active site entrance, N-terminus and C-terminus position. However, in the CHS from F. hybrida an extra α helix is seen at the N-terminus. This may only be due to the TKS structure not currently being solved to the first residue. The additional α helix has been observed in a number of other plant type III PKSs that show structural similarities to TKS, therefore, it can be hypothesized that this protein may also contain this additional α helix. The CHS from F. hybrida has been shown to contain the conserved active site triad of cysteine, asparagine and histidine, and exhibits the catalytic activity of iterative condensation of a starter CoA with an acetyl CoA produced from the decarboxylation of malonyl-CoA. The CHS from F. hybrida also produces two derailment products alongside its major one. Due to the similarity in structure, sequence identity and tendency to produce derailment by-products shown between the CHS and TKS, it can be hypothesized that their catalytic activity is due to a similar reaction mechanism (Sun et al., 2015).

90

Figure 4.4 Ribbon representation of the apo structure of the homodimer TKS determined to a resolution of 1.2Å - Monomer A is shown in blue and monomer B is shown in orange. The buried active site catalytic residues Cys157, Asn330 and His297 are shown as cylinder structure and coloured by atom type; red for oxygen, green for carbon, blue for nitrogen and yellow for sulphur.

Figure 4.5 Ribbon representation of TKS active site - The catalytic Cys157, Asn330 and His297 shown as cylinder structure and coloured by atom type; red for oxygen, green for carbon, blue for nitrogen and yellow for sulphur.

91

Figure 4.6 Ribbon representation of the active site of TKS overlayed the active site of benzalacetone synthase L208F variant from Rhem palmatum - The superimposed structures show a RMSD of 0.70Å when 376 residues are superimposed. TKS is blue and the CHS is green with the catalytic residues represented by cylinder structure. The Benzalacetone synthase L208F variant from Rhem palmatum was reported in Morita et al., 2010, figure adapted from PDB 3A5S.

The position of the buried active site of each monomer of TKS is shown in figure 4.4, the catalytic residues are shown as cylinder structures and coloured by atom type; red for oxygen, green for carbon, blue for nitrogen and yellow for sulphur. The active site of TKS in figure 4.5 shows the conserved catalytic triad of cysteine, asparagine and histidine at the residue positions 157, 330 and 297, respectively. The presence of these residues suggests that the activity of TKS is due to the previously described plant type III PKS mechanism. The superimposed active sites of TKS and the benzalacetone synthase L208F mutant in figure 4.6 shows that the residues cysteine, asparagine and histidine are present in the same position in both type III PKSs, and are also conserved in families such as CHSs. The wild-type benzalacetone synthase has been crystallized with its monoketide intermediate covalently bound to the sulphur of its Cys157, which gave direct evidence that the activity of plant type III PKSs is due to the cysteine acting as a nucleophile to attack the starter CoA, and subsequently acts as the attachment point of the polyketide intermediate. As the benzalacetone synthase acts by this method and the active site is conserved within the TKS, it can be confirmed that its catalytic activity is due to the triad of residues, in particular the cysteine. The detrimental effects of the mutations of Asn330 and His297 on the activity of TKS further supports this conclusion. The wild-type form of benzalacetone synthase only performs a one-step decarboxylative condensation, rather than the multiple reactions that is normally observed in plant type III PKSs. The L208F mutant of benzalacetone synthase has the typical iterative condensation activity of a PKS returned. This would suggest that the two phenylalanine gatekeepers to the active site, particularly at residue position 208, play a role in control of the repetition of the condensation reaction (Morita et al., 2010).

92

Figure 4.7 Ribbon representation of TKS with phenylalanine gatekeepers to active site shown - The gatekeepers to the active site Phe208 and Phe259 are shown as cylinder structures and coloured by atom type; red for oxygen, green for carbon and blue for nitrogen. Their position in each monomer is shown on the left and their orientation is shown on the right.

Figure 4.8 Ribbon representation of the active site of TKS overlaid on the active site of chalcone synthase (CHS) from Oryza sativa - The superimposed structures show a RMSD of 0.71Å, when 381 residues are superimposed. TKS is blue and the CHS is pink with the phenylalanine gatekeeper residues represented by cylinder structure, Phe268 in the CHS is present in a dual conformation. CHS from Oryza sativa was determined in Go et al., 2015. The figure is adapted from PDB 4YJY.

93

The phenylalanine gatekeeper residues to the active site are often conserved in Type III plant PKSs, particularly in the CHS family. The residues Phe208 and Phe259 fulfill this role in TKS and their position and orientation are shown in figure 4.7. The position of these residues is shown to be conserved by the overlay with the CHS from Oryza sativa (O. sativa) in figure 4.8. The conservation of these residues suggests that they are important to the function of the protein, particularly, as mentioned before, in the mediation of the iterative condensation reactions. The function of the Phe208 and Phe259 in the CHS from O. sativa has previously been investigated by mutating these residues (Go et al., 2015). The results of these variants showed no alteration to the product profile of the enzyme, suggesting these residues do not play an essential role in the catalytic activity of plant Type III PKSs, but in specific examples they can impact the number of repetitive condensation reactions that occur.

94

Figure 4.9 Ribbon representation of the side of TKS that interacts with adjoining monomer - The methionine that interacts with the adjoining monomer is shown as cylinder structure and coloured by atom type; red for oxygen, green for carbon and blue for nitrogen. Their position in each monomer is shown on the left and their orientation shown on the right.

Figure 4.10 Ribbon representation of the active site of TKS overlaid with the active site of chalcone synthase (CHS) from Oryza sativa - The superimposed structures show a RMSD of 0.71Å when 376 residues are superimposed. TKS is blue and the CHS is pink with the methionine at residue positions 130 and 140, represented by cylinder structure. The CHS structure from Oryza sativa was determined in Go et al., 2015. Figure was adapted from PDB 3A5S.

95 The methionine residues are shown to be conserved on the loop that interacts between the two monomers of the PKSs. The position of the Met130 in each monomer of TKS and their orientation is shown in figure 4.9. The conserved position and orientation of these methionine residues is shown in the overlay figure in 4.10 of TKS with the CHS from O. sativa. This coupled with the detrimental effect of the Met130 mutant further supports the notion that this residue is important in the TKS for dimerization and shaping the active site of the adjoining monomer.

96

Figure 4.11 Ribbon representation of TKS monomer A bound to malonyl-CoA and hexanoyl-Coa - TKS with bound malonyl-CoA is shown in the blue diagram and TKS with bound hexanoyl-CoA is shown in the red diagram. The malonyl-CoA and hexanyl-CoA are shown as ball and stick structures with atoms coloured;

carbons are green, nitrogens are blue, oxygens are red and phosphorus in pink.

Figure 4.12 The electrostatic potential representation of the TKS active site entrance with bound malonyl-CoA and hexanoyl-CoA - Neutral residues are shown in grey, negative residues are shown in red and positive residues are shown in blue. The bound malonyl-CoA and hexanyl-CoA are shown as ball and stick structures with atoms coloured; carbons are green, nitrogens are blue, oxygens are red and phosphorus is in pink. The malonyl-CoA is shown on the left and hexanoyl-CoA is shown on the right.

97 Figure 4.11 confirms that malonyl-CoA and hexanoyl-CoA enter the TKS through the previously shown active site entrance to reach the buried active site. The CoA portion of these substrates protrudes from the protein and interacts with residues close to the surface. The electrostatic potential representations shown in figure 4.12 confirm the presence of a CoA binding tunnel in TKS that the malonyl-CoA and hexanoyl-CoA move through in order to reach the active site residues. Again this is a typical feature of a plant type III PKS, which is further evidence that the TKS from C. sativa is a typical member of this family and acts through the conserved mechanism.

98 Figure 4.13 Ribbon representation of the apo structure of TKS superimposed onto the malonyl-CoA bound and hexanoyl-CoA bound TKS structures - The apo TKS structure is ice blue, the malonyl-CoA bound TKS is bright blue and the hexanoyl-CoA bound structure is red. The malonyl-CoA bound TKS showed

a RMSD of 0.39Å and the hexanoyl-CoA bound form showed a RMSD of 0.34Å.

Figure 4.14 Ribbon representation overlay of the loop where most significant movement occurs during TKS substrate binding – The apo TKS is coloured ice blue, malonyl-CoA is coloured blue and hexanoyl-CoA is coloured red. The structures of both malonyl-CoA and hexanoyl-CoA bound TKS are superimposed onto the structure of apo TKS. The substrates malonyl-CoA and hexanoyl-CoA are shown as ball and stick structures with their atoms coloured; carbons are green, oxygens are red, nitrogens are blue and phosphorus is in pink.

99

The small movement that occurs during substrate binding can be observed in figure 4.13. The superimposed structures of malonyl-CoA and hexanoyl-CoA bound TKS onto apo TKS exhibited a RMSD of 0.39Å and 0.34Å, respectively. The most significant movement occurs at the loop position shown in figure 4.14. The level of movement can be observed in this figure as the apo and substrate bound TKS structures are superimposed. Although this is the most significant movement that occurs upon substrate binding, it is still relatively minor. The loop that undergoes this movement is positioned close to the entrance of the active site, suggesting that this conformational change may be attributed to the binding of the substrates to surface residues.

100

Figure 4.15 Ribbon structure of the active site of TKS bound to malonyl-CoA - The residues Lys301, Lys55, Leu261, Lys263, Ala302 and Gly299 that interact with the malonyl-CoA molecule are labelled and the solvent molecules that are present in the active site are shown as red spheres. Hydrogen bonds are represented by dashed lines between atoms. The oxygen, sulphur, and nitrogen atoms of the amino acid residues are coloured red, yellow and blue, respectively. The malonyl-CoA is shown as ball and stick structure with atoms coloured; carbons are green, nitrogens are blue, oxygens are red and phosphorus is in pink.

Figure 4.15 shows that two phosphate groups of the diphosphate region of the malonyl-CoA form hydrogen bonds with the residues Lys301 and Lys55 at the entrance to the CoA binding tunnel. The residues Lys263 and Leu261 that are present on the surface of TKS also form hydrogen bonds with the phosphoadenosine region of the malonyl-CoA. Lys263 forms a hydrogen bond with the hydroxyl group and Leu261 forms a hydrogen bond with the amine group. Moving into the entrance of the CoA binding tunnel, the residue Ala302 forms a hydrogen bond with the oxygen of the pantoic acid region of the malonyl-CoA. Within the CoA binding tunnel Gly299 forms a hydrogen bond with the amine group of the β-cystamine region of the malonyl-CoA. A number of solvent molecules that are present in the active site also are shown to form hydrogen bonds with the substrate within the active site. Previously determined plant PKSs have reported a similar level of hydrogen bonds forming with lysine residues that are present at the entrance to the active site and with an alanine residue further into the CoA tunnel. The majority of other interactions have been attributed to van der Waals contacts (Ferrer et al., 1999).

101

Figure 4.16 Ribbon structure of the active site of TKS bound to hexanoyl-CoA - The residues Lys301, Leu261 and Gly299 that interact with the hexanoyl-CoA molecule are labelled and the solvent molecules that are present in the active site are shown as red spheres. Hydrogen bonds are represented by dashed lines between atoms. The oxygen, sulphur, and nitrogen atoms of the amino acid residues are coloured red, yellow and blue, respectively. The hexanyl-CoA is shown as ball and stick structure with atoms coloured; carbons are green, nitrogens are blue, oxygens are red and phosphorus in pink.

As seen in figure 4.16 the hydrogen bonds that TKS forms with hexanoyl-CoA and malonyl-CoA are very similar, although there is a difference in the number of bonds made with each substrate. On the surface of TKS, both Lys301 and Leu261 form hydrogen bonds at the same atom positions with hexanoyl-CoA as they do with malonyl-CoA. The Lys55 and Lys263, which form hydrogen bonds with malonyl-CoA, do not form bonds with hexanoyl-CoA. Moving towards the active site, only Gly299 forms a hydrogen bond with hexanoyl-CoA, whereas malonyl-CoA forms the extra bond with Ala299. Similarly to malony-CoA, solvent molecules that are present in the active site form hydrogen bonds with hexanoyl-CoA.

102

Figure 4.17 Active site of TKS with bound malonyl-CoA and hexanoyl-CoA - The active site of TKS with bound malonyl-CoA is shown in blue and TKS with bound hexanoyl-CoA is shown in red. The oxygen, sulphur and nitrogen atoms of the catalytic residues are coloured red, yellow and blue respectively. The malonyl-CoA and hexanoyl-CoA are shown as ball and stick structures with atoms coloured; carbons are green, nitrogens are blue, oxygens are red and phosphorus is in pink. The density of malonyl-CoA and hexanoyl-CoA is only continuous until their respective sulphur atoms, thus, they have only been modeled up to this point.

Figure 4.18 The superimposed active site of apo TKS with malonyl-CoA and hexanoyl-CoA - The apo TKS is coloured ice blue, TKS with bound malonyl-CoA is coloured blue and TKS with bound hexanoyl-CoA is coloured red. The catalytic residues and phenylalanine gatekeeper residues are present as cylinder structures.

103 Figure 4.17 shows that both malonyl-CoA and hexanoyl-CoA enter the enzyme via the previously mentioned entrance and CoA binding tunnel to reach the active site, in order to interact with the catalytic residues. The density of both substrates is not continuous after their respective sulphur atoms, therefore, they have only been modeled to this point. There is evidence of the density of the carboxyl groups of each substrate being present in the vicinity of the catalytic residues, however, this has not yet been confirmed, and therefore, was not included on the model. This break in the continuous density of the substrates suggests that these proteins are still active while in their crystalline form and may be performing the decarboxylation and nucleophilic attack on malonyl- CoA and hexanoyl-CoA, respectively. This gives even further evidence of the TKS being active via the typical PKS reaction mechanism. Figure 4.18 compares the orientation of the catalytic residues and phenylalanine gatekeepers in apo TKS and substrate bound TKS. This figure shows that despite there being a small level of movement in the full protein, there is almost none exhibited by the catalytic triad or the gatekeeper residues, thus, no significant conformational change in the active site can be attributed to malonyl-CoA or hexanoyl-CoA binding.

Figure 4.19 Mutated residues of TKS - The mutated residues based around the active site of TKS are shown as red cylinder structures on a ribbon representation of the apo structure of TKS.

Figure 4.19 depicts the residues that were mutated during this study, these residues where chosen as they are the larger amino acids that surround the active site of TKS. This meant there was the potential that mutating these residues would alter the shape of the active site and therefore impact the product profile of the enzyme. As previously stated these mutations were detrimental to the enzyme’s solubility and catalytic activity, which may be due to them playing a role in the correct folding of the protein.

104 9. Conclusions

A number of conclusions can be made from the work conducted in this study in relation to both synthetic and structural biology. From the initial work conducted, it was confirmed that the first enzyme of the cannabinoid biosynthetic pathway - TKS - can be expressed in an E. coli host, and that the protein produced is soluble. The activity of this recombinant protein was confirmed by LC/MS analysis of the products from biotransformation reactions of TKS incubated with the substrates hexanoyl-CoA and malonyl-CoA. The data from the mass spectrometry matched those previously collected by Gagne et al., 2012, confirming that the recombinant TKS expressed in an E. coli host is active and can be included in a synthetic biology cannabinoid pathway construct.

A pivotal outcome from this study was the first successful crystallization and determination of the three-dimensional structure of the TKS from C. sativa in both its apo and substrate-bound forms. This has major relevance in the biochemical study of this protein and enabled its comparison to previously studied plant type III PKSs. By doing so, the macromolecular structure was confirmed as a homodimer consisting of two five layered αβαβα monomers, which is a tertiary structure that has been described in a number of type III PKSs, especially within the CHS family. The mechanism of the TKS can also be attributed to the conserved cysteine, asparagine and histidine residues, which have previously been reported to catalyze the iterative reaction of PKSs. The substrate-bound structure of TKS further supported this proposed mechanism. Both hexanoyl-CoA and malony-CoA can be observed entering the active site of the TKS, and both are cleaved at their respective sulphur atoms in the vicinity of the triad of catalytic residues. This supports the proposed reaction of nucleophilic attack by the cysteine residue at the sulphur atom of the substrates.

The prevalent role of other residues can also be confirmed in the TKS due to their conservation in numerous PKSs from a variety of species. The phenylalanine residues that act as gatekeepers to the active site have been identified in other type III PKSs and are important to the iterative character of the reaction catalyzed by the class of enzymes. The methionine residue that has been identified as essential to dimerization in type III PKSs was also showed to be present in the TKS in the same position and orientation. This is further evidence of TKS being a type II PKS and that the methionine is important to the enzyme binding substrate. The structure of TKS also allowed the design of mutants based around the active site in an attempt to alter its activity and to inform the design of a more efficient variant.

The mutations of the TKS aimed to alter and ultimately improve the efficiency of the enzyme, therefore increasing its potential use in synthetic biology. All of the mutations performed changed the selected amino acids to alanine residues to test their importance and impact on the protein’s stability and activity. The mutation of the residues Asn330 and His297 gave further evidence of these residues being crucial to the activity of the TKS, and therefore polyketide synthases in general, as these mutations caused complete loss of catalytic activity. Mutation of the residue Met130 gave evidence that this residue is crucial to the protein’s stability, therefore, supporting the

105 conclusion that this residue is involved in dimerization. The alanine scanning study of the residues Ser126, Asp185, Met187, Ile248, Leu257, Phe259, Leu261 and Ser332 aimed to alter the activity of the enzyme in order to inform the design of variants that exhibit a higher efficiency than the natural TKS. All of these mutations either decreased the solubility or the catalytic activity of the protein, showing that they are detrimental to the folding of the protein or at least removed its ability to utilize hexanoyl-CoA and malonyl-CoA. Although these results have not aided the improvement of the protein’s efficiency, they have informed any future attempts to design mutants that can retain their activity and solubility. Particularly, they showed that a different approach to protein engineering is required to produce stable and active variants. A potential method would be to mutate active site residues to ‘similar’ amino acids such as aspartic acid and tyrosine to asparagine and phenylalanine, respectively. This approach allows alteration of the functional groups while not drastically changing the residue’s structure. This approach may be more effective at altering the activity of the enzyme while not completely inhibiting it.

The second member of the pathway OAC was shown to be a highly insoluble protein, which has a high tendency to form inclusion bodies when expressed in an E. coli host. This posed a drawback in trying to use this enzyme in a synthetic biology capacity, as active forms may not be possible to express in vivo. Despite this issue, the technique of adding the soluble GST tag to OAC, previously shown by Yang et al., 2015, was confirmed by this study as a viable approach to producing soluble and active forms of this protein in an E. coli host. GST was also shown as the most effective soluble tag to use for OAC compared to TRX and NUS tags, which were not as effective at reducing inclusion body formation.

Both the TKS and OAC that were expressed by a heterologous host were shown to contain the previously described activity and product profile of the native enzymes, which were obtained from the plant species C. sativa. This was shown by a LC/MS analysis of a biotransformation reaction of TKS and OAC incubated with the substrates hexanoyl-CoA and malonyl-CoA. It was confirmed that this reaction produced olivetolic acid, which is an essential intermediate for the cannabinoid biosynthetic pathway. This validates that the first two stages of cannabinoid production can be introduced to a heterologous host, which is an essential step towards introducing the entire pathway in order to allow controlled production of specific cannabinoids.

Though the studies presented in this thesis have begun the design of a cannabinoid biosynthetic pathway in an E. coli host, there is still much to build upon. The structural characterization of the TKS allowed the initial stages of protein engineering to design a variant that exhibits higher efficiency to be conducted. However, further work in this area is still required. A finer control of the mutations made to the TKS may be necessary to produce an active variant which can then be included in the cannabinoid biosynthetic pathway. Furthermore, future work on the later members of the pathway will be required in order to build a construct in an E. coli strain that can produce cannabinoids, specifically THC and CBD.

106 10. Future work

A large amount of understanding of the cannabinoid biosynthetic pathway is required in order to effectively introduce it into a heterologous host. The next member of the pathway - the aromatic prenyltransferase (APT) - is a membrane bound protein that may represent a bottleneck in the ability to insert the pathway into a heterologous host. This is because membrane bound proteins can have a high tendency to be toxic to cells. The study of the biophysical and biochemical characteristics of the APT may be required to overcome such toxicity, which could possibly come by introducing tags that improve its solubility (Schlegel et al., 2010), or by inserting a homolog of this protein that exhibits the same activity but dos not cause toxicity. The two final enzymes – tetrahydrocannabinolic acid synthase and cannabidiolic acid synthase - may also represent a bottleneck in the synthetic biology application of this pathway in E. coli. While E. coli is often the most desirable host for protein expression, these enzymes display high levels of N-glycosylation and this post-translational modification will not occur in E. coli cells unless the relevant glycosylases can be expressed in the bacterial cells. Again, a structural and biochemical study may be necessary to modify these proteins in order to allow their production in a heterogeneous organism. This may be achieved by either identifying the N-glycosylation sites to facilitate their removal via mutation or by identifying homologs of these proteins. The final decarboxylation step of the cannabinoid biosynthetic pathway, which normally occurs non-enzymatically, will also require the identification of a decarboxylase that can be included in the construct to ensure that the medically relevant cannabinoids THC and CBD are efficiently produced from an E. coli factory system.

For almost every member of this pathway, the study of their characteristics and how these can be modified has been shown to be critical to their synthetic biology potential. The analysis of the native proteins, and the design of approaches to improve their properties, demonstrates the importance of synthetic biology, biophysics and biochemistry working together in order to obtain the most efficient outcomes.

107 Bibliography

ADAMS, P. D., AFONINE, P. V., BUNKOCZI, G., CHEN, V. B., DAVIS, I. W., ECHOLS, N., HEADD, J. J., HUNG, L. W., KAPRAL, G. J., GROSSE-KUNSTLEVE, R. W., MCCOY, A. J., MORIARTY, N. W., OEFFNER, R., READ, R. J., RICHARDSON, D. C., RICHARDSON, J. S., TERWILLIGER, T. C. & ZWART, P. H. 2010. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr, 66, 213-21.

AFONINE, P. V., GROSSE-KUNSTLEVE, R. W., ECHOLS, N., HEADD, J. J., MORIARTY, N. W., MUSTYAKIMOV, M., TERWILLIGER, T. C., URZHUMTSEV, A., ZWART, P. H. & ADAMS, P. D. 2012. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr D Biol Crystallogr, 68, 352-67.

AWAN, A. R., SHAW, W. M. & ELLIS, T. 2016. Biosynthesis of therapeutic natural products using synthetic biology. Adv Drug Deliv Rev, 105, 96-106.

BAKER, D., PRYCE, G., GIOVANNONI, G. & THOMPSON, A. J. 2003. The therapeutic potential of cannabis. Lancet Neurol, 2, 291-8.

BASSARD, J. E., MOLLER, B. L. & LAURSEN, T. 2017. Assembly of Dynamic P450-Mediated Metabolons-Order Versus Chaos. Curr Mol Biol Rep, 3, 37-51.

BERENS, C., GROHER, F. & SUESS, B. 2015. RNA aptamers as genetic control devices: the potential of riboswitches as synthetic elements for regulating gene expression. Biotechnol J, 10, 246-57.

BISOGNO, T., HANUS, L., DE PETROCELLIS, L., TCHILIBON, S., PONDE, D. E., BRANDI, I., MORIELLO, A. S., DAVIS, J. B., MECHOULAM, R. & DI MARZO, V. 2001. Molecular targets for cannabidiol and its synthetic analogues: effect on vanilloid VR1 receptors and on the cellular uptake and enzymatic hydrolysis of anandamide. Br J Pharmacol, 134, 845-52.

CHEN, V. B., ARENDALL, W. B., 3RD, HEADD, J. J., KEEDY, D. A., IMMORMINO, R. M., KAPRAL, G. J., MURRAY, L. W., RICHARDSON, J. S. & RICHARDSON, D. C. 2010. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr, 66, 12-21.

CHEN, R., GAO, B., LIU, X., RUAN, F., ZHANG, Y., LOU, J., FENG, K., WUNSCH, C., LI, S. M., DAI, J. & SUN, F. 2017. Molecular insights into the of an aromatic prenyltransferase. Nat Chem Biol, 13, 226-234.

108 COBB, R. E., SUN, N. & ZHAO, H. 2013. Directed evolution as a powerful synthetic biology tool. Methods, 60, 81-90.

CSORGO, B., FEHER, T., TIMAR, E., BLATTNER, F. R. & POSFAI, G. 2012. Low-mutation-rate, reduced-genome Escherichia coli: an improved host for faithful maintenance of engineered genetic constructs. Microb Cell Fact, 11, 11.

DI MARZO, V., BIFULCO, M. & DE PETROCELLIS, L. 2004. The endocannabinoid system and its therapeutic exploitation. Nat Rev Drug Discov, 3, 771-84.

EMSLEY, P., LOHKAMP, B., SCOTT, W. G. & COWTAN, K. 2010. Features and development of Coot. Acta Crystallogr D Biol Crystallogr, 66, 486-501.

FELLERMEIER, M. & ZENK, M. H. 1998. Prenylation of olivetolate by a hemp transferase yields cannabigerolic acid, the precursor of tetrahydrocannabinol. FEBS Lett, 427, 283-5.

FELLERMEIER, M., EISENREICH, W., BACHER, A. & ZENK, M. H. 2001. Biosynthesis of cannabinoids. Incorporation experiments with (13)C-labeled glucoses. Eur J Biochem, 268, 1596- 604.

FERRER, J. L., JEZ, J. M., BOWMAN, M. E., DIXON, R. A. & NOEL, J. P. 1999. Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat Struct Biol, 6, 775- 84.

FLORES-SANCHEZ, I. J. & VERPOORTE, R. 2009. Plant polyketide synthases: a fascinating group of enzymes. Plant Physiol Biochem, 47, 167-74.

FOO, J. L., CHING, C. B., CHANG, M. W. & LEONG, S. S. 2012. The imminent role of protein engineering in synthetic biology. Biotechnol Adv, 30, 541-9.

FRASCH, H. J., MEDEMA, M. H., TAKANO, E. & BREITLING, R. 2013. Design-based re- engineering of biosynthetic gene clusters: plug-and-play in practice. Curr Opin Biotechnol, 24, 1144-50.

FUJIO, T. 2007. Minimum genome factory: innovation in bioprocesses through genome science. Biotechnol Appl Biochem, 46, 145-6.

GAGNE, S. J., STOUT, J. M., LIU, E., BOUBAKIR, Z., CLARK, S. M. & PAGE, J. E. 2012. Identification of olivetolic acid cyclase from Cannabis sativa reveals a unique catalytic route to plant polyketides. Proc Natl Acad Sci U S A, 109, 12811-6.

109 GIEGE, R. 2013. A historical perspective on protein crystallization from 1840 to the present day. FEBS J, 280, 6456-97.

GLASS, J. I., ASSAD-GARCIA, N., ALPEROVICH, N., YOOSEPH, S., LEWIS, M. R., MARUF, M., HUTCHISON, C. A., 3RD, SMITH, H. O. & VENTER, J. C. 2006. Essential genes of a minimal bacterium. Proc Natl Acad Sci U S A, 103, 425-30.

GO, M. K., WONGSANTICHON, J., CHEUNG, V. W. N., CHOW, J. Y., ROBINSON, R. C. & YEW, W. S. 2015. Synthetic Polyketide Enzymology: Platform for Biosynthesis of Antimicrobial Polyketides. ACS Catalysis, 5, 4033-42.

GROTENHERMEN, F. & MULLER-VAHL, K. 2012. The therapeutic potential of cannabis and cannabinoids. Dtsch Arztebl Int, 109, 495-501.

GUZMAN, M. 2003. Cannabinoids: potential anticancer agents. Nat Rev Cancer, 3, 745-55.

HAMPSON, A. J., GRIMALDI, M., AXELROD, J. & WINK, D. 1998. Cannabidiol and (-)Delta9- tetrahydrocannabinol are neuroprotective antioxidants. Proc Natl Acad Sci U S A, 95, 8268-73.

HASSANI, S., MOMTAZ, S., VAKHSHITEH, F., MAGHSOUDI, A. S., GANJALI, M. R., NOROUZI, P. & ABDOLLAHI, M. 2017. Biosensors and their applications in detection of organophosphorus pesticides in the environment. Arch Toxicol, 91, 109-30.

JOOSTEN, R. P., LONG, F., MURSHUDOV, G. N. & PERRAKIS, A. 2014. The PDB_REDO server for macromolecular structure model optimization. International Union Crystallography Journal, 1, 213-20.

JORGENSEN, K., RASMUSSEN, A. V., MORANT, M., NIELSEN, A. H., BJARNHOLT, N., ZAGROBELNY, M., BAK, S. & MOLLER, B. L. 2005. Metabolon formation and metabolic channeling in the biosynthesis of plant natural products. Curr Opin Plant Biol, 8, 280-91.

KABSCH, W. 2010. Xds. Acta Crystallogr D Biol Crystallogr, 66, 125-32.

KEASLING, J. D. 2012. Synthetic biology and the development of tools for metabolic engineering. Metab Eng, 14, 189-95.

KELCHTERMANS, P., BITTREMIEUX, W., DE GRAVE, K., DEGROEVE, S., RAMON, J., LAUKENS, K., VALKENBORG, D., BARSNES, H. & MARTENS, L. 2014. Machine learning applications in proteomics research: how the past can boost the future. Proteomics, 14, 353-66.

110 KHAN, F. I., WEI, D. Q., GU, K. R., HASSAN, M. I. & TABREZ, S. 2016. Current updates on computer aided protein modeling and designing. Int J Biol Macromol, 85, 48-62.

KINGSLEY, L. J. & LILL, M. A. 2015. Substrate tunnels in enzymes: structure-function relationships and computational methodology. Proteins, 83, 599-611.

LECHNER, A., BRUNK, E. & KEASLING, J. D. 2016. The Need for Integrated Approaches in Metabolic Engineering. Cold Spring Harb Perspect Biol, 8.

LI, Y. & CIRINO, P. C. 2014. Recent advances in engineering proteins for biocatalysis. Biotechnol Bioeng, 111, 1273-87.

LIEBERMAN, R. L., PEEK, M. E. & WATKINS, J. D. 2013. Determination of soluble and membrane protein structures by X-ray crystallography. Methods Mol Biol, 955, 475-93.

LUSSIER, F. X., COLATRIANO, D., WILTSHIRE, Z., PAGE, J. E. & MARTIN, V. J. 2012. Engineering microbes for plant polyketide biosynthesis. Comput Struct Biotechnol J, 3, e201210020.

MALFAIT, A. M., GALLILY, R., SUMARIWALLA, P. F., MALIK, A. S., ANDREAKOS, E., MECHOULAM, R. & FELDMANN, M. 2000. The nonpsychoactive cannabis constituent cannabidiol is an oral anti-arthritic therapeutic in murine collagen-induced arthritis. Proc Natl Acad Sci U S A, 97, 9561-6.

MARNER, W. D., 2ND 2009. Practical application of synthetic biology principles. Biotechnol J, 4, 1406-19.

MCCOY, A. J., GROSSE-KUNSTLEVE, R. W., ADAMS, P. D., WINN, M. D., STORONI, L. C. & READ, R. J. 2007. Phaser crystallographic software. J Appl Crystallogr, 40, 658-74.

MCKEAGUE, M., WONG, R. S. & SMOLKE, C. D. 2016. Opportunities in the design and application of RNA for gene expression control. Nucleic Acids Res, 44, 2987-99.

MCPHERSON, A. & GAVIRA, J. A. 2014. Introduction to protein crystallization. Acta Crystallogr F Struct Biol Commun, 70, 2-20.

MECHOULAM, R., PARKER, L. A. & GALLILY, R. 2002. Cannabidiol: an overview of some pharmacological aspects. J Clin Pharmacol, 42, 11S-19S.

111 MEHROTRA, P. 2016. Biosensors and their applications - A review. J Oral Biol Craniofac Res, 6, 153-9.

MIZOGUCHI, H., MORI, H. & FUJIO, T. 2007. Escherichia coli minimum genome factory. Biotechnol Appl Biochem, 46, 157-67.

MORIMOTO, S., TANAKA, Y., SASAKI, K., TANAKA, H., FUKAMIZU, T., SHOYAMA, Y. & TAURA, F. 2007. Identification and characterization of cannabinoids that induce cell death through mitochondrial permeability transition in Cannabis leaf cells. J Biol Chem, 282, 20739-51.

MORITA, H., SHIMOKAWA, Y., TANIO, M., KATO, R., NOGUCHI, H., SUGIO, S., KOHNO, T. & ABE, I. 2010. A structure-based mechanism for benzalacetone synthase from Rheum palmatum. Proc Natl Acad Sci U S A, 107, 669-73.

MOSES, T., MEHRSHAHI, P., SMITH, A. G. & GOOSSENS, A. 2017. Synthetic biology approaches for the production of plant metabolites in unicellular organisms. J Exp Bot, 68, 4057- 4074.

PADDON, C. J. & KEASLING, J. D. 2014. Semi-synthetic artemisinin: a model for the use of synthetic biology in pharmaceutical development. Nat Rev Microbiol, 12, 355-67.

PAIGE, J. S., NGUYEN-DUC, T., SONG, W. & JAFFREY, S. R. 2012. Fluorescence imaging of cellular metabolites with RNA. Science, 335, 1194.

PERTWEE, R. G. 2008. The diverse CB1 and CB2 receptor pharmacology of three plant cannabinoids: delta9-tetrahydrocannabinol, cannabidiol and delta9-. Br J Pharmacol, 153, 199-215.

PRYCE, G., VISINTIN, C., RAMAGOPALAN, S. V., AL-IZKI, S., DE FAVERI, L. E., NUAMAH, R. A., MEIN, C. A., MONTPETIT, A., HARDCASTLE, A. J., KOOIJ, G., DE VRIES, H. E., AMOR, S., THOMAS, S. A., LEDENT, C., MARSICANO, G., LUTZ, B., THOMPSON, A. J., SELWOOD, D. L., GIOVANNONI, G. & BAKER, D. 2014. Control of spasticity in a multiple sclerosis model using central nervous system-excluded CB1 cannabinoid receptor agonists. FASEB J, 28, 117-30.

RAHARJO, T. J., CHANG, W.-T., CHOI, Y. H., PELTENBURG-LOOMAN, A. M. G. & VERPOORTE, R. 2004. Olivetol as product of a polyketide synthase in Cannabis sativa L. Plant Science, 166, 381-5.

RAMAN, S., ROGERS, J. K., TAYLOR, N. D. & CHURCH, G. M. 2014. Evolution-guided optimization of biosynthetic pathways. Proc Natl Acad Sci U S A, 111, 17803-8.

112

ROGERS, J. K. & CHURCH, G. M. 2016. Genetically encoded sensors enable real-time observation of metabolite production. Proc Natl Acad Sci U S A, 113, 2388-93.

SASAKI, Y. & NAGANO, Y. 2004. Plant acetyl-CoA carboxylase: structure, biosynthesis, regulation, and gene manipulation for plant breeding. Biosci Biotechnol Biochem, 68, 1175-84.

SCAIFE, M. A. & SMITH, A. G. 2016. Towards developing algal synthetic biology. Biochem Soc Trans, 44, 716-22.

SCHLEGEL, S., KLEPSCH, M., GIALAMA, D., WICKSTROM, D., SLOTBOOM, D. J. & DE GIER, J. W. 2010. Revolutionizing membrane protein overexpression in bacteria. Microb Biotechnol, 3, 403-11.

SHI, Y. 2014. A glimpse of structural biology through X-ray crystallography. Cell, 159, 995-1014.

SHOYAMA, Y., TAKEUCHI, A., TAURA, F., TAMADA, T., ADACHI, M., KUROKI, R. & MORIMOTO, S. 2005. Crystallization of Delta1-tetrahydrocannabinolic acid (THCA) synthase from Cannabis sativa. Acta Crystallogr Sect F Struct Biol Cryst Commun, 61, 799-801.

SHOYAMA, Y., TAMADA, T., KURIHARA, K., TAKEUCHI, A., TAURA, F., ARAI, S., BLABER, M., MORIMOTO, S. & KUROKI, R. 2012. Structure and function of 1-tetrahydrocannabinolic acid (THCA) synthase, the enzyme controlling the psychoactivity of Cannabis sativa. J Mol Biol, 423, 96-105.

SINGLETON, C., HOWARD, T. P. & SMIRNOFF, N. 2014. Synthetic metabolons for metabolic engineering. J Exp Bot, 65, 1947-54.

SIRIKANTARAMAS, S., MORIMOTO, S., SHOYAMA, Y., ISHIKAWA, Y., WADA, Y. & TAURA, F. 2004. The gene controlling marijuana psychoactivity: molecular cloning and heterologous expression of Delta1-tetrahydrocannabinolic acid synthase from Cannabis sativa L. J Biol Chem, 279, 39767-74.

SIRIKANTARAMAS, S., TAURA, F., TANAKA, Y., ISHIKAWA, Y., MORIMOTO, S. & SHOYAMA, Y. 2005. Tetrahydrocannabinolic acid synthase, the enzyme controlling marijuana psychoactivity, is secreted into the storage cavity of the glandular trichomes. Plant Cell Physiol, 46, 1578-82.

SMYTH, M. S. & MARTIN, J. H. 2000. x ray crystallography. Mol Pathol, 53, 8-14.

113 STEPHANOPOULOS, G. 2012. Synthetic biology and metabolic engineering. ACS Synth Biol, 1, 514-25.

STOUT, J. M., BOUBAKIR, Z., AMBROSE, S. J., PURVES, R. W. & PAGE, J. E. 2012. The hexanoyl-CoA precursor for cannabinoid biosynthesis is formed by an acyl-activating enzyme in Cannabis sativa trichomes. Plant J, 71, 353-65.

SUN, W., MENG, X., LIANG, L., JIANG, W., HUANG, Y., HE, J., HU, H., ALMQVIST, J., GAO, X. & WANG, L. 2015. Molecular and Biochemical Analysis of Chalcone Synthase from Freesia hybrid in flavonoid biosynthetic pathway. PLoS One, 10, e0119054.

TAURA, F., SIRIKANTARAMAS, S., SHOYAMA, Y., YOSHIKAI, K. & MORIMOTO, S. 2007. Cannabidiolic-acid synthase, the chemotype-determining enzyme in the fiber-type Cannabis sativa. FEBS Lett, 581, 2929-34.

TAURA, F., TANAKA, S., TAGUCHI, C., FUKAMIZU, T., TANAKA, H., SHOYAMA, Y. & MORIMOTO, S. 2009. Characterization of olivetol synthase, a polyketide synthase putatively involved in cannabinoid biosynthetic pathway. FEBS Lett, 583, 2061-6.

UMENHOFFER, K., FEHER, T., BALIKO, G., AYAYDIN, F., POSFAI, J., BLATTNER, F. R. & POSFAI, G. 2010. Reduced evolvability of Escherichia coli MDS42, an IS-less cellular chassis for molecular and synthetic biology applications. Microb Cell Fact, 9, 38.

WANG, M., WANG, Y. H., AVULA, B., RADWAN, M. M., WANAS, A. S., VAN ANTWERP, J., PARCHER, J. F., ELSOHLY, M. A. & KHAN, I. A. 2016. Decarboxylation Study of Acidic Cannabinoids: A Novel Approach Using Ultra-High-Performance Supercritical Fluid Chromatography/Photodiode Array-Mass Spectrometry. Cannabis Cannabinoid Res, 1, 262-71.

WEI, L. & ZOU, Q. 2016. Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition. Int J Mol Sci, 17(12). pii: E21

WINTER, G., LOBLEY, C. M. & PRINCE, S. M. 2013. Decision making in xia2. Acta Crystallogr D Biol Crystallogr, 69, 1260-73.

YANG, X., MATSUI, T., KODAMA, T., MORI, T., ZHOU, X., TAURA, F., NOGUCHI, H., ABE, I. & MORITA, H. 2016. Structural basis for olivetolic acid formation by a polyketide cyclase from Cannabis sativa. FEBS J, 283, 1088-106.

114 YANG, X., MATSUI, T., MORI, T., TAURA, F., NOGUCHI, H., ABE, I. & MORITA, H. 2015. Expression, purification and crystallization of a plant polyketide cyclase from Cannabis sativa. Acta Crystallogr F Struct Biol Commun, 71, 1470-4.

ZHANG, F., CAROTHERS, J. M. & KEASLING, J. D. 2012. Design of a dynamic sensor-regulator system for production of chemicals and fuels derived from fatty acids. Nat Biotechnol, 30, 354-9.

ZHANG, F. & KEASLING, J. 2011. Biosensors and their applications in microbial metabolic engineering. Trends Microbiol, 19, 323-9.

115