Wesleyan ♦ University

Cloning and Characterization of New Protocatechuate (PCAD) Superfamily Members

By Ling Xie Faculty Advisor: Prof. Erika Anne Taylor

A dissertation submitted to the faculty of Wesleyan University in partial fulfillment of the requirements for the degree of Master of Arts in Chemistry

Middletown, Connecticut May 2013

Abstract

Genome sequence data has been rapidly accumulated with the development of new generations of sequencing techniques. A vast gap exists between the fast growth of sequence data and the comparatively slow progress on functional characterization of sequenced genes.

Due to this vast gap, gene functional annotation mostly relies on extrapolations based upon functionally characterized genes with high sequence homology to newly sequenced open reading frames (ORFs). The understanding of the relationship between protein sequence, structure and function, which allows robust classification of protein superfamilies, will ultimately yield more accurate gene annotation. A new superfamily of extradiol , which includes aromatic-ring-cleaving dioxygenases involved in degradation of lignin derived aromatic compounds, has been identified. This superfamily, the procatechuate dioxygenase (PCAD) superfamily is largely uncharacterized with few structures, substrate specificities, or catalytic mechanism having been determined. Genes with less than 20 % pair-wise identity to the first structurally characterized in the superfamily, protocatechuate 4,5-dioxygenase from Sphingomonas paucimobilis SYK-6, also named as

LigAB, have been selected through a BLAST search in order to identify potential dioxygenases in other bacterial strains and explore the boundaries of the PCAD superfamily.

Among the resulting genes, YgiD in Escherichia coli K12 has been cloned, expressed, and

i purified. YgiD has been shown to bind tightly to Fe(II), and to lack the ability to react with

PCA. In the search for the native substrate of YgiD, a differential analysis of 2D NMR spectra was being exploited to investigate what compounds accumulate in a YgiD knockout strain of E. coli. The study of YgiD will help in understanding the functional diversity of this superfamily. Six other putative PCAD superfamily members from a diversity of organisms have also been cloned, of them three have been expressed and purified. Further examination of these to the PCAD superfamily members will likely reveal insights into the linkages between structure, sequence and function in this superfamily.

ii

Acknowledgement

I would like to take this opportunity to acknowledge all the people who helped me complete my Master’s Thesis work. First, I would like to thank my advisor Professor Taylor for your mentorship and guidance in the past years. I have become a better scientist than before through your brilliant suggestions and guidance for my professional knowledge, experimental skills, and presentation ability. Thank you.

I would next like to thank my committee members Professor Pratt and Professor

Novick for your professional feedback and guidiance received from each committee meeting.

Especially Professor Pratt, my knowledge of kinetics increased greatly from the two courses with you. It has been very enjoyable to TA with you, too. I also appreciate all of the other faculty and staff who have helped me in my time here.

I am most grateful to my labmates especially Kevin and Dan. You both helped me a lot in almost every aspect of school life, from course assignments, learning new experimental techniques, troubleshooting failed experiments and encouragement through tough times. I’d also like to thank Jagadesh, Julie, Abraham, Noreen, Shu, Ann-Marie and other labmates for their support and entertainment from chatting.

To my family, much love and thank you very much. Your constant encouragement and immense mental support enabled me to finish my studies abroad. Without your love and blessings, this would have not been possible. Thank you to my uncle’s family. You helped me

iii to settle down when I had just arrived in the USA, and provided me family warmness at

Christmas.

I also want to thank my friends especially Yayan who kept me company, helped me indulge in a break, entertained me with funny conversations, and gave me useful suggestions on both study and life. Thanks to everyone for supporting me in this journey.

iv

Contents

Abstract ...... i Acknowledgement ...... iii Contents ...... v Table of Abbreviation ...... viii Table of Figures ...... xi Chapter 1 Introduction ...... 2 “Complete” Gene Sequence Understanding ...... 2 From Sequence to Function ...... 2 Protein Evolutionary Classification...... 3 Superfamilies of Dioxygenases...... 5 Dioxygenase Evolutionary Classification...... 5 VOC Superfamily...... 10 Cupin Superfamily...... 14 PCA superfamily ...... 19 Chapter 2 Characterization of a putative extradiol dioxygenase YgiD in Escherichia coli...... 22 Materials and Methods ...... 25 Cloning of YgiD from E. coli...... 28 Expression and purification of putative YgiD dioxygenase...... 28 Isothermal Titration Calorimetry experiment ...... 30 UV-Vis Spectrophotometric Assays of YgiD ...... 31 Knock out ygiD from E. coli...... 31 ADP-heptose extraction from WBBO6...... 33 Differential analysis of knockout and wild type extractions...... 34 Results and Discussion ...... 34 Expression and purification of putative YgiD dioxygenase ...... 35 Calorimetric Titration Curves...... 36

v

UV-Vis Spectroscopy Characterization...... 39 Knockout Experiment of YgiD ...... 41 Differential analysis of knockout and wild type extractions...... 42 Conclusions and Future Directions...... 43 Chapter 3 Expression and Purification of 3-methoxy gallic acid 3,4- dioxygenase (DesZ) from Sphingomonas paucimobilis SYK-6 ...... 45 Materials and Methods ...... 50 Cloning of DesZ...... 51 Soluble protein expression screening...... 52 Expression and Purification of DesZ...... 52 UV-Vis spectroscopy...... 55 Results and Discussion ...... 55 Soluble protein expression screening...... 50 Purification of non-tagged version of DesZ...... 51 UV-Vis spectroscopy characterization...... 52 Purification of His-tagged version of DesZ...... 53 Conclusions and Future Directions...... 61 Chapter 4 Other putative homologs in the PCAD superfamily ...... 62 Materials and Methods ...... 66 Pseudomonas syringae DC3000 (PSPT_1776) ...... 69 Cloning of putative PSPT_1776 dioxygenase...... 69 Soluble protein expression screening...... 70 Expression and purification of putative PSPT_1776 dioxygnase...... 64 Bacillus cereus ATCC 10987(BCE_1944) ...... 73 Cloning of putative BCE_1944 dioxygenase...... 73 Chromobacterium violaceum ATCC 12472 (CV3550) ...... 74 Cloning of putative CV3550 dioxygenase...... 74 Soluble protein expression screening...... 76 Expression and purification of putative CV3550 dioxygenas...... 69

vi

Sulfolobus solfataricus P2 (SSO0066)...... 78 Cloning of putative SSO0066 dioxygenase...... 78 DesB ...... 79 Cloning of DesB...... 79 Soluble protein expression screening...... 79

Site-directed mutagenesiss of DesB-pET21b to generate C-terminal His6 tagged proteins ...... 80 Results and Discussion ...... 81 Pseudomonas syringae DC3000 (PSPT_1776) ...... 81 Bacillus cereus ATCC 10987(BCE_1944) ...... 84 Chromobacterium violaceum ATCC 12472 (CV3550) ...... 85 Sulfolobus solfataricus P2 (SSO0066)...... 88 DesB protein expression screening and plasmid mutagenesis ...... 89 Conclusions and Future Directions ...... 90 Chapter 5 Conclusion ...... 91 Appendix1 Sequences of proteins studied in this thesis...... 93 Appendix 2 Mass spectra ...... 96 Appendix 3 Progress for all the proteins studies in this thesis………………… 100 References ...... 1001

vii

Table of Abbreviation

PCAD, procatechuate dioxygenase ORFs, open reading frames PCA, procatechuate or 3,4-dihydroxybenzoic acid 3MGA, 3-O-methylgallate EC, Enzyme Commission SCOP, Structure Classification of Protein BLAST, Basic Search Alignment Tool DHBD, 2,3-dihydroxybiphenyl 1,2-dioxygenase MhpB, 2,3-dihydroxypheylpropionate 1,2-dioxygenase VOC, Vicinal oxygen chelate 3,4-PCD, protocatechuate 3,4-dioxygenase C12O, catechol 1,2-dioxygenase HQ12O, hydroxyquinol 1, 2-dioxygenase C23O, catechol 2,3-dioxygenase HPCD, homoprotocatechuate 2,3-dioxygenase CHQO, chlorohydroquinone dioxygenase DHPPD, 2, 3-dihydroxyphenylpropionate 1,2-dioxygenase AkbC, methylcatechol 2,3-dioxygenase 4,5-PCD (LigAB), protocatechuate 4,5-dioxygenase APD, 2-aminophenol 1,6-dioxygenase GDO, gentisate dioxygenase PCB, polychlorinated biphenyls PcpA, 2,6-dichlorohydroquinone 1,2-dioxygenase HAD,3-hydroxyanthranilate 3,4-dioxygenase AHD, 4-amino-3-hydroxybenzoate 2,3-dioxygenase HGDO, homogentisate dioxygenase HNDO, 1-hydroxy2-naphthoate 1,2-dioxygenase CDO, QDO, quercetin dioxygenase ADO, acireductione dioxygenase SDO, salicylate 1,2-dioxygenase DesZ, 3-methoxy gallic acid 3,4- dioxygenase LigD, NAD-dependent -aryl ether dehydrogenase LigE and LigF, -etherase LigG, glutathionelyase

viii

LigX, 5,5’-dehydrodivanillate O-demethylase LigZ, 2,2’-trihydroxy-3’-methoxy-5,5’-dicarboxybiphenyl diooxygenase LigY, C-C LigW/LigW2, carboxyvanillate decarboxylase FerA, feruloyl-CoA synthetase FerB/FerB2, feruloyl-CoA hydratase/

LigM, tetrahydrofolate (H4folate)-dependent O-demethylase LigC, 4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase LigJ, 4-oxalomesaconate hydratase DesB, LigK, 4-carboxy-4-hydroxy-2-oxoadipate aldolase/oxaloacetate decarboxylase DesA, syringate O-demethylase DDVA, 5,5’-dehydrodivanillate OH-DDVA, 2,2’-trihydroxy-3’-methoxy-5,5’-dicarboxybiphenyl 5CVA, 5-carboxyvanillate CHMS,4-carboxy-2-hydroxymuconate-6-semialdehyde CHA, 4-carboxy-4-hydroxy-2-oxoadipate OMA, 4-oxalomesaconate PDC, 2-pyrone-4,6-dicarboxylate CHMOD, 4-carboxy-2-hydroxy-6-methoxy-6-oxohexa-2,4-dienoate Rmsd, root mean square deviation IPTG, Isopropyl β-D-1-thiogalactopyranoside Tris-HCl, tris-(hydroxymethyl)aminomethane hydrochloride EDTA, Ethylenediaminetetraacetic acid SDS, sodium dodecyl sulfate APS, ammonium persulfate TEMED, N, N, N, N’-tetra-methylethylenediamine SDS-PAGE, sodium dodecyl sulfate polyacrylamide gel electrophoresis LB, Luria-Bertani Amp, ampicillin SOC, Super Optimal broth with catabolite repression TAE, Tris-acetate-EDTA ADP-heptose, ADP-L-glycerol-D-manno-heptose FPLC, fast protein liquid chromatography PCR, polymerase chain reaction ITC, isothermal titration calorimetry NMR, nuclear magnetic resonance

ix

MALDI, matrix-assisted laser desorption ionization DTT, dithiothreitol DQCOSY, double-quantum filtered correlation spectroscopy ESI-MS, electrospray ionization mass spectrometry SAP, Shrimp Alkaline Phosphatase TSB, Tryptic Soy Broth

x

Table of Figures

Chapter 1 Introduction

Figure 1. Hypothetical model for domain structures of three classes of extradiol catechol dioxygenases ...... 7 Figure. 2. Families of ring-cleaving based on structural folds ...... 9 Figure 3. Ribbon structures of A. HPCD, B. GDO and C. LigAB ...... 10 Figure 4. Ribbon structures of HPCD and DHBD ...... 12 Figure 7. Structures of A. GDO, B. QDO, and C. HAD ...... 16 Figure 8. Reactions catalyzed by ring-cleavage dioxygenases in the Cupin superfamily ...... 18 Figure 9. A. Crystal structure of LigAB; B view of LigAB...... 19

Chapter 2 Characterization of a putative extradiol dioxygenase YgiD in

Escherichia coli

Figure 1. A. Structure of the YgiD dimer; B. Superposition of YgiD and LigB ... 22 Figure 2. A. View of the Zn coordination residues in YgiD; B. superposition of active sites of YgiD monomer and LigB; C. active site view of LigAB; D. active site view of YgiD dimer...... 24 Figure 3. A. Batch purification result of YgiD through a Sepharose Fast-Flow column charged with Ni2+ by step wash ...... 36 Figure 4. Calorimetric titration profiles of Fe and Zn titrated into YgiD ...... 37 Figure 5. Interactions between bound Zn ions and YgiD residues ...... 38 Figure 6. A. Time dependent UV-Vis spectrum of PCA with YgiD ...... 40 Figure 7. Gradient PCR result of Kan cassette from pKD13 on a 1% Agarose gel.41

Chapter 3 Expression and Purification of 3-methoxy gallic acid 3,4- dioxygenase

DesZ from Sphingomonas paucimobilis SYK-6

Figure 1. Proposed catabolic pathway for the degradation of lignin-derived aromatic compounds in S. paucimobilis SYK-6 ...... 49 Figure 2. DesZ in pET21b vector expression screening ...... 56 Figure 3. The chromatograph of non-tagged version of DesZ from DEAE column57 Figure 4. The SDS-PAGE gels of the non-tagged version of DesZ from DEAE column...... 58

xi

Figure 5. A. Time dependent UV-Vis spectrum of 3MGA with DesZ...... 58 Figure 6. The chromatograph of C terminal tagged version of DesZ from the DEAE column...... 60 Figure 7. The SDS-PAGE gels of the C-terminal tagged version of DesZ from the DEAE column ...... 60 Figure 8. The Western-blot of the C-terminal tagged version of DesZ off the DEAD column...... 61 Chapter 4 Other putative homologs in the PCAD superfamily Figure 1. Pair-wise sequence identity determined for comparison of PCA dioxygenase superfamily members ...... 62 Figure 2. Sequence alignment with selected LigB homologs and two structurally characterized enzymes LigB and YgiD …………………………57 Figure 3. Gradient PCR result of PSPT_1776 on a 1% Agarose gel ...... 82 Figure 4. Colony PCR product of PSPT_1776...... 82 Figure 5. ps-pET15b expression screening...... 83 Figure 6. The SDS-PAGE gels of ps-pET15b purified using a Sepharose Fast-Flow column...... 83 Figure 7. A. Size exclusion column chromatography of purified ps-pET15b from Sepharose Fast-Flow column ...... 84 Figure 8. Gradient PCR result of BCE_1944 on agarose gel ...... 84 Figure 9. Colony PCR product of BCE_1944...... 85 Figure 10. Gradient PCR result of CV3550 on agarose gel...... 86 Figure 11. Colony PCR result of CV3550 ...... 86 Figure 12. Protein expression of cv-pTOM construct in BL-21 AI cells ...... 87 Figure 13. The SDS-PAGE gels of cv-pTOM in BL-21 AI cells from a Sepharose Fast-Flow column ...... 88 Figure 14. Gradient PCR result of SSO0066 on agarose gel ...... 88 Figure 15. DesB in pET21b vector expression screening ...... 89

xii

Chapter 1 Introduction

“Complete” Gene Sequence Understanding

From Sequence to Function. With the development of new generations of sequencing techniques dramatically lowering the cost of genome sequencing, genome sequence data has been rapidly accumulating, thus allowing us to address fundamental biological questions that could not have been asked in the not so distant past. According to a review by Galperin, “by the end of 2009, 1052 genomes representing 720 individual species

(636 bacteria, 61 archaea, and 23 eukaryotes) were completely sequenced, deposited in the public nucleotide sequence databases (GenBank\EMBL\DDBJ) and made freely available over the internet.”(1)

However, a vast gap exists between the fast growth of sequence data and the comparatively slow progress on functional characterization of sequenced genes. For example,

Escherichia coli K12 MG1655 strain is the most well studied organism, since its genome was sequenced in 1997 by Blattner and his coworkers.(2) However, only about 54 % of the protein coding gene products had been experimentally assigned a biological function by

2005.(3) Predicted functions have been assigned to an additional 26 % of coding regions, based on (a) the sequence similarity to proteins with known functions, (b) the operon location, and (c) the phenotypes of mutants.(4) Over 10 % of the gene coding regions have no putative assignments of function at all. Considering all the other sequenced but far less well studied organisms, filling the gap described above with experimental indication of protein functions

2 is both urgent and crucial.

Due to the vast gap between the numbers of sequenced genes and experimentally verified protein functions, gene functional annotation mostly relies on extrapolations of functionally characterized genes with high sequence homology, based on the assumption that if the two proteins are similar in sequence, their functions are also similar. However, when the sequence identity is below 50 %, which is a common situation, the two genes do not necessarily perform similar functions, even though they are in the same .

With the rapid growth of sequence and structure data, it is found that notable functional promiscuity exists in many protein superfamilies: one structural fold may exhibit multiple functions, and one function may correspond with multiple structures. These complexities necessitate the understanding of the relationship between protein sequence, structure and function, which is reliant on robust classification of protein superfamilies, which will ultimately yield more accurate gene annotation.(5)

Protein Evolutionary Classification. An enzyme superfamily is a basic and popular term in biology and enzymology for protein classification, attributed to a group of enzymes hypothesized to have a common ancestor. The term “superfamily” has experienced a series of changes, and has evolved to have a clearer and more explicit definition as a deeper understanding of protein structure and function has developed.(6-10)

The first person to use superfamily and family to systematically classify proteins is

3

Margaret O. Dayhoff for “Atlas of Protein Sequence and Structure” in 1965. Without the invention of techniques to obtain protein structures, proteins are categorized only based upon sequence identities. Proteins with more than 50 % sequence identity were organized into one family. Proteins in a sub-family usually have more than 80 % sequence identity.

Superfamilies were groups of families set up by statistical methods and search procedures.(11)

Proteins were subsequently reorganized on the basis of either function or structure, along with the help of sequence similarities. This restructuring leads to the formation of

Enzyme Commission (EC), the best developed and mostly widely used protein functional classification system. The EC organizes enzymes into six classes of , , , , , and based on their catalyzed chemical reactions.(12) Meanwhile, the Structure Classification of Protein (SCOP, http://scop.mrc-lmb.cam.ac.uk/scop/) database exclusively relies on visual comparison of structures evolved from a common progenitor to construct the protein classification system.(13) After more extensive exploration of evolutionary relationships of superfamily members hidden beyond the sequence level was performed, people found that the above protein classifications were not enough to describe the global relationships between protein sequence, structure and function. By considering structure, function, and sequence together, it was discovered that: (a) enzymes with chemistry-based homology may be partitioned into

4 different superfamilies in SCOP due to differencies in structures, (b) enzymes which catalyze different categories of overall reactions in EC system can be closely related based on their similar structures.

This necessitates the development of a novel definition of superfamily which better describes the divergent evolution of enzyme functions.(14) Gerlt and Babbit clarified the definition for a superfamily in 2001, suggesting that it describes a group of homologous enzymes that catalyze either (a) the same chemical reaction with differing substrate specificities or (b) different overall reactions that share a common mechanistic attribute

(partial reaction, intermediate, or transition state) enabled by conserved active site residues that perform the same functions in all members of the superfamily.(15) As a result of this definition, members of a superfamily typically share less than 50 % sequence similarity, and often share less than 20 % sequence identity. To the best of my knowledge, no better definition of superfamily has been proposed to systematically consider the protein structure, function, and sequence together, other than this one. With more complete understanding of superfamily members, the definition of superfamily may further evolve in the future.

Superfamilies of Dioxygenases

Dioxygenase Evolutionary Classification. Dioxygenase enzymes are significant participants in microbial pathways for degrading aromatic molecules.(16) The degradation of

5 aromatic compounds by dioxygenases includes two significant reactions: ring dihydroxylation and ring cleavage (Scheme 1). Catecholic ring cleavage is generally catalyzed with one of two regiochemical ring cleavage patterns: (a) intradiol dioxygenases which cleave between two adjacent hydroxyl substituents, typically involving a non-heme Fe

(III); and (b) extradiol dioxygenases that cleave the bond adjacent to the hydroxyl substituents typically involving a non-heme Fe(II).(17) Intradiol dioxygenases investigated to date utilize substrates possessing vicinal hydroxyl groups with mildly electron-withdrawing substituents, while extradiol enzymes have demonstrated greater versatility, cleaving a wider variety of substrates with either electron withdrawing or electron-donating substituents.(18)

R2

HOOC

R1 OH Intra COOH R1 O 2 R1 Hydroxylating dioxygenses OH R1

HOOC R2 R2 Extra O Ring-cleavaging dioxygenases HO R2 Scheme 1. Typical reactions catalyzed by ring hydroxylating and cleaving dioxygenases. Intradiol and extradiol cleavage patterns are indicated by blue and magenta respectively.

Sequence and structural homology indicates that all the intradiol dioxygenases belong to a single evolutionarily related superfamily,(19) while the evolutionary relationships within extradiol dioxygenase superfamilies are less well defined. Limited by the numbers of extradiol dioxygenases found and the finite understanding of superfamily at the time, several

6 wrong classifications for extradiol dioxygenases have been proposed before. In 1989

Harayama and Rekik divided the extradiol dioxygenases into two families based on sequence alignment and substrate specificity: those showing a preference for bicyclic substrates and those showing a preference for monocyclic substrates.(19) Later in 1993 Bugg suggested an expanded classification system for extradiol dioxygenases using sequence alignment (Figure

1).(20) Class I are single-domain enzymes typified by the small 2,3-dihydroxybiphenyl

1,2-dioxygenase (DHBD) III in Rhodococcus globerrulus P6;(21) Class II, which most extradiol enzymes are belonged to, are two-domain enzymes in which the catalytically active

C-terminal domain binds Fe(II), as the DHBD from Pseudomonas cepacia LB400 exemplifies.(21) Class III are two-domain enzymes where the N-terminal domain contains conserved histidine residues, typified by the 2,3-dihydroxypheylpropionate 1,2-dioxygenase

(MhpB) in E. coli.(22)

Class I (DHBD III in … AHV … HHR … HHT … QHG… SLY … VEL R. globerrulus P6 )

Class II (DHBD in Domain 1 … GYM … AWR … AFA … ASL … LIT … LE I P. cepacia LB400) Domain 2 GHF … HHT … HHF … RHT … SFY … VEY * * * * * * * 10 53 115 Class III (MhpB in Domain 1 … SHS … DHY … GDF … DHG… PLE … LDK E. coli) Domain 2 SHQ … ERE … EAQ … LSA … KST … IKT 179 Figure 1. Hypothetical model for domain structures of three classes of extradiol catechol dioxygenases (adapted from Ref. (20)). The Fe(II) ligands identified in the dioxygenase DHBD by X-ray crystallography (His-146, His-210, and Glu-260) are underlined.(21) The positions marked by asterisks correspond to seven amino acid residues as active residues in DHBD from Pseudomonas cepacia LB400, namely, the three (II) ligands and His-195, His-241, Ser-248, and Tyr-250.(21) Residues which are identically conserved within each dioxygenase enzyme class are in bold face.

7

Recent studies have shown that the relationships of enzymes within a protein superfamily can be understood not only in terms of their sequence similarities and three-dimensional structures, but also by chemical threads that relate their functional attributes. Based on this, Eltis gave a revised and renamed classification system for dioxygenases.(18) Extradiol dioxygenases are divided into three superfamilies on the basis of this classification system (Figure 2): type I, Vicinal oxygen chelate (VOC) superfamily which includes two-domain and one-domain enzymes;(23) type II, an unknown superfamily which we have named the PCAD superfamily includes enzymes consisting of one or two different subunits; and type III, the Cupin superfamily which also are one or two domain dioxygenases.(24) Each type of dioxygenases has their own structural, which will be elucidated below characteristics (Figure 3).

8

3,4-PCDB-10 ()12 Intradiol C12OADP1 2 HQ12O3E 2 C23Omt2 4

HPCDBfu 4 CHQO  Vicinal oxygen x DHBD  chelate P6-III 2 L-DOPA

dioxygenaselincolnensis2

Dioxygenase DHPPDEco 4 Gallate

dioxygenaseKT2440 3 Extradiol Unknown HPCDEcoC x (PCAD) C23OJMP222-I x

4,5-PCDSYK6(LigAB) 22

4,5-PCDT2 44

APDJS45

22 GDO 4 Cupin 1-hydroxy-2- naphthoate

dioxygenaseKP7 6

Figure. 2. Families of ring-cleaving enzymes based on structural folds (Adapted from ref.(18)).

3,4-PCDB-10, protocatechuate 3,4-dioxygenase from P. putida B-10;(25) C12OADP1, catechol

1,2-dioxygenase from Acinetobacter sp. ADP1; (26) DHBDLB400, 2,3-dihydroxybiphenyl

1,2-dioxygenase from Pseudomonas cepacia LB400;(21) HQ12O3E, hydroxyquinol 1, 2-dioxygenase from Nocardioides simplex 3E;(27) C23Omt2, catechol 2,3-dioxygenase from P. putida mt-2;(28)

HPCDBfu, homoprotocatechuate 2,3-dioxygenase from B. fuscum;(29) CHQO, chlorohydroquinone dioxygenase; DHBDP6-III, 2,3-dihydroxybiphenyl 1,2-dioxygenase III from R. globerulus P6;(30) L-DOPA dioxygenase (LmbB1) from Streptomyces lincolnensis;(31) gallate dioxygenase from P. putida KT2440;(32) DHPPDEco, 2, 3-dihydroxyphenylpropionate 1,2-dioxygenase from E. coli;(33)

HPCDEcoC, homoprotocatechuate 2,3-dioxygenase from E. coli C;(34) 4,5-PCDSYK6, protocatechuate

4,5-dioxygenase from Sphingomonas paucimobilis SYK-6;(35) 4,5-PCDT2, protocatechuate

4,5-dioxygenase from Comomonas testosteroni T-2;(36) C23OJMP222-I, catechol 2,3-dioxygenase I from

Alcaligenes eutrophus JMP222;(37) APDJS45, 2-aminophenol 1,6-dioxygenase from P. pseudoalcaligenes JS45(38); 1-hydroxy-2-naphthoate dioxygenase from Nocardioides sp. KP7.(39) GDO, gentisate dioxygenase from P. testosterone.;(40) An “x” indicates an unknown oligomeric state.

9

A B C

Figure 3. Ribbon structures of A. HPCDBfu (2qjt.pdb, VOC superfamily), B. GDO (3bu7.pdb, cupin superfamily), and C. LigAB (1b4u.pdb, unknown superfamily). Metal ions are shown as orange spheres.

VOC Superfamily. The VOC superfamily was named from the fact that the first identified three types of enzymes in this superfamily (glyoxalase I, extradiol dioxygenase, and fosfomycin resistance proteins) all have substrates or intermediates that coordinate to the metal center via vicinal oxygens.(41) Sequence alignments, three-dimensional structures, and functional analysis reveal that there are at least seven functionally distinct subclasses of this superfamily: type I extradiol dioxygenases, 4-hydroxyphenylpuruvate dioxygenase,(42) glyoxalase I,(43) fosfomycin resistance proteins,(44) methylmalonyl-CoA epimerases,(45) bleomycin resistance proteins,(46) and aminocarboxycyclopropane forming enzyme.(47)

Among the VOC superfamily members, type I extradiol dioxygenases are most often found as part of metabolic pathways. Catechol 2,3-dioxygenases (C23O) are present in pathways which degrade a variety of substituted aromatic compounds like toluene, nitrobenzene and aniline.(48-50) 2,3-dihydroxybiphenyl 1,2-dioxygenases (DHBD) are in

10 pathways degrading polychlorinated biphenyls (PCB).(21, 30) Homoprotocatechuate

2,3-dioxygenase (HPCD) is found in tyrosine catabolism.(51, 52)

All type I extradiol dioxygenases identified to date comprise one single subunit with divergence into one- and two-domain enzymes. The majority of type I enzymes are

two-domain enzymes (Figure 2, DHBDLB400, C23Omt2, and HPCDBfu) within a range of

oligomeric states, while the one-domain enzymes (DHBDP6-III and L-DOPA

dioxygenaselincolnensis) are usually homodimers. Phylogenetic analysis based on sequence alignment indicates that the ancestral type I dioxygenase was a one-domain enzyme, and two duplication events were involved in the divergence of type I dioxygenases into one- and two-domain enzymes.(17) The majority of type I enzymes are two-domain, suggesting that the catalytically inactive N-domain may bring some sort of advantage. No one-domain type I extradiol dioxygenases have been structurally studied. Biochemical and crystallographic characterization of VOC superfamily dioxygenases has been concentrated on the larger two-domain enzymes, such as DHBD and HPCD.

Although sequence alignment indicates that the identity between HPCDBfu and DHBD from Pseudomonas sp. strain KKS102 is no more than 15 %,(17) their tertiary structures are very similar (Figure 4). Both of monomers have their active site at the C-terminus. Each monomer contains four conserved  motifs, which is conserved in all VOC superfamily members.(53) In the active site, the observed ligands for the metal ion are two histidines and

11 one glutamic acid residues. This 2-His-1-carboxylate facial triad is strictly conserved in all the type I dioxygenases.(54) The glutamate coordinates with the iron monodentately.

Sequence alignments of 23 mostly type I extradiol dioxygenases by Eltis show that there are another six conserved residues besides the 2-His-1-caboxylate facial triad: two histidines and one tyrosine in the active site pocket whose function are yet unknown; one glycine, one leucine, and one proline at the N- and C-terminal domain interface which are proposed to play a role in protein structure or folding.(17)

C C

N

N

Figure 4. Ribbon structures of HPCD from Brevibacterium fuscum (left, 2qjt.pdb) and DHBD from Pseudomonas sp. strain KKS102 (right, 1eir.pdb). The  helices involving in the motif  are colored in red. Fe (II) is highlighted in orange. Ligands coordinated with Fe are colored in cyan. The N- and C-terminal residues of each chain are labeled with N and C.

Some typical reactions catalyzed by type I dioxygenases and their metal contents are displayed in Figure 5. The substrates of most type I dioxygenases are catecholic compounds, except 2,6-dichlorohydroquinone 1,2-dioxygenase (PcpA), which catalyzes the substrate consisting of a hydroxyl-substituted aromatic ring with a hydroxyl at the para instead of

12

Reaction Metal Center

HO OH OH HOOC Fe(II), Mn(II) A DHBD O

CH CH3 3 H C H3C OH AkbC 3 O Fe(II) B COOH

OH OH

OH CHO C23O Fe(II) C COOH

OH OH OH

Cl Cl Cl COOH PcpA COOH Fe(II) D

COOH O OH COOH OH HPCD CHO Fe(II), Mn(II), COOH E Co(II), Mg(II) OH OH COOH NH2 HOOC NH2

OH L-DOPA CHO dioxygenase COOH Fe(II) F OH OH Figure 5. Reactions catalyzed by some type I dioxygenases. A. 2,3-dihydroxybiphenyl 1,2-dioxygenase (DHBD);(55, 56) B. methylcatechol 2,3-dioxygenase (AkbC);(57) C. catechol 2,3-dioxygenase (C23O);(58) D. 2,6-dichlorohydroquinone 1,2-dioxygenase (PcpA);(59) E. homoprotocatechuate 2,3-dioxygenase (HPCD);(60-62) F. L-DOPA dioxygenase.(63) ortho position. The majority of the dioxygenases in VOC superfamily use Fe(II) as an active metal, only HPCD exhibited interesting “metal promiscuity”. HPCD from

Brevibacterium fuscum showed highest catalytic activity under the presence of Co(II), and comparable activity with Mn(II) and Fe(II). HPCD from Klebsiella pneumonia is

13 the only one in type I dioxygenases which uses Mg(II) as an active metal, to the best of my knowledge. Further investigation is required to probe the cause of substrate diversity and metal promiscuity, and the effect to the catalytic mechanism.

Cupin Superfamily. The members of the cupin superfamily all share a six -strands barrel core structure, from which the term cupin (derived from the Latin word cupa for barrel) comes. The cupin superfamily was discovered from an amino acid sequence analysis in which the wheat protein germin shared high level of similarity with fungal spherulins.(64)

The cupin superfamily has members from all three domains of life including archaea, bacteria and eukaryotes, and is among the most functionally diverse superfamilies of proteins known.(65) A review by Dunwell in 2001 estimated that there were a minimum 18 different functional subclasses, including both enzymatic and non-enzymatic members, such as dioxygenases, decarboxylases, epimerases, isomerese, epoxidases, auxin-binding proteins, transcription factors, and seed storage globulins.(24) As more cupin members have been identified, the number of functional subclasses has continued to grow.

As with the other cupins, the dioxygenases in the cupin superfamily have two conserved motifs based on sequence alignment (Figure 6). Each motif,

G(X)5HXH(X)3,4E(X)6G (Motif 1) and G(X)5PXG(X)2H(X)3N (Motif 2), contains two

-strands. The two motifs together form one metal , and are separated by another two -strands with an intervening variable length loop (Figure 7). The cupin superfamily

14

Motif 1 IMR Motif 2 -strand C D G H

HADsce G . . . P NE R T GYH I N P T P EW F Y QKKG 23 G D S YL L . . P GN V P H S P VR HADhsa G . . . P NT R KDYH I E E G E E V T Y Q L E G 19 G E I F L L . . P A R V P H S P Q R HGDOhsa . . . . F . R P P YYH R NCM S E F M G L I RG 11 G G GS L H . . S T M R P H G P D A GDOspo_AG . . . E RAGA HRHAA S A L R F I M E GS G 12 GA ND F VL T P N G T W H E HGI GDOspo_BG . . . E HT KA HRH T GN V I Y N V A KGQG 13 H D I F C V . . P AW T W H E HC N GDOhsp G . . . E VA P C HRH T . . Q CA L R F I L E G 16 F D L VL T . . P G GQW H DHGN CDOmmu G . . . H GS S I HDH T D S . H C F L K L LQG 27 N Q CAY I ND S I G . L H R VE N QDOaja HSDA L GVL P H I HQ KH Y E . N F Y C NKG 20 G D YGS V . . P R N V T H T F Q I ADOkpn L . R E K F L N E HT H . . G E D E V R F F VE G 19 N D L I S V . . P A H T P HW F D M Consensus G H H E G G D P G H N Figure 6. Multiple sequence alignment of selected dioxygenases in cupin superfamily (adapted from Ref.(65)) (some of structures are shown in Figure 6), showing the two diagnostic conserved domains (boxed), the four -strands (shaded), the numbers of amino acids in the inter-motif region (IMR), the consensus sequence, the four conserved active-site residues (in red), and the conserved structurally important residues (in blue). The four -strands, C, D, G, and H, are labeled using the convention based on the structure of phaseolin.(66) The A or B suffix following the protein abbreviations designates either the N- or C-terminal domain of two-domain bicupins respectively. The protein abbreviations are as follows: HADsce, 3-hydroxyanthranilate dioxygenase from Saccharomyces cerevisiae;(67) HADhsa, 3-hydroxyanthranilate 3,4-dioxygenase from Homo sapiens;(68) HGDOhsa, homogentisate dioxygenase form Homo sapiens;(69) GDOspo, gentisate 1,2-dioxygenase from Silicibacter pomeroyi;(70) GDOhsp, gentisate 1,2-dioxygenase from Haloferax sp. D1227;(71) CDOmmu, cystein dioxygenase in Mus musculus;(72) QDOaja, quercetin dioxygenase in Aspergillus japonicas;(73) ADOkpn, acireductione dioxygenase from Klebsiella pneumonia.(74) ring-cleaving dioxygenases can be subdivided into monocupins and bicupins (Figure 8), based upon whether the monomer contains a single cupin domain such as

3-hydroxyanthranilate dioxygenase (HAD) and 4-amino-3-hydroxybenzoate 2,3-dioxygenase

(AHD), or two cupin domains like homogesitate 1,2-dixoygenase (HGDO), gentisate

1,2-dioxygenase (GDO), and qercetin dioxygenase (QDO). The evolution of bicupins could be from one or more gene duplication events originating from a monocupin ancestor.(75)

Structures representing the known scaffolds of cupin ring-cleaving dioxygenases are shown in Figure 7. Even though the various structures displayed below demonstrate a low

15

A

B C

Figure 7. Structures of A. GDO in Silicibacter pomeroyi (bicupin, 3BU7.pdb), B. QDO in Aspergillus japonicas (bicupin, 1GQG.pdb), and C. HAD from Saccharomyces cerevisiae (monocupin, 1ZVF.pdb). Motif 1 and 2 are colored in magenta and orange respectively. Ni(II), Fe(II), and Cu(II) are colored in green, red, and deep red respectively. Residues coordinating with metal ions are in cyan. level of overall structural identity, each has a -barrel structure with the metal binding site at the center. Instead of using the conserved 2-His-1-carboxylate facial triad at the active site, some type III dioxygenases substitute the glutamate with a third histidine. This substitution has been discussed to possibly represent an adaption for binding of both the aromatic

16 carboxylate and O2 to the metal.(76) Even still keeping the 2-His-1-carboxylate facial triad like HADsce, the glutamate residue binds in a bidentate manner, contrasting with the monodentate manner seen in the VOC superfamily.

All the reactions catalyzed by ring-cleaving dioxygenases found in the Cupin superfamily to date, as well as their metal contents, are shown in Figure 8. Instead of mainly using catecholic substrates as their VOC superfamily counterparts, type III dioxygenases utilize non-catecholic hydroxyl-substituted aromatic carboxylic acids, mostly monohydroxylated substrates, such as getisate, salicylate, 1-hydroxy 2-naphthoate, or aminohydroxybenzoates. Most type III dioxygenases use Fe(II) for catalysis, in rare cases other metals are used like Ni(II) in HAD from Saccharomyces cerevisiae, and Cu(II) in QDO from Aspergillus japonicas. Interestingly, QDO from some organisms display important

“metal promiscuity”. They exhibit different metal selectivity, or are cambialistic, which means that they are able to display comparatively significant activities with either of the metals bound at the same active site. For example, the activities of QDO from Bacillus subtilis under the presence of different divalent metals are ordered in following sequence:

Co(II), Cu(II)>Mn(II)>Ni(II). Further investigation is required to probe the effect of the the

3-His metal coordination. Investigating the substrate differences between the type I and III dioxygenases besides the primary structural differences and the reasons to cause these differences could increase our understanding about the whole extradiol dioxygenase group.

17

Metal Domain, Reaction Content Strcture

OH OH

O2 A Fe(II) Bicupin, homotetramer GDO COOH O COOH COOH OH OH COOH COOH COOH O O2 Fe(II) Bicupin, B HNDO homohexamer

O2

SDO Bicupin, C COOH O Fe(II) COOH COOH homotetramer OH OH OH

O2 Bicupin, D Fe(II) HGDO homotetramer O COOH OH COOH COOH

COOH COOH NH NH2 O2 2 Fe(II), Ni(II) Monocupin, E HAD COOH homodimer OH CHO

Spontaneous H2O COOH

N COOH COOH COOH O 2 Monocupin, F CHO Fe(II) AHD homodimer COOH OH

NH2 NH2 OH OH OH OH HO O Bicupin, O HO O Cu(II), Mn(II),Co(II) G 2 homodimer Fe(II), Ni(II) OH QDO O CO COOH OH O OH Figure 8. Reactions catalyzed by ring-cleavage dioxygenases in cupin superfamily (adapted from Ref. (76)). A. gensitate 1,2-dioxygenase (GDO);(70, 77) B. 1-hydroxy2-naphthoate 1,2-dioxygenase (HNDO);(39) C. salicylate 1,2-dioxygenase (SDO);(78, 79) D. homogesitate 1,2-dixoygenase (HGDO);(69) E. 3-hydroxyanthranilate 3,4-dioxygenase (HAD);(67, 80) F. 4-amino-3-hydroxybenzoate 2,3-dioxygenase (AHD);(81) G. qercetin dioxygenase (QDO).(73, 82, 83)

18

PCA superfamily. The evolution of extraldiol dixoxygenase classification has resulted in the identification of an as yet unnamed superfamily in the literature which our lab temporarily calls the PCAD superfamily, because PCA is the substrate of LigAB, the first structurally characterized member. This new superfamily was first proposed to be part of the class III two-domain dioxygenases with the conserved histidine residues at the N-terminal domain by Bugg in 1996 (Figure 1).(20) However, a 35 dioxygenases protein sequence alignment conducted by Eltis showed no sequence similarities between the unknown superfamily members and their counterparts in the other classes.(17) No details of the catalytic mechanism were available because of a lack of any structural information for these dioxygenases, until Senda solved the first structure in this superfamily for protocatechuate

4,5-dioxygenase (LigAB) from Sphingomonas paucimobilis SYK-6 (Figure 9).(35)

A B

Figure 9. A. Crystal structure of LigAB (1B4U.pdb). The  subunit is colored in aqua, and the  subunit is colored in blue, with the Fe at the bottom of the cleft between two subunits (in green). B. active site view with the substrate protocatechuate (PCA, in magenta) bound.  and  after the numbers designate the subunit from where the corresponding residue comes.

19

The determination of the LigAB structure validated the prediction that there are three distinct superfamilies containing extradiol dioxygenases from a structural perspective.

LigAB is a heterodimer, with the active site at the interface of the two protein subunits

(Figure 9A). Neither of the two subunits contains a  domain as the VOC superfamily, nor has a six -strands core barrel structure, as in the cupin superfamily. The LigB subunit is a single entity with nine -strands forming a central core which is sandwiched by six  helices. This seems to be a novel type of folding in dioxygenases, because this folding pattern has not been observed through searches using SCOP. At the active site, LigAB keeps the

2-His-1-carboxylate facial triad and uses Fe(II) as the active metal (Figure 9B). Similar to counterparts in the VOC superfamily, the glutamic acid binds to the metal in a monodentate manner and the substrate PCA binds bidentately. Interestingly, some common features have

been found around the active sites of LigAB and DHBDKKS102 in the VOC superfamily, such as the iron coordination spheres, their substrate stabilization environments, and a putative catalytic acid/base residue (His195 in LigAB, His 194 in DHBD). This suggests that they may possess the same catalytic mechanism despite their completely different tertiary and quaternary structures.

Based on the above information, Eltis and his coworkers proposed that the unknown superfamily includes LigAB and other dioxygenases whose sequences do not align well with their VOC and cupin counterparts (Figure 2). Since this superfamily is newly

20 assembled, it is far less well studied than the other two dioxygenase superfamilies. No other common features have been confirmed except that the sequences of the unknown superfamily members do not align well with those of VOC or cupin superfamily members. More enzymes must be characterized by crystallography, substrate specificity, gene context, and catalytic mechanism in order to better understand this superfamily.

This thesis describes the exploration of remote members in this unknown PCAD superfamily. The evolutionary diversity of this enzyme superfamily was assessed starting with the characterization of YgiD in E. coli and DesZ in Sphingomonas paucimobilis SYK-6, followed by the investigation of other homologues. Identification of new superfamily members, especially the ones at the edge of superfamily, will likely reveal homologues with diverse function and will allow further understanding of the overall characteristics of this superfamily.

21

Chapter 2 Characterization of a putative extradiol dioxygenase YgiD in

Escherichia coli

In the search for new members of the PCA dioxygenase superfamily, our research led us to YgiD, a protein of unknown function in E. coli, which has been annotated to be a dioxygenase in all the sequence databases. Using the YgiD protein sequence as a BLAST entry, the most related sequence is annotated as the LigB subunit of LigAB, even though the pairwise identity between them is only 22.4 %. Based on the crystal structure characterized by the Southeast Collaboratory for Structure Genomics in 2007, YgiD crystallized as a dimer with Zn as the bound metal instead of Fe (Figure 1A). While there are two monomers in the asymmetric unit, the interface of the dimer might be a crystallographic artifact. Based on the analysis from PDBePISA (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.htmL, which is a sophisticated service that analyses the interfaces between macromolecules in their crystal

A B

Figure 1. A. Structure of the YgiD dimer (2PW6.pdb). The two subunits are colored in magenta and red separately, with Zn metal ions in green. B. Superposition of YgiD in orange and LigB (B subunit of LigAB, 1B4U.pdb) in gray, with the Fe metal ion in gray and Zn metal ions in green.

22 structures. The score for YgiD is 10 %, which means it’s only 10 % chance to be a real dimer for YgiD, while the score for LigAB is 65.2 %. The subunit Rossman-like fold with has eight parallel -strands located at the central core, surrounded by seven  helices. The root mean square deviation (rmsd) of YgiD superposed on LigB is only 2.223 Å (Figure 1B), showing that the structural folds are very similar.

Examining the active site of YgiD, the Zn ion is directly coordinated by both oxygens of the carboxylate group of Asp86, a nitrogen from His22, a nitrogen from His57, and a nitrogen from His234 (Figure 2A). In addition, a water molecule can be assigned at the distance of 2.692 Å from the zinc center. The coordination sphere could be described as a distorted octahedral geometry (Figure 2A), with the His234 and one of the oxygens from

Asp86’s carboxylate in the axial positions and the other four proteinaceous ligands in equatorial positions.

Generally, the active site of the YgiD monomer superimposes well with that of LigB, especially the metal coordinating residues (Figure 2B). This suggests that YgiD may catalyze a similar type of reaction as LigAB, or different type of reactions but with the same mechanism. Nevertheless, there are some differences between the two active sites (Figure 2C and 2D). The Asp86 from the the sond YgiD subunit occupies the space where PCA is located in LigAB, which blocks the YgiD substrate from accessing the active site from the same direction as PCA in LigAB. This may have several important implications for the

23

A B

C D

Figure 2. A. View of the Zn coordination residues in YgiD. Distance between the Zn and the coordinating atoms from residues are colored in purple. B. superposition of active sites of YgiD monomer (orange) and LigB (gray). L and Y after residue numbers stand for LigB and YgiD respectively. C. Active site view of LigAB. D. Active site view of YgiD. Residues are labeled as shown in the pictures.  and  after the residue numbers denote the residues subunit of oringin. function of the protein. His121 in YgiD adopts a vastly different conformation from that of the His127 in LigAB, possibly creating an empty cavity neighboring the Zn metal of YgiD for substrate binding. It’s also possible that the Asp86 may act as a catalytic cavity lid. Upon the binding of the substrate with YgiD, conformation changes may occur, leading Asp86 to be replaced by the substrate. Whether this assumption is true or not will be determined by a

24 crystal structure characterization of the YgiD-substrate complex. If the YgiD dimer interface is artificial, then that allows for a binding site.

Based upon gene context analysis, investigation carried out by Klemm and Hancock revealed that ygiD gene was highly up-regulated in biofilm formation of E. coli 83972 and

VR50.(84) Compared with the wild type, knockouts of ygiD in these two strains caused reduced biofilm formation in urine by 26-43 %. However, the function of YgiD either as an enzyme or a regulatory protein related to biofilm formation has yet to be fully explored. The full characterization of YgiD may unveil new functions within the PCA dioxygenase superfamily, and thereby increase our knowledge of this new dioxygenase containing superfamily.

Materials and Methods

All primers (Table 1), AccuPrime Pfx SuperMix, DH5, and BL-21(DE3) cells were purchased from Invitrogen (Carlsbad, CA). Plasmid pKD13 and strain BW25113 for generating a ygiD deletion were kindly donated by Dr. John Gerlt’s Lab in Department of

Biochemistry in University of Illinois at Urbana-Champaign. E. coli K12 ygiD knockout strain JW3007 was purchased from the Coli Genetic Stock Center in Yale University (New

Haven, CT). The E. coli heptosyltransferase-I knockout strain WBBO6 was kindly donated by Dr. Christian Raetz’s Lab at the Duke University School of Medicine. The E. coli K12 mb1760 strain, used to clone ygiD, was obtained from ATCC (ATCC19215, Manassas, VA).

25

DpnI was obtained from New England Biolabs (Ipswich, MA). Isopropyl

β-D-1-thiogalactopyranoside (IPTG) was bought from Gold Bio Technologies (St. Louis,

MO). B-PER II Bacterial Protein Extraction Reagent was purchased from Thermo Scientific

(Rockford, IL). Ampicillin (Amp), ferrous ammonium sulfate hexahydrate, sodium hydroxide, catechol, triethylamine, tetracycline, 3,4-dihydroxybenzoic acid (PCA), L-cysteine, tris-(hydroxymethyl)aminomethane hydrochloride (Tris-HCl), EDTA, sodium dodecyl sulfate (SDS), glycine, nickel sulfate hexahydrate, zinc sulfate, and 3,3-dimethylglutarate,

2-amino-2-methyl-1,3-propanediol were obtained from Sigma (St. Louis, MO). Imidazole, kanamycin, sodium chloride, sodium biphosphate, Bacto tryptone, Bacto yeast extract, and

Amicon 3 k concentrators were purchased from Fisher Scientific (Pittsburgh, PA).

Ammonium persulfate (APS), 30 % acrylamide, Coomassie Brilliant Blue R-250, and N, N,

N, N’-tetra-methylethylenediamine (TEMED) were obtained from Bio-Rad (Hercules, CA).

The Qiaquick Gel Extraction Kit and Qiaquick PCR Purification Kit were purchased from

Qiagen (Valencia, CA). The pTOM vector is a modified version of Novagen pET-15b vector in which the two Ser codons both before and after the His-tag were mutated to His codons by

Toomas Haller in John Gerlt’s lab. The modified pTOM vector has an NdeI/XhoI/BamHI cloning site and encodes Amp resistance, an N-terminal 10-His-tag and a thrombin protease cleavage site for the cleavage of the His-tag.

Luria-Bertani (LB) media was prepared from 5 g/L sodium chloride, 10 g/L tryptone,

26

5 g/L yeast extract, and 1 mM sodium hydroxide. SOC media was made using 20 g/L Bacto

Tryptone, 5 g/L Bacto Yeast Extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM

MgSO4, and 20 mM glucose. The concentration of ampicillin (Amp) in all growth media was

100 μg/mL, while the concentration of kanamycin in all growth media was 50 μg/mL. All liquid growths were performed at 37 ˚C, with 200 rpm shaking speed unless otherwise indicated. 1 % agarose gels for DNA electrophoresis were prepared using 10 mg/mL agarose, and 50 mL TAE electrophoresis buffer (40 mM tris, 20 mM glacial acetic acid, and 1 mM

EDTA (pH 8.0), and 0.1 mg/mL ethidium bromide). 7.5 % polyacrylamide protein gels were prepared using 30% acrylamide, 0.01 mM EDTA, separating buffer (1.5 M Tris-HCl, pH 8.8), stacking buffer (0.5 M Tris-HCl, pH 6.8), APS and TEMED. Protein running buffer for sodium dodecylsulfide-polylacrylamide gel electrophoresis (SDS-PAGE) was prepared using

0.1 % w/w SDS, 192 mM glycine, and 25 mM Tris (pH 8.2). All DNA agarose and

SDS-PAGE protein gels were run at 100 V using a PowerPac Basic power supply from

Bio-Rad (Hercules, CA). Protein gels were stained using Coomassie Brilliant Blue R-250 for

22 min and destained for 30 min.

An AKTA FPLC system (GE Healthcare, Piscataway, NJ) was used for

ADP-L-glycerol-D-manno-heptose (ADP-heptose) purification. A batch benchtop method was adopted for purification of proteins. Extinction coefficients and predicted masses of all proteins were calculated based on the amino acid sequence using the web application

27

ProtParam (http://www.expasy.ch/tools/protparam.htmL). PCR reactions were accomplished utilizing a S1000 Thermal Cycler (Bio-rad, Hercules, CA). Cells were lysed using a French press cell (Aminco, Urbana, Ill.). The presence of ADP-heptose was confirmed by a Finnigal

LCQ Advantage MAX mass spectrophotometer (Thermo, Newington, NH). The isothermal titration calorimetry (ITC) experiments were accomplished using a VP-ITC apparatus

(MicroCal, Northhampton, MA). Electroporation was carried out using a BTX electro cell manipulator 600 (BTX inc, San Diego, CA). Spectrophotometric assays were taken using a

Varian Cary 100 Bio UV-Visible Spectrophotometer (Agilent Technologies, Santa Clara, CA).

NMR was performed using Varian Unity Inova 400 MHz NMR spectrometer (Agilent

Technologies, Santa Clara, CA). NMR data was processed using either VARIAN’s VNMR software or ACD/NMR processor (ACD/Labs, Toronto, Ontario, Canada).

Cloning of YgiD from E. coli. The YgiD gene was PCR amplified from E. coli

K12 mb1760 and subcloned into pTOM vector in the NdeI and BamHI cloning sites by Sam

Berman.

Expression and purification of putative YgiD dioxygenase (32.2 kDa). The

YgiD-pTOM construct was transformed into BL-21 cells using the heat shock method, and stored in a 40 % glycerol solution as a frozen stock in -80 °C freezer. 1 mL overnight LB-amp growth of transformed BL-21 cells was inoculated into 1 L LB-amp media. After the

absorbance of the cell growth at a wavelength of 600 nm (OD600) reached 0.6, 1 mL of 1 M

28

IPTG was added into the growth medium to make a final concentration of 1 mM to induce the cells. Cells were harvested by centrifugation after 24 h growth from the time of induction at

5,000 rpm for 15 min. Resulting cell pellets were lysed in FE buffer (consisting of 50 mM

Tris-HCl and 2 mM L-cysteine (pH 7.0)). The cells were passed three times through a French

Press cell and followed by centrifugation at 11,000 rpm for 30 min at 4 ºC to remove the cell debris. The resulting supernatant was loaded onto a IMAC Sepharose Fast-Flow column

(16  50 mm) charged with Ni2+, and purified by step wash: first with 80 mL binding buffer

(consisting of 50 mM NaH2PO4 (pH 7.42), 300 mM NaCl, and 10 mM imidazole), then with

80 mL washing buffer (containing 50 mM NaH2PO4 (pH 7.45), 300 mM NaCl, and 50 mM

imidazole), eluted by 40 mL eluting buffer (consisting of 250 mM imidazol, 50 mM NaH2PO4

(pH 8.0), and 300 mM NaCl). The Ni2+ and all residual proteins were stripped from the column using 80 mL strip buffer (containing 200 nM EDTA (pH 8.0), 500 mM NaCl). The fractions containing YgiD were identified by SDS-PAGE. The identified fractions were combined, and subjected to MALDI mass analysis (32073.1 g/mol compared with expected mass 32266.8 g/mol).

To determine the oligomeric state of the purified YgiD, a native gel and size exclusion chromatography methods were exploited. All preparations for running native gels were the same with SDS-PAGE except that (1) SDS was excluded from running buffer, (2) gels were run at 100 V running voltage at 4 °C, (3) protein samples were not boiled, and (4) a 5X native

29 gel loading dye without dithiothreitol (DTT) was used. The 5X native gel loading dye was prepared with 0.313 M Tris-HCl (pH 6.8), 0.05 % m/w bromophenol blue, and 50 % m/w glycerol. The size exclusion chromatography was accomplished with a HiLoad 16/60

Superdex 200 prepgrad column (16  600 mm, 120 mL) in Dr. Rich Olson’s lab at Wesleyan

University. Before loading the protein samples, the column was washed with 40 mL degassed deionized water at 1 mL/min, 45 mL degassed 20 % ethanol at 1.5 mL/min, and then equilibrated with 200 mL degassed equilibration buffer (containing 20 mM Tris (pH 7.4), and

150 mM NaCl). 1 mL of 8 mg/mL purified YgiD was spun at 15,000 rpm for 10 min to remove aggregates, filtered, and loaded onto the column with a superloop. The column was isocraticly eluted with 30 mL equilibration buffer.

Isothermal Titration Calorimetry experiment. 3 mL purified 4.8 mg/mL YgiD was dialyzed in 1 L of 20 mM Tris-HCl (pH 7.0) with 1 mM EDTA three times to remove trace metals which may bind to the isolated protein, and then dialyzed into 1L of 20 mM Tris-HCl

(pH 7.0) three times to remove EDTA. Each dialysis was performed for approximately 4 h.

The ITC experiments were accomplished using a VP-ITC apparatus. In each experiment, 48 injections (the volume of each injection was 5 μL, but the first injection is 3 μL) of a solution

containing 250 mM ZnSO4 or Fe(NH4)2(SO4)2 were titrated into 2.5 mM YgiD (cell volume =

1.43 mL, stirring speed 374 rpm) using a computer controlled 310 μL microsyringe. The reference cell was filled with distilled water. Before each experiment, the protein and metal

30 solution were degassed for 2-5 min to eliminate air bubbles, using the ThermoVac accessory of the microcalorimeter. The first addition started after baseline stability was achieved. To allow the system to reach equilibrium, a spacing of 350 s between each injection was applied.

Data analysis was performed using the Origin software supplied by MicroCal.

UV-Vis Spectrophotometric Assays of YgiD. Spectrophotometric Assays of YgiD were carried using a Varian Cary 300 Bio UV-visible Spectrophotometer at room temperature.

The reaction was recorded with 21 scans with the range of 200-800 nm at 600 nm/min, with a

30 min interval between scans. Each 2 mL assay mixture contained 50 mM GTA buffer

(consisting of 50 mM 3,3-dimethylglutarate, 50 mM 2-amino-2-methyl-1,3-propanediol, and

50 mM Tris ,pH 8.5), purified YgiD (about 20 μg), 100 μM Fe(NH4)2(SO4)2, and 100 μM

substrate (PCA or catechol). YgiD was pre-incubated with Fe(NH4)2(SO4)2 for 10 min before the assay began, and wasn’t added into the mixture consisting of buffer and substrate after the first scan.

Knock out ygiD from E. coli. Generating the knockout construction of the ygiD gene in E. coli was attempted based on experiments performed by Wanner (85). The

Kanamycin-resistance cassette was generated by PCR from pKD13 using specifically designed primers (Table 1). PCR reactions (25 μL) were prepared using 22.5 μL AccuPrime

SuperMix, 1.5 μL pKD13 (116 ng), 0.5 μL primer (50 ng, each). The reaction mixture was incubated at 95 °C for 5 min to denature the DNA, followed by 45 cycles consisting of 95 °C

31

Table 1 Primers used in this chapter

Oligo name Sequence (5’-3’)

GTTCTTACTCCGCCGAACAAGCGTATTGTG

AGGATATCAAAATGACACCTTTAGTTAAGG Forward ATTCCGGGGATCCGTCGACC pKD13 GCCACGATACCGGATGGCACTCGCCATCCG Reverse GTAATTGTTAGCCTATCTGCACCGACAGCT

GTAGGCTGGAGCTGCTTCG

ygiD A1 GGGTTTTTGCCCGAGAACGC knockout C1 GCAGCGGAAGGTCCGTATGG verification for 15 s, 45-65 °C for 30 s for annealing, and 68 °C for 1.5 min for elongation. The PCR products were stored at 4 °C and subsequently purified from agarose gel-electrophoresis by following Qiaquick Gel Extraction Kit procedures. The purified PCR products were digested with restriction enzyme DpnI at 37 °C overnight using the following conditions: 1 μg PCR product, 2 μL DpnI (20 U), 3 μL 10X NEB buffer 4 for a total volume of 30 μL. The digested products were purified utilizing Qiaquick PCR Purification Kit.

To make electroporation competent BW25113 cells, transformants BW25113 were

grown in 100 mL LB-Amp media with 2 mM L-arabinose at 30 °C to an OD600 of 0.6,

32 incubated in ice-water bath for 15 min, and then centrifuged at 6 k rpm for 15 min at 4 °C.

The resulting cell pellets were resuspended in 18.75 mL ice-cold ddH2O, and spun down at

1500 rpm for 10 min at 4 °C. The cell pellets were washed three times with 11.25 mL, 5 mL, and 1 mL ice-cold 10 % glycerol separately, resuspended in 600 μL ice-cold 10 % glycerol, and 50 μL aliquots were dispensed into sterile microcentrifuge effendorf tubes.

Electroporation was performed by using 50 μL competent BW25113 cells and 400 ng digested PCR products at 2.5 kV with 25 μF and 200 ohm. 1 mL SOC and 1 mM L-arabinose was added into shocked cells, and shake at 200 rpm for 2 h at 37 °C. 400 μL incubated solution was spread onto LB-Kan Agar plates to select successful transformation. Viable colonies found on the LB-Kan plates were checked for successful knock out of ygiD via PCR.

ADP-heptose extraction from WBBO6. 0.5 mL overnight growth of WBBO6

E. coli Hep I and II knockout cells was inoculated into 1 L LB media containing 12 μg/mL of

tetracycline, and grown at 37 °C until OD600 reached 1.0. Cells were harvested at 6 k rpm for

15 min, and stirred in 20 mL 50 % ethanol for 1 h on ice. The mixture was then centrifuged at

5 k rpm for 10 min, and the resulting supernatant was placed in a Speedvac to remove the ethanol and reduce the sample volume to half. The remaining undissolved cell debris and high molecular weight compounds were removed by using Amicon 3 kDa cut off concentrators. To obtain purer ADP-heptose as a standard, the filtered aqueous solution was further purified through a HiPrep DEAE Sepharose Fast Flow column (16  320 mm, 64 mL).

33

The column was equilibrated with 2 CV of H2O. The filtered solution was loaded onto the

column at 3 mL/min after the equilibration, followed by a 1.5 CV of H2O wash. The

ADP-heptose was eluted by 5 CV of a linear gradient from 0 % to 52 % of 0.97 M triethylammonium bicarbonate (pH 8.0). The remaining compounds were washed off by

2 CV of 50 % 0.97 M triethylamine. 2 CV of H2O was used to wash off the triethylamine.

150 μL of each fraction was lyophilized twice to remove the triethylamine, and subsequently dissolved in 50 % acetonitrile in water to confirm the presence of ADP-heptose by ESI-MS.

Fractions containing ADP-heptose were combined, lyophilized and stored at -80 °C.

Differential analysis of knockout and wild type extractions. The E. coli wild type extraction of E. coli K12 mb1760 was performed using the same procedures as described for the extraction of ADP-heptose from WBB06. Both the ADP-heptose and E. coli wild type

extraction were lyophilized and dissolved in D2O. DQCOSY spectra were acquired using a

VARIAN INOVA 400 MHz NMR spectrometer with following parameters: 512 complex increments (ni) in F1, and 16 scans per increment (nt). Spectra were processed using either

VARIAN’s VNMR software or ACD/NMR Processor. Subtraction of E. coli wild type

DQCOSY spectrum from that of ADP-heptose extraction was prepared via ACD/NMR processor.

Results and Discussion

Annotated to be a dioxygenase, the actual function of YgiD is still veiled. The

34 similar structural folds of YgiD and LigAB suggest YgiD may catalyze a similar type of reaction as LigAB, or different type of reactions but with the same mechanism. However, the differences in their active sites and preliminary gene context analysis both advise YgiD may have a very different function with LigAB. Thus different assays have been applied to explore the substrate of YgiD and its metal content.

Expression and purification of putative YgiD dioxygenase. The gene ygiD was subcloned directly from E. coli K12 MB1760 into pTOM vector using NdeI and BamHI digestion sites for the expression of His-tagged enzymes. Successful purification by immobilized nickel affinity chromatography afforded about 200 mg purified proteins per liter

of culture. The protein was eluted using the elution buffer (consisting of 50 mM NaH2PO4

(pH 8.0), 300 mM NaCl, and 250 mM imidazole). The purity of the enzyme was confirmed by SDS-PAGE gels (Figure 3A). A single protein band was detected on the denaturing gel corresponding to about 32 kDa. The size exclusion chromatography, in which the molecules in solution were separated by their sizes, was used to investigate the oligomeric state of YgiD.

The smaller the protein, the faster it passes through the column. Different oligomeric states of protein will be displayed as separated peaks on the chromatogram because of their variable sizes. Both size exclusion column chromatography and the native gel experiments clearly indicated the YgiD was a monomer (Figure 3B). If YgiD was a dimer then it would elute around the position of BSA on the size exclusion chromatography.

35

A

37 25

B C Lysozyme 14.7 kDa std YgiD

66 45 31

BSA 66 kDA YgiD 32 kDA

Figure 3. A. Batch purification result of YgiD through a Sepharose Fast-Flow column charged with Ni2+ by step wash: protein loading flow through (F), binding step (B), washing step (W), eluting step (E), and stripping step (S). Numbers besides lane under standard (std) are indicating corresponding molecular weight (kDa). YgiD was eluted and shown in red box. Fractions are indicated above each well. B. Size exclusion column chromatography of 1 mL 8 mg/mL YgiD from Sepharose Fast-Flow column. Traces of standards BSA and Lysozyme were indicated with labels besides them. C. Native gel of YgiD eluted from size exclusion column.

Calorimetric Titration Curves. ITC experiments were exploited to determine the

metal binding information of YgiD. Upon the injection of ligand aliquots into the ITC sample

cell, heat was either taken up or evolved depending on the nature of the reaction.

Measurements consist of the time-dependent input of power required to maintain equal

36 temperatures between the sample and reference cells, causing signals return to baseline before the next injection. Then observations are plotted as the power needed to maintain the reference and sample cell at an identical temperature against the time.(86, 87) Two divalent metals Fe and Zn were chosen, because most extradiol dioxygenases contain iron and the crystal structure of YgD shows zinc. The binding of the divalent metal cations to

YgiD is exothermic, the downward peaks indicating heat released on injection of cations. The binding profile of Fe2+ binding to YgiD was shown in Figure 4A. It displayed a primary exothermic binding event, following by a sond weaker exothermic binding event. A two sites binding mode was applied to fit the curve. The exothermic association parameters were

A Time (min) B Time (min) 0 50 100 150 200 250 300 0 50 100 150 200 250 0.50 0.20 0.10

0.00 0.00 -0.10 -0.20 -0.50

-0.30

al/sec al/sec

礳 -0.40 -1.00 -0.50 -0.60 -1.50 -0.70 0.00 -6.00

-20.00 -8.00

-40.00 -10.00

-60.00

-12.00 KCal/MoleInjectant of

-80.00 KCal/Moleof Injectant -14.00 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Molar Ratio Molar Ratio Figure 4. Calorimetric titration profiles of aliquots of 250 mM metal ions into 2.5 mM treated YgiD in 20 mM Tris, pH 7.0. A two sites binding mode carried out by Origin software supplied by MicroCal..

A. FeNH4SO4; B. ZnSO4

37

-1 summarized in Table 2. The first association constant K1 (2.03E8 M ) suggests a very tight

-1 binding. While the sond association constant K2 (4.79E6 M ) revealed a weaker but still

relatively strong binding. The two enthalpies -83.8 kcal/mol and -31.3 kcal/mol were higher

than the enthalpies for other protein metal binding events, which are generally range from

2-25 kcal/mol.(88-90) The entropy changes of -243 and -76 cal/mol/deg are comparable with

entropies from other protein metal binding.(91, 92) According to the crystal structure, there

are three metal binding sites at the YgiD monomer (Figure 5). It’s assumed that the stronger

binding event correlates with the interaction between the metal cation and the active site (the

middle metal in Figure 5), and the weaker exothermic event perhaps relates to the interactions

His22 Zn

Glu233 Zn

His57 His177 Zn

His177 His234 His234 His234

Figure 5. Interactions between bound Zn ions and YgiD residues, including integrated and enlarged graph.

38 between metal cations and one of the other two binding sites (the first, and third metal in

Figure 5).

The binding of Zn2+ to YgiD was found to be weaker than Fe2+. Generally the trends were similar, but more noise was observed from two exothermic binding events (Fig 4B). The enthalpy and entropy are located in normal range, (89, 91, 92) but the rmsd are in ridiculers

2+ 2+ large (Table 2). The main association constant K1 of Zn is much smaller than that of Fe , suggesting Zn2+ is not preferred by YgiD. This is consistent with the observation that dioxygenases tend to contain Fe2+ rather than Zn2+.(61, 93)

N1 K1 ΔH ΔS N2 K2 ΔH ΔS -1 -1 (M ) (cal/ (cal/mol (M ) (cal/ (cal/mol mol) /deg) mol) /deg)

Fe 0.676 2.03± -8.38± -243 0.631 4.79± -3.14± -74.6 ±0.01 0.65E8 0.97E4 ±0.01 1.53E6 0.21E4

Zn 3.46± 1.31± -9188± -12.0 0.56± 4.39± -1.13± 15.3 2.42 47E4 9886 24.6 87.5E11 983E4

Table 2. Exothermic association parameters of Fe2+ and Zn2+ with YgiD obtained from two sites binding mode.

UV-Vis Spectroscopy Characterization. UV-Vis spectroscopy is exploited to determine whether YgiD can react with PCA derivatives, since LigAB, an extradiol dioxygenase that is related to YgiD, uses PCA as the native substrate. PCA has a UV spectrum with two peaks at 256 and 292 nm (Figure 6). There was no peak shift after PCA was incubated with YgiD for about 10 h. The intensity change could not be considered as persuasive proof of reaction because PCA itself also increases intensity due to autooxidation.

39

These scans look different from PCA-LigAB incubation (data not shown), thus implying that

PCA is not the substrate of YgiD. Catechol has also been investigated in the same way, and no apparent spectral changes were observed (data not shown). Since YgiD failed to demonstrate any ability to utilize PCA or catechol, we hypothesize that either (1) the enzyme needs to be purified differently in order to maintain activity, or (2) that YgiD is not an extradiol dioxygenase, like LigAB.

A

B

Figure 6. A. Time dependent UV-Vis spectrum of PCA with YgiD. The curves were determined using 50 mM GTA buffer (pH 8.5) consisting of 50 mM 3,3-dimethylglutarate, 50 mM Tris, 50 mM

2-amino-2-methyl-1,3-propanediol, purified YgiD (about 20 μg), 100 μM Fe(NH4)2(SO4)2, and 100

μM PCA. YgiD was pre-incubated with Fe(NH4)2(SO4)2 for 10 min before the assay began, and wasn’t added into the mixture consisting of buffer and substrate until the first scan accomplished. The reaction was recorded with 21 scans with the range of 200-600 nm at 600 nm/min, with a 30 min interval between scans. B. UV-Vis spectrum of PCA autooxidation. The assay concentrations and parameters of

B are exactly the same with A, except ddH2O was used to substitute the enzyme.

40

Knockout Experiment of YgiD. Since PCA and catechol are not the substrate of YgiD, in an attempt to identify its natural substrate, a NMR difference experiment was performed using 2D DQCOSY NMR spectroscopy. If YgiD is an enzyme, a strain of E. coli that is deficient inYgiD should show differences with their counterparts from the wild type strain by

2D DQCOSY NMR spectroscopy because of the accumulation of substrate. (94) Attempts to generate a YgiD knockout strain were made according to Wanner’s paper,(23) in which the target gene has been replaced by a Kanamycin-resistance cassette. Gradient PCR was applied to obtain Kanamycin-resistance cassette from the vector pKD13 (Figure 7) However, generation of a YgiD knockout failed over multiple attempts. A YgiD knockout strain was ultimately purchased from the Coli Genetic Stock Center at Yale University.

Figure 7. Gradient PCR result of Kan cassette from pKD13 on a 1% Agarose gel. The object bands are restricted by red box.

41

Differential analysis of knockout and wild type extractions. As a control to ensure the ability to perform the experiment, a heptosyltransferase I (Hep I) deficient strain of E. coli

(WBB06) was used. A cell extract solution containing the natural substrate of Hep I,

ADP-D-glycero-D-manno-heptose, was prepared and analyzed by 2D DQCOSY NMR spectroscopy. The wild type strain E. coli mb 1760 strain was also been processed through the same procedures. Figure 8 was obtained by subtracting the DQCOSY spectrum of wild type extraction from that of WBB06 extraction. The yellow dots from the WBB06 extraction

Figure 8. Subtraction of DQCOSY spectrum of E. coli K12 mb1760 extraction from WBB06 extraction. The red dots are overlapping part. The yellow points correspond to the difference from WBB06 extraction.

42 spectrum are the ones which are not shown on the wild type extraction spectrum. However, there were more yellow points shown than expected from ADP-heptose alone. They may come from the background noise from the WBB06 extraction spectrum, or extra substance from the extraction process.

Conclusions and Future Directions.

While the protein sequence similarity suggests that it is most closely related to enzymes that are extradiol dioxygenases, like LigAB, this project failed to demonstrate any catalytic activity for YgiD. Interestingly, Fe was demonstrated to bind more tightly with YgiD than Zn which was observed to co-crystalize with the enzyme in its crystal structure.

Additional experiments need to be performed to demonstrate success with the control reaction for the NMR difference experiment, and once successful attempts could be made to perform the same experiment with wild-type E. coli and the YgiD knockout strain. Consider the possibility of YgiD as a regulatory protein, but regulatory proteins can be divided to two groups, gene regulatory proteins which bind with DNA and non-genetic regulatory proteins which generally bind with proteins. An experiment using microscopy could be exploited to determine whether YgiD is a genetic regulatory protein or not. Label the YgiD with the GFP tag, transfer the labeled protein back to the YgiD knockout cell whose chromosome was labeled with DAPI, and visualize the whole complex under the microscope. If YgiD binds with DNA, an overlapping of the two colors resulted from the two fluorophors could be

43 observed. If YgiD binds is a non-genetic regulatory protein, a protein complex immunoprecipitation experiment using YgiD antibody could be adopted.(95, 96)

44

Chapter 3 Expression and Purification of 3-methoxy gallic acid 3,4- dioxygenase

(DesZ) from Sphingomonas paucimobilis SYK-6

Lignin, which comprises 10-35 % of the dry weight in plant cell walls, is a low cost waste from the pulp and paper industry.(97, 98) Lignin degradation is an important area of research due to its role in nature as well as for commercial application.(99) Methods to optimize lignin degradation could provide access to lignin based products or allow enhancement of cellulose and hemicelluloses accessibility.(100) Currently, lignin brings enormous obstacles to the utilization of lignocelluloses for biofuels, and other commercial chemical feedstock production, due to its high molecular weight and the presence of various relatively stable carbon-to-carbon and ether links.(101) As a potentially valuable source of renewable aromatic chemicals, biodegradation of lignin to biofuel molecules could aid the biofuel production, provide a solution for the “food vs. fuel debate” risen from the first generation biofuel production technologies, and ensure overall sustainability and cost effectiveness of the biomass to biofuel production process.(102)

To microbiologically convert lignin to biofuels, three key steps are required: a

scalable process for lignin depolymerization, microbiological pathways that can efficiently

degrade the aromatic components obtained from the depolymerization, and the development

of the microorganisms that are capable of producing fuel-like molecules from available

carbon sources. The first and third steps of this process have been the focus of research in

both industrial and academic labs; the second step is one focus of the Taylor Lab.

45

The first step, the depolymerization of lignin, is being extensively pursued through either chemical or microbial processes.(103-105) Much progress has been achieved in the

chemical processes including the kraft process using NaOH and Na2S, the sulfite process utilizing acidic sulfite, the soda process using NaOH, the quatam process employing tetramethylammoniun hydroxide as digestion solutions, and several others.(106) The microbial degradation of lignin has been most extensively studied in white-rot (notably

Phanerochaete chrysosporium) and brown-rot fungi.(107) Several types of extracellular oxidative enzymes found in both types of fungi, such as laccases, lignin peroxidases, peroxidases, and versatile peroxidases, are involved in lignin degradation in plant cell walls.(108, 109) Besides the basidiomycetes, a number of bacteria have been reported to break down lignin.(110, 111) The extensive studies of lignin depolymerization provide abundant choices and possibilities for the further utilization of lignin to more valuable molecules.

For step three, the biosynthesis of fuel-like molecules from available carbon sources, research is ongoing toward the development of bacteria which efficiently convert sugar molecules into both ethanol and other higher molecular weight potential biofuels like butanol.(112-116) This includes the first generation of biofuel production technologies, such as biodiesel production from vegetable oils, bio-ethanol production from energy crops, and biogas produced by anaerobic digestion of biomass.(117-119)

46

Studies on microbial pathways which are capable of degrading lignin derived aromatic compounds (the second step mentioned above), and thus bridging the gap between the research areas of step one and step three are still rare.(120-123) Among them, a catabolic pathway in S. paucimobilis SYK-6 has been identified for funneling several lignin derived monomeric and dimeric components into the citric acid cycle compounds.(124) This pathway was determined on the basis of knock-out complementation studies and the mass spectrometric analysis of reaction products. Most of enzymes involved in the pathway are not fully characterized enzymologically or are just putatively assigned and completely uncharacterized. In the pathway (Figure 1),(124) compounds with a syringyl

(4-dydroxy-3,5-dimethoxyphenyl) moiety and guaiacyl (4-hydroxy-3-methoxyphenyl) moiety are degraded to syringate and vanillate respectively. Subsequently, syringate and vanillate are degraded into 3-O-methylgallate (3MGA) and protocatechuate (PCA), which are further converted to puruvate and oxaloacelate which are part of the citrid acid cycle.(125)

Masai and his colleagues investigated the role of DesZ in the pathway through analyzing the ability of desZ mutant strain of S. paucimobilis SYK-6 to grown on syringate.(125) They found that mutation in desZ alone didn’t affect the syringate degradation much, however, the desB desZ ligB triple disruption mutant and the desB ligB double disruption mutant completely lost the ability to grow on syringate. Meanwhile desB desZ disruption mutant has a similar growth rate on syringate as desB mutant. It is hypothesized

47 that the desB ligB double disruption mutant couldn’t grow on syringate, because of the accumulation of gallate which completely inhibits the growth of the SYK-6 strain on syringate completely. To confirm this hypothesis and with the functions of LigM and LigI already determined,(126, 127) the authors built a desB ligB ligM triple inactivation mutant which was able to grow on syringate. To further validate the role of DesZ, a ligM disruption but ligI insertion strain was built and able to grow on syringate, because this mutant closed the route from 3MGA to gallate and activated the route from 3MGA to 4-oxalomesaconate

(OMA) through 2-pyrone-4,6-dicarboxylate (PDC), thus allowing the mutant to grow. All of these factors indicated that there was a pathway converting 3MGA through

4-carboxy-2-hydroxy-6-methoxy-6-oxohexa-2,4-dienoate (CHMOD) to OMA catalyzed by

DesZ and a hydrolase. The reaction products are validated through GC-MS. The Km and Vmax values of DesZ for 3MGA were determined to be 210 ± 24 μM and 3.6 ± 0.2 U/mg, respectively.(128) Complete characterization of DesZ would allow for a full understanding of this pathway and deeper knowledge of the unknown dioxygenase superfamily.

This work is a continuation of the preliminary research by Ann-Marie Illsley toward obtaining DesZ for kinetic and crystallographic investigations.

48

COOH COSCoA COOH COOH COOH CHO COOH LigW2 FerA FerB2 LigY FerB LigW H3CO COOH H3CO O OH COOH OCH OCH OCH3 OCH3 3 3 OH OH OH OH OH OH 5CVA Ferulate Vanillin Vanillate LigZ LigM COOH COOH CH OH COOH 2 COOH CH OH CHO 2 O O H3CO OH O H CO OCH 3 3 OH OH OCH LigEFG OH 3 OH OH Syringate OH-DDVA OCH3 PCA LigX OCH3 OH OH LigAB DesA COOH COOH LigD COOH LigD2 COOH CHO CH2OH H3CO OCH3 HO H3CO OH CHO OH OH O HOOC OH OH 3MGA DDVA OCH3 CHMS LigAB LigM LigC DesZ OCH3 DesZ COOH OH COOH COOH spontaneous b-aryl ether HO OH O O COOH H3COOOC HO COOH OH PDC CHMOD Gallate

hydrolase LigAB LigI DesB

COOH COOH COOH

HOOC HOOC HOOC HO COOH O COOH HOOC OH enol form enol form keto form OMA

LigJ

HO COOH

HOOC O COOH CHA LigK

Puruvate+Oxaloacetate Figure 1. Proposed catabolic pathway for the degradation of lignin-derived aromatic compounds in S. paucimobilis SYK-6 (adapted from Ref. 22). Enzymes abbreviations: LigD, NAD-dependent -aryl ether dehydrogenase; LigE and LigF, -etherase; LigG, glutathionelyase; LigX, DDVA O-demethylase; LigZ, OH-DDVA diooxygenase. LigY, C-C hydrolase; LigW/LigW2, carboxyvanillate decarboxylase; LigC, 4-carboxy-2-hydroxymuconate-6-semialdehyde dehydrogenase; FerA, feruloyl CoA synthetase. FerB/FerB2, feruloyl CoA hydratase/lyase; LigAB, PCA 4,5-dioxygenase; LigM, tetrahydrofolate

(H4folate)-dependent O-demethylase; LigI, PDC hydrolase; LigJ, OMA hydratase; DesB, gallate dioxygenase.; LigK, 4-carboxy-4-hydroxy-2-oxoadipate aldolase/oxaloacetate decarboxylase; DesA, syringate O-demethylase; DesZ, 3MGA 3,4-dioxygenase. Abbreviations: 5CVA, 5-carboxyvanillate DDVA, 5,5’-dehydrodivanillate; CHMS,4-carboxy-2-hydroxymuconate-6-semialdehyde; CHA, 4-carboxy-4-hydroxy-2-oxoadipate; OH-DDVA, 2,2’-trihydroxy-3’-methoxy-5,5’-dicarboxybiphenyl;;

49

Materials and Methods

BL-21 AI cells were purchased from Invitrogen (Carlsbad, CA). pET-21b vector was obtained from Novagen (EMD4 Biosciences, San Diego, CA). Isopropyl

β-D-1-thiogalactopyranoside (IPTG) was bought from Gold Bio Technologies (St. Louis,

MO). B-PER II Bacterial Protein Extraction Reagent was purchased from Thermo scientific

(Rockford, IL). Ampicilliam (Amp), lysozyme, ferrous ammonium sulfate, sodium hydroxide,

EDTA, L-cysteine, tris-(hydroxymethyl)aminomethane hydrochloride (Tris-HCl), sodium dodecyl sulfate (SDS), 3-methoxygallic acid, glycine, Triton X, and manganese sulfate monohydrate were obtained from Sigma (St. Louis, MO). Tris-base and sodium chloride were purchased from Fisher Scientific (Pittsburgh, PA). Glacial acetic acid and isopropanol were purchased from Phamaco-AAPER (Brookfield, CT). 30 % acrylamide, ammonium persulfate

(APS), Coomassie Brilliant Blue R-250, and N, N, N’ N’-tetra-methylethylenediamine

(TEMED) were obtained from Bio-Rad (Hercules, CA). Bacto tryptone and Bacto yeast extract were obtained from Difco Laboratories (Lawrence, Ks). pTOM vector is a modified version of Novagen pET-15b vector in which the two Ser codons both before and after the

His-tag were mutated to His codons by Toomas Haller in John Gerlt’s lab, which encodes

Amp resistance, an N-terminal 10-His-tag and a thrombin protease cleavage site for the cleavage of the His-tag.

The Luria-Bertani (LB) media was prepared from 5 g/L sodium chloride, 10 g/L

50 tryptone, 5 g/L yeast extract, and 0.001 M sodium hydroxide. The concentration of Amp in all growth media was 100 μg/mL. All liquid growths were performed at 37 ˚C, with 200 rpm shaking speed unless otherwise indicated. The polyacrylamide protein gels were prepared using 30 % acrylamide, separating buffer (1.5 M Tris-HCl, pH 8.8), stacking buffer (0.5 M

Tris-HCl, pH 6.8), 0.01 mM EDTA, APS and TEMED. The protein running buffer was prepared using 0.1 % SDS, 192 mM glycine, and 25 mM Tris (pH 8.2). All SDS-PAGE protein gels were run at 100 V using a PowerPac Basic power supply from Bio-Rad. Then the protein gels were stained using Coomassie Brilliant Blue R-250 for 22 min and destained for

30 min.

An AKTA FPLC system (GE Healthcare, Piscataway, NJ) was used for protein purification. The extinction coefficients and predicted masses of all proteins were calculated based on the amino acid sequence using the web application ProtParam

(http://www.expasy.ch/tools/protparam.htmL). The cells were burst open using a French press cell (Aminco, Urbana, Ill.). Spectrophotometric assays were taken at a Varian Cary 100 Bio

UV-visible Spectrophotometer (Agilent Technologies, Santa Clara, CA).

Cloning of DesZ. The desZ DNA sequence was synthesized by DNA2.0 (Menlo Park,

CA), and provided in vector pJ201. It was cut out of pJ201 and ligated into pTOM vector with the NdeI and BamHI cloning sites by Julie Huang. Later the desZ was removed out of pTOM, subcloned into pET-21b in the NdeI and BamHI cloning sites, mutated out the stop

51 codons at the C-terminal, and transferred into BL-21 AI cells by Ann-Marie Illsley.

Soluble protein expression screening. The DesZ-pET21b with stop codons version

(shorten as DesZ-pET21b) was transferred into BL-21 AI cells and stored as frozen stock.

100 µL overnight growth media was inoculated into 10 mL LB-Amp media. After the OD600 reached 0.6, different concentration combinations of IPTG (1 mM and 0.1 mM) and

L-arabinose (0.2 %, 0.02 %, 0.002 %, and 0.0002 %) were added into the growth media to induce the protein production. The incubation temperature was changed to 30 °C. 1 mL aliquot of growth media was removed out at 0 h, 5 h, and 24 h intervals after induction. The aliquots were centrifuged at 11,000 rpm for 10 min at room temperature to save the cell pellets and stored at -20 °C. To extract the soluble proteins, the thawed cell pellets were lysed utilizing 150 µL B-PER II Bacterial Protein Extraction Reagent. After centrifugation at

11,000 rpm for 5 min, supernatants were saved as the soluble fractions. The remaining cell pellets were re-suspended in 150 µL B-PER II Bacterial Protein Extraction Reagent as insoluble fractions. Both the soluble and insoluble proteins were visualized on SDS-PAGE gels and compared. To determine the best growth condition, the soluble and insoluble proteins obtained from the exact same growth conditions except changing the incubation

temperature after induction from 30 °C to 37 °C and 16 °C, or adding 0.1 mM Fe(NH4)2SO4

or MnSO4 to induce were compared with each other.

Expression and Purification of DesZ (DesZ-pET21b 36.5 kDa, DesZ-C-pET21b

52

39.0 kDa) 16 mL overnight LB-amp growth of transformed DesZ-pET21b BL-21 AI cells was inoculated into 4 L LB-amp media. After the absorbance of the cell growth at 600 nm reached 0.6, 0.4 mL 1 M IPTG and 20 % L-arabinose were added into the growth media to make the final concentration of IPTG and L-arabinose to 0.1 mM and 0.002 % respectively to induce the cells. After 24 h incubation at 30 °C from induction, the cells were harvested at

6,000 rpm for 15 min. Resulting cell pellets were lysed in buffer consisting of 20 mM Tris,

0.1 mM Fe(NH4)2(SO4)2 and 10 mM L-cysteine (pH 8.0). The cells were passed three times through a French Press cell, followed by centrifugation at 11,000 rpm for 30 min at 4 ºC to remove the cell debris. The resulting supernatant was purified on an IEX DEAE Sepharose

Fast-Flow column (26  530 mm, 244 mL). The column was first equilibrated with 3 CV of lysis buffer at 1 mL/min. After the proteins were loaded onto the column, 1.5 CV lysis buffer was used to remove the unbound proteins. The column was eluted with a linear gradient of

0-50 % of eluting buffer (containing 20 mM Tris base, 0.1 mM Fe(NH4)2(SO4)2, 1 M NaCl, and 10 mM L-cysteine (pH 8.0)) over 6.5 CV, followed by 1.5 CV of 50 % eluting buffer to wash off all the remaining proteins. After the elution, the column was washed by 2 CV of

2+ H2O. 3 CV of 0.5 % Triton-X and 0.1 M acetic acid were utilized to wash off the Fe ion at

0.5 mL/min. 3 CV of 30 % isopropanol was adopted to wash off the acid. The column was

regenerated with 1 CV of 1 M NaCl, and the extra salt was washed off with 2 CV of H2O. All the buffers used during this purification were degassed. The fractions containing DesZ were

53 identified using SDS-PAGE, and combined.

The purification for DesZ C-terminal His-tag version (shorten as DesZ-C-pET21b) was described as below. 8 mL overnight LB-amp growth of DesZ-C-pET21b BL-21 AI cells was inoculated into 3 L LB-amp media. After the absorbance of the cell growth reached 0.6,

0.3 mL 1 M IPTG and 30 mL 20 % L-arabinose were added into the growth media to make the final concentration of IPTG and L-arabinose to 0.1 mM and 0.2 %. 0.1 mM

Fe(NH4)2(SO4)2 was also added to the induced cells. After 24 h incubation at 30 °C with shaking at 145 rpm, the cells were harvested at 6,000 rpm for 15 min. 3 L of non-His-tagged version DesZ-pET21b BL-21 AI cells were combined with the cell pellets of DesZ-C-pET21b, and the combined cell pellets were lysed in the buffer consisting of 20 mM Tris, 0.1 mM

Fe(NH4)2(SO4)2 and 10 mM L-cysteine (pH 7.0) with a small spatula tip worth of lysozyme.

The cells were passed three times through a French Press cell and followed by centrifugation at 11,000 rpm for 30 min at 4 ºC to remove the cell debris. The resulting supernatant was purified through an IEX DEAE Sepharose Fast-Flow column (26  530 mm, 244 mL). The column was first equilibrated with 3 CV of lysis buffer at 1 mL/min. After the proteins were loaded onto the column, 1.5 CV lysis buffer was used to remove the unbound proteins. The column was eluted with a linear gradient of 0-50 % of 6.5 CV eluting buffer (containing

20 mM Tris base, 0.1 mM Fe(NH4)2(SO4)2, 1 M NaCl, and 10 mM L-cysteine (pH 7.0)), followed by 1 CV of 50 % eluting buffer to wash off all the remaining proteins. After elution,

54 the column was washed by 2 CV of H2O. The fractions containing DesZ were identified using

SDS-PAGE based on size, confirmed the presence of DesZ by anti-His tag antibody through

Western-blot, combined and sent for Mass spectroscopy (36246 g/mol compared with expected mass 36530 g/mol).

UV-Vis spectroscopy. Spectrophotometric assays were taken using programs

“kinetics” and ‘”scanning kinetics” by a Varian Cary 100 Bio UV-visible Spectrophotometer at room temperature. Right before use, DesZ was buffer exchanged into freshly made buffer

(consisting 20 mM Tris base, 0.1 mM Fe(NH4)2(SO4)2, and 10 mM L-cysteine (pH 8.0)).

Each 1 mL kinetics assays was prepared with 2.24 mM 3-methoxygallic acid, 80 μL DesZ

(320 μg), DA buffer (50 mM Tris, 50 mM 2-amino-2-methyl-1,3-propanediol, and 50 mM

3,3-dimethylglutarate (pH 8.0)). DesZ was not added into the cuvette until a flat base line was obtained. For scanning kinetics assays, each 2 mL assay mixture contained purified DesZ

(about 250 μg), and 50 μM 3-methoxy gallic acid and DA buffer. DesZ wasn’t added into the mixture consisting of buffer and substrate after the first. The spectra were recorded with 21 scans with the range of 200-800 nm at 600 nm/min, and 30 min intervals between each scan.

Results and Discussion

As an important member in a catabolic pathway in S. paucimobilis SYK-6 for funneling several lignin derived monomeric and dimeric components into the citric acid cycle compounds, this study of DesZ is a continuation of the preliminary research by Ann-Marie

55

Illsley toward obtaining DesZ for kinetic and crystallographic investigations.

Soluble protein expression screening. After the DesZ gene was cloned into the pTOM vector by Julia Huang, it was found that little soluble protein was produced during protein expression screens (data not shown), and what little protein was expressed solubly failed to stick to a nickel affinity column to allow protein purification. Suspecting that the

N-terminal His6 tag might be causing the protein to misfold leading to the solubility problems, the DesZ gene was then subcloned into the pET21b vector, in which the stop codon prevents the His tag production, for new expression screening (Figure 2). Different combinations of temperatures and concentrations of inducing agents (IPTG and arabinose) and cell lines were

tried to get the optimal expression conditions. 0.1 mM Fe(NH4)2(SO4)2 or MnSO4 were also

0h std 5h 24h 0h e 5h 24h 0h 5h 24h 0h 5h 24h

Soluble

0.2 % 0.02 % 0.002 % 0.0002 %

0h 5h 24h std 0h 5h 24h 0h 5h 24h 0h 5h 24h

Insoluble

Figure 2. DesZ in pET21b vector expression screening. BL-21 AI cells with DesZ-pET21b were grown at 30 °C and induced by 0.1 mM IPTG and four different concentrations of arabinose (numbers under the red lines) separately. Aliquots were taken out at different time point as shown on top of each lane to check the protein solubility. The band corresponding to the DesZ molecular weight and showing the most soluble portion was circled in red.

56 added at induction to check whether metal ions would help soluble protein expression. It was shown that DesZ was produced most solubly when the cells were grown at 30 °C for 24 h after being induced by 0.1 mM IPTG and 0.002 % arabinose.

Purification of non-tagged Version of DesZ. Since there is no His tag attached to DesZ from DesZ-pET21b, the protein was purified through a DEAE Sepharose Fast-Flow column by salt gradient wash (Figure 3). The complicated chromatograph hardly gives hints to suggest the location of DesZ. SDS-PAGE gel analysis of column fractions did not immediately reveal the position of DesZ (Figure 4). Based on the predicted molecular weight of 36.5 kDa for DesZ, there were two possible locations of DesZ, and addition tests were needed to identify where DesZ elutes.

Figure 3. The chromatograph of non-tagged version of DesZ from DEAE column. Numbers besides the vertical lines represent the fractions. The UV 280 nm reading is the Y axis, and being denoted as the blue curve. The red curve shows the UV 254 nm reading, and the green line displays the supposed salt concentration. The actual salt conductivity is demonstrated as brown line. The column was eluted with a linear gradient of 0-50 % of 6.5 CV eluting buffer containing 20 mM Tris base, 0.1 mM

Fe(NH4)2SO4, 1 M NaCl, and 10 mM L-cysteine (pH 7.0), followed by 1 CV of 50 % eluting buffer to wash off all the left proteins.

57

L C P 6 7 8 9 10 11 12 13 L 14 15 16 18 19 20 21 22 23 24 L 25 26 27 28 29 3031

37 25

32 33 34 L 35 36 37 38 39 40 41 42 43 44 L 45 46 47 48 49 L L 50 51 52 53 54 55 56 57

Figure 4. The SDS-PAGE gels of the non-tagged version of DesZ from DEAE column. The fractions are labeled on top of each lane. L represents standards, and numbers besides the lane “L” in the third gel on the top indicate the corresponding molecular weights. The possible DesZ are circled in red.

UV-Vis Spectroscopy Characterization. The purer fractions from the above gels were used to check whether it can react with DesZ’s natural substrate 3MGA, as monitored by

UV-Vis spectroscopy (Figure 5). Compared with 3MGA autoxidation, there is an obvious

A B

Figure 5. A. Time dependent UV-Vis spectrum of 3MGA with DesZ. The curves were determined using 50 mM GTA buffer (pH 8.5) consisting of 50 mM 3,3-dimethylglutarate, 50 mM Tris, 50 mM

2-amino-2-methyl-1,3-propanediol, DesZ (about 20 μg), 100 μM Fe(NH4)2(SO4)2, and 100 μM 3MGA.

DesZ was pre-incubated with Fe(NH4)2(SO4)2 for 10 min before the assay began, and wasn’t added into the mixture consisting of buffer and substrate until the first scan completed. The reaction was recorded with 21 scans with the range of 200-600 nm at 600 nm/min, with a 30 min interval between scans. B. UV-Vis spectrum of 3MGA autoxidation. The assay concentrations and parameters of B are exactly the same with A, except ddH2O was used to substitute the enzyme.

58 intensity increment at 278 nm for the DesZ and 3MGA mixture, which may be the result of protein addition. The increasing differences at 340 nm for both pictures are almost the same.

No more other differences suggesting reaction happening could be observed. Therefore,

UV-Vis spectra are not evident enough to support the fact that the purified factions could react with 3MGA.

Purification of His-tagged version of DesZ. Due to the fact that activity screens of fractions didn’t allow confirmation of the production and location of non-tagged version of

DesZ off the column, a new purification method using C-terminal His-tagged DesZ was proposed (N-terminal His-tagged DesZ was not soluble). The stop codons between the DesZ gene and the His tag in DesZ-pET21b were mutated to two serines for the continuing production of the His tag at the C-terminal. The C terminal tagged version of DesZ was over-expressed together with the same of the non-tagged version. The chromatograph was very similar with the one of non-tagged version (Figure 6), giving complicated gel bands

(Figure 7). The western-blot using His tag antibody confirmed the appearance and accurate position of the C terminal His tagged DesZ (Figure 8).

59

Figure 6. The chromatograph of C terminal tagged version of DesZ from the DEAE column. Numbers besides the vertical lines represent the fractions. The UV 280 nm reading is the Y axis, and being denoted as the blue curve. The red curve shows the UV 254 nm reading, and the green line displays the supposed salt concentration. The actual salt conductivity is demonstrated as brown line. The column was eluted with a linear gradient of 0-50 % of 6.5 CV eluting buffer containing 20 mM Tris base, 0.1 mM Fe(NH4)2SO4, 1 M NaCl, and 10 mM L-cysteine (pH 7.0), followed by 1 CV of 50 % eluting buffer to wash off all the left proteins. L C P 6 12 13 1415 17 18 21 L 23 24 2526 27 28 29 30 3132 L 3334 35 3637 38 39

40 41 42 L 43 44 45 46 47 48 4950 51 52 L 53 54 55 56 68

Figure 7. The SDS-PAGE gels of the C-terminal tagged version of DesZ from the DEAE column. The fractions are labeled on top of each lane. L represents standards. The possible DesZ bands are circled in red box.

60

L C P 6 31 32 33 34 35 36 37 L 38 39 40 41 42 43 44 45

Figure 8. The western-blot of the C-terminal tagged version of DesZ off the DEAD column. The fractions are labeled on top of each lane. L represents standards.

Conclusions and Future Directions.

As an important member in the lignin-derived aromatic compounds degradation pathway in S. paucimobilis SYK-6, further efforts to obtain DesZ will have to be pursued.

The N-terminal His tag causes solubility issues, while the non-tagged version of DesZ brought purification verification problems. Finally, western-blot noticeably helps confirm the existing of C-terminal His tagged DesZ. We believe that the anaerobic purification of DesZ will significantly increase the activity of DesZ, thus shed light on the fully characterization of

DesZ.

61

Chapter 4 Other putative homologs in the PCAD superfamily

To identify other possible members in the PCAD dioxygenase superfamily, a psi-BLAST search using the LigB protein sequence from Sphingomonas paucimobilis SYK-6 as the query sequence has been conducted. LigB homologs have been identified in the genomes of about 1274 different organisms, and some of them have multiple hits in a single organism. From among these thousands of homologs, eight proteins (those not highlighted in yellow in Figure 1) have been selected, and cloned. Three of them have been expressed, and purified (colored in magenta). The selected proteins only have less than 15 % pair-wise identity to any previously identified members of the PCAD superfamily. Identification of new superfamily members, especially the ones at the edge of superfamily, will likely reveal members with diverse substrate specificity and will allow further understanding of the overall characteristics of this PCAD superfamily.

1 2 3 4 5 6 7 8 9 10 11 B. bronchiseptica 1 100 48.8 43.2 34.8 35.1 13.3 16.1 13.7 10.5 9.8 10.9 P. syringae 2 48.8 100 47 39 38.2 12.8 14.9 16 12.6 14.3 14.6 C. violaceum 3 43.2 47 100 38.6 37.1 14.1 14.8 12.7 9.9 10 9.5 B. cereus 4 34.8 39 38.6 100 37.8 11.1 13.7 12.4 11.8 11.5 12.7 E. coli K12 YgiD 5 35.1 38.2 37.1 37.8 100 9.6 13.9 12.6 10.7 12 11.9 DesZ 6 13.3 12.8 14.1 11.1 9.6 100 22.7 22.4 11.3 12.1 10.5 DesB 7 16.1 14.9 14.8 13.7 13.9 22.7 100 38.3 11.8 9.6 8.7 LigB 8 13.7 16 12.7 12.4 12.6 22.4 38.3 100 11.6 9.4 11.6 S. solfataricus 9 10.5 12.6 9.9 11.8 10.7 11.3 11.8 11.6 100 70.2 68.2 M. sedula 10 9.8 14.3 10 11.5 12 12.1 9.6 9.4 70.2 100 66.2 S. acidocaldarius 11 10.9 14.6 9.5 12.7 11.9 10.5 8.7 11.6 68.2 66.2 100 Figure 1. Pair-wise sequence identity determined for comparison of PCA dioxygenase superfamily members. Previously identified enzymes (indicated by yellow) are included for comparison to new putative superfamily members with no known functions. Enzymes which are cloned are highlighted in green; enzymes which have been purified are highlighted in magenta. The pair-wise identity of proteins in this project to the three previously identified PCA dioxygenases superfamily members is indicated by the black box.

62

An amino acid sequence alignment was performed using CLUSTALW, to explore sequence similarity of these putative dioxygenases (Figure 2). Instead of the consensus

2-His-1-carboxylate facial triad for metal binding, a three histidine metal coordination sphere is found in the active site of the selected enzymes, similar to the motif found in some Cupin

B. cereus ------M M P S L F L A H G S P M L A I Q D T / K P K A I V I YgiD ------M T P L V K D I I M S S T R M P A L F L G H G S P M N V L E D N / R P Q A I V V P. syringae ------M F P S L F I S H G S P T L A L E P G / R P R A I V M B. bronchicepticaM G F I A G L K I P T I S H P Y L P E V A M R L P T L F V S H G S P M L A V E P G / R P R A V L V C. violaceum ------M T T R Q P S L F I S H G A P T L P I E D S / R P R A I V I M. sedula - - M K R R P A V A G S F Y E D D S A Q L R K R I E W A F H H P I G P G G I P S V / H A G Y I Y S S. solfataricus - - M R R L P A V A G S F Y E S D P K K L K M Q I E W S F R H N I G P R D I P K Q / H A G Y I Y S S. acidocaldarius- - M I R I P A V A G S F Y E A D P V R L R K Q I E W S F L H D L G P K S L P S V / H A G Y M Y S LigB ------M A R V T T G I T S S H I P A L G A A I Q T / P I R D W I K

B. cereus F T ------A H W E S / I Y D F G G F P P E L Y E I K Y R A K - G S S S I A S M L YgiD V S ------A H W F T / I H D F G G F P Q A L Y D T H Y P A P - G S P A L A Q R L P. syringae V S ------A H W E S / W H D F G G F A A E L F A M Q Y P A P - G L P E L T R T I B. bronchicepticaV S ------P H W M G / W H D F G G F P A D L Y Q L Q Y P A E - G S P R L A Q Q V C. violaceum V S ------A H W E A / M Y D F G G F P P V L R T L R Y P A A - T D A A L A E R V M. sedula G P V A ------A H S Y Y / G P N H T G Y G S Q V S I W P G G D W E T P L G S A Q V N S. solfataricus G P V A ------A H S Y Y / G P N H T G L G S Y V S A W P K G E W E T P L G S V K V D S. acidocaldariusG P V A ------A H A Y Y / G P N H T G L G S Y V S I W H K G K W K T P L G E V S V D LigB Q P G N M P D V V I L V Y N D H A S A / A E T F K P A D E G W G P R P V P D V K G H P D L A W H I

B. cereus E / N - M T R G L D ------H G S W T L L H R M Y P E A N I P V V / F L S A - - - K E YgiD V / D K E A W G F D ------H G S W G V L I K M Y P D A D I P M V / S K P A - - - A W P. syringae V / D - S R R P F D ------H G V W V P L S L M Y P Q A D I P V V / R Q G P - - - A L B. bronchicepticaV / D - A R R P L D ------H G A W V P L R Y L Y P H A D L P V V / E R D A - - - R G C. violaceum I / T - E E R P L D ------H G M W T P L M L M F P Q A D I P L V / Q A G V - - - A E M. sedula A / E V V D I D E K - - - - - A H L Y E H S I E V Q L P F L Q Y F F D N - L S / L M Q T P E I A E S. solfataricus E / E V I D L E E K - - - - - S H L Y E H S I E V Q L P F L Q Y F F D D N F K / M M Q T P E I A E S. acidocaldariusD / E I I D I D E R - - - - - A H L Y E H S I E V Q I P F L Q Y L F G Q N F K / M M Q T P D V A E LigB A / - - I L D E F D M T I M N Q M D V D H G C T V P L S M I F G E - - - P E E / P F P V N V V T Y

B. cereus Q F K I G E A L K G L ------G Q E D I L V I G S G V T V H N L R A L K W N / W A YgiD H F E M G R K L A A L ------R D E G I M L V A S G N V V H N L R T V K W H / W A P. syringae Q T Q V G R A L A G L ------R E Q G V L V I G S G S I T H N L Y D L D W S / W A B. bronchicepticaQ Y A L G Q A L A P L ------R D D G V L V I G S G S L T H N L R D V R M P / Y V C. violaceum H Y A L G Q A L K P L ------R D D N V L V I G S G S Y T H N L W E L S - P / W A M. sedula F V A E G I W R F I Q R H ------S D K D I V V L A S S D L N H Y D P H D V T M / K I S. solfataricus F L A D A I Y K V I Q K Y ------S D K D I V V L A S S D M N H Y D P H E I T M / K I S. acidocaldariusS L A E G I Y K L V S - S ------G K K D I V V L A S S D L N H Y E P H D K T I / E I LigB P P P S G K R C F A L G D S I R A A V E S F P E D L N V H V W G T G G M S H Q L Q G P R A G / N F

B. cereus I E / P H A Q L A V P R A E H F V P L F I A M G S - - G E N S G E V I H R S Y E L G T L S Y L C / YgiD T S / G - G T L S N P T P E H Y L P L L Y V L G A W D G Q E P I T I P V E G I E M G S L S M L S / P. syringae A E / P H A V R A H P S D E H L L P L Y F A R G A G G - - A F S - I A Y Q G F T M G A L G M D I / B. bronchicepticaA P / P G A A R A H P H D D H L M P L Y V A L G A G G - - M P A R R L N D E V A Y G A L A M D A / C. violaceum A D / P Y P R E N H P S D E H F Q P L L V A L G A S D D D A V P V K L H D D W R M G S L S M A C / M. sedula Q D / K K M H K K P Y I L K H - - - A T S G ------D T S G D K S S V V G Y L A T R F / S. solfataricus Q Q / K K L G K K A Y I L K H - - - A T S G ------D T S G P K D S V V G Y L A A R F / S. acidocaldariusQ K / K K L G K K P Y V L R H - - - A T S G ------D T S G D K S S V V G Y L S V R F / LigB I D / S E G V ------E L V M W L I M R G A L P E K V R D L Y T F Y H I P A S N T A L G A /

Figure 2. Sequence alignment with selected LigB homologs and two structurally characterized enzymes---LigB and YgiD. Residues highlighted in red are ligands coordinated with metal ions (for both LigB and YgiD), in green are other conserved residues in the active site of LigB and YgiD, in blue are 60-80 % conserved in LigB superfamily provided by Pfam,(129) in yellow are 90 % conserved in LigB superfamily,(130) and in magenta are highly conserved in this alignment.

63 superfamily members. Also two histidines and one leucine (green highlighted in Figure 2) which are conserved residues in the active sites of LigB and YgiD may help in substrate binding. A consensus sequence GXPXLA (yellow highlighted) has been found, whose function is yet unknown, but be believed to help in new members identification in sequence alignment. Overall, sequences of orthologous from M. sedula, S. acidocaldarius, and S. solfataricus are more similar to each other than counterparts from other strains, likely due to the fact that they are all Archaea.

The rest of the selected organisms are bacteria, none of these gene are in a context surrounded by genes related to lignin degradation. It’s reasonable to deduce that these PCAD superfamily members may have more diverse functions. Bordetella bronchiseptica RB50 (B. bronchiseptica) is a small Gram-negative bacterium that causes chronic respiratory infections in a wide range of animals like pigs, and rabbits, but rarely infects humans.(131) Current research involving B. bronchiseptica are primarily focused on vaccine development.(132)

Pseudomonas syringae pv tomato DC3000 (P. syringae) is a Gram-negative plant pathogen that causes bacterial speck on tomato, and is an important model in molecular plant pathology.(133) Bioavailable iron for plants was shown to be strongly associated with a variety of pathways in this bacterium. Many previously uncharacterized genes were believed to be related with iron regulation.(134)

64

Bacillus cereus ATCC 10987 (B. cereus) is a non-lethal dairy Gram-positive bacterium which was isolated from a study on cheese spoilage.(135) Some unique metabolic mechanism capabilities like urease and xylose have been identified, but the bacterium can’t utilize nitrite and nitrate as other B. cereus substrains.

Chromobacterium violaceum ATCC 12472 (C. violaceum) is a gram-negative bacterium that grows in soil and water.(136) It’s found that human and animal infection caused by C. violaceum is rare, but it is associated with a high mortality rate when it happens.

It characteristically produces violacein, a water-insoluble purple pigment with antibacterial activity. It’s also has potential for use in the production of other compounds of biotechnological and medical interest such as cyanide and chitinase.(137)

Sulfolobus solfataricus (S. solfataricus), Sulfolobus acidocaldarius (S. acidocaldarius), and Metallosphaera sedula (M. sedula) are hyperthermophilic microorganisms belonging to the Archaea.(138) Hyperthermophilic Archaea are worthy of attention because of their capacity to survive and reproduce at temperatures near the boiling point of water, and are of extreme biotechnological interest not only because of the exceptional stability of their proteins, but also because of their unique enzymatic activities.

Many genes found within this domain are unique to Archaea.(139)

Based on above information, all of the selected bacteria are sources of disease caused in human, animals or plants. None of the organisms, including the three Archaea, are

65 obviously related to lignin degradation. The putative dioxygenases from these bacteria are likely to utilize various substrates, or catalyze different types of reactions. Therefore, the characterization of these remote homologs may reveal various functional and structural diversities, and gain more insight into the structural and mechanism consensus of the whole

PCAD superfamily, thus clarifying the boundaries.

Materials and Methods

All the genomic DNA and bacterial cells lines were purchased from ATCC (Manassas,

VA). All the primers (Table 1), E. coli strains DH5α, BL-21, and BL-21 AI and AccuPrime pfx SuperMix polymerase were purchased from Invitrogen (Carlsbad, CA). NdeI, BamHI,

XhoI, DpnI, Taq DNA Polymerase with Standard Taq Buffer and Quick Ligation Kit were obtained from New England Biolabs (Ipswich, MA). Isopropyl β-D-1-thiogalactopyranoside

(IPTG) was bought from Gold Bio Technologies (St. Louis, MO). DNAzol was obtained from

MRC Inc. (Cincinnati, OH). B-PER II Bacterial Protein Extraction Reagent was purchased from Thermo scientific (Rockford, IL). Ampicillin (Amp), L-arabinose, tris-(hydroxymethyl)aminomethane hydrochloride (Tris-HCl), ferrous ammonium sulfate, sodium hydroxide, EDTA, ethidium bromide, and sodium dodecyl sulfate (SDS) were obtained from Sigma (St. Louis, MO). Sodium biphosphate, imidizol, and sodium chloride are purchased from Fisher Scientific (Pittsburgh, PA). Agarose and Shrimp Alkaline

Phosphatase (SAP) were obtained from Promega (Madison, WI). 30% acrylamide,

66 ammonium persulfate (APS), Coomassie Brilliant Blue R-250, and N, N, N,

N’-tetra-methylethylenediamine (TEMED) were obtained from Bio-Rad (Hercules, CA).

Bacto tryptone, tryptic soy broth, nutrient broth and Bacto yeast extract were obtained from

Difco Laboratories (Lawrence, Ks). Qiaquickk Gel Extraction kit and QIAprepSpin Mini-prep

Kit were obtained from Qiagen (Valencia, CA). QuikChange Lightning Site-Directed

Mutagenesis Kit was purchased from Agilent technology (Santa Clara, CA). All DNA sequencing results were obtained from ATCG (Wheeling, IL).

pET-15b and pET21b vectors were obtained from Novagen (Madison, WI). pTOM vector is a modified version of Novagen pET-15b vector in which the two Ser codons both before and after the His-tag were mutated to His codons by Toomas Haller in John Gerlt’s lab in Biochemistry Department at University of Illinois at Urbana-Champaign. The modified pTOM vector has an NdeI/XhoI/BamHI cloning site and encodes Amp resistance, an

N-terminal 10-His-tag and a thrombin protease cleavage site for the cleavage of the His-tag. pET-15b and pTOM vector are both digested with the same endonucleases with corresponding inserts’ restriction sites, following a Shrimp Alkaline Phosphatase (SAP) at

37 ˚C for 1 h.

The Luria-Bertani (LB) media was prepared from 5 g/L sodium chloride, 10 g/L tryptone, 5 g/L yeast extract, and 0.001 M solidum hydroxide. The super broth media was prepared using 5 g/L sodium chloride, 32 g/L tryptone, 20 g/L yeast extract, and 0.001 M

67 sodium hydroxide. The SOC media is made of 20 g/L Bacto Tryptone, 5 g/L Bacto Yeast

Extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, 20 mM glucose. The concentration of Amp in all growth media was 100 μg/mL. All liquid growths were performed at 37 ˚C, with 200 rpm shaking speed unless otherwise indicated. The 1 % agarose gel for electrophoresis was prepared using 10 mg/mL agarose, 50 mL TAE electrophoresis buffer (including 40 mM Tris, 20 mM glacial acetic acid, and 1 mM EDTA (pH 8.0)), and

0.1 mg/mL ethidium bromide. The polyacrylamide protein gels were prepared using 30 % acrylamide, separating buffer (1.5 M Tris-HCl, pH 8.8), stacking buffer (0.5 M Tris-HCl, pH 6.8), 0.01 mM EDTA, APS and TEMED. The protein running buffer was prepared using

0.1 % SDS solution. All DNA agarose gels and SDS-PAGE protein gels were run at 100 V using a PowerPac Basic power supply from Bio-Rad. Then the protein gels were stained using Coomassie Brilliant Blue R-250 for 22 min and destained for 1 h.

The gradient PCR reactions were performed using a MJ Reserch PTC-200 Peltier

Thermal Cycler and (Bio-rad, Hercules, CA). The colony PCR reactions were utilized AB

Applied Biosystems Gene Amp PCR system 9700 (Life Technologies, grand Island, NY) or a

S1000 Thermal Cycler. The extinction coefficients and predicted masses of all proteins were calculated based on the amino acid sequence using the web application ProtParam

(http://web.expasy.org/protparam/). The cells were lysed using a French press cell (Aminco,

Urbana, Ill.). The proteins from Chromobacterium violaceum ATCC 12472 and Pseudomonas

68 syringae DC3000 were purified using an AKTA FPLC system.

Pseudomonas syringae DC3000 (PSPT_1776)

Cloning of putative PSPT_1776 dioxygenase (705 b.p.). The genomic DNA of

P. syringae was purchased from ATCC, dissolved in 10 µL H2O, and stored in a 40 % glycerol solution at -80 °C. A gradient PCR using 22.5 μL AccuPrime SuperMix, 1 μL genomic DNA (38.4 ng), 0.75 μL primer (750 ng each, Table 1) was adopted to isolate the putative dioxygenase. The reaction mixture was incubated at 95 °C for 5 min to denature the

DNA, followed by 35 cycles consisting of 95 °C for 15 s, 45-65 °C for 30 s for annealing, and 68 °C for 1 min for elongation, finally 68 °C for 5 min after the final cycle. The PCR products at the right size in the 1 % Agarose gel were isolated to extract the DNA by following the Qiaquickk Gel Extraction kit protocol. The gel extraction product was incubated overnight with NdeI endonuclease at 37 °C with the following recipe: 1 μg PCR product, 2 μL NdeI (40 U), and 3 μL 10X NEB buffer 3 for a total volume of 30 μL. The NdeI digested products were followed by 2 h incubation with 2 µL BamHI (40 U), 0.6 µL 10X

NEB buffer 3, and 0.36 µL BSA to make a total volume of 36 µL. The digested products were

checked by 1 % Agarose gel and purified utilizing Qiaquick PCR Purification Kit. The endonuclease digestion product was ligated successfully with pET-15b and pTOM vectors via

Quick Ligation Kit with a 3 to 1 insert to vector concentration ratio, and then transformed into DH5α cells.

The LB-Amp plates were inoculated with the transformation products and incubated

69 overnight at 37 ˚C. A single colony from the plates was used to inoculate 20 mL LB-Amp media. After overnight growth, 4 µL culture was used for colony PCR using 5 µL 10X standard Taq reaction buffer, 1 µL 10 mM dNTPs, 1 µL each T7 promotor and terminator

(40 ng), 1 µL growth media, 0.25 µL Taq DNA polymerase to make a total volume of 50 µL to check the insert’s size. The remaining cells were harvested by centrifugation at 4000 rpm for 20 min. The plasmid DNA was extracted via QIAprepSpin Mini-prep Kit. Mini-prep protocol. 10 μL products were sent to sequencing. Another 5 μL pET-15b and pTOM-gene constructs (ps-pET15b and ps-pTOM respectively) were transformed into BL-21 AI cells.

The transformation product was plated onto LB-Amp plates and incubated overnight at 37 °C.

A single colony from the plates was used to inoculate in 20 mL LB-Amp media to make frozen stock.

Soluble protein expression screening. 200 μL LB-Amp overnight growth media of ps-pET15b was used to inoculate 20 mL Super Broth-Amp media. After incubation for about

4 h at 37 ˚C, the absorbance of the new growth at OD600 reached 0.4375. Different concentration combinations of IPTG (1 mM and 10 mM) and L-arabinose (0.2 %, 0.02 %,

0.002 %, and 0.0002 %) were added into the growth media to induce the protein production.

Then 1 mL aliquots were removed at 0, 5, and 24 h after induction. The aliquots were centrifuged down at 11,000 rpm for 10 min at room temperature and the resulting cell pellets were stored at -20 °C. To extract the soluble proteins, the thawed cell pellets were lysed

70 utilizing 150 µL B-PER II Bacterial Protein Extraction Reagents. After centrifuged at 11 k rpm for 5 min, supernatants were removed from the cell pellets. The cell pellets were suspended in 150 µL B-PER II Bacterial Protein Extraction Reagents to allow analysis of the insoluble fraction. Both the soluble and insoluble protein fractions were visualized on

SDS-PAGE gels and compared. To determine the best growth condition, the soluble and insoluble protein fractions were analyzed from growth with 37 °C, 30 °C or 16 °C incubation after induction.

Expression and purification of putative PSPT_1776 dioxygnase (29.9 kDa). 4 mL overnight superbroth-amp growth of ps-pET15b BL-21 AI cells frozen stock was inoculated

into 4 L superbroth-amp medium with 0.1 mM Fe(NH4)2(SO4)2. After the absorbance of the cell growth reached 0.6, 40 mL of 1 M IPTG and 0.4 mL of 20 % L-arabinose were added into the growth media to make the final concentrations of IPTG and L-arabinose to 10 mM and 0.002 % to induce the cells. The cells were harvested by centrifugation at 5,000 rpm for

15 min after 24 h growth at 16 °C from induction. Resulting cell pellets were lysed in FE

buffer (consisting of 50 mM Tris-HCl, 0.1 mM Fe(NH4)2SO4, 10 % glycerol and 2 mM

L-cysteine (pH 8.0)). The cells were passed three times through a French Press cell followed by centrifugation at 11,000 rpm for 30 min at 4 ºC to remove the cell debris. The resulting supernatant was loaded onto an IMAC Sepharose Fast-Flow column (26  60 mm, 44 mL) charged with Ni2+. The column was first equilibrated with 2 CV of Fe buffer at 2 mL/min.

71

After the proteins were loaded onto the column, 2 CV of binding buffer (containing 20 mM

Hepes, 0.5 M NaCl, and 20 mM imidazole) was used to remove the unbound stuff. The column was eluted with a linear gradient of 0-50 % of 25 CV of eluting buffer (containing

20 mM Hepes, 0.5 M NaCl, and 0.5 M imidazole (pH 8.0)), followed by 6 CV of 100 % eluting buffer to wash off all the remaining proteins. After elution, the column was washed by

6 CV of stripping buffer (consisting of 0.2 mM EDTA and 0.5 M NaCl), followed by 6 CV of

H2O. The column was consequently washed with 2 CV of 2 M Urea, 1 CV of H2O at

1 mL/min, and 6 CV of H2O at 2 mL/min. 2 CV of 0.5 % Triton-X and 0.1 M acetic acid were utilized to wash off the Fe2+ ion. 4 CV of 80 % ethanol was used to wash off the acid at

1 mL/min. Then the column was washed with 2 M NaCl at 0.5 mL/min, followed by 1 CV

H2O at 1 mL/min, and 6 CV of H2O at 2 mL/min. 2 CV of 30 % isopropanol was adopted to

clean the column, followed by 1 CV of H2O at 1 mL/min, and 6 CV of H2O at 2 mL/min. The

column was regenerated with 3 CV of 0.2 M NiSO4, and washed off the extra metal ion with

6 CV of binding buffer. The fractions containing the target protein were identified using

SDS-PAGE, and combined.

The size exclusion chromatography was performed using a HiLoad 16/60 Superdex

200 prepgrad column (16  600 mm, 120 mL) in Prof. Rich Olson’s lab at Wesleyan

University. Before loading the protein samples, the column was washed with 40 mL degassed deionized water at 1 mL/min, 45 mL degassed 20 % ethanol at 1.5 mL/min, and equilibrated

72 with 200 mL degassed equilibration buffer (containing 20 mM Tris (pH 7.4), and 150 mM

NaCl). 1 mL of about 2 mg/mL Ni affinity column purified PSPT_1776 dioxygnase was spun at 15,000 rpm for 10 min to remove aggregates, filtered, and loaded onto the column with a superloop. The column was eluted with 30 mL equilibration buffer. The identified fractions were combined, and subjected to MALDI mass analysis (29260 g/mol compare with the expected mass 29911.8 g/mol).

Bacillus cereus ATCC 10987(BCE_1944)

Cloning of putative BCE_1944 dioxygenase (757 b.p.) After rehydration with 1

μL Tryptic Soy Broth (TSB) medium made from dissolving 30 g TSB powder into 1 L H2O, the B. cereus cells were grown on a TSB plate at 30 °C for 24 h, followed by liquid overnight growth in 20 mL TSB, stored in a 40 % glycerol solution in -80 °C, and then inoculated into

500 mL TSB for 24 h. Cells were harvested by centrifugation at 4,000 rpm for 20 min. The genomic DNA was extracted by following DNAZol protocol. Then a phenol-chloroform extraction (phenol: chloroform: isoamylalcohol= 25:24:1) was adopted to purify 200 μL of genomic DNA. Gradient PCR was conducted for gene amplification using using 22.5 μL

AccuPrime SuperMix, 1 μL genomic DNA (212 ng), 0.75 μL primer (750 ng each, Table 1).

The reaction mixture was incubated at 95 °C for 5 min to denature the DNA, followed by 35 cycles consisting of 95 °C for 15 s, 45-65 °C for 30 s for annealing, and 68 °C for 1 min for elongation, finally 68 °C for 5 min after the final cycle. The PCR products at the right size in

73 the 1 % Agarose gel were isolated to extract the DNA by following the Qiaquick Gel

Extraction kit protocol.

Table 1 Sequences of the olignucleotides used in the study. Restriction sites are underlined. The mutation were indicated in red.

Oligo name Sequence (5’-3’)

P. syringae Forward GGAGCCCCATATGTTTCCCAGC

Reverse CTGGATCCTCAGTCGAACCGATAG

B. cereus Forward GGAGGTTCTCGAGATGATGCCATCAC

Reverse CATCCTTGGATCCTTAAAATTGGAGACAAAG

ATAAC

C. violaceum Forward GCAGGCAACAGCATATGAC CACCC

Reverse GCGGATCCTCAGTCGAAGCGCCAG

S. solfataricus Forward GAGATGAGGCTTGTCATATGAGGAGATTACCAG

Reverse GAAACGGATCCTTAACTTCCAAACCGCGCTGC

DesB CGGTAGCCGGCAAACAGTCCTCTGATCCGAATT

QuikChange Forward CGAGCTCCGT CGACAAGC

GCTTGTCGACGGAGCTCGAATTCGGATCAGAGG

Reverse ACTGTTTGCCGGCTACCG

T7 promoter TAATACGACTCACTATAGG

T7 terminator GCTAGTTATTGCTCAGCGG

Chromobacterium violaceum ATCC 12472 (CV3550)

Cloning of putative CV3550 dioxygenase (767 b.p.). After the rehydration with 1 μL

Nutrient broth media made from dissolving 8 g nutrient broth powder in 1 L H2O, the

CV3550 cells were grown on a Nutrient broth plate at 26 °C for 24 h, followed by liquid

74 overnight growth in 20 mL nutrient broth, stored in a 40 % glycerol solution in -80 °C, and then inoculated into 500 mL nutrient broth media for 24 h. Cells were harvested by centrifugation at 4000 rpm for 20 min. The genomic DNA was extracted by following

DNAZol protocol. Then a phenol-chloroform extraction (phenol: chloroform: isoamylalcohol= 25:24:1) was adopted to purify 200 μL of genomic DNA. Gradient PCR was conducted for gene amplification using 22.5 μL AccuPrime SuperMix, 1 μL genomic DNA

(106 ng), 0.75 μL primer (750 ng each, Table 1). The reaction mixture was incubated at 95 °C for 5 min to denature the DNA, followed by 35 cycles consisting of 95 °C for 15 s, 45-65 °C for 30 s for annealing, and 68 °C for 1 min for elongation, finally 68 °C for 5 min after cycles end. The PCR products at the right size in the 1 % Agarose gel were isolated to extract the

DNA by following the Qiaquick Gel Extraction kit protocol. The gel extraction product was incubated overnight with XhoI endonuclease at 37 °C with the following recipe: 1 μg PCR product, 2 μL XhoI (40 U), 3 μL 10X NEB buffer 4, and 0.3 µL BSA for a total volume of

30 μL. The XhoI digested products were checked by 1 % Agarose gel and purified utilizing

Qiaquickk Gel Extraction Kit. The extracted DNA was digested with BamHI by 2 h incubation at

37 °C with exact the same recipe with XhoI except substituting 10X NEB buffer 4 with buffer

3. The endonuclease digestion product was ligated successfully into pTOM vector via Quick

Ligation Kit with a 3 to 1 insert to vector concentration ratio, and then transformed into

DH5α cells.

75

The LB-Amp plates were inoculated with the transformation products and incubated overnight at 37 ˚C. A single colony from the plates was used to inoculate 20 mL LB-Amp media. After overnight incubation, 4 µL culture total was used for colony PCR using 5 µL

10X standard Taq reaction buffer, 1 µL 10 mM dNTPs, 1 µL each T7 promotor and terminator (40 ng), 1 µL growth media, and 0.25 µL Taq DNA polymerase to make a total volume of 50 µL to check the insert’s size. The remaining cells were harvested by centrifugation at 4000 rpm for 20 min. The plasmid DNA was extracted via QIAprepSpin

Mini-prep Kit. 10 µL Mini-prep products were sent to sequencing. Another 5 μL pTOM-gene constructs (shorten as cv-pTOM) were transformed into BL-21 AI cells. The transformation product was plated onto LB-Amp plates and incubated overnight at 37 °C. A single colony from the plates was used to inoculate in 20 mL LB-Amp media to make frozen stock.

Soluble protein expression screening. 200 μL LB-Amp overnight growth medium of cv-pTOM in BL-21 cells was used to inoculate 20 mL LB-Amp media. After incubation for

about 4 h at 37 ˚C, the absorbance of the new growth at OD600 reached 0.4153, at which point

IPTG was added to a final concentration of 1 mM or 10 mM to induce the protein production.

Then 1 mL aliquots were removed at 0, 5, and 24 h after induction. The aliquots were centrifuged down at 11,000 rpm for 10 min at room temperature and the resulting cell pellets were stored at -20 °C. To extract the soluble proteins, the thawed cell pellets were lysed utilizing 150 µL B-PER II Bacterial Protein Extraction Reagents. After centrifugation at

76

11,000 rpm for 5 min, supernatants were removed from the cell pellets. The cell pellets were suspended in 150 µL B-PER II Bacterial Protein Extraction Reagents to allow analysis of the insoluble fraction. Both the soluble and insoluble protein fractions were visualized on

SDS-PAGE gels and compared. To determine the best growth condition, the soluble and insoluble protein fractions were analyzed from growth at 30 °C or 16 °C in LB-Amp meida,

16 ˚C in Super Broth-Amp medium, and 16 ˚C in Super Broth-Amp media with 0.5 mM

Fe(NH4)2(SO4)2 incubation after induction. To improve the yield of soluble protein, the cv-pTOM construct was also transformed into BL-21 AI cells and the same protein screening process was repeated with different concentrations combinations of IPTG (1mM and 10 mM) and L-arabinose (0.2, 0.02, 0.002, and 0.0002 % by molecular weight).

Expression and purification of putative CV3550 dioxygenase (31.4 kDa). 1 mL overnight superbroth-amp growth of cv-pTOM BL-21 AI cells frozen stock was inoculated

into 1 L superbroth-amp media with 0.5 mM Fe(NH4)2(SO4)2. After the absorbance of the cell growth reached 0.6, 1 mL of 1 M IPTG and 20 % L-arabinose were added into the growth media to make the final concentrations of IPTG and L-arabinose to 1 mM and 0.02 % to induce the cells. The cells were harvested by centrifugation at 5 k rpm for 15 min after 24 h growth at 16 °C from induction. The resulting supernatant was loaded onto an IMAC

Sepharose Fast-Flow column (26  60 mm, 44 mL) charged with Ni2+. The column was first equilibrated with 2 CV of Fe buffer at 2 mL/min. After the proteins were loaded onto the

77 column, 2 CV of binding buffer (containing 20 mM Hepes, 0.5 M NaCl, and 20 mM imidazole) was used to remove the unbound stuff. The column was eluted with a linear gradient of 0-50 % of 25 CV eluting buffer (containing 0.5 M imidazole (pH 8.0), 20 mM

Hepes, and 0.5 M NaCl), followed by another 6 CV of 100 % eluting buffer to wash off all the remaining proteins. After elution, the column was washed by 6 CV of stripping buffer

(consisting of 0.2 mM EDTA and 0.5 M NaCl), followed by6 CV of H2O. The column was

consequentively washed with 2 CV of 2 M Urea, 1 CV of H2O at 1 mL/min, and 6 CV of H2O at 2 mL/min. 2 CV of 0.5 % Triton-X and 0.1 M acetic acid were utilized to wash off the Fe2+ ion. 4 CV of 80 % ethanol was adopted to wash off the acid at 1 mL/min. Then the column

was washed with 2 M NaCl at 0.5 mL/min, followed by 1 CV H2O at 1 mL/min, and 6 CV of

H2O at 2 mL/min. 2 CV of 30 % isopropanol was adopted to clean the column, followed by 1

CV of H2O at 1 mL/min and 6 CV of H2O at 2 mL/min. The column was regenerated with 3

CV of 0.2 M NiSO4, and washed off the extra metal ion with 6 CV of binding buffer. The fractions containing the target protein were identified using SDS-PAGE, combined, and subjected to MALDI mass analysis (31175.9 g/mol compared with the expected mass

31404.5 g/mol).

Sulfolobus solfataricus P2 (SSO0066)

Cloning of putative SSO0066 dioxygenase (850 b.p.). The genomic DNA of

S. solfataricus was purchased from ATCC, dissolved in 10 µL H2O, and stored in a 40 %

78 glycerol solution in -80 °C. A gradient polymerase chain reaction (gradient PCR) using

22.5 μL AccuPrime SuperMix, 1 μL genomic DNA (38.4 ng), 0.75 μL primer (750 ng each,

Table 1) was adopted to isolate the putative dioxygenase. The reaction mixture was incubated at 95 °C for 5 min to denature the DNA, followed by 35 cycles consisting of 95 °C for 15 s,

45-65 °C for 30 s for annealing, and 68 °C for 1 min for elongation, finally 68 °C for 5 min after cycles end. The PCR products at the right size in the 1 % Agarose gel were isolated to extract the DNA by following the k Gel Extraction kit protocol.

DesB

Cloning of DesB. The desB DNA sequence was synthesized by DNA2.0 (Menlo Park,

CA), and provided in vector pJ201. It was cut out of pJ201 and ligated into pTOM vector with the NdeI and BamHI cloning sites by Julie Huang. Later the DesB was removed out of pTOM, subcloned into pET-21b in the NdeI and BamHI cloning sites, and transferred into

BL-21 AI cells by Ann-Marie Illsley.

Soluble protein expression screening. The DesB-pET21b with stop codons version

(shorten as DesB-pET21b) was transferred into BL-21 AI cells and stored in a 40 % glycerol solution in -80 °C. 100 µL overnight growth media was inoculated into 10 mL LB-Amp

media. After the OD600 reaches 0.6, different concentration combinations of IPTG (1 mM and

0.1 mM) and L-arabinose (0.2 %, 0.02 %, 0.002 %, and 0.0002 %) were added into the growth media to induce the protein production. The incubation temperature was changed to

79

30 °C. Then 1 mL aliquots were removed at 0, 5, and 24 h after induction. The aliquots were centrifuged down at 11,000 rpm for 10 min at room temperature and the resulting cell pellets were stored at -20 °C. To extract the soluble proteins, the thawed cell pellets were lysed utilizing 150 µL B-PER II Bacterial Protein Extraction Reagents. After centrifugation at

11,000 rpm for 5 min, supernatants were removed from the cell pellets. The cell pellets were suspended in 150 µL B-PER II Bacterial Protein Extraction Reagents to allow analysis of the insoluble fraction. Both the soluble and insoluble protein fractions were visualized on

SDS-PAGE gels and compared. To determine the best growth condition, the soluble and insoluble protein fractions were analyzed from growth at 30 °C or 16 °C, additionally adding

0.1 mM Fe(NH4)2(SO4)2 or MnSO4 after induction were compared with each other.

Site-directed mutagenesiss of DesB-pET21b to generate C-terminal His6 tagged proteins. A 50 μL of PCR reaction was prepared utilizing 5 μL 10X reaction buffer, 125 ng of each primer, 1 μL of dNTP mix, 1.5 μL of QuikSolution reagent, 100 ng DesB-pET21b, and

1μL of QuikChange Lightning Enzyme. The PCR reaction mixture was incubated at 95 °C for

2 min, followed by 18 cycles containing 20 s at 95 °C, 10 s at 60 °C for annealing and

3.5 min at 68 °C for elongation, finally at 68 °C for 5 min after the final cycle. Annealing temperature at 55 and 65 °C had been tried with other same conditions. A control reaction using pWhitescript 4.5-kb control plasmid also performed according to the manual. 2 μL of

DpnI was directly added to the PCR reaction mixtures, and incubated at 37 °C for 1 h. 10 μL

80 of digestion products was transferred into DH5 cells or XL 10-Gold cells, and plated on

LB-Amp plates. Four colonies were selected for overnight growth at 37 °C in 15 mL

LB-Amp medium. Cell pellets were harvested at 4,000 rpm for 15 min. Plasmids were extracted from cell pellets using QIAprepSpin Mini-prep Kit, and sent for sequencing.

Results and Discussion

To identify possible members in the PCA superfamily, a PSI-BLAST search using the LigB protein sequence has been conducted. Five proteins have been selected for cloning to obtaining proteins for further kinetic and crystallographic investigation.

Pseudomonas syringae DC3000 (PSPT_1776)

The gradient PCR was used to clone the designed target from the genomic DNA using the forward and reverse primers shown in Table 1. DNA gel analysis revealed that the molecular mass of this product was similar to the predicted molecular mass of the putative dioxygenase

PSPT_1776 (705 b.p.) (Figure 3). The gene was subcloned into pET-15b and pTOM vectors.

The successful ligations were confirmed by the bands around 1000 b.p. in the 1 % agarose gel after colony PCR (Figure 4), due to the 200 b.p.between T7 promoter and terminator regions.

All cloned gene constructs were confirmed through sequencing.

81

Figure 3. Gradient PCR result of PSPT_1776 on a 1% Agarose gel. The left column is a DNA ladder with the corresponding size label. The estimated size of PSPT_1776 is 705 b. p.

Figure 4. Colony PCR product of PSPT_1776. The left column is DNA ladder besides with corresponding size label.

After the PSPT_1776 gene was cloned into the pET15b vector, and transformed into

BL-21 cells, different combinations of temperatures and IPTG concentrations and were screened to reveal the optimal expression conditions. Unfortunately, it was found that little soluble protein was produced during protein expression screens (data not shown). So the ps-pET15b was re-transformed into a BL-21 AI cells. Different combinations of IPTG and arabinose concentrations were tried for induction, in addition to different temperatures

(Figure 5). It was shown that the most soluble protein was produced when the cells were grown at 30 °C for 24 h after induced by 0.1 mM IPTG and 0.002 % arabinose.

82

0h 5h std 24h 0h 5h 0h 5h 24h 0h std 5h 24h

Soluble

0.02% 0.0002% 0.2% 0.002% 0h 5h 24h std 0h 5h 24h 0h 5h 24h 0h 5h 24h std

Insoluble

Figure 5. ps-pET15b expression screening. BL-21 AI cells with ps-pET15b were grown at 16 °C in Super Broth-Amp media, and induced by 10 mM IPTG and four different concentrations (numbers above the lines) of arabinose separately. Aliquots were taken out at different time point as shown on top of each. The band corresponding to the ps-PET15b molecular weight was circled in red. The most soluble protein growth condition was in red, too.

Successful purification by immobilized nickel affinity chromatography afforded about

18 mg purified proteins per liter of culture. The enzyme was confirmed by SDS-PAGE gels

corresponding to about 30 kDa (Figure 6). The size exclusion column chromatography

displayed several peaks, suggesting that either the desired protein was not pure, or multiple

oligomeric states existing (Figure 7). Std 3 7 11 13 16 19 22 25 28 31 std 34 37 40 43 47 50 53 56

37 g 25

Figure 6. The SDS-PAGE gels of ps-pET15b purified using a Sepharose Fast-Flow column charged with Ni2+. The fractions are labeled on top of each lane. Std represents standards, and numbers besides the standard lane indicate the corresponding molecular weights. The expected proteins are in red box.

83

A B Lysozyme 14.7 kDa BSA l 66 kDa ps

Figure 7. A. Size exclusion column chromatography of purified ps-pET15b from Sepharose Fast-Flow column. B. Overlap of ps-pET15b with standards BSA and Lysozyme, which were indicated with labels besides them.

Bacillus cereus ATCC 10987(BCE_1944)

The gradient PCR was exploited to clone the expected gene from genomic DNA using the forward and reverse primers shown in Table 1. DNA gel analysis revealed single bands at 0h 5h 24h the predicted DNA size of the putative dioxygenase BCE_1944std (757 b.p.) (Figure 8). The gene was subcloned into the pTOM vector. The ligations were confirmed by the bands around

1000 b.p. in the 1 % agarose gel after colony PCR (Figure 9). However, the sequencing result Insolu ble showed that the cloned insert has less than 30 % similarity to the predicted sequence (data not

Figure 8. Gradient PCR result of BCE_1944 on agarose gel. The left column is DNA ladder besides with corresponding size label. The estimated size of BCE_1944 is 757 b. p.

84

Figure 9. Colony PCR product of BCE_1944. The left column is DNA ladder besides with corresponding size label. shown). This mistake occurred probably due to mis-cut during endonuclease digestion or wrong primer binding during gradient PCR. So primers will be redesigned and the whole process will be repeated until obtaining correctly sequenced insert.

Chromobacterium violaceum ATCC 12472 (CV3550)

The gradient PCR was used to clone the expected enzyme DNA from genomic DNA using the forward and reverse primers shown in Table 1. DNA gel analysis revealed that the molecular mass of this product was similar to the predicted molecular mass of the putative dioxygenase CV3550 (767 b.p.) (Figure 10). The gene was subcloned into pET-15b and pTOM vectors.

The successful ligations were confirmed by the bands around 1000 b.p. in the agarose gel after colony PCR (Figure 11). All cloned gene constructs were sequenced. The successful cloning was confirmed by the sequencing results.

85

Figure 10. Gradient PCR result of CV3550 on agarose gel. The left column is DNA ladder besides with corresponding size label. The estimated size of BCE_1944 is 767 b. p.. The isolated gene is circled orange rectangle.

Figure 11. Colony PCR result of CV3550. The left column is DNA ladder besides with corresponding size label.

To determine the best conditions to obtain soluble protein, all the parameters related to protein expression, such as vectors, temperature, media, and time were screened. Soluble and insoluble protein expression was compared for cv-pET-15b and cv-pTOM after transformation into BL-21 cells in LB-Amp media at 37 ˚C, 30 ˚C or 16 ˚C with induction with 1 mM or 10 mM IPTG, in addition with Super broth and Fe2+. Unfortunately the

SDS-PAGE gels showed that cv-pTOM dioxygenase produced very little soluble protein of the correct size under each growth condition (data not shown). Therefore, the cv-pTOM was transformed into BL-21 AI cells, and repeated the screening process (Figure 12). Under combinations of different concentrations of IPTG (1mM and 10 mM) and L-arabinose (0.2,

86

Soluble

0.0002%

Figure 12. Protein expression of cv-pTOM construct in BL-21 AI cells with superbroth-Amp media with 0.05 mM Fe2. BL-21 AI cells were grown at 30 °C and induced by 0.1 mM IPTG and four different concentrations (numbers on the middle of the gel) of arabinose separately. Aliquots were taken out at different time point as shown on top of each lane to check the protein solubility. The bands corresponding to the cv-pTOM molecular weight were in red rectangle, while the arabinose concentration at which cells produced the most soluble protein are circled in red oval.

0.02, 0.002, and 0.0002 % by weight), SDS-PAGE protein gel conformed super broth-Amp media with 0.05 mM ferrous ammonium sulfate at 16 ˚C for 24 h, induced with 1 mM IPTG and 0.02 % L-arabinose produced the highest level of soluble protein.

Successful purification of cv-pTOM was accomplished using immobilized nickel affinity chromatography. The protein was eluted using a lineal gradient of 0-50 % eluting buffer (containing 20 mM hepes, 0.5 mM NaCl, and 0.5 M imidazole (pH8.0)). The purity of the enzyme was confirmed by SDS-PAGE gels (Fig. 13). A single protein band was detected on the denaturing gel corresponding to about 31.3 kDa.

87

Figure 13. The SDS-PAGE gels of cv-pTOM in BL-21 AI cells from a Sepharose Fast-Flow column. The fractions are labeled on top of each lane. std represents standards, and numbers besides the standard lane indicate the corresponding molecular weights. The possible cv-pTOM are circled in red box.

Sulfolobus solfataricus P2 (SSO0066)

Gradient PCR was used to clone the expected enzyme gene from genomic DNA using the forward and reverse primers shown in Table 1. DNA gel analysis revealed single bands at the predicted DNA size of the putative dioxygenase SSO0066 (850 b.p.) (Figure 14).

However, the attempt to ligate the SSO0066 into pET-15b and pTOM vectors failed, and new attempts will be performed in the future.

Figure 14. Gradient PCR result of SSO0066 on agarose gel. The left column is DNA ladder besides with corresponding size label. The estimated size of of SSO0066 is 850 b.p.

88

DesB protein expression screening and plasmid mutagenesis

After the DesB gene was cloned into the pTOM vector by Julie Huang, it was found that little soluble protein was produced during protein expression screens (data not shown).

Suspecting that the N-terminal His6 tag might be causing the protein to misfold leading to the solubility problems, the DesZ gene was re-subcloned into the pET21b vector in which the stop codon prevents the His tag production. Different combinations of temperatures and concentrations of inducing agents (IPTG and arabinose) including adding Fe2+ and Mn2+ were tried to get the optimal expression. However, none of these conditions provided enough soluble proteins, compared with the bulky bands in insoluble fractions as shown in Figure 15.

0h std 5h 24h 0h 5h 24h 0h 5h 24h std 0h 5h 24h

Soluble

0.2 % 0.02 % 0.002 % 0.0002 % 0h 5h 24h std 0h 5h 24h 0h 5h 24h std 0h 5h 24h

Insoluble

Figure 15. DesB in pET21b vector expression screening. BL-21 AI cells with DesZ-pET21b were grown at 30 °C and induced by 0.1 mM IPTG and four different concentrations of arabinose (numbers under the red lines) separately. Aliquots were taken out at different time point as shown on top of each lane to check the protein solubility. The bands corresponding to the DesB molecular weight in the insoluble gels were very obvious and circled in red.

89

Since the expression screening for DesB in pET21b without His tag was not optimal, using C-terminal His tag was considered to improve the solubility, thus site-directed mutagenesis was adopted to remove the stop codon in the DesB-pET21b construct. Many means have been tried including designing new primers, changing annealing temperature from 60 °C to 55 or 65 °C, increasing elongation time for 30 s, using better competent cells

XL-10 gold. All of these attempts failed, giving a random sequence shown by sequencing. It was suspect that the batch of DesB-pET21b construct may have problems.

Conclusions and Future Directions

Among the selected five LigB divergent homologs, C. violaceum (CV3550) and

Pseudomonas syringae DC3000 (PSPT_1776) were cloned and purified, DesB has solubility difficulty, and the other two homologs were still waiting to be inserted into a vector. Upon obtaining the proteins, the substrates need to be identified for further kinetics and other characteristic assays. The mutation of stop codon for the DesB-pET21b construct will be continued with other parameter changes, or an inclusion body purification may be applied.

The cloning of the remaining two proteins will be focused on the ligation step, since the PCR amplification conditions have already been determined.

90

Chapter 5 Conclusion

Considering all the sequenced but far less well studied organisms, it is urgent and crucial to fill the vast gap between the fast growth of sequence data resulting from the development of sequencing techniques and the comparatively slow progress of functional characterization of sequenced genes. However, accurate gene functional annotation which mostly depends on extrapolations of functionally characterized genes with high sequence homology has been hindered, because notable functional promiscuity exists in many protein superfamilies. Thus, understanding of the relationship between protein sequence, structure and function, which is reliant on robust classification of protein superfamilies, will ultimately yield more accurate gene annotation.

In the family of aromatic-ring-cleaving dioxgenases, extradiol dioxygenases are involved in enzyme pathways that funnel lignin derived aromatic compounds into the TCA cycle. Enzymes belonging to the PCA superfamily of extradiol dioxygenases need additional characterization to reveal important aspects of structures, substrates specificities and catalytic mechanism. Genes with less than 20 % pair-wise identity to the first structurally characterized enzyme in the superfamily, protocatechuate 4,5-dioxygenase in Sphingomonas paucimobilis

SYK-6, LigAB, are under investigation to reveal the boundaries of the PCAD superfamily.

Identification of potential diverged PCAD superfamily members will likely reveal enzymes with diverse substrate specificity. Further characterization of these and other related enzymes, including analysis of substrate specificity, mechanism and physiological structure,

91 should not only provide the foundation for understanding the specificity determinants for lignin-derived aromatic compound degradation, but also reveal more clearly the basis for differences between this new superfamily. This will also allow comparison to the VOC and cupin superfamilies.

The exploration and solidification of the new extradiol dioxygenase containing superfamily will facilitate discovery of new functions and help define the scope of possible reactions that can be catalyzed by this new structural fold, thus increasing our understanding between protein sequence, structure and function. This accumulative understanding will help filling the gap between sequenced genome data and experimentally verified functions, and benefit more accurate genome annotation.

92

Appendix1 Sequences of proteins studied in this thesis.

Species of origin and putative function attributed by genome annotation are listed

Escherichia coli K12

Predicted dioxygenase YgiD >F - unk MTPLVKDIIMSSTRMPALFLGHGSPMNVLEDNLYTRSWQKLGMTLPRPQAIVVVSAH WFTRGTGVTAMETPPTIHDFGGFPQALYDTHYPAPGSPALAQRLVELLAPIPVTLDKE AWGFDHGSWGVLIKMYPDADIPMVQLSIDSSKPAAWHFEMGRKLAALRDEGIMLVA SGNVVHNLRTVKWHGDSSPYPWATSFNEYVKANLTWQGPVEQHPLVNYLDHEGGT LSNPTPEHYLPLLYVLGAWDGQEPITIPVEGIEMGSLSMLSVQIG

Sphingomonas paucimobilis SYK-6

3-methoxy gallic acid 3,4-dioxygenase

MAEIVLGIGTSHGPMLVTQTEQWRSRLAFDQSVNHAWRGGSWSYDQLVAERADQNF AAQITPEAMTAHNARCQASLDQLAEIFSEAKIDVAVILGNDQMEIFDERLVPAFSVFYG DTITNYEFPPERMAALPPGINLSVAGYIPSGGAEYAGQPELARSIIAQAMADEFDVAA MKALPKPETPHAFGFVYRRIMRDNPVPSVPVLVNTFYPPNQPTVRRCYEFGKSVLRGI QAWESDARVAVLASGGLTHFVIDEEIDRLFFQAMEDRDIARLADLGEAIFQDGTSELK NWIPLAGMMAELGLDHEILDYVPCYRSEAGTGNAMGFVCWRMGHHHHHHHHHHG LVPRGSH

Bordetella bronchiseptica RB50 (ATCC# BAA-588 BSL 2)

>Hypothetical_protein_BB1843_[Bordetella_bronchiseptica_RB50]

MGFIAGLKIPTISHPYLPEVAMRLPTLFVSHGSPMLAVEPGRTGPALAAWSDGLPERPR AVLVVSPHWMGQGLAVSTRDRQVAWHDFGGFPADLYQLQYPAEGSPRLAQQVVDV LARAGIQAGNDARRPLDHGAWVPLRYLYPHADLPVVQLSLDMERDARGQYALGQA LAPLRDDGVLVIGSGSLTHNLRDVRMPHGAPPADYVAPFQQWYAEHLAAGDVEALL DWQARAPGAARAHPHDDHLMPLYVALGAGGMPARRLNDEVAYGALAMDAYQFGA

93

Pseudomonas syringae pv. tomato str. DC3000 (ATCC# BAA-871 BSL 1)

>hypothetical_protein_[Pseudomonas_syringae_pv._tomato_str._DC3000]_PSPT

MFPSLFISHGSPTLALEPGESGPALAQLAASLPRPRAIVMVSAHWESHELIVNGNPQPE TWHDFGGFAAELFAMQYPAPGLPELTRTIVELLTANGLPARIDSRRPFDHGVWVPLSL MYPQADIPVVQVSLPSRQGPALQTQVGRALAGLREQGVLVIGSGSITHNLYDLDWSA APDRVEPWAAEFRDWIIDKLQSNDEAALHHYRTQAPHAVRAHPSDEHLLPLYFARGA GGAFSIAYQGFTMGALGMDIYRFD

Chromobacterium violaceum ATCC 12472 (ATCC# 12472 BSL 2)

>Hypothetical_protein_CV_3550_[Chromobacterium_violaceum_ATCC 12472]

MTTRQPSLFISHGAPTLPIEDSPTRHFLEQLGRDWPRPRAIVIVSAHWEARALTVNLQP RPETMYDFGGFPPVLRTLRYPAATDAALAERVIGLLDAAGLPVAATEERPLDHGMWT PLMLMFPQADIPLVMISLRHQAGVAEHYALGQALKPLRDDNVLVIGSGSYTHNLWEL SPEGSEPAPWAADFAGWMDQALLDNRRDDIVHWQQRAPYPRENHPSDEHFQPLLVA LGASDDDAVPVKLHDDWRMGSLSMACWRFD

Bacillus cereus ATCC_10987 (ATCC# 10987 BSL 1)

>oxidoreductase_[Bacillus_cereus_ATCC_10987]

MMPSLFLAHGSPMLAIQDTDYTSFLKILGETYKPKAIVIFTAHWESEVLTISSSDNEYE TIYDFGGFPPELYEIKYRAKGSSSIASMLETKFKNKGIPVHHNMTRGLDHGSWTLLHR MYPEANIPVVQISVNPFLSAKEQFKIGEALKGLGQEDILVIGSGVTVHNLRALKWNQT TPEQWAIEFDDWIIKHMQAADKDALFNWEKNAPHAQLAVPRAEHFVPLFIAMGSGE NSGEVIHYELGTLSYLCLQF

Sulfolobus solfataricus P2 (ATCC # 35092 BSL 1)

Predicted dioxygenase

>Sulfolobus_solfataricus_ P2- Hypothetical protein

MRRLPAVAGSFYESDPKKLKMQIEWSFRHNIGPRDIPKQSYEKKKRDNLFFIVPHAGY

94

IYSGPVAAHSYYYLASEGKPDVVIILGPNHTGLGSYVSAWPKGEWETPLGSVKVDEE VLMQLVMESEVIDLEEKSHLYEHSIEVQLPFLQYFFDDNFKIVPIVIMMQTPEIAEFLA DAIYKVIQKYSDKDIVVLASSDMNHYDPHEITMKKDEEAIEKIQQLDYRGLYEVVEG KDVTLCGYGPIMVSLILAKKLGKKAYILKHATSGDTSGPKDSVVGYLAARFGS

Sphingomonas paucimobilis SYK-6

Gallate dioxygenase MAKIIGGFAVSHTPTIAFAHDANKYDDPVWAPIFQGFEPVKQWLAEQKPDVTFYVYN DHMTSFFEHYSHFALGVGEEYSPADEGGGQRDLPPIKGDPELAKHIAECLVADEFDLA YWQGMGLDHGAFSPLSVLLPHEHGWPCRIVPLQCGVLQHPIPKARRFWNFGRSLRR AIQSYPRDIKVAIAGTGGLSHQVHGERAGFNNTEWDMEFMERLANDPESLLGATVTD LAKKGGWEGAEVVMWLLMRGALSPEVKTLHQSYFLPSMTAIATMLFEDQGDAAPP AESDEALRARAKRELAGVEEIEGTYPFTIDRAVKGFRINHFLHRLIEPDFRKRFVEDPE GLFAESDLTEEEKSLIRNRDWIGMIHYGVIFFMLEKMAAVLGIGNIDVYAAFRGLSVP EFQKTRNAAITYSVAGKQ

95

Appendix 2 Mass spectra

YgiD-pTOM

4700 Linear Spec #1 MC[BP = 31880.1, 13227]

3 1 8 8 4 .4 9 2 2 1 0 0 1 .3 E+ 4

90

80

70

3 2 0 7 3 .0 7 4 2 60

50 % Intensity

40 1 5 9 3 8 .6 7 8 7

30 1 0 6 0 6 .6 9 6 3

3 2 2 9 6 .8 9 8 4 1 6 0 3 8 .4 1 1 1 20 6 3 8 4 0 .5 5 4 7

1 1 0 1 1 .5 3 0 3 10 1 0 8 1 4 .9 3 9 5 2 1 2 8 4 .8 1 6 4 4 7 9 3 6 .4 9 2 2

0 1 0 0 0 2 0 8 0 0 4 0 6 0 0 6 0 4 0 0 8 0 2 0 0 1 0 0 0 0 0 M a s s (m /z )

96

Ps-pET15b

97 cv-pTOM

4700 Linear Spec #1 MC[BP = 31000.0, 36904]

3 0 9 9 4 .3 7 5 0 1 0 0 3 .7 E+ 4

90

80

70 3 1 1 7 5 .8 5 7 4

60

50 % Intensity

40 1 5 4 8 9 .4 4 0 4

30 6 2 0 9 8 .0 3 1 3 1 5 5 8 4 .1 6 8 0 3 1 3 9 7 .4 9 8 0 20

10 4 6 6 1 5 .7 7 3 4 9 3 1 7 7 .6 4 0 6 1 0 3 8 8 .3 5 5 5 2 0 7 3 5 .4 4 7 3

0 1 0 0 0 0 2 8 0 0 0 4 6 0 0 0 6 4 0 0 0 8 2 0 0 0 1 0 0 0 0 0 M a s s (m /z )

98

DesZ

99

Appendix 3 Progress for all the proteins studies in this thesis

Frozen Genome Extracted Ligation into Soluble Protein Protein Bacteria Strain Grown DH5a BL21 BL21-AI Sequencing Stock extracted Insert Vector Insoluble Purified Assayed

Pseudomonas syringae pv. pTOM DNA x x x x x x x tomato str. DC3000 G1 Chromobacterium pET-15b violaceum ATCC x x x x x x x x x x pTOM 12472 Sulfolobus DNA x solfataricus P2 Bacillus cereus x x x x ATCC_10987 pET-15b DesZ DNA pTOM x x x x x x x

pET21b pET-15b DesB DNA pTOM x x x x x

pET21b YgiD DNA pTOM x x x x x x

100

References

1. Galperin MY & Koonin EV (2010) From complete genome sequence to ‘complete’ understanding? Trends in Biotechnology 28(8):398-406. 2. Blattner FR, et al. (1997) The Complete Genome Sequence of Escherichia coli K-12. Science 277(5331):1453-1462. 3. Riley M, et al. (2006) Escherichia coli K-12: a cooperatively developed annotation snapshot--2005. (Translated from eng) Nucleic Acids Research 34(1):1-9 (in eng). 4. Serres M, et al. (2001) A functional update of the Escherichia coli K-12 genome. Genome Biology 2(9):research0035.0031-research0035.0037. 5. Todd AE, Orengo CA, & Thornton JM (2001) Evolution of function in protein superfamilies, from a structural perspective. Journal OF MOLECULAR BIOLOGY 307(4):1113-1143. 6. Wilson D, et al. (2009) SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Research 37(suppl_1):D380-386. 7. Ammelburg M, Frickey T, & Lupas AN (2006) Classification of AAA+ proteins. Journal of Structural Biology 156(1):2-11. 8. Matsuda K, Nishioka T, Kengo K, Kawabata T, & Go N (2003) Finding evolutionary relations beyond superfamilies: Fold-based superfamilies. Protein Science 12(10):2239-2251. 9. Babbitt PCG, John A (2001) New functions from old scaffolds: how nature reengineers enzymes for new functions. Advances in Protein Chemistry 55:1-28. 10. Kinoshita KK, Akinori; Go, Nobuhiro (1999) Symmetry in protein folds: implication in evolution and folding. International Congress Series 1194:249-257. 11. Dayhoff MO (1976) The origin and evolution of protein superfamilies. Federation Proceedings 35(10):2132-2138. 12. Webb EC ed (1992) Enzyme Nomenclature 1992 (Academic Press, San Deigo). 13. Alexey G. Murzin SEB, Tim Hubbard and Cyrus Chothia (1995) SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 247:536-540. 14. Margaret E Glasner JAGaPCB (2006) Evolution of enzyme superfamilies. Current Opinion in Chemical Biology 10:492-497. 15. Babbitt JAGaPC (2001) DIVERGENT EVOLUTION OF ENZYMATIC FUNCTION: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies. Annu. Rev. Biochem. 70:209. 16. Lipscomb JD (2008) Mechanism of extradiol aromatic ring-cleaving dioxygenases. Current Opinion in Structural Biology 18(6):644-649. 17. LINDSAY D. ELTIS JTB (1996) Evolutionary Relationships among Extradiol

101

Dioxygenases. JOURNAL OF BACTERIOLOGY 178:5930–5937. 18. Frederic H. Vaillancourt JTB, Lindsay D. Eltis (2006) The Ins and Outs of Ring-cleaving Dioxygenases. Critical Reviews in Biochemistry and Molecular Biology 41:241-267. 19. Harayama S & Rekik M (1989) Bacterial aromatic ring-cleavage enzymes are classified into two different gene families. J. Biol. Chem. 264(26):15328-15333. 20. Spence EL, Kawamukai, M., Sanvoisin, J., Braven, H. and Bugg,T.D.H. (1996) Catechol Dioxygenases from Escherichia coli (MhpB) and Alcaligenes eutrophus (MpcI): Sequence Analysis and Biochemical Properties of a Third Family of Extradiol Dioxygenases. Journal of Bacteriology 178(17):5249-5256. 21. Han S, Eltis LD, Timmis KN, Muchmore SW, & Bolin JT (1995) Crystal structure of the biphenyl-cleaving extradiol dioxygenase from a PCB-degrading pseudomonad. Science v270(n5238):p976(975). 22. EMMA L. SPENCE MK, JONATHAN SANVOISIN, HELEN BRAVEN, & BUGG ATDH (1996) Catechol Dioxygenases from Escherichia coli (MhpB) and Alcaligenes eutrophus (MpcI): Sequence Analysis and Biochemical Properties of a Third Family of Extradiol Dioxygenases. JOURNAL OF BACTERIOLOGY, 178(17):5249-5256. 23. Armstrong RN (2000) Mechanistic Diversity in a Metalloenzyme Superfamily Biochemistry 39(45):13625-13632. 24. Dunwell JM, Culham A, Carter CE, Sosa-Aguirre CR, & Goodenough PW (2001) Evolution of functional diversity in the cupin superfamily. Trends in Biochemical Sciences 26(12):740-746. 25. Ohlendorf DH, Lipscomb, J.D., and Weber, P.C. (1988) Structure and assembly of protocatechuate 3,4-dioxygenase. Nature 336. 26. Vetting MW, D'Argenio DA, Ornston LN, & Ohlendorf DH (2000) Structure of Acinetobacter Strain ADP1 Protocatechuate 3,4-Dioxygenase at 2.2 A Resolution: Implications for the Mechanism of an Intradiol Dioxygenase†Biochemistry 39(27):7943-7955. 27. Ferraroni M, et al. (2005) Crystal Structure of the Hydroxyquinol 1,2-Dioxygenase from Nocardioides simplex 3E, a Key Enzyme Involved in Polychlorinated Aromatics Biodegradation. Journal of Biological Chemistry 280(22):21144-21154. 28. Kita A, et al. (1999) An archetypical extradiol-cleaving catecholic dioxygenase: the crystal structure of catechol 2,3-dioxygenase (metapyrocatechase) from Pseudomonas putida mt-2. Structure 7(1):25-34. 29. Vetting MW, Wackett LP, Que L, Jr., Lipscomb JD, & Ohlendorf DH (2004) Crystallographic Comparison of Manganese- and Iron-Dependent Homoprotocatechuate 2,3-Dioxygenases. The Journal of Bacteriology 186(7):1945-1958. 30. Asturias JA, Eltis LD, Prucha M, & Timmis KN (1994) Analysis of three

102

2,3-dihydroxybiphenyl 1,2-dioxygenases found in Rhodococcus globerulus P6. Identification of a new family of extradiol dioxygenases. J. Biol. Chem. 269(10):7807-7815. 31. Jitka Novotná AHPBJKJíJJSE (2004) l-3,4-Dihydroxyphenyl alanine-extradiol cleavage is followed by intramolecular cyclization in lincomycin biosynthesis. European Journal of Biochemistry 271(18):3678-3683. 32. Nogales J, Canales A, Jimenez-Barbero J, Garcia JL, & Diaz E (2005) Molecular Characterization of the Gallate Dioxygenase from Pseudomonas putida KT2440: THE PROTOTYPE OF A NEW SUBGROUP OF EXTRADIOL DIOXYGENASES. J. Biol. Chem. 280(42):35382-35390. 33. Bugg TDH (1993) Overproduction, purification and properties of 2,3-dihydroxyphenylpropionate 1,2-dioxygenase from Escherichia coli. Biochimica et Biophysica Acta 1202(2):258-264. 34. Roper DI & Cooper RA (1990) Subcloning and nucleotide sequence of the 3,4-dihydroxyphenylacetate (homoprotocatechuate) 2,3-dioxygenase gene from Escherichia coli C. FEBS Letters 275(1-2):53-57. 35. Sugimoto K, et al. (1999) Crystal structure of an aromatic ring opening dioxygenase LigAB, a protocatechuate 4,5-dioxygenase, under aerobic conditions. Structure 7(8):953-965. 36. Mampel J, Providenti MA, & Cook AM (2005) Protocatechuate 4,5-dioxygenase from Comamonas testosteroni T-2: biochemical and molecular properties of a new subgroup within class III of extradiol dioxygenases. Archives of Microbiology 183(2):130-139. 37. Kabisch MaF, P. (1990) Nucleotide sequence of metapyrocatechase I (catechol 2,3- I) gene mpcI from Alcaligenes eutrophus JMP222. Nucleic Acids Res. 18(11):3405-3406. 38. Davis JK, He Z, Somerville CC, & Spain JC (1999) Genetic and biochemical comparison of 2-aminophenol 1,6-dioxygenase of Pseudomonas pseudoalcaligenes JS45 to meta-cleavage dioxygenases: divergent evolution of 2-aminophenol meta-cleavage pathway. Archives of Microbiology 172(5):330-339. 39. Iwabuchi T & Harayama S (1998) Biochemical and Molecular Characterization of 1-Hydroxy-2-naphthoate Dioxygenase from Nocardioides sp. KP7. Journal of Biological Chemistry 273(14):8332-8336. 40. Harpel MR & Lipscomb JD (1990) Gentisate 1,2-dioxygenase from Pseudomonas. Substrate coordination to active site Fe2+ and mechanism of turnover. Journal of Biological Chemistry 265(36):22187-22196. 41. Babbitt PC & Gerlt JA (1997) Understanding Enzyme Superfamilies. Journal of Biological Chemistry 272(49):30591-30594. 42. Serre L, et al. (1999) Crystal structure of Pseudomonas fluorescens

103

4-hydroxyphenylpyruvate dioxygenase: an enzyme involved in the tyrosine degradation pathway. Structure 7(8):977-988. 43. Cameron AD, Olin B, Ridderstrom M, Mannervik B, & Jones TA (1997) Crystal structure of human glyoxalase I[mdash]evidence for gene duplication and 3D domain swapping. EMBO J 16(12):3386-3395. 44. Bernat BA, Laughlin LT, & Armstrong RN (1997) Fosfomycin Resistance Protein (FosA) Is a Manganese Metalloglutathione Related to Glyoxalase I and the Extradiol Dioxygenases†. Biochemistry 36(11):3050-3055. 45. McCarthy AA, Baker HM, Shewry SC, Patchett ML, & Baker EN (2001) Crystal Structure of Methylmalonyl-Coenzyme A Epimerase from P. shermanii: a Novel Enzymatic Function on an Ancient Metal Binding Scaffold. Structure 9(7):637-646. 46. Dumas P, Bergdoll M, Cagnon C, & Masson JM (1994) Crystal structure and site-directed mutagenesis of a bleomycin resistance protein and their significance for drug sequestering. (Translated from eng) The EMBO journal 13(11):2483-2492 (in eng). 47. Vaillancourt FH, Yeh E, Vosburg DA, O'Connor SE, & Walsh CT (2005) Cryptic chlorination by a non-haem iron enzyme during cyclopropyl amino acid biosynthesis. Nature 436(7054):1191-1194. 48. Kukor JJ & Olsen RH (1991) Genetic organization and regulation of a meta cleavage pathway for catechols produced from catabolism of toluene, benzene, phenol, and cresols by Pseudomonas pickettii PK01. Journal of Bacteriology 173(15):4587-4584. 49. Parales RE, Ontl TA, & Gibson DT (1997) Cloning and sequence analysis of a catechol 2,3-dioxygenase gene from the nitrobenzene-degrading strain <i>Comamonas</i> sp JS765. Journal of Industrial Microbiology & Biotechnology 19(5):385-391. 50. Takeo M, Nishimura M, Shirai M, Takahashi H, & Negoro S (2007) Purification and Characterization of Catechol 2,3-Dioxygenase from the Aniline Degradation Pathway of Acinetobacter sp. YAA and Its Mutant Enzyme, Which Resists Substrate Inhibition. Bioscience, Biotechnology, and Biochemistry 71(7):1668-1675. 51. Vetting MW, Wackett LP, Que L, Lipscomb JD, & Ohlendorf DH (2004) Crystallographic Comparison of Manganese- and Iron-Dependent Homoprotocatechuate 2,3-Dioxygenases. Journal of Bacteriology 186(7):1945-1958. 52. Sparnins VL & Chapman PJ (1976) Catabolism of L tyrosine by the homoprotocatechuate pathway in gram positive bacteria. Journal of Bacteriology 127(1):362-366. 53. Gerlt JA & Babbitt PC (1998) Mechanistically diverse enzyme superfamilies: the importance of chemistry in the evolution of catalysis. Current Opinion in Chemical Biology 2(5):607-612. 54. Hegg EL & Jr LQ (1997) The 2-His-1-Carboxylate Facial Triad — An Emerging

104

Structural Motif in Mononuclear Non-Heme Iron(II) Enzymes. European Journal of Biochemistry 250(3):625-629. 55. Senda T, et al. (1996) Three-dimensional Structures of Free Form and Two Substrate Complexes of an Extradiol Ring-cleavage Type Dioxygenase, the BphC Enzyme fromPseudomonassp. Strain KKS102. Journal of Molecular Biology 255(5):735-752. 56. Hatta T, Mukerjee-Dhar G, Damborsky J, Kiyohara H, & Kimbara K (2003) Characterization of a Novel Thermostable Mn(II)-dependent 2,3-Dihydroxybiphenyl 1,2-Dioxygenase from a Polychlorinated Biphenyl- and Naphthalene-degrading Bacillus sp. JF8. Journal of Biological Chemistry 278(24):21483-21492. 57. Cho HJ, et al. (2010) Substrate Binding Mechanism of a Type I Extradiol Dioxygenase. Journal of Biological Chemistry 285(45):34643-34652. 58. Cerdan P, Wasserfallen A, Rekik M, Timmis KN, & Harayama S (1994) Substrate specificity of catechol 2,3-dioxygenase encoded by TOL plasmid pWW0 of Pseudomonas putida and its relationship to cell growth. Journal of Bacteriology 176(19):6074-6081. 59. Machonkin TE & Doerner AE (2011) Substrate Specificity of Sphingobium chlorophenolicum 2,6-Dichlorohydroquinone 1,2-Dioxygenase. Biochemistry 50(41):8899-8913. 60. Boldt YR, Sadowsky MJ, Ellis LB, Que L, & Wackett LP (1995) A manganese-dependent dioxygenase from Arthrobacter globiformis CM-2 belongs to the major extradiol dioxygenase family. Journal of Bacteriology 177(5):1225-1232. 61. Fielding A, Kovaleva E, Farquhar E, Lipscomb J, & Que L (2011) A hyperactive cobalt-substituted extradiol-cleaving . Journal of Biological Inorganic Chemistry 16(2):341-355-355. 62. Gibello A, Ferrer E, Martin M, & Garrido-Pertierra A (1994) 3,4-Dihydroxyphenylacetate 2,3-dioxygenase from Klebsiella pneumoniae, a Mg2+-containing dioxygenase involved in aromatic catabolism. Biochemical Journal 301(Pt1):145-150. 63. Colabroy KL, et al. (2008) Biochemical characterization of l-DOPA 2,3-dioxygenase, a single-domain type I extradiol dioxygenase from lincomycin biosynthesis. Archives of Biochemistry and Biophysics 479(2):131-138. 64. Lane BG, et al. (1991) Homologies between members of the germin gene family in hexaploid wheat and similarities between these wheat germins and certain Physarum spherulins. Journal of Biological Chemistry 266(16):10461-10469. 65. Dunwell JM, Purvis A, & Khuri S (2004) Cupins: the most functionally diverse protein superfamily? Phytochemistry 65(1):7-17. 66. Lawrence MC, Izard T, Beuchat M, Blagrove RJ, & Colman PM (1994) Structure of Phaseolin at 2·2 Å Resolution: Implications for a Common Vicilin/Legumin Structure and the Genetic Engineering of Seed Storage Proteins. Journal of Molecular Biology

105

238(5):748-776. 67. Kucharczyk R, Zagulski M, Rytka J, & Herbert CJ (1998) The yeast gene YJR025c encodes a 3-hydroxyanthranilic acid dioxygenase and is involved in nicotinic acid biosynthesis. FEBS Letters 424(3):127-130. 68. Calderone V, Trabucco M, Menin V, Negro A, & Zanotti G (2002) Cloning of human 3-hydroxyanthranilic acid dioxygenase in Escherichia coli: characterisation of the purified enzyme and its in vitro inhibition by Zn2+. Biochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1596(2):283-292. 69. Titus GP, et al. (2000) Crystal structure of human homogentisate dioxygenase. Nature Structural Molecular Biology 7(7):542-546. 70. Chen J, et al. (2008) Crystal structure and mutagenic analysis of GDOsp, a gentisate 1,2-dioxygenase from Silicibacter Pomeroyi. Protein Science 17(8):1362-1373. 71. Fu W & Oriel P (1998) Gentisate 1,2-dioxygenase from Haloferax sp. D1227. Extremophiles 2(4):439-446. 72. McCoy JG, et al. (2006) Structure and mechanism of mouse cysteine dioxygenase. Proceedings of the National Academy of Sciences of the United States of America 103(9):3084-3089. 73. Steiner RA, Kooter IM, & Dijkstra BW (2002) Functional Analysis of the Copper-Dependent Quercetin 2,3-Dioxygenase. 1. Ligand-Induced Coordination Changes Probed by X-ray Crystallography: Inhibition, Ordering Effect, and Mechanistic Insights. Biochemistry 41(25):7955-7962. 74. Pochapsky T, Pochapsky S, Ju T, Hoefler C, & Liang J (2006) A Refined Model for the Structure of Acireductone Dioxygenase from Klebsiella ATCC 8724 Incorporating Residual Dipolar Couplings. Journal of Biomolecular NMR 34(2):117-127. 75. Khuri S, Bakker FT, & Dunwell JM (2001) Phylogeny, Function, and Evolution of the Cupins, a Structurally Conserved, Functionally Diverse Superfamily of Proteins. Molecular Biology and Evolution 18(4):593-605. 76. Fetzner S (2012) Ring-Cleaving Dioxygenases with a Cupin Fold. Applied and Environmental Microbiology 78(8):2505-2514. 77. Adams MA, Singh VK, Keller BO, & Jia Z (2006) Structural and biochemical characterization of gentisate 1,2-dioxygenase from Escherichia coli O157:H7. Molecular Microbiology 61(6):1469-1484. 78. Ferraroni M, et al. (2012) Crystal structures of salicylate 1,2-dioxygenase-substrates adducts: A step towards the comprehension of the structural basis for substrate selection in class III ring cleaving dioxygenases. Journal of Structural Biology 177(2):431-438. 79. Matera I, et al. (2008) Salicylate 1,2-Dioxygenase from Pseudaminobacter salicylatoxidans: Crystal Structure of a Peculiar Ring-cleaving Dioxygenase. Journal

106

of Molecular Biology 380(5):856-868. 80. Zhang Y, Colabroy KL, Begley TP, & Ealick SE (2005) Structural Studies on 3-Hydroxyanthranilate-3,4-dioxygenase: The Catalytic Mechanism of a Complex Oxidation Involved in NAD Biosynthesis. Biochemistry 44(21):7632-7643. 81. Murakami S, Sawami Y, Takenaka S, & Aoki K (2004) Cloning of a gene encoding 4-amino-3-hydroxybenzoate 2,3-dioxygenase from Bordetella sp. 10d. Biochemical and Biophysical Research Communications 314(2):489-494. 82. Schaab MR, Barney BM, & Francisco WA (2005) Kinetic and Spectroscopic Studies on the Quercetin 2,3-Dioxygenase from Bacillus subtilis. Biochemistry 45(3):1009-1016. 83. Gopal B, Madan LL, Betz SF, & Kossiakoff AA (2004) The Crystal Structure of a Quercetin 2,3-Dioxygenase from Bacillus subtilis Suggests Modulation of Enzyme Activity by a Change in the Metal Ion at the Active Site(s)†. Biochemistry 44(1):193-201. 84. Hancock V & Klemm P (2007) Global Gene Expression Profiling of Asymptomatic Bacteriuria Escherichia coli during Biofilm Growth in Human Urine. Infection and Immunity 75(2):966-976. 85. Datsenko KA & Wanner BL (2000) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proceedings of the National Academy of Sciences 97(12):6640-6645. 86. Velázquez-Campoy A, Ohtaka H, Nezami A, Muzammil S, & Freire E (2001) Isothermal Titration Calorimetry. Current Protocols in Cell Biology, (John Wiley & Sons, Inc.). 87. Pochetti G & Montanari R (2012) Isothermal titration calorimetry to determine the association constants for a ligand bound simultaneously to two specific protein binding sites with different affinities. 88. Chao Y & Fu D (2004) Thermodynamic Studies of the Mechanism of Metal Binding to the Escherichia coli Zinc Transporter YiiP. Journal of Biological Chemistry 279(17):17173-17180. 89. Grossoehme NE, Mulrooney SB, Hausinger RP, & Wilcox DE (2007) Thermodynamics of Ni2+, Cu2+, and Zn2+ Binding to the Urease Metallochaperone UreE†. Biochemistry 46(37):10506-10516. 90. Song H, Wilson DL, Farquhar ER, Lewis EA, & Emerson JP (2012) Revisiting Zinc Coordination in Human Carbonic Anhydrase II. Inorganic Chemistry 51(20):11098-11105. 91. Gopal B, et al. (1997) Thermodynamics of Metal Ion Binding and Denaturation of a Calcium Binding Protein from Entamoeba histolytica†. Biochemistry 36(36):10910-10916. 92. Canabady-Rochelle L-S, Sanchez C, Mellema M, & Banon S (2009) Study of

107

Calcium−Soy Protein Interactions by Isothermal Titration Calorimetry and pH Cycle. Journal of Agricultural and Food Chemistry 57(13):5939-5947. 93. Emerson JP, Kovaleva EG, Farquhar ER, Lipscomb JD, & Que L (2008) Swapping metals in Fe- and Mn-dependent dioxygenases: Evidence for oxygen activation without a change in metal redox state. Proceedings of the National Academy of Sciences 105(21):7347-7352. 94. Pungaliya C, et al. (2009) A shortcut to identifying small molecule signals that regulate behavior and development in Caenorhabditis elegans. Proceedings of the National Academy of Sciences 106(19):7708-7713. 95. Phizicky EM & Fields S (1995) Protein-protein interactions: methods for detection and analysis. Microbiological Reviews 59(1):94-123. 96. Bonifacino JS, Dell'Angelica EC, & Springer TA (2001) Immunoprecipitation. Current Protocols in Molecular Biology, (John Wiley & Sons, Inc.). 97. Gosselink RJA, de Jong E, Guran B, & Abächerli A (2004) Co-ordination network for lignin—standardisation, production and applications adapted to market requirements (EUROLIGNIN). Industrial Crops and Products 20(2):121-129. 98. Adler E (1977) Lignin chemistry—past, present and future. (Translated from English) Wood Sci. Technol. 11(3):169-218 (in English). 99. Harris EE (1940) Utilization of Waste Lignin Current Chemical Research. Industrial & Engineering Chemistry 32(8):1049-1052. 100. Tien M & Kirk TK (1983) Lignin-Degrading Enzyme from the Hymenomycete Phanerochaete chrysosporium Burds. Science 221(4611):661-663. 101. Bugg TDH, Ahmad M, Hardiman EM, & Rahmanpour R (2011) Pathways for degradation of lignin in bacteria and fungi. Natural Product Reports 28(12):1883-1896. 102. Society TR (2008) Sustainable biofuels prospects and challenges. (Cardiff, UK). 103. Anderson W & Akin D (2008) Structural and chemical properties of grass lignocelluloses related to conversion for biofuels. (Translated from English) J Ind Microbiol Biotechnol 35(5):355-366 (in English). 104. Geib SM, et al. (2008) Lignin degradation in wood-feeding insects. Proceedings of the National Academy of Sciences 105(35):12932-12937. 105. Ko J-J, et al. (2009) Biodegradation of high molecular weight lignin under sulfate reducing conditions: Lignin degradability and degradation by-products. Bioresource Technology 100(4):1622-1627. 106. Javor T, Buchberger W, & Tanzcos I (2000) Determination of lignin degradation compounds in different wood digestion solutions by capillary electrophoresis. Lenzinger Berichte 79:50-55. 107. Kirk TK & Farrell RL (1987) Enzymatic "Combustion": The Microbial Degradation of Lignin. Annual Review of Microbiology 41(1):465-501.

108

108. Martínez ÁT, et al. (2005) Biodegradation of Lignocellulosics: Microbial, Chemical, and Enzymatic Aspects of the Fungal Attack of Lignin International Microbiology 8(3):195-204. 109. Hatakka A (2001) Biodegradation of Lignin (Wiley-VDH, Weinheim) p 51. 110. Vicuña R (1988) Bacterial Degradation of Lignin. Enzyme and Microbial Technology 10(11):646-655. 111. Zimmermann W (1990) Degradation of Lignin by Bacteria. Journal of Biotechnology 13(2–3):119-130. 112. Lee J (1997) Biological conversion of lignocellulosic biomass to ethanol. Journal of Biotechnology 56(1):1-24. 113. Wang L & Chen H (2011) Increased fermentability of enzymatically hydrolyzed steam-exploded corn stover for butanol production by removal of fermentation inhibitors. Process Biochemistry 46(2):604-607. 114. Dwidar M, Lee S, & Mitchell RJ (2012) The production of biofuels from carbonated beverages. Applied Energy (0). 115. Kaur G, Srivastava AK, & Chand S (2012) Advances in Biotechnological Production of 1,3-propanediol. Biochemical Engineering Journal (0). 116. Zhang F, Rodriguez S, & Keasling JD (2011) Metabolic engineering of microbial pathways for advanced biofuels production. Current Opinion in Biotechnology 22(6):775-783. 117. Chang C-C & Wan S-W (1947) China's Motor Fuels from Tung Oil. Industrial & Engineering Chemistry 39(12):1543-1548. 118. Srivastava A & Prasad R (2000) Triglycerides-based diesel fuels. Renewable and Sustainable Energy Reviews 4(2):111-133. 119. Weisz PB, Haag WO, & Rodewald PG (1979) Catalytic Production of High-Grade Fuel (Gasoline) from Biomass Compounds by Shape-Selective Catalysis. Science 206(4414):57-58. 120. Kishi K, Habu N, Samejima M, & Yshimoto T (1991) Purification and Some Properties of the Enzyme Catalyzing the Cγ-Elimination of a Diarylpropane-type Lignin Model from Pseudomonas paucimobilis TMY1009. Agricultural and Biological Chemistry 55(5):1319-1323. 121. Ohta M, Higuchi T, & Iwahara S (1979) Microbial degradation of dehydrodiconiferyl alcohol, a lignin substructure model. Archives of Microbiology 121(1):23-28. 122. Nakatsubo F, Kirk TK, Shimada M, & Higuchi T (1981) Metabolism of a phenylcoumaran substructure lignin model compound in ligninolytic cultures of <i>Phanerochaete chrysosporium</i>. Archives of Microbiology 128(4):416-420. 123. Kamaya Y, Nakatsubo F, Higuchi T, & Iwahara S (1981) Degradation of d,l-syringaresinol, a β-β′ linked lignin model compound, by Fusarium solani

109

M-13-1. Archives of Microbiology 129(4):305-309. 124. Eiji Masai YKMF (2007) Genitic and Biochemical Investigations on Bacterial Catabolic pathways for Lignin-Derived Aromatic Compounds. Biosci. Biotechnol. Biochem 71(1):1-15. 125. Daisuke Kasai EM, Keisuke Miyauchi, Yoshihiro Katayama, and Masao Fukuda (2005) Characterization of the Gallate Dioxygenase Gene: Three Distinct Ring Cleavage Dioxygenases Are Involved in Syringate Degradation by Sphingomonas paucimobilis SYK-6. Journal of Bacteriology 187(15):5067-5074. 126. Abe T, Masai E, Miyauchi K, Katayama Y, & Fukuda M (2005) A Tetrahydrofolate-Dependent O-Demethylase, LigM, Is Crucial for Catabolism of Vanillate and Syringate in Sphingomonas paucimobilis SYK-6. JOURNAL OF BACTERIOLOGY 187(6):2030-2037. 127. Masai E, et al. (1999) Genetic and Biochemical Characterization of a 2-Pyrone-4,6-Dicarboxylic Acid Hydrolase Involved in the Protocatechuate 4,5-Cleavage Pathway of Sphingomonas paucimobilis SYK-6. JOURNAL OF BACTERIOLOGY 181(1):55-62. 128. Kasai D, Masai E, Miyauchi K, Katayama Y, & Fukuda M (2004) Characterization of the 3-O-Methylgallate Dioxygenase Gene and Evidence of Multiple 3-O-Methylgallate Catabolic Pathways in Sphingomonas paucimobilis SYK-6. JOURNAL OF BACTERIOLOGY 186(15):4951-4959. 129. Punta M, et al. (2012) The Pfam protein families database. (Wellcome Trust Sanger Institute). 130. R.D. Finn JT, J. Mistry, P.C. Coggill, J.S. Sammut, H.R. Hotz, G. Ceric, K. Forslund, S.R. Eddy, E.L. Sonnhammer and A. Bateman (2008) The Pfam protein families database. Nucleic Acids Research (Database Issue 36):D281-D288. 131. Parkhill J, et al. (2003) Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 35(1):32-40. 132. Horst Finger CHWvK ed (1996) Bordetella. In: Barron's Medical Microbiology (Univ of Texas Medical Branch, Galveston, TX), Vol 4th edition. 133. Mansfield JW (2009) From bacterial avirulence genes to effector functions via the hrp delivery system: an overview of 25 years of progress in our understanding of plant innate immunity. Molecular Plant Pathology 10(6):721-734. 134. Bronstein P, et al. (2008) Global transcriptional responses of Pseudomonas syringae DC3000 to changes in iron bioavailability in vitro. BMC Microbiology 8(1):209. 135. W.M. H (1930) Rancidity in cheddar cheese. Master (Queen's University, Kinston). 136. Kodach LL, et al. (2006) Violacein synergistically increases 5-fluorouracil cytotoxicity, induces apoptosis and inhibits Akt-mediated signal transduction in human colorectal cancer cells. Carcinogenesis 27(3):508-516.

110

137. Chernin LS, et al. (1998) Chitinolytic Activity in Chromobacterium violaceum: Substrate Analysis and Regulation by Quorum Sensing. JOURNAL OF BACTERIOLOGY 180(17):4435-4441. 138. Brock T, Brock K, Belly R, & Weiss R (1972) Sulfolobus: A new genus of sulfur-oxidizing bacteria living at low pH and high temperature. (Translated from English) Archiv. Mikrobiol. 84(1):54-68 (in English). 139. Prangishvili D, Stedman K, & Zillig W (2001) Viruses of the extremely thermophilic archaeon Sulfolobus. Trends in Microbiology 9(1):39-43.

111