<<

UPDATE ON THE DISTRIBUTION AND EVOLUTION OF THE

DEHYDROGENASE SUPERFAMILY IN VERTEBRATES AND BIOCHEMICAL

AND POLYMORPHIC CHARACTERIZATION OF HUMAN ALDH1B1

by

BRIAN CHRISTOPHER JACKSON

B.S., University of California, Riverside 2004

M.S., University of Texas at Tyler, 2007

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Toxicology Program

2015 This thesis for the Doctor of Philosophy degree by

Brian Christopher Jackson

has been approved for the

Toxicology Program

by

Dennis Petersen, Chair

Vasilis Vasiliou, Advisor

David Bain

David Orlicky

David Thompson

Date 5/07/2015

ii

Jackson, Brian Christopher (Ph.D., Toxicology)

Update on the Distribution and Evolution of the Aldehyde Superfamily in

Vertebrates and Biochemical and Polymorphic Characterization of Human ALDH1B1

Thesis directed by Professor Vasilis Vasiliou

ABSTRACT

The (ALDH) superfamily is a group of that catalyze the NAD(P)+-dependent oxidation of a wide variety of endogenous and exogenous to their corresponding carboxylic acids. This family is present in all taxonomic lineages studied, including archaea, bacteria, and eukaryotes. As a torrent of new genomic data has become available over the past decade, there is now a need to organize and examine the evolution of this critical superfamily. To create a reference of the distribution and number of ALDHs in vertebrates, 11 representative species with completed genomes were examined and the full number of ALDHs was manually studied

(Chapter II). One recently investigated , ALDH1B1 appeared to have a limited distribution and high similarity to ALDH2. This gene has received increased attention recently as a mediator of metabolism, growth and development, and as a biomarker and possibly key mediator of colon cancer. The complete known distribution of ALDH1B1 was investigated, as well as its evolutionary origins as a retrotransposition of ALDH2. In addition, it was shown that although ALDH1B1 has a unique pattern of expression and substrate specificity from ALDH2, they retain enough similarity that heterotetramerization (and possibly cross-regulation) may be feasible (Chapter III). From

iii determining the distribution of ALDH1B1 and ALDH16A1 across phylogenies, in both cases frogs appeared to have unusual patterns of ALDH distributions. Since there was no frog representative in previous work, the full number of frog ALDHs was determined and full gene trees were created to examine the phylogenetic distribution of frog ALDHs.

This also allowed deeper examination of the distribution and evolution of ALDHs. From this analysis it was determined rather than frogs being unusual, ALDH1B1 likely arose in the early vertebrate lineage and was subsequently lost in species other than frogs and mammals, and that a unique non-catalytic version of ALDH16A1 likely arose in fish, and was transferred to an early amniote ancestor (Chapter IV). This and other examples of evolution of ‘dead-enzymes’ within families led to the search for additional examples of non-catalytic ALDHs. 182 examples were found across all three kingdoms, which were divided into 19 groups based on sequence, with a large number of newly discovered records coming from bacteria and fungi (Chapter IV). Finally, the substrate specificity and effect of human polymorphisms of ALDH1B1 were investigated in depth, and it was found that ALDH1B1 likely plays a role in growth and development via retinaldehyde metabolism, and that this function may be disrupted by mutations prevalent in human populations, especially via the ALDH1B1*2 (A86V) mutation

(Chapter V). This work together enhances our understanding of the distribution and evolutionary origins of the ALDH superfamily as a whole, and increases the understanding of the mechanisms of action of ALDH1B1 in particular.

The form and content of this abstract are approved. I recommend its publication.

Approved: Vasilis Vasiliou

iv

ACKNOWLEDGEMENTS

I would like to thank the current and former members of the Vasiliou lab for their help and support over the years. First I appreciate the efforts of Elizabeth Donald and

Bettina Miller for keeping the lab organized, stocked and compliant. In addition, many lab members helped, advised, or worked with me during the past years including Ying

Chen, Chad Brocker, Guarav Mehta, Surendra Singh, Vindhya Koppaka, Akiko

Matsumoto, Monica Sandoval, and Claire Heit. I would also like to thank the labs that I did research / training rotations in including the lab of Richard Radcliffe and David Ross, especially the training from Chao Yan, David Siegel, and J. ‘Gigi’ Kepa. In addition, I would like to acknowledge the guidance and assistance from my advisor, Vasilis Vasiliou and my committee, Dennis Petersen, David Thompson, David Orlicky, and David Bain.

Many thanks go out to my first mentor Blake Bextine and the people I worked with both at UC Riverside and UT Tyler, for all of the work and guidance that they gave to get me to where I am today. I appreciate the support and love of my family including my Mom and Dad, brothers and sister, in-laws, nephews, nieces, and all of the extended group that I consider home. I appreciate the patience and support of my wife Natalie

Vitovsky, and my ever-constant companions Rupert and Stella.

I would also like to acknowledge NRSA fellowship support from the NIAAA

(F31 AA020728).

v

TABLE OF CONTENTS

CHAPTER

I. INTRODUCTION………………………………………………………………...1

The ALDH Superfamily…………………………………………………..1

Distribution of ALDH in Vertebrates…………………………….10

Frog ALDHs and Phylogenies of Vertebrate ALDHs…………………...12

Evolution and Structural Similarities between ALDH1B1 and ALDH2...14

ALDH ‘Dead Enzymes’………………………………………………….17

Substrate Specificity and Human Mutations of ALDH1B1……………..26

II. UPDATE ON THE ALDEHYDE DEHYDROGENASE GENE (ALDH) SUPERFAMILY IN VERTEBRATES………………………………………….31

Summary…………………………………………………………………31

Introduction………………………………………………………………32

Methods………………………………………………………………..…35

Results…………………………………………………………………....37

Discussion………………………………………………………………..58

III. UPDATE ON THE ALDEHYDE DEHYDROGENASE GENE (ALDH) SUPERFAMILY IN FROG (XENOPUS TROPICALIS) – AN EXAMPLE OF POSSIBLE HORIZONTAL GENE TRANSFER……………………………….64

Summary………………………………………………………………....64

Introduction……………………………………………………………....65

Methods…………………………………………………………………..67

Results…………………………………………………………………....68

Discussion………………………………………………………………..77

IV. COMPARATIVE GENOMICS, MOLECULAR EVOLUTION AND COMPUTATIONAL MODELING OF ALDH1B1 AND ALDH2……………..80

Summary…………………………………………………………………80

vi

Introduction……………………………………………………………....81

Methods………………………………………………………………..…84

Results……………………………………………………………………86

Discussion………………………………………………………………104

V. ROLE OF DEAD ENZYMES OF THE ALDEHYDE DEHYDROGENASE FAMILY IN AND TOXICOLOGY…………………106

Summary……………………………………………………………..…106

Introduction…………………………………………………………..…107

Discovering new ALDH dead-enzymes………………………………..109

Discussion………………………………………………………………111

VI. HUMAN ALDH1B1 POLYMORPHISMS MAY AFFECT THE METABOLISM OF ACETALDEHYDE AND ALL-TRANS RETINALDEHYDE – IN VITRO STUDIES AND COMPUTATIONAL ……………………………………...…123

Summary………………………………………………………………..123

Introduction……………………………………………………………..124

Methods…………………………………………………………………128

Results…………………………………………………………………..136

Discussion………………………………………………………………152

VII. SUMMARY AND FUTURE DIRECTIONS…………………………………..161

REFERENCES………………………………………………………………………....165

vii

LIST OF TABLES

TABLE

2.1 Total number of aldehyde dehydrogenase (ALDH) NCBI gene records identified within each species’ genome…………………………………………………….39

2.2 ALDH genes and duplicated genes across species with respective (Chr) locations…………………………………………………………………...43

2.3 List of the Gene ID (GI), chromosome location, presence of introns, gene type and recommended gene name of all ALDH genes in this study that show evidence of gene duplication, compared with that in the ……….45

2.4 Tabulation of all ALDH genes in this study that show evidence of gene duplication, compared with that in the human genome………………………….47

2.5 Known copy number variations in humans……………………………………...55

3.1 Frog ALDH genes………………………………………………………………..70

3.2 Exons present in ALDH16A1 by species………………………………………..77

4.1 ALDH1B1 and ALDH1A genes and enzymes in selected vertebrate species…...88

4.2 Aldehyde dehydrogenase (ALDH2) genes and enzymes in selected vertebrate species……………………………………………………………………………89

4.3 Comparative docking interaction energies and protein stabilities for ALDH2 and ALDH1B1 subunits…………………………………………………………….100

4.4 Specific interactions made by ALDH homo- and heterotetramers……………..103

5.1 Non-enzymatic functions of ALDHs…………………………………………...109

5.2 Summary of groups of ALDH dead enzyme records…………………………...114

5.3 Full list of ALDH dead enzyme records………………………………………..115

5.4 Summary of mutations of key residues for ALDH dead enzyme groups………121

6.1 Computational modeling of interactions between ALDH isozymes and substrates………………………………………………………………………..138

6.2 Kinetic values for the metabolism of select substrates by ALDH isozymes…...141

6.3 Polymorphisms of human ALDH1B1, and variant frequency by race…………143

6.4 Summary of docking poses for NAD+ binding to ALDH isozymes……………147

viii

6.5 Root mean square (RMSD) distances between ALDH1B1 variants and wild- type……………………………………………………………………..………150

6.6 Homology modeling metrics for ALDH1B1 and variants……………………...150

ix

LIST OF FIGURES

FIGURE

2.1 Neighbor-joining dendrogram (with branch lengths representing relative protein sequence similarity) of ALDH3B sequences in human, rat and mouse, indicating the likely homology and identity of the genes assigned ‘Aldh3b3’ in rat and mouse…………………………………………………………………………….50

2.2 Alignment of ALDH3B2 genes in human, rat and mouse created by ClustalW...51

2.3 Comparison of ALDH4A1 from human and rat…………………………………53

3.1 Protein phylogenetic trees of frog ALDH1A1 (left), ALDH1A2 (center) and ALDH1A3 (right)……………………………………………………………..…71

3.2 Protein phylogenetic trees of frog ALDH1B1 (left) and ALDH2 (right)..………71

3.3 Protein phylogenetic trees of frog ALDH1L1 (left) and ALDH1L2 (right)……..72

3.4 Protein phylogenetic trees of frog ALDH3A2 (left) and ALDH3Bs (right).……72

3.5 Protein phylogenetic trees of frog ALDH4A1 (left) and ALDH5A1 (right)…….74

3.6 Protein phylogenetic trees of frog ALDH6A1 (left) and ALDH7A1 (right)…….74

3.7 Protein phylogenetic trees of frog ALDH8A1 (left) and ALDH9A1 (right)…….75

3.8 Protein phylogenetic trees of frog ALDH16A1 (left) and ALDH18A1 (right)….75

3.9 Protein phylogenetic trees of the standard model of vertebrate evolution (left) and frog ALDH16A1 (right).…………………………………………………………76

4.1 Phylogenetic tree for vertebrate ALDH2 and ALDH1B1 protein sequences……90

4.2 Comparative structures for human and mouse ALDH2 and ALDH1B1 genes….92

4.3 Amino acid sequence alignments for human, mouse and frog ALDH2 and ALDH1B1 sequences…………………………………………………………….94

4.4 Comparative amino acid sequence alignments for ALDH2 and ALDH1B1 subunit binding domains……………………………………………………………….…96

4.5 A proposal for the evolutionary appearance of the mammalian and frog ALDH1B1 genes by retroviral integration of ancestral ALDH2 cDNA sequences………………………………………………………………………...97

4.6 ALDH2 and ALDH1B1 subunit-subunit docking results…...………………….101

x

5.1 The phylogenetic distribution of ALDH dead enzyme records found showing the name of major groups and the number of records in that group………………..112

5.2 Representative alignment showing the key residues required for enzymatic activity in Group 19 (Omega )………………………………………113

6.1 Alignment of ALDH1B1 and ALDH2…………………………………………130

6.2 Representative docking poses for substrates of ALDH1B1……………………139

6.3. Representative docking poses of aldehyde substrates to ALDH1B1...…………140

6.4 Metabolism of by recombinant ALDH2 and ALDH1B1………...142

6.5 Location of polymorphisms of ALDH1B1……………………………………..144

6.6 Comparison of substrate-binding domain in ALDH1B1*1 and ALDH1B1*3...146

6.7 Docking poses for NAD+ bound to ALDH1B1 and human variants…………...149

6.8. Expression and activity of ALDH1B1 variants………………………………...151

xi

ABBREVIATIONS

1,2 DNG 1,2 dinitroglycerin

1,3 DNG 1,3 dinitroglycerin

4-HNE 4-hydroxy nonenal

AA amino acid

AFR African

ALDH aldehyde dehydrogenase

AMP adenosine monophosphate

ASN Asian

ATP adenosine triphosphate

BHMT betaine-homocysteine S-methyltransferase

BMP bone morphogenic protein bp base pairs

CASK calcium/calmodulin-dependent serine protein kinase

Chr chromosome

CNV copy number variant

COP clathrin-ordered protein

ER endoplasmic reticulum

ERK extracellular signal-regulated kinase

EUR European

FPLC fast protein liquid chromatography

G3PDH glyceraldehyde 3-phosphate dehydrogenase

xii

GBAS glioblastoma amplified sequences

HGT horizontal gene transfer

HMM hidden Markov model

HPRT hypoxanthine phosphoribosyltransferase

HSC human hematopoietic stem cell

HSP heat shock protein

IND Indian

InDel insertions and deletions iRHOM rhomboid family pseudoproteases

LPO lipid peroxidation

MAPK mitogen activated protein kinase

MEK mitogen activated protein

MEX Mexican

MYA million years ago

NAD nicotinamide adenine dinucleotide

NADH nicotinamide adenine dinucleotide

NADP nicotinamide adenine dinucleotide phosphate

NADPH nicotinamide adenine dinucleotide phosphate

NCBI National Center for Biotechnology Information

NTG nitroglycerin

PAAF proteasomal ATPase-associated factor

PCR polymerase chain reaction

PGK phosphoglycerate kinase

xiii

PI3K phosphoinositol 3 kinase

PRKAG protein kinase, AMP-activated, gamma 2 non-catalytic subunit

RMSD root mean square distance

ROS reactive species

PAGE polyacrylamide gel electrophoresis

SKIP S-phase kinase-associated protein

SLC solute carrier family 2-facilitated glucose transporter

SNP single nucleotide polymorphism

TACE TNFα-converting enzyme

TEAD TEA domain protein

TNF tumor necrosis factor

TRIB tribbles protein

TRIM tripartate motif family protein

UPLC ultra performance liquid chromatography

USP ubiquitin specific peptidase

UVR ultra violet radiation xCTBP xenopus cytosolic thyroid hormone-binding protein

YAP yes-associated protein

xiv

CHAPTER I

INTRODUCTION

The ALDH Superfamily

The aldehyde dehydrogenase (ALDH) superfamily catalyzes the irreversible

NAD(P)+-dependent oxidation of aldehydes to carboxylic acids. Aldehydes are reactive molecules that can damage cellular macromolecules; ALDH enzymes protect cells by keeping aldehyde levels low. In addition, aldehydes can serve a physiological function in that many metabolic pathways require aldehyde intermediates which require by

ALDH enzymes. For example, ALDHs are required in the synthesis of (a key mediator of growth and differentiation), betaine (an important organic osmolyte), and tetrahydrofolate (a regulator of proliferation). ALDH are typically ~500 amino acids (AA) in length, although some members have multiple ALDH domains (e.g.,

ALDH16A1 has one full and one partial ALDH domain) or an ALDH domain and another enzymatic domain (e.g., ALDH1L1 has an ALDH domain and a formyl domain) which may increase the length dramatically (Jackson, Brocker et al.

2011). ALDH monomers contains three domains – a substrate-binding (catalytic) domain, a (NAD(P)+) binding domain, and a dimerization / tetramerization domain

(Steinmetz, Xie et al. 1997), and ALDH proteins are typically found as either homodimers or homotetramers.

Humans have 19 ALDHs, which are grouped into 12 families. There are a number of additional families found exclusively in non-vertebrates which will not be considered

1 here. ALDH genes are named in a manner similar to the convention set out for the

Cytochrome P450 (CYP) gene family, i.e., with the superfamily name (ALDH) followed by a number indicating family (e.g. ALDH1), a letter indicating subfamily (e.g.

ALDH1A), and a number indicating member number (e.g. ALDH1A1) (Vasiliou,

Bairoch et al. 1999). Typically, family members share greater than ~60% AA identity and subfamily members share greater than ~40% AA identity, although homology should also be taken into account when creating families and subfamilies. Each ALDH family is discussed in turn, although emphasis is primarily on vertebrate members.

The ALDH1 family: The named ALDH1 family consists of ALDH1A1,

ALDH1A2, and ALDH1A3, as well as ALDH1B1. By homology, and by AA sequence identity, ALDH2 is considered part of the ALDH1 family. However, it retains the

ALDH2 designation for historical reasons which include the length of time that it has had that name and the number of publications referring to it as such. By phylogenetic analysis and by AA sequence identity, ALDH2 would likely belong to the ALDH1B family.

ALDH1L1 and ALDH1L2 are not part of the ALDH1 family, which represents an additional historical exception. The three ALDH1A genes, ALDH1A1, ALDH1A2 and

ALDH3, are all highly conserved and participate in the metabolism of retinaldehyde to retinoic acid, a key regulator of differentiation and development (Zhao, McCaffery et al.

1996; Niederreither, Fraulob et al. 2002; Rhinn and Dolle 2012). ALDH1A2 knockout mice are embryonic lethal, while ALDH1A1 and ALDH1A3 mice are viable but with various developmental abnormalities (Niederreither, Subbarayan et al. 1999; Dupe, Matt et al. 2003; Molotkov and Duester 2003). These enzymes are also essential for the metabolism of lipid peroxidation-derived aldehydes, including 4HNE, MDA, hexanal,

2 octanal and decanal (Wang, Penzes et al. 1996; King and Holmes 1997; Graham,

Brocklehurst et al. 2006). All three ALDH1 proteins are cytosolic with ALDH1A1 and

ALDH1A2 forming homotetramers and ALDH1A3 forming a homodimer (King and

Holmes 1997; Hsu, Chang et al. 2000; Graham, Brocklehurst et al. 2006).

The ALDH1B group includes ALDH1B1 and ALDH2. Both are mitochondrially expressed as homotetramers. ALDH2 is the enzyme primarily responsible for the metabolism of acetaldehyde, the toxic byproduct of consumption (Klyosov,

Rashkovetsky et al. 1996), and of nitroglycerin, an anti-anginal drug (Daiber, Wenzel et al. 2009). Rapid inhibition of ALDH2 by nitroglycerin leads to nitrate tolerance, as well as increasing oxidative stress (Daiber, Oelze et al. 2009). In the brain, ALDH2 is responsible for metabolism of 3,4-dihydroxyphenylacetaldehyde (DOPAL), a metabolite of the neurotransmitter (Marchitti, Deitrich et al. 2007). In addition, ALDH2 plays a key role in the metabolism of lipid peroxidation-derived aldehydes, especially 4- hydroxynonenal (4-HNE) and malondialdehyde (MDA) (Reichard, Vasiliou et al. 2000) in tissues where ALDH2 is expressed. ALDH2 enzyme activity can be disrupted in human populations by inactivating mutations (e.g., the ALDH2*2 mutation in 40-50% of east Asians) (Peng and Yin 2009), pharmacological inhibition, and by environmental exposures (e.g. pesticides) (Koppaka, Thompson et al. 2012). Such disruption has consequences related to each substrate including increased incidence of Parkinson’s disease due to DOPAL toxicity (Kamino, Nagasaka et al. 2000), higher rates of myocardial infarction due to lipid peroxidation-derived aldehyde damage (Jo, Kim et al.

2007), and hepatotoxicity and gastrointestinal tract cancers due to acetaldehyde toxicity

(Matsuda, Yabushita et al. 2006). In alcoholics especially, saturation of ALDH2 by

3 acetaldehyde competes with the substrates discussed above and may lead to each of these

ALDH2 deficiency-related disorders (MacKerell, Blatter et al. 1986). Despite other similarities with ALDH1A enzymes, ALDH2 does not metabolize retinaldehyde to retinoic acid (Sobreira, Marletaz et al. 2011).

ALDH1B1 was first characterized in 1995 and showed a preference for short chain aldehydes (Stewart, Malek et al. 1995). Recently the Vasiliou lab characterized the substrate specificity of ALDH1B1 more broadly and showed that ALDH1B1 is a mitochondrial homotetramer that has specificity for short through long chain saturated aldehydes, but has poor activity towards the lipid peroxidation products MDA and 4-

HNE (Stagos, Chen et al. 2010). Epidemiological studies have linked ALDH1B1 mutants with endpoints consistent with increased acetaldehyde toxicity upon drinking, suggesting that ALDH1B1 participates in acetaldehyde metabolism from ethanol ingestion

(Husemoen, Fenger et al. 2008; Linneberg, Gonzalez-Quintela et al. 2010). A number of studies have linked ALDH1B1 to growth and development in human hematopoietic stem cells (Luo, Wang et al. 2007), and in the pancreas (Ioannou, Serafimidis et al. 2013).

However, the substrate that likely mediates these effects was unknown previous to this work. Also, unpublished work by the Vasiliou lab has shown that in an ALDH1B1 knockout mouse model, glucose control and acetaldehyde clearance are impaired.

Clearly, although there are hints regarding ALDH1B1’s substrates and functions in human physiology, much work remains to be done with this key enzyme.

The ALDH1L family: ALDH1L family members were described as ALDH1-like.

The ‘-L’ designation for ‘-like’ is nomenclature that was once common in naming enzyme families, but is now discouraged. The ALDH portion of the bifunctional

4

ALDH1L members is indeed nearest to ALDH1 members, but with a unique structure, function and percent AA identity (the ALDH domain of ALDH1L1 shares 50% AA identity with ALDH1A1, but the two proteins share only 27% AA identity overall), the

ALDH1L enzymes qualify as a different family. The ALDH1L family consists of two members, ALDH1L1 and ALDH1L2. Both are homotetramers consisting of two domains, an amino-terminal formyl transferase domain and carboxy terminal ALDH domain (Krupenko, Horstman et al. 1995). They catalyze the conversion of 10- formyltetrahydrofolate (10-fTHF) to tetrahydrofolate (THF) and CO2 (Garcia-Martinez and Appling 1993). 10-fTHF is a key metabolite involved in purine biosynthesis (and, therefore, has a role in DNA replication and repair). ALDH1L1 is cytosolic and likely acts to regulate cell growth and proliferation by limiting the available pool of 10-fTHF for purine biosynthesis (Oleinik, Krupenko et al. 2006; Strickland, Krupenko et al. 2013).

ALDH1L2, by contrast, is mitochondrial and it has been suggested that in this context, the same reaction (10-fTHF → THF) is used to supply one-carbon groups (i.e., formate) for cytosolic one-carbon metabolism (Appling 1991; Strickland, Krupenko et al. 2011).

Evidence for these opposing roles (i.e., ALDH1L1 is growth-suppressing and ALDH1L2 is growth-promoting) is found in the observation that ALDH1L1 is silenced in human cancers, allowing unlimited proliferation (Krupenko and Oleinik 2002; Rodriguez,

Giannini et al. 2008; Chen, He et al. 2012), while ALDH1L2 remains expressed at normal levels (Krupenko, Dubard et al. 2010).

The ALDH3 family: The ALDH3 family consists of four members in humans, viz.

ALDH3A1, ALDH3A2, ALDH3B1 and ALDH3B2. ALDH3A1 is a cytosolic homodimer which mediates oxidative stress (Estey, Piatigorsky et al. 2007). It catalyzes

5 the oxidation of various lipid peroxidation-derived aldehydes, including 4-HNE (Pappa,

Estey et al. 2003). In general, this has been shown to maintain cell proliferation by protecting against the anti-proliferative effects of lipid peroxidation-derived damage

(Canuto, Muzio et al. 1999; Muzio, Trombetta et al. 2003). However, ALDH3A1 has also been shown to be expressed in the nucleus where it may slow proliferation via down- regulation of various cell cycle and cell regulatory proteins (Pappa, Brown et al. 2005). It has been suggested that this may allow for DNA damage repair (Lassen, Pappa et al.

2006). ALDH3A1 is overexpressed in a number of cancers, including non-small-cell lung cancer and hepatoma (Canuto, Ferro et al. 1994; Huang, Hu et al. 2000). ALDH3A2 is a microsomal homodimer that participates in the oxidation of fatty alcohols to fatty acids

(Ichihara, Kusunose et al. 1986; Chang and Yoshida 1997). Major substrates are thought to include metabolites of fatty alcohols, phytanic acid, leukotriene B4, and ether glycerolipids (Marchitti, Brocker et al. 2008). Various mutations in ALDH3A2 lead to

Sjögren-Larsson syndrome (SLS), caused by abnormal accumulation of lipids in the membranes of cells of the skin and brain, aldehyde adducts to cellular macromolecules, and defective eicosanoid metabolism (Rizzo 2007). Gene therapy with active, wild-type human ALDH3A1 has proven promising in patients with SLS (Haug and Braun-Falco

2006).

Less is known about ALDH3B1 and ALDH3B2. ALDH3B1 metabolizes the lipid-peroxidation product 4-HNE as well as aliphatic aldehydes of carbon chain length six and up including unsaturated and trans-2 aldehydes. It has poor activity against shorter aldehydes such as acetaldehyde and MDA (Marchitti, Orlicky et al. 2007). Little is known about the function of ALDH3B1, but it is proposed to protect the brain against

6 oxidative stress, although not via a mechanism involving the metabolism of DOPAL

(Marchitti, Orlicky et al. 2007). Mouse ALDH3B2 has only recently been characterized and been shown to have a broadly similar substrate specificity to ALDH3B1 (Kitamura,

Takagi et al. 2015). ALDH3B2 localizes to lipid droplets, distinguishing it from

ALDH3B1 which associates with the plasma membrane (Kitamura, Takagi et al. 2015).

The oligomerization status of ALDH3B1 and ALDH3B2 is currently unknown.

ALDH4A1: In vertebrates, the ALDH4 – ALDH18 families consist of a single member each. ALDH4A1 is a homodimer which is expressed mitochondrially (Hu, Lin et al. 1996). The primary function of ALDH4A1 is the metabolism of proline-5-carboxylate

(P5C) to glutamate, a neurotransmitter (Forte-McRobbie and Pietruszko 1986; Haslett,

Pink et al. 2004). Inactivating mutations of ALDH4A1 lead to the accumulation of P5C which causes type II hyperprolinemia (Geraghty, Vaughn et al. 1998). There is also some evidence that ALDH4A1 participates in the mitigation of oxidative stress via the metabolism of short- and medium-chain lipid peroxidation-derived aldehydes (Farres,

Julia et al. 1988).

ALDH5A1: ALDH5A1, expressed as a homotetramer in the mitochondria (Kang,

Park et al. 2005), converts succinic semialdehyde (SSA) to succinate. This process represents the last step of γ-aminobutyric acid (GABA) catabolism. Inactivating mutations of ALDH5A1 lead to γ-hydroxybutyric aciduria due to the accumulation of

GABA, γ-hydroxybutyric acid (GHB), and SSA in the body (Akaboshi, Hogema et al.

2003). ALDH5A1 probably does not participate in oxidative stress mitigation and is inhibited by the lipid peroxidation-derived aldehydes, 4-HNE and acrolein (Nguyen and

Picklo 2003).

7

ALDH6A1: ALDH6A1 is a homotetramer expressed in the mitochondria

(Kedishvili, Popov et al. 1992). It is CoA dependent and catalyzes the conversion of methylmalonate semialdehyde (MMS) to propionyl-CoA and malonate semialdehyde to acetyl-CoA (Goodwin, Rougraff et al. 1989). Inactivating mutations lead to conditions characterized by psychomotor delays and involve the accumulation of 3-aminoisobutyric acids, 3-hydroxyisobutyric acids, β-alanine, and 3-hydroxypropionoic acid (Chambliss,

Gray et al. 2000).

ALDH7A1: Expressed as a homotetramer, ALDH7A1 can be found in the nucleus, and mitochondria (Brocker, Lassen et al. 2010; Brocker, Cantore et al.

2011). Its best known function is the metabolism of alpha-aminoadipic semialdehyde

(AASA) to aminoadipate, which is a key reaction in the pipecolic acid pathway of catabolism (Chang, Ghosh et al. 1990). Additional substrates include medium- and long- chain aldehydes, betaine aldehyde, and 4-HNE. Inactivating mutations in this enzyme cause the accumulation of AASA, leading to pydridoxine-dependent epilepsy (PDE), which can be mitigated with high daily doses of pyridoxine (vitamin B6) (Mills, Struys et al. 2006). In plants, ALDH7A1 plays a role in osmotic regulation (Lee, Kuhl et al. 1994); it plays a similar role in humans by metabolizing toxic aldehydes and generating osmolytes, including betaine (Brocker, Lassen et al. 2010).

ALDH8A1: Relatively little is known about ALDH8A1. Expressed in the cytosol

(Lin and Napoli 2000), ALDH8A1 has specificity for medium- to long-chain aldehydes,

SSA, and glutaraldehyde. Interestingly, ALDH8A1 metabolizes retinaldehyde, specifically 9-cis retinaldehyde (Lin and Napoli 2000), and is the only ALDH to show preference for 9-cis retinaldehyde over all-trans retinaldehyde (Lin and Napoli 2000).

8

ALDH9A1: ALDH9A1 encodes a homotetramer found in the cytosol (Lin, Chen et al. 1996). This enzyme participates in the metabolism of γ-aminobutyraldehyde

(involved in GABA biosynthesis), DOPAL (dopamine metabolism), betaine aldehyde (an osmolyte), γ-trimethylaminobutyrladehyde (carnitine biosynthesis), and acetaldehyde

(Marchitti, Brocker et al. 2008).

ALDH16A1: This protein is unusual in that it contains two ALDH family domains – one is full length and the other is truncated, missing most of the catalytic domain (Vasiliou, Sandoval et al. 2013). ALDH16A1 is found in two major forms: a non- catalytic form found in mammals, fish and reptiles, and a catalytic form found in amphibians, lower animals, lower eukaryotes and bacteria. The non-catalytic human form has been shown to interact with a large number of proteins and such protein-protein interactions likely underlie ALDH16A1’s function in its non-catalytic form. Studies have identified an ALDH16A1 mutation as a risk factor for hyperuricemia and gout, possibly due to interactions with HPRT1 (Sulem, Gudbjartsson et al. 2011; Vasiliou, Sandoval et al. 2013). To date, the function and substrate specificity of the catalytic form of

ALDH16A1 remains unknown.

ALDH18A1: ALDH18A1 is an inner mitochondrial membrane protein (Hu, Lin et al. 1999) which consists of two domains, an amino-terminal amino acid kinase domain and a carboxy-terminal aldehyde dehydrogenase domain. It catalyzes the oxidation of glutamate to delta-pyrroline-5-carboxylate, a part of the de novo synthesis pathway for proline and arginine (Hu, Lin et al. 1999). Inactivating mutations of ALDH18A1 lead to a metabolic syndrome with diverse abnormalities due to both accumulations of precursors and deficiency of products in this pathway. Such abnormalities include hypoprolinemia,

9 hypoornithinemia, hypocitrullinemia, hypoargininemia and hyperammonemia with cataract formation, and connective tissue anomalies (Marchitti,

Brocker et al. 2008).

Distribution of ALDH Genes in Vertebrates

ALDH enzymes are found in all three taxonomic domains (Archaea, Eubacteria, and Eukarya), suggesting an ancient origin and vital role in the physiology of organisms.

Since the full sequencing of the human genome and gradual revision and annotation of protein-coding genes, the number of known ALDH enzymes in humans has stabilized at

19 members. This process has begun and continued in recent years for numerous other species, including key model organisms, especially mice. The use of model organisms, especially vertebrates, to represent human physiology has a long a scientifically useful history. However, the valid use of model organisms requires understanding of the similarity and differences between the number and functions of the genes involved in key pathways in each species. In addition, gene duplication events, leading to multiple functional and/or non-functional genetic copies in the genome, can significantly complicate polymerase chain reaction (PCR)-based genotyping assays. Transgenic animal models have permitted the exploration of the functions of ALDHs under in vivo physiological and pathophysiological conditions (Marchitti, Brocker et al. 2008).

The Vasiliou lab has a long history of annotating and updating the ALDH gene family as new genomic data has become available. In 1999, a standardized system was proposed for the ALDH superfamily, described above, similar to the nomenclature system used in the P450 superfamily (Vasiliou, Bairoch

10 et al. 1999). This included all 86 eukaryotic ALDH sequences known at the time. Sixteen of the 19 human ALDH enzymes were known by 2000 (Vasiliou and Pappa 2000). These nomenclature reports were periodically updated with 331 records known in 2001 and 555 records in 2003 (Sophos, Pappa et al. 2001; Sophos and Vasiliou 2003). By 2005, all 19 human ALDH genes had been discovered and named (Vasiliou and Nebert 2005). In recent years, the availability of next-gen (second generation high-throughput) sequencing has increased the availability of genomic data exponentially. A download from the Pfam database in 2011 (build version 24.0) includes 16,765 ALDH entries (listed as aldedh in the Pfam database) (Finn, Mistry et al. 2010). By 2013, this database listed almost 40,000

ALDH entries. Many of these are from recently completed genomes with automatic annotation. Although identification is often quite accurate due to well- established protein family signatures, in many cases genes annotated this way are assigned to the wrong families or subfamilies. In addition, generous algorithms list pseudogenes as protein-coding genes, or may not annotate gene duplication events correctly. In Chapter II, the full ALDH number is reviewed and annotated in 11 representative vertebrate species in which the full genome has been sequenced: five primates, the cow, two rodents, two birds and one fish. Where possible, pseudogenes are identified and gene-duplication events are annotated. In this study, it is shown that that the 19 human ALDHs are typical of the number of ALDHs in each vertebrate examined with notable duplications or deletions in certain lineages.

11

Frog ALDHs and Phylogenies of Vertebrate ALDHs

This chapter continues the work introduced in Chapter II, which described the general distribution of ALDHs across vertebrates, based on a few individual species per group. However, this work could be expanded in a several key ways. ALDH1B1 has now been shown to have arisen as a gene duplication event from ALDH2 (Jackson, Holmes et al. 2013), but the distribution of ALDH1B1 is limited to mammals and one frog species, as opposed to ALDH2, which is distributed widely throughout animals. The Vasiliou lab has recently completed an initial characterization of ALDH16A1which is present in two major forms – a non-catalytic form present in most vertebrates examined, including mammals and fish, and a catalytic form found in lower animals (Vasiliou, Sandoval et al.

2013). However, it was discovered that frogs possess a putatively catalytic ALDH16A1, which was phylogenetically similar to lower animals. The observations of unusual distributions of frog ALDH1B1 and ALDH16A1 suggests that frog ALDHs may be unusual but it is difficult to quantify this exactly given current information. First, no amphibian or reptile was included earlier gene nomenclature work, since they were only sequenced quite recently. Therefore, the first step is to determine the number and type of

ALDH genes present in amphibians. Second, this chapter aims to determine if these apparent anomalies in the distribution of ALDHs in frogs represent horizontal gene transfer (HGT) event(s).

Gene transfer between non-mating species is called horizontal gene transfer

(HGT). The more typical form of gene transfer is vertical gene transfer, where genes are passed from a parent species to divergent offspring whose genes are separated either by speciation events or by gene duplications. Vertebrate-to-vertebrate HGT is poorly

12 understood, although gene transfer between bacteria has been well documented (Arber

2014). Further a transfer of prokaryotic genes to eukaryotes, including to primates, flies, and nematodes has been shown (Crisp, Boschetti et al. 2015). Evidence for HGT may come in several forms, including the presence of genes in a species or lineage that is highly similar to a distant species, but not more closely related species, evolutionarily.

The best evidence is the construction of gene phylogenies. Numerous gene phylogenies have led to the conclusion that eukaryote-to-eukaryote HGT has occurred (Keeling and

Palmer 2008). Although a gene may be introduced into a genome directly, adding to the number of existing genes, in many instances, the acquired gene replaced an existing homologue, rather than introducing a new gene (Keeling and Palmer 2008). Although detection of HGT in vertebrates has gained interest in recent years, little is known about the mechanisms by which genes move and the factors which encourage or discourage this movement (Keeling and Palmer 2008).

In chapter III, to investigate these questions, all frog ALDH genes found across several databases were aligned to create a unique list of the ALDH genes in frog. Next,

AA sequences for each of these genes were used to seed large gene trees (300 nearest neighboring homologous genes) to determine the origin and extent of each gene. By doing this, 1) each frog gene was placed in within a broad phylogenetic context to be able to judge evidence for unusual gene distributions, and 2) these gene trees allow further examination of the phylogenetic distribution of these genes, especially within this lineage since frog ALDHs are well representative of vertebrate ALDHs.

13

Evolution and Structural Similarities Between ALDH1B1 and ALDH2

One pair of genes ALDH2 and ALDH1B1 is receiving increased attention due to their high degree of similarity. Based on phylogenetic clustering, AA similarity and structural similarity, and using the typical ALDH superfamily naming conventions,

ALDH2 should be folded into the ALDH1 family. However, for historical reasons (see above), ALDH2 retains the ‘2’ family designation. As described above, the ALDH2 resides on human and encodes a mitochondrial tetramer that is active in a number of endogenous and exogenous detoxifications (Hsu, Bendel et al. 1988).

Importantly, ALDH2 is the enzyme that plays the greatest role in metabolism of acetaldehyde, the primary toxic product of ethanol consumption (Wang, Nakajima et al.

2002; Ohta, Ohsawa et al. 2004; Marchitti, Brocker et al. 2008). This primacy is illustrated best in East Asian human subjects possessing a dominant inactive genetic variant, ALDH2*2 (E487K). This variant results in alcohol ‘flushing’ and a lowered acetaldehyde clearance capacity, arising from lowered coenzyme (NAD+) binding affinity and is generally considered inactive (i.e., greater than 90% loss of enzyme activity)

(Yoshida, Ikawa et al. 1985; Goedde and Agarwal 1987; Higuchi, Muramatsu et al. 1992;

Cook, Luczak et al. 2005; Chen, Peng et al. 2009). ALDH2 forms homotetramers in vitro, but it has been demonstrated that ALDH2*2 can form heterotetramers with ALDH2, dominantly inactivating wild-type monomers in the final protein (Wang, Sheikh et al.

1996). ALDH1B1 is similarly mitochondrially located.

Several lines of evidence suggest that ALDH1B1 also participates in the clearance of acetaldehyde on drinking. ALDH1B1 shares 72 percent AA identity with ALDH2

(ALDH2’s closest relative within the ALDH1 family), and like ALDH2, forms

14 homotetramers and is capable of metabolizing acetaldehyde in the body (Km of 55 µM), in addition to numerous other aldehydes (Stagos, Chen et al. 2010). ALDH1B1 is present in tissues where it is likely to encounter ethanol-derived acetaldehyde – strong expression in the stomach and small intestines are likely to metabolize acetaldehyde derived from the metabolism of ethanol by bacterial alcohol , thereby playing a role in first-pass metabolism. Immunohistochemical staining of ALDH1B1 in human liver shows a strong prevalence of ALDH1B1 (Chen, Orlicky et al. 2011), although these are likely lower than ALDH2 levels based on reported relative transcript levels (Stagos, Chen et al. 2010). The aversive effects of acetaldehyde typically begin around 40-60 µM, tested in wild-type individuals (ALDH2*1/*1) (Johnsen, Stowell et al. 1992). When alcoholic individuals were given 0.5 g/kg of ethanol (equivalent to 3-4 drinks in an average weight man), wild type individuals (ALDH2*1/*1) had a measured blood acetaldehyde concentration of 1.8 µM (Chen, Peng et al. 2009). Heterozygous individuals

(ALDH2*1/*2) had an average blood acetaldehyde concentration of 57.5 µM and homozygous mutant (ALDH2*2/*2) individuals had 108.7 µM blood acetaldehyde concentration. Interestingly, these values are strong evidence for non-complete dominance in heterozygotes, and the effects that even a small amount of active ALDH2 can have. Similar values were seen in non-alcoholic populations of ALDH2*1/*1 and

ALDH2*1/*2 individuals, but since the ethanol dose was so poorly tolerated by

ALDH2*2/*2 individuals, data were not able to be measured in that group. At these elevated acetaldehyde concentrations, ALDH1B1 (Km - 55 µM) would be well suited to participate in the metabolism of acetaldehyde.

15

Additional evidence for the participation of ALDH1B1 in acetaldehyde metabolism in vitro comes from epidemiological studies that have shown that mutations in ALDH1B1 result in physiological effects which are consistent with increased acetaldehyde toxicity (i.e. ethanol aversion and ethanol hypersensitivity reactions)

(Husemoen, Fenger et al. 2008; Linneberg, Gonzalez-Quintela et al. 2010). Finally, in a

ALDH1B1 mouse knockout model developed in the Vasiliou Lab, ALDH1B1 deficient mice are less tolerant of ethanol feeding than wild-type mice, achieving higher blood acetaldehyde levels [Manuscript in preparation]. I hypothesize that given this data,

ALDH1B1 should compensate for the lack of ALDH2*2 to some extent, especially at blood acetaldehyde concentrations exceeding 50µM. Given the acetaldehyde metabolizing capacity of ALDH1B1, it is surprising that ALDH2*2 individuals are so susceptible to acetaldehyde toxicity in that their reduced metabolic activity is not compensated for by ALDH1B1 activity to a greater extent. One possible reason that this does not occur may be due to ALDH2*2 forming heterotetramers with ALDH1B1 and thereby exerting a similar negative effect on ALDH1B1 activity as has been seen for

ALDH2. This possibility was first suggested by Vasiliou in 2009 (Endo, Sano et al.

2009).

In Chapter IV, the phylogenetic and sequence similarities between ALDH1B1 and

ALDH2 are investigated. Understanding the evolutionary origins of ALDH1B1 will allow better understanding of ALDH1B1, helping to distinguish between a duplication of

ALDH2 that would serve as a backup to ALDH2’s detoxification functions, and a new enzyme that plays a unique physiological function. The predicted sequences, structures and phylogeny of vertebrate ALDH2 and ALDH1B1 genes and enzymes are identified and

16 these results are compared with those previously reported for human (Homo sapiens) and mouse (Mus musculus) ALDH2 and ALDH1B1. Phylogenetic analyses describe the relationships and potential origins of ALDH2 and ALDH1B1 genes during mammalian and vertebrate evolution. Given their structural similarity and the mechanism by which

ALDH2*2 monomers lower the activity of ALDH2, the possibility of a similar heterotetramerization-based suppression of ALDH1B1 warrants examination. Molecular modeling studies of protein-protein interactions between ALDH1B1 and ALDH2 subunits are used to examine whether dominant negative heteromerization of ALDH2*2 with ALDH1B1 may contribute to a lack of compensation by ALDH1B1 in ALDH2*2 individuals. Specifically, conservation of the monomer-monomer and dimer-dimer interfaces and interactions are predicted and compared.

ALDH ‘Dead Enzymes’

The existence of catalytically inactive homologues of enzymes has been known for more than half a century (Brew, Vanaman et al. 1967), and have been called “inactive enzyme-homologues”, “nonenzymes,” “pseudoenzymes” or “dead enzymes” (Leslie

2013; Vasiliou, Sandoval et al. 2013). These can be defined as homologues of enzymes which are predicted to retain protein expression, subcellular localization and typical protein folding, but which have lost key residues required for catalytic activity. These are in contrast to pseudogenes, which have no protein product. Over the intervening years, other examples of enzymes losing catalytic ability and recruitment to non-catalytic functions have been documented, but this has always been assumed to be a relatively rare occurrence. Whole genome sequencing has altered this contention dramatically. When

17 the entire human genome was combed for protein kinases in 2002, nearly 10% (50 of

518) of all human kinases were found to lack at least one of the three conserved catalytic residues required for activity (Manning, Whyte et al. 2002). Further, rather than being unique defects, 28 of these inactive kinases had inactivations that were conserved in human, fly, worm and yeast (Manning, Whyte et al. 2002). These enzymes were shown to have taken on new, evolutionarily conserved, non-catalytic roles. It is now understood that such “dead enzymes” are present in a wide variety of enzyme families and play diverse roles in physiology and pathophysiology.

Dead enzymes have now been found in most enzyme families and a number of commonalities have been found. It has been postulated that the conservation of protein expression (a physiologically expensive process) suggests that dead enzymes are likely to serve some physiological function, although such function is yet to be defined in many newly discovered dead enzyme families. Indeed, many dead enzymes are conserved across taxonomic groups (e.g., among vertebrates, or insect species). This provides evidence of functional significance and conservative selective pressure – otherwise, many of these proteins would have been lost over evolutionary time (Pils and Schultz 2004).

Dead enzymes often have regulatory functions (Pils and Schultz 2004), and frequently, the processes regulated are those in which their active counterparts participate (Adrain and Freeman 2012). The mechanism of action typically involves protein-protein interaction; the dead enzyme may modulate the activity of an active enzyme, or interact with a separate protein substrate acting as an allosteric modulator. In some cases, dead enzymes act on their natural substrates directly, sequestering them either as inhibitors of

18 other enzymes, or anchoring them in a particular subcellular space (Reiterer, Eyers et al.

2014). Examples of some of these mechanisms are provided below.

The best studied dead enzymes belong to the kinase and phosphatase families

(sometimes referred to as pseudokinases and pseudophosphatases). The Trib family of pseudokinases has three members, all of which have been implicated as modulators of tumorigenesis. TRIB1 allosterically interacts with MEK1, leading to greater ERK phosphorylation – this has been found to be a key factor in myeloid leukemias

(Yokoyama, Kanno et al. 2010). TRIB2, a downstream target of Wnt in liver cancer, stabilizes YAP by interacting with two E3 – binding of TRIB2 to βTrCP blocks its targeting of YAP for proteasomal degradation, and TRIB2 promotes degradation of

C/EBPα, an inhibitor of YAP/TEAD transcriptional activation, likely due to interaction with the E3 ligases COP1 or TRIM21 (Wang, Park et al. 2013). TRIB3 (also called

TRB3) is the best studied member of this family and has been found to interact with a range of partners including transcription factors, ubiquitin , BMP type II receptor, and members of the MAPK and PI3K signaling pathways, thereby affecting a wide range of physiological processes including energy homeostasis, apoptosis, differentiation, and stress response (Hua, Mu et al. 2011). Knockdown of TRIB3 inhibits migration and invasion of tumor cells, and modulates proteins regulating epithelial-to-mesenchymal transition (EMT) (Hua, Mu et al. 2011). ErbB3 is a pseudokinase that interacts as part of a heterodimer with its active enzyme counterpart to modulate its activity. ErbB receptors typically hetero- or homodimerize and undergo trans-phosphorylation (Reiterer, Eyers et al. 2014). However, despite ErbB3’s inability to bind ATP and lack of tyrosine kinase activity, ErbB2-ErbB3 dimers are considered the most potent pairing of ErbB dimers

19 with regards to mitogenic effects. Expression of one or both of these partners has been seen in a number of cancers (Baselga and Swain 2009). It is also important to note that mutations in key catalytic residues may in fact be atypical mechanisms of action, rather than inactivating mutations. Both WNK1 and CASK are kinases that were originally thought to be inactive due to missing key residues or motifs, but have since been shown to be catalytically active using non-cannonical modes of action (Adrain and Freeman

2012). A list of known pseudokinases and phosphatases and the effects of knockdown / knockout in mice are presented by Reiterer and colleagues (Reiterer, Eyers et al. 2014).

Another well-studied group of dead enzymes is the pseudoenzyme rhomboid proteases, dubbed the iRhoms. Active rhomboid proteases are transmembrane proteins whose lies within the transmembrane domain. They bind and cut Type I transmembrane proteins and release them into the luminal or extracellular space. This action is especially critical in extracellular signaling. iRhoms are catalytically inactive, but retain the transmembrane localization and binding to transmembrane proteins (Adrain and Freeman 2012). iRhom2 is located in the endoplasmic reticulum (ER) and interacts with TACE (TNFα-converting enzyme), allowing its exit from the ER to cleave and release the cytokine tumor necrosis factor (TNF) from the cell surface (Adrain, Zettl et al.

2012). Together, these represent some of the well-known modes of action of dead enzymes.

ALDH proteins usually comprise ≈ 500 AA, although some multifunctional or multi-domain members may be larger (Jackson, Brocker et al. 2011). They are typically dimers or tetramers and consist of three domains: a substrate-binding (catalytic) domain, a cofactor (NAD(P)+) binding domain, and a dimerization/tetramerization domain

20

(Steinmetz, Xie et al. 1997). Several AA residues have been shown to be highly conserved and are required for activity. These include CYS302, ASN169, and GLU285 in the catalytic domain, GLY262 and GLY267 in the , and LYS209,

GLU416, PHE418 in the cofactor binding domain (Steinmetz, Xie et al. 1997).

In addition to their catalytic roles, ALDHs have been shown to possess non- enzymatic roles. Only two ALDH dead enzyme groups have been studied in any great detail, but in the course of the present study, several more were identified. The non- enzymatic roles of ALDH proteins, in both catalytically-active and -inactive proteins are described here. This provides a starting point for understanding the possible roles of novel catalytically-inactive ALDH proteins.

Small molecule binding and adduction by catalytically-active ALDHs: ALDH3A1 protects cellular proteins by the enzymatic detoxification of lipid peroxidation-derived aldehydes (Pappa, Estey et al. 2003), but also by directly scavenging hydroxyl radicals via CYS sulfhydryl groups in a manner reminiscent of GSH-mediated quenching (Uma,

Hariharan et al. 1996). Thus, overexpression of these proteins can be protective regardless of catalytic function. Other ALDHs, such as ALDH1A1 and ALDH2), have also been shown to be a target of adduction by acetaminophen, often with partial enzyme inactivation being a result (Landin, Cohen et al. 1996; Lee, Liao et al. 2013). In non- small lung cell carcinoma cell lines, the synthetic flavone flavopiridol, a cytotoxic drug, binds tightly to ALDH1A1 without inhibiting its ALDH catalytic activity or undergoing metabolism (Schnier, Kaur et al. 1999). Xenopus ALDH1A1, first described as Cytosolic

Thyroid Hormone-binding Protein (xCTBP), shows a high affinity for binding triiodothyronine (T3), which is disrupted by NAD(H)+ but not NADP(H)+ (Yamauchi and

21

Tata 1994; Yamauchi, Nakajima et al. 1999). This is consistent with reports that the T3 is estimated to be in residues 93-114, i.e., in the cofactor-binding domain, as

ALDH1A1 requires NAD+ (and cannot use NADP+) as a cofactor for catalytic oxidation of aldehydes (Yamauchi, Nakajima et al. 1999). Similarly, ALDH1A1 has been shown to bind daunorubicin, a chemotherapeutic agent, in a manner that is competitive with NAD binding (Banfi, Lanzi et al. 1994). In genital skin fibroblasts, ALDH1A1 has also been shown to bind androgen (i.e., dihydrotestosterone bromoacetate) (Pereira, Rosenmann et al. 1991).

Ocular ALDHs: The cornea and lens of the eye have the unique physiological requirements of maintaining high clarity (low light diffraction) while being resistant to incident UV radiation (UVR). The presence of specialized proteins, crystallins, helps these tissues accomplish these important tasks. Corneal and lens crystallins may be a single protein or set of proteins that compose up to 90% of the total water-soluble proteins within these tissues (Eguchi 1966; Chen, Thompson et al. 2013). Examples of crystallins include ALDH1A1, α-enolase, glutathione-S-transferase, lactic dehydrogenase, glyceraldehyde-3-phosphate dehydrogenase (G3PDH), and arginino- succinate (Jester 2008). They typically share several properties including i) being recruited from diverse, pre-existing cytoplasmic stress-response enzymes, ii) having a highly taxon-specific nature (different genes may serve the same function in different groups), and iii) accumulating to high levels in transparent tissues (Chen, Thompson et al.

2013).

Members of the ALDH1, ALDH2, and ALDH3 families serve as lens and corneal crystallins (Chen, Thompson et al. 2013). ALDH3A1 is the predominant corneal

22 in most mammals, (Verhagen, Hoekzema et al. 1991) but ALDH1A1 serves this purpose in human, rabbit, pig, chicken, and fish (Holmes, Cheung et al. 1989).

ALDH2 is also expressed in the cornea in rabbit and fish (Pappa, Sophos et al. 2001). In the lens, ALDH1 family members serve as crystallins as follows: ALDH1A1 in most mammals, ALDH1A8 (η-crystallin) in the elephant shrew, ALDH1A9 (Ω-crystallin) in scallops and ALDH1C1/2 (Ω-crystallins) in cephalopods (Graham, Hodin et al. 1996).

The catalytic activity of the ALDHs allows for the enzymatic detoxification of ultraviolet radiation (UVR)-induced, ROS-mediated LPO products. However, some important functions of the crystallins manifest independently of their catalytic properties. For example, high concentrations of these proteins provide short-range order within the cytoplasm of the lens fibers to reduce light scattering (Jester 2008). Another structural role involves direct absorption of UVB irradiation. The effectiveness of this process is illustrated in the bovine cornea in which ALDH3A1 accounts for 17% of the total cellular protein, but accounts for ≈ 50% of the total UVB absorptive capacity of the cornea

(Abedinia, Pain et al. 1990).

The Ω-crystallins (ALDH1A9 and ALDH1C1/2) are the only ALDH crystallins known to date that do not have catalytic activity (Zinovieva, Tomarev et al. 1993), and arise from a single gene homologue. Ω-crystallin represents 14% of the total soluble lens protein in octopus eye, but is present in minor, non-crystallin amounts in squid eye lens

(Tomarev, Chung et al. 1995). However, this protein represents ≈70% of the total soluble protein of the photophore (light organ) lens in the squid (Tomarev, Chung et al. 1995).

ALDH16A1: ALDH16A1 is structurally unusual in that it contains two ALDH family domains (Pfam: aldedh) (Finn, Bateman et al. 2014), as opposed to the single

23 domain typical of other ALDH family members. One of these domains is full length and the other is truncated, i.e., missing most of the catalytic domain (Vasiliou, Sandoval et al.

2013). This protein has widely distributed homologues in bacteria, protists, fish, amphibians and mammals, but not in archaea, fungi, and plants (Vasiliou, Sandoval et al.

2013). However, in all vertebrates (with the unusual exception of the frog), ALDH16A1 is predicted to be catalytically-inactive, missing a number of key catalytic and cofactor binding residues (Vasiliou, Sandoval et al. 2013). This makes ALDH16A1 a unique case among known ALDH family members in that it has a form that is predicted to be catalytically-inactive and yet conserved among a large phylogenetic group. Of interest,

ALDH16A1 protein expression has been detected in human immortalized cell lines by immunoblot, demonstrating that a protein product of this gene is produced (Vasiliou,

Sandoval et al. 2013).

ALDH16A1 interacts with a number of proteins, and it is likely that its physiological role lies in these interactions (Vasiliou, Sandoval et al. 2013). A nucleotide insertion resulting in the premature truncation of the protein maspardin underlies mast syndrome, a form of spastic paraplegia (Simpson, Cross et al. 2003). This protein has been shown to colocalize and interact with ALDH16A1 in neuronal cells, although it is not yet known what function this interaction plays (Hanna and Blackstone 2009). Other proteins that have been found to interact with ALDH16A1 include S-phase kinase- associated protein 1 (SKIP-1) (Foster, Rudich et al. 2006), proteasomal ATPase- associated factor 1 (PAAF1) (Ewing, Chu et al. 2007), ubiquitin specific peptidase 1

(USP1) (Sowa, Bennett et al. 2009), and protein kinase, AMP-activated, gamma 2 non- catalytic subunit (PRKAG2) (Behrends, Sowa et al. 2010). In addition, databases of

24 interaction data either curated or automatically extracted from Pubmed predict that

ALDH16A1 interacts with albumin (ALB), solute carrier family 2-facilitated glucose transporter (SLC2A4), heat shock protein 90 kDa alpha class B member 1 (cytosolic;

HSP90AB1), hypoxanthine phosphoribosyltransferase 1 (HPRT1), betaine–homocysteine

S-methyltransferase (BHMT), and glioblastoma amplified sequence (GBAS) (Niu,

Otasek et al. 2010; Kerrien, Aranda et al. 2012).

An epidemiological study in an Icelandic population identified a single nucleotide polymorphism (SNP) of ALDH16A1 (ALDH16A1*2) as a risk factor for hyperuricemia and gout (Sulem, Gudbjartsson et al. 2011). Computational analyses suggest that this is likely due to altered interactions with hypoxanthine phosphoribosyltransferase 1

(HPRT1) (Vasiliou, Sandoval et al. 2013). Homology modeling and docking experiments with frog ALDH16A1 (resembling the ALDH16A1 of lower animals) predicts the enzyme to be catalytically-active and bind aldehyde substrates normally, although substrate specificity has not been extensively investigated. This suggests that at some point ALDH16A1 was recruited from an enzymatically-active role in lower animals to a non-catalytic role which has been preserved in vertebrates.

Negative regulation of ALDH catalytic activity by ALDH mutants: The E487K mutant of ALDH2 (ALDH2*2) is inactive, showing a 150-fold increase in Km for cofactor (NAD+) and 2-10-fold increase in Vmax (Xiao, Weiner et al. 1995). Structural studies have shown that this mutation, located in the oligomerization domain, results in a large disordered region at the dimer interface, which includes much of the coenzyme- binding cleft, as well as part of the catalytic cleft (Larson, Weiner et al. 2005). Further, it has been demonstrated that ALDH2*2 can negatively regulate ALDH2*1 via dominant

25 negative hetero-tetramerization (Xiao, Weiner et al. 1995). Estimates of the reduction in activity in heterozygotes are ~85% (Ferencz-Biro and Pietruszko 1984; Enomoto, Takase et al. 1991). Although it has not been shown in vitro, heterodimerization between closely-related ALDH genes has been predicted as well, based on colocalization and conserved interactions in dimer and tetramer interfaces (Jackson, Holmes et al. 2013). It stands to reason that conservation of dimerization and tetramerization domains may allow crosstalk between active enzymes and closely-related dead enzymes, and this possibility should be explored in newly-discovered ALDH dead enzyme groups.

Substrate Specificity and Human Mutations of ALDH1B1

Enzymes with high AA similarity often share similar substrate specificities.

ALDH1B1 shares 72 percent AA identity with ALDH2, and 64 percent AA identity with

ALDH1A1. ALDH2 has been shown to possess three types of catalytic activity, namely aldehyde dehydrogenase, esterase, and nitroglycerin reductase (Daiber, Wenzel et al.

2009). ALDH1B1 has previously been reported to possess two of these activities, i.e., aldehyde dehydrogenase and esterase (Stagos, Chen et al. 2010). It is not known whether

ALDH1B1 has nitroglycerin reductase activity. As noted, initial reports of ALDH1B1 substrate specificity indicated a preference for short chain aldehydes, including acetaldehyde and propionaldehyde (Stewart, Malek et al. 1995). More recently, Stagos and colleagues described a broader range of substrates for ALDH1B1, including acetaldehyde (Km = 55 µM), benzaldehyde (Km = 50 µM), and p-nitrophenyl acetate

(Km = 288 µM). Unlike ALDH2, ALDH1B1 metabolizes 4-HNE very poorly (Km =

3,383 µM) but had some activity towards MDA (Km = 466 µM) (Stagos, Chen et al.

26

2010), making it unlikely that ALDH1B1 plays a large role in detoxifying these products of lipid peroxidation (LPO). Retinoic acid signaling plays a role in the development and homeostasis of many human tissues (Theodosiou, Laudet et al. 2010). The oxidation of retinaldehyde to the biologically-active retinoic acid represents another important ALDH family function and there is some evidence that ALDH1B1 may play a role in retinoic acid signaling. For example, ALDH1B1 activity may be downregulated by retinoic acid and retinaldehyde (Crabb, Stewart et al. 1995), and a role for ALDH1B1 in granulocytic development of human hematopoietic stem cells has been proposed through a mechanism involving retinoic acid signaling (Luo, Wang et al. 2007). In addition, ALDH1B1 has been shown to be a stem cell / progenitor marker in the development of the pancreas

(Ioannou, Serafimidis et al. 2013). Finally, the AA sequence similarity between

ALDH1B1 and traditional retinaldehyde-metabolizing enzymes, such as the ALDH1A subfamily, lends further support to the possibility that ALDH1B1 may play a role in these pathways.

ALDH2 plays an important role in the metabolic activation of nitroglycerin

(Chen, Foster et al. 2005), an anti-anginal drug that has been used for more than a century. Individuals lacking ALDH2 activity (e.g., those possessing the ALDH2*2 polymorphism) retain some responsiveness to nitroglycerin, suggesting the existence of alternate, ALDH2-independent pathways of activation (Zhang, Chen et al. 2007).

Nitroglycerin acts as a potent inhibitor of ALDH2. Through such an action, nitroglycerin inhibits its own bioactivation (leading to its diminished efficacy with continued administration, a process called tolerance) and can cause ALDH2 dysfunction (Beretta,

Sottler et al. 2008).

27

Given the range of substrates metabolized by ALDH1B1, it is important to understand mutations that could affect its activity. Early studies found ALDH1B1 to be polymorphic (Hsu and Chang 1991; Sherman, Dave et al. 1993). A search of current databases revealed three polymorphisms which are non-synonymous and present at a frequency of at least 1% - ALDH1B1*2 (Ala86Val), ALDH1B1*3 (Leu107Arg), and

ALDH1B1*5 (Met253Val). ALDH1B1*4 has been named, but is a silent (synonymous) mutation. ALDH polymorphisms can have significant pathophysiological sequelae. For example, a polymorphism of ALDH2, ALDH2*2, causes marked reductions in acetaldehyde metabolism and consequent flushing syndrome and ethanol avoidance in hetero- and homozygotes. Early studies in a limited number of subjects found no significant association between the ALDH1B1 genotypes and or alcohol aversion (Sherman, Dave et al. 1993; Sherman, Ward et al. 1994). A more recent, larger study examined associations between polymorphisms in several ALDHs (including

ALDH1B1) and alcoholism and cardiovascular risk factors including weekly alcohol intake, cholesterol, triglycerides, and systolic and diastolic blood pressure. Individuals with ALDH1B1*2 exhibited increased non-drinking behaviors (average of less than one drink per week), as well as increased systolic blood pressure (Husemoen, Fenger et al.

2008). No associations were found in ALDH1B1*3 individuals. The same research group performed a follow-up study that expanded the population used by Hussemoen et al.

(2008) to include an additional 6,784 adults (Linneberg, Gonzalez-Quintela et al.). It examined associations between ALDH genotypes and a variety of behavioral and physiological factors including ethanol consumption behaviors and ethanol hypersensitivity reactions, such as itchy runny nose, sneezing, shortness of breath, rash,

28 itching or swelling. An increase in the number of alcohol hypersensitivity reactions was observed in ALDH1B1*2 individuals, suggesting increased acetaldehyde toxicity, which is consistent with poorer metabolism of acetetaldehyde by ALDH1B1. As was the case in the previous study, ALDH1B1*3 polymorphisms did not correlate with any change in epidemiological parameters.

Given the number of known and proposed roles for ALDH1B1 in vitro and in vivo, and the effects that have been shown in population association studies, it is important to understand the substrate specificity of ALDH1B1 and the impact polymorphisms have on the function of ALDH1B1. Computational modeling of the binding of substrates to ALDH1B1 can be used to predict the substrate specificity of

ALDH1B1 and the impact mutations may have. This modeling is facilitated and underpinned by an understanding of the well-studied and highly conserved catalytic mechanism of the aldehyde dehydrogenase activity of the ALDH superfamily. A catalytic cysteine (CYS319 in ALDH1B1) makes a nucleophilic attack on the carbonyl carbon of the aldehyde, removing a hydride ion which reduces NAD+ to NADH. A glutamate

(GLU285 in ALDH1B1) serves as a general base (or activates a water molecule to do so), attacking the carbonyl carbon with as a leaving group. The side chain amide nitrogen of an asparagine (ASN186 in ALDH1B1) and the peptide nitrogen of the catalytic cysteine stabilize the oxyanion in the thiochemical transition state and orient the thiohemiacetal for hydride transfer to NAD+ (Liu, Sun et al. 1997; Steinmetz, Xie et al.

1997).

In Chapter VI, computational methods are used to investigate the binding of known and previously unreported substrates to ALDH1B1. Using recombinant human

29

ALDH1B1, the of two additional substrates of ALDH1B1 are characterized, specifically nitroglycerin and all-trans retinaldehyde. Based on the results of the computational docking, the enzyme kinetics of another substrate, 4-HNE was revisited. In addition, computational models of the polymorphic variants of ALDH1B1 were created, and the same substrates used in this study were docked against them in order to predict differences in substrate specificities that might arise from polymorphic variants. These polymorphic variants were expressed in vitro in order to confirm the results of in silico docking studies. Finally, the computational models of ALDH1B1 variants were probed in detail to provide a mechanism for the results seen in vitro.

30

CHAPTER II

UPDATE ON THE ALDEHYDE DEHYDROGENASE (ALDH) GENE

SUPERFAMILY IN VERTEBRATES

Summary

Members of the aldehyde dehydrogenase gene (ALDH) superfamily play an important role in the enzymatic detoxification of endogenous and exogenous aldehydes and in the formation of molecules that are important in cellular processes, like retinoic acid, betaine and -aminobutyric acid. ALDHs exhibit additional, non-enzymatic functions, including the capacity to bind to some hormones and other small molecules and to diminish the effects of ultraviolet irradiation in the cornea. Mutations in ALDH genes leading to defective aldehyde metabolism are the molecular basis of several diseases, including -hydroxybutyric aciduria, pyridoxine-dependent seizures,

Sjögren–Larsson syndrome and type II hyperprolinaemia. Interestingly, several ALDH enzymes appear to be markers for normal and cancer stem cells. The superfamily is evolutionarily ancient and is represented within Archaea, Eubacteria and Eukarya taxa.

Recent improvements in DNA and protein sequencing have led to the identification of many new ALDH family members. To date, the human genome contains 19 known

ALDH genes, as well as many pseudogenes. Whole-genome sequencing allows for comparison of the entire number of ALDH family members among organisms. This chapter provides an update of ALDH genes in several recently sequenced vertebrates and aims to clarify the associated records found in the National Center for Biotechnology

31

Information (NCBI) gene database. It also highlights where and when likely gene- duplication and gene-loss events have occurred. This information should be useful for future studies that might wish to compare the role of ALDH members among species and how the gene superfamily as a whole has changed throughout evolution.

Introduction

The aldehyde dehydrogenase gene (ALDH) superfamily is represented in all three taxonomic domains (Archaea, Eubacteria and Eukarya), suggesting a vital role throughout evolutionary history. Our understanding of the biological roles of this superfamily continues to expand in ways that are often unexpected and, perhaps, unprecedented for an enzyme family. As implied by their name, members of this superfamily serve to metabolize both physiologically- and pathophysiologically-relevant aldehydes. This capacity prevents the accumulation of toxic aldehydes derived from endogenous production and/or exogenous exposures, which, if left unchecked, adversely affect cellular homeostasis and organismal functions (Sophos, Pappa et al. 2001). ALDH activity is also required for the synthesis of vital biomolecules through the metabolism of aldehyde intermediates, such as retinoic acid, folate and betaine, to name a few (Vasiliou,

Pappa et al. 2000; Marchitti, Brocker et al. 2008; Sobreira, Marletaz et al. 2011). Whereas the ability of the ALDH family members to metabolize reactive aldehydes represents a major underlying cytoprotective mechanism, it is important to recognize that ALDHs demonstrate functions that extend beyond detoxification. Accumulating evidence supports roles for ALDHs in the modulation of cell proliferation, differentiation and survival, especially through participation in retinoic acid synthesis (Marchitti, Brocker et

32 al. 2008). Members of this superfamily also exhibit functions that appear to be independent of their enzyme activity, including absorption of ultraviolet (UV) irradiation in the cornea by acting as a crystallin and binding to hormones and other small molecules, including androgens, cholesterol, thyroid hormone and acetaminophen (Estey,

Cantore et al. 2007; Estey, Piatigorsky et al. 2007; Marchitti, Brocker et al. 2008).

Sequencing of the human genome and subsequent identification of mutations in

ALDH genes associated with loss of ALDH enzyme activity have led to the identification of many disease associations, such as cataracts (ALDH1A1, ALDH3A1, ALDH18A1), seizures (ALDH7A1), hyperprolinaemia (ALDH4A1), heart disease (ALDH2), alcohol sensitivity (ALDH1A1, ALDH1B1, ALDH2), certain cancers (ALDH2) and a broad array of other metabolic and developmental abnormalities (Marchitti, Brocker et al.

2008). Recently, a role for ALDHs in normal and cancer stem cells has also been identified. For example, ALDH1A1 is differentially expressed in human haematopoietic stem cells (HSCs) and can be used as a stem cell marker for multiple cancers (Marchitti,

Brocker et al. 2008). Similarly, ALDH1B1 is primarily expressed in stem cells in the normal colon and is strongly upregulated in human colonic adenocarcinomas (Stagos,

Chen et al. 2010; Chen, Orlicky et al. 2011). As described by Nelson and colleagues

(Nelson, Zeldin et al. 2004), genomic gene artifact identification becomes very important when using genotyping techniques to identify disease-causing alleles. Gene-duplication events, leading to multiple functional and/or non-functional genetic copies in the genome, can significantly complicate polymerase chain reaction (PCR)-based genotyping assays.

Transgenic animal models have permitted the exploration of the functions of ALDHs under in vivo physiological and pathophysiological conditions (Marchitti, Brocker et al.

33

2008). These invaluable studies are heavily dependent upon our understanding of the mouse and human genomes. In addition to mutations in ALDH genes within populations, there is a large variation in the number of ALDH genes between species.

During the past decade, the availability of gene and protein information has grown rapidly, primarily due to advances in gene-sequencing technologies. In the 2002 update of ALDH superfamily members (Sophos and Vasiliou 2003), 555 ALDH genes were listed, including 32 from Archaea, 351 from Eubacteria and 172 from Eukarya.

Characteristic ALDH motifs were searched in 74 genomes: 16 in Archaea, 51 in

Eubacteria and 7 in Eukarya. A recent download from the current Pfam database (build version 24.0) includes 16,765 ALDH entries (listed as aldedh in the Pfam database)

(Finn, Mistry et al. 2010). This update focuses on 11 representative vertebrate species in which the full genome has been sequenced: five primates, the cow, two rodents, two birds and one fish. Many of these genomes have been annotated automatically; generous algorithms list pseudogenes as protein-coding genes. This update attempts to describe the

ALDH number within these organisms and identify pseudogenes and gene-duplication events, when possible.

This work allows the testing of several hypotheses: 1) overall genes are broadly similar throughout vertebrates and no new gene families will be discovered, 2) some genes will be ‘universal’ indicating critical and broadly applicable metabolic functions present since ancient life, and that 3) other genes will have more ‘lineage-specific’ functions, indicated by patterns of duplication and loss across different lineages and species.

34

Methods

Fully sequenced genomes from 11 representative species: primates (human, Homo sapiens; common chimpanzee, Pan troglodytes; common marmoset, Callithrix jacchus;

Sumatran orangutan, Pongo abelii; Rhesus macaque, Macaca mulatta), the cow (Bos taurus), rodents (mouse, Mus musculus; rat, Rattus norvegicus), birds (zebra finch,

Taeniopygia guttata; domestic chicken, Gallus gallus) and one fish (zebrafish, Danio rerio) were analyzed.

ALDH genes were retrieved from Entrez Gene (Sayers, Barrett et al. 2010) using the terms ‘ALDH’ or ‘aldehyde dehydrogenase’. Peptide sequences for each ALDH gene were retrieved from Entrez Protein (Sayers, Barrett et al. 2010) and aligned against a reference list of ALDH family members, including known human ALDHs and sequences from the NCBI’s HomoloGene (Sayers, Barrett et al. 2010) using ClustalW (Larkin,

Blackshields et al. 2007). To be included for description, a gene record was required to meet three criteria: 1) the protein product of the gene must be ‘full-length’ (i.e. excludes known fragments and partial records); 2) the gene must have a known unique chromosomal location on the annotated genome; and 3) the gene must be listed as protein-coding (i.e. excludes known pseudogenes).

Parent genes were designated based on highest homology to the known human protein. Identified gene duplications were sequentially named according to nomenclature guidelines, based on decreasing to the parent gene. Duplicated genes were further analyzed to determine if they represented potentially new protein-coding genes or non-functional pseudogenes. Pseudogenes were identified according to criteria outlined previously (Nelson, Zeldin et al. 2004) and assigned to the following categories:

35 detritus pseudogenes (those which are fragments missing exons) and reverse-transcriptase events (those which resemble mRNA sequences and lack introns). If data suggested that a duplicated gene was protein coding, it was considered to be a new gene family member and named according to the previously established ALDH nomenclature system

(Vasiliou, Bairoch et al. 1999). Zebrafish aldh genes were named according to the guidelines set out by the zebrafish nomenclature committee (http://www.zfin.org)

(Mullins 1995). Pseudogenes in rodent (or fish) and non-rodent/non-fish genomes were appended with the suffix ‘p’ or ‘P’, respectively, and followed by a number designating multiple pseudogenes for a given gene family within each individual species.

It is, again, important to underscore that this initial analysis should be considered preliminary and subject to change as experimental evidence sheds light on actual protein function. Alignment and clustering of protein sequences were used as a basis for assigning homology. Sequences were aligned, and dendrograms based on neighbor- joining distances were created using a ClustalW webserver at http://align.genome.jp.

Percentage amino acid (AA) identities were determined using the Needle webserver at

(http://www.ebi.ac.uk/Tools/emboss/align/) (Needleman and Wunsch 1970).

To assess whether protein sequences were actively transcribed, several methods were employed. Numerous promoter-prediction programs were used, but none was sufficiently consistent across species or discriminatory to be useful in the prediction of pseudogenes. The ratio of nonsynonymous to synonymous nucleotide substitution rates

(Ka/Ks) was used as a measure of selective pressure on each individual gene. Rates were calculated using homologous genes for all species in the current analysis, in order to determine ancestral states using the Bergen Center Ka/Ks Calculation Tool

36

(http://services.cbu.uib.no/tools/kaks/) and default values, with the exception that the tree method was set to maximum likelihood (Liberles 2001).

Copy number variants (CNV; defined here as gains and losses of DNA sequences

>1 kilobase [kb]), insertions and deletions (InDels; gains and losses of DNA sequences of

100–999 base pairs [bp]), and inversions in human ALDH genes were retrieved from the

Database of Genomic Variants (Zhang, Feuk et al. 2006).

Results

Records for ALDH genes were retrieved and sorted for all 11 species analyzed

(Table 2.1). The number of records that met the above-mentioned criteria is provided (i.e. the number of genes excluding nonfunctional pseudogenes). The number of ALDH genes per species varied from 14 in chicken to 25 in zebrafish. There are currently 207 distinct genes present within the database for these 11 species; this is a greater than four-fold increase over the 51 annotated in 2002 (Sophos and Vasiliou 2003). This allows for a much more comprehensive comparison of ALDH superfamily members throughout vertebrate evolution during the past 450 million years. It is important to keep in mind that, for many species, some genes have yet to be identified. Further, many annotated genes may reflect gene-duplication events that represent non-functional pseudogenes.

These situations will be explored in greater depth below.

The total number of human annotations has remained unchanged since 2005, with

19 functional protein-coding genes (Vasiliou and Nebert 2005). The chimpanzee and the orangutan genomes diverged from humans ~5 and ~14 million years ago (MYA), respectively (Goodman, Porter et al. 1998; Hedges 2002). Both the chimpanzee and

37 orangutan genomes contain 18 ALDH genes, each corresponding to a known human orthologue. The macaque and common marmoset genomes are more distantly related.

They diverged ~25 and 35–40 MYA (Goodman 1999) and contain 20 and 16 ALDH members, respectively. Orthologues for all 19 human genes were identified in mouse and rat. In addition, rodent genomes contain an Aldh1a1 paralogue (Aldh1a7) and an Aldh3b2 gene duplication, resulting in a total of 21 Aldh genes. The most recent common ancestor of humans and rodents lived 75–90 MYA.

The cow genome, which diverged from that of the human 80–100 MYA, has 20 annotated ALDH entries which, again, closely parallel human members. Variations include two gene duplications and one possible deletion. Both avian genomes currently lack orthologous entries for ALDH1A1, ALDH1B1, ALDH1L1, ALDH3A1, ALDH3B2 and ALDH16A1. Moreover, the zebra finch genome is also missing annotated sequences for ALDH18A1 and includes two apparent gene duplications.

Table 2.2 summarizes these ALDH orthologues, their chromosomal locations and the associated NCBI Entrez gene identification (ID) number for each of the 11 species.

For zebrafish, Entrez gene ID 100334142 was listed as ‘aldehyde dehydrogenase 1A1- like [D. rerio]’. This gene record appears to be derived from an unplaced chromosomal fragment, however, because no genome location could be determined. In addition, alignment of the peptide sequence for this gene ID to other mammalian ALDH1A1 protein sequences was poor. Specifically, sequence homology with human, mouse and rat

ALDH1A1 was only 26.2 per cent, 26.4 per cent and 26.8 per cent, respectively. NCBI

BlastP analysis indicated that it most closely resembles bacterial ALDH proteins.

Together, this evidence suggests that this record may represent bacterial contamination,

38

Table 2.1. Total number of aldehyde dehydrogenase (ALDH) NCBI gene records identified within each species’ genome.

Latin name Common name # ALDH genes Homo sapiens Human 19 Pan troglodytes Common Chimpanzee 18 Callithrix jacchus Common Marmoset 16 Pongo abelii Sumatran Orangutan 18 Macaca mulatta Rhesus Monkey 20 Bos taurus Cow 20 Rattus norvegicus Norway rat 21 Mus musculus House mouse 21 Taeniopygia Zebra finch 15 guttata Gallus gallus Chicken 14 Danio rerio Zebrafish 25

rather than a true zebrafish gene; thus, this gene was not included. This also makes the zebrafish the only species among the 11 analyzed that lacks a record for ALDH1A1.

Interestingly, a protein blast (blastp) search using human ALDH1A1 and limiting results to fish species only (NCBI taxid: 7898) revealed ALDH1A2 homologues in multiple species (including salmon, pufferfish, ricefish and bichir), but no records for ALDH1A1 in any fish species. This is consistent with previous findings that indicate that ALDH1A1 is not present in the teleost lineage (Pittlik, Domingues et al. 2008).

Evidence for several gene duplications was found. Table 2.3 lists all genes that show duplications, compared with genes in the human genome. This table provides a summary of existing information available within the NCBI gene entries, as well as recommended gene names based on our analyses and current nomenclature guidelines.

Table 2.4 lists additional information related to peptide sequences and calculated sequence identities. Additional genes (increase in gene number, compared with humans)

39 show peptide divergence of as little as 0.4 per cent (zebrafish .2 and aldh2.3) and as much as 64.9 per cent (zebrafish aldh3a2.1and aldh3a2.2). In most cases, gene duplications have similar sizes, are often nearby on the same chromosome (Chr) and show some degree of divergence (i.e., 70–95 per cent AA identity). Genes that have portions of the gene copied with no AA divergence include: cow ALDH1A3P1 (127 of

537 AAs), zebrafish aldh5a1.2 (404 of 514 AAs) and zebrafish aldh18a1.2 (782 of 782

AAs). Zebrafish aldh3a2.3 (169 of 514 AAs) represents a shortened copy which shows minor divergence (98.4 per cent identity). Ka/Ks ratios were calculated for all gene duplications. A value of >1.0 indicates selective pressure to conserve the gene and suggests that it plays a functional role. All duplications were found to have a score of

>1.0, except macaque ALDH7A1P5 (to be discussed below).

ALDH1: ALDH1A1 is present in all species except zebrafish, confirming earlier studies (Pittlik, Domingues et al. 2008). In cow, there are two distinct records for

ALDH1A3: the gene found on Chr 21 is full length (537 AAs) and represents the putatively functional parent gene (ALDH1A3), whereas the second is a detritus pseudogene on Chr 28 which appears to the product of a partial gene duplication event

(ALDH1A3P1). The shorter genomic sequence would translate a peptide sharing 100 per cent sequence identity to only the 127 carboxy-terminal AAs of the full-length parent protein. Several gene duplications appear to have been conserved in rodents. One such gene is ALDH1A7, found in rats and mice. In both cases, the ALDH1A7 gene is present on the same chromosome and in close proximity to ALDH1A1. Mouse ALDH1A7 shares

92 per cent AA identity with mouse ALDH1A1, and studies have confirmed that the gene encodes inducible tissue-specific mRNA (Alnouti and Klaassen 2008). ALDH1B1 is

40 present in mammals but missing from birds and fish. ALDH1L1 is missing from both bird species (zebra finch and chicken) but present in other species examined and thus may represent a deletion in the avian lineage.

ALDH2: ALDH2 appears to be one of many genes duplicated in zebrafish. It has been suggested that an entire genome duplication event may have occurred after the divergence of teleosts and mammals (Woods, Kelly et al. 2000); this may explain the increased ALDH gene number in zebrafish. A second gene-duplication event appears to have occurred, giving rise to three zebrafish aldh2 gene records (aldh2.1-3). The aldh2.1 gene is believed to be the parent, based on homology with orthologous ALDH2 protein sequences. Both aldh2.2 and aldh2.3 potentially encode full-length peptides. Aldh2.2 is

95.2 per cent similar to Aldh2.1 and aldh2.3 may represent a more evolutionarily recent duplication of aldh2.2, as evidenced by 99.6 per cent AA identity between the Aldh2.2 and Aldh2.3 proteins.

ALDH3: The ALDH3 genes show the most variation in gene number of any

ALDH family among the organisms studied. ALDH3A1 is missing from birds and fish but is present in every mammalian genome analyzed in this study. The zebra finch has a duplicate ALDH3A2 (ALDH3A3) entry which encodes a full-length peptide that shares

84.1 per cent identity with the parent protein. Four ALDH3A2 homologues were identified within the zebrafish genome. The aldh3a2.1 is considered the parent gene. The aldh3a2.2 and aldh3a2.3 full-length gene products, respectively, share 64.9 per cent and

70.9 per cent sequence identity with that of Aldh3a2.1 and 64.9 per cent identity with each other. Zebrafish aldh3a2p1 represents a partial gene duplication; the resulting 169

AA peptide would most likely undergo proteolytic degradation if translated.

41

ALDH3B1 is duplicated in cow and zebra finch, as well as in zebrafish, on the proviso that D. rerio aldh3d1 is also considered an ALDH3B1 homologue. Zebrafish

Aldh3d1 shares 44 per cent AA identity with Aldh3b1 and is listed in NCBI

HomoloGene as a homologue of ALDH3B1 (HomoloGene, data not shown) (Sayers,

Barrett et al. 2010). Zebra finch ALDH3B5 encodes a 341 AA peptide that shares 100 per cent sequence identity with the 228 amino-terminal AAs of the parent gene’s protein.

Cow and zebra finch ALDH3B4 and ALDH3B5 proteins share 80.9 per cent and 53.2 per cent sequence identity with their respective parent genes, and 39.7 per cent with one another, indicating that none of the genes is an orthologue. Zebra finch ALDH3B5 is shorter than ALDH3B1 (341 versus 450 AAs) and, without this sequence gap, they share

93.1 per cent AA identity; it is unknown whether this smaller gene product is functional.

ALDH3B2 is present as a single distinct gene in human, chimpanzee and macaque, whereas two copies occur in mouse and rat. ALDH3B2 is absent from common marmoset, cow, zebra finch, chicken and zebrafish. Mouse and rat ALDH3B3 share 86.4 per cent and 76.9 per cent AA identity, respectively, with the corresponding parent

ALDH3B2 proteins and 83.4 per cent identity with each other. The two ALDH3B3 genes are found on corresponding syntenic within their respective genomes.

Presently, the protein product of Entrez Gene ID 688778 (R. norvegicus) is annotated as

‘ALDH3B1 (predicted)’. Based on a phylogenetic clustering of ALDH3B1 and

ALDH3B2 protein sequences (Figure 2.1), however, I believe it is better to name this protein ALDH3B3. This shows that both mouse and rat ALDH3B3 proteins are in the

ALDH3B2 clade and are more similar to each other than to rodent or human ALDH3B2 proteins. The alignment used for phylogenetic clustering can be seen in Figure 2.2.

42

indicate chromosomal regions. indicatechromosomal

romosome (Chr) locations. Numbers in in Numbers (Chr) romosome locations.

newlyduplicated genes. discovered

dicate dicate

ALDH genes and duplicated genes across species with respective ch across genes respective with duplicated species and genes ALDH

parentheses indicate NCBI Entrez gene ID (GI). Records in bold text denote duplications compared with the human genome. Z, Z, in genome. human denote the with text bold compared gene ID duplications Records NCBI (GI). indicate Entrez parentheses locations gene in mouse designations Letter centiMorgans. birds in Chr cM, system); (ZW sex the bold in in Genes Table 2.2.

43

.

Continued

Table 2.2.

44

enes in this study that show evidence of gene duplication, compared duplication, genome. gene of in that with human the compared enesthis in study evidence show that

List of the Entrez Gene genes ID (GI), chromosome location, presence of introns, gene type and recommended gene type of recommended and gene introns, presence location, genes ID Gene chromosome (GI), Entrez Listthe of

3. 3.

2.

name of all ALDH all g of name committee. zfin the by set to nomenclature separate named conventions according genes are *zebrafish Table

45

Continued

3. 3. 2.

Table

46

e e

per cent paired AAs are identical. The identical. The are paired AAs cent per

the truncation, would 23.6 = 127/537 have truncation, the

ther. ‘% AA identity (unaligned excluded)’ indicates the indicates the excluded)’ ther.identity (unaligned AA ‘%

absolute number of identical AAs AAs number identical absolute number the to relative absolute of

AA protein, AA except identical is for which

-

H genes in this study that show evidence of gene duplication, compared with that in that with the compared duplication, of gene evidence show that this in genes H study

AA fragment 537 AA a of

-

. Tabulation of all ALD all of Tabulation .

4

2.

Table values, acids RefSeq [AAs]), amino IDs Ka/Ks protein protein and of number lengths Included (in genome. human are denotes identity’ protein AA names. the ‘% recommended of either by of alignment arepresented the in are that gap AAs the ‘% indicates percentage locations. AA AA unaligned’ of o the sequence than longer is one if or overhang an sequence either locations. AA number total the For excluded from are of identical AAs when of locations unaligned are AA that percentage 127 a example, hav that (AAs the in identity, which of unaligned by sequence residues longer cent cent per 76.4 = 410/537 per is represented 127/127 excluding but, sequence) residues, those 100 = shorter the with correlation no identity gaps percentage percentage and being are identity,for which percentage indicates column compared sequences final gaps) (excluding

47

Continued

. . 2.4

Table

48

ALDH4: ALDH4A1 is missing from chimpanzee and common marmoset but is present in all others. Previously, rat ALDH4A1 had been conspicuously absent from the major databanks but it was recently added. During a BLAST search of the rat genome using various individual exon segments from mouse Aldh4a1, significant hits for

Aldh4a1 in the rat genome were identified on Chr 5q36 and it was determined to be a part of the fusion gene LRRP Ba1-651 (Tizzano and Sbarbati 2007). Figure 2.3 shows an assembled structure of this fusion gene with the Aldh4a1 exons highlighted in red.

Although it appears that these exons are transcribed and contain the conserved ALDH catalytic domain, it is not clear whether the gene product retains aldehyde dehydrogenase activity

ALDH5 and beyond: ALDH5A1 is missing in marmoset and duplicated in zebrafish. The zebrafish duplication, aldh5a1.2, encodes a slightly truncated peptide (404 versus 514 AAs) which shares 100 per cent AA identity with the first 426 AAs and resides on the same Chr as aldh5a1.1.

ALDH7A1 is duplicated in the macaque. The ALDH7A1P5 duplication is located on Chr 14 and contains the complete ALDH7A1 coding sequence; however, the sequence lacks any intronic regions, suggesting a reverse transcriptase-mediated duplication event.

Furthermore, this gene has a Ka/Ks score of 1.289, indicating a lack of selective pressure to conserve this gene. This provides further evidence that ALDH7A1P5 does not code for a functional protein.

In zebrafish, aldh9a1 has three copies. The parent gene aldh9a1.1 and aldh9a1.3 reside on Chr 8; aldh9a1.2 is found on Chr 2. Both aldh9a1.2 and aldh9a1.3 encode putative full-length proteins which respectively share 71.2 per cent and 94.9 per cent AA

49

Figure 2.1. Neighbor-joining dendrogram (with branch lengths representing relative protein sequence similarity) of ALDH3B sequences in human, rat and mouse, indicating the likely homology and identity of the genes assigned ‘Aldh3b3’ in rat and mouse.

50

Figure 2.2. Alignment of ALDH3B2 genes in human, rat and mouse created by ClustalW. Dashes (–) represent sequence gaps, asterisks (*) represent identical amino acids (AAs), colons (:) represent very similar AAs, periods (.) represent less similar AAs, whereas spaces ( ) represent dissimilar AAs.

51 identity with Aldh9a1.1 and 70.3 per cent sequence identity with each other. Zebrafish also contains a duplication of aldh18a1. The aldh18a1.2 is found on the same chromosome and encodes a protein that is 100 per cent identical with that of the parent gene. The naming of zebrafish genes required further genomic analyses in order to determine whether duplications originated from the ray-finned lineage whole-genome duplication event. Many of the duplicated genes reside within close proximity on the same chromosome, suggesting that they are segmental duplications that resulted from misguided recombination processes during meiosis and not a product of the whole genome duplication that took place within the ray-fin lineage (Nelson 2009). These include the aldh2, aldh5a1 and aldh18a1 paralogues, which are located in close proximity on Chr 5, 16 and 12, respectively. It also includes aldh3a2.1 and aldh3a2.2, located on Chr 15, as well as aldh9a1.1 and aldh9a1.3, found on Chr 8. The gene architecture surrounding aldh3a2.3 on Chr 21 does not support a duplicated chromosome, in that the region lacks other duplicated genes from Chr 15. Furthermore, studies looking at zebrafish gene duplications found that a high frequency of genes found on Chr 21 are duplicated on Chr 5 and none were identified on Chr 15, suggesting that Chr 5, rather than Chr 15, is the paralogous chromosome (Taylor, Braasch et al. 2003; Woods, Wilson et al. 2005). A similar situation was identified with respect to aldh9a1.2 on Chr 2.

Uridine–cytidine kinase-2 homologues (uck2a and uck2b) are found upstream of both aldh9a1.1 and aldh9a1.2, supporting a tandem gene duplication event; however, other genes in close proximity to this duplication do not show any homology between chromosomes 2 and 8.

52

Figure 2.3. Comparison of ALDH4A1 from human and rat. Rat Aldh4a1 is part of the larger fusion gene LRRP Ba1-651. The exons representing the Aldh4a1 portion of this gene with homology to mouse and human are highlighted in red.

53

Alternatively spliced transcriptional variants and CNVs of human ALDH genes:

In addition to the increase in ALDH identification through genomic sequencing, other sources of complexity in the ALDH superfamily are being studied. Transcript sequencing has revealed that many ALDH genes encode multiple mRNA splice variants (for a review of human ALDH splice variants, see Black et al. (Black, Stagos et al. 2009)). Besides splice variants, CNVs have been reported for human ALDH genes. By querying the

Database of Genomic Variants, 35 CNVs, 28 InDels and one inversion have been detected in the ALDH family, although these records are usually representative of one or several individuals (Table 2.5). Of these 64 events, 33 were InDels entirely within intronic regions and may be silent. Others are likely to cause loss of function of the enzyme involved, including loss of the whole gene (11 events; occurred in ALDHs 1A3,

1B1, 3A1, 3B1, 5A1 and 16A1) or duplication, loss or inversion of exons within the coding sequence (16 events; occurred in ALDHs 1A3, 1L1, 1L2, 3A2, 3B2, 6A1 and

9A1). Finally, in a few cases, a region containing the entire gene and surrounding region was duplicated (four events; occurred in ALDH3B1 and ALDH3B2).

Discussion:

The ALDH superfamily shows considerable diversity among vertebrate genomes, with species in the current study showing between 14 and 25 putatively protein-encoding genes. Many of the gene duplications discussed here probably encode functional proteins.

There are also a number of duplication events that give rise to non-functional pseudogenes. Names were assigned to the ‘new genes’ and ‘pseudogenes’ (Table 2.3) according to the ALDH nomenclature system established in 1999

54

Table 2.5. Known copy number variations in humans. Included are the variation ID from the Database of Genomic Variants, ALDH family member, type (CNV – copy number variation with changes . 1 kb; InDel – insertions and deletions with changes 100–999 bp; inv — inversions with changes that invert the nucleotide sequence), whether the change was a loss or gain, site (intron — change only affects an intronic region; part — change affects one or more exons; whole — change affects the entire gene), sample size and chromosomal location. Variation Sample ALDH Type Gain/Loss Site Chr ID size (variant /

controls) 26310 16A1 InDel Gain Intron 1/1 19q13.33 26311 16A1 InDel Gain Intron 1/1 19q13.33 26312 16A1 InDel Gain Intron 1/1 19q13.33 26313 16A1 InDel Loss Intron 1/1 19q13.33 109892 1A1 InDel Gain Intron 1/1 9q21.13 102109 1A2 CNV Loss Intron 1/1 15q22.1 25534 1A2 InDel Loss Intron 1/1 15q22.1 40101 1A2 InDel Loss Intron 1/1 15q22.1 41386 1A2 InDel Loss Intron 1/1 15q22.1 45349 1A2 InDel Loss Intron 1/1 15q22.1 45350 1A2 InDel Loss Intron 1/1 15q22.1 102186 1A3 CNV Loss Intron 1/1 15q26.3 11819 1A3 InDel Loss Intron 1/36 15q26.3 25599 1A3 InDel Loss Intron 1/1 15q26.3 25600 1A3 InDel Loss Intron 1/1 15q26.3 25601 1A3 InDel Loss Intron 1/1 15q26.3 40124 1A3 InDel Loss Intron 1/2 15q26.3 42429 1A3 InDel Loss Intron 1/1 15q26.3 42898 1A3 InDel Loss Intron 1/1 15q26.3 45395 1A3 InDel Loss Intron 1/1 15q26.3 61482 1A3 InDel Loss Intron 1/1 15q26.3 68446 1L1 InDel Loss Intron 1/39 3q21.2 106822 1L2 CNV Gain Intron 1/1 12q23.3 42760 3A2 InDel Loss Intron 1/1 17p11.2 24787 3B2 InDel Loss Intron 1/1 11q13.2 44926 3B2 InDel Loss Intron 1/1 11q13.2 81276 5A1 InDel Gain Intron 1/90 6p22.2 93550 5A1 CNV Loss Intron 2/90 6p22.2 99466 5A1 CNV Loss Intron 1/1 6p22.2

55

Table 2.5. Continued Variation Sample ALDH Type Gain/Loss Site Chr ID size (variant /

controls) 33982 7A1 InDel Gain Intron 1/1 5q23.2 97538 9A1 InDel Gain Intron 1/1 1q24.1 23991 9A1 InDel Gain Intron 1/1 1q24.1 11004 9A1 InDel Loss Intron 15/50 1q24.1 35661 16A1 CNV Gain Part 1/1 19q13.33 114045 1A3 CNV Gain Part 1/30 15q26.3 72379 1A3 CNV Loss Part 1/39 15q26.3 4352 1L1 CNV 2G 1L Part 3/95 3q21.2 59786 1L1 Inv Inversion Part 1/1 3q21.2 68445 1L1 CNV Loss Part 1/39 3q21.2 107014 1L2 CNV Loss Part 1/1 12q23.3 88379 3A2 CNV Loss Part 1/90 17p11.2 88381 3A2 CNV Loss Part 1/90 17p11.2 3140 3A2 CNV Loss Part 4/270 17p11.2 65982 3B2 CNV Gain Part 2/450 11q13.2 85827 3B2 CNV Loss Part 2/90 11q13.2 53128 3B2 CNV Loss Part 2/1064 11q13.2 3055 6A1 CNV Gain Part 1/270 14q24.3 66668 6A1 CNV Loss Part 2/450 14q24.3 6793 9A1 CNV Loss Part 2/50 1q24.1 3856 3B1 CNV Gain/loss Whole 3/270 11q13.2 113072 3B1 CNV Gain Whole 1/30 11q13.2 30558 3B1 CNV Gain Whole 1/1 11q13.2 11q13.1– 5275 3B2 CNV Gain Whole 1/272 11q13.2 5111 16A1 CNV Loss Whole 25/95 19q13.33 19q13.32– 32261 16A1 CNV Loss Whole 18/30 19q13.33 5110 16A1 CNV Loss Whole 4/95 19q13.33 2201 1A3 CNV Loss Whole 3/269 15q26.3 47939 1B1 CNV Loss Whole 6/2906 9p13.1

56

Table 2.5. Continued Variation Sample ALDH Type Gain/Loss Site Chr ID size (variant /

controls) 30022 3A1 CNV Loss Whole 2/485 17p11.2 53160 3B1 CNV Loss Whole 2/1064 11q13.2 2931 3B1 CNV Loss Whole 8/270 11q13.2 29913 3B1 CNV Loss Whole 1/485 11q13.2 29914 3B1 CNV Loss Whole 1/485 11q13.2 47969 5A1 CNV Loss Whole 9/2906 6p22.2

(Vasiliou, Bairoch et al. 1999). The species-specific nomenclature system was used for zebrafish genes (Mullins 1995). Pseudogenes were also named according to the standardized protocol (Hedges 2002).

In the cow genome, ALDH1A3P1 resembles the product of a partial gene duplication event. The coding region would translate a peptide sharing 100 per cent sequence identity to the 127 carboxyterminal AAs of the full-length parent gene. Such a high degree of sequence identity is suggestive of a relatively recent evolutionary duplication. Even if the truncated gene encodes the 127 AA peptide; however, it lacks many highly conserved residues required for ALDH activity. Thus, the truncated peptide would probably be targeted for rapid degradation. As such, this gene represents a nonfunctional pseudogene and has been named accordingly.

ALDH1B1 is present in mammals but missing from birds and fish. The high degree of AA sequence conservation between ALDH2 and ALDH1B1 suggests that the latter may be the product of a gene duplication event that occurred sometime after the avian–land animal split around 310 MYA. Future analyses should consider other species,

57 including amphibians and reptiles, in order to verify and more accurately pinpoint this evolutionary event.

Analysis of the aldh2 gene duplications in zebrafish indicates that these represent protein-coding genes and not pseudogenes. As mentioned above, translation of either gene would result in a full-length peptide. The aldh2.2 gene would encode a product 95.2 per cent identical to that of the parent gene aldh2.1. At 95.2 per cent AA identity, aldh2.2 represents a new gene. The aldh2.3 homologue may represent a more evolutionarily recent duplication of aldh2.2, as evidenced by the ~99.6 per cent sequence identity noted.

Therefore, aldh2.3 is likely to be a gene-duplication event of aldh2.2. All three protein products include the conserved ALDH motifs and residues required for enzyme activity.

The ALDH3 family showed the greatest variability among species. ALDH3A1 facilitates cell cycle regulation and scavenging of reactive oxygen species, and acts as a corneal crystallin by filtering UV irradiation in the eye. ALDH3A1 is missing from birds and fish but is present in every mammalian genome analyzed in this study, suggesting that the gene evolved sometime after 310 MYA. ALDH3A1 is conserved among mammals and shows no apparent duplications. In some species, such as rabbit, it appears that ALDH1A1 is expressed as a corneal crystallin instead of ALDH3A1 (Stagos, Chen et al. 2010). Interestingly, zebrafish is the only species in this study that apparently lacks both ALDH3A1 and ALDH1A1. Studies have suggested that zebrafish use scinla

(cytosolic gelsolin) as a corneal crystallin instead (Xu, Kantorow et al. 2000; Jia,

Omelchenko et al. 2007; Greiling and Clark 2008).

Zebra finch ALDH3A3 encodes a full-length peptide that shares 84.1 per cent similarity with the ALDH3A2 parent gene. Zebrafish has three aldh3a2 duplications,

58 which include two full-length genes (aldh3a2.2 and aldh3a2.3) and a significantly truncated partial duplication (aldh3a2p1). The degree of sequence identity that Aldh3a2.2 and Aldh3a2.3 share with the parent peptide (64.9 per cent and 70.9 per cent, respectively) suggests that they diverged sufficiently long ago to be considered new

ALDH3A family members. They also share 64.9 per cent identity with each other and less than 60 per cent identity with zebra finch ALDH3A3, suggesting that all three genes are paralogues rather than orthologues. Zebra finch ALDH3A5 should also be considered a new functional ALDH family member. In addition, the zebrafish pseudogene aldh3a2p1, if translated, would share the highest degree of sequence identity with aldh3a2.3. Thus, the pseudogene most likely reflects a more recent partial duplication of this gene.

ALDH3B1 is duplicated in both cow and zebra finch. The cow ALDH3B4- encoded protein would be full length and share 85.4 per cent identity to ALDH3B1, suggesting that it is a new ALDH3B family member. Zebra finch ALDH3B5 shares an extremely high degree of homology with the amino-terminus of ALDH3B1. However, it lacks ~150 AAs that comprise the carboxy-terminus needed for enzyme oligomerisation.

The truncated protein would still contain the conserved motifs required for ALDH activity. Until more experimental evidence becomes available, the ALDH3B5 gene should be considered as putatively functional.

The mouse and rat Aldh3b3 genes appear to represent new orthologous ALDH family members; the genes reside in syntenic chromosomal regions and share a high degree (83.4 per cent) of sequence identity with one another. The two proteins are more divergent than the rodent ALDH3B2 orthologues, which share 89.9 per cent sequence identity.

59

Aldh5a1 is another duplicated ALDH gene within the zebrafish genome. The duplication aldh5a1.2 resides on the same chromosome as the aldh5a1.1 parent gene, and the two share 100 per cent sequence identity. Aldh5a1.2 encodes a peptide containing an additional 22 amino-terminal and 88 carboxyterminal residues. It also shares greater sequence identity with the human ALDH5A1 orthologue than Aldh5a1.1 (65.5 per cent versus 51.4 per cent). This suggests that aldh5a1.2 might actually be the parent gene and aldh5a1.1 a slightly truncated version formed as the result of gene duplication.

As mentioned above, the macaque ALDH7A1P5 genomic sequence lacks intronic regions, suggesting that a reverse transcriptase-mediated event gave rise to this pseudogene (i.e. having no adjacent promoter or other regulatory sequences). Four additional ALDH7A1 pseudogenes have been identified on chromosomes 5q14

(ALDH7A1P1), 2q31 (ALDH7A1P2), 7q36 (ALDH7A1P3) and 10q21 (ALDH7A1P4) [ref

19]. Macaque ALDH7A1P5 is located on Chr 14, which is not syntenic with human Chr

11 and does not share common origins with any of the human pseudogenes. Therefore, the event that gave rise to ALDH7A1P5 must have taken place within the last 25 million years.

Three full-length ALDH9A1 homologues were identified in zebrafish. The

Aldh9a1.2 peptide shares 71.2 per cent and 70.3 per cent identity with Aldh9a1.1 and

Aldh9a1.3, respectively. Aldh9a1.3 is 94.9 per cent identical to the parent Aldh9a1.1 peptide, suggesting that this duplication was a relatively recent event when compared with the duplication that gave rise to Aldh9a1.2. Hence, aldh9a1.1, aldh9a1.2 and aldh9a1.3 represent three distinct protein-coding ALDH9 family members. The zebrafish genome also contains two copies of aldh18a1, which are found in very close proximity

60 on Chr 12. Both genes are considered protein coding and would give rise to peptides of the same length which share 100 per cent sequence identity, suggesting a relatively recent duplication event.

ALDH gene-naming conventions dictate that (i) ALDH superfamily members sharing more than ~40 per cent AA identity belong to the same family (e.g., ALDH1A,

ALDH1B, etc.), and (ii) ALDH family members that share greater than ~60 per cent AA identity belong to the same subfamily (e.g., ALDH1A1, ALDH1A2, etc). This provides a convenient and systematic naming system for an entire superfamily. Interestingly, this does not always indicate homology properly; these rules in the cytochrome P450 (CYP) gene superfamily are known to break down when evolutionarily distantly related animals are included (Nelson 2009). For example, whereas zebrafish Aldh3d1 and Aldh3b1 share only 50 per cent AA identity, HomoloGene evidence and alignments suggest that aldh3d1 is probably a duplication of (data not shown). Although aldh3d1 has diverged considerably, it is likely to be more closely related to aldh3b1 than the naming convention would suggest.

Many of these proteins have been defined based on genomic or dbEST data and have not been studied extensively. Many records remain in databases that are listed as

‘protein-coding’ but which instead may represent pseudogenes of various types.

Furthermore, although the genes here do not have internal stop codons, without functional analysis, it is difficult to determine whether the genes might have other inactivating mutations or if they experience selective pressure. Although automated prediction and naming of ALDH proteins from completely sequenced genomes have achieved a great deal of information in a short amount of time, the alignment, curation

61 and naming of these genes remains an important task. The fact that no new human ALDH genes have been identified over the past six years and that most other vertebrates seem to have settled close to this number suggests that identification of ALDH superfamily members in vertebrates is nearing completion. Determining the function and biological importance of each family member still requires additional work, however. As more information becomes available, the web database resource at www.aldh.org (the aldehyde dehydrogenase gene superfamily resource center) (Black and Vasiliou 2009) will be updated to reflect our current understanding of this diverse and essential gene superfamily.

For the first time, the complete number of ALDHs across a large number of vertebrates has been described. As predicted, it was found that vertebrates have a largely similar number and type of ALDHs, with fish having more than average and birds having less than average. No new families of ALDHs were discovered in vertebrates that are not present in humans. Future updates should focus on expanding the number of vertebrates considered, especially including amphibians and reptiles as genomes become available or are completed. Further, as the genetic / genomic picture becomes clearer, expression studies must be carried out to definitively show protein expression of these genomic locations. This will clarify the total number and expression of ALDHs that are enzymatically active in each species as well as their similar and divergent substrate specificities. Finally, the copy number variation revealed here indicates additional complexity in the human ALDH superfamily. Many of the records found were gene losses, which would likely have effects comparable to the many described inactivating mutations in ALDHs. Less well understood are the effects of partial losses and partial or

62 full gene duplications. Deeper sequencing of more individuals will allow for the better characterization of these effects and the physiological consequences of these events.

63

CHAPTER III

UPDATE ON THE ALDEHYDE DEHYDROGENASE (ALDH) GENE

SUPERFAMILY IN FROG (XENOPUS TROPICALIS) – AN EXAMPLE OF

POSSIBLE HORIZONTAL GENE TRANSFER

Summary

Previously, aldehyde dehydrogenase (ALDH) superfamily nomenclature has been analyzed for many vertebrates but, to date, there has not been a study of the number of

ALDH enzymes in frogs. Further, smaller studies have shown unusual placement of frog

ALDH1B1 and frog ALDH16A1 in gene trees, prompting further questions. This study analyzes the number of frog (Xenopus tropicalis) ALDH genes and places them in context with their closest 300 ALDH protein neighbors. These gene trees provide context for the evolution of frog ALDHs as well as wider context of each ALDH gene family found in vertebrates. The possibility horizontal gene transfer (HGT) is analyzed. It is found that ALDH1B1 likely arose in early vertebrates but is only retained in mammals and amphibians, although more amphibian genome data is needed to confirm this.

Second, in considering a full ALDH16A1 gene tree in evolutionary context, it is likely that the non-catalytic form of ALDH16A1 arose in fish and was transferred to an early amniote ancestor.

64

Introduction

The aldehyde dehydrogenase (ALDH) superfamily represents a group of enzymes that catalyze the NAD(P)+-dependent oxidation of a wide variety of aldehydes to their corresponding carboxylic acids (Marchitti, Brocker et al. 2008). This group is widely distributed and found in all kingdoms from archaea to mammals including humans. Their known substrates include diverse aldehydes involved in growth and development, differentiation, oxidative stress, osmoregulation, neurotransmission, and detoxification of dietary and environmental aldehydes (Marchitti, Brocker et al. 2008; Brocker, Vasiliou et al. 2013). ALDH proteins usually comprise ≈ 500 amino acids, although some multifunctional or multi-domain members may be larger (Jackson, Brocker et al. 2011).

They are typically dimers or tetramers and consist of three domains: a substrate-binding

(catalytic) domain, a cofactor (NAD(P)+) binding domain, and a dimerization/tetramerization domain (Steinmetz, Xie et al. 1997).

The last decade has seen a flood of whole genome sequence data for a number of organisms. Many of these are annotated automatically, identifying putatively protein- coding sequences and assigning them to protein families. However, it can be difficult to assess duplicate records and pseudogenes. In addition, while assignment of based on protein family signatures (e.g. HMM signatures in Pfam) is often quite accurate, assignment to families within that group are more difficult (e.g. ALDH2 vs. ALDH1B1). Recent efforts have been made to annotate the ALDH gene number in a number of groups including vertebrates (Jackson, Brocker et al. 2011), plants (Brocker,

Vasiliou et al. 2013), and others. These provide invaluable resources, including stable and unique nomenclature for ALDH genes. Other efforts have focused on describing the

65 distribution of an individual gene throughout evolution. Recently, it was shown that

ALDH1B1 likely arose as a gene duplication event from ALDH2 (Jackson, Holmes et al.

2013). However, unlike ALDH2, which is distributed widely throughout animals,

ALDH1B1 is found only in mammals, and also in one frog species. Another ALDH protein which is receiving attention recently is ALDH16A1, which is present in two major forms – a non-catalytic form present in most vertebrates examined, including mammals and fish, and a catalytic form found in lower animals. Unexpectedly, frogs possess a putatively-catalytic ALDH16A1 similar to lower animals. These observations led to two major questions – 1) given that no amphibian (or reptile) was included in the previous ALDH update in vertebrates, what is the typical number of ALDH genes in this group? 2) Is there any evidence that these apparent anomalies in the distribution of

ALDHs in frogs represent horizontal gene transfer (HGT) event(s)?

Horizontal gene transfer (HGT) may be defined as gene transfer between non- mating species. This is in contrast to vertical gene transfer where genes in different species are related by homology, either by speciation events or by gene duplications.

HGT is well documented in bacteria (Arber 2014), and numerous studies have shown transfer of prokaryotic genes to eukaryotes, including primates, flies, and nematodes

(Crisp, Boschetti et al. 2015), although much less is known about the genome-wide incidence of eukaryote-to-eukaryote HGT, let alone in vertebrates. However, numerous gene phylogenies have led to the conclusion that eukaryote-to-eukaryote HGT has occurred (Keeling and Palmer 2008). In many instances, the acquired gene replaced an existing homologue, rather than introducing a new gene (Keeling and Palmer 2008).

Although detection of HGT in vertebrates has gained interest in recent years, little is

66 known about the mechanisms by which genes move and the factors which encourage or discourage this movement (Keeling and Palmer 2008).

To investigate these questions, large gene trees (300 nearest neighboring homologous genes) were assembled for each known frog ALDH protein coding sequence. This serves the two main functions of 1) placing the frog gene within a broad phylogenetic context to be able to judge evidence for unusual gene distributions, and 2) since frog ALDHs are well representative of vertebrate ALDHs, these gene trees allow further examination of the phylogenetic distribution of these genes, especially within the metazoan lineage. I hypothesize that 1) frogs have an unusual set of ALDH genes

(especially ALDH1B1 and ALDH16) due to horizontal gene transfer with various species, 2) large scale gene trees will be able to identify the evolutionary origins of horizontal gene transfer events in frogs, and 3) large scale gene trees will identify the distribution of each ALDH gene family, extending the work from Chapter II.

Methods

Frog ALDH protein sequences were surveyed from multiple databases including

NCBI protein, UniProt, and Xenbase. All unique proteins with a unique chromosomal location were collected and compared to known vertebrate ALDHs to identify and name them. Final chromosomal locations were drawn from the Xenbase genome version 7.1.

To build phylogenetic protein trees, the nearest 300 protein sequences to each frog ALDH protein sequence was retrieved via the HMMER webserver at http://hmmer.janelia.org/search/phmmer (Finn, Clements et al. 2011). These were aligned using T-Coffee using default settings (Notredame, Higgins et al. 2000). Phylogenetic

67 trees were calculated using 100 bootstrap replicates of neighbor-joining trees using

Phylip (http://evolution.genetics.washington.edu/phylip.html).

ALDH16A1 exons were counted by searching the ALDH16A1 protein sequences from representative species against their respective genomes using BLAT in the UCSC genome browser (Karolchik, Bejerano et al. 2007). Syntenic analysis was carried out using the SynMap tool in CoGe (Lyons and Freeling 2008). Only syntenic regions containing ALDH16A1 were recorded.

Results

Frogs were found to have 18 ALDH genes with unique chromosomal locations

(Table 3.1), a number comparable to that found in other vertebrates (i.e. 14-21 genes per organism) (Jackson, Brocker et al. 2011). Although multiple databases were surveyed for frog ALDHs, where multiple records were found, values in Table 3.1 correspond to those found in the Xenbase genome version 7.1 (Karpinka, Fortriede et al. 2015). ALDHs found were typical of ALDH proteins, about 500 amino acids, except for ALDH1L1,

ALDH1L2, ALDH16A1 and ALDH18A1, which are longer and contain multiple domains. Evidence for protein existence as given by the Uniprot database has 5 levels from 1) experimental evidence at protein level to 5) protein uncertain. All frog ALDH genes discovered had protein existence evidence levels (PE) of 2) experimental evidence at transcript level, or 3) protein inferred from homology (2015).

Protein phylogenetic trees for each frog ALDH gene were generated to assess the possibility of horizontal gene transfer in frog. In addition, since the nearest 300 protein records were retrieved and aligned, these trees show the overall distribution of these

68 genes across evolution. ALDH1A1, ALDH1A2, and ALDH1A3 are distributed widely across vertebrates, and are present in frogs as well (Figure 3.1). ALDH1B1 is present exclusively in mammals with the exception of Xenopus tropicalis (no ALDH1B1 gene sequence is available for Xenopus laevis) (Figure 3.2). ALDH2 by contrast, from which

ALDH1B1 is likely derived (Jackson, Brocker et al. 2011) is widely distributed across animals from humans to lower animals such as tunicates (Figure 3.2). In wider context,

Figure 3.2 also shows that ALDH1B1 can be placed early in the vertebrate lineage, compared to ALDH2. Frogs have both ALDH1L1, present only in vertebrates (fish, amphibians, and mammals, but not birds/reptiles), and ALDH1L2, which is much more widely distributed (across all animal groups from mammals to lower animals) (Figure

3.3). Frogs have ALDH3A2, which is widely distributed across vertebrates, but not

ALDH3A1 which is only found in mammals (Figure 3.4). Phylogenetic clustering of

ALDH3B1 was indistinct – e.g. mammal ALDH3B1 and ALDH3B2 clustered together, separated from bird/reptile ALDH3B1 and ALDH3B2. Whereas typically genes will cluster by gene first and then phylogenetic group (e.g. as in ALDH1A1, ALDH1A2), often these genes clustered by phylogenetic group first and then by genes. Thus, where identification of ALDH3B1 and ALDH3B2 genes was not clear, clusters were labeled as

ALDH3B (Figure 3.4). Frogs had one copy each of ALDH3B1 (which clustered with fish

ALDH3B genes), and ALDH3B2, which formed a group between fish and birds/reptiles.

ALDH families 1, 2, and 3 are relatively recent and have limited distribution (Animalia or less), and so the 300 records included all known records and extended into other gene families (e.g. a search of ALDH2 genes found both all ALDH2 proteins, ALDH1B1 proteins, and some ALDH1A proteins). By contrast, other ALDH families discussed here

69

Table 3.1. Frog ALDH genes. Shown are the recommended gene name, Uniprot ID, NCBI gene ID, AA length, Chromosome (or contig), strand (STR), nucleotide start and end, and Uniprot protein evidence level (PE).

are far more widely distributed across kingdoms (and presumably more ancient). Thus, while gene trees represent a wide survey of these genes, they may not include all examples from species evolutionarily distant from the reference organism (Xenopus tropicalis). Although no protein tree can be expected to perfectly reflect evolution, most of the higher ALDH gene trees reflect evolutionary order without any major deviations.

The genes and distributions are as follows: ALDH4A1 (animals, bacteria, archaea, lower eukaryotes), ALDH5A1 (animals, plants, bacteria, fungi), ALDH6A1 (animals, plants, bacteria, lower eukaryotes), ALDH7A1 (animals, plants, bacteria, fungi, lower eukaryotes), ALDH8A1 (animals, bacteria, fungi, lower eukaryotes), ALDH9A1

(animals, bacteria), ALDH16A1 (animals, lower eukaryotes, bacteria), and ALDH18A1

70

Figure 3.1. Protein phylogenetic trees of frog ALDH1A1 (left), ALDH1A2 (center) and ALDH1A3 (right).

Figure 3.2. Protein phylogenetic trees of frog ALDH1B1 (left) and ALDH2 (right).

71

Figure 3.3. Protein phylogenetic trees of frog ALDH1L1 (left) and ALDH1L2 (right).

Figure 3.4. Protein phylogenetic trees of frog ALDH3A2 (left) and ALDH3Bs (right).

72

(animals, plants, lower eukaryotes). These protein trees are presented in figures 3.5-3.8.

The one exception is that ALDH16A1 forms three major groups, comprising 1) a group including mammals, fish, and reptiles which has lost the residues required for catalytic activity, 2) a group including amphibians, the coelacanth, lower animals and lower eukaryotes, and 3) a group comprising bacteria. Both of the latter groups retain residues required for catalytic activity. Figure 3.9 shows a protein tree of ALDH16A1 compared to the standard evolutionary model in vertebrates (adapted from (Amemiya, Alfoldi et al.

2013)).

Based on these results, only ALDH16A1 was selected for further syntenic analysis. Two possible hypotheses were considered: 1) the original hypothesis that non- catalytic ALDH16A1 developed in the early vertebrate lineage and that frogs acquired a

‘lower animal-like’ catalytic ALDH16A1, or 2) the hypothesis suggested by Figure 3.8, that the non-catalytic ALDH16A1 developed in the ancestors of mammals, birds, and reptiles, and that fish acquired this protein (or vice versa, having developed in fish and being acquired by an amniote common ancestor). One marker of single gene structure is intron structure. Intron structure between the groups was investigated via the UCSC genome browser BLAT tool (Karolchik, Bejerano et al. 2007). Table 3.2 shows the number of introns for representative members, which were found similar (14-19 exons in all groups). In addition, intron/exon structure was broadly similar (data not shown).

Syntenic analysis can be used to find groups of homologous genes that are present in contiguous regions on chromosomes of two different species. Both the genomes of

Danio rerio and Xenopus tropicalis were compared against a list of representative

73

Figure 3.5. Protein phylogenetic trees of frog ALDH4A1 (left) and ALDH5A1 (right).

Figure 3.6. Protein phylogenetic trees of frog ALDH6A1 (left) and ALDH7A1 (right).

74

Figure 3.7. Protein phylogenetic trees of frog ALDH8A1 (left) and ALDH9A1 (right).

Figure 3.8. Protein phylogenetic trees of frog ALDH16A1 (left) and ALDH18A1 (right).

75

Figure 3.9. Protein phylogenetic trees of the standard model of vertebrate evolution (left) and frog ALDH16A1 (right).

76

Table 3.2. Exons present in ALDH16A1 by species. Species (Common name) exons Bos taurus (Bovine) 16 Sus scrofa (Pig) 17 Ailuropoda melanoleuca (Giant panda) 17 Homo sapiens (Human) 17 Mus musculus (Mouse) 17 Ornithorhynchus anatinus (Duckbill 14 platypus) Danio rerio (Zebrafish) 16 Oreochromis niloticus (Nile tilapia) 17 Anolis carolinensis (Green anole) 15 Xenopus tropicalis (Western clawed frog) 17 Branchiostoma floridae (Florida lancelet) 17 Ciona intestinalis (Transparent sea squirt) 19

genomes across both animal groups of ALDH16A1 genes (Table 3.3). Few overall matches were found, compared to the number of species investigated, even between some closely-related species.

Discussion

Frog ALDHs were investigated for two reasons: 1) there were no amphibians represented in the last update of vertebrate ALDHs (Jackson, Brocker et al. 2011), and 2) to investigate apparent oddities in ALDH distribution in frogs. Regarding the first point,

18 ALDH genes were found in frogs. In fact, frogs have an identical number of ALDH compared to humans with the exception of ALDH3A1, which is found only in mammals.

Frogs have two ALDH3B genes, but although ALDH3B genes are named sequentially in vertebrates (i.e. ALDH3B1, ALDH3B2), it is not clear that these represent orthologous

77 genes. Evidence suggests multiple duplications in this gene family in many groups and the ALDH3B gene family in vertebrates needs to be analyzed more closely in the future.

Regarding oddities in the ALDH distribution in frogs, no evidence was found in this report of horizontal gene transfer for ALDH1B1 in frogs. As seen in Figure 3.2,

ALDH1B1 is present in frogs in a position consistent with known evolutionary relationships – it would not be unusual for a bird/reptile ancestor to lose an ALDH gene.

Further, looking at a broad ALDH2 gene tree, the ALDH1B1 branch is placed at the base of vertebrate evolution, suggesting that it may have occurred even earlier, but been lost in other species. However it remains unusual that ALDH1B1 is found in Xenopus tropicalis, but not Xenopus laevis. This may reflect further progress in assembling the X. tropicalis genome compared to X. laevis. The understanding of the distribution of ALDH1B1 within amphibians will be helped greatly by the sequencing of additional amphibian complete genomes.

Similarly, when a more complete distribution of ALDH16A1 in vertebrates was assembled, it became clear that frog ALDH16A1 formed an evolutionarily contiguous group with coelacanth, lancelet, and lower animals, containing a putatively enzymatically active ALDH16A1. Fish, on the other hand were clearly unusual in having a non- catalytic ALDH16A1, along with mammals and reptiles (birds do not possess

ALDH16A1). Neither exon structure nor syntenic analysis was able to shed any further light on the relationships between the groups. It is likely that finding of contiguous syntenic regions was disrupted by a number of factors, especially the incomplete nature of many lower animal and non- genomes, and that more complete chromosome representation will allow better discovery of homologous regions. Given the

78 evolutionary history of these groups, the most plausible explanation is that non-catalytic

ALDH16A1 developed in fish (after future-tetrapods had left that genetic lineage) and this gene was acquired by an early amniote ancestor. Protein similarity is the primary data that supports this conclusion. Given the likely time of this transfer, it is not surprising that the two groups have diverged significantly since that time, although

ALDH16A1 structure has been conserved. As more species have genomes sequenced and deeper sequencing / chromosome organization of current species occurs, deeper synteny analyses will be possible, allowing further resolution of this question.

This report has significantly expands understanding of the origins and distribution of vertebrate ALDH genes. It is shown that the ALDH1, ALDH2, ALDH1L, and ALDH3 gene families likely originated in vertebrates and are exclusive to that lineage. The other

ALDH gene families, ALDH4, ALDH5, ALDH6, ALDH7, ALDH8, ALDH9, ALDH16, and ALDH18 gene families are likely evolutionarily ancient and extend far back into evolutionary history. Discovering their full extent and distribution will be the subject of future work. In addition, the ALDH distribution of lower animals, fungi, protists, bacteria and archaea are poorly understood at this time and will be the subject of future phylogenetic and nomenclature work.

79

CHAPTER IV

COMPARATIVE GENOMICS, MOLECULAR EVOLUTION AND

COMPUTATIONAL MODELING OF ALDH1B1 AND ALDH2

Summary

Vertebrate ALDH2 genes encode mitochondrial enzymes capable of metabolizing acetaldehyde and other biological aldehydes in the body. Mammalian ALDH1B1, another mitochondrial enzyme sharing 72% identity with ALDH2, is also capable of metabolizing acetaldehyde but has a tissue distribution and pattern of activity distinct from that of

ALDH2. Bioinformatic analyses of several vertebrate genomes were undertaken using known ALDH2 and ALDH1B1 amino acid sequences. Phylogenetic analysis of many representative vertebrate species (including fish, amphibians, birds and mammals) indicated the presence of ALDH1B1 in many mammalian species and in frogs (Xenopus tropicalis); no evidence was found for ALDH1B1 in the genomes of birds, reptiles or fish.

Predicted vertebrate ALDH2 and ALDH1B1 subunit sequences and structures were highly conserved, including residues previously shown to be involved in catalysis and coenzyme binding for human ALDH2. Studies of ALDH1B1 sequences supported the hypothesis that the ALDH1B1 gene originated in early vertebrates from a retrotransposition of the vertebrate ALDH2 gene. Given the high degree of similarity between ALDH2 and ALDH1B1, it is surprising that individuals with an inactivating mutation in ALDH2 (ALDH2*2) do not exhibit a compensatory increase in ALDH1B1 activity. I hypothesized that the similarity between the two ALDHs would allow for

80 dominant negative heterotetramerization between the inactive ALDH2 mutants and

ALDH1B1. Computational-based molecular modeling studies examining predicted protein-protein interactions indicated that heterotetramerization between ALDH2 and

ALDH1B1 subunits was highly probable and may partially explain a lack of compensation by ALDH1B1 in ALDH2*2 individuals.

Introduction

The aldehyde dehydrogenase (ALDH; EC 1.2.1.3) gene superfamily

(http://aldh.org/superfamily.php) encodes ALDHs which participate in metabolic pathways involving aldehydes in the body, examples of which include alcohol, retinoids, neurotransmitters, lipids, amino acids, drugs and xenobiotics (Marchitti, Brocker et al.

2008). The human ALDH2 gene is localized on chromosome 12 (Hsu, Bendel et al. 1988) and encodes liver mitochondrial ALDH2 which functions in acetaldehyde and peroxidic aldehyde metabolism (Wang, Nakajima et al. 2002; Ohta, Ohsawa et al. 2004; Marchitti,

Brocker et al. 2008). The physiological importance of ALDH2 is illustrated best in East

Asian human subjects possessing a dominant inactive genetic variant, ALDH2*2

(E487K). This variant results in alcohol ‘flushing’ and a lowered acetaldehyde clearance capacity, arising from lowered coenzyme (NAD+) binding affinity and is generally considered inactive (greater than 90% loss of enzyme activity) (Yoshida, Ikawa et al.

1985; Goedde and Agarwal 1987; Higuchi, Muramatsu et al. 1992; Cook, Luczak et al.

2005; Chen, Peng et al. 2009). Typically, ALDH2 forms homotetramers in vitro, but the dominant negative effect of ALDH2*2 on ALDH2 activity has been shown to be due to the formation of ALDH2/ALDH2*2 heterotetramers which have poor activity in vitro

81

(Wang, Sheikh et al. 1996). ALDH1B1 is another mitochondrial ALDH and was previously designated as ALDHx or ALDH5. It is encoded by an intronless coding region gene (ALDH1B1) on (Hsu and Chang 1991). ALDH1B1 shares 72 percent amino acid identity with ALDH2, and like ALDH2, it forms homotetramers and is capable of metabolizing acetaldehyde in the body, in addition to other short chain aldehydes (Stagos, Chen et al. 2010). ALDH1B1 is present in the liver, as well as in the stomach and small intestines, where it likely plays a role in first-pass metabolism of alcohol-derived acetaldehyde. Immunohistochemical staining of ALDH1B1 in human liver shows a strong prevalence of ALDH1B1 (Chen, Orlicky et al. 2011), although these are likely lower than ALDH2 levels based on reported relative transcript levels (Stagos,

Chen et al. 2010). With a Km of 3.2 µM for acetaldehyde, ALDH2 plays a major role in acetaldehyde metabolism in “normal” ALDH2*1/*1 individuals who have ingested moderate amounts of ethanol (Klyosov, Rashkovetsky et al. 1996). In subjects in whom

ALDH2 has been pharmacologically inhibited, the aversive effects of acetaldehyde typically begin around 40-60 µM (Johnsen, Stowell et al. 1992). In an alcoholic population, consumption of a 0.5g/kg dose of ethanol (equivalent to 3-4 drinks in an average weight man), individuals with a normal ALDH2 (ALDH2*1/*1) resulted in a blood acetaldehyde concentration of 1.8 µM (Chen, Peng et al. 2009). However, under the same conditions, alcoholic ALDH2 heterozygous (ALDH2*1/*2) and mutant

(ALDH2*2/*2) individuals achieved a blood acetaldehyde concentration of 57.5 µM and

108.7 µM, respectively. Values were similar for non-alcoholic populations of

ALDH2*1/*1 and ALDH2*1/*2 individuals, but the ethanol dose was poorly tolerated by

ALDH2*2/*2 individuals so no data are available for that group. At these elevated

82 concentrations, ALDH1B1 (Km - 55 µM) would be expected to participate in acetaldehyde metabolism. There is also some evidence that ALDH1B1 participates in the ethanol detoxification pathway. Epidemiological data has revealed that mutations in

ALDH1B1 result in physiological effects which are consistent with increased acetaldehyde toxicity (i.e. ethanol aversion and ethanol hypersensitivity reactions)

(Husemoen, Fenger et al. 2008; Linneberg, Gonzalez-Quintela et al. 2010). Finally, in a newly developed ALDH1B1 mouse knockout model, preliminary data suggests that acetaldehyde levels are higher in ALDH1B1 deficient mice than in wild-type mice upon ethanol feeding [Manuscript in preparation]. Given the acetaldehyde metabolizing capacity of ALDH1B1, it is surprising that ALDH2*2 individuals are so susceptible to acetaldehyde toxicity in that their reduced metabolic activity is not compensated for by

ALDH1B1 activity to a greater extent. I hypothesize that part of the reason that this does not occur may be due to ALDH2*2 forming heterotetramers with ALDH1B1 and thereby exerting a similar negative effect on ALDH1B1 activity as has been seen for ALDH2.

This possibility was first suggested by the authors (Vasiliou) in 2009 (Endo, Sano et al.

2009).

Given the similarities between ALDH1B1 and ALDH2, it is important to understand their similarities and differences at a sequence and phylogentic level to better identify what unique roles ALDH1B1 may play in physiology and pathophysiology.

Furthermore, given their structural similarity and the mechanism by which ALDH2*2 monomers lower the activity of ALDH2, the possibility of a similar heterotetramerization-based suppression of ALDH1B1 warrants examination. This study identifies and describes the predicted sequences, structures and phylogeny of vertebrate

83

ALDH2 and ALDH1B1 genes and enzymes and compares these results with those previously reported for human (Homo sapiens) and mouse (Mus musculus) ALDH2 and

ALDH1B1. Phylogenetic analyses describe the relationships and potential origins of

ALDH2 and ALDH1B1 genes during mammalian and vertebrate evolution. This work also describes molecular modeling studies of protein-protein interactions between

ALDH1B1 and ALDH2 subunits to examine whether dominant negative heteromerization of ALDH2*2 with ALDH1B1 may contribute to a lack of compensation by ALDH1B1 in ALDH2*2 individuals.

Methods

Vertebrate ALDH2 and ALDH1B1 gene and enzyme identification and phylogenetic analysis: ALDH1B1 and ALDH2 sequences for multiple representative vertebrate species were retrieved from major databases (UniProt, NCBI) using single and iterative HMMER profile searches (phmmer and jackhmmer) using human ALDH1B1 and ALDH2 to seed searches (http://hmmer.janelia.org/search/phmmer). Confirmation of the presence or absence of ALDH2 and ALDH1B1 genes in vertebrate genomes was based on several methods, including HMMER profile searches of completed proteomes

(as listed by UniProt), and protein BLAT searches (UCSC Genome Browser - http://genome.ucsc.edu) (Karolchik, Bejerano et al. 2007). Human ALDH1A1,

ALDH1A2 and ALDH1A3 were used as the outgroup as these sequences are the most closely related to ALDH1B1 and ALDH2. In addition, all sequences that were considered

ALDH2 or ALDH1B1 by the automated analysis of NCBI homologene were retrieved

(http://www.ncbi.nlm.nih.gov/homologene). All sequences were aligned using the most

84 accurate version of T-Coffee (http://tcoffee.crg.cat/) and phylogenetic trees were constructed using neighbor joining methods with 1000 replicate bootstrap in PHYLIP

3.69 (http://evolution.genetics.washington.edu/phylip.html). Sequences were identified as members of either the ALDH1B1 or ALDH2 group. Gene locations, predicted gene structures, and protein sequences were observed for each ALDH examined. Comparative structures for human and mouse ALDH2 and ALDH1B1 genes were derived from the

AceView website (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/) (Thierry-Mieg and Thierry-Mieg 2006); the major isoform was used in each case, with capped 5’- and

3’- ends for the predicted mRNA sequences including introns and coding exons. For analysis of secondary structure and oligomerization residues, alignments of human

ALDH2 (Hempel, Kaiser et al. 1985), ALDH1B1 (Stewart, Malek et al. 1996; Stagos,

Chen et al. 2010), mouse ALDH2 (Chang and Yoshida 1994) and ALDH1B1 (Stagos,

Chen et al. 2010) and of other predicted vertebrate ALDH2 and ALDH1B1 sequences were calculated using a ClustalW-technique (http://www.ebi.ac.uk/clustalw/) (Thompson,

Higgins et al. 1994).

Subunit docking studies and comparative binding energies and stabilities for vertebrate ALDH2 and ALDH1B1 subunits: Web tools were used to predict secondary structures and mitochondrial leader sequences for vertebrate ALDH2 and ALDH1B1 subunits (http://swissmodel.expasy.org/workspace). The crystal structure for human

ALDH2 (PDB ID: 1O01) (Steinmetz, Xie et al. 1997), ALDH2*2 (PDB ID: 2ONM)

(Larson, Zhou et al. 2007), and sheep ALDH1A1 (PDB ID: 1BXS) (Moore, Baker et al.

1998) were downloaded from the RCSB Protein Data Bank (http://www.rcsb.org/). A homology model was created for human ALDH1B1 from ALDH2 and for human

85

ALDH1A1 from sheep ALDH1A1. As controls to evaluate the homology modeling and docking studies, a human ALDH2 homology model was created from the human

ALDH1A1 structure and a human ALDH1A1 homology model was created from the

ALDH2 structure. Computational studies were conducted using Discovery Studio 3.1

(Accelrys, San Diego; http://accelrys.com). All proteins were prepared using the prepare protein function with a CHARMM force field (Brooks, Bruccoleri et al. 1983). For each docking study, a single subunit of ALDH1B1, ALDH2, or ALDH2*2 was minimized and then docked against a homotrimer consisting of three subunits of ALDH1B1 or ALDH2.

Minimization was performed using a steepest descent method followed by the conjugate gradient method (using 10,000 steps and an RMS gradient 0.01) utilizing the Generalized

Born implicit solvent model (Feig and Brooks 2004) to approximate the shape of a formed tetramer and remove steric overlaps. Docking was completed using the ZDOCK

(Chen and Weng 2002) and ZRANK (Pierce and Weng 2007) algorithms to obtain the most likely pose and to calculate the interaction energy, i.e., the energy of interaction between the monomer and trimer. The best pose was minimized (as noted above) to calculate the protein stability parameter, i.e., the total stability of the final protein tetramer in solution. Individual weak interactions between amino acids were measured by calculating hydrogen bonds (cutoff of 3.5Å).

Results

Phylogeny and evolution of vertebrate ALDH2 and ALDH1B1: The predicted locations, sizes and number of coding exons for vertebrate ALDH1B1 and ALDH2 genes examined are presented in Tables 4.1 and 4.2, respectively. The ALDH1B1 sequences

86 were found in frogs (Xenopus tropicalis) and mammals, including marsupials - opossum

(Monodelphis domestica) and Tasmanian devil (Sarcophilus harrisii) - and a monotreme species - platypus (Ornithorhynchus anatinus), but notably absent from birds (Gallus gallus, Taeniopygia guttata) and lower species, such as zebrafish (Danio rerio) (Table 1).

In contrast, ALDH2 sequences were identified in all mammalian, bird, lizard, frog and fish genomes examined, although there were three such ALDH2 genes identified in the zebrafish (Danio rerio) genome (Table 4.2). The phylogram demonstrates the separation of the sequences into three distinct groups during vertebrate evolution, namely

ALDH1B1, ALDH2 and the ‘outgroup’ ALDH1A-like sequences (Figure 4.1), and suggests that ALDH1B1 genes have been derived from an ancestral vertebrate ALDH2 gene. The phylogenetic distribution of ALDH1B1 genes have not been described previously in this detail, and the gap distribution (namely the lack of ALDH1B1 in birds) suggests that either multiple duplication events occurred or ALDH1B1 was lost early in the avian lineage.

Predicted gene locations and exonic structures for vertebrate ALDH2 and

ALDH1B1 genes: Each vertebrate ALDH2 gene examined had 13 coding exons (Figure

4.2, Table 4.2), whereas the vertebrate ALDH1B1 genes consistently contained only a single coding exon with the single exception that marmoset ALDH1B1 has two coding exons (Figure 4.2, Table 4.1). In contrast to all of the other vertebrate genomes, the zebrafish (Danio rerio) genome contained three predicted ALDH2 genes, designated as

ALDH2a, ALDH2b and ALDH2c, which are closely localized on chromosome 5 (Jackson,

Brocker et al. 2011). This is indicative of an origin of these multiple genes from successive unequal crossover events in the ancestral fish ALDH2 gene. These gene

87

a a

ve ve

-

RefSeq refers to the NCBI reference sequence; sequence; reference NCBI tothe refers RefSeq

1 1

elected vertebrate species. species. vertebrate elected

uence; inc an incomplete genome sequence was available for analysis; +ve positive strand; strand; positive +ve analysis; for available was sequence genome incomplete incan uence;

gorilla sequence was derived from BLAT using the UCSC web browser; na not available; ^ gene scaffold ID; * * ID; scaffold gene ^ available; not na browser; web UCSC the using from BLAT derived was sequence gorilla

b

ALDH1B1 and ALDH1A1 genes and enzymes in s in enzymes genes and ALDH1A1 and ALDH1B1

predicted NCBI sequence; sequence; NCBI predicted seq nucleotide of pairs base bps ID; contig strand negative Table 4.1. Table

88

a a

ve ve

-

strand; strand;

RefSeq refers to the NCBI reference sequence; sequence; reference NCBI tothe refers RefSeq

1 1

gorilla sequence was derived from BLAT using the UCSC web browser; na not available; ^ gene scaffold ID; * * ID; scaffold gene ^ available; not na browser; web UCSC the using from BLAT derived was sequence gorilla

b

I sequence; Isequence;

ALDH1B1 and ALDH1A1 genes and enzymes in selected vertebrate species. species. vertebrate selected in enzymes genes and ALDH1A1 and ALDH1B1

.

2

predicted NCB predicted positive +ve analysis; for available was sequence genome incomplete incan sequence; nucleotide of pairs base bps ID; contig strand negative Table 4. Table

89

Figure 4.1. Phylogenetic tree for vertebrate ALDH2 and ALDH1B1 protein sequences. The tree is labeled with the protein name and the species name of the vertebrate. Note the major clusters for the vertebrate ALDH1B1, ALDH2 and human ALDH1A sequences. The tree is ‘rooted’ with human ALDH1A1, ALDH1A2 and ALDH1A3 sequences. See Tables 1 and 2 for details of sequences and gene locations. Note the absence of ALDH1B1 sequences from birds, reptiles and fish, whereas ALDH2 sequences were observed for all vertebrate genomes examined, including three sequences for the zebrafish genome.

90 duplication events may have arisen from rapid, lineage-specific expansion of the zebrafish genome mediated by recent tandem duplications, as reported for other genes of this teleost species (Lu, Peatman et al. 2012).

Alignments of human, mouse and frog ALDH2 and ALDH1B1 amino acid sequences: Amino acid alignments for human (Homo sapiens) (Hempel, Kaiser et al.

1985), mouse (Mus musculus) (Chang and Yoshida 1994) and frog (Xenopus tropicalis)

ALDH2 and ALDH1B1 sequences (Stagos, Chen et al. 2010) are shown in Figure 4.3.

Comparisons of the ALDH2 and ALDH1B1 sequences with the human ALDH2 sequence, for which the tertiary structure has been described (PDB ID: 1CW3A)

(Steinmetz, Xie et al. 1997; Larson, Zhou et al. 2007), enabled identification of key residues contributing to catalysis, structure and function. Mitochondrial ALDH2 N- terminal leader sequences (human ALDH2 residues 1-24), which enable ALDH2 uptake into the (Zhou, Bai et al. 1995), were predicted for each of the other vertebrate ALDH2 and ALDH1B1 sequences examined. These sequences were highly divergent and varied in length from 12 residues (opossum and marmoset ALDH1B1) to

32 residues (lizard ALDH2) (Table 4.1). Active site residues (human ALDH2 numbers used) binding the substrate (Glu285; Cys319) or stabilizing the transition state for the catalyzed reaction (Asn186) were conserved in all ALDH2 and ALDH1B1 sequences examined (Figure 4.3). Within the NAD+ binding domain, a dinucleotide-binding motif near the N-terminal end of the αG helix (262Gly-Ser-Thr-Glu-Val-267Gly) (Steinmetz,

Xie et al. 1997) was predominantly conserved in the vertebrate ALDH2 and ALDH1B1 sequences examined with the exception of a 266Ile/Val substitution in the human

ALDH2 sequence (Figure 4.3). Glu487 was also conserved among all vertebrate ALDH2

91

Figure 4.2. Comparative structures for human and mouse ALDH2 and ALDH1B1 genes. Data are derived from the AceView website (http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/). The major isoform for each gene is shown, with capped 5’- and 3’- ends for the predicted mRNA sequences. Introns and coding exons are shown. Note the differences in scale for the ALDH2 (44 or 28 kbs) and ALDH1B1 (both 6kbs) genes. Coding exons are depicted as pink (or shaded) bars and untranslated 5’ and 3’ regions are shown as white bars. CpG islands located in the 5’- promoter regions are labeled. The location of predicted binding sites for the transcription factors Myb, Myc and PPARG in the promoter region for the human genes are also presented. Comparative expression levels are shown as (x) times the average expression level; NM refers to the NCBI RefSeq sequence. Flags depict the 3’UTR polyadenylation sites. Black flags indicate the typical AATAAA signal, and blue flags indicate a SNP variant of this signal. Filled flags correspond to events with many supporting cDNAs, and empty flags indicate fewer supporting cDNAs.

92 and ALDH1B1 sequences examined. This residue is subject to genetic variation

(ALDH2*2) in East Asian human populations. Specifically, the substitution with Lys487 lowers coenzyme (NAD+) binding affinity and results in greater than a 90% loss of enzyme activity. The consequent reduced catalytic activity of ALDH2*2 lowers acetaldehyde clearance capacity, rendering the subject vulnerable to alcohol ‘flushing’ and other unpleasant physiological symptoms attending acetaldehyde accumulation following alcohol consumption (Yoshida, Ikawa et al. 1985; Goedde and Agarwal 1987;

Higuchi, Muramatsu et al. 1992; Cook, Luczak et al. 2005; Chen, Peng et al. 2009).

Predicted secondary structures for mouse and frog ALDH2 subunits and human, mouse and frog ALDH1B1 subunits were compared with those previously reported structure for human ALDH2 (Steinmetz, Xie et al. 1997) (Figure 4.3). Alpha-helix and β- sheet locations (consistent with experimental three-dimensional structures for human

ALDH2) were observed for each of the ALDH2 and ALDH1B1 sequences examined, suggesting strong conservation of amino acid sequences and secondary structure

(Steinmetz, Xie et al. 1997). Three distinct domains for each subunit of tetrameric

ALDH2 have been previously identified: (i) the coenzyme (NAD+) binding domain (β strands 1-4 and 7-11; helices A to G and helix N), (ii) the catalytic domain (β strands 12-

18; helices H to M), and (iii) the oligomerization domain (β strands 5, 6 and 19). Major changes in structure have been reported for the ALDH2*2 tetramer, specifically disruptions in a disordered region surrounding the Lys487 substitution (which is located at the dimer interface involving αG, β10 and β11) and an alteration in the loop consisting of residues 463-478, all of which are involved in NAD+ binding (Larson, Zhou et al.

2007). These changes in structure cause a 3Å shift of the αG helix towards the active site

93

Figure 4.3. Amino acid sequence alignments for human, mouse and frog ALDH2 and ALDH1B1 sequences. See Tables 4.1 and 4.2 for sources of ALDH1B1 and ALDH2 sequences, respectively. Symbols below all of the sequences describe the similarities in the amino acids across the sequences at each position and denotes them as being identical (*), highly similar (:) or less similar (.). Residues with no symbols have no similarity. Key functional residues include N-signal peptide residues (red text or bold) and the active site triad residues Asn; Glu; and Cys (pink shading or AS). Predicted secondary structural regions, shown as -helix (yellow shading or black shading) and -sheet (grey shading), are based on (Steinmetz, Xie et al. 1997; Larson, Zhou et al. 2007) for human ALDH2. Known or predicted exon junctions are identified by underlined bold font, with exon numbers referring to the human ALDH2 gene. Other important enzyme regions, including the NAD+ binding domain (green shading or shaded ****:*) and the ALDH2*2 variant position (blue shading or #) are shown.

94 cleft which may significantly contribute to the dramatically reduced affinity of ALDH2*2 towards NAD+, and thereby reduce ALDH2 activity.

Proposed retroviral origin for frog and mammalian ALDH1B1 genes: The evolutionary origin of the vertebrate ALDH1B1 gene may have occurred in a two stage process, namely the retroviral integration of an ALDH2 cDNA segment into an ancestral amphibian chromosome, and a second retroviral integration of an ALDH2 cDNA segment into an ancestral mammalian chromosome, which was retained throughout subsequent monotreme, marsupial and eutherian mammalian evolution. This proposed origin for the vertebrate ALDH1B1 genes is supported by the following evidence: (i) the high levels of sequence identities observed for vertebrate ALDH2 and ALDH1B1 amino acid sequences

(Figures 4.3 and 4.4; Tables 4.1 and 4.2), (ii) the phylogenetic clustering observed for vertebrate ALDH2 and ALDH1B1 sequences (Figure 4.1), and (iii) the presence of identical or similar CpG islands within the genomic structures for human and mouse

ALDH2 and ALDH1B1 genes (Figure 4.2). Moreover, the generation of single exon coding genes from a retroviral integration process derived from a cDNA has been described for many vertebrate genes, which serves as a precedent for such a mechanism.

More than 400 retrogenes have been reported to contribute to the appearance of novel coding sequences during mammalian evolution where the mRNA transcript is reverse- transcribed and integrated into genomic DNA (Yu, Morais et al. 2007; Cordaux and

Batzer 2009). These include several retrotransposon-encoding enzymes differentially expressed in the body, examples of which are phosphoglycerate kinase 2 (PGK2) which encodes a sperm-specific enzyme functioning in glycolysis (McCarrey, Kumari et al.

1996), ribosomal protein genes that have been retrotransposed from an X-chromosome

95

Figure 4.4. Comparative amino acid sequences alignments for ALDH2 and ALDH1B1 subunit binding domains. See Tables 4.1 and 4.2 for sources of ALDH1B1 and ALDH2 sequences, respectively. Three regions for vertebrate ALDH2 and ALDH1B1 sequences are shown, including dimer-dimer (β5); NAD+ binding site, monomer-monomer αG and β18, and monomer-monomer (β19). Amino acid residues are color coded: yellow for P (proline); green for hydrophilic amino acids, S (serine), Q (glutamine), N (asparagine), and T (threonine); brown for glycine (G); light blue for hydrophobic amino acids, L (leucine), I (isoleucine), V (valine), M (methionine), W (tryptophan); dark blue for amino acids, T (tyrosine) and H (histidine); purple for acidic amino acids, E (glutamate) and D (aspartate); and red for basic amino acids, K (lysine) and R (arginine). The site of the ALDH2*2 variant (E487K) is identified.

96

Figure 4.5. A proposal for the evolutionary appearance of the mammalian and frog ALDH1B1 genes by retroviral integration of ancestral ALDH2 cDNA sequences. Proposed evolutionary appearance for the single coding exon ALDH1B1 gene derived from a retrovirally-induced integration of ALDH2 cDNAs and the subsequent integration into another chromosome within an ancestral vertebrate genome; ALDH2 and ALDH1B1 gene structures were derived from AceView.

97 gene (Uechi, Maeda et al. 2002), and a testis-specific form of the human E1 alpha subunit (Dahl, Brown et al. 1990). It appears likely that a retroviral origin for the frog and mammalian ALDH1B1 genes is consistent with these and other examples for such events during vertebrate evolution (Figure 4.5).

Human ALDH2 and ALDH1B1 subunit docking studies: Figure 4.4 compares aligned AA sequences for the oligomer-forming residues (previously identified for human ALDH2 (Chang and Yoshida 1994; Steinmetz, Xie et al. 1997)) in human, mouse, opossum, frog and fruit fly ALDH2 sequences and in human, mouse, opossum, and frog

ALDH1B1 sequences. Monomer-monomer ALDH2 and ALDH1B1 binding amino acid residues located in the αG, β18 and β19 secondary structures for these enzymes were predominantly identical in sequence, whereas the dimer-dimer ALDH2 and ALDH1B1 binding residues located in the β5 secondary structure were identical for all of the vertebrate ALDH2 and ALDH1B1 sequences examined. This demonstrates a high degree of conservation for the oligomer-forming residues for vertebrate ALDH2 and ALDH1B1 subunits and suggests that heterotetramers between these subunits may be formed, particularly given their colocalization within mitochondria.

To evaluate variations due to the homology modeling process, a homology model of ALDH2 was created based on the structure of ALDH1A1, a homology model of

ALDH1A1 based on the structure of ALDH2 was created (66 per cent amino acid identity). The ALDH2 homology model was docked against a trimer of ALDH2, and the

ALDH1A1 homology model was docked against a trimer of ALDH2. Interaction energy and protein stability were calculated for each (Table 4.3), and an RMSD (root mean squared distance) was calculated between the original and the homology model e.g.,

98

ALDH1A1 vs. ALDH1A1 homology model. The difference in interaction energy between the original homotetramer and homology model for ALDH1A1 and ALDH2 was

9.3 per cent and 22.5 per cent, respectively. The difference between protein stability for

ALDH1A1 and ALDH2 was 0.26 per cent and 0.25 per cent respectively. The RMSD between the control and homology model for ALDH1A1 and ALDH2 was 0.98Å and

1.49Å, respectively. This indicates that the models are broadly comparable, especially in respect to protein stability and RMSD (in most cases < 2.0Å is considered highly successful), but care must be taken comparing binding energy differences under approximately 20 per cent.

Computational protein-protein docking studies were undertaken for human

ALDH2 and ALDH1B1 subunits to investigate whether these subunits were capable of forming heterotetramers in silico. In these studies, a single monomer of protein

(ALDH1B1, ALDH2*2 or ALDH2) was bound to a homotrimer of either ALDH1B1 or

ALDH2. Comparison of the most energetically favorable conformation of each tetramer suggested that all tetramers use a similar binding modality to known ALDH2 crystal structures (Figure 4.6). Analysis of the interaction energy and protein stability (Table 4.3) suggests that all of the complexes (hetero- and homo-tetramers) are energetically favorable and may be assembled in nature. The binding of either ALDH2 or ALDH2*2 to an ALDH2 trimer was used as a positive control. Previous in vitro studies have indicated that ALDH2*2 has very little catalytic activity and that heteromerization between

ALDH2 and ALDH2*2 accounts for the lack of sufficient active ALDH2 in heterozygous individuals (Wang, Sheikh et al. 1996). The in silico docking studies indicated that the

99 binding of ALDH2*2 to ALDH2 is less favorable than binding of the wild-type ALDH2 to an ALDH2 trimer by 11 per cent, although it should be noted that this occurs with a high enough frequency in vivo to strongly reduce total ALDH2 activity in heterozygotes.

In the present study, all calculated protein stability values were broadly similar between homo- and heterotetramers of the same trimer, with variations from 0.25 to 1.35 per cent.

This indicates that no gross deformations are caused by heterotetramer formation.

Interestingly, the binding of either ALDH2 or ALDH2*2 to the ALDH1B1 trimer is more energetically favorable (44 per cent, and 19 per cent, respectively) than is ALDH1B1 binding to an ALDH1B1 trimer.

Table 4.3. Comparative docking interaction energies and protein stabilities for ALDH2 and ALDH1B1 subunits.a

a Energies for the binding of an ALDH monomer to an ALDH trimer were calculated in silico. The interaction energy parameter for each indicates the energy of interaction between the monomer and the trimer, whereas the protein stability parameter represents the calculated total stability of the final protein tetramer in solution.

100

Figure 4.6. ALDH2 and ALDH1B1 subunit-subunit docking results. Predicted three- dimensional ALDH2 and ALDH1B1 subunit-containing tetrameric structures. Trimer (ALDH2 and ALDH1B1) subunits are shown in yellow, red and purple. The monomer (ALDH2, ALDH2*2 or ALDH1B1), which completes the tetramer in the docking studies, is shown in green. The top-right and bottom-left subunits form a dimer (green and yellow), as do the top-left and bottom-right subunits.

101

Given that amino acids in the oligomerization domains were not completely conserved, hydrogen bonds made between the docked and minimized ALDH monomers and trimers were calculated and recorded (Table 4.4). Overall, there were 102 to 115 hydrogen bond interactions made between the monomer and trimers upon binding. More than half of these occurred in the monomer-monomer interface, with the remainder concentrated in the dimer-dimer interface (specifically between the two subunits which make β5:β5 interactions), and a few interactions were made between the monomer and the subunit diagonal from it (which does not participate in any canonical monomer- monomer or dimer-dimer binding). To determine which amino acid residues participate in subunit interactions, hydrogen bonds made by each residue in the monomer-monomer interface (αG:αG, β18:β19) and the dimer-dimer interface (β5:β5) were tallied. These were counted if they were made by either the monomer or its homologous residue on the trimer. Typically interactions were symmetrical (each interaction was made with the same donor / acceptor residues from the monomer to the trimer and from the trimer to the monomer), but this was not always the case. These interactions were summed between the monomer and trimer to show trends, but individual variation in binding patterns (data not shown) likely contributes to differences seen in binding energies between different tetramer assemblies. Additional analyses were conducted to determine if the residues participating in interactions were identical between ALDH1B1 and ALDH2. Of the residues making interactions between subunits, those on beta sheets were fully conserved

(three residues in β5, four residues in β18, and three residues in β19). The interactions made were also highly conserved, with the exception that ALDH2 or ALDH2*2 monomers did not make a Lys142 interaction when binding to an ALDH1B1 trimer.

102

Further, an Asn454 interaction was only seen when ALDH2 was bound to an ALDH1B1 trimer. Although the interactions between the two αG helices were less conserved than the interactions between the β sheets, about half (4/7) of the residues were conserved

(same amino acid) and about half (4/7) of the positions were conserved (interaction at the homologous amino acid position). However, these small differences in helix-helix binding do not appear to disrupt overall binding – these bonds represent a small fraction

Table 4.4. Specific interactions made by ALDH homo- and heterotetramers.a

a Hydrogen bonds made between the ALDH monomers and trimers were calculated and recorded. Specific interactions between the monomer-monomer interface (αG:αG, β18:β19) and the dimer-dimer interface (β5:β5) were compared between the different assemblies. If a specific residue made a H-bond interaction between the correct secondary structure interface (either on the monomer or the trimer), then it is counted as present (P). The identity of the amino acid residue in ALDH2 is listed and whether it is conserved between ALDH2 and ALDH1B1 (Y) or not (N). Total H-bond interactions by subunit are also recorded (interaction cutoff of 3.5Å) by subunit. The monomer is designated the A subunit, its dimer is designated the B subunit, the subunit opposite A in the dimer-dimer interface that participates in β5:β5 interactions is the D subunit, and the diagonal subunit which participates in no canonical interactions is the C subunit.

of the total weak bonds measured across the monomer-monomer and dimer-dimer interfaces, which is measured in aggregate by the interaction energy parameter. None of the individual substitutions observed appears to cause large disruptions in electrostatic forces.

103

Given the favorable binding energies, similar overall protein stabilities and the similar weak interactions between the monomer and trimers, the present results suggest that either ALDH2 or ALDH2*2 may be included in an ALDH1B1 heterotetramer.

Because of the effect of ALDH2*2 subunits on the catalytic activity of ALDH2 tetramers, it is reasonable to expect that the binding of ALDH2*2 to an ALDH1B1 trimer would similarly repress ALDH1B1 catalytic activity. This may explain why the presence of ALDH1B1 does not compensate for a lack of ALDH2 activity in ALDH2*2 individuals, despite favorable enzyme kinetics for substrates such as ethanol-derived acetaldehyde. It will be important that these in silico observations be verified under in vitro and in vivo conditions.

Discussion

BLAST and BLAT analyses of several vertebrate genome databases were undertaken using amino acid sequences reported for human ALDH2 and

ALDH1B1 to interrogate vertebrate genomes. Predicted amino acid sequences and structures for the vertebrate ALDH2 and ALDH1B1 subunits showed a high degree of similarity with human ALDH2, which served as a reference structure for these

ALDHs. Three ALDH2 genes were observed for the zebrafish (Danio rerio) genome and were closely located on chromosome 5. Amino acid sequences for vertebrate ALDH2 and ALD1B1 subunit-subunit binding regions showed a high degree of sequence identity. The present results support the molecular evolution of vertebrate ALDH2 and ALDH1B1 genes with the mammalian and frog ALDH1B1 genes as being generated by retrotranspositional duplication events of ALDH2

104 genes at two distinct stages of vertebrate evolution: in an ancestral gene leading to the evolutionary appearance of frogs and in an ancestral genome leading to the evolutionary appearance of monotreme, marsupial and eutherian mammals.

This is the first study to describe evidence that suggests that human ALDH2*2 and ALDH1B1 subunits may be capable of forming heterotetramers. Accordingly, through a mechanism similar to that documented for human mitochondrial ALDH2, the

ALDH2*2 genetic variant may reduce the catalytic activity of ALDH1B1. This novel hypothesis may explain why the presence of ALDH1B1 does not compensate for a lack of ALDH2 activity in ALDH2*2 individuals. Further, it opens the possibility that reduced

ALDH1B1 activity could be involved in the pathogenesis of ALDH2*2-related disorders.

105

CHAPTER V

ROLE OF DEAD ENZYMES OF THE ALDH DEHYDROGENASE FAMILY IN

DRUG METABOLISM AND TOXICOLOGY

Summary

Dead enzymes are homologues of enzymes which are predicted to retain protein expression, subcellular localization and typical protein folding, but which have lost key residues required for catalytic activity. These are in contrast to pseudogenes, which have no protein product. In the pre-genome era, these were thought to be a rare occurrence, but have now been shown to be upwards of 10% of the total enzyme count in many species.

Evolutionarily, the retention of highly conserved protein-coding genes throughout taxonomic groups suggests conservative selection pressure and thus physiologic relevance for these genes. Known dead enzymes have a wide range of actions – this typically involves protein-protein interaction; the dead enzyme may modulate the activity of an active enzyme, or interact with a separate protein substrate acting as an allosteric modulator. In some cases dead enzymes act on their natural substrates directly, sequestering them either as inhibitors of other enzymes, or to anchor them in a particular subcellular space. The non-enzymatic activities of both catalytically-active and -inactive

ALDHs are reviewed, focusing on the known and postulated roles of dead enzymes in this physiologically and pathological relevant enzyme superfamily. Finally, the result of a computational search for new ALDH dead enzymes is presented. 182 protein records,

106 divided into 19 groups were found which meet the criteria for dead enzymes: putatively protein coding, but lacking the catalytic cysteine residue required for ALDH activity.

Introduction

In 1967, the first ‘dead enzyme’ was described – a catalytically-inactive homologue of an enzyme family (Brew, Vanaman et al. 1967). These have alternately been called “inactive enzyme-homologues”, “nonenzymes,” “pseudoenzymes” or “dead enzymes” (Leslie 2013; Vasiliou, Sandoval et al. 2013). Dead enzymes are catalytically inactive, but retain protein expression and native-like folding, in contrast to pseudogenes which have no protein product. When the human genome was first fully sequenced, researchers were surprised to find that nearly 10% of all human kinases met this definition (Manning, Whyte et al. 2002). Since then, many of these enzymes have been shown to taken on new roles, and this conversion of members of enzyme families to non- catalytic members with divergent functions is now recognized to be a common feature of many enzyme families. A number of commonalities have been found among dead enzymes. Dead enzymes disproportionately have regulatory functions (Pils and Schultz

2004), and frequently, the processes regulated are those in which their active counterparts participate (Adrain and Freeman 2012). The mechanism of action typically involves protein-protein interaction; the dead enzyme may modulate the activity of an active enzyme, or interact with a separate protein substrate acting as an allosteric modulator. In some cases dead enzymes act on their natural substrates directly, sequestering them either as inhibitors of other enzymes, or to anchor them in a particular subcellular space

(Reiterer, Eyers et al. 2014).

107

The aldehyde dehydrogenase (ALDH) superfamily metabolizes a diverse range of endogenous and exogenous aldehydes to their corresponding carboxylic acids (Marchitti

2008). This group is widely distributed and found in all kingdoms from archaea to mammals including humans. Their known substrates include aldehydes involved in growth and development, differentiation, oxidative stress, osmoregulation, neurotransmission, and detoxification of dietary and environmental aldehydes (Marchitti,

Brocker et al. 2008; Brocker, Vasiliou et al. 2013). ALDH proteins usually comprise ≈

500 amino acids, although some multifunctional or multi-domain members may be larger

(Jackson, Brocker et al. 2011). They are typically dimers or tetramers and consist of three domains: a substrate-binding (catalytic) domain, a cofactor (NAD(P)+) binding domain, and a dimerization/tetramerization domain (Steinmetz, Xie et al. 1997). In addition to their catalytic roles, ALDHs have been shown to possess non-enzymatic roles. The non- enzymatic functions of ALDHs include scavenging of hydroxyl radicals by cysteine sulfhydryl groups (e.g. ALDH3A1), small molecule binding (e.g. binding of T3, daunorubicin, and androgens by ALDH1A1), function as lens crystallins (e.g. various

ALDH1, 2, and 3 family members), and protein-protein interactions (e.g. ALDH16A1).

Further, in a mechanism reminiscent of regulation of active enzymes by dead-enzymes,

ALDH2*2 dominantly inactivates wild-type ALDH2 subunits in hetero-tetramers (Table

5.1).

However, other than ALDH16A1 in mammals, reptiles and fish, and certain

ALDH lens crystallins, no other ALDH dead enzymes have been described. I hypothesize that the ALDH superfamily has numerous other non-catalytic members that have not yet

108 been discovered. To investigate this, a full protein database (UniProt) is mined for dead

ALDHs.

Table 5.1. Non-enzymatic functions of ALDHs ALDH Function ALDH3A1 Scavenging of hydroxy radicals by CYS sulfhydryl groups ALDH1A1 Small molecule binding (e.g. T3, daunorubicin, androgen) Lens crystallins ALDH1, 2, 3 families

ALDH16A1 Protein-protein interaction

Regulation of active ALDH subunit by inactive ALDH2 subunit

Discovering New ALDH Dead Enzymes

There are a number of residues in both the catalytic and cofactor binding site that have been shown to be absolutely required for ALDH activity. One of these residues is the primary catalytic cysteine (CYS319 in ALDH2). Other residues that are either invariant or highly conserved include (GLU285, ASN186 for catalytic, GLY262 and

GLY267 in the Rossmann fold, LYS209, GLU416, PHE418 for cofactor binding).

Although there are countless ways in which an ALDH enzyme might become catalytically inactive, for screening purposes, catalytically inactive is defined here as lacking a homologous catalytic cysteine in an alignment with a model ALDH (human

ALDH2). To find records with an ALDH domain, every record in the UniProt database

(2015) was scanned against an ALDH profile (http://pfam.sanger.ac.uk/family/aldedh) from the Pfam 27.0 build (Finn, Bateman et al. 2014) using the profile hidden markov model tool HMMER (Finn, Clements et al. 2011). Positive ALDH sequences were aligned individually using T-COFFEE (Notredame, Higgins et al. 2000) against a model

109

ALDH (human ALDH2, UniProt ID: P05091) and classified as catalytically active or inactive at that site. Multiple records of the same gene and multiple isoforms were combined.

To combine records into groups of homologous genes, this final list of non- catalytic ALDH records was aligned using T-COFFEE, and then neighbor-joining phylogenetic analysis with 8000 bootstrap replicates by PHYLIP (http:// http://evolution.genetics.washington.edu/phylip.html). Gene identification was approximated by comparing the record against the full UNIPROT database using

HMMER and retrieving the closest named ALDH relative. In addition, to assist with classification and identification of genes, members of known ALDH families were included in the analysis as positive controls. Phylogenetic classification for each record was obtained through UNIPROT. Groups were combined by several factors including 1) consensus-tree bootstrap values, 2) contiguously or similarly named groups, and 3) phylogenetically contiguous groups.

A summary of ALDH dead enzyme groups can be found in Table 5.2, and a full list of records found is presented as Table 5.3. One hundred and eighty-two unique records were found which met the conditions specified above. These were divided into 19 groups. Both known families of ALDH dead enzymes are represented – ALDH16A1

(Group 1) and Ω-crystallins (Group 19). ALDH dead enzyme records were found in

Archaea, Bacteria, and Eukaryota (Figure 5.1). Newly discovered groups were especially prevalent in bacteria and fungi, and many of these records represent groups whose function is unstudied or poorly understood. Records were also discovered in vertebrates, but a majority of these are ALDH16A1 (Group 1), which has been described previously.

110

Several dead enzyme records each were found from other groups including archaea (5), alveolata (2), oomycetes (2), and viridiplantae (7).

Figure 5.2 presents a representative AA alignment (Group 19), showing the location of key residues required for ALDH catalysis. The distribution of the most common inactivating mutations in these residues for each group is given in Table 5.3. In

6 of the groups, CYS319 was primarily deleted, and in another 8 of the groups various mutations were found (Table 5.4). In only 5 of the groups was a single amino acid replacement dominant. Deletions were far less common in the rest of the residues surveyed. The residues of the Rossmann fold were highly conserved with 19/19 groups retaining GLY262 and 14/19 groups retaining GLY267. The residues of the NAD

Binding pocket were moderately conserved with 15/19 groups retaining LYS209, 13/19 groups retaining GLU416 and 11/19 groups retaining PHE418. The remaining catalytic residues were the least well conserved with 9/19 groups retaining ASN186 and 10/19 groups retaining GLU285.

Discussion

Three examples of the non-catalytic roles have been shown to exist in the ALDH superfamily: 1) small-molecule binding, 2) structural, optical, and oxidative stress sinks, and 3) protein-protein interactions. Each one of these may play a role in dead enzymes.

Dead enzymes often retain their cellular localization and substrate / cofactor binding potential, even if they lose catalytic ability, so they are well positioned for regulatory roles (Leslie 2013). Even lacking catalytic activity, many enzymes may retain their substrate and/or cofactor binding regions. This may allow them to sequester substrates or

111

Archaea 5

Bacteria 79 Acidobacteria 1

Actinobacteria 47

Bacteroidetes 3

Cyanobacteria 3

Proteobacteria 23

Tenericutes 2

Eukaryota 98 Alveolata 2

Fungi 29 Ascomycota 23

Basidiomycota 6

Metazoa 58 Chordata 45

Arthropoda 11

Mollusca 2

Oomycetes 2

Viridiplantae 7 Chlorophyta 3

Streptophyta 4

Figure 5.1. The phylogenetic distribution of ALDH dead enzyme records found showing the name of major groups and the number of records in that group. On the left are more inclusive groups, in the center are intermediately inclusive groups and on the left are less inclusive groups. Groups that are connected left to right indicate that they are subgroups of the group on the left.

112

Figure 5.2. Representative alignment showing the key residues required for enzymatic activity in Group 19 (Ω-Crystallins). Included are the most closely related human catalytic ALDHs for reference (i.e., ALDH1A1, 1A2, 1A3, and 2).

113

Table 5.2. Summary of groups of ALDH dead enzyme records.

bind alternate molecules as in the case of Xenopus ALDH1A1 binding T3. Further they may act as structural or optical elements as in the case of ALDH crystallins. They may simply act as sinks for oxidative stress (as seen for ALDH crystallins). Finally, they make take on completely new roles in protein-protein binding. ALDH16A1’s conserved yet

114

.

Full list of ALDH dead recordsdead enzyme FullALDH of list

Table 5.3. 115

.

Continued

Table 5.3.

116

.

Continued

Table 5.3.

117

.

Continued

Table 5.3.

118

.

Continued

Table 5.3.

119

.

Continued

Table 5.3.

120

The top labels are the residue / AA AA / the labels top residue The are

deadenzyme groups.

Summary of mutations of key residues for ALDH ALDH residues for mutations key of Summaryof

number in ALDH2 (e.g., ASN186) and the percent (%) that is either mutated to a new residue (e.g., THR), deleted (del). If no (del). If deleted (e.g., residue THR), mutated is either new a to that (e.g., percent the ALDH2 in (%) and number ASN186) indicated. is (var) various then found is majority Table 5.4.

121 catalytically-inactive structure is likely to act via interactions with other proteins. And finally the possibility of regulation of one ALDH subunit by other related ALDH members in hetero- dimers and tetramers is explored. This work represents the novel discovery of a number of groups of genes that were previously unknown to contain non- catalytic members, and the functions have not been described for many of these members. However, the presence of multiple, closely-related records and groups representing contiguous phylogenetic groups is highly suggestive of conservative pressure to keep these dead-enzymes and thus, suggestive of a physiological role for these proteins. Given the increased information and interest in the role of dead enzymes, there is no doubt that their role in this key enzyme superfamily will grow as well.

Future studies of these genes should focus on validation of these dead enzymes by sequencing genes in these individuals and other, closely-related individuals to confirm these results and determine the distribution of the non-catalytic form of these enzymes.

Second protein expression experiments should be carried out to 1) confirm protein expression in vivo and 2) determine the functions of these new, noncatatlytic enzyme groups.

122

CHAPTER VI

HUMAN ALDH1B1 POLYMORPHISMS MAY AFFECT THE METABOLISM

OF ACETALDEHYDE AND ALL-TRANS RETINALDEHYDE – IN VITRO

STUDIES AND COMPUTATIONAL MODELING

Summary

Additional substrate specificities of ALDH1B1 are elucidated and the effect that human ALDH1B1 polymorphisms will have on substrate specificity are determined.

Computational-based molecular modeling was used to predict the binding of the substrates propionaldehyde, 4-hydroxynonenal, nitroglycerin, and all-trans retinaldehyde to ALDH1B1. Based on positive in silico results, the capacity of purified human recombinant ALDH1B1 to metabolize nitroglycerin and all-trans retinaldehyde was explored. Additionally, metabolism of 4-HNE by ALDH1B1 was revisited. Databases queried to find human polymorphisms of ALDH1B1 identified three major variants:

ALDH1B1*2 (A86V), ALDH1B1*3 (L107R), and ALDH1B1*5 (M253V).

Computational modeling was used to predict the binding of substrates and of cofactor

(NAD+) to the variants. These human polymorphisms were created and expressed in a bacterial system and specific activity was determined. It was found that ALDH1B1 metabolizes (and appears to be inhibited by) nitroglycerin and has favorable kinetics for the metabolism of all-trans retinaldehyde. ALDH1B1 metabolizes 4-HNE with higher apparent affinity than previously described, but with low turnover. Recombinant

ALDH1B1*2 is catalytically inactive, whereas both ALDH1B1*3 and ALDH1B1*5 are

123 catalytically active. Modeling indicated that the lack of activity in ALDH1B1*2 is likely due to poor NAD+ binding. Modeling also suggests that ALDH1B1*3 may be less able to metabolize all-trans retinaldehyde and that ALDH1B1*5 may bind NAD+ poorly. We found that ALDH1B1 metabolizes nitroglycerin and all-trans-retinaldehyde. One of the three human polymorphisms, ALDH1B1*2, is catalytically inactive, likely due to poor

NAD+ binding. Expression of this variant may affect ALDH1B1-dependent metabolic functions in stem cells and ethanol metabolism.

Introduction

The aldehyde dehydrogenase (ALDH) superfamily is a group of enzymes responsible for the metabolism of a diversity of exogenous and endogenous aldehydes, ranging from the developmentally crucial retinaldehyde to acetaldehyde, a major toxic byproduct of ethanol consumption (Marchitti, Brocker et al. 2008). ALDH1B1 is expressed at high levels in the liver, intestinal tract, testes and, to a lesser extent, heart and lung (Hsu and Chang 1991; Stagos, Chen et al. 2010; Stagos, Chen et al. 2010).

ALDH1B1 shares 72 per cent AA identity with ALDH2, and 64 percent amino acid identity with ALDH1A1. ALDH2 has been shown to possess three types of catalytic activity, namely aldehyde dehydrogenase, esterase, and nitroglycerin reductase (Daiber,

Wenzel et al. 2009). ALDH1B1 has previously been reported to catalyze two of these activities: aldehyde dehydrogenase and esterase (Stagos, Chen et al. 2010). It is not known whether ALDH1B1 has nitroglycerin reductase activity. The Vasiliou lab has characterized the substrate preference of ALDH1B1 – this includes acetaldehyde, benzaldehyde, and p-nitrophenyl acetate, as well as longer saturated aldehydes (Stagos,

124

Chen et al. 2010). Unlike ALDH2, ALDH1B1 metabolizes both MDA and 4-HNE very poorly (Stagos, Chen et al. 2010), making it unlikely that ALDH1B1 plays a large role in detoxifying these products of LPO.

Retinoic acid signaling plays a role in the development and homeostasis of many human tissues (Theodosiou, Laudet et al. 2010). The oxidation of retinaldehyde to the biologically active retinoic acid represents another important ALDH1 family function.

There is some evidence that ALDH1B1 plays a role in growth and development and the most likely mechanism for this is via retinaldehyde metabolism. A role for ALDH1B1 in granulocytic development of human hematopoietic stem cells has been proposed through a mechanism involving retinoic acid signaling (Luo, Wang et al. 2007). In addition,

ALDH1B1 has been shown to be a stem cell or progenitor marker in the development of the pancreas (Ioannou, Serafimidis et al. 2013).

ALDH2 plays an important role in the metabolic activation of nitroglycerin

(Chen, Foster et al. 2005), an anti-anginal drug that has been used for more than a century. Individuals lacking ALDH2 activity (e.g., those possessing the ALDH2*2 polymorphism) retain some responsiveness to nitroglycerin, suggesting the existence of alternate, ALDH2-independent pathways of activation (Zhang, Chen et al. 2007). Other

ALDHs including ALDH1A1 and ALDH3A1 metabolize nitroglycerin to some extent as well (Tsou, Page et al. 2011; Lin, Page et al. 2013). Nitroglycerin acts as a potent inhibitor of the enzymes that metabolize, and through such action, nitroglycerin inhibits its own bioactivation (leading to its diminished efficacy with continued administration, a process called tolerance) and can cause ALDH2 dysfunction (Beretta, Sottler et al. 2008).

125

It is worth asking if ALDH1B1 metabolizes nitroglycerin (and is thus inhibited by it as well).

Given the range of substrates metabolized by ALDH1B1, it is important to understand mutations that could affect its activity. Early studies found ALDH1B1 to be polymorphic (Hsu and Chang 1991; Sherman, Dave et al. 1993). A search of current databases revealed three polymorphisms which are non-synonymous and present at a frequency of at least 1% - ALDH1B1*2 (Ala86Val), ALDH1B1*3 (Leu107Arg), and

ALDH1B1*5 (Met253Val) (Table 6.1). ALDH1B1*4 has been named, but is a silent

(synonymous) mutation. ALDH polymorphisms have previously been shown to cause significant pathophysiological effects. For example, a polymorphism of ALDH2,

ALDH2*2, causes marked reductions in acetaldehyde metabolism and consequent flushing syndrome and ethanol avoidance in hetero- and homozygotes. Two large epidemiological studies have examined associations between polymorphisms in several

ALDHs (including ALDH1B1) and factors relating to alcohol metabolism including weekly alcohol intake, and ethanol hypersensitivity reactions (e.g. itchy runny nose, sneezing, shortness of breath, rash, itching or swelling). Individuals with ALDH1B1*2 exhibited increased non-drinking behaviors (average of less than one drink per week), as well as increase in the number of ethanol hypersensitivity reactions (Husemoen, Fenger et al. 2008), (Linneberg, Gonzalez-Quintela et al.). Both of these responses are consistent with increased acetaldehyde toxicity, which is consistent with poorer metabolism of acetetaldehyde by ALDH1B1. ALDH1B1*3 polymorphisms did not correlate with any change in epidemiological parameters.

126

Given the number of known and proposed roles for ALDH1B1 in vitro and in vivo, and the effects that have been shown in population association studies, it is important to understand the substrate specificity of ALDH1B1 and the impact polymorphisms have on the function of ALDH1B1. Computational modeling of the binding of substrates to ALDH1B1 can be used to predict the substrate specificity of

ALDH1B1 and the impact mutations may have. This modeling is facilitated and underpinned by an understanding of the well-studied and highly conserved catalytic mechanism of the aldehyde dehydrogenase activity of the ALDH superfamily which involves an initial binding state where the carbonyl carbon of the substrate is stabilized by bonds to a catalytic cysteine (CYS319 in ALDH1B1) and the side chain amide nitrogen of an asparagine (ASN186 in ALDH1B1) (Liu, Sun et al. 1997; Steinmetz, Xie et al. 1997).

I hypothesize that 1) ALDH1B1 has additional substrates that define divergent roles from ALDH2, especially as it relates to growth and development and that 2)

ALDH1B1 has human mutants that may affect metabolism of ALDH substrates. To investigate these hypotheses, first computational methods were used to investigate the binding of known and previously unreported substrates to ALDH1B1. Using recombinant human ALDH1B1, the enzyme kinetics of two additional substrates of ALDH1B1 were characterized: nitroglycerin and all-trans retinaldehyde. Based on the results of the computational docking, enzyme kinetics of another substrate, 4-HNE was revisited. In addition, computational models of ALDH1B1 and its polymorphic variants were created, and known substrates were docked against them in order to: 1) provide a physicochemical basis for observed epidemiological differences, and 2) predict

127 differences in substrate specificities that might arise from polymorphic variants.

ALDH1B1 and its polymorphic variants were epxressed in vitro in order to confirm the results of in silico docking studies. Finally, the computational models of ALDH1B1 variants were investigated to provide a mechanism for the results seen in vitro.

Methods

Computational modeling: Modeling was performed for a total of six proteins:

ALDH1A1, ALDH2, ALDH1B1 and the variants, ALDH1B1*2, ALDH1B1*3 and

ALDH1B1*5. Wild-type ALDH1B1 is sometimes referred to as ALDH1B1*1 where it might be otherwise confused with one of its variants. ALDH1A1 and ALDH2 were included in experiments as positive controls since they are established metabolizers of all-trans retinaldehyde (ALDH1A1), acetaldehyde (ALDH2), 4-HNE (ALDH2) and nitroglycerin (ALDH2). Crystal structures were downloaded from the Protein Data Bank

(Berman, Westbrook et al. 2000). The B subunit of human ALDH2 (PDB ID: 1O01,

(Perez-Miller and Hurley 2003)) was used directly for docking. Homology models were created for human ALDH1B1 (from ALDH2 – PDB ID: 1O01) and for human

ALDH1A1 (from sheep ALDH1A1 – PDB ID: 1BXS, (Moore, Baker et al. 1998)). Once mitochondrial leader sequences are removed, ALDH1B1 is the same length as ALDH2

(i.e., 517 AA), and these sequences align with no gaps, making ALDH2 an ideal template for creating a homology model of ALDH1B1. An alignment of ALDH1B1 and ALDH2 is provided in Figure 6.1. ALDH1B1*2, ALDH1B1*3 and ALDH1B1*5 were created as mutations of the ALDH1B1 homology model. Homology models were created in

MODELLER 9.12 (Webb and Sali 2014). One hundred models were created using

128 random seeds and the best model was picked by DOPE score. The best model was minimized using NAMD 2.9 (Phillips, Braun et al. 2005). Briefly, the protein was solvated with explicit water (TIP3P) molecules (30Å minimum padding in each direction) and 20 mM MgCl2 was added as a buffer. All molecules were typed with the

CHARMM force field (CHARMM22 for proteins and CGenFF for small molecules).

Energy minimization was calculated using periodic boundary conditions until the average step size was less than 0.001 kcal (approximately 50,000 steps). Minimization was performed twice, first with the protein held rigid to minimize the solvent only, and then with the entire system allowed to move.

Substrates were prepared using MGLTools (v1.5.6) and then docked into homology models using AutoDock Vina (v1.1.2) (Trott and Olson 2010) 100 times using random seeds. Ligands were treated as flexible, but isomerization was not allowed, e.g. from all-trans retinaldehyde to 9- or 13-cis retinaldehyde. Hydrogen bond lengths and angles were measured between substrates and critical amino acids (with a cutoff of 3.5Å) using the BINANA python script (Durrant and McCammon 2011), with minor modifications to output values that were calculated internally but not reported in the published script. Among the multiple poses found for each protein-substrate interaction, the best pose was selected from those poses that made two critical interactions: hydrogen- bonding of the side-chain amide nitrogen of asparagine (ASN186) and the peptide nitrogen of the catalytic cysteine (CYS319) to the carbonyl oxygen of the substrate.

Where multiple poses were found that met this requirement, the pose that had the minimum hydrogen bond distances and maximum AHD and HAY bond angles for these interactions was chosen. The best poses (≈ 10) for each substrate/protein

129

Figure 6.1. Alignment of ALDH1B1 and ALDH2. An alignment was generated by T- COFFEE [PMID: 10964570]. Amino acids are classified as identical (*), highly similar (:), somewhat similar (.), or dissimilar ( ). Secondary structures are labeled by homology with ALDH2 with beta pleated sheets tinted blue and alpha helices tinted green. ALDH1B1 polymorphic variants are indicated by arrows and labeled as *2, *3, and *5.

130 interaction were chosen and subjected to energy minimization as described above. The highest ranked final pose was selected from the minimized poses using the same criteria described above.

Interaction energy (the sum of pairwise Van der Waals and electrostatic energy between the ligand and protein) was calculated for each final minimized docking pose using NAMD.

The cofactor NAD+ was also docked into ALDH1B1 and variant homology models as described above. Hydrogen bonds between cofactor (NAD+) and ALDH protein were measured up to 3.5Å in length. The best pose was selected based on interactions and similarity to positioning reported in the literature for other ALDHs

(Steinmetz, Xie et al. 1997). As a crude measure of the position of NAD+ relative to the substrate, the distance between the carbonyl oxygen of the docked pose of propionaldehyde and the center of the nicotinamide ring of the docked pose for NAD+ was measured. As a measure of hydrogen bond conservation, the number of amino acids making hydrogen bonds to NAD+ were counted as described previously for ALDH2 by

Steinmetz and colleagues (Steinmetz, Xie et al. 1997).

The root mean square distance (RMSD) between the α-carbon of amino acid residues of wild-type vs. variant proteins was measured for individual amino acids, for each element of secondary structure, and for the entire protein using MODELLER 9.12.

Overlays of wild-type vs. variant proteins were created using structural alignments performed by Discovery Studio Visualizer (Accelrys, San Diego, CA).

ALDH substrate metabolism in vitro: Recombinant human wild-type ALDH1A1,

ALDH1B1, and ALDH2 proteins were expressed in SF9 cells and purified by FPLC as

131 described previously (Stagos, Chen et al. 2010). ALDH1B1 and ALDH2 were activated by incubation in 50 mM β-mercaptoethanol for 1 h in quantities to yield 1 mM β- mercaptoethanol in the final reaction. ALDH1A1 does not require pre-activation, so β- mercaptoethanol was added with reaction buffers to achieve a final concentration of

1mM.

To monitor the metabolism of nitroglycerin over time, 20 μM nitroglycerin

(5mg/mL, ethanol 30% v/v, propylene glycol 30% v/v - American Reagent Inc., Shirley,

NY) was added to 25 µg ALDH protein in a buffer containing 1mM NAD+, 1mM glutathione and 1mM dithiothreitol. Reactions were performed in triplicate. Aliquots of reaction mixtures were taken at 0, 1, 2, 5, 10, 30, 60, and 120 min. Reactions in aliquots were quenched by adding 50:50% ice-cold acetonitrile / water, centrifuged at 15,000

RPM for 15 min in a microcentrifuge, and then analyzed by ultra performance liquid chromatography (UPLC). Negative controls (buffered system with no ALDH protein) were included. UPLC analysis (Acquity UPLC, Waters, Milford, MA) of nitroglycerin metabolites was performed using a BEH C18 column (1.7 µm). The UPLC reverse phase consisted of a linear gradient of 0% to 95% B using the solvents - A: 100% acetonitrile and B: 95% water / 5% acetonitrile. Quantitation was performed by comparing peak areas against standard curves of nitroglycerin, 1,2 DNG and 1,3 DNG.

To determine kinetic parameters of all-trans retinaldehyde metabolism, 2.2 µg of activated ALDH protein (or no-enzyme controls) were added to a solution of all-trans retinaldehyde (3.9 - 62.5 µM) in a sodium-pyrophosphate buffer containing 1 mM NAD+ and 1 mM pyrazole (N=7). These experiments were performed under minimal light conditions. After incubation for 30 min, reactions were quenched by adding 50/50% ice-

132 cold acetonitrile / 1-butanol containing the internal standard retinyl acetate, and then extracted and analyzed as described previously (McClean, Ruddel et al. 1982). Samples were analyzed by UPLC using a mobile phase of 79.5% acetonitrile, 0.5% acetic acid, and 20% water. Quantitation was performed by comparing peak areas against standard curves of retinoic acid, retinaldehyde and retinyl acetate. Kinetic parameters were calculated using SigmaPlot 12 (Systat Software, Inc., San Jose, CA).

To determine the kinetic properties of 4-HNE, 10 µg activated ALDH protein was added to a solution of 4-HNE (2-128 µM) in a sodium-pyrophosphate buffer containing 2 mM NAD+ and 1mM pyrazole (N=4). Production of NADH, used to measure ALDH catalytic activity, was monitored spectrophotofluorometrically at 450 nm (excitation 340 nm; SpectraMax Gemini EM) and normalized to a NADH standard curve. The reaction was monitored and provided a linear increase in NADH from 5 min post substrate addition to 30 min.

ALDH1B1 polymorphisms: ALDH1B1 polymorphic variants were retrieved from listings in the Uniprot, NCBI’s refSNP, and GeneCards databases (Sherry, Ward et al.

2001; Safran, Dalah et al. 2010; 2015). Three variants were found that were non- synonymous and with a population frequency exceeding 1%. Sequences were downloaded from the Uniprot database. In this study, amino acids are numbered from the beginning of the translated protein sequence, which include the 17 AA mitochondrial leader sequence in amino acid numbering. In some databases, amino acids are numbered without the leader sequence, e.g., ALDH1B1*2 A86V may be listed in some references as A69V. Frequency data were obtained from NCBI’s refSNP database. Variant frequency data by race were obtained from NCBI’s refSNP database by combining

133

HapMap3 data into the major racial groups African (AFR), Asian (ASN), European

(EUR), Indian (IND), and Mexican (MEX).

Generation of ALDH1B1 variant proteins in vitro: Human ALDH1B1 cDNA

(NM_000692.3) was purchased from Origene (Rockville, MD) (Stagos, Chen et al.

2010). To remove the mitochondrial leader sequence from ALDH1B1, Y19 was mutated to MET, creating an NdeI restriction site. The modified cDNA sequence was then cloned into the pET-15b vector using NdeI and BamHI restriction sites. The expressed protein is

HIS-tagged and has the modified sequence MGSSHHHHHHSSGLVPRGSHMSSA… where the underlined MET replaces Y19 and begins the native human ALDH1B1 sequence. This modified plasmid was created and generously provided by the laboratory of Dr. Tom Hurley (Indiana University, Indianapolis, IN). The pET-15b hALDH1B1 plasmid was transformed into E. coli BL21-DE3 Tuner cells.

Cells were expressed in 6 l batches by seeding 15 ml of LB broth (all media was supplemented with 100 µg/ml carbenicillin) with culture from glycerol stocks and growing overnight at 37° C (all growth periods were performed with shaking). This 15 ml culture was centrifuged to remove media then resuspended in 90 ml fresh LB broth and grown for 3 h at 37° C. Fifteen ml of this culture was added to each of 4 flasks containing

1.5 l media and then grown at 37° C to 0.8 optical density at 600 nm (OD600). Flasks were cooled to 16° C and then induced with 0.1 mM IPTG and grown at 16° C for 24 h.

Cells were harvested by centrifugation at 5,000 RPM for 15 min and frozen overnight at -

80° C. Pellets were thawed and resuspended in a lysis buffer (20 mM Hepes pH 8.0, 0.5

M NaCl, 2 mM β-mercaptoethanol) containing protease inhibitors (cOmplete protease inhibitor cocktail; Roche, Indianapolis, IN) and 1 mg/ml lysozyme (from chicken egg

134 white; Sigma-Aldrich, St. Louis, MO) by gentle shaking at room temperature for 30 min.

The cell suspension was subjected to 4 freeze-thaw cycles, i.e., complete freezing in liquid nitrogen followed by thawing in a shaking water bath at 37° C. The suspension was drawn through an 18G needle 10x to shear genomic DNA followed by brief sonication to complete shearing. This solution was ultracentrifuged at 35,000 RPM for 1 h. The cleared lysate was then purified using a Ni-NTA column by applying the lysate to the column and washing with 10 column volumes of lysis buffer containing 10mM imidazole followed by 5 column volumes lysis buffer containing 60 mM imidazole.

Protein was eluted with 5 column volumes of lysis buffer containing 250 mM imidazole and concentrated / desalted using centrifugal filter units (Amicon Ultra; Sigma-Aldrich,

St. Louis, MO) in 10 mM Tris buffer (pH 7.8). Human ALDH1B1 was verified by denaturing gel electrophoresis (SDS-PAGE) followed by either Coomassie Blue staining or immunoblotting with human ALDH1B1 antibodies (data not shown).

Variant plasmids for ALDH1B1*2, ALDH1B1*3 and ALDH1B1*5 were created by Custom DNA Constructs (University Heights, OH) via site-directed mutagenesis and verified by nucleotide sequencing. Variant proteins were expressed and purified as above.

To determine the specific activity of bacterially expressed ALDH1B1 and variants, ≈ 20

µg activated ALDH protein was added to a solution of propionaldehyde (10 mM) in a sodium-pyrophosphate buffer containing 2 mM NAD+ and 1mM pyrazole (N=4).

Production of NADH, used to measure ALDH catalytic activity, was monitored spectrophotofluorometrically at 450 nm (excitation 340 nm; SpectraMax Gemini EM) and normalized to a NADH standard curve. The reaction was monitored and provided a linear increase in NADH from 5 min post substrate addition to 30 min.

135

Results

Molecular modeling of ALDH1B1 substrate binding: As noted, ALDH2 was used as a positive control because it is known to efficiently metabolize propionaldehyde (Km

2.4 µM (Lassen, Estey et al. 2005)), 4-HNE (Km 0.9 µM (Yoval-Sanchez and Rodriguez-

Zavala 2012)) and nitroglycerin (Km 11.3 µM (Li, Zhang et al. 2006)). ALDH1A1 was used as a positive control for all-trans retinaldehyde, but is also known to metabolize propionaldehyde (Km 21.0 µM (Wang, Han et al. 2009)), 4-HNE (Km 1.7 µM (Yoval-

Sanchez and Rodriguez-Zavala 2012)) and nitroglycerin. ALDH1B1 has previously been shown to metabolize propionaldehyde efficiently (Km 14.0 µM (Stagos, Chen et al.

2010)). Poses containing the two critical hydrogen bonds (ASN, CYS, described above) were identified. Poses without these interactions were scored as non-interacting. Table

6.1 lists hydrogen bond lengths found (up to 3.5 Å), as well as calculated interaction energies between the substrate and protein. Apparent binding affinity (Km) is also listed if they have been experimentally determined. Although there is no direct correlation between interaction energies or poses and Km, each of the positive control substrates had

Km values in the lower micromolar range, indicative of relatively strong binding. Each of the four substrates bound to ALDH1A1 with appropriate docking poses. Similarly, for

ALDH2, poses were found for propionaldehyde, 4-HNE, and nitroglycerin. No appropriate docking pose was found for all-trans retinaldehyde with ALDH2, which was expected since ALDH2 has a much narrower substrate binding pocket than ALDH1A1, making it less likely to accommodate larger substrates. Each of the four substrates correctly bound to ALDH1B1 (ALDH1B1*1). Figure 6.2 shows two dimensional representations of the binding poses for each substrate with ALDH1B1. Additional three

136 dimensional representations of these poses are presented in Figure 6.3. Multiple hydrophobic interactions were found for all-trans retinaldehyde, and to a somewhat lesser extent 4-HNE and propionaldehyde. ALDH1B1 has been previously shown to metabolize propionaldehyde, but was reported to have a poor affinity for 4-HNE, which was inconsistent with the appropriate docking poses that were consistently found.

Additionally, good docking poses were found for the untested substrates all-trans retinaldehyde and nitroglycerin. To verify the functional implications of these in silico results, the metabolism of 4-HNE, all-trans retinaldehyde and nitroglycerin by

ALDH1B1 was examined in vitro.

Metabolism of all-trans retinaldehyde by ALDH1B1 in vitro: All-trans retinaldehyde is an established substrate for ALDH1A1. In the present study, ALDH1A1 metabolized all-trans retinaldehyde with a Km of 26.8 ± 7.1 μM and a Vmax of 74.2 ±

23.6 nMol/min/mg protein. ALDH1B1 had a similar Km of 24.9 ± 10.7 μM and a lower

Vmax of 20.0 ± 7.6 nMol/min/mg protein for all-trans retinaldehyde (Table 6.2).

Metabolism of 4-HNE by ALDH1B1 in vitro: In this study, ALDH1B1 metabolized 4-

HNE with a Km of 18.5 ± 4.1 µM and Vmax of 10.3 ± 0.4 nmol/min/mg protein (Table

6.2).

Metabolism of nitroglycerin by ALDH1B1 in vitro: At a 20 μM substrate concentration, ALDH1B1 metabolized nitroglycerin to 1,2-DNG and 1,3-DNG at rates comparable to ALDH2 (Figure 6.4). Rates of 1,2 DNG production by both enzymes declined sharply after 10 min without depletion of nitroglycerin (data not shown), which is consistent with inhibition of these enzymes by nitroglycerin. Initial rates of catalysis

137

Table 6.1. Computational modeling of interactions between ALDH isozymes and substrates.

were calculated for both enzymes at 10 minutes. ALDH2 produced 0.20 ± 0.02 nmol 1,2-

DNG/min/µg protein, and 0.09 ± 0.02 nmol 1,3-DNG/min/µg protein. ALDH1B1 produced 0.16 ± 0.06 nmol 1,2-DNG/min/µg protein, and 0.12 ± 0.03 nmol 1,3-

DNG/min/µg protein. For both enzymes, rates of 1,2-DNG production was higher than rates of 1,3-DNG production (ratios of 1,2-DNG/1,3-DNG for ALDH2 and ALDH1B1 were 2.3, and 1.4, respectively).

138

Figure 6.2. Representative docking poses for substrates of ALDH1B1. Amino acids of ALDH1B1 that make hydrogen bonds to the substrate are displayed and labeled in green (hydrogen atoms and bond order are not shown). Amino acids that make hydrophobic interactions with the substrate are labeled in black. Key: Carbon – black, Oxygen – red, Nitrogen – blue, Sulfur – yellow. This figure was created in LigPlot+ (v1.4.5).

139

Figure 6.3. Representative docking poses of aldehyde substrates (shown in stick figures) to ALDH1B1 (ribbon diagram). Key amino acids are shown in ball and stick representation. Hydrogen bonds are indicated by green dashed lines.

140

Table 6.2. Kinetic values for the metabolism of select substrates by ALDH isozymes. Vmax Km Substrate Protein ( nmol / min / Vmax / Km ( µM ) mg protein ) All-trans ALDH1A1 26.8 ± 7.1 74.2 ± 23.6 2.8 retinaldehyde ALDH1B1 24.9 ± 10.7 20.0 ± 7.6 0.8 4 - HNE ALDH1B1 18.5 ± 4.1 10.3 ± 0.4 0.6

Human polymorphisms of ALDH1B1: Three polymorphic variants of ALDH1B1 were found in sequence databases that caused amino acid changes and were present at frequencies of greater than 1%. These include ALDH1B1*2 (A86V – dbSNP: rs2228093), ALDH1B1*3 (L107R – dbSNP: rs2073478), and ALDH1B1*5 (M253V – dbSNP: rs4878199) (Table 6.3). Population frequencies found in the 1000 genomes project and the HapMap3 project and frequency by race are provided in Table 6.3. The frequency of mutations varied between the races such that the ALDH1B1*2 variant are the most common in Asian and Mexican populations and the ALDH1B1*3 variant was most common in African, European and Indian populations. The ALDH1B1*5 was least frequent in the Asian, European, Indian and Mexican populations (Table 6.3). The mutations in all three polymorphic variants are present in the NAD+ binding domain of

ALDH1B1 (Figure 6.5a, Figure 6.1). By homology to ALDH2 (Steinmetz, Xie et al.

1997), ALDH1B1*2 is located in the αA helix, and the amino acid side chain faces inward toward the core of the protein (Figure 6.5b, 6.5c, Figure 6.1). ALDH1B1*3 is located on the αB helix and faces outward at the surface of the protein (Figure 6.5b, 6.5c,

Figure 6.1). ALDH1B1*5 is located on the loop between the αF helix and β10 sheet and faces outward at the surface of the protein (Figure 6.5b, 6.5c, Figure 6.1). None of these

141

Figure 6.4. Metabolism of nitroglycerin by recombinant ALDH2 and ALDH1B1. ALDH2 (closed circles), ALDH1B1 (open circles) or buffer solution containing no ALDH protein (closed triangles) were incubated with 20 µM nitroglycerin in a buffer containing NAD+, glutathione and DTT. The rate of production of 1,2 dinitroglycerin (1,2 DNG) (upper panel) and 1,3 dinitroglycerin (1,3 DNG) (lower panel) after nitroglycerin addition was studied by UPLC analysis and normalized to the amount of ALDH protein. For consistency, the rate of production of each product in the negative controls were divided by 25 µg protein, the same as each experimental group.

142

Table 6.3. Polymorphisms of human ALDH1B1 (top), and variant frequency by race (bottom).

substitutions are at positions that are involved in the monomer-monomer (dimer-forming) or dimer-dimer (tetramer-forming) interfaces. It should be noted that while ALDH1B1*3

(L107R) does not participate in the interfaces directly, ASN106 forms a hydrogen bond across the dimer axis. This residue is part of an α-helix secondary structure, and thus the residue at position 107 is rotated away from this interface and faces outwards in a tetramer homology model (data not shown), and is unlikely to interact directly across this interface.

Computational modeling of ALDH1B1 polymorphisms: The substrates propionaldehyde, 4-HNE, all-trans retinaldehyde, and nitroglycerin were docked into homology models of ALDH1B1*2, ALDH1B1*3, and ALDH1B1*5 as described above.

Poses similar to ALDH1B1*1 (wild-type) were found for all polymorphism / substrate combinations with the single exception that no appropriate docking pose was found for

ALDH1B1*3 with all-trans retinaldehyde (Table 6.1). Figure 6.6 shows the homology model of ALDH1B1*3 superimposed upon the docked pose of all-trans retinaldehyde into ALDH1B1*1. It shows that the likely reason that no docking poses for all-trans retinaldehyde were found with ALDH1B1*3 was that in the homology model of

143

Figure 6.5. Location of polymorphisms of ALDH1B1. The location of the polymorphic amino acids are shown for ALDH1B1*2 (A86V; light green), ALDH1B1*3 (*3 L107R; light blue) and ALDH1B1*5 (*5 M253V; yellow). a) Homology model of an ALDH1B1 monomer. The protein structure is colored by domain as follows: substrate binding domain (green), cofactor (NAD+) binding domain (blue) and polymerization domain (grey). The asterisk (*) shows the substrate binding tunnel (as shown) which connects to the cofactor binding cavity (if viewed from behind the page). b) Space filling model of the ALDH1B1 protein showing the predicted relative exposure or burial of the mutated amino acids (shown are wild-type residues). c) The position of the polymorphic amino acids relative to secondary structures on the cofactor binding domain. Domains are colored as in (a). Figures were created in Discovery Studio Visualizer.

144

ALDH1B1*3, a loop comprising amino acids 472-478 was shifted 2.4 Å towards the substrate binding pocket compared to wild-type, leaving insufficient room for docking of the bulky substrate. However, it should be noted that this loop is part of the dimer- forming interface. In the current study ALDH1B1 is modeled as a monomer, and this allows extra flexibility in loops such as this that would normally be stabilized by interactions between subunits. Thus, without further information, it is likely that this shift represents an artifact of modeling rather than a consequence of the polymorphic variant.

Given that the human polymorphisms were all located in the cofactor binding domain,

NAD+ was also docked against ALDH2, ALDH1B1, and each variant of ALDH1B1.

ALDH2 was used as a positive control since it has a known crystal structure with cofactor (NAD+) bound. The binding poses for ALDH1B1 and its polymorphic variants were compared with hydrogen bond interactions reported by Steinmetz and colleagues

(Steinmetz, Xie et al. 1997) (Table 6.4). No individual docking experiment in silico was able to reproduce the exact binding pose reported for the crystal structure of ALDH2.

However, each of the docking experiments reproduced five of the seven known hydrogen bonding interactions, with the exception of ALDH1B1*2 which only reproduced four.

Notably, ALDH1B1*2 was the only protein that did not reproduce any of the three hydrogen bonding interactions nearest the substrate, i.e., LEU283, GLU416 and TRP185.

As a crude measure of the position of NAD+ relative to the substrate, the distance between the carbonyl oxygen of propionaldehyde and the center of the nicotinamide ring of NAD+ was measured. For reference, in the literature, the measured distance for the hydride transfer pose for NAD+ for ALDH2 is 4.7 Å, and the hydrolysis pose for NAD+ has a distance of 7.9 Å (PDB ID: 1O00B and A, respectively,

145

Figure 6.6. Comparison of substrate-binding domain in ALDH1B1*1 and ALDH1B1*3. An overlay of ALDH1B1*3 (yellow) over ALDH1B1 wild type (light-blue) with all- trans retinaldehyde bound (dark blue stick representation). The loop comprising amino acids 472-478 was shifted 2.4 Å towards the substrate, blocking that binding position in ALDH1B1*3. This figure was created in Discovery Studio.

146

Table 6.4. Summary of docking poses for NAD+ binding to ALDH isozymes.

ALDH2 ALDH1B1*1 ALDH1B1*2 ALDH1B1*3 ALDH1B1*5 Interaction Energy (kcal / mol)

Total -201.0 -189.9 -182.5 -210.7 -234.2 Electrical -156.1 -132.5 -129.7 -160.2 -189.8 Van der Waals -44.9 -57.4 -52.8 -50.5 -44.4 Dist to substrate a (Å) 5.4 4.4 16.9 4.1 12.0 H-bonds b 5(8) 5(5) 4(4) 5(6) 5(6) LEU286 LEU286 LEU286 GLU416 GLU416 x3 GLU416 GLU416 GLU416 TRP185 TRP185 TRP185 TRP185 SER263 SER263 x2 SER263 SER263x2 SER263 GLU212 GLU212 GLU212 GLU212 GLU212 GLU212x2 LYS209 LYS209 LYS209 LYS209 ILE183 ILE183 ILE183 ILE183 a Distance between the carbonyl oxygen of the docked propionaldehyde and the nicotinamide ring of NAD+ b Number of unique hydrogen bonds to each amino acid with total hydrogen bonds in parentheses. Only interactions described for ALDH2 in Steinmetz et al. 1997 are shown. Where multiple interactions to the same amino acid was measured, x2 or x3 is indicated.

(Perez-Miller and Hurley 2003)). The distances calculated for ALDH1B1 and variants are presented in Table 6.4. The docked pose of ALDH1B1 and ALDH1B1*3 had calculated distances similar to that of the known hydride transfer pose. However, ALDH1B1*2 and

ALDH1B1*5 both had distances more than double the distance of ALDH1B1, indicating that the docked pose placed the nicotinamide ring far away from the substrate. Qualitative representations of each cofactor binding pose are provided in Figure 6.7. Most docking poses were able to correctly place the backbone of NAD+ in the correct orientation (also shown in the hydrogen bond data in Table 6.4), but were differentiated by their placement of the nicotinamide ring and the adenine base. The pose that most resembled known

147 interactions was ALDH1B1*3 which correctly oriented the nicotinamide ring towards the substrate and the adenine base in the cleft between the αF and αG helices (Figure 6.7).

ALDH1B1 was positioned similarly but had the adenine base projecting out of the binding cleft towards the exterior of the protein. For ALDH1B1*5, both the nicotinamide ring and the adenine base projected outwards away from their binding clefts. Finally, the binding pose for ALDH1B1*2 was completely unsuitable, and reversed in overall orientation. The RMSD between Cα for each variant was calculated for each residue and overall for each variant protein compared to the wild-type protein (Table 6.5). The overall RMSD for each protein was between 0.72 and 0.97 Å. indicating that major deformations of the proteins due to the mutations are unlikely. This is also supported by similar overall minimization energies for the proteins (Table 6.6). The RMSD for individual secondary structures was also low, similar to that seen between whole proteins

(data not shown). When comparing the individual amino acids involved in binding

NAD+, ALDH1B1*3 is most similar to ALDH1B1*1, with only one amino acid

(SER263) with an RMSD greater than 1 Å, which is also shifted in each of the other variants. Both ALDH1B1*2 and ALDH1B1*5 have shifts in the LEU286 amino acid residue compared to wild-type. In both cases, the shift is away from the cofactor making interaction less likely. ALDH1B1*2 also has a shift in GLU416 towards the cofactor, likely disrupting the binding pocket further. In terms of these binding metrics, an overall binding suitability of ALDH1B1*3 > ALDH1B1*1 >> ALDH1B1*5 > ALDH1B1*2 is proposed. However, it is important to recognize that, due to the multitude of possible interactions and the flexibility of NAD+, multiple configurations are likely to exist in vivo.

148

Figure 6.7. Docking poses for NAD+ bound to ALDH1B1 and human variants. The elements comprising the cofactor binding cleft are colored blue and other elements of the protein are colored grey. The cofactor is shown in stick representation with the nicotinamide ring of NAD+ highlighted in yellow and the adenine base highlighted green.

149

Table 6.5. Root mean square (RMSD) distances between ALDH1B1 variants and wild- type.

Table 6.6. Homology modeling metrics for ALDH1B1 and variants. ALDH1B1*2 DOPE score -58566 (A86V) Z score -13.5 Minimization Energy -393390 ALDH1B1*3 DOPE score -58137 (L107R) Z score -13.536 Minimization Energy -395944 ALDH1B1*5 DOPE score -58404 (M253V) Z score -13.67 Minimization Energy -406335 ALDH1B1*1 DOPE score -58968 (WT) Z score -13.518 Minimization Energy -395741

Recombinant Expression of Human ALDH1B1 and Variants: The approximate yield of ALDH1B1 and variants were similar at approximately 75 µg protein/l culture.

Attempts to enhance the yield were unsuccessful as increased protein expression inevitably increased the insoluble fraction of the protein. This is consistent with the

150

Figure 6.8. Expression and activity of ALDH1B1 variants. (top) The specific activity of ALDH1B1 polymorphic variants was estimated by measuring NADH production from NAD+ using propionaldehyde as a substrate. No significant difference (P > 0.05, ANOVA) was found between the wild-type (ALDH1B1*1) and ALDH1B1*3 or ALDH1B1*5. Data represent the mean + SE from 3 experiments. (bottom) Coomassie- stained SDS-PAGE of recombinant ALDH1B1 proteins.

151 suggestion that expression of similar ALDH isozymes in this family is chaperone- dependent (Lee, Kim et al. 2002). ALDH1B1 and all variant proteins appeared as a double band between 55-58 kDa on SDS-PAGE (Figure 6.8). This has been previously observed when ALDH1B1 is expressed in a eukaryotic system (Stagos, Chen et al. 2010).

Immunobloting of ALDH1B1*1 protein using antibodies against human ALDH1B1 were successful and specific (data not shown). The specific activity of ALDH1B1 using propionaldehyde as a substrate and NAD+ as a cofactor under saturating conditions was

1,004 ± 2 nmol/min/mg protein. The specific activities for the variants were 0 nmol/min/mg protein for ALDH1B1*2, 1,048 ± 39 nmol/min/mg protein for

ALDH1B1*3, and 962 ± 32 nmol/min/mg protein for ALDH1B1*5. There was no significant difference between the specific activities of ALDH1B1*1, ALDH1B1*3 and

ALDH1B1*5 (Figure 6.8).

Discussion

ALDH1B1 substrate specificity: Computational modeling was used to investigate the substrate specificity of ALDH1B1. In these studies, previously examined

(propionaldehyde and 4-HNE) and untested (nitroglycerin and all-trans retinaldehyde) substrates were all found to have favorable docking poses for ALDH1B1 in silico. Based on these results, additional enzyme kinetics studies were performed in vitro to verify the predicted metabolism of all-trans retinaldehyde and nitroglycerin. Moreover, 4-HNE, which was included in the in silico studies as a poor binder based on the previously reported apparent Km of (3,383 µM) (Stagos, Chen et al. 2010), made favorable docking interactions in silico, so the kinetics of 4-HNE were revisited as well.

152

These studies revealed ALDH1B1 to be capable of metabolizing two previously untested substrates, nitroglycerin and all-trans retinaldehyde. Nitroglycerin is metabolized to 1,2

DNG and 1,3 DNG by both ALDH2 and ALDH1B1. A sharp decline in DNG formation occurred after the first 10 minutes for both ALDH isozymes, suggesting that, like

ALDH2, ALDH1B1 is subject to rapid inhibition by nitroglycerin (Sydow, Daiber et al.

2004). This has potential therapeutic implications. First, nitroglycerin is bioactivated through metabolism by ALDHs and 1,2 DNG is thought to be the pharmacologically- active metabolite (Chen and Stamler 2006). Inactivation of ALDH2 is thought to underlie the diminishing vasodilator activity of nitroglycerin observed with maintained nitroglycerin exposure or therapy (Sydow, Daiber et al. 2004). The present results suggest that this may also apply to ALDH1B1. Second, by inhibiting ALDH2 and

ALDH1B1, nitroglycerin treatment may adversely affect other physiological processes reliant upon the catalytic activity of these enzymes, such as the development and differentiation of cells due to retinoic acid signaling and the detoxification of exogenous and endogenous aldehydes. In the present study, 1,2 DNG was formed preferentially compared to 1,3 DNG by both ALDH1B1 and ALDH2, but not by a very large extent.

This may indicate that, under the experimental conditions utilized, the enzymes were saturated, which has been shown to reduce product specificity in ALDH2. Although the present results are valuable in providing the first demonstration of the capacity of

ALDH1B1 to metabolize nitroglycerin, limitations in the sensitivity of UPLC methods utilized in the present study prevent the determination of kinetic properties (i.e., Km and

Vmax) for the metabolism of nitroglycerin by ALDH1B1. Future studies using LC/MS should be performed to better define the kinetic properties and the ratio of metabolites

153 created by ALDH1B1 at lower, sub-saturating nitroglycerin concentrations. The inactivating ALDH2 polymorphism, ALDH2*2, has been shown to have 7-10 fold lower activity against nitroglycerin than the wild-type ALDH2, similar to the reduced activity seen for this isozyme for aldehyde substrates (Li, Zhang et al. 2006; Beretta, Gorren et al.

2010). Nevertheless, a study in individuals with ALDH2*2 genotypes found that sublingual nitroglycerin retained efficacy in 36.1% of individuals (compared with 81.1% in wild-type ALDH2 individuals) (Zhang, Chen et al. 2007). This suggests the presence other enzymes capable of catalytically-activating nitroglycerin. The results of the present study support the hypothesis that ALDH1B1 may be one such enzyme and may serve as an important contributor to the efficacy of nitroglycerin in vivo.

All-trans retinaldehyde is known to be metabolized by ALDH1A1. The present study found the kinetic properties of ALDH1A1 for catalyzing all-trans retinaldehyde to have a Km of 26.8 μM and a Vmax of 74.2 nmol/min/mg protein, a result somewhat higher, but similar to previous reports (e.g. Km = 8.1 µM (Bchini, Vasiliou et al. 2013)).

Consistent with the computational data, ALDH1B1 showed favorable kinetics for retinaldehyde metabolism in vitro. ALDH1B1 was found to have a similar Km of 24.9

μM but a lower Vmax of 20.0 nmol/min/mg protein than ALDH1A1 for all-trans retinaldehyde. Given the role of retinoic acid signaling in cell development and differentiation, these results, when combined with ALDH1B1 associations with development of hematopoietic stem cells (Luo, Wang et al. 2007) and recent reports that

ALDH1B1 may be a marker (Chen, Orlicky et al. 2011), suggest that

ALDH1B1 may play a role in development and differentiation. Should this indeed be the

154 case, disruption of such a function by inactivating mutations would be predicted to have physiological and pathophysiological consequences.

As noted, computational analyses in the present study showed 4-HNE to have a favorable docking pose with ALDH1B1. This seems inconsistent with the previous observation that 4-HNE is a poor substrate for ALDH1B1 with an apparent Km of 3,383

µM and a Vmax of 2,043 nmol/min/mg protein (Stagos, Chen et al. 2010). In the present study, ALDH1B1 was shown to metabolize 4-HNE with higher affinity (Km = 18.5 µM) but lower turnover (10.3 nmol/min/mg protein) than the previous study. In spite of these differing parameter values, our observed Vmax/Km of 0.56 is very similar to that previously reported (0.60) by Stagos and colleagues. The discrepancy is likely due to longer kinetic runs and the more sensitive measurements (fluorescent vs. visible absorbance spectrometry), which were better able to precisely measure the low activity in vitro. Thus, while the previously reported conclusion that 4-HNE is a poor substrate for

ALDH1B1 remains valid, this new information is of particular relevance to studies which attempt to model or generalize the binding of substrates to ALDH1B1.

ALDH1B1 polymorphisms: Three human ALDH1B1 variants were discovered that met the criteria of polymorphism (i.e., > 1% frequency and non-synonymous) at of the time of database query. There is currently an explosion of sequence data becoming available as sequencing shifts from “the human genome” to projects like HapMap and the

1000 human genomes project. As more data becomes available, the “frequency in humans” will change, even assuming even coverage of human populations. It has been long known that genetic polymorphisms often have strong racial biases, and this was evident in ALDH variants as well (Table 6.3). ALDH1B1*2, the inactive variant, is

155 especially prevalent (≈ 40%) in Mexican and Asian populations, and has significant representation (11-27%) in the other racial populations sampled. This is intermediate between ALDH1B1*3 which appears to be widely distributed and even dominant in some populations, possibly because it has little to no effect on enzyme activity, and

ALDH1B1*5 which is almost exclusively found in African populations. Recently, another human polymorphism of ALDH1B1 has been reported (V176I - rs113083991), which is also found in the coenzyme binding domain (Way 2014). The prediction software PolyPhen-2, which classifies how likely polymorphisms are to affect protein function, correctly assigns ALDH1B1*2 as probably damaging and ALDH1B1*3 and

ALDH1B1*5 as benign (Adzhubei, Jordan et al. 2013). This software also classifies the mutation V176I as benign. Given the large number of mutations being discovered, predictive software and computational modeling will continue to play an important role in screening mutations and prioritizing experimental work, especially in cases where the recombinant protein is either difficult or time-consuming to obtain.

Computational-based molecular modeling of ALDH1B1 and its polymorphic variants: In all cases except one, docking analyses in the present study suggested that

ALDH1B1 polymorphic variants would be able to metabolize the same substrates as the wild-type enzyme. The one exception was that no docking pose was found for all-trans retinaldehyde binding to ALDH1B1*3. Computational modeling indicated that in silico, this was due to a shift in a loop that resulted in narrowing of the substrate binding pocket.

As the bulkiest substrate in this study, and one likely to be physiologically important, this means that all-trans retinaldehyde may be a good substrate to test mutations in which the

156 substrate binding cavity may be narrowed. Additional experiments should be carried out to test whether all-trans retinaldehyde metabolism by ALDH1B1*3 is affected in vitro.

Altered cofactor binding plays a role in the changes in catalytic activity of many enzyme variants. An example of this is the ALDH2 polymorphism, ALDH2*2, in which the change in NAD+ binding renders this enzyme catalytically-inactive (Larson, Weiner et al.

2005; Larson, Zhou et al. 2007). In the present study, ALDH2 and ALDH1B1 were shown to have conserved cofactor binding modalities, with many shared hydrogen interactions, which placed NAD+ in similar positions relative to the substrate, i.e., ≈ 5 Å distance. ALDH1B1*3 had the best NAD+ binding profile with the most conserved interactions and a location near the substrate, followed closely by ALDH1B1*1. These proteins were both fully active in vitro as well. ALDH1B1*2 had a poor binding profile, characterized by few conserved interactions and a location far from the substrate. Lack of cofactor binding is the most likely explanation for the complete lack of enzyme activity seen for ALDH1B1*2 in vitro, documented in the present study. ALDH1B1*5 had a relatively poor binding profile where, despite a number of favorable conserved hydrogen bond interactions, the best docking pose showed a nicotinamide ring that was not appropriately bound to the binding cleft, leaving it far away from the substrate. Similar to

ALDH1B1*2, LEU286 was shifted away from the binding site, decreasing the likelihood of necessary interactions. However, this enzyme was fully active in vitro. There are several possibilities which could explain these apparently disparate findings. First, it is possible that ALDH1B1*5 does, indeed, bind NAD+ more poorly than wild-type enzyme, but this had no functional impact on the in vitro experiments because weak binding was overcome by high concentrations of cofactor. Second, the in silico results may simply

157 reflect an artifact or error in the homology model or docking process which would not occur in vivo. Although protein expression was low, making it difficult to perform extensive kinetic studies for each substrate/cofactor, the apparent binding affinity of

NAD+ with ALDH1B1*5 should be determined in the future to determine which of these possibilities is occurring.

The pathophysiological implications of ALDH1B1 mutations remain to be established. Known mutations in other ALDH family members have been shown to play a role in a number of disease states (Marchitti, Brocker et al. 2008). Some of these include: increased risk for certain cancers and myocardial infarction with polymorphisms of ALDH2 (Muto, Hitomi et al. 2000; Yokoyama, Muramatsu et al. 2001; Jo, Kim et al.

2007; Oze, Matsuo et al. 2010); increased risk of spina bifida with polymorphisms of

ALDH1A2 (Deak, Dickerson et al. 2005); γ-hydroxybutyric aciduria with polymorphisms of ALDH5A1 (Akaboshi, Hogema et al. 2003); developmental and metabolic abnormalities with polymorphisms of ALDH6A1 (Chambliss, Gray et al.

2000); and Sjögren-Larsson syndrome with polymorphisms of ALDH3A2 (Rizzo and

Carney 2005). Finally, a linkage analysis of Finnish families identified two chromosomal regions associated with bipolar disorder, 9p13.1, which contains ALDH1B1, among other candidate enzymes, and 7q31 (Palo, Soronen et al. 2010). Given the proposed roles for

ALDH1B1, it is no surprise that polymorphisms with diminished catalytic activity could have significant pathophysiological consequences.

Present in the liver and intestinal tract and possessing a favorable Km, ALDH1B1 is likely to contribute to both first-pass and systemic acetaldehyde detoxification.

Preliminary observations show that ALDH1B1 knockout mice clear acetaldehyde more

158 slowly than wild-type mice, providing additional evidence for a role of ALDH1B1 in ethanol metabolism (Singh and Vasiliou, manuscript in preparation). Previously

ALDH1B1 has been shown to metabolize acetaldehyde, and all ALDH1B1 variants are predicted to be capable of binding acetaldehyde, as reflected in appropriate docking poses

(data not shown). However, poor cofactor binding in ALDH1B1*2 may prevent this variant from being catalytically-active. This would help explain why ALDH1B1*2 was associated with changes in acetaldehyde toxicity in population association studies and

ALDH1B1*3 was not (Husemoen, Fenger et al. 2008; Linneberg, Gonzalez-Quintela et al. 2010).

In addition to the factors discussed here, other interactions that may affect the metabolic activity of ALDH1B1 variants warrant future experimental consideration. As one example, it will be important to know whether inactivating mutations are dominant or recessive. Similar to ALDH2, ALDH1B1 likely forms homotetramers. The ALDH2*2 variant is dominant negative, meaning that ALDH2*1/*2 heterotetramers are inactive and degraded (Crabb, Stewart et al. 1995; Xiao, Weiner et al. 1996). Work by Linneberg and colleagues suggests that this may not be the case for ALDH1B1*2 because the prevalence of ethanol hypersensitivity reactions increased in a trend-wise fashion (15% in

ALDH1B1*1/*1, 19% in ALDH1B1*1/*2, and 31% in ALDH1B1*2/*2) (Linneberg,

Gonzalez-Quintela et al. 2010). Although genotypes were not statistically tested individually in that study, this single result is not consistent with a dominant negative interaction. Additionally, in the present study, substrate/cofactor interactions have been modeled as monomers. This is a logical initial approach due to (i) the greatly increased computational cost of modeling a full tetrameric protein, and (ii) none of the mutations

159 appear to reside in protein-protein interfaces. While state-of-the-art molecular dynamics software was used, the analyses in this study can be performed on a single modern PC over the course of days to weeks. Modeling of larger systems, such as a tetramer, requires many months to compute and may require more advanced computational resources.

These considerations notwithstanding, future modeling should examine the effect that tetramers may have on either (i) restraining the shifts caused by the mutations or (ii) propagating amino acid shifts to dimer or tetramer partners.

Conclusions: Computational-based molecular modeling studies allow prediction of enzyme catalytic activities and may provide a mechanistic explanation of experimental data. The results of the present study offer a possible physicochemical explanation for the differences in ethanol sensitivity between ALDH1B1*2 and ALDH1B1*3. This work demonstrates that ALDH1B1 metabolizes nitroglycerin and all-trans retinaldehyde.

Computational modeling predicts that some ALDH1B1 polymorphic variants will be catalytically-inactive due to poor substrate and/or cofactor binding. Clearly, the diminished catalytic activity of the variants may adversely impact physiological processes in which ALDH1B1 has a functional role. As the in vivo functions of

ALDH1B1 become more clearly defined, it will be important for investigators to consider the impact polymorphic variants may have in the manifestation of diseases or in variations in the efficacy of therapeutic interventions.

160

CHAPTER VII

CONCLUSIONS AND FUTURE DIRECTIONS

With ever more increased sequencing ability, there is an unprecedented amount of publicly available genetic information available for analysis. While automated algorithms can usually correctly assign a putative gene to the correct superfamily, often careful curation work is required to correctly assign specific family and group names and numbers. This thesis has greatly expanded that work in the ALDH superfamily, especially in vertebrates. Both ‘bottom-up’ (manual annotation of each of the individual genes in an organism, looking at genetic, protein and percent identity between groups), and ‘top-down’ (placing the gene in broad phylogenetic context to see extent, phylogenetic origins and homology) approaches are used here and have been extremely valuable in understanding the ALDH gene family. First, manual annotation of the full set of ALDHs in a number of individual vertebrate genomes gave a starting point, but raised a number of questions regarding the phylogenetic origin of two key ALDH members,

ALDH1B1 and ALDH16A1. Placing the genes in broad phylogenetic context allowed the determination of ALDH1B1 as a retrotransposition of ALDH2, which likely occurred early in vertebrates but has only been kept in mammals and one amphibian. As more amphibians are sequenced, it will be interesting to see the role it plays in that lineage.

Second, this type of analysis was able to identify the non-catalytic form of ALDH16A1 as likely originating in fish but being transferred to an early amniote ancestor. This

161 allows future researchers of ALDH16A1 to focus on the common and divergent roles of

ALDH16A1 in these unique and divergent lineages.

This important nomenclature work is much less well defined in groups other than plants and animals. For example, bacteria especially lack a unified ALDH nomenclature system, a task that is complicated by high diversity and a number of multidomain / multifunctional proteins. This should be addressed in future studies. In particular, it has become clear that many ALDHs are homologous between even humans and bacteria. It is likely in this case that strict amino acid percent identity requirements (e.g. 60% between subfamilies and 40% between families) should be relaxed to allow naming by homology.

It would be much more useful to users of the ALDH nomenclature system if homologs were consistently named across all domains of life, rather than being assigned new family numbers, as it is becoming easier to detect such relationships across broad groups than it was previously.

Also made possible by the available genetic information and careful understanding of the catalytic mechanisms of ALDH action was the search for non- catalytic members of the ALDH superfamily. This work has shown the use of bioinformatics and the mining of large database to find new functional groups of enzymes. As before, due to the prevalence of sequencing errors in large databases, having multiple records across multiple species and placing these records into phylogenetic context is key to reducing false positives. New groups of dead enzymes were found, especially in bacteria and fungi, groups that require work on ALDH organization and nomenclature. Some records appear to represent known ALDH enzymes that appear to be non-catalytic in a specific lineage (similar to that found in ALDH16A1), but many

162 records belong to groups that have completely unknown functions at this time (i.e.,

Groups 2, 8, 13, 17, and 18).

There is much work to do in the area of dead enzymes in general and in the

ALDH superfamily. The Vasiliou lab has begun work on characterizing and determining the functional role of ALDH16A1. Previous work has shown the utility of computational modeling to predict the active site structure of ALDH16A1, and this may be used to begin work categorizing and investigating newly discovered groups of non-catalytic

ALDHs. Further investigation of these newly discovered groups of ALDH dead enzymes should follow several steps – 1) determining the number and phylogenetic extent of

ALDHs in these groups, 2) characterizing the function of the closely related enzymatic

ALDHs that these dead-enzymes derived from, and 3) characterizing the function of the new non-catalytic members. As discussed previously, genetic databases have been doubling and tripling every few years, and the exact study described here can be repeated every several years to provide new insights and to resolve questions about the nature and extent of ALDH dead enzymes.

In the present work, for the first time it was shown that ALDH1B1 metabolizes retinaldehyde, a substrate that likely accounts for the role of ALDH1B1 in development and differentiation. Common human ALDH1B1 mutants were also characterized, including one that was both completely inactive and relatively common. Again, this work demonstrates the use of computational modeling to predict and sort mutations to prioritize in vitro resources, especially in cases where heterologous protein expression is difficult. In vitro work in this study, and work by others suggests that these mutations may affect detoxification pathways (especially ethanol via acetaldehyde), growth and

163 development (via retinaldehyde), and cancer progression (via retinaldehyde). Previous longitudinal studies have implicated the inactivating ALDH1B1*2 mutation in increased alcohol sensitivity. A possible interaction between ALDH1B1 and ALDH2*2 mutants was also described. This interaction still needs to be tested in vitro, but if it is shown that

ALDH1B1 is inactivated or downregulated by ALDH2*2, then ALDH2*2 can be considered a partially or fully inactivating mutation of ALDH1B1 in future studies.

This work raises many important questions for the future. In published and unpublished work by the Vasiliou lab, ALDH1B1 has been shown to be 1) a highly expressed marker in colon cancer and 2) required for colon cancer tumor spheroid development, at least in vitro. To further the conclusions reached by these in vitro and in vivo data, epidemiological surveys of ALDH1B1 mutational status should be carried out in colon cancer tumor samples and compared with human population frequencies to show if indeed, a lack of ALDH1B1 might be protective against colon cancer. If this were the case, ALDH1B1 may be an attractive target for pharmacological inhibition or knockdown via gene therapy.

164

REFERENCES

(2015). "UniProt: a hub for protein information." Nucleic Acids Res 43(Database issue): D204-212.

Abedinia, M., T. Pain, et al. (1990). "Bovine corneal aldehyde dehydrogenase: the major soluble corneal protein with a possible dual protective role for the eye." Exp Eye Res 51(4): 419-426.

Adrain, C. and M. Freeman (2012). "New lives for old: evolution of pseudoenzyme function illustrated by iRhoms." Nat Rev Mol Cell Biol 13(8): 489-498.

Adrain, C., M. Zettl, et al. (2012). "Tumor necrosis factor signaling requires iRhom2 to promote trafficking and activation of TACE." Science 335(6065): 225-228.

Adzhubei, I., D. M. Jordan, et al. (2013). "Predicting functional effect of human missense mutations using PolyPhen-2." Curr Protoc Hum Genet Chapter 7: Unit7 20.

Akaboshi, S., B. M. Hogema, et al. (2003). "Mutational spectrum of the succinate semialdehyde dehydrogenase (ALDH5A1) gene and functional analysis of 27 novel disease-causing mutations in patients with SSADH deficiency." Hum Mutat 22(6): 442-450.

Alnouti, Y. and C. D. Klaassen (2008). "Tissue distribution, ontogeny, and regulation of aldehyde dehydrogenase (Aldh) enzymes mRNA by prototypical microsomal enzyme inducers in mice." Toxicol Sci 101(1): 51-64.

Amemiya, C. T., J. Alfoldi, et al. (2013). "The African coelacanth genome provides insights into tetrapod evolution." Nature 496(7445): 311-316.

Appling, D. R. (1991). "Compartmentation of folate-mediated one-carbon metabolism in eukaryotes." FASEB J 5(12): 2645-2651.

Arber, W. (2014). "Horizontal Gene Transfer among Bacteria and Its Role in Biological Evolution." Life (Basel) 4(2): 217-224.

Banfi, P., C. Lanzi, et al. (1994). "The daunorubicin-binding protein of Mr 54,000 is an aldehyde dehydrogenase and is down-regulated in mouse liver tumors and in tumor cell lines." Mol Pharmacol 46(5): 896-900.

Baselga, J. and S. M. Swain (2009). "Novel anticancer targets: revisiting ERBB2 and discovering ERBB3." Nat Rev Cancer 9(7): 463-475.

Bchini, R., V. Vasiliou, et al. (2013). "Retinoic acid biosynthesis catalyzed by retinal dehydrogenases relies on a rate-limiting conformational transition associated with substrate recognition." Chem Biol Interact 202(1-3): 78-84.

165

Behrends, C., M. E. Sowa, et al. (2010). "Network organization of the human autophagy system." Nature 466(7302): 68-76.

Beretta, M., A. C. Gorren, et al. (2010). "Characterization of the East Asian variant of aldehyde dehydrogenase-2: bioactivation of nitroglycerin and effects of Alda-1." J Biol Chem 285(2): 943-952.

Beretta, M., A. Sottler, et al. (2008). "Partially irreversible inactivation of mitochondrial aldehyde dehydrogenase by nitroglycerin." J Biol Chem 283(45): 30735-30744.

Berman, H. M., J. Westbrook, et al. (2000). "The Protein Data Bank." Nucleic Acids Res 28(1): 235-242.

Black, W. and V. Vasiliou (2009). "The aldehyde dehydrogenase gene superfamily resource center." Hum Genomics 4(2): 136-142.

Black, W. J., D. Stagos, et al. (2009). "Human aldehyde dehydrogenase genes: alternatively spliced transcriptional variants and their suggested nomenclature." Pharmacogenet Genomics 19(11): 893-902.

Brew, K., T. C. Vanaman, et al. (1967). "Comparison of the amino acid sequence of bovine alpha-lactalbumin and hens egg white lysozyme." J Biol Chem 242(16): 3747-3749.

Brocker, C., M. Cantore, et al. (2011). "Aldehyde dehydrogenase 7A1 (ALDH7A1) attenuates reactive aldehyde and oxidative stress induced cytotoxicity." Chem Biol Interact 191(1-3): 269-277.

Brocker, C., N. Lassen, et al. (2010). "Aldehyde dehydrogenase 7A1 (ALDH7A1) is a novel enzyme involved in cellular defense against hyperosmotic stress." J Biol Chem 285(24): 18452-18463.

Brocker, C., M. Vasiliou, et al. (2013). "Aldehyde dehydrogenase (ALDH) superfamily in plants: gene nomenclature and comparative genomics." Planta 237(1): 189-210.

Brooks, B. R., R. E. Bruccoleri, et al. (1983). "Charmm - a Program for Macromolecular Energy, Minimization, and Dynamics Calculations." Journal of Computational Chemistry 4(2): 187-217.

Canuto, R. A., M. Ferro, et al. (1994). "Role of aldehyde metabolizing enzymes in mediating effects of aldehyde products of lipid peroxidation in liver cells." Carcinogenesis 15(7): 1359-1364.

Canuto, R. A., G. Muzio, et al. (1999). "Inhibition of class-3 aldehyde dehydrogenase and cell growth by restored lipid peroxidation in hepatoma cell lines." Free Radic Biol Med 26(3-4): 333-340.

166

Chambliss, K. L., R. G. Gray, et al. (2000). "Molecular characterization of methylmalonate semialdehyde dehydrogenase deficiency." J Inherit Metab Dis 23(5): 497-504.

Chang, C. and A. Yoshida (1994). "Cloning and characterization of the gene encoding mouse mitochondrial aldehyde dehydrogenase." Gene 148(2): 331-336.

Chang, C. and A. Yoshida (1997). "Human fatty aldehyde dehydrogenase gene (ALDH10): organization and tissue-dependent expression." Genomics 40(1): 80- 85.

Chang, Y. F., P. Ghosh, et al. (1990). "L-pipecolic acid metabolism in human liver: L- alpha-aminoadipate delta-semialdehyde ." Biochim Biophys Acta 1038(3): 300-305.

Chen, R. and Z. Weng (2002). "Docking unbound proteins using shape complementarity, desolvation, and electrostatics." Proteins 47(3): 281-294.

Chen, X. Q., J. R. He, et al. (2012). "Decreased expression of ALDH1L1 is associated with a poor prognosis in hepatocellular carcinoma." Med Oncol 29(3): 1843- 1849.

Chen, Y., D. J. Orlicky, et al. (2011). "Aldehyde dehydrogenase 1B1 (ALDH1B1) is a potential biomarker for human colon cancer." Biochem Biophys Res Commun 405(2): 173-179.

Chen, Y., D. C. Thompson, et al. (2013). "Ocular aldehyde dehydrogenases: protection against ultraviolet damage and maintenance of transparency for vision." Prog Retin Eye Res 33: 28-39.

Chen, Y. C., G. S. Peng, et al. (2009). "Pharmacokinetic and pharmacodynamic basis for overcoming acetaldehyde-induced adverse reaction in Asian alcoholics, heterozygous for the variant ALDH2*2 gene allele." Pharmacogenet Genomics 19(8): 588-599.

Chen, Z., M. W. Foster, et al. (2005). "An essential role for mitochondrial aldehyde dehydrogenase in nitroglycerin bioactivation." Proc Natl Acad Sci U S A 102(34): 12159-12164.

Chen, Z. and J. S. Stamler (2006). "Bioactivation of nitroglycerin by the mitochondrial aldehyde dehydrogenase." Trends Cardiovasc Med 16(8): 259-265.

Cook, T. A., S. E. Luczak, et al. (2005). "Associations of ALDH2 and ADH1B genotypes with response to alcohol in Asian Americans." J Stud Alcohol 66(2): 196-204.

Cordaux, R. and M. A. Batzer (2009). "The impact of retrotransposons on human genome evolution." Nat Rev Genet 10(10): 691-703.

167

Crabb, D. W., M. J. Stewart, et al. (1995). "Hormonal and chemical influences on the expression of class 2 aldehyde dehydrogenases in rat H4IIEC3 and human HuH7 hepatoma cells." Alcohol Clin Exp Res 19(6): 1414-1419.

Crisp, A., C. Boschetti, et al. (2015). "Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes." Genome Biol 16(1): 50.

Dahl, H. H., R. M. Brown, et al. (1990). "A testis-specific form of the human pyruvate dehydrogenase E1 alpha subunit is coded for by an intronless gene on chromosome 4." Genomics 8(2): 225-232.

Daiber, A., M. Oelze, et al. (2009). "Nitrate tolerance as a model of vascular dysfunction: roles for mitochondrial aldehyde dehydrogenase and mitochondrial oxidative stress." Pharmacol Rep 61(1): 33-48.

Daiber, A., P. Wenzel, et al. (2009). "Mitochondrial aldehyde dehydrogenase (ALDH-2)- -maker of and marker for nitrate tolerance in response to nitroglycerin treatment." Chem Biol Interact 178(1-3): 40-47.

Deak, K. L., M. E. Dickerson, et al. (2005). "Analysis of ALDH1A2, CYP26A1, CYP26B1, CRABP1, and CRABP2 in human neural tube defects suggests a possible association with alleles in ALDH1A2." Birth Defects Res A Clin Mol Teratol 73(11): 868-875.

Dupe, V., N. Matt, et al. (2003). "A newborn lethal defect due to inactivation of retinaldehyde dehydrogenase type 3 is prevented by maternal retinoic acid treatment." Proc Natl Acad Sci U S A 100(24): 14036-14041.

Durrant, J. D. and J. A. McCammon (2011). "BINANA: a novel algorithm for ligand- binding characterization." J Mol Graph Model 29(6): 888-893.

Eguchi, G. (1966). "[Crystalline lens]." Tanpakushitsu Kakusan Koso 11(11): 1083-1084.

Endo, J., M. Sano, et al. (2009). "Metabolic remodeling induced by mitochondrial aldehyde stress stimulates tolerance to oxidative stress in the heart." Circ Res 105(11): 1118-1127.

Enomoto, N., S. Takase, et al. (1991). "Acetaldehyde metabolism in different aldehyde dehydrogenase-2 genotypes." Alcohol Clin Exp Res 15(1): 141-144.

Estey, T., M. Cantore, et al. (2007). "Mechanisms involved in the protection of UV- induced protein inactivation by the corneal crystallin ALDH3A1." J Biol Chem 282(7): 4382-4392.

Estey, T., J. Piatigorsky, et al. (2007). "ALDH3A1: a corneal crystallin with diverse functions." Exp Eye Res 84(1): 3-12.

168

Ewing, R. M., P. Chu, et al. (2007). "Large-scale mapping of human protein-protein interactions by mass spectrometry." Mol Syst Biol 3: 89.

Farres, J., P. Julia, et al. (1988). "Aldehyde oxidation in human placenta. Purification and properties of 1-pyrroline-5-carboxylate dehydrogenase." Biochem J 256(2): 461- 467.

Feig, M. and C. L. Brooks, 3rd (2004). "Recent advances in the development and application of implicit solvent models in biomolecule simulations." Curr Opin Struct Biol 14(2): 217-224.

Ferencz-Biro, K. and R. Pietruszko (1984). "Human aldehyde dehydrogenase: catalytic activity in oriental liver." Biochem Biophys Res Commun 118(1): 97-102.

Finn, R. D., A. Bateman, et al. (2014). "Pfam: the protein families database." Nucleic Acids Res 42(Database issue): D222-230.

Finn, R. D., J. Clements, et al. (2011). "HMMER web server: interactive sequence similarity searching." Nucleic Acids Res 39(Web Server issue): W29-37.

Finn, R. D., J. Mistry, et al. (2010). "The Pfam protein families database." Nucleic Acids Res 38(Database issue): D211-222.

Forte-McRobbie, C. M. and R. Pietruszko (1986). "Purification and characterization of human liver "high Km" aldehyde dehydrogenase and its identification as glutamic gamma-semialdehyde dehydrogenase." J Biol Chem 261(5): 2154-2163.

Foster, L. J., A. Rudich, et al. (2006). "Insulin-dependent interactions of proteins with GLUT4 revealed through stable isotope labeling by amino acids in cell culture (SILAC)." J Proteome Res 5(1): 64-75.

Garcia-Martinez, L. F. and D. R. Appling (1993). "Characterization of the folate- dependent mitochondrial oxidation of carbon 3 of serine." Biochemistry 32(17): 4671-4676.

Geraghty, M. T., D. Vaughn, et al. (1998). "Mutations in the Delta1-pyrroline 5- carboxylate dehydrogenase gene cause type II hyperprolinemia." Hum Mol Genet 7(9): 1411-1415.

Goedde, H. W. and D. P. Agarwal (1987). "Polymorphism of aldehyde dehydrogenase and alcohol sensitivity." Enzyme 37(1-2): 29-44.

Goodman, M. (1999). "The genomic record of Humankind's evolutionary roots." Am J Hum Genet 64(1): 31-39.

Goodman, M., C. A. Porter, et al. (1998). "Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence." Mol Phylogenet Evol 9(3): 585-598.

169

Goodwin, G. W., P. M. Rougraff, et al. (1989). "Purification and characterization of methylmalonate-semialdehyde dehydrogenase from rat liver. Identity to malonate- semialdehyde dehydrogenase." J Biol Chem 264(25): 14965-14971.

Graham, C., J. Hodin, et al. (1996). "A retinaldehyde dehydrogenase as a structural protein in a mammalian eye lens. Gene recruitment of eta-crystallin." J Biol Chem 271(26): 15623-15628.

Graham, C. E., K. Brocklehurst, et al. (2006). "Characterization of retinaldehyde dehydrogenase 3." Biochem J 394(Pt 1): 67-75.

Greiling, T. M. and J. I. Clark (2008). "The transparent lens and cornea in the mouse and zebra fish eye." Semin Cell Dev Biol 19(2): 94-99.

Hanna, M. C. and C. Blackstone (2009). "Interaction of the SPG21 protein ACP33/maspardin with the aldehyde dehydrogenase ALDH16A1." Neurogenetics 10(3): 217-228.

Haslett, M. R., D. Pink, et al. (2004). "Assay and subcellular localization of pyrroline-5- carboxylate dehydrogenase in rat liver." Biochim Biophys Acta 1675(1-3): 81-86.

Haug, S. and M. Braun-Falco (2006). "Restoration of fatty aldehyde dehydrogenase deficiency in Sjogren-Larsson syndrome." Gene Ther 13(13): 1021-1026.

Hedges, S. B. (2002). "The origin and evolution of model organisms." Nat Rev Genet 3(11): 838-849.

Hempel, J., R. Kaiser, et al. (1985). "Mitochondrial aldehyde dehydrogenase from human liver. Primary structure, differences in relation to the cytosolic enzyme, and functional correlations." Eur J Biochem 153(1): 13-28.

Higuchi, S., T. Muramatsu, et al. (1992). "The relationship between low Km aldehyde dehydrogenase phenotype and drinking behavior in Japanese." J Stud Alcohol 53(2): 170-175.

Holmes, R. S., B. Cheung, et al. (1989). "Isoelectric focusing studies of aldehyde dehydrogenases, alcohol dehydrogenases and oxidases from mammalian anterior eye tissues." Comp Biochem Physiol B 93(2): 271-277.

Hsu, L. C., R. E. Bendel, et al. (1988). "Genomic structure of the human mitochondrial aldehyde dehydrogenase gene." Genomics 2(1): 57-65.

Hsu, L. C. and W. C. Chang (1991). "Cloning and characterization of a new functional human aldehyde dehydrogenase gene." J Biol Chem 266(19): 12257-12265.

170

Hsu, L. C., W. C. Chang, et al. (2000). "Mouse type-2 retinaldehyde dehydrogenase (RALDH2): genomic organization, tissue-dependent expression, chromosome assignment and comparison to other types." Biochim Biophys Acta 1492(1): 289- 293.

Hu, C. A., W. W. Lin, et al. (1999). "Molecular enzymology of mammalian Delta1- pyrroline-5-carboxylate synthase. Alternative splice donor utilization generates isoforms with different sensitivity to ornithine inhibition." J Biol Chem 274(10): 6754-6762.

Hu, C. A., W. W. Lin, et al. (1996). "Cloning, characterization, and expression of cDNAs encoding human delta 1-pyrroline-5-carboxylate dehydrogenase." J Biol Chem 271(16): 9795-9800.

Hua, F., R. Mu, et al. (2011). "TRB3 interacts with SMAD3 promoting tumor cell migration and invasion." J Cell Sci 124(Pt 19): 3235-3246.

Huang, J., N. Hu, et al. (2000). "High frequency allelic loss on chromosome 17p13.3- p11.1 in esophageal squamous cell carcinomas from a high incidence area in northern China." Carcinogenesis 21(11): 2019-2026.

Husemoen, L. L., M. Fenger, et al. (2008). "The association of ADH and ALDH gene variants with alcohol drinking habits and cardiovascular disease risk factors." Alcohol Clin Exp Res 32(11): 1984-1991.

Ichihara, K., E. Kusunose, et al. (1986). "Some properties of the fatty alcohol oxidation system and reconstitution of microsomal oxidation activity in intestinal mucosa." Biochim Biophys Acta 878(3): 412-418.

Ioannou, M., I. Serafimidis, et al. (2013). "ALDH1B1 is a potential stem/progenitor marker for multiple pancreas progenitor pools." Dev Biol 374(1): 153-163.

Jackson, B., C. Brocker, et al. (2011). "Update on the aldehyde dehydrogenase gene (ALDH) superfamily." Hum Genomics 5(4): 283-303.

Jackson, B. C., R. S. Holmes, et al. (2013). "Comparative genomics, molecular evolution and computational modeling of ALDH1B1 and ALDH2." Chem Biol Interact 202(1-3): 11-21.

Jester, J. V. (2008). "Corneal crystallins and the development of cellular transparency." Semin Cell Dev Biol 19(2): 82-93.

Jia, S., M. Omelchenko, et al. (2007). "Duplicated gelsolin family genes in zebrafish: a novel scinderin-like gene (scinla) encodes the major corneal crystallin." FASEB J 21(12): 3318-3328.

171

Jo, S. A., E. K. Kim, et al. (2007). "A Glu487Lys polymorphism in the gene for mitochondrial aldehyde dehydrogenase 2 is associated with myocardial infarction in elderly Korean men." Clin Chim Acta 382(1-2): 43-47.

Johnsen, J., A. Stowell, et al. (1992). "Clinical responses in relation to blood acetaldehyde levels." Pharmacol Toxicol 70(1): 41-45.

Kamino, K., K. Nagasaka, et al. (2000). "Deficiency in mitochondrial aldehyde dehydrogenase increases the risk for late-onset Alzheimer's disease in the Japanese population." Biochem Biophys Res Commun 273(1): 192-196.

Kang, J. H., Y. B. Park, et al. (2005). "High-level expression and characterization of the recombinant enzyme, and tissue distribution of human succinic semialdehyde dehydrogenase." Protein Expr Purif 44(1): 16-22.

Karolchik, D., G. Bejerano, et al. (2007). "Comparative genomic analysis using the UCSC genome browser." Methods Mol Biol 395: 17-34.

Karpinka, J. B., J. D. Fortriede, et al. (2015). "Xenbase, the Xenopus model organism database; new virtualized system, data types and genomes." Nucleic Acids Res 43(Database issue): D756-763.

Kedishvili, N. Y., K. M. Popov, et al. (1992). "CoA-dependent methylmalonate- semialdehyde dehydrogenase, a unique member of the aldehyde dehydrogenase superfamily. cDNA cloning, evolutionary relationships, and tissue distribution." J Biol Chem 267(27): 19724-19729.

Keeling, P. J. and J. D. Palmer (2008). "Horizontal gene transfer in eukaryotic evolution." Nat Rev Genet 9(8): 605-618.

Kerrien, S., B. Aranda, et al. (2012). "The IntAct molecular interaction database in 2012." Nucleic Acids Res 40(Database issue): D841-846.

King, G. and R. Holmes (1997). "Human corneal and lens aldehyde dehydrogenases. Purification and properties of human lens ALDH1 and differential expression as major soluble proteins in human lens (ALDH1) and cornea (ALDH3)." Adv Exp Med Biol 414: 19-27.

Kitamura, T., S. Takagi, et al. (2015). "Mouse aldehyde dehydrogenase ALDH3B2 is localized to lipid droplets via two C-terminal tryptophan residues and lipid modification." Biochem J 465(1): 79-87.

Klyosov, A. A., L. G. Rashkovetsky, et al. (1996). "Possible role of liver cytosolic and mitochondrial aldehyde dehydrogenases in acetaldehyde metabolism." Biochemistry 35(14): 4445-4456.

172

Koppaka, V., D. C. Thompson, et al. (2012). "Aldehyde dehydrogenase inhibitors: a comprehensive review of the pharmacology, mechanism of action, substrate specificity, and clinical application." Pharmacol Rev 64(3): 520-539.

Krupenko, N. I., M. E. Dubard, et al. (2010). "ALDH1L2 is the mitochondrial homolog of 10-formyltetrahydrofolate dehydrogenase." J Biol Chem 285(30): 23056- 23063.

Krupenko, S. A., D. A. Horstman, et al. (1995). "Baculovirus expression and purification of rat 10-formyltetrahydrofolate dehydrogenase." Protein Expr Purif 6(4): 457- 464.

Krupenko, S. A. and N. V. Oleinik (2002). "10-formyltetrahydrofolate dehydrogenase, one of the major folate enzymes, is down-regulated in tumor tissues and possesses suppressor effects on cancer cells." Cell Growth Differ 13(5): 227-236.

Landin, J. S., S. D. Cohen, et al. (1996). "Identification of a 54-kDa mitochondrial acetaminophen-binding protein as aldehyde dehydrogenase." Toxicol Appl Pharmacol 141(1): 299-307.

Larkin, M. A., G. Blackshields, et al. (2007). "Clustal W and Clustal X version 2.0." Bioinformatics 23(21): 2947-2948.

Larson, H. N., H. Weiner, et al. (2005). "Disruption of the coenzyme binding site and dimer interface revealed in the crystal structure of mitochondrial aldehyde dehydrogenase "Asian" variant." J Biol Chem 280(34): 30550-30556.

Larson, H. N., J. Zhou, et al. (2007). "Structural and functional consequences of coenzyme binding to the inactive asian variant of mitochondrial aldehyde dehydrogenase: roles of residues 475 and 487." J Biol Chem 282(17): 12940- 12950.

Lassen, N., T. Estey, et al. (2005). "Molecular cloning, baculovirus expression, and tissue distribution of the zebrafish aldehyde dehydrogenase 2." Drug Metab Dispos 33(5): 649-656.

Lassen, N., A. Pappa, et al. (2006). "Antioxidant function of corneal ALDH3A1 in cultured stromal fibroblasts." Free Radic Biol Med 41(9): 1459-1469.

Lee, K. H., H. S. Kim, et al. (2002). "Chaperonin GroESL mediates the protein folding of human liver mitochondrial aldehyde dehydrogenase in Escherichia coli." Biochem Biophys Res Commun 298(2): 216-224.

Lee, P., W. Kuhl, et al. (1994). "Homology between a human protein and a protein of the green garden pea." Genomics 21(2): 371-378.

173

Lee, Y. P., J. T. Liao, et al. (2013). "Inhibition of human alcohol and aldehyde dehydrogenases by acetaminophen: Assessment of the effects on first-pass metabolism of ethanol." Alcohol 47(7): 559-565.

Leslie, M. (2013). "Molecular biology. 'Dead' enzymes show signs of life." Science 340(6128): 25-27.

Li, Y., D. Zhang, et al. (2006). "Mitochondrial aldehyde dehydrogenase-2 (ALDH2) Glu504Lys polymorphism contributes to the variation in efficacy of sublingual nitroglycerin." J Clin Invest 116(2): 506-511.

Liberles, D. A. (2001). "Evaluation of methods for determination of a reconstructed history of gene sequence evolution." Mol Biol Evol 18(11): 2040-2047.

Lin, M. and J. L. Napoli (2000). "cDNA cloning and expression of a human aldehyde dehydrogenase (ALDH) active with 9-cis-retinal and identification of a rat ortholog, ALDH12." J Biol Chem 275(51): 40106-40112.

Lin, S., N. A. Page, et al. (2013). "In vitro organic nitrate bioactivation to nitric oxide by recombinant aldehyde dehydrogenase 3A1." Nitric Oxide 35: 137-143.

Lin, S. W., J. C. Chen, et al. (1996). "Human gamma-aminobutyraldehyde dehydrogenase (ALDH9): cDNA sequence, genomic organization, polymorphism, chromosomal localization, and tissue expression." Genomics 34(3): 376-380.

Linneberg, A., A. Gonzalez-Quintela, et al. (2010). "Genetic determinants of both ethanol and acetaldehyde metabolism influence alcohol hypersensitivity and drinking behaviour among Scandinavians." Clin Exp Allergy 40(1): 123-130.

Liu, Z. J., Y. J. Sun, et al. (1997). "The first structure of an aldehyde dehydrogenase reveals novel interactions between NAD and the Rossmann fold." Nat Struct Biol 4(4): 317-326.

Lu, J., E. Peatman, et al. (2012). "Profiling of gene duplication patterns of sequenced teleost genomes: evidence for rapid lineage-specific genome expansion mediated by recent tandem duplications." BMC Genomics 13: 246.

Luo, P., A. Wang, et al. (2007). "Intrinsic retinoic acid receptor alpha-cyclin-dependent kinase-activating kinase signaling involves coordination of the restricted proliferation and granulocytic differentiation of human hematopoietic stem cells." Stem Cells 25(10): 2628-2637.

Lyons, E. and M. Freeling (2008). "How to usefully compare homologous plant genes and chromosomes as DNA sequences." Plant J 53(4): 661-673.

174

MacKerell, A. D., Jr., E. E. Blatter, et al. (1986). "Human aldehyde dehydrogenase: kinetic identification of the isozyme for which biogenic aldehydes and acetaldehyde compete." Alcohol Clin Exp Res 10(3): 266-270.

Manning, G., D. B. Whyte, et al. (2002). "The protein kinase complement of the human genome." Science 298(5600): 1912-1934.

Marchitti, S. A., C. Brocker, et al. (2008). "Non-P450 aldehyde oxidizing enzymes: the aldehyde dehydrogenase superfamily." Expert Opin Drug Metab Toxicol 4(6): 697-720.

Marchitti, S. A., R. A. Deitrich, et al. (2007). "Neurotoxicity and metabolism of the -derived 3,4-dihydroxyphenylacetaldehyde and 3,4- dihydroxyphenylglycolaldehyde: the role of aldehyde dehydrogenase." Pharmacol Rev 59(2): 125-150.

Marchitti, S. A., D. J. Orlicky, et al. (2007). "Expression and initial characterization of human ALDH3B1." Biochem Biophys Res Commun 356(3): 792-798.

Matsuda, T., H. Yabushita, et al. (2006). "Increased DNA damage in ALDH2-deficient alcoholics." Chem Res Toxicol 19(10): 1374-1378.

McCarrey, J. R., M. Kumari, et al. (1996). "Analysis of the cDNA and encoded protein of the human testis-specific PGK-2 gene." Dev Genet 19(4): 321-332.

McClean, S. W., M. E. Ruddel, et al. (1982). "Liquid-chromatographic assay for retinol (vitamin A) and retinol analogs in therapeutic trials." Clin Chem 28(4 Pt 1): 693- 696.

Mills, P. B., E. Struys, et al. (2006). "Mutations in antiquitin in individuals with pyridoxine-dependent seizures." Nat Med 12(3): 307-309.

Molotkov, A. and G. Duester (2003). "Genetic evidence that retinaldehyde dehydrogenase Raldh1 (Aldh1a1) functions downstream of Adh1 in metabolism of retinol to retinoic acid." J Biol Chem 278(38): 36085-36090.

Moore, S. A., H. M. Baker, et al. (1998). "Sheep liver cytosolic aldehyde dehydrogenase: the structure reveals the basis for the retinal specificity of class 1 aldehyde dehydrogenases." Structure 6(12): 1541-1551.

Mullins, M. (1995). "Genetic nomenclature guide. Zebrafish." Trends Genet: 31-32.

Muto, M., Y. Hitomi, et al. (2000). "Association of aldehyde dehydrogenase 2 gene polymorphism with multiple oesophageal dysplasia in head and neck cancer patients." Gut 47(2): 256-261.

175

Muzio, G., A. Trombetta, et al. (2003). "Antisense oligonucleotides against aldehyde dehydrogenase 3 inhibit hepatoma cell proliferation by affecting MAP kinases." Chem Biol Interact 143-144: 37-43.

Needleman, S. B. and C. D. Wunsch (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins." J Mol Biol 48(3): 443-453.

Nelson, D. R. (2009). "The cytochrome p450 homepage." Hum Genomics 4(1): 59-65.

Nelson, D. R., D. C. Zeldin, et al. (2004). "Comparison of cytochrome P450 (CYP) genes from the mouse and human genomes, including nomenclature recommendations for genes, pseudogenes and alternative-splice variants." Pharmacogenetics 14(1): 1-18.

Nguyen, E. and M. J. Picklo, Sr. (2003). "Inhibition of succinic semialdehyde dehydrogenase activity by alkenal products of lipid peroxidation." Biochim Biophys Acta 1637(1): 107-112.

Niederreither, K., V. Fraulob, et al. (2002). "Differential expression of retinoic acid- synthesizing (RALDH) enzymes during fetal development and organ differentiation in the mouse." Mech Dev 110(1-2): 165-171.

Niederreither, K., V. Subbarayan, et al. (1999). "Embryonic retinoic acid synthesis is essential for early mouse post-implantation development." Nat Genet 21(4): 444- 448.

Niu, Y., D. Otasek, et al. (2010). "Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D." Bioinformatics 26(1): 111-119.

Notredame, C., D. G. Higgins, et al. (2000). "T-Coffee: A novel method for fast and accurate multiple sequence alignment." J Mol Biol 302(1): 205-217.

Ohta, S., I. Ohsawa, et al. (2004). "Mitochondrial ALDH2 deficiency as an oxidative stress." Ann N Y Acad Sci 1011: 36-44.

Oleinik, N. V., N. I. Krupenko, et al. (2006). "Leucovorin-induced resistance against FDH growth suppressor effects occurs through DHFR up-regulation." Biochem Pharmacol 72(2): 256-266.

Oze, I., K. Matsuo, et al. (2010). "Comparison between self-reported facial flushing after alcohol consumption and ALDH2 Glu504Lys polymorphism for risk of upper aerodigestive tract cancer in a Japanese population." Cancer Sci 101(8): 1875- 1880.

176

Palo, O. M., P. Soronen, et al. (2010). "Identification of susceptibility loci at 7q31 and 9p13 for bipolar disorder in an isolated population." Am J Med Genet B Neuropsychiatr Genet 153B(3): 723-735.

Pappa, A., D. Brown, et al. (2005). "Human aldehyde dehydrogenase 3A1 inhibits proliferation and promotes survival of human corneal epithelial cells." J Biol Chem 280(30): 27998-28006.

Pappa, A., T. Estey, et al. (2003). "Human aldehyde dehydrogenase 3A1 (ALDH3A1): biochemical characterization and immunohistochemical localization in the cornea." Biochem J 376(Pt 3): 615-623.

Pappa, A., N. A. Sophos, et al. (2001). "Corneal and stomach expression of aldehyde dehydrogenases: from fish to mammals." Chem Biol Interact 130-132(1-3): 181- 191.

Peng, G. S. and S. J. Yin (2009). "Effect of the allelic variants of aldehyde dehydrogenase ALDH2*2 and alcohol dehydrogenase ADH1B*2 on blood acetaldehyde concentrations." Hum Genomics 3(2): 121-127.

Pereira, F., E. Rosenmann, et al. (1991). "The 56 kDa androgen binding protein is an aldehyde dehydrogenase." Biochem Biophys Res Commun 175(3): 831-838.

Perez-Miller, S. J. and T. D. Hurley (2003). "Coenzyme isomerization is integral to catalysis in aldehyde dehydrogenase." Biochemistry 42(23): 7100-7109.

Phillips, J. C., R. Braun, et al. (2005). "Scalable molecular dynamics with NAMD." Journal of Computational Chemistry 26(16): 1781-1802.

Pierce, B. and Z. Weng (2007). "ZRANK: reranking protein docking predictions with an optimized energy function." Proteins 67(4): 1078-1086.

Pils, B. and J. Schultz (2004). "Inactive enzyme-homologues find new function in regulatory processes." J Mol Biol 340(3): 399-404.

Pittlik, S., S. Domingues, et al. (2008). "Expression of zebrafish (raldh3) and absence of in teleosts." Gene Expr Patterns 8(3): 141-147.

Reichard, J. F., V. Vasiliou, et al. (2000). "Characterization of 4-hydroxy-2-nonenal metabolism in stellate cell lines derived from normal and cirrhotic rat liver." Biochim Biophys Acta 1487(2-3): 222-232.

Reiterer, V., P. A. Eyers, et al. (2014). "Day of the dead: pseudokinases and pseudophosphatases in physiology and disease." Trends Cell Biol 24(9): 489-505.

Rhinn, M. and P. Dolle (2012). "Retinoic acid signalling during development." Development 139(5): 843-858.

177

Rizzo, W. B. (2007). "Sjogren-Larsson syndrome: molecular genetics and biochemical pathogenesis of fatty aldehyde dehydrogenase deficiency." Mol Genet Metab 90(1): 1-9.

Rizzo, W. B. and G. Carney (2005). "Sjogren-Larsson syndrome: diversity of mutations and polymorphisms in the fatty aldehyde dehydrogenase gene (ALDH3A2)." Hum Mutat 26(1): 1-10.

Rodriguez, F. J., C. Giannini, et al. (2008). " profiling of NF-1- associated and sporadic pilocytic astrocytoma identifies aldehyde dehydrogenase 1 family member L1 (ALDH1L1) as an underexpressed candidate biomarker in aggressive subtypes." J Neuropathol Exp Neurol 67(12): 1194-1204.

Safran, M., I. Dalah, et al. (2010). "GeneCards Version 3: the human gene integrator." Database (Oxford) 2010: baq020.

Sayers, E. W., T. Barrett, et al. (2010). "Database resources of the National Center for Biotechnology Information." Nucleic Acids Res 38(Database issue): D5-16.

Schnier, J. B., G. Kaur, et al. (1999). "Identification of cytosolic aldehyde dehydrogenase 1 from non-small cell lung carcinomas as a flavopiridol-binding protein." FEBS Lett 454(1-2): 100-104.

Sherman, D., V. Dave, et al. (1993). "Diverse polymorphism within a short coding region of the human aldehyde dehydrogenase-5 (ALDH5) gene." Hum Genet 92(5): 477- 480.

Sherman, D. I., R. J. Ward, et al. (1994). "Alcohol and acetaldehyde dehydrogenase gene polymorphism and alcoholism." EXS 71: 291-300.

Sherry, S. T., M. H. Ward, et al. (2001). "dbSNP: the NCBI database of genetic variation." Nucleic Acids Res 29(1): 308-311.

Simpson, M. A., H. Cross, et al. (2003). "Maspardin is mutated in mast syndrome, a complicated form of hereditary spastic paraplegia associated with dementia." Am J Hum Genet 73(5): 1147-1156.

Sobreira, T. J., F. Marletaz, et al. (2011). "Structural shifts of aldehyde dehydrogenase enzymes were instrumental for the early evolution of retinoid-dependent axial patterning in metazoans." Proc Natl Acad Sci U S A 108(1): 226-231.

Sophos, N. A., A. Pappa, et al. (2001). "Aldehyde dehydrogenase gene superfamily: the 2000 update." Chem Biol Interact 130-132(1-3): 323-337.

Sophos, N. A. and V. Vasiliou (2003). "Aldehyde dehydrogenase gene superfamily: the 2002 update." Chem Biol Interact 143-144: 5-22.

178

Sowa, M. E., E. J. Bennett, et al. (2009). "Defining the human deubiquitinating enzyme interaction landscape." Cell 138(2): 389-403.

Stagos, D., Y. Chen, et al. (2010). "Aldehyde dehydrogenase 1B1: molecular cloning and characterization of a novel mitochondrial acetaldehyde-metabolizing enzyme." Drug Metab Dispos 38(10): 1679-1687.

Stagos, D., Y. Chen, et al. (2010). "Corneal aldehyde dehydrogenases: multiple functions and novel nuclear localization." Brain Res Bull 81(2-3): 211-218.

Steinmetz, C. G., P. Xie, et al. (1997). "Structure of mitochondrial aldehyde dehydrogenase: the genetic component of ethanol aversion." Structure 5(5): 701- 711.

Stewart, M. J., K. Malek, et al. (1996). "Distribution of messenger RNAs for aldehyde dehydrogenase 1, aldehyde dehydrogenase 2, and aldehyde dehydrogenase 5 in human tissues." J Investig Med 44(2): 42-46.

Stewart, M. J., K. Malek, et al. (1995). "The novel aldehyde dehydrogenase gene, ALDH5, encodes an active aldehyde dehydrogenase enzyme." Biochem Biophys Res Commun 211(1): 144-151.

Strickland, K. C., N. I. Krupenko, et al. (2011). "Enzymatic properties of ALDH1L2, a mitochondrial 10-formyltetrahydrofolate dehydrogenase." Chem Biol Interact 191(1-3): 129-136.

Strickland, K. C., N. I. Krupenko, et al. (2013). "Molecular mechanisms underlying the potentially adverse effects of folate." Clin Chem Lab Med 51(3): 607-616.

Sulem, P., D. F. Gudbjartsson, et al. (2011). "Identification of low-frequency variants associated with gout and serum uric acid levels." Nat Genet 43(11): 1127-1130.

Sydow, K., A. Daiber, et al. (2004). "Central role of mitochondrial aldehyde dehydrogenase and reactive oxygen species in nitroglycerin tolerance and cross- tolerance." J Clin Invest 113(3): 482-489.

Taylor, J. S., I. Braasch, et al. (2003). "Genome duplication, a trait shared by 22000 species of ray-finned fish." Genome Res 13(3): 382-390.

Theodosiou, M., V. Laudet, et al. (2010). "From carrot to clinic: an overview of the retinoic acid signaling pathway." Cell Mol Life Sci 67(9): 1423-1445.

Thierry-Mieg, D. and J. Thierry-Mieg (2006). "AceView: a comprehensive cDNA- supported gene and transcripts annotation." Genome Biol 7 Suppl 1: S12 11-14.

179

Thompson, J. D., D. G. Higgins, et al. (1994). "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice." Nucleic Acids Res 22(22): 4673-4680.

Tizzano, M. and A. Sbarbati (2007). "Is rat LRRP Ba1-651 a Delta-1-pyrroline-5- carboxylate dehydrogenase activated by changes in the concentration of sweet molecules?" Med Hypotheses 68(4): 864-867.

Tomarev, S. I., S. Chung, et al. (1995). "Glutathione S-transferase and S-crystallins of cephalopods: evolution from active enzyme to lens-refractive proteins." J Mol Evol 41(6): 1048-1056.

Trott, O. and A. J. Olson (2010). "AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading." Journal of Computational Chemistry 31(2): 455-461.

Tsou, P. S., N. A. Page, et al. (2011). "Differential metabolism of organic nitrates by aldehyde dehydrogenase 1a1 and 2: substrate selectivity, enzyme inactivation, and active cysteine sites." AAPS J 13(4): 548-555.

Uechi, T., N. Maeda, et al. (2002). "Functional second genes generated by retrotransposition of the X-linked ribosomal protein genes." Nucleic Acids Res 30(24): 5369-5375.

Uma, L., J. Hariharan, et al. (1996). "Corneal aldehyde dehydrogenase displays antioxidant properties." Exp Eye Res 63(1): 117-120.

Vasiliou, V., A. Bairoch, et al. (1999). "Eukaryotic aldehyde dehydrogenase (ALDH) genes: human polymorphisms, and recommended nomenclature based on divergent evolution and chromosomal mapping." Pharmacogenetics 9(4): 421- 434.

Vasiliou, V. and D. W. Nebert (2005). "Analysis and update of the human aldehyde dehydrogenase (ALDH) gene family." Hum Genomics 2(2): 138-143.

Vasiliou, V. and A. Pappa (2000). "Polymorphisms of human aldehyde dehydrogenases. Consequences for drug metabolism and disease." Pharmacology 61(3): 192-198.

Vasiliou, V., A. Pappa, et al. (2000). "Role of aldehyde dehydrogenases in endogenous and xenobiotic metabolism." Chem Biol Interact 129(1-2): 1-19.

Vasiliou, V., M. Sandoval, et al. (2013). "ALDH16A1 is a novel non-catalytic enzyme that may be involved in the etiology of gout via protein-protein interactions with HPRT1." Chem Biol Interact 202(1-3): 22-31.

Verhagen, C., R. Hoekzema, et al. (1991). "Identification of bovine corneal protein 54 (BCP 54) as an aldehyde dehydrogenase." Exp Eye Res 53(2): 283-284.

180

Wang, J., J. S. Park, et al. (2013). "TRIB2 acts downstream of Wnt/TCF in liver cancer cells to regulate YAP and C/EBPalpha function." Mol Cell 51(2): 211-225.

Wang, M. F., C. L. Han, et al. (2009). "Substrate specificity of human and yeast aldehyde dehydrogenases." Chem Biol Interact 178(1-3): 36-39.

Wang, R. S., T. Nakajima, et al. (2002). "Effects of aldehyde dehydrogenase-2 genetic polymorphisms on metabolism of structurally different aldehydes in human liver." Drug Metab Dispos 30(1): 69-73.

Wang, X., P. Penzes, et al. (1996). "Cloning of a cDNA encoding an aldehyde dehydrogenase and its expression in Escherichia coli. Recognition of retinal as substrate." J Biol Chem 271(27): 16288-16293.

Wang, X., S. Sheikh, et al. (1996). "Heterotetramers of human liver mitochondrial (class 2) aldehyde dehydrogenase expressed in Escherichia coli. A model to study the heterotetramers expected to be found in Oriental people." J Biol Chem 271(49): 31172-31178.

Way, M. J. (2014). "Computational modelling of ALDH1B1 tetramer formation and the effect of coding variants." Chem Biol Interact 207: 23.

Webb, B. and A. Sali (2014). "Protein structure modeling with MODELLER." Methods Mol Biol 1137: 1-15.

Woods, I. G., P. D. Kelly, et al. (2000). "A comparative map of the zebrafish genome." Genome Res 10(12): 1903-1914.

Woods, I. G., C. Wilson, et al. (2005). "The zebrafish gene map defines ancestral vertebrate chromosomes." Genome Res 15(9): 1307-1314.

Xiao, Q., H. Weiner, et al. (1996). "The mutation in the mitochondrial aldehyde dehydrogenase (ALDH2) gene responsible for alcohol-induced flushing increases turnover of the enzyme tetramers in a dominant fashion." J Clin Invest 98(9): 2027-2032.

Xiao, Q., H. Weiner, et al. (1995). "The aldehyde dehydrogenase ALDH2*2 allele exhibits dominance over ALDH2*1 in transduced HeLa cells." J Clin Invest 96(5): 2180-2186.

Xu, Y. S., M. Kantorow, et al. (2000). "Evidence for gelsolin as a corneal crystallin in zebrafish." J Biol Chem 275(32): 24645-24652.

Yamauchi, K., J. Nakajima, et al. (1999). "Xenopus cytosolic thyroid hormone-binding protein (xCTBP) is aldehyde dehydrogenase catalyzing the formation of retinoic acid." J Biol Chem 274(13): 8460-8469.

181

Yamauchi, K. and J. R. Tata (1994). "Purification and characterization of a cytosolic thyroid-hormone-binding protein (CTBP) in Xenopus liver." Eur J Biochem 225(3): 1105-1112.

Yokoyama, A., T. Muramatsu, et al. (2001). "Alcohol and aldehyde dehydrogenase gene polymorphisms and oropharyngolaryngeal, esophageal and stomach cancers in Japanese alcoholics." Carcinogenesis 22(3): 433-439.

Yokoyama, T., Y. Kanno, et al. (2010). "Trib1 links the MEK1/ERK pathway in myeloid leukemogenesis." Blood 116(15): 2768-2775.

Yoshida, A., M. Ikawa, et al. (1985). "Molecular abnormality and cDNA cloning of human aldehyde dehydrogenases." Alcohol 2(1): 103-106.

Yoval-Sanchez, B. and J. S. Rodriguez-Zavala (2012). "Differences in susceptibility to inactivation of human aldehyde dehydrogenases by lipid peroxidation byproducts." Chem Res Toxicol 25(3): 722-729.

Yu, Z., D. Morais, et al. (2007). "Analysis of the role of retrotransposition in gene evolution in vertebrates." BMC Bioinformatics 8: 308.

Zhang, H., Y. G. Chen, et al. (2007). "[The relationship between aldehyde dehydrogenase-2 gene polymorphisms and efficacy of nitroglycerin]." Zhonghua Nei Ke Za Zhi 46(8): 629-632.

Zhang, J., L. Feuk, et al. (2006). "Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome." Cytogenet Genome Res 115(3-4): 205-214.

Zhao, D., P. McCaffery, et al. (1996). "Molecular identification of a major retinoic-acid- synthesizing enzyme, a retinaldehyde-specific dehydrogenase." Eur J Biochem 240(1): 15-22.

Zhou, J., Y. Bai, et al. (1995). "Proteolysis prevents in vivo chimeric fusion protein import into yeast mitochondria. Cytosolic cleavage and subcellular distribution." J Biol Chem 270(28): 16689-16693.

Zinovieva, R. D., S. I. Tomarev, et al. (1993). "Aldehyde dehydrogenase-derived omega- crystallins of squid and octopus. Specialization for lens expression." J Biol Chem 268(15): 11449-11455.

182