The Pharmacogenomics Journal (2007) 7, 133–143 & 2007 Nature Publishing Group All rights reserved 1470-269X/07 $30.00 www.nature.com/tpj ORIGINAL ARTICLE

Genetic diversity and function in the human cytosolic sulfotransferases

MAT Hildebrandt1, Amino-acid substitutions, which result from common nonsynonymous (NS) 2 1 polymorphisms, may dramatically alter the function of the encoded . DP Carrington , BA Thomae , Gaining insight into how these substitutions alter function is a step toward 3 2 BW Eckloff , DJ Schaid , acquiring predictability. In this study, we incorporated resequencing, VC Yee4, RM Weinshilboum1 functional genomics, amino-acid characterization and crystal structure and ED Wieben3 analysis for the cytosolic sulfotransferases (SULTs) to attempt to gain predictability regarding the function of variant allozymes. Previously, four 1Division of Clinical Pharmacology, Department SULT were resequenced in 118 DNA samples. With additional of Molecular Pharmacology and Experimental resequencing of the remaining eight SULT family members in the same Therapeutics, Mayo Clinic College of Medicine, DNA samples, a total of 217 polymorphisms were revealed. Of 64 Mayo Foundation, Rochester, MN, USA; 2Department of Health Sciences Research, Mayo polymorphisms identified within 8785 bp of coding regions from SULT Clinic College of Medicine, Mayo Foundation, genes examined, 25 were synonymous and 39 were NS. Overall, the Rochester, MN, USA; 3Department of proportion of synonymous changes was greater than expected from a Biochemistry and Molecular Biology, Mayo Clinic random distribution of mutations, suggesting the presence of a selective College of Medicine, Mayo Foundation, Rochester, MN, USA; 4Department of pressure against amino-acid substitutions. Functional data for common Biochemistry, Case Western Reserve University variants of five SULT genes have been previously published. These data, School of Medicine, Cleveland, OH, USA together with the SULT1A1 variant allozyme data presented in this paper, showed that the major mechanism by which amino acid changes altered Correspondence: function in a transient expression system was through decreases in Dr ED Wieben, Department of Biochemistry and Molecular Biology, Mayo Clinic College of immunoreactive protein rather than changes in kinetics. Additional Medicine, Mayo Foundation, Rochester, MN insight with regard to mechanisms by which NS single nucleotide 55985, USA. E-mail: [email protected] polymorphisms alter function was sought by analysis of evolutionary conservation, physicochemical properties of the amino-acid substitutions and crystal structure analysis. Neither individual amino-acid characteristics nor structural models were able to accurately and reliably predict the function of variant allozymes. These results suggest that common amino-acid substitutions may not dramatically alter the protein structure, but affect interactions with the cellular environment that are currently not well understood. The Pharmacogenomics Journal (2007) 7, 133–143. doi:10.1038/sj.tpj.6500404; published online 27 June 2006

Keywords: amino-acid substitutions; functional genomics; single nucleotide polymorphisms

Introduction

The contains an estimated 11 million common single nucleotide changes with allele frequencies greater than 1%.1 A vast majority of these sequence variations are believed to be functionally neutral, yet a subset alter the Received 5 January 2006; revised 28 April 2006; accepted 24 May 2006; published structure of a gene product. Such structural changes are often detrimental to online 27 June 2006 function, but some variant have normal, or even increased, function. Predicting the function of amino-acid substitutions MAT Hildebrandt et al 134

With the goals of better understanding of how molecular 11

evolution influences the function of variant allozymes and 6 how specific substitutions affect protein function, we have et al. incorporated resequencing data, functional genomics, ami- et al. no-acid characterization and crystal structure analysis to Reference analyze structure–function relationships within the human cytosolic sulfotransferase (SULT) .

The members of the SULT gene family play an important D 1.331.06 Thomae Current study 0.043 Current study 0.303 Current1.16 study 0.214 Current study Current1.18 study Adjei À À À À À À À role in the biotransformation of a variety of substrates, Tajima’s including , neurotransmitters, drugs and other xenobiotics.2–4 To date, 12 human SULT genes have p been identified.5 The family members share a high degree of sequence identity and are found in clusters on chromo- somes.5 Because of the importance of sulfation in the of a wide variety of substrates, much work has y been carried out in characterizing the genetic variation and functional consequences of amino-acid substitutions in the SULT family6–13 and crystal structures have been solved for nine of the 12 members.14–22 Previously, four SULT genes – SULT1A3, 1A4, 1E1 and 2A1 Non-coding

– were resequenced from 59 Caucasian-American (CA) and polymorphisms 59 African-American (AA) DNA samples.6,8,11,12 To deter- mine the nature and degree of sequence variation within

the entire gene family for the analyses reported here, the S remaining members of SULT gene family – SULT1A1, 1A2, 1B1, 1C2, 1C4, 2B1, 4A1 and 6B1 – were resequenced in the polymorphisms same DNA samples (Table 1). In total, over 8.5 million base pairs of sequence were analyzed from regions on six .

Functional genomic studies have been completed for 14 NS nonsynonymous (NS) single nucleotide polymorphism

(SNPs) as a step toward understanding how variation in polymorphisms nucleotide sequence translates into variation in enzyme function in the SULT gene family.6-9,11,12 These previous studies showed that alterations in amino-acid sequence can cause decreased enzyme activity owing to decreased levels of immunoreactive protein. In this study, additional functional Coding genomic studies were performed for three variant allozymes polymorphisms of SULT1A1 to determine the functional consequences of the amino-acid substitutions in a recombinant expression system. Previous genotype–phenotype correlation studies in both human liver and platelet samples showed that one common variant allele, SULT1A1*2 (Arg213His), was Frequency associated with low enzyme activity.9,10 The present work extends this analysis to additional SULT1A1 variants and (polymorphism/bp) presents new data on levels of immunoreactive proteins for all SULT1A1 variants. The SULT gene family is an ideal candidate for studying the relationship between sequence variation, structure and function by combining functional genomic data with amino-acid characterization and structural modeling. Un- polymorphisms derstanding of the consequences of amino-acid changes may aid in characterizing the mechanisms responsible for 2417313335585298 15 22 5 1/161.1 36 1/142.4 1/711.6 1/147.2 6 12 0 11 3 5 10 0 3 7 1 0 9 10 25 5 10.28 11.10 11.25 4.82 6.68 11.08 2.33 2.40 0.060 Current study decreases in enzyme function and ultimately gaining 2430401036084423 213734 363648 17 1/115.7 29 1/111.4 14 1/212.2 22 1/152.5 1/266.7 1/165.8 10 10 2 7 1 5 3 7 2 5 1 3 7 3 0 2 0 11 2 26 15 14.31 22 14.87 18.60 13 18.07 17 7.80 0.794 10.86 Current 0.613 study 6.88 6.21 Current 6.33 study 9.99 5.67 5.59 predictability regarding these effects. Our results suggest Summary of SULT gene sequence variation that commonly used methods for predicting the effect of amino-acid substitutions on protein function have only Table 1 SULT2A1 SULT2B1 SULT4A1 SULT6B1 NS, nonsynonymous; S, synonymous; SULT, sulfotransferase. limited utility when applied to SULT allozymes and better Gene Total bpSULT1A1 SULT1A2 SULT1B1 Total SULT1C2 SULT1C4 SULT1E1

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 135

‘tools’ are needed in order to predict how variation at the polymorphisms – an average of one polymorphism every nucleotide level translates into altered function of variant 712 bp. All of SULT4A1 polymorphisms are located in allozymes within the cellular environment. intronic regions (Table 1). The extent of nucleotide variation found in members of the SULT gene family was similar to 23–27 Results that observed for other groups of human genes.

Genetic variation in the SULT gene family Genetic diversity across gene regions All human SULT genes were resequenced in DNA samples Neutral theory predicts that nucleotide changes occur from AA and CA populations (236 alleles total) to catalog randomly throughout the genome with no selective pres- the genetic diversity within this gene family. A total of sure as to location.28,29 In general, this appeared to hold 36 259 bp were amplified from the human SULT genes in true for the regions of the SULT genes analyzed, with the each DNA sample, including 8785 bp (24%) of coding and proportion of base pairs sequenced roughly equaling the 27 474 bp (76%) of non-coding sequences (Table 2). Se- proportion of polymorphisms identified for the total quence analysis identified 212 SNPs and five insertion and population (Figure 1) (w2 ¼ 9.13, df ¼ 5, P ¼ 0.104). The deletions with an overall minor allele frequency (MAF) of notable deviation from this pattern was for nucleotide 0.102. Of the 217 polymorphisms identified, 64 (29.5%) changes within the coding region of the genes. Although were located in coding regions and 153 (70.5%) were non- 80% of the coding region nucleotides were NS sites where coding sequence variants (Table 2). On average, one nucleotide changes have the potential to change the protein polymorphism was identified every 167 bp of total se- sequence, only 61% of the observed polymorphisms in the quence. The most polymorphic SULT gene was SULT1A2, coding region actually did lead to a change in the encoded with an average of one nucleotide change per every 111 bp amino acid. This excess of synonymous (S) changes screened. In contrast to SULT1A2, sequence analysis of suggested that some selective pressure is working against 3558 bp corresponding to SULT4A1 detected only five amino-acid substitution in the SULT proteins in both

Table 2 Summary of SULT gene polymorphisms by population and gene region

Sequence section bp Polymorphisms y p Tajima’s D

Total population 36 259 217 9.972.2 8.674.2 À0.40 Coding 8785 64 12.172.9 8.174.2 À0.98 Nonsynonymous 7011.66 39 9.272.4 4.972.8 À1.34 Synonymous 1773.34 25 23.376.7 20.8711.7 À0.30 Non-coding 27 474 153 9.272.0 8.874.3 À0.14 Intron 20 084 104 8.571.9 7.673.8 À0.32 50-UTR 2312 15 10.773.5 12.577.3 0.41 30-UTR 1630 12 12.274.3 14.178.6 0.37 50-FR 2746 14 8.472.8 7.874.8 À0.19 30-FR 702 8 18.977.6 20.8714.0 0.22 AA population 36 259 186 9.672.3 8.474.1 À0.38 Coding 8785 51 10.972.9 8.074.2 À0.82 Nonsynonymous 7011.66 30 8.072.3 5.272.9 À1.05 Synonymous 1773.34 21 22.277.0 19.2711.0 À0.38 Non-coding 27 474 135 9.172.3 8.674.2 À0.20 Intron 20 084 91 8.472.1 7.273.6 À0.45 50-UTR 2312 13 10.573.8 12.777.4 0.54 30-UTR 1630 10 11.574.4 14.778.9 0.71 50-FR 2746 13 8.973.2 8.675.2 À0.07 30-FR 702 8 21.378.9 19.5713.4 À0.20 CA population 36 259 141 7.271.8 8.374.1 0.50 Coding 8785 36 7.772.2 7.874.1 0.05 Nonsynonymous 7011.66 21 5.671.8 4.372.5 À0.68 Synonymous 1773.34 15 15.875.4 21.8712.2 1.02 Non-coding 27 474 105 7.171.8 8.574.2 0.64 Intron 20 084 72 6.771.7 7.573.7 0.37 50-UTR 2312 12 9.773.5 12.177.1 0.63 30-UTR 1630 8 9.273.8 12.878.0 0.93 50-FR 2746 7 4.872.1 6.874.4 0.99 30-FR 702 6 13.376.6 21.6714.5.0 1.29

Abbreviations: AA, African-American; CA, Caucasian-American; FR, flanking region; SULT, sulfotransferase; UTR, untranslated region.

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 136

Figure 1 SULT polymorphism distribution. (a) Proportion of base pairs sequenced corresponding to each gene region (coding, non-coding, NS, S, intron, 50-UTR, 30-UTR, 50-FR and 30-FR). (b) Proportion of polymorphisms identified in the combined AA and CA population in each gene region. (c) Proportion of polymorphisms identified in AA DNA samples in each gene region. (d) Proportion of polymorphisms identified in the CA population in each gene region.

populations (w2 ¼ 64.5, df ¼ 1, Po0.0001). Furthermore, the support a greater degree of sequence variation compared to average MAF for S polymorphisms was 60% greater than NS sites (Table 2). that observed for NS changes (9.85 and 6.04% for S and NS changes, respectively). Inspection of the MAF distribution SULT family functional genomics also indicated a shift toward an increase in the presence of Amino-acid substitutions have the potential to dramatically low (o0.01) to intermediate (0.01–0.10) allele frequency NS affect protein function. To determine the functional con- alleles compared to S polymorphisms for both populations. sequences of SULT NS SNPs, previous studies characterized The difference in the number of NS and S polymorphisms 14 amino-acid changes. Seven of the variant allozymes identified may be a direct result of the length of sequence displayed significant reductions in enzyme function screened. Therefore, the relationships between S and NS (Table 3).6–8,11,12 In the present study, we have also assayed polymorphisms and base pairs sequenced were analyzed by the three SULT1A1 variant allozymes present in the CA and regression analysis. For S polymorphisms, over 61% of the AA populations studied here for levels of enzyme activity variation could be explained by length (R2 ¼ 0.614, and immunoreactive protein, using the same expression F(1,9) ¼ 12.75, P ¼ 0.0073). In comparison, length was not system as used for the previous studies of other SULT a significant factor in explaining the variation present at NS variants. Figure 2 shows the results of the enzyme activity sites (R2 ¼ 0.0067, F(1,9) ¼ 0.054, P ¼ 0.8216), further sug- and Western blot analysis. gesting that some selective pressure against amino-acid Taking all the SULT variants that have been studied using substitutions is present in the SULT genes. this assay as a group, a significant correlation existed between levels of enzyme activity and immunoreactive

Population-genetic analysis of diversity in the SULT gene family protein for all 17 NS SULT SNPs (Figure 3, RP ¼ 0.692, To gain insight into the molecular evolution of the SULT P ¼ 0.0015). These results indicated that the major – gene family, nucleotide diversity (p), the neutral parameter although not the only – mechanism responsible for the (y) and Tajima’s D statistic were calculated.30 The estimated decrease in enzyme activity was a corresponding decrease in p values for the SULT family were almost identical for the enzyme protein. combined population, AA population and the CA popula- tion, with values of 8.6, 8.4 and 8.3, respectively (Table 2). Structure–function predictions based on characterization of amino- The 95% CI for Tajima’s D values included zero for all acid substitutions calculations, suggesting that the overall evolution of the Several techniques have been developed that characterize gene family was neutral. Yet there was a marked difference amino acid substitutions as a way to understand how between the nucleotide diversity observed for the NS sites substitutions might affect function. We applied four com- compared to the S sites. The value of p was more than monly used methods based on evolutionary conservation fourfold higher for S sites than NS sites, whereas y values and/or physicochemical properties to 17 of the substitutions were 2.5-fold higher for S sites compared to NS sites for the found in the human SULTs. First, amino-acid alignments total population. This indicated that S sites were able to within the SULT subfamilies for human, chimpanzee, rat

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 137

Table 3 SULT variant allozyme functional parameters

Allozyme Amino-acid change dbSNP AA Allele Frequency CA allele frequency % WT activitya % WT protein Reference

SULT1A1*2 Arg213His rs9282861 0.271 0.339 67 54 — SULT1A1*3 Met223Val rs1801030 0.169 0.000 109 64 — SULT1A1*5 Phe247Leu rs28374453 0.042 0.000 109 74 — SULT1A3/4*2 Lys234Asn N/A 0.042 0.000 28 54 Thomae et al.12 SULT1A3/4*3 Pro101Leu N/A 0.025 0.000 13 42 Hildebrandt et al.8 SULT1A3/4*4 Pro101His N/A 0.004 0.000 59 86 Hildebrandt et al.8 SULT1A3/4*5 Arg144Cys N/A 0.025 0.000 110 120 Hildebrandt et al.8 SULT1C2*2 Ser255Ala rs17036104 0.025 0.068 99 87 Freimuth et al.7 SULT1C2*3 Asp60Ala N/A 0.000 0.011 14 58 Freimuth et al.7 SULT1C2*4 Arg73Gln rs17036058 0.000 0.008 15 99 Freimuth et al.7 SULT1C2*5 Ser111Phe N/A 0.000 0.006 0 0 Freimuth et al.7 SULT1E1*2 Asp22Tyr rs11569705 0.008 0.000 7 13 Adjei et al.6 SULT1E1*3 Ala32Val rs11569703 0.000 0.008 54 54 Adjei et al.6 SULT1E1*4 Pro253His rs11569712 0.000 0.008 108 101 Adjei et al.6 SULT2A1*2 Ala261Thr rs11569679 0.136 0.000 93 79 Thomae et al.11 SULT2A1*3 Ala63Pro rs11569681 0.051 0.000 57 27 Thomae et al.11 SULT2A1*4 Lys227Glu rs11569680 0.008 0.000 15 2 Thomae et al.11

Abbreviations: AA, African-American; CA, Caucasian-American; N/A, not applicable; dbSNP, SNP database; SULT, sulfotransferase; WT, wild type. aEach variant allozyme was assayed using an appropriate substrate at optimal conditions for the SULT isoform (see references).

Figure 3 SULT variant allozyme activity and protein level correlation. Correlation between level of enzyme activity and immunoreactive protein for 17 SULT variant allozymes compared to the WT allozyme. RP is the Pearson product-moment coefficient for the correlation analysis.

and mouse were performed to determine which sites were conserved through mammalian evolution. Of the 17 loca- tions of NS SNPs, only three were conserved in all four mammalian species – SULT1A1*2, SULT1C2*4 and SULT1E1*4 (Table 4). Of these, only the SULT1C2*4 (Arg73Gln) variant exhibited decreased enzyme activity of greater than 50% of wild type (WT). Notably, position 73 of SULT1C2 was the only site among the SULT variants we studied that was conserved in every species examined to date, including fish (Danio rerio and Fugu rubripes) and birds Figure 2 SULT1A1 functional genomics. Average levels of enzyme activity (a) and immunoreactive protein (b) for human SULT1A1 (Gallus gallus). allozymes expressed in COS-1 cells (mean7s.e.m., n ¼ 6). Levels are Among the 14 substitutions that we studied, which occur expressed relative to WT allozyme and corrected for transfection at residues that have not been conserved through mamma- efficiency. lian evolution, eight had enzyme activities that were at least

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 138

Table 4 SULT variant amino-acid characteristics

Allozyme Amino-acid change % WT activity EC/EU Grantham BLOSUM62 SIFT

SULT1A1*2 Arg213His 67 EC 29 0 0.00 SULT1A1*3 Met223Val 109 EU 21 1 0.26 SULT1A1*5 Phe247Leu 109 EU 103 0 0.64 SULT1A3/4*2 Lys234Asn 28 EU 94 0 0.19 SULT1A3/4*3 Pro101Leu 13 EU 98 À3 0.03 SULT1A3/4*4 Pro101His 59 EU 77 À2 0.01 SULT1A3/4*5 Arg144Cys 110 EU 180 À3 0.02 SULT1C2*2 Ser255Ala 99 EU 99 1 0.07 SULT1C2*3 Asp60Ala 14 EU 126 À2 0.03 SULT1C2*4 Arg73Gln 15 EC 43 1 0.00 SULT1C2*5 Ser111Phe 0 EU 155 À2 0.00 SULT1E1*2 Asp22Tyr 7 EU 160 À3 0.09 SULT1E1*3 Ala32Val 54 EU 86 0 0.31 SULT1E1*4 Pro253His 108 EC 77 À2 0.00 SULT2A1*2 Ala261Thr 93 EU 58 0 0.37 SULT2A1*3 Ala63Pro 57 EU 27 À1 1.00 SULT2A1*4 Lys227Glu 15 EU 56 1 0.17

Abbreviations: EC, evolutionarily conserved; EU, evolutionarily unconserved; SIFT, sorting intolerant from tolerant; SULT, sulfotransferase; WT, wild type. Bold values indicates SIFT predictions that were made with low confidence.

50% of that produced by the WT residue, but the remaining of disease-associated changes in distinguishing neutral six allozymes had activities ranging from 0 to 28% of that substitutions from non-neutral, protein altering changes.34 observed with the WT allele. Thus, a lack of evolutionary For the SULT polymorphisms, 59% of the NS SNPs (10 of 17) conservation clearly does not predict the absence of had SIFT scores of o0.01 and were predicted to be poorly consequences for SULT enzyme function. tolerated and alter protein function. A similar percentage Grantham numbers are determined based on differences (41%, seven out of 17) of the variant allozymes did display in physicochemical properties between two amino acids.31 A a significant decrease in enzyme activity compared to high Grantham number (4100) indicates that the substitu- WT, but the success rate for SIFT scores in predicting tion would be a ‘radical’ change and was expected to alter the decrease was only 50% (five out of 10). Interestingly, the function of the protein. Five of the 17 amino-acid three of the five discordant instances were for allozymes substitutions had Grantham numbers 4100, but two of with close to or greater activity than the WT. Care should be these allozymes (SULT1A1*5 and SULT1A3/4*5) had activity taken in the interpretation of these results, as eight of the comparable to WT (Table 4). Of the 12 substitutions with 17 SIFT predictions were made with low confidence (as Grantham values less than 100 (suggesting less impact indicated in bold in Table 4) as a result of restricted diversity on function) four had enzyme activities less than 50% within the comparison set. All eight low confidence of WT. Thus, using Grantham number was also not a predictions were for substitutions predicted to affect protein reliable method for accurate prediction of variant allozyme function. function. In their analysis of the molecular evolution of human Several studies have used BLOSUM62 values to aid in membrane transporter genes, Shu et al.35 and Leabman predicting the effect of NS SNPs on protein function based et al.23 noted that the most common variants of the on physicochemical properties of amino acids.32 For the membrane transporter genes studied exhibited normal SULT NS SNPs, almost half of the amino-acid substitutions function, and that nonfunctional alleles tended to be rare were considered to be nonconservative based on a negative in the populations they studied. Similar trends were evident BLOSUM62 value. These designations only moderately for the SULTs we studied. None of the more common correlated with the functional activity and level of immuno- variants (allele frequencies 45%) yielded enzyme activities reactive protein of the variant allozymes, again demonstrat- less than 50% of the WT allele in functional assays. ing that analyzing the physicochemical properties of the Conversely, all of the NS changes that reduced enzyme substitutions was not sufficient to accurately and reliably activities below 20% of the WT allele had MAFs below 2.5%. predict the consequence of NS SNPs. Nevertheless, the correlations between allele frequency and Finally, SIFT (Sorting Intolerant From Tolerant) was used enzyme function were not perfect. Three rare allozymes in to characterize the SULT amino-acid substitutions. SIFT our populations (SULT1A3/4*4, SULT1E1*3 and SULT1E1*4) utilizes amino-acid physicochemical properties together had enzyme activities more than 50% of WT, and SULT1A3/ with alignment of similar sequences for predicting the 4*2 (which had an allele frequency of over 4% in our AA functional effects of NS SNPs.33 Other studies using SIFT population) resulted in an enzyme with only about 25% of have had success in predicting the functional consequences normal function in our assays.

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 139

Crystal structure analysis conformational constraints for the Ala63Pro of SULT2A1*3 The PolyPhen program, a web-based tool that utilizes crystal can be accommodated only with some significant local structure data for predicting the functional consequences conformational change, which may be incompatible with of amino-acid substitutions, predicted that five of the 17 protein folding and/or stability. SULT1C2*5 has no detect- amino-acid substitutions would be ‘possibly’ or ‘probably able protein or activity, whereas SULT2A1*3 has 57 and 27% damaging’. Of these, only one variant allozyme – SULT1A3/ of WT activity and protein, respectively. 4*3 – showed a significant reduction in enzyme activity (Table 5). Conversely, six allozymes that were predicted to be ‘benign’ had less than 50% of WT activity. Furthermore, the Discussion PolyPhen program faired poorly in obtaining the correct WT crystal structure from the PDB database for modeling The combination of gene resequencing, functional geno- purposes. Although three-dimensional (3-D) structural mics, amino-acid characterization and crystal structure- mapping is not the only parameter used by the PolyPhen based modeling allowed for our complete analysis of the algorithm, the predictions might have been more accurate if the appropriate crystal structure had been selected. Structural modeling of the local chemical and steric environment of each amino-acid substitution using the corresponding WT SULT crystal structure was also unable to predict accurately the function of the variant allozymes (Table 5). Modeling showed that a majority of the substitu- tion sites were located on the surface of the proteins and as a result, the encoded residues were exposed to the cellular environment (Figure 4). Of the 17 amino-acid substitution locations, only four were near the 30-phosphoadenosine 50- phosphosulfate binding site; two of these had undiminished activity, the third (SULT1E1*2) had 7 and 13% of WT activity and protein, respectively, and the fourth (SULT1C2*5) had no activity owing to the absence of any detectable protein. Although alterations in enzyme kinetics were not the Figure 4 Distribution of variant substitution sites in the SULT protein major mechanism by which an amino-acid substitution fold. The crystal structure of human SULT1A1 (PDB ID: 1LS6) is shown in a gray ribbon (left) or space-filling sphere (right) representation. altered function, minor variations in substrate kinetics Residues altered in SULT variants are shown as Ca on the left, and were observed for two of these four variant allozymes – colored by protein in both panels (cyan for SULT1A1, magenta for 1A3/ SULT1E1*2 and SULT1E1*4. Only two substitutions were 4, yellow for 1C2, blue for 1E1, orange for 2A1 and green for both 1C2 speculated to be incompatible with WT structure; the large and 1E1). For reference, the bound PAP (30-phosphoadenosine 50- Ser111Phe side chain of SULT1C2*5 and the main-chain phosphate) is shown in pink.

Table 5 PolyPhen and structural predictions

Allozyme Amino-acid change % WT activity PolyPhen structure PolyPhen prediction Structural prediction

SULT1A1*2 Arg213His 67 1E1 Possibly damaging Can be accommodated SULT1A1*3 Met223Val 109 None Benign Can be accommodated SULT1A1*5 Phe247Leu 109 1A1 Benign Can be accommodated, near BS SULT1A3/4*2 Lys234Asn 28 1A1 Benign In unobserved disordered region SULT1A3/4*3 Pro101Leu 13 None Possibly damaging Can be accommodated SULT1A3/4*4 Pro101His 59 None Possibly damaging Can be accommodated SULT1A3/4*5 Arg144Cys 110 None Probably damaging Can be accommodated SULT1C2*2 Ser255Ala 99 None Benign In unobserved disordered region SULT1C2*3 Asp60Ala 14 1A1 Benign Can be accommodated SULT1C2*4 Arg73Gln 15 1A1 Benign In unobserved disordered region SULT1C2*5 Ser111Phe 0 None Benign Will alter structure, near BS residue SULT1E1*2 Asp22Tyr 7 1E1 Benign Can be accommodated, near BS SULT1E1*3 Ala32Val 54 1E1 Benign Can be accommodated SULT1E1*4 Pro253His 108 1E1 and 1A1 Probably damaging Can be accommodated, near BS residue SULT2A1*2 Ala261Thr 93 2A1 Benign Can be accommodated SULT2A1*3 Ala63Pro 57 2A1 Benign Will alter local structure SULT2A1*4 Lys227Glu 15 2A1 Benign Can be accommodated

Abbreviations: BS, binding site; WT, wild type.

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 140

SULT gene family at several levels. Our goal was to without a corresponding reduction in enzyme protein. In determine how variation in the coding sequences of SULT that respect, it is an exception to the general rule for these genes translates into variation in function and whether SULT variants. As shown in Figure 3, the major mechanism alterations in function could be predicted based on evolu- by which common, naturally occurring polymorphisms tionary conservation, physicochemical properties and/or reduced enzyme activity in the SULTs – at least in a transient 3-D structure. expression system – was by reducing steady-state levels of In general, evolution of the SULT gene family appeared to SULT proteins. This observation reinforces the conclusions be consistent with neutral theory, indicating that most of previous mechanistic studies of how polymorphisms regions of these genes were not under significant selective in Phase II drug metabolizing affect enzyme pressure. However, there was a suggestion of selection activity.37–39 against diversity at NS sites compared to S sites within the Functional genomic studies performed with recombinant coding regions of the SULT genes. This shift toward protein revealed that seven of 17 variants had enzyme increased diversity at S sites was present for all SULT genes activity decreases of greater than 50%. Our analysis with the exception of SULT6B1. There is a surprising amount suggested that predictions based on individual amino-acid of heterogeneity in the genetic diversity of the different substitutions had limited value in predicting the functional SULT genes. It is interesting that SULT4A1 is the most highly consequences of the variant allozymes we studied. Each of conserved through evolution, with over 97.5% sequence the approaches tested in this study yielded both false identity between mouse, rat, chimp and human SULT4A1 positive and false negative predictions. coding sequences. Currently, there is no known substrate for We also tested the hypothesis that the use of 3-D SULT4A1; so the function of this enzyme is not clear, structural data might yield additional insight into the although it is highly expressed in the human brain.36 functional consequences of polymorphic substitutions. The most polymorphic member of the family is SULT1A2, Structural modeling of amino-acid substitutions showed with 36 total polymorphisms found in 4010 bp examined. that nearly all of the NS SNPs altered residues were located Given the structural and functional redundancy of four on the surfaces of the proteins and were compatible with members of the SULT1A family, a tolerance for higher WT structure (Figure 4). Work on several other pharmaco- divergence in the SULT1A genes is not surprising. Overall, genetically important enzymes, including thiopurine N- the extent and nature of genetic variation within the SULT methyltransferase (TPMT), phenylethanolamine N-methyl- gene family was comparable to the sequence variation transferase and SULT1A3/4, has shown that amino-acid identified in other gene resequencing studies.23–26 substitutions can result in variant allozymes, which are Many of these previous resequencing studies have not rapidly degraded through the proteosome.12,39–41 This focused on gene families, but Leabman et al.23 resequenced process can involve several cellular proteins including 24 genes from a family of membrane transporters. Their data molecular chaperones.39 Furthermore, variant allozymes suggested that specific gene locations had differing levels can form aggresomes in the cell as demonstrated by of nucleotide diversity owing to evolutionary constraints TPMT*3A.42 Aggresomes are complexes of misfolded pro- resulting from structural requirements for membrane asso- teins as well as chaperones, microtubular proteins and ciation. The SULTs are cytosolic enzymes and do not have histone deacetylase 6. It may be that many amino-acid the structural constraints of membrane-bound proteins. substitutions alter interactions within the cellular environ- This may partly explain why fewer SULT residues have been ment, and it is these complex interactions that are conserved through mammalian evolution than is observed responsible for variation in protein level and thus enzyme for the membrane transporters.23,35 Only three of the 17 activity – not alterations in protein structure. SULT amino-acid substitutions were alterations of conserved From a practical pharmacogenomics perspective, the residues in human, chimpanzee, rat and mouse, and two most important issue for the SULTs would be the ability to resulted in enzymes that had specific activities at least as identify low activity alleles that may impair normal drug high as the WT enzyme. However, the WT Arg at position 73 and metabolism. Allele frequencies for the SULTs of SULT1C2*4 has been conserved in fish and birds and it is do not correlate perfectly with function, but variant alleles notable that Gln at this position leads to a dramatic decrease with frequencies greater than 5% were uniformly associated in enzyme activity. The reduced function of SULT1C2*4 is with enzyme activities of at least 50% of WT. More consistent with the hypothesis that amino-acid substitu- importantly, all the allozymes with enzyme activity less tions at the most evolutionarily conserved (EC) sites are than half of the WT allele had allele frequencies of less than likely to be detrimental to function. However, the lack of 2.5% in our combined sample. an extended correlation between evolutionary conservation In conclusion, this study has incorporated nucleotide and enzyme function in this SULT data set is a departure polymorphism, functional genomic data, amino-acid char- from the observations of Leabman et al.23 with regard to acterizations and structural modeling for an important membrane transporters, and emphasizes that different human gene family as a step toward predictability regarding protein families may be subject to different selection the function of variant allozymes. The results showed that pressures. although there appears to be pressure against amino-acid The reduction of enzyme activity as a result of substitu- substitutions in the SULT family, no single predictive tion at the highly conserved residue 73 in SULT1C2*4 occurs method – including crystal structure modeling – is able to

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 141

accurately and reliably predict the functional consequences available through the PharmGKB web site (see below). All of amino-acid substitutions as measured in these genotype- subjects had provided written informed consent for the use to-phenotype studies of recombinant proteins. Additional of their DNA for research purposes, and the present studies understanding will be required to develop better predictors were reviewed and approved by the Mayo Clinic Institu- of ways in which variation at the nucleotide level translates tional Review Board. into variation in function for the SULT gene family. Further work will also be needed to understand how accurately the individual differences in and function SULT gene resequencing measured here can predict drug responses in the context The PCR was used to amplify each of the SULT genes. The of the complex biological processes that exist in individual amplifications included all and splice junctions as patients. well as portions of introns and flanking regions. Amplicons were sequenced using dye primer sequencing chemistry. All polymorphisms observed in only a single DNA sample were Materials and methods verified by re-amplification and sequencing to rule out the presence of PCR-induced artifacts. WT alleles were desig- DNA samples nated as the allele common in the AA population at that DNA samples were obtained from the Coriell Cell Repository location. (Camden, NJ, USA). Specifically, 59 samples each from two Resequencing data reported in this paper have been Human Variation Panels, HD100CAU and HD100AA, were deposited in the PharmGKB database (www.PharmGKB.org) used. Resequencing data for individual Coriell samples are with the following accession IDs: SULT1A1 – PA343,

Table 6 Accession numbers and probe substrates

Subfamily Gene Species Ref. seq. Probe substrate (concentration) PDB ID

SULT1A SULT1A1 Human NP_001046 4-nitrophenol (4 mM) 1LS6 Chimp ENSPTRP00000013543 Mouse NP_598431 Rat NP_114022 SULT1A2 Human NP_001045 Chimp ENSPTRP00000013540 SULT1A3/4 Human NP_003157 Dopamine (40 mM) 1CJM Chimp ENSPTRP00000013656 SULT1C SULT1C1 Mouse NP_061221 Rat NP_113920 SULT1C2 Human NP_001047 4-nitrophenol (10 mM) 1ZHE Mouse NP_081211 Rat CAB41460 SULT1C3 Rat CAB41461 Chimp ENSPTRP00000021107 SULT1C4 Human NP_006579 Chimp ENSPTRP00000021109 SULT1E SULT1E1 Human NP_005411 17 b-estradiol (50 nM) 1G3M, 1HY3 Chimp ENSPTRP00000027719 Mouse NP_075624 ESTr Rat AAA41128 rEST-3 Rat AAB33441 Ste-1 Rat AAB07680 Ste-2 Rat AAB07681 SULT1E2 Rat AAB33442 SULT2A SULT2A1 Human NP_003158 (5 mM) 1OV4, 1J99 Mouse P52843 ST-20 Rat AAA41356 ST-21a Rat BAA03632 ST21b Rat BAA03633 SULT2A2 Mouse NP_03312 ST-40 Rat AAA42183 ST-41 Rat CAA45007 SULT2A3 Rat AAB57741 SULT2A4 Rat BAA03634

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 142

SULT1A2 – PA341, SULT1A3/1A4 – PA344, SULT1B1 – PA415, Duality of interest SULT1C2 (previously SULT1C15) – PA345, SULT1C4 (pre- viously SULT1C25) – PA414, SULT1E1 – PA340, SULT2A1 – None. PA346, SULT2B1 – PA36249, SULT4A1 – PA412, SULT6B1 – PA128505849. Abbreviations

SULT1A1 functional genomics AA African-American Expression constructs for three variant SULT1A1 allozymes CA Caucasian-American (Arg213His, Met223Val and Phe247Leu) were used to SULT sulfotransferase transfect COS-1 cells that had been co-transfected with a SNP single nucleotide polymorphism b NS nonsynonymous -galactosidase construct (Promega, Madison, WI, USA). S synonymous COS-1 cell lysate from these transfections were used to WT wild type assay levels of enzyme activity using 4 mM 4-nitrophenol as MAF minor allele frequency the substrate and immunoreactive protein as described EC evolutionarily conserved EU evolutionarily unconserved previously.9

Data analysis SULT1A3 SULT1A4 Because and are 99.99% identical in Acknowledgments sequence,8 both copies were amplified and sequenced simultaneously, so we were unable to assign the SNPs in We thank Luanne Wussow for her assistance with the manuscript. these two genes to a specific . Therefore, polymorphism This work was supported in part by the National Institutes of Health data from these two genes were removed from statistical (NIH) Grants RO1 GM35720 (MATH, BAT and RMW) and UO1 analysis. GM61388, The Pharmacogenetics Research Network (MATH, DPC, NS and S sites were calculated based on the number of BAT, BWE, DJS, VCY, RMW and EDW). fourfold, twofold and nondegenerate sites described by Hartl and Clark.43 Nucleotide diversity (p), the neutral parameter References (y) – corrected for length sequenced – and Tajima’s D were calculated as described by Tajima30 for each gene and gene 1 Kruglyak L, Nickerson DA. Variation is the spice of life. Nat Genet 2001; 27: 234–236. region. All p and y values are expressed as parameter 2 Falany CN. Enzymology of human cytosolic sulfotransferases. FASEB J estimates  10À4/bp 7s.e. 1997; 11: 206–216. 3 Coughtrie MW. Sulfation through the looking glass – recent advances in Amino-acid characterization sulfotransferase research for the curious. Pharmacogenomics J 2002; 2: 297–308. SULT sequence alignments were created using the AliBee 4 Weinshilboum RM, Otterness DM, Aksoy IA, Wood TC, Her C, tool from the GeneBee website (www.genebee.msu.su/ Raftogianis RB. Sulfation and sulfotransferases 1: sulfotransferase genebee.html). A list of reference sequences used for the molecular biology: cDNAs and genes. FASEB J 1997; 11: 3–14. analysis is presented in Table 6. These alignments were used 5 Blanchard RL, Freimuth RR, Buck J, Weinshilboum RM, Coughtrie MW. A proposed nomenclature system for the cytosolic sulfotransferase (SULT) to classify amino-acid residues as EC or EU. EC residues were superfamily. Pharmacogenetics 2004; 14: 199–211. designated as locations that were identical in the subfamily 6 Adjei AA, Thomae BA, Prondzinski JL, Eckloff BW, Wieben ED, alignment for all species. The multiple sequence alignments Weinshilboum RM. Human sulfotransferase (SULT1E1) phar- were also used to query the SIFT database (http:// macogenomics: gene resequencing and functional genomics. Br J 33 Pharmacol 2003; 139: 1373–1382. blocks.fhcrc.org/sift/SIFT.html). SIFT scores, Grantham 7 Freimuth RR, Eckloff B, Wieben ED, Weinshilboum RM. Human numbers and BLOSUM62 values were calculated for each sulfotransferase SULT1C1 pharmacogenetics: gene resequencing and SULT NS SNP identified during gene resequencing.31,32 functional genomic studies. Pharmacogenetics 2001; 11: 747–756. 8 Hildebrandt MA, Salavaggione OE, Martin YN, Flynn HC, Jalal S, Wieben ED et al. Human SULT1A3 pharmacogenetics: gene duplication and Crystal structure analysis functional genomic studies. Biochem Biophys Res Commun 2004; 321: The web-based program, PolyPhen (http://www.bork.embl- 870–878. heidelberg.de/PolyPhen/), was used to predict the possible 9 Raftogianis RB, Wood TC, Weinshilboum RM. Human phenol sulfo- consequences of amino-acid substitutions on protein struc- transferases SULT1A2 and SULT1A1: genetic polymorphisms, allozyme 44 properties, and human liver genotype–phenotype correlations. Biochem ture and function based on queries of the PDB. Additional Pharmacol 1999; 58: 605–616. crystal structure modeling of the local chemical and steric 10 Raftogianis RB, Wood TC, Otterness DM, Van Loon JA, Weinshilboum environment of each NS SNP was performed using the RM. Phenol sulfotransferase pharmacogenetics in humans: association interactive crystallographic graphics software program ‘O’.45 of common SULT1A1 alleles with TS PST phenotype. Biochem Biophys Res Commun 1997; 239: 298–304. The modeling was performed by computational mutagen- 11 Thomae BA, Eckloff BW, Freimuth RR, Wieben ED, Weinshilboum RM. esis using the appropriate WT SULT crystal structure Human sulfotransferase SULT2A1 pharmacogenetics: genotype-to- deposited in the PDB as a scaffold, substitution of the side phenotype studies. Pharmacogenomics J 2002; 2: 48–56. chain to correspond to the NS SNP, and analysis of the 12 Thomae BA, Rifki OF, Theobald MA, Eckloff BW, Wieben ED, Weinshilboum RM. Human catecholamine sulfotransferase (SULT1A3) chemical and steric compatibility of the local structural pharmacogenetics: functional genetic polymorphism. J Neurochem environment with the new side chain. 2003; 87: 809–819.

The Pharmacogenomics Journal Predicting the function of amino-acid substitutions MAT Hildebrandt et al 143

13 Wood TC, Her C, Aksoy I, Otterness DM, Weinshilboum RM. Human for CNS disorders in a representative sample of the European dehydroepiandrosterone sulfotransferase pharmacogenetics: quantita- population. Genome Res 2003; 13: 2271–2276. tive Western analysis and gene sequence polymorphisms. J Steroid 28 King JL, Jukes TH. Non-Darwinian evolution. Science 1969; 164: 788– Biochem Mol Biol 1996; 59: 467–478. 798. 14 Bidwell LM, McManus ME, Gaedigk A, Kakuta Y, Negishi M, Pedersen L 29 Kimura M. Evolutionary rate at the molecular level. Nature 1968; 217: et al. Crystal structure of human catecholamine sulfotransferase. JMol 624–626. Biol 1999; 293: 521–530. 30 Tajima F. Statistical method for testing the neutral mutation hypothesis 15 Dajani R, Cleasby A, Neu M, Wonacott AJ, Jhoti H, Hood AM et al. X-ray by DNA polymorphism. Genetics 1989; 123: 585–595. crystal structure of human dopamine sulfotransferase, SULT1A3. 31 Grantham R. Amino acid difference formula to help explain protein Molecular modeling and quantitative structure–activity relationship evolution. Science 1974; 185: 862–864. analysis demonstrate a molecular basis for sulfotransferase substrate 32 Henikoff S, Henikoff JG. Amino acid substitution matrices from protein specificity. J Biol Chem 1999; 274: 37862–37868. blocks. Proc Natl Acad Sci USA 1992; 89: 10915–10919. 16 Gamage NU, Duggleby RG, Barnett AC, Tresillian M, Latham CF, Liyou 33 Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. NE et al. Structure of a human carcinogen-converting enzyme, Genome Res 2001; 11: 863–874. SULT1A1. Structural and kinetic implications of substrate inhibition. 34 Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect J Biol Chem 2003; 278: 7655–7662. protein function. Nucleic Acids Res 2003; 31: 3812–3814. 17 Lee KA, Fuda H, Lee YC, Negishi M, Strott CA, Pedersen LC. Crystal 35 Shu Y, Leabman MK, Feng B, Mangravite LM, Huang CC, Stryke D et al. structure of human sulfotransferase (SULT2B1b) in the Evolutionary conservation predicts function of variants of the human presence of and 30-phosphoadenosine 50-phosphate. organic cation transporter, OCT1. Proc Natl Acad Sci USA 2003; 100: Rationale for specificity differences between prototypical SULT2A1 and 5902–5907. the SULT2B1 isoforms. J Biol Chem 2003; 278: 44593–44599. 36 Liyou NE, Buller KM, Tresillian MJ, Elvin CM, Scott HL, Dodd PR et al. 18 Pedersen LC, Petrotchenko E, Shevtsov S, Negishi M. Crystal structure Localization of a brain sulfotransferase, SULT4A1, in the human and rat of the human estrogen sulfotransferase–PAPS complex: evidence for brain: an immunohistochemical study. J Histochem Cytochem 2003; 51: catalytic role of Ser137 in the sulfuryl transfer reaction. J Biol Chem 1655–1664. 2002; 277: 17928–17932. 37 Weinshilboum R, Adjei AA. Sulfate conjugation: pharmacogenetics and 19 Pedersen LC, Petrotchenko EV, Negishi M. Crystal structure of SULT2A3, pharmacogenomics. In: Pacifici GM, Coughtrie MW (eds). Human human hydroxysteroid sulfotransferase. FEBS Lett 2000; 475: 61–64. Cytosolic Sulfotransferases. Taylor and Francis: Boca Raton, Fl, 2005 pp 20 Shevtsov S, Petrotchenko EV, Pedersen LC, Negishi M. Crystallographic 61–78. analysis of a hydroxylated polychlorinated biphenyl (OH-PCB) bound to 38 Weinshilboum R, Wang L. Pharmacogenetics: inherited variation in the catalytic estrogen binding site of human estrogen sulfotransferase. amino acid sequence and altered protein quantity. Clin Pharmacol Ther Environ Health Perspect 2003; 111: 884–888. 2004; 75: 253–258. 21 Rehse PH, Zhou M, Lin SX. Crystal structure of human dehydroepian- 39 Wang L, Sullivan W, Toft D, Weinshilboum R. Thiopurine S-methyl- drosterone sulphotransferase in complex with substrate. [erratum transferase pharmacogenetics: chaperone protein association and appears in Biochem J 2002 Jun 15;364(Pt 3):888]. Biochem J 2002; allozyme degradation. Pharmacogenetics 2003; 13: 555–564. 364(Part 1): 165–171. 40 Salavaggione OE, Wang L, Wiepert M, Yee VC, Weinshilboum RM. 22 Chang HJ, Shi R, Rehse P, Lin SX. Identifying androsterone (ADT) as a Thiopurine S-methyltransferase pharmacogenetics: variant allele func- cognate substrate for human dehydroepiandrosterone sulfotransferase tional and comparative genomics. Pharmacogenet Genomics 2005; 15: (DHEA-ST) important for steroid homeostasis: structure of the enzyme– 801–815. ADT complex. J Biol Chem 2004; 279: 2689–2696. 41 Ji Y, Salavaggione OE, Wang L, Adjei AA, Eckloff B, Wieben ED et al. 23 Leabman MK, Huang CC, DeYoung J, Carlson EJ, Taylor TR, de la Cruz Human phenylethanolamine N-methyltransferase pharmacogenomics: M et al. Natural variation in human membrane transporter genes reveals gene re-sequencing and functional genomics. J Neurochem 2005; 95: evolutionary and functional constraints. Proc Natl Acad Sci USA 2003; 1766–1776. 100: 5896–5901. 42 Wang L, Nguyen TV, McLaughlin RW, Sikkink LA, Ramirez-Alvarado M, 24 Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A et al. Patterns Weinshilboum RM. Human thiopurine S-methyltransferase pharmaco- of single-nucleotide polymorphisms in candidate genes for blood- genetics: variant allozyme misfolding and aggresome formation. Proc pressure homeostasis. Nat Genet 1999; 22: 239–247. Natl Acad Sci USA 2005; 102: 9394–9399. 25 Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N et al. 43 Hartl DL, Clark AG. Principles of Population Genetics, 3rd edn. Sinauer Characterization of single-nucleotide polymorphisms in coding regions Associates Inc.: Sunderland, MA, 2000. of human genes. Nat Genet 1999; 22: 231–238. 44 Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server 26 Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE and survey. Nucleic Acids Res 2002; 30: 3894–3900. et al. Haplotype variation and linkage disequilibrium in 313 human 45 Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods genes. Science 2001; 293: 489–493. for building protein models in electron density maps and the 27 Freudenberg-Hua Y, Freudenberg J, Kluck N, Cichon S, Propping P, location of errors in these models. Acta Crystallogr A 1991; 47(Part 2): Nothen MM. Single nucleotide variation analysis in 65 candidate genes 110–119.

The Pharmacogenomics Journal