Androgen Receptor Binding Sites Identified by a GREF GATA Model
Total Page:16
File Type:pdf, Size:1020Kb
doi:10.1016/j.jmb.2005.09.009 J. Mol. Biol. (2005) 353, 763–771 COMMUNICATION Androgen Receptor Binding Sites Identified by a GREF_GATA Model Katsuaki Masuda1, Thomas Werner2, Shilpi Maheshwari1 Matthias Frisch2, Soyon Oh1, Gyorgy Petrovics1, Klaus May2 Vasantha Srikantan1, Shiv Srivastava1 and Albert Dobi1* 1Center for Prostate Disease Changes in transcriptional regulation can be permissive for tumor Research, Department of progression by allowing for selective growth advantage of tumor cells. Surgery, Uniformed Services Tumor suppressors can effectively inhibit this process. The PMEPA1 gene, a University, Rockville, MD potent inhibitor of prostate cancer cell growth is an androgen-regulated 20852, USA gene. We addressed the question of whether or not androgen receptor can directly bind to specific PMEPA1 promoter upstream sequences. To test this 2Genomatix Software GmbH hypothesis we extended in silico prediction of androgen receptor binding D-80339 Munich, Germany sites by a modeling approach and verified the actual binding by in vivo chromatin immunoprecipitation assay. Promoter upstream sequences of highly androgen-inducible genes were examined from microarray data of prostate cancer cells for transcription factor binding sites (TFBSs). Results were analyzed to formulate a model for the description of specific androgen receptor binding site context in these sequences. In silico analysis and subsequent experimental verification of the selected sequences suggested that a model that combined a GREF and a GATA TFBS was sufficient for predicting a class of functional androgen receptor binding sites. The GREF matrix family represents androgen receptor, glucocorticoid receptor and progesterone receptor binding sites and the GATA matrix family represents GATA binding protein 1–6 binding sites. We assessed the regulatory sequences of the PMEPA1 gene by comparing our model-based GREF_GATA predictions to weight matrix-based predictions. Androgen receptor binding to predicted promoter upstream sequences of the PMEPA1 gene was confirmed by chromatin immunoprecipitation assay. Our results suggested that androgen receptor binding to cognate elements was consistent with the GREF_GATA model. In contrast, using only single GREF weight matrices resulted in additional matches, apparently false positives. Our findings indicate that complex models based on datasets selected by biological function can be superior predictors as they recognize TFBSs in their functional context. q 2005 Elsevier Ltd. All rights reserved. Keywords: model; androgen receptor; prostate cancer; promoter database; *Corresponding author GATA factor Alterations in transcription regulation are hall- expression of specific genes that control cell marks of cancer progression. Decreases in the proliferation may significantly contribute to malig- nant transformation. The PMEPA1 gene has been recognized as a potent cell growth inhibitor of Abbreviations used: TFBS, transcription factor binding 1 site; AR, androgen receptor; ChIP, chromatin immuno- prostate cancer cells, with decreased expression of precipitation; GREF, glucocorticoid responsive element PMEPA1 associated with higher pathologic stage in matrix family; ARE, androgen responsive element. prostate cancer specimens. Function for PMEPA1 in E-mail address of the corresponding author: the ubiquitin-proteasome pathway is suggested by [email protected] shared structural similarities between PMEPA1 and 0022-2836/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. 764 Predicting Androgen Receptor Binding Sites Table 1. Androgen-inducible gene training set Gene Accession Locus Contig Absolute positions (strand) KLK3 (PSA) HT2351 19q13.41 NM_145864 23,624,361–23,626,361 (C) NDRG1 D87953 8q24.3 NT_008046 47,529,674–47,527,674 (K) NKX3.1 U80669 8p21 NT_023666 1,916,763–1,914,763 (K) IHPK1 D87452 3p21.31 NT_022517 49,711,601–49,709,601 (K) Gene symbols, accession numbers and chromosomal locations (Homo sapiens NCBI Build 35, http://www.ncbi.nlm.nih.gov) are shown. Nedd4-binding proteins. Interestingly, PMEPA1 Defining a model for androgen receptor binding gene expression was shown to be regulated by in the context of prostate-specific hormone androgens in prostate cancer cells.2–4 In order to induction assess androgen receptor (AR) binding to regulatory elements of the PMEPA1 gene we Upstream sequences of highly androgen- integrated computation biology with wetlab experi- inducible genes were selected as a training set mentation. Computation molecular biology of from our previous oligonucleotide array experi- gene regulation is an emerging frontier of bio- ments in order to formulate a model that can predict 5 informatics. In the past, conclusive identification of functional androgen receptor binding sites in the functionally active transcription factor binding sites context of androgen-induced prostate cancer cells (TFBSs) relied mainly on laborious wetlab experi- (Table 1). Four genes were selected that stood out of mentations. However, recently chromatin immuno- the list of androgen-inducible genes both by the precipitation (ChIP)-based assays such as “ChIP- robustness of gene expression response (KLK3Z chip” and ChIP Display were shown to be robust 58! and NDRG1Z13!) and by kinetic consider- techniques to assess binding sites experimentally ations (IHPK1Z9! in 1 h and NKX3.1Z5! in 6,7 genome-wide. In support to these efforts, robust 6 h).4 platforms are currently available for in vitro Extraction of 50 upstream sequences relative to selection of single TFBSs to assess optimal cognate the transcription start sites of training set genes 8–10 sitesforvariouselements. Although bio- between positions K2000 and C1 was accom- informatic tools are available for the prediction of plished by the ElDorado system (Genomatix, 11 TFBSs, we preferred a weight matrix approach. Munich)† that is based on the genome assemblies However, the application of any weight matrix from NCBI (Homo sapiens NCBI Build 35‡). approach to long sequences is hampered by a large Sequences were verified by database searches in number of false positive predictions. Although the human genome. In this process upstream examining the evolutional conservation of tran- sequences of KLK3 and KLK2 genes were found to scription factor binding can enhance the specificity show high degrees of identity (approx. 80% within 12,13 of in silico predictions, this technique can be K1000;C1). Although, we previously found that confounded by species-specific genes. Notably, KLK2 was a highly androgen-inducible gene, KLK2 many human cancer-related genes as illustrated was excluded from the training set to avoid by prostate specific antigen (PSA, KLK3) barely sequence redundancy that may introduce bias into resemble their orthologs from other mammalian the model prediction. The training set was analyzed 14 species. Among mammals the pathogenesis of by the MatInspector20 to identify potential andro- prostate cancer (CaP) is unique to man. The only gen receptor binding sites. Androgen receptor can mammal known to develop CaP in a similar fashion recognize DNA sequences represented by the 15 is the canine. Fortunately, biological context glucocorticoid responsive element matrix family can significantly enhance the quality of TFBS (GREF) that includes matrices of androgen receptor, 16,17 models. Biologically linked sets of transcription glucocorticoid receptor and progesterone receptor factors observed in the combination of specific binding sites. Two GREFs in KLK3 were identified, 5 TFBSs can significantly improve prediction quality. one in NDRG1, two in NKX3.1 and one within the Consistent with this notion, we selected a training IHPK1 upstream sequences. In addition to the set of genes based on gene expression kinetics and GREF family the program found 27 other matrix dose–response of prostate cancer cells to androgen families to match within the sequences. The relation 4 hormone stimulus. Furthermore, we reasoned that of each predicted site was assessed relative to gene regulation by the combination of closely the GREF positions. The distance criterion for the situated binding sites is an essential feature of assessment was 20–33 base-pairs between the 18,19 functional transcription regulatory regions. centers of two adjacent sites (within 2–3 DNA Therefore, we formulated an androgen receptor- helical turn distance). Only the GATA matrix family binding model from the combination of adjacent matches showed distance correlation to the GREF matrices. We experimentally verified this model and went on to assess the utility of the model by testing upstream sequences of the androgen- † http://www.genomatix.de 2 inducible gene, PMEPA1. ‡ http://www.ncbi.nlm.nih.gov Predicting Androgen Receptor Binding Sites 765 Figure 1. Matrices within the training set sequences. (a) Position of matches of GREF (encircled filled arrows) and GATA (open arrowheads) matrix families within the training set of sequences. Arrowheads indicate the strand orientation. Adjacent GREF and GATA matrices (min. 20 and max. 33 bp interval between the centers of matrices) and the corresponding matrix anchor positions are boxed. (b) List of all matches of matrix families identified within the train- ing set sequences. Numbers indi- cate how often a given matrix matched within the pre-defined range of a GREF matrix match (20–33 bp). Only GATA matrix family matches fulfilled this con- dition in all four sequences. matrix family matches (Figure 1). Indeed, the Launcher13 and was successfully checked for identification of potential GATA binding