Rapid Publication Functional Genomics of the Endocrine Pancreas The Pancreas Clone Set and PancChip, New Resources for Diabetes Research L. Marie Scearce,1 John E. Brestelli,1 Shannon K. McWeeney,2 Catherine S. Lee,1 Joan Mazzarelli,2 Deborah F. Pinney,2 Angel Pizarro,2 Christian J. Stoeckert, Jr.,2 Sandra W. Clifton,3 M. Alan Permutt,4 Juliana Brown,5 Douglas A. Melton,5 and Klaus H. Kaestner1

Over the past 5 years, microarrays have greatly facili- ways affected by diabetes, and 30 clones representing tated large-scale analysis of expression levels. housekeeping . The core pancreas clone set of 3,139 Although these arrays were not specifically geared to cDNAs was assembled using both experimental and com- represent tissues and pathways known to be affected by putational approaches to select clones that were ex- diabetes, they have been used in both type 1 and type 2 pressed in islets and pancreas. We developed a glass cDNA diabetes research. To prepare a tool that is particularly microarray, the “PancChip,” based on the 3.4K clone set. useful in the study of type 1 diabetes, we have assem- Subsequently, we used the PancChip to study gene expres- bled a nonredundant set of 3,400 clones representing sion in the developing mouse pancreas between embry- genes expressed in the mouse pancreas or pathways onic day (E) 14.5 and adulthood. The 3.4K clone set used known to be affected by diabetes. We have demon- strated the usefulness of this clone set by preparing a to prepare the PancChip is available to the diabetes cDNA glass microarray, the PancChip, and using it to research community through the National Institute of analyze pancreatic from embryonic day Diabetes and Digestive and Kidney Diseases (NIDDK)- 14.5 through adulthood in mice. The clone set and funded biotechnology centers. corresponding array are useful resources for diabetes research. Diabetes 51:1997–2004, 2002 RESEARCH DESIGN AND METHODS Glass microarray preparation. The bacterial clones selected for the array were grown to confluence overnight in flat bottom 96-well square block plates with a volume of 1 ml LB-Amp medium per well. Plasmid DNA was prepared icroarray analysis has been used in studies of for each clone using a Qiagen 3000 robot, and the purified DNA used as both type 1 (1,2) and type 2 diabetes (3–8). template to amplify the inserts with PCR. The clones were assigned a PCR These studies made use of commercial, high- score of “pass” indicating a single strong band and “fail” if there were no bands density oligonucleotide arrays as well as or multiple bands. For the subgroup of genes named “select array,” primer M pairs were synthesized for each gene and the cDNA amplified from a mixture cDNA glass microarrays and filter arrays. Due to the fact of intestine, liver, and pancreas RNA using RT-PCR. All PCR products were that array design is often directed by clone availability purified using Millipore Mutiscreen PCR Cleanup 96 well plates, resuspended without consideration of the tissues relevant to diabetes in 50 ␮l deionized sterile water, diluted with an equal volume of DMSO research, most of the elements did not show any differen- (Sigma), and printed on poly-L-lysine–coated slides with an Affymetrix 417 arrayer. There are 3,840 elements on the PancChip (Table 1), including tial expression in diabetes studies. To overcome these pancreas specific clones, clones representing signal transduction pathways, limitations, and to provide a cost-effective resource to the controls, and blanks (9). Information about the PancChip (Version 2.0) is diabetes research community, we have developed a 3.4K available at http://www.cbil.upenn.edu/EPConDB/pancChip.html. clone set that contains 3,139 clones representing mRNAs Preparation of RNA. Six adult CD1 female mice were killed, and the pancreas and heart were immediately homogenized in 10 ml denaturing expressed in the pancreas, 231 clones representing path- solution (4 mol/l guanidium thiocyanate, 0.1 mol/l Tris-Cl pH 7.5, 1% ␤-mer- captoethanol) per organ. Total RNA was extracted using the acid-phenol extraction method (10). Mouse islets, as well as embryonic and newborn From the 1Department of Genetics, University of Pennsylvania, Philadephia, pancreas and liver tissues, were immediately homogenized in 1 ml TRIzol Pennsylvania; the 2Center for Bioinformatics, University of Pennsylvania, Philadephia, Pennsylvania; the 3Genome Sequencing Center, Washington Reagent (Gibco). RNA was purified following the manufacturer’s protocols University, St. Louis, Missouri; the 4Department of Internal Medicine, Wash- with the exception that 20 ␮g glycogen (Roche) was added to each sample. ington University, St. Louis, Missouri; and the 5Department of Molecular and Subsequently, RNA pellets were washed with 75% ethanol and resuspended in Cellular Biology, Harvard University, Boston, Massachusetts. 300 ␮l TES (10 mmol/l Tris pH 7.5, 1 mmol/l EDTA, 0.1% SDS). The RNA was Address correspondence and reprints requests to Klaus H. Kaestner, re-extracted with 600 ␮l phenol:chloroform:isoamyl alcohol (25:24:1) and Department of Genetics, University of Pennsylvania, 415 Curie Blvd., Phila- precipitated with 1/10 volume 3 mol/l sodium acetate and 3 volumes ethanol delphia, PA 19104. E-mail: [email protected]. and stored at Ϫ80°C until use. For the developmental time course, a common Received for publication 2 January 2002 and accepted in revised form 6 May control was prepared by pooling samples. The common control pool consisted 2002. Posted on the World Wide Web at http://diabetes.diabetesjournals.org/ ␮ ␮ rapidpubs.shtml on 7 June 2002. of 2.5 g of each of the E16.5 and E18.5 samples for a total of 15 g from each DoTS, Database of Transcribed Sequences; EST, expressed sequence tag; of these time points, as well as 5.3 ␮g of each of the newborn, P7, and adult GO, ; GUS, Genomics Unified Schema; NIDDK, National Insti- samples for a total of 32 ␮g from each of these later time points. For our initial tute of Diabetes and Digestive and Kidney Diseases. expression survey with Incyte mouse GEM 1 (1.12 for newborn pancreas) and

DIABETES, VOL. 51, JULY 2002 1997 FUNCTIONAL GENOMICS OF THE ENDOCRINE PANCREAS

TABLE 1 The clone sets used to make the PancChip % PCR Total Clones Source success Successes reactions Pancreas clone set 3139 Incyte genomics 81 2546 3139 Housekeeping 30 Research genetics 90 27 30 Pathways 231 Incyte genomics 68 157 231 Yeast controls 16 Incyte genomics 63 10 16 In-house 108 In-house 49 53 108 Select 153 In-house 79 119 150 Blanks 153 Anchors 10 Total 3840 79 2912 3674 human GEM arrays, labeling, hybridization, and signal quantification were of local background significantly increases data variance (E. Manduchi, performed by the manufacturer using 200 ␮g total RNA supplied by us. For the L.M.S., J.E. Brestelli, G.R. Grant, K.H.K., C.J.S. Jr., Physical Genomics, In experiments performed using the PancChip, these steps were performed as Press) and is not necessarily a good control for nonspecific binding of the described below. labeled nucleic acids. RNA labeling. cDNAs were labeled with the 3DNA Submicro Array kit Vector projection. The normalized data were analyzed for trends over time (Genisphere) according to the manufacturer’s protocol and recommenda- using vector projection. Vector projection is a method that allows identifica- tions. Total RNA (2.0 ␮g) and 2 pmoles Cy3 capture sequence primer or Cy5 tion of these trends in gene expression data and incorporates all of informa- capture sequence primer were brought to 10 ␮l with diethyl pyrocarbonate tion from all time points (Terry Speed, Department of Statistics, University of (DEPC)-treated water and incubated for 10 min at 80°C. The RNA mixture was California at Berkley, and Genetics and Bioinformatics, Walter and Eliza Hall then cooled to 42°C. An equal volume of reaction mix (2ϫ first-strand Buffer Institute Australia; and Ingrid Lo¨ nnstedt, Department of Mathematics, Upp- [InVitrogen], 1 mmol/l dATP, 1 mmol/l dGTP, 1 mmol/l dCTP, 1 mmol/l dTTP, sala University, personal communication). Specifically, vector projection 20 mmol/l dithiothreitol [DTT], 40 units RNasin [Promega], and 200 units facilitates quick identification of genes that match the predetermined patterns Superscript II reverse transcriptase [InVitrogen]) was added, and the reaction of interest shown in Fig. 3 (e.g., “late” gene expression defined as peak was incubated for2hat42°C. The reaction was terminated by bringing it to expression during the adult time point). Each gene has a vector of its 0.074N NaOH and 7.4 mmol/l EDTA and incubated at 65°C for 10 min. Finally, normalized expression values across time. These values are projected onto the the reaction was neutralized by adjusting it to 0.175 mol/l Tris-Cl, pH 7.5. The space spanned by the pattern of interest (vector of coefficients or weightings Cy3 and Cy5 reactions were combined and precipitated with 20 ␮g linear for each time point). Examination of the Normal QQ-plot of all projection polyacrylamide (Ambion), 1 volume 7.5 mol/l ammonium acetate, and 9 values allows identification of the extreme projection scores. Genes with volumes ethanol at Ϫ20°C for 30 min. Following precipitation the pellet was strong patterns will have the largest (or smallest) values of the inner product. air dried. Prehybridization. Prehybridization was performed for all arrays (9). A coplin jar containing 50 ml prehybridization buffer (5ϫ SSC, 0.1% SDS, and 1% BSA) RESULTS was brought to 45°C. The arrays were incubated for 45 min at 45°C, rinsed five times in deionized water at room temperature, rinsed once in isopropanol, and The pancreas clone set. To produce a custom cDNA then placed into a 50-ml conical tube and centrifuged 1 min at 1,000 rpm. The microarray, the PancChip, we first needed to identify a set prehybridization was done no more than 1 h before hybridization. of clones that represented genes expressed in the pan- Hybridization. In preparation for hybridization, the cDNA pellet was resus- creas. Experimental and bioinformatics approaches were pended in 2 ␮l sterile deionized water, 3.0 ␮l oligo dT blocker (0.25 ␮g/␮l), 70 used to identify these clones. First, expression studies 2.5 ␮l of the Cy3 dendrimer, 2.5 ␮l of the Cy5 dendrimer, and 1 ␮l high-end differential enhancer (all from Genisphere) were added to the cDNA. Mouse were performed with Incyte GEM arrays using RNA from Cot1 DNA, 2.5 ␮g (1 mg/ml, Gibco-BRL), and 1 ␮l anti-fade reagent (Geni- newborn and adult mouse pancreas, newborn mouse liver, sphere) were added to 100 ␮l hybridization buffer (40% formamide, 4ϫ SSC, a mouse insulinoma cell line, and human islets. In each 1% SDS; Genisphere), and the hybridization buffer was warmed to 45°C. The case, we identified the clones in the top 15% expression prepared hybridization buffer (19 ␮l) was added to the cDNA/dendrimer mix and incubated at 45°C for 15 min. This hybridization mix was added to a level as measured by fluorescence intensity, which given prehybridized glass microarray, covered with a glass coverslip (22 ϫ 40 mm), the random collection of cDNAs on the Incyte GEM array, and incubated in a Corning hybridization chamber overnight at 45°C. The represents genes expressed at low, moderate, and high labeled arrays were washed three times for 10 min each: once at 55°Cin2ϫ levels. Second, we identified additional clones for the SSC, 0.2% SDS, once in 2ϫ SSC at room temperature, and once in 0.2ϫ SSC at PancChip from dbEST cDNA libraries nos. 185 (13), 422 room temperature and then dried by centrifugation in a slide rack for 3 min at 1,000 rpm in a Sorvall SH-3000 rotor. (14), 1,144, 1,880, and 2,712 (15). These libraries were Scanning and image analysis. All slides were scanned immediately follow- chosen because they were all prepared from RNA isolated ing hybridization and washing using an Affymetrix 418 scanner. The laser from pancreatic tissue. Library 1,870 was prepared with power was set to 100%, and the gain of the photomultiplier-tube was varied to C57BL/6J adult mouse pancreas. All others were prepared avoid signal saturation in any spots. The image analysis was performed with ArrayVison 6.0 (Imaging Research). Signal and background pixel classification from human pancreatic islets; no. 2,712 also included RNA were determined by segmentation limited to 75–125% of the set spot diameter. from human total pancreas. Signal and background intensities were determined by the median pixel Genomics Unified Schema. To identify and select a values. All of the array data are available through http://www.cbil.upenn.edu/ nonredundant set of clones, the expressed sequence tags EPConDB/query.html. (ESTs) and clones identified above were mapped to Data preprocessing and normalization. In the arrays used for the devel- opmental time course, 2,914 genes had a passing PCR score (see “Glass IMAGE clones using the Genomics Unified Schema (GUS) Microarray Preparation”) and were considered for further analysis. Genes data system (16) accessible through AllGenes (http:// with high variation among the six replicates (coefficient of variation [CV] Ͼ www.allgenes.org). RNA entries contained within GUS are 0.7) were examined for outliers. The data were normalized using the statistical organized as Database of Transcribed Sequences (DoTS) software package R (11). For the comparisons across time-points, scaled print-tip group lowess normalization was used to remove spatial effects and assemblies; each entry or assembly represents a consen- other systematic variation (12). The local background intensities as calculated sus of overlapping—confirmed and putative—transcribed by ArrayVision were not subtracted from the data intensities. The subtraction sequences. The mouse clones identified by expression

1998 DIABETES, VOL. 51, JULY 2002 L.M. SCEARCE AND ASSOCIATES analysis described above could be directly mapped to GO function assignments for the core pancreas clone set DoTS assemblies. For human cDNA libraries, EST se- are shown in Fig. 1 and Table 2. quences over 100 bp were first retrieved and trailing polyA Additional clones. To further increase the usefulness of and leading polyT regions removed. Mouse orthologs for the information obtained from the core pancreas clone set, these ESTs and for the genes identified by the expression we identified representative genes from various signal analysis of human islets were identified through stringent transduction pathways relevant to pancreatic develop- BLASTX similarity against the nonredundant data- ment, which resulted in the identification of an additional base (NRDB) at the National Center for Biotechnology 231 IMAGE clones (“Pathways” in Table 1). A group of 108 Information (NCBI) using a cut off P value of 1 ϫ 10Ϫ50.A clones relevant to pancreatic development were added to set of IMAGE clones was chosen from combined groups of the collection from laboratory stocks (“In-house” in Table nonoverlapping mouse DoTS assemblies. Finally, one IM- 1). Using an alternative approach to obtain PCR products AGE clone was chosen to represent each of the DoTS to spot onto microarrays, we designed primer pairs for a assemblies identified, with preference given to clones complementary group of 153 genes of importance to containing the 3Ј end of the assembly. pancreatic development and amplified PCR products from Using this combination of expression analysis and data- a cDNA pool derived from intestinal, liver, and pancreas base mining, we obtained a nonredundant core clone set RNA (“Select” in Table 1). To control for the possibility of of 3,139 mouse IMAGE clones, each representing a unique residual cDNA being contained in the purified PCR prod- assembly (“Pancreas clone set,” Table 1). Most of the ucts, we performed PCR reactions without primers (cDNA genes represented in this set were identified by several controls). Finally, all PCR products were purified and paradigms; for example, hundreds were found in an ex- spotted on the array. Additional controls included 30 pression array with mouse insulinoma RNA and were also housekeeping genes (19) and 8 yeast intergenic sequences present in an EST library derived from human pancreatic in duplicate (Incyte Genomics). islets. Overall, ϳ60% of the clones in the set, or 1,900 The cDNA inserts of the clone sets were amplified by elements, were defined to be present in pancreatic islets or PCR and analyzed individually by agarose gel electro- insulinoma cells. Of this core set, 2,369 clones showed phoresis. As can be seen in Table 1, the overall rate of PCR Ͼ95% identity to known protein sequences, and 310 success as defined by the presence of a single band on a showed “no nonredundant (NR) protein similarities” and gel was ϳ80%. An additional 15% of the clones were represent unique protein coding regions or untranslated amplified but contained more than one band. Therefore, of regions. In addition, 1,898 of the clones were sequence the 3,674 clones in the collection, 2,912 are represented as verified at the Genome Sequencing Center at Washington single-band products on the array. To demonstrate the University. The remaining 1,241 clones were not verified, basic characteristics of the PancChip, we labeled the same as they did not match with a parent sequence. These total pancreas RNA with both Cy3 and Cy5 using the clones are not necessarily incorrectly identified, due to the dendrimer method (20) and hybridized both cDNAs on the fact that many parent clones had previously only been same array. Idealized “same versus same” hybridization sequenced from one end. Of the nonsequence verified would show all the points falling perfectly on a line with clones, 830 or 69% were shown to be expressed during one the deviation of the slope from one reflecting the channel- of the stages of pancreatic development contained in this to-channel differences. As can be seen in a scatter plot of study, and, therefore, these clones were maintained as the median intensities of this array (Fig. 2A), the intensi- part of the 3.4K clone set. The corresponding PancChip ties show excellent correlation with a linear regression information, including annotation, for the 3.4K clone set with R2 ϭ 0.9809. Second, we compared two very different and the verification status of all clones is available at RNA samples on the array. In this experiment, islet RNA http://www.cbil.upenn.edu/EPConDB/pancChip.html. was labeled with Cy3 and heart RNA was labeled with Cy5. Gene ontology functions. The cDNA clones in the core As we expected, the scatter plot of the median intensities pancreas clone set were categorized using a directed clearly showed two populations (Fig. 2B). The larger acyclic graphical (DAG) classification system defined by population with 92% of the differentially expressed spots the Gene Ontology (GO) Consortium (17,18) (http://www was labeled primarily with the islet RNA (Cy3 channel). .geneontology.org). The assignments of GO functions to The other population of only 8% of the differentially the represented by the cDNA clones in the core expressed spots showed hybridization primarily with the pancreas clone set (Fig. 1) were made computationally heart RNA (Cy5 channel), indicating that indeed our clone using an algorithm associating the translated protein do- set is enriched for cDNAs expressed in pancreatic islets. mains with GO functions (25). Briefly, to assign GO Developmental time course of pancreatic gene ex- function(s) to the translated sequences, it is assumed that pression. To demonstrate the value of the 3.4K clone set the sequence of a previously characterized functional and the PancChip, we used them to investigate gene domain always functions as characterized. For example, a expression patterns during pancreatic development in translated sequence having a domain with a BLAST simi- mice. We dissected the pancreata from mice representing larity meeting a P value threshold with a previously developmental stages from E14.5 through adulthood, iso- characterized DNA binding domain is assigned the GO lated total RNA from the pancreata, and labeled the RNA function “DNA binding.” Not every translated sequence for hybridization using a dendrimer labeling system (Geni- will contain a domain meeting the criteria to be assigned a sphere; E. Manduchi, L.M.S., J.E.B., G.R. Grant, K.H.K., GO function. Thus, not every clone was assigned a GO C.J.S. Jr., Physiological Genomics, In Press). For E16.5 function, and some sequences were assigned more than though adulthood, each sample represents a single indi- one top-level function. The distributions of the top-level vidual. Six replicates (each with RNA from a distinct

DIABETES, VOL. 51, JULY 2002 1999 FUNCTIONAL GENOMICS OF THE ENDOCRINE PANCREAS

FIG. 1. The distribution of GO functions of proteins represented by the 3,139 core pancreas IMAGE cDNA clone set selected for the PancChip. The GO functions are classifications of cellular functions and were computationally assigned according to the guidelines of the GO Consortium. individual) were performed for each time point from E16.5 the individual time points for each gene over the intensi- through adulthood. Due to the low amount of RNA from ties of the common control for the same gene as has been each pancreatic primordium in E14.5 samples, we pooled used for other two-channel microarray experiments with several pancreata from individual embryos to prepare one multiple comparisons (11,21). RNA sample. Four replicates of this time point were done Time series analysis. In the developmental time course, (each with pools from different groups of individuals). The only the 2,914 genes with a passing PCR score were experimental samples were all labeled with Cy3. We used considered for further analysis. Gene expression levels a pool of all the samples in the study (except samples from were normalized, and the normalized data were analyzed E14.5) as the reference RNA and labeled it with Cy5. The for trends over time as described in RESEARCH DESIGN AND labeled RNA was hybridized to the arrays and scanned. METHODS. Three predetermined patterns were of interest: The data were analyzed as the ratios of the intensities of “late” expression (with peak expression at adult), “early”

2000 DIABETES, VOL. 51, JULY 2002 L.M. SCEARCE AND ASSOCIATES

TABLE 2. GO function assignments of the core pancreas clone set Number of Percent of GO function clones total Enzyme 815 26.9 Ligand binding or carrier 526 17.3 Nucleic acid binding 463 15.3 Signal transducer 367 12.1 Cell adhesion molecule 199 6.6 Transporter 197 6.5 Structural protein 194 6.4 Motor 86 2.8 Chaperone 53 1.8 Microtubule binding 33 1.1 Enzyme activator 21 0.7 Cell cycle regulator 20 0.6 Enzyme inhibitor 17 0.6 Defense/immunity protein 16 0.5 Apoptosis regulator 13 0.4 Antioxidant 5 0.2 Cytoskeletal regulator 2 0.1 Protein tagging 2 0.1 Total 3,029 100 expression (peak is at E14.5), and genes that have peak expression at birth (Fig. 3). Genes with the most extreme vector projections for each trend were identified (see RESEARCH DESIGN AND METHODS). These genes are also listed in supplemental materials (http://www.cbil.upenn.edu/EP- ConDB/pancChip.html). We used the GO functions, dis- cussed earlier, to categorize the genes with the top vector projection scores for each trend. There is a shift in the distribution of GO functions across the developmental stages examined (Fig. 4). During the early expression period, the predominant group of genes being expressed fall into the “Nucleic acid binding” category. At birth, the predominant group of genes being expressed has “No NR FIG. 2. Scatter plots of the median intensities of total RNA hybridized protein similarities.” Finally, in the late period of expres- to the PancChip. A: For a same-versus-same comparison, 2.5 ␮gof mouse pancreas total RNA labeled with Cy5 (red) and 2.5 ␮g of mouse sion, the predominant group is “Enzyme,” as we would pancreas total RNA labeled with Cy3 (green) were hybridized to the expect since our samples included the exocrine pancreas PancChip. B: To compare total RNA from two different tissues, 2.5 ␮g whose primary function in the adult is the secretion of heart RNA labeled with Cy5 and 2.5 ␮g islet RNA labeled with Cy3 were hybridized to the PancChip. The values for blank spots were removed digestive enzymes. and the median intensities plotted as measured by ArrayVison. When we examined the range of intensities across time for a given trend, we found dramatic shifts in the intensi- ties for a gene across the different time points (Fig. 5). The (Fig. 5B) shows an approximate eightfold difference in y-axis is the log2 ratio of the average median intensities of gene expression between birth and E14.5 and an approx- the given developmental time point versus the average imate twofold difference between birth and adulthood. median intensities of the pooled common control. Due to Fifty percent of the genes identified as highly expressed the log nature of the scale, the changes seen in the early during the perinatal period are genes with no GO function period (panel A) relate to an eightfold scale in the expres- (genes with no known protein similarities and clones with sion of the identified genes when compared with the similarity to other uncharacterized ESTs). Other genes common control. Included among the early expression identified as highly expressed during the perinatal period genes for nucleic acid binding proteins are 1) a clone with include BAT2 (a voltage sensitive calcium channel) and high identity to heterogeneous nuclear ribonucleoprotein proline oxidase 1. A3 (hnrnpA3) and 2) nucleophosmin, as well as transcrip- The late expression proteins show up to a 20-fold tion factors. Both hnrnpA3 and nucleophosmin are change in expression between our earliest time point thought to play roles in cell growth and proliferation. In (E14.5) and adulthood (Fig. 5C). As can be seen on the addition to the nucleic acid binding proteins, genes active graph, many of these genes are digestive enzymes one in membrane transport functions, including thyroid recep- would expect to be expressed in the adult pancreas, tor activator molecule (TRAM1) and lysosomal-associated including elastase-2 and amylase. Trefoil factor 2 (TFF2), protein transmembrane 4␣ (LAPTM4A), are expressed in which has previously been shown to be expressed in the early pancreatic development. pancreas (22–24), also increases expression in the adult The scale of the graph for the expression-at-birth panel versus the earlier stages.

DIABETES, VOL. 51, JULY 2002 2001 FUNCTIONAL GENOMICS OF THE ENDOCRINE PANCREAS

FIG. 3. Models of the three predicted gene expression patterns used to predict genes of interest with vector projections. The “early expression” pattern is a pattern defined as a clone with the greatest expression at E14.5, the earliest data obtained in this study. The “expression at birth” is defined as a clone with a pattern of expression, which peaks in the neonate with lower expression in the adult and the embryo. The “late expression” is defined as any clone with the greatest expression in the adult.

DISCUSSION during the developmental stages studied. These clones We have assembled a set of clones that represent genes will, however, be replaced with new cDNAs from Endo- primarily expressed in the pancreas and have used this crine Pancreas Consortium libraries as they become avail- clone set to construct a glass cDNA microarray as a able. resource to the diabetes research community. Our core We have demonstrated the usefulness of the clone set by pancreas clone set has been extensively annotated and preparing a glass cDNA microarray, the PancChip, and by represents a varied spectrum of cell functions. All cDNAs profiling pancreatic gene expression patterns from mid- of the core pancreas clone set have been resequenced, and gestation through adulthood with the PancChip. At the the majority of the clones were sequence-verified. Most of earliest stage in this study (E14.5), nucleic acid binding the nonconfirmed clones are expressed at some point proteins were the largest group of differentially expressed

FIG. 4. The median intensity data from six individuals for each time point (four in the E14.5) were analyzed. The genes showing the greatest expression during the “early” period (E14.5), expression at “birth,” and late expression (at adult) were identified using vector projection. The functions of the proteins represented by differentially expressed genes in each stage were characterized using GO functions. The categories for “nucleic acid binding,”“no GO function,”“transporter,”“enzyme,”“ligand binding or carrier,”“structural proteins/molecular chaperones”, and “no NR protein similarities” are shown.

2002 DIABETES, VOL. 51, JULY 2002 L.M. SCEARCE AND ASSOCIATES

FIG. 5. The log2 ratio (intensity of Cy3 channel/intensity Cy5 channel) of the mean of the median intensity data from six individuals for each time point (four in the E14.5) for the genes identified by vector projection as having greatest expression at E14.5, the “early” period (A), at “birth” (B), and during “adulthood” (C) are plotted against time. The developmental time points include E14.5, E16.5, E18.5, newborn, p7, and adult. The specific time course of expression for specific genes are labeled and identified by the arrows. A: Heterogeneous nuclear ribonucleoprotein A3 (hnrnp A3), lysosomal-associated protein transmembrane 4␣ (LAPTM4A), thyroid receptor activator molecule (TRAM1), and nucleophosmin (Npm1). B: BAT2 (a voltage-sensitive calcium channel) and proline oxidase 1. C: Trefoil factor 2, elastase, and amylase. genes as assigned by GO functions. The nucleic acid in the adult consists largely of exocrine tissue whose main binding proteins of the GO function classification include function is the secretion of digestive enzymes. transcription factors and other proteins involved in cell Previous gene profiling experiments made use of com- fate decisions. This result fits with expectations of gene mercial arrays to study both type 1 and type 2 diabetes expression during embryonic development. During the (1,2,5,6,8). None of these arrays were prepared with the adult stage of pancreatic development, the largest group of intention to focus on diabetes or the pancreas. Because genes being expressed encodes enzymes. Again, this is an the pancreas clone set is enriched for genes that are expected result that confirms the validity of the PancChip, expressed in the pancreas or were present in cDNA as we had used total pancreas for our time course, which libraries prepared from pancreatic tissue, it offers the

DIABETES, VOL. 51, JULY 2002 2003 FUNCTIONAL GENOMICS OF THE ENDOCRINE PANCREAS diabetes research community the opportunity to investi- 7. Nadler ST, Stoehr JP, Schuler KL, Tanimoto G, Yandell BS, Attie AD: The gate genes of interest at a higher density. Ultimately, it will expression of adipogenic genes is decreased in obesity and diabetes mellitus. Proc Natl Acad SciUSA97:11371–11376, 2000 be desirable to utilize a complete genome-wide cDNA 8. Nadler ST, Attie AD: Please pass the chips: genomic insights into obesity array representing all 20,000–40,000 mammalian genes for and diabetes. J Nutr 131:2078–2081, 2001 all expression profiling experiments, including those re- 9. Hegde P, Qi R, Abernathy K, Cheryl G, Dharap S, Gaspard R, Earle-Hughes lated to diabetes. However, until such a resource is J, Snesrud E, Lee N, Quakenbush J: A concise guide to cDNA microarray available at a low cost, the PancChip will provide a analysis. Biotechniques 29:548–550, 2000 valuable and affordable resource for the diabetes research 10. Chomczynski P, Sacchi N: Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal Biochem community. The 3.4K pancreas clone set used to make the 162:156–159, 1987 PancChip will be distributed through the NIDDK-funded 11. Dudoit S, Yang YH, Callow MJ, Speed TP: Statistical methods for identi- biotechnology centers. Current efforts of the Endocrine fying differentially expressed genes in replicated cDNA microarray exper- Pancreas Consortium are centered on the generation and iments [article online]. Available from http://www.stat.berkeley.edu/tech- reports/index.html. Accessed September 2000 sequencing of pancreas-specific cDNA libraries from vari- 12. Yang YH, Dudoit S, Luu P, Speed T: Normalization of cDNA microarray ous stages of development from both mouse and human. data. Proceedings of SPIE, Microarrays: Optical Technologies and Infor- These new libraries will allow us to dramatically increase matics. Bittner ML, Chen Y, Dorsel AN, Dougherty ER, Eds. 4266:141–152, the number of clones represented in the clone set and on 2001 the PancChip in the future. 13. Permutt M, Koranyi L, Keller K, Lacy P, Scharp D, Mueckler M: Cloning and functional expression of a human pancreatic islet glucose-transporter cDNA. Proc Natl Acad SciUSA86:8688–8692, 1989 ACKNOWLEDGMENTS 14. Takeda J, Yano H, Eng S, Zeng Y, Bell G: A molecular inventory of human We gratefully acknowledge support in the form of NIDDK pancreatic islets: sequence analysis of 1000 cDNA clones. Hum Mol Genet 2:1793–1798, 1993 Grant 56947 to K.H.K and NIDDK 56954 to A.M.P. D.M. 15. Ferrer J, Wasson J, Schoor K, Mueckler M, Donis-Keller H, Permutt M: acknowledges support from the Juvenile Diabetes Re- Mapping novel pancreatic islet genes to human . Diabetes search Foundation. 46:386–392, 1997 We thank Phillip Phuc Le for computer support and for 16. Davidson SB, Crabtree J, Brunk BP, Schug J, Tannen V, Overton GC, C. J. the design of the web interfaces and Jian Wang for initial Stoeckert J: K2/Kleisli and GUS: experiments in integrated access to genomic data sources. IBM Systems Journal 40:512–531, 2001 clone selection. We also thank Elisabetta Manduchi and 17. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis members of the Computational Biology and Informatics AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Laboratory (CBIL) in the Center for BioInformatics for Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, helpful discussions. Sherlock G: Gene ontology: tool for the unification of biology. Nat Genet 25:25, 2000 18. Ashburner M, Ball CA, Blake JA, Butler H, Cherry JM, Corradi J, Dolinski REFERENCES K, Eppig JT, Harris M, Hill DP, Lewis S, Marshall B, Mungall C, Reiser L, 1. Cardozo AK, Heimberg H, Heremans Y, Leeman R, Kutlu B, Kruhoffer M, Rhee S, Richardson JE, Richter J, Ringwald M, Rubin GM, Sherlock G, Orntoft T, Eizirik DL: A Comprehensive analysis of cytokine-induced and Yoon J: Creating the gene ontology resource: design and implementation. nuclear Factor-kappa B-dependent genes in primary rat pancreatic beta- Genome Res 11:1425–1433, 2001 cells. J Biol Chem 276:48879–48886, 2001 19. Warrington J, Nair A, Mahadevappa M, Tsyganskaya M: Comparison of 2. Zimmer Y, Milo-Landesman D, Svetlanov A, Efrat S: Genes induced by human adult and fetal expression and identification of 535 housekeeping/ growth arrest in a pancreatic b cell line: identification by analysis of cDNA maintenance genes. Physiol Genomics 2:143–147, 2000 arrays. FEBS Lett 457:65–70, 1999 20. Stears RL, Getts RC, Gullans SR: A novel, sensitive detection system for 3. Joussen AM, Huang S: Mo¨ glichkeiten einer breitspektrumanalyse von high-density microarrays using dendrimer technology. Physiol Genomics genexpressionmustern mittels cDNA-arrays: gene expression profiling 3:93–99, 2000 using cDNA microarrays. Der Opthalmologe 98:568–573, 2001 21. Fellenberg K, Hauser NC, Brors B, Neutzner A, Hoheisel JD, Vingron M: 4. Boel E, Albrektsen T, Fleckner J, Selmer J: Modulation of metabolism Correspondence analysis applied to microarray data. Proc Natl Acad Sci U through transcriptional control has created new treatment opportunities SA98:10781–10786, 2001 for type 2 diabetes. Curr Pharm Biotechnol 1:63–71, 2000 22. Tomasetto C, Rio M, Gautier C, Wolf C, Hareuveni M, Chambon P, Lathe R: 5. Tobe K, Suzuki R, Aoyama M, Yamauchi T, Kamon J, Kubota N, Terauchi hSP, the domain-duplicated homolog of pS2 protein, is co-expressed with Y, Matsui J, Akanuma Y, Kimura S, Tanaka J, Abe M, Ohsumi J, Nagai R, pS2 in stomach but not in breast carcinoma. EMBO J 9:407–414, 1990 Kadowaki T: Increased expression of the sterol regulatory element-binding 23. Lefe`bvre O, Wolf C, Kedinger M, Chenard M, Tomasetto C, Chambon P, Rio protein-1 gene in insulin receptor substrate-2-/- mouse liver. J Biol Chem MC: The mouse one P-domain (pS2) and two domain (mSP) genes exhibit 276:38337–38340, 2001 distinct patterns of expression. J Cell Biol 122:191–198, 1993 6. Aitman T, Glazier A, Wallace C, Cooper L, Norsworthy P, Wahid F, 24. Ribieras S, Lefe`bvre O, Tomasetto C, Rio MC: Mouse trefoil factor genes: Al-Majali K, Trembling P, Mann C, Shoulders C, Graf D, St Lezin E, Kurtz genomic organization, sequences and methylation analyses. Gene 266: T, Kren V, Pravenec M, Ibrahimi A, Abumrad N, Stanton L, Scott J: 67–75, 2001 Identification of Cd36 (Fat) as an insulin-resistance gene causing defective 25. Schug J, Diskins S, Mazzarelli J, Bunk BP, Stoeckert CJ: Predicting gene fatty acid and glucose metabolism in hypertensive rats. Nat Genet 21:76– ontology functions from ProDom and CDD protein domains. Genome Res 83, 1999 12.

2004 DIABETES, VOL. 51, JULY 2002