Supporting Information

Whittle et al. 10.1073/pnas.0812894106 SI Text range based on analysis on 1.5% agarose gels after reversal of In Vitro Genomic Selection. Recombinant NFI-1-GST fusion pro- cross-linking. DNA- complexes were precipitated with tein was immobilized on Glutathione Sepharose 4B (Amersham) anti-NFI immune serum (5 ␮l) or preimmune serum (5 ␮l) as a and C. elegans genomic DNA was used for in vitro genomic control and 5% of the input sample was set aside. Samples were selection. The fusion protein contained the NFI-1 DNA-binding processed using the ChIP assay kit (Upstate). Following reversal domain and a short downstream region. For NFI-1-GST Sepha- of the cross-links, RNase A and Proteinase K treatments, the rose preparation, 400 ml of induced DH5␣ cells expressing immunoprecipitated DNA was purified using phenol-chloro- NFI-1-GST (plasmid pGEXNFI-1) was extracted in buffer L (25 form, precipitated with ethanol, and dissolved in 50 ␮l of DNase, mM Hepes pH 7.5, 10% sucrose, 0.35 M NaCl, 5 mM DTT, 1 mM Rnase-free water. For ChIP-chip, samples were amplified by PMSF, 0.1% Nonidet P-40 and 2 mg/ml of lysozyme). The extract either ligation mediated PCR (5) or a modified Whole Genome (35 ml) was flash-frozen in 5-ml aliquots and stored at –70 °C. To Amplification protocol [Sigma; (6)] as previously described. immobilize NFI-1-GST protein on Glutathione Sepharose 4B, 5 ml of cell extract was incubated with 65 ␮l (bed volume) of Microarrays and Data Extraction. DNA microarrays (Agilent Tech- Sepharose 30 min at 4 °C, washed 3 times with L buffer minus nologies) covering the entire C. elegans genome with 185,000 lysoyme and sucrose, and 2 times with buffer B (25 mM Hepes, probes at an average start-to-start spacing of 600 bp were used pH 7.5, 5 mM MgCl2, 4 mM DTT, 150 mM NaCl and 0.05% for ChIP-chip (GEO accession no. GPL7776). Four NFI-1 ChIP Nonidet P-40). Approximately 10 ␮g was bound as estimated by biological replicates and 2 NFI-1 Preimmune Mock ChIP-chip SDS PAGE. In vitro genomic selection was performed as experiments were performed. Raw intensities for each ChIP described (1), with some modifications to optimize binding were normalized by median centering the log2(ChIP/Input). conditions. Briefly, 5 ␮g of genomic DNA from a mixed age Normalized log2 ratios from each experiment were converted to population of N2 worms was digested with Sau3A I and incu- standard Z-scores and combined by taking the median of bated with NFI-1-GST Sepharose in 200 ␮l of buffer B at 4 °C, experiments. Probes corresponding to known repetitive ele- followed by 3 washes. DNA was eluted with 10 mM of Gluta- ments were spatially sequestered and removed from subsequent thione in Buffer B, purified using phenol-chloroform and eth- analysis. Raw and processed data can be accessed at NCBI GEO anol precipitation, and dissolved in 12-␮lH2O. DNA was ligated accession number GSE13918. Significant binding peaks were to linkers, PCR amplified, and subjected to the second and third derived using a perl implementation of ChIPOTle (7) using a rounds of selection. After the last round of selection and window size of 1,800 bp, step size of 600 bp, at a Bonferroni Ϫ amplification, DNA was TOPO-cloned (Invitrogen) and all corrected P-value of 1 ϫ 10 12. For each of the 55 discovered clones were sequenced. binding peaks, the maximum probe within a peak was extracted and annotated to the nearest using a C. elegans implemen- DNA-Binding Assays. Gel mobility-shift assays and competition tation of Cis-element annotation software (8) and hand-checked analysis to assess enrichment for NFI binding sites during in vitro for accuracy (Wormbase release ws170). selection were performed as described previously (2). Worm extracts were prepared from dounce-homogenized mixed-age ChIP-chip Data Analysis. For motif analysis, a 1,500-bp window worms using the Nonidet P-40-based extraction buffer described centered on the peak maximum probe was used. Extracted previously (2). NFI-1-GST protein was purified on Glutathione sequences (genome release ws170/ce4) were masked for repet- Sepharose 4B from extracts of Escherichia coli as described itive elements using RepeatMasker (9). ChIP sequences were above for genomic selection. Microfiltration on Millipore 5000 ranked by maximum probe Z-score and MDscan (10) was used was used to remove Glutathione after elution and to concentrate for motif discovery. Matrixscan (5) was used to find motifs using protein. Next, 100 ng of recombinant GST-NFI-1 protein and the the MDscan-generated position weight matrix for the top- labeled 26-mer oligonucleotides containing a wild-type NFI scoring motif with a word size of 15 bp. Distance to nearest TSS binding site (wt) 5ЈAGGTCTGGCTTTGGGCCAAGAGC- mappings, random window generation, and perfect-match motif CGC or a site with a single point mutation (mut) 5ЈAG- finding were performed using custom Perl and Ruby scripts GTCTcGCTTTGGGCCAAGAGCCGC, shown previously to (available upon request). R was used for statistical analysis and abolish the binding of vertebrate NFI (3), were used. A plotting. Genome browser visualizations were obtained using the 100-fold molar excess of unlabeled PCR amplified DNA from UCSC genome browser (http://genome.ucsc.edu), genome re- each round of selection was added to the indicated samples. lease ws170/ce4. Modeling of nucleosome occupancy and micrococcal nuclease Chromatin Immunoprecipitation. Rabbit polyclonal antiserum was mapping of nucleosome occupancy and position (Adjusted Nu- raised against recombinant NFI-1-GST fusion protein described cleosome Stringency) were derived from published datasets (11, above. Antibody recognition of native NFI-1 protein bound to 12) and extracted via the UCSC genome browser (http:// DNA was verified by gel mobility-shift assays (Fig. S9). ChIP was genome.ucsc.edu). Raw expression data were obtained from the performed on a mixed-age population of N2 worms. Worm Stanford Microarray Database (http://smd.stanford.edu/) for a culture was initiated with 20 young adult worms on 10-cm previously published time-course of the C. elegans lifecycle (13). NGM/DH10B plates (10–14 plates for one experiment). Ap- Raw intensities for each expression microarray channel (mixed proximately 0.5 ml of worm pellet was collected for each ChIP RNA reference or single stage) were percentile-ranked as a sample. Cross-linking conditions were as previously described measure of relative RNA abundance. (4). Cross-linked pellets (120–150 mg) were resuspended in 1 ml Precomputed blastp hits derived from wormbase release of ChIP lysis buffer (Upstate) and sonicated using a Branson ws170 were used to find orthologs of the C. elegans NFI-1 targets. Sonifier 250 (output 30 and DutyCycle 30% setting) with 15 sets The protein with the lowest e-value was chosen and 3kb (C. of 10 pulses (1 sec each) on ice with 1-min intervals between each briggsae) or 5 kb (mouse/human) upstream of the TSS was set for cooling. The sonicated fragments were Ϸ200- to 1,300-bp examined using Matrixscan (5) for motifs using the C. elegans

Whittle et al. www.pnas.org/cgi/content/short/0812894106 1of12 derived position weight matrix. C. briggsae sequences and anno- using iQ SYBR Green Supermix (Bio-Rad) on a Bio-Rad iCycler tations were obtained for wormbase genome release ws190; and 1 ␮l of DNA precipitated with anti-NFI immune or preim- mouse (Ensembl50/NCBI m37), and human (Ensembl50/ mune serum. Input DNA sample was diluted Ϸ1,000-fold to NCBI36) sequences and annotations were obtained via Ensembl achieve a Ct value within the same range. All reactions were in (http://www.ensembl.org/) and the UCSC genome browser. duplicate. Primers designed within the coding region of ama-1, a locus negative for NFI-1 binding, were used as an internal qPCR Analysis of ChIP-chip Data. qPCR was used to determine control to normalized quantification in qPCR reactions. qPCR relative amount of specific loci in IP, Input, and Mock (Preim- primers are available on request. Data are expressed as IP/Input mune) samples. Amplicons of 100–200 bp were designed using where DDCT ϭ (CtIP࿝locusX Ϫ CtIP࿝ama-1) Ϫ (CtInput࿝locusX Ϫ Ct Macvector software for each loci to ensure a uniform assay Input࿝ama-1). As a negative control, Mock/Input was analyzed in performance under cycling conditions: 50 °C for 2 min, 95 °C for parallel. DDAverage Ct values for ama-1 and input were used in 8 min, 30 sec and 40 cycles 95 °C for 15 sec, and 60 °C for 1 min, calculations. Bars on the graphs represent corresponding DDCt following by melt-curve data collection. qPCR was performed values and their range.

1. Shostak Y, Van Gilst MR, Antebi A, Yamamoto KR (2004) Identification of C. elegans 8. Ji X, Li W, Song J, Wei L, Liu XS (2006) CEAS: cis-regulatory element annotation system. DAF-12-binding sites, response elements, and target . Genes Dev 18:2529–2544. Nucleic Acids Res 34(Web Server issue):W551–W554. 2. Lazakovitch E, et al. (2005) nfi-I affects behavior and life-span in C. elegans but is not 9. Chen N (2004) Current Protocols in Bioinformatics (Wiley, Hoboken, NJ), Chapter 4: essential for DNA replication or survival. BMC Dev Biol 5:24. Unit 4–10, 4.10.1–4.10.14. 3. Goyal N, Knox J, Gronostajski R (1990) Analysis of multiple forms of nuclear factor I in 10. Liu XS, Brutlag DL, Liu JS (2002) An algorithm for finding protein-DNA binding sites human and murine cell lines. Mol Cell Biol 10:1041–1048. with applications to chromatin-immunoprecipitation microarray experiments. Nat 4. Oh SW, et al. (2006) Identification of direct DAF-16 targets controlling longevity, Biotechnol 20:835–839. metabolism and diapause by chromatin immunoprecipitation. Nat Genet 38:251–257. 11. Valouev A, et al. (2008) A high-resolution, nucleosome position map of C. elegans 5. Ercan S, et al. (2007) X repression by localization of the C. elegans dosage reveals a lack of universal sequence-dictated positioning. Genome Res 18:1051–1063. compensation machinery to sites of transcription initiation. Nat Genet 39:403–408. 12. Kaplan N, et al. (2008) The DNA-encoded nucleosome organization of a eukaryotic 6. O’Geen H, Nicolet CM, Blahnik K, Green R, Farnham PJ (2006) Comparison of sample preparation methods for ChIP-chip assays. Biotechniques 41:577–580. genome. Nature 458:362–366. 7. Buck MJ, Nobel AB, Lieb JD (2005) ChIPOTle: a user-friendly tool for the analysis of 13. Jiang M, et al. (2001) Genome-wide analysis of developmental and sex-regulated gene ChIP-chip data. Genome Biol 6:R97. expression profiles in Caenorhabditis elegans. Proc Natl Acad Sci USA 98:218–223.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 2of12 A BC

Sau3A genomic DNA digest Competitor - - - - gDNA R1 DNA R2 DNA R3 DNA P32 oligo wt mut wt mut wt wt wt wt NFI-1 protein - - + + + + + + Lane 1 2 3 4 5 6 7 8 GST 100 GST NFI-1 NFI-1

GST GST 75 NFI-1 NFI-1

Selection with Repeat NFI-1::GST selection 50 (x2)

% Binding inhibition 25

Elute DNA, 0 ligate linkers, PCR amplify Round 3 DNA Round 2 DNA Round 1 DNA Genomic DNA

TOPO clone and sequence

Fig. S1. Analysis of NFI-1 in vitro binding. (A) Diagram of the in vitro genomic selection strategy (SI Materials and Methods). (B) Analysis of enrichment for NFI binding sites at different rounds of in vitro selection with DNA binding assays. Purified recombinant GST-NFI-1 protein (100 ng) was incubated with P32-labeled oligo containing an NFI binding motif (wt) in the absence (Lanes 3 and 4) or presence of a 100-fold molar excess of unlabeled Sau3AI-digested genomic DNA (Lane 5) and PCR-amplified DNA from each round of selection (Lanes 6–8). (C) Quantification of protein-DNA-bound complexes in DNA binding assays. The percent-binding was assessed as the ratio of P32-labeled 2.6 oligo bound to GST-NFI-1 protein in the presence and absence of competitor quantified using a PhosphorImager (Molecular Dynamics).

Whittle et al. www.pnas.org/cgi/content/short/0812894106 3of12 A Chromosomal Distribution of NFI-1 sites

NFI-1 consensus [TTGGC(N5)GCCAA] In vivo binding sites Percent of class

0.0IIIIIIIVVX 0.1 0.2 0.3 0.4 Chromosome B Distribution of NFI-1 motifs around transcription starts (-273) In vivo

-4000 0 4000 (-712) In vitro (+470) In silico

-20000 -10000 0 10000 20000 30000 Distance from nearest TSS (bp)

Fig. S2. The genomic distribution of NFI-1 motifs containing a N5 or A(N)3T spacer. (A) The chromosomal distribution of 877 NFI-1 (N)5 spacer [TTGGC(N)5GCCAA] consensus motifs in comparison to the in vivo binding sites of NFI-1. (B) Box plots showing the distance (bp) of the motif start relative to the nearest annotated TSS for the in vivo targets (orange), in vitro targets (blue), or in silico targets (gray). In silico motif positions were annotated using the in vivo-derived position weight matrix (pwm) and Matrixscan (5). The box represents the interquartile range; Whiskers denote the fifth and ninty-fifth percentiles, respectively; Open circles indicate outliers. Median distance to TSS is noted in parentheses. The inlaid plot shows a 10-kb window around the motif, indicated by the red box on the main plot.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 4of12 A C. Briggsae genome 600 544 500

400

300

Sites (count) Sites 211 200

100 27 4 0 3 N5 AT TA CG GC Spacers B Mouse genome 1800 1600 1566 1400 1200 1000 800

Sites (count) Sites 600 400 200 159 192 139 0 1 N5 AT TA CG GC Spacer C 2500

2000 1911

1500

1000 Sites (count) Sites 500 194 211 179 0 0 N5 AT TA CG GC Spacer

Fig. S3. NFI-1 consensus motif representation in C. briggsae, mouse, and human. The C. briggsae (ws190), mouse (NCBI37), and human (NCBI36) genomes were searched for NFI N5 consensus or alternative spacers containing AT, TA, CG, or GC in the sixth and tenth positions, respectively. Counts of motif occurrence for each spacer are plotted for the (A) C. briggsae, (B) mouse, or (C) human genome.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 5of12 A qPCR validation of NFI-1 targets 10

Array positive; 8 Array positive; with NFI sites no NFI sites Array negative

6

4

2 -ddCt

0

-2 NFI-1 ChIP/input -4 Mock/input

-6

htz-1 eft-2 gei-4 arf-6 mdt-8 rpl-11.2 F25H2.6 K01G5.4 C48B6.2 C54D1.6 F17C11.9 T06A10.3 W01C8.5 Y17G7B.2C17D12.7 T25C12.3 Y48E1B.11

B qPCR validation of loci negative for NFI-1 binding by ChIP-chip, but that contain an NFI-1 motif 18.0

16.0

14.0

12.0

10.0

8.0

6.0

4.0

Fold enrichment (ChIP/Input) 2.0

0.0 F54C8.5 C31C9.2 myo-2 cki-1 Y57A10A ZK643

Fig. S4. Real-time PCR validation of NFI-1 in vivo targets. (A) Nine positive ChIP-chip loci with an NFI-1 motif, 4 positive loci without an NFI-1 motif, and 4 randomly chosen negative loci were chosen for validation by qPCR. Data are derived from analysis of independent ChIP biological replicates. The ama-1 locus was used as an internal control to normalize quantification in qPCR reactions. Bars on the graph represent corresponding DDCt values and their range from 3 independent ChIP biological replicates. Mock ChIP samples (gray) were analyzed as negative controls. (B) Six loci negative for NFI-1 binding according to ChIP-chip, but that contained consensus NFI-1 motifs, were selected and examined by qPCR. Fold enrichment of NFI-1 ChIP/Input is shown. The ama-1 locus was used as an internal control to normalize quantification in qPCR reactions. Error bars show Ϯ 1 SD.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 6of12 A NFI-1 target genes are highly expressed B Promoters with NFI-1 motifs; bound vs. unbound

Bound promoters 2 Unbound promoters R = .80 R2 = .69 Proportion of class 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 1234 0.0 0.2 0.4 0.6 0.8 1.0 Young adult expression percentile rank RNA abundance quartile Embryo expression percentile rank

Fig. S5. NFI-1 targets are expressed throughout development. Expression data were derived from a previously published dataset of staged animals over the lifecycle (3). Single-channel raw intensities for mixed-stage RNA were percentile-ranked as an unbiased measure of relative RNA abundance. (A) Percent of NFI-1 target bound) (red) versus unbound (gray) is plotted binned by quartiles of relative expression percentile rank. (B) A scatter-plot of embryo-expression percentile rank versus young adult-expression percentile rank is shown for all NFI-1 bound (red) versus unbound (gray) genes with Spearman-rank correlation noted.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 7of12 A 350

297 300

All peaks 250 Peaks with an NFI-1 motif

200

150 133 119 Peak count

100 74 59 57 55 48 50

0 10-3 10-6 10-9 10-12 Bonferroni corrected p-value

B

1 0 0 0

297 13 133 14 74 14 55 14

p = 1 x 10-3 p = 1 x 10-6 p = 1 x 10-9 p = 1 x 10-12

In vivo In vitro

Fig. S6. NFI-1 in vivo peak finding and overlap with in vitro targets. Peaks were discovered using a Perl implementation of ChIPOTle (7) at varying Bonferroni corrected P-value cutoffs. (A) Total peak count at each P-value cutoff is shown in blue and peak count with Ն1 NFI-1 motifs is shown in green. With decreasing stringency, fewer peaks containing NFI-1 motifs are found. (B) Venn diagrams representing little or no overlap between in vitro binding sites and in vivo binding sites at each ChIPOTle P-value cutoff.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 8of12 Fig. S7. Nucleosome occupancy surrounding individual NFI-1 binding sites. Nucleosome occupancy (adjusted nucleosome stringency) derived from a published dataset (11) is plotted for a 1-kb window centered on individual in vivo NFI-1 binding sites that are (A) Ͻ500 bp from a TSS (‘‘core’’) or (B) Ͼ500 bp from a TSS (‘‘distal’’). The vertical line indicates the motif center. Black overhead brackets indicate Ͼ1 motif upstream of the same TSS; gray brackets indicates a promoter with motifs in both the ‘‘core’’ and ‘‘distal’’ categories. The plots are sorted by absolute distance of the 300-bp centered local nucleosome minima from the NFI-1 binding site; sites within the same promoter are grouped together.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 9of12 Fig. S7. continued

Whittle et al. www.pnas.org/cgi/content/short/0812894106 10 of 12 A. Motif occurence at the mouse and human orthologs of C. elegans NFI-1 targets. Mouse Gene BLAST BLAST C. elegans gene name Motif count Human gene name Motif count Name E-value E-value F25H2.5 Nme2 7.6E-55 2 NME2 4.3E-55 1 nhl-2 Trim2 5.8E-25 1 TRIM2 3.5E-25 1 pph-4.1 Ppp4c 7E-141 1 PPP4C 7.1E-141 1 rpt-6 Psmc5 6E-179 1 PSMC5 6.2E-179 1 Y38F2AR.2 Ssr3 2.9E-34 SSR3 2.3E-34 3 pbs-1 Psmb6 8.2E-53 PSMB6 1.5E-51 2 rheb-1 Rheb 1.2E-37 RHEB 1.2E-37 2 rsp-7 Sfrs12 5.5E-47 SFRS12 5E-62 2 tag-309 Chmp4b 7E-67 CHMP4B 5.5E-67 2 F17C11.10 Wdhd1 3E-54 WDHD1 2.2E-59 1 car-1 Lsm14a 1.9E-48 LSM14B 6.9E-49 1 F01G4.6 Slc25a3 1E-129 SLC25A3 3.2E-131 1 eft-2 nd nd nd EFTUD2 4.8E-165 1 his-72 nd nd nd H3F3B 5E-66 1 F36A2.9 nd nd nd PLB1 3.4E-28 2 Y54G11A.2 Ccdc25 5.4E-57 4 CCDC25 4.2E-57 snr-3 Snrpd1 2.7E-43 2 SNRPD1 2.7E-43 elc-1 Tceb1 9.8E-43 2 TCEB1 9.8E-43 M28.5 Nhp2l1 8.7E-50 2 NHP2L1 8.7E-50 F25H5.5 Clspn 8.9E-23 1 CLSPN 1.4E-25 htz-1 H2afz 8.5E-52 1 H2AFV 3.5E-52 pfd-1 Pfdn1 2.9E-14 1 PFDN1 2.3E-14 F17C11.9 Eef1g 5.2E-91 1 EEF1G 5.8E-92 rab-18 Rab18 2.4E-59 2 nd nd nd cdc-14 Cdc14a 9.3E-96 1 nd nd nd rack-1 Gnb2l1 1E-127 1 nd nd nd

B. Motif occurence in promoters of orthologs of C. elegans NFI-1 targets.

No. with a % with a motif No. with a % with a motif No. with a % with a motif motif (p=10-4) (p=10-4) motif (p=10-5) (p=10-5) motif (p=10-6) (p=10-6) C. elegans Targets with a C. briggsae ortholog (78) 70 89.7 63 80.8 54 69.2 All elegans-briggsae orthologs (13861) 4417 31.9 1099 7.9 342 2.5 C. briggsae Target Homologs (78) 65 83.3 59 75.6 52 66.7 All elegans-briggsae orthologs (13861) 4422 31.9 1002 7.2 373 2.7

Fig. S8. Motif occurrence in the promoters of orthologs of C. elegans NFI-1 targets. (A) A table summarizing the mouse and human orthologs and motif occurrence across both species. The C. elegans NFI-1 target gene is shown with its corresponding mouse and human ortholog gene common name. Blastp e-values are derived from the precomputed results extracted from wormbase release ws170 (http://www.wormbase.org); motif count is the number of motifs foundat a Matrixscan cutoff of P ϭ 0.0001; ‘‘nd’’ signifies no Blastp data were available from wormbase. (B) A table summarizing motif finding using Matrixscan in the NFI-1 in vivo targets and their orthologs in C. briggsae. Comparisons were made against all C. elegans–C. briggsae 1-to-1 orthologs. Promoters were defined as within 3-kb upstream from the transcription start site. Motif counts and percent of searched motifs are shown for 3 Matrixscan P-value stringencies (P ϭ 10Ϫ4, 10Ϫ5, and 10Ϫ6).

Whittle et al. www.pnas.org/cgi/content/short/0812894106 11 of 12 NFI-1 extract -- +++ ++ + Oligo wt mut wt mut wt wt mut mut Pre-immune - - - - + - + - Anti-NFI-1 -- - - - +- + 12 3 4 5 6 7 8 Anti-NFI-1/ NFI-1/DNA complex

NFI-1/DNA complex

Fig. S9. NFI-1 antisera recognizes NFI-1 in complex with DNA. DNA binding assays were performed as described previously (2). Arrows denote NFI-1-DNA and Antiserum-NFI-1-DNA complexes, as labeled. Serum from multiple immunized rabbits was screened (not shown) and the serum with the highest titer (this figure) was used in ChIP assays. The positions of some lanes were moved from their positions in the original gel for ease of presentation.

Whittle et al. www.pnas.org/cgi/content/short/0812894106 12 of 12