Supporting Information

Markmiller et al. 10.1073/pnas.1305536111 SI Methods WIK, used in positional cloning as a polymorphic reference Primer and Probe Sequences. Primer sequences are as follows (5′–3′): strain to Tü/TL. The following transgenic lines of zebrafish s854 s854 Full-length rnpc3 [RNA-binding region (RNP1, RRM) containing 3] were used: Tg(gutGFP) , also known as Tg(Xla.Eef1a1:GFP) , coding sequence: forward ATGGCGGAGACGGACGAG and expressing GFP in the endoderm-derived tissues of the developing gz12 reverse CACTTCAGTGTTTTCTCCCTCCTTTC; clbns846 gen- gut, liver, and pancreas. Tg(fabp10:dsRed, ela3l:GFP) ,also otyping: forward AGCAAAGGATACCACCACCA and reverse known as 2CLIP (2-color liver pancreas), expressing dsRed in AAGGCGAGCTCTTTATGAACG; clbnZM genotyping: forward the liver and GFP in the exocrine pancreas. GACGCAGGCGCATAAAATCAGTC and reverse AGGCGTT- Microinjections of Embryos. Fertilized one- to two-cell stage embryos CGTAATGTTCAGG; whole-mount in situ hybridization (WISH) ∼ probe 1 (599 bp): forward TAATACGACTCACTATAGGGTA- were injected under a Leica MZ6 stereomicroscope with 2nLof CGAACCCGGAGAACCAAC and reverse AATTAACCCTCA- injection solution at the indicated concentrations using a mi- CTAAAGGGAACCAGGGCAGTCTGCAATAA (boldface letters croinjection apparatus (Narishige) and a micromanipulator correspond to T7 and T3 binding sequences, respectively); WISH (Narishige). For mRNA and minigene injections, the needle was probe 2 (790 bp): forward AATTAACCCTCACTAAAGGGCTG- positioned in the area of embryonic streaming from the yolk cell CCGTAGCCGAAA and reverse TAATACGACTCACTATAGG- to the fertilized blastocyst. Trace amounts of Phenol red were GGACCTGAGCGCTTG (boldface letters correspond to T3 and added to injection solutions as a visible tracer and rhodamine T7 binding sequences, respectively); sms U12in6_7: forward dextran was added to nonfluorescent injection solutions as a GCAGCGGCAAAGAGCACTATGCTG and reverse AATC- fluorescent tracer. CTTACCTCGTAACAATCTCC; mapk3 U12in2_3: forward Genotyping. Genomic DNA from single zebrafish embryos or adult AGACCTACTGCCAGCGCACCCTG and reverse AGCCCT- fin clips was extracted by incubation at 95 °C for 20 min in 100 μL CGCAGGATCTGATACAG; mapk12 U12in8_9: forward GA- of 50 mM sodium hydroxide (NaOH) using a thermocycler, CATCTGGTCAGTCGGGTGCATC and reverse TCTTCAG- followed by neutralization with 10 μLof1MTris·HCl (pH ACTGTAGCTTGGCTGTG; mapk12 U2in3_4: forward GTT- ZM 8.0). For clbn genotyping, 35 PCR cycles of 30 s at 95 °C, ATCGGACTTGTGGATGTGTTC and reverse CATCTGAT- s846 30 s at 58 °C, and 1 min at 72 °C were performed. For clbn , AGACCAGATACTGCAC; snrpe U12in1_2: forward CGTCA- allele-specific PCR was performed using the same conditions TTCTGTTTCCGGATT and reverse CAACCCTCTATCCG- as previously described (1). CATGTT; ppp2r2a U12in7_8: forward CCTATGGATCTCA- TGGTGGAG and reverse TGCTTGTCACAAAGTGCTGA; Histology. Larvae for histology or immunohistochemistry were fkbp3 U12in2_3: forward AGGGTCTGCTGACGATGAGT and fixedinBouin’s fixative overnight at 4 °C. After fixation, larvae reverse CAGGGTTTGTCTTATTTATTGTCG; U11 snRNA were rinsed three times with PBS containing 0.1% (vol/vol) northern: forward GCATCTGCTGTGAATAGCGTA and reverse Tween 20 (PBST) to remove residual fixative, and stored at 4 °C GAGGCACCAAGATAACAGATCA; U12 snRNA northern: for no more than 48 h. Larvae were aligned and embedded in forward TGCCTTAAACTGATGAGTAAGGAAAA and re- disposable plastic Tissue-Tek cryomolds (ProSciTech) in 1% low verse CGCGGCATCTCGCTAAAGTA; U5 snRNA northern: melting temperature agarose (SeaKem LE Agarose, Lonza) and forward TGTTTCTCTTCATATCGAATAAGTC and reverse stored in 70% (vol/vol) ethanol (EtOH). The blocks of larvae AAAATTAGTAAATACTCAAGGTGTTCC; U6atac snRNA were dehydrated through an ethanol series to 100%, embedded northern: forward CTGTTGTTTGAGAGGAGAGAAGGT and in paraffin and cut into 5-μm-thin sections, which were then reverse AAACCACCCCGATCATGG. U11 snRNA quantitative stainedwithH&E. RT-PCR (RT-qPCR): forward GCTGTGGAAGGGATTCTCT- GA and reverse TGGGGCGCCAAGACCAAC; U12 snRNA Microscopy. Brightfield and epifluorescence images were captured RT-qPCR: forward TGCCTTAAACTGATGAGTAAGGAAAA using a Leica MZ FLIII fluorescence microscope equipped with and reverse CGCGGCATCTCGCTAAAGTA; U2 snRNA an Olympus DP70 camera and Olympus DP controller software. RT-qPCR: forward CTCGGCCTTTTGGCTAAGAT and re- Live or fixed larvae were placed in 2% methyl cellulose to prevent verse TACTGCAATACCGGGTCGAT; U4atac snRNA RT- them from moving during image acquisition. Images were im- qPCR: forward CTTCCTTGTCTGGGGGTGGTT and reverse ported into ImageJ (National Institutes of Health) for image pro- GGTGTTAGCAGGGATGTTCTCAGTTA; elongation factor α cessing and CorelDRAW for figure preparation. For two-photon [elfa]: forward CTTCTCAGGCTGACTGTGC and reverse CC- microscopy, WT and clbns846 mutant larvae on the Tg(gutGFP)s854 GCTAGCATTACCCTCC; cyclin G1: forward GTGCGGAGA- background were fixed briefly in 2% (wt/vol) PFA/PBST to CGTTTTCTTT and reverse AAGACAGATGCTTGGGCTGA; preserve endogenous GFP fluorescence. Whole larvae were p53: forward TCCACTCTCCCACCAACATC and reverse GGG- embedded in 2% (wt/vol) low-melting temperature agarose AACCTGAGCCTAAATCC; Δ113p53: forward ATATCCTGGC- for imaging. Imaging was performed on an Olympus FV1000 GAACATTTGG and reverse ACGTCCACCACCATTTGAAC. Confocal Microscope (BX61W upright) using a 20× XLUM- PlanFL NA0.95 objective and Olympus FV-10ASW v1.7c software. Zebrafish Husbandry and Strains. Animals were maintained at 28 °C Multiphoton excitation was provided by a Spectra Physics MaiTai on a 12 h-light/dark cycle under the standard zebrafish hus- DeepSee LD Ti:Sapphire Laser. Excitation wavelengths used bandry procedures of the Ludwig Institute for Cancer Research/ were 920 nm for GFP and 750 for autofluorescence of internal Walter and Eliza Hall Institute of Medical Research Zebrafish structures of zebrafish larvae, both were detected with a 490- to Facility. A commercial strain harboring an independent mutation 540-nm band-pass filter (Olympus Model FV10MPMG/R). in rnpc3 (clbnZM) was purchased from Znomics (this company terminated operations in 2009). The following WT strains of WISH. Zebrafish embryos and larvae were processed for WISH, as zebrafish were used: Tübingen/Tübingen long fin (Tü/TL), used described previously (2). To generate rnpc3 riboprobes, a cDNA to propogate the clbns846 mutant strain and for positional cloning; template was amplified by RT-PCR. For primer sequences see

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 1of12 Primer and Probe Sequences above. Antisense and sense probes TL software was used to quantify full-length signals for U11 or were then transcribed using the digoxigenin DNA Labeling Kit U12. Background was subtracted manually for each lane, and in (Roche Diagnostics) according to the manufacturer’s instruc- those lanes with >2× signal, background was expressed as a per- tions. Hybridized riboprobes were detected using an anti-DIG centage of the lane with the greatest signal intensity in a single antibody conjugated to alkaline phosphatase according to the gradient. Normalized intensities of bands across 24 fractions manufacturer’s instructions (Roche Diagnostics). Larvae were (fraction nos. 4–27) were plotted in Excel vs. fraction number. imaged on a Nikon SMZ 1500 microscope. Experimental Design for Microarray and RNAseq. -expression Minigene Assay. A minigene spanning the U2-type intron 13, the data were obtained from two independent platforms, using DNA U12-type intron 14, and flanking exons, was amplified from WT microarrays as well as RNA sequencing of poly-A enriched RNA and clbns846 genomic DNA using primers carrying 5′ tails (23 bp) from clbns846 whole larvae. This approach combines advantages to allow specific amplification of the exogenous transgene after of both technologies (i.e., well-established experimental and injection into zebrafish embryos. WT and mutant minigenes were analysis procedures for microarrays and the independence of confirmed by sequencing to differ only in the T→Amutation RNAseq from a priori knowledge of the zebrafish transcriptome) observed in clbns846. but takes into account the cost of RNAseq experiments as well as limitations on microarray content imposed by incomplete Preparation of Larval Extracts for snRNP Analyses. Mutants were annotation of the zebrafish genome. For microarrays, three in- sorted based on phenotype at 150 h postfertilization (hpf) and ∼100 dependent biological replicates consisting of ∼20 pooled genotyped + + larvae each of WT and clbnZM were used per glycerol gradient ex- clbns846/s846 and clbn / larvae each were analyzed on the Agilent periment. Larvae were killed in benzocaine, rinsed several times in Zebrafish V2 platform. Biological replicates ice cold buffer G, and transferred into 1.5-mL microcentrifuge tubes. were then pooled and one single sequencing library each was Tubes were centrifuged for 1 min at 16,000 × g at 4 °C and buffer constructed for mutant and WT samples, respectively, allowing G removed completely, leaving a larval pellet. Larvae were then them to be sequenced on a single slide in the same sequencing resuspended in 4 volumes (microliter per larva) of buffer G con- run. Although the lack of technical or biological replicates in taining 8% (vol/vol) glycerol, 1 mM DTT, 0.5 mM PMSF, and 0.02% RNAseq data limits the statistical power of the analysis, the Nonidet P-40. Larvae were homogenized using a 1-mL dounce ho- experimental design still allows extensive interpretation of the mogenizer and extracts were transferredintoanultracentrifugetube data. The coefficient of variation for differential gene expression and centrifuged for 30 min at 100,000 rpm in an Optima Max ul- in RNAseq experiments is dominated primarily by biological tracentrifuge (Beckman Coulter, Brea, CA) in a Beckman TLA100.2 variability and we minimized the already relatively small con- rotor at 4 °C. The supernatant was layered onto 10–30% (vol/vol) tribution of technical variability further by sequencing all sam- glycerol gradients or used for native gel electrophoresis. ples on a single slide in the same run. Both samples represent pools of ∼60 whole zebrafish larvae derived from three different Native Gel Electrophoresis. Whole larval extract was prepared as heterozygous adult pairs, substantially reducing the biological described above and separated on 4% (wt/vol) polyacrylamide variance seen in experiments on samples from individual animals × (80:1), 1 TGM (50 mM Tris base, 50 mM glycine, 2 mM MgCl2) or human patients. Furthermore, the use of independent biological gels at 160 V for 6 h at 4 °C with 1× TGM buffer (3). Samples replicates for microarrays allows the estimation of biological vari- treated with heparin were incubated with 5 mg/mL heparin on ance between the triplicates and hierarchical clustering suggests ice for 10 min before loading on the gel. Pairs of WT and clbn very little variation between replicates. A χ2 statistic was calcu- extracts were loaded into adjacent wells of the gel to permit lated for RNAseq gene expression data to apply a ranking to multiple analyses of the same extract. The resolved components differentially expressed (DEGs). were transferred to Hybond-N (GE Healthcare), and the mem- brane was cut into strips before Northern blotting for U5, Sample Preparation for Microarrays and RNAseq. Embryos were U6atac, U11, and U12 snRNAs (see below). obtained from three individual mating pairs of clbns846 hetero- zygous adult fish. At 108 hpf they were killed in benzocaine and Synthesis of Radioactive Probes. For Northern analysis of gradients, placed on ice. Individual larva were transferred into wells of a full-length snRNA probe templates for U11 and U12 were am- 96-well plate filled with 60 μL of solution D [4 M guanidinium plified from genomic DNA using the primers listed above. Probes thiocyanate, 25 mM sodium citrate pH 7.0, 0.5% (wt/vol) were radiolabeled with α-32P-CTP (Perkin-Elmer) using the N-laurosylsarcosine (Sarkosyl), 0.1 M 2-mercaptoethanol] on Megaprime DNA labeling kit (GE Healthcare). After incubation ice. After all larvae from all three biological replicates were for 10 min, unincorporated nucleotides were removed using transferred into 96-well plates, they were homogenized until no Illustra ProbeQuant G-50 Micro Columns (GE Healthcare). solid tissue pieces could be observed in the solution. After ho- For analysis of native gels, U5, U6atac, and U11 snRNAs were mogenization, plates were spun briefly to collect samples in the cloned, sequenced, and linearized before synthesis of cRNA wells and immediately processed further. For extraction of ge- probes labeled with 32P-UTP (Perkin-Elmer). nomic DNA and RNA, 6 μL of 2 M NaOAc (pH 4.0), 60 μLof Phenol (pH 4.3), and 12 μL of chloroform:isoamyl alcohol (49:1) Northern Analysis. Membranes were prehybridized in Ultrahyb were added to each well. After addition of all reagents, wells hybridization buffer (Ambion) for 1–4 h at 42 °C. cDNA probes were mixed by vigorously pipeting up and down 8–12 times. were heated for 5min at 95 °C in the presence of salmon sperm Plates were incubated on ice for 15 min and then centrifuged DNA (200 μL per 10 mL hybridization buffer; Sigma) and chilled at full speed in a tabletop centrifuge at 4 °C for 20 min. After on ice. Probes were added to prewarmed hybridization buffer, centrifugation, 50 μL of the organic (lower) phase containing and membranes were incubated in the presence of the probe for DNA and were transferred into a new 96-well plate filled 12–16 h at 42 °C. Membranes were washed four times in 0.1× with 40 μL of absolute EtOH per well. At this stage, the plates SSC, 0.1% SDS at 65 °C for gradients and 2 × 5-min washes in containing the aqueous phases were sealed and stored at −70 °C. 2 × SSC, 0.1% SDS at 42 °C followed by 2 × 15-min washes in 0.2 × SSC, 0.1% SDS at 42 °C for native gels. Membranes were Extraction of Genomic DNA from Single Embryos. For extraction of exposed to Phosphor Storage screens (GE Healthcare) before genomic DNA from single embryos, EtOH and organic phases image acquisition using ImageQuant TL software (GE Healthcare) were mixed by pipetting and incubated at room temperature for on a Storm PhosphorImager 820 (GE Healthcare). ImageQuant 2–3 min before plates were centrifuged at 2,000 × g in a bench-top

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 2of12 centrifuge at 4 °C for 5 min. Supernatants were carefully discarded protocol. After precipitation with EtOH overnight, resuspension and 125 μL of 0.1 M sodium citrate solution were added to each in 10.5 μL of nuclease-free water and heating to 55 °C for 5 min well to wash the pellet. Plates were sealed and incubated at room to redissolve the poly(A) enriched RNA, samples were stored temperature for 30 min with intermittent mixing about two to three at −70 °C until further use. times. After incubation, plates were again centrifuged at 2,000 × g in a bench-top centrifuge at 4 °C for 5 min and supernatants re- Library Construction for RNAseq. Libraries for deep sequencing moved. Wash steps were repeated once with sodium citrate, once were generated using the SOLiD Whole Transcriptome Analysis with 150 μL of ice-cold 75% (vol/vol) EtOH, and once with 150 μL Kit (Applied Biosystems/Life Technologies) with protocol Part of ice-cold 95% EtOH. After centrifugation and discarding of su- Number 4409491 Rev F, 08/2009. Eight microliters of poly(A) pernatants, pellets were air dried for 15–30 min and resuspended in RNA were fragmented by incubating with RNase III for 10 min 20–50 μL of freshly prepared 8 mM NaOH. Plates were incubated at 37 °C and immediately purified on Qiagen RNeasy columns. on ice or at 4 °C for at least 1 h and up to overnight to solubilize Samples were concentrated to a volume of ∼5 μL by centrifuging pellets and one-tenth volume of 1 M Tris·HCl (pH 8.0) was added under vacuum at 40 °C and analyzed on the Agilent Bioanalyzer to neutralize the DNA solution. Genotyping of single embryos was (Agilent) using an RNA Nano 6000 chip (Agilent) to assess performed as described above using 1 μL of genomic DNA. quantity and size distribution of the fragmented RNA. The fragmented RNA was hybridized and ligated to Adaptor Mix A Extraction of Total RNA from Pooled WT and Mutant Larvae. For to generate templates for SOLiD System sequencing from the 5′ extraction of total RNA, aqueous phases from all wells identified end of the sense strand. After overnight ligation, samples were as homozygous mutant or WT larvae, respectively, were pooled reverse-transcribed for 30 min at 42 °C and the obtained cDNA in a 15-mL falcon tube. Isopropanol precipitation, washes, and purified using Qiagen MinElute PCR Purification Kit. cDNA resuspension of pellets were carried out as per standard pro- was diluted in 2× urea sample buffer [45 mM Tris base, 45 mM cedures. Concentration and purity of total RNA samples was Boric acid, 1 mM EDTA (free acid), 6% (wt/vol) Ficoll Type assessed spectrophotometrically using a Nanodrop. Integrity of 400, 3.5 M urea, 0.005% Bromophenol blue, 0.025% xylene total RNA was examined on the Agilent Bioanalyzer using an cyanol] and separated on 6% (wt/vol) polyacrylamide (19:1), 8 RNA 6000 Nano chip (Agilent). RNA for microarray analysis had M urea-TBE gels for 25 min at 180 V. When the Bromophenol to exhibit 260 nm:230 nm and 260 nm:280 nm absorption ratios blue marker had migrated halfway, gels were stained in SYBR of 1.8–2.0 and an RNA integrity number of at least 9.0 of 10. gold nucleic acid stain in TBE for 10 min. The region between 100 Two micrograms of total RNA for each sample were sent to the and 200 bp was excised and used as templates for 18 cycles of cDNA Australian Zebrafish Phenomics Facility at the University of amplification. After purification using PureLink PCR Micro Queensland for microarray analysis. For RNAseq, the three columns (Invitrogen), cDNA samples were assessed for concen- biological replicates for each sample were pooled and stored tration and size distribution. The majority of cDNAs in the library at −70 °C until poly(A) enrichment (see below). were between 150 and 250 bp in size and a smear analysis through the Bioanalyser software confirmed that only between 2% and RNA Amplification and Target Labeling for Microarrays. Microarrays 8% of cDNAs fell in the 25- to 150-bp size range, in accordance were performed as one-color experiments. Total RNA was lin- with the protocol requirements. cDNA libraries were used in the early amplified from 1 μg of total RNA per each sample using the next step for attachment to beads by emulsion PCR. Emulsion messageAMP amplified RNA (aRNA) kit (Ambion), yielding PCRs and sequencing on a SOLiD3 sequencer were performed a minimum of 20 μg of amino-allyl–labeled antisense aRNA. by Ivonne Petermann (employee of Life Technologies) ac- Quantity and integrity of aRNAs were compared by running each cording to the manufacturer’s recommendations. sample on a bioanalyzer RNA microfluidic chip (Agilent) before labeling. Five micrograms of each aRNA sample were then SOLiD3 Data Acquisition and Read Mapping. Sequencing reactions labeled by covalent linking of Cy3-labeled UTP (Amersham). were performed on a SOLiD3 sequencer in the laboratory of Finally, the labeled material was hydrolyzed and used for hy- Melissa Southey at the Department of Pathology, University of bridization to Agilent Zebrafish V2 Gene Expression arrays. Melbourne, VIC, Australia. Raw read files were transferred to Hybridization to each array was performed for 16 h at 45 °C. servers at the Queensland Centre for Medical Genomics, at the Institute for Molecular Bioscience in Brisbane. The genome Image Acquisition, Normalization, and Analysis. Hybridized micro- sequence of the zebrafish Zv9 genome assembly was obtained arrays were scanned on a 600B Scanner (Agilent) The images were from the University of California at Santa Cruz (UCSC) table analyzed using Imagene 5.6.0 (BioDiscovery) to determine mean browser. RNAseq reads were aligned to the genome using the foreground and background signal intensities. Data were extracted recursive X-MATE recursive mapping pipeline (6), using the using Agilent Feature Extraction software and analyzed using the following parameters: (i) strand-specific mapping, where tags are Linear Models for Microarray Data (LIMMA) software package via expected to align on the sense strand; (ii) raw tag length of 50 nt; the R Project for Statistical Computing (www.r-project.org). Data (iii) genome and junction mapping performed with the ISAS were background-corrected and normalized both within and be- mapping engine; (iv) recursive mapping N,M, where N is athe tween arrays (4). Differential expression was defined using a robust length of the tag, and M is the number of allowable mismatches: statistical method rather than simple fold-change. Genes were 50,5 then 45,5 then 40,5 then 35,3 then 25,2; (v)tagsaligning ranked using the B-statistic method, where both the fold-change to multiple genomic locations were discarded; (vi)exhaustive and variance of signals in replicates are used to determine the alignments were enabled (filtering was disabled); (vii)quality likelihood that expression of a gene is truly differential (5). A linear filtering was disabled; and (viii) mapping to junctions was en- model of the data were fitted to the experimental design matrix abled. During the mapping to junctions, 10 nucleotides were and used to calculate Bayesian statistics (B statistics; posterior required to overlap the exon–exon boundary for tag lengths ≥40, log odds) adjusted for multiple testing using Benjamini–Hochberg and 5 nucleotides were required to overlap for tag lengths ≤35. analysis (5). A threshold in the B statistic of 0.0 was adopted. Exon–exon junction libraries were constructed using Refseq zv9 annotations the scripts distributedwithX-MATE,andwere Poly(A) Enrichment of RNA for RNAseq. Total RNA from genotyped filtered to include introns lengths of more than 9 nt. Raw data homozygous WT and clbns846 mutant samples was enriched for are available through the Gene Expression Omnibus (accession poly(A) RNA by performing two rounds of purification using the no. GSE53935). BED and WIG files are available from the MicroPoly(A)Purist Kit (Ambion) according to the manufacturer’s following URLs:

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 3of12 Wild-type. point sequence that satisfy the consensus sequence motifs for http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.start. these intron features (8); and (ii) were sufficiently supported by negative.gz mRNA, EST, known protein-coding gene or zebrafish RNAseq http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.start. evidence available from Ensembl release 61. This process yielded positive.gz 606 bona fide U12-type introns of which 461 were found to have http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.wiggle. an associated Refseq Transcript ID. negative.gz Definition of a Reference Set of Exons and Introns. We defined a http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.wiggle. reference exon/intron set from the zebrafish Zv9 genome assembly, negative.bw by downloading all Refseq-annotated genes from the UCSC table http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.wiggle. browser onto the publicly accessible Galaxy graphic user interface positive.gz to the high performance computing cluster at Penn State Uni- http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.wiggle. versity (http://main.g2.bx.psu.edu/) (9). We excluded all Refseq positive.bw IDs that mapped to multiple locations or did not map within the http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.expect. – junc.BED.gz assembled 1 25, yielding a total of 13,854 genes http://grimmond.imb.uq.edu.au/files/zebrafishU12/108wt.unexpect. with 126,875 exons and 113,021 introns. Introns of less than 25 bp junc.BED.gz were excluded, leaving a set of 112,196 introns for analysis (Fig. S3). caliban. Construction of Combinatorial Libraries to Search for Cryptic Splicing. Combinatorial exon–exon junction libraries were constructed from http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.start. the set of zebrafish U12-type introns coordinates defined above negative.gz using a custom perl script (available on request). Each potential http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.start. donor site (within a 100-nt window centered on the canonical positive.gz donor site) was paired with every potential acceptor site (within http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.wiggle. a 100-nt window centered on the canonical acceptor site). Using negative.gz this script on 634 canonical U12 introns, we generated 6,467,434 http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.wiggle. potential exon-exon junctions. Junction libraries for use during negative.bw mapping were then constructed using scripts distributed with http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.wiggle. X-MATE. Mapping was then performed as described above with positive.gz the following changes: no recursive mapping was performed, tags http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.wiggle. were aligned first to the genome at 50,5 and then to the junction positive.bw libraries required a 10-nt overlap. http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.expect. junc.BED.gz Zebrafish Uniqueome. Uniqueome files for zebrafish Zv8 and Zv9 http://grimmond.imb.uq.edu.au/files/zebrafishU12/108ko.unexpect. genome assemblies were generated as previously described (10) junc.BED.gz and are available for download from: http://grimmond.imb.uq. edu.au/uniqueome/downloads. URLs for Upload of Sequencing Data as Custom Tracks into UCSC Calculation of Reads per Unique Kilobase per Million Reads Values. Genome Browser. BED files containing the chromosomal positions of all unique track type = bigWig name=”108mut negative” description=”cal read start sites were uploaded onto the publicly accessible Galaxy 108hpf mutant negative strand” bigDataUrl=http://grimmond. graphic user interface to the high-performance computing cluster imb.uq.edu.au/files/zebrafishU12/108ko.wiggle.negative.bw at Penn State University (9) at http://main.g2.bx.psu.edu/ for track type = bigWig name=”108mut positive” description=”cal further analysis. To determine a normalized quantitative value of 108hpf mutant positive strand” bigDataUrl=http://grimmond. intron retention for every intron in the reference dataset, imb.uq.edu.au/files/zebrafishU12/108ko.wiggle.positive.bw a Galaxy workflow was implemented to determine the number of track type = bigWig name=”108wt negative” description=”cal reads per unique kilobase per million reads (RPuKM) for each 108hpf wildtype negative strand” bigDataUrl=http://grimmond. intron. To enable the assignment of stranded sequencing data to imb.uq.edu.au/files/zebrafishU12/108wt.wiggle.negative.bw introns, both Refseq and U12 intron datasets were split in two track type = bigWig name=”108wt positive” description=”cal separate datasets corresponding to the positive and negative 108hpf wildtype positive strand” bigDataUrl=http://grimmond. strand using the “Filter” function in Galaxy to sort on the strand imb.uq.edu.au/files/zebrafishU12/108wt.wiggle.positive.bw column. In the first step, all tags with start sites within the co- ordinates of an intron in the reference sequence were selected by the “Join” function within the “Operate on Genomic Intervals” Definition of a Validated Set of Zebrafish U12-Type Introns. We obtained an updated dataset of U12 introns from the U12 database menu. The information from both strands was then concatenated into a single file using the “Concatenate” function in the “Op- (7) (http://genome.crg.es/cgi-bin/u12db/u12db3.cgi,courtesyof erate on Genomic Intervals” menu. Next, the “Group” function Tyler Alioto, Centre Nacional d’Anàlisi Genòmica, Parc Científic in the “Join, Subtract and Group” menu was used to calculate de Barcelona, Barcelona), which identified 733 introns in the total read counts per intron by grouping on the column con- zebrafish Zv8 genome assembly, compared with 702 introns in the taining the intron identifier and adding up the number of tags human hg18 genome assembly. With the release of the Zv9 as- mapping to the same intron. The resulting file was joined side by sembly, the coordinates of Zv8 U12 introns were converted to the side with the file containing the length of each intron using the corresponding genomic locations in Zv9 using the liftover function “Join two queries” in the “Join, Subtract and Group” menu. In of the UCSC genome browser. All resulting coordinates were the final steps, using the “Compute” function in the “Text Ma- manually inspected on a genome browser level to verify that nipulation” menu, RPuKM were calculated by dividing the they fulfilled the following criteria: (i) contained either the number of total reads per intron by unique intron length mul- AT-AC or the GT-AG combination of 5′ and 3′ splice site di- tiplied by 1,000. The resulting number was then divided by the nucleotides as well sequences at the 5′ splice site and the branch number of total mapped reads for the respective sample and

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 4of12 multiplied by 1,000,000 to obtain final RPuKM. To simplify the Calculation of Intron Retention Coefficients. Intron retention resulting output file, only the columns containing the intron coefficients (IRCs) were calculated in Microsoft Excel from identifier and the final RPuKM value were selected using the tab-delimited Galaxy output files. Only introns with >1total “Cut” function from the “Text Manipulation” menu. The anal- read or intron RPuKM > 0.05 in at least one genetic condi- ysis procedure for U12-type introns was essentially identical with tion were included. Introns where no corresponding RNAseq the only difference being in intron identifiers. To assess RPuKM gene expression value was available were also excluded from values for all U12-introns, unique identifiers were constructed the analysis. IRCs were calculated by dividing intron RPuKM by concatenating number, intron start position and values by the corresponding gene RPuKM values. IRC values intron end position into a single field. To analyze whole gene for U12- and U2-type introns were compared and IRC values expression, start sites were counted for individual exons as de- for U12-type introns were plotted against U12 intron subtype, scribed above and read numbers for all exons of a particular gene intron length and gene expression values using Prism5/Graphpad were summed up for calculation of gene RPuKM values. software.

1. de Jong-Curtain TA, et al. (2009) Abnormal nuclear pore formation triggers apoptosis 6. Wood DL, Xu Q, Pearson JV, Cloonan N, Grimmond SM (2011) X-MATE: A flexible in the intestinal epithelium of elys-deficient zebrafish. Gastroenterology 136(3): system for mapping short read data. Bioinformatics 27(4):580–581. 902–911. 7. Alioto TS (2007) U12DB: A database of orthologous U12-type spliceosomal introns. 2. Christie EL, Parslow AC, Heath JK (2008) Determination of mRNA and protein expression Nucleic Acids Res 35(Database issue):D110–D115. patterns in zebrafish. Methods Mol Biol 469:253–272. 8. Brent MR, Guigó R (2004) Recent advances in gene structure prediction. Curr Opin 3. Raghunathan PL, Guthrie C (1998) A spliceosomal recycling factor that reanneals U4 Struct Biol 14(3):264–272. and U6 small nuclear ribonucleoprotein particles. Science 279(5352):857–860. 9. Taylor J, Schenck I, Blankenberg D, Nekrutenko A (2007) Using galaxy to perform 4. Smyth GK, Yang YH, Speed T (2003) Statistical issues in cDNA microarray data analysis. large-scale interactive data analyses. Curr Protoc Bioinformatics Chapter 10:5. Methods Mol Biol 224:111–136. 10. Koehler R, Issac H, Cloonan N, Grimmond SM (2011) The uniqueome: A mappability 5. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential resource for short-tag sequencing. Bioinformatics 27(2):272–274. expression in microarray experiments. Stat Appl Genet Mol Biol 3:e3.

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 5of12 Fig. S1. rnpc3 is the mutated locus in clbn.(A) Simple schematic representation of the genetic interval containing the mutated locus in clbns486 based on the zebrafish genome assembly (Zv9). The genetic region of interest was defined by determining linkage of several simple sequence length polymorphism (SSLP) markers to the mutation (1). A total of up to 2,980 meioses from a defined series of crosses between the original ethylnitrosourea-mutagenized strain and a polymorphic mapping strain were analyzed, narrowing down the genetic interval containing the mutation to a distance of 0.31 cM between the closest linked markers. Blue boxes represent bacterial artificial chromosome (BAC) clones encompassing the region. Diagonal line indicates the presence of additional intervening sequence. Black arrows and gene names shown in italics represent annotated genes. (B) Aberrant transcripts extracted from clbns846 larvae encode truncated translation products. RT-PCR of rnpc3 RNA extracted from clbns846 larvae produces four faint bands (see Fig. 2A), corresponding to the aberrant transcripts (a–d) shown schematically in Fig. 2B. Product (i) is encoded by transcripts retaining intron 13 (a and c in Fig. 2B). A frame-shift in the coding se- quence results in a premature stop codon instead of residue 440. Product (ii) is encoded by transcripts (b and d in Fig. 2B) using a de novo 3′ splice site (ss) in intron 13 that results in the addition of 10 intronic nucleotides to the coding sequence. This also introduces a frame-shift and a premature stop codon is encountered instead of residue 445. Full-length WT product is shown below. (C) In vivo minigene assay to characterize the splicing defect in clbns846.A minigene spanning the U2-type intron 13, the U12-type intron 14, and flanking exons, was amplified from WT and clbns846 genomic DNA using primers carrying 5′ tails (23 bp; shown in red) to allow specific amplification of the exogenous transgene after injection into zebrafish embryos. No correctly spliced Legend continued on following page

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 6of12 RT-PCR product is detected upon injection of mutant minigene plasmid DNA into clbns846 and WT embryos (lanes 1 and 3, respectively). Sequencing of the lower bands confirms the use of the same de novo 3′ ss detected endogenously in clbns846. The retention of 10 intronic nucleotides is indicated by the yellow region in the schematic diagram. Mutant larvae display significant retention of the U12-type intron 14 upon injection of both the WT and mutant minigene (lanes 1 and 2, asterisks). (D) Full-length rnpc3 cDNA is RT-PCR amplified from RNA extracted from one to two cell stage embryos (lane 1) indicating maternal deposits of WT rnpc3. The level of full-length rnpc3 is greatly reduced in clbnZM mutants by 24 hpf and undetectable by 48 hpf. Lanes contain RT-PCR products of RNA extracted from genotyped homozygous WT (lanes 2 and 4) and clbnZM mutant embryos (lanes 3 and 5) at 24 hpf (lanes 2 and 3) and 48 hpf (lanes 4 and 5). (Lower) elfa control. (E–H) Brightfield microscopy images of left lateral views of (E) WT siblings from a clbnZM × clbns846 cross, (F) homozygous mutant clbns846 larvae, and (G) homozygous mutant clbnZM larvae. (H) Twenty-five percent of offspring from clbnZM × clbns846 crosses show a phenotype in- distinguishable from that of either clbns846 or clbnZM homozygous mutants. The images in panels E–H were taken at the same magnification.

1. Zhou Y, Zon LI (2011) The zon laboratory guide to positional cloning in zebrafish. Methods Cell Biol 104:287–309.

Fig. S2. Widespread rnpc3 mRNA expression in early zebrafish embryos becomes more restricted during development. WISH analysis was carried out on clutches of embryos derived from an in-cross of clbnZM heterozygotes using a 599-bp antisense RNA probe designed to hybridize to zebrafish rnpc3 mRNA. (A and B) Widespread rnpc3 mRNA expression in WT embryos at 10 and 24 hpf (arrows). (C and D) From 48 to 72 hpf, rnpc3 mRNA expression becomes pro- gressively restricted to proliferating tissues, including the lens (arrow), pharyngeal region (white arrow), liver and pancreas anlage (black and white arrow- heads, respectively), and intestine. (E and F)At96–120 hpf, rnpc3 mRNA expression is prominent in the digestive organs. Right lateral views show strong expression in the developing pancreas (white arrowhead) and intestine (arrow). (G) Sense probe at 96 hpf produces very weak, nonspecific staining. (H) At 120 hpf, clbn larvae show markedly reduced rnpc3 expression, compared with WT (F). B–D are dorsal views; E–H are right lateral views. Asterisk in E denotes residual pigment after PTU treatment. The same patterns of expression were obtained with a 790-bp probe designed to hybridize to a nonoverlapping region of rnpc3 mRNA. All embryos/larvae shown were genotyped and all images were taken at the same magnification.

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 7of12 Fig. S3. Aberrant U12-type snRNPs are present in caliban (clbn). (A and B) Additional glycerol gradients followed by Northern analysis of U11 spliceosomal snRNPs from WT and clbnZM extracts demonstrate that the abnormal sedimentation of U11 snRNA-containing snRNPs in clbnZM extracts (see Fig. 3D)isre- producible. Gradients shown in B were centrifuged for 17 h instead of 19 h. (C) Northern analysis of WT, clbns846, and clbnZM extracts resolved on native gels and probed for U12 snRNA show a similarly retarded band (arrowhead) in clbns846 and clbnZM extracts compared with WT, indicating the functional equiv- alence of the two clbn alleles.

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 8of12 Fig. S4. Characteristics of intron retention. (A) Schematic diagram showing the pipeline used to define a reference set of introns and exons. The current annotated Refseq geneset was downloaded from the UCSC table browser (http://genome.ucsc.edu/cgi-bin/hgTables) into Galaxy (http://main.g2.bx.psu.edu/). IDs that did not map to the assembled chromosomes 1–25 or that mapped to multiple locations in the genome were excluded and the remaining set of 13,854 Refseq IDs was split into introns and exons using the Gene BED To Exon/Intron/Codon BED expander function in Galaxy. This yielded 126,875 exons and 113,021 introns, 825 of which were excluded because they were shorter than 25 bp. Both reference sets are distributed evenly across the positive and negative strand. (B–D) The degree of intron retention for any given transcript does not correlate with its overall expression level (B), intron length (C), or U12-type intron Legend continued on following page

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 9of12 subtype (D). The data in C shows that the retention of U12-type introns in clbn is higher than in WT and is independent of intron length. Although this graphical representation may give the impression that U12-type introns (represented by ∼300 red dots) are generally much shorter than Refseq (U2-type) introns, there is no significant difference between the average length of U2-type and U12-type introns. Indeed, the vast majority of Refseq introns (repre- sented by >55,000 blue dots on the graph) is also very short (in the graph the shorter Refseq introns are on top of each other; thus, the longer ones are individually more conspicuous). This graph also shows that there is no significant relationship between the degree of intron retention of affected Refseq introns in clbn and their length. (E) RT-PCR of three different U12-type intron-containing transcripts provides evidence of slight retention of U12-type introns in the snrpe and ppp2r2a genes in clbns846 compared with WT. Sequencing of the fastest migrating (spliced) bands in the clbn lanes provided no evidence of cryptic splicing for any of the three genes. (F) Schematic representation of the custom junction approach to detecting novel cryptic splice sites.

Fig. S5. Differential gene expression in clbn.(A) Fold-changes obtained by microarray and RNAseq for statistically significant, DEGs in clbn vs. WT are strongly correlated. (B) Venn diagram showing significant overlap between U12-type intron containing genes and high-confidence DEGs. (C) Analysis of the relative expression of U12 intron-containing genes and their IRCs in clbn reveals widespread aberrations in key steps of mRNA processing. *IRC > 0.10, **IRC > 0.25, ***IRC > 0.50. Red arrows denote up-regulated gene expression; green arrows denote down-regulated gene expression. (D) Cell-cycle repressor genes, in- cluding prohibitin (phb), cyclin G1 (ccng1), e2f5, and p53 are up-regulated in clbn.(E) RT-qPCR reveals that overall up-regulation of p53 transcript levels is because of a strong increase in the Δ113p53 isoform. (F) Use of the alternative transcription start site corresponding to Δ113p53 is clearly visible as an ad- ditional read peak in RNAseq data from clbn mutants. See also Datasets S3–S5.

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 10 of 12 Table S1. Read mapping statistics Total uniquely mapping Sample 50 nt 45 nt 40 nt 35 nt 25 nt Total GB start sites

WT 108hpf 18,199,851 3,393,511 3,484,923 1,209,052 6,330,656 1.40 28,309,779 clbn 108hpf 16,457,138 3,164,151 3,161,809 1,086,117 5,814,016 1.28 25,718,058

Summary of results from mapping of RNAseq raw tags. Total uniquely mapping reads (last column) were used for calculation of RPuKM values. All other columns contain multimapping tags.

Table S2. U12 intron genes in mRNA processing pathways Microarray fold-change RNAseq fold-change Gene symbol Refseq ID clbn/WT clbn/WT IRC IRC fold-change clbn/WT polr3h NM_001002476 NS NS 0.33 6.34 polr3c NM_200232 NS NS 0.11 3.31 polr2k NM_001199106 NS NS 0.11 9.10 polr1e NM_001020704 NS NS ND ND rnpc3 NM_001039930 −2.28 −3.67 0.68 20.00 snrpe NM_201004 1.51 1.55 0.04 ND (no reads in WT) srr35 NM_200533 NS NS ND ND dcp2 NM_200152 −1.36* −2.12 ND ND edc4 NM_001045085 1.21 1.94 0.36 109.15 exosc1 NM_001145567 −1.58* −1.48 0.13 6.50 exosc2 NM_001017572 −1.18* +1.58 0.51 ND (no reads in WT) exosc5 NM_001045352 +1.93 +6.14 0.27 3.22 exosc9 NM_001006077 −1.56 −1.46 0 NS lsm5 NM_001100438 NS NS 0.12 1.05 lsm8 XM_689282 ND ND ND ND ipo4 XM_679071 NS ND ND ND ipo9 NM_213539 NS NS 0.41 ND (no reads in WT) ipo11 NM_001159661 NS NS 0.23 ND (no reads in WT) xpo4 NM_212674 NS NS 0.51 ND (no reads in WT) xpo5 XM_001921387 ND ND ND ND xpo7 NM_001128230 NS 1.57 0.28 ND (no reads in WT) elys/ahctf1 XM_692635 NS ND ND ND nup107 NM_001030167 NS NS ND ND nup155 NM_200156 NS NS 0.02 1.48 nup160 NM_200662 NS NS 0.25 3.86 nup205 NM_001003859 NS NS ND ND nup210 N/A ND ND ND ND nup210l XM_002667560 NS ND ND ND upf1 NM_213474 NS NS 0.04 ND (no reads in WT) rpl4 NM_213107 NS NS 0.02 3.35

U12-intron containing genes involved in key steps of mRNA processing with associated intron retention coefficients and gene-expression fold-changes from microarrays and RNAseq. Fold-changes marked with an asterisk (*) were not ranked as statistically significant (B < 0) in microarray analysis, but are significantly differentially expressed (χ2 > 6.64, = P < 0.01) by RNA sequencing. ND, not determined; NS, not significantly different.

Dataset S1. Curated set of U12 introns in zebrafish

Dataset S1

Chromosomal coordinates, length, and PositionID of 606 U12 introns identified in the zebrafish Zv9 genome assembly.

Dataset S2. Tags mapping to custom cryptic splice junction library

Dataset S2

List of hypothetical novel cryptic junctions with RNAseq reads mapping. None of the junctions with significantly different read numbers in clbn and WT correspond to possible bona fide splice junctions.

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 11 of 12 Dataset S3. High confidence differentially expressed genes

Dataset S3

List of 203 genes with statistically significant differential expression according to microarray and RNAseq analyses.

Dataset S4. Complete RNAseq analysis of gene expression in RPuKM

Dataset S4

List of all Refseq-annotated genes with detectable expression by RNAseq in WT or clbn.

Dataset S5. All differentially expressed microarray probes (B > 0)

Dataset S5

List of all probes with significantly different expression values between WT and clbn (B > 0).

Markmiller et al. www.pnas.org/cgi/content/short/1305536111 12 of 12