Supplementary Figures and Tables

Figure S1. Validation of miR-4423 expression in the human airway via qRT-PCR.

The mature 3p and 5p forms of miR-4423 are detected in human bronchial epithelium

(n=9). The average Ct values were 26.6 for the 3p form and 30.1 for the 5p form.

Expression of RNU44 was measured as a positive control (average Ct value= 24.3).

1 Figure S2. MiR-4423 overexpression. A) Stable and transient overexpression of a plasmid containing the miR-4423 precursor plus 200 bps of flanking region, results in the production of processed 3p and 5p forms. An empty vector was used as a control.

Figure S3. Validation of miR-4423 biogenesis. A) Transfection with a siRNA targeting

Dicer decreases Dicer mRNA levels in H1299 cells stably overexpressing miR-4423

(P=0.0001)(n=3). B) Levels of processed miR-4423-5p, miR-4423-3p, miR-10b, miR-21 and miR-26 are significantly decreased in Dicer knocked-down cells compared to control

(P= 0.0014, P= 0.0013, P=0.04, P=0.01; P=0.019, respectively). Error bars indicate standard error and P values were determined using Student’s t test.

2

Figure S4. RNA-Seq profile of the miR-4423 locus using reads that associated with the AGO complex. Using a small RNA sequencing dataset from Hafner et al (1), one read that associated with TNRC6C and two reads that associated with AGO3 aligned uniquely to the 5p form while one read that associated with TNRC6A aligned uniquely to the 3p form. Grey background indicates the genomic location of the 5p and 3p forms of miR-4423. Red text indicates a mismatch at that position in the read compared to the genome.

Figure S5. In situ hybridization of U6 and scramble controls. In situ hybridization in tissue sections from the mainstem bronchus shows strong expression of the positive control U6 and no expression of the negative scramble control.

3

Figure S6. The expression of miR-4423 is highly correlated with that of pri-miR-

4423 and WDR63. A) Expression of miR-4423 (3p and 5p), pri-miR-4423 and WDR63 was measured in the human tissue panel (n=24). The expression of pri-miR-4423 and

WDR63 is strongly correlated across these tissues (R=0.97). In addition, the expression of miR-4423-3p and 5p is positively correlated to that of WDR63 (R= 0.71 and R=0.71, respectively). The red diamonds correspond to tissues in which pri-miR-4423, WDR63 and both mature forms of miR-4423 are highly expressed (i.e. bronchial epithelium, lung, trachea, nasal epithelium, fallopian tube epithelium and ovary). The black diamonds correspond to tissues in which pri-miR-4423 and WDR63 are highly expressed but the mature forms of miR-4423 are not highly expressed (i.e. testis, placenta, kidney and brain). The blue diamonds correspond to the remaining 14 tissues that do not express any of the transcripts. B) Expression of miR-4423 (3p and 5p), pri-miR-4423 and WDR63

4 was measured at different time points of the differentiation of the airway epithelium at an

ALI (days 0,2,4,6,9,11 and 13). Expression of all three transcripts was highly correlated during this process (WDR63 vs. pri-miR-4423; R=0.97, WDR63 vs. miR-4423-3p;

R=0.99, WDR63 vs. miR-4423-50; R=0.98).

Figure S7. MiR-4423 expression is highly correlated to FOXJ1. A) NHBE cells isolated from three different lung donors were differentiated at an ALI for 13 days.

Expression of the 3p and 5p forms of miR-4423 are first detected at day 6 and expression increases through day 13. Expression of the ciliated cell markers, FOXJ1 and WDR63 is highly correlated with both miR-4423-3p (R=0.92, R=0.94) and miR-4423-5p (R=0.9,

R=0.89). B) The expression of miR-4423 (3p and 5p) and that of the airway epithelial cell type markers, FOXJ1, MUC5AC, MUC5B, CC10 and ASCL1 were measured during differentiation of airway epithelium at an ALI. Pairwise spearman correlations between all marker , WDR63 and miR-4423 were clustered showing that the pattern of miR-

4423 expression during differentiation is more similar to the expression pattern of the

5 ciliated markers FOXJ1 and WDR63 than the other cell type markers. Red indicates a more positive correlation coefficient.

Figure S8. MiR-4423 overexpression results in an increase in the number of FOXJ1- expressing cells. Expression of FOXJ1 was assayed via qRT-PCR in NHBE cells overexpressing miR-4423 or control at different time points of differentiation into mucociliary epithelium at an ALI (Days 1,5,7,9,11,13, 15 and 17). Using a linear model that had FOXJ1 expression as the response and time point and treatment as predictors we found that the expression of FOXJ1 is significantly increased in cells overexpressing miR-4423 compared to controls (p=0.025). Expression of FOXJ1 was normalized using both eGFP and GAPDH.

6

Figure S9. MiR-4423 knockdown results in a modest decrease in the number of cells expressing ciliated cell markers. A) NHBE cells were transduced with either miRZip™ anti-microRNA expression lentivectors to inhibit both mature miR-4423-3p and -5p or a scramble control and were differentiated into mucociliary epithelium at an ALI. miR-

7 4423 knocked-down cells show a modest reduction in FOXJ1 and β-tubulin staining (top) compared to control (bottom). Representative images shown were taken at days 13

(FOXJ1) and 16 (β-tubulin). Arrows are pointing to regions of positive staining. B)

Expression of FOXJ1 was assayed via qRT-PCR in miR-4423 knocked-down NHBE or control at different time points of differentiation into mucociliary epithelium at an ALI

(Days 3,7,13 and 17). FOXJ1 showed a modest trend to downregulation.

8

Figure S10. MiR-4423 expression is associated with lung cancer. A) The expression of miR-4423-3p was measured in matched tumor and adjacent normal tissue derived from

SCC (SCC) (n=15), ADC (ADC) from non-smokers (n=10) and ADC (ADC) from current and former smokers (n=10). The log2 ratio of tumor/adjacent normal tissue was calculated for each matched sample. mir-4423-3p expression decreases by at least 2-fold

9 in 80% of SCC, 70% of ADC from current and former smokers and 70% of ADC from non-smokers. B) Expression of WDR63 (n=10), pri-miR-4423 (n=10), miR-4423-3p

(n=10) and miR-4423-5p (n=8) was measured via qRT-PCR in lung tumors and matched adjacent normal tissues. There is a strong positive correlation between WDR63 and pri- miR-4423 (R=0.86), between pri-miR-4423 and miR-4423-3p (R=0.86) and between

WDR63 and miR-4423-3p (R=0.78). Similar results were obtained when calculating the correlations between WDR63 and miR-4423-5p (R=0.74) and between pri-miR-4423 and miR-4423-5p (R=0.80). The blue diamonds correspond to the adjacent normal tissues and the red diamonds to the tumors. C) We examined WDR63 expression in a larger set of tumor and matched adjacent normal tissue pairs using RNA-seq data generated by the

TCGA(2). Similar to the previous findings, WDR63 was downregulated in 81.4% of SCC

(n=43), 71% of ADC from current and former smokers (n=7) and 73% of ADC from non-smokers (n=42).

10

Figure S11. MiR-4423 expression decreases in squamous metaplasia. The expression of miR-4423-3p and -5p was measured in laser captured normal airway epithelium (n=2), squamous metaplasia (n=1) and SCC (n=2) from resected lung tissue of a single individual with lung cancer. MiR-4423 expression decreases in squamous metaplasia and even further in SCC compared to the normal airway epithelium.

11

Figure S12. Copy number variation (CNV) for the region of miR-4423 and WDR63 in lung tumors. A) In the TCGA(2), CNVs were measured in ADC and adjacent normal pairs (n=129) and in a larger number of ADC with corresponding RNA-seq (n=196). B)

CNVs were also measured in SCC and adjacent normal tissue (n=133) and in a larger number of SCC with corresponding RNA-seq (n=172). CNVs were measured using the

Affymetrix Genome-Wide Human SNP Array 6.0. A positive segmentation mean indicates amplification while a negative segmentation means indicates deletion.

Expression was measured by Fragments per Kilobase per Million (FPKM) using the

12 Illumina HiSeq 2000. We did not observe a significant CNV in the region overlapping with miR-4423 and WDR63 compared to adjacent normal or a significant correlation between a CNV in that region and WDR63 expression in both ADC and SCC tumors.

These results suggest that a CNV in the region of miR-4423 is not responsible for the loss of miR-4423 expression in lung tumors.

13

Figure S13. Methylation for the region of miR-4423 and WDR63 in lung ADC. In the

TCGA (2), matched methylation and expression was measured in lung ADC

(n=325). Methylation was measured using the Infinium HumanMethylation450 BeadChip

Kit. Ten probes on this platform overlapped with the region of miR-4423 and WDR63.

Expression was measured by Fragments per Kilobase per Million (FPKM) using the

14 Illumina HiSeq 2000. The x-axis indicates the level of methylation at that CpG dinucleotide (Beta value) with zero corresponding to no methylation and one corresponding to complete methylation. The y-axis shows log2 transformed FPKM values for WDR63 after adding a pseudocount of one. Eight of the ten locations exhibited a significant negative correlation between WDR63 expression and methylation (P < 0.01;

Pearson) suggesting that loss of WDR63 and miR-4423 is due to increased methylation in that region in a subset of lung ADC.

15

Figure S14. Methylation for the region of miR-4423 and WDR63 in lung SCC. In the

TCGA(2), matched methylation and gene expression was measured in lung SCC (n=218).

Methylation was measured using the Infinium HumanMethylation450 BeadChip Kit. Ten probes on this platform overlapped with the region of miR-4423 and WDR63. Expression was measured by Fragments per Kilobase per Million (FPKM) using the Illumina HiSeq

16 2000. The x-axis indicates the level of methylation at that CpG dinucleotide (Beta value) with zero corresponding to no methylation and one corresponding to complete methylation. The y-axis shows log2 transformed FPKM values for WDR63 after adding a pseudocount of one. None of the ten locations exhibited a significant negative correlation between WDR63 expression and methylation (P > 0.05; Pearson) suggesting that loss of

WDR63 and miR-4423 in lung SCC is not due to increased methylation in that region.

Figure S15. Comparison of lung cancer cell lines sensitive and resistant to miR-4423 overexpression in soft agar. We observed that some lung cancer cell lines showed a significant decrease in anchorage independent growth with miR-4423 overexpression

(sensitive) while other lung cancer cell lines did not (resistant) as measured by colony formation in soft agar. In order to better understand the differences in miR-4423 sensitive and resistant cells, we profiled gene expression in two miR-4423 sensitive cell lines

(SW900, Calu-6) and one miR-4423 resistant cell line (H2170). We observed that 2,073

17 genes were significantly up-regulated and 2,761 genes were significantly down-regulated in both sensitive cell lines compared to the resistant cell line (FDR < 0.01). Using

DAVID, we found that genes up-regulated in miR-4423 sensitive cells were enriched in developmental genes important for cell differentiation (including NOTCH1, NOTCH2, and members of the HOXA, HOXB, and HOXC clusters), genes in focal adhesion pathways important for cell-to-cell contact and migration (including ITGA1, ITGA5,

ITGA11, ITGA7, ITGAV, ITGB1, ITGB3, PIK3CA, FN1, and ACTN1), and cell cycle genes (including AURKA, AURKB, CDKN2A, CDKN2C, and CCNA2). Genes down- regulated in miR-4423 sensitive cells were enriched for those involved in apoptosis

(including BID, NFKB1, TRADD, FAS, CASP3, CASP6, CASP7, CASP8, and CASP10).

Figure S16. miR-4423 overexpression suppresses tumor growth in vivo. Calu-6 and

H1299 cells stably expressing miR-4423 or control (5x106) were injected subcutaneously into NSG mice (8 mice/group). We observed a modest decrease in tumors size in miR-

4423 overexpressing cells when compared to control (P=0.008, P=5.19x10-9). Values represent tumor volume (mm3) ± standard error.

18

Figure S17. MiR-4423 overexpression promotes cellular organization in lung tumors. H&E staining was performed in miR-4423-overpressing tumors and controls (H1703). Mir-4423- overexpressing tumors exhibit foci of structured cellular organization, such as the presence of clusters and linear alignments. These foci were not observed in the control tumors.

19

Figure S18. Comparison of miR-4423, miR-449, and miR-34 predicted targets. A)

The seed region (underlined) of miR-4423-3p partially overlapped with seeds from other involved in airway epithelial differentiation(3, 4) (overlapping bases are highlighted in red). B) Targets for miR-449a, miR-449b-5p, miR-449c-5p, miR-34a-5p, miR-34b-5p, and miR-34c-5p were predicted using the same procedure used to identify

20 miR-4423 targets. The predicted targets for each of these microRNAs significantly overlapped with the predicted targets for miR-4423-3p (p < 0.001; Fisher Exact Test).

However, miR-4423-3p was predicted to target additional sets of distinct genes not targeted by any of the microRNAs in the miR-449/miR-34 family. These results suggest that the function of miR-4423 may be partially redundant with microRNAs in the miR-

449/miR-34 family, but miR-4423 may also have distinct functions in airway epithelium.

Figure S19. Differential expression of genes in response to miR-4423 overexpression.

A) 1,231 genes were differentially expressed when miR-4423 was transiently overexpressed in H1299 cells. The black bars indicate genes with a predicted miR-4423 target site in its 3’ UTR. B) The distribution of t-statistics for genes with predicted target sites for miR-4423 (red) was significantly more negative compared to the distribution of genes without predicted target sites (black) indicating that genes with predicted target

21 sites were more downregulated than genes without target sites (P = 4.1e-07; KS-test).

The t-statistics for each gene was derived from a Student’s t test.

Figure S20. MiR-4423 modulates gene expression changes associated with the differentiation of bronchial epithelial cells at an ALI. Genes upregulated and downregulated with miR-4423 overexpression (left) are concordantly enriched (right) among genes upregulated and downregulated in NHBEs differentiated at an ALI in a microarray dataset from Ross et al

(5)(GSEA; FDR < 0.05). The genes in the heatmap are a union of the leading edges genes from

GSEA of the miR-4423 induced and repressed gene sets. Each gene is represented in the same row across heatmaps. The dashed line indicates the timepoint (between days 4-8) where miR-

4423 expression becomes detectable in our ALI experiments.

22

Figure S21. Genes changing in expression upon overexpression of miR-4423 are enriched among genes changing in the opposite direction in lung cancer. Genes differentially expressed upon overexpression of miR-4423 are enriched among genes changing in the opposite direction in (A) lung ADC compared to adjacent normal tissue from two previously published gene expression studies (6, 7) and (B) SCC compared to adjacent normal tissue from two published gene expression studies (6, 8) (FDR < 0.05;

GSEA). Unlike the genes down-regulated with miR-4423 overexpression, the genes up- regulated with miR-4423 overexpression were not enriched in genes changing between tumor and adjacent normal in the study from Wachi et al (8). This may be due to the decreased power to detect differential expression from a smaller sample size compared to

23 the other lung cancer studies. The genes in each heatmap are a union of leading edges genes from all enrichments of that particular dataset.

Supplementary Tables

Table S1. Demographics for individuals recruited for bronchoscopy. RNA from these samples was used for sequencing.

Smoking Lung Cancer Never Current No Cancer Cancer n 3 3 3 3 Sex 3F 2M, 1F 1M, 2F 2M, 1F Age 29.7 (4.9) 34.3 (11.6) 49.0 (7.8) 64.7 (16.3) Pack years - 14.8 (13.3) 49.5 (6.5) 75.7 (22.7) Smoking status 3N 3C 2Fmr,1C 2Fmr,1C Months since quiting - - 23 (18) 174 (127) Histology - - Pne, sar, Inf 1Sq, 2Ad Staging - - - T4N1M1, T2N0M0, T1N0M0 The mean and standard deviation (in parentheses) are reported. N: Never smoker; Fmr: Former smoker; C: Current smoker Pne: organizing pneumonitis; Sar: sarcoidosis; Inf: chronic inflammation due to foreign body material. Sq: SCC; Ad: ADC Staging: Tumor size (T), Node (N), Metastasis (M)

Table S2. Sequence alignment statistics for sequencing data.

Never Smoker Smoker No Cancer Cancer Total number of reads 52502713 66428492 93681388 55447184 Aligned reads 31288411 34923660 40454554 25998462 Reads aligning to a 13821764 11751896 8669508 5067733 microRNA precursor

Table S3: Predicted novel microRNAs by miRDeep.

Location miRDeep score Average RPM chr1:85599525-85599545 20098.6 216.896 chr15:83424800-83424820 4753.5 27.816 chr6:133137944-133137963 4174.6 108.406 chr19:49994432-49994453 268.3 6.089 chr1:109643159-109643183 139.6 4.046

24 Table S4. MiR-4423 expression in lung cancer cell lines and undifferentiated primary bronchial epithelial cells.

Sample miR-4423 (CT) RNU44 (CT)

Human Bronchial Epithelial Cell lines HBEC1 38.47 25.38 HBEC2 60 25.71 HBEC13 60 24.81 HBEC14 60 24.17

NHBE 40 19.21

Primary Bronchial Epithelial Cells 8005-010 38.19 24.86 8006-013 39.47 24.92

8008-018 38.01 24.31 Lung Cancer Cell lines Calu6 51.08 24.76 H460 53.26 24.72 H23 60 25.39 HCC827 50.78 24.63 A549 48.99 25.49 H1435 60 24.86

H1299 40 18.25 H1703 36.9 23.89 H2170 38.42 23.17

RH2 36.29 23.23 SW900 40 23.83 H520 40 21.87 Calu1 40 21.57 MiR-4423 was measured by qRT-PCR in immortalized human bronchial epithelial cell lines, cultured primary bronchial epithelial cells, and lung cancer-derived cell lines. MiR-4423 expression is not detected in any of the cells screened (Ct > 35). RNU44 expression was measured as a positive control.

25 Table S5. Pri-miR-4423 expression in lung cancer cell lines

Pri-miR-4423 was measured by qRT-PCR in lung cancer cell lines. Pri-miR-4423 expression is not detected in any of the cells screened (Ct>35). GAPDH expression was measured as a positive control.

Table S6. Demographics for patients undergoing diagnostic bronchoscopy on whom mir- 4423 was measured by qRT-PCR.

Gender Age Cancer Smoking Status Staging Diagnosis M 53 No F NA Organizing pneumonitis M 59 No F NA Granuloma F 71 No N NA Sarcoid F 49 No F NA Normal M 72 Yes F T4N1M1 Adenocarcinoma M 76 Yes F T2N0M0 Squamous cell carcinoma F 51 Yes C T1N0M0 Carcinoid F 87 Yes F IA Adenocarcinoma M 81 Yes F IIB Squamous cell carcinoma Gender: Male (M); Female (F) N: Never smoker; Fmr: Former smoker; C: Current smoker Staging: Tumor size (T), Node (N), Metastasis (M)

Table S7. Genes downregulated upon overexpression of miR-4423 are enriched in genes containing putative binding sites for the microRNA in their 3’UTRs.

Sample Downregulated Upregulated Predicted 3’UTR Binding Site 97 55 No Predicted 3’UTR Binding Site 538 541

SI Materials and Methods

Patient population and sample collection

In this manuscript, we collected airway and lung tissue from 4 distinct cohorts:

26 For the airway miRNA-seq studies (Figure 1), we recruited never (n=3) and current (n=3) smokers who underwent volunteer fiberoptic bronchoscopy at Boston University Medical

Center, as well as current and former smokers undergoing bronchoscopy in the operating room during resection of a suspicious lung nodule (see Table S1 for demographics).

Patients were classified as having cancer (n=3) or not having lung cancer (n=3) based on the pathology of the resected tissue. Patients without lung cancer were diagnosed with alternative benign diseases of the chest, such as organizing pneumonitis, sarcoidosis and chronic inflammation due to foreign body material (Table S1).

For the airway mir-4423 qRT-PCR studies (Figure 4A), we recruited smokers with lung cancer (n=5) and smokers with benign disease of the chest (n=4) who underwent bronchoscopy for suspect lung cancer (see Table S6 for patient demographics).

For the lung tissue miR-4423 qRT-PCR studies (Figures 4B and Figure S9A,B), snap frozen tumor and adjacent normal lung tissue were collected from patients undergoing resection of their primary lung cancer (ADC from current and former smokers (n=10),

ADCs from non-smokers (n=10) and SCC (n=15)) relative to matched adjacent normal lung tissue. These samples were collected at the Mayo clinic in Rochester, Minnesotta.

For the qRT-PCR studies in matched normal, premaligant and SCC (Figure S10), lung tissue specimens collected at UCLA medical center were prepared for Laser Capture

Microdissection by sectioning frozen lung tissues at 7 µm thickness onto thermo labile membrane slides. Frozen sections were then stained with hematoxylin and eosin (H&E) and all cytological areas for laser capture microdissection were pathologically identified.

Cells were captured with the Leica AS-LMD system. Captured cells were stored in

27 QIAzol (Qiagen, Valencia, CA) solution at -80 °C prior to RNA extraction using the

Qiagen (Valencia, CA) miRNeasy Mini Kit and RNeasy MinElute Cleanup Kit. RNA was then amplified with the NuGEN (San Carlos, CA) Pico Kit.

Small RNA -sequencing and sequencing data analysis

Small RNA-sequencing. For small RNA sequencing library preparation, 300ng of small molecular weight fraction (<200bp) RNA was pooled from the airway of individuals within each phenotype (never smoker, current smoker, no cancer and cancer), for a total of 4 pools each consisting of 900ng RNA. Each pool was size fractioned using the Ambion flashPAGE fractionator to obtain small RNAs of 10-40nts. Small RNA sequencing libraries were created using SOLiD Small RNA Expression Kit (Ambion) starting with 200ng of the size fractionated material following the manufacturer’s instructions. Libraries were sequenced on an Applied

Biosystems SOLiD System to obtain 36 bp reads.

Sequence alignment and novel miRNA prediction. Following generation of microRNA sequencing data on pooled airway samples from healthy current and never smokers and smokers with and without lung cancer (n=4), the sequence for the adapter used in library preparation was aligned to the 3’ end of each read in color space using the

“trimLRPatterns” function from the “Biostrings” package v2.12.10 in Bioconductor v2.4

(9). Increasing numbers of mismatches were allowed depending on how much of the adapter aligned to the 3’ end of the read: No mismatches were allowed if only the first 3 colors of the adapter aligned to the 3’ end of the read, 1 mismatch was allowed if the first

5 colors of the adapter aligned to the 3’ end of the read, 2 mismatches were allowed if the first 7 colors of the adapter aligned to the 3’ end of the read, 3 mismatches were allowed if the first 9 of the adapter colors aligned to the 3’ end of the read, 4 mismatches were allowed if the first 12 colors of the adapter aligned to the 3’ end of the read, 5

28 mismatches were allowed if the first 15 colors of the adapter aligned to the 3’ end of the read, and 6 mismatches were allowed if all 18 colors of the adapter aligned to the 3’ end of the read. No trimming was performed if only the first two colors of the adapter aligned to the 3’ end of the read. In addition, the last color representing the transition from the last nucleotide of the biological RNA molecule to the adapter was trimmed off for reads where the adapter was present.

After adaptor trimming reads were aligned to the build 19 (hg19) using

Bowtie version 0.12.3 (10) allowing up to two mismatches and limiting the number of locations a read can map to the genome to four or fewer. Reads that aligned with zero mismatches and that were between 19 and 25 nucleotides long were used as input for miRDeep (11) to predict novel microRNA loci. The number of reads that overlap with each known microRNA from miRBase v16 (12) or that overlap with each novel microRNA from miRDeep was counted using BEDTools v2.9.0 (13). For each sample, normalized expression values were derived by dividing each microRNA count by the total number of reads (i.e. Reads Per Million reads sequenced; RPM).

Analysis of AGO complex immunoprecipitation dataset (GSE21918). To determine whether miR-4423 associates with the AGO complex, we analyzed the sequencing data from Hafner et al; in which small RNA associated with the Argonaute complex in

HEK293 cells was immunoprecipitated using antibodies to different components of the complex and sequenced (1). Sequences obtained from the material precipitated with antibodies to AGO1 (deposited as SRR048973), AGO2 (SRR048974, SRR048975),

AGO3 (SRR048976, SRR048977), AGO4 (SRR048978, SRR048979), TNRC6A

(SRR048982, SRR048983), TNRC6B (SRR048984, SRR048985), or TNRC6C

29 (SRR048986, SRR048987) were obtained from the Short Read Archive (SRA) and converted to FASTQ format. The 3’ adapter was trimmed using the FASTX-Toolkit

(http://hannonlab.cshl.edu/fastx_toolkit/). Reads between 18 and 25 nucleotides long were selected and aligned to hg19 using Bowtie allowing up to one mismatch and five locations to the genome. qRT-PCR analysis

To measure the expression of miR-4423-3p and -5p, 10ng of total RNA was used in a custom Taqman assay (Life Technologies, ordering Ids= CCBI832; CSFAQ5N) as per manufacturer’s protocol and the results were normalized to the expression of RNU44

(Life Technologies). To determine the tissue-specificity of miR-4423-3p and -5p, their expression was measured in the FirstChoice human total RNA survey panel (Ambion), and in bronchial, nasal and fallopian tube epithelium RNA collected at Boston University and UCLA. To measure the expression of pri-miR-4423, 20 ng of total RNA was used in a Taqman assay (Life Technologies) and the results were normalized to GAPDH. To measure the expression of DICER, FOXJ1, MUC5AC, MUC5B, CC10, ASCL1, WDR63 and DNAJB1, total RNA was reverse transcribed using random hexamers (Life

Technologies) and Superscript™II reverse transcriptase (Invitrogen). 20ng of starting cDNA product was added to SYBR® Green PCR master mix (Life Technologies). For all qRT-PCR reactions, forty cycles of amplification were used and data acquisition was carried out with the StepOne Real Time PCR System (Life Technologies). The data was analyzed using the comparative CT method (14). qRT-PCR Primer sequences:

30 WDR63:

For: 5’-GCT GGG AAT AAG CCA CAC AT-3’

Rev: 5’-GAG AGT CCC CCT CTC TGC TT-3’

FOXJ1:

For: 5’-GGA GGG GAC GTA AAT CCC TA-3’

Rev: 5’-CCA AAC TTC CAG CTG CTC TC-3’

MUC5AC:

For: 5’-CTG GGA GTC CAG GTC ATG TT-3’

Rev: 5’-TGT TGG CAA ACT TGC TGA AG-3’

MUC5B:

For: 5’-GAC CTG CAA GAA CGA AGA CC-3’

Rev: 5’-AAA GCA CAC GCA CGT TGT AG-3’

CC10:

For: 5’-GCT CAG CTG AAG AAG CTG GT-3’

Rev: 5’-TGA TGC TTT CTC TGG GCT TT-3’

ASCL1:

For: 5’-GAG CAA CTG GGA CCT GAG TC-3’

31 Rev: 5’CTT TTG CAC ACA AGC TGC AT-3

In vitro Assays miR-4423 overexpression. The DNA sequence encoding the miR-4423 precursor and approximately 200 bp of flanking region was amplified by PCR using the following primers: Forward (5’-CGCGGATCCACACAGCTCACACCATCAGG-3’) and Reverse

(5’-CGCGGATCCCATGAGACTCTCCCCTGCTC-3’). The amplified fragment was then ligated into the BamH1 restriction site of the plasmid pHAGE-CMV-eGFP-W (15).

This vector was used to overexpress miR-4423 both transiently and stably (see map of the overexpression vector below).

For the transient transfection, H1299 cells were cultured in RPMI 1640 growth medium

(ATCC) and plated at a 50% confluence in 60mm plates 24h before transfection.

Plasmids containing miR-4423 or empty vector as a control were transfected using

32 Lipofectamine 2000 (Invitrogen) as per manufacturer’s protocol. To create stable cell lines expressing miR-4423, 293T cells were transfected with the vectors described above together with 4 expression vectors encoding the packaging Gag-Pol, Rev, Tat, and VSV-G (16). Viral particles in the supernatants were collected and concentrated by ultracentrifugation. Calu-6, SW900, H1703, RH2, H1299, H2170, Skmes-1, H520, Calu-

1 and A549 cells were grown in 24-well plates and lentiviral infection was performed when a confluence of 40-50% was reached. To generate NHBEs stably overexpressing miR-4423 or an empty vector as a control, first passage primary human bronchial epithelial cells (MatTek) were cultured on collagen-coated dishes in serum-free BEGM medium (Lonza, Walkersville, MD). Cells were infected with an MOI of 20 at 50-60% confluence. Virus particles were diluted in antibiotic-free BEGM media with 4 µg/ml

SureENTRY reagent (Qiagen, Valencia, CA) and applied to cells for 24 hrs. Similarly, miR-4423 knocked-down NHBEs were generated by co-infecting first passage primary cells with both miR-4423-3p and miR-4423-5p miRZip™ anti-microRNA expression lentivectors or a scramble control (Systems Biolabs) (MOI=20 for each lentivector).

Soft agar assay. Calu-6, SW900, H1703, RH2, H1299, H2170, Skmes-1, H520, Calu-1 and A549 cells stably overexpressing miR-4423 or control (10,000 cells /well) were incubated for 8 days in a semisolid agar media to allow for colony formation using the

Fluorimetric CytoSelect 96-well Cell Transformation Assay (Cell Biolabs). Cells were then solubilized, lysed and detected in a fluorescence microplate reader using a 485/520 nm filter set as per manufacturer’s protocol.

Dicer knockdown. 100 pmoles of siRNA targeting Dicer or siRNA negative control (ON-

TARGET plus, Dharmacon) were transfected into H1299 cells stably overexpressing

33 miR-4423 using the Lipofectamine 2000 system (Invitrogen) according to manufacturer’s protocol. The media was changed and siRNA transfection was repeated 24 hours later.

Cells were harvested 72 hours after the first transfection and total RNA was isolated using the miRNeasy mini kit (Qiagen).

In situ hybridization and Immunohistochemistry. miRNA in situ hybridization on paraffin-embedded sections. We performed in situ hybridization on paraffin-embedded sections of the trachea, mainstem bronchus and second generation bronchus using double-DIG-labeled LNA probes (Exiqon) according to Obernosterer’s protocol (17). Briefly, tissue slides were first dewaxed and rehydrated using xylene and EtOH. Slides were treated with Proteinase K (10ug/ml) and washed with PBS, followed by acetylation in triethanolamine buffer plus acetic anhydride, and were then permeabilized in PBST (PBS plus 0.1% Triton X-100 in DEPC-treated water).

After pre-hybridization at RT, hybridization was carried out at 470C overnight in hybridization buffer (50% formamide, 5x SSC, 1xDenhardt’s solution, 500ug/mL salmon sperm DNA, 5% dextran sulfate) containing the miR-4423 LNA probe (25uM) at a dilution of 1:100, the U6 positive control probe (25uM) at a dilution of 1:1000 and a probe with a scrambled sequence (25uM) at a dilution of 1:100. Then, slides were sequentially washed with 0.2xSSC at 47oC and with 0.2 X SSC and PBS at RT. Slides were then incubated in blocking solution (TTBS, 0.05M Tris, pH 7.5, 0.15M NaCl, 0.1%

Tween-20, plus 5% sheep serum) and incubated with antidigoxigenin- AP antibody

(1:2500,Roche) for 6 hours at RT. After washing in TTBS, signals were developed using

BM purple (Roche).

Immunohistochemistry ALI cultures. EpiAirway tissues (AIR-100, MatTek) were fixed

34 in formalin overnight at room temperature, embedded in paraffin and sectioned. Sections were deparaffinized in 3 changes of Xylene followed by graded alcohol washes to rehydrate. Heat mediated antigen retrieval was then performed in 0.01M Sodium Citrate buffer, pH 6.0 for 45 minutes at 100°C. Slides were blocked for 1 hr in 10% normal goat

+ 1% BSA in PBS for 1 hr followed by incubation with primary antibody (anti-FOXJ1 abcam, ab40869, 1:4000 dilution) diluted in 1% BSA in PBS at 4°C overnight. Slides were washed followed by incubation in Alexa Fluor conjugated secondary antibody (Life

Technologies, 1:400 dilution) for 1 hr at RT. Sections were imaged using an Olympus

FluoView FV1000 confocal microscope. Image analysis and quantification of staining was performed using the Nikon NIS Elements software. β-Tubulin staining was performed using Vectastain Elite ABC kit (Vector Laboratories, Burlingame, CA) according to the manufacturer’s protocol. Briefly, endogenous peroxidase activity was blocked by incubating sections for 30 min in 3% H2O2 followed by blocking for 20 min in diluted normal horse serum. Sections were then incubated in mouse anti-β-Tubulin

(BioGenex, San Ramon, CA) diluted 1:1,500 in 1% BSA in PBS for 1 hr at RT. Slides were washed and incubated in diluted biotinylated secondary antibody for 30 min at RT followed by incubation for 30 min with ABC reagent. Staining was visualized using the

Vector DAB substrate kit. Solution was prepared according to the manufacturer’s protocol and sections were incubated for 5-10 minutes at RT. Slides were washed with dH2O, counterstained with Mayer’s Hematoxylin, and coverslipped. Sections were imaged using a Nikon Eclipse Ti inverted microscope. Two independent tissues per experimental group (GFP control or CMV-MiR-4423) were analyzed for each marker.

E-cadherin antibody staining on miR-4423/control-overexpressing tumors.

35 Paraffin-embedded miR-4423-overexpressing H1703 lung tumors and controls harvested from the backs of NOD SCID gamma (NSG) mice, were studied for the presence of E- cadherin. Antigen retrieval before immunolabeling was performed with Antigen

Unmasking Solution (Vector Laboratories) in a microwave oven. Sections were incubated with primary antibody (Takara M108, dilution 1:250) at 4C overnight.

Antibody binding was detected using ABC reagent (Vectastain kit) with diaminobenzidine as chromagenic substrate. Sections were counterstained with hematoxylin and photographed using ImagePro software.

Microarray sample processing, data analysis and connection to other datasets.

Microarray sample processing and data analysis. Total RNA was isolated from H1299,

Calu-6, SW900 and H2170 cells overexpressing miR-4423 (n=3) or empty vector controls (n=3) using RNeasy Plus Mini-Kit (Qiagen) as per manufacturer’s instructions.

Isolated RNA was processed and hybridized onto Human Gene 1.0 ST arrays

(Affymetrix Inc.) according to the manufacturer’s protocol. Transcript-level gene expression estimates were generated using RMA with the gene CDF version v11

(18). A t-test followed by a False Discovery Rate (FDR) correction was used to identify genes differentially expressed between empty vector control and treated samples or between cell lines sensitive and resistant to miR-4423 in soft agar. The microarray data from these studies has been deposited in the Gene Expression Omnibus under the accession GSE48798.

Comparison of miR-4423 overexpression data with other microarray datasets. To determine the potential role of miR-4423 in the differentiation of the airway epithelium

36 and lung carcinogenesis, we compared the gene expression changes induced upon miR-

4423 overexpression with other publically available datasets.

For the study of gene expression during airway epithelial differentiation (5)(GSE5264),

CEL files were obtained from GEO and normalized with RMA using the custom Entrez

Gene CDF v11.0.1. Linear models were used to examine the relationship between gene expression and differentiation. A mixed-effect linear model was used with gene expression as the response, the average Z score (19) as a fixed effect to control for sample quality, time in days of differentiation as a fixed effect, and cell donor as a random effect. Genes were ranked by the t-statistic of the time variable coefficient.

Relationships between this ranked list of genes and genes that change in response to overexpression of miR-4423 were examined using GSEA.

For the study of gene expression differences in cytologically normal airway epithelium associated with lung cancer status (19)(GSE4115), raw CEL files were normalized with

RMA using the custom Entrez Gene CDF v11.0.1. A linear model was used with gene expression as the response and average Z score, pack years, age, and cancer status as predictors. Genes were ranked by the t-statistic of the cancer status variable coefficient.

Relationships between this ranked list of genes and the set of genes that change in response to overexpression of miR-4423 were examined using GSEA.

For the studies of gene expression differences between tumors and adjacent normal tissue

(6-8) (GSE18842, GSE3268, GSE18842 and GSE12236), raw CEL files were obtained from GEO and separately normalized with RMA using the custom Entrez Gene CDF v11.0.1 with the exception of GSE3268 where the preprocessed data deposited in GEO

37 was used since the CEL files were not available. Genes were ranked by the t-statistic from a paired t-test between tumors and controls from matched adjacent normal tissue.

Relationships between this ranked list of genes and the set of genes that change in response to overexpression of miR-4423 were examined using GSEA.

Air Liquid Interface Cultures

Airway tissue donated for scientific research was obtained from the National Disease

Research Interchange (Philadelphia, PA) following a protocol approved by their IRB.

The donors were two 23-year-old non-smoker Caucasian males and a 13-year-old non- smoking Caucasian male. NHBE from these donors were seeded onto collagen coated tissue culture inserts (Millipore Corp., New Bedford, MA) and cultured under submerged conditions in BEGM (Lonza, Walkerville, MD) until a confluent monolayer was attained.

The cultures were then raised to the ALI by removing the culture medium from the apical surface. Thereafter, cultures were fed basolaterally every other day for up to 21 days to attain differentiated morphology. ALI medium (MatTek Corporation, Ashland, MA) contained insulin (5.0 mg/ml), transferrin (5 mg/ml), hydrocortisone (0.5 nM), bovine pituitary extract (26 mg/ml), EGF (0.6 ng/ml), epinephrine (0.5 ng/ml), all-trans retinoic acid (50.0 nM), 100u/ml penicillin, and 100mg/ml streptomycin. Culture morphology was evaluated by light microscopy of H&E stained paraffin sections. Barrier function was evaluated by measurement of transepithelial electrical resistance (TEER). Cultures generated TEER > 300 Ω·cm2 by day 13 of ALI conditions. Cultures were fixed in

RNALater (Ambion) at various times of ALI culture and stored 4 °C until isolation of

RNA. Total RNA was isolated using the miRNeasy mini kit (QIAGEN).

Xenograft Tumor Growth

38 Pathogen-free NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (strain #005557) mice, commonly known as NOD SCID gamma (NSG) mice, were obtained from The Jackson Laboratory and maintained in the University of California at Los Angeles (UCLA) biocontainment facility. All experimental procedures were performed in accordance with the standards of the Institutional Animal Care and Use Committee under a protocol approved by the

Chancellor’s Animal Research Committee (ARC) at UCLA. Control and miR-4423 overexpressing Calu-6 (5x106), H1299 (5x106), and H1703 (1x106) human cancer cells in a 200ul volume of isotonic saline were injected subcutaneously (s.c.) into the dorsal surface of the right lower quadrant of female mice aged 6 to 12 weeks (7-10 mice/group).

Upon initial tumor palpation, tumor growth was assessed every 2-3 days. Two bisecting diameters of each tumor were measured with calipers, and the volume was calculated using the formula 0.4 x ab2, where a represents the longer diameter and b the shorter perpendicular diameter. All mice in the matched control and experimental groups were euthanized via CO2 asphyxia when the largest tumor reached the maximum allowable tumor volume (~2,000mm3). Representative s.c. primary tumors and matched lungs were harvested for microscopic and molecular analyses; tissues for H&E staining were fixed in

10% neutral-buffered formalin (NBF) for 24hr prior to dehydration and paraffin embedding. To compare tumor weights, P values were generated using the Student's t test. To compare in vivo tumor growth curves between matched control and experimental groups, P values were calculated for each cell line by comparing two linear mixed effect models with a likelihood ratio test. The first model included fixed effects for day and treatment with mouse as the random effect. The second model included an additional fixed effect term for the interaction between day and treatment.

39 TCGA data analysis

RNA-seq, DNA methylation and CNV data as well as all clinical data for lung squamous cell carcinoma (LUSC), ADC (LUAD) and adjacent normal tissue was downloaded using the TCGA data portal on May 31st, 2013. All data was Level 3 (i.e. preprocessed) and included the following folders: UNC__IlluminaHiSeq_RNASeqV2 (RNA-seq),

JHU_USC__HumanMethylation450 (DNA methylation) and

BI__Genome_Wide_SNP_6 (CNV). FPKM values from RNA-seq were normalized by adding a pseudocount of one and applying a log2 transformation. For the CNV data, the segment overlapping with the WDR63 locus was used for the comparisons between tumor and adjacent normal and to WDR63 expression. If more than one segment overlapped with WDR63, the segment with the most marks per base was used for further analysis.

References

1. Hafner M, et al. (2010) Transcriptome-wide identification of RNA-binding and microRNA target sites by PAR-CLIP. Cell 141(1):129-141. 2. Deus HF, et al. (2010) Exposing the cancer genome atlas as a SPARQL endpoint. J Biomed Inform 43(6):998-1008. 3. Marcet B, et al. (2011) Control of vertebrate multiciliogenesis by miR-449 through direct repression of the Delta/Notch pathway. Nat Cell Biol 13(6):693-699. 4. Wang L, et al. (2013) miR-34b regulates multiciliogenesis during organ formation in zebrafish. Development 140(13):2755-2764. 5. Ross AJ, Dailey LA, Brighton LE, Devlin RB (2007) Transcriptional profiling of mucociliary differentiation in human airway epithelial cells. Am J Respir Cell Mol Biol 37(2):169-185. 6. Sanchez-Palencia A, et al. (2011) Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer. Int J Cancer 129(2):355-364. 7. Xi L, et al. (2008) Whole genome exon arrays identify differential expression of alternatively spliced, cancer-related genes in lung cancer. Nucleic Acids Res 36(20):6535- 6547. 8. Wachi S, Yoneda K, Wu R (2005) Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics 21(23):4205-4208.

40 9. Gentleman RC, et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80. 10. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. 11. Friedlander MR, et al. (2008) Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 26(4):407-415. 12. Kozomara A, Griffiths-Jones S (2010) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39:D152-157. 13. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841-842. 14. Schmittgen TD, Livak KJ (2008) Analyzing real-time PCR data by the comparative C(T) method. Nat Protoc 3(6):1101-1108. 15. Mostoslavsky G, Fabian AJ, Rooney S, Alt FW, Mulligan RC (2006) Complete correction of murine Artemis immunodeficiency by lentiviral vector-mediated gene transfer. Proc Natl Acad Sci U S A 103(44):16406-16411. 16. Murphy GJ, Mostoslavsky G, Kotton DN, Mulligan RC (2006) Exogenous control of mammalian gene expression via modulation of translational termination. Nat Med 12(9):1093-1099. 17. Obernosterer G, Martinez J, Alenius M (2007) Locked nucleic acid-based in situ detection of microRNAs in mouse tissue sections. Nat Protoc 2(6):1508-1514. 18. Dai M, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data.Nucleic Acids Res 33(20):e175. 19. Spira A, et al. (2007) Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat Med 13(3):361-366.

41