The human mitochondrial transcriptome

Tim R. Mercer,1 Shane Neph,2 Marcel E. Dinger,1 Joanna Crawford,1 Martin A. Smith,1 Anne‐Marie J. Shearwood,3 Oliver Rackham,3 Eric Haugen,2 Cameron P. Bracken4, John A. Stamatoyannopoulos,2,5 Aleksandra Filipovska,3,5 and John S. Mattick1,5.

1Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, Australia

2Department of Genome Sciences and Medicine, University of Washington School of Medicine, Seattle, WA 98195, United States

3Western Australian Institute for Medical Research and Centre for Medical Research, The University of Western Australia, Perth, WA 6000, Australia

4Division of Human Immunology, Centre for Cancer Biology, SA Pathology, Adelaide, SA, 5000

5To whom correspondence should be addressed.

Email: [email protected] Tel: +61 7 3346 2079; Fax: +61 7 3346 2111;

Email: [email protected] Tel: +61 8 9224 0330; Fax: +61 8 9224 0322;

Email: [email protected] Tel: +01 206 685 2672 Fax: +01 206 267 1094

1 SUMMARY

The human mitochondrial genome comprises a distinct genetic system that is transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs and rRNAs. Here we provide a comprehensive analysis of the human mitochondrial transcriptome. Using directional deep sequencing and parallel analysis of RNA ends we demonstrate wide variation in mitochondrial transcript abundance, precisely resolve transcript processing and maturation events, identify novel transcripts including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high‐throughput in vivo DNaseI footprinting we also establish the global profile of DNA‐binding occupancy across the mitochondrial genome at single nucleotide resolution, revealing novel regulatory features at mitochondrial transcription initiation sites and functional insights into disease‐associated variants. Together, this integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression and processing of mitochondrial RNA, and provides a resource for the future study of mitochondrial function that can be accessed at mitochondria.matticklab.com.

2 INTRODUCTION Following its endosymbiosis from an ‐proteobacterial ancestor, the mitochondrial genome has been streamlined into a small, high copy, bioenergetically specialized genetic system that allows individual mitochondria to respond, by expression, to changes in membrane potential and maintain oxidative phosphorylation (Lane and Martin, 2010). Given this dedicated role, it is not surprising that the mitochondrial genome is regulated and expressed in a unique manner.

Human mitochondria contain a compact circular genome that is 16,569 bp in length (Anderson et al., 1981). Replication and transcription of mitochondrial DNA (mtDNA) is initiated from a small noncoding region, the D‐loop, and is regulated by nuclear‐encoded that are post‐translationally imported into mitochondria. Mitochondrial RNAs are transcribed as long polycistronic precursor transcripts from both strands (historically termed ‘heavy’ and ‘light’) that are processed according to the ‘tRNA punctuation model’ whereby 22 interspersed tRNAs are excised to concomitantly release individual rRNAs and mRNAs (Ojala et al., 1981). The liberated RNAs then undergo maturation that involves polyadenylation of the 3' ends of mRNAs and rRNAs, and specific nucleotide modifications and addition of CCA trinucleotides to the 3' ends of tRNAs (Nagaike et al., 2005). Together, this comprises a unique genetic system that is able to translate the mitochondria‐encoded into 13 protein subunits of the electron transport chain.

Little is known about the fine features of the mitochondrial transcriptome, in particular about the regulation of transcript abundance, sites of RNA processing and modification, and the possible presence of noncoding RNAs. The recent advent of deep sequencing has provided a global profile of the nuclear transcriptome, revealing unforeseen transcriptional complexity that includes prevalent post‐transcriptional processing and abundant noncoding RNAs (Jacquier, 2009). We have taken a similar approach to investigate the mitochondrial transcriptome. Although the atypical features of mitochondrial gene expression require special considerations in sequencing and analysis, the small size of the genome affords a depth of coverage that provides unprecedented

3 opportunities to sample the total RNA population and characterize even rare and transient events.

Here we provide the first comprehensive map of the human mitochondrial transcriptome by near‐exhaustive deep sequencing of long and short RNA fractions from purified mitochondria. Despite their common polycistronic origin, we observe wide variation between individual tRNAs, mRNAs and rRNA amounts, attesting to the importance of post‐transcriptional cleavage and processing mechanisms in the regulation of mitochondrial gene expression. By parallel analysis of RNA ends (PARE) sequencing, we provide a global profile that precisely resolves these cleavage processes, as well as indicating further unexpected non‐canonical cleavage events. During this analysis we were also able to discern the contribution of nuclear RNAs to the mitochondrial transcriptome, accounting for co‐purifying contaminants by a two‐phase sequencing approach.

Finally, we have analyzed in vivo DNaseI protection patterns by massively parallel sequencing to generate a profile of protein‐DNA interactions across the entire mitochondrial genome at single nucleotide resolution. Analysis of these sites of DNA‐ protein interaction, in combination with transcriptional profiling, provides novel regulatory insight into mitochondrial genome dynamics. The integration of these maps reveals unexpected complexity of the regulation, expression and processing of the mitochondrial transcriptome, and comprises an important resource for the future study of mitochondrial function and disease. This resource comprises 10 new datasets that, combined with meta‐analyses of 20 publicly available sets (Table S1), are accessible within the mitochondria‐specific genome browser hosted at mitochondria.matticklab.com.

RESULTS

Mitochondrial gene expression

To capture the distinct features of the mitochondrial transcriptome we first performed RNA sequencing (RNAseq) on purified human 143B cell mitochondria. Mitochondrial preparations from human 143B cells were initially pre‐treated with RNaseA to reduce

4 contaminating RNA, and a clean separation of mitochondria from cytoplasmic and nuclear contaminants was confirmed by immunoblotting and qRT‐PCR assessment of known compartment‐specific proteins and RNAs (Figure S1A‐C). We then employed a strand‐ specific RNA sequencing approach with random hexamer priming, no rRNA or tRNA depletion, and smaller cDNA fragmentation size to provide an expression profile that encompassed all annotated mRNAs, tRNAs and rRNAs (Figure 1). In total, we aligned 1.4 million sequenced reads uniquely to the human mitochondrial genome, finding almost the entire mitochondrial genome to be transcribed (99.9% heavy and 97.6% light strands; Figure S1C). Furthermore, less than 3.8% of the mitochondrial genome sequence was represented singly within the sequenced libraries, indicating the near‐exhaustive depth of this sequencing.

Mitochondrial RNAs are derived from precursor transcripts that traverse almost the entire heavy and light mtDNA strands. These precursor transcripts are subsequently processed into individual mRNAs that exhibit considerable variation in steady‐state expression levels (Figure 2B,C; Table S2A; Figure S1D,E). A similarly wide variation in the abundance of different mitochondrial tRNAs was also observed (Figure 1; Table S1B). Given their common source, the wide variation in steady‐state levels of mature RNAs indicates the importance of post‐transcriptional mechanisms, including those affecting transcript stability, in regulating tRNA and mRNA abundance (Piechota et al., 2006).

The strand‐specific sequencing of the mitochondrial transcriptome also provided an opportunity to detect stable antisense transcripts. We distinguished several highly expressed, stable antisense RNAs against the background of unprocessed polycistronic transcripts (Table S1; Figure 2C). Like their mRNA counterparts, these antisense RNAs exhibit considerable variation in gene expression. A notable example is the antisense transcript associated with the ND5 and ND6 genes, which are organized tail‐to‐tail, where strand‐specific sequencing, confirmed by qRT‐PCR (Figure S1F), showed a highly expressed RNA antisense to the length of the ND5 gene, mirroring the ND5 3’UTR that is antisense to the ND6 gene (Figure 2D).

5 We next examined mitochondrial gene expression across 16 human tissues sequenced as part of the Illumina Body Tissue Atlas (Supplementary Methods). We performed hierarchical clustering to identify tissue‐specific differences (Figure 2E; Table S2A), noting some differences in mitochondrial transcript abundance in samples from human tissue and 143B cells due to the alternative use of polyadenylated or total RNA during library preparation (see below). We found that the variation in mitochondrial gene expression was consistent between tissues, with the analysis segregating two distinct gene clusters, and that mitochondrial transcript abundance is higher in tissues with high‐energy demands, such as heart and muscle. Because RNA libraries are derived from whole tissues, we could also determine the proportion of the tissue mRNA content derived from mitochondria. Since both mitochondrial genome copy number and transcriptional activity contribute to this content, this analysis provides an indirect metric of the demand each tissue places upon the mitochondria (Figure 2F). In the heart, mitochondrial transcripts comprise almost 30% of total mRNA, whereas mitochondria contribute a lower bound of ~5% to the total mRNA of tissues with lower energy demands (adrenal, ovary, thyroid, prostate, testes, lung, lymph and white blood cells).

To determine if tissue‐specific differences in mitochondrial expression are coordinated with changes in nuclear gene expression, we added 1,013 nuclear‐encoded genes implicated in mitochondrial function to our gene expression analysis (Pagliarini et al., 2008) (Table S2C). Hierarchical clustering of nuclear gene expression showed a significant correlation to mitochondrial gene expression (r2 = 0.78, p‐value < 0.01), resolving two distinct gene sets that correspond to the pair of mitochondrial gene clusters (Figure S1G) that segregate according to tissue energy demand, reflecting a close coordination between nuclear and mitochondrial genomes in relation to each tissues energy requirements.

Nuclear‐encoded RNAs in mitochondria

The sequencing of purified mitochondrial RNA returned a surprisingly large number of reads that uniquely aligned to the nuclear genome (Figure 3A). Although it has been

6 shown that nuclear RNAs may be imported into human mitochondria (Entelis et al., 2001), the strong enrichment for genes associated with protein translation (p‐value < 10‐78) suggests that cytosolic ribosomes closely associated with the mitochondrial outer membrane co‐purify with mitochondria and confound efforts to identify bona‐fide imported transcripts (Sylvestre et al., 2003). Therefore, to distinguish RNA within mitochondria from co‐purifying contaminants, we employed a subtractive approach that compared the RNA content of whole mitochondria to that of mitochondrial preparations stripped of their outer membrane (mitoplasts). This two‐phase approach is analogous to that successfully employed to characterize the mitochondrial proteome (Pagliarini et al., 2008) and is based on the observation that RNAs within the mitochondrial matrix are enriched, whereas RNAs that co‐purify with the outer membrane are depleted, during mitoplast preparation.

We performed matched sequencing of RNA isolated from mitoplasts and aligned a total of 4.7 million sequenced reads to the genome, representing a 5.9‐fold enrichment relative to whole mitochondrial preparations (Figure 3A, Figure S2A,B). While the expression of mitochondrial genes was closely correlated between whole mitochondrial and mitoplast libraries (r2 = 0.98, p‐value < 0.001; Figure S2C), there was a selective depletion of nuclear‐ encoded mRNAs and rRNAs, indicating that these transcripts are associated with the mitochondrial outer membrane (Figure 3C). Furthermore, there was no enrichment of cytoplasmic mRNAs encoding mitochondrial proteins in mitoplasts, consistent with their proteins being post‐translationally imported into mitochondria. We confirmed by qRT‐PCR the strong depletion in mitoplasts for 6 mRNAs and 3 miRNAs that showed the greatest enrichment in mitochondrial preparations (Figure 3D,E).

We next compared matched mitochondrial and mitoplast libraries to determine whether other nuclear transcripts are enriched in mitoplasts (Supplementary Methods). Firstly, to validate the two‐phase sequencing approach, we considered three noncoding RNAs (ncRNAs), 5S rRNA, MRP and RNase P, that have been previously shown to be present in the mitochondrial matrix (Wang et al., 2010), finding all transcripts, though lowly

7 expressed, enriched in mitoplasts (Figure S2D). We also investigated whether other nuclear‐encoded ncRNAs, including snoRNAs, scRNAs and novel ncRNAs predicted by Evofold (Pedersen et al., 2006) or RNAz (Washietl et al., 2005), were present within mitoplasts. Although the vast majority of such ncRNAs are either absent or depleted from mitoplasts, several novel ncRNAs were enriched in mitoplasts, although the relative amount of these ncRNAs was a fraction of that observed for mitochondria‐encoded RNAs (Figure S2E‐G).

Lower eukaryotes, such as trypanosomes and yeast, require the import of tRNAs into mitochondria (Entelis et al., 1998; Hancock et al., 1992) and tRNA‐Gln was recently shown to be naturally imported into human mitochondria (Rubio et al., 2008). We compared the expression of nuclear tRNAs between mitochondria and mitoplasts to identify additional candidate tRNAs present within mitochondria (Supplementary Methods). We found several nuclear‐encoded tRNA species to be enriched within mitoplasts relative to mitochondrial preparations, including the previously detected tRNA‐GluUUG, an enrichment that was confirmed by qRT‐PCR (Figure 3F‐G; Figure S2F). Furthermore, we noted a prevalence of sequencing mismatches at known modified nucleotide positions indicating that these tRNAs were present in a mature processed form.

Processing of primary mitochondrial polycistronic transcripts

Next we investigated the processing of polycistronic precursor transcripts into individual functional RNAs (Temperley et al., 2010b). To profile the opening stage of polycistronic processing, the RNase P‐dependent cleavage of tRNA 5’ ends (Rossmanith and Holzmann, 2009), we performed parallel analysis of RNA ends (PARE) (German et al., 2008) on purified human 143B mitochondria to capture the exposed 5’‐monophosphate residue of cleaved transcripts. Despite this methodology being originally developed for identification of miRNA‐directed cleavage sites in mRNAs, sequenced PARE reads exhibit a distinct enrichment at the 5’ end of tRNAs, confirming the ability of this technique to detect polycistronic processing sites (Figure 4A,B). In total, using PARE, we identified 80 high‐ confidence cleavage sites across the mitochondrial transcriptome, of which 51.6%

8 corresponded to 5’ tRNA cleavage events (Table S3A). To provide a complementary profile of 3’ tRNA cleavage events, we ‘rescued’ sequenced reads from our directional sequencing of purified 143B mitochondria that contain non‐templated 3’‐CCA nucleotides added during tRNA maturation. The majority of rescued reads (77.8%) were associated with tRNA 3’ termini (Figure 4C), providing a quantitative indication of tRNA 3’ cleavage frequency, which exhibited high correlation with total tRNA expression (r2 = 0.94, p‐value < 0.001) and 5’ processing (r2 = 0.74, p‐value < 0.001). In combination, these two techniques provided a global profile of tRNA processing from precursor transcripts (Figure 1). We did, however, note a number of cases, such as the adjacent tRNA‐Ala and –Asn loci, where 5’ and 3’ processing do not correlate, suggesting incomplete or aberrant processing of tRNAs and the existence of stable conjoined tRNA transcripts (Figure S3A,B).

Next, we examined the cleavage of mitochondrial mRNAs. We supplemented our analysis with publicly available PARE libraries generated from human Hek, HeLa cells and brain (Karginov et al., 2010; Shin et al., 2010) that, despite being generated from whole cell preparations, provided better coverage of mitochondrial mRNAs as a result of greater sequencing depth and the use of polyA+ RNA to generate these libraries. For example, in PARE libraries from Hek cells, 15.4 million (34% of total) PARE sequenced reads aligned to the mitochondrial genome. We found cleavage occurred at a single dominant 5’ nucleotide in the majority of mRNAs (Figure 4D) and, in some cases, we noted a divergence from the predicted cleavage sites in current gene annotations (Table S4A). Moreover, we did not observe a cleavage site associated with the CO3 gene, suggesting the absence of a 5’ terminal mono‐phosphate residue. In addition to canonical cleavage sites, we also found evidence of intra‐exonic cleavage sites within mRNAs. We focused on events that, similar to canonical sites, are characterised by a distinct single nucleotide peak that is positionally conserved in all three samples (Figure S3C,D). However, these events occur at a lower frequency (161‐fold lower than canonical cleavage events associated with the 5’ ends of mRNAs) indicating that, though present, non‐canonical cleavage events are relatively rare. Similar truncated mRNAs have been detected in mitochondria with aberrant RNA surveillance pathways (Szczesny et al., 2010).

9 A concluding stage in precursor processing involves 3’ polyadenylation of mRNAs and rRNAs. We identified mitochondrial polyadenylation events by rescuing unmapped sequenced reads from our directional sequencing of 143B mitochondria after removing non‐templated 3’ adenine nucleotides and realigning the reads to the mitochondrial genome (Figure S3E). To increase the sensitivity of this approach, we also undertook a complementary analysis of directional read sequenced as part of the Illumina Body Tissue Atlas that, due to greater sequencing depth and polyA priming, allowed us to detect even rare polyadenylation sites. The majority (64.5%) of sites corresponded to the annotated 3’ ends of mRNAs, rRNAs and, surprisingly, some tRNAs (Figure 4E; Figure S3E‐G; Table S4B), with the exception of ND6 mRNA, for which we observed no polyadenylation, as previously reported (Temperley et al., 2003). To identify the 3’ terminus of the ND6 transcript, we used 3’ rapid amplification of cDNA ends (RACE), and found a 33/34 nt untranslated region following the AGG stop codon (Figure 4F) that would completely encompass the stable secondary structure predicted to promote ‐1 frameshifting in mitochondrial ribosomes (Temperley et al., 2010a). In addition to ND6, the variation between RNA polyadenylation frequencies that did not correlate with gene expression (r2 = ‐0.14, p‐value = 0.64) suggests other mitochondrial RNAs are also present in a non‐ polyadenylated form (Slomovic et al., 2005). To identify non‐polyadenylated mRNAs, we compared gene expression between matched polyA+ and polyA‐ RNAseq libraries (Morin et al., 2008). We found the relative abundance of mRNAs varies between polyA+ and polyA‐ libraries suggesting the differential existence of non‐polyadenylated isoforms (Figure S3H,I). For example, some mRNAs, such as CO1, are enriched in the polyA‐ fraction, whereas mRNAs that require polyadenylation to complete an in‐frame stop codon (Ojala et al., 1981), such as ND4L/4, are diminished in the polyA‐ fraction (1.7‐fold, p‐value = 0.060). The differential polyadenylation frequencies also resolved differences between gene expression observed earlier in polyA and total RNA sequencing libraries (see above).

The previous identification of non‐canonical transcriptional units has implied the existence of cleavage events not accounted for by the tRNA punctuation model (Guan et al., 1998;

10 Ojala et al., 1981; Sbisa et al., 1992). At every stage of polycistronic precursor transcript cleavage, our analysis also uncovered non‐canonical processing events. Given canonical processing requires the precursor transcript to locally fold into a tRNA structure, we considered whether non‐canonical events identified within our analysis also associate with novel RNA secondary structures. We surveyed the mitochondrial transcriptome for evidence of evolutionary selection for RNA secondary structures using SISSIz (Gesell and

Washietl, 2008), predicting 173 RNA secondary structures (Z‐score > 2, P95) (Figure 1; Table S4C) of which 85 correspond to tRNAs and 82 to rRNA substructures, which include all previously annotated RNA structures in the mitochondrial genome. We identified 14 new structural clusters within the ND1, ND2, CO3 and ND5 genes, the majority of which were associated with cleavage sites, albeit at lower frequency (Figure S4A‐F).

Small RNAs in mitochondria

A diversity of small RNAs (sRNAs) that regulate a range of cellular processes have been characterized in viral, bacterial and nuclear genomes (Zhang, 2009). To determine whether sRNAs are encoded within the human mitochondrial genome, we sequenced sRNAs (size range 15‐30 nt) isolated from purified 143B mitochondria. We obtained 575,553 sRNA reads that mapped uniquely to the mitochondrial genome (Figure 5A). Of these, only 0.6% were represented as single reads (compared to 14.1% of sRNAs from 143B whole cell preparations being singletons) indicating that our sequencing of mitochondrial sRNAs is almost exhaustive (Mayr and Bartel, 2009). In total, this mitochondrial sRNA population comprised 3.1% of the whole cell sRNA population.

The sRNA sequencing yielded an unexpectedly large number of sRNAs that mapped to the mitochondrial genome. To distinguish sRNAs from potential fragmented intermediates of RNA degradation, we considered only sRNAs that were expressed at greater than 1,048 reads per million, the median level of miRNA expression within 143B cells. This filtering step omitted most sRNAs between 15‐18 nt in size, and revealed two distinct 21 nt and 26 nt sRNA size classes (Figure 5B). We then compared the expression of sRNA sequences between long and small RNA sequencing libraries to determine whether sRNA expression

11 is a function of overlapping gene expression, as might be expected from transitory intermediates of RNA degradation. We found highly expressed sRNA expression was not significantly correlated (r2 = 0.2, p = 0.243) with overlapping gene expression, in contrast to the omitted sRNAs (r2 = 0.61, p = 0.003). Furthermore, we observed a large and dynamic range of expression of the highly expressed mitochondrial sRNAs within whole cell sRNA libraries derived from eight different cell types (Figure S5A‐F). Together, this process annotated 31 novel sRNAs expressed from 17 distinct loci (Table S5A).

When mapped to the mitochondrial genome, we found that the majority (84%) of these abundant sRNAs were derived from tRNA genes (Figure 5A). It was recently shown that nuclear tRNAs are processed by both Dicer‐ and RNase Z‐dependent pathways into a range of small RNA species that are subsequently incorporated into RNA silencing pathways (Haussecker et al., 2010; Lee et al., 2009). Like nuclear tRNAs, we observed two distinct species of sRNAs associated with mitochondrial tRNA genes, one immediately downstream of the 5’ cleavage site and a second less abundant species immediately downstream of the 3’ cleavage site (Figure 5C; Figure S5A‐F). However, unlike nuclear tRNAs, we did not observe a Dicer‐dependent sRNA species cleaved from the 3’ stem loop of nuclear tRNAs (Pagliarini et al., 2008) (Figure S5H,I). Furthermore, we did not observe a significant correlation in the expression of the two mt‐tRNA sRNA species. To investigate whether 5’ associated sRNAs are derived from mature tRNAs, we looked for sequencing mismatches as an indication of host tRNA nucleotide modifications. We found overwhelming enrichment of sequencing mismatches at known modified nucleotide positions, such as the methylated N1‐A9 residue, indicating that the 5’ sRNA species was derived from the tRNA stem (Figure S5G).

Although, during this analysis, we filtered out lowly expressed sRNAs, we expect this omitted fraction to contain additional sRNAs encoded in the mitochondrial genome. For example, we observed lowly expressed sRNAs whose 5’ termini align with the origin of light strand replication (Ol) and 3’ termini map to downstream sites where RNA to DNA transition occurs, likely reflecting primers utilized by POLG during replication initiation

12 (Fuste et al., 2010) (Figure S5J). Therefore, we anticipate the mitochondrial transcriptome includes additional sRNAs other than those annotated within this study.

Identification of DNA‐binding protein occupancy by analysis of in vivo DNaseI protection patterns

Transcription initiation and termination are mediated by a suite of proteins that bind directly to sites within the mitochondrial genome (Scarpulla, 2008). Deep analysis of in vivo DNaseI cleavage sites by massively parallel sequencing enables systematic identification of DNaseI footprints and hence regulatory factor occupancy on genomic DNA (Hesselberth et al., 2009). We performed a DNaseI protection assay, whereby DNaseI‐cleaved fragments are sequenced to sufficient depth to allow identification of sites protected by proteins. We sequenced DNaseI‐cleaved fragments of total DNA from seven human cell lines, following which, given the abundant and exposed nature of mtDNA, we were able to align an average ~71 million reads to the mitochondrial genome to achieve greater than 4,000‐fold coverage (Figure 6A).

We observed considerable variation in DNaseI sensitivity across the genome that included numerous short stretches of protected nucleotides ('footprints') that marked putative sites of protein‐DNA interactions. The DNaseI cleavage patterns were highly reproducible between two biological replicates (r2 = 0.86, p < 0.0001). We identified DNaseI footprints systematically by requiring 8 or more contiguous nucleotides to exhibit a depressed cleavage frequency relative to flanking regions. The contrast between internal to flanking DNaseI cleavage frequency was used to ascribe each footprint a sensitivity score (F‐score). By these criteria, we identified an average of 159 putative footprints across the mitochondria genome of each cell line that collectively encompass ~8.4% of the mitochondrial genome (Table S6A). As expected, we observed the highest density of footprints in the regulatory D‐loop region where a number of footprints corresponded to sites previously identified by traditional Southern‐based DNaseI footprinting assays (Fisher et al., 1992; Ghivizzani et al., 1994), confirming the validity of the footprints gleaned from the deep sequencing data.

13 We next performed hierarchical clustering of footprints according to DNaseI sensitivity (Figure 7A). This approach identified a core set of footprints present in all cell lines, including sites overlapping the origin of light strand replication (Figure S6A), termination sites at the D‐loop, and previously described mitochondrial transcription factor A (TFAM) and mitochondrial transcription termination factor 1 (MTERF1) binding sites (Figure 7B). This approach also revealed several novel cell‐specific sites (Figure 7C; Figure S6B).

To gain insight into the identity of the regulatory factors occupying mitochondrial DNaseI footprints, we performed de novo motif discovery on protected regions, and aligned the results with known transcription factor recognition sequences from public databases (Table S6B). This analysis returned motifs matching the consensus recognition sequences for CREB, STAT3 and T3R transcription factors, all of which have previously been identified within mitochondria, as well as numerous other transcription factors (Cannino et al., 2007) (Figure S6C; Table S6A,B). This analysis also identified several novel motifs that recur prevalently within mitochondrial DNaseI footprints, suggesting the existence of additional unidentified proteins that regulate the mitochondrial genome and its expression (Figure 7C).

Protein‐DNA interactions at single‐nucleotide resolution

In addition to providing a global map of DNaseI footprints, DNaseI sensitivity mapping has the potential to reveal fine scale DNA‐protein interactions within DNaseI footprints at single‐nucleotide resolution (Hesselberth et al., 2009). To verify the reproducibility of the nucleotide‐resolution DNaseI cleavage patterns within identified footprints, we compared two HeLa cell biological replicates, finding 86% of internal footprint patterns to be significantly correlated (mean r2=0.88, p < 0.05). To illustrate the single nucleotide resolution of DNaseI cleavage patterns relative to protein structural features, we considered the binding of MTERF1 to the tRNA‐LeuUUR termination DNA sequence, the structure of which was recently solved by X‐ray crystallography (Yakubovskaya et al., 2010). Aligning the nucleotide‐level DNaseI cleavage profile with this resolved structure showed sharply delineated DNaseI protection corresponding to the MTERF1 DNA

14 recognition region and additional internal low frequency cleavage events corresponding to phosphodiester bonds selectively occluded by the MTERF1 DNA‐protein interface (Figure 7B; Figure S6D).

Transcription termination within the D‐loop

The D‐loop, where mitochondrial replication and transcription initiates, is the major site of transcriptional regulation (Scarpulla, 2008), reflected in the complex DNaseI footprint profiles observed within this region (Figure 6B). These complex profiles include footprints associated with TFAM binding sites (Fisher et al., 1992), light strand transcription initiation, transcription termination and sites of RNA to DNA transitions. In HeLa cells, we integrated the DNaseI cleavage data with transcriptomic maps to provide insight into regulation within the D‐loop. Firstly, using PARE sequencing, we identified reads from HeLa cells that aligned to the light (4.0 x 105 reads) and minor heavy strand (1.4 x 105 reads) promoters. The similarly high number of PARE reads aligning to both the light and heavy strand promoters supports the bidirectional nature of mitochondrial transcription initiation (Fisher et al., 1987) but contrasts with the dominant expression of the heavy strand relative to the light strand as indicated by RNAseq. Given that transcription termination is the key mechanism that determines the ratio of mitochondrial rRNA to mRNA expression (Scarpulla, 2008), we next considered whether termination also modulates light strand expression stoichiometry. We found the largest component of light strand polyadenylation events (35.5% of total) occurs just ~160‐185 nt downstream of the light‐strand promoter, coincident with the abortion of transcription observed in RNAseq (Figure S6E) (Suissa et al., 2009). The strong DNaseI footprints at this site, encompassing several recurrent novel motifs, suggest the involvement of DNA‐binding proteins in this termination event that, like MTERF1, contribute to the ratio of asymmetric polycistronic precursor expression (Figure 6B).

Identification of functional sequence variants within DNaseI footprints

15 The accurate alignment of sequence reads from DNaseI mapping experiments required us to firstly genotype the mitochondrial genome of each cell line and identify sequence variation. This allowed us to consider whether the identified sequence variation occurred within footprints and could potentially interfere with protein‐DNA interactions. For this analysis, we extended the sample set to include DNaseI hypersensitivity mapping data from an additional 8 less deeply sampled ENCODE cell types, thereby increasing our sample size to 15 cell types derived from different individuals (Consortium, 2011). In total we identified 215 mtDNA variants relative to the hg18 reference human sequence (Table S6C), 50 of which occurred within DNaseI footprints. Of these, 4 have previously been associated with disease (Crimi et al., 2003; Hauswirth and Clayton, 1985; Khogali et al., 2001; Lu et al., 2010). Of note, an additional 49 disease‐associated mitochondrial variants that were not present in the cell lines examined also localize within DNaseI footprint regions.

We next analyzed the consequence of sequence variation on protein‐DNA interactions by quantifying DNaseI protection at the variant position in affected cell types and comparing this with cell types containing the reference sequence. By this approach we identified a T16189C sequence variation that lies adjacent to a previously defined control element (Suissa et al., 2009) and associates with significant differences in DNaseI footprint occupancy (Figure 7E). We found the DNaseI cleavage profile over this variant position to be significantly distorted (p < 0.001) in the 3 cell lines containing the C allele compared with all 12 cell lines harboring the T allele (Figure 7D,E). Previously, the T16189C variant has been associated with an increased risk of dilated cardiomyopathy and type 2 diabetes (Khogali et al., 2001; Bhat et al., 2007). These results suggest a mechanistic basis for the contribution of the T16189C variant to human disease, and demonstrate the potential for mitochondrial DNaseI footprinting to simultaneously identify sequence variation and its functional consequences.

Higher‐order mitochondrial genome organisation

Little is known about the higher order structure of the mitochondrial genome in vivo.

16 Periodic trends in DNaseI cleavage patterns sites provide information on higher order DNA structure, such as the 10bp periodicity of minor groove DNaseI cleavages characteristic of DNA apposed to nucleosomes or other surfaces (Noll, 1974). We therefore considered whether periodic trends within DNaseI cleavage profiles might similarly provide insight into the higher order organization of the mitochondrial genome. We firstly identified periodicity of DNaseI cleavage by WOSA (Welch's Overlapped Segment Averaging) estimators. This revealed reproducible peaks at ~10 nt phasing that deviate from random expectations (Figure S7A‐G) (Percival D.B., 1993), suggesting that mitochondrial DNA is tightly apposed to a surface in vivo (Noll, 1974). The existence of additional higher order organization of mtDNA is also suggested by an enrichment for DNase cleavage at ~80 nt periodicity, although the significance of this pattern within the structural model of the mitochondrial nucleoid (Holt et al., 2007) is unknown.

DISCUSSION

The application of DNA and RNA sequencing technologies has transformed our understanding of the nuclear genome and revealed many novel features to a highly complex human transcriptome. In comparison to the nuclear genome, the mitochondrial genome has evolved its own unique solutions to regulate gene expression. The streamlined mitochondrial genome is initially transcribed in its entirety before a variety of post‐transcriptional mechanisms act to liberate individual gene products and regulate their expression. We reasoned that these same approaches would be similarly well suited to profile the mitochondrial transcriptome at single‐nucleotide resolution and at genome‐ wide scale.

Sequencing of the mitochondrial transcriptome revealed wide variation in gene expression which, given their common source, indicates the existence of complex post‐transcriptional regulatory mechanisms. Analysis of the data also identified novel mitochondrial small RNAs, including small RNAs derived from tRNAs, as well as previously unknown antisense RNAs. The existence of these novel RNAs is demonstrated by PARE sequencing, which, in

17 combination with other signatures of RNA processing, resolves the precise sites and frequencies by which individual gene products are fashioned from a common substrate to assemble a complete genetic system. This process is fundamental to the coordination of mitochondrial individual gene expression, and underscores the complexity of the mitochondrial transcriptome. Our analysis also revealed numerous non‐canonical processing sites, including transient or rare un‐ or mis‐processed intermediate events. This technique, in combination with the reference maps provided within this study, provides a powerful tool to investigate the hierarchy of cleaved intermediates and the proteins responsible for these processes (Temperley et al., 2010b).

Our analysis also adds to our growing understanding of the regulatory protein factors that bind to the mitochondrial genome. Recent chIP‐seq studies have revealed that regulatory factors bind at many sites across the nuclear genome, often remotely to gene promoters. The small size of the mitochondrial genome permitted us to conduct a DNaseI protection analysis at unprecedented depth and resolution, and thereby identify regions of the genome protected by proteins. This revealed many novel binding motifs, for which the majority of interacting proteins remain to be discovered. Strong protein footprints are also associated with a major site of transcript termination and cleavage (as shown by RNaseA and PARE data) immediately downstream of the light strand promoter, which likely controls the asymmetric strand expression of the mitochondrial genome.

A broad spectrum of degenerative diseases has been associated with mtDNA mutations. Mitochondrial mutations can result in defective oxidative phosphorylation and energy metabolism, and may affect numerous downstream pathological effects, such as changes in apoptosis and DNA repair (Taylor and Turnbull, 2005). The maps collated within this resource, particularly the protein‐binding profiles, provide a ready context in which to interpret the functional consequence of such genetic variation. Whether mediated at the level of protein‐DNA interactions, post‐transcriptional processing or gene expression, our data will provide a richer framework to understand the mechanisms by which these mutations mediate their phenotypic effects.

18 During our survey of the mitochondrial transcriptome, we were greeted at all levels with unanticipated novelty and complexity. We anticipate the continued interrogation of the datasets presented herein to reveal many previously unknown features of mitochondrial gene expression, function and regulation. As previously for the nuclear transcriptome, we expect that the application of sequencing approaches will transform our understanding of the distinct genetic system contained within human mitochondria.

ACKNOWLEDGMENTS

This work was supported by grants and fellowships from the Australian Research Council, the National Health and Medical Research Council of Australia, the Queensland Government Department of Employment, Economic Development and Innovation, the Australian Stem Cell Centre, and NIH Grants U54HG004592 and U54HG004592‐S1. We thank Rob King from GeneWorks (Adelaide, SA) for advice with RNA sequencing; Andreas Gruber from the Institute for Theoretical Chemistry in Vienna for providing his software for merging and fragmenting alignment files; Robert Thurman, Alex Reynolds and Dave Shechner for help with DNaseI hypersensitivity analysis; and Illumina (Hayward, CA) for providing early‐access to the human ‘Body Map’ RNA sequencing atlas.

REFERENCES

Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., et al. (1981). Sequence and organization of the human mitochondrial genome. Nature 290, 457‐465.

Bhat, A., Koul, A., Sharma, S., Rai, E., Bukhari, S.I., Dhar, M.K., and Bamezai, R.N. (2007). The possible role of 10398A and 16189C mtDNA variants in providing susceptibility to T2DM in two North Indian populations: a replicative study. Hum Genet 120, 821‐826.

Cannino, G., Di Liegro, C.M., and Rinaldi, A.M. (2007). Nuclear‐mitochondrial interaction. 7, 359‐366.

Consortium, T.E.P. (2011). A User's Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol 9, e1001046.

Crimi, M., Del Bo, R., Galbiati, S., Sciacco, M., Bordoni, A., Bresolin, N., and Comi, G.P. (2003). Mitochondrial A12308G polymorphism affects clinical features in patients with single mtDNA macrodeletion. Eur J Hum Genet 11, 896‐898.

19 Entelis, N.S., Kieffer, S., Kolesnikova, O.A., Martin, R.P., and Tarassov, I.A. (1998). Structural requirements of tRNALys for its import into yeast mitochondria. Proc Natl Acad Sci U S A 95, 2838‐2843.

Entelis, N.S., Kolesnikova, O.A., Martin, R.P., and Tarassov, I.A. (2001). RNA delivery into mitochondria. Adv Drug Deliv Rev 49, 199‐215.

Fisher, R.P., Lisowsky, T., Parisi, M.A., and Clayton, D.A. (1992). DNA wrapping and bending by a mitochondrial high mobility group‐like transcriptional activator protein. J Biol Chem 267, 3358‐3367.

Fisher, R.P., Topper, J.N., and Clayton, D.A. (1987). Promoter selection in human mitochondria involves binding of a transcription factor to orientation‐independent upstream regulatory elements. Cell 50, 247‐258.

Fuste, J.M., Wanrooij, S., Jemt, E., Granycome, C.E., Cluett, T.J., Shi, Y., Atanassova, N., Holt, I.J., Gustafsson, C.M., and Falkenberg, M. (2010). Mitochondrial RNA polymerase is needed for activation of the origin of light‐strand DNA replication. Mol Cell 37, 67‐78.

German, M.A., Pillay, M., Jeong, D.H., Hetawal, A., Luo, S., Janardhanan, P., Kannan, V., Rymarquis, L.A., Nobuta, K., German, R., et al. (2008). Global identification of microRNA‐target RNA pairs by parallel analysis of RNA ends. Nat Biotechnol 26, 941‐946.

Gesell, T., and Washietl, S. (2008). Dinucleotide controlled null models for comparative RNA gene prediction. BMC Bioinformatics 9, 248.

Ghivizzani, S.C., Madsen, C.S., Nelen, M.R., Ammini, C.V., and Hauswirth, W.W. (1994). In organello footprint analysis of human mitochondrial DNA: human mitochondrial transcription factor A interactions at the origin of replication. Mol Cell Biol 14, 7717‐7730.

Guan, M.X., Enriquez, J.A., Fischel‐Ghodsian, N., Puranam, R.S., Lin, C.P., Maw, M.A., and Attardi, G. (1998). The deafness‐associated mitochondrial DNA mutation at position 7445, which affects tRNASer(UCN) precursor processing, has long‐range effects on NADH dehydrogenase subunit ND6 gene expression. Mol Cell Biol 18, 5868‐5879.

Hancock, K., LeBlanc, A.J., Donze, D., and Hajduk, S.L. (1992). Identification of nuclear encoded precursor tRNAs within the mitochondrion of Trypanosoma brucei. J Biol Chem 267, 23963‐23971.

Haussecker, D., Huang, Y., Lau, A., Parameswaran, P., Fire, A.Z., and Kay, M.A. (2010). Human tRNA‐derived small RNAs in the global regulation of RNA silencing. RNA 16, 673‐695.

Hauswirth, W.W., and Clayton, D.A. (1985). Length heterogeneity of a conserved displacement‐loop sequence in human mitochondrial DNA. Nucleic Acids Res 13, 8093‐8104.

Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds, A.P., Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., et al. (2009). Global mapping of protein‐DNA interactions in vivo by digital genomic footprinting. Nat Methods 6, 283‐289.

Holt, I.J., He, J., Mao, C.C., Boyd‐Kirkup, J.D., Martinsson, P., Sembongi, H., Reyes, A., and Spelbrink, J.N. (2007). Mammalian mitochondrial nucleoids: organizing an independently minded genome. Mitochondrion 7, 311‐321.

Jacquier, A. (2009). The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs. Nat Rev Genet 10, 833‐844.

20 Karginov, F.V., Cheloufi, S., Chong, M.M., Stark, A., Smith, A.D., and Hannon, G.J. (2010). Diverse endonucleolytic cleavage sites in the mammalian transcriptome depend upon microRNAs, Drosha, and additional nucleases. Mol Cell 38, 781‐788.

Khogali, S.S., Mayosi, B.M., Beattie, J.M., McKenna, W.J., Watkins, H., and Poulton, J. (2001). A common mitochondrial DNA variant associated with susceptibility to dilated cardiomyopathy in two different populations. Lancet 357, 1265‐1267.

Lane, N., and Martin, W. (2010). The energetics of genome complexity. Nature 467, 929‐934.

Lee, Y.S., Shibata, Y., Malhotra, A., and Dutta, A. (2009). A novel class of small RNAs: tRNA‐derived RNA fragments (tRFs). Genes Dev 23, 2639‐2649.

Lu, J., Qian, Y., Li, Z., Yang, A., Zhu, Y., Li, R., Yang, L., Tang, X., Chen, B., Ding, Y., et al. (2010). Mitochondrial haplotypes may modulate the phenotypic manifestation of the deafness‐associated 12S rRNA 1555A>G mutation. Mitochondrion 10, 69‐81.

Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673‐684.

Morin, R., Bainbridge, M., Fejes, A., Hirst, M., Krzywinski, M., Pugh, T., McDonald, H., Varhol, R., Jones, S., and Marra, M. (2008). Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short‐read sequencing. Biotechniques 45, 81‐94.

Nagaike, T., Suzuki, T., Katoh, T., and Ueda, T. (2005). Human mitochondrial mRNAs are stabilized with polyadenylation regulated by mitochondria‐specific poly(A) polymerase and polynucleotide phosphorylase. J Biol Chem 280, 19721‐19727.

Noll, M. (1974). Internal structure of the chromatin subunit. Nucleic Acids Res 1, 1573‐1578.

Ojala, D., Montoya, J., and Attardi, G. (1981). tRNA punctuation model of RNA processing in human mitochondria. Nature 290, 470‐474.

Pagliarini, D.J., Calvo, S.E., Chang, B., Sheth, S.A., Vafai, S.B., Ong, S.E., Walford, G.A., Sugiana, C., Boneh, A., Chen, W.K., et al. (2008). A mitochondrial protein compendium elucidates complex I disease biology. Cell 134, 112‐123.

Pedersen, J.S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad‐Toh, K., Lander, E.S., Kent, J., Miller, W., and Haussler, D. (2006). Identification and classification of conserved RNA secondary structures in the . PLoS Comput Biol 2, e33.

Percival D.B., A.T.W. (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques. (Cambridge, Cambridge University Press).

Piechota, J., Tomecki, R., Gewartowski, K., Szczesny, R., Dmochowska, A., Kudla, M., Dybczynska, L., Stepien, P.P., and Bartnik, E. (2006). Differential stability of mitochondrial mRNA in HeLa cells. Acta Biochim Pol 53, 157‐168.

Rossmanith, W., and Holzmann, J. (2009). Processing mitochondrial (t)RNAs: new enzyme, old job. Cell Cycle 8, 1650‐1653.

Rubio, M.A., Rinehart, J.J., Krett, B., Duvezin‐Caubet, S., Reichert, A.S., Soll, D., and Alfonzo, J.D. (2008). Mammalian mitochondria have the innate ability to import tRNAs by a mechanism distinct from protein import. Proc Natl Acad Sci U S A 105, 9186‐9191.

21 Sbisa, E., Tullo, A., Nardelli, M., Tanzariello, F., and Saccone, C. (1992). Transcription mapping of the Ori L region reveals novel precursors of mature RNA species and antisense RNAs in rat mitochondrial genome. FEBS Lett 296, 311‐316.

Scarpulla, R.C. (2008). Transcriptional paradigms in mammalian mitochondrial biogenesis and function. Physiol Rev 88, 611‐638.

Shin, C., Nam, J.W., Farh, K.K., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38, 789‐802.

Slomovic, S., Laufer, D., Geiger, D., and Schuster, G. (2005). Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol Cell Biol 25, 6427‐6435.

Suissa, S., Wang, Z., Poole, J., Wittkopp, S., Feder, J., Shutt, T.E., Wallace, D.C., Shadel, G.S., and Mishmar, D. (2009). Ancient mtDNA genetic variants modulate mtDNA transcription and replication. PLoS Genet 5, e1000474.

Sylvestre, J., Vialette, S., Corral Debrinski, M., and Jacq, C. (2003). Long mRNAs coding for yeast mitochondrial proteins of prokaryotic origin preferentially localize to the vicinity of mitochondria. Genome Biol 4, R44.

Szczesny, R.J., Borowski, L.S., Brzezniak, L.K., Dmochowska, A., Gewartowski, K., Bartnik, E., and Stepien, P.P. (2010). Human mitochondrial RNA turnover caught in flagranti: involvement of hSuv3p helicase in RNA surveillance. Nucleic Acids Res 38, 279‐298.

Taylor, R.W., and Turnbull, D.M. (2005). Mitochondrial DNA mutations in human disease. Nat Rev Genet 6, 389‐402.

Temperley, R., Richter, R., Dennerlein, S., Lightowlers, R.N., and Chrzanowska‐Lightowlers, Z.M. (2010a). Hungry codons promote frameshifting in human mitochondrial ribosomes. Science 327, 301.

Temperley, R.J., Seneca, S.H., Tonska, K., Bartnik, E., Bindoff, L.A., Lightowlers, R.N., and Chrzanowska‐ Lightowlers, Z.M. (2003). Investigation of a pathogenic mtDNA microdeletion reveals a translation‐ dependent deadenylation decay pathway in human mitochondria. Hum Mol Genet 12, 2341‐2348.

Temperley, R.J., Wydro, M., Lightowlers, R.N., and Chrzanowska‐Lightowlers, Z.M. (2010b). Human mitochondrial mRNAs‐‐like members of all families, similar but different. Biochim Biophys Acta 1797, 1081‐ 1085.

Wang, G., Chen, H.W., Oktay, Y., Zhang, J., Allen, E.L., Smith, G.M., Fan, K.C., Hong, J.S., French, S.W., McCaffery, J.M., et al. (2010). PNPASE regulates RNA import into mitochondria. Cell 142, 456‐467.

Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., and Stadler, P.F. (2005). Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol 23, 1383‐1390.

Yakubovskaya, E., Mejia, E., Byrnes, J., Hambardjieva, E., and Garcia‐Diaz, M. (2010). Helix unwinding and base flipping enable human MTERF1 to terminate mitochondrial transcription. Cell 141, 982‐993.

Zhang, C. (2009). Novel functions for small RNA molecules. Curr Opin Mol Ther 11, 641‐651.

22 FIGURE LEGENDS

Figure 1 Map of the human mitochondrial genome indicating the following features (from centre with outer tracks corresponding to heavy strand and inner track corresponding to light strand); DNase sensitivity profile (central black track, opacity indicates strength according to central key), RNA secondary structures predicted by SISSIz (green track), annotated gene features (blue/red gradient indicates fraction of total mitochondrial gene expression according to central key), RNA sequencing (black histogram on grey background, scale 0‐900000[log10]), 5’ RNA cleavage sites determined by PARE (black histogram on pink background, scale 0‐8000), 3’ cleavage sites determined by non‐ template polyadenylation RNAseq reads (black histogram on purple background, scale 0‐ 8000) and small RNA profile (black histogram on blue background, scale 0‐2500).

Figure 2. (A) Mitochondrial gene expression. Proportion of RNAseq reads aligning to three primary polycistronic transcripts transcribed for the heavy major (HSP; light blue), heavy minor (HSP; dark blue) and light strand promoters (LSP; grey). (B) Proportion of RNAseq reads mapping to rRNA, tRNA, mRNA and intergenic or antisense regions of the mitochondrial genome. (C) Mitochondrial gene expression (blue) determined by RNAseq in human 143B mitochondria (mitoplast) preparations (error bars indicate 95%CI). Antisense transcription (green) to each gene is also shown with notable antisense transcription to ND5 indicated (asterisk). (D) Genome browser illustration of heavy and light strand showing ND5 and ND6 mRNAs (blue) and complementary antisense regions. Black histogram indicates RNAseq read distribution across browser view, including extended region of high transcription antisense to ND5 mRNA, PARE (red) and poly‐A (blue) histogram indicate transcript 5’ and 3’ ends, respectively. (E) Hierarchical clustering of mitochondrial gene expression across 16 human tissues. (Pearson’s correlation [r2] indicated. Rpbm; sequenced reads per base per million) (F) Histogram indicates the proportion of total polyA+ RNA mapping uniquely and exactly to the mitochondrial genome across 16 human tissues.

23 Figure 3. (A,B) Relative enrichment of nuclear‐encoded RNAs in mitochondria or mitoplast preparations. Proportion of sequenced reads aligning to nuclear (red) and mitochondrial (blue) genomes in libraries derived from mitochondria (A) and mitoplast (B) preparations. Comparison between libraries indicates the depletion of co‐purifying contaminant transcripts in mitoplast preparations. (C) Box whisker plot indicates enrichment for mitochondrial encoded (blue) mRNAs, rRNA and tRNA in mitoplast relative to whole mitochondrial RNA preparations. While the majority of nuclear‐encoded (red) mRNAs and rRNAs are depleted in mitoplast preparations, nuclear tRNAs are enriched in mitoplasts. (D,E) qRT‐PCR confirmation for mitoplast depletion of mRNAs (D) and miRNAs (E) for which we observed the highest expression in whole mitochondrial preparations. (F) Scatter‐plot indicating the normalised expression for nuclear (red) and mitochondrial

(blue) tRNAs, showing the enrichment of several nuclear tRNAs, including tRNA‐LeuUAA (p‐ value<0.001) and tRNA‐GlnUUG (p‐value=0.003) in mitoplasts. (G) qRT‐PCR shows the depletion of nuclear tRNA‐AlaAGC and tRNA‐LysUUU in mitoplasts, and enrichment of tRNA‐

GluUUG (p‐value=0.025) and presence of tRNA‐LeuUAA within mitoplasts. 12S rRNA included as positive control (error bars indicate SEM).

Figure 4. (A) Mitochondrial position of 5’ cleavage sites. Cleavage sites determined by PARE with total (green) and polyA+ (blue) human RNA and 3’ cleavage sites determined by RNAseq reads containing non‐template –CCA (purple) and polyA additions (orange) are indicated. (B‐E). Frequency distribution of (B) PARE reads from total RNA across window centred on annotated tRNA 5’ end, (C) non‐template –CCA sites across window centred on annotated tRNA 3’ end. (D) Frequency distribution of PARE reads from poly‐A+ RNA across window centred on annotated mRNA 5’ end, (E) non‐template poly‐A sites across window centred on annotated mRNA 3’ end. (F) Annotation of ND6 3’UTR by 3’ RACE. The 3’UTR sequence with stop codon (black) and ‐1 shifted codon (red) is indicated (top panel). Gel electrophoresis of product retrieved from ND6 3’RACE shown in lower panel.

Figure 5. (A) Mitochondrial positions of small RNAs (red). Internal pie‐graph indicates majority of highly expressed small RNAs primarily associated with tRNAs (dark blue). (B)

24 Size distribution for small RNAs derived from human 143B whole cells (green), mitochondria (blue) and highly expressed (>1,048 reads; red). (C) Small RNA frequency distribution associated with mitochondria tRNA genes indicates two distinct species immediately downstream of 5’ and 3’ processing sites.

Figure 6. (A) DNaseI footprinting across the mitochondrial genome provides global profile of protein‐DNA interactions at single nucleotide resolution. Cleavage tracks indicate normalised percentile DNaseI cleavage frequency relative to the mtDNA (0‐75%). Annotated genomic features including mRNA (light blue), tRNA (dark blue) and rRNA (green) with D‐loop and MTERF1 binding site are indicated. (B) Detailed view of D‐loop region. Top panel histogram showing PARE (red) and polyA (blue) read frequency distribution, indicating transcription initiation and termination events. Previously identified conserved elements (CS; green bars) and replication primer site (blue bars; retrieved from www.mitomap.org) and novel motif (#59; grey bar) also indicated. DNaseI cleavage profile indicates the normalised percentile DNaseI cleavage frequency (0‐25%).

Figure 7. (A) Hierarchical cluster diagram indicating the sensitivity of top 100 indentified DNaseI footprints. Footprints corresponding to TFAM and MTERF1 biding sites are indicated. (B) DNaseI footprint at mTERF binding site. Cleavage events associated with transcription termination are indicated by PARE (blue histogram) and non‐template polyadenylation RNAseq reads (red histogram). DNaseI cleavage frequency in Hela cells indicated by black histogram and frequency in other cell lines indicated by density plot. Underlined green nucleotides indicate MTERF1 binding. (C) Commonly recurring novel sequence motif #59 indicated (top panel) and example of novel cell‐specific footprint (bottom panel) (D) Association of T16189C SNP variant with distorted footprint. An extended footprint (blue bar) is present in 3 cell lines with C variant (green) but absent in cell lines with T variant (blue). (E) Histogram indicates significant increase in DNaseI cleavage frequency in cell lines containing variant 16189T allele as compared to 16189C allele (error bars indicate SEM).

25 Figure Click here to download Figure: Figure1.pdf

Pro Thr heavy strand

CytoB Phe

12SrRNA

Glu Val

ND6

16SrRNA

ND5 light strand Leu

Human Mitochondria Genome

0% 0% ND1 small RNA 3’ cleavage 5’ cleavage DNase

Leu cleavage Ile Ser fraction of mt RNAseq Gln

His 75% expression (log) 50% Met predicted RNA secondary structure

ND2

ND4

Trp Ala Asn Cys Tyr

ND4L Arg

ND3 CO1 Gly

CO3 Ser Asp

CO2

Lys

ATPase6

ATPase8 Click heretodownloadFigure:Figure2.pdf Figure E C A

4

0.29 expression (rpmk x10 ) 20.0 40.0 0.0 1.5 2.0 0.5 1.0 r 2 0.85 1.00 9 r 2 12S 11 adipose 16S ND1 brain 80 colon ND2 80 lymph CO1 prostate CO2 testes ATPase8/6 heart CO3 antisense sense breast ND3 HSP (minor) HSP (major) muscle ND4L/4 LSP liver ND5 * kidney ND6 D lung CYTB

whiteblood mtDNA: 1x10 1x10 poly-A PARE

adrenal poly-A PARE thyroid B 6 5 ovary ND6 ND5 CO1 ND3 ND4L/4 CYTB CO3 CO2 ND1 ND2 ATPase8/6 RNAseq RNAseq 8,000 log 0.0 13 10 rpkmx10 8 1 5.0 78 F 3 10,000 mitochondrial contribution to cellular polyA+ RNA 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 antisense intergenic/ heart mRNA tRNA rRNA

colon antisense transcription muscle

kidney ND5 adipose breast 12000 liver brain testes thyroid

adrenal heavy strand prostate light strand lymph ND6

ovary 14,000 whiteblood lung Figure Click here to download Figure: Figure3.pdf

mitochondrial 500 3 preparations mitochondria A D mitoplasts E

14% Genome: nuclear 2 150 86% mt 100 1 mitoplast fold enrichment B preparations 50 fold enrichment (relative to total RNA) (relative to total RNA) 18% 0 0 T1 18S R CO1 12S B2M NUC

82% EEF2 miR-16 HP S100A6 GAPDH miR-103 GNB2L1 miR-146a 10 mt nuclear mt mp C F 4 G 8 genome genome 10 0.20 0.4 4 15 800

) 6 2 , tpm)

4 y 3 c 10 LeuUAA 0.15 0.3 3 600 2 10

0 2 10 0.10 0.2 2 400 -2 GlnUUG

-4 5 1 mitochondria -6 10 fold enrichment 0.05 0.1 1 200 fold enrichment (log tRNA mitoplast/miitochondria -8 mitoplast (read frequen Lys nuclear tRNA 0 UUU -10 10 0 0 1 2 3 4 0.00 0.0 0 0

10 10 10 10 10 (relative to whole cell preparation) Ala Lys Gln Leu 12S mitochondria (read frequency, tpm) AGC UUU UUG UAA tRNA tRNA rRNA rRNA mRNA mRNA Click heretodownloadFigure:Figure4.pdf Figure F A bp 200 300 400 75

ladder

ND6 Human Mitochondria 3’ RACE 3’-CCA additions PARE (polyA+RNA) 3’-polyA+ additions PARE (totalRNA) rRNAs mRNAs tRNAs UAG stopcodon T A AGG stopcodon GG TT A T G T G A TT A GG 3' UTR A G T A GGG TT A GG D B A

T PARE (poly A) read

G PARE (total) read 6 3 A frequency (x10 ) frequency (x10 ) G 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 cleavage T GG -15 -15 sites mRNA 5’termini(nt) distance relativeto R tRNA 5’termini(nt) distance relativeto +1 +1 +15 +15 C E poly-A nontemplate RNAseq -CCA nontemplate RNAseq read frequency (x104) read frequency (x105) 2.0 4.0 6.0 8.0 0.0 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 -15 -15 mRNA 3’termini(nt) tRNA 3’termini(nt) distance relativeto distance relativeto +1 +1 +15 +15 +30nt 3’ end

tRNA tRNA (100 deciles) 5’ end -30nt

1.5 1.0 0.5 0.0 3.5 3.0 2.5 2.0 raw count) raw ; 10 (x

4 small RNA frequency RNA small C 30 28 26 24 size (nt) 22 20 18 16 143B whole cell 143B mitochondria (total) 143B mitochdondria (>1,048 reads)

0.25 0.20 0.15 0.10 0.05 0.00 proportion of sRNA total sRNA of proportion B tRNAs mRNAs rRNAs small RNAs

12%

86% small RNAs proportion of A

Click here to download Figure: Figure5.pdf Figure: to download here Click Figure Figure Click here to download Figure: Figure6.pdf

A.

D-loop

DNase I cleavage profile

mTERF1 binding 0% site (from center) tRNAs 1. BJ 2. Hela mRNAs 3. hES rRNAs 4. K562 5. Skmc

noncoding read frequency 6. Skncsh

percentile normalised 7. Th1 75%

B. LSP 8x104 8x106

poly-A PARE

CS1 #59 CS2 Rep.primer CS3 25% Hela 0% Skmc BJ hES K562 Th1 Sknshra chrM: 200 250 300 350 400 DNase I normalised percentile 0% 25% Figure Click here to download Figure: Figure7.pdf

DNase I normalised percentile 0% 25% A mean DNase I normalised B C Novel motif #59 percentile per nt 33,000 2 Z = 88.9 0.0 0.1 0.2 0.3 0.4 PARE 250 1

BJ Hela K562 hES Skmc Th1 bits poly A+ T 100% 0 CAGAAGAATCTAT 1 2 3 4 5 6 7 8 DNase I A hES Hela Hela BJ Th1 TFAM hES Skmc binding K562 K562 Bjtert site Skmc Th1 Sknshra TGTTAAGATGGCAGAGCCCGGTAATCGCATAA CCAATACCAAACGCCCCTCTTCGTCT 6435 footprint 6460 chrM:3,230 MTERF1 DNA binding site 3,261

D footprint E Bjtert 0 600 Caco2 *** Gm12878 mTERF1 H1es Hgf binding Hrcepic 400 site Huvec Jurkat 16189T K562 Mcf7 Nhek 200 frequency (rpm)

Saec DNase cleavage Sknshra Hre Panc1 Hepg2 0

16189C CACATCAAACCCCCCTCCCCCATGCTTAC T C DNase I normalised percentile 0% 25% (n=13) (n=3) Inventory of Supplemental Information

Supplementary Inventory:

6 Supplemental Figures:

Figure S1 (related to Figure 2) shows experimental controls indicating purity of mitochondria preparations, including qRT‐PCR validation, and expands on Figure 2B with hierarchal‐clustering of nuclear‐gene expression associated with mitochondrial function.

Figure S2 (related to Figure 3) demonstrates the correlation between RNA libraries prepared from mitochondria and mitoplast preparations and also provides broad profile of ncRNAs enriched within mitochondria/mitoplast preparations.

Figure S3 (related to Figure 4) shows supporting evidence and illustrated examples, noted in the main text, of aberrant tRNA processing events and non‐polyadenylated RNA expression.

Figure S4 (related to Figure 4) shows supporting evidence and illustrated examples, noted in the main text, of novel RNA secondary structures predicted within mitochondria transcriptome that associate with cleavage events.

Figure S5 (related to Figure 5) shows illustrated examples of mitochdrial small RNA loci associated with tRNAs and provides evidence for their biogenesis. Additional examples of small RNA loci, noted in the main text, at the light strand replication origin are also shown.

Figure S6 (related to Figure 7) provides examples of DNase footprints noted in main text, over‐ represented sequence motifs identified within DNase I footprints and illustration of the DNase I cleaved residues in relation to bound MTERF1 structure.

Figure S7 (related to Figure 7) displays white‐noise and WOSA plots indicating periodic trends within DNase I cleavage profiles across seven analysed cell‐lines.

6 Supplemental Tables:

Table S1 (related to Figure 1) provides summary of sequencing libraries (both provided within this library and publicly available).

Table S2 (related to Figure 2) indicates expression of annotated mitochondrial mRNA, rRNA and tRNA and expression of mitochondria‐ and nuclear‐encoded genes across 16 human tissues.

Table S3 (related to Figure 4) indicates positions of 5’ and 3’ RNA processing resolved by RNA sequencing.

Table S4 (related to Figure 4) compares RNA cleavage sites with current gene annotations and indicates sites of predicted RNA secondary structures.

Table S5 (related to Figure 5) indicates small RNAs within the mitochondrial genome.

Table S6 (related to Figure 7) indicates position of DNase I footprints and SNPs as determined from DNase I sequencing.

Supplemental Experimental Procedures

Includes all detailed experimental procedures employed during the study that are omitted from main text due to space constraints.

Supplemental References

Includes references cited within supplemental information. Supplemental Information (compiled Word file)

SUPPLEMENTAL FIGURES

Figure S1, Related to Figure 2, Mitochondrial purification. (A) Western Blot using antibodies to mitochondrial‐ localised protein (‐MnSOD) and nuclear localised proteins (‐PSP1) performed on cytoplasm and mitochondrial lysate indicates relative purity of mitochondrial preparations (B) Ratio of mitochondrial RNAs (CO1 mRNA and 12S rRNA) to nuclear (NEAT1) and cytoplasmic (GAPDH, 18sRNA) RNA expression as determined by qRT‐PCR in mitochondria preparations relative to whole cell preparation indicates relative purity of mitochondrial preparations (error bars indicate SEM) (D) Gene expression profiles for mitochondrial genes (ND5, ND6, CYTB and 12S rRNA) as determined by RNA sequencing (blue with left scale, 95%CI indicated) or qRT‐PCR (red with left scale, error bars indicate SEM). (E) qRT‐PCR shows expression of transcripts antisense (green) relative to sense ND5, CO1 and 12S rRNA genes (blue, error bars indicate SEM) (F) Nuclear gene expression associated with mitochondrial function. Hierarchical clustering detail identifies a subset of nuclear‐encoded genes (black) associated with mitochondrial biogenesis and function (Pagliarini et al., 2008) that exhibit correlated expression profile with mitochondrial genes (green).

1

Figure S2, Related to Figure 3, Validation of mitochondria/mitoplast preparation purity (A) qRT‐PCR indicates ratio of mitochondrial RNAs (CO1 mRNA and 12S rRNA) to nuclear (NEAT1) and cytoplasmic (GAPDH, 18sRNA) RNA expression in mitochondria (blue) and mitoplast (green) preparations relative to whole cell preparation, thereby indicating relative purity (error bars indicate SEM). (B) Cumulative frequency with which transcribed nucleotides occur in mitochondria and mitoplast sequenced libraries. Less than 1.5% of transcribed nucleotides are singly represented in sequenced mitoplast libraries. (C) Scatter‐plot showing the relative normalised expression of mitochondrial genes between whole mitochondria and mitoplast preparations. The close correlation (r2 = 0.98, p < 0.01) suggests the expression dynamics are preserved between the two preparations. (D) Relative enrichment of nuclear‐encoded RNA transcripts 5S rRNA, MRP and RNase P in mitoplast relative to mitochondria preparations. (E‐ G) Presence of novel ncRNA within mitochondria. Scatter‐plots show the relative expression of known ncRNAs (E), Evofold predicted ncRNAs (F) and RNAz predicted RNAs (G) between mitoplasts and mitochondria. The majority of ncRNAs are depleted in mitochondria indicating they represent co‐purifying contaminants.

2

Figure S3, Related to Figure 4, (A,B) PARE and 3’‐CCA non‐template reads indicate aberrant tRNA 3’ and 5’ processing. (A) The comparison of relative 5’ (red) and 3’ (blue) processing of tRNA determined by PARE and non‐ template 3’ CCA additions indicates wide variation between these processes and suggest the existence of partially processed tRNA transcripts (asterisk). (B) Frequency distribution of PARE reads (top panel, red) and non‐template 3’ CCA additions (bottom panel, blue) across tRNA‐Asn and ‐Ala genes (blue boxes) suggests existence of partially processed intermediate transcripts. (C,D) PARE sequencing reveal existence of non‐canonical intra‐exonic cleavage events. (C) Relative proportion of intra‐exonic PARE reads from Hek and Hela cell that align with intra‐ exonic Brain PARE reads indicate the positional conservation of non‐canonical cleavage sites. (D) Frequency distribution of PARE reads within ND4 mRNA (blue) illustrates an example of the positional conservation of non‐ canonical cleavage. (E) Frequency distribution of non‐template poly‐adenylation RNAseq reads derived from human 143b cells across window centered on mRNA 3’ end. (F,G) Frequency distribution of non‐template poly‐adenylated RNAseq reads from mixed human tissues (F) or human 143b cells (G) centered on tRNA 3’ end. (H) Histogram indicates differential gene polyadenylation frequency as determined by RNAseq. (I) Relative expression of mitochondrial genes in Hela polyA+ (green) and Hela polyA‐ (blue) RNAseq libraries indicates differential expression of non‐polyadenylated isoforms.

3

Figure S4, Related to Figure 4, (A) Box‐whisker plot indicating the frequency of cleavage sites (as determined by PARE sequencing, Hek cells) corresponding to rRNA (red), tRNA (blue) and novel (orange) secondary structures. (B‐ D) Genome browser view showing frequency distribution of PARE reads (black histogram) associated with RNA secondary structure (grey bar) in CO1 (B), ND2 (C) and CO3 (D) genes. (E,F) Novel predicted RNA secondary structures gene show Watson‐Crick base interactions (blue bars), co‐varying bases (red bases) and conserved interactions (red lines).

4

Figure S5, Related to Figure 5, (A‐F) small RNAs associated with tRNA‐SerAGY (A‐C) and tRNA‐LeuUUR (D‐F). (A,D) Frequency distribution of small RNAs (blue shades) associated with 5’ end of tRNA (green). (B,E) Relative expression (reads per million) of tRNA‐associated small RNAs in eight different human cell lines showing differential expression. (C,F) Relative expression of small RNA isoforms (light, medium and dark blue shades) shows differential isoform specific expression in eight different human cell lines. (G; Above) Schematic diagram of mitochondria tRNA‐Trp mature structure indicates modified nucleotides (methylated adenine and non‐template –CCA nucleotide additions [red]). (G;Below) Sequenced small RNA reads that include sequencing errors at methylated adenine (red) indicate small RNAs are derived from mature tRNA (blue sequence). (H) Frequency distribution of small RNA 5’ ends indicates differences between nuclear (red) and mitochondrial (blue) tRNA loci that includes an absence of dicer‐ dependent (Haussecker et al., 2010) small RNA class in mitochondria. (I) Small RNA associated with mitochondrial tRNA loci occur in all organisms considered, including D.melanogastor. We considered D.melanogastor mutant small RNA libraries to determine the contribution of microRNA biogenesis proteins to mt‐tRNA small RNA biogenesis. The Box‐whisker plot shows no fold‐depletion of small RNAs mapping to mitochondrial tRNA loci in mutant D.melanogastor libraries (Czech et al., 2009), showing no contribution of dcr‐1 and dcr‐2 to biogenesis of mt‐tRNA associated small RNAs. (J) Frequency distribution indicating sRNA alignments at the origin of light strand replication and sites of RNA to DNA transition.

5

Figure S6, Related to Figure 7, Examples of DNaseI footprints. (A) DNaseI footprint at origin of light replication show protection at transcription initiation site. DNaseI cleavage profile indicates the normalised percentile DNAse I cleavage frequency (0‐25%). (B) DNaseI cleavage profile at novel cell‐specific footprint. (C) De novo identification of motifs within DNaseI footprints. Examples of de novo motifs (top panel) with significant homology (q‐score indicated) to transcription factors (bottom panel) involved in regulating mitochondrial gene expression (T3R, ER, STAT3 and CREB) or nuclear gene expression associated with mitochondrial function (MEF2 and NF1) (D) mTERF1 bound to termination DNA sequence (grey) with residues subject to DNaseI cleavage indicated (blue) (Yakubovskaya et al., 2010). (E) Prevalent light strand promoter (LSP) initiated transcription abortion in D‐loop. Genome browser view of D‐loop regions showing PARE read frequency distribution (top red panels), RNAseq (black panel) and polyA (blue panel) tracks indicate transcription initiation and termination events. Previously identified regulatory elements (retrieved from www.mitomap.org) are indicated (green bars).

6

Figure S7, Related to Figure 7, Periodicity within DNAseI cleavage profiles across mitochondrial genome of analysed cells (A‐G) White noise test (left) across phased nucleotide periods indicates deviation from random for each cell type analysed. Significance indicated by paired lines. WOSA plots (right) indicate periodic enrichments within DNaseI cleavage profiles at ~10nt and ~80nt.

7 SUPPLEMENTAL TABLE LEGENDS

Table S1, Relating to Figure 1, (Sheet A) Summary of deep‐sequencing datasets provided within study or analysed from publicly available sources.

Table S2, Relating to Figure 2, (Sheet A) Expression of annotated mRNA, tRNA and rRNA in 143B cells whole mitochondria and mitoplast preparations determined by RNA sequencing (reads per million per kb), and expression levels relative to total mitochondrial gene expression, and ratio of sense/antisense transcription relative to each gene indicated. (Sheet B) Relative expression of mitochondrial genes in 16 human tissues (indicated in reads per million per kb). (Sheet C) Gene expression of nuclear‐encoded genes involved in mitochondria biogenesis and function in 16 human tissues.

Table S3. Relating to Figure 4, (Sheet A) Transcript processing sites determined by PARE sequencing in human 143B cells. (Sheet B) Sites of non‐template 3’CCA additions determined by RNAseq sequencing in human 143B cells (mitoplasts). Genome coordinates (hg18) for identified nucleotide sites are indicated with raw RNAseq read frequency.

Table S4, Relating to Figure 4, (Sheet A) Position of mRNA 5’ ends previously annotated (from www.mitomap.org) and determined by PARE sequencing of RNA from Hek cells, Hela cells and human whole brain polyA+ RNA. (nucleotide corresponding to the local PARE read distribution peak and frequency is indicated). (Sheet B) Position of mRNA 3’ polyadenylation sites previously annotated (from www.mitomap.org) and as determined by RNAseq from human tissue and mitoplasts isolated from 143B cells. (nucleotide corresponding to the local PARE read distribution peak and frequency is indicated). (Sheet C) Predicted RNA secondary structures in the mitochondrial transcriptome. The regions of mitochondrial genome predicted to form secondary structures (Z‐score > 2, P95) and annotated transcripts of overlapping regions are indicated.

Table S5. Relating to Figure 5, (Sheet A) Small RNAs encoded within the mitochondrial genome. The position of 32 unique small RNA sequences expressed from 17 loci within the mitochondrial genome, their frequency and unique sequence from the 17 distinct loci are indicated.

Table S6, Relating to Figure 7, (Sheet A) Mitochondrial genome coordinates for identified putative DNaseI footprints are provided. Footprints containing significant homology to TRANSFAC motifs are shown, otherwise a novel identifier (UW) is assigned. F‐score for each motif is indicated. (Sheet B) De novo identification of nucleotide motifs within footprints with accompanying Z score statistic. Best motif matches within TRANSFAC and Jaspar are shown. (Sheet C) SNPs identified within the sequenced mitochondrial genome of 15 different cell lines. The position and variant nucleotide present within each cell line is indicated.

8 SUPPLEMENTAL EXPERIMENTAL PROCEDURES

Cell culture

143B osteosarcoma cells were cultured at 37˚C under humidified 95% air/5% CO2 in Dulbecco's modified Eagle's medium (DMEM, Invitrogen) containing glucose (4.5 g.l‐1), 1 mM pyruvate, 2 mM glutamine, penicillin (100 U.ml‐1), streptomycin sulfate (100 µg.ml‐1) and 10 % fetal bovine serum (FBS).

Mitochondrial Isolation

Mitochondria were prepared from 2 x 107 143B cells grown overnight in 15 cm2 dishes. Cells in PBS were sedimented (150 g for 5 min at 4 °C), resuspended in 4 ml of ice‐cold 10 mM NaCl, 1.5 mM MgCl2 and 10 mM Tris‐ HCl, pH 7.5. Cells were allowed to swell for 5 min on ice and briefly homogenised with a 7 ml glass homogenizer. The sucrose concentration was then adjusted to 250 mM by adding 2 M sucrose in 10 mM Tris‐HCl and 1 mM

EDTA, pH 7.6 (T10E20 buffer). The nuclei from this suspension were sedimented (1,300 g for 3 min at 4 °C), and the centrifugation step was repeated for the supernatant once more. Mitochondria were sedimented from the supernatant by centrifugation (15,000 g for 15 min at 4 °C). The mitochondrial suspension was layered on a discontinuous sucrose gradient (1.0 and 1.7 M) in T10E20 buffer and centrifuged at 70,000 g for 40 min at 4˚C. The mitochondrial fraction was recovered from the interface between the two sucrose cushions, diluted with 400 µl of

250 mM sucrose in T10E20 buffer and washed once in the same solution. The final mitochondrial pellet was resuspended in 100 µl of 250 mM sucrose in T10E20 buffer and protein concentration as determined by the bicinchoninic acid (BCA) assay using bovine serum albumin (BSA) as a standard.

RNase A treatment of purified mitochondria and mitoplasts

Mitochondria or mitoplast preparations were treated with RNAse A to remove contaminating cytoplasmic RNA.

The mitochondrial pellet was washed once with 250 mM sucrose in T10E20 buffer and resuspended in the same buffer. Mitochondria were treated with 100 µg.ml‐1 RNase A for 30 min on ice. The efficacy of the RNase A treatment was compared to a control without RNase A treatment where both samples were spiked with in vitro transcribed EYFP RNA (125 ng) before the RNase A treatment. Quantitative RT‐PCR analysis showed complete loss of the EYFP RNA upon RNase A treatment.

Immunoblotting

Immunoblotting using the nuclear paraspecle protein 1 (PSP1) and the mitochondrial manganese superoxide dismutase protein (MnSOD) as markers confirmed the separation of mitochondria from nuclear contaminants (Figure S1A). Specific proteins were detected using a mouse MnSOD monoclonal antibody (diluted 1:2000, Transduction Laboratories) and a mouse PSP1 monoclonal antibody (diluted 1:5000, gift from Dr Archa Fox) in 5% skim milk powder in phosphate‐buffered saline (PBS). Immunoblots of the mitochondrial fraction against the antibodies showed separation of the mitochondria from the nuclei.

9 RNA harvesting from purified mitochondria

Total RNA was harvested from isolated mitochondria using the miRNeasy kit according to manufacturers instructions (Qiagen). RNA purity and integrity was confirmed by BioAnalyser (Agilent, CA). To provide an additional indication of mitochondria purity we used qRT‐PCR to show enrichment of mitochondrial RNAs (CO1 mRNA and 12S rRNA) relative to nuclear (NEAT1) and cytoplasmic genes (GAPDH; Figure S1B).

Quantitative RT‐PCR

The transcript abundance of mitochondrial, cytoplasmic and nuclear genes was measured on RNA isolated from whole cells and mitochondria or mitoplasts from 143B cells using the miRNeasy RNA extraction kit (Qiagen). cDNA was prepared by reverse transcription of RNA using random hexamers and ThermoScript reverse transcriptase (Invitrogen) and used as a template in the subsequent PCR that was performed using a Corbett Rotorgene 3000 using Platinum UDG SYBR Green Mastermix (Invitrogen).

RNA sequencing

Deep sequencing of the isolated mitochondrial and mitoplast RNA was performed by GeneWorks (Adelaide, Australia) on the Illumina GAII platform (Illumina, CA) according to the Illumina Directional RNA‐Seq protocol. Notably, to capture tRNA and rRNA transcripts that are usually omitted from standard RNA sequencing approaches we employed random hexamer primers for cDNA library generation, and omitted depletion steps to preserve rRNA expression and size selection steps to preserve tRNA expression.

Human tissue RNA libraries were sequenced as part of the Body Map initiative and were provided by Illumina. For tissue specific libraries, polyA+ RNA from 16 human tissue sources (Ambion) was prepared using the Illumina mRNA‐Seq kit with random priming and subject to single‐end 75nt sequencing on the Illumina HiSeq 2000 Platform. For mixed‐tissue library, total RNA 16 human tissues were combined in equal amounts and polyA+ RNA was prepared from the mixture using the Illumina Directional RNA‐Seq protocol and subjected to single‐end 100nt sequencing on HiSeq 2000 Platform. Human tissue RNA libraries are available for download from: http://www.broadinstitute.org/igvdata/BodyMap/IlluminaHiSeq2000_BodySites/

Mapping sequenced reads to human genome

Sequenced reads were initially trimmed of 3’adaptor sequence (5'‐ATCTCGTATGCCGTCTTCTGCTTG‐3') prior to mapping to reference human genome (hg18). For sequenced reads from mitochondria or Illumina Human Body Map, sequenced reads were initially aligned uniquely and exactly to the human genome (hg18) with ZOOM (Lin et al., 2008). This afforded the accurate comparison of nuclear and mitochondrial genome contributions to sequenced reads. When solely considering mitochondrial gene expression, reads were aligned to the mitochondrial genome with up to 2 mismatches, permitting sequenced reads to overlap SNPs or modified nucleotides. Read mapping were converted into .bed and .wig format for visualisation on the UCSC genome browser (genome.ucsc.edu).

10 Annotations employed within this study were sourced as follows: Mitochondrial gene and control element annotations were retrieved from Mitomap (www.mitomap.org); Nuclear gene annotations were retrieved from the RefSeq table within the UCSC genome browser (genome.ucsc.edu); mitochondria associated genes were obtained from Mitocarta (http://www.broadinstitute.org/pubs/MitoCarta/index.html); Nuclear tRNA sequences were retrieved from the Genomic tRNA sequence database (lowelab.ucsc.edu/GtRNAdb). We uniquely aligned reads to a combined nuclear and mitochondrial tRNA sequence file, before multiple tRNA sequences were collapsed to provide a single sum gene expression for each tRNA species. One mismatch at modified nucleotides was permitted during mapping to account for the propensity of sequencing errors at modified nucleotides within tRNA. analysis of cop‐purifying contaminants was performed using FatiGO, Babelomics 4 suite (Medina et al., 2010) requiring an adjusted p‐value < 0.05 (two tailed, fisher exact test).

We employed CuffDiff (Trapnell et al., 2010) to calculate the tissue‐specific expression of mitochondrial genes, as annotated according to mitomap (www.mitomap.org), from resultant SAM files. Gene tracking files were then subject to hierarchical clustering of tissue‐specific expression using Cluster3 (de Hoon et al., 2004) and visualised by TreeView (Saldanha, 2004) with average‐linkage Pearson’s correlation indicated. Pearson’s correlation and ascribed significance between nuclear and mitochondria tissue order was calculated using Prism 5 Statistical package (GraphPad, La Jolla CA).

PARE sequencing analysis

RNA was extracted from 143B mitochondria using the miRNeasy kit (Qiagen) and PARE libraries prepared as described previously (German et al. 2009). RNA was polyadenylated with E.coli poly(A) polymerase, ethanol precipitated and cytoplasmic rRNA was depleted using the RiboMinus kit (Invitrogen). RNA adaptor (5’‐ GUUCAGAGUUCUACAGUCCGAC‐3’) was ligated using T4 RNA ligase (Ambion) and RNA was extracted with phenol / chloroform, ethanol precipitated and purified using an Oligotex kit (Qiagen). RNA was then reverse transcribed (SuperscriptII, Invitrogen) using the primer (5’‐CGAGCACAGAATTAATACGACT(18)V‐3’) and amplified by PCR (Phusion DNA Polymerase, Finnzymes) using the primers (5’‐GTTCAGAGTTCTACAGTCCGAC‐3’ and 5’‐ CGAGCACAGAATTAATACGAC‐3’). PCR conditions were 5 cycles of 98 °C for 20 s, 60 °C for 30 s and 72 °C for 3 min, followed by a 7 min incubation at 72˚C. Products were gel purified, cleaved with MmeI (New England Biolabs) and dephosphorylated (FastAP, Fermentas). Samples were run on a 12% polyacrylamide gel and a 42 nt band excised. DNA was eluted from the gel overnight with 0.3 M NaCl, filtered through a Millex 0.45 µM column and ethanol precipitated. Products were then ligated using T4 DNA ligase (Fermentas) to one of 6 double‐stranded DNA adaptors (top 5’‐P‐TCGTATGCCGTCTTCTGCTTG‐3’, bottom NN: 3’‐NNAGCATACGGCAGAAGACGAAC‐5’) that varied in the composition of an additional first 6 nucleotides (not in the given sequence) to enable barcoding of the separate tissue samples. Another 12% polyacrylamide gel was run and a 92 nt band excised and purified as above followed by PCR amplification using the following conditions : 21 cycles of 98 °C for 20 s, 60 °C for 30 s and 72 °C for 15s, followed by an incubation at 72˚C for 3 min. The product was run on a polyacrylamide gel and purified prior to

11 high‐throughput sequencing using the Illumina GA platform. Publicly available PARE sequencing reads for Hek (GSM546438‐GSM546444), Hela and Brain samples (GSM548630‐GSM548637) were retrieved from the NCBI Short Read Archive and aligned to the human genome (hg18) and the mitochondria genome (chrM) with ZOOM (Lin et al., 2008) requiring exact and unique matches as above. Correlation values and significance (two‐tailed, paired t‐ test) were determined using Prism 5 Statistical package (GraphPad, La Jolla CA).

Identification of sites with non‐template polyA or 3’CCA nucleotide additions

Mitochondria and mitoplast directional RNA sequenced reads and Illumina HumanBody Map 100nt mixed‐tissue directional reads were used to identify sites of non‐template poly‐A or ‐CCA nucleotide additions to the mitochondrial transcriptome using the following approach. Sequenced reads were aligned to the human genome requiring exact matching. Those reads that did not map exactly to the genome were then trimmed of either ‐CCA nucleotides or three or more ‐A nucleotides from the 3’ end. Trimmed reads longer than 15nt were then realigned to the human genome again with those reads that mapped uniquely and exactly following 3’ polyA or CCA trimming being considered indicative of putative cleavage sites and/or polyadenylations sites. Nucleotide sites were additionally required to have evidence of 10 or more aligned reads from for consideration in analysis.

PolyA+ and polyA‐ RNA preparations from Hela cells were employed to determine the relative expression of polyadenylated and non‐polyadenylated mitochondrial mRNA isoforms. Sequenced reads were retrieved and mapped to the human genome as above. Hela polyA+ and polyA‐ sequenced libraries are available at: ftp://ftp03.bcgsc.ca/public/HeLa/

Small RNA sequencing

Small RNAs were harvested from isolated mitochondria using the miRNeasy RNA extraction kit (Qiagen). RNA purity and integrity was confirmed by BioAnalyser (Agilent, CA). Deep sequencing of the mitochondrial small RNA was performed by GeneWorks (Adelaide, Australia) on the Illumina GAII (Illumina, CA) according to the manufacturer’s instructions with one modification to sample isolation from the PAGE gel with after adaptor ligation was performed with a modified set of size markers to facilitate sequencing of small RNAs between 15‐30nt. Small RNA sequenced reads were aligned to the human genome (hg18) with ZOOM requiring exact and unique matches (Lin et al., 2008). Comparison between small RNAs and mitochondria gene expression and significance (two‐tailed, paired t‐test) were determined using Prism 5 Statistical package (GraphPad, La Jolla CA).

Additional small RNA datasets for H.sapiens (Mayr and Bartel, 2009) (GSM416733, GSM416753, GSM416756, GSM416759, GSM416755, GSM416758, GSM416757, GSM416760, GSM416732, GSM416754, GSM416761) were obtained from the NCBI Gene Expression Omnibus (NCBI GEO) and aligned to the human genome (hg18) as for mitochondrial small RNAs requiring exact and unique matches (Lin et al., 2008).

12 Conserved RNA secondary structure analysis

The multiz 44‐way mitochondrial genome sequence was downloaded from the UCSC browser for the hg18 genome assembly. Alignment blocks were merged together and subsequently fragmented into 827 overlapping windows of size 120 nt with a 40 nt step encompassing both strands. The alignment‐based consensus secondary structure prediction algorithm SISSIz, which boasts lower false‐positive rates than RNAz (Gesell and Washietl, 2008), was then used to make predictions with parameters "‐d ‐t ‐n 100 ‐p 0.02". A randomized simulated alignment for each native alignment window was generated by SISSIz with parameters "‐s ‐d ‐t ‐p 0.02", hence conserving mean pairwise identity and dinucleotide composition. Simulated alignments were then used to estimate the false positive rate by submitting them to the same analysis as native alignments and by counting all resulting scores displaying a Z‐score greater or equal to 2.

Mitoplast preparation

Mitochondria were isolated from ten 15 cm2 dishes as described above. Mitochondria were washed and suspended in STE containing 0.1% (w/v) fat‐free BSA (STE/BSA) in the last step. The final pellet was suspended in a minimal amount of STE/BSA and the protein concentration was measured by the BCA assay. The mitochondria were diluted with STE/BSA to a final concentration of 50 mg.ml‐1 and stirred on ice. 0.5 ml of digitonin (6.25 mg.ml‐1) was added dropwise over 2 minutes and the solution was kept on ice for a further 13 minutes. 3 volumes of STE/BSA were added and the mixture was homogenised 4 times with a tight homogenizer. The mitoplasts were collected by centrifugation for 10 minutes at 10,000 g and the concentration was measured using the BCA assay. Immunoblotting was carried out to confirm the absence of the outer membrane from the mitoplasts. RNA was harvested, sequenced and aligned to the reference genome as for whole mitochondria preparations (see above). Correlation and significance (two‐tailed, paired t‐test) calculations between mitoplast and mitochondrial gene expression performed using Prism 5 Statistical package (GraphPad, La Jolla CA).

Reverse transcription and qPCR

Quantitative PCR was performed for hsa‐miR‐146a, has‐miR‐103 and has‐miR‐16 using TaqMan MicroRNA Assays (Life Technologies) in the 7900 Real‐Time PCR System (Applied Biosystems) according to the manufacturer’s instructions. Gene specific reverse transcription was conducted with 50 ng RNA in each instance, with a final 1 in 15 dilution of cDNA used for qPCR.

DNaseI hypersensitivity mapping

DNaseI hypersensitivity mapping was performed as described previously (John et al., 2011; Sekimata et al., 2009). Briefly, DNaseI digestion from human cell lines was performed, followed by deproteination and RNA removal. The resulting purified DNA samples were size fractionated by sucrose gradient centrifugation to obtain fragments that were cleaved on both ends by DNaseI as described (Sabo et al., 2006). Sequencing libraries were constructed according to Illumina’s genomic preparation kit protocol with minor modifications, and sequenced on an Illumina

13 Genome Analyzer by the University of Washington High‐Throughput Genomics Unit. Sequence reads were aligned to the reference mitochondrial genome using the ELAND aligner. The 5’ termini of the sequenced reads, corresponding to the cleavage sites, was used to assign each nucleotide within the mitochondrial genome an integer score corresponding to DNaseI cleavage frequency. DNaseI cleavage profiles were subsequently corrected for intrinsic DNaseI dinucleotide cleavage biases as previously described (Hesselberth et al., 2009).

In addition, sequence reads generated as part of the ENCODE Consortium were employed during analysis of sequence variation within DNaseI footprints. All ENCODE data were used in accordance with the ENCODE Consortium Data Release Policy and are available from the ENCODE Data Coordination Center at UCSC (http://genome.ucsc.edu/ENCODE/).

Detection of single nucleotide variants within DNaseI‐released fragment sequences

To identify single nucleotide variants in the mitochondrial sequence using DNaseI mapping‐derived reads, we employed the Genome Analysis ToolKit (McKenna et al., 2010). Firstly DNaseI sequencing reads were uniquely mapped to the mitochondrial genome with Bowtie (Langmead et al., 2009) using default settings before indels and variants called with the Unified Genotyper. Nucleotide variants were then modeled in reference to the mitochondrial genome to generate a cell‐specific mitochondrial genome against which DNaseI sequencing reads were then realigned requiring unique and exact matches. Aligned reads were then trimmed to the single 5’‐ nucleotide that correspond to the site of DNaseI cleavage and used to generate a .wig format histogram that visualizes a DNaseI cleavage frequency profile. Disease‐associated mitochondrial variants were retrieved from MitoMap (www.mitomap.org).

Identification of mitochondrial DNaseI footprints

Footprints were defined in DNaseI cleavage profiles as follows. A footprint is comprised of 3 components: a central component with a flanking component to each side. The central, or core, component of a footprint corresponding to the ‘shadow’ of bound protein while its flanking regions show active cutting by the DNaseI enzyme. The contrast between a central component's score and its flanking scores provide evidence of a bound protein to DNA and is quantified according to: fp‐score = (C+1)/L + (C+1)/R, where

C = the average number of tags in the central component, L = the average number of tags in the left flanking component, and R = the average number of tags in the right flanking component.

The footprint detection algorithm searches for footprints with central components between 8 and 40 nt in length and flanking components with lengths of 3 to 10 nt, inclusively. The output of the algorithm is the set of footprints with optimal fp‐scores, subject to the criteria that L and R must both be greater than C, and all central components

14 must be disjoint. As defined, a lower fp‐score is more significant than a higher one. When two or more potential footprints have overlapping central components, we select the one with the lowest fp‐score, or the one genomically most left on ties, for output. The final output is then filtered to achieve an expected 1% false discovery rate relative to random expectation.

De novo motif discovery

Initially, the frequency of target regions within which a subsequence pattern occurs was determined, considering every 8nt permutation over the 4‐letter DNA nucleotide alphabet, with up to 8 intervening IUPAC 'N' degenerate symbols. For background estimates, nucleotide labels within every target region were randomly shuffled; thereby maintaining local nucleotide label compositions, and the frequency of each shuffled pattern was determined. The entire shuffling procedure was repeated 250 times, allowing sample mean and variance calculations for every pattern's frequency. A Z‐score was computed for each observed subsequence pattern by subtracting the mean frequency estimate from the observed frequency and then dividing by the estimated standard deviation. This returned a list of subsequence patterns in descending Z‐score order that was clustered and filtered to remove redundant motif as follows. The characters between two patterns were sequentially compared. When two patterns are the same length and their 'N' placeholders align, a character difference in one position was permitted to declare similarity; otherwise, two differences are permitted. Initially, the highest Z‐score pattern to the output list with each subsequent pattern being compared to every entry in the output list. If a similar entry was found, the pattern was discarded; otherwise, the pattern was added to the bottom of the output list. The reverse character sequence of every pattern also undergoes same filtering. The retuned motif list was then subject to a second stage filter that includes all alignment possibilities and reverse complement combinations. For remaining patterns a positional weight matrix (PWM) was then constructed by normalizing all target sequences for a pattern's instances into a frequency matrix. A list of non‐redundant motif models whose instances exist within footprints was determined. Motifs were ordered by Z‐score, and sorted and appended, keeping only those with Z‐scores at least 30. Next, the proportion of a pattern's instances that genomically overlap instances from another pattern were determined with all pairwise combinations between patterns being considered. Each pattern’s instance was scanned twice, firstly for only those instances that do not deviate from the pattern, and secondly for instances that have up to 1 mismatch. Scanning was performed over all padded footprints, merged across all cell types with instances with footprints overlapping by more than 0.2 being discarded in favour of the higher Z‐score. Remaining patterns comprised the final non‐redundant motif models.

Analysis of periodicity in DNaseI cleavage profiles

The potential periodicity in DNA structure across the mitochondrial genome was investigated by using DNaseI cleavage activity as a proxy for structure. Initially, cleavage frequencies were converted to percentile rankings and then subtracted from resulting sample mean to mean‐center the data. The null hypothesis that the data were generated from a white noise process was tested using a normalized cumulative periodogram test as described in

15 (Percival D.B., 1993) and rejected at 0.01 significance for each cell analyzed. The periodogram estimates showed no indication of broadband bias but had large variance characteristics. We applied the Welch's Overlapped Segment Averaging (WOSA) spectral estimate technique with block sizes of 256 measurements, 50% overlap percentages and Hanning data tapers to reduce the variance (Percival D.B., 1993).

Data Accession

All sequencing data generated during this study can be retrieved from the NCBI Short Read Archive (SRA) with the following accession identifiers; mitochondria long RNA (>50nt) sequencing (pending); mitochondria small RNA (15‐ 30nt) sequencing (pending); mitoplast long RNA (>50nt) sequencing (pending); PARE sequencing (pending); DNaseI cleavage sequencing (pending). Processed data is also available for download and analysis from mitochondria.matticklab.com.

SUPPLEMENTAL REFERENCES

Czech, B., Zhou, R., Erlich, Y., Brennecke, J., Binari, R., Villalta, C., Gordon, A., Perrimon, N., and Hannon, G.J. (2009). Hierarchical rules for Argonaute loading in Drosophila. Mol Cell 36, 445‐456. de Hoon, M.J., Imoto, S., Nolan, J., and Miyano, S. (2004). Open source clustering software. Bioinformatics 20, 1453‐1454.

Gesell, T., and Washietl, S. (2008). Dinucleotide controlled null models for comparative RNA gene prediction. BMC Bioinformatics 9, 248.

Haussecker, D., Huang, Y., Lau, A., Parameswaran, P., Fire, A.Z., and Kay, M.A. (2010). Human tRNA‐derived small RNAs in the global regulation of RNA silencing. RNA 16, 673‐695.

Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds, A.P., Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., et al. (2009). Global mapping of protein‐DNA interactions in vivo by digital genomic footprinting. Nat Methods 6, 283‐289.

John, S., Sabo, P.J., Thurman, R.E., Sung, M.H., Biddie, S.C., Johnson, T.A., Hager, G.L., and Stamatoyannopoulos, J.A. (2011). Chromatin accessibility pre‐determines glucocorticoid receptor binding patterns. Nat Genet 43, 264‐268.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory‐efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25.

Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., and Li, M. (2008). ZOOM! Zillions of oligos mapped. Bioinformatics 24, 2431‐ 2437.

Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673‐684.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next‐generation DNA sequencing data. Genome Res.

16 Medina, I., Carbonell, J., Pulido, L., Madeira, S.C., Goetz, S., Conesa, A., Tarraga, J., Pascual‐Montano, A., Nogales‐ Cadenas, R., Santoyo, J., et al. (2010). Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res 38, W210‐213.

Pagliarini, D.J., Calvo, S.E., Chang, B., Sheth, S.A., Vafai, S.B., Ong, S.E., Walford, G.A., Sugiana, C., Boneh, A., Chen, W.K., et al. (2008). A mitochondrial protein compendium elucidates complex I disease biology. Cell 134, 112‐123.

Percival D.B., A.T.W. (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques. (Cambridge, Cambridge University Press).

Sabo, P.J., Kuehn, M.S., Thurman, R., Johnson, B.E., Johnson, E.M., Cao, H., Yu, M., Rosenzweig, E., Goldy, J., Haydock, A., et al. (2006). Genome‐scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat Methods 3, 511‐518.

Saldanha, A.J. (2004). Java Treeview‐‐extensible visualization of microarray data. Bioinformatics 20, 3246‐3248.

Sekimata, M., Perez‐Melgosa, M., Miller, S.A., Weinmann, A.S., Sabo, P.J., Sandstrom, R., Dorschner, M.O., Stamatoyannopoulos, J.A., and Wilson, C.B. (2009). CCCTC‐binding factor and the transcription factor T‐bet orchestrate T helper 1 cell‐specific structure and function at the interferon‐gamma locus. Immunity 31, 551‐564.

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA‐Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511‐515.

Yakubovskaya, E., Mejia, E., Byrnes, J., Hambardjieva, E., and Garcia‐Diaz, M. (2010). Helix unwinding and base flipping enable human MTERF1 to terminate mitochondrial transcription. Cell 141, 982‐993.

17 Click heretodownloadSupplementalFigure:FigureS1.pdf Supplemental Figure D A F

0.19 Mitochondria Cell lysate RNAseq abundance (RPKM x 103) 222.0 225.0 0.0 1.0 2.0 3.0 4.0 r α-MnSOD 2 1.00 0.81 r 2 12S liver heart

muscle CYTB adipose α-PSP1 breast

kidney ND5 thyroid qRT-PCR RNAseq adrenal B

lymph ND6 fold enrichment colon (relative to whole cell preparation) 10 10 10 10 10

prostate 0 1 2 3 4 1.5 2.0 2.5 115.0 120.0 0.0 0.5 1.0 ovary CO1/ Neat1 qRT-PCR lung qRT-PCR abundance CO1/GAPDH (transcript/ug RNA; x 105) brain CO1/18S testes 12S/Neat1 E whiteblood 12S/GAPDH HADH HSDL2 ETFB ACOT2 ACADS ACACB ECH1 MAOA VDAC1 PDK4 NDUFV3 ND2 ND1 ND3 CO3 CO2 ATPase8/6 CYTB ND4L/4 ECHDC3 ACSL1 GPD1 ND6 ND5 CO1 12S/18S 5 Abundance (transcript/ug RNA; x 10 ) C 120.0 115.0 1.5 2.0 2.5 0.0 0.5 1.0 log

0.0 Reads aligned to mtDNA light strand (per million) 10 10 10 10 10 10 10 nuclear genes genes mitochondria rpkmx10 12S 0 1 2 3 4 5 mitochondrial genome(100deciles) LSP CYTB 4.0 3 antisense sense ND5 rRNA Click heretodownloadSupplementalFigure:FigureS2.pdf Supplemental Figure D A

mitoplast/mitochondria fold enrichment (log ) fold enrichment

-2 -1 2

0 1 2 3 (relative to whole cell preparation) 10 10 10 10 10 10 10

5S rRNA 0 1 2 3 4 5 6 Neat1 CO1/

MRP mitoplasts mitochondria

RNase P GAPDH CO1/ E CO1/

mitoplast (reads; rpkm) 18S 10 10 10 10 10 10 10 10

-2 -1 1 2 3 4 5 0

10 mitochondria (reads;tpm) -2

Annotated RNAs

Neat1

12S/

10

-1

10 0

GAPDH

10 12S/ 1

10

2

10

3 12S/ 18S

10

4

10 5 B F

mitoplast (reads; rpkm) cumulative fraction 10 10 0 0 0 0 0 1 10 10 10 10 10 ......

0 2 4 6 8 0

-2 -1 0 1 2 3 4 10

10

mitochondria (reads;tpm) 0 -2

RNAz predictedncRNAs

10

10 1

-1

total readcount

10

10 2

0

10

10 3 1

mitoplast mitochondria

10

10

4 2

10

10

5 3

10

10

6 4 C G

mitoplast (reads; rpkm) mitoplast (reads; rpkm) 10 10 10 10 10 10 10 10 10 10 10 10 10 10

2 3 4 5 6 7

-2 -1

0 1 2 3 4 5

10 10

2 mitochondria (reads;tpm) -2 mitochondria (reads;tpm) Mitochondria transcripts Evofold predictedncRNAs

10

10 -1

3

10

0

10

4 10

1

10

10

2 5

10

3 r

10 2

6 =0.98 10

4

10 10

5 7 Supplemental Figure Click here to download Supplemental Figure: FigureS3.pdf

1 5’ processing (PARE) A B 35 C Brain 3’ processing Hek ) 1 Hela 10 * (-CCA addition) 0.1 PARE 0.8

0.6 * 300 0.01 0.4 relative proportion relative proportion (log 3 ’ -CCA

additions 0.2 read frequency (raw count)

0.001

l 0 r s p a y r tRNA-Asn y Ile tRNA-Ala -6 +1 V T Ala

T +6 His L Gln Pro Glu Arg Gly Thr Asn Cys Asp Met Phe mtDNA::5700 5650 5600 distance relative to internal cleavage site (brain centric, nt) Ser(AGY) Ser(UCN) Leu(CUN) Leu (UUR)

D tRNA E 3000 143B cells F 4 mixed human tissue 500 Brain 2500 ) 3 )

PARE 3 2000 500 Hela 1500 2 PARE

3000 Hek 1000 tag frequency (x 10 1 polyA non-template RNAseq

read frequency (raw count) 500 PARE read frequency (raw count poly-A nontemplate RNAseq 0 0 ND4 MET residues -15 +1 +15 -15 +1 +15 mtDNA:11,250 11,400 distance relative to distance relative to mRNA 3’ termini (nt) tRNA 3’ termini (nt) 0.4 G 250 143b cells H 1.0x105 I polyA+ * polyA- 4 200 1.0x10 0.3

150 1.0x103 0.2 * 100 1.0x102

polyA non-template 0.1 relative proportion human mixed tissue RNAseq tag frequency 50 10 polyA frequency (raw count)

0 1 0.0 -15 +1 +15 ND1 C01 C01 CO3 ND1 ND2 ND3 ND5 ND6 CO2 ND2 ND5 ND6 distance relative to CO2 CO3 CytoB tRNA 3’ termini (nt) CytoB ND4L/4 ND4L/4 TPase8/6 TPase8/6 A A Supplemental Figure Click here to download Supplemental Figure: FigureS4.pdf

A 1x106 B 2,400 C 570 1x105

1x104

1x103 PARE reads PARE PARE reads PARE PARE reads PARE

2 (Hek cells; raw count) 1x10 (Hek cells; raw count) (Hek cells; raw count)

1x101 structure structure

CO1 ND2 tRNA

novel mtDNA:6,400 6,650 mtDNA:5,100 5,400 rRNA

330 structure A 60 structure B G U U A 50 U C G C D E A G A F G C G U G G U G C C G A G U G G G C C G U G 70 G U 40 G U G A A G

G G G G 80 U G U U U U A A C A U G U U U 80 U U A A G U U A 70 G C U U G 30 A U U G U G G A G G U 50 A 90 G U A U U U A U G G 90 60 A A G C C

PARE reads PARE C C C A U G U U C G A U C C C G G G C C U A G G G U U G U G 20 G C C G U G G G C G G G U U U U A A U G U U C G G U

(Hek cells; raw count) C A 40 30 U G U C G C G 20 U U G A A G 100 G 100 10 C G A U A U U A G A U C G U A G U G C G U U A U G A G U G 10 A G U G structure A structure B A U U A G C 110 C G U A

G U ATPase 8/6 A U CO3 mtDNA:9,150 9,500 Supplemental Figure Click here to download Supplemental Figure: FigureS5.pdf

A B C 3.6x105 250 100 90 200 80 70 150 60 50 (raw count)

sRNA frequency sRNA 100 40

(per million) (per million) 30 sRNA frequency sRNA frequency sRNA 19mer 50 20 22mer 10 chrM:12,180 26mer 12290 0 0

143B A549 Hela Hela DLD2 H520 U2OS 143B A549DLD2 H520 U2OS tRNA-His tRNA-Ser tRNA-Leu SW480 SW480 AGY CUN MDA231 MDA231 D E F 3.6x105 350 100 90 300 80 250 70 200 60 50 (raw count) 150 sRNA frequency sRNA 40 (per million) (per million)

sRNA frequency sRNA 100 frequency sRNA 30 22mer 20 50 27mer 10 chrM:3220 29mer 3330 0 0

Hela Hela 143B A549DLD2 H520 U2OS 143B A549DLD2 H520 U2OS SW480 SW480 16S rRNA tRNA-LeuUUR ND1 MDA231 MDA231

A G C H I C 25000 tRNA-Trp G 3 A T nuclear tRNAs G C mitochondria tRNAs A T 20000 ype A T t A T 2 T A

15000 o wild- T A t equency T A e r A A T T C A T A v A A T T G G m1 ti a A A G T T t in mutant small RNA T el n C (per million) 10000 G A C C T G dicer-dependent r A A sRNA f 1 C A C G small RNAs

A T A ichme G C 5000 r A C libraires G C mt-tRNA associated small RNA old en 0 C G 0 f C A 10 20 30 40 50 60 70 dcr1 -/- dcr2 -/- loq -/- r2d2 -/- T A tRNA loci (nt) D. melanogastor T C A genotype AGAAATTTAGGTTAAATACAGACCAAGAGCCTTCAAAGCCCTCAGTAAGTTGCAATACTTAATTTCTG AGAAATTTGGGTTAAAT n=390 AGAAATTTTGGTTAAAT n=481 small RNA sequences AGAAATTTCGGTTAAAT n=18 J 600 700 500

400

300 (raw count)

sRNA frequency sRNA 200

100

0 CCGCCGCCGGGAAAAAAGGCGGGAGAAGCCCCGGCAGGTTTGAAGCTGCTT chrM: 5737 5787 RNA initation DNA transition Supplemental Figure Click here to download Supplemental Figure: FigureS6.pdf

DNase I normalised percentile DNase I normalised percentile 0% 25% A B 0% 25% Bjtert hES Hela Hela hES Th1 K562 Skmc Skmc K562 Sknshra Bjtert Th1 Sknshra mtDNA: 5725 5785 mtDNA: 6435 6460 CGCCGCCGGGAAAAAAGGCGGGAGAAGCCCCGGCAGGTTTGAAGCT CCAATACCAAACGCCCCTCTTCGTCT transcription start site RNA/DNA transitions footprint

C 2 2 2

bits 1 bits 1 bits 1

0 0 0

T R, q=0.16 ER, q = 1.7 x 10-3 STAT3, q = 4.8 x 10-3 2 3 2 2 1 1 1 bits bits bits 0 0 0

2 2 2

bits 1 bits 1

bits 1 0 0 0

CREB, q = 1.9 x 10-4 MEF2, q = 2.3 x 10-3 NF1, q = 5.3 x 10-5 2 2 2 1 1 1 bits bits bits 0 0 0

6 LSP D E 2x10 light strand mTERF1 bound PARE HSP to termination 8x10 5 minor sequence heavy strand PARE 6x10 5 RNAseq 1x10 4 poly-A

CSB: I II III HSP major

BJ Hela K562 DNase 1 hES Skmc cleavage Sknsh sites Th1 mtDNA: 100 200 300 400 500 600 700 Supplemental Figure Click here to download Supplemental Figure: FigureS7.pdf

A 1.0 B 1.0 32 White Noise Test 35 WOSA White Noise Test WOSA 0.8 0.8 0.6 0.6 dB dB Pk Pk 0.4 0.4 0.2 0.2 0.0 Skmc 26 Skmc 0.0 BJ 25 BJ 80 20 10 5 4 80 20 10 5 80 20 10 5 4 80 20 10 5 f(1/bp) f(1/bp) base pair C D 1.0 31 1.0 White Noise Test WOSA White Noise Test 37 WOSA 0.8 0.8 0.6 0.6 Pk dB Pk dB 0.4 0.4 0.2 0.2 0.0 K562 26 K562 0.0 SknSH 25 SknSH 80 20 10 5 4 80 20 10 5 80 20 10 5 4 80 20 10 5 base pair f(1/bp) f(1/bp) base pair E F 1.0 36 1.0 32 White Noise Test WOSA White Noise Test WOSA 0.8 0.8 0.6 0.6 dB dB Pk Pk 0.4 0.4 0.2 0.2 0.0 Th1 24 Th1 0.0 Hela 25 Hela 80 20 10 5 4 80 20 10 5 80 20 10 5 4 80 20 10 5 f(1/bp) base pair base pair G f(1/bp) 1.0 33 White Noise Test WOSA 0.8 0.6 dB Pk 0.4 0.2 0.0 hES 25 hES 80 20 10 5 4 80 20 10 5 f(1/bp) base pair