Human brain transcriptomics: towards understanding multiple system atrophy

James Dominic Mills

Athesissubmittedinfulfilmentoftherequirementsforthedegreeof Doctor of Philosophy

School of Biotechnology and Biomolecular Sciences Faculty of Science The University of New South Wales Sydney, Australia

October 28, 2015 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: Mills

First name: James Other name/s: Dominic

Abbreviation for degree as given In the University calendar: BIOC8608

School: The School of Biotechnology and Biomolecular Sciences Faculty: Science

Title: Human brain transcriptomlcs: towards understanding multiple system atrophy

Abstract 350 words maximum: (PLEASE TYPE)

The human brain is a remarkably complex organ. It is a heterogeneous collection of billions of neurons and glial cells that are interconnected to form a finely tuned network capable of higher cognition. It is thought that the transcriptome may hold the key to understanding the complexity of the human brain. Next-generation sequencing allows the brain's transcriptome to be probed at an unmatched resolution. This has uncovered a myriad of RNA elements. including a number of long intervening non-coding RNAs (lincRNAs) that appear to be vital to the development and fun ction of the brain. The complexity of the human brain also makes it prone to a variety of different neurodegenerative diseases. One such disorder is multiple system atrophy (MSA). MSA is a sporadic, rapidly progressing neurodegenerative disease. Currently no treatment exists and very little is known about the molecular basis of MSA.

In an attempt to understand the complexity of the human brain, transcriptome profiling of grey matter (GM) and white matter (WM) from lhe prefrontal cortex was performed. This revealed high expression of numerous lincRNAs, including the oligodendrocyte maturation-associated lincRNA (OLMALINC) in WM. To further establish the role of OLMALINC In the human brain the transcript was knocked down in neurons and oligodendrocytes. This revealed that OLMALINC plays a role in oligodendrocyte maturation. Next transcriptome profiling of MSA brain tissue was carried out. that were differentially expressed between the healthy and MSA brain included. a1-hemoglobin (HBA 1), a2-hemoglobin (HBA2) and 13-hemoglobin (HBB). This suggests that perturbation of iron metabolism may be involved in MSA pathology. A number of differentially expressed lincRNAs between MSA GM and MSA WM were also Identified. One of these lincRNAs. linc00320 was further investigated to establish if it played role in MSA pathology.

Together. using next-generation sequencing and bioinformatic tools this thesis provides a comprehensive transcriptional analysis of the human brain, followed by the most detailed transcriptome analysis of MSA. The transcriptome wide profiling analyses are also coupled with comprehensive analyses of OLMALINC and linc00320, detailing their importance In human brain physiology.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known. subject to the provisions of the Copyright Act 1968. I retain all property rights. such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis In Dissertation Abstracts International (this is applicable to doctoral theses only, . ..~./.IQ (. \ .~...... Signature Witness bate

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances andre uire the a roval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS Copyright Statem ent

'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in t he University libraries in all forms of media, now or here after known, subject to t he provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University :Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only).

I have either used no substantial portions of copyright material in my thesis or I have obtnined permission to use copyright material; where perm ission has not been granted I have npplied/ will apply for n partial restriction of the digital copy of my t hesis or dissertation.'

Signed

Date

Authent icity Statem ent

'I certify that the Library deposit digital copy is a direct equivalent of the final offi cially approved version of my thesis. No emendation of content has occurred and if t here are any minor variations in formatting, they arc the result of the conversion to digital format.'

Signed

Date Originality Statement

'I hereby declare that t his submission is my own work and to the best of my knowl­ edge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSvV or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that t he intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expres­ sion is acknowledged.·

Signed

Date Contents

Acknowledgements IV

Abstract V

List of figures VIII

Abbreviations IX

1 Introduction 1

1.1 The prefrontal cortex of the human brain ...... 1

1.2 Thetranscriptomeandorganismcomplexity ...... 3

1.3 Non-codingRNAs...... 4

1.4 Conservation and tissue-specificity of long non-coding RNAs . . . . . 8

1.5 RNA-sequencing ...... 10

1.6 Multiple system atrophy ...... 13

1.7 Multiple system atrophy and transcriptomics ...... 16

1.8 Aimsofthisstudy ...... 17

1.9 Thesis synopsis ...... 18

I 1.10 Review article: Strand-specific RNA-Seq provides greater resolution of transcriptome profiling ...... 19

2 Transcriptome profiling of the healthy human brain using RNA- Seq 30

2.1 Primary research article: Unique transcriptome patterns of the white and grey matter corroborate structural and functional heterogeneity in the human frontal lobe ...... 31

3 The role of long intervening non-coding RNAs in healthy brain development and function 51

3.1 Primary research article: High expression of long intervening non- coding RNA OLMALINC in the human cortical white matter is as- sociated with regulation of oligodendrocyte maturation ...... 52

4 Transcriptome profiling of multiple system atrophy brain tissue using RNA-Seq 66

4.1 Primary research article: Transcriptome analysis of grey and white matter cortical tissue in multiple system atrophy ...... 67

5 Investigation of a long intervening non-coding RNA that may be involved in the pathology of multiple system atrophy 85

5.1 Primary research article: Long intervening non-coding RNA 00320 is human brain-specific and highly expressed in the cortical white matter ...... 86

II 6 Conclusions and future directions 101

6.1 Summary ...... 101

6.2 The use of post-mortem brain tissue in transcriptome profiling . . . . 104

6.3 Selection of the superior frontal gyrus ...... 107

6.4 The complexity of the transcriptome of the human brain ...... 108

6.5 Brain lincRNAs show low levels of sequence and expression conservation ...... 110

6.6 Insights into the pathology of multiple system atrophy ...... 111

6.7 Futuredirections ...... 112

References 116

Appendices 133

A Publications with non-first authorship related to this thesis 133

A.1 The role of transcriptional control in multiple system atrophy . . . . 134

A.2 Pathway analysis of the human brain transcriptome in disease . . . . 142

A.3 Conservation and tissue-specific transcription patterns of long non- codingRNAs ...... 152

III Acknowledgements

First and foremost I o↵er my sincerest gratitude to my supervisor Dr. Michael Janitz. His enthusiasm, commitment and support made the completion of this thesis possible. I could not have asked for a better mentor throughout my PhD. You have had a profound influence on my career path.

Next, I would like to thank all of my co-authors who contributed to the papers presented in this thesis including; Prof. Glenda Halliday, Dr. Paul Waters, Dr. Scott Kim, Dr. Yoshihiro Kawahara, Prof. Eleonora Aronica, Bei Jun Chen, Tomas Kavanagh, Jieqiong Chen and Avanita Prabowo.

To all of my friends and family throughout Australia and the world, thanks for your support.

Last but in no way the least, I would like to thank the donors who made these studies possible. I implore everyone who reads this to donate their body to science.

IV Abstract

The human brain is a remarkably complex organ. It is a heterogeneous collection of billions of neurons and glial cells that are interconnected to form a finely tuned network capable of higher cognition. It is thought that the transcriptome may hold the key to understanding the complexity seen in the human brain. Next-generation sequencing allows the brain’s transcriptome to be probed at an unmatched resolu- tion. This has uncovered a myriad of RNA elements, including RNA that does not code for , known as non-coding RNA (ncRNA). Originally, thought to be transcriptional noise, it is now appreciated that ncRNAs have numerous functional properties, with the ability to interact with DNA, other RNA molecules and in di↵erent cellular compartments. It is thought that an increase in the number of ncRNAs being expressed throughout the brain, is a major driver of the increased intellectual capacity seen in humans and primates. The increase in the complexity of the human brain, also makes it prone to a number of di↵erent neurodegenerative and psychiatric diseases. These diseases are set to have dramatic economic and social impacts by the middle of the 21st century. To avert this looming epidemic an adequate understanding of the human brain is needed, so diagnostic tools and treatment targets can be developed. One such disorder is multiple system atrophy (MSA). MSA is a sporadic, rapidly progressing neurodegenerative disease. Currently no treatment exists and very little is known about the molecular basis of MSA. Be- fore an understanding of the diseased brain can be reached an understanding of the healthy brain is necessary.

Here, the transcriptome of grey matter (GM) and white matter (WM) from the superior frontal gyrus (SFG) of the healthy prefrontal cortex (PFC) was analysed. This revealed pervasive transcription and highlighted the di↵erences in the tran-

V scriptome profiles of distinct cortical structures throughout the brain. A number of protein-coding genes were expressed exclusively in GM or WM, including gamma- aminobutyric acid A receptor, beta 2 (GABRB2 )andP21Protein(Cdc42/Rac)- Activated Kinase 2 (PAK2 ), respectively. Further, an interesting phenomenon known as isoform switching was detected in genes such as the G protein-coupled receptor 123 (GPR123 ). It was also revealed that in the healthy frontal cortex long intervening non-coding RNAs (lincRNAs), a subclass of long non-coding RNAs (lncRNAs), appear to be important drivers of tissue di↵erentiation.

To further establish the role of lincRNAs in the healthy human brain a comprehen- sive analysis of the oligodendrocyte maturation-associated lincRNA (OLMALINC ) was carried out. It was found that OLMALINC is a recently evolved lincRNA with its highest expression levels in the human brain. OLMALINC was knocked down in human neurons and oligodendrocytes. Depletion of OLMALINC transcription revealed that it plays a role oligodendrocyte maturation. This study was one of the first functional characterisations of a lincRNA expressed in the human brain, and thus demonstrated the importance of the non-coding transcriptome throughout the human brain.

Next, the transcriptome of MSA brain tissue was profiled. This identified a hand- ful of new genes that may be involved in the progression of the MSA. Genes that were di↵erentially expressed between healthy and MSA tissue from the SFG in- cluded, ↵1-hemoglobin (HBA1 ), ↵2-hemoglobin (HBA2 ), -hemoglobin (HBB)and transthyretin (TTR). A number of di↵erentially expressed lincRNAs between MSA GM and MSA WM were also identified. Very little is currently known about the genetic basis of MSA and this research article provides the most in-depth transcrip- tomic investigation of MSA to date.

Due to its significant up-regulation in MSA WM linc00320 was selected for further analysis. While no conclusive link between linc00320 expression patterns and MSA

VI was found, it was established that linc00320 was both human- and brain-specific. Interesting results concerning the biology of lincRNAs were also uncovered. While highly expressed in the human brain, no expression of linc00320 was detected in other human peripheral tissues. Further, no expression of linc00320 was detected in the following human cell lines; neurons, oligodendrocytes, adult astrocytes and fetal astrocytes. The inability to detect linc00320 expression in these homogeneous cell lines could be linked to the lack of cell-to-cell communication and tissue context of cell lines. Analysis of linc00320 splice variants demonstrated that alternative splicing can result in lincRNAs with di↵erent functional properties.

Together, using cutting edge sequencing techniques and bioinformatic tools this thesis provides a comprehensive transcriptional analysis of the unique transcriptome of the human brain, followed by the most detailed transcriptome analysis of the neurodegenerative disease MSA. The transcriptome wide profiling analyses are also coupled with comprehensive analyses of OLMALINC and linc00320, detailing their importance in human brain physiology.

VII List of figures

1.1 Classification of lncRNAs based on genomic location...... 7

1.2 RNA-Seqwork-flow...... 12

1.3 Proposed pathogenesis of multiple system atrophy ...... 15

VIII Abbreviations

circRNAs circular RNAs COQ2 coenzyme Q2 4-hydroxybenzoate polyprenyltransferase EIF4EBP1 eukaryotic tranlsation intiation factor 4E binding protein 1 ENCODE The Encyclopedia of DNA Elements fpkm fragments per kilobase of transcript per million mapped reads GABRB2 gamma-aminobutyric acid A receptor, beta 2 GBA acid beta-glucosidase GCIs glial cytoplasmic inclusions GM grey matter GPR123 Gprotein-coupledreceptor123 GTEx Genotype-Tissue Expression HBA1 ↵1-hemoglobin HBA2 ↵2-hemoglobin HBB -hemoglobin HGNC Huamn Genome Organisation Nomenclature Committee HOTAIR HOX transcript antisense RNA lincRNAs long intervening non-coding RNAs lncRNAs long non-coding RNAs LRRK2 leucine-rich repeat kinase 2 MALAT1 metastasis associated lung adenocarcinoma transcript 1 miRNAs microRNAs MSA multiple system atrophy MSA-C MSA, cerebellar ataxia subtype MSA-P MSA, parkinsonian subtype ncRNA non-coding RNA

IX non-polyadenylated poly(A)- OLMALINC oligodendrocyte maturation-associated lincRNA PAK2 P21 Protein (Cdc42/Rac)-Activated Kinase 2 PD Parkinson’s disease PFC prefrontal cortex piRNAs PIWI-interacting RNAs PMI post-mortem interval polyadenylated poly(A)+ RIN RNAintegritynumber RNA-Seq RNA-Sequencing rRNA ribosomal RNA SFG superior frontal gyrus SHC2 Src Homology 2 Domain Containing transforming protein 2 siRNAs small interfering RNAs SLC1A4 solute carrier family 1 (glutamate/neutral amino acid transporter), Member 4 sncRNAs short non-coding RNAs SNPs singlenucleotidepolymorphisms TTR transthyretin WM white matter

X Chapter 1

Introduction

1.1 The prefrontal cortex of the human brain

The human brain is an organ of immense complexity, it is a heterogeneous cellular mass comprised of billions of neurons and glial cells. The fossil record suggests that brain size has increased approximately 3-fold over the past 2.5 million years of evolution in the human lineage [1]. Today the human brain is much larger than the largest non-human primate brain. Since this is the case it would be expected that all structures in the human brain would be larger than the analogous region in the non-human primate brain. While this generally holds true, there are exceptions, for example the olfactory bulbs are smaller in humans than in great apes [2]. The region of the human brain that has undergone the greatest increase in size when compared to non-human primates is the prefrontal cortex (PFC) [2]. The PFC of the cerebrum is considered the most highly developed region of the human brain. The PFC is responsible for higher order tasks and behaviours such as planning [3], working memory [4], aspects of language [5] and social information processing [6]. These key roles in a variety of higher level tasks suggests that the PFC may be involved in the unique higher cognitive ability seen in humans.

The cerebrum, including the PFC can be divided into two distinct tissue types; (i) grey matter (GM), that contains neurons, their dendrites and axons as well as

1 Chapter 1 glial cells and (ii) white matter (WM), which is made up predominately of oligo- dendrocytes and myelinated and non-myelinated axons [7, 8]. WM is responsible for providing connectivity between neurons in di↵erent cortical regions of the brain [9]. As brain size has increased across the primate lineage there was a requirement that axons were longer, thicker and more heavily myelinated to maintain optimal connectivity between neurons. Interestingly, it appears in mammals that as brain size increases, WM increases in volume disproportionally faster than GM [10]. Fur- ther, when comparing the human brain to other primates it appears that it is the WM that is the major contributor to the relative brain enlargement [9, 11]. Long considered a passive tissue it is now accepted that WM plays an important role in intelligence, and is vital for the acquisition of new skills [7, 12]. This suggests that increases in the volume of WM may be directly related to human specific intelligence.

While it is known that there exists distinct structural and functional regions of the human brain, particularly in the PFC [13], little is known about the molecular basis of these di↵erences. Due to ethical and practical reasons, it is not currently possible to study gene expression of di↵erent regions of the human brain in vivo.The use of model organisms such as mice or drosophila melanogaster can provide useful information concerning the molecular basis of the brain, however due to the added layers of complexity of the human brain results from these model organisms are not always applicable or relevant. The best resource for research of the human brain is post-mortem brain tissue [14]. Post-mortem brain tissue that has been handled and stored properly should hold similar gene expression patterns to in vivo brain tissue. Thus, making it ideal for research into molecular di↵erences of distinct brain regions.

2 Chapter 1

1.2 The transcriptome and organism complexity

No universal definition of organism complexity exists. It can be considered as an increase in cell number, accompanied by cellular di↵erentiation and an associated increase in cell types, each with specialised function and the expression of cell- specific genes [15]. Using this definition a human can be seen as a more complex organism than a mouse, and within the human body the brain can be considered a more complex organ than the liver. Intuitively it would be thought that the number of genes or the size of the genome would be correlated with organism complexity. This is in fact not true [16].

Estimates place the number of genes in the at approximately 23,000. These genes are expressed in myriad of di↵erent ways across approximately 1014 cells, including 1010 neurons in the brain alone [17]. Conversely, the relatively sim- ple nematode Caenorhabditis elegans,hasup-towards20,000proteincodinggenes, expressed across 1000 relatively homogeneous cells [18, 19]. Against intuition, this suggests, that a similar set of genes used to produce a simple organism with a small number of di↵erentiated cells, could also produce humans with a diverse range of spe- cialised cells. Further, the protein-coding sequences of the human and chimpanzee genome share between 98-99% sequence identity [20], when the overall DNA se- quence including non-coding regions is compared between humans and chimpanzee the similarity is closer to 95% [21, 22]. Even with these similarities there exist marked phenotypic di↵erences, including extensive di↵erence in brain development, size and rate of growth. This suggests organism complexity is not simply derived at the genomic level.

The transcriptome refers to the set of RNA molecules present in a cell. Unlike the genome, which remains relatively fixed throughout the life time of a cell, the transcriptome is dynamic, changing depending on environmental factors, disease

3 Chapter 1 state and developmental stage. The transcriptome of di↵erent cells is made up of a varied collection of RNA transcripts. Transcription can produce protein-coding or non-coding RNAs (ncRNAs) that in turn can be alternatively spliced or edited to produce functionally distinct RNA molecules [23, 24]. This means that the re- lationship between DNA locus and functional unit (RNA or protein) is not one to one. Gene expression and thus the transcriptome can further be controlled through epigenetic mechanisms [25], DNA methylation suppresses the expression of genes throughout the genome [26]. It appears the RNA and the transcriptome plays a vital role in human development and cognition [27].

It has been hypothesised that it is in fact the transcriptome that drives organism complexity. The transcriptome adds an extra level of regulation, whereby gene expression can be precisely controlled and modulated across di↵erent cell types to achieve higher levels of complexity [28]. Cells within the human body demonstrate this; while liver cells and neurons both have identical genomic material they are distinct, both morphologically and functionally. These di↵erences are controlled by which genes are being expressed and to what level [29].

1.3 Non-coding RNAs ncRNAs refer to RNA molecules that are transcribed, but are not translated to proteins. It was originally thought that the majority of these untranslated RNA molecules were simply transcriptional noise. While there is a long history of sporadic evidence suggesting that some RNA molecules were indeed functional, ribosomal RNAs (rRNAs) play an important role in protein synthesis [30], functionality of RNA was generally thought to be the exception rather then the rule. Over the past decade this paradigm has shifted dramatically. Now, the non-coding aspect of the transcriptome has been generating high levels of interest and excitement

4 Chapter 1 amongst research laboratories. The ENCODE project showed that up to 75% of the human genome is transcribed in cell type-specific manner [31] and that much of this transcriptional output is ncRNAs [32]. Further, it appears ncRNAs can interact with DNA, other RNAs and proteins, often acting as a regulatory molecule [33]. It is thought that ncRNAs are a major contributor to organism complexity and indeed there does exist a correlation between organism complexity and proportion of ncRNA in the genome [28]. In particular it appears that ncRNAs may be involved in brain evolution, development, synaptic plasticity and disease [34]. This points to the non-coding transcriptome being related to the complexity and the higher levels of intellectual capacity seen in the human brain.

It is possible to split ncRNAs into three categories based on the length or shape of the transcripts; circular RNAs (circRNAs), short non-coding RNAs (sncRNAs) (<200 nucleotides in length) and long non-coding RNAs (lncRNAs) (>200 nucleotides in length). The most recent ncRNA class to emerge is circRNAs. These occur when the 5’ end of one exon is joined to the 3’ end of another exon, forming a circular structure with no free 5’ or 3’ end [35]. Interestingly it appears that circRNAs are enriched in synapses [36]. Of all classes of ncRNAs the sncRNAs are the most well understood. Examples of sncRNAs include small interfering RNAs (siRNAs), microRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs) [37]. lncRNAs are RNA transcripts greater than 200 nucleotides in length that lack protein-coding potential [38]. Other then length there does not appear to be any common sequence or structural features shared between lncRNAs. It is possible to further categorise lncRNAs as antisense, intronic, bi-directional, intervening and overlapping based on their orientation and relationship to surrounding genes (Figure 1.1) [38].

There is a growing body of evidence suggesting functionality of lncRNAs, with exam- ples including the HOX transcript antisense RNA (HOTAIR)[39]andmetastasis associated lung adenocarcinoma transcript 1 (MALAT1 )[40].Theidentification

5 Chapter 1 of a handful of functional lncRNAs justifies the search for functional lncRNAs at a transcriptome wide scale. While it is known that many lncRNA are expressed throughout the human genome, a large gap exists towards their functional annota- tion [32]. It does appear that lncRNAs play an essential role in development, cellular maintenance and function, and have been linked to a number of neurodegenerative diseases [41].

One particular subclass of lncRNAs is known as long intervening non-coding RNAs (lincRNAs). These are transcripts which are transcribed from regions between genes of known annotation [42]. Evidence suggest that lincRNAs may play a role in mouse brain development and are involved in cancer progression [43, 44]. Further, 93% of disease associated single nucleotide polymorphisms (SNPs) are found in regions that do not code for proteins [45], hence it is possible that these regions may instead harbour lincRNAs. However, there have been very few studies into the role of lincRNAs in diseases of the human brain.

6 Chapter 1

Figure 1.1: Classification of lncRNAs based on genomic location. Upper panel: Intronic lncRNA: the lncRNA is transcribed from an intronic region of a protein-coding gene. Antisense lncRNA: the lncRNA is transcribed from the strand opposite to protein-coding gene, with partial or complete overlap of any intronic or exonic regions. Intervening lncRNA (lincRNA): the lncRNA is transcribed from a region located between other genes. There is no overlap with any protein-coding genes. Overlapping lncRNA: The intron of the lncRNA encompasses a protein- coding gene. Bidirectional lncRNA: The lncRNA shares its transcription start site with a protein-coding gene on the opposite strand. Arrows indicate orientation of transcription. Lower panel: lncRNAs can be alternatively spliced to produce numerous splice variants. Here, a lincRNA is spliced to produce two variants. Each of these variants produces RNAs with unique secondary structure. The unique RNA secondary structure can determine function of the lncRNA isoform. Figure adopted from Ward, McEwan, Mills et al, 2015 [46].

7 Chapter 1

1.4 Conservation and tissue-specificity of long non-

coding RNAs

One very interesting feature of lncRNAs is that loci from which lncRNAs are tran- scribed show lower levels of sequence conservation than loci from which protein- coding RNAs are transcribed from [47, 48, 49]. While young lncRNAs (under 25 million years) show lower levels of exonic sequence conservation than random inter- genic regions, older lncRNAs (older than 90 million years) in fact show higher levels of conservation than untranslated regions [48]. This poses an issue when investigat- ing the level of conservation of lncRNAs across species for evidence of function. A study of 14,880 lncRNAs by the GENCODE group, found lncRNAs are less con- served than protein-coding genes, with approximately 30% of lncRNA transcripts conserved amongst primates, and 1% appearing specific to the human lineage. By comparison 98% of protein-coding genes are conserved across all primates [32, 48]. Outside of primates the conservation of lncRNAs falls away rapidly. Specifically, within the brain there appears to be higher levels of species-specific transcription of lncRNAs, and in particular lincRNAs. A recent comparative transcriptome study of human and macaque PFC found that while 13,722 of the 14,745 protein-coding genes expressed in human were also expressed in macaque, only 514 of 1061 lincR- NAs were expressed in macaque [50]. While it is probable that some of the lincRNAs that were not expressed in the macaque PFC, would be expressed in other primates, such as chimpanzees and orangutans, it does highlight that lincRNAs are expressed in a much more species-specific manner then protein-coding RNAs.

Along with having a low level of conservation lncRNAs also display a high level of tissue-specific gene expression patterns. The GENCODE consortium analyzed the levels of expression of protein-coding RNAs and lncRNAs across 16 tissue types [32]. It was found that 65% of protein-coding RNAs were expressed at some level across

8 Chapter 1 all tissue types. In contrast only 11% of lncRNAs were expressed across all tissue types, with 11% only detected in one tissue type, and 21% not detected in any of the 16 chosen tissues. The lncRNAs also had overall lower levels of expression, but higher levels of expression variability. For heterogenous tissues such as the brain, the expression patterns of lncRNAs may vary within the organ. This is demonstrated by a study of lncRNAs within the mouse brain, which showed that the majority of the lncRNAs were expressed in specific neuroanatomical regions, cell types and subcellular compartments [51]. Another study has revealed that a set of lncRNAs expressed in the human PFC showed significant changes in expression levels with age, suggesting involvement in brain ageing and development [50]. These studies provide strong evidence that the non-coding transcriptome is context based, i.e. the transcriptome is influenced by the developmental stage of organism, the tissue type and the development stage and type of the surrounding cells.

9 Chapter 1

1.5 RNA-sequencing

RNA-sequencing (RNA-Seq) is sequencing technique that makes use of high-throughput next-generation DNA sequencing. RNA-Seq is capable of providing an in-depth pic- ture of the dynamic transcriptome, detecting gene expression levels and alternative splicing patterns [52]. RNA-Seq has advantages over other widely used transcrip- tome profiling tools such as microarrays. RNA-Seq produces less false positives, has lower levels of background noise, single base resolution and can detect a more dynamic range of gene expression levels [53]. Perhaps, the most important aspect of RNA-Seq is that it can assemble transcriptomes de novo,allowingforthedis- covery of un-annotated transcripts and identification of novel splicing events [53]. This is particularly important for the non-coding fraction of the transcriptome as it is mostly transcribed in a tissue-specific manner. It has been suggested that RNA-Seq may hold the key to uncovering the pathogenesis of numerous complex neurodegenerative diseases [14, 53, 54].

While a variety of RNA-Seq platforms exist, each making use of di↵erent sequenc- ing chemistries, overall the work-flows are conceptually similar (Figure 1.2). The first step in RNA-Seq involves the selection of a RNA fraction from total RNA for sequencing. There are two main ways in which this can be performed; (i) selection of those transcripts that have been polyadenylated (poly(A)+) and (ii) selection through ribosomal RNA (rRNA) depletion. Selection of RNA is followed by frag- mentation, reverse transcription and adapter ligation, creating a double-stranded cDNA library. cDNA libraries are then subject to high-throughput sequencing, pro- ducing millions of short reads.

The raw data produced from an RNA-Seq experiment are then analysed to quantify gene and transcript expression levels, detect alternative splicing events and identify any un-annotated genes and transcripts. The generalised data analysis work-flow is

10 Chapter 1 as follows; first the reads are mapped to a reference genome or assembled de novo. Based on how the reads are aligned and the number of reads aligned, the tran- scripts are assembled and assigned expression values. Finally di↵erential expression testing can be carried out to identify any di↵erentially expressed genes or tran- scripts between two di↵erent biological conditions. A common work-flow utilised by researchers around the world is known as the Tuxedo suite [55]. The Tuxedo suite contains the following software modules; Tophat, Cu✏inks and Cu↵di↵. This set of software packages is ideal for the identification of novel splicing events and un-annotated transcripts, including lncRNAs. Each RNA-Seq experiment produces vast amounts of data. Storage, analysis and transfer of these large amounts of data is the current research bottleneck for transcriptome profiling. These issues will be overcome as the use of cloud computing and high performance computing clusters becomes even more widespread.

11 Chapter 1

Figure 1.2: RNA-Seq work-flow. First the RNA fraction is selected for se- quencing. This can be done via poly(A)+ selection or rRNA depletion. The RNA is then fragmented and reverse transcribed into cDNA. Sequencing adapters are then ligated to each cDNA fragment. The cDNA library is subjected to high-throughput next-generation sequencing. The output from the sequencing machine is millions of short sequence reads. These reads are mapped to a reference genome. The gene expression levels are calculated by determining the number of reads mapped to each exon. Further, by interrogating reads spanning exon/exon boundaries, the splicing patterns of each gene can be reconstructed. Figure adopted from Mills and Janitz, 2012 [56].

12 Chapter 1

1.6 Multiple system atrophy

Multiple system atrophy (MSA) is an adult-onset, sporadic neurodegenerative dis- ease; it is rare and progresses rapidly. MSA has a prevalence of 3.4-4.9 cases per 100,000 population, increasing to 7.8 per 100,000 amongst persons older than 40 years of age [57]. It is thought that disease establishment involves a complex inter- action between the environment and genetic factors. While no environmental factors are conclusively known to increase the risk of developing MSA, nicotine and alcohol use are more common among MSA su↵erers than control groups [58]. MSA presents amajordiagnosticchallenge,andisoftenmistakenforParkinson’sdisease(PD) until the more quickly progressing MSA has advanced clinically [58]. Similarly to PD, many cases of MSA (20-75%) have a prodromal pre-motor phase, that may in- clude sexual dysfunction, urinary incontinence or retention, orthostatic hypotension, inspiratory stridorm and rapid-eye-movement sleep behaviour disorder [59]. These symptoms can occur months to years before the first motor symptoms present, in- creasing with severity as the disease progresses. In terms of motor symptoms MSA is characterised by varying combinations of parkinsonism and cerebellar ataxia; the severity of the symptoms are dependent on the sub-type of MSA [60, 61]. There are two subtypes of MSA, the parkinsonian subtype (MSA-P) and the cerebellar ataxia substype (MSA-C) [62]. MSA-P is characterised by slowness of movements, rigidity and tendency to fall, while MSA-C patients present with a wide-based gait, uncoordinated limb movements, action tremor, and nystagmus [63]. MSA-P is more common than MSA-C throughout most of the world, with the exception of Japan [64].

Similarly to PD, MSA is classified as an ↵-synucleinopathy, in which the common pathological marker is the aggregation of the insoluble protein ↵-synuclein through- out the brain [65]. However, ↵-synuclein accumulates in a distinct fashion in both diseases. In PD, ↵-synuclein aggregates are found predominately in dopaminer-

13 Chapter 1 gic neurons. In MSA, ↵-synuclein accumulates initially in the oligodendrocytes of the WM, in aggregates known as glial cytoplasmic inclusions (GCIs) [66]. While the exact pathogenic mechanisms underlying MSA remains unclear, it is thought to primarily be oligodendrogliopathy (Figure 1.3) [67]. Initially, the p25↵ protein, an important stabilizer of myelin integrity, is relocated into the oligodendrocytes [68]. This is followed by swelling of oligodendrocytes and abnormal uptake or over- expression of ↵-synuclein [69, 70]. The interaction between p25↵ and ↵-synuclein leads to the development of GCIs, and this results in compromised neuronal support and activates microglial and astroglial. As the GCIs further compromise the func- tion of oligodendrocytes, misfolded ↵-synuclein is released extracellularly and may be taken up by the nearby neurons. The resulting neuroinflammation, loss of neu- ronal support and neuronal cytoplasmic inclusions lead ultimately to neuronal death [58, 67, 71, 72, 73, 74, 75]. It is also possible that the extracellular ↵-synuclein may spread in a prion-like fashion to other connected brain areas, resulting in neuronal death throughout the brain [76].

14 Chapter 1

Figure 1.3: Proposed pathogenesis of multiple system atrophy.Acombi- nation of genetic and environmental factors may contribute to the initiation of MSA pathological events. Early events include re-localization of p25↵ into the oligoden- drocyte soma and oligodendrocyte swelling. This is followed by ↵-synuclein build-up in oligodendrocytes, either through over-expression or abnormal uptake. ↵-synuclein and p25↵ interact to form glial cytoplasmic inclusions (GCIs). GCIs hinder neu- ron support, along with the activation of microglial and astroglial. Neighbouring neurons may take up misfolded ↵-synuclein released by the oligodendrocytes. A combination of inflammation, loss of neuronal support and uptake of misfolded ↵- synuclein by neurons could contribute to neuronal death. There may be a possible prion-like propagation of ↵-synuclein to connected areas of the brains. The regions of the brain that experience neuronal death determines the nature of the clinical symptoms. Figure adopted from Longo et al, 2015 [58].

15 Chapter 1

1.7 Multiple system atrophy and transcriptomics

While genomic studies have uncovered genetic links in various neurodegenerative diseases, no conclusive genetic link has been established for MSA. Two single- nucleotides polymorphisms of the SNCA locus are associated with increased risk of MSA in European populations [77, 78]. Casual mutations and polymorphisms the gene coenzyme Q2 4-hydroxybenzoate polyprenyltransferase (COQ2 )havebeen linked to MSA in Japanese populations [79]. Variations in the genes leucine-rich repeat kinase 2 (LRRK2 ), acid beta-glucosidase (GBA), solute carrier family 1 (glutamate/neutral amino acid transporter), Member 4 (SLC1A4 ), sequestosome 1 (SQSTM1 ) and eukaryotic translation initiation factor 4E binding protein 1 (EIF4EBP1 ) have also been associated with MSA [80, 81, 82]. At the transcriptional level loss of copy number of Src Homology 2 Domain Containing transforming protein 2 (SHC2 ) was found in Japanese su↵erers of MSA, but not in patients from the United States of America [83, 84]. Together, no conclusive mechanistic involvement of these genes in MSA disease pathology has been elucidated.

It is thought that transcriptional studies may hold the key to understanding the molecular pathology of MSA and other complex neurodegenerative and psychiatric diseases [14, 54]. RNA-Seq has been used to profile the transcriptomes of su↵erers of Alzheimer’s disease and autism [85, 86, 87]. Further, many alternative spliced transcripts have been linked to various neurodegenerative diseases [56]. To date there has only been one comprehensive transcriptome profiling analysis of MSA brain tissue. This study was performed using microarrays to assess the rostral pons of MSA su↵erers [88]. This study was limited to only analysing the protein-coding fraction of the transcriptome in a single brain region. MSA has not been investigated using RNA-Seq.

16 Chapter 1

1.8 Aims of this study

The research presented here first seeks to understand the transcriptome profile of the healthy human brain, more specifically the transcriptome profiles of GM and WM from the superior frontal gyrus (SFG) located in the prefrontal cortex (PFC). This is followed by a comprehensive analysis of the oligodendrocyte maturation-associated long intergenic non-coding RNA (OLMALINC ), to establish its functional role in the PFC of the human brain.

The first two chapters of the thesis focus on understanding the transcriptome of the healthy brain. The proceeding two chapters focus on a brain a✏icted by MSA. First, transcriptome profiling of the SFG of MSA su↵erers was carried out, followed by an in-depth analysis of linc00320 to establish if it is involved in the molecular pathology of MSA.

Therefore, the aims of this PhD thesis can be broken down into four components:

1. Perform a comparative transcriptome analysis of GM and WM from the SFG of the healthy human brain.

2. Transcriptionally characterise OLMALINC and investigate its functional role in the PFC of the human brain.

3. Perform a comparative transcriptome analysis between MSA brain tissue and healthy brain tissue from the SFG of the human brain.

4. Transcriptionally characterise linc00320 and investigate its role in MSA.

17 Chapter 1

1.9 Thesis synopsis

This doctoral thesis is being submitted as a series of publications. This thesis presents five published articles. The first article in this thesis is a review article that discusses the recent technological advances in RNA-Seq technology [89]. Currently, sequencing technology is advancing at an unprecedented rate, making it more im- portant than ever to stay on top of the rapidly progressing literature. This review article demonstrates an in-depth knowledge of the latest technological advances and an understanding of how antisense transcripts may regulate the genome. As this publication is a review article, it has been included in the introduction (section 1.10). The other four articles included in this PhD thesis are original primary research ar- ticles [90, 91, 92, 93], each covering a di↵erent aim and component of the larger thesis topic. The appendices of this thesis contain three further review articles that were co-authored during the PhD candidature [46, 54, 94].

18 Chapter 1

1.10 Review article: Strand-specific RNA-Seq

provides greater resolution of transcriptome

profiling

Reference Mills, J. D., Kawahara, Y. and Janitz, M. (2013). “Strand-Specific RNA-Seq Provides Greater Resolution of Transcriptome Profiling” Current Genomics 14(3): 173-181.

Contribution I researched and wrote the entirety of this review article. MJ and YK edited and provided feedback throughout the writing process.

Synopsis This review article describes an advancement in RNA-Seq known as strand-specific RNA-Seq. Strand-specific RNA-Seq allows for a RNA molecule to be tracked back to the DNA strand from which it was transcribed. This allows for the identification of antisense transcripts and antisense transcription. In this article di↵erent strand- specific RNA-Seq techniques are compared, and the importance of using strand- specific RNA-Seq for the analysis of complex transcriptomes such as the human brain is highlighted. When this review article was written strand-specific RNA-Seq was a recent development in next generation sequencing. It has now become the standard for transcriptome profiling of human brain tissue.

19 Chapter 1

Declaration

I certify Lhat t his publication was a direct result of my research towards this PhD,

e&ion in t his t hesis docs not breach copyright regulations.

omi nic Mills [PhD Candidate]

20 Send Orders of Reprints at [email protected] Current Genomics, 2013, 14, 173-181 173 Strand-Specific RNA-Seq Provides Greater Resolution of Transcriptome Profiling

1 2 1, James Dominic Mills , Yoshihiro Kawahara and Michael Janitz *

1School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia; 2National Institute of Agrobiological Sciences, Agrogenomics Research Center, Bioinformatics Research Unit, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan

Abstract: RNA-Seq is a recently developed sequencing technology, that through the analysis of cDNA allows for unique insights into the transcriptome of a cell. The data generated by RNA-Seq provides information on gene expression, alter- native splicing events and the presence of non-coding RNAs. It has been realised non-coding RNAs are more then just ar- tefacts of erroneous transcription and play vital regulatory roles at the genomic, transcriptional and translational level. Transcription of the DNA sense strand produces antisense transcripts. This is known as antisense transcription and often results in the production of non-coding RNAs that are complementary to their associated sense transcripts. Antisense tran- scription has been identified in bacteria, fungi, protozoa, plants, invertebrates and mammals. It seems that antisense tran- scriptional ‘hot spots’ are located around nucleosome-free regions such as those associated with promoters, indicating that it is likely that antisense transcripts carry out important regulatory functions. This underlines the importance of identifying the presence and understanding the function of these antisense non-coding RNAs. The information concerning strand ori- gin is often lost during conventional RNA-Seq; capturing this information would substantially increase the worth of any RNA-Seq experiment. By manipulating the input cDNA during the template preparation stage it is possible to retain this vital information. This forms the basis of strand-specific RNA-Seq. With an ability to unlock immense portions of new in- formation surrounding the transcriptome, this cutting edge technology may hold the key to developing a greater under- standing of the transcriptome.

Received on: December 13, 2012- Revised on: January 31, 2013- Accepted on: February 25, 2013 Keywords: Antisense RNA, Next-generation sequencing, Non-coding, RNA, Transcriptome, Pervasive transcription.

1. INTRODUCTION consuming and inefficient system would become prevalent. It is now thought that these non-coding RNAs play major As sequencing techniques become more sophisticated regulatory roles and are involved in chromatin remodeling, and our understanding of molecular biology increases, it has RNA-DNA RNA-RNA, and RNA-protein interactions as become apparent that the pathway from gene to protein is an well as other unknown forms of regulation [5-8]. It is intricate and multifaceted process. Organism complexity thought that this complex network of regulatory elements highlights this; there is no correlation between genome size plays an important role in the development of organism or number of genes and the complexity of an organism [1]. complexity. This highlights the importance of research in the Intuitively, it would be thought that increased organism field of transcriptomics and more specifically ncRNAs. complexity would require a larger number of protein coding genes; this is not the case. A human genome contains ap- One component of the transcriptome of particular interest proximately 30,000 protein coding genes, the fruit fly Dro- is transcription from the DNA sense (or plus) strand. Tran- sophila melanogaster has approximately half this amount, scription from this strand produces transcripts known as an- while the salamander has twenty times this number of genes tisense transcripts or antisense RNAs (asRNAs) [9]. Cur- [2-4]. One trend that does exist amongst higher eukaryotes rently there is a very limited body of research in this area. and indeed humans is an increase in alternative splicing Antisense transcription can produce both protein and non- events and the addition of a variety of non-coding RNAs coding transcripts, with the latter being the most common (ncRNAs) [5]. Up 98% of the transcriptional output of the product of antisense transcription. As antisense transcription genome is made up of ncRNAs. It was initially thought that is a pervasive feature of the mammalian transcriptome, it is this extra transcriptional output was the result of misdirected likely that these transcripts play important regulatory roles transcription, however it seems unlikely that such an energy [9, 10]. Furthermore the act of antisense transcription itself can have a regulatory function. Importantly there are a num- ber of antisense transcripts transcribed from genes that are *Address correspondence to this author at the School of Biotechnology and related to various human disorders [10]. It has also been Biomolecular Sciences, University of New South Wales, Sydney, NSW suggested that asRNAs may play an important role during 2052, Australia; Tel: +612 938 58608; Fax: +612 938 51483; E-mail: [email protected] development and changing environmental conditions by al-

1875-5488/13 $58.00+.00 ©2013 Bentham Science Publishers 174 Current Genomics, 2013, Vol. 14, No. 3 Mills et al. tering expression patterns [11-13]. A full elucidation of the and chromatin modification. As the field of non-coding RNA antisense transcriptome will open up new levels of under- is still relatively young and the field of asRNA compara- standing that may help develop insight into the complex tively unexplored, it is likely that more mechanisms of regu- working of the transcriptome. lation by these elements will be elucidated in the future. RNA-Seq is a next-generation sequencing technique that 2.2.1. Transcription Modulation allows for an in depth look into the transcriptome [14]. It is perhaps the most exciting next-generation sequencing appli- The act of antisense transcription, rather than asRNA cation. It is has many advantages over other methods of tran- molecule itself can modulate gene expression levels. During scriptome analysis such as microarrays, and is adept at iden- transcription RNA polymerase binds to the promoter region tifying alternative splicing events and ncRNAs [14, 15]. of the gene and proceeds along the strand. If transcription While the use of RNA-seq is becoming more common occurs on the DNA sense strand and antisense strand simul- throughout molecular biology, one significant short-coming taneously it can result in the RNA polymerases colliding. of the standard RNA-seq protocol is that it loses the strand of This is known as the transcriptional collision model [10]. origin information for each transcript. This is of particular While it is not exactly known how colliding RNA polym- concern due to the possible regulatory role carried out by erases interact, it would most likely result in either complete antisense transcripts. It is possible to retain this information termination of transcription or the termination of transcrip- pertaining to strand origin by modifying the standard RNA- tion on one strand [24]. Transcriptional collision has been Seq protocol; this is known as strand-specific RNA-Seq [16]. observed in Saccharomyces cerevisiae and bioinformatic Perhaps, due to the increase in the time and knowledge re- studies have suggested that it does occur in mice and humans quired or lack of awareness, this method is severely underu- [25, 26]. The study of mice and humans indicates that a con- tilized in research. This could be leading to vital elements of verse relationship between antisense and sense transcripts the transcriptome being overlooked. Through the use of exists. Furthermore, this relationship was stronger when the strand-specific RNA-Seq a more complete understanding of antisense-sense pair shared a longer section of DNA [25]. As the transcriptome could be achieved, this has the potential to expected the longer the common region between an an- identify new levels of regulation of gene expression. tisense-sense pair, the more likely it would be for transcrip- tional collision to occur. 2. ANTISENSE TRANSCRIPTS 2.2.2. Hybridization of Sense-Antisense RNA Partners 2.1. Antisense Transcription Antisense transcripts are complementary to their sense DNA exists as a double stranded molecule; one strand is partners; this means that there is always a chance of hybridi- known as the sense (or plus) strand the other is known as the zation, and the formation of a RNA duplex. Duplex forma- antisense (or minus) strand. The antisense strand contains all tion can exert regulatory functions in diverse ways. Factors of the pertinent information for the formation of proteins; it that impact on the regulatory implications of the RNA du- was originally thought that this strand was the only strand plex include length of hybridization, location along the RNA that underwent transcription. Transcription from the sense transcript of hybridization and whether the duplex forms in strand is less common and it will produce what is knows as the nucleus or the cytoplasm. The formation of RNA du- antisense transcripts or asRNAs. It was previously thought plexes is probably the main way in which antisense tran- that this form of transcription was an aberration of the norm, scripts exert regulatory functions. however antisense transcription has been identified at higher Antisense transcripts have the potential to mask many then expected levels in prokaryotes and eukaryotes; includ- regulatory components of sense RNA transcripts. Alternative ing humans and other mammals [9, 17-21]. Antisense tran- splicing is a process by which exons and introns of primary scription can produce antisense transcripts that are poly- transcripts can be included, excluded or skipped to form adenylated and undergo the addition of a 5’ cap, which then unique mRNAs. Splicing is controlled by the presence of proceed to the ribosomes for translation. However, the most exonic splicing enhancers/silencers and intronic enhan- common form of antisense transcription in the mammalian cer/silencers, the ratios of these elements impact on the splic- genome is the production of non-coding antisense transcripts ing pattern [27]. These elements contain motifs that will re- that have a protein coding sense partner [9]. Furthermore, cruit splicing machinery to the site. If sections of the tran- antisense transcripts have been documented that partner with script containing these elements are masked, by hybridiza- active promoter sites or those that are in close proximity of tion with an antisense transcript, then the splicing patterns of transcription start sites [17, 22, 23]. While antisense tran- the sense transcript will be changed [10]. For example, the scripts occur at lower abundances than their sense tran- thyroid hormone receptor- gene (THRA) has splicing pat- scripts, all evidence points to non-coding antisense tran- terns that can be altered by the presence of its antisense tran- scripts playing a pivotal role in regulation of the transcrip- script partner [28]. It can be postulated that a similar mecha- tome [19]. nism could also block the binding of miRNA to mRNA and hence alter the regulation of the transcripts [10]. The number 2.2. Antisense Transcripts As Regulators of the Tran- and pattern of splicing elements and miRNA binding sites scriptome that are masked would depend on the length and positioning There exist a variety of pathways in which antisense tran- of the asRNA. Changing the number of these sites available scripts can act as regulatory elements. It is possible to divide could dramatically alter the functionality of the RNA and its these pathways into three broad categories; transcription protein counterpart. Furthermore, antisense transcription has modulation, hybridization of sense-antisense RNA partners been highlighted at elevated levels near promoters and tran- Strand-Specific RNA-Seq Current Genomics, 2013, Vol. 14, No. 3 175 scription start sites hence a similar mechanism could also TSIX is suspected to play a role in X-inactivation, by keeping alter the function of these elements. XIST from coating both X-. It is suspected that TSIX silences XIST through the modification of the chroma- RNA duplex formation in the cytoplasm may alter the tin structure by recruiting protein complexes that are in- ability of a transcript to be translated. It is possible that the duplex formation blocks the ability of the transcript to asso- volved in heterochromatinization [41]. If XIST is repressed by TSIX it follows that the X- will remain ac- ciate with the ribosome hence altering the efficiency of the tive. Strand specific FISH probes have demonstrated that translation machinery. In the case of the gene spleen focus TSIX expression is associated with the active X, while no forming virus proviral integration oncogene spi1 (SPI1 or TSIX is found in cells that have entered the X-inactivation PU.1) its non-coding antisense transcript partner stalls trans- pathway [38]. lation between the initiation and elongation steps [29]. Fur- ther, cytoplasmic RNA duplex formation can also alter the 3. RNA-SEQ ANALYSIS stability of the mRNA. The gene -site amyloid precursor protein (APP)-cleaving enzyme (BACE1) has been associ- RNA-Seq is a recently developed next-generation se- ated with Alzheimer’s disease (AD) [30]. It plays a pivotal quencing technology, that through the analysis of cDNA role in the cleavage of the amyloid precursor protein (APP), allows for unique insights into the transcriptome of a cell. leading to a build up amyloid- peptides. It has been shown The data generated by RNA-Seq provides information on that the antisense transcript of BACE1 is up-regulated in AD gene expression, alternative splicing events, locations of brains [31]. It has been reported that the BACE1 antisense binding sites and the presence of non- transcript and BACE1 will form a duplex and this increases coding RNAs. It has been realised non-coding RNAs are the stability of the transcript through the formation of secon- more than just artefacts of erroneous transcription and play dary and tertiary structures. Formation of these secondary vital regulatory roles at the genomic, transcriptional and and tertiary structures protects the transcript from degrada- translational level. It is thought that these additional levels of tion while still allowing it to function as a cleaving enzyme. regulation are the greatest contributors to the complexity Furthermore, as amyloid- peptides levels are increased the seen in higher eukaryotic organisms such as humans [5, 42]. levels of BACE1 antisense transcripts are also elevated, cre- This makes the identification, annotation and cataloguing of ating a positive feedback loop [31]. mRNA transcripts, non-coding RNAs, microRNAs (miR- NAs) and their associated binding sites one of the major 2.2.3. Chromatin Modification aims of the field of transcriptomics [43]. RNA-Seq plays a Genomic DNA in eukaryotic cells must be assembled key role in completing this goal. A variety of next-generation appropriately to fit within the nucleus of the cell. Generally, sequencing high throughput platforms can be used for RNA- genomic DNA in its packaged state exists as 146bp segments Seq including systems from Roche, Illumina and Applied tightly encompassing a histone protein octamer and these Biosystems [44]. units are known as nucleosomes [32]. Each nucleosome is The data generated by RNA-Seq can then be analyzed connected by a short DNA fragment, this forms a structure using free, open source bioinformatics software tools such as known as chromatin, the chromatin interacts with itself and TopHat and Cufflinks [45]. TopHat, which is a popularly condenses even further to form chromosomes [33, 34]. Evi- used splice junction-aware mapping tool, aligns the RNA- dence has emerged that the structure of the nucleosomes can Seq to a genome of choice, and Cufflinks then assembles vary, and this structure impacts on transcription, replication transcripts and estimates expression levels using alignments and DNA repair [34, 35]. and splice junction information. The bioinformatics analysis It is thought that long non-coding RNAs are involved in can be extended through the use of Cuffdiff. Through the use the regulation of chromatin, possessing the ability to remodel of statistical models Cuffdiff highlights the genes that are the nucleosome complex [24, 36]. Histone modifying en- differentially expressed between two or more samples [45]. zymes lack the specific DNA binding domains that are seen These tools can be applicable to both standard and strand- in other transcription factors [37]. It has been suggested that specific RNA-Seq data by specifying appropriate options. long ncRNAs, such as those produced by antisense transcrip- For the most up to date RNA-Seq bioinformatic analysis tion, may interact with histone modifying enzymes via the workflow description the reader is directed to the protocol formation of specific RNA secondary structures [36]. In this paper by Trapnell et al., 2012 [45]. Open, web-based plat- case long non-coding antisense transcripts may act a re- forms such as Galaxy (http://galaxyproject.org/) have been cruitment vessels for various histone remodeling enzymes. developed to make these programs and various other more This complex will then alter chromatin structure by adding, complex tools user friendly and accessible to all wet-lab re- removing or replacing various chromatin modifications [24, search scientists even those with little or no bioinformatics 36]. training [46-48]. Perhaps the best example of an antisense transcript being RNA-Seq has several advantages over other gene expres- involved in chromatin remodeling is X-inactivation [38]. X- sion quantification platforms such as exon arrays and reverse inactivation is achieved in mammals through regulation of transcription quantitative real-time PCR. RNA-Seq is a high the chromosome by the gene X-inactive specific transcript throughput method that requires low levels of initial input (XIST) [39]. The highly expressed XIST coats one of the X- RNA, it has a high resolution in terms of determination of chromosomes, the coated X will go on to become the inac- transcripts structure and their quantification, low levels of tive X [40]. The asRNA partner, of XIST known as TSIX, is a background noise and produces fewer false positive results 40 kb RNA that is located 15 kb downstream of XIST [38]. [43]. Furthermore, RNA-Seq is able to detect novel, unanno- 176 Current Genomics, 2013, Vol. 14, No. 3 Mills et al. tated transcripts, can reveal the exact location of transcript Diego, California) RNA-Seq protocol utilizes direct frag- boundaries and determine single nucleotide polymorphisms mentation of RNA followed by reverse transcription by ran- in transcribed regions [49]. There is also a growing body of dom hexamers (http://www.illumina.com/products/tru seq_ evidence demonstrating high accuracy and reproducibility of rna_sample_prep_kit_v2.ilmn). RNA-Seq [50]. The major challenges faced by RNA-Seq concern storage, retrieval and processing of large amounts of 5. STRAND-SPECIFIC RNA-SEQ data. However, with the advent of cloud computing these The standard method of RNA-Seq library generation fails issues are becoming increasingly inconsequential [51]. to preserve the information pertaining to which DNA strand was the original template during transcription and subse- 4. STANDARD RNA-SEQ PROTOCOL quent synthesis of the mRNA transcript. Since antisense Despite differences in sequencing reaction chemistries transcripts are likely to have regulatory roles that are dis- and base calling all of the RNA-Seq platforms share com- tinctly different from their protein coding complement, this monalities in terms of major steps in template preparation loss of strand information results in an incomplete under- [43]. The construction of a cDNA library is the first step in standing of the transcriptome [16, 17, 53]. To this end, it is any RNA-Seq workflow. To construct a cDNA library an possible to retain the strand origin of various transcripts us- appropriate RNA fraction must be selected from the total ing a method known as strand-specific RNA-Seq. While RNA preparation. The total RNA of an organism contains more challenging technically and more time consuming then approximately 90% ribosomal (rRNA) [52], for downstream standard RNA-Seq [16], the extra information gathered from RNA-Seq analysis it is therefore important to remove this a strand-specific RNA-Seq experiment cannot be over- potential contaminant from the sample. This can be done in looked. Strand-specific RNA-Seq allows for sense and an- two ways; selection of mRNA transcripts by using oligo(dT) tisense transcript structures to be predicted, overlapping re- primers or through rRNA depletion [52]. It is important to gions of transcription can be identified exactly and expres- realize that oligo(dT) primers only select for polyadenylated sion levels of sense and antisense genes can be more accu- mRNA. As the transcriptome is made up of numerous RNAs rately estimated. species that are non-polyadenylated; including preprocessed Methods for the construction of strand-specific RNA-Seq RNA, tRNA, numerous regulatory RNA molecules, and libraries can be split into two categories (i) the use of known other RNA molecules of unknown function, this method orientation strand-specific adapters, and (ii) the chemical modi- could lead to development of rather simplified picture of the fication of strands [16]. The main methods discussed in this transcriptome [6, 14, 52]. The second method, rRNA deple- paper are summarized and compared in Table 1 and Figure 1. tion can remove up to 99.9% of all large rRNA molecules, while leaving all mRNA and non-polyadenylated transcripts 5.1. Adapter Methods for Stand-Specific RNA-Seq intact. Illumina (San Diego, California) employs a Ribo-Zero kit for the removal of rRNA during RNA sample preparation Adapter methods for strand-specific RNA-Seq are con- (http://www.illumina.com/products/truseq_rna_sample_prep ceptually difficult and require more consideration and plan- _kit_v2.ilmn). This method makes use of a bead capture pro- ning when compared to standard RNA-Seq. There has been cedure, where the beads will selectively bind rRNA mole- an array of different protocols outlined. These methods util- cules, the RNA bound beads are then removed using a mag- ize the known sequences at the 5’ or 3’ end and the relative net. This leaves the diverse RNA species representing an orientations of the RNA transcript to derive strand informa- intact transcriptome. tion. After selection of the appropriate RNA fraction is com- In the strand-specific 3’-end RNA-Seq method anchored pleted, the molecules must be fragmented into smaller oligo(dT) primers are first used to select for mRNA, which pieces, to a size between 200-500bp depending on the se- results in production of double-stranded cDNA molecules quencing platform being used [43]. This fragmentation can [52]. Adapters for paired end sequencing are then ligated to be achieved in two ways; fragmentation of double-stranded each end of the cDNA molecule. Subsequently, the frag- (ds) cDNA or the fragmentation of RNA. Both methods re- ments are sequenced generating pair-end reads that are sult in the same end product of a double stranded cDNA li- aligned to a reference genome. Any aligned read that con- brary in which each fragment has an adapter attached [14]. tains a stretch of adenines at the end of the transcript must be a transcript that originated from the DNA antisense strand, For the direct fragmenting of double stranded (ds) cDNA, while any reads that align with a stretch of thymines at the first the RNA must be reverse transcribed. This can be done front must be a transcript from the DNA sense strand (Fig. using either random hexamers or oligo(dT) primers. Again 1A) [54]. This protocol while reasonable for identifying oligo(dT) primers will fail to pick up any RNAs that are not those antisense transcripts that are capped and polyadeny- polyadenylated. The ds cDNA is then fragmented and primer lated, will miss out on the diverse repertoire of non- adapters are ligated [14]. Alternatively, direct fragmenting of processed RNAs. Due to the use of oligo(dT) primers for the RNA allows for a much more in-depth transcriptome analy- selection of RNA. This constitutes a significant shortcoming sis. First the RNA is fragmented, this is followed by two of this protocol as the main driving force behind strand- rounds of cDNA synthesis (first strand and second strand) specific RNA-Seq is to identify non-coding RNAs, which using random hexamers, and adapters are then ligated to the may or may not be capped and polyadenylated [55]. It is also ds cDNA [14]. Once the ds cDNA libraries are constructed foreseeable that alignment and identification of strand origin they are sequenced producing short reads. The sequencing may be difficult when sequencing organisms with A-T rich can be either single-end or paired-end. The Illumina (San transcriptomes. Strand-Specific RNA-Seq Current Genomics, 2013, Vol. 14, No. 3 177

Table 1. Summary of Strand-Specific RNA-Seq Methods.

Method Advantages Disadvantages References

Strand-specific 3’- Technically simple. Only polyadenylated mRNA selected. [54] end RNA-Seq Follows standard RNA-Seq protocol. Alignment process may be laborious and difficult.

Single-stranded No need for second strand cDNA synthesis. T4 DNA ligase inefficiently ligates adapters to frag- [56] adapter ligation Simplified RNA-Seq protocol. ments. No chemical modification of transcripts.

Flowcell reverse No PCR bias. High initial RNA input. [58, 59] transcription se- Compatible with paired-end sequencing. Only polyadenylated mRNA selected. quencing

Bisulfite Treatment Small modification to standard RNA-Seq protocol. Sequence alignment process can be difficult. [16, 19, 61]

dUTP second Small modification to standard RNA-Seq protocol. Rated Complex and time-consuming template preparation [16, 62-66] strand as the most comprehensive strand-specific sequencing protocol. method.

Another strand-specific RNA-Seq protocol makes use of e_prep_kit.ilmn), and the Life Technologies (California, single-stranded (ss) cDNA and Illumina adapters (Fig. 1B) USA) SOLiD® Total RNA-Seq Kit that preserves strand [56]. The standard RNA-Seq protocol requires the input of specificity through the addition of adapters in a directional ds cDNA and through the generation of this ds cDNA library manner (http://products.invitrogen.com/ivgn/product/ 44453 the strand-specific information is lost. Application of T4 74). DNA ligase allows for linking of 3’ and 5’ adapters to ssDNA [57]. These single-stranded constructs can then un- 5.2. Strand-Specific RNA-Seq by Chemical Modification dergo sequencing. As the second strand is never synthesized Another method of carrying out strand-specific RNA-Seq and does not proceed to sequencing, strand information is involves the marking the RNA strand through chemical retained. This can also be seen as a simplification of the modification, either on the RNA itself or during second standard RNA-Seq method as there is no need for second strand cDNA synthesis [16]. These methods are the most strand cDNA synthesis. An analysis of the single-stranded commonly used methods for strand-specific RNA-Seq and adapter ligation RNA-Seq protocol demonstrated that the involve only a slight deviation from the standard RNA-Seq results produced are comparable to those produced by the protocol. standard RNA-Seq protocol [56]. The first method of chemical modification involves Flowcell reverse transcription sequencing (FRT-Seq), is marking the original RNA template through the use of bisul- another category of RNA-Seq that maintains strand informa- fite treatment (Fig. 1D). Through the addition of bisulfite tion. Of all the methods discussed it diverges the greatest mix, all cytidine residues of the RNA strand are changed to amount from the standard RNA-Seq protocol, but it can be uridine residues [19, 61]. This creates a purified RNA strand loosely classified as an adapter method (Fig. 1C). [58, 59]. that contains an artificially high number of uridine residues. FRT-Seq involves the ligation of specially designed adapters After cDNA synthesis and sequencing the resultant reads to either end of fragmented and purified polyadenylated will contain a high proportion of deoxythymidine residues. mRNA. Each adapter consists of two regions; a region to These reads are then aligned to a converted plus and minus which the sequencing primers anneal and a region that is strand reference genome, where all cytosine residues have complementary to the oligonucleotides present on the flow- been converted to thymine residues. In this case reads from cell [59]. This complementary region allows the mRNA sense transcripts will align with the converted plus strand, fragment to hybridize to the flowcell. The mRNA fragments but not with the converted minus strand or either of the un- are then reverse transcribed on the flowcell surface. Once converted strands. Likewise, reads from antisense transcripts this has been completed cluster amplification and sequencing will align with the converted minus strand, but not converted proceeds as normal. This method avoids library amplifica- plus strand or either of the unconverted strands [19]. In this tion and all of the possible biases, including unequal amplifi- way the transcripts strand of origin is retained. cation efficiencies of different RNA-species and the intro- The second method of chemical modification, known as duction of amplification artifacts. For this protocol to work dUTP second strand, involves a modified second-strand syn- properly it requires a much larger initial RNA input [59]. thesis step. During this step dUTPs are incorporated into re- There are other adapter methods for strand-specific RNA- verse transcription reaction, this results in ds cDNA where the Seq. These include direct strand-specific sequencing (DSSS) original strand has deoxythymidine residues while the com- developed by Vivancos, et al. 2010 [60], which is based on plementary strand contains deoxyuridine residues. Through the Illumina small RNA sample preparation protocol the use of Uracil-DNA-Glycosylase (UDG) these residues are (http://www.illumina.com/products/truseq_small_rna_sampl degraded, which leaves the first strand intact (Fig. 1E) [62-66]. 178 Current Genomics, 2013, Vol. 14, No. 3 Mills et al.

A Sense transcript Antisense transcript B Single stranded ligation of illumia adapters cap AAAAA AAAAA cap i cDNA strand 5’ 3’ polyA selection random frag. cap cap Klenow & T4 AAAAA AAAAA Polymerases polyA selection ii cDNA strand AAAAA TTTVN 5’ 3’ NVTTT AAAAA ss cDNA strand unchanged RT, cDNA synthesis

AAA TTT Klenow exo- & dATP TTT AAA T4 DNA ligase Adapter ligation

AAA TTT iii TTT AAA

Gel purify, PCR, sequence ss adaptors ligated to ss cDNA by T4 DNA ligase AAA TTT PE1 PE2 PE1 PE2 TTT AAA PE2 PE1 PE2 PE1 Map to genome AAA iv Single ended sequencing by synthesis AAA DNA sense strand ORF DNA antisense strand ORF TTT TTT

C Poly A+ RNA D 5’ OH AAAAAAAAAAAA 3’ 5’-ACA ACC AGG GGC TGG CCC TGA CAA TGG-3’ (sense strand) Fragment and 3’-TGT TGG TCC CCG ACC GGG ACT GTT ACC-5’ (antisense strand) dephosphorylate

OH 3’ Cellular Transcription

Ligate 3’ adapter 3’ adapter Sense transcript Antisense transcript 5’-ACA ACC AGG GGC UGG CCC UGA CAA UGG-3’ 5’-CCA UUG UCA GGG CCA GCC CCU GGU UGU-3’ 5’ OH 3’ Bisulfite treatment of purified RNA Gel purify, phosphorylate 5’-AUA AUU AGG GGU UGG UUU UGA UAA UGG-3’ 5’-UUA UUG UUA GGG UUA GUU UUU GGU UGU-3’

5’ PO4 3’ cDNA synthesis, PCR, sequencing Ligate 5’ adapter 5’ adapter 3’ adapter 5’-ATA ATT AGG GGT TGG TTT TGA TAA TGG-3’ 5’-TTA TTG TTA GGG TTA GTT TTT GGT TGT-3’ 3’-TAT TAA TCC CCA ACC AAA ACT ATT ACC-5’ 3’-AAT AAC AAT CCC AAT CAA AAA CCA ACA-5’

Quantify Alignment to genome Reverse transcribe Sequence matched to “virtual” converted sense strand Sequence matched to “virtual” converted antisense strand Cluster and sequence 5’-ATA ATT AGG GGT TGG TTT TGA TAA TGG-3’ 5’-TTA TTG TTA GGG TTA GTT TTT GGT TGT-3’ but does not match to converted antisense strand or but does not match to converted sense strand or either unconverted strand. either unconverted strand.

E First-strand synthesis with normal dNTP’s

Second-strand synthesis with dTTP --> dUTP

Y-adaptor ligation

#Ad2 #Ad1 #Ad1 #Ad2

UNG treatment

Preamplification and sequencing from #Ad1 side

#Ad2 #Ad1

#Ad2 #Ad1

Fig. (1). Different methods of template preparation for strand-specific RNA-Seq. A. Strand-specific 3’-end RNA-Seq. Reads that align with a stretch of adenines at the end of the transcript are sense transcripts originating from the DNA antisense strand. Reads that align with a stretch of thymines at the front of the transcript are antisense transcripts originating from the DNA sense strand [54]. B. Single-stranded adap- ter ligation. Adapters are ligated directly to ss cDNA using T4 DNA ligase. As there is only one strand, stranded information is retained [56]. Strand-Specific RNA-Seq Current Genomics, 2013, Vol. 14, No. 3 179

C. FRT-Seq. Poly(A) RNA is selected and fragmented. Adapters are ligated to the 3’ and 5’ end. The adapters consist of two regions; for the 3’ adapter the light purple region hybridizes to the flowcell surface and the sequencing primers anneal to the dark purple region. Similarly for the 5’ adapter the dark blue region hybridizes to the flowcell and the sequencing primers anneal to the light blue region. The fragments undergo reverse transcription on the flowcell surface then proceed to sequencing [59]. D. Bisulfite Treatment. By applying bisulfite mix to the RNA strand, all cytidine residues are converted to uridine. Through subsequent alignment with converted sense and antisense strands, the strand of origin can be identified. Reads from sense transcripts will align with the converted DNA sense (plus) strand, but not with the con- verted DNA antisense (minus) strand or either of the unconverted strands. Reads from antisense transcripts will align with the converted DNA antisense (minus) strand, but not converted DNA sense (plus) strand or either of the unconverted strands [19]. E. dUTP second strand. During second strand synthesis dUTP are added to the mix rather than dTTPs. These residues are removed by the addition of UNG (also known as UDG), destroying the strand. Only one strand proceeds to sequencing [63]. When sequencing proceeds the polarity information from the REFERENCES original RNA molecules is maintained. A recent protocol by [1] Meinke D.W.; Cherry J.M.; Dean C.; Rounsley S.D. Koornneef M.; Zhang et al., 2012 [66] further optimizes this technique by Arabidopsis thaliana: a model plant for genome analysis. Science ® using a Ribo-zero kit for RNA selection. Illumina’s TruSeq 1998, 282, 662, 79-82. Stranded Total RNA Sample Prep Kit (http://www.illu [2] Orgel L.E. Crick F.H.; Selfish DNA: the ultimate parasite. Nature mina.com/ products/truseq_stranded_total_rna_sample_prep 1980, 284, 604-607. _kit.ilmn) uses a similar method to maintain strand informa- [3] Rubin G.M.; Yandell M.D.; Wortman J.R.; Gabor Miklos G.L.; Nelson C.R.; Hariharan I.K.; Fortini M.E.; Li P.W.; Apweiler R.; tion. For the Illumina protocol second-strand synthesis also Fleischmann W.; Cherry J.M.; Henikoff S.; Skupski M.P.; Misra involves the incorporation of dUTPs, however instead of S.; Ashburner M.; Birney E.; Boguski M.S.; Brody T.; Brokstein using UDG to degrade the residues a special polymerase is P.; Celniker S.E.; Chervitz S.A.; Coates D.; Cravchik A.; used during amplification that will not incorporate these nu- Gabrielian A.; Galle R.F.; Gelbart W.M.; George R.A.; Goldstein cleotides. In a comprehensive comparison of various strand- L.S.; Gong F.; Guan P.; Harris N.L.; Hay B.A.; Hoskins R.A.; Li J.; Li Z.; Hynes R.O.; Jones S.J.; Kuehl P.M.; Lemaitre B.; specific RNA-Seq protocols using RNA extracted from Sac- Littleton J.T.; Morrison D.K.; Mungall C.; O'Farrell P.H.; Pickeral charomyces cerevisiae the dUTP second strand method was O.K.; Shue C.; Vosshall L.B.; Zhang J.; Zhao Q.; Zheng X.H. identified as the leading protocol [16] and it is most preva- Lewis S.; Comparative genomics of the eukaryotes. Science, 2000, lent strand specific RNA-Seq protocol seen in the scientific 287, 2204-2215. literature [62-66]. [4] Venter J.C.; Adams M.D.; Myers E.W.; Li P.W.; Mural R.J.; Sutton G.G.; H.O.; Yandell M.; Evans C.A.; Holt R.A.; Gocayne J.D.; Amanatides P.; Ballew R.M.; Huson D.H.; Wortman 6. CONCLUDING REMARKS J.R.; Zhang Q.; Kodira C.D.; Zheng X.H.; Chen L.; Skupski M.; Subramanian G.; Thomas P.D.; Zhang J.; Gabor Miklos G.L.; Antisense transcription and the expression of antisense Nelson C.; Broder S.; Clark A.G.; Nadeau J.; McKusick V.A.; transcripts add another layer of complexity to the transcrip- Zinder N.; Levine A.J.; Roberts R.J.; Simon M.; Slayman C.; tome and gene regulation. The modes of regulation by an- Hunkapiller M.; Bolanos R.; Delcher A.; Dew I.; Fasulo D.; tisense transcripts are diverse and, as a greater amount of Flanigan M.; Florea L.; Halpern A.; Hannenhalli S.; Kravitz S.; research is directed to this area, it is expected that more Levy S.; Mobarry C.; Reinert K.; Remington K.; Abu-Threideh J.; Beasley E.; Biddick K.; Bonazzi V.; Brandon R.; Cargill M.; modes of regulation will be revealed. Strand-specific RNA- Chandramouliswaran I.; Charlab R.; Chaturvedi K.; Deng Z.; Di Seq holds the key to fully understand the antisense transcrip- Francesco V.; Dunn P.; Eilbeck K.; Evangelista C.; Gabrielian tome. A.E.; Gan W.; Ge W.; Gong F.; Gu Z.; Guan P.; Heiman T.J.; Higgins M.E.; Ji R.R.; Ke Z.; Ketchum K.A.; Lai Z.; Lei Y.; Li Z.; When using RNA-Seq to annotate the transcriptome of Li J.; Liang Y.; Lin X.; Lu F.; Merkulov G.V.; Milshina N.; Moore higher eukaryotic organisms the following might be sug- H.M.; Naik A.K.; Narayan V.A.; Neelam B.; Nusskern D.; Rusch gested. Ribosomal RNA depletion should be used for the D.B.; Salzberg S.; Shao W.; Shue B.; Sun J.; Wang Z.; Wang A.; selection of the RNA fraction to be sequenced, random hex- Wang X.; Wang J.; Wei M.; Wides R.; Xiao C.; Yan C.; Yao A.; Ye J.; Zhan M.; Zhang W.; Zhang H.; Zhao Q.; Zheng L.; Zhong amers for the generation of double-stranded cDNA libraries F.; Zhong W.; Zhu S.; Zhao S.; Gilbert D.; Baumhueter S.; Spier and strand-specific RNA-Seq for sequencing. By following G.; Carter C.; Cravchik A.; Woodage T.; Ali F.; An H.; Awe A.; these guidelines the entirety of the transcriptome can be an- Baldwin D.; Baden H.; Barnstead M.; Barrow I.; Beeson K.; notated including the vital non-processed RNAs, such as Busam D.; Carver A.; Center A.; Cheng M.L.; Curry L.; Danaher ncRNAs and the strand of origin information will be re- S.; Davenport L.; Desilets R.; Dietz S.; Dodson K.; Doup L.; Ferriera S.; Garg N.; Gluecksmann A.; Hart B.; Haynes J.; Haynes tained. These recommendations might help to avoid genera- C.; Heiner C.; Hladun S.; Hostin D.; Houck J.; Howland T.; tion of a limited snapshot of the transcriptome consisting Ibegwam C.; Johnson J.; Kalush F.; Kline L.; Koduru S.; Love A.; almost entirely of protein coding mRNAs and processed Mann F.; May D.; McCawley S.; McIntosh T.; McMullen I.; Moy RNAs, which does not fully demonstrate the complexity of M.; Moy L.; Murphy B.; Nelson K.; Pfannkoch C.; Pratts E.; Puri the transcriptome. V.; Qureshi H.; Reardon M.; Rodriguez R.; Rogers Y.H.; Romblad D.; Ruhfel B.; Scott R.; Sitter C.; Smallwood M.; Stewart E.; Strong R.; Suh E.; Thomas R.; Tint N.N.; Tse S.; Vech C.; Wang CONFLICT OF INTEREST G.; Wetter J.; Williams S.; Williams M.; Windsor S.; Winn-Deen E.; Wolfe K.; Zaveri J.; Zaveri K.; Abril J.F.; Guigo R.; Campbell The authors certify that there are no conflict of interest M.J.; Sjolander K.V.; Karlak B.; Kejariwal A.; Mi H.; Lazareva B.; with any financial organization regarding any of the material Hatton T.; Narechania A.; Diemer K.; Muruganujan A.; Guo N.; discussed in this manuscript. Sato S.; Bafna V.; Istrail S.; Lippert R.; Schwartz R.; Walenz B.; Yooseph S.; Allen D.; Basu A.; Baxendale J.; Blick L.; Caminha ACKNOWLEDGEMENTS M.; Carnes-Stine J.; Caulk P.; Chiang Y.H.; Coyne M.; Dahlke C.; Mays A.; Dombroski M.; Donnelly M.; Ely D.; Esparham S.; The authors would like to thank Dr. Caroline Janitz for Fosler C.; Gire H.; Glanowski S.; Glasser K.; Glodek A.; Gorokhov her expert advice and critical comments during manuscript M.; Graham K.; Gropman B.; Harris M.; Heil J.; Henderson S.; preparation. Hoover J.; Jennings D.; Jordan C.; Jordan J.; Kasha J.; Kagan L.; 180 Current Genomics, 2013, Vol. 14, No. 3 Mills et al.

Kraft C.; Levitsky A.; Lewis M.; Liu X.; Lopez J.; Ma D.; Majoros convergent genes in budding yeast. Proc. Natl. Acad. Sci. U. S. A, W.; McDaniel J.; Murphy S.; Newman M.; Nguyen T.; Nguyen N.; 2002, 99, 8796-8801. Nodell M.; Pan S.; Peck J.; Peterson M.; Rowe W.; Sanders R.; [27] Matlin A.J.; Clark F. Smith C.W.; Understanding alternative Scott J.; Simpson M.; Smith T.; Sprague A.; Stockwell T.; Turner splicing: towards a cellular code. Nat. Rev. Mol. Cell. Biol, 2005, 6, R.; Venter E.; Wang M.; Wen M.; Wu D.; Wu M.; Xia A.; Zandieh 386-398. A. Zhu X.; The sequence of the human genome. Science, 2001, [28] Hastings M.L.; Milcarek C.; Martincic K.; Peterson M.L. Munroe 291, 1304-1351. S.H.; Expression of the thyroid hormone receptor gene, erbAalpha, [5] Mattick J.S.; Non-coding RNAs: the architects of eukaryotic in B lymphocytes: alternative mRNA processing is independent of complexity. EMBO Rep, 2001, 2, 986-991. differentiation but correlates with antisense RNA levels. Nucleic [6] Mattick J.S.; The genetic signatures of noncoding RNAs. PLoS Acids Res, 1997, 25, 4296-4300. Genet, 2009, 5, e1000459. [29] Ebralidze A.K.; Guibal F.C.; Steidl U.; Zhang P.; Lee S.; Bartholdy [7] Mattick J.S. Gagen M.J.; The evolution of controlled multitasked B.; Jorda M.A.; Petkova V.; Rosenbauer F.; Huang G.; Dayaram gene networks: the role of introns and other noncoding RNAs in T.; Klupp J.; O'Brien K.B.; Will B.; Hoogenkamp M.; Borden the development of complex organisms. Mol. Biol. Evol, 2001, 18, K.L.; Bonifer C. Tenen D.G.; PU.1 expression is modulated by the 1611-1630. balance of functional sense and antisense RNAs regulated by a [8] Mattick J.S. Makunin I.V.; Non-coding RNA. Hum. Mol. Genet, shared cis-regulatory element. Genes Dev, 2008, 22, 2085-2092. 2006, 15 Spec No 1, R17-29. [30] Tanahashi H. Tabira T.; Three novel alternatively spliced isoforms [9] Katayama S.; Tomaru Y.; Kasukawa T.; Waki K.; Nakanishi M.; of the human beta-site amyloid precursor protein cleaving enzyme Nakamura M.; Nishida H.; Yap C.C.; Suzuki M.; Kawai J.; Suzuki (BACE) and their effect on amyloid beta-peptide production. H.; Carninci P.; Hayashizaki Y.; Wells C.; Frith M.; Ravasi T.; Neurosci. Lett, 2001, 307, 9-12. Pang K.C.; Hallinan J.; Mattick J.; Hume D.A.; Lipovich L.; [31] Faghihi M.A.; Modarresi F.; Khalil A.M.; Wood D.E.; Sahagan Batalov S.; Engstrom P.G.; Mizuno Y.; Faghihi M.A.; Sandelin A.; B.G.; Morgan T.E.; Finch C.E.; St Laurent G., 3rd; Kenny P.J. Chalk A.M.; Mottagui-Tabar S.; Liang Z.; Lenhard B. Wahlestedt Wahlestedt C.; Expression of a noncoding RNA is elevated in C.; Antisense transcription in the mammalian transcriptome. Alzheimer's disease and drives rapid feed-forward regulation of Science, 2005, 309, 1564-1566. beta-secretase. Nat. Med, 2008, 14, 723-730. [10] Faghihi M.A. Wahlestedt C.; Regulatory roles of natural antisense [32] Luger K.; Mader A.W.; Richmond R.K.; Sargent D.F. Richmond transcripts. Nat. Rev. Mol. Cell. Biol, 2009, 10, 637-643. T.J.; Crystal structure of the nucleosome core particle at 2.8 A [11] Lopez-Barragan M.J.; Lemieux J.; Quinones M.; Williamson K.C.; resolution. Nature, 1997, 389, 251-260. Molina-Cruz A.; Cui K.; Barillas-Mury C.; Zhao K. Su X.Z.; [33] Tremethick D.J.; Higher-order structures of chromatin: the elusive Directional gene expression and antisense transcripts in sexual and 30 nm fiber. Cell, 2007, 128, 651-654. asexual stages of Plasmodium falciparum. BMC Genomics, 2011, [34] Luger K.; Dechassa M.L. Tremethick D.J.; New insights into 12, 587. nucleosome and chromatin structure: an ordered state or a [12] Lu T.; Zhu C.; Lu G.; Guo Y.; Zhou Y.; Zhang Z.; Zhao Y.; Li W.; disordered affair? Nat. Rev. Mol. Cell. Biol, 2012, 13, 436-447. Lu Y.; Tang W.; Feng Q. Han B.; Strand-specific RNA-seq reveals [35] Filion G.J.; van Bemmel J.G.; Braunschweig U.; Talhout W.; Kind widespread occurrence of novel cis-natural antisense transcripts in J.; Ward L.D.; Brugman W.; de Castro I.J.; Kerkhoven R.M.; rice. BMC Genomics, 2012, 13, 721. Bussemaker H.J. van Steensel B.; Systematic protein location [13] Passalacqua K.D.; Varadarajan A.; Weist C.; Ondov B.D.; Byrd B.; mapping reveals five principal chromatin types in Drosophila cells. Read T.D. Bergman N.H.; Strand-specific RNA-seq reveals Cell, 2010, 143, 212-224. ordered patterns of sense and antisense transcription in Bacillus [36] Magistri M.; Faghihi M.A.; St Laurent G., 3rd Wahlestedt C.; anthracis. PloS One, 2012, 7, e43350. Regulation of chromatin structure by long noncoding RNAs: focus [14] Costa V.; Angelini C.; De Feis I. Ciccodicola A.; Uncovering the on natural antisense transcripts. Trends Genet., Genet, 2012, 28, complexity of transcriptomes with RNA-Seq. J. Biomed. 389-396. Biotechnol, 2010, 2010, 853916. [37] Bernstein E. Allis C.D.; RNA meets chromatin. Genes Dev, 2005, [15] Mills J.D. Janitz M.; Alternative splicing of mRNA in the 19, 1635-1655. molecular pathology of neurodegenerative diseases. Neurobiol. [38] Lee J.T.; Davidow L.S. Warshawsky D.; Tsix, a gene antisense to Aging, 2012, 33, 1012 e11-24. Xist at the X-inactivation centre. Nat. Genet, 1999, 21, 400-404. [16] Levin J.Z.; Yassour M.; Adiconis X.; Nusbaum C.; Thompson [39] Brown C.J.; Hendrich B.D.; Rupert J.L.; Lafreniere R.G.; Xing Y.; D.A.; Friedman N.; Gnirke A. Regev A.; Comprehensive Lawrence J. Willard H.F.; The human XIST gene: analysis of a 17 comparative analysis of strand-specific RNA sequencing methods. kb inactive X-specific RNA that contains conserved repeats and is Nat. Methods, 2010, 7, 709-715. highly localized within the nucleus. Cell, 1992, 71, 527-542. [17] Core L.J.; Waterfall J.J. Lis J.T.; Nascent RNA sequencing reveals [40] Clemson C.M.; McNeil J.A.; Willard H.F. Lawrence J.B.; XIST widespread pausing and divergent initiation at human promoters. RNA paints the inactive X chromosome at interphase: evidence for Science, 2008, 322, 1845-1848. a novel RNA involved in nuclear/chromosome structure. J. Cell. [18] Dornenburg J.E.; Devita A.M.; Palumbo M.J. Wade J.T.; Widespread Biol, 1996, 132, 259-275. antisense transcription in Escherichia coli. mBio, 2010, 1. [41] Ohhata T.; Hoki Y.; Sasaki H. Sado T.; Crucial role of antisense [19] He Y.; Vogelstein B.; Velculescu V.E.; Papadopoulos N. Kinzler transcription across the Xist promoter in Tsix-mediated Xist K.W.; The antisense transcriptomes of human cells. Science, 2008, chromatin modification. Development, 2008, 135, 227-235. 322, 1855-1857. [42] Mercer T.R.; Dinger M.E. Mattick J.S.; Long non-coding RNAs: [20] Lindberg J. Lundeberg J.; The plasticity of the mammalian insights into functions. Nat. Rev. Genet, 2009, 10, 155-159. transcriptome. Genomics, 2010, 95, 1-6. [43] Wang Z.; Gerstein M. Snyder M.; RNA-Seq: a revolutionary tool [21] Wagner E.G. Simons R.W.; Antisense RNA control in bacteria, for transcriptomics. Nat. Rev. Genet, 2009, 10, 57-63. phages, and plasmids. Annu. Rev. Microbiol, 1994, 48, 713-742. [44] Janitz M.; Next-Generation Genome Sequencing-Towards Persona [22] Preker P.; Nielsen J.; Kammler S.; Lykke-Andersen S.; Christensen lised Medicine. Wiley-VCH, Weinheim, Germany, 2008. M.S.; Mapendano C.K.; Schierup M.H. Jensen T.H.; RNA [45] Trapnell C.; Roberts A.; Goff L.; Pertea G.; Kim D.; Kelley D.R.; exosome depletion reveals transcription upstream of active human Pimentel H.; Salzberg S.L.; Rinn J.L. Pachter L.; Differential gene promoters. Science, 2008, 322, 1851-1854. and transcript expression analysis of RNA-seq experiments with [23] Seila A.C.; Calabrese J.M.; Levine S.S.; Yeo G.W.; Rahl P.B.; TopHat and Cufflinks. Nat. Protoc, 2012, 7, 562-578. Flynn R.A.; Young R.A. Sharp P.A.; Divergent transcription from [46] Blankenberg D.; Von Kuster G.; Coraor N.; Ananda G.; Lazarus active promoters. Science, 2008, 322, 1849-1851. R.; Mangan M.; Nekrutenko A. Taylor J.; Galaxy: a web-based [24] Lapidot M. Pilpel Y.; Genome-wide natural antisense transcription: genome analysis tool for experimentalists. Current protocols in coupling its regulation to its different regulatory mechanisms. molecular biology / edited by Frederick M Ausubel, 2010, Chapter EMBO Rep, 2006, 7, 1216-1222. 19, Unit 19.10.1-21. [25] Osato N.; Suzuki Y.; Ikeo K. Gojobori T.; Transcriptional [47] Giardine B.; Riemer C.; Hardison R.C.; Burhans R.; Elnitski L.; interferences in cis natural antisense transcripts of humans and Shah P.; Zhang Y.; Blankenberg D.; Albert I.; Taylor J.; Miller W.; mice. Genetics, 2007, 176, 1299-1306. Kent W.J. Nekrutenko A.; Galaxy: a platform for interactive large- [26] Prescott E.M. Proudfoot N.J.; Transcriptional collision between scale genome analysis. Genome Res, 2005, 15, 1451-1455. Strand-Specific RNA-Seq Current Genomics, 2013, Vol. 14, No. 3 181

[48] Goecks J.; Nekrutenko A. Taylor J.; Galaxy: a comprehensive [57] Kuhn H. Frank-Kamenetskii M.D.; Template-independent ligation approach for supporting accessible, reproducible, and transparent of single-stranded DNA by T4 DNA ligase. FEBS J, 2005, 272, computational research in the life sciences. Genome Biol, 2010, 11, 5991-6000. R86. [58] Mamanova L.; Andrews R.M.; James K.D.; Sheridan E.M.; Ellis [49] Morin R.; Bainbridge M.; Fejes A.; Hirst M.; Krzywinski M.; Pugh P.D.; Langford C.F.; Ost T.W.; Collins J.E. Turner D.J.; FRT-seq: T.; McDonald H.; Varhol R.; Jones S. Marra M.; Profiling the amplification-free, strand-specific transcriptome sequencing. Nat. HeLa S3 transcriptome using randomly primed cDNA and Methods, 2010, 7, 130-132. massively parallel short-read sequencing. Biotechniques, 2008, 45, [59] Mamanova L. Turner D.J.; Low-bias, strand-specific transcriptome 81-94. Illumina sequencing by on-flowcell reverse transcription (FRT- [50] Marioni J.C.; Mason C.E.; Mane S.M.; Stephens M. Gilad Y.; seq). Nat. Protoc, 2011, 6, 1736-1747. RNA-seq: an assessment of technical reproducibility and [60] Vivancos A.P.; Guell M.; Dohm J.C.; Serrano L. Himmelbauer H.; comparison with gene expression arrays. Genome Res, 2008, 18, Strand-specific deep sequencing of the transcriptome. Genome Res, 1509-1517. 2010, 20, 989-999. [51] Schadt E.E.; Linderman M.D.; Sorenson J.; Lee L. Nolan G.P.; [61] Schaefer M.; Pollex T.; Hanna K. Lyko F.; RNA cytosine Computational solutions to large-scale data management and methylation analysis by bisulfite sequencing. Nucleic Acids Res, analysis. Nat. Rev. Genet, 2010, 11, 647-657. 2009, 37, e12. [52] Chen Z. Duan X.; Ribosomal RNA depletion for massively parallel [62] Borodina T.; Adjaye J. Sultan M.; A strand-specific library bacterial RNA-sequencing applications. Methods Mol. Biol. 2011, preparation protocol for RNA sequencing. Methods Enzymol, 2011, 733, 93-103. 500, 79-98. [53] Chatterjee A.; Johnson C.M.; Shu C.C.; Kaznessis Y.N.; [63] Parkhomchuk D.; Borodina T.; Amstislavskiy V.; Banaru M.; Ramkrishna D.; Dunny G.M. Hu W.S.; Convergent transcription Hallen L.; Krobitsch S.; Lehrach H. Soldatov A.; Transcriptome confers a bistable switch in Enterococcus faecalis conjugation. analysis by strand-specific sequencing of complementary DNA. Proc. Natl. Acad. Sci. U. S. A, 2011, 108, 9721-9726. Nucleic Acids Res, 2009, 37, e123. [54] Yoon O.K. Brem R.B.; Noncanonical transcript forms in yeast and [64] Sultan M.; Dokel S.; Amstislavskiy V.; Wuttig D.; Sultmann H.; their regulation during environmental stress. RNA, 2010, 16, 1256- Lehrach H. Yaspo M.L.; A simple strand-specific RNA-Seq library 1267. preparation protocol combining the Illumina TruSeq RNA and the [55] Tupy J.L.; Bailey A.M.; Dailey G.; Evans-Holm M.; Siebel C.W.; dUTP methods. Biochem. Biophys. Res. Commun, 2012, 422, 643- Misra S.; Celniker S.E. Rubin G.M.; Identification of putative 646. noncoding polyadenylated transcripts in Drosphila melanogaster. [65] Wang L.; Si Y.; Dedow L.K.; Shao Y.; Liu P. Brutnell T.P.; A low- Proc. Natl. Acad. Sci. U. S. A, 2005, 102, 5495-5000. cost library construction protocol and data analysis pipeline for [56] Croucher N.J.; Fookes M.C.; Perkins T.T.; Turner D.J.; Marguerat Illumina-based strand-specific multiplex RNA-seq. PloS One, S.B.; Keane T.; Quail M.A.; He M.; Assefa S.; Bahler J.; Kingsley 2011, 6, e26426. R.A.; Parkhill J.; Bentley S.D.; Dougan G. Thomson N.R.; A [66] Zhang Z.; Theurkauf W.E.; Weng Z. Zamore P.D.; Strand-specific simple method for directional transcriptome sequencing using libraries for high throughput RNA sequencing (RNA-Seq) prepared Illumina technology. Nucleic Acids Res, 2009, 37, e148. without poly(A) selection. Silence, 2012, 3, 9.

Chapter 2

Transcriptome profiling of the healthy human brain using RNA-Seq

30 Chapter 2

2.1 Primary research article: Unique transcrip-

tome patterns of the white and grey matter

corroborate structural and functional hetero-

geneity in the human frontal lobe

Reference Mills, J. D., T. Kavanagh, W. S. Kim, B. J. Chen, Y. Kawahara, G. M. Halliday and M. Janitz (2013).“Unique transcriptome patterns of the white and grey mat- ter corroborate structural and functional heterogeneity in the human frontal lobe.” PLoS One 8(10): e78480.

Contribution I wrote this article, prepared all of the figures, isolated the RNA for sequencing and carried out all of the bioinformatic analyses. MJ, WSK and YK provided feedback throughout the writing process. TK and BJC helped with the initial enrichment analysis. MJ and GMH conceived the experiments.

Synopsis This article is the first research component of this PhD. A comparative analysis of GM and WM isolated from the superior frontal gyrus (SFG) of the human brain was performed. The transcriptomes of GM and WM from the prefrontal cortex (PFC) displayed a high degree of alternative splicing of both protein-coding and non-coding genes and contained numerous novel transcripts and lincRNAs. LincRNAs appeared to be an important factor in cellular di↵erentiation in the human brain. Genes such as the G protein-coupled receptor 123 (GPR123 )displayedaphenomenon known as isoform switching, in which the dominant splice variant of a gene changes depending on the tissue that it was expressed in. Gene ontology enrichment analysis of di↵erentially expressed genes revealed over-representation of genes involved in

31 Chapter 2 synaptic processes in GM and myelination regulation and axonogenesis in WM, accurately refl ecting the distinct roles these tissue types play in brain physiology.

This research article advanced the knowledge of the healthy brain transcriptome, providing a reference set that could be used to compare diseased transcriptomes and confirmed the presence of numerous lincRNAs throughout the brain. 'When this article was published (2013), it was first time that RNA-Seq has been used to profile the transcriptome of distinct brain regions.

Declaration I certify that this publication was a direct result of my research towards t his PhD, and that repr duction in this thesis does not breach copyright regulations.

ominic :tvlills [PhD Candidate]

32 Unique Transcriptome Patterns of the White and Grey Matter Corroborate Structural and Functional Heterogeneity in the Human Frontal Lobe

James D. Mills1, Tomas Kavanagh1, Woojin S. Kim2,3, Bei Jun Chen1, Yoshihiro Kawahara4, Glenda M. Halliday2,3, Michael Janitz1*

1 School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia, 2 Neuroscience Research Australia, Sydney, New South Wales, Australia, 3 School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia, 4 National Institute of Agrobiological Sciences, Agrogenomics Research Center, Bioinformatics Research Unit, Tsukuba, Ibaraki, Japan

Abstract

The human frontal lobe has undergone accelerated evolution, leading to the development of unique human features such as language and self-reflection. Cortical grey matter and underlying white matter reflect distinct cellular compositions in the frontal lobe. Surprisingly little is known about the transcriptomal landscape of these distinct regions. Here, for the first time, we report a detailed transcriptomal profile of the frontal grey (GM) and white matter (WM) with resolution to alternatively spliced isoforms obtained using the RNA-Seq approach. We observed more vigorous transcriptome activity in GM compared to WM, presumably because of the presence of cellular bodies of neurons in the GM and RNA associated with the nucleus and perinuclear space. Among the top differentially expressed genes, we also identified a number of long intergenic non-coding RNAs (lincRNAs), specifically expressed in white matter, such as LINC00162. Furthermore, along with confirmation of expression of known markers for neurons and oligodendrocytes, we identified a number of genes and splicing isoforms that are exclusively expressed in GM or WM with examples of GABRB2 and PAK2 transcripts, respectively. Pathway analysis identified distinct physiological and biochemical processes specific to grey and white matter samples with a prevalence of synaptic processes in GM and myelination regulation and axonogenesis in the WM. Our study also revealed that expression of many genes, for example, the GPR123, is characterized by isoform switching, depending in which structure the gene is expressed. Our report clearly shows that GM and WM have perhaps surprisingly divergent transcriptome profiles, reflecting distinct roles in brain physiology. Further, this study provides the first reference data set for a normal human frontal lobe, which will be useful in comparative transcriptome studies of cerebral disorders, in particular, neurodegenerative diseases.

Citation: Mills JD, Kavanagh T, Kim WS, Chen BJ, Kawahara Y, et al. (2013) Unique Transcriptome Patterns of the White and Grey Matter Corroborate Structural and Functional Heterogeneity in the Human Frontal Lobe. PLoS ONE 8(10): e78480. doi:10.1371/journal.pone.0078480 Editor: Thomas Preiss, The John Curtin School of Medical Research, Australia Received July 3, 2013; Accepted September 13, 2013; Published October 23, 2013 Copyright: © 2013 Mills et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a National Health and Medical Research Council of Australia (NHMRC) project grant (#1022325). GMH is a NHMRC Senior Principal Research Fellow (#630434). Tissues were received from the Sydney Brain Bank at Neuroscience Research Australia and the New South Wales Tissue Resource Centre at the University of Sydney which are supported by the National Health and Medical Research Council of Australia (NHMRC), University of New South Wales, Neuroscience Research Australia, Schizophrenia Research Institute and National Institute of Alcohol Abuse and Alcoholism (NIH (NIAAA) R24AA012725). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. * E-mail: [email protected]

Introduction structures, they are expected to show significant connectivity to other brain regions. This part of the brain manages the most The human cerebrum is extraordinarily complex and is complex thought, decision making, planning, conceptualization, composed of billions of neurons and trillions of synaptic attention control, and working memory [1,2]. connections. Neurons are organized into circuit assemblies that Broadly speaking, the human cortex can be divided into the are modulated by specific interneurons and non-neuronal cells. phylogenetically older allocortex and the newer neocortex. The The frontal lobe is often considered the most highly developed neocortex has expanded the most in humans and is composed and most human featured brain region. As the prefrontal and of six superimposed layers with distinct cellular compositions; frontal cortexes exert executive type control over other the thickness of each layer differs in each cortical lobe [3]. The

PLOS ONE | www.plosone.org 1 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

most prominent distinction of the cerebrum is its division into and the RIN values ranged between 6.0 and 7.0. This RIN outer cortical grey matter (GM) and inner white matter (WM). range was previously shown to have a little effect on relative GM consists of neural cell bodies, their dendrites, and parts of gene expression ratios [12] (and our unpublished their axons, as well as glial cells, mainly astrocytes. In contrast, observations). Six RNA samples (three WM and three GM) WM is mainly aggregations of myelinated and non-myelinated were prepared for sequencing according to the Illumina TruSeq axons linking different cortical and subcortical regions [4]. Of RNA sample preparation guide and subjected to 100 bp paired- long-neglected significance, WM has recently started to be a end sequencing using Illumina HiSeq1000. The sequence data subject of intensive studies due its involvement in the have been submitted to the NCBI Short Read Archive with development of working memory capacity and reading ability accession number SRA091951. [5]. Recently, the frontal WM has been proposed as a major contributor to human brain enlargement and higher structural Mapping of RNA-Seq reads using TopHat connectivity, as compared to other primates [6,7]. Bioinformatics analysis was carried out using Galaxy; an Despite several RNA-Seq-based studies investigating whole open access web-based program that contains a variety of transcriptome profiles of cerebral tissue in pathological next-generation sequencing analysis tools including, TopHat conditions such as Alzheimer’s disease (AD) [8,9] or autism and the Cufflinks package [13-15]. The Galaxy server was [10] there has been no systematic attempt to investigate based at the Garvan Institute, Sydney, Australia. Using TopHat transcriptomic landscape of normal brain tissue with resolution the reads were processed and aligned to the H. sapiens of RNA sequencing. Hawrylycz and colleagues recently reference genome (build hg19). TopHat utilizes the ultra high- published a microarray-based transcriptome profile of the throughput short read aligner Bowtie to align the RNA-Seq distinct brain regions in two individuals [11]. They provided reads, the reads are then analyzed and splice junctions important insights into the spatial distribution of expression between the exons are identified [16].The default parameters across well-defined neuroanatomical regions. The critical for TopHat were used. Subsequently the aligned reads from conclusion of this study was the importance of local gene expression patterns for the maintenance of physiological each sample were analyzed for 5’-3’ end bias using RSeQC uniqueness within these regions. Because this analysis was [17]. performed using microarray technology, it does not provide further information on posttranscriptional control in the brain, Transcript assembly with Cufflinks namely, alternative splicing. The aligned reads were processed with Cufflinks. Cufflinks Here, we report, for the first time, detailed transcriptome assembles the RNA-Seq reads into individual transcripts, profiles of GM and WM of the human lobe using RNA-Seq. inferring the splicing structure of the genes [18]. Cufflinks Comparative analysis of the gene and isoform expression, assembles the data parsimoniously giving a minimal set of combined with the pathway analysis, revealed surprisingly transcripts that fits the data. Cufflinks normalizes the RNA-Seq distinct transcriptome patterns reflecting the contribution of fragment counts to estimate the abundance of each transcript. different glial cell types and neuronal structures to WM and Abundance was measured in the units of fragments per GM, respectively. Moreover we observe elevated expression of kilobase of exon per million fragments mapped (FPKM). For lincRNAs in WM, as well as isoform switching between WM this analysis a .GTF annotation file (iGenomes UCSC hg19 and GM for genes encoding DNA binding proteins and proteins gene annotation) was used to guide the assembly. involved in signal transduction. Differential analysis with Cuffmerge and Cuffdiff Materials and Methods The alignment files produced by TopHat are merged for CuffDiff processing so that combinatorial pairwise sample Human brain tissue comparison is performed. The output GTF files from each of Human brain tissues were obtained from the Sydney Brain the Cufflinks analysis and the .GTF annotation file were sent to Bank and NSW Tissue Resource Centre, part of The Australian Cuffmerge [18]. Cuffmerge takes these files and amalgamates Brain Bank Network funded by the National Health and Medical them into a single unified transcript catalog; it also filters out Research Council of Australia. Ethics approval was from the any transcribed fragments that may be artifacts. The inclusion University of New South Wales Human Research Ethics of the reference annotation allows gene names and other Committee. Frozen brain tissue samples from superior frontal details such as, transcript ID, exon number, transcription start gyrus (SFG) GM and WM were collected from three individuals site ID and coding sequence ID to be added to the merged aged 79, 94 and 98. The PMI of samples ranged 8-24 hrs and transcript catalogue. It also allows for the gene and transcripts pH 5.77-6.65. All three brains were pathologically diagnosed to be classified as known or novel. The merged GTF file was and were free of any pathology or neurodegeneration. then fed to Cuffdiff along with the original alignment files produced from TopHat. Cuffdiff takes the replicates from each RNA isolation, library preparation and sequencing condition and looks for statistically significant changes in gene Total RNA was isolated using RNeasy Lipid Tissue Midi Kit expression, transcript expression, splicing and promoter use. (Qiagen) followed by RNase-free DNase treatment to remove Cuffdiff uses a corrected p-value, known as the q-value to traces of genomic DNA. The RNA quality of the total RNA was determine if the differences between the two groups are assessed using the Agilent 2100 Bioanalyser RNA Nano Chip significant (q-value<0.05).

PLOS ONE | www.plosone.org 2 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Visualization with CummeRbund and Interactive gene reveals a different picture; there were 9,427 genes with 1 Genome Viewer isoform and 8,653 genes with between two and seven The resultant Cuffdiff output files were fed into isoforms. There were only 107 genes with eight or more CummeRbund. CummeRbund is an R package that is isoforms. This suggests that almost 50% of genes undergo designed to simplify the analysis of the Cuffdiff outputs. alternative splicing. CummeRbund is user friendly and allows for easy data Analysis using RSeQC showed that the aligned reads from exploration and figure generation [19]. The Broad Institutes each RNA sample had a similar degree of bias towards the 3’ Integrative Genome Viewer (IGV) (http:// end of each transcript (Figure S1). This result was expected www.broadinstitute.org/igv/), was used to visualize Cufflinks due to the poly-A selection of RNA for sequencing. Of note, the GTF outputs, this allowed for comparisons to be made between Cufflinks package, utilized in this study, has been designed to genes of known structure and the gene structure of novel correct for any sequence bias that occurs due to steps transcripts identified by Cufflinks [20,21]. undertaken during template preparation [24].

Gene-set enrichment analysis with DAVID Differentially expressed genes and isoforms The gene list of differential expressed genes was split into Within in the set of 18,362 expressed genes, a total of 1,652 two groups; those up-regulated in GM and those up-regulated were identified as differentially expressed between the two in WM. Only annotated genes can be utilized by enrichment conditions (q-value<0.05) (Figure 1). This included 1,218 that tools, all novel genes and indecisively annotated genes were were up-regulated in GM and 434 that were up-regulated in removed. Each of these lists was fed into the Database for WM (Table S1). Overall, there was a transcription bias toward Annotation, Visualization and Integrated Discovery (DAVID) GM. To preclude possible discordant results the gene lists (http://david.abcc.ncifcrf.gov/) [22]. DAVID tested the gene were filtered so that any gene with an FPKM<1 for both ontology (GO) terms for over representation in each of the conditions was excluded. An FPKM<1 means that there is less gene lists. The GO terms list produced by DAVID, were than one fragment per million aligned fragments mapped onto processed using the ‘Enrichment Map’ plug in for Cytoscape a 1-kb exon; this can be considered the result of background (http://www.cytoscape.org/) [23]. This produces a visual output noise arising from erroneous sequencing or statistical errors of the text based GO term lists. during mapping. This reduced the overall list of differentially expressed genes to 1,591, of which 1,162 genes were up- In situ hybridization validation regulated in GM and 429 genes were up-regulated in WM, the respective top 10 up-regulated genes in GM and WM by fold- Our RNA-Seq expression data were compared with in situ change are shown in Table 1. Of the top 10 up-regulated genes hybridization (ISH) data from the Allen brain atlas database in GM, there were two unannotated genes, and among the top (http://www.brain-map.org/), which uses RNA probes to 10 up-regulated genes in WM, there were three unannotated measure gene expression in normal human dorsolateral frontal genes. On further inspection of the size and location of these cortex. genes, it was established that these genes fit the criteria for lincRNAs. Further, there was one annotated lincRNA Results (LINC00162) in the top 10 up-regulated genes in WM. Comparing the entirety of the lists, there were more Total transcription in GM and WM lincRNAs in WM than GM (3 to 1). While there were more An expression catalogue containing 32,740 genes was unannotated up-regulated genes in GM (80) than WM (57), the created by the Cufflinks package. Of this catalogue, 18,362 of unannotated genes in WM represented a greater proportion of the genes were considered expressed in the analyzed brain the up-regulated genes in WM than the proportion of up- samples. This number includes 3,615 unannotated genes, 44 regulated genes in GM represented by unannotated genes. A small nucleolar RNAs (snoRNAs), 67 micro RNAs (miRNAs), chi-squared statistical test was performed to determine if there 40 lincRNAs and 52 RNAs of uncharacterized function was any statistically significant difference between the (locRNAs). Short non-coding RNA sequences were proportions of unannotated genes in each of the data sets. The presumably derived from their polyadenylated long non-coding chi-squared test conclusively showed that the difference in the RNA parent transcript or might be carryovers captured during proportions of unannotated genes in each data set was greater library preparation. The cumulative coverage of the expressed than what would be expected by chance alone (p<0.00005), transcripts in GM and WM was approximately 37% of the entire suggesting that, overall, there was a statistically significant human genome; this figure includes the transcribed introns that higher proportion of unannotated genes identified in WM. were spliced out. There was no significant difference between Cuffdiff identified 36,306 isoforms in both GM and WM, the total human genome coverage of GM and WM. including 3844 unannotated isoforms (Figure 2). Of this set, The developed isoform catalogue assembled by Cufflinks 882 isoforms were significantly differentially expressed contained over 85,000 distinct isoforms (splice variants). between GM and WM (q-value<0.05), including 681 isoforms Among these isoforms, 36,306 were identified as being up-regulated in GM and 201 up-regulated in WM (Table S2). expressed in the selected regions of brain tissue, including When the criteria of having at least one condition with an 3844 unannotated isoforms, which equates to approximately FPKM>1 was applied this list was reduced to a total of 856 two isoforms per gene. The actual distribution of isoforms per isoforms, with 657 isoforms up-regulated in GM and 199 up-

PLOS ONE | www.plosone.org 3 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 1. Volcano plot of gene expression in GM and WM. The fold-change of the genes was relative to their expression in WM. Those genes with a negative fold-change were up-regulated in GM (down-regulated in WM) and those genes with a positive fold-change were up-regulated in WM (down-regulated in GM). Genes that were statistically significant (q-value<0.05) are shown in red and were listed in Table S1. This figure demonstrates that a larger number of genes were significantly up-regulated in GM (1218) than were up-regulated in WM (434). Overall there was a greater spread of data for the genes that are up-regulated in GM. doi: 10.1371/journal.pone.0078480.g001 regulated in WM. The top ten differentially expressed isoforms protein. The exon structure of the novel isoform GPR123-004 sorted by fold-change from GM and WM respectively are listed was most similar to GPR123-003, but it lacks the 5th exon. The in Table 2. This list shares almost no commonality with Table 1; lack of this exon is predicated to result in a 199-amino acid only C1QL3 and FLJ41278 (up-regulated in GM) appear in truncation at the C-Terminal end of the GPR123-004 protein both tables. All of the top 10 up-regulated isoforms in WM are when compared to GPR123-003. GPR123-005 was most unique to WM and have no expression in GM (Table 2). In fact, similar to GPR123-002, albeit with a slight modification to the 5’ the top 24 up-regulated isoforms in WM were not detected at region. Translation of GPR123-005 was predicted to result in a all in GM; in contrast, there was only one unique up-regulated protein similar to GPR123-002. Among the four GPR123 isoform present in GM (Table S2). isoforms, three different transcription start sites (TSS) were identified and GPR123-003 and GPR123-004 had the same G protein-coupled receptor 123 TSS, while the two N-terminal truncated isoforms The G protein-coupled receptor 123 (GPR123) gene had one (GPR123-002 and GPR123-005) both utilized unique TSS. such isoform that was uniquely expressed in WM. However, In GM, the dominant isoform was found to be the novel when analyzed as an entire gene, GPR123 was found to be 5- truncated isoform GPR123-005. This isoform contributed to fold up-regulated in GM, with overall expression levels of 10 approximately 70% of the total GPR123 expression in GM, and FPKM in GM and 2 FPKM in WM. Further investigation the full length protein coding isoform GPR123-003 and the revealed unique splicing patterns that underline the importance truncated protein coding isoform GPR123-002 contributed to of analyzing genes at the isoform level. approximately 20% and 10% of the total GPR123 expression Overall, there were four splice variants identified in both GM seen in GM, respectively (Figure 4). These three isoforms were and WM, including two previously identified splice variants— all expressed at a higher level in GM when compared to WM, GPR123-002 (ENST00000392607) and GPR123-003 with the 13-fold up-regulation of the GPR123-005 considered to (ENST0000039606)—and two novel splice variants— be statistically significant. The TSS of GPR123-005 was also GPR123-004 and GPR123-005 (Figure 3). The two previously up-regulated in GM. GPR123-004 was not expressed at all in identified isoforms (GPR123-002 and GPR123-003) are both GM. In WM, the three GM expressed isoforms were expressed protein coding; GPR123-003, with its extended N-terminal, can at very low levels (<0.6 FPKM). In GPR123-004, the isoform be considered a full-length, fully functional protein. By that was not expressed in GM becomes the dominant isoform, comparison, GPR123-002 was missing the first four exons. Its contributing to 70% of all expression of GPR123 in WM. In this first exon is an untranslated alternate exon. The translation of case, the full length protein coding isoform GPR123-003 GPR123-002 results in a 97-amino acid N-terminal truncated contributed to only 1.5% of the total GPR123 expression.

PLOS ONE | www.plosone.org 4 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Table 1. Top 10 up-regulated genes in GM and WM.

Up-regulated in GM Gene Description Chrom. FPKM GM FPKM WM Fold Change q-value Ensembl ID CBLN4 cerebellin 4 precursor chr20 9.98418 0.0288865 345.6348121 0.00290815 ENSG00000054803 - intergenic, 1 exon, 5391bps chr10 1.04349 0.00475891 219.2707994 0.0473629 not annotated potassium voltage-gated channel, delayed-rectifier, subfamily S, KCNS2 chr8 1.56421 0.00718485 217.7094859 4.20E-06 ENSG00000156486 member 2 SDR16C5 short chain dehydrogenase/reductase family 16C, member 5 chr8 1.46311 0.00753222 194.2468489 0.0200461 ENSG00000170786 KIAA1239 KIAA1239 chr4 1.12395 0.00660563 170.1503112 0.000270041 ENSG00000174145 GLRA3 glycine receptor, alpha 3 chr4 1.31853 0.00902946 146.0253437 0.000379934 ENSG00000145451 C1QL3 complement component 1, q subcomponent-like 3 chr10 6.69439 0.0476218 140.5740648 0.00597669 ENSG00000165985 intergenic, 2 splice variants: 1st: 1 exon, 3048 bps 2nd: 2 exons, - chr17 2.88395 0.0245143 117.6435795 0.00298648 not annotated 2469 bps FLJ41278 uncharacterized LOC400046 chr12 1.74655 0.0150581 115.9874088 0.000318564 ENSG00000255693 solute carrier family 17 (sodium-dependent inorganic phosphate SLC17A6 chr11 2.44851 0.0226146 108.2712053 0.00269117 ENSG00000091664 cotransporter), member 6 Up-regulated in WM Gene Description Chrom. FPKM GM FPKM WM Fold Change q-value Ensembl ID SLC47A2 solute carrier family 47, member 2 chr17 0.0592195 4.7187 79.68152382 0.00109939 ENSG00000180638 - intergenic, 1 exon, 6039 bps chr7 0.0201173 1.58368 78.72229375 0.04031 not annotated CCDC19 coiled-coil domain containing 19 chr1 0.0212159 1.35457 63.84692613 0.0248006 ENSG00000213085 LINC00162 long intergenic non-protein coding RNA 162 chr21 0.0968483 5.6667 58.51109415 0.00926851 ENSG00000224930 serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, SERPINA5 chr14 0.0486903 1.87777 38.56558699 0.00467642 ENSG00000188488 antitrypsin), member 5 intergenic, 2 splice varaints, both 2 exons. 1st: 3294 bps 2nd: - chr5 0.0833172 2.88574 34.63558545 0.0163335 not annotated 356 bps DAO D-amino-acid oxidase chr12 0.180005 6.11662 33.98027833 0.000196888 ENSG00000110887 - intergenic, 2 exons, 6042 bps chr17 0.0364359 1.18023 32.39195409 0.0345805 not annotated FFAR3 free fatty acid receptor 3 chr19 0.0459175 1.13566 24.73261828 0.0496471 ENSG00000185897 LTF lactotransferrin chr3 0.520221 12.5738 24.17011232 4.14E-06 ENSG00000012223 doi: 10.1371/journal.pone.0078480.t001

Between GM and WM, there was a switch in the dominant surrogate markers for WM (Table 3). While oligodendrocytes GPR123 isoforms from GPR123-005 to GPR123-004. are not entirely specific to WM, they would be expected to Interestingly, although total GPR123 was expressed at a lower appear at a higher expression level in WM than GM. level in WM than GM, the dominant isoform contributed to the Seven neuronal markers were chosen, all of which were same percentage of the overall expression. expressed at higher levels in GM than WM. For five of these The isoform expression patterns of the genes that had the genes (NEFL, GABRA1, SYT1, SLC12A5, SV2B), the up- top three annotated differently expressed isoforms in GM and regulation in GM was statistically significant (q-value<0.05) the top three annotated differently expressed isoforms in WM (Figure 5). The neuronal markers clearly correlate with GM, were also analyzed (Figure S2-S7) and shown that the total showing limited to no expression in WM. Ten different expression levels of many genes result from dominant oligodendrocyte markers were chosen; all were expressed at a expression of one of the isoforms whereas the remaining splice higher level in WM than in GM. Seven of these genes (SOX10, variants are marginally present. Thus it is important to examine GJC2, MOG, MAG, MAL, GAL3ST1, and UGT8) were gene expression with resolution to individual transcriptional considered differentially expressed, being up-regulated in WM isoforms. (q-value<0.05) (Figure 6). These results demonstrate that a correlation exists between the cellular composition of GM and Expression of well-known cell type markers WM and the results produced by the transcriptome sequencing. While few studies explore the transcriptome profiles of GM and WM, there are a number of well-characterized cell type Validation with in situ hybridization markers [25]. As the cellular make up of GM and WM is Six genes from the RNA-Seq dataset were selected for considerably different, certain cell type markers were used to further validation through the use of the Allen Brain Atlas ISH reflect the composition of the respective regions; in this case, database. The first two selected genes were neurofilament established neuronal markers were used as surrogate markers heavy polypeptide (NEFH) and myelin oligodendrocyte for GM and established oligodendrocyte markers were used as glycoprotein (MOG). NEFH encodes for neurofilament-heavy

PLOS ONE | www.plosone.org 5 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 2. Volcano plot of isoform expression in GM and WM. The fold-change of the isoforms was relative to their expression in WM. Those isoforms with a negative fold-change were up-regulated in GM (down-regulated in WM) and those isoforms with a positive fold-change were up-regulated in WM (down-regulated in GM). Isoforms that were statistically significant (q-value<0.05) are shown in red and were listed in Table S2. This figure demonstrates that a larger number of isoforms were significantly up-regulated in GM (681) than were up-regulated in WM (201). Overall there was a greater spread of data for the isoforms that were up-regulated in GM, both in terms of fold-change and p-values. doi: 10.1371/journal.pone.0078480.g002 polypeptides, which are present in the chains that form one of significantly high levels in GM while not being expressed at all the integral components in neuronal cytoskeleton in WM. The criteria reduced the number of genes differentially neurofilaments, making it a relevant gene for the neuron rich expressed in GM from 1,218 to 145 (Table S3). As there is less GM [26,27]. NEFH had a FPKM of 57.76 in GM and a FPKM of transcriptional activity in WM, the selection criteria were 0.95 in WM. MOG is myelin specific in the central nervous relaxed to the following: >4 fold up-regulation in WM, an system, and its expression level parallels the myelination of FPKM>5 in WM, and a FPKM <5 in GM. These criteria reduced axons, suggesting that it plays an integral role in WM [28]. the list of significantly expressed genes in WM from 434 to 76 MOG had a FPKM of 8.33 in GM and 49.08 in WM. The ISH (Table S3). The top 40 genes from both GM and WM are results (Figure 7) correlate strongly with the RNA-Seq data. As shown in the form of a heat map (Figure 8). The heat map expected, the NEFH ISH slide shows a higher level of shows that a marked contrast exists between the refined expression in GM, while there is almost no expression in WM. expression profiles of the two tissue types. In the genes Conversely, the ISH slides of MOG confirm that MOG is selected as GM markers, the differences between the two expressed at its highest level in WM. The other four genes tissue types were more distinct than the expression differences (RGS4, CAMK2A, SLC17A7, NEFM) which were selected for seen in the WM. As WM is a less transcriptionally active tissue validation with in situ hybridization also correlated well with the when compared to GM, WM may experience a flow of RNA RNA-Seq data (Figure S8). Again, these results demonstrate from the more transcriptionally active GM. These two gene lists the accuracy of the RNA-Seq results. set a baseline for the expression profiles for healthy GM and WM. Novel gene markers for grey matter and white matter Validation of the RNA-Seq results via in situ hybridization Enrichment map of pathway analysis and correlation with well-known cell type makers demonstrated The gene lists of differentially expressed genes from GM and that RNA-Seq is biologically accurate and thus a useful tool for WM were fed into DAVID. DAVID sorts genes by gene ontology transcriptome profiling of the human brain. Strict selection (GO) terms. The GO terms loosely define the functional criteria were applied to all genes that were up-regulated in GM relevance of the gene; a gene may belong to numerous to define a normal healthy GM transcriptome. The criteria ontologies. DAVID then collates the GO terms and determines included genes that were >20-fold up-regulated in GM, had an which ontologies are enriched in the gene list. For GM, a total FPKM>5 in GM, and had an FPKM<1 in WM. The aim of this of 516 different GO terms were considered enriched (Table selection stringency was to find genes that were expressed at S4). The top 10 GO terms in GM sorted by p-value are listed in

PLOS ONE | www.plosone.org 6 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Table 2. Top 10 up-regulated isoforms in GM and WM.

Up-regulated in GM Gene Description Chrom. FPKM GM FPKM WM Fold Change q-value Ensembl ID KCNK12 potassium channel, subfamily K, member 12 chr2 2.81439 0 Unique to GM 0.0441582 ENSG00000184261 OLFM3 olfactomedin 3 chr1 2.6659 0.00538715 494.8627753 0.0477668 ENSG00000118733 potassium voltage-gated channel, Shaw-related subfamily, KCNC1 ch 11 12.8201 0.0341285 375.6420587 0.0473807 ENSG00000129159 member 1 RIMS3 regulating synaptic membrane exocytosis 3 chr1 28.2837 0.14296 197.8434527 0.0423304 ENSG00000117016 C1QL3 complement component 1, q subcomponent-like 3 chr10 6.69439 0.0433715 154.3499764 0.00357408 ENSG00000165985 GABRB2 gamma-aminobutyric acid (GABA) A receptor, beta 2 chr5 17.313 0.112452 153.9590225 8.69E-08 ENSG00000145864 RAB27B RAB27B, member RAS oncogene family chr18 3.83441 0.0286122 134.0131133 0.0100352 ENSG00000041353 UDP-N-acetyl-alpha-D-galactosamine:polypeptide N- GALNT9 chr12 15.68 0.123142 127.3326728 0.0454933 ENSG00000182870 acetylgalactosaminyltransferase 9 (GalNAc-T9) CAMK2A calcium/calmodulin-dependent protein kinase II alpha chr5 79.8284 0.664104 120.2046667 0.018091 ENSG00000070808 RGS4 regulator of G-protein signaling 4 chr1 56.2238 0.476735 117.9351212 2.53E-05 ENSG00000117152 Up-regulated in WM FPKM Gene Description Chrom. FPKM WM Fold Change q-value Ensembl ID GM EIF3F eukaryotic translation initiation factor 3, subunit F chr11 0 16.3903 Unique to WM 0.00528201 ENSG00000175390 SUPT5H suppressor of Ty 5 homolog (S. cerevisiae) chr19 0 13.427 Unique to WM 0.0474906 ENSG00000196235 KIFAP3 kinesin-associated protein 3 chr1 0 8.74404 Unique to WM 0.000345075 ENSG00000075945 ATP6V1H ATPase, H+ transporting, lysosomal 50/57kDa, V1 subunit H chr8 0 8.14233 Unique to WM 0.00382367 ENSG00000047249 FLNB filamin B, beta chr3 0 7.70063 Unique to WM 0.00648399 ENSG00000136068 PAK2 p21 protein (Cdc42/Rac)-activated kinase 2 chr3 0 6.18142 Unique to WM 0.00233423 ENSG00000180370 ZDHHC3 zinc finger, DHHC-type containing 3 chr3 0 6.10182 Unique to WM 0.0064209 ENSG00000163812 SEZ6L2 seizure related 6 homolog (mouse)-like 2 chr16 0 5.73869 Unique to WM 0.0315557 ENSG00000174938 DCAF7 DDB1 and CUL4 associated factor 7 chr17 0 4.83543 Unique to WM 0.0155276 ENSG00000136485 MECP2 methyl CpG binding protein 2 (Rett syndrome) chrX 0 4.18897 Unique to WM 0.00357408 ENSG00000169057 doi: 10.1371/journal.pone.0078480.t002

Figure 3. Splice variants of GPR123. The intron/exon structure of the four GPR123 isoforms. GPR123-002 and GPR123-003 have been identified previously. GPR123-003 is considered the full-length isoform. GPR123-004 and GPR123-005 were previously unannotated. doi: 10.1371/journal.pone.0078480.g003

PLOS ONE | www.plosone.org 7 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 4. Expression levels of the GPR123 isoforms. GPR123-004 was not expressed at all in GM, but became the dominant isoform in WM. GPR123-002, GPR123-003 and GPR123-005 all had reduced expressed in WM. The TSS for GPR123-005 was also up-regulated in GM. The changes in expression of GPR123-003 and GPR123-003 were statisitically significant (q-value<0.05). Error bars are ± standard error. doi: 10.1371/journal.pone.0078480.g004

Table 4. For GM, the top 10 GO terms related predominately to and membranes are ion-gated channels, transporters and synapses and various transport activities. For WM, 284 GO receptors. These groups are related to the transport of terms were identified as enriched (Table S4). The top 10 GO substances such as ions throughout cells. Clusters of GO terms in WM sorted by p-value are listed in Table 4. Of note in terms related to neuron morphogenesis and neuron projection this list are the 9th and 10th GO terms, which relate to the were enriched by genes from WM and GM. Neuron ensheathment of neurons and axons. morphogenesis refers to changes in the underlying neuronal The list of GO terms from GM and WM were then fed into cytoskeleton and its interaction with the plasma membrane Cytoscape and used to modify the total gene list, creating an [29]. It can involve processes pertaining to axon initiation, enrichment map (Figure 9). The enrichment map revealed growth, guidance and branching; dendritic growth, guidance, several large and distinct clusters related to vesicles and and branching; and synapse formation and stability. Similarly, membrane: ion-gated channels, transporters and receptors, neuron projection relates to any process involved in the neuron morphogenesis, ensheathment and myelination, neuron projection, synaptosomes, neuron morphogenesis, initiation of neurite protrusion, and subsequent elongation often transmission of nerve impulse, and plasticity and axon/dendrite involving axons and dendrites [30]. Neuron morphogenesis and projection. From the enrichment map, it can be seen that there neuron projection are related to improving communication are more GO terms enriched in GM. Also, there are more GM between neurons and hence will involve a complex interaction connections between GO terms, which reinforce the observed between GM and WM. Only one cluster was clearly dominated trend of higher levels of transcriptional activity in GM than in by WM and was related to ensheathment and myelination, a WM. The major clusters of GO terms for GM related to vesicles process performed by glia enriched in WM.

PLOS ONE | www.plosone.org 8 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Table 3. Cell type markers.

Neuronal markers Gene Description Chrom. FPKM GM FPKM WM Fold Change q-value Significant Ensembl ID NEFL neurofilament, light polypeptide chr8 204.57 2.89839 70.58056369 1.00E-06 yes ENSG00000104725 GABRA1 gamma-aminobutyric acid (GABA) A receptor, alpha 1 chr5 18.8275 0.358014 52.58872558 0.00155552 yes ENSG00000022355 SYT1 synaptotagmin I chr12 87.9017 1.49415 58.83057257 0.000873923 yes ENSG00000067715 solute carrier family 12 (potassium/chloride transporter), SLC12A5 chr20 31.4802 0.808493 38.93688628 1.17E-08 yes ENSG00000124140 member 5 SV2B synaptic vesicle glycoprotein 2B chr15 16.0663 1.40407 11.44266312 0.000361367 yes ENSG00000185518 SNAP25 synaptosomal-associated protein, 25kDa chr20 974.198 34.271 28.42630796 0.173004 no ENSG00000132639 potassium voltage-gated channel, KQT-like subfamily, KCNQ2 chr20 1.31699 0 Unique to GM 0.780598 no ENSG00000075043 member 2 Oligodendrocyte markers Gene Description Chrom. FPKM GM FPKM WM Fold Change q-value Significant Ensembl ID SOX10 SRY (sex determining region Y)-box 10 chr22 20.4556 133.049 6.504282446 0.0450965 yes ENSG00000100146 GJC2 gap junction protein, gamma 2, 47kDa chr1 7.88551 57.9386 7.347476574 4.56E-07 yes ENSG00000198835 MOG myelin oligodendrocyte glycoprotein chr6 8.33271 49.0835 5.890460606 0.000313549 yes ENSG00000204655 MAG myelin associated glycoprotein chr19 62.8597 530.705 8.442690627 1.60E-05 yes ENSG00000105695 MAL mal, T-cell differentiation protein chr2 68.0608 546.616 8.03128967 0.0139824 yes ENSG00000172005 GAL3ST1 galactose-3-O-sulfotransferase 1 chr22 6.15286 41.0321 6.668784923 0.000107928 yes ENSG00000128242 UGT8 UDP glycosyltransferase 8 chr4 23.6732 88.8345 3.752534512 0.00966086 yes ENSG00000174607 CSPG4 chondroitin sulfate proteoglycan 4 chr15 2.95805 4.12994 1.396169774 0.732599 no ENSG00000173546 PDGFRA platelet-derived growth factor receptor, alpha polypeptide chr4 4.67575 4.8763 1.042891515 0.992145 no ENSG00000134853 MOBP myelin-associated oligodendrocyte basic protein chr3 160.105 851.149 5.316192499 0.374118 no ENSG00000168314 doi: 10.1371/journal.pone.0078480.t003

Discussion The human brain is an extremely complex organ. This complexity is not only derived from the sheer number and This study is the first comparative transcriptome analysis of variety of cells present in the brain, but also through the GM and WM from the human brain using RNA-Seq. It has heterogeneity of the brain, with cellular composition and shown that, overall, there are high levels of transcription in the density having the propensity to vary over short distances. The human brain, with the identified transcripts covering tissue samples used for the RNA-Seq analysis were taken from approximately 37% of the genome. This number is close to the two different tissue types (GM and WM) from adjacent regions figure described by the ENCODE consortium [31]; the of the superior frontal gyrus. While the samples were taken ENCODE study reported that, among 15 different human cell from regions of close proximity, their transcriptome profiles lines, the mean coverage of the human genome by primary were distinct, suggesting that each of the two tissue types have transcripts was 39%. The ENCODE study included non- separate functions and further highlights the heterogeneity of polyadenylated RNAs, while the current study utilized poly-T the brain tissue across small distances. The heterogeneity of the brain becomes an issue when attempting to gauge differing oligo-attached magnetic beads for the selection of the RNA gene expression levels between case and control samples. If fraction for analysis. If the RNA selection for this study also the brain tissue is not selected and matched properly, included non-polyadenylated RNAs, it would be expected that differences highlighted in the transcriptome profiles may be the coverage of the genome would be much higher than the differences resulting from variation in the composition and 39% suggested by ENCODE. This would suggest that function of different brain regions rather than disease-related transcription is pervasive in the human brain. Furthermore, this changes in the transcriptome. While the heterogeneity of brain study has also shown that a number of differentially expressed tissue could make it difficult to study, the problem can be genes and transcripts exist across GM and WM from the same overcome through appropriate experimental design. The brain anatomical region of the human brain, namely, the superior regions being compared must have the same cellular frontal gyrus. These results highlight the complexity and composition and be of the same functional capabilities. Laser variability of the transcriptome and the presence of numerous capture of specific cell populations could be used to create non-coding elements. Importantly, the transcriptome profiles for homogenous samples for RNA-Seq analysis [32]. each region, and the GO enrichment map correlated well with Several previous attempts were made to establish specific what is known about the function and composition of WM and gene expression patterns as markers for GM and WM [33,34]. GM. This previous information has been predominately These studies, performed using microarrays, provided only established through the use of other techniques, such as partially overlapping results in terms of GM- and WM-specific histopathology and in situ hybridization. genes. These discrepancies are due to the different cortical

PLOS ONE | www.plosone.org 9 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 5. Heatmap of gene expression of neuronal cell markers. Shows the expression profile of neuronal cell markers in GM and WM. There was an expression bias towards GM with the up-regulation of NEFL, GABRA1, SYT1, SLC12A5, SV2B being statistically significant (q-value<0.05). doi: 10.1371/journal.pone.0078480.g005 regions analyzed between studies; thus, the contribution of differential gene and isoform expression analysis with pathway genes specifically expressed in neurons and oligodendrocytes analysis we were able to demonstrate that biological processes variably contributed to the calculated GM/WM ratio [3]. Another specific for WM and GM, such as myelination and ions difficulty stems from the microarray technique itself, which is transportation, respectively are reflected on the transcriptional limited in dynamic range to quantify gene expression [35]. In level. Thus the transcriptome profiling may be effectively used the present study, we provide a range of over 40 new gene to investigate cellular physiology, while a more global view of marker candidates based on their unequivocal abundance in genomic expression may be used to study phenotypic features expression in GM and WM. Further, we propose that of complex structures such as human cortex. cumulative gene expression signatures, as now commonly This study identified high levels of various non-coding RNA used in cancer research [36], rather than individual genes classes, including snoRNAs, miRNAs, and lincRNAs; further, should be used as markers for GM and WM. The utility of such there were also high levels of locRNAs, novel genes, and novel marker signatures should however be further evaluated using isoforms produced from splicing events. It is possible that these larger cohort of samples. last three classes could contribute further to the number of non- Within the last few years, the RNA-Seq technique became coding RNAs found in the human brain. Interestingly, while widely used in transcriptome research, progressively replacing higher levels of transcriptional activity were found in GM, WM microarray techniques. Surprisingly, little has been done in the was identified as having higher levels of lincRNAs and a higher field of brain diseases using this sequencing approach. This can be partially caused by the lack of comprehensive reference proportion of unannotated isoforms. Non coding RNAs data sets that might be used for comparative transcriptome (ncRNAs) were previously suggested to be widely expressed analysis. Wu and colleagues recently reported comparative across the brain, where they often have cell specific regulatory analysis of the transcriptome profiles derived from the superior functions [40]. These ncRNAs may play a role in regulating temporal gyrus (STG) [37]. This study was however limited to myelination patterns of axon in WM, a dynamic process that the GM of STG and performed in the context of schizophrenia. can be altered by experience and can carry on for decades in Our study provides the first well-defined data sets that can be the human brain [41]. The functionality of the unannotated used to explore transcriptome aberrations affecting the frontal RNAs will need to be explored further in the future. It should lobe and, in particular, frontal WM as in the case of multiple also be underlined that possible influence of age on the system atrophy and bipolar disorders [38,39]. By combining transcriptional patterns presented here cannot be excluded and

PLOS ONE | www.plosone.org 10 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 6. Heatmap of gene expression of oligodendrocyte cell markers. Shows the expression profile of oligodendrocyte cell markers in GM and WM. There was an expression bias towards WM with the up-regulation of SOX10, GJC2, MOG, MAG, MAL, GAL3ST1, UGT8 being statistically significant (q-value<0.05). doi: 10.1371/journal.pone.0078480.g006 will require utilization of larger number of samples to enable it may be relevant to functional GM. In contrast, the dominant regression analysis of covariates. isoform in WM was the N-terminal truncated isoform GPR123 is a member of the human G protein-coupled GPR123-005, which contributed approximately 70% of the total receptor (GPCR) family. GPCRs play important roles in a GPR123 expression in WM. It is known that the domain variety of sensory systems and help modulate blood pressure, structure of GPR123-004 and GPR123-005 would differ, which food intake, immune responses, and development. It has been would lead to distinct functions for proteins encoded by each suggested that GPR123 is expressed specifically in the central isoform. This fact points to the possibility that the GPR123 nervous system (CNS); it also shows high levels of proteins may carry out distinct tissue-specific functions while conservation across the vertebrate lineage [42]. These factors being transcribed and translated from the same genomic locus. suggest that GPR123 expression may be a fundamental component of the vertebrate CNS. While more research needs to be directed at elucidating the The expression patterns of GPR123 between GM and WM role of the differing GPR123 isoforms in the human brain, this reveal an interesting individual insight into the complexity of the gene does demonstrate the complexity of the transcriptome transcriptome. At the gene level, GPR123 was up-regulated 5- and also shows how the repertoire of transcripts is greatly fold in GM, suggesting that GPR123 is of greater functional increased by splicing events. It also illustrates that it is not just relevance in GM; however, when analyzed at the isoform level, total gene expression levels that are important, but which it was shown that the full length isoform GPR123-003 was not isoforms are contributing to the expression levels. the major contributor to the GPR123 overexpression. Instead, Unfortunately, current pathway analysis bioinformatic tools, the dominant isoform was the C-terminal truncated such as DAVID, have not yet been set up to delineate between GPR123-004, which contributed to approximately 70% of total different splice variants. GPR123 expression in GM. The level of GPR123-004 FPKM expression in GM and the use of an alternate TSS suggest that

PLOS ONE | www.plosone.org 11 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Conclusions

This study identified a large number of differentially expressed genes between GM and WM that matched up well with previous studies and the known biology of the human cerebrum, underlining the accuracy and the advantages of using RNA-Seq for the analysis of complex transcriptomes. This set of results form a baseline expression database for healthy GM and WM and can be used as a resource to help detect abnormal gene expression between GM and WM tissues from sufferers of neurodegenerative diseases or psychiatric disorders. This is particularly relevant to WM which may play a major role in AD and multiple system atrophy [43,44].

Figure 7. Allen Human Brain Atlas in situ hybridisation for NEFH and MOG genes. A. NEFH: Neurofilament, Heavy Polypeptide GM FPKM: 57.76 WM FPKM: 0.95. Slide from the dorsolateral cortex of a healthy 20-year-old male. The slide shows high levels of expression in GM B. MOG: Myelin Oligodendrocyte Glycoprotein GM FPKM: 8.33 WM FPKM: 49.08. Slide from the dorsolateral cortex of a healthy 20-year- old male. The slide shows high level of expression in WM. Source: Allen Human Brain Atlas (Hawrylycz et al. 2012 and ©2012 Allen Institute for Brain Science. Allen Human Brain Atlas [Internet]. Available from: http://human.brain-map.org/). doi: 10.1371/journal.pone.0078480.g007

PLOS ONE | www.plosone.org 12 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 8. Heatmap of high abundant genes specifically expressed in GM and WM. The following cut-offs were applied to the WM differentially expressed gene list; >4 fold up-regulation in WM, an FPKM>5 in WM and a FPKM <5 in GM. The following cut-offs were applied to the GM differentially expressed gene list; >20-fold up-regulated in GM, had an FPKM>5 in GM and an FPKM<1 in WM. This heatmap shows the top 40 genes from each list. The top half of the heatmap show genes deemed as WM specific, while the lower half shows genes expressed specifically in GM. doi: 10.1371/journal.pone.0078480.g008

PLOS ONE | www.plosone.org 13 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Table 4. Top 10 GO terms in GM and WM.

GO term GM Term p-value 1. Synapse 9.62E-45 2. Synapse part 5.98E-40 3. Synaptic transmission 1.10E-39 4. Transmission of nerve impulse 1.63E-39 5. Plasma membrane part 2.97E-33 6. Gated channel activity 2.38E-30 7. Plasma membrane 3.86E-29 8. Neuron projection 8.53E-28 9. Ion channel complex 1.08E-27 10. Ion channel activity 4.01E-27 GO terms WM Term p-value 1.Plasma membrane 4.78E-06 2. Negative regulation of axonogenesis 7.22E-06 3. Regulation of cell projection organization 9.78E-06 4. Regulation of action potential in neuron 9.83E-06 5. Regulation of cell morphogenesis 1.08E-05 6. Negative regulation of cell projection organization 1.46E-05 7. Regulation of axonogenesis 1.48E-05 8. Ensheathment of neurons 1.61E-05 9. Axon ensheathment 1.61E-05 10. Sterol metabolic process 3.00E-05 doi: 10.1371/journal.pone.0078480.t004

PLOS ONE | www.plosone.org 14 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Figure 9. Gene Ontology terms enrichment map for GM and WM transcriptomes. Each node represents a different GO term, the size of the node relates to the level of enrichment of each term. Red in the centre of the node represents up-regulation in GM, red on the edge of the node represents up-regulation in WM. The connections between each GO term are either green or blue. A green connection between node means that the both GO terms are in the GM lists, blue connections represent appearance of the GO terms in the WM lists. The more closely related GO terms are, the closer they appear on the enrichment map. A large number of closely related GO terms forms a cluster. Each cluster has been labelled with a general terms that captures all GO terms. doi: 10.1371/journal.pone.0078480.g009

PLOS ONE | www.plosone.org 15 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Supporting Information WM. Both splice vairants contributed almost equal levels of expression to both GM and WM. TF-001 was upregulated 7x in Table S1. Full list of significant DEGs between GM and WM when compared to GM. TF-202 was a novel splice WM. varaints, it was expressed at higher levels in WM than in GM, (XLSX) however the changes in expression was not considered to be statistically significant. Error bars are ± standard error. Table S2. Full list of significant DEIs between GM and WM. (TIFF) (XLSX) Figure S6. Expression levels of myelin associated Table S3. Novel gene markers for GM and WM. glycoprotein (MAG) isoforms. There were four MAG splice (XLSX) variants expressed across GM and WM. MAG-001 was the dominant isoform and was up regulated 8x in WM when Table S4. Gene Ontology terms enrichment analysis for compared to GM. The three other splice variants (MAG-002, DEGs between GM and WM. MAG-003, MAG-009) were expressed at low levels in both GM (XLSX) and WM. MAG-009 was a novel splice variant. Error bars are ± standard error. Figure S1. Gene body coverage for GM and WM RNA (TIFF) samples used in this study. A - C. GM samples D - F. WM samples. Each figure shows the number of reads that map to Figure S7. Expression levels of MARCKS-like 1 particular portions of the gene body. The x-axis starts at the 5’ (MARCKSL1) isoforms. There were two MARCKSL1 splice end of the transcript and moves towards the 3’ end (left to variants expressed across GM and WM. Both splice vairants right). The y-axis represents the average wigsum. The wigusm contributed high levels of expression to both GM and WM. is a normalised ‘total read count’ where a wigsum of MARKSL1-001 was upregulated 7x in WM when compared to 100,000,000 is equal to the coverage achieved by 1 million 100 GM. MARCKSL1-002 was a novel splice varaints, it was also base reads or 2 million 50 base reads. All figures are skewed expressed at higher levels in WM than in GM, however the to the 3’ end of the transcripts, showing a 3’ bias, caused by changes in expression was not considered to be statistically poly-A selection of the RNA fraction. significant. Error bars are ± standard error. (TIFF) (TIFF)

Figure S2. Expression levels of visinin-like 1 (VSNL1) Figure S8. Allen Human Brain Atlas in situ hybridisation isoforms. There were two VSNL1 splice variants expressed for the RGS4, CAMK2A, SLC17A7 and NEFM genes. A. across GM and WM. VSNL1-001 was up regulated 34x in GM RGS4: Regulator of G-protein signalling 4 GM FPKM: 100.75 when compared to WM. The second splice variant VSLN1-007 WM FPKM: 1.52. Slide from the dorsolateral cortex of a healthy was novel and was expressed at low levels across both GM 20-year-old male. The slide shows high levels of expression in and WM. Error bars are ± standard error. GM. B. CAMK2A: Calcium/calmodulin-dependent protein (TIFF) kinase II alpha GM FPKM: 233.16 WM FPKM: 5.27. Slide from the dorsolateral cortex of a healthy 20-year-old male. The slide Figure S3. Expression levels of syniclein, beta (SNCB) shows high levels of expression in GM. C. SLC17A7: Solute isoforms. There were two SNCB splice variants expressed carrier family 17 (sodium-dependent inorganic phosphate across GM and WM. SNCB-002 was the dominant isoform and cotransporter), member 7 GM FPKM: 270.79 WM FPKM: 7.39. was up regulated 29x in GM when compared to WM. The Slide from the dorsolateral cortex of a healthy 20-year-old second splice variant SNCB-001 was expressed at low levels male. The slide shows high levels of expression in GM. D. across both GM and WM. Error bars are ± standard error. NEFM: Neurofilament, medium polypeptide GM FPKM: 224.01 (TIFF) WM FPKM: 4.09. Slide from the dorsolateral cortex of a healthy 20-year-old male. The slide shows high levels of expression in Figure S4. Expression levels of reticulon 1 (RTN1) GM. Source: Allen Human Brain Atlas (Hawrylycz et al. 2012 isoforms. There were four RTN1 splice variants expressed and ©2012 Allen Institute for Brain Science. Allen Human Brain across GM and WM. RTN1-202 was the dominant isoform and Atlas [Internet]. Available from: http://human.brain-map.org/). was up regulated 10x in GM when compared to WM. The (TIFF) splice variant RTN1-201 was expressed at approximately 20 FPKM in both conditions. RTN1-203 and RTN1-204 were Acknowledgements expressed at low levels in both conditions. All four identified splice variants were novel. Error bars are ± standard error. The authors would like to thank Dr. Caroline Janitz for her (TIFF) expert advice and critical comments during manuscript preparation, Adam J. Mills for his computing expertise and Figure S5. Expression levels of transferrin (TF) isoforms. acknowledge the Garvan Institute providing Galaxy server There were two TF splice variants expressed across GM and (http://galaxy.garvan.unsw.edu.au).

PLOS ONE | www.plosone.org 16 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

Author Contributions

Conceived and designed the experiments: JDM WSK GMH MJ. Performed the experiments: JDM. Analyzed the data: JDM TK BJC YK. Wrote the manuscript: JDM WSK GMH MJ.

References

1. Kimberg DY, Farah MJ (1993) A unified account of cognitive differentiation. Nat Biotechnol 28: 511-515. doi:10.1038/nbt.1621. impairments following frontal lobe damage: the role of working memory PubMed: 20436464. in complex, organized behavior. J Exp Psychol Gen 122: 411-428. doi: 19. Trapnell C, Roberts A, Goff L, Pertea G, Kim D et al. (2012) Differential 10.1037/0096-3445.122.4.411. PubMed: 8263463. gene and transcript expression analysis of RNA-seq experiments with 2. Miller EK, Freedman DJ, Wallis JD (2002) The prefrontal cortex: TopHat and Cufflinks. Nat Protoc 7: 562-578. doi:10.1038/nnano. categories, concepts and cognition. Philos Trans R Soc Lond B Biol Sci 2012.118. PubMed: 22383036. 357: 1123-1136. doi:10.1098/rstb.2002.1099. PubMed: 12217179. 20. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES et 3. Ziles K, Amunts K (2011) Architecture of cerebral cortex. In: KM MaiG al. (2011) Integrative genomics viewer. Nat Biotechnol 29: 24-26. doi: Paxinos. The Human Nervous System. 3rd ed. Academic Press. pp. 10.1038/nbt.1754. PubMed: 21221095. 836-895. 21. Thorvaldsdóttir H, Robinson JT, Mesirov JP (2012) Integrative 4. Lui JH, Hansen DV, Kriegstein AR (2011) Development and evolution Genomics Viewer (IGV): high-performance genomics data visualization of the human neocortex. Cell 146: 18-36. doi:10.1016/j.cell. and exploration. Brief Bioinform, 14: 178–92. PubMed: 22517427. 2011.06.030. PubMed: 21729779. 22. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and 5. Nagy Z, Westerberg H, Klingberg T (2004) Maturation of white matter is integrative analysis of large gene lists using DAVID bioinformatics associated with the development of cognitive functions during resources. Nat Protoc 4: 44-57. PubMed: 19131956. childhood. J Cogn Neurosci 16: 1227-1233. doi: 23. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N et al. (2007) 10.1162/0898929041920441. PubMed: 15453975. Integration of biological networks and gene expression data using 6. Schoenemann PT, Sheehan MJ, Glotzer LD (2005) Prefrontal white Cytoscape. Nat Protoc 2: 2366-2382. doi:10.1038/nprot.2007.324. matter volume is disproportionately larger in humans than in other PubMed: 17947979. primates. Nat Neurosci 8: 242-252. doi:10.1038/nn1394. PubMed: 24. Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L (2011) 15665874. Improving RNA-Seq expression estimates by correcting for fragment 7. Smaers JB, Schleicher A, Zilles K, Vinicius L (2010) Frontal white bias. Genome Biol 12: R22. doi:10.1186/gb-2011-12-s1-p22. PubMed: matter volume is associated with brain enlargement and higher 21410973. structural connectivity in anthropoid primates. PLOS ONE 5: e9123. 25. Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian JL et al. (2008) A doi:10.1371/journal.pone.0009123. PubMed: 20161758. transcriptome database for astrocytes, neurons, and oligodendrocytes: 8. Mills JD, Nalpathamkalam T, Jacobs HI, Janitz C, Merico D et al. a new resource for understanding brain development and function. J (2013) RNA-Seq analysis of the parietal cortex in Alzheimer's disease Neurosci 28: 264-278. doi:10.1523/JNEUROSCI.4178-07.2008. reveals alternatively spliced isoforms related to lipid metabolism. PubMed: 18171944. Neurosci Lett 536: 90-95. doi:10.1016/j.neulet.2012.12.042. PubMed: 26. Kim MS, Chang X, LeBron C, Nagpal JK, Lee J et al. (2010) 23305720. Neurofilament heavy polypeptide regulates the Akt-beta-catenin 9. Twine NA, Janitz K, Wilkins MR, Janitz M (2011) Whole transcriptome pathway in human esophageal squamous cell carcinoma. PLOS ONE sequencing reveals gene expression and splicing differences in brain 5: e9003. doi:10.1371/journal.pone.0009003. PubMed: 20140245. regions affected by Alzheimer's disease. PLOS ONE 6: e16266. doi: 27. Lee MK, Cleveland DW (1996) Neuronal intermediate filaments. Annu 10.1371/journal.pone.0016266. PubMed: 21283692. Rev Neurosci 19: 187-217. doi:10.1146/annurev.ne.19.030196.001155. 10. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y et al. (2011) PubMed: 8833441. Transcriptomic analysis of autistic brain reveals convergent molecular 28. Johns TG, Bernard CC (1999) The structure and function of myelin pathology. Nature 474: 380-384. doi:10.1038/nature10110. PubMed: oligodendrocyte glycoprotein. J Neurochem 72: 1-9. doi:10.1111/j. 21614001. 1471-4159.1999.tb11579.x. PubMed: 9886048. 11. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L et al. 29. Luo L (2002) Actin cytoskeleton regulation in neuronal morphogenesis (2012) An anatomically comprehensive atlas of the adult human brain and structural plasticity. Annu Rev Cell Dev Biol 18: 601-635. doi: transcriptome. Nature 489: 391-399. doi:10.1038/nature11405. 10.1146/annurev.cellbio.18.031802.150501. PubMed: 12142283. PubMed: 22996553. 30. Khodosevich K, Monyer H (2010) Signaling involved in neurite 12. Ho——Pun-Cheung A, Bascoul-Mollevi C, Assenat E, Boissière-Michot outgrowth of postnatally born subventricular zone neurons in vitro. BMC F, Bibeau F et al. (2009) Reverse transcription-quantitative polymerase Neurosci 11: 18. doi:10.1186/1471-2202-11-18. PubMed: 20146799. chain reaction: description of a RIN-based algorithm for accurate data 31. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T et al. (2012) normalization. BMC Mol Biol 10: 31. doi:10.1186/1471-2199-10-31. Landscape of transcription in human cells. Nature 489: 101-108. doi: PubMed: 19368728. doi:10.1186/1471-2199-10-31 PubMed: 19368728 10.1038/nature11233. PubMed: 22955620. 13. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R et al. 32. Emmert-Buck MR, Bonner RF, Smith PD, Chuaqui RF, Zhuang Z et al. (2010) Galaxy: a web-based genome analysis tool for experimentalists. (1996) Laser capture microdissection. Science 274: 998-1001. doi: Curr Protoc Mol Biol Chapter 19: Unit 19.10.1: Unit 19 10 11-21 10.1126/science.274.5289.998. PubMed: 8875945. PubMed: 20069535. 33. Rosell A, Vilalta A, García-Berrocoso T, Fernández-Cadenas I, 14. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L et al. (2005) Domingues-Montanari S et al. (2011) Brain perihematoma genomic Galaxy: a platform for interactive large-scale genome analysis. profile following spontaneous human intracerebral hemorrhage. PLOS Genome Res 15: 1451-1455. doi:10.1101/gr.4086505. PubMed: ONE 6: e16750. doi:10.1371/journal.pone.0016750. PubMed: 16169926. 21311749. 15. Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive 34. Sibille E, Arango V, Joeyen-Waldorf J, Wang Y, Leman S et al. (2008) approach for supporting accessible, reproducible, and transparent Large-scale estimates of cellular origins of mRNAs: enhancing the yield computational research in the life sciences. Genome Biol 11: R86. doi: of transcriptome analyses. J Neurosci Methods 167: 198-206. doi: 10.1186/gb-2010-11-8-r86. PubMed: 20738864. 10.1016/j.jneumeth.2007.08.009. PubMed: 17889939. 16. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice 35. Courtney E, Kornfeld S, Janitz K, Janitz M (2010) Transcriptome junctions with RNA-Seq. Bioinformatics 25: 1105-1111. doi:10.1093/ profiling in neurodegenerative disease. J Neurosci Methods 193: bioinformatics/btp120. PubMed: 19289445. 189-202. doi:10.1016/j.jneumeth.2010.08.018. PubMed: 20800617. 17. Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq 36. Costa V, Aprile M, Esposito R, Ciccodicola A (2013) RNA-Seq and experiments. Bioinformatics 28: 2184-2185. doi:10.1093/bioinformatics/ human complex diseases: recent accomplishments and future bts356. PubMed: 22743226. perspectives. Eur J Hum Genet 21: 134-142. doi:10.1038/ejhg. 18. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G et al. (2010) 2012.129. PubMed: 22739340. Transcript assembly and quantification by RNA-Seq reveals 37. Wu JQ, Wang X, Beveridge NJ, Tooney PA, Scott RJ et al. (2012) unannotated transcripts and isoform switching during cell Transcriptome sequencing revealed significant alteration of cortical

PLOS ONE | www.plosone.org 17 October 2013 | Volume 8 | Issue 10 | e78480 RNA-Seq of the Human White and Grey Matter

promoter usage and splicing in schizophrenia. PLOS ONE 7: e36351. 42. Lagerström MC, Rabe N, Haitina T, Kalnina I, Hellström AR et al. doi:10.1371/journal.pone.0036351. PubMed: 22558445. (2007) The evolutionary history and tissue mapping of GPR123: 38. Borgwardt S, Fusar-Poli P (2012) White matter pathology--an specific CNS expression pattern predominantly in thalamic nuclei and endophenotype for bipolar disorder? BMC Psychiatry 12: 138. doi: regions containing large pyramidal cells. J Neurochem 100: 1129-1142. 10.1186/1471-244X-12-138. PubMed: 22970986. doi:10.1111/j.1471-4159.2006.04281.x. PubMed: 17212699. 39. Halliday GM, Holton JL, Revesz T, Dickson DW (2011) Neuropathology 43. Sachdev PS, Zhuang L, Braidy N, Wen W (2013) Is Alzheimer's a underlying clinical variability in patients with synucleinopathies. Acta disease of the white matter? Curr Opin Psychiatry 26: 244-251. doi: Neuropathol 122: 187-204. doi:10.1007/s00401-011-0852-9. PubMed: 10.1097/YCO.0b013e32835ed6e8. PubMed: 23493128. 21720849. 44. Wakabayashi K, Ikeuchi T, Ishikawa A, Takahashi H (1998) Multiple 40. Mercer TR, Qureshi IA, Gokhan S, Dinger ME, Li G et al. (2010) Long system atrophy with severe involvement of the motor cortical areas and noncoding RNAs in neuronal-glial fate specification and cerebral white matter. J Neurol Sci 156: 114-117. doi:10.1016/ oligodendrocyte lineage maturation. BMC Neurosci 11: 14. doi: S0022-510X(98)00018-5. PubMed: 9559999. 10.1186/1471-2202-11-S1-P14. PubMed: 20137068. 41. Fields RD (2008) White matter in learning, cognition and psychiatric disorders. Trends Neurosci 31: 361-370. doi:10.1016/j.tins. 2008.04.001. PubMed: 18538868.

PLOS ONE | www.plosone.org 18 October 2013 | Volume 8 | Issue 10 | e78480 Chapter 3

The role of long intervening non-coding RNAs in healthy brain development and function

51 Chapter 3

3.1 Primary research article: High expression of

long intervening non-coding RNA

OLMALINC in the human cortical white mat-

ter is associated with regulation of oligoden-

drocyte maturation

Reference Mills, J. D., Kavanagh, T., Kim, W. S., Chen, B. J., Waters, P. D., Halliday, G. M., and Janitz, M., (2015). “High expression of long intervening non-coding RNAOLMALINC in the human cortical white matter is associated with regulation of oligodendrocyte maturation.” Molecular Brain 8(1):2.

Contribution Iconceivedtheexperiments,wrotethearticle,preparedthefigures,isolatedthe RNA for sequencing, performed RT-qPCR of OLMALINC in neurons and oligoden- drocytes and carried out the all of the bioinformatic analyses. PDW contributed to calculating sequence and expression conservation of OLMALINC.TKcarriedout RT-qPCR of OLMALINC in GM and WM under my supervision. WSK carried out the RNAi of the neurons and oligodendrocytes. BJC and GMH provided feed- back on the manuscript. MJ conceived experiments and provided feedback on the manuscript.

52 Chapter 3

Synopsis

One of the major findings from the transcriptome profiling of GM and WM from the human frontal lobe was that lincRNAs are important players in tissue differentiation in the human brain. Linc00263 was up-regulated 4.4-fold in Wlvl when compared to G1vl. This up-regulation puts the expression level of linc00263 within the top

10% of all genes expressed in WM. While high expression in itself is not a proof of functionality, it is an indication, hence linc00263 was chosen for further analysis.

Investigation of the linc00263 locus, identified two splice variants that are possibly regulated by an antisense RNA. F\trther it was demonstrated that linc00263 has its highest levels of expression in the human brain and t hat it is primate-specific.

Linc00263 expression was knocked down in human neuron and oligodendrocyte cell lines, altering the expression of a number of genes critical to oligodendrocyte mat­ m ation in the latter. This article was one of the first times that the functionality of a lincRNA was confirmed in the human brain. Hence, t hese research findings facilitated a name change by the Human Genome Organisation Gene Nomencla­ ture Committee (HGNC) from linc00263 to the more descriptive oligodendrocyte maturation-associated long intergenic non-coding RNA ( OLMALJNC).

Declaration

I certify that this publication was a direct result of my research towards this PhD, and t hat re roduction in t his t hesis docs not breach copyright regulations.

Jame Dominic Mills [PhD Candidate]

53 Mills et al. Molecular Brain (2015) 8:2 DOI 10.1186/s13041-014-0091-9

RESEARCH Open Access High expression of long intervening non-coding RNA OLMALINC in the human cortical white matter is associated with regulation of oligodendrocyte maturation James D Mills1, Tomas Kavanagh1,4, Woojin S Kim2,3, Bei Jun Chen1, Paul D Waters1, Glenda M Halliday2,3 and Michael Janitz1*

Abstract Background: Long intervening non-coding RNAs (lincRNAs) are a recently discovered subclass of non-coding RNAs. LincRNAs are expressed across the mammalian genome and contribute to the pervasive transcription phenomenon. They display a tissue-specific and species-specific mode of expression and are present abundantly in the brain. Results: Here, we report the expression patterns of oligodendrocyte maturation-associated long intervening non-coding RNA (OLMALINC), which is highly expressed in the white matter (WM) of the human frontal cortex compared to the grey matter (GM) and peripheral tissues. Moreover, we identified a novel isoform of OLMALINC that was also up-regulated in the WM. RNA-interference (RNAi) knockdown of OLMALINC in oligodendrocytes, which are the major cell type in the WM, caused significant changes in the expression of genes regulating cytostructure, cell activation and membrane signaling. Gene ontology enrichment analysis revealed that over 10% of the top 25 up- and down-regulated genes were involved in oligodendrocyte maturation. RNAi experiments in neuronal cells resulted in the perturbation of genes controlling cell proliferation. Furthermore, we identified a novel cis-natural antisense non-coding RNA, which we named OLMALINC-AS, which maps to the first exon of the dominant isoform of OLMALINC. Conclusions: Our study has demonstrated for the first time that a primate-specific lincRNA regulates the expression of genes critical to human oligodendrocyte maturation, which in turn might be regulated by an antisense counterpart. Keywords: Long intervening non-coding RNA, OLMALINC, Human brain, Frontal cortex, White and grey matter, Antisense RNA

Background express most of the currently characterized lncRNAs. There is a growing comprehension that the majority of AsubcategoryoflncRNAs,termedlongintervening the transcribed genome does not code for proteins but non-coding RNAs (lincRNAs), show a tissue-specific comprises a variety of non-coding RNAs of different mode of expression and substantially contribute to properties, such as length, as well as functionality in the pervasive transcription that is observed in the transcriptional and epigenetic control [1]. In particular, mammalian genome. LincRNAs are defined as intervening long non-coding RNAs (lncRNAs) seem to be a very (relative to the current gene annotations) transcripts that recent evolutionary development, and this is supported by are longer than 200 nucleotides in length and lack the observation that primates, and humans in particular, protein-coding capacity [2,3]. The role of lincRNAs is thought to be in gene regulation, and in the brain it has been proposed that non-coding * Correspondence: [email protected] 1School of Biotechnology and Biomolecular Sciences, University of New RNAs may play a role in regulating axon myelination in South Wales, Sydney, NSW 2052, Australia WM and glial cell differentiation [4]. It is thought that Full list of author information is available at the end of the article

© 2015 Mills et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mills et al. Molecular Brain (2015) 8:2 Page 2 of 12

lincRNAs act as scaffolds to allow for epigenetic changes to OLMALINC-002 (Figure 1). This read distribution excludes occur within the genome, such as increasing or decreasing the possibility of the long terminal repeat (LTR) element, mRNA levels through sequestration or stabilization, thus that covers exon 3, might artificially inflate the calculated regulating the transcription of target genes. Such a process expression level of OLMALINC.Neitheroftheisoforms is of the utmost importance in the development of the represented OLMALINC-001,whichisannotatedin complicated array of cells that is observed within the brain. Ensembl [Ensembl ID: ENSG00000235823]. The most There is a growing body of evidence that the human highly expressed transcript in our comparative analysis brain, as the most complex organ in the body, is the represented the NCBI annotated isoform [NCBI ID: richest source of lncRNAs, reflecting the demand for NR_026762.1], which for clarity, we called OLMALINC-002. highly complex control mechanisms for the differenti- The isoform is 988-nucleotides (nts) long, consists of ation and development of numerous cell types, including three exons (Figure 1), and is overexpressed 4.5-fold neurons and oligodendroglia [4]. Indeed, our recent in WM (62.5 fpkm) compared to GM (14 fpkm) RNA sequencing (RNA-Seq) study has revealed vast (Additional file 1: Figure S1B). Thus, OLMALINC-002 differences between the transcriptomes of the GM and is a major contributor to the observed overexpression WM in the human superior frontal cortex [5]. While not of the OLMALINC gene in WM. The second alternatively unexpected, given the differences in cell type populations spliced isoform was not annotated in any of the reference within these two structures, this study has demonstrated databases. We named this novel isoform OLMALINC-003. that a significant number of annotated and unidentified It is composed of two exons that form a 1517-nts-long lincRNAs are expressed at high levels and disparate transcript (Figure 1). This novel transcript does not quantities between GM and WM. encode any protein, as determined by an open read- One of the most noticeable examples of differentially ing frame (ORF) finder. OLMALINC-003 is expressed expressed lincRNAs between WM and GM was linc00263 at residual levels in GM (1.6 fpkm) and 10 fpkm in [5]. Compared to most lincRNAs, which remain at low WM (Additional file 1: Figure S1B). expression levels, i.e. < 2 fragments per kilobase of exon To confirm the specificity of the short sequence read per million fragments mapped (fpkm), linc00263 was alignment and assembly of the data, both OLMALINC expressed at 16.2 and 71.5 fpkm in GM and WM, isoforms were reverse transcribed and amplified using respectively [5]. The high levels of expression in WM isoform specific primers, and the PCR products were suggest that linc00263 may have a functional importance in Sanger sequenced. Sequence alignment of the annotated oligodendrocytes, which constitute the majority of cells in isoform and the sequence extracted from the RNA-Seq the WM. Here, we validated the differential expression of data for the novel isoform to the reference genome linc00263 in GM and WM using RT-qPCR. Furthermore, confirmed the sequence specificity of the RT-PCR results we performed a comparative analysis of linc00263 using (Additional file 2: Figure S2A and B). vertebrate genomes and found that this gene is specific Next, we validated the differential expression patterns to primates. Finally, RNAi knockdown of linc00263 in of the OLMALINC using RT-qPCR of independent sets human neuronal and oligodendrocyte cell lines of samples representing the GM and WM of the frontal followed by transcriptome sequencing demonstrated cortex. The high expression of OLMALINC, including that silencing of linc00263,whichwerenamedto both isoforms, was confirmed (Figure 2), with a 12.05-fold OLMALINC (oligodendrocyte maturation-associated higher expression level in WM (p-value< 0.05). The long intervening non-coding RNA), results in coordinated OLMALINC primers, used for RT-qPCR, spanned changes in the expression of genes controlling the cyto- exons 2 and 3 of OLMALINC-002 and exons 1 and 2 skeleton and maturation of glial cells. of OLMALINC-003.Thisprimerdesignruledoutthe possibility of the high FPKM value attributed to Results OLMALINC by RNA-Seq as a result of additional OLMALINC is highly expressed in the white matter of the alignment of reads to the LTR element that covers a human frontal cortex large portion of the final exon in both OLMALINC In our recently published global analysis of the tran- isoforms (Figure 1). scriptome of the WM and GM of the human frontal cortex, we detected 4.4-fold overexpression of OLMALINC Comparative sequence analysis and expression of in WM versus GM [5]. The overall expression of OLMA- OLMALINC in mammals LINC was 16.2 fpkm in GM tissue and 71.5 fpkm in WM Sequence conservation level of OLMALINC locus was ana- (Additional file 1: Figure S1A). RNA-Seq analysis revealed lyzed using blastn searches, DNA dot plots (https://www. that OLMALINC is expressed as two isoforms (Figure 1). sanger.ac.uk/resources/software/seqtools/) and chain blastz The alignment pattern of RNA-Seq reads in the WM sam- alignments on the UCSC genome browser (https://genome. ples shows equal amounts ofreadsaligningtoeachexonof ucsc.edu/). In primates, homology was detected to all Mills et al. Molecular Brain (2015) 8:2 Page 3 of 12

Figure 1 Genomic context splice of variants and genomic features of OLMALINC. Track 1 represents the chromosomal positioning of OLMALINC. Track 2 shows OLMALINC in genomic context. OLMALINC is located downstream of the gene stearoyl-CoA desaturase (delta-9-desaturase) (SCD) and up-stream of wingless-type MMTV integration site family, member 8B (WNT8B). Track 3 is a schematic representation of the exon/intron structure of the OLMALINC-001, −002 and −003 isoforms as well as OLMALINC-AS. Track 4 is a schematic of the repeat elements that appears in this region of the genome including long terminal repeats LTR2, LTR7B and HERVH-int. A LTR 7B over laps with a large portion of the final exon of OLMALINC-002 and OLMALINC-003. Track 5 show the read alignment in the WM samples carried out by TopHat, demonstrating that the repeat elements have not artificially inflated the RNA-Seq fpkm values. human exons in chimpanzee, gorilla, orangutan and OLMALINC expression profiles in non-brain tissues rhesus monkey. In bushbaby, marmoset, mouse, elephant, To determine whether OLMALINC was likely to be opossum and platypus homology was only observed with brain-specific or commonly expressed in all tissues, a human exon 1 (Figure 3), and in each case in the same meta-analysis was performed. This analysis utilized genomic context (downstream of the SCD gene). The publicly available RNA-Seq datasets from the Illumina UCSC phyloP base-wise conservation across 100 vetebrates Transcriptome BodyMap 2 project, which included shows a peak in conservation 5′ of OLMALINC-001 samples from 16 human tissues, including brain, exon 1, suggesting a possible conserved promoter region using humans of varying ages and sex. The 15 tissues (Additional file 3: Figure S3). in the Illumina data sets were compared with the Analysis of published RNA-Seq data [6,7] revealed brain transcriptome data. The non-brain tissues comprised expression of all exons in chimpanzee and orangutan liver, kidney, heart, lung, skeletal muscle, breast, adrenal, brain, although relative expression of exon 3 was lower thyroid, prostate, ovary, testes, adipose, colon, lymph node, than in human (Additional file 4: Figure S4). In gorilla and white blood cells. The analysis was carried out using brain, only expression of exon 1, and neighboring re- the Tuxedo protocol [8] and showed that brain expression gions (perhaps the antisense transcript), was detected. level of OLMALINC was higher than the expression levels RNA-Seq reads did not map to expected exons in any in any other tissue type (Figure 4 and Additional file 5: other species examined (Additional file 4: Figure S4), Figure S5). Liver, ovary and breast tissue had the next suggesting that OLMALINC-001 is not expressed out- highest levels of OLMALINC expression; these levels side of great apes. This information is summarized in were approximately half of the OLMALINC expression Figure 3. seen in the brain. On average OLMALINC was expressed Mills et al. Molecular Brain (2015) 8:2 Page 4 of 12

Figure 2 RT-qPCR validation of the OLMALINC gene expression patterns in GM and WM samples. This boxplot shows OLMALINC was up-regulated 12.05-fold in WM when compared to GM (p-value < 0.05).

7.5-fold higher in the brain when compared to other sources of tissue. To validate the expression levels from the Illumina Transcriptome BodyMap 2 project, the expression level of OLMALINC was analysed in another publicly avail- able RNA-Seq dataset produced by The Human Protein Atlas project. The tissue types assessed were brain, liver, kidney, skin and bone marrow (Additional file 5: Figure 4 Comparative analysis of the OLMALINC expression Figure S5). OLMALINC was expressed at 3-fold higher levels in brain and 15 other human tissues. The bar graph shows the difference in expression levels of OLMALINC in varying tissues compared to the brain. The RNA-Seq datasets used for this analysis were taken from Illumina’s BodyMap2 project. The brain has at least a greater than 1.9x expression level of OLMALINC than any other tissue. The y-axis is expression in fpkm.

level in the brain when compared to liver and on aver- age 20-fold higher than the other tissue types. Usage of this independent set of tissue-specific transcriptome se- quence data thus corroborates the pattern seen in the Illumina dataset.

The OLMALINC locus co-expresses cis-natural antisense RNA Recently, we performed strand-specific RNA-Seq analysis of combined RNA samples derived from the GM and WM tissue from the frontal cortex (Mills et al., unpublished data). Bioinformatics analysis of this dataset revealed the Figure 3 Summary of homology and expression of OLMALINC presence of an antisense RNA that mapped to the first exon (UCSC uc001kqz.4) exons in representative vertebrate of the OLMALINC-002 isoform (Figure 1). This antisense genomes. Homology and expression of all three exons is only detected in human and chimpanzee. Exons 2 and 3 are not detected outside of transcript is 488-nts long, consists of one exon and is old world monkeys. There is low expression of exon 3 in orangutan. expressed at 18 fpkm in the whole brain tissue. We named Outside of great apes there is no expression of any exon. ND =no data. this RNA OLMALINC-AS to emphasize its cis-natural fea- U=homologywasonlydetectedintheUCSCgenomebrowserand ture in regard to its overlap with the OLMALINC locus. not by any other method (blastn/blat/dotplots). - = no expression To validate the sensitivity of the strand-specific RNA-Seq detected. + = expression detected. ? = low levels of expression detected. analysis, we amplified and Sanger sequenced a fragment Mills et al. Molecular Brain (2015) 8:2 Page 5 of 12

that was unique to the OLMALINC-AS transcript. the genes affected by OLMALINC knockdown, nearly Alignment to the reference genome confirmed expression twice as many genes showed increased rather than de- of the OLMALINC-AS in the human frontal cortex creased expression in neurons. (Additional file 2: Figure S2C). To further explore the pattern of OLMALINC-AS OLMALINC knockdown affects genes regulating cell expression in specific histological structures of the activation and membrane signaling in oligodendrocytes cortex, we quantified the levels of this transcript in Gene ontology analysis of differentially expressed genes GM and WM samples using RT-qPCR. Figure 5 shows (DEGs) and differentially expressed isoforms (DEIs) 11.7-fold up-regulation (p-value = 0.07) of OLMALINC-AS (Additional file 9: Table S3) in oligodendrocytes revealed in WM. This fold-change did not reach the p-value cut-off two superclusters of genes and isoforms that were involved level for statistical significances. in the regulation and maintenance of cytostructure and cel- lular adhesion. Moreover, there were four individual clusters OLMALINC knockdown in oligodendrocytes and neurons related to heart development, cell activation, cell surface To provide insight into the influence of OLMALINC on receptor-linked signal transduction and positive regulation gene expression and regulation of transcription in cortical of cell adhesion (Figure 6). Overall 56 DEGs and DEIs were tissue, we performed RNAi-driven silencing of OLMALINC clustered in the pathway analysis, comprising 34% of the in the human MO3.13 oligodendrocytes and SK-N-SH neu- 162 unique DEGs and DEIs that underwent analysis. rons. The efficiency of the down-regulation of OLMALINC Interestingly, amongst the top 25 genes that were expression in the two cell lines was estimated by RT-qPCR. up- and down-regulated as a result of OLMALINC silen- We were able to specifically reduce OLMALINC levels in cing in oligodendrocytes (Additional file 7: Table S1), 12 oligodendrocytes and neurons by 4.5- and 3.5-fold, respect- genes contributed to the pathway clusters that were linked ively (p-values < 0.05) (Additional file 6: Figure S6). to the regulation of cell activation and cell surface receptor- Next, to assess the impact of OLMALINC depletion at linked signal transduction. In contrast, only 2 of the top 25 the level of the whole transcriptome, we performed DEGs contributed to the cytostructure supercluster, and RNA-Seq analysis on the OLMALINC silenced cells. In none contributed to the cell adhesion supercluster (Table 1). MO3.13 oligodendrocytes, 81 genes were up-regulated Several genes of the subset, that are shown in Table 1, and 41 were down-regulated (Additional file 7: Table S1). caught our attention due to theirinvolvementinthephysi- Notably, RNAi-driven reduction of OLMALINC expres- ology of oligodendrocytes. The expression profiles for these sion led to twice as many genes being up-regulated as selected genes are shown in Figure 7A. down-regulated in these cells. The histone deacetylase 9 (HDAC9) was up-regulated 4.1- In SK-N-SH neurons, 29 genes showed a significant fold, after knockdown of OLMALINC.Theproteinproduct increase in expression after knockdown of both of this gene belongs to class II Hdacs and is expressed OLMALINC isoforms, whereas 17 genes were down- mainly in post-mitotic, mature neurons in the murine cere- regulated (Additional file 8: Table S2). Again, amongst bral cortex [9,10]. Studies of HDAC9 have revealed its asso- ciation with medulloblastoma [11] and schizophrenia [9]. SRY (Sex-Determining Region Y)-Box 4 (SOX4) was up-regulated 2.8-fold in OLMALINC-depleted oligodendro- cytes. SRY (Sex-Determining Region Y) (Sox) proteins of group C are strongly expressed in the developing nervous system and have been associated with the maturation of neurons and glia. Prolonged SOX4 expression in cells of the oligodendrocyte lineage is incompatible with the acquisition of a fully mature phenotype, which indicates that the presence of SOX4, and possibly SRY (Sex-Determining Region Y)-Box 11 (SOX11), in oligodendrocyte precursors may normally prevent premature differentiation. SOX4 transgenic mice develop the full spectrum of phenotypic traits that are associated with severe hypomyelination dur- ing the first postnatal weeks. In these mice, myelin gene expression is severely reduced, and myelin dramatically thinned in several central nervous system (CNS) regions Figure 5 Quantification of OLMALINC-AS in GM and WM of the [12]. SOX4 expression counteracts the differentiation frontal cortex. This boxplot shows OLMALINC-AS was up-regulated of radial glia and must be down-regulated before full 11.7-fold in WM (p-value = 0.07). maturation can occur [13]. Mills et al. Molecular Brain (2015) 8:2 Page 6 of 12

Figure 6 Enrichment map of the Gene Ontology clusters derived from the DEGs and DEIs in oligodendrocytes following silencing of OLMALINC. The size of the node relates to the number of genes in each term. Lack of branching in the bottom clusters indicates that genes contributing to these clusters are not present in other clusters and thus are unique for particular pathway.

G protein-coupled receptor 126 gene (GPR126) was peripheral neuropathy in mice and the expression of down-regulated 3.1-fold in RNAi-treated MO3.13 oli- differentiated Schwann cell markers [15]. godendrocytes. In Schwann cells, GPR126 controls AXL receptor tyrosine kinase (AXL) is a member of proper development and myelination [14]. A mutation the Tyro3-Axl-Mer (TAM) receptor tyrosine kinase in GPR126 causes severe congenital hypomyelinating subfamily and was down-regulated 2.5-fold in our RNAi

Table 1 Expression levels for selected DEGs enriched in gene ontology analysis following OLMALINC silencing in oligodendrocytes Gene Chr. Description GO Clusters FPKM Control FPKM RNAi Fold Change q-value SAA1 chr11 Serum amyloid A1 Cell activation 12.64 2.20 −5.73 0.0014 IL7R chr5 Interleukin 7 receptor Cell activation, cell surface 1.60 0.42 −3.83 0.0194 receptor linked signal transduction APLN chrX Apelin Cell surface receptor linked 17.51 5.38 −3.25 2.99E-07 signal transduction GPR126 chr6 G protein-coupled receptor 126 Cell surface receptor linked 2.67 0.88 −3.04 0.0116 signal transduction CXCL5 chr4 Chemokine (C-X-C motif) ligand 5 Cell surface receptor linked 8.84 2.97 −2.98 4.01E-04 signal transduction AXL chr19 AXL receptor tyrosine kinase Cell surface receptor linked 4.23 1.75 −2.42 0.0114 signal transduction ANXA1 chr5 Annexin A1 Cell surface receptor linked 4.80 2.06 −2.33 4.41E-04 signal transduction COL3A1 chr2 Collagen, type III, alpha 1 Cell activation, cell surface receptor 380.21 191.13 −1.99 0.0396 linked signal transduction SOX4 chr6 SRY (sex determining region Y)-box 4 Cell activation, cell surface receptor 11.85 33.56 2.83 6.13E-11 linked signal transduction EGR1 chr5 Early growth response 1 Cell activation 1.81 5.41 2.99 3.92E-06 HDAC9 chr7 Histone deacetylase 9 Cell activation 0.83 3.33 4.02 0.0486 WIPF3 chr7 WAS/WASL interacting protein Cytostructure supercluster 0.79 3.44 4.37 4.09E-08 family, member 3 TRIM63 chr1 Tripartite motif containing 63, Cytostructure supercluster 0.35 1.59 4.62 0.0128 E3 ubiquitin protein ligase WNT11 chr11 Wingless-type MMTV integration Cell surface receptor linked signal 0.13 0.93 7.38 0.0439 site family, member 11 transduction Chr. - chromosome. Mills et al. Molecular Brain (2015) 8:2 Page 7 of 12

in a dose-dependent manner, suggesting that TAM receptor signaling could be directly involved in myelination by oligodendrocytes [18]. Early growth response gene (EGR1) is a zinc-finger transcription factor that was 3-fold up-regulated in OLMALINC-silenced oligodendrocytes. EGR1 down- regulation is critical for oligodendrocyte progenitor cell differentiation, and the gene was recently included in a de-repression model of oligodendrocyte lineage progression that relied on the concurrent down-regulation of several inhibitors of differentiation [19]. The expression pattern of selected genes, described above, was confirmed in an independent OLMALINC silencing experiment using individual siRNAs followed by RT-qPCR analysis (Additional file 10: Figure S7).

The cell proliferation pathway is affected by knockdown of OLMALINC in neurons GO enrichment analysis of genes whose expression had been affected by OLMALINC knockdown in neurons grouped GO terms for cell proliferation into a single clus- ter. This cluster was composed of 11 genes that comprised 24% of the all DEGs resulting from OLMALINC knock- down in neurons (Additional file 11: Table S4). Several genes within this subset have been previously described to be involved in neuronal biology. SRY (Sex-Determining region Y)-box 2 gene (SOX2) was 1.8-fold up-regulated following OLMALINC knock- down. Sox2 plays a role in the maturation and survival of embryonic and adult neurons, and Sox2 expression is high in undifferentiated neurons, but declines upon differenti- ation [20]. Sox2 deficiency results in decreased precursor cell proliferation and the generation of new neurons in adult mouse neurogenic regions [21]. Caspase-3 (CASP3) was up-regulated 1.6-fold following knockdown of OLMALINC. CASP3 is a key mediator of apoptosis in neuronal cells. The functions of non-apoptotic Figure 7 Expression levels of selected DEGs in OLMALINC-depleted caspase-3 in neuronal cells include synaptic plasticity, oligodendrocytes (A) and neurons (B). In oligodendrocytes the EGR1, dendrite pruning, as well as learning and memory HDAC9 and SOX4 were all up-regulated after RNAi of OLMALINC. AXL processes [22]. and GPR126 was down-regulated in oligodendrocytes after RNAi of Finally, SOX4, which was up-regulated 1.8-fold in neu- OLMALINC. The expression levels are in fpkm as calculated by RNA-Seq. rons, promotes neuronal differentiation and has a central In neurons CASP3, SOX2 and SOX4 were up-regulated after RNAi of OLMALINC. Level of significance: *q-value < 0.05, **q-value < 0.02, regulatory role during neuronal maturation. It mechanis- ***q-value < 0.01. tically separates cell cycle withdrawal from the establish- ment of neuronal properties [23]. Figure 7B presents differences in the expression levels of the SOX2, CASP3 experiments. The TAM receptors are widely expressed in and SOX4 genes as a result of OLMALINC silencing. the nervous system, including oligodendrocytes [16]. It has been shown that growth arrest-specific 6 (Gas6), a ligand Discussion for the tyro3/Axl/Mer (TAM) receptors, can affect the se- This study shows that OLMALINC is overexpressed in verity of demyelination in mice, and a loss of signaling via the human brain compared to peripheral tissue and that Gas6 leads to decreased oligodendrocyte survival and in- WM is a major source of this overexpression. Moreover, creased microglial activation during cuprizone-induced de- silencing of OLMALINC in oligodendrocytes and neurons myelination [17]. Gas6 significantly increased myelination affects the expression of functionally related gene sets. Mills et al. Molecular Brain (2015) 8:2 Page 8 of 12

Furthermore, this lincRNA is co-expressed with an anti- It has been previously suggested that lincRNAs tend to sense counterpart, OLMALINC-AS, which shows a similar act in cis,thusaffectinggeneslocatedina10-300-kbvicin- expression pattern, i.e., up-regulation in WM compared to ity of the particular lincRNA locus[31,32].Interestingly,we GM. Such co-expression of sense and antisense RNAs have not observed elevated expression of the protein- with overlapping loci has been previously observed in coding genes located in the 300-kb vicinity of the the case of the BACE1 mRNA and its natural antisense OLMALINC locus (data not shown), which is in contrast non-coding partner BACE1-AS [24]. Comparative analysis one study [33], but corroborates observations of others of the OLMALINC sequence demonstrates its unique showing that only 3% of the human lincRNAs have expres- expression in primates. sion profiles correlated with their neighboring genes [34]. Recent genome-wide surveys on lincRNAs allow us to Comparative analysis of the OLMALINC sequence draw some general observations about this particular corroborated earlier observations on its fast evolution RNA species [25]. First, it has been suggested that most [29,30]. Indeed, OLMALINC is only conserved within lincRNAs are expressed at low levels, usually below 2 primate genomes, and its homology abruptly drops for fpkm. In contrast to this observation, OLMALINC and rodents and other non-primate vertebrates. The presence its antisense counterpart, OLMALINC-AS, remain at of the human-specific exon 3 in the OLMALINC-002 levels of at least 16.2 fpkm and reach 71.5 fpkm in WM. isoform further supports the thesis that lincRNAs belong Moreover, our meta-analysis of OLMALINC expression to rapidly evolving segments of the primate genome [35]. in peripheral tissues revealed at least 1.9-fold higher Along with metastasis associated lung adenocarcinoma expression of this transcript in brain when compared transcript 1 (MALAT-1) [36] and polyadenylated nuclear to 15 other cell types of the body. Indeed, lincRNAs RNA (PAN) [37], OLMALINC belongs to the exception- have also been shown to exhibit tissue-specific patterns in ally abundantly expressed lincRNAs. High expression model organisms such as zebrafish [26]. In particular, the levels of some lincRNAs may facilitate their trans function overexpression of OLMALINC in the WM corroborated as a decoy to titrate proteins from their potential targets, the outcome of OLMALINC silencing, which affected a as reported for the growth arrest-specific 5 (Gas5) and number of genes involved in oligodendrocyte maturation P21 associated ncRNA DNA damage activated (PANDA) (see below). lincRNAs [38,39]. Hence, preservation of a titration Our comparative analysis of the OLMALINC sequence mechanism will require high numbers of lincRNA with vertebrate genomes revealed a remarkably high molecules interacting with numerous proteins [28]. degree of conservation in great ape and old world monkey The RNAi silencing efficiency of OLMALINC was less genomes (Figure 3). In all other mammal species effective in neurons than in an oligodendrocytic cell line, examined only homology with exon 1 was observed. and this could be a result of cell type-specific differences There was also evidence for conservation of the in the uptake capacity of the siRNA between neurons region 5′ of exon 1, which was detected as far back and oligodendrocytes [40]. Second, the initial levels of as opossum, and even platypus (Figure 3). Although OLMALINC were much below its expression level in all exons are present in old world monkeys, detectable oligodendrocytes, which further led to diminution of the expression of OLMALINC (and its antisense) was only silencing effect in neurons. observed in great apes, with exon 3 under expressed Silencing of OLMALINC expression in oligodendrocytes outside of humans (Additional file 4: Figure S4). Loss of affected the expression of genes, such as SOX4, GPR126 expression of exons 2 and 3 in gorilla must have occurred and EGR1, which are involved in the differentiation of after divergence from the human/chimp lineage. these cells from the neuronal stem lineage and in the These findings suggests that the building blocks for control of myelination processes. Several non-coding OLMALINC exon 1 are ancient (>200 mya, being com- RNAs (ncRNA) have been shown to exhibit dynamic mon to all mammals), whereas exons 2 and 3 only arose expression patterns during neuronal and oligodendro- in the ancestor of old world moneys (~30-45 mya) [27]. cyte lineage specification, neuronal-glial fate transi- The evolution of exons 2 and 3 was followed by the tions, and myelination, and they include ncRNAs expression of OLMALINC in at least the Hominidae associated with differentiation-specific nuclear subdo- ancestor (>17 mya), with further specialization (high mains, such as Gomafu and Neat1, ncRNAs associated expression of exon 3) only humans (<6 mya). Although the with developmental enhancers, and genes encoding original building blocks of OLMALINC appear ancient, its important transcription factors and homeotic proteins recently evolved expression suggests that it is a young gene. [41,42]. Up-regulation of the deacetylase 5 and 9 genes Indeed, less than 6% of zebrafish lincRNAs have detectable (Additional file 7: Table S1) by depletion of OLMALINC sequence conservation with mouse or human lincRNAs further underpins the previous observations of the involve- [28]. Moreover, merely 12% of the human and mouse lincR- ment of lincRNAs in chromatin remodeling-controlling NAs are conserved in other species [29,30]. cellular differentiation, as has been shown in lineage-specific Mills et al. Molecular Brain (2015) 8:2 Page 9 of 12

gene expression programs in mouse embryonic stem Cell lines cells [43]. MO3.13 oligodendrocytes and SK-N-SH neurons were The number of genes whose expression was affected obtained from the ATCC (Manassas, VA) and were by OLMALINC silencing remains similar to the average cultured in 12-well plates in Dulbecco’s modified Eagle’s of 175 protein-coding genes that changed their expression medium containing 10% fetal calf serum, 2 mM glutamine, pattern in 137 individual lincRNA knockdowns in 100 IU/ml penicillin, and 100 μg/ml streptomycin at 37°C Guttman’s study (2011). Moreover, changes in expression in humidified air containing 5% CO2. Cell culture media were characterized by up- and down-regulation, suggest- and additives were obtained from Invitrogen (Melbourne, ing a versatile impact of OLMALINC on transcription Australia) unless stated otherwise. that leads to the activation or repression of transcription. This remains in contrast to previous suggestions that the Reverse transcription and PCR action of lincRNAs mainly leads to repression of gene Using total RNA samples previously used for the RNA-Seq expression [44]. analysis [5], the OLMALINC-002, −003 and -AS transcripts were reverse transcribed and amplified using the Qiagen One-Step RT-PCR kit. The forward and reverse primers Conclusions used to amplify OLMALINC-002 were TGTGGTAC Our evidence indicates that the recently evolved TAAGCTTGACAGC and TCATAGGTGGATCTCCTC OLMALINC is a primate specific transcript that is a ACG; for OLMALINC-003, TAGACCTTGCTAACCAG major contributor to the maintenance of oligodendrocyte GACG and TGGTATCAGTTAGCGTGGGGC; and for maturation. This conclusion remains in line with previous OLMALINC-AS, CCCGAGATTCTTTGTGGGCT and CT observations that lincRNAs are involved in development of CTCCCACCACACACCAC. Standard RT-PCR conditions, the nervous system [45]. However, it should be emphasized as recommended by Qiagen, were used. PCR products of that further characterization of OLMALINC function will 560, 1100 and 280 bp were purified and Sanger sequenced. require systematic studies, including defining all protein complexes with which the lincRNA possibly interacts, Quantitative PCR determining where these protein interactions assemble Primers were designed for the OLMALINC, OLMALINC-AS on the RNA, and ascertaining whether they bind and proteasome (prosome, macropain) subunit beta type 4 simultaneously or alternatively. Moreover, understanding (PSMB4)genes.ForquantificationofEGR1, HDAC9, SOX4, how OLMALINC–protein or –DNA interactions give rise GPR126 and AXL expression in MO3.13 oligodendrocytes to specific patterns of gene expression will require primers were purchased from Qiagen. We used PSMB4 as a determination of the functional contribution of each housekeeping gene, as described previously [47]. The PSMB4 interaction and possible localization of the complex forward and reverse primers were ordered from Qiagen to its genomic targets. (HS_PSMB4_1_SG QuantiTect primer assay). The sequence for the OLMALINC forward primer was GACTCCTTTGG Methods GAGACCAGTG, and that of the reverse primer was Brain tissue and RNA extraction AGGTCACAGGGGATTTGATGG. The OLMALINC Samples representing the GM and WM of the human primers spanned the fragment of the transcript that superior frontal gyrus (SFG) were obtained from the was common for the OLMALINC-002 and −003 isoforms. Sydney Brain Bank following ethical approval from the The sequence for the OLMALINC-AS forward primer was Human Research Ethics Committee of the University of GTCACTGGGGAGAACGTGAC, and that for the New South Wales. The GM and WM tissues were obtained reverse primer was CTCTCCCACCACACACCAC. The from three individuals (aged 79, 94 and 98) without signifi- OLMALINC-AS primers were unique for the –AS tran- cant neuropathology. The post mortem interval (PMI) of script. All of the primers used for RT-qPCR had an effi- the samples ranged from 8–24 h, and the pH ranged from ciency of between 90%-110%. The RNA samples were 5.77-6.65. For RT-qPCR experiments, another two SFG reverse-transcribed using the Qiagen QuantiTect Reverse samples were used that were matched to the previous Transcription (RT) Kit, and the gene expression was samples regarding age, PMI and pH. quantified using the QuantiTect SYBR green PCR Total RNA was extracted from approximately 100 mg master mix (Qiagen). The PCR reaction was performed for each case using Qiagen’s RNeasy Lipid Tissue RNA on a Rotor-Gene 6000 (Qiagen) using three independent Extraction Kit. The RNA integrity numbers (RINs) were GM and WM samples and MO3.13 oligodendrocytes, and determined using an Agilent 2100 BioAnalyzer RNA each reaction was performed in triplicate. PSMB4 was Nano Chip. The RIN values ranged from 4.9-7.2. This used to normalize the results from each RT-qPCR run to RIN range was previously shown to have little effect on reduce batch effects and correct any variation in template relative gene expression ratios [46]. input [48]. Expression levels were calculated using 2-ΔCt Mills et al. Molecular Brain (2015) 8:2 Page 10 of 12

and fold-change was calculated using the 2-ΔΔCt method (build hg19). The default setting for TopHat was used. [49]. Statistically significant changes in gene expression Cufflinks was then used to assemble the aligned reads into were assessed using the R project for statistical computing individual transcripts by inferring splicing structure and (www.r-project.org). provided a minimal number of predicted transcripts through parsimonious assembly. Cufflinks also normalized RNA interference the read count of each input file to allow for calculation RNAi knockdown of OLMALINC was carried out using of the relative abundance of each transcript in fpkm. To four commercially prepared siRNA oligonucleotides guide the assembly process, the iGenomes UCSC hg19 full (Qiagen) specific to different fragments of the OLMALINC annotation GTF file was used. These steps were performed transcript. Briefly, MO3.13 and SK-N-SH cells were seeded for all 16 tissue types that were assessed. Subsequently, the at 40% confluence in 6-well plates in antibiotic-free media results for each tissue type were merged with the files for and equimolar mix of four oligonucleotides was transfected the brain transcriptome and the iGenomes reference anno- using Lipofectamine 2000 and Opti-MEM I (Invitrogen) tation via Cuffmerge and then passed through Cuffdiff. following the manufacturer’sprotocol.Forvalidationexper- iments MO3.13 oligodendrocytes were transfected with Comparative analysis with vertebrate genomes two individual siRNAs used previously in the four siRNA The UCSC Genome Browser website (http://genome.ucsc. mix. Transfection efficiency was confirmed by substituting edu/index.html) contains the reference sequences and the siRNA with fluorescently labelled BLOCK-iT reagent working draft assemblies of a large collection of genomes (Invitrogen), using the same transfection procedure and this from a variety of organisms. Using the UCSC Genome confirmed >95% transfection efficiency was achieved as Browser, the level of sequence conservation of OLMALINC assessed by fluorescence microscopy. across numerous species was analyzed.

RNA isolation, library preparation and sequencing After 48 hrs of culture total RNA was isolated using Gene ontology enrichment analysis RNeasy Mini Kit (Qiagen) followed by RNase-free The lists of DEGs and DEIs were entered into the Database DNase treatment to remove traces of genomic DNA. for Annotation, Visualization and Integrated Discovery The RNA quality of the total RNA was assessed using (DAVID) (http://david.abcc.ncifcrf.gov/) [50]. DAVID the Agilent 2100 BioAnalyser RNA Nano Chip and the can only utilize annotated genes/isoforms; thus, all RIN values ranged between 8.0 and 9.0. Six RNA samples un-annotated genes/isoforms and indecisively annotated (three MO3.13 and three SK-N-SH replicates) were pre- genes/isoforms were removed from the lists. DAVID tests pared for sequencing according to the Illumina TruSeq GO terms for over representation in each of the DEG and RNA sample preparation guide and subjected to 100 bp DEI lists. The GO terms lost produced by DAVID, were paired-end sequencing using Illumina HiSeq1000. processed using the ‘Enrichment Map’ plug in for ‘Cytoscape’ (http://www.cytoscape.org/) [51]. This produces Meta-analysis of Illumina expression data of non-brain a visual putput of the text based GO term lists. tissues Meta-analysis of Illumina BodyMap2 transcriptome files Data access was carried out to determine the expression distribution The sequence data have been submitted to the NCBI of OLMALINC across tissues at the gene level. The Short Read Archive with accession number SRA602249. BodyMap2 project was carried out by Illumina to pro- vide a sample RNA-Seq dataset of 16 individual and mixed tissues. A second analysis was performed on five Additional files independent RNA-Seq datasets produced by The Human Additional file 1: Figure S1. Differential expression of the OLMALINC Protein Atlas (http://www.proteinatlas.org/tissue). The gene and its isoforms as revealed by RNA-Seq analysis of GM and WM meta-analysis was carried out on Galaxy using the Tuxedo samples from the human frontal cortex. (A) Expression levels of the protocol [13], and files from Illumina’sBodyMap2 OLMALINC gene in GM and WM. (B) Expression levels of the OLMALINC-002 and −003 isoforms in GM and WM; bars represent SD. Level of significance: project and the Human protein Atlas project were **q-value< 0.02, ***q-value<0.01. imported into Galaxy from the NCBI sequence read Additional file 2: Figure S2. Sequence alignment of Sanger sequenced archive (SRA) (project accession number: ERP000546, RT-PCR OLMALINC products with sequences derived from RNA-Seq data. http://www.ncbi.nlm.nih.gov/sra/?term=ERP000546, project The query sequence is the sequence derived from Sanger sequencing and the subject sequence is derived from RNA-Seq data. A. Sequence accession number: ERP003613, http://www.ncbi.nlm.nih. alignment of OLMALINC-002. B. Sequence alignment of OLMALINC-003 gov/sra/?term=ERP003613). The selected SRA file reads C. Sequence alignment of OLMALINC-AS. The letter N marks were assembled with TopHat, which utilizes Bowtie to align nucleotides where sequencing failed to develop a consensus short sequence reads to the H. sapiens reference genome sequence. Mills et al. Molecular Brain (2015) 8:2 Page 11 of 12

Additional file 3: Figure S3. Homology detected to OLMALINC in the RNA; PANDA: P21 associated ncRNA DNA damage activated; PMI: Post mortem chimpanzee, gorilla, orangutan, rhesus monkey, marmoset and opossum interval; PSMB4: Proteasome genomes with blat searches on the UCSC genome browser. All (prosome, macropain) subunit, beta type, 4; RIN: RNA integrity number; homologies are displayed relative to the SCD gene. In the opossum RNAi: RNA-interference; RNA-Seq: RNA-Sequencing; RT-qPCR: Reverse genome (reverse orientation) homology to exon 1 was not detected with transcription quantitative polymerase chain reaction; SCD: Stearoyl-CoA blat searches, but was detected in the multiZ alignments track. In rhesus desaturase (delta-9-desaturase); SFG: Superior frontal gyrus; Sox: SRY monkey, exon 3 is not detected in the correct position with balt (sex-determining region Y); SOX2: SRY (sex-determining region Y)-box 2; searches, but is detected with blastz. Exon 1 homologies are boxed in SOX4: SRY (Sex-determining region Y)-box 4; SOX11: SRY (sex-determining red, exon 2 in green, and exon 3 in blue. Basewise conservation region Y)-box 11; TAM: tyro3/Axl/Mer; TFs: Transcription factors; WM: White matter. calculated by PhyloP across 100 vertebrates shows a peak in conservation upstream of OLMALINC exon 1 indicated a possible promoter region. Competing interests The authors declare that they have no competing interests. Additional file 4: Figure S4. Read coverage from male brain RNA-seq data across the OLMALINC homologous regions in human, chimpanzee, ’ gorilla, orangutan, rhesus monkey and opossum. Outside of great apes Authors contributions there is no evidence for expression of any exon. Exon 1 homologies are JDM, WSK, PDW, GMH and MJ conceived and designed the experiments. boxed in red, exon 2 in green, and exon 3 in blue. Read depth bedGraph JDM, TK and WSK performed the experiments. JDM, TK, BJC and PDW files were available for human, gorilla, rhesus monkey and opossum [11]. analyzed the data. JDM, WSK, PDW, GMH and MJ wrote the manuscript. For chimpanzee and orangutan, RNA-seq reads [12] were mapped with All authors read and approved the final manuscript. TopHat v2.0.9 with the command line options: tophat -p 4 -a 8 -i 40 -m 1 -I 1000000 –coverage-search –microexon-search. Read depth coverage Acknowledgements bedGraph files were generated with the samtools depth utility. Tissues were received from the Sydney Brain Bank at Neuroscience Research Australia and the New South Wales Tissue Resource Centre at the University Additional file 5: Figure S5. Comparative analysis of OLMALINC of Sydney which are supported by the National Health and Medical Research expression across five tissue samples. The RNA-Seq data sets were taken Council of Australia (NHMRC), University of New South Wales, Neuroscience from the Human Protein Atlas project (http://www.proteinatlas.org/). Research Australia, Schizophrenia Research Institute and National Institute of Again, OLMALINC is expressed at its highest level in brain tissue and it is Alcohol Abuse and Alcoholism (NIH (NIAAA) R24AA012725). This research expressed 3-fold higher than liver, the tissue source with the next highest was supported by the National Health & Medical Research Council of level of expression. The independent dataset confirms the results from Australia (Project grant #1022325 to WSK and Fellowship #630434 to GMH) Figure 4. The y-axis is expression in fpkm. and Brain Foundation Australia (to MJ). Authors would like to thank Caroline Additional file 6: Figure S6. Quantification of the OLMALINC transcript Janitz for her expert advice regarding Illumina sequencing. following its knockdown in oligodendrocytes and neurons using RT-qPCR. OLMALINC levels in oligodendrocytes and neurons were reduced Author details by 4.5- and 3.5-fold, respectively (p-values<0.05). N – neurons; O – 1School of Biotechnology and Biomolecular Sciences, University of New oligodendrocytes. South Wales, Sydney, NSW 2052, Australia. 2Neuroscience Research Australia, 3 Additional file 7: Table S1. Full list of differentially expressed genes Sydney, NSW 2031, Australia. School of Medical Sciences, University of New 4 and isoforms between the oligodendrocyte cell controls and the South Wales, Sydney, NSW 2052, Australia. Present address: Garvan Institute oligodendrocyte cells that underwent RNAi. of Medical Research, Sydney, NSW 2010, Australia. Additional file 8: Table S2. Full list of differentially expressed genes Received: 13 October 2014 Accepted: 15 December 2014 and isoforms between the neuronal cell controls and the neuronal cells that underwent RNAi. Additional file 9: Table S3. Enriched gene ontology terms for genes References and isoforms differentially expressed in the oligodendrocyte cell line. 1. Mattick JS. The central role of RNA in human development and cognition. Additional file 10: Figure S7. RT-qPCR validation of the EGR1, HDAC9, FEBS Lett. 2011;585:1600–16. SOX4, AXL and GPR126 genes expression pattern in MO3.13 2. Batista PJ, Chang HY. Long noncoding RNAs: cellular address codes in oligodendrocytes silenced with individual OLMALINC siRNAs. EGR1, development and disease. Cell. 2013;152:1298–307. HDAC9 and SOX4 genes were up-regulated 4-, 4.4-, 3.6-fold in RNAi-treated 3. Lee JT. Epigenetic regulation by long noncoding RNAs. Science. oligodendrocytes when compared to control (p-value<0.05), respectively. 2012;338:1435–9. The GPR126 and AXL genes were down-regulated 3.5- and 7.5-, respectively. 4. Mercer TR, Qureshi IA, Gokhan S, Dinger ME, Li G, Mattick JS, et al. Long Con – control. noncoding RNAs in neuronal-glial fate specification and oligodendrocyte Additional file 11: Table S4. Enriched gene ontology terms for genes lineage maturation. BMC Neurosci. 2010;11:14. and isoforms differentially expressed in the neuronal cell line. 5. Mills JD, Kavanagh T, Kim WS, Chen BJ, Kawahara Y, Halliday GM, et al. Unique transcriptome patterns of the white and grey matter corroborate structural and functional heterogeneity in the human frontal lobe. PLoS Abbreviations One. 2013;8:e78480. AXL: AXL receptor tyrosine kinase; BACE1: Beta-site APP-cleaving enzyme 1; 6. Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, Zeller U, et al. The BACE1-AS: Beta-site APP-cleaving enzyme 1 antisense RNA; CASP3: Caspase-3; evolution of lncRNA repertoires and expression patterns in tetrapods. CNS: Central nervous system; EGR1: early growth response 1; Fpkm: Fragments Nature. 2014;505:635–40. per kilobase of exon per million fragments mapped; Gas5: Growth arrest-specific 7. Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, et al. 5 (non-protein coding); Gas6: Growth arrest-specific 6; GM: Grey matter; The evolution of gene expression levels in mammalian organs. Nature. Gomafu: myocardial infarction associated transcript (non-protein coding); 2011;478:343–8. GPR126: G protein-coupled receptor 126; Hdac: Histone deacetylase; 8. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential HDAC9: Histone deacetylase 9; HIF-1alpha: Hypoxia inducible factor 1, alpha gene and transcript expression analysis of RNA-seq experiments with subunit (basic helix-loop-helix transcription factor); ING4: Inhibitor of growth TopHat and Cufflinks. Nat Protoc. 2012;7:562–78. family, member 4; OLMALINC:Oligodendrocytematuration-associatedlong 9. Lang B, Alrahbeni TM, Clair DS, Blackwood DH, International Schizophrenia intervening non-coding RNA; OLMALINC-AS:Oligodendrocytematuration-associated C, McCaig CD, et al. HDAC9 is implicated in schizophrenia and expressed long intervening non-coding RNA; lincRNAs: long non-coding intervening specifically in post-mitotic neurons but not in adult neural stem cells. Am J RNAs; lncRNAs: Long non-coding RNAs; LTR: long terminal repeat; Stem Cells. 2012;1:31–41. Malat1: Metastasis associated lung adenocarcinoma transcript 1; Neat1: Nuclear 10. Broide RS, Redwine JM, Aftahi N, Young W, Bloom FE, Winrow CJ. paraspeckle assembly transcript 1 (non-protein coding); ncRNA: Non-coding Distribution of histone deacetylases 1–11 in the rat brain. J Mol Neurosci. RNA; nts: Nucleotides; ORF: Open reading frame; PAN: Polyadenylated nuclear 2007;31:47–58. Mills et al. Molecular Brain (2015) 8:2 Page 12 of 12

11. Milde T, Oehme I, Korshunov A, Kopp-Schneider A, Remke M, Northcott P, et al. 35. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D, et al. HDAC5 and HDAC9 in medulloblastoma: novel markers for risk stratification and Many human large intergenic noncoding RNAs associate with role in tumor cell growth. Clin Cancer Res. 2010;16:3240–52. chromatin-modifying complexes and affect gene expression. Proc 12. Potzner MR, Griffel C, Lutjen-Drecoll E, Bosl MR, Wegner M, Sock E. Prolonged Natl Acad Sci U S A. 2009;106:11667–72. Sox4 expression in oligodendrocytes interferes with normal myelination in the 36. Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, et al. MALAT-1, a central nervous system. Mol Cell Biol. 2007;27:5316–26. novel noncoding RNA, and thymosin beta4 predict metastasis and survival in 13. Hoser M, Baader SL, Bosl MR, Ihmer A, Wegner M, Sock E. Prolonged glial early-stage non-small cell lung cancer. Oncogene. 2003;22:8031–41. expression of Sox4 in the CNS leads to architectural cerebellar defects and 37. Sun R, Lin SF, Gradoville L, Miller G. Polyadenylylated nuclear RNA encoded ataxia. J Neurosci. 2007;27:5495–505. by Kaposi sarcoma-associated herpesvirus. Proc Natl Acad Sci U S A. 14. Mogha A, Benesh AE, Patra C, Engel FB, Schoneberg T, Liebscher I, et al. 1996;93:11883–8. Gpr126 functions in Schwann cells to control differentiation and 38. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP. Noncoding RNA gas5 is a myelination via G-protein activation. J Neurosci. 2013;33:17976–85. growth arrest- and starvation-associated repressor of the glucocorticoid 15. Monk KR, Oshima K, Jors S, Heller S, Talbot WS. Gpr126 is essential for receptor. Sci Signal. 2010;3:ra8. peripheral nerve development and myelination in mammals. Development. 39. Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD, et al. Extensive and 2011;138:2673–80. coordinated transcription of noncoding RNAs within cell-cycle promoters. 16. Binder MD, Cate HS, Prieto AL, Kemper D, Butzkueven H, Gresle MM, et al. Nat Genet. 2011;43:621–9. Gas6 deficiency increases oligodendrocyte loss and microglial activation in 40. Kim JB, Choi JS, Nam K, Lee M, Park JS, Lee JK. Enhanced transfection of response to cuprizone-induced demyelination. J Neurosci. 2008;28:5195–206. primary cortical cultures using arginine-grafted PAMAM dendrimer, 17. Tsiperson V, Li X, Schwartz GJ, Raine CS, Shafit-Zagardo B. GAS6 enhances PAMAM-Arg. J Control Release. 2006;114:110–7. repair following cuprizone-induced demyelination. PLoS One. 2010;5:e15748. 41. Sone M, Hayashi T, Tarui H, Agata K, Takeichi M, Nakagawa S. The mRNA-like 18. Binder MD, Xiao J, Kemper D, Ma GZ, Murray SS, Kilpatrick TJ. Gas6 increases noncoding RNA Gomafu constitutes a novel nuclear domain in a subset of myelination by oligodendrocytes and its deficiency delays recovery neurons. J Cell Sci. 2007;120:2498–506. following cuprizone-induced demyelination. PLoS One. 2011;6:e17727. 42. Clemson CM, Hutchinson JN, Sara SA, Ensminger AW, Fox AH, Chess A, et al. 19. Swiss VA, Nguyen T, Dugas J, Ibrahim A, Barres B, Androulakis IP, et al. An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential Identification of a gene regulatory network necessary for the initiation of for the structure of paraspeckles. Mol Cell. 2009;33:717–26. oligodendrocyte differentiation. PLoS One. 2011;6:e18088. 43. Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, et al. 20. Cavallaro M, Mariani J, Lancini C, Latorre E, Caccia R, Gullo F, et al. Impaired lincRNAs act in the circuitry controlling pluripotency and differentiation. generation of mature neurons by neural stem cells from hypomorphic Sox2 Nature. 2011;477:295–300. mutants. Development. 2008;135:541–57. 44. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D, 21. Ferri AL, Cavallaro M, Braida D, Di Cristofano A, Canta A, Vezzani A, et al. et al. A large intergenic noncoding RNA induced by p53 mediates global Sox2 deficiency causes neurodegeneration and impaired neurogenesis in gene repression in the p53 response. Cell. 2010;142:409–19. the adult mouse brain. Development. 2004;131:3805–19. 45. Young RS, Marques AC, Tibbit C, Haerty W, Bassett AR, Liu JL, et al. 22. D’Amelio M, Cavallucci V, Cecconi F. Neuronal caspase-3 signaling: not only Identification and properties of 1,119 candidate lincRNA loci in the cell death. Cell Death Differ. 2010;17:1104–14. Drosophila melanogaster genome. Genome Biol Evol. 2012;4:427–42. 23. Bergsland M, Werme M, Malewicz M, Perlmann T, Muhr J. The establishment 46. Ho-Pun-Cheung A, Bascoul-Mollevi C, Assenat E, Boissiere-Michot F, Bibeau F, of neuronal properties is controlled by Sox4 and Sox11. Genes Dev. Cellier D, et al. Reverse transcription-quantitative polymerase chain reaction: 2006;20:3475–86. description of a RIN-based algorithm for accurate data normalization. BMC Mol Biol. 2009;10:31. 24. Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE, et al. 47. Eisenberg E, Levanon EY. Human housekeeping genes are compact. Expression of a noncoding RNA is elevated in Alzheimer’s disease and Trends Genet. 2003;19:362–5. drives rapid feed-forward regulation of beta-secretase. Nat Med. 48. Forlenza M, Kaiser T, Savelkoul HF, Wiegertjes GF. The use of real-time 2008;14:723–30. quantitative PCR for the analysis of cytokine mRNA levels. Methods Mol Biol. 25. Hangauer MJ, Vaughn IW, McManus MT. Pervasive transcription of the 2012;820:7–23. human genome produces thousands of previously unidentified long 49. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C intergenic noncoding RNAs. PLoS Genet. 2013;9:e1003569. (T) method. Nat Protoc. 2008;3:1101–8. 26. Kaushik K, Leonard VE, Kv S, Lalwani MK, Jalali S, Patowary A, et al. Dynamic 50. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of expression of long non-coding RNAs (lncRNAs) in adult zebrafish. PLoS One. large gene lists using DAVID bioinformatics resources. Nature Protocols. 2013;8:e83616. 2009;4:44–57. 27. Perelman P, Johnson WE, Roos C, Seuanez HN, Horvath JE, Moreira MA, et al. 51. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, et al. A molecular phylogeny of living primates. PLoS Genet. 2011;7:e1001342. Integration of biological networks and gene expression data using 28. Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP. Conserved function of Cytoscape. Nat Protoc. 2007;2:2366–82. lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011;147:1537–50. 29. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, et al. Integra- tive annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–27. 30. Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7:e1000112. Submit your next manuscript to BioMed Central 31. Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, et al. and take full advantage of: Long noncoding RNAs with enhancer-like function in human cells. Cell. 2010;143:46–58. • Convenient online submission 32. Ponjavic J, Oliver PL, Lunter G, Ponting CP. Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the • Thorough peer review developing brain. PLoS Genet. 2009;5:e1000617. • No space constraints or color figure charges 33. Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, Ponting CP, et al. • Immediate publication on acceptance Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 2012;8:e1002841. • Inclusion in PubMed, CAS, Scopus and Google Scholar 34. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. The • Research which is freely available for redistribution GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012;22:1775–89. Submit your manuscript at www.biomedcentral.com/submit Chapter 4

Transcriptome profiling of multiple system atrophy brain tissue using RNA-Seq

66 Chapter 4

4.1 Primary research article: Transcriptome anal-

ysis of grey and white matter cortical tissue

in multiple system atrophy

Reference Mills, J. D., Kim, W. S., Halliday, G. M., and Janitz, M., (2015). “Transcrip- tome analysis of grey and white matter cortical tissue in multiple system atrophy.” Neurogenetics 16(2):107-122.

Contribution Iconceivedtheexperiments,wrotethethearticle,preparedthefigures,isolated the RNA for sequencing, carried out all of the bioinformatic analyses and all of the associated wet-lab work. WSK provided feedback on the manuscript. GMH and MJ provided feedback on the manuscript and conceived the experiments.

Synopsis The previous two primary research articles focused on analysing the transcriptome of the healthy brain. The next step of the project was to look at the human brain in a disease context. The disease context that was analysed was multiple system atrophy (MSA). MSA was chosen as it is neurodegenerative disease in which limited genetic mutations have been implicated in the establishment and progression of the disease. RNA-Seq was employed to profile GM and WM in both healthy and MSA brain tissue from the superior frontal gyrus (SFG) of the human brain. This published ar- ticle identified a handful of candidate genes that may be involved in MSA pathology including ↵1-hemoglobin (HBA1 ), ↵2-hemoglobin (HBA2 ), -hemoglobin (HBB) and transthyretin (TTR). These results suggest that MSA pathology may be re- lated to increased iron levels in MSA WM. Further, a number of lincRNAs, including linc00320,wereidentifiedasdi↵erentiallyexpressedbetweenMSAGMandMSA

67 Chapter 4

WM. As MSA initially establishes in WM, this differential expression could indi­ cate potential involvement in MSA disease pathology. This paper provides the most comprehensive transcriptional analysis of MSA disease pathology to date.

Declaration

I certify that t his publication was a direct result of my research towards this PhD,

and that repr duction in this thesis does not breach copyright regulations.

James Dominic !\!ills [PhD Candidate]

68 Neurogenetics (2015) 16:107–122 DOI 10.1007/s10048-014-0430-0

ORIGINAL ARTICLE

Transcriptome analysis of grey and white matter cortical tissue in multiple system atrophy

James D. Mills & Woojin S. Kim & Glenda M. Halliday & Michael Janitz

Received: 24 July 2014 /Accepted: 20 October 2014 /Published online: 6 November 2014 # Springer-Verlag Berlin Heidelberg 2014

Abstract Multiple system atrophy (MSA) is a distinct mem- Keywords Multiple system atrophy . Transcriptome ber of a group of neurodegenerative diseases known as α- sequencing . White and grey matter . Haemoglobin genes . synucleinopathies, which are characterized by the presence of Long intervening non-coding RNA aggregated α-synuclein in the brain. MSA is unique in that the principal site for α-synuclein deposition is in the oligodendro- cytes rather than neurons. The cause of MSA is unknown, and Introduction the pathogenesis of MSA is still largely speculative. Brain transcriptome perturbations during the onset and progression Multiple system atrophy (MSA) is a sporadic and rapidly of MSA are mostly unknown. Using RNA sequencing, we progressive neurodegenerative disorder. MSA presents clini- performed a comparative transcriptome profiling analysis of cally with autonomic dysfunction and is accompanied by the grey matter (GM) and white matter (WM) of the frontal various combinations and levels of parkinsonism, cerebellar cortex of MSA and control brains. The transcriptome sequenc- ataxia and pyramidal features [1]. MSA has also been catego- ing revealed increased expression of the alpha and beta rized as an α-synucleinpathy, an umbrella term that covers haemoglobin genes in MSA WM, decreased expression of neurodegenerative disorders in which the aggregation of α- the transthyretin (TTR) gene in MSA GM and numerous synuclein is thought to be a key event in pathogenesis [2]. region-specific long intervening non-coding RNAs Other α-synucleinopathies include Parkinson’s disease (PD), (lincRNAs). In contrast, we observed only moderate changes dementia with Lewy bodies (DLB) and a number of less-well- in the expression patterns of the α-synuclein (SNCA) gene, characterized neuroaxonal dystrophies [2]. MSA can be clin- which confirmed previous observations by other research ically similar to PD, and early cases of MSA are often groups. Our study suggests that at the transcriptional level, misdiagnosed as PD. Although MSA and PD have common MSA pathology may be related to increased iron levels in biochemical and clinical characteristics, there are clear WM and perturbations of the non-coding fraction of the differences in how α-synuclein is implicated in these transcriptome. diseases. In PD, α-synuclein aggregates are found in neurons, referred to as Lewy bodies and Lewy neurites Electronic supplementary material The online version of this article [2]. In MSA, α-synuclein is deposited primarily in the (doi:10.1007/s10048-014-0430-0) contains supplementary material, form of glial cytoplasmic inclusions (GCIs) that are which is available to authorized users. located in oligodendrocytes [2]. The sequence of patho- J. D. Mills : M. Janitz (*) logical events in the pathways affected by MSA is now School of Biotechnology and Biomolecular Sciences, University of recognized to be an abnormal aggregation of α- New South Wales, Sydney, NSW 2052, Australia synuclein in the form of GCIs, followed by demyelin- e-mail: [email protected] ation, and then by the loss of neurons [3–8]. W. S. Kim : G. M. Halliday Little is known about the genetic basis of MSA. The recent Neuroscience Research Australia, Sydney, NSW 2031, Australia identification of causal mutations and polymorphisms in the Q2 4-hydroxybenzoate polyprenyltransferase gene (COQ2) W. S. Kim : G. M. Halliday School of Medical Sciences, University of New South Wales, [9], a gene encoding a biosynthetic enzyme for the production Sydney, NSW 2052, Australia of the lipid-soluble electron carrier coenzyme Q10 108 Neurogenetics (2015) 16:107–122

(ubiquinone), suggests that membrane transporters are in- Materials and methods volved in pathogenesis in some MSA cases [3]. However, the extent to which such transporters are involved in the early Human brain tissue myelin degeneration observed in MSA remains unclear [3]. Unlike PD, in which mutations in the α-synuclein (SNCA) Human brain tissue was obtained from longitudinally follow- gene have been implicated in the disease process, anal- ed brain donors by the Sydney Brain Bank and NSW Tissue yses of SNCA in MSA failed to identify causal muta- Resource Centre as part of the Australian Brain Bank Network tions, and therefore, alteration of the SNCA sequence is funded by the National Health and Medical Research Council unlikely to provide a pathogenic mechanism for MSA of Australia. Ethics approval was from the University of New [10]. This does not eliminate a role for SNCA in the South Wales Human Research Ethics Committee (Ref. No pathogenesis of MSA; modification of SNCA could still HC11221). Frozen brain tissues was from six MSA cases occur at the transcriptional, post-transcriptional or post- and six controls, all Caucasian, aged 62–98. Control cases translational levels. Nevertheless, previous studies inves- had no clinical or pathological evidence of neurodegenerative tigating the levels of SNCA gene expression in MSA disease. MSA brains were clinically and pathologically diag- and sporadic PD brains remain inconclusive, with most nosed using international diagnostic criteria [20], with two of the reports showing non-significant changes of SNCA having mild cognitive impairment and a Clinical Dementia transcript levels [11–14]. Rating of 0.5 [21]. Approximately 50 mg of brain tissue from Over the last few years, an increasing number of genes frozen slices of frontal cortex GM and WM was isolated for involved in neurodegenerative disorders have been shown to total RNA extraction. The frontal gyrus (superior frontal gyrus differentially express more than one splice variant, including (SFG)) from the frontal cortex was selected, as it is moderately presenilin 1 (PSEN1)and2(PSEN2), apolipoprotein E affected by the MSA-specific neurodegenerative process [2]. (APOE)andmicrotubule-associatedproteintau(MAPT) The motivation behind selecting this region was to capture [15]. Currently, RNA sequencing (RNA-Seq) is the most early or at least only moderately advanced changes in the powerful tool available for transcriptome-wide investigations. brain tissue so that any observed aberration of gene expression RNA-Seq is characterized by low levels of variability, due to could be directly related to the molecular pathology occurring the direct sequencing of the mRNA target rather than the in MSA prior to the secondary effects of cell loss that occurs indirect detection of hybridization events between the with disease progression. The number of samples subjected to probe and complementary transcripts, as in the case of transcriptome sequencing fulfilled requirements for sufficient microarrays. RNA-Seq has been used to investigate the number of biological replicates recommended for RNA-Seq transcriptome of sufferers of Alzheimer’sdiseaseand experiments (http://encodeproject.org/ENCODE/protocols/ autistic brains [16–18]. Gene expression changes have dataStandards/RNA_standards_v1_2011_May.pdf)[22]. been assessed in the rostral pons of MSA patients using Total RNA was isolated using RNeasy Lipid Tissue Mini microarrays [19]; however, the analytical power of that Kit (Qiagen, Hilden, Germany) followed by RNase-free DN- study was limited by coverage of only known protein- ase treatment to remove traces of genomic DNA. The RNA coding genes, which comprise up to 3 % of the quality of the total RNA was assessed using the Agilent 2100 expressed genome, overlooking the possible effects of Bioanalyzer RNA Nano Chip. The RNA integrity number long intervening non-coding RNAs (lincRNAs). Further- (RIN) values ranged between 6.0 and 7.0 which provided more, to the best of our knowledge, there have been no statistically robust transcriptome representation as previously RNA-Seq studies of MSA brain tissue. reported [23]. In this report, we provide the first global RNA-Seq generated survey of the transcriptome landscape in MSA Library preparation and sequencing brains. This is also the first time that distinct cortical structures such as white matter (WM) and grey matter RNA samples were prepared for sequencing according to the (GM), which are affected by MSA-specific neurodegen- Illumina TruSeq RNA sample preparation guide and subjected eration, have been comparatively analyzed using RNA- to 100-bp paired-end sequencing using Illumina HiSeq1000. Seq. The elevated expression of the haemoglobin genes The data sets supporting the results of this article are available in MSA WM and the decreased expression of in the NCBI Short Read Archive with the accession number transthyretin (TTR)inMSAGM,asidentifiedinthis SRA512485 and SRA091951. study, provide new insights into the molecular patholo- gy of MSA. Moreover, we discovered a number of Transcript mapping and quantification lincRNAs that are specifically expressed in MSA GM and MSA WM that open new avenues of investigations Bioinformatics analysis, which also included six RNA-Seq towards the elucidation of MSA pathogenesis. data sets from healthy GM and WM (SRA091951), was Neurogenetics (2015) 16:107–122 109 performed using Galaxy, an open access web-based environ- redundant terms were removed. The fold enrichment of each ment that contains a variety of next-generation sequencing term in the list was then compared across conditions. analysis tools including TopHat and the Cufflinks package [24–26]. The Galaxy server was based at the Garvan Institute Quantitative real-time PCR of Medical Research, Sydney, Australia. As schematically presented in Suppl. Fig. 1, using TopHat, Using RNA from GM and WM MSA and control brain the reads were processed and aligned to the Homo sapiens tissue samples and commercial primers specific for TTR, reference genome (build hg19) [27]. The aligned reads were HBA1, HBA2 and HBB transcripts (Qiagen), RT-qPCR was processed with Cufflinks. Cufflinks assembles the RNA-Seq performed and signal quantified as described previously reads into individual transcripts, inferring the splicing struc- [16]. ture of the genes [28]. Cufflinks normalizes the RNA-Seq fragment counts to estimate the abundance of each transcript. Abundance was measured as fragments per kilobase of exon per million fragments mapped (FPKM). For this analysis, a Results General Transfer Format (GTF) annotation file (iGenomes UCSC hg19 gene annotation) was used to guide the assembly. We performed the comparative transcriptome analysis in two The output GTF files from each of the Cufflinks analysis dimensions. First, we considered the role of oligodendrocytes and the GTF annotation file were sent to Cuffmerge [28]. in MSA pathology and analyzed differences between the GM Cuffmerge amalgamates files into a single unified transcript and WM transcriptomes within the MSA sample set. Second, catalog; it also filters out any transcribed fragments that may we performed a pairwise analysis of GM and WM between be mapping and assembly artefacts. The inclusion of the MSA and control samples to capture fundamental changes in reference annotation allows gene names and other details such gene expression patterns resulting from neurodegenerative as transcript ID, exon number, transcription start site ID and processes in each tissue type. Our results included the identi- coding sequence ID to be added to the merged transcript fication of a number of un-annotated and cortex region- catalog. Cuffmerge also allows the gene and transcripts to be specific lincRNAs. Principle component analysis (PCA) de- classified as annotated or un-annotated. The merged GTF file termined that the transcriptome profiles clustered by tissue is then fed to Cuffdiff along with the original alignment files type (GM and WM) rather than disease state (Fig. 1). This produced from TopHat. Cuffdiff takes the replicates from each highlights the propensity of the brain’s transcriptome to vary condition and looks for statistically significant changes in greatly across tissue types and that the MSA disease process gene expression, transcript expression, splicing and promoter only causes moderate perturbations to the transcriptome in the use. Cuffdiff uses a corrected p value, known as the q value, to regions analyzed. determine whether the differences between the two groups are significant (q value <0.05). Genes were considered to be Transcriptome analysis of MSA GM and control GM expressed if they had an FPKM value greater than 1 in at least one condition. Isoforms were considered to be expressed if Cufflinks identified a total of 15,261 expressed genes across they had an FPKM value of greater than 0.5 in at least one MSA GM and control GM. These genes cover approximately condition. Genes and transcripts were excluded if they had an 33 % of the human genome. Analysis of control GM and FPKM value greater than 5,000 in both conditions or a value MSA GM revealed five DEGs between the two conditions greater than 10,000 in one condition. (Table 1). Genes that were up-regulated in healthy brain included major histocompatibility complex, class I, A (HLA- Gene and isoform enrichment analysis A), major histocompatibility complex, class I, B (HLA-B), major histocompatibility complex, class I, C (HLA-C), TTR Differentially expressed genes (DEGs) and differentially and uncharacterized LOC389831 transcript (LOC389831). expressed isoforms (DEIs) were entered into Database for LOC389831 was up-regulated 8-fold in control GM. The Annotation, Visualization and Integrated Discovery full-length four-exon LOC389831 has been validated as an (DAVID) (http://david.abcc.ncifcrf.gov/)[29]. DAVID can uncharacterized, human, protein-coding gene. The only utilize annotated genes/isoforms; thus, all un-annotated LOC389831 transcript that was expressed in this study was a genes/isoforms and indecisively annotated genes/isoforms shorter variant containing three exons (Ensembl ID: were removed from the lists. Gene ontology (GO) enrichment ENST00000400890). TTR was expressed at 1.57 FPKM in test was performed on the list of DEGs and DEIs. To generate the MSA GM, and in control GM, the level of expression was the GO heatmaps, the top 30 enriched GO terms, selected by p 25.71 FPKM (Fig. 2). This was a down-regulation of 16.36- value, from each tissue type were selected. The GO terms fold in MSA GM (up-regulation in control GM). There was no were compiled into one list, now containing 60 terms, and any difference in the expression levels of TTR between MSA GM 110 Neurogenetics (2015) 16:107–122

Fig. 1 Principle component analysis at the gene level. Dimensionality reduction results in the formation of two distinct clusters. MSA and control WM cluster together, and MSA and control GM cluster together. This result demonstrates that the data clusters by tissue type rather than disease state

and MSA WM or between MSA WM and control WM. The with control WM, HBA1, HBA2 and HBB were all up- transcribed TTR transcript was the dominant, four-exon, regulated approximately 3.75-fold in MSA WM (Fig. 3a). protein-coding isoform (Ensembl ID: ENST00000237014). All three genes were also up-regulated in MSAWM compared There were no additional TTR splice variants expressed in with MSA GM. HBA1 was up-regulated 3.27-fold, HBA2 was any other tissue types. Down-regulation of the TTR gene up-regulated 3.47-fold, and HBB was up-regulated 3.65-fold expression has been confirmed using RT-qPCR quantification (Fig. 3b). While there were no statistically significant differ- (Suppl. Fig. 2a). ences in the expression levels of HBA1, HBA and HBB when Cufflinks identified 35,117 distinct isoforms in MSA and MSA GM was compared with healthy GM, there was an healthy GM. This equates to approximately 2.3 splice variants overall reduction in the gene expression levels in MSA GM per gene. Upon further analysis, it was revealed that alterna- (Fig. 3c). Therefore, there was no contamination from periph- tive splicing occurred during the transcription of 8,752 genes; eral blood, a possible source of the haemoglobin-related tran- 8,660 genes produced 2 to 7 splice variants, and 91 genes scripts. These results point to the up-regulation of produced 8 to 12 splice variants. The list of DEIs completely haemoglobin genes being specific to MSA WM. Furthermore, overlapped with the DEGs list and did not reveal any signif- we performed additional quantification of haemoglobin genes icant changes in splicing patterns. using RT-qPCR and independent set of brain tissue sample with results corroborating outcomes of the RNA-Seq analysis (Suppl. Fig. 2b). Transcriptome analysis of MSA WM and control WM In MSA and control WM, 26,876 distinct isoforms were identified by Cufflinks, which equates to approxi- Cufflinks identified a total of 15,106 genes expressed across mately 1.78 splice variants per gene. There were 6,445 MSA WM and control WM. The genes had a cumulative genes that underwent alternative splicing; 6,376 genes coverage of approximately 28 % of the human genome. A produced 2 to 7 splice variants, and 69 genes produced total of seven genes were considered to be DEGs (Table 1). 8to14splicevariants.ThelistofDEIscompletely Genes that were up-regulated in control WM compared to overlapped with the DEGs list and did not reveal any MSA WM included HLA-A, HLA-B and HLA-C. This may changes in the splicing patterns. suggest a down-regulation of important inflammatory pro- cesses in MSA WM. Genes that were up-regulated in MSA WM compared to control WM included haemoglobin alpha 1 α-Synuclein expression patterns (HBA1), haemoglobin alpha 2 (HBA2), haemoglobin beta (HBB) and interleukin 1 receptor-like 1 (IL1RL1). The expression levels of SNCA were not identified as The three haemoglobin genes, HBA1, HBA2 and HBB, differentially expressed in any of the transcriptome com- displayed statistically significant changes in gene expression parisons. Overall, there was a 2.2-fold up-regulation of in two of the transcriptome comparisons. When compared SNCA in MSA GM compared with MSA WM. The Neurogenetics (2015) 16:107–122 111

Table 1 Differentially expressed genes from comparison of MSA and healthy brain tissue

Differentially expressed genes in GM Gene Description Chrom. FPKM MSA FPKM control Fold change q value Ensembl ID TTR Transthyretin chr18 1.5719 25.7088 16.3557 0.03988 ENSG00000118271 LOC389831 Uncharacterized transcript LOC389831 chr7 1.2571 10.0836 8.0214 0.01546 ENSG00000215781 HLA-A Major histocompatibility complex, class I, A chr6 0.2338 8.7909 37.6050 0.00228 ENSG00000206503 HLA-B Major histocompatibility complex, class I, B chr6 1.9616 19.7728 10.0802 0.00027 ENSG00000234745 HLA-C Major histocompatibility complex, class I, C chr6 6.0681 11.6432 1.9187 0.00046 ENSG00000204525 Differentially expressed in WM Gene Description Chrom. FPKM MSA FPKM Control Fold change q value Ensembl ID IL1RL1 Interleukin 1 receptor-like 1 chr2 5.5376 0.4399 −12.5897 0.04324 ENSG00000115602 HBA1 Haemoglobin, alpha 1 chr16 570.2840 152.0060 −3.7517 0.02408 ENSG00000206172 HBA2 Haemoglobin, alpha 2 chr16 209.3750 55.7293 −3.7570 0.02866 ENSG00000188536 HBB Haemoglobin, beta chr11 359.3090 94.1778 −3.8152 0.01067 ENSG00000244734 HLA-A Major histocompatibility complex, class I, A chr6 0.4108 13.1112 31.9189 0.01293 ENSG00000206503 HLA-B Major histocompatibility complex, class I, B chr6 2.7718 20.3784 7.3520 0.01420 ENSG00000234745 HLA-C Major histocompatibility complex, class I, C chr6 7.4809 16.4211 2.1951 0.02384 ENSG00000204525

SNCA expression in MSA GM was up-regulated 1.4- Comparative transcriptome analysis of MSA GM and MSA fold compared to control GM, and it was l.7-fold higher WM in MSA WM compared to control WM. The expression of SNCA was higher in MSA-affected tissue than the Cufflinks identified 16,930 expressed genes across MSA GM control counterpart across both tissue types, with MSA and MSA WM. The cumulative transcriptional coverage of GM having the highest overall expression levels of the human genome across MSA GM and MSA WM was SNCA.Similarly,wehavenotobservedsignificantly approximately 34 %; this figure includes transcribed altered levels of expression for MAPK and COQ2-10 introns that were spliced out. The 16,930 genes includ- genes (Suppl. Table 1). ed 2,318 un-annotated genes and 33 previously annotat- Analysis of SNCA alternative splicing revealed a ed lincRNAs. Among the set of genes that were presence of three transcriptional isoforms (NCBI IDs: expressed across MSA GM and MSA WM, 1,910 were NM_001146055, NM_00345, NM_001146054). All identified as differentially expressed genes (DEGs) (q three transcripts had the same number of exons; how- value <0.05) (Suppl. Table 1). This included 981 genes ever, the transcripts varied in the positioning and length that were up-regulated in GM and 929 genes that were of the first exon, including 5′ untranslated region up-regulated in WM (Fig. 4). (UTR). There was a 1.8-fold and 1.4-fold up- Among the 1,910 DEGs, 195 were un-annotated. Of these regulation of the transcripts NM_00345 and un-annotated genes, 160 were greater than 200 nucleotides NM_001146054 across both MSA GM and MSA WM (nts) in length, were transcribed from intervening regions of compared with their control counterparts, respectively. the genome and showed little to no protein-coding potential; NM_001146055 was expressed at a low level in both thus, they may qualify as lincRNAs. The majority of these WM conditions and was down-regulated 1.6-fold in putative lincRNAs did not undergo alternative splicing and MSA GM compared with control GM. There was a tended to contain only one exon. There was a greater number decrease in the abundance of all three transcripts in of putative lincRNAs up-regulated in WM compared with MSA WM compared with MSA GM. According to the GM (133 to 27), and more annotated lincRNAs were up- NCBI database, all three transcripts would be translated regulated in WM compared with the GM (5 to 1 ratio) into the SNCA protein isoform NACP140; however, the (Fig. 5). The average FPKM of the differentially expressed differing 5′ UTR could further alter the expression lincRNAs in GM was 18.46, and the average FPKM of the levels of NACP140. Analysis of the 5′ UTR of all three differentially expressed lincRNAs in WM was 63.16. Overall, transcripts denoted the presence of a GY-box motif in WM contained a greater number of more highly expressed NM_001146054 that was not observed in the other two lincRNAs than GM. The top 10 up-regulated genes in GM transcripts. It has previously been suggested that the and the top 10 up-regulated genes in WM are shown in GY-box motif could be a target for miRNAs, thus Table 2. The top 10 up-regulated genes in GM included 4 altering the translation of the transcript [30]. un-annotated genes that satisfied the criteria for lincRNAs. 112 Neurogenetics (2015) 16:107–122

Fig. 2 Gene expression of TTR across control GM and MSA GM. TTR was down-regulated 16.36- fold in MSA GM compared with control GM (q value <0.04). There was no change in the expression levels of TTR across any of the tissue comparisons. Error bars are ±standard deviation

Fig. 3 Gene expression levels of HBA1, HBA2 and HBB in different transcriptome analyses. a Gene expression levels in control WM and MSA WM. b Gene expression levels in MSA GM and MSA WM. c Gene expression levels in control GM and MSA GM. All errors bars are ±standard deviation, and asterisk denotes a q value <0.02 Neurogenetics (2015) 16:107–122 113

Fig. 4 Volcano plot of gene expression in MSA GM relative to MSA WM. The fold change of the genes was relative to their expression in WM. Those genes with a negative fold change were up-regulated in GM (down- regulated in WM), and those genes with a positive fold change were up-regulated in WM (down- regulated in GM). Genes that were statistically significant (q value <0.05) are shown in red. This figure demonstrates that an equal number of genes were up- regulated in MSA GM (981) and MSA WM (929)

The top 10 up-regulated genes in WM included 9 un- there were 8,669 genes with 2 to 7 isoforms and 102 genes annotated genes that fulfilled the criteria for lincRNAs. with 8 to 14 isoforms. Of the 36,054 expressed isoforms, 959 isoforms were considered to be differentially expressed iso- Differential isoform expression forms (DEIs) between MSA GM and MSA WM (Suppl. Table 1); 418 were up-regulated in MSA GM, and 541 were Cufflinks identified 36,054 distinct isoforms in MSA GM and up-regulated in MSA WM (Fig. 6). There were 16 un- MSA WM, or approximately 2.13 splice variants (isoforms) annotated isoforms from the MSA GM list and 95 un- per gene. The distribution of isoforms per gene revealed that annotated isoforms from the MSA WM list that were greater

Fig. 5 Heatmap of differentially expressed putative lincRNAs. The heatmap shows the FPKM values of all 185 differentially expressed putative lincRNAs. Region 1 of the heatmap shows the lincRNAs that are up- regulated in MSA WM, and region 2 shows the lincRNAs that are up-regulated in MSA GM. Overall, there is a transcriptional bias of lincRNAs towards MSA WM 114 Neurogenetics (2015) 16:107–122

Table 2 Top 10 up-regulated genes from comparison of MSA GM and MSA WM

Up-regulated in GM Gene Description Chrom. FPKM GM FPKM WM Fold change q value Ensembl ID Putative lincRNA, two exons, first exon: chr2 1.2862 0.0000 Unique to GM 0.03721 Not annotated 185 bp second exon, 2,255 bp Putative lincRNA, one exon, 2,036 bp chr8 1.1308 0.0000 Unique to GM 0.04016 Not annotated Putative lincRNA, one exon, 3,090 bp chr5 1.1164 0.0000 Unique to GM 0.03310 Not annotated KCNB2 Potassium voltage-gated channel, Shab-related chr8 1.3147 0.0000 Unique to GM 0.02436 ENSG00000182674 subfamily, member 2 CBLN4 Cerebellin 4 precursor chr20 14.2528 0.0851 −167.5062 0.00002 ENSG00000054803 OPRK1 Opioid receptor, kappa 1 chr8 1.5062 0.0135 −111.9057 0.03811 ENSG00000082556 OPRD1 Opioid receptor, delta 1 chr1 1.8432 0.0191 −96.7509 0.01004 ENSG00000116329 SYT1 Synaptotagmin I chr12 191.0900 2.0143 −94.8691 0.00007 ENSG00000067715 MARCH4 Membrane-associated ring finger (C3HC4) 4, chr2 3.1859 0.0339 −94.0479 0.00748 ENSG00000144583 E3 ubiquitin protein ligase Putative lincRNA, one exon, 6,291 bp chr18 2.3955 0.0278 −86.1325 0.02222 Not annotated Up-regulated in WM Gene Description Chrom. FPKM GM FPKM WM Fold change q value Ensembl ID Putative lincRNA, one exon, 1,403 bp chr7 0.0177 1.9526 110.4519 0.01895 Not annotated Putative lincRNA, one exon, 1,307 bp chr2 0.0371 1.9098 51.4365 0.00916 Not annotated Putative lincRNA, one exon, 1,120 bp chr14 0.0469 1.9890 42.4112 0.01054 Not annotated Putative lincRNA, one exon, 753 bp chr15 0.1095 3.9356 35.9306 0.00923 Not annotated Putative lincRNA, one exon, 1,065 bp chr2 0.1094 3.2436 29.6420 0.01909 Not annotated Putative lincRNA, one exon, 1,045 bp chr9 0.0756 2.2274 29.4469 0.00714 Not annotated CHI3L2 Chitinase 3-like 2 chr1 2.4072 68.1419 28.3073 0.00009 ENSG00000064886 Putative lincRNA, one exon, 1,173 bp chr14 0.1820 4.9905 27.4179 0.00127 Not annotated Putative lincRNA, one exon, 2,957 bp chr7 0.0667 1.8069 27.0887 0.00022 Not annotated Putative lincRNA, one exon, 915 bp chr13 0.2391 6.3786 26.6775 0.00030 Not annotated

than 200 nts in length and were transcribed from intervening lincRNAs. The putative lincRNAs tended to contain one exon regions of the genome; these isoforms thus qualify as possible and displayed no protein-coding potential. The putative

Fig. 6 Volcano plot of isoform expression in MSA GM relative to MSA WM. The fold change of the isoform was relative to their expression in WM. Those isoforms with a negative fold change were up-regulated in GM (down-regulated in WM), and those isoforms with a positive fold change were up-regulated in WM (down-regulated in GM). Isoforms that were statistically significant (q value <0.05) are shown in red. There were 418 isoforms up-regulated in MSA GM and 541 up-regulated in MSA WM Neurogenetics (2015) 16:107–122 115

Table 3 Top 10 up-regulated isoforms from comparison of MSA GM and MSA WM

Up-regulated in GM Isoform Description Chrom. FPKM GM FPKM WM Fold change q value Ensembl ID MGAT5B Mannosyl (alpha-1,6-)-glycoprotein beta-1, chr17 21.8920 0 Unique to GM 0.04453 ENST00000301618 6-N-acetyl-glucosaminyltransferase, isozyme B FAM81A Family with sequence similarity 81, chr15 19.0847 0 Unique to GM 0.00649 Novel isoform member A ERRFI1 ERBB receptor feedback inhibitor 1 chr1 6.5719 0 Unique to GM 0.01653 ENST00000377482 CHRDL1 Chordin-like 1 chrX 5.0222 0 Unique to GM 0.01076 Novel isoform CACNA2D2 Calcium channel, voltage-dependent, chr3 3.0060 0 Unique to GM 0.02507 Novel isoform alpha 2/delta subunit 2 EPN3 Epsin 3 chr17 2.9916 0 Unique to GM 0.01862 ENST00000537145 FAM65B Family with sequence similarity 65, chr6 2.7512 0 Unique to GM 0.03491 Novel isoform member B FAM135A Family with sequence similarity 135, chr6 1.8649 0 Unique to GM 0.04920 Novel isoform member A ZNF398 Zinc finger protein 398 chr7 1.7945 0 Unique to GM 0.04214 ENST00000420008 EPHA7 EPH receptor A7 chr6 1.6580 0 Unique to GM 0.04631 Novel isoform Up-regulated in WM Isoform Description Chrom. FPKM GM FPKM WM Fold change q value Ensembl ID AAK1 AP2-associated kinase 1 chr2 0 3.0779 Unique to GM 0.04178 Novel isoform EWSR1 EWS RNA-binding protein 1 chr22 0 39.7081 Unique to GM 0.00947 Novel isoform PKM2 Pyruvate kinase isozymes M1/M2 chr15 0 86.2297 Unique to GM 0.01373 ENST00000335181 PRCD Progressive rod-cone degeneration chr17 0 16.4962 Unique to GM 0.01898 Novel isoform Putative lincRNA, one exon, 1,307 bases chr2 0.0371 1.9098 51.4365 0.02682 Gene not annotated SGK2 Serum/glucocorticoid regulated kinase 2 chr20 0.2478 12.3214 49.7278 0.00774 Novel isoform Putative lincRNA, one exon, 1,120 bases chr14 0.0469 1.9890 42.4112 0.03066 Gene not annotated HOXD-AS1 HOXD cluster antisense RNA 1 chr2 0.2084 8.7530 42.0106 0.04159 ENST00000417086 Putative lincRNA, one exon, 753 bases chr15 0.1095 3.9356 35.9306 0.02702 Gene not annotated USH1C Usher syndrome 1C chr11 1.8665 56.1677 30.0920 0.00006 Novel isoform

lincRNAs were generally expressed at higher levels in WM. were selected. The levels of enrichment of each GO term in The top 10 up-regulated isoforms in MSA GM and MSA WM each tissue type were visualized using a heatmap (Fig. 8). The are listed in Table 3. None of the top 10 up-regulated isoforms GO terms that had higher levels of enrichment in MSA GM in MSA GM or MSA WM appeared in the analogous top 10 related to synaptic processes and various transport- and ion- gene lists (Table 2). Of the top 10 up-regulated isoforms in related processes. For MSA WM GO terms relating to MSA WM, three fulfilled the criteria for lincRNAs. myelination, oligodendrocyte differentiation and ensheathment of axons and neurons had high levels of enrichment. General GO analysis of MSA GM and MSA WM GO terms relating to the cytoskeleton, cellular processes and the plasma membrane were enriched in both GM and WM. The lists of annotated DEGs from MSA GM and MSA WM The lists of DEIs from MSA GM and MSA WM were were entered into the DAVID. DAVID assigned GO terms to entered into DAVID. Of the 400 annotated isoforms up- each gene and identified any enriched GO terms in the genes regulated in GM, 293 different GO terms were identified as lists. For the 924 annotated genes up-regulated in GM, 499 enriched (Suppl. Table 1). For the 416 annotated isoforms up- different GO terms were identified as enriched (Suppl. regulated in WM, 147 different GO terms were identified as Table 1). For the 725 annotated genes up-regulated in WM, enriched (Suppl. Table 1). The GO enrichment profiles of the 255 different GO terms were considered enriched (Suppl. DEIs also diverged by tissue type; only 45 were common to Table 1). As expected, pathway analysis of GM and WM both tissue types (Fig. 7b). Again, the top 30 GO terms (by p tissues revealed divergent GO enrichment profiles; only 66 value) from MSA GM and MSA WM were selected, and the GO terms were common to each data set (Fig. 7a). To further enrichment level of the GO terms was compared (Fig. 9). In visualize the differences in the GO enrichment profiles, the GM, the highest levels of enrichment were again seen in top 30 GO terms (by p value) from MSA GM and MSA WM synapses and various transport- and ion-related processes. 116 Neurogenetics (2015) 16:107–122

Terms with higher levels of enrichment in WM related pre- 28 % of the genome; in GM, the identified transcripts covered dominately to ensheathment processes. General terms relating approximately 33 % of the genome, and the union of both sets to the cytoskeleton and membrane organization were enriched covered approximately 34 % of the genome. This finding in both tissue sets. suggests that while there were higher levels of transcription in Finally, the enriched GO terms generated from the DEGs GM, unique and tissue-specific transcription occurred in each and the DEIs were compared to determine if there were any tissue type. The amount of transcription reported is close to the differences between the two sets. The list of DEGs and DEIs figure described by The Encyclopedia of DNA Elements from GM shared 234 common GO terms (Fig. 7c), while the (ENCODE) consortium, which reported that among 15 human list of DEGs and DEIs from WM shared 114 common GO cell lines, the average coverage of the human genome by terms (Fig. 7d). For both sets, the majority of the GO terms primary transcripts was 39 % [32]. The ENCODE study in- generated from the list of DEIs were also present in the list cluded non-polyadenylated RNAs, while the current study generated from the DEGs. utilized poly-T oligo-attached magnetic beads to select the RNA fraction for analysis. If the RNA selection for this study Comparison of control GM and control WM transcriptomes also included non-polyadenylated RNAs, it would be expected that the coverage of the genome would be equal to or higher Previously, Mills et al. identified 1,652 genes that were differ- than the 39 % suggested by ENCODE. This would suggest that entially expressed between control GM and control WM [31]. transcription is pervasive in the human brain and tissue specific. This included 1,217 genes up-regulated in GM and 434 genes up-regulated in WM. Our current study detected a modest Characteristics of transcriptome profiles in MSA brain decrease in the number of genes considered up-regulated in MSA GM compared with control GM (981 to 1,217). However, The difference, in terms of the number of DEGs and DEIs, there was a drastic increase in thenumberofup-regulatedgenes between MSA GM and MSA WM is greater than the differ- in MSA WM compared with control WM (929 to 434). Of the ence between MSA and control brain within their respective 947 annotated genes on the MSA GM list, 718 also appeared in regions. This finding may reflect a strong cell-type specificity the control GM lists, and of the 768 annotated genes on the of the transcriptome, which has also been reflected in our MSA WM list, 304 were common to the WM control list. previous study, in which samples from the normal GM and Further, there were a total of 133 putative lincRNAs up- WM were compared [31]. Indeed, Mills et al. previously regulated in MSA WM compared with only 52 putative identified 1,652 DEGs when comparing healthy GM and lincRNAs in control WM. None of the lincRNAs were common WM, which is a similar amount to the 1,910 DEGs between the data sets. At the isoform level, 681 isoforms were identified in the present study when comparing MSA up-regulated in GM, and 201 isoforms were up-regulated in GM and WM. Therefore, it could be suggested that the WM. In our current study, there was a comparable number of majority of DEGs and DEIs differentially expressed in isoforms up-regulated in MSA GM (418 to 681) and a large MSA GM and WM reflect in greater extent the cellular increase in the number of isoforms up-regulated in MSA WM and functional distinctiveness of these two different brain (541 to 201). The up-regulated isoforms from the MSA GM and structures than any contribution from neurodegenerative control GM lists had 59 isoforms in common, and the up- processes. This dominant cell type/region-specific tran- regulated isoforms from the MSA WM and control WM lists scriptional profile, outweighing any MSA-linked pattern, had 71 isoforms in common. No common putative lincRNAs can also be observed when performing a GO enrichment were observed in either list. When comparing these two different analysis. The majority of GO term clusters identified as transcriptome analyses, there appears to be a larger number of GM-specific were associated with synaptic physiology differentially expressed lincRNAs in MSA brain tissue com- and neuronal membrane potentiation, whereas WM- pared with control brain tissue, and there was a particularly large specific biological processes were linked to ensheathment number of differentially expressed lincRNAs in MSA WM. and myelination (see “Results” for details). Conversely, a small number of DEGs and DEIs that were identified when comparing MSA and control brain tissue may be interpreted from two perspectives. First, we selected the Discussion SFG from the frontal cortex, which is moderately affected by the MSA-specific neurodegenerative process [2]. The motiva- Pervasive transcription tion behind selecting this region was to capture early or at least only moderately advanced changes in the brain tissue so that Our study revealed high levels of transcription in the human any observed aberration of gene expression could be directly brain: in WM, the identified transcripts covered approximately related to the molecular pathology occurring in MSA prior to Neurogenetics (2015) 16:107–122 117

Fig. 7 Venn diagrams of enriched GO terms for MSA GM and MSA WM. a GO terms enriched in MSA GM (purple) and MSA WM (tan) using DEGs lists. The lists shared 66 terms in common. b GO terms enriched in MSA GM (green) and MSA WM (blue) using DEIs lists. The lists shared 45 terms in common. c GO terms enriched in MSA GM genes (purple) and MSA GM isoforms (green). The lists shared 234 terms in common. d GO terms enriched in MSA WM genes (tan) and MSA WM isoforms (blue). The lists shared 114 terms in common

the secondary effects of cell loss that occurs with disease peptides, an aggregating protein that is one of the hallmark progression. The distinct overexpression of all haemoglobin pathological signatures of Alzheimer’sdisease[33]. It is chains in the WM and down-regulation of TTR in GM appear thought that TTR may suppress the Alzheimer’sdisease to be MSA disease specific (see below for details; Fig. 2). phenotype by binding to and inhibiting Aβ toxicity Second, we applied very stringent statistical criteria to select [34]. Other studies have demonstrated that TTR may DEGs and transcripts, which also led to the identification of be neuroprotective [35], and decreased levels of TTR fewer transcripts. However, we believe that this strategy de- have been associated with memory impairment in mice livers more confident results in terms of future functional [36]. These studies suggested that decreased levels or studies of individual gene candidates. the absence of TTR may accelerate cognitive dysfunc- tion observed with aging and make the elderly more MSA brain tissue has moderate alterations in SNCA gene susceptible to neurodegeneration. expression In the current study, there was a dramatic down-regulation of TTR (16.36-fold) in the superior frontal GM of MSA As determined by RNA-Seq, the levels of SNCA gene expres- patients compared with controls, and two of the three MSA sion were only moderately elevated in the MSA tissues. This cases had mild cognitive impairment. Recent studies have result confirms earlier observations, which were obtained emphasized that single-domain executive dysfunction is com- using qPCR, that despite an accumulation of α-synuclein mon in patients with MSA [37–40] and that there are no aggregates, the levels of SNCA transcript remain largely un- obvious pathological variables that account for cognitive def- changed in MSA [13, 12, 14]. These unchanged expression icits in MSA [13]. Perhaps, the reduced TTR expression levels do not eliminate possible post-transcriptional or post- observed relates to this phenomenon, particularly, as it has translational modifications to α-synuclein that may make it been shown to be associated with prefrontal involvement [39]. more prone to aggregation. Moreover, future studies should It is of interest that such changes appear to be independent of aim to analyze SNCA gene expression profiles in brain tissue the amount of α-synuclein aggregates [13]. fractions enriched in particular cellular components of the human cortex, such as oligodendrocytes, in order provide WM-specific haemoglobin overexpression could be related more detailed cell type specificity of α-synuclein transcrip- to the unique targeting of the white matter in MSA tional distribution. Our results demonstrated an increase in the expression of all Change in TTR expression three members (HBA1, HBA2, HBB) of the haemoglobin protein complex; this increase in expression was specific to Over the past two decades, a number of publications have MSA WM (Fig. 3). Haemoglobin is the largest source of suggested that TTR could interact with amyloid-β (Aβ) peripheral iron in the human body, and it may play a role in 118 Neurogenetics (2015) 16:107–122

Color Key and Histogram 40 Count 0 20 02468 Value

actin binding structural constituent of ribosome basolateral plasma membrane cytoskeletal protein binding ribosome actin cytoskeleton organization actin cytoskeleton actin filament−based process structural molecule activity cytoskeleton cytoskeleton organization lipid binding plasma membrane part plasma membrane cell junction neuron development cell projection organization cellular component morphogenesis regulation of cell projection organization positive regulation of MAP kinase activity ensheathment of neurons axon ensheathment myelination cytosolic ribosome regulation of action potential in neuron translational elongation ribosomal subunit cytosolic part activation of MAPK activity actin filament organization regulation of action potential small ribosomal subunit oligodendrocyte differentiation cytosolic small ribosomal subunit voltage−gated ion channel activity voltage−gated channel activity clathrin−coated vesicle postsynaptic membrane synaptic transmission synapse ion channel complex gated channel activity cation channel activity transmission of nerve impulse synapse part cation channel complex voltage−gated cation channel activity metal ion transmembrane transporter activity neuron projection ion channel activity alkali metal ion binding substrate specific channel activity channel activity passive transmembrane transporter activity metal ion transport cell−cell signaling cation transport ion transport synaptic vesicle neurotransmitter transport GM WM Fig. 8 Heatmap showing the levels of enrichment of selected GO terms terms were removed. The darker the colour, the higher the level of enriched in MSA GM and MSA WM DEG lists. The top 30 enriched GO enrichment of the GO term terms (selected by p value) from each lists were chosen. All redundant iron homeostasis throughout the brain [41, 42]. Disrupted iron occur prior to neuronal loss. The correlation among homeostasis has long been associated with various neurode- iron, haemoglobin and neurodegeneration is further sup- generative disorders. It has been demonstrated that high levels ported by the finding that high levels of α-synuclein are of iron correlate with regions of neurodegeneration in both present in blood, specifically in red blood cells [45]. MSA and PD [43, 44]. High iron levels may also promote Further investigations of the role of haemoglobin and its oxidative stress, alter myelin synthesis, increase the aggrega- interaction with iron and α-synuclein in the early stages tion of α-synuclein and cause neuronal death [44]. However, of MSA are required. the debate on whether iron accumulation promotes neurode- It is of interest that HLA-A, HLA-B and HLA-C that generation or is a by-product of neuronal cell loss has not been relate to immune function were all expressed at higher settled [41]. In PD, evidence suggests that high levels of levels in control WM compared with MSA WM. These haemoglobin are associated with an increased risk of disease genes encode class 1 major histocompatibility complex development [44]. The selection of frontal WM with cell surface proteins that are constitutively expressed on limited MSA-specific damage in our study further sup- microglia and endothelial cells in the brain and can be ports that such increases in haemoglobin levels may up-regulated in all brain cells in response to an immune Neurogenetics (2015) 16:107–122 119

Fig. 9 Heatmap showing the levels of enrichment of selected GO terms terms were removed. The darker the colour, the higher the level of enriched in MSA GM and MSA WM DEI lists. The top 30 enriched GO enrichment of the GO term terms (selected by p value) from each lists were chosen. All redundant challenge [46]. Such reduced expression of HLA cell tissue did not highlight any differentially expressed surface proteins in MSA WM may suggest a reduction lincRNAs, suggesting that the lincRNAs are involved in WM-specific immune function that could relate to the in tissue-specific processes. The specific expression of documented reduced involvement of inflammatory mi- lincRNAs across different tissue types in the human croglia in the progression of MSA degeneration [47]. brain agrees with previous transcriptomic reports [48, 32]. When the lists of DEGs and DEIs from MSA Differential expression of lincRNAs suggests that a strong GM and MSA WM were compared with the control regulatory component is involved in MSA-specific GM and WM lists generated by Mills et al. [31], the neurodegeneration un-annotated lincRNAs diverged more than did the an- notated genes and isoforms. Consequently, it is not The largest category of DEGs and DEIs within MSA possible to conclude that lincRNAs are not involved in GM and MSA WM was un-annotated transcripts, many the establishment and progression of MSA. This study of which were putative lincRNAs. Many of the putative confirms the heterogeneity of the human brain by the lincRNAs have distinct tissue expression patterns. Un- identification of divergent transcriptomic profiles across fortunately, it is not possible to complete GO enrich- adjacent regions. Further, much of the divergence is due ment analysis with un-annotated lincRNAs; therefore, it to the differing expression patterns of putative is difficult to determine whether the lincRNAs are re- lincRNAs. It is important that the functions of these lated to tissue- or disease-specific processes. The tran- putative lincRNAs be identified so more complete GO scriptome comparisons between MSA tissue and control enrichment analyses can be performed. 120 Neurogenetics (2015) 16:107–122

Conclusions References

We would like to conclude with a notion of GO enrichment 1. Wenning GK, Geser F, Krismer F, Seppi K, Duerr S, Boesch S, analysis accuracy when using gene versus isoform expression Kollensperger M, Goebel G, Pfeiffer KP, Barone P, Pellecchia MT, sets. The utilization of gene lists is a standard approach that is Quinn NP, Koukouni V, Fowler CJ, Schrag A, Mathias CJ, Giladi N, also used in this study. In the context of RNA-Seq analysis, it Gurevich T, Dupont E, Ostergaard K, Nilsson CF, Widner H, Oertel W, Eggert KM, Albanese A, del Sorbo F, Tolosa E, Cardozo A, is important to note that the gene lists generated by any Deuschl G, Hellriegel H, Klockgether T, Dodel R, Sampaio C, differential expression software package, e.g. Cuffdiff, only Coelho M, Djaldetti R, Melamed E, Gasser T, Kamm C, Meco G, indirectly reflect the shape of the transcriptome because the Colosimo C, Rascol O, Meissner WG, Tison F, Poewe W, European level of gene expression is a sum of the expression levels of Multiple System Atrophy Study G (2013) The natural history of multiple system atrophy: a prospective European cohort study. alternatively spliced isoforms. This fact is of particular Lancet Neurol 12(3):264–274 importance when performing an analysis of human 2. Halliday GM, Holton JL, Revesz T, Dickson DW (2011) transcriptomes in which alternative splicing of genes is Neuropathology underlying clinical variability in patients with the norm [49]. Therefore, the elevated expression of one synucleinopathies. Acta Neuropathol 122(2):187–204. doi:10.1007/ s00401-011-0852-9 isoform can be abrogated by a reduced expression of 3. Bleasel JM, Wong JH, Halliday GM, Kim WS (2014) Lipid dysfunc- other isoforms when the cumulative expression level for tion and pathogenesis of multiple system atrophy. Acta Neuropathol the gene, as a transcriptionally active DNA locus, is Commun 2(1):15. doi:10.1186/2051-5960-2-15 estimated. In contrast, the expression level of the indi- 4. Fellner L, Jellinger KA, Wenning GK, Stefanova N (2011) Glial dysfunction in the pathogenesis of alpha-synucleinopathies: emerg- vidual transcript is a true primary signal generated by ing concepts. Acta Neuropathol 121(6):675–693. doi:10.1007/ RNA-Seq and should be preferentially used for down- s00401-011-0833-z stream analysis steps such as GO enrichment analysis. 5. Fellner L, Stefanova N (2013) The role of glia in alpha- As for genes, sequencing approach will result in the availabil- synucleinopathies. Mol Neurobiol 47(2):575–586. doi:10.1007/ s12035-012-8340-3 ity of annotation information for each transcript so that it can 6. Wakabayashi K, Takahashi H (2006) Cellular pathology in multiple be assigned to its respective GO terms. From the perspective system atrophy. Neuropathology 26(4):338–345 of our own analysis, the GO enrichment analysis based on 7. Wenning GK, Stefanova N, Jellinger KA, Poewe W, Schlossmacher transcriptional isoforms conveys more specific GO-termed MG (2008) Multiple system atrophy: a primary oligodendrogliopathy. Ann Neurol 64(3):239–246. doi:10.1002/ana. clusters, e.g. with a reduced number of general terms such as 21465 metabolism or cell division. 8. Huang Y, Song YJ, Murphy K, Holton JL, Lashley T, Revesz T, Gai Finally, this study was the first in-depth analysis of WP, Halliday GM (2008) LRRK2 and parkin immunoreactivity in the MSA transcriptome. This study also investigated the multiple system atrophy inclusions. Acta Neuropathol 116(6):639– 646. doi:10.1007/s00401-008-0446-3 expression patterns in two distinct tissue types, GM and 9. Multiple-System Atrophy Research C (2013) Mutations in COQ2 in WM, in an attempt to provide new insights into MSA familial and sporadic multiple-system atrophy. N Engl J Med 369(3): molecular pathology. We demonstrated that different 233–244. doi:10.1056/NEJMoa1212115 tissue types within the brain have distinct expression 10. Ozawa T, Healy DG, Abou-Sleiman PM, Ahmadi KR, Quinn N, Lees AJ, Shaw K, Wullner U, Berciano J, Moller JC, profiles, primarily due to the expression of novel Kamm C, Burk K, Josephs KA, Barone P, Tolosa E, lincRNAs. Further, we identified several new candidate Goldstein DB, Wenning G, Geser F, Holton JL, Gasser T, genes for future investigations in MSA, including Revesz T, Wood NW, European MSA (2006) The alpha- HBA1, HBA2 and HBB.Duetotheselectionofthe synuclein gene in multiple system atrophy. J Neurol Neurosurg Psychiatry 77(4):464–467. doi:10.1136/jnnp.2005. SFG as the brain region for study, these genes hold 073528 great potential as biomarkers for the early detection or 11. Kingsbury AE, Daniel SE, Sangha H, Eisen S, Lees AJ, Foster OJ confirmation of MSA. (2004) Alteration in alpha-synuclein mRNA expression in Parkinson’s disease. Mov Disord 19(2):162–170. doi:10.1002/mds. 10683 Acknowledgments Authors wish to thank Bei Jun Chen for her con- 12. Jin H, Ishikawa K, Tsunemi T, Ishiguro T, Amino T, Mizusawa H tribution to GO term enrichment analysis and generation of Suppl. Fig. 1 (2008) Analyses of copy number and mRNA expression level of the and Caroline Janitz for her expert advice regarding Illumina sequencing. alpha-synuclein gene in multiple system atrophy. J Med Dent Sci Tissues were received from the Sydney Brain Bank at Neuroscience 55(1):145–153 Research Australia and the New South Wales Tissue Resource Centre at 13. Asi YT, Simpson JE, Heath PR, Wharton SB, Lees AJ, Revesz T, the University of Sydney which are supported by the National Health and Houlden H, Holton JL (2014) Alpha-synuclein mRNA expression in Medical Research Council of Australia (NHMRC), University of New oligodendrocytes in MSA. Glia 62(6):964–970. doi:10.1002/glia. South Wales, Neuroscience Research Australia, Schizophrenia Research 22653 Institute and National Institute of Alcohol Abuse and Alcoholism (NIH 14. Ozawa T, Okuizumi K, Ikeuchi T, Wakabayashi K, Takahashi H, (NIAAA) R24AA012725). This research was supported by the National Tsuji S (2001) Analysis of the expression level of alpha-synuclein Health and Medical Research Council of Australia (Project grant mRNA using postmortem brain samples from pathologically con- #1022325 to WSK and Fellowship #630434 to GMH) and Brain Foun- firmed cases of multiple system atrophy. Acta Neuropathol 102(2): dation Australia (to MJ). 188–190 Neurogenetics (2015) 16:107–122 121

15. Mills JD, Janitz M (2012) Alternative splicing of mRNA in the grey matter corroborate structural and functional heterogeneity in the molecular pathology of neurodegenerative diseases. Neurobiol human frontal lobe. PLoS One 8(10):e78480. doi:10.1371/journal. Aging 33(5):1012 e1011–1024. doi:10.1016/j.neurobiolaging.2011. pone.0078480 10.030 32. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, 16. Mills JD, Nalpathamkalam T, Jacobs HI, Janitz C, Merico D, Hu P, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue CH, Marinov GK, Janitz M (2013) RNA-Seq analysis of the parietal cortex in Khatun J, Williams BA, Zaleski C, Rozowsky J, Roder M, Alzheimer’s disease reveals alternatively spliced isoforms related to Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, lipid metabolism. Neurosci Lett 536:90–95. doi:10.1016/j.neulet. Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, 2012.12.042 Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, 17. Twine NA, Janitz C, Wilkins MR, Janitz M (2013) Sequencing of Falconnet E, Fastuca M, Fejes-Toth K, P, Foissac S, hippocampal and cerebellar transcriptomes provides new insights Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, into the complexity of gene regulation in the human brain. Neurosci Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Lett 541:263–268. doi:10.1016/j.neulet.2013.02.034 Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, 18. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y,Horvath S, Mill J, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Cantor RM, Blencowe BJ, Geschwind DH (2011) Transcriptomic Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu analysis of autistic brain reveals convergent molecular pathology. YB, Ruan XA, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Nature 474(7351):380–384. doi:10.1038/nature10110 Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan YJ, 19. Langerveld AJ, Mihalko D, DeLong C, Walburn J, Ide CF (2007) Wold B, Carninci P, Guigo R, Gingeras TR (2012) Landscape of Gene expression changes in postmortem tissue from the rostral pons transcription in human cells. Nature 489(7414):101–108. doi:10. of multiple system atrophy patients. Mov Disord 22(6):766–777. doi: 1038/Nature11233 10.1002/mds.21259 33. Li X, Buxbaum JN (2011) Transthyretin and the brain re-visited: is 20. Wenning GK, Tison F, Seppi K, Sampaio C, Diem A, Yekhlef F, neuronal synthesis of transthyretin protective in Alzheimer’s disease? Ghorayeb I, Ory F, Galitzky M, Scaravilli T, Bozi M, Colosimo C, Mol Neurodegener 6:79. doi:10.1186/1750-1326-6-79 Gilman S, Shults CW, Quinn NP, Rascol O, Poewe W, Multiple 34. Schwarzman AL, Goldgaber D (1996) Interaction of transthyretin System Atrophy Study G (2004) Development and validation of the with amyloid beta-protein: binding and inhibition of amyloid forma- Unified Multiple System Atrophy Rating Scale (UMSARS). Mov tion. CIBA Found Symp 199:146–160, discussion 160–144 Disord 19(12):1391–1402. doi:10.1002/mds.20255 35. Santos SD, Lambertsen KL, Clausen BH, Akinc A, Alvarez R, 21. Morris JC (1993) The Clinical Dementia Rating (CDR): current Finsen B, Saraiva MJ (2010) CSF transthyretin neuroprotection in a version and scoring rules. Neurology 43(11):2412–2414 mouse model of brain ischemia. J Neurochem 115(6):1434–1444. 22. Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason doi:10.1111/j.1471-4159.2010.07047.x CE, Socci ND, Betel D (2013) Comprehensive evaluation of differ- 36. Brouillette J, Quirion R (2008) Transthyretin: a key gene involved in ential gene expression analysis methods for RNA-seq data. Genome the maintenance of memory capacities during aging. Neurobiol Biol 14(9):R95. doi:10.1186/gb-2013-14-9-r95 Aging 29(11):1721–1732. doi:10.1016/j.neurobiolaging.2007. 23. Gallego Romero I, Pai AA, Tung J, Gilad Y (2014) RNA-seq: impact 04.007 of RNA degradation on transcript quantification. BMC Biol 12(1):42. 37. Celebi O, Temucin CM, Elibol B, Saka E (2014) Cognitive profiling doi:10.1186/1741-7007-12-42 in relation to short latency afferent inhibition of frontal cortex in 24. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, multiple system atrophy. Parkinsonism Relat Disord. doi:10.1016/j. Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based parkreldis.2014.03.012 genome analysis tool for experimentalists. Curr Protoc Mol Biol. doi: 38. Kao AW, Racine CA, Quitania LC, Kramer JH, Christine CW, Miller 10.1002/0471142727.mb1910s89, Chapter 19:Unit 19 10 11–21 BL (2009) Cognitive and neuropsychiatric profile of the 25. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, synucleinopathies: Parkinson disease, dementia with Lewy bodies, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, and multiple system atrophy. Alzheimer Dis Assoc Disord 23(4): Nekrutenko A (2005) Galaxy: a platform for interactive large-scale 365–370. doi:10.1097/WAD.0b013e3181b5065d genome analysis. Genome Res 15(10):1451–1455. doi:10.1101/gr. 39. Kawai Y, Suenaga M, Takeda A, Ito M, Watanabe H, Tanaka F, Kato 4086505 K, Fukatsu H, Naganawa S, Kato T, Ito K, Sobue G (2008) Cognitive 26. Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive impairments in multiple system atrophy: MSA-C vs MSA-P. approach for supporting accessible, reproducible, and transparent Neurology 70(16 Pt 2):1390–1396. doi:10.1212/01.wnl. computational research in the life sciences. Genome Biol 11(8): 0000310413.04462.6a R86. doi:10.1186/gb-2010-11-8-r86 40. Stankovic I, Krismer F, Jesic A, Antonini A, Benke T, Brown RG, 27. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice Burn DJ, Holton JL, Kaufmann H, Kostic VS, Ling H, Meissner WG, junctions with RNA-Seq. Bioinformatics 25(9):1105–1111. doi:10. Poewe W, Semnic M, Seppi K, Takeda A, Weintraub D, Wenning 1093/bioinformatics/btp120 GK, On behalf of the Movement Disorders Society MSASG (2014) 28. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Cognitive impairment in multiple system atrophy: a position state- Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript ment by the neuropsychology task force of the MDS multiple system assembly and quantification by RNA-Seq reveals unannotated tran- atrophy (MODIMSA) study group. Mov Disord. doi:10.1002/mds. scripts and isoform switching during cell differentiation. Nat 25880 Biotechnol 28(5):511–515. doi:10.1038/nbt.1621 41. Kaur D, Andersen J (2004) Does cellular iron dysregulation play a 29. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and causative role in Parkinson’s disease? Ageing Res Rev 3(3):327–343. integrative analysis of large gene lists using DAVID bioinformatics doi:10.1016/j.arr.2004.01.003 resources. Nat Protoc 4(1):44–57. doi:10.1038/Nprot.2008.211 42. Biagioli M, Pinto M, Cesselli D, Zaninello M, Lazarevic D, 30. Lai EC, Tam B, Rubin GM (2005) Pervasive regulation of Roncaglia P, Simone R, Vlachouli C, Plessy C, Bertin N, Beltrami Drosophila Notch target genes by GY-box-, Brd-box-, and K-box- A, Kobayashi K, Gallo V, Santoro C, I, Rivella S, Beltrami class microRNAs. Genes Dev 19(9):1067–1080. doi:10.1101/gad. CA, Carninci P, Raviola E, Gustincich S (2009) Unexpected expres- 1291905 sion of alpha- and beta-globin in mesencephalic dopaminergic neu- 31. Mills JD, Kavanagh T, Kim WS, Chen BJ, Kawahara Y, Halliday rons and glial cells. Proc Natl Acad Sci U S A 106(36):15454– GM, Janitz M (2013) Unique transcriptome patterns of the white and 15459. doi:10.1073/pnas.0813216106 122 Neurogenetics (2015) 16:107–122

43. Dexter DT, Jenner P, Schapira AH, Marsden CD (1992) Alterations major histocompatibility complex class I molecules on the different in levels of iron, ferritin, and other trace metals in neurodegenerative cell types in multiple sclerosis lesions. Brain Pathol 14(1):43–50 diseases affecting the basal ganglia. The Royal Kings and Queens 47. Ishizawa K, Komori T, Arai N, Mizutani T, Hirose T (2008) Glial Parkinson’sDiseaseResearchGroup.AnnNeurol32(Suppl): cytoplasmic inclusions and tissue injury in multiple system atrophy: a S94–S100 quantitative study in white matter (olivopontocerebellar system) and 44. Abbott RD, Ross GW, Tanner CM, Andersen JK, Masaki KH, gray matter (nigrostriatal system). Neuropathology 28(3):249–257. Rodriguez BL, White LR, Petrovitch H (2012) Late-life hemoglobin doi:10.1111/j.1440-1789.2007.00855.x and the incidence of Parkinson’s disease. Neurobiol Aging 33(5): 48. Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) 914–920. doi:10.1016/j.neurobiolaging.2010.06.023 Specific expression of long noncoding RNAs in the mouse brain. 45. Barbour R, Kling K, Anderson JP, Banducci K, Cole T, Diep L, Fox Proc Natl Acad Sci U S A 105(2):716–721. doi:10.1073/Pnas. M, Goldstein JM, Soriano F, Seubert P, Chilcote TJ (2008) Red blood 0706729105 cells are the major source of alpha-synuclein in blood. Neurodegener 49. Faustino NA, Cooper TA (2003) Pre-mRNA splicing and human Dis 5(2):55–59. doi:10.1159/000112832 disease. Genes Dev 17(4):419–437. doi:10.1101/gad.1048803 46. Hoftberger R, Aboul-Enein F, Brueck W, Lucchinetti C, Rodriguez 50. Schmittgen TD, Livak KJ (2008) Analyzing real-time PCR data by M, Schmidbauer M, Jellinger K, Lassmann H (2004) Expression of the comparative C(T) method. Nat Protoc 3(6):1101–1108 Chapter 5

Investigation of a long intervening non-coding RNA that may be involved in the pathology of multiple system atrophy

85 Chapter 5

5.1 Primary research article: Long intervening

non-coding RNA 00320 is human brain-specific

and highly expressed in the cortical white mat-

ter

Reference Mills, J. D., Chen, J., Kim, W. S., Waters, P. D., Prabowo, A. S., Aronica, E., Halliday, G. M., and Janitz, M., (2015). “Long intervening non-coding RNA 00320 is human brain-specific and highly expressed in the cortical white matter.” Neurogenetics 16(3):201-213.

Contribution Iconceivedtheexperiments,wrotethearticle,preparedthefigures,isolatedthe RNA for sequencing and carried out the bioinformatic analyses. PDW contributed to calculating sequence and expression conservation of linc00320. JC carried out RT- qPCR of linc00320 in brain tissue under my supervision. ASP and EA completed RT-qPCR of astrocytes. WSK and GMH provided feedback on the manuscript. MJ conceived the experiments and provided feedback on the manuscript.

Synopsis As linc00320 was identified as highly expressed throughout the frontal cortex of the human brain and up-regulated in MSA WM when compared to MSA GM. This ex- pression pattern suggested that linc00320 may be involved in the molecular pathol- ogy of MSA, and provided a motivation for performing a comprehensive analysis of the linc00320 locus. The results in this article show that linc00320 is more likely to be involved in tissue di↵erentiation rather then MSA pathology. However, fur- ther interesting aspects of linc00320 expression patterns were revealed. Linc00320 appeared to be human- and brain-specific; no expression was detected in primates

86 Chapter 5 and in non-brain human tissues. Within brain tissue the highest expression levels of linc00320 were seen in WM of the SFG. Further, no expression was detected in human neuron, oligodendrocyte and astrocyte cell lines. Since oligodendrocytes and astrocytes are the major constituents of WM, high expression levels in t hese cell lines were expected. It was reasoned that expression was not found in these cell lines as they are homogeneous cell populations removed from the cellular conte:>.'t of the brain tissue. This suggests linc00320 may be involved in the inter-neuronal con­ nectivity required for higher human cognition. Finally, this paper set a standard for the use of bioinformatic tools as a means to analyse the functionality of lincRNAs.

D eclaration

I certify that this publication was a direct result of my research towards this PhD, and that re roduction in this thesis does not breach copyright regulations.

James Dominic Mills [PhD Candidate]

87 Neurogenetics (2015) 16:201–213 DOI 10.1007/s10048-015-0445-1

ORIGINAL ARTICLE

Long intervening non-coding RNA 00320 is human brain-specific and highly expressed in the cortical white matter

1 1 2,3 1 James D. Mills & Jieqiong Chen & Woojin S. Kim & Paul D. Waters & 4,5 4,5 2,3 1 Avanita S. Prabowo & Eleonora Aronica & Glenda M. Halliday & Michael Janitz

Received: 9 February 2015 /Accepted: 14 March 2015 /Published online: 29 March 2015 # Springer-Verlag Berlin Heidelberg 2015

Abstract Pervasive transcription of the genome produces a involved in improved functional connectivity for higher hu- diverse array of functional non-coding RNAs (ncRNAs). One man brain cognition. particular class of ncRNAs, long intervening non-coding RNAs (lincRNAs) are thought to play a role in regulating gene Keywords Human brain . Long intervening non-coding expression and may be a major contributor to organism and RNAs . Linc00320 . RNA-Seq . Grey matter . White matter . tissue complexity. The human brain with its heterogeneous Multiple system atrophy cellular make-up is a rich source of lincRNAs; however, the functions of the majority of lincRNAs are unknown. Recently, by completing RNA sequencing (RNA-Seq) of the human Introduction frontal cortex, we identified linc00320 as being highly expressed in the white matter compared to grey matter in While no strict definition of organism complexity exists, it can multiple system atrophy (MSA) brain. Here, we further inves- be seen generally as an increase in differentiated cells, accom- tigate the expression patterns of linc00320 and conclude that it panied by specialized cell function and the appearance of is involved in specific brain regions rather than having in- tissue-specific gene expression [1]. The human brain, and in volvement in the MSA disease process. We also show that particular the frontal lobe, can be classed as a complex tissue. the full-length linc00320 is only expressed in human brain It contains a number of different cell types that make up two tissue and not in other primates, suggesting that it may be distinct tissue types; grey matter (GM) and white matter (WM). GM is predominately made up of neurons, dendrites, synapses, as well as glial cells [2–4], while WM consists pri- Electronic supplementary material The online version of this article marily of axons, myelinating oligodendrocytes, and other glial (doi:10.1007/s10048-015-0445-1) contains supplementary material, cells [2, 3, 5]. There is a dramatic increase in the volume of which is available to authorized users. WM in the human brain when compared to the rodent brain and primates. It is theorized that it is this increase in WM * Michael Janitz mass, along with the associated WM cell types, that allows [email protected] humans to carry out higher cognitive functions [6–8]. Both GM and WM from the human frontal lobe have distinct gene 1 School of Biotechnology and Biomolecular Sciences, University of expression profiles which reflect differing cell types and func- New South Wales, Sydney, NSW 2052, Australia tional capabilities within these two structures [9]. 2 Neuroscience Research Australia, Sydney, NSW 2031, Australia Recently, there has been a shift away from the simplified 3 School of Medical Sciences, University of New South Wales, protein-centric view of molecular biology [10]. It is now Sydney, NSW 2052, Australia thought that the formation of organism phenotype involves 4 Department (Neuro) Pathology, Academic Medical Center, University multifaceted regulatory networks that dynamically alter levels of Amsterdam, Amsterdam, The Netherlands of gene expression and rates of translation depending on de- 5 Swammerdam Institute for Life Sciences, Center for Neuroscience, velopmental stage, environmental factors, and disease state. University of Amsterdam, Amsterdam, The Netherlands While it was initially thought that regulatory proteins were 202 Neurogenetics (2015) 16:201–213 responsible for this regulation, and while this may be mostly conserved among primates, thus suggesting that linc00320 true for bacteria, eukaryotic regulation of gene expression may be involved in improved functional connectivity for seems to require heterogeneous non-coding RNAs acting in higher cognition. distinct fashion on DNA, other RNA molecules, and proteins [10]. It is thought that it is this non-protein coding fraction of the genome that contributes to organism complexity [11]. In Materials and methods particular, non-coding RNAs are believed to play an important role in human brain development, plasticity, and disease [12]. RNA-Seq data sets and analysis Conversely, there is evidence that perturbation of non-coding RNA expression may play an important role in the establish- RNA-Seq datasets were downloaded from the NCBI’s short ment and progression of neurodegenerative diseases [13]. read archive (http://www.ncbi.nlm.nih.gov/sra;accession Long intervening non-coding RNAs (lincRNAs) are a sub- numbers SRA091951, SRA512485, and SRA602249). For class of long non-coding RNAs. LincRNAs are defined as meta-analysis across 16 human tissues RNA-Seq, data sets intervening transcripts, relative to the current gene annota- were taken from Illumina’s BodyMap2 project (accession tions, that are longer than 200 nucleotides in length and lack number ERP000546). protein-coding capacity [14]. LincRNAs have been reported The RNA-Seq reads were assembled with TopHat, which to be involved in X-inactivation, vertebrate embryonic devel- utilizes Bowtie to align sequence reads to the Homo sapiens opment, and establishment of cell identity [15–18]. Previous reference genome (build hg19). The default settings for transcriptome-wide comparative studies of different tissue TopHat were used. Cufflinks was then used to assemble the types from the human brain have suggested that expression aligned reads into individual transcripts by inferring splicing of lincRNAs may be the biggest driver of tissue differentiation structure and providing a minimal number of predicted tran- and that they may be involved in neurodegenerative diseases scripts through parsimonious assembly. Cufflinks also nor- [9, 19]. Of particular interest are lincRNAs that are expressed malized the read count of each input file to allow for calcula- at high levels. For example, it has been recently shown that tion of the relative abundance of each transcript in fragment highly expressed OLMALINC may be involved in maturation per kilobase of exon per million fragments mapped (fpkm). To of oligodendrocytes [20]. guide the assembly process, the iGenomes UCSC hg19 full Multiple system atrophy (MSA), a sporadic neurodegener- annotation GTF file was used. The TopHat outputs along with ative disorder characterized by varying combinations of par- merged Cufflinks files were passed on to Cuffdiff. Cuffdiff kinsonism, cerebellar ataxia, and autonomic failure [21, 22], is takes the replicates from each condition and looks for statisti- a disease in which conventional genetic studies have made cally significant changes in gene expression, transcript expres- little headway. Linc00320 is a multi-exon lincRNA that is sion, splicing, and promoter use. Cuffdiff uses a corrected p transcribed from chromosome 21; it also undergoes alternative value, known as the q value to determine if the differences splicing to produce distinct splice variants. Linc00320 was between the two groups are significant (q value <0.05) [23]. previously identified as being up-regulated 5.4-fold in MSA For the two-way ANOVAanalysis of linc00320 expression WM when compared to MSA GM [19]. Furthermore, a small- levels in the RNA-Seq data, first, the fpkm value of linc00320 er not statistically significant up-regulation of 2.8-fold was in each replicate was normalized across library sizes using seen when healthy WM was compared to healthy GM [9]. Cuffnorm. The normalized fpkm values were then passed to In all conditions, linc00320 was expressed at a high level. R (http://www.r-project.org/) for statistical testing. The high levels of brain region-specific expression patterns provide evidence that linc00320 may have a functional role Cell cultures in the human brain. Furthermore, the up-regulation in WM tissue from the MSA brain suggests that linc00320 may be Astrocytes-enriched human cell cultures were prepared from involved in MSA pathology [19]. fetal brain tissue (15–16 weeks of gestation; obtained from Toward further categorizing the possible function of spontaneous abortions with appropriate maternal written con- linc00320 here, we further analyze linc00320 RNA-Seq data, sent) or from histologically normal temporal neocortex of an validate the linc00320 human brain expression values, assess adult patient (male; 32 years) undergoing extensive surgical sequence and transcription conservation in vertebrates, deter- resection of the mesial structures for the treatment of medical- mine tissues-specific expression levels, and provide the puta- ly intractable epilepsy. Tissue was obtained in accordance tive secondary structure along with predicted RNA-binding with the Declaration of Helsinki and the AMC Research Code protein interaction sites. We find that linc00320 is more likely provided by the Medical Ethics Committee of the AMC. Tis- to be involved in region-specific processes in WM rather than sue samples were collected in Dulbecco’s modified Eagle’s MSA disease etiology. We also show that linc00320 is specif- medium (DMEM)/HAM F10 (1:1) medium (Gibco, Life ically expressed in the human brain even though the gene is Technologies), supplemented with 50 units/ml penicillin and Neurogenetics (2015) 16:201–213 203

50 μg/ml streptomycin and 10 % fetal calf serum (FCS). Cell of this step. This was followed by a final melt run from 60 to isolation was performed as previously described [24]. 95 °C. Each sample was run in triplicate. PSMB4 was used as a control to normalize results from each RT-qPCR run. Spec- Brain tissue samples ificity of each reaction was confirmed through analysis of melt curves and gel electrophoresis of PCR products. Brain tissue was taken from the superior frontal gyrus (SFG) For RT-qPCR of fetal and adult astrocytes, cerebral cortex of three MSA and three healthy control donors aged between tissue, and WM, first five micrograms of total RNA were 62 and 98. GM and WM samples were taken from each indi- reverse-transcribed into cDNA using oligo dT primers. Five vidual, giving a total of 12 samples. The postmortem interval nanomoles of oligo dT primers were annealed to 5 μg total (PMI) of the samples ranged 8–45 h. Control cases had no RNA in a total volume of 25 μL, by incubation at 72 °C for clinical or pathological evidence of neurodegenerative dis- 10 min, and cooled to 4 °C. Reverse transcription was per- ease. MSA brains were clinically and pathologically diag- formed by the addition of 25 μL RT-mix, containing First nosed using international diagnostic criteria [25]. Brain tissue Strand Buffer (Invitrogen–Life Technologies), 2 mM dNTPs samples were obtained from the Sydney Brain Bank following (Pharmacia, Germany), 30 U RNAse inhibitor (Roche Ap- ethical approval from the Human Research Ethics Committee plied Science, Indianapolis, IN, USA) and 400 U M-MLV of the University of New South Wales (Ref No. HC11221). reverse transcriptase (Invitrogen—Life Technologies, Additional cases were obtained from the archives of the De- The Netherlands). The total reaction mix (50 μL) was incu- partment of Neuropathology of the Academic Medical Center bated at 37 °C for 60 min, heated to 95 °C for 10 min, and (AMC, University of Amsterdam). Tissue was obtained and stored at −20 °C until use. For qPCR, a mastermix was pre- used in accordance with the Declaration of Helsinki and the pared on ice, containing per sample 1 μl cDNA, 2.5 μl of AMC Research Code provided by the Medical Ethics Com- FastStart Reaction Mix SYBR Green I (Roche Applied Sci- mittee. WM samples (corpus callosum) were obtained at au- ence, Indianapolis, IN, USA), and 0.4 μM of both reverse and topsy from nine adult control patients (years range 34–94; forward primers. The final volume was adjusted to 5 μL with F/M: 2/7) and cortical tissue (temporal cortex) from ten con- H2O (PCR grade). The LightCycler® 480 Real-Time PCR trol patients (years range 34–94; F/M: 3/7). All autopsies were System (Roche Applied Science) was used with a 384- performed within 24 h after death. Control cases had no clin- multiwell plate format. The cycling conditions were carried ical or pathological evidence of neurological diseases. out as follows: initial denaturation at 95 °C for 5 min, followed by 45 cycles of denaturation at 95 °C for 15 s, annealing at 55– Reverse transcription quantitative real-time polymerase 60 °C for 5 s, and extension at 72 °C for 10 s. The fluorescent chain reaction product was measured by a single acquisition mode at 72 °C after each cycle. For distinguishing specific from non-specific For RNA isolation, frozen material or cell culture material was products and primer dimers, a melting curve was obtained homogenized in Qiazol Lysis Reagent (QIAGEN Benelux, after amplification by holding the temperature at 65 °C for Venlo, the Netherlands). Total RNA was isolated using the 15 s followed by a gradual increase in temperature to 95 °C miRNeasy mini kit or RNeasy Lipid Tissue mini kit at a rate of 2.5 °C s−1, with the signal acquisition mode set to (QIAGEN Benelux, Venlo, The Netherlands) according to continuous [26]. manufacturer’s instructions. The concentration and purity of For both RT-qPCR experiments, the sequence of the forward RNA were determined at 260/280 nm using a Nanodrop spec- primer used to amplify linc00320 was GACTCCTTTGGGAG trophotometer (Ocean Optics, Dunedin, FL, USA). ACCAGTG, whereas the sequence of the reverse primer was For RT-qPCR of MSA and control brain samples, an Ex- AGGTCACAGGGGATTTGATGG. The forward and reverse press One-Step Superscript RT-qPCR kit from Invitrogen primers used to amplify PSMB4 were TCAGTCCTCGGCGT (Carlsbad, CA, USA) was used in accordance to the manufac- TAAGTTC and CTGATCATGTGGGCAATATCC, respec- turer guidelines. Briefly, each reaction contained 10 μL Ex- tively. The 2-ΔΔCt method was used to calculate fold-change press SYBR® GreenER™ qPCR SuperMix Universal solu- [27], and the data was passed to R (http://www.r-project.org/) tion, 0.4 μL forward primer (10 μM) and 0.4 μL reverse for statistical testing. primer (10 μM), 0.5 μL Express SuperScript® Mix for One- Step GreenER™ solution, 150 ng RNA template, and various Sequence conservation and expression conservation amount of H2O to make up the total volume to 20 μL. RT- analysis qPCR was carried out using the Rotor-Gene 6000 with an initial hold Tm at 50 °C for 5 min and second hold Tm at The sequence conservation of all linc00320 isoforms was inves- 95 °C for 2 min, followed by 40 cycles made up of denatur- tigated using UCSC Genome Browser (http://genome.ucsc.edu). ation for 15 s at 94 °C, annealing and extension for 60 s at The full display mode of Multiz Alignment of 100 vertebrates 60 °C, a florescence acquisition step was carried out at the end was chosen to visualize linc00320 conservation. BLASTN 204 Neurogenetics (2015) 16:201–213

(http://www.ensembl.org/)searches,usingnormalanddistant linc00320 in WM when compared to GM (p value <0.003) homologies settings, were carried out using linc00320-1 (Fig. 1a). There was no difference between linc00320 levels (ENST00000416768) as a query in Pan troglodytes (chimpan- when MSA tissue was compared to healthy tissue (p value zee), Gorilla gorilla (gorilla), Pongo pygamaeus abelil (orang- >0.65). There was no interaction effect between the disease utan), Macaca mulatta (rhesus macaque), Otolemur garnettii state and the tissue type (p value >0.5). We were able to val- (bushbaby), Mus musculus (mouse), Loxodonta africana (Afri- idate these results using RT-qPCR; there was a ninefold up- can savanna elephant), and Monodelphis domestica (gray short- regulation of linc00320 in WM when compared to GM (p tailed opossum). value <0.01) (Fig. 1c). Again, there was no difference in Wiggle plots of read coverage were created, using the linc00320 expression levels when MSA tissue was compared SAMtools [28], for brain RNA-Seq data [29] and uploaded to healthy tissue (ignoring tissue type) (p value >0.75). There as a custom track to examine the expression, by looking for was no interaction between the disease state and tissue type (p read depth, of linc00320 in representative genomes. value >0.9). Both the RNA-Seq and RT-qPCR results showed a strong up-regulation of linc00320 in WM compared to GM, Prediction of secondary structure and interactions with limited interaction between disease state and tissue type. with RNA-binding proteins Next, we performed a statistical analysis to establish if there was any difference in linc00320 expression in any of The RNAfold tool from the ViennaRNAWeb Suite (http://rna. the individual groups (MSA WM, MSA GM, control WM, tbi.uniview.ac.at/)wasused,withdefaultoptions,for control GM). Using the RNA-Seq data, there was a statistical- prediction of linc00320 secondary structures [30, 31]. ly significant fivefold up-regulation of linc00320 in MSA RNAfold predicts the minimum free energy (MFE) secondary WM compared to MSA GM (p value <0.04). The 3.5-fold structure for calculation of the centroid structure [32]. up-regulation seen in control WM when compared to control For the prediction of protein-RNA interactions, catRAPID GM was not statistically significant (Fig. 1b). There was no web-based software was used (http://service.tartaglialab.com/ difference in the expression of MSA GM when compared to page/catrapid_group)[33, 34]. First, catRAPID omics control GM, and when MSA WM was compared to control software was used to predict interactions between linc00320 WM. This is also inline with previous RNA-Seq findings [19]. and the nucleotide binding proteome of H. sapiens. Next, the For the RT-qCPR, the 11.9-fold up-regulation of linc00320 in interaction profile of each of the potential RNA-binding pro- control WM compared to control GM was not considered teins with linc00320 was investigated individually using cat- statistically significant; the 7.3-fold up-regulation in MSA RAPID fragments software. WM when compared to MSA GM was also not considered statistically significant (Fig. 1d). While the general trend does remain, the RT-qPCR results are in conflict with the results Results produced by the RNA-Seq data, which showed a statistically significant up-regulation in WM MSAwhen compared to GM Expression patterns of linc00320 MSA. While we cannot rule out the potential changes in the ex- Linc00320 was expressed at 59.4 fpkm in MSA WM, 49.14 pression of linc00320 being involved in MSA pathology, both fpkm in control WM, 11.9 fpkm in MSA GM, and 14.15 fpkm the RNA-Seq and RT-qPCR results strongly point to in control GM. The RNA-Seq expression values for linc00320 linc003200 expression being brain tissue region specific in were in the top 10 % of expressed genes in MSA WM, top the human brain rather than being involved in the MSA dis- 11 % of expressed genes in control WM, and top 38 and 33 % ease process. of expressed genes in MSA GM and control GM, respectively. To assess if linc00320 was involved in any cis regulation of This included protein-coding genes, other lincRNAs, and un- adjacent protein-coding loci, the expression levels of genes annotated genes, indicating relative high expression level of 300 kb up-stream and down-stream from the linc00320 locus the linc00320. For comparison, the highly expressed long were analyzed using data derived from the previous RNA-Seq non-coding RNA metastasis associated lung adenocarcinoma study [9]. The neural cell adhesion molecule 2 (NCAM2) was transcript 1 (MALAT1) was expressed in the top 5 % of genes the only protein-coding gene within this region, and there expressed in MSA and control WM and in the top 3 and 2 % were no changes in the expression patterns of NCAM2 across of genes expressed in MSA and control GM, respectively. GM and WM. A two-way ANOVAwas carried out on the RNA-Seq data to ascertain if there was any difference in linc00320 expres- Meta-analysis of linc00320 across other human tissues sion between GM and WM and whether the MSA disease process was involved in any changes in expression. There To determine whether linc00320 was expressed specifically in was as fourfold statistically significant up-regulation of brain tissue or across other tissues in the human body, a meta- Neurogenetics (2015) 16:201–213 205

Fig. 1 a RNA-Seq expression values for linc00320 in GM and WM. (p value <0.01). There was no difference in expression between MSA and Linc00320 up-regulated fourfold in WM when compared to GM (p value control tissue. PSMB4 was used as a reference gene to normalize the <0.003). There was no difference in expression between MSA and control expression levels of linc00320 before comparisons were made. d RT- tissue. Error bars are one standard deviation. b RNA-Seq expression qPCR expression levels grouped by tissue type and disease state. levels grouped by tissue type and disease state. There was a statistically Linc00320 was up-regulated 11.9-fold in control WM when compared to significant fivefold up-regulation of linc00320 in MSA WM when com- control GM and 7.4-fold when MSA WM was compared to MSA GM. pared to MSA GM (p value <0.04). The 3.5-fold up-regulation seen in PSMB4 was used as a reference gene to normalize the expression levels of control WM when compared to controlGMwasnotstatisticallysignifi- linc00320 before comparisons were made. For all figures expression levels cant. c RT-qPCR expression values for linc00320 in GM and WM. There were calculated using 2-ΔCt, and fold-change was calculated using the was a ninefold up-regulation of linc00320 in WM when compared to GM 2-ΔΔCt method. All error bars are one standard deviation

analysis was performed using the publicly available RNA-Seq fpkm. In contrast, only residual levels of linc00320 expression datasets from the Illumina Transcriptome BodyMap 2 project. were detected in both prostate (0.01 fpkm) and testes (0.008 The expression levels of linc00320 were investigated across fpkm), and no expression in other tissues (Fig. 2). Depending 16 tissue types of varying ages and sex, including liver, brain, on the total amount of RNA in a cell, one transcript copy per prostate, testes, lymph node, ovary, kidney, heart, lung, breast, cell corresponds to 0.5–5 fpkm [35]. It might be therefore adipose, adrenal, skeleton muscle, thyroid, colon, and white concluded that there was no expression of linc00320 outside blood cells. The level of linc00320 in the brain was at 5.64 of brain tissue. 206 Neurogenetics (2015) 16:201–213

4 Expression (FPKM) Expression 2

0 Liver Lung Brain Heart Colon Ovary Testes Breast Kidney Thyroid Adrenal Adipose Prostate Lymph node Lymph Skeletal muscle Skeletal Tissue cells White blood Fig. 2 Comparative analysis of the linc00320 expression levels in brain type. The RNA-Seq datasets used for this analysis were taken from and 15 other human tissues. Linc00320 had an expression of 5.64 fpkm in Illumina’s BodyMap2 project total brain tissue. No significant expression was seen in any other tissue

To further narrow down the brain cell type expression of to GM (p value <0.0001), and further when the tissue types linc00320, previously published RNA-Seq data from human where broken down by disease state; it was up-regulated in oligodendrocyte (MO3.13) and neuronal (SK-N-SH) cell lines both MSA WM (p value <0.01) and control WM (p value was analyzed [20]. Oligodendrocytes were used as a cell line <0.05), when compared to the MSA GM and control GM model for WM, and neurons were used as a cell line model for counterparts. Linc00320-002 showed minimal signs of being GM. As the expression of linc00320 was much higher in WM involved in the disease process. Neither linc00320-006 nor than GM, it was expected that higher levels of expression linc00320-007 were differentially expressed across any of would be identified in oligodendrocytes than neurons. How- the conditions. Hence, we propose that linc00320-002 is the ever, no expression of linc00320 was seen in either cell type. major contributor to tissue differentiation. Using RT-qPCR, we also performed quantitative analysis of Each linc00320 splice variant retained the same 5′ and 3′ the linc00320 in fetal and adult astrocytes. Again, the expres- exon. Furthermore, the 3′ exon contains two polyadenylation sion levels for the linc00320 transcripts remained at the basal sites, suggesting that all of the linc00320 isoforms are levels in contrast to moderate levels of the transcript in cortical polyadenylated. Linc00320-002 retained a unique second ex- tissue and high levels in WM tissues (Fig. 3). on. Two different transcription start sites (TSS) were identified among the splice variants. Linc00320-002 and −007 utilized the same TSS, whereas −006 had a unique TSS. Each of the Alternative splicing of linc00320 isoforms were scanned for the presence of any open reading frames (ORFs). The longest ORF identified on any of the Three splice variants of linc00320, with expression levels of at isoforms was 54 amino acids in length, which remained well least 5 fpkm in one condition, were identified by RNA-Seq. below the 100 amino acid cut-off used as the selection criteria One of the splice variants resembled linc00320-002 annotated for lincRNAs [36] and thus confirming that all of the isoforms in the Ensembl database. The other two splice variants were are lincRNAs. not annotated and were considered novel. To remain in accor- dance with Ensembl annotation, the novel transcripts will henceforth be referred to as linc00320-006 and linc00320- Conservation and expression of linc00320 in vertebrates 007 (Fig. 4a). The annotated linc00320-002 was the dominant isoform with the highest expression level in each of the con- Sequence conservation of all linc00320 transcripts was ditions, peaking at 30.58 fpkm in MSA WM (Fig. 4b). assessed on the UCSC genome browser using the Multiz Linc00320-002 was up-regulated when WM was compared alignment of 100 vertebrates track, and BLASTN searches Neurogenetics (2015) 16:201–213 207

Fig. 3 RT-qPCR quantification of the linc00320 expression in fetal and adult astrocytes, cerebral cortex, and WM. The highest level of expression was again seen in WM, followed by the cerebral cortex. Only basal levels of expression were detected in fetal and adult astrocytes. Linc00320 was expressed threefold higher in WM when compared to the cerebral cortex (p value <0.001). The difference in expression levels between WM and fetal astrocytes (p value <0.001) and adult astrocytes (p value <0.005) were statistically significant. Expression levels were calculated using 2-ΔCt and fold-change was calculated using the 2-ΔΔCt method

in chimpanzee, gorilla, orangutan, rhesus macaque, bushbaby, exon also showed the highest level of sequence conservation mouse, elephant, and opossum. All exons were detected in across vertebrates. chimpanzee and orangutan. Exons 3–6 were not detected in the gorilla genome, and in rhesus macaque exon 3 was miss- Prediction of linc00320 interaction with RNA-binding ing (Suppl. Fig. 1). In bushbaby, mouse, and elephant, only proteins the largest exon was detected in the correct genomic context (i.e., up stream of Ncam2). BLASTN searches did not return The protein binding potential of each isoform was assessed, any significant hits in the opossum genome. based on the presence of sequence motif and favorable RNA Expression of linc00320 (Suppl. Fig. 1) was assessed with secondary structure. Initially, 17 different RNA-binding pro- published brain RNA-Seq data from chimpanzee, gorilla, teins were identified as having a high likelihood of interacting orangutan, rhesus macaque, and mouse. There was little to with at least one of the linc00320 transcripts. The gene ontol- no read coverage in chimpanzee brain across the linc00320 ogy terms for each protein were then assessed, and RNA- locus. There was low (relative to human) expression in gorilla binding proteins involved in fundamental transcription pro- and orangutan that was not restricted to the exons (Suppl. cesses, such as polyadenylation and RNA transport, were Fig. 1); however, there was no evidence that the largest exon disregarded for further analysis. This reduced the number of was expressed. In rhesus and mouse, there was no read cov- distinct proteins to seven: CUGBP ELAV-like family member erage across linc00320. 1 (Celf1), ELAV-like protein 1 (Elav1), Nucleolysin TIA-1 isoform p40 (Tia1), Nucleolysin TIAR (Tiar), Splicing factor, Secondary structure of linc00320 isoforms proline- and glutamine-rich (Sfpq), Serine/arginine-rich splic- ing factor 1 (Srsf1), and RNA-binding protein FUS (Fus). It is thought that lincRNAs may play vital roles in regulatory There were five RNA-binding proteins that interacted with functions throughout a cell [37]. To elucidate a possible regu- linc00320-002; Celf1, Elav1, Tia1, Tiar, and Sfpq. Interesting- latory role for linc00320, we performed a prediction of the ly, the protein binding motifs for all five proteins were located RNA secondary structure coupled with estimation of protein within structurally well-defined region of exon 3 of the binding capacity. To establish if the varied exon/intron struc- linc00320-002 transcript. The protein interaction region for ture impacted on the functional capabilities, each linc00320 linc00320-002 is shown in Fig. 5; for the other transcripts, isoform was analyzed individually. Each of the isoforms had a the interaction profiles are shown in Supplementary Figure 3. slightly different secondary structure; however, since all of the Linc00320-006 potentially interacted with six different RNA- isoforms shared the last exon, which is also the largest exon in binding proteins; Celf1, Fus, Elav1, Tia1, Tiar, and Sfpq. all transcripts, all of the isoforms are featured by a large com- Again, the predicted binding region was similar for all mon structural region (Suppl. Fig. 2). Interestingly, the last RNA-binding proteins, corresponding to exon three of 208 Neurogenetics (2015) 16:201–213

Fig. 4 a Chromosomal positioning, genomic context, and splice variants across all four tissue types. Linc00320-002 is the dominant isoform, hav- of linc00320. Track 1 shows the coordinates of linc00320 on ing the highest level of expression in every sample. Linc00320 was sig- chromosome 21. Track 2 shows linc003203 in genomic context. nificantly up-regulated in MSA WM when compared to MSA GM (p Linc00320 is located between an un-characterized lncRNA value <0.01) and control WM when compared to control GM (p value (AL109763.2) and neural cell adhesion molecule 2 (NCAM2). Track 3 <0.02). Every isoform has its highest level of expression in MSA WM. is a schematic representation of the exon/intron structure of the None of the changes in expression were considered statistically signifi- linc00320-002, −006,and−007.Theexonswerenumberedand cant. Error bars are one standard deviation highlighted in green. b RNA-Seq expression of linc00320 isoforms linc00320-006.Sixproteins,namelyCelf1,Elav1,Sfpq, or brain-related processes (Table 1). It is thought that Celf1 Srsf1, Tia1, and Tiar, were predicted to interact with may play a role in the disease myotonic dystrophy and em- linc00320-007. The binding motifs are clustered within 140 bryonic development [40, 51]. It has been shown that Elav nucleotide region encompassing exon three. All of the motifs proteins are essential for proper development of the localize within the structurally conserved motif in the RNA Drosophila nervous system and that Elav proteins are differ- secondary structure (Suppl. Fig. 2); an open finger containing entially expressed in the human neuroblastoma [43, 44]. Mice between four and six loops, depending on the linc00320 iso- knockout of Tia1 have shown disregulation of genes encoding form. Of note, while the intron/exon structure of all linc00320 lipid homeostasis in the brain [45]. Tiar is a protein re- transcripts varies, this particular structural element remains lated to apoptotic death ofTlymphocytesandretinal conserved. Moreover, the corresponding exon in each pigment epithelial cells, and has shown an up- linc00320 transcript is also conserved in chimpanzee and regulation of expression in neurons after ischemic cere- orangutan. In contrast, no RNA-binding proteins were found bral injury [46]. Sfpq regulates the dyslexia susceptibility to interact with the more highly conserved exon at the 3′ end 1candidate(DYX1C1)genethatisinvolvedinneuronal of the linc00320. migration during the development of the cerebral cortex All of the RNA-binding proteins that may interact with the [48]. The splicing factor Srsf1 is thought to regulate ap- linc00320 isoforms are involved in developmental processes optosis and proliferation to promote epithelial cell Neurogenetics (2015) 16:201–213 209

Fig. 5 Interaction profile of the linc00320-002 transcript and the RNA- experimentally validated predictions, the interaction between the Fmr1 binding proteins, Celf1, Tia1, Elav1, Tiar, and Sfpq. The x-axis indicates protein and its mRNA and the interaction between Srsf1 and the long the nucleotide position along the transcript. The interaction score on the non-coding RNA Xist give interaction scores of close to 4 [38, 39]. The y-axis indicates how likely it is for the RNA-binding protein to interact prediction is based on structural properties; further analysis also showed with this region of the transcript. The peak of 3.5 between approximately that sequence motifs are present for each of the RNA-binding proteins 150 and 300 indicates that this is a likely area of interaction. Two transformation [49]. Mutations in FUS have been associ- Seq data, through the use of RT-qPCR, concluding that ated with familial amyotrophic lateral sclerosis [50]. linc00320 expression is specific for WM as a brain region structure rather than involved in specific MSA pathophysiol- ogy. In particular, these differences may be driven by the Discussion linc00320-002 isoform. The expression levels of linc00320, as determined by RNA-Seq, exceeded the expression levels Initial RNA-Seq data suggested that linc00320 was expressed for many protein-coding genes. According to RNA-Seq, at high levels in the human brain with a significant up- linc00320 expression in control WM was in the top 11 % of regulation in frontal WM when compared to frontal GM in genes, and its expression in MSA WM was in the top 10 % of MSA brain tissue. We investigated the validity of the RNA- genes. Based on previous observations in vertebrates, it 210 Neurogenetics (2015) 16:201–213

Table 1 Properties of RNA-binding proteins predicted to interact with linc00320 isoforms

RNA-binding protein Symbol Linc00320 transcript Subcellular location Function References (exon)

CUGBP ELAV-like Celf1 −002 (3) Nucleus, cytoplasm Involved in regulating gene expression [40, 41] family member 1 during embryonic development of vertebrates −006 (3) Inactivation in mice causes growth, [42] −007 (3) viability, and spermatogenesis defects ELAV-like protein 1 Elav1 −002 (3) Nucleus, cytoplasm Elav-like proteins are essential for [43] Drosophila nervous system development −006 (3) Elav-like proteins may play a role in [44] −007 (3) neuronal differentiation Nucleolysin TIA-1 Tia1 −002 (3) Cytoplasmic granule, Mice knockouts have shown [45] isoform p40 −006 (3) nucleus deregulation of genes encoding −007 (3) for lipid homeostasis in the brain Nucleolysin TIAR Tiar −002 (3) Cytoplasmic granule, Up-regulated in neurons after ischemic [46] nucleus, cytoplasm cerebral in jury −006 (3) Essential for primordial germ cell [47] −007 (3) development Splicing factor, Sfpq −002 (3) Nucleus matrix, Regulates expression of dyslexia [48] proline-, and −006 (3) cytoplasm susceptibility 1 candidate (DYX1C1), glutamine-rich −007 (3) a gene that is involved in neuronal migration during development of the cerebral cortex Serine/arginine-rich Srsf1 −007 (3) Cytoplasm, nucleus Regulates apoptosis and proliferation to [49] splicing factor 1 speckle promote epithelial cell transformation FUS RNA-binding Fus −006 (3) Nucleus Mutations have been associated with [50] protein familial amyotrophic lateral sclerosis generally appears that lincRNAs are expressed at lower levels very different from the transcriptome in vivo [56]. If when compared to protein-coding genes [14, 16, 52]. Previous linc00320 is involved in any function that involves interaction studies have shown a threefold difference of median expres- of multiple cell types, in vivo expression patterns may be lost sion levels between lincRNA and protein-coding RNA [14, in cell culture. Alternatively, linc00320 may be expressed in 52, 53]. In contrast, linc00320 surpasses these expression other cell types within the brain. The main cells in the brain are levels, particularly in human WM, thus suggesting neurons, and the four different groups of glial cells, such as functionality. oligodendrocytes, astrocytes, microglia, and NG2 cells, that The expression level of linc00320 in total brain was 5.64 occur in varying numbers throughout GM and WM [57]. Con- fpkm (Fig. 2), with enrichment to 34.5 fpkm in control WM sidering that linc00320 is expressed both in WM and GM but (Fig. 1). Since long non-coding RNAs have a highly specific not neurons, oligodendrocytes, or astrocytes, it is possible that expression pattern at both regional and subcellular levels [54], linc00320 is expressed in microglial or NG2 cells. It is known it would be expected that the expression levels for linc00320 that microglia are affected by MSA disease pathology [58, would be even higher if cell type-specific RNA-Seq would be 59], whereas NG2 cells are not. Our results showed that performed. We attempted to establish the cell type expression linc00320 is more likely to be involved in tissue differentia- of linc00320 by performing RNA-Seq of oligodendrocytes tion, for example in controlling cell to cell interactions, rather and neurons, and RT-qPCR in fetal and adult astrocytes; how- than the disease pathology. ever, low levels of expression were identified. This may have In old world monkeys, linc00320 appeared well conserved, been due to the fact that these assays were carried out on cell with all exons detected in most representative species (Suppl. lines, which may have different transcriptome profiles com- Fig. 1). Remarkably, the last exon of this lincRNAwas detect- pared to mature cells in a specific region of the brain. Cell ed in the correct genomic context (up stream of Ncam2) in all lines lack the cellular context seen in primary tissue, which eutherian genomes that were examined, suggesting that it can abolish cell-cell interactions, secretion, and other func- arose in the eutherian ancestor and is an ancient region of tions that are typical in a cellular context [55]. Cell lines also the human genome. Interestingly, the expression of all suffer from phenotypic and genotypic drift as they continue to linc00320 exons was only detected in the human brain and divide, meaning that the transcriptome seen in vitro can be no other great ape. However, there appeared to be low Neurogenetics (2015) 16:201–213 211 expression within the gene body of the gorilla and orangutan References linc00320 locus that might represent a proto-linc00320 tran- script. Incorporation of the largest exon (human exon 7) into 1. Vinogradov AE, Anatskaya OV (2007) Organismal complexity, the linc00320 transcript must have been a human-specific cell differentiation and gene expression: human over mouse. event. The mature linc00320 was then recruited into (or en- Nucleic Acids Res 35(19):6350–6356. doi:10.1093/nar/gkm723 2. Nolte J, Sundsten J (2002) The human brain: an introduction to its hanced a recently established) human brain function. functional anatomy, 5th edn. Mosby, China While linc00320 is alternatively spliced and each transcript 3. Saladin K (2010) Anatomy and physiology: the unity of form and variant has a distinct exon/intron structure, RNA secondary function, 5th edn. McGraw-Hill, New York structure analysis revealed that all transcripts contained a 4. Martin J, Howard J, Leonard M (2004) Neuroanatomy: text and atlas, vol., 3rd edn. McGraw-Hill, USA number of similar structural motifs. One such motif appeared 5. Underwood J, Cross S (2009) General and systematic pathology, to be the major structural element involved in interaction with 5th edn. Elsevier Limited, China RNA-binding proteins. This structural similarity has resulted 6. Schoenemann PT, Sheehan MJ, Glotzer LD (2005) Prefrontal white in prediction of interaction with the RNA-binding proteins matter volume is disproportionately larger in humans than in other primates. Nat Neurosci 8(2):242–252. doi:10.1038/nn1394 Celf1, Elav1, Tia1, Sfpq, and Tiar across all linc00320 tran- 7. Smaers JB, Schleicher A, Zilles K, Vinicius L (2010) Frontal white scripts. As such, we suggest that each of the linc00320 iso- matter volume is associated with brain enlargement and higher forms is involved in a similar main regulatory function, and structural connectivity in anthropoid primates. PLoS One 5(2): linc00320 splicing plays a cell- or cellular compartment- e9123. doi:10.1371/journal.pone.0009123 8. Zhang K, Sejnowski TJ (2000) A universal scaling law between specific distribution role within the cortical tissue. The cluster gray matter and white matter of cerebral cortex. Proc Natl Acad Sci of protein binding motifs was located on exons that were U S A 97(10):5621–5626. doi:10.1073/pnas.090504197 conserved, but not expressed, in chimpanzee and orangutans, 9. Mills JD, Kavanagh T, Kim WS, Chen BJ, Kawahara Y, Halliday suggesting that linc00320 has a human-specific function. We GM, Janitz M (2013) Unique transcriptome patterns of the white and grey matter corroborate structural and functional heterogeneity might therefore speculate that linc00320 has multiple human in the human frontal lobe. PLoS One 8(10):e78480. doi:10.1371/ brain-specific regulatory roles that may depend on cellular journal.pone.0078480 context and developmental stage. This hypothesis will need 10. Mattick JS (2011) The central role of RNA in human development to be examined with in situ hybridization studies and experi- and cognition. FEBS Lett 585(11):1600–1616. doi:10.1016/j. mental validation of RNA-protein interactions capacity of febslet.2011.05.001 11. Mattick JS (2001) Non-coding RNAs: the architects of eukaryotic linc00320. complexity. EMBO Rep 2(11):986–991. doi:10.1093/embo- In summary, linc00320 is highly expressed in specific reports/kve230 regions of the brain, and may be involved in cell- 12. Qureshi IA, Mehler MF (2012) Emerging roles of non-coding specific functions. The linc00320 exons were observed RNAs in brain evolution, development, plasticity and disease. Nat Rev Neurosci 13(8):528–541. doi:10.1038/nrn3234 in all primates, but only expressed in humans. So al- 13. Qureshi IA, Mattick JS, Mehler MF (2010) Long non-coding RNAs though the largest exon is common to all eutherian in nervous system function and disease. Brain Res 1338:20–35. doi: mammal, and the whole gene appears to be assembled 10.1016/j.brainres.2010.03.110 in old world monkeys, the mature human transcript is 14. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL (2011) Integrative annotation of human large intergenic recently evolved and may play a role in improved func- noncoding RNAs reveals global properties and specific subclasses. tional connectivity for higher human brain function. Genes Dev 25(18):1915–1927. doi:10.1101/gad.17446611 While future functional studies are required to elucidate 15. Duret L, Chureau C, Samain S, Weissenbach J, Avner P (2006) The the role of linc00320,wesuggestthatlinc00320 is an Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312(5780):1653–1655. doi:10.1126/ interesting candidate for further understanding of how science.1126316 the human brain functions and what sets humans apart 16. Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP (2011) from other primates. Conserved function of lincRNAs in vertebrate embryonic develop- ment despite rapid sequence evolution. Cell 147(7):1537–1550. doi:10.1016/j.cell.2011.11.055 17. Sone M, Hayashi T, Tarui H, Agata K, Takeichi M, Nakagawa S Acknowledgments Tissues were received from the Sydney Brain Bank (2007) The mRNA-like noncoding RNA Gomafu constitutes a nov- at Neuroscience Research Australia and the New South Wales Tissue el nuclear domain in a subset of neurons. J Cell Sci 120(Pt 15): Resource Centre at the University of Sydney which are supported by 2498–2506. doi:10.1242/jcs.009357 the National Health and Medical Research Council of Australia 18. Rapicavoli NA, Poth EM, Blackshaw S (2010) The long noncoding (NHMRC), University of New South Wales, Neuroscience Research RNA RNCR2 directs mouse retinal cell specification. BMC Dev Australia, Schizophrenia Research Institute, and National Institute of Al- Biol 10:49. doi:10.1186/1471-213X-10-49 cohol Abuse and Alcoholism (NIH (NIAAA) R24AA012725). This re- 19. Mills JD, Kim WS, Halliday GM, Janitz M (2014) Transcriptome search was supported by the National Health & Medical Research Coun- analysis of grey and white matter cortical tissue in multiple system cil of Australia (Project grant #1022325 to WSK and Fellowship #630434 atrophy. Neurogenetics. doi:10.1007/s10048-014-0430-0 to GMH) and Brain Foundation Australia (to MJ). EA is supported by the 20. Mills JD, Kavanagh T, Kim WS, Chen B, Waters PD, Halliday GM, Framework Programme FP7/2007-2013 under the project EPISTOP Janitz M (2015) High expression of long intervening non-coding (grant agreement n°: 602391) and (NeuroGeM grant 733051052). RNA OLMALINC in the human cortical white matter is associated 212 Neurogenetics (2015) 16:201–213

with regulation of oligodendrocyte maturation. Mol Brain 8(1):2. functional DNA elements in the human genome. Proc Natl Acad doi:10.1186/s13041-014-0091-9 Sci U S A 111(17):6131–6138 21. Gilman S, Wenning G, Low P, Brooks D, Mathias C, Trojanowski 36. Hangauer MJ, Vaughn IW, McManus MT (2013) Pervasive tran- J, Wood NW, Colosimo C, Dürr A, Fowler C, Kaufmann H, scription of the human genome produces thousands of previously Klockgether T, Lees A, Poewe W, Quinn N, Revesz T, Robertson unidentified long intergenic noncoding RNAs. PLoS Genet 9(6): D, Sandroni P, Seppi K, Vidailhet M (2008) Second consensus e1003569. doi:10.1371/journal.pgen.1003569 statement on the diagnosis of multiple system atrophy. Neurology 37. Mercer TR, Mattick JS (2013) Structure and function of long non- 71(9):670–676 coding RNAs in epigenetic regulation. Nat Struct Mol Biol 20(3): 22. Lu CF, Soong BW, Wu HM, Teng S, Wang PS, Wu YT (2013) 300–307. doi:10.1038/nsmb.2480 Disrupted cerebellar connectivity reduces whole‐brain network ef- 38. Schaeffer C, Bardoni B, Mandel JL, Ehresmann B, Ehresmann C, ficiency in multiple system atrophy. Mov Disord 28(3):362–369 Moine H (2001) The fragile X mental retardation protein binds 23. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, specifically to its mRNA via a purine quartet motif. EMBO J Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential 20(17):4803–4813. doi:10.1093/emboj/20.17.4803 gene and transcript expression analysis of RNA-seq experiments 39. Royce-Tolland ME, Andersen AA, Koyfman HR, Talbot DJ, Wutz with TopHat and Cufflinks. Nat Protoc 7(3):562–578. doi:10.1038/ A, Tonks ID, Kay GF, Panning B (2010) The A-repeat links ASF/ nprot.2012.016 SF2-dependent Xist RNA processing with random choice during X 24. Iyer A, Zurolo E, Prabowo A, Fluiter K, Spliet WG, van Rijen PC, inactivation. Nat Struct Mol Biol 17(8):948–954. doi:10.1038/ Gorter JA, Aronica E (2012) MicroRNA-146a: a key regulator of nsmb.1877 astrocyte-mediated inflammatory response. PLoS One 7(9):e44789. 40. Gautier-Courteille C, Le Clainche C, Barreau C, Audic Y, doi:10.1371/journal.pone.0044789 Graindorge A, Maniey D, Osborne HB, Paillard L (2004) EDEN- 25. Wenning GK, Tison F, Seppi K, Sampaio C, Diem A, Yekhlef F, BP-dependent post-transcriptional regulation of gene expression in Ghorayeb I, Ory F, Galitzky M, Scaravilli T, Bozi M, Colosimo C, Xenopus somitic segmentation. Development 131(24):6107–6117. Gilman S, Shults CW, Quinn NP, Rascol O, Poewe W, Multiple doi:10.1242/dev.01528 System Atrophy Study G (2004) Development and validation of the 41. Blech-Hermoni Y, Stillwagon SJ, Ladd AN (2013) Diversity and Unified Multiple System Atrophy Rating Scale (UMSARS). Mov conservation of CELF1 and CELF2 RNA and protein expression Disord 19(12):1391–1402. doi:10.1002/mds.20255 patterns during embryonic development. Dev Dyn 242(6):767– 26. Aronica E, Boer K, Becker A, Redeker S, Spliet WG, van Rijen PC, 777. doi:10.1002/dvdy.23959 Wittink F, Breit T, Wadman WJ, Lopes da Silva FH, Troost D, 42. Kress C, Gautier-Courteille C, Osborne HB, Babinet C, Paillard L Gorter JA (2008) Gene expression profile analysis of epilepsy- (2007) Inactivation of CUG-BP1/CELF1 causes growth, viability, associated gangliogliomas. Neuroscience 151(1):272–292. doi:10. and spermatogenesis defects in mice. Mol Cell Biol 27(3):1146– 1016/j.neuroscience.2007.10.036 1157. doi:10.1128/MCB. 01009-06 27. Livak KJ, Schmittgen TD (2001) Analysis of relative gene expres- 43. Campos AR, Grossman D, White K (1985) Mutant alleles at the sion data using real-time quantitative PCR and the 2(−Delta Delta locus elav in Drosophila melanogaster lead to nervous system de- C(T)) Method. Methods 25(4):402–408. doi:10.1006/meth.2001. fects. A developmental-genetic analysis. J Neurogenet 2(3):197– 1262 218 28. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth 44. Chagnovich D, Fayos BE, Cohn SL (1996) Differential activity of G, Abecasis G, Durbin R, Genome Project Data Processing S ELAV-like RNA-binding proteins in human neuroblastoma. J Biol (2009) The sequence alignment/map format and SAMtools. Chem 271(52):33587–33591 Bioinformatics 25(16):2078–2079. doi:10.1093/bioinformatics/ 45. Heck MV,Azizov M, Stehning T, Walter M, Kedersha N, Auburger btp352 G (2014) Dysregulated expression of lipid storage and membrane 29. Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan dynamics factors in Tia1 knockout mouse nervous tissue. P, Weier M, Liechti A, Aximu-Petri A, Kircher M, Albert FW, Neurogenetics 15(2):135–144. doi:10.1007/s10048-014-0397-x Zeller U, Khaitovich P, Grutzner F, Bergmann S, Nielsen R, 46. Jin K, Li W, Nagayama T, He X, Sinor AD, Chang J, Mao X, Paabo S, Kaessmann H (2011) The evolution of gene expression Graham SH, Simon RP, Greenberg DA (2000) Expression of the levels in mammalian organs. Nature 478(7369):343–348. doi:10. RNA-binding protein TIAR is increased in neurons after ischemic 1038/nature10532 cerebral injury. J Neurosci Res 59(6):767–774 30. Gruber AR, Lorenz R, Bernhart SH, Neubock R, Hofacker IL 47. Beck AR, Miller IJ, Anderson P, Streuli M (1998) RNA-binding (2008) The Vienna RNA websuite. Nucleic Acids Res 36(Web protein TIAR is essential for primordial germ cell development. Server issue):W70–W74. doi:10.1093/nar/gkn188 Proc Natl Acad Sci U S A 95(5):2331–2336 31. Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm 48. Tapia-Paez I, Tammimies K, Massinen S, Roy AL, Kere J (2008) C, Stadler PF, Hofacker IL (2011) ViennaRNA package 2.0. The complex of TFII-I, PARP1, and SFPQ proteins regulates the Algorithms Mol Biol 6:26. doi:10.1186/1748-7188-6-26 DYX1C1 gene implicated in neuronal migration and dyslexia. 32. Hamada M, Kiryu H, Sato K, Mituyama T, Asai K (2009) FASEB J 22(8):3001–3009. doi:10.1096/fj.07-104455 Prediction of RNA secondary structure using generalized centroid 49. Anczukow O, Rosenberg AZ, Akerman M, Das S, Zhan L, Karni R, estimators. Bioinformatics 25(4):465–473. doi:10.1093/ Muthuswamy SK, Krainer AR (2012) The splicing factor SRSF1 bioinformatics/btn601 regulates apoptosis and proliferation to promote mammary epithe- 33. Agostini F, Zanzoni A, Klus P, Marchese D, Cirillo D, Tartaglia GG lial cell transformation. Nat Struct Mol Biol 19(2):220–228. doi:10. (2013) catRAPID omics: a web server for large-scale prediction of 1038/nsmb.2207 protein-RNA interactions. Bioinformatics 29(22):2928–2930. doi: 50. Vance C, Rogelj B, Hortobagyi T, De Vos KJ, Nishimura AL, 10.1093/bioinformatics/btt495 Sreedharan J, Hu X, Smith B, Ruddy D, Wright P, Ganesalingam 34. Bellucci M, Agostini F, Masin M, Tartaglia GG (2011) Predicting J, Williams KL, Tripathi V, Al-Saraj S, Al-Chalabi A, Leigh PN, protein associations with long noncoding RNAs. Nat Methods 8(6): Blair IP, Nicholson G, de Belleroche J, Gallo JM, Miller CC, Shaw 444–445. doi:10.1038/nmeth.1611 CE (2009) Mutations in FUS, an RNA processing protein, cause 35. Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov familial amyotrophic lateral sclerosis type 6. Science 323(5918): GK, Ward LD, Birney E, Crawford GE, Dekker J (2014) Defining 1208–1211. doi:10.1126/science.1165942 Neurogenetics (2015) 16:201–213 213

51. Timchenko NA, Cai ZJ, Welm AL, Reddy S, Ashizawa T, 55. Pan C, Kumar C, Bohl S, Klingmueller U, Mann M (2009) Timchenko LT (2001) RNA CUG repeats sequester CUGBP1 Comparative proteomic phenotyping of cell lines and primary cells and alter protein levels and activity of CUGBP1. J Biol Chem to assess preservation of cell type-specific functions. Mol Cell 276(11):7820–7826. doi:10.1074/jbc.M005960200 Proteomics 8(3):443–450. doi:10.1074/mcp. M800258-MCP200 52. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, 56. Burdall SE, Hanby AM, Lansdown MR, Speirs V (2003) Breast Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C (2010) Ab cancer cell lines: friend or foe? Breast Cancer Res 5(2):89–95 initio reconstruction of cell type-specific transcriptomes in mouse 57. Peters A (2004) A fourth type of neuroglial cell in the adult central reveals the conserved multi-exonic structure of lincRNAs. Nat nervous system. J Neurocytol 33(3):345–357. doi:10.1023/ Biotechnol 28(5):503–510 B:NEUR.0000044195.64009.27 53. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van 58. Ishizawa K, Komori T, Sasaki S, Arai N, Mizutani T, Hirose T Baren MJ, Salzberg SL, Wold BJ, Pachter L (2010) Transcript (2004) Microglial activation parallels system degeneration in mul- assembly and quantification by RNA-Seq reveals unannotated tran- tiple system atrophy. J Neuropathol Exp Neurol 63(1):43–52 scripts and isoform switching during cell differentiation. Nat 59. Stefanova N, Reindl M, Neumann M, Kahle PJ, Poewe W, Biotechnol 28(5):511–515 Wenning GK (2007) Microglial activation mediates neurodegener- 54. Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) ation related to oligodendroglial alpha-synucleinopathy: implica- Specific expression of long noncoding RNAs in the mouse brain. tions for multiple system atrophy. Mov Disord 22(15):2196– Proc Natl Acad Sci U S A 105(2):716–721. doi:10.1073/pnas. 2203. doi:10.1002/mds.21671 0706729105 Chapter 6

Conclusions and future directions

6.1 Summary

This PhD thesis presents four original primary research articles, each addressing a di↵erent aspect of the overall topic “Human brain transcriptomics: towards under- standing multiple system atrophy”. Together these articles satisfy the aims outlined in chapter one of this PhD thesis.

The first original research article in this thesis titled “Unique transcriptome patterns of the white and grey matter corroborate structural and functional heterogeneity in the human frontal lobe” [90] was the first comprehensive global transcriptome profile of distinct regions of the human brain. The tissue samples used in this study were adjacent grey matter (GM) and white matter (WM) taken from the superior frontal gyrus (SFG) of the prefrontal cortex (PFC). This article demonstrated that transcription in the brain is pervasive and tissue-specific. Further, the transcriptome of the SFG is complex, containing numerous alternative splicing events and novel putative long non-coding RNAs (lncRNAs) and more specifically the subclass long intervening non-coding RNAs (lincRNAs). Much of the region specificity of the brain is driven by the presence of lincRNAs and it is possible that lincRNAs are involved in the development of tissue-specific cell types. The results presented in chapter 2 of this thesis also highlighted the accuracy of RNA-Seq as a tool for investigating the

101 Chapter 6 transcriptome. The RNA-Seq expression profile matched the expression of known gene markers for GM and WM. When this article was published the use of RNA-Seq was not wide-spread, and there had been no comparative analyses of distinct tissue types from the human brain. Currently, many RNA-Seq analyses have been carried out on numerous regions of the human brain, however this study remains the only RNA-Seq based transcriptome analysis of GM and WM from the human brain.

There were a number of lincRNAs that were di↵erentially expressed between healthy WM and healthy GM. Amongst these OLMALINC was the most highly expressed. OLMALINC was up-regulated 4.4-fold in healthy WM, taking it to an expression level of 71.5 fragments per kilobase of transcript per million mapped reads (fpkm). An expression level of 71.5 fpkm placed it amongst the top 10% of all expressed genes in WM. While high expression levels are indicative of functionality, it does not confirm functionality. Hence, the high expression levels of OLMALINC pro- vided motivation to further analyse its role in healthy brain function. These results were published in the paper “High expression of long intervening non-coding RNA OLMALINC in the human cortical white matter is associated with regulation of oligodendrocyte maturation” and make up chapter 3 of this thesis [91]. Compar- ative studies suggest that OLMALINC has evolved recently and is only expressed in human and chimpanzee. Further, amongst human tissues its highest expression levels are seen in the brain. RNAi of OLMAINC in a human oligodendrocyte cell line revealed a possible role of the lincRNA in oligodendrocyte maturation. The ob- servation that OLMALINC may be involved in development of the nervous system was in-line with previous hypotheses surrounding lincRNAs [95]. Most importantly this study was one of the first functional characterisations of a lincRNA expressed in the human brain, and thus demonstrated the importance of the non-coding tran- scriptome throughout the human brain.

Following the comprehensive analysis of the healthy brain transcriptome, next tran-

102 Chapter 6 scriptome profiling of MSA brain tissue from the SFG was carried out. This study was outlined in chapter 4 of this thesis in the paper “Transcriptome analysis of grey and white matter cortical tissue in multiple system atrophy”. This primary research article provided new insights into MSA disease pathology [92]. A hand- ful of new genes that may potentially be involved in the molecular pathology of MSA were identified, including ↵1-hemoglobin (HBA1 ), ↵2-hemoglobin (HBA2 ), - hemoglobin (HBB)andtransthyretin(TTR). The up-regulation of the hemoglobin genes, HBA1, HBA2 and HBB,suggestsapossibleinvolvementofironaccumu- lation in the pathogenesis of MSA. The transcriptome profiles of MSA GM and MSA WM were also compared. A large number of putative lincRNAs were found as di↵erentially expressed between MSA WM and MSA GM. Very little is known about the genetic basis of MSA and this research article provides the most in-depth transcriptomic investigation of MSA to date.

Amongst the di↵erentially expressed genes between MSA GM and MSA WM was the highly expressed linc00320. linc00320 was up-regulated 5.4-fold in MSA WM when compared to MSA GM. A much smaller non-significant up-regulation of linc00320 was found when healthy GM was compared to healthy WM. As linc00320 was identified as having a possible involvement in the MSA disease process, it warranted amorecomprehensiveanalysis.Theanalysisoflinc00320 is discussed in chapter 5 of this thesis in the article “Long intervening non-coding RNA 00320 is human brain- specific and highly expressed in the cortical white matter” [93]. While this primary research article was unable to confirm if linc00320 played a role in the MSA disease process, it did uncover interesting results concerning the biology of lincRNAs. While linc00320 was expressed at high levels in the GM and WM of the SFG, exploration of the expression of linc00320 in the cell lines, human neurons, oligodendrocytes and adult and fetal astrocytes produced negative results. While these three cell types are unambiguously present in the GM and WM of the SFG, the inability to detect expression of linc00320 could be linked to the lack of cell-to-cell communication and

103 Chapter 6 tissue context in these cell lines. Further, linc00320 was analysed at the isoform level, demonstrating that di↵erent splice variants of lincRNAs, can have di↵erent functional properties.

6.2 The use of post-mortem brain tissue in

transcriptome profiling

In general there are three main sources of RNA that can be used for comparative transcriptome studies; cell lines, animal models and post-mortem brain tissue. Each of these sources have both positives and negatives when conducting transcriptome profiling experiments. The tissue- and species-specificity of the non-coding transcrip- tome raises two major issues that must be considered when designing experiments and sourcing RNA; (i) if the transcriptome is context driven will homogeneous cell lines retain the tissue- and organ- specific transcriptome patterns? (ii) since the transcriptome is highly species-specific are animal models sucient for elucidating complex diseases in humans?

There are two types of cell lines; established cell lines and primary cell lines. Estab- lished cell lines are usually derived from tumour cells or normal cells that have been immortalized, while primary cell lines are extracted from the primary source tissue. In general cell lines can be maintained in a stable environment, are easy to han- dle and allow for the extraction of high quality RNA for downstream experiments. However, established cell lines are devoid of any tissue context; this can abolish cell-cell interactions, secretion and other functions based on cellular context [96]. Established cell lines will also experience changes at the phenotypic and genotypic level as they continue to divide over many generations in an in vitro environment [97]. This phenomenon is known as phenotypic or genotypic drift. While primary

104 Chapter 6 cell lines are considered closer to cells in vivo,theystillsu↵erfromphenotypic and genotypic drift. Further, some primary cells, such as neurons will not undergo further division, making them dicult to work with and resulting in a short lifespan.

Acomparativetranscriptomeprofilinganalysisofestablishedhepatomacelllines (HepG2 and Huh7), primary human hepatocyte cultures and human liver tissue was carried out to quantify the amount of genotypic drift that occurs between these di↵erent RNA sources [98]. Using microarrays it was found that 77% of probe sets remained unchanged between the primary cells and those from human tissue, while on average 51.5% of probes di↵ered between human tissue and established cell lines [98]. This comparison used liver cells, a relatively homogeneous organ with homogeneous gene expression when compared to brain tissue. This indicates that even in homogeneous tissue types there are high levels of gene expression drift between the original tissue and cell lines. Further, the microarrays used in this study only measured the expression of genes annotated prior to 2007, of which knowledge of the non-coding transcriptome was relatively limited. No similar comparison has been carried out with cell lines derived from the brain. Hence, a comparison of the transcriptome of brain tissue, primary cell lines and established cell lines should be carried out using RNA-Seq. RNA-Seq would capture the non-coding RNAs and novel transcripts that are prevalent throughout the human brain. As described in chapter 2 the transcriptome of the brain can vary greatly between adjacent, tissue types [90]. This would make it important to match the brain tissue and site of origin of the selected cell lines used for comparative studies.

The evolutionary acceleration of non-coding transcriptome observed in higher ver- tebrates, particularly in primates, results a number of species-specific lncRNAs [31]. If it is indeed these species-specific lncRNAs that are involved in complex disorders, that specifically a✏ict higher mammals, then the use of animal models that lack these lncRNAs may not be appropriate for studying these diseases [31]. Mice dis-

105 Chapter 6 ease models are often used to study complex neurodegenerative diseases, however it has been shown that 30% of all known lncRNAs are primate-specific, and the ma- jority of human lncRNAs have no orthologs in mice. Nevertheless, there are many advantages associated with using animal models, at least for selected disorders; (i) it allows for extraction of high-quality RNA from numerous biological replicates, (ii) it is easier to match cases and controls, (iii) the cellular in vivo context is retained. Many animal models also have relatively short generation times and can be more easily manipulated genetically. When using animal models it is however important to make sure that the correct biological question is being asked, while considering the limitations of the model being used.

Post-mortem brain tissue preserves the context of the tissue being studied, and hence should be considered the gold standard for gene expression studies [14]. Within post- mortem brain tissue, cells maintain their tissue context and any species-specific lncRNAs are expressed. However, post-mortem tissue can be prone to degrada- tion. There are two major factors that contribute to degradation of RNA; these are post-mortem factors and agonal factors. Post-mortem factors refer to the treatment of the brain after death, primarily the time taken from death to the brain being frozen; known as the post-mortem interval (PMI). Agonal factors refer to events occurring prior to death, including coma, hypoxia, pyrexia, seizures, dehydration hypoglycemia, multiple organ failure, head injury, and ingestion of neurotoxic sub- stances at time of death [99]. Agonal factors are much harder to control then post- mortem factors. While ideally, high quality RNA should be used in transcriptome profiling experiments, moderate levels of RNA degradation does not seem to impact on the reliability of experiments [99, 100].

The major goals of this thesis comprised understanding the complexity of the human brain, with a particular focus on the non-coding transcriptome, followed by an investigation of the complex neurodegenerative disease MSA. To complete these

106 Chapter 6 goals post-mortem brain tissue was the ideal source of RNA. Further, all RNA extracted from post-mortem brain tissue used in these studies were of moderate to good quality, having a RNA integrity number (RIN) of between 6.0 and 7.0, thus providing a statistically robust representation of the transcriptome [101]. Using post-mortem brain tissues in comparative transcriptome analyses appears to be the most reliable way of detecting RNA elements of interest.

6.3 Selection of the superior frontal gyrus

Throughout this thesis all of the RNA extracted from post-mortem brain tissue used for RNA-Seq and the subsequent RT-qPCR validation, was taken from the SFG of the human brain. The SFG is a sub-region of the PFC of the human brain, and has been implicated in higher cognitive functions including working memory [102] and self-awareness [103]. This involvement in human specific higher thought processes should result in human-specific patterns of the transcriptome being expressed. If acomparativetranscriptomeanalysiswascarriedoutbetweenahumanliverand chimpanzee liver, it would be expected that there would not be a huge di↵erence in gene expression profile as the function of the liver is similar in both organisms. Transcriptome profiling of the SFG allows for an in-depth view of a physiologically advanced region of the brain and may hold the key to understanding how the tran- scriptome drives the complexity of the human brain.

As well as playing a role in human-specific intelligence the SFG is also a good target for capturing MSA disease pathology. While the SFG is a↵ected by MSA, it is one of the last regions of the brain to undergo cellular loss and GCI accumulation during the progression of MSA [66]. The motivation behind selecting the SFG was to capture early or at least only moderately advanced changes involved in MSA pathology. Therefore any observed aberration of gene expression could be directly

107 Chapter 6 related to the molecular pathology of MSA, rather than its advanced pathogenic features. Conversely, the cerebellum and globus pallidus are two regions of the brain that are a↵ected early in MSA pathology, experiencing extensive cellular loss and GCI accumulation [66]. If the cerebellum or globus pallidus were used in a comparative transcriptome analysis, it would be dicult to delineate between gene expression changes that are a causative factor of MSA and gene expression changes due to the cellular loss or the secondary e↵ects of MSA progression. While selecting the SFG for analysis may deliver a smaller number of di↵erentially expressed genes and transcripts, the identified genes are more likely to be involved in the underlying progression of MSA.

6.4 The complexity of the transcriptome of the

human brain

The human brain, and in particular the SFG of the PFC, shows high levels of overall transcription, with GM showing a higher level of transcriptional activity when compared to WM. Interestingly, while there appeared to be less transcriptional activity within the WM of the brain, there was a significantly higher proportion of lincRNAs and un-annotated isoforms expressed in this structure of the brain. A large number of the un-annotated transcripts appeared to lack protein coding potential and presumably are lincRNAs. It is known that lncRNAs including lincRNAs are widely expressed across the brain, where they often carry out cell-specific regulatory functions, possibly via chromatin modification [104]. This suggests that multiple of levels of control of chromatin modification and thus more intensive regulation of gene expression may be needed in WM. It is known that myelination of axons in the WM is most significant during childhood and adolescence but can continue into adulthood [105]. Further, there appears to be an increase in the water fraction

108 Chapter 6 between the hydrophobic bilayers of myelin sheath between the ages 20 and 55, suggesting the myelin is indeed remodelled throughout life [106]. WM may require a higher proportion of lincRNAs so that myelination of axons can be carried out throughout life as new skills are acquired.

Chapter 2 highlighted that there was an increase in the number and diversity of transcripts in the human brain through alternative splicing [92]. Almost 50% of transcripts underwent alternative splicing, this included 8,653 genes that produced between two and seven isoforms and 107 genes with eight or more isoforms. The importance of analysing genes at the isoform level was further emphasised by the di↵erential expression of the G protein-coupled receptor 123 (GPR123 ). At the gene level GPR123 was up-regulated 5-fold in GM. When GPR123 was analysed at the isoform level a di↵erent picture emerged. There were four splice variants transcribed from the GPR123 locus, while three of the four GPR123 isoforms were up-regulated in GM, one isoform was uniquely expressed in WM, where it became the dominant isoform. This phenomenon, in which the dominant isoform changes between conditions, was termed isoform switching.

Isoform switching was not only identified at the GPR123 locus. It was identified numerous times throughout the transcriptome of the SFG as described in chapter 2 [92]. If the transcriptome is only analysed at the gene level and not at the isoform level this particular type of splicing pattern would be missed. This has important biological relevance as each splice variant can be functionally distinct either at the RNA or protein level. There are no examples in the literature that quantify the levels of isoform switching in other peripheral human tissues or in di↵erent regions of the brain. It is possible that isoform switching increases with tissue complexity. Indeed, alternative splicing appears to be correlated with organism complexity. Alternative splicing is more common in humans [107, 108] than in Drosophila melanogaster [109] and Caenorhabditis elegans [110]. Isoform switching, such as that identified at the

109 Chapter 6

GPR123 locus, provides an example of the molecular machinery producing more then one functional molecule from a single locus.

Performing analyses at the isoform level is currently hampered by the lack of func- tional information e.g. gene ontology terms, on individual transcripts. Current databases only outline the function at the gene level, and this function can normally only be applied to the dominant isoform. Accordingly, a splice variant that is func- tionally divergent from the dominant isoform will not be properly accounted for in apathwayanalysisorincludedinageneontologyenrichmentanalysis.Thus,there is a need for the development of more comprehensive functional catalogues of each individual isoforms. This will allow for information on isoform expression levels to be properly utilised.

6.5 Brain lincRNAs show low levels of sequence

and expression conservation

LincRNAs appear to be under di↵erent conservation constraints when compared to protein-coding genes. Both OLMALINC and linc00320 have low levels of genomic sequence and expression conservation. The genomic sequence of OLMALINC was conserved across rhesus macaque, orangutan, gorilla and chimpanzee. Outside of human, expression of the entire transcript was only detected in chimpanzee [91]. The sequence and expression conservation patterns of linc00320 were even more specific. Complete sequence conservation was only detected in orangutan and chimpanzee, and no expression of linc00320 was seen outside humans [93]. The lower levels of genomic sequence and expression conservation of linc00320 suggest that linc00320 is evolutionarily younger than OLMALINC.Theseresultsareinlinewitharecent study of lincRNAs in human and macaque PFC, that demonstrated lincRNAs are

110 Chapter 6 less conserved then protein-coding genes [50].

Amongst human tissues OLMALINC had its highest level of expression within brain tissue, and more specifically within the WM of the SFG [91]. Expression of OLMAL- INC was also detected in human oligodendrocyte and neuronal cell lines. Linc00320 was only expressed in post-mortem brain tissue, with no expression detected in hu- man peripheral tissues. Further no expression of linc00320 was detected in the hu- man brain-related cell lines such as neurons, oligodendrocytes, fetal astrocytes and adult astrocytes [93]. All of these cell types are present in the post-mortem SFG tissue used to originally identify the expression of linc00320.Thissuggeststhata heterogeneous mix of cells is required to detect expression of linc00320. The higher levels of brain-specific expression of evolutionary younger linc00320,maysuggest younger lincRNAs are involved in more human-specific functions of the brain.

6.6 Insights into the pathology of multiple system

atrophy

Currently, there is limited knowledge related to the mechanistic nature of MSA. Per- haps one of the most significant findings of this thesis, outlined in chapter 4, was the up-regulation of all three members (HBA1, HBA2, HBB)ofthehemoglobinprotein complex specifically in MSA WM from the SFG [92]. Interestingly, hemoglobin is the largest source of peripheral iron throughout the human body [111]. It is foresee- able that hemoglobin may also play a role in iron homeostasis throughout the brain [112]. Disrupted iron homeostasis has long been associated with various neurodegen- erative disorders. For example, increased iron levels throughout the brain have been correlated with PD and MSA [113]. However, the debate on whether iron accumula- tion promotes neurodegeneration or is simply a by-product of neuronal loss has not

111 Chapter 6 been settled [114]. It is possible that increased iron levels promotes neurodegenera- tion through oxidative stress, promotion of ↵-synuclein aggregation and alteration of myelin synthesis [111, 114]. These pathogenic events fit well with what is known about MSA disease pathology. The relationship between hemoglobin, iron and neu- rodegeneration is further supported by the findings that high levels of ↵-synuclein are present in red blood cells [115] and that in PD, high levels of hemoglobin are as- sociated with an increased risk of disease development [111]. For the study presented in chapter 4 of this thesis the frontal WM was selected from the SFG [92], which has limited MSA-specific damage [66]. This suggests that the increase in hemoglobin gene expression may precede the neurodegenerative process and neuronal loss. This could potentially open up the use of hemoglobin as a bio-marker for MSA.

Another insight into MSA disease pathology relates to the large number of puta- tive lincRNAs that were found to be up-regulated in MSA WM when compared to MSA GM [92]. There were a total of 133 putative lincRNAs up-regulated in MSA WM compared with only 52 putative lincRNAs up-regulated in healthy WM when compared to healthy GM [92]. None of the lincRNAs were common between both data sets. While it is probable that some subset of putative lincRNAs, that were up-regulated in MSA WM, play a role in tissue di↵erentiation, each di↵erentially expressed lincRNA may present a new target for further investigation into a possible role in MSA disease pathology.

6.7 Future directions

Transcriptomics is one of the fastest growing fields in molecular biology. The tran- scriptome is made up of an array of interesting elements including circRNAs, sncR- NAs and lncRNAs. As research into the transcriptome progresses it is foreseeable an even greater number of elements will be identified. As it currently stands, func-

112 Chapter 6 tional studies of each of these various RNA molecules lag behind identification of expression. In the future comprehensive functional studies of each identified RNA transcript will be needed.

This thesis presents in-depth studies of two lincRNAs, OLMALINC and linc00320, demonstrating both transcripts have specific expression patterns and di↵ering levels of evolutionary conservation [91, 93]. In humans there exists thousands of lincR- NAs and other lncRNAs each with own unique tissue based expression patterns and levels of evolutionary conservation that have not been functionally classified [32]. It might be expected that this number will grow as more RNA-Seq investigations are carried out across di↵erent tissues. Systematic research needs to be directed towards classifying the functional role of each lincRNA. This would allow for the development of lincRNA functional catalogues, facilitating the use of gene ontology and pathway enrichment analyses. Furthermore, it will allow for a better under- standing of the role of lincRNAs in the molecular pathology of complex diseases such as neurodegenerative diseases.

The first step towards categorising lincRNAs would be to map the expression of each individual transcript across an array of di↵erent human tissue types. This has been achieved to an extent via the Genotype-Tissue Expression (GTEx) project [116], however as the human reference genome and reference annotations are improved this will need constant updating. Following expression across human tissues, a determination of genomic sequence conservation and expression amongst vertebrates of each lincRNA is needed. This can be followed by RNAi of lincRNAs in various cell and tissue types to establish possible target genes or proteins that the lincRNAs may interact with. Protein binding capacity of lincRNAs is another important functional trait. The protein binding capacity of lincRNAs could be assessed by first completing a pull-down of proteins bound to lincRNA-specific oligonucleotides. These proteins could then be characterised using mass spectrometry.

113 Chapter 6

As previously noted the expression of lincRNAs appears to be a function of their cellular context, and many lincRNAs are species-specific. This highlights a potential quandary related to functional studies of lincRNAs. Many traditional functional genomics studies, that make use of cell lines and mouse models, may prove ine↵ective when studying lincRNAs. For those lincRNAs, that are expressed in cell lines, systematic knock-downs across a variety of cell lines could be used to classify specific function. For lincRNAs that have high levels of evolutionary conservation mouse models may also be appropriate. For those lincRNAs that are expressed only in human cells, it may be possible to gain insight in the function of the lincRNA using transgenic mice. It would be of interest to see if over-expressing human-specific lincRNAs in mice results in any observable changes in phenotype.

One of the most interesting elements of the diverse RNA landscape is the presence of antisense transcripts throughout the genome. Antisense transcripts are units of RNA transcribed from a strand opposite a sense locus [117]. This phenomenon results in a RNA transcript that has partial or complete overlap across exonic or intronic regions of the the opposing transcript, hence creating a sense-antisense pair. The majority of antisense transcripts appear to be ncRNA, and in particular lncRNAs. Recent studies have estimated that between 20-30% of human transcripts have an antisense partner [118, 119]. Similarly to lncRNAs, antisense transcripts appear to be transcribed in a cell-specific manner [120]. Standard RNA-Seq approaches are not able to resolve the strand of origin of each sequence read. This, coupled with the fact that much of the transcriptional output of cell is not polyadenylated (poly(A)-) [121], highlights the need for the utilization of strand-specific RNA-Seq combined with ribosomal depletion of the total RNA fraction. Ribosomal depletion will select for both poly(A)+ and poly(A)- transcripts. This will allow for generation of a more complete and in-depth picture of the transcriptome to be formed, which is of particular importance when studying complex transcriptomes such as that of the human brain.

114 Chapter 6

The research outlined in chapter 4 of this thesis has opened new avenues in the study of MSA disease pathology [92]. Firstly, the large number of putative lincR- NAs di↵erentially expressed between MSA GM and MSA WM, that did not appear di↵erentially expressed between healthy GM and healthy WM [92], present potential new targets for investigation into MSA pathology. This could be facilitated by clas- sifying the expression profiles, levels of conservations and investigating the function of each of the putative lincRNAs. The protein-coding genes HBA1, HBA2, HBB also warrant further investigation. The suggested relationship between increased hemoglobin levels and iron needs to be confirmed. This could be completed by us- ing histochemical techniques such as the Meguro iron stain to establish if there is indeed an build up of iron in MSA WM [122]. It also possible that a strand-specific RNA-Seq coupled with ribosomal depletion of di↵erent brain regions could o↵er further insights into MSA pathology.

Finally, one of the huge advantages of the technological revolution that is currently sweeping biology, is the free access and availability of numerous RNA-Seq data sets produced by laboratories around the world. The answers to many biological questions can be found through meta-analysis of data sets that were produced to originally answer a di↵erent question, or through the analysis of older data sets using new analytical pipelines, reference genomes and reference annotations. Maintaining the public accessibility of RNA-Seq datasets is essential and will allow the field of transcriptomics to progress at a remarkable pace.

115 References

[1] R. L. Holloway, C. C. Sherwood, P. R. Hof, and J. K. Rilling, “Evolution of the brain in humans–paleoneurology,” in Encyclopedia of Neuroscience,pp.1326– 1334, Springer, 2009.

[2] H. Stephan, H. Frahm, and G. Baron, “New and revised data on volumes of brain structures in insectivores and primates,” Folia Primatologica,vol.35, no. 1, pp. 1–29, 1981.

[3] A. R. Damasio and S. W. Anderson, “The frontal lobes,” Clinical Neuropsy- chology,vol.4,pp.404–6,1993.

[4] P. S. Goldman-Rakic, A. Cools, and K. Srivastava, “The prefrontal landscape: implications of functional architecture for understanding human mentation and the central executive [and discussion],” Philosophical Transactions of the Royal Society B: Biological Sciences,vol.351,no.1346,pp.1445–1453,1996.

[5] J. D. Gabrieli, R. A. Poldrack, and J. E. Desmond, “The role of left prefrontal cortex in language and memory,” Proceedings of the National Academy of Sciences, vol. 95, no. 3, pp. 906–913, 1998.

[6] P. Vendrell, C. Junqu´e, J. Pujol, M. A. Jurado, J. Molet, and J. Grafman, “The role of prefrontal regions in the stroop task,” Neuropsychologia,vol.33, no. 3, pp. 341–352, 1995.

[7] J. H. Lui, D. V. Hansen, and A. R. Kriegstein, “Development and evolution of the human neocortex,” Cell,vol.146,no.1,pp.18–36,2011.

[8] R. C. Gur, B. I. Turetsky, M. Matsui, M. Yan, W. Bilker, P. Hughett, and R. E. Gur, “Sex di↵erences in brain gray and white matter in healthy

116 young adults: correlations with cognitive performance,” The Journal of Neu- roscience,vol.19,no.10,pp.4065–4072,1999.

[9] J. B. Smaers, A. Schleicher, K. Zilles, and L. Vinicius, “Frontal white matter volume is associated with brain enlargement and higher structural connectivity in anthropoid primates,” PLoS One,vol.5,no.2,p.e9123,2010.

[10] K. Zhang and T. J. Sejnowski, “A universal scaling law between gray matter and white matter of cerebral cortex,” Proceedings of the National Academy of Sciences,vol.97,no.10,pp.5621–5626,2000.

[11] P. T. Schoenemann, M. J. Sheehan, and L. D. Glotzer, “Prefrontal white matter volume is disproportionately larger in humans than in other primates,” Nature Neuroscience,vol.8,no.2,pp.242–252,2005.

[12] R. D. Fields, “White matter matters,” Scientific American,vol.298,no.3, pp. 54–61, 2008.

[13] J. Duncan and A. M. Owen, “Common regions of the human frontal lobe recruited by diverse cognitive demands,” Trends in Neurosciences,vol.23, no. 10, pp. 475–483, 2000.

[14] G. T. Sutherland, M. Janitz, and J. J. Kril, “Understanding the pathogenesis of alzheimer’s disease: will rna-seq realize the promise of transcriptomics?,” Journal of Neurochemistry,vol.116,no.6,pp.937–946,2011.

[15] A. E. Vinogradov and O. V. Anatskaya, “Organismal complexity, cell di↵er- entiation and gene expression: human over mouse,” Nucleic Acids Research, vol. 35, no. 19, pp. 6350–6356, 2007.

[16] D. W. Meinke, J. M. Cherry, C. Dean, S. D. Rounsley, and M. Koornneef, “Arabidopsis thaliana: a model plant for genome analysis,” Science,vol.282, no. 5389, pp. 662–682, 1998.

117 [17] F. A. Azevedo, L. R. Carvalho, L. T. Grinberg, J. M. Farfel, R. E. , R. E. Leite, R. Lent, S. Herculano-Houzel, et al.,“Equalnumbersofneuronal and nonneuronal cells make the human brain an isometrically scaled-up pri- mate brain,” Journal of Comparative Neurology,vol.513,no.5,pp.532–541, 2009.

[18] D. H. Hall, Z. F. Altun, et al., C. elegans atlas. Cold Spring Harbor Laboratory Press, 2007.

[19] L. W. Hillier, A. Coulson, J. I. Murray, Z. Bao, J. E. Sulston, and R. H. Waterston, “Genomics in c. elegans: so many genes, such a little worm,” Genome Research,vol.15,no.12,pp.1651–1660,2005.

[20] M.-C. King and A. C. Wilson, Evolution at two levels in humans and chim- panzees.na,1975.

[21] R. J. Britten, “Divergence between samples of chimpanzee and human dna sequences is 5%, counting indels,” Proceedings of the National Academy of Sciences,vol.99,no.21,pp.13633–13635,2002.

[22] A. Varki and T. K. Altheide, “Comparing the human and chimpanzee genomes: searching for needles in a haystack,” Genome Research,vol.15, no. 12, pp. 1746–1758, 2005.

[23] B. J. Blencowe, “Alternative splicing: new insights from global analyses,” Cell, vol. 126, no. 1, pp. 37–47, 2006.

[24] Z. Peng, Y. Cheng, B. C.-M. Tan, L. Kang, Z. Tian, Y. Zhu, W. Zhang, Y. Liang, X. Hu, X. Tan, et al.,“Comprehensiveanalysisofrna-seqdata reveals extensive rna editing in a human transcriptome,” Nature Biotechnology, vol. 30, no. 3, pp. 253–260, 2012.

118 [25] R. Holliday, “Epigenetics: a historical overview,” Epigenetics,vol.1,no.2, pp. 76–80, 2006.

[26] I. Keshet, J. Yisraeli, and H. Cedar, “E↵ect of regional dna methylation on gene expression,” Proceedings of the National Academy of Sciences,vol.82, no. 9, pp. 2560–2564, 1985.

[27] J. S. Mattick, “The central role of rna in human development and cognition,” FEBS Letters,vol.585,no.11,pp.1600–1616,2011.

[28] J. S. Mattick, “Rna regulation: a new genetics?,” Nature Reviews Genetics, vol. 5, no. 4, pp. 316–323, 2004.

[29] M. Mel´e, P. G. Ferreira, F. Reverter, D. S. DeLuca, J. Monlong, M. Sammeth, T. R. Young, J. M. Goldmann, D. D. Pervouchine, T. J. Sullivan, et al., “The human transcriptome across tissues and individuals,” Science,vol.348, no. 6235, pp. 660–665, 2015.

[30] A. E. Dahlberg, “The functional role of ribosomal rna in protein synthesis,” Cell,vol.57,no.4,pp.525–529,1989.

[31] S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, et al.,“Landscapeoftranscrip- tion in human cells,” Nature,vol.489,no.7414,pp.101–108,2012.

[32] T. Derrien, R. Johnson, G. Bussotti, A. Tanzer, S. Djebali, H. Tilgner, G. Guernec, D. Martin, A. Merkel, D. G. Knowles, et al.,“Thegencodev7cat- alog of human long noncoding rnas: analysis of their gene structure, evolution, and expression,” Genome Research,vol.22,no.9,pp.1775–1789,2012.

[33] J. S. Mattick and I. V. Makunin, “Non-coding rna,” Human Molecular Genet- ics,vol.15,no.suppl1,pp.R17–R29,2006.

119 [34] I. A. Qureshi and M. F. Mehler, “Emerging roles of non-coding rnas in brain evolution, development, plasticity and disease,” Nature Reviews Neuroscience, vol. 13, no. 8, pp. 528–541, 2012.

[35] W. R. Jeck and N. E. Sharpless, “Detecting and characterizing circular rnas,” Nature Biotechnology, vol. 32, no. 5, pp. 453–461, 2014.

[36] X. You, I. Vlatkovic, A. Babic, T. Will, I. Epstein, G. Tushev, G. Akbalik, M. Wang, C. Glock, C. Quedenau, et al., “Neural circular rnas are derived from synaptic genes and regulated by development and plasticity,” Nature Neuroscience,vol.18,no.4,pp.603–610,2015.

[37] D. Moazed, “Small rnas in transcriptional gene silencing and genome defence,” Nature,vol.457,no.7228,pp.413–420,2009.

[38] J. S. Mattick and J. L. Rinn, “Discovery and annotation of long noncoding rnas,” Nature Structural & Molecular Biology,vol.22,no.1,pp.5–7,2015.

[39] R. A. Gupta, N. Shah, K. C. Wang, J. Kim, H. M. Horlings, D. J. Wong, M.- C. Tsai, T. Hung, P. Argani, J. L. Rinn, et al., “Long non-coding rna hotair reprograms chromatin state to promote cancer metastasis,” Nature,vol.464, no. 7291, pp. 1071–1076, 2010.

[40] C. Xu, M. Yang, J. Tian, X. Wang, and Z. Li, “Malat-1: a long non-coding rna and its important 3’end functional motif in colorectal cancer metastasis,” International Journal of Oncology, vol. 39, no. 1, pp. 169–175, 2011.

[41] P. Wu, X. Zuo, H. Deng, X. Liu, L. Liu, and A. Ji, “Roles of long noncoding rnas in brain development, functional diversification and neurodegenerative diseases,” Brain Research Bulletin,vol.97,pp.69–80,2013.

[42] M. N. Cabili, C. Trapnell, L. Go↵, M. Koziol, B. Tazon-Vega, A. Regev, and J. L. Rinn, “Integrative annotation of human large intergenic noncoding

120 rnas reveals global properties and specific subclasses,” Genes & Development, vol. 25, no. 18, pp. 1915–1927, 2011.

[43] M. Sauvageau, L. A. Go↵, S. Lodato, B. Bonev, A. F. Gro↵, C. Gerhardinger, D. B. Sanchez-Gomez, E. Hacisuleyman, E. Li, M. Spence, et al.,“Multiple knockout mouse models reveal lincrnas are required for life and brain devel- opment,” eLife,vol.2,p.e01749,2013.

[44] M.-C. Tsai, R. C. Spitale, and H. Y. Chang, “Long intergenic noncoding rnas: new links in cancer progression,” Cancer Research,vol.71,no.1,pp.3–7, 2011.

[45] L. A. Hindor↵, P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins, and T. A. Manolio, “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proceedings of the National Academy of Sciences,vol.106,no.23,pp.9362–9367,2009.

[46] M. Ward, C. McEwan, J. D. Mills, and M. Janitz, “Conservation and tissue- specific transcription patterns of long noncoding rnas,” Journal of Human Transcriptome,2015.

[47] I. Ulitsky, A. Shkumatava, C. H. Jan, H. Sive, and D. P. Bartel, “Conserved function of lincrnas in vertebrate embryonic development despite rapid se- quence evolution,” Cell,vol.147,no.7,pp.1537–1550,2011.

[48] A. Necsulea, M. Soumillon, M. Warnefors, A. Liechti, T. Daish, U. Zeller, J. C. Baker, F. Gr¨utzner, and H. Kaessmann, “The evolution of lncrna repertoires and expression patterns in tetrapods,” Nature,vol.505,no.7485,pp.635–640, 2014.

[49] C. Kutter, S. Watt, K. Ste✏ova, M. D. Wilson, A. Goncalves, C. P. Ponting, D. T. Odom, and A. C. Marques, “Rapid turnover of long noncoding rnas and

121 the evolution of gene expression,” PLoS Genetics,vol.8,no.7,p.e1002841, 2012.

[50] Z. He, H. Bammann, D. Han, G. Xie, and P. Khaitovich, “Conserved expres- sion of lincrna during human and macaque prefrontal cortex development and maturation,” RNA,vol.20,no.7,pp.1103–1111,2014.

[51] T. R. Mercer, M. E. Dinger, S. M. Sunkin, M. F. Mehler, and J. S. Mattick, “Specific expression of long noncoding rnas in the mouse brain,” Proceedings of the National Academy of Sciences,vol.105,no.2,pp.716–721,2008.

[52] V. Costa, C. Angelini, I. De Feis, and A. Ciccodicola, “Uncovering the complexity of transcriptomes with rna-seq,” BioMed Research International, vol. 2010, 2010.

[53] E. Courtney, S. Kornfeld, K. Janitz, and M. Janitz, “Transcriptome profiling in neurodegenerative disease,” Journal of Neuroscience Methods,vol.193,no.2, pp. 189–202, 2010.

[54] J. Chen, J. D. Mills, G. M. Halliday, and M. Janitz, “The role of transcriptional control in multiple system atrophy,” Neurobiology of Aging,vol.36,no.1, pp. 394–400, 2015.

[55] C. Trapnell, A. Roberts, L. Go↵, G. Pertea, D. Kim, D. R. Kelley, H. Pimentel, S. L. Salzberg, J. L. Rinn, and L. Pachter, “Di↵erential gene and transcript expression analysis of rna-seq experiments with tophat and cu✏inks,” Nature Protocols, vol. 7, no. 3, pp. 562–578, 2012.

[56] J. D. Mills and M. Janitz, “Alternative splicing of mrna in the molecular pathology of neurodegenerative diseases,” Neurobiology of Aging,vol.33,no.5, pp. 1012–e11, 2012.

122 [57] A. Schrag, Y. Ben-Shlomo, and N. Quinn, “Prevalence of progressive supranu- clear palsy and multiple system atrophy: a cross-sectional study,” The Lancet, vol. 354, no. 9192, pp. 1771–1775, 1999.

[58] D. L. Longo, A. Fanciulli, and G. K. Wenning, “Multiple-system atrophy,” New England Journal of Medicine,vol.372,no.3,pp.249–263,2015.

[59] M. Jecmenica-Lukic, W. Poewe, E. Tolosa, and G. K. Wenning, “Premo- tor signs and symptoms of multiple system atrophy,” The Lancet Neurology, vol. 11, no. 4, pp. 361–368, 2012.

[60] S. Gilman, G. Wenning, P. a. Low, D. Brooks, C. Mathias, J. Trojanowski, N. W. Wood, C. Colosimo, A. D¨urr, C. Fowler, et al.,“Secondconsensus statement on the diagnosis of multiple system atrophy,” Neurology,vol.71, no. 9, pp. 670–676, 2008.

[61] C.-F. Lu, B.-W. Soong, H.-M. Wu, S. Teng, P.-S. Wang, and Y.-T. Wu, “Dis- rupted cerebellar connectivity reduces whole-brain network eciency in mul- tiple system atrophy,” Movement Disorders,vol.28,no.3,pp.362–369,2013.

[62] S. Gilman, P. Low, N. Quinn, A. Albanese, Y. Ben-Shlomo, C. Fowler, H. Kaufmann, T. Klockgether, A. Lang, P. Lantos, et al.,“Consensusstate- ment on the diagnosis of multiple system atrophy,” Journal of the Neurological Sciences,vol.163,no.1,pp.94–98,1999.

[63] M. K¨ollensperger, F. Geser, J.-P. Ndayisaba, S. Boesch, K. Seppi, K. Oster- gaard, E. Dupont, A. Cardozo, E. Tolosa, M. Abele, et al.,“Presentation, diagnosis, and management of multiple system atrophy in europe: final anal- ysis of the european multiple system atrophy registry,” Movement Disorders, vol. 25, no. 15, pp. 2604–2612, 2010.

123 [64] H. Watanabe, Y. Saito, S. Terao, T. Ando, T. Kachi, E. Mukai, I. Aiba, Y. Abe, A. Tamakoshi, M. Doyu, et al.,“Progressionandprognosisinmultiple system atrophy,” Brain,vol.125,no.5,pp.1070–1083,2002.

[65] M. J. Mart`I, E. Tolosa, and J. Campdelacreu, “Clinical overview of the synu- cleinopathies,” Movement Disorders,vol.18,no.S6,pp.21–27,2003.

[66] G. M. Halliday, J. L. Holton, T. Revesz, and D. W. Dickson, “Neuropathol- ogy underlying clinical variability in patients with synucleinopathies,” Acta Neuropathologica,vol.122,no.2,pp.187–204,2011.

[67] G. K. Wenning, N. Stefanova, K. A. Jellinger, W. Poewe, and M. G. Schloss- macher, “Multiple system atrophy: a primary oligodendrogliopathy,” Annals of Neurology,vol.64,no.3,pp.239–246,2008.

[68] Y. J. C. Song, D. M. Lundvig, Y. Huang, W. P. Gai, P. C. Blumbergs, P. Højrup, D. Otzen, G. M. Halliday, and P. H. Jensen, “p25↵ relocalizes in oligodendroglia from myelin to cytoplasmic inclusions in multiple system atrophy,” The American Journal of Pathology,vol.171,no.4,pp.1291–1303, 2007.

[69] Y. T. Asi, J. E. Simpson, P. R. Heath, S. B. Wharton, A. J. Lees, T. Revesz, H. Houlden, and J. L. Holton, “Alpha-synuclein mrna expression in oligoden- drocytes in msa,” Glia,vol.62,no.6,pp.964–970,2014.

[70] J. F. Reyes, N. L. Rey, L. Bousset, R. Melki, P. Brundin, and E. Angot, “Alpha-synuclein transfers from neurons to oligodendrocytes,” Glia,vol.62, no. 3, pp. 387–398, 2014.

[71] J. M. Bleasel, J. H. Wong, G. M. Halliday, and W. S. Kim, “Lipid dysfunction and pathogenesis of multiple system atrophy,” Acta Neuropathologica Com- munications,vol.2,no.1,p.15,2014.

124 [72] L. Fellner, K. A. Jellinger, G. K. Wenning, and N. Stefanova, “Glial dysfunc- tion in the pathogenesis of ↵-synucleinopathies: emerging concepts,” Acta Neuropathologica,vol.121,no.6,pp.675–693,2011.

[73] L. Fellner and N. Stefanova, “The role of glia in alpha-synucleinopathies,” Molecular Neurobiology, vol. 47, no. 2, pp. 575–586, 2013.

[74] Y. Huang, Y. J. C. Song, K. Murphy, J. L. Holton, T. Lashley, T. Revesz, W.- P. Gai, and G. M. Halliday, “Lrrk2 and parkin immunoreactivity in multiple system atrophy inclusions,” Acta Neuropathologica,vol.116,no.6,pp.639– 646, 2008.

[75] K. Wakabayashi and H. Takahashi, “Cellular pathology in multiple system atrophy,” Neuropathology,vol.26,no.4,pp.338–345,2006.

[76] J. C. Watts, K. Giles, A. Oehler, L. Middleton, D. T. Dexter, S. M. Gentleman, S. J. DeArmond, and S. B. Prusiner, “Transmission of multiple system atrophy prions to transgenic mice,” Proceedings of the National Academy of Sciences, vol. 110, no. 48, pp. 19555–19560, 2013.

[77] S. W. Scholz, H. Houlden, C. Schulte, M. Sharma, A. Li, D. Berg, A. Melchers, R. Paudel, J. R. Gibbs, J. Simon-Sanchez, et al.,“Sncavariantsareassociated with increased risk for multiple system atrophy,” Annals of Neurology,vol.65, no. 5, pp. 610–614, 2009.

[78] A. Al-Chalabi, A. Durr, N. W. Wood, M. H. Parkinson, A. Camuzat, J.-S. Hulot, K. E. Morrison, A. Renton, S. D. Sussmuth, B. G. Landwehrmeyer, et al.,“Geneticvariantsofthealpha-synucleingenesncaareassociatedwith multiple system atrophy,” PLoS One,vol.4,no.9,p.e7114,2009.

[79] M.-S. A. R. Collaboration et al.,“Mutationsincoq2infamilialandsporadic multiple-system atrophy.,” The New England Journal of Medicine,vol.369, no. 3, p. 233, 2013.

125 [80] H. Soma, I. Yabe, A. Takei, N. Fujiki, T. Yanagihara, and H. Sasaki, “Associ- ations between multiple system atrophy and polymorphisms of slc1a4, sqstm1, and eif4ebp1 genes,” Movement Disorders,vol.23,no.8,pp.1161–1167,2008.

[81] O. A. Ross, A. I. Soto-Ortolaza, M. G. Heckman, J. O. Aasly, N. Abahuni, G. Annesi, J. A. Bacon, S. Bardien, M. Bozi, A. Brice, et al., “Association of lrrk2 exonic variants with susceptibility to parkinson’s disease: a case–control study,” The Lancet Neurology,vol.10,no.10,pp.898–908,2011.

[82] J. Mitsui, T. Matsukawa, H. Sasaki, I. Yabe, M. Matsushima, A. D¨urr, A. Brice, H. Takashima, A. Kikuchi, M. Aoki, et al.,“Variantsassociatedwith gaucher disease in multiple system atrophy,” Annals of Clinical and Transla- tional Neurology,vol.2,no.4,pp.417–426,2015.

[83] M. C. Ferguson, E. M. Garland, L. Hedges, B. Womack-Nunley, R. Hamid, J. A. Phillips III, C. A. Shibao, S. R. Raj, I. Biaggioni, and D. Robertson, “Shc2 gene copy number in multiple system atrophy (msa),” Clinical Auto- nomic Research,vol.24,no.1,pp.25–30,2014.

[84] H. Sasaki, M. Emi, H. Iijima, N. Ito, H. Sato, I. Yabe, T. Kato, J. Utsumi, and K. Matsubara, “Copy number loss of (src homology 2 domain containing)- transforming protein 2 (shc2) gene: discordant loss in monozygotic twins and frequent loss in patients with multiple system atrophy,” Molecular Brain, vol. 4, p. 24, 2011.

[85] J. D. Mills, T. Nalpathamkalam, H. I. Jacobs, C. Janitz, D. Merico, P. Hu, and M. Janitz, “Rna-seq analysis of the parietal cortex in alzheimer’s disease re- veals alternatively spliced isoforms related to lipid metabolism,” Neuroscience Letters,vol.536,pp.90–95,2013.

[86] N. A. Twine, C. Janitz, M. R. Wilkins, and M. Janitz, “Sequencing of hip- pocampal and cerebellar transcriptomes provides new insights into the com-

126 plexity of gene regulation in the human brain,” Neuroscience Letters,vol.541, pp. 263–268, 2013.

[87] I. Voineagu, X. Wang, P. Johnston, J. K. Lowe, Y. Tian, S. Horvath, J. Mill, R. M. Cantor, B. J. Blencowe, and D. H. Geschwind, “Transcriptomic analysis of autistic brain reveals convergent molecular pathology,” Nature,vol.474, no. 7351, pp. 380–384, 2011.

[88] A. J. Langerveld, D. Mihalko, C. DeLong, J. Walburn, and C. F. Ide, “Gene expression changes in postmortem tissue from the rostral pons of multiple system atrophy patients,” Movement Disorders,vol.22,no.6,pp.766–777, 2007.

[89] J. D. Mills, Y. Kawahara, and M. Janitz, “Strand-specific rna-seq provides greater resolution of transcriptome profiling,” Current Genomics,vol.14, no. 3, p. 173, 2013.

[90] J. D. Mills, T. Kavanagh, W. S. Kim, B. J. Chen, Y. Kawahara, G. M. Halliday, and M. Janitz, “Unique transcriptome patterns of the white and grey mat- ter corroborate structural and functional heterogeneity in the human frontal lobe,” PloS One,vol.8,no.10,p.e78480,2013.

[91] J. D. Mills, T. Kavanagh, W. S. Kim, B. J. Chen, P. D. Waters, G. M. Hal- liday, and M. Janitz, “High expression of long intervening non-coding rna olmalinc in the human cortical white matter is associated with regulation of oligodendrocyte maturation,” Molecular Brain,vol.8,no.1,p.2,2015.

[92] J. D. Mills, W. S. Kim, G. M. Halliday, and M. Janitz, “Transcriptome anal- ysis of grey and white matter cortical tissue in multiple system atrophy,” Neurogenetics,vol.16,no.2,pp.107–122,2014.

[93] J. D. Mills, J. Chen, W. S. Kim, P. D. Waters, A. S. Prabowo, E. Aron- ica, G. M. Halliday, and M. Janitz, “Long intervening non-coding rna 00320

127 is human brain-specific and highly expressed in the cortical white matter,” Neurogenetics,pp.1–13,2015.

[94] T. Kavanagh, J. D. Mills, W. S. Kim, G. M. Halliday, and M. Janitz, “Pathway analysis of the human brain transcriptome in disease,” Journal of Molecular Neuroscience,vol.51,no.1,pp.28–36,2013.

[95] R. S. Young, A. C. Marques, C. Tibbit, W. Haerty, A. R. Bassett, J.-L. Liu, and C. P. Ponting, “Identification and properties of 1,119 candidate lincrna loci in the drosophila melanogaster genome,” Genome Biology and Evolution, vol. 4, no. 4, pp. 427–442, 2012.

[96] C. Pan, C. Kumar, S. Bohl, U. Klingmueller, and M. Mann, “Comparative proteomic phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions,” Molecular & Cellular Proteomics,vol.8, no. 3, pp. 443–450, 2009.

[97] S. E. Burdall, A. M. Hanby, M. Lansdown, and V. Speirs, “Breast cancer cell lines: friend or foe?,” Breast Cancer Research,vol.5,no.2,pp.89–89,2003.

[98] K. M. Olsavsky, J. L. Page, M. C. Johnson, H. Zarbl, S. C. Strom, and C. J. Omiecinski, “Gene expression profiling and di↵erentiation assessment in pri- mary human hepatocyte cultures, established hepatoma cell lines, and human liver tissues,” Toxicology and Applied Pharmacology,vol.222,no.1,pp.42–56, 2007.

[99] A. Barton, R. Pearson, A. Najlerahim, and P. Harrison, “Pre-and postmortem influences on brain rna,” Journal of Neurochemistry,vol.61,no.1,pp.1–11, 1993.

[100] O. Schoor, T. Weinschenk, J. Hennenlotter, S. Corvin, A. Stenzl, H.-G. Ram- mensee, and S. Stevanovi´c, “Moderate degradation does not preclude microar-

128 ray analysis of small amounts of rna.,” Biotechniques,vol.35,no.6,pp.1192–6, 2003.

[101] I. G. Romero, A. A. Pai, J. Tung, and Y. Gilad, “Rna-seq: impact of rna degradation on transcript quantification,” BMC Biology,vol.12,no.1,p.42, 2014.

[102] F. du Boisgueheneuc, R. Levy, E. Volle, M. Seassau, H. Du↵au, S. Kinkingne- hun, Y. Samson, S. Zhang, and B. Dubois, “Functions of the left superior frontal gyrus in humans: a lesion study,” Brain,vol.129,no.12,pp.3315– 3328, 2006.

[103] I. I. Goldberg, M. Harel, and R. Malach, “When the brain loses its self: pre- frontal inactivation during sensorimotor processing,” Neuron,vol.50,no.2, pp. 329–339, 2006.

[104] T. R. Mercer, I. A. Qureshi, S. Gokhan, M. E. Dinger, G. Li, J. S. Mattick, and M. F. Mehler, “Long noncoding rnas in neuronal-glial fate specification and oligodendrocyte lineage maturation,” BMC Neuroscience,vol.11,no.1, p. 14, 2010.

[105] F. M. Benes, M. Turtle, Y. Khan, and P. Farol, “Myelination of a key relay zone in the hippocampal formation occurs in the human brain during childhood, adolescence, and adulthood,” Archives of General Psychiatry,vol.51,no.6, pp. 477–484, 1994.

[106] S. Flynn, D. Lang, A. Mackay, V. Goghari, I. Vavasour, K. Whittall, G. Smith, V. Arango, J. Mann, A. Dwork, et al., “Abnormalities of myelination in schizophrenia detected in vivo with mri, and post-mortem with analysis of oligodendrocyte proteins,” Molecular Psychiatry,vol.8,no.9,pp.811–820, 2003.

129 [107] Q. Pan, O. Shai, L. J. Lee, B. J. Frey, and B. J. Blencowe, “Deep survey- ing of alternative splicing complexity in the human transcriptome by high- throughput sequencing,” Nature Genetics,vol.40,no.12,pp.1413–1415,2008.

[108] E. T. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. F. Kingsmore, G. P. Schroth, and C. B. Burge, “Alternative isoform regulation in human tissue transcriptomes,” Nature,vol.456,no.7221,pp.470–476,2008.

[109] V. Stolc, Z. Gauhar, C. Mason, G. Halasz, M. F. van Batenburg, S. A. Rifkin, S. Hua, T. Herreman, W. Tongprasit, P. E. Barbano, et al.,“Ageneexpres- sion map for the euchromatic genome of drosophila melanogaster,” Science, vol. 306, no. 5696, pp. 655–660, 2004.

[110] L. W. Hillier, V. Reinke, P. Green, M. Hirst, M. A. Marra, and R. H. Water- ston, “Massively parallel sequencing of the polyadenylated transcriptome of c. elegans,” Genome Research,vol.19,no.4,pp.657–666,2009.

[111] R. D. Abbott, G. W. Ross, C. M. Tanner, J. K. Andersen, K. H. Masaki, B. L. Rodriguez, L. R. White, and H. Petrovitch, “Late-life hemoglobin and the incidence of parkinson’s disease,” Neurobiology of Aging,vol.33,no.5, pp. 914–920, 2012.

[112] M. Biagioli, M. Pinto, D. Cesselli, M. Zaninello, D. Lazarevic, P. Roncaglia, R. Simone, C. Vlachouli, C. Plessy, N. Bertin, et al., “Unexpected expression of ↵-and -globin in mesencephalic dopaminergic neurons and glial cells,” Proceedings of the National Academy of Sciences,vol.106,no.36,pp.15454– 15459, 2009.

[113] Y. Wang, S. R. Butros, X. Shuai, Y. Dai, C. Chen, M. Liu, E. Haacke, J. Hu, and H. Xu, “Di↵erent iron-deposition patterns of multiple system atrophy with predominant parkinsonism and idiopathetic parkinson diseases demon-

130 strated by phase-corrected susceptibility-weighted imaging,” American Jour- nal of Neuroradiology,vol.33,no.2,pp.266–273,2012.

[114] D. Kaur and J. Andersen, “Does cellular iron dysregulation play a causative role in parkinson’s disease?,” Ageing Research Reviews,vol.3,no.3,pp.327– 343, 2004.

[115] R. Barbour, K. Kling, J. P. Anderson, K. Banducci, T. Cole, L. Diep, M. Fox, J. M. Goldstein, F. Soriano, P. Seubert, et al.,“Redbloodcellsarethemajor source of alpha-synuclein in blood,” Neurodegenerative Diseases,vol.5,no.2, pp. 55–59, 2008.

[116] J. Lonsdale, J. Thomas, M. Salvatore, R. Phillips, E. Lo, S. Shad, R. Hasz, G. Walters, F. Garcia, N. Young, et al.,“Thegenotype-tissueexpression(gtex) project,” Nature Genetics, vol. 45, no. 6, pp. 580–585, 2013.

[117] V. Pelechano and L. M. Steinmetz, “Gene regulation by antisense transcrip- tion,” Nature Reviews Genetics,vol.14,no.12,pp.880–893,2013.

[118] J. Chen, M. Sun, W. J. Kent, X. Huang, H. Xie, W. Wang, G. Zhou, R. Z. Shi, and J. D. Rowley, “Over 20% of human transcripts might form sense–antisense pairs,” Nucleic Acids Research,vol.32,no.16,pp.4812–4820,2004.

[119] F. Ozsolak, P. Kapranov, S. Foissac, S. W. Kim, E. Fishilevich, A. P. Monaghan, B. John, and P. M. Milos, “Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation,” Cell, vol. 143, no. 6, pp. 1018–1029, 2010.

[120] Y. He, B. Vogelstein, V. E. Velculescu, N. Papadopoulos, and K. W. Kinzler, “The antisense transcriptomes of human cells,” Science,vol.322,no.5909, pp. 1855–1857, 2008.

131 [121] Z. Chen and X. Duan, “Ribosomal rna depletion for massively parallel bac- terial rna-sequencing applications,” in High-Throughput Next Generation Se- quencing,pp.93–103,Springer,2011.

[122] S. van Duijn, R. J. Nabuurs, S. G. van Duinen, and R. Natt´e, “Comparison of histological techniques to visualize iron in paran-embedded brain tissue of patients with alzheimer’s disease,” Journal of Histochemistry & Cyto- chemistry,p.0022155413501325,2013.

132 Appendix A

Publications with non-first authorship related to this thesis

133 A.1 The role of transcriptional control in multiple

system atrophy

Chen, J., Mills, J. D., Halliday, G. M., and Janitz, M., (2015).“The role of tran- scriptional control in multiple system atrophy” Neurobiology of Aging. 36:394-400.

134 Neurobiology of Aging 36 (2015) 394e400

Contents lists available at ScienceDirect

Neurobiology of Aging

journal homepage: www.elsevier.com/locate/neuaging

The role of transcriptional control in multiple system atrophy

Jieqiong Chen a, James D. Mills a, Glenda M. Halliday b,c, Michael Janitz a,* a School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia b Neuroscience Research Australia, Sydney, New South Wales, Australia c School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia article info abstract

Article history: Multiple system atrophy (MSA) is an a-synucleinopathy that is clinically characterized by varying de- Received 7 May 2014 grees of parkinsonian, autonomic, and cerebellar features. Unlike other a-synucleinopathies such as Received in revised form 29 July 2014 Parkinson’s disease, MSA is unique in that the principal a-synuclein lesions, called glial cytoplasmic Accepted 12 August 2014 inclusions, occur in oligodendroglia rather than neurons, with significantly more a-synuclein accumu- Available online 19 August 2014 lating in MSA brain compared with Parkinson’s disease. Although well defined clinically, the molecular pathophysiology of MSA has barely been investigated. In particular, there have been no systematic Keywords: studies of the perturbation of the brain transcriptome during the onset and progression of this disease. Multiple system atrophy SNCA Transcriptome Interestingly, measurements of a-synuclein gene ( ) expression in MSA brain tissue have not Gene regulation revealed overexpression of this gene in oligodendroglia or neurons. It has therefore become clear that a-Synuclein other genes and gene networks, both directly as noncoding RNAs or through protein products, contribute Neurodegenerative diseases to the accumulation of the a-synuclein protein in the brain. This review provides a summary of current developments in the investigation of the transcriptional causes of MSA and outlines perspectives for future research toward the elucidation of the molecular pathology of MSA-specific neurodegeneration. Ó 2015 Elsevier Inc. All rights reserved.

1. Introduction as well as the phosphorylation of serine at the 129 position are responsible for the development of PD (Kiely et al., 2013; Kosaka, a-Synucleinopathies comprise a group of neurodegenerative 1978; Ma et al., 2013; Zarranz et al., 2004). Importantly, multiplica- disorders, including Parkinson’s disease (PD), dementia with Lewy tions of the whole gene are also pathogenic for PD (Lesage and Brice, bodies (DLB), and multiple system atrophy (MSA) (Fellner and 2009). The expression of a-synuclein 98 is upregulated in PD and DLB Stefanova, 2013; Ferrer, 2001; Halliday et al., 2011; Jellinger, patients (Beyer et al., 2008), whereas high expression of a-synuclein 2003). The major pathologic hallmark of a-synucleinopathies is 112 is observed in DLB patients, suggesting that a-synuclein 112 the aggregation of the protein a-synuclein throughout the brain; in might play a role in the pathogenesis of DLB (Beyer, 2006). In contrast PD and DLB these aggregates occur primarily in the neurons and are to PD, SNCA gene multiplications do not occur in MSA, and there is no known as Lewy bodies and Lewy neurites, in MSA a-synuclein ag- evidence of increased messenger RNA (mRNA) for the major species gregation occurs in oligodendroglia and are known as glial cyto- a-synuclein 140 (Asi et al., 2014; Jin et al., 2008; Miller et al., 2005; plasmic inclusions (GCIs). Ozawa et al., 2001). This suggests different pathogenic mechanisms a-Synuclein aggregation is thought to be a key event in the between these a-synucleinopathies. pathogenesis of a-synucleinopathies; however, the exact molecular mechanisms of pathogenesis have not been elucidated (Wakabayashi 2. The role of glia in a-synucleinopathies and Takahashi, 2006). The gene encoding human a-synuclein (SNCA) is located on chromosome 4q21.3-q22 (Campion et al., 1995), and 4 Glia is an umbrella term for a number of different cell types pre- fi protein isoforms have been identi ed: a-synuclein 140, a-synuclein sent in the brain including astrocyglial (or astrocytes), oligoden- 126, a-synuclein 112, and a-synuclein 98 (Beyer and Ariza, 2013; Ma droglial (or oligodendrocytes), and microglial (Fellner and Stefanova, et al., 2013; McLean et al., 2000). Studies have shown that point 2013; Halliday and Stevens, 2011). In a healthy brain, glial cells play an mutations at the A30P, E46K, G51D, and A53T sites of a-synuclein 140 important support role throughout the brain providing an insulating myelin sheath around axons and metabolic and structural support * Corresponding author at: School of Biotechnology and Biomolecular Sciences, for neurons. Evidence is emerging, suggesting that glial cells play University of New South Wales, Sydney, New South Wales 2052, Australia. Tel.: 612 938 58608; fax: 612 938 51483. a crucial role in a variety of a-synucleinopathies (Fellner et al., 2011; þ þ E-mail address: [email protected] (M. Janitz). Halliday and Stevens, 2011). Initial injury or infection causes

0197-4580/$ e see front matter Ó 2015 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.neurobiolaging.2014.08.015 J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400 395 microglial and astroglial cells to be activated (Nimmerjahn et al., 2005; predominantly with a cerebellar syndrome are classified as MSA with Wilhelmsson et al., 2006). Activated microglia and astroglia undergo cerebellar signs (MSA-C) (Gilman et al., 2008; Kawai et al., 2008; morphologic changes, releasing trophic and inflammatory factors; Nicoletti et al., 2006). The primary pathologic differences between additionally, activated microglial cells remove dead or apoptotic cells the 2 subtypes are based on the sites of neuronal loss and GCI accu- processes that are vital for neuron survival (Nimmerjahn et al., 2005; mulation (Tong et al., 2010). In MSA-C patients, the loss of neurons Wilhelmsson et al., 2006). However, in chronic diseases, astroglia, and occurs primarily in the basis pontis, inferior olives, and cerebellar microglia can become over activated leading to neurotoxicity and cortex, this is coupled with white matter (WM) degeneration of the increased tissue damage after the release of proinflammatory cyto- middle cerebellar peduncle (Lu et al., 2013; Minnerop et al., 2010). The kines, reactive oxygen species, and nitric oxide (Dean et al., 2010; main differentiating pathology in MSA-P is the progressive degen- Deshpande et al., 2005). eration of putaminal neurons (Sato et al., 2007), significantly affecting In neuronal a-synucleinopathies such as PD, a-synuclein ag- basal ganglia circuits involved in the regulation of motor pathways. gregation occurs primarily in neurons but also features a-synuclein Further, MSA-P is characterized by progressive akinesia and rigidity accumulation in astroglial cells (Braak et al., 2007; Wakabayashi and a poor response to levodopa therapy (Wenning et al., 2004). The et al., 2000). It is speculated that astroglia play a major role in PD prevalence of MSA-P and MSA-C varies in different parts of the world: by releasing inflammatory agents at the site of injury and recruiting subtype C has been reported to be more prevalent than subtype P in microglial cells (Fellner and Stefanova, 2013). The microglial cells the Japanese population (65%e67% vs. 33%e35%), whereas subtype P then become over activated leading to the production of proin- is more prevalent in Europe (63% vs. 34%) and North America (60% vs. flammatory agents, followed by oxidative stress, hence accelerating 13%, with 27% of cases unclassified) (Multiple-System Atrophy neuronal cell death (Zhang et al., 2005). It is not thought that Research Collaboration, 2013). oligodendroglia play a role in neuronal a-synucleinopathies. In the oligodendroglial a-synucleinopathy MSA, the initial site of a-syn- 4. a-Synuclein and MSA uclein accumulation is in the oligodendroglia cells. This results in the compromised function of these cells resulting in demyelination As mentioned previously, the etiology of MSA is an increase in of axons and overall a lack of neuronal support. Activated astroglia the brain level and accumulation of a-synuclein, a small, natively at the site attract microglia through the release of proinflammotry unfolded protein that accounts for as much as 1% of the total protein cytokines, and oxidative stress promotes neuronal death (Fellner in the soluble cytosolic brain fraction (Bisaglia et al., 2009; Uversky, and Stefanova, 2013). 2007). Monomeric a-synuclein exists in the presynaptic termini in These models are not without controversy. Different studies equilibrium between free and membrane-bound states (McLean have reported conflicting results on the functions of glial cells in a- et al., 2000). Approximately 15% of a-synuclein is membrane synucleinopathies (Stefanis, 2012), suggesting that glial cells might bound (Lee et al., 2002), suggesting that the protein may regulate participate in more than one pathogenic process and that their vesicular release and/or turnover and other synaptic functions in dominant function corresponds to their local environmental the central nervous system (Clayton and George, 1999; Lavedan, changes, such as neuroinflammation. Furthermore, other research 1998; Ueda et al., 1993). Mutation and/or environmental changes groups have suggested that astroglia are not activated during some may reduce the capability of a-synuclein to recognize proper 3 2 a-synucleinopathies (Mirza et al., 2000; Vila et al., 2001). This is binding partners. For instance, the presence of Al þ and Cu þ may further confounded by glial cells well-documented positive effects induce structural perturbation (Paik et al., 1997, 1999), and point on compromised neuronal cells. The complex role of glia in mutations at A30P, E49K, and A53T were shown to reduce protein a-synucleinopathies warrants further investigation in neurode- hydrophobicity (Li et al., 2001), thus leading to the formation of generative diseases. nonfunctional and deadly aggregates (Uversky, 2007). In addition, the interaction between a-synuclein and b-III tubulin in the 3. Multiple system atrophy transgenic mouse model and the nitration of a-synuclein tyrosine residues are critical for the formation of insoluble protein com- MSA is a sporadic neurodegenerative disorder characterized by plexes that progressively accumulate in neurons (Ischiropoulos and varying combinations of parkinsonism, cerebellar ataxia, and Beckman, 2003; Nakayama et al., 2012; Souza et al., 2000). These autonomic failure (Gilman et al., 2008; Lu et al., 2013). Prevalence accumulated a-synuclein complexes lead to neuronal dysfunction rates range from 1.9 to 4.6/100,000, and the mean age at disease (Nakayama et al., 2009). However, according to studies GCIs usually onset is approximately 55 years (O’Sullivan et al., 2008). The inci- contain full-length and C-terminally truncated a-synuclein; the dence rate is approximately 0.6 cases per 100,000 persons latter, truncated a-synuclein, is more prone to fibrillate than the (Vanacore et al., 2001). The mean survival time after diagnosis of full-length protein in vitro (Gai et al., 1998; Murray et al., 2003; MSA is 9 years (Donadio et al., 2010). It is a rapidly progressing Serpell et al., 2000; Uversky, 2007). Research has also shown that disease, and patients may be confined to a wheelchair after only a phosphorylation of a-synuclein at serine 129 promotes fibril for- few years from diagnosis (Watanabe et al., 2002). Studies have mation in vitro, possibly by altering the confirmation of the C ter- shown that males are more susceptible to this disease than females, minus of a-synuclein (Fujiwara et al., 2002). with ratios ranging from between 1.4: 1 to 1.9: 1 (Wenning et al., Studies using animal models have not only demonstrated that 2004). Two significant pathologic features that characterize MSA the expression of lipid and membrane transport genes are associ- are GCIs and neuronal intranuclear inclusions, both of which are ated with a-synuclein expression but have also stated that changes composed of a-synuclein (Nakayama et al., 2012). However, the in membrane fluidity and in cellular fatty acid uptake and meta- precise function and regulation of a-synuclein in MSA is yet to be bolism are consequences of either the overexpression or homozy- determined. Further, extensive myelin (forming a myelin sheath gous deletion of SNCA in a neuronal cell line (Castagnet et al., 2005; around axons in white matter) damage is present in the brains of Golovko et al., 2005; Scherzer et al., 2003; Sharon et al., 2003). MSA patients (Matsuo et al., 1998; Probst-Cousin et al., 1998). Alternative splicing is a versatile and widespread post- MSA is further categorized into 2 subtypes based on clinical transcriptional mechanism for the generation of multiple mRNAs phenotypes (Gawel et al., 2012; Umoto et al., 2012). Patients who from a single transcript (Beyer, 2006; McLean et al., 2000; Mills and present predominantly with parkinsonian symptoms are classified Janitz, 2012). The a-synuclein-encoding gene SNCA undergoes com- as MSA with parkinsonism (MSA-P), and patients who present plex splicing events, including in-frame deletion (the deletion of one 396 J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400

Fig. 1. Gene ontology terms for genes differentially expressed in MSA brain tissue. (A) Gene ontology terms for genes downregulated in MSA brain tissue. Each segment indicates the percentage of genes that were downregulated that contribute to each term. (B) Gene ontology terms for genes upregulated in MSA brain tissue. Each segment indicates the percentage of genes that were upregulated that contribute to each term. Based on data from Langerveld et al. (2007). Abbreviation: MSA, multiple system atrophy.

or more exons) and variation in the 50UTR exon content (Beyer and Although currently no direct relationship between SNCA modi- Ariza, 2013; Campion et al., 1995). A total of 15 SNCA transcript vari- fication at the transcriptional or posttranscriptional level has been ants are translated into 4 distinct protein isoforms of a-synuclein that identified, a recent study suggests that the aggregation of a-synu- have been identified thus far. clein interferes with the process of oligodendrogenesis, preventing The 4 protein isoforms are a-synuclein 140, a-synuclein 126, the formation of mature oligodendroglial (May et al., 2014). This a-synuclein 112, and a-synuclein 98 (Beyer and Ariza, 2013; Ma novel pathway warrants further investigation. et al., 2013; McLean et al., 2000). a-Synuclein 140 conserves the entire structure (6 exons) and posttranslational modification sites 5. Genomics of MSA of the SNCA gene (Beyer, 2006; Cho et al., 2009). It has been sug- gested that a-synuclein 126 might be an aggregation-preventing 5.1. Mutations in familial and sporadic MSA isoform, because the expression of a-synuclein 126 is diminished in DLB and Alzheimer’s disease (AD) (Beyer et al., 2008). Murray MSA is generally considered to be a nongenetic disorder (Dickson et al. (2003) showed that a-synuclein 112 is more prone to aggre- et al., 1999; Shiga et al., 2005); however, family members of MSA gation than full-length a-synuclein 140 because of the lack of the C- patients might be more susceptible to MSA. Some studies have found terminus (C-terminal truncated). Overexpression of this protein in a a higher frequency of parkinsonism among the first-degree relatives human dopaminergic cell line led to inhibited proteasomal function of MSA patients (Vidal et al., 2010); approximately 13% of MSA pa- and induced cell death (Kalivendi et al., 2010). a-Synuclein 98 tients had at least 1 first-degree or second-degree relative with together with a-synuclein 112 might participate in the initial parkinsonism (Wenning et al., 1993). In addition, higher frequencies pathologic seeding process of a-synuclein, as they enhance the of neurologic disease occur among the first-degree relatives of MSA destabilization of a-synuclein tetramers when their concentrations patients (Nee et al., 1991). increase (Bartels et al., 2011; Beyer and Ariza, 2013). As sequencing A recent collaborative study has identified the coenzyme Q2 technology improves, it is expected that more splice variants and 4-hydroxybenzoate polyprenyltransferase (COQ2) gene, an essen- associated protein isoforms will be identified. tial gene for the biosynthesis of coenzyme Q10, which is a part of the J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400 397

Table 1 Gene ontology terms enriched in astrocytes, neurons, and oligodendrocytes

Astrocytes Contributing genes Neurons Contributing genes Oligodendrocytes Contributing genes Symporter activity Slc1a2, Slc1a3, Slc15a2, Chloride transport Gabrg2, Gabra1, Glra2, Axon ensheathment Gm98, Plp1, Ugt8a, Slc4a4, Slc25a18 Slc12a5, Gabra5 Cldn11, Gal3st1, Mbp Response to wounding Slc1a2, Slc1a3, F3, Inorganic anion Gabrg2, Gabra1, Glra2, Ensheathment of Gm98, Plp1, Ugt8a, Pla2g7, Tlr3, Bmpr1b, transport Slc12a5, Gabra5 neurons Cldn11, Gal3st1, Mbp Papss2 Anion:cation symporter Slc1a2, Slc1a3, Slc4a4 Synapse Gabrg2, Syt1, Slc17a6, Regulation of action Gm98, Plp1, Ugt8a, activity Gabra1, Glra2, Gabra5, potential in neuron Cldn11, Gal3st1, Mbp Snap25 Organic anion transport Slc1a2, Slc1a3, Slc4a4 Neurotransmitter Gabrg2, Sstr2, Gabra1, Regulation of action Gm98, Plp1, Ugt8a, receptor activity Glra2, Gabra5 potential Cldn11, Gal3st1, Mbp Glutamate biosynthetic Slc1a3, Prodh Neurotransmitter Gabrg2, Sstr2, Gabra1, Myelination Gm98, Plp1, Ugt8a, process binding Glra2, Gabra5 Gal3st1, Mbp

Based on data from Cahoy et al. (2008). electron transport chain and participates in the aerobic cellular Research Collaboration, 2013). Another group of downregulated respiration that generates adenosine triphosphate (Stefanova et al., genes were found to be related to the structure of the proteasome, 2005), as the first susceptibility gene for MSA (Multiple-System which is responsible for the degradation of proteins, and abnor- Atrophy Research Collaboration, 2013). This MSA research collab- malities in the proteasome pathways are linked to the accumulation oration examined the number of COQ2 mutations and allele fre- of the aggregated, ubiquitinated proteins found in several neuro- quencies of variants in individuals in Japan, Europe, and North degenerative diseases, including MSA (Keller et al., 2000; America. Thirteen variants of COQ2 were found in MSA patients McNaught and Jenner, 2001). In the MSA brain, substantial tissue from Japan, Europe, and North America. One variant (R69H) and 3 loss occurs (Minnerop et al., 2010), compromising brain function, single-nucleotide variants were found in the North American and with the expression level of genes responding to inflammation, Japanese control groups, and no variant was found in the European injury, and stress being upregulated. These observations are in line control group. In addition, the allele frequencies of variants in pa- with the elevated levels of active glial cells, which respond to tients’ family members were significant higher than in the corre- degeneration in brain injury and/or infection and release proin- sponding controls. The same research group also measured the flammatory cytokines and complement proteins in the brains of intracellular levels of coenzyme Q10 in lymphoblastoid cell lines. patients with MSA (Ide et al., 1996; Ishizawa et al., 2004). The coenzyme Q10 level in MSA patients (2 variant alleles) was In addition to these differences, the accumulation of a-synuclein substantially lower than in patients who were heterozygous car- in brain tissue is associated with the expression of several genes, riers (1 variant allele); the controls (without variants) had the including sparc/osteonectin, cwcv, and kazal-like domains proteo- highest coenzyme Q10 levels. As patients’ family members have glycan (SPOCK1), microtubule-associated protein 1A (MAP1A), high variant allele frequencies, they are more likely to have laminin alpha 4 (LAMA4), and tumor necrosis factor alpha induced impaired COQ2 activity and are thus more susceptible to MSA. protein 6 (TNFAIP6). SPOCK1 and MAP1A are downregulated in MSA These findings suggest that impaired COQ2 activity, which would patients (Langerveld et al., 2007). SPOCK1 encodes the protein core be predicted to impair the mitochondrial respiratory chain and of a seminal plasma proteoglycan containing chondroitin- and increase vulnerability to oxidative stress (Ferrante et al., 2005; heparan-sulfate chains and has putative function as protease in- Huntington Study Group Pre2CARE Investigators et al., 2010), cau- hibitor with possible involvement in cell-cell interactions. MAP1A ses susceptibility to MSA. COQ2 dysfunctionality also points toward gene product is involved in microtubule assembly, which is an a potential role for abnormalities in lipid membranes and adeno- essential step in neurogenesis. Upregulation of SPOCK1 was also sine triphosphate-dependent membrane transporters in MSA identified in palmitoyl protein thioesterase 1 (ppt1) knockout (Bleasel et al., 2014). mouse neurons, a model for infantile neuronal ceriod lipofuscinosis, a severe neurodegenerative disorder of children (Ahtiainen et al., fi 5.2. Alterations in gene expression in the MSA brain 2007). In the same study Langerveld et al. (2007) identi ed LAMA4 and TNFAIP6 to be upregulated in association with increased aggregation of -synuclein in MSA brains. LAMA4 is a major Using microarrays, Langerveld et al. (2007) reported that a total a component of basement membrane and is implicated in cell adhe- of 254 genes were differently expressed between MSA patients and sion and neurite outgrowth. TNFAIP6 is a member of the healthy controls, with 180 genes being downregulated and the hyaluronan-binding protein family and is involved in extracellular remaining 74 genes upregulated in MSA brain tissue. Most of the matrix stability and cell migration. Future studies should be focused downregulated genes in MSA patients are associated with mito- on elucidation of a mechanistic role of these differentially expressed chondrial functions (21 genes), protein modification (21 genes), genes in the MSA-specific -synuclein aggregation. metabolism and/or glycolysis (21 genes), signal transduction (15 a genes), ion transport (12 genes), and proteasome structure (9 genes). Most of the upregulated genes are involved in transcription 6. Oligodendroglia-specific transcriptome and/or RNA modification (16 genes), signal transduction (9 genes), and inflammation and/or response to stress (9 genes) (Langerveld Over the past decade, there has been a shift away from the et al., 2007). Fig. 1 graphically summarizes these findings. protein-centric view of molecular biology. It is now suspected that it Langerveld et al. (2007) also found that the expression of all is the vast, dynamic network of RNA molecules, mainly products of electron transport chain complex-encoding genes were altered alternative splicing and pervasive transcription, and the presence of (predominately downregulated) in MSA brain samples, ultimately long noncoding RNAs and microRNAs (miRNAs) that contribute to leading to a loss of mitochondrial function. According to recent organism and tissue complexity such as that seen in the human brain research, mitochondrial dysfunction is a major contributing factor (Mattick, 2011). The human brain contains numerous different cell in many a-synucleinopathies (Beal, 2005; Multiple-System Atrophy types; here, we discuss the oligodendroglia specific transcriptome. 398 J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400

Using microarrays Cahoy et al. (2008) analyzed transcriptome phosphatase and tensin homolog-induced putative kinase 1 profiles of isolated mouse astrocytes, oligodendrocytes, and neu- (naPINK1), functions to stabilize and upregulate the expression of rons and determined a number of new molecular markers specific svPINK1 (splice variant of PINK1), which ultimately leads to the for these cell subsets. Moreover, the pathway analysis confirmed a disruption of the mitochondrial respiratory chain, increasing the phagocytic phenotype for astrocytes and underlined differences in sensitivity of cells to apoptosis in PD (Chiba et al., 2009; Morais et al., transcriptome profiles between this type of cell and oligoden- 2009; Sai et al., 2012; Scheele et al., 2007). droglia (Table 1). fi Comparative analysis of the transcriptome pro les from the WM 8. Conclusion and gray matter (GM) of the human frontal cortex, generated using RNA-Seq technique, revealed distinct differences not only in gene MSA is an a-synucleinopathy whose molecular pathogenesis is fi expression pro les but also alternative splicing patterns between poorly understood. It is a unique neurodegenerative disorder in that these 2 cortical structures (Mills et al., 2013). The WM tran- the pathologic lesion is found in oligodendroglia rather than neu- fi scriptome corroborated speci c functions of oligodendroglia rons. Expression studies have identified changes in mRNAs impor- related to myelination, axon growth, and lipid metabolism. Gene tant for transcription and/or RNA modification itself, as well as for ontology enrichment analysis of the GM samples revealed gene regulating mitochondrial function, metabolism and/or glycolysis, fi ontology clusters speci c for neuronal physiology such as ion-gated proteasome structure and protein modification, signal transduction channels and synaptic transmission. Interestingly, this study and ion transport, and inflammation and/or response to stress. fi revealed speci c upregulation of the long intergenic noncoding Future research with usage of transcriptome sequencing techniques RNAs (lincRNAs), such as linc00263, overexpressed in the WM are likely to provide further insight into genome-wide changes of versus the GM (Mills et al., 2013). This study also showed that the mRNA expression important for the highly increased deposition of fi transcriptome pro les of different cell types from adjacent regions a-synuclein in MSA brain. In particular, more oligodendroglia- of the human brain are distinct. This highlights the heterogeneity of specific and noncoding parts of the transcribed genome need to the brain which must be taken into account during any gene be explored to identify the mechanisms underlying the molecular expression studies. pathology of MSA. Similarly to long noncoding RNAs (lncRNAs), miRNAs are also increasingly recognized as important regulatory factors in oligo- Disclosure statement dendroglial differentiation and axon myelination. Widespread miRNA dysregulation has been identified in a variety of neurode- Authors declare that no conflicts of interests exist. generative disorders; further comparison across other neurode- generative diseases indicated an MSA specific alteration in miR-96 expression levels (Ubhi et al., 2014). Although dysregulation of miR- Acknowledgements 96 was not shown to be conclusively related to oligodendroglia, its disease specific upregulation warrants further investigation in Glenda M Halliday is a Senior Principal Research Fellow of oligodendroglial cell lines. Further examples of miRNA playing a the National Health and Medical Research Council Australia role in the oligodendroglial transcriptome include microRNA-23 (#630434). This study was supported by Brain Foundation Australia (miR-23) that has been found to suppress lamin B1 and enhance (to Michael Janitz). oligodendroglial differentiation in vitro (Lin and Fu, 2009). The former effect is particularly important in regard to the pathogenesis References of leukodystrophy where overexpression of lamin B1 leads to demyelination in the central nervous system (Kohler, 2010). Ahtiainen, L., Kolikova, J., Mutka, A.L., Luiro, K., Gentile, M., Ikonen, E., Khiroug, L., fi Recently, it has been demonstrated that miR-23 modulates the Jalanko, A., Kopra, O., 2007. Palmitoyl protein thioesterase 1 (Ppt1)-de cient mouse neurons show alterations in cholesterol metabolism and calcium ho- phosphatase and tensin homolog (PTEN), which is a key modulator meostasis prior to synaptic dysfunction. Neurobiol. Dis. 28, 52e64. of the AKT-mTOR signaling pathway (Lin et al., 2013). Moreover, the Asi, Y.T., Simpson, J.E., Heath, P.R., Wharton, S.B., Lees, A.J., Revesz, T., Houlden, H., lncRNA 2700046G09Rik was identified as another miR-23a target. Holton, J.L., 2014. Alpha-synuclein mRNA expression in oligodendrocytes in MSA. Glia 62, 964e970. This lncRNA regulates PTEN in a miR-23a-dependent manner (Lin Bartels, T., Choi, J.G., Selkoe, D.J., 2011. alpha-Synuclein occurs physiologically as a et al., 2013). helically folded tetramer that resists aggregation. Nature 477, 107e110. Beal, M.F., 2005. Mitochondria take center stage in aging and neurodegeneration. Ann. Neurol. 58, 495e505. 7. Long noncoding RNAs and neurodegenerative diseases Beyer, K., 2006. Alpha-synuclein structure, posttranslational modification and alternative splicing as aggregation enhancers. Acta Neuropathol. 112, 237e251. In recent years, it has become increasingly apparent that the Beyer, K., Ariza, A., 2013. alpha-Synuclein posttranslational modification and alter- native splicing as a trigger for neurodegeneration. Mol. Neurobiol. 47, 509e524. nonprotein coding portion of the genome is of crucial functional Beyer, K., Domingo-Sabat, M., Humbert, J., Carrato, C., Ferrer, I., Ariza, A., 2008. importance for normal development and disease pathophysiology Differential expression of alpha-synuclein, parkin, and synphilin-1 isoforms in (Mercer et al., 2009). Qureshi et al. (2010) have suggested that the Lewy body disease. Neurogenetics 9, 163e172. Bisaglia, M., Mammi, S., Bubacco, L., 2009. Structural insights on physiological precise temporal and spatial expression of noncoding RNAs appears functions and pathological effects of alpha-synuclein. FASEB J. 23, 329e340. to be exceptionally important for mediating central nervous system Bleasel, J.M., Wong, J.H., Halliday, G.M., Kim, W.S., 2014. Lipid dysfunction and form and function. lncRNAs are a heterogeneous group of noncoding pathogenesis of multiple system atrophy. Acta Neuropathol. Commun. 2, 15. transcripts more than 200 nucleotides long (Esteller, 2011), and the Braak, H., Sastre, M., Del Tredici, K., 2007. Development of alpha-synuclein immu- noreactive astrocytes in the forebrain parallels stages of intraneuronal pathol- expression of certain lncRNAs might influence the pathogenesis of ogy in sporadic Parkinson’s disease. Acta Neuropathol. 114, 231e241. neurodegenerative disorders. For example, the expression of anti- Cahoy, J.D., Emery, B., Kaushal, A., Foo, L.C., Zamanian, J.L., Christopherson, K.S., sense b-site amyloid precursor protein-cleaving enzyme 1 (BACE1) Xing, Y., Lubischer, J.L., Krieg, P.A., Krupenko, S.A., Thompson, W.J., Barres, B.A., 2008. A transcriptome database for astrocytes, neurons, and oligodendrocytes: RNA, BACE1-AS, regulates the expression of BACE1 in AD patients, and a new resource for understanding brain development and function. J. Neurosci. a mouse model by stabilizing the mRNA of BACE1, which, in turn 28, 264e278. promotes the production of amyloid b and contributes to the path- Campion, D., Martin, C., Heilig, R., Charbonnier, F., Moreau, V., Flaman, J.M., Petit, J.L., Hannequin, D., Brice, A., Frebourg, T., 1995. The NACP/synuclein gene: chro- ogenesis of AD (Faghihi et al., 2008; Frisardi et al., 2010; Qureshi et al., mosomal assignment and screening for alterations in Alzheimer disease. Ge- 2010). The noncoding natural antisense RNA transcribed from the nomics 26, 254e257. J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400 399

Castagnet, P.I., Golovko, M.Y., Barcelo-Coblijn, G.C., Nussbaum, R.L., Murphy, E.J., Kalivendi, S.V., Yedlapudi, D., Hillard, C.J., Kalyanaraman, B., 2010. Oxidants induce 2005. Fatty acid incorporation is decreased in astrocytes cultured from alpha- alternative splicing of alpha-synuclein: Implications for Parkinson’s disease. synuclein gene-ablated mice. J. Neurochem. 94, 839e849. Free Radic. Biol. Med. 48, 377e383. Chiba, M., Kiyosawa, H., Hiraiwa, N., Ohkohchi, N., Yasue, H., 2009. Existence of Kawai, Y., Suenaga, M., Takeda, A., Ito, M., Watanabe, H., Tanaka, F., Kato, K., Pink1 antisense RNAs in mouse and their localization. Cytogenet. Genome Res. Fukatsu, H., Naganawa, S., Kato, T., Ito, K., Sobue, G., 2008. Cognitive impair- 126, 259e270. ments in multiple system atrophy: MSA-C vs MSA-P. Neurology 70 (16 Pt 2), Cho, M.K., Nodet, G., Kim, H.Y., Jensen, M.R., Bernado, P., Fernandez, C.O., Becker, S., 1390e1396. Blackledge, M., Zweckstetter, M., 2009. Structural characterization of alpha- Keller, J.N., Hanni, K.B., Markesbery, W.R., 2000. Impaired proteasome function in synuclein in an aggregation prone state. Protein Sci. 18, 1840e1846. Alzheimer’s disease. J. Neurochem. 75, 436e439. Clayton, D.F., George, J.M., 1999. Synucleins in synaptic plasticity and neurodegen- Kiely, A.P., Asi, Y.T., Kara, E., Limousin, P., Ling, H., Lewis, P., Proukakis, C., Quinn, N., erative disorders. J. Neurosci. Res. 58, 120e129. Lees, A.J., Hardy, J., Revesz, T., Houlden, H., Holton, J.L., 2013. alpha-Synuclein- Dean, J.M., Wang, X., Kaindl, A.M., Gressens, P., Fleiss, B., Hagberg, H., Mallard, C., opathy associated with G51D SNCA mutation: a link between Parkinson’s dis- 2010. Microglial MyD88 signaling regulates acute neuronal toxicity of LPS- ease and multiple system atrophy? Acta Neuropathol. 125, 753e769. stimulated microglia in vitro. Brain Behav. Immun. 24, 776e783. Kohler, W., 2010. Leukodystrophies with late disease onset: an update. Curr. Opin. Deshpande, M., Zheng, J., Borgmann, K., Persidsky, R., Wu, L., Schellpeper, C., Neurol. 23, 234e241. Ghorpade, A., 2005. Role of activated astrocytes in neuronal damage: potential Kosaka, K., 1978. Lewy bodies in cerebral cortex, report of three cases. Acta Neu- links to HIV-1-associated dementia. Neurotox. Res. 7, 183e192. ropathol. 42, 127e134. Dickson, D.W., Lin, W., Liu, W.K., Yen, S.H., 1999. Multiple system atrophy: a sporadic Langerveld, A.J., Mihalko, D., DeLong, C., Walburn, J., Ide, C.F., 2007. Gene expression synucleinopathy. Brain Pathol. 9, 721e732. changes in postmortem tissue from the rostral pons of multiple system atrophy Donadio, V., Cortelli, P., Elam, M., Di Stasi, V., Montagna, P., Holmberg, B., patients. Mov. Disord. 22, 766e777. Giannoccaro, M.P., Bugiardini, E., Avoni, P., Baruzzi, A., Liguori, R., 2010. Auto- Lavedan, C., 1998. The synuclein family. Genome Res. 8, 871e880. nomic innervation in multiple system atrophy and pure autonomic failure. Lee, H.J., Choi, C., Lee, S.J., 2002. Membrane-bound alpha-synuclein has a high ag- J. Neurol. Neurosurg. Psychiatry 81, 1327e1335. gregation propensity and the ability to seed the aggregation of the cytosolic Esteller, M., 2011. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861e874. form. J. Biol. Chem. 277, 671e678. Faghihi, M.A., Modarresi, F., Khalil, A.M., Wood, D.E., Sahagan, B.G., Morgan, T.E., Lesage, S., Brice, A., 2009. Parkinson’s disease: from monogenic forms to genetic Finch, C.E., St Laurent 3rd, G., Kenny, P.J., Wahlestedt, C., 2008. Expression of a susceptibility factors. Hum. Mol. Genet. 18 (R1), R48eR59. noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward Li, J., Uversky, V.N., Fink, A.L., 2001. Effect of familial Parkinson’s disease point regulation of beta-secretase. Nat. Med. 14, 723e730. mutations A30P and A53T on the structural properties, aggregation, and Fellner, L., Jellinger, K.A., Wenning, G.K., Stefanova, N., 2011. Glial dysfunction in the fibrillation of human alpha-synuclein. Biochemistry 40, 11604e11613. pathogenesis of alpha-synucleinopathies: emerging concepts. Acta Neuro- Lin, S.T., Fu, Y.H., 2009. miR-23 regulation of lamin B1 is crucial for oligodendrocyte pathol. 121, 675e693. development and myelination. Dis. Model Mech. 2, 178e188. Fellner, L., Stefanova, N., 2013. The role of glia in alpha-synucleinopathies. Mol. Lin, S.T., Huang, Y., Zhang, L., Heng, M.Y., Ptacek, L.J., Fu, Y.H., 2013. MicroRNA-23a Neurobiol. 47, 575e586. promotes myelination in the central nervous system. Proc. Natl. Acad. Sci. U.S.A Ferrante, K.L., Shefner, J., Zhang, H., Betensky, R., O’Brien, M., Yu, H., Fantasia, M., Taft, J., 110, 17468e17473. Beal, M.F., Traynor, B., Newhall, K., Donofrio, P., Caress, J., Ashburn, C., Freiberg, B., Lu, C.F., Soong, B.W., Wu, H.M., Teng, S., Wang, P.S., Wu, Y.T., 2013. Disrupted cere- O’Neill, C., Paladenech, C., Walker, T., Pestronk, A., Abrams, B., Florence, J., bellar connectivity reduces whole-brain network efficiency in multiple system Renna, R., Schierbecker, J., Malkus, B., Cudkowicz, M., 2005. Tolerance of high-dose atrophy. Mov. Disord. 28, 362e369. (3,000 mg/day) coenzyme Q10 in ALS. Neurology 65, 1834e1836. Ma, K.L., Song, L.K., Long, W.A., Yuan, Y.H., Zhang, Y., Song, X.Y., Niu, F., Han, N., Ferrer, I., 2001. [Alpha-synucleinopathies]. Neurologia 16, 163e170. Chen, N.H., 2013. Deletion in exon 5 of the SNCA gene and exposure to rotenone Frisardi, V., Solfrizzi, V., Imbimbo, P.B., Capurso, C., D’Introno, A., Colacicco, A.M., leads to oligomerization of alpha-synuclein and toxicity to PC12 cells. Brain Res. Vendemiale, G., Seripa, D., Pilotto, A., Capurso, A., Panza, F., 2010. Towards Bull. 90, 127e131. disease-modifying treatment of Alzheimer’s disease: drugs targeting beta-am- Matsuo, A., Akiguchi, I., Lee, G.C., McGeer, E.G., McGeer, P.L., Kimura, J., 1998. Myelin yloid. Curr. Alzheimer Res. 7, 40e55. degeneration in multiple system atrophy detected by unique antibodies. Am. J. Fujiwara, H., Hasegawa, M., Dohmae, N., Kawashima, A., Masliah, E., Goldberg, M.S., Pathol. 153, 735e744. Shen, J., Takio, K., Iwatsubo, T., 2002. alpha-Synuclein is phosphorylated in Mattick, J.S., 2011. The central role of RNA in human development and cognition. synucleinopathy lesions. Nat. Cell. Biol. 4, 160e164. FEBS Lett. 585, 1600e1616. Gai, W.P., Power, J.H., Blumbergs, P.C., Blessing, W.W., 1998. Multiple-system atro- May, V.E., Ettle, B., Poehler, A.M., Nuber, S., Ubhi, K., Rockenstein, E., Winner, B., phy: a new alpha-synuclein disease? Lancet 352, 547e548. Wegner, M., Masliah, E., Winkler, J., 2014. alpha-Synuclein impairs oligoden- Gawel, M., Jamrozik, Z., Szmidt-Salkowska, E., Slawek, J., Rowinska-Marcinska, K., drocyte progenitor maturation in multiple system atrophy. Neurobiol. Aging 35, 2012. Is peripheral neuron degeneration involved in multiple system atrophy? A 2357e2368. clinical and electrophysiological study. J. Neurol. Sci. 319, 81e85. McLean, P.J., Kawamata, H., Ribich, S., Hyman, B.T., 2000. Membrane association and Gilman, S., Wenning, G.K., Low, P.A., Brooks, D.J., Mathias, C.J., Trojanowski, J.Q., protein conformation of alpha-synuclein in intact neurons. Effect of Parkinson’s Wood, N.W., Colosimo, C., Durr, A., Fowler, C.J., Kaufmann, H., Klockgether, T., disease-linked mutations. J. Biol. Chem. 275, 8812e8816. Lees, A., Poewe, W., Quinn, N., Revesz, T., Robertson, D., Sandroni, P., Seppi, K., McNaught, K.S., Jenner, P., 2001. Proteasomal function is impaired in substantia Vidailhet, M., 2008. Second consensus statement on the diagnosis of multiple nigra in Parkinson’s disease. Neurosci. Lett. 297, 191e194. system atrophy. Neurology 71, 670e676. Mercer, T.R., Dinger, M.E., Mattick, J.S., 2009. Long non-coding RNAs: insights into Golovko, M.Y., Faergeman, N.J., Cole, N.B., Castagnet, P.I., Nussbaum, R.L., functions. Nat. Rev. Genet. 10, 155e159. Murphy, E.J., 2005. Alpha-synuclein gene deletion decreases brain palmitate Miller, D.W., Johnson, J.M., Solano, S.M., Hollingsworth, Z.R., Standaert, D.G., uptake and alters the palmitate metabolism in the absence of alpha-synuclein Young, A.B., 2005. Absence of alpha-synuclein mRNA expression in normal and palmitate binding. Biochemistry 44, 8251e8259. multiple system atrophy oligodendroglia. J. Neural Transm. 112, 1613e1624. Halliday, G.M., Holton, J.L., Revesz, T., Dickson, D.W., 2011. Neuropathology under- Mills, J.D., Janitz, M., 2012. Alternative splicing of mRNA in the molecular pathology lying clinical variability in patients with synucleinopathies. Acta Neuropathol. of neurodegenerative diseases. Neurobiol. Aging 33, 1012.e11e1012.e24. 122, 187e204. Mills, J.D., Kavanagh, T., Kim, W.S., Chen, B.J., Kawahara, Y., Halliday, G.M., Janitz, M., Halliday, G.M., Stevens, C.H., 2011. Glia: initiators and progressors of pathology in 2013. Unique transcriptome patterns of the white and grey matter corroborate Parkinson’s disease. Mov. Disord. 26, 6e17. structural and functional heterogeneity in the human frontal lobe. PLoS One 8, Huntington Study Group Pre2CARE Investigators, Hyson, H.C., Kieburtz, K., e78480. Shoulson, I., McDermott, M., Ravina, B., de Blieck, E.A., Cudkowicz, M.E., Minnerop, M., Luders, E.,Specht,K.,Ruhlmann,J.,Schimke, N., Thompson, P.M., Chou, Y.Y., Ferrante, R.J., Como, P., Frank, S., Zimmerman, C., Cudkowicz, M.E., Ferrante, K., Toga,A.W.,Abele,M.,Wullner,U.,Klockgether,T., 2010. Callosaltissueloss in multiple Newhall, K., Jennings, D., Kelsey, T., Walker, F., Hunt, V., Daigneault, S., system atrophyea one-year follow-up study. Mov. Disord. 25, 2613e2620. Goldstein, M., Weber, J., Watts, A., Beal, M.F., Browne, S.E., Metakis, L.J., 2010. Mirza, B., Hadberg, H., Thomsen, P., Moos, T., 2000. The absence of reactive astro- Safety and tolerability of high-dosage coenzyme Q10 in Huntington’s disease cytosis is indicative of a unique inflammatory process in Parkinson’s disease. and healthy subjects. Mov. Disord. 25, 1924e1928. Neuroscience 95, 425e432. Ide, C.F., Scripter, J.L., Coltman, B.W., Dotson, R.S., Snyder, D.C., Jelaso, A., 1996. Morais, V.A., Verstreken, P., Roethig, A., , J., Snellinx, A., Vanbrabant, M., Cellular and molecular correlates to plasticity during recovery from injury in the , D., Frezza, C., Mandemakers, W., Vogt-Weisenhorn, D., Van Coster, R., developing mammalian brain. Prog. Brain Res. 108, 365e377. Wurst, W., Scorrano, L., De Strooper, B., 2009. Parkinson’s disease mutations in Ischiropoulos, H., Beckman, J.S., 2003. Oxidative stress and nitration in neuro- PINK1 result in decreased complex I activity and deficient synaptic function. degeneration: cause, effect, or association? J. Clin. Invest. 111, 163e169. EMBO Mol. Med. 1, 99e111. Ishizawa, K., Komori, T., Sasaki, S., Arai, N., Mizutani, T., Hirose, T., 2004. Microglial Multiple-System Atrophy Research Collaboration, 2013. Mutations in COQ2 in fa- activation parallels system degeneration in multiple system atrophy. milial and sporadic multiple-system atrophy. N. Engl. J. Med. 369, 233e244. J. Neuropathol. Exp. Neurol. 63, 43e52. Murray, I.V., Giasson, B.I., Quinn, S.M., Koppaka, V., Axelsen, P.H., Ischiropoulos, H., Jellinger, K.A., 2003. Neuropathological spectrum of synucleinopathies. Mov. Disord. Trojanowski, J.Q., Lee, V.M., 2003. Role of alpha-synuclein carboxy-terminus on 18 (Suppl 6), S2eS12. fibril formation in vitro. Biochemistry 42, 8530e8540. Jin, H., Ishikawa, K., Tsunemi, T., Ishiguro, T., Amino, T., Mizusawa, H., 2008. Analyses Nakayama, K., Suzuki, Y., Yazawa, I., 2009. Microtubule depolymerization suppresses of copy number and mRNA expression level of the alpha-synuclein gene in alpha-synuclein accumulation in a mouse model of multiple system atrophy. multiple system atrophy. J. Med. Dent. Sci. 55, 145e153. Am. J. Pathol. 174, 1471e1480. 400 J. Chen et al. / Neurobiology of Aging 36 (2015) 394e400

Nakayama, K., Suzuki, Y., Yazawa, I., 2012. Binding of neuronal alpha-synuclein to alpha-synuclein overexpression replicates the characteristic neuropathology of beta-III tubulin and accumulation in a model of multiple system atrophy. Bio- multiple system atrophy. Am. J. Pathol. 166, 869e876. chem. Biophys. Res. Commun. 417, 1170e1175. Tong, J., Wong, H., Guttman, M., Ang, L.C., Forno, L.S., Shimadzu, M., Rajput, A.H., Nee, L.E., Gomez, M.R., Dambrosia, J., Bale, S., Eldridge, R., Polinsky, R.J., 1991. Muenter, M.D., Kish, S.J., Hornykiewicz, O., Furukawa, Y., 2010. Brain alpha- Environmental-occupational risk factors and familial associations in multiple synuclein accumulation in multiple system atrophy, Parkinson’s disease and system atrophy: a preliminary investigation. Clin. Auton. Res. 1, 9e13. progressive supranuclear palsy: a comparative investigation. Brain 133 (Pt 1), Nicoletti, G., Lodi, R., Condino, F., Tonon, C., Fera, F., Malucelli, E., Manners, D., 172e188. Zappia, M., Morgante, L., Barone, P., Barbiroli, B., Quattrone, A., 2006. Apparent Ubhi, K., Rockenstein, E., Kragh, C., Inglis, C., Spencer, B., Michael, S., Mante, M., diffusion coefficient measurements of the middle cerebellar peduncle differ- Adame, A., Galasko, D., Masliah, E., 2014. Widespread microRNA dysregulation entiate the Parkinson variant of MSA from Parkinson’s disease and progressive in multiple system atrophy - disease-related alteration in miR-96. Eur. J. Neu- supranuclear palsy. Brain 129 (Pt 10), 2679e2687. rosci. 39, 1026e1041. Nimmerjahn, A., Kirchhoff, F., Helmchen, F., 2005. Resting microglial cells are highly Ueda, K., Fukushima, H., Masliah, E., Xia, Y., Iwai, A., Yoshimoto, M., Otero, D.A., dynamic surveillants of brain parenchyma in vivo. Science 308, 1314e1318. Kondo, J., Ihara, Y., Saitoh, T., 1993. Molecular cloning of cDNA encoding an O’Sullivan, S.S., Massey, L.A., Williams, D.R., Silveira-Moriyama, L., Kempster, P.A., unrecognized component of amyloid in Alzheimer disease. Proc. Natl. Acad. Sci. Holton, J.L., Revesz, T., Lees, A.J., 2008. Clinical outcomes of progressive supra- U.S.A 90, 11282e11286. nuclear palsy and multiple system atrophy. Brain 131 (Pt 5), 1362e1372. Umoto, M., Miwa, H., Ando, R., Kajimoto, Y., Kondo, T., 2012. White matter hyper- Ozawa, T., Okuizumi, K., Ikeuchi, T., Wakabayashi, K., Takahashi, H., Tsuji, S., 2001. intensities in patients with multiple system atrophy. Parkinsonism Relat. Disord. Analysis of the expression level of alpha-synuclein mRNA using postmortem 18, 17e20. brain samples from pathologically confirmed cases of multiple system atrophy. Uversky, V.N., 2007. Neuropathology, biochemistry, and biophysics of alpha- Acta Neuropathol. 102, 188e190. synuclein aggregation. J. Neurochem. 103, 17e37. Paik, S.R., Lee, J.H., Kim, D.H., Chang, C.S., Kim, J., 1997. Aluminum-induced structural Vanacore, N., Bonifati, V., Fabbrini, G., Colosimo, C., De Michele, G., Marconi, R., alterations of the precursor of the non-A beta component of Alzheimer’s disease Nicholl, D., Locuratolo, N., Talarico, G., Romano, S., Stocchi, F., Bonuccelli, U., De amyloid. Arch. Biochem. Biophys. 344, 325e334. Mari, M., Vieregge, P., Meco, G., European Study Group on Atypical, P, 2001. Paik, S.R., Shin, H.J., Lee, J.H., Chang, C.S., Kim, J., 1999. Copper(II)-induced self- Epidemiology of multiple system atrophy. ESGAP consortium. European study oligomerization of alpha-synuclein. Biochem. J. 340 (Pt 3), 821e828. group on atypical parkinsonisms. Neurol. Sci. 22, 97e99. Probst-Cousin, S., Rickert, C.H., , K.W., Gullotta, F., 1998. Cell death mecha- Vidal, J.S., Vidailhet, M., Derkinderen, P., Tzourio, C., Alperovitch, A., 2010. Familial nisms in multiple system atrophy. J. Neuropathol. Exp. Neurol. 57 (9), 814e821. aggregation in atypical Parkinson’s disease: a case control study in multiple Qureshi, I.A., Mattick, J.S., Mehler, M.F., 2010. Long non-coding RNAs in nervous system atrophy and progressive supranuclear palsy. J. Neurol. 257, 1388e1393. system function and disease. Brain Res. 1338, 20e35. Vila, M., Jackson-Lewis, V., Guegan, C., Wu, D.C., Teismann, P., Choi, D.K., Tieu, K., Sai, Y., Zou, Z., Peng, K., Dong, Z., 2012. The Parkinson’s disease-related genes act in Przedborski, S., 2001. The role of glial cells in Parkinson’s disease. Curr. Opin. mitochondrial homeostasis. Neurosci. Biobehav. Rev. 36, 2034e2043. Neurol. 14, 483e489. Sato, K., Kaji, R., Matsumoto, S., Nagahiro, S., Goto, S., 2007. Compartmental loss of Wakabayashi, K., Hayashi, S., Yoshimoto, M., Kudo, H., Takahashi, H., 2000. NACP/ striatal medium spiny neurons in multiple system atrophy of parkinsonian type. alpha-synuclein-positive filamentous inclusions in astrocytes and oligoden- Mov. Disord. 22, 2365e2370. drocytes of Parkinson’s disease brains. Acta Neuropathol. 99, 14e20. Scheele, C., Petrovic, N., Faghihi, M.A., Lassmann, T., Fredriksson, K., Rooyackers, O., Wakabayashi, K., Takahashi, H., 2006. Cellular pathology in multiple system atrophy. Wahlestedt, C., Good, L., Timmons, J.A., 2007. The human PINK1 locus is regu- Neuropathology 26, 338e345. lated in vivo by a non-coding natural antisense RNA during modulation of Watanabe, H., Saito, Y., Terao, S., Ando, T., Kachi, T., Mukai, E., Aiba, I., Abe, Y., mitochondrial function. BMC Genomics 8, 74. Tamakoshi, A., Doyu, M., Hirayama, M., Sobue, G., 2002. Progression and prog- Scherzer, C.R., Jensen, R.V., Gullans, S.R., Feany, M.B., 2003. Gene expression changes nosis in multiple system atrophy: an analysis of 230 Japanese patients. Brain presage neurodegeneration in a drosophila model of Parkinson’s disease. Hum. 125 (Pt 5), 1070e1083. Mol. Genet. 12, 2457e2466. Wenning, G.K., Colosimo, C., Geser, F., Poewe, W., 2004. Multiple system atrophy. Serpell, L.C., Berriman, J., Jakes, R., Goedert, M., Crowther, R.A., 2000. Fiber diffrac- Lancet Neurol. 3, 93e103. tion of synthetic alpha-synuclein filaments shows amyloid-like cross-beta Wenning, G.K., Wagner, S., Daniel, S., Quinn, N.P., 1993. Multiple system atrophy: conformation. Proc. Natl. Acad. Sci. U.S.A 97, 4897e4902. sporadic or familial? Lancet 342, 681. Sharon, R., Bar-Joseph, I., Frosch, M.P., Walsh, D.M., Hamilton, J.A., Selkoe, D.J., 2003. Wilhelmsson, U., Bushong, E.A., Price, D.L., Smarr, B.L., Phung, V., Terada, M., The formation of highly soluble oligomers of alpha-synuclein is regulated by Ellisman, M.H., Pekny, M., 2006. Redefining the concept of reactive astrocytes as fatty acids and enhanced in Parkinson’s disease. Neuron 37, 583e595. cells that remain within their unique domains upon reaction to injury. Proc. Shiga, K., Yamada, K., Yoshikawa, K., Mizuno, T., Nishimura, T., Nakagawa, M., 2005. Natl. Acad. Sci. U.S.A 103, 17513e17518. Local tissue anisotropy decreases in cerebellopetal fibers and pyramidal tract in Zarranz, J.J., Alegre, J., Gomez-Esteban, J.C., Lezcano, E., Ros, R., Ampuero, I., Vidal, L., multiple system atrophy. J. Neurol. 252, 589e596. Hoenicka, J., Rodriguez, O., Atares, B., Llorens, V., Gomez Tortosa, E., del Ser, T., Souza, J.M., Giasson, B.I., Lee, V.M., Ischiropoulos, H., 2000. Chaperone-like activity Munoz, D.G., de Yebenes, J.G., 2004. The new mutation, E46K, of alpha- of synucleins. FEBS Lett. 474, 116e119. synuclein causes Parkinson and Lewy body dementia. Ann. Neurol. 55, 164e173. Stefanis, L., 2012. alpha-Synuclein in Parkinson’s disease. Cold Spring Harb. Per- Zhang, W., Wang, T., Pei, Z., Miller, D.S., Wu, X., Block, M.L., Wilson, B., Zhang, W., spect. Med. 2, a009399. Zhou, Y., Hong, J.S., Zhang, J., 2005. Aggregated alpha-synuclein activates Stefanova, N., Reindl, M., Neumann, M., Haass, C., Poewe, W., Kahle, P.J., microglia: a process leading to disease progression in Parkinson’s disease. Wenning, G.K., 2005. Oxidative stress in transgenic mice with oligodendroglial FASEB J. 19, 533e542. A.2 Pathway analysis of the human brain tran-

scriptome in disease

Kavanagh, T., Mills, J. D., Kim, W. S., Halliday, G. M., and Janitz, M., (2013).“Path- way Analysis of the Human Brain Transcriptome in Disease.” Journal of Molecular Neuroscience. 51:28-36.

142 J Mol Neurosci (2013) 51:28–36 DOI 10.1007/s12031-012-9940-0

Pathway Analysis of the Human Brain Transcriptome in Disease

Tomas Kavanagh & James D. Mills & Woojin S. Kim & Glenda M. Halliday & Michael Janitz

Received: 3 November 2012 /Accepted: 10 December 2012 /Published online: 22 December 2012 # Springer Science+Business Media New York 2012

Abstract Pathway analysis is a powerful method for dis- Introduction cerning differentially regulated genes and elucidating their biological importance. It allows for the identification of In the complex transcriptional environment of the brain, perturbed or aberrantly expressed genes within a biological identification of differentially regulated genes is a signifi- context from extensive data sets and offers a simplistic cant challenge. This is especially true in cases of neurode- approach for interrogating such data sets. With the growing generative diseases in which ageing itself is causative of use of microarrays and RNA-Seq, data for genome-wide many of the changes seen (Lu et al. 2004). To complicate studies are growing at an alarming rate, and the use of deep the identification of causative genes in neurodegenerative sequencing is revealing elements of the genome previously diseases further, the brain has an enormous range of tran- uncharacterised. Through the employment of pathway anal- script isoforms, and many differentially expressed isoforms ysis, mechanisms in complex diseases may be explored and have been linked to neurodegenerative diseases (Courtney et novel causatives found primarily through differentially reg- al. 2010; Faustino and Cooper 2003; Mills and Janitz 2012; ulated genes. Further, with the implementation of next gen- Tollervey et al. 2011; Twine et al. 2011). These isoforms eration sequencing, a deeper resolution may be attained, may be differentially expressed across tissues or in disease, particularly in identification of isoform diversity and SNPs. or alternative splicing and erroneous splicing may have an Here, we look at a broad overview of pathway analysis in impact on disease (Mills and Janitz 2012; Tollervey et al. the human brain transcriptome and its relevance in teasing 2011; Twine et al. 2011). It is imperative that a deep under- out underlying causes of complex diseases. We will outline standing of the brain transcriptome be developed so that processes in data gathering and analysis of particular dis- diseases of the brain might be better treated. eases in which these approaches have been successful. In identifying key differences in complex disease states, systems biology looks to identify differentially regulated Keywords Transcriptome . Pathway analysis . Human genes. Whilst there are many potential methods to study brain . Brain disorders . Gene expression . RNA-Seq the transcriptome, the use of microarrays is likely the most prevalent. However, microarray analysis is not without faults, and the recent improvement in next-generation se- quencing (NGS) and development of RNA-Seq—which has greater reproducibility, resolution and cost effectiveness T. Kavanagh : J. D. Mills : M. Janitz (*) School of Biotechnology and Biomolecular Sciences, when analysing an entire transcriptome—looks set to over- University of New South Wales, Sydney, New South Wales 2052, take microarrays (Shendure 2008). Australia As the wealth of data generated by the aforementioned e-mail: [email protected] techniques grows, the interrogation of the datasets for bio- W. S. Kim : G. M. Halliday logically relevant information becomes more complex. Tra- Neuroscience Research Australia, Sydney, New South Wales 2031, ditionally these approaches provide a list of genes identified Australia as differentially expressed. Pathway analysis has gained favour for its ability to reduce these lists to a series of W. S. Kim : G. M. Halliday School of Medical Sciences, University of New South Wales, “pathways” in biological systems. These pathways are Sydney, New South Wales 2052, Australia formed of gene and protein products within a defined J Mol Neurosci (2013) 51:28–36 29 context of biology, e.g. sphingolipid metabolism in humans. islands of oligo-primer probes fixed to a solid surface such Examples of these pathways include those formed from as glass to hybridise to complementary DNA (cDNA) sam- Gene Ontology annotated genes drawn from public data- ples generated from RNA (Brown and Botstein 1999). By bases (Emmert-Streib and Glazko 2011; Khatri et al. 2012). setting up these probes in an array, relative expression of the The analysis may be implemented in any number of pro- library of genes being screened can be visualised when grams or databases and include online resources such as The labelled sample is washed across the array for hybridisation. Database for Annotation, Visualization and Integrated Appropriate use and placement of probes within genes Discovery (DAVID) (Huang et al. 2009)orMetaCore increases the specificity of this process and can provide (www.genego.com). sequence information about the expression of transcripts The use of DAVID requires little technical expertise yet can within the sample being analysed. Whilst microarrays im- determine the presence of affected pathways and allows for prove with increasing knowledge of genomes, they still rely simplified targeted identification and analysis of aberrant on a reference genomic sequence and as such are incapable genes. To complete a DAVID analysis, it is necessary to have of de novo sequence identification (Kapur et al. 2008; a sufficiently extensive gene list ranging from hundreds to Shendure 2008). thousands of genes, and each gene needs to have been selected Standard cDNA arrays allow for the analysis of the by statistical analysis (e.g. t test) and shows significant fold absolute, often for the most common isoforms, expression change in expression (Huang et al. 2009). DAVID’s output of differential to be measured. Microarrays produce extensive affected gene pathways may then be used in an exploratory gene lists showing differential regulation of those genes fashion to find differentially expressed genes of biological between the states being investigated. However, the basis relevance to the state being questioned. of hybridisation of probe to cDNA strand is by no means The use of pathway analysis has seen extensive use flawless. As a result, microarray data need to be corrected throughout the microarray and RNA-Seq analysis. These for hybridisation background. For specialised study of iso- experiments have ranged from aiming to generate a general- forms in transcriptomics, specialised microarrays, such as ised understanding of gene expression changes or differences exon and tiling arrays, have been developed. Exon arrays during development or ageing and between tissue types (Chen are rich in probes specific for exons of transcripts based on et al. 2011; Lerch et al. 2012) or in disease (Sutherland et al. libraries of annotated, partially supported or estimated 2011). Such studies have encompassed Alzheimer’s disease exons. Each probe set will target one exon and show ex- (AD), schizophrenia (SCZ), autism spectrum disease (ASD), pression levels of each exon type providing an accurate which will be discussed here, amongst others. representation of gene expression and suggesting the likely Pathway analysis offers a powerful analytical tool for composition of transcripts (Kapur et al. 2008). studying the transcriptome within the human brain. It has Many computing technologies have been developed the ability to readily identify differentially expressed genes for analysis of microarray data. These include Gene- and also to isolate them into biological groupings based on BASE, a tool for estimating expression levels of genes functional relation, structural relation or position in a cell (www.stanford.edu/group/wonglab/GeneBASE/), and amongst other grouping. In doing so, pathway analysis microarray analysis of differential splicing, an improved reduces the complexity of such tasks involving large gene program for transcript isoform construction from exon lists of differentially regulated genes and hence has strong tiling arrays (Xing et al. 2008). These, and other pro- applications in studies of the transcriptome using either grams, allow for more accurate processing of microarray RNA-Seq or microarray for data collection. data by removing hybridisation background, more accu- rately quantifying gene expression and assembling RNA isoforms. This in turn allows for deeper analysis of the Data Collection transcriptome via microarray. Despite the usefulness of microarrays in identifying While pathway analysis is applicable to many data sets, deregulated genes and even transcripts, significant flaws in including metabolomics, proteomics and gene expression microarray technology are becoming apparent. Non-specific (Emmert-Streib and Glazko 2011), its use in interpreting signals and cross-hybridisation are significant and can cause transcriptome data generated by microarray or RNA-Seq is loss of reproducibility among microarray experiments of particular interest here. (Homer et al. 2008). The reliance on hybridisation, analogue signatures and dependence on availability of the reference Microarray genome sequence make it therefore difficult to detect rare, low abundance or structurally similar isoforms (Chen et al. Microarrays have become a popular means for elucidating 1997; Homer et al. 2008). This makes high-resolution stud- gene expression patterns in disease states. Microarrays use ies of the transcriptome problematic. 30 J Mol Neurosci (2013) 51:28–36

RNA-Seq structure and align these to the genome (Trapnell et al. 2012). This allows for deeper analysis of differentially RNA-Seq is a recent technology that utilises NGS platforms expressed genes, with not only the absolute expression to sequence short cDNA fragments of around 100 bp (Costa of a gene seen but also the expression of its isoforms. et al. 2010; Huang et al. 2011; Wang et al. 2009) and thus This enhanced resolution increases RNA-Seq’spotential gain a snapshot of the human transcriptome at a particular to elucidate mechanisms of poorly understood complex point in time. This provides a view of not only what genes disease processes. are being expressed in a particular tissue and to what degree but will also reveal SNP genotype within transcribed parts of the genome. Moreover, alternatively spliced isoform type, Pathway Analysis abundance, diversity and regulation may be determined (Fang and Cui 2011;Lerchetal.2012;Trapnelletal. When using pathway analysis to explore RNA-Seq or 2009). It is also possible to detect RNA editing and cod- microarray data for differentially expressed genes, there ing/non-coding RNA’s and potential sites of action for non- are a number of options available. Over-representation anal- coding RNA, revealing the complex nature of transcriptome ysis was the first type of approach developed to take advan- dynamics (Peng et al. 2012). RNA-Seq overcomes the short- tage of microarray and Gene Ontology. This method selects comings of microarray that, for many organisms, rely on genes from the total list of genes generated by RNA-Seq or reference genomes that are incomplete for the entire tran- microarray, often by a fold change cut-off (commonly a scriptome and thus fail to identify any new or unknown twofold change) with a p value of 0.05 (Emmert-Streib transcript isoforms (Ameur et al. 2011; Martin and Wang and Glazko 2011; Khatri et al. 2012). These genes are used 2011; Wang et al. 2009). as input in analysis packages such as DAVID or MetaCore RNA-Seq offers base-pair resolution and strong coverage which aligns these genes to database knowledge (Huang et of the transcriptome. This allows for the determination of al. 2009). Pathways will be ordered according to over- transcript structure even as the result of alternative splicing representation, i.e. the number of genes differentially (AS) or mis-splicing. The ability to determine expression expressed from the number of genes defined by the pathway. levels of individual exons is, perhaps, RNA-Seq's most Each pathway may then be explored for differentially useful feature. Using this technology, it has been estimated expressed genes within the biological context; this repre- that approximately 95 % of multi-exon genes undergo AS to sents a knowledge-based identification system of differen- give potentially a hundred thousand splicing events occur- tially expressed gene candidates for further study. ring in major human tissues (Pan et al. 2008). However, the Flaws in the over-representation method include a loss of data quantity produced by RNA-Seq is overwhelming since data as selection criteria, such as the fold change and p value the transcriptomic analysis of one human individual can cut-off for reducing gene lists generated by RNA-Seq or result in ~70 Gbp of sequence reads (Peng et al. 2012). microarray, are often arbitrary. They may ignore genes that In designing RNA-Seq experiments, issues such as read fall just shy of these criteria or are not sufficiently affected coverage or sample size are key considerations. Most of the by the state under investigation, such as disease, when small transcripts are expressed at relatively low levels, and the disturbances may lead to crucial loss of pathway homeosta- detection of their expression will require generation of at sis further downstream (Subramanian et al. 2005). least 40 million reads per sample (Toung et al. 2011). To compensate for the weaknesses of over-representation Further, it is essential to include sufficient number of bio- analysis, two other major methods were developed. Func- logical replicates and of technical replicates to ensure repro- tional class scoring uses a multitude of statistics applied to ducibility and facilitate quantification of individual genes of gene-level changes recorded in experiments. These results interest using quantitative PCR (qPCR) (Fang and Cui are carried through to the pathway level to aid in assessing 2011). For example, current version of Cufflinks, a tran- the significance of an affected pathway (Ackermann and script assembler, requires sequence input from at least Strimmer 2009). The disadvantages of this method lie in three biological replicates to provide sufficient statistical its treatment of pathways, as in over-representation analysis, power for calculation of variability in gene expression as individual pathways when many have overlapping genes between samples. and functions. Further use of ranking at a pathway level can With the growing popularity of RNA-Seq in genome- lead to genes with a high degree of differential expression wide association studies, many new computing and analysis which may be considered equal to those with a lower degree methods have been developed to deal with the influx of of differential expression (Pavlidis et al. 2004). data. For sequence alignment, one such program includes The final method, pathway topology analysis, utilises TopHat which when extended to Cufflinks is not only useful further information from specialised databases (e.g. KEGG) in aligning sequences but can also reconstruct isoform such as metabolite production or protein sub-cellular J Mol Neurosci (2013) 51:28–36 31 location to assess how much one pathway may be affected Application of Pathway Analysis in Disease Studies by differentially expressed genes (Kanehisa and Goto 2000). However, pathway topology changes from cell to cell, and Diseases of the brain are often complex, with gradual pro- as such, a poor or no characterisation and annotation for gressive changes. Many neurodegenerative diseases such as some cell lines is available which presents a major drawback AD have greater risk with age, and once the process is in this method (Green and Karp 2006; Bauer-Mehren et al. initiated, it advances with age. It is vital that an understand- 2009). Furthermore, the method does not fully compensate ing of the key mechanisms of complex diseases in the brain for the interdependencies between similar pathways. be understood, particularly at early stages, so that preventa- Whilst a myriad of software packages exists for tive measures might be taken to halt disease progression. performing pathway analysis on large gene sets—including DAVID, MetaCore, GenGen, EASE and many others— MetaCore, a professional software package, and DAVID, Alzheimer’s Disease an open source package, are the main focus here. Whilst MetaCore and DAVID both perform over-representation AD is the most prevalent form of dementia (~60 % of analyses, they utilise different databases. Where MetaCore dementia) and is set to grow rapidly in the western world relies on its own manually curated database, DAVID utilises as the population’s age (Lobo et al. 2000; Fratiglioni et al. the BioCarta and KEGG databases amongst many others 1999). AD is a complex disease, and whilst it has been (Huang et al. 2009;david.abcc.ncifcrf.gov/).Thisgives associated with a build-up of amyloid beta (Aβ) and Tau DAVID a unique flexibility, allowing the user to view gene proteins, the parthenogenesis of the disease is still largely lists in varying biological perspectives. However, MetaCore unknown (Bertram et al. 2010). The use of transcriptomics offers many integrate functions including algorithms to en- and pathway analysis of gene expression within the human able pathway topology analysis, pathway construction with brain tissue offers a powerful method of elucidating aberrant integrated information on protein–protein interactions, mass expression of genes that may be causing or compounding spectrometry data, nuclear magnetic resonance data and the disease. metabolomics databases and a strong drug design and anal- Increasingly, it is being suggested that microRNAs ysis amongst many other functions (www.genego.com). (miRNAs) and transcriptional regulation play a key role Both programs have simple set-up and analyse wiz- in the development of spontaneous AD. As these miRNAs ards to provide guided usage and provide access to are theorised to control expression of genes and direct many reference databases such as Ensemble and UCSC splicing and regulation of mRNA transcripts, they play a Genome browser. MetaCore features nicer design of key role in the development of neurodegenerative diseases pathways and integrates experimental values. This com- (Satoh 2012). It has been seen in many studies that miR- bined with MetaCore's simple storage, and activation of NAs are increasingly mis-regulated. This includes the mis- multiple gene lists allows the user to directly compare regulation of miR-146a which is controlled by NF-κBand experiments with visual plotting of expression differen- is associated with sustained inflammation seen in AD ces. DAVID features a gene ID converter to ensure the brains (Li et al. 2011). Further groups have shown miR- majority of genes from a list are incorporated into the NAs that target BACE1-regulated formation of Aβ signif- analysis but can only accept 3,000 genes in a single job icantly reduced in AD frontal lobes (Kellett and Hooper (Huang et al. 2009;david.abcc.ncifcrf.gov/).Eachpath- 2009). way analysis program offers its own unique features and Studies into alternative splicing of genes previously cor- failings. As such, due consideration should be given to each related to AD pathology have shown differential expression before a decision is made as to which program to favour, with of transcript isoforms, a potential mediator of disease. RNA- researchers often choosing multiple programs to increase re- Seq data of AD brains show differential expression of apo- producibility and robustness of their results. lipoprotein E (APOE) isoforms, with APOE-001 and APOE- Choosing the appropriate method of analysis is a key for 002 showing decreased expression in AD brains compared getting the best data out of the gene lists (Breitling et al. to normal brains and APOE-005 showing significant up- 2004). This however may be affected by data collection and regulation in AD brains compared to normal brains (Twine database choices for analysis depending on the previous et al. 2011). Many other cases of aberrant splicing have been knowledge on pathways in a cell line or phenotype of linked to AD pathology including cryptic and deletion splic- interest (Green and Karp 2006). This is also dependent on ing of presenilin 1 (PSEN1) leading to loss or cryptic inclu- the sort of control implemented, whether this is a self- sion of exon 4. Altered splicing has been suggested to affect contained comparison of differential expression between amyloid precursor protein (APP)distributionintheAD two phenotypes or a competitive differential expression of brain, and mutations in other proteins such as progranulin genes compared to a control set in the same list. (GRN; an anti-inflammatory protein) prevent appropriate 32 J Mol Neurosci (2013) 51:28–36 splicing and may lead to a loss of neuro-protection (Golde et cyclin-dependent kinase 2-associated protein 1 (CDK2AP1), al. 1990; Cruts et al. 2006). were shown to be involved in apoptosis and cell cycle Meta-analysis of late onset AD cases has shown a num- amongst other functions, whilst down-regulated tumour ber of variants in the bridging integrator 1 (BIN1), comple- suppressor genes were those related to growth and prolifer- ment component (3b/4b) receptor 1 (CR1)andclusterin ation, such as v-myc myelocytomatosis viral oncogene ho- (CLU) genes to be involved in AD pathogenesis (Lambert molog (MYC) and transcription factor Dp-1 (DP1). Whilst et al. 2009; Harold et al. 2009). However the CLU gene has many of these tumour suppressors are involved in apoptotic not been replicated in some studies utilising GenGen: pathways, many have alternative roles, such as neuron Genetic Genomics Analysis of Complex Data in meta- senescence and extension or re-entry into the cell cycle analysis of Alzheimer’s expression data (Hu et al. 2011). (Blalock et al. 2004). Such differential expression of tumour The same authors also identified phosphatidylinositol- suppressor genes may have complex effects on cells within binding clathrin assembly protein (PICALM) variants, yet the AD brain, including cell cycle re-entry, increased myeli- the presence of PICALM was seen as insignificant when nation, cessation of protein biosynthesis and not just apoptosis. corrected for APOE variant presence, which is recognised as This varies with cell type and position within the AD brain. the most significant genetic marker for AD. PICALM is Analysis performed within DAVID on synaptically local- involved in clathrin-mediated endocytosis of APP fragments ised mRNA showed an increase in neuroplasticity genes before cleavage by γ-secretase to yield Aβ, and it has been such as glutamate receptor, ionotropic, AMPA 2 (GR1A2) suggested that dysfunctional PICALM may hinder Aβ clear- and solute carrier family 1, member 2 (SLC1A2). This ance (Bertram et al. 2010). The discovery of the CR1 locus suggests activation of inhibitory pathways as these proteins in this study shows a relationship to inflammation that the would hold synapses in a low conductance state. Further, an authors suggest compound environmental factors such as increase in 3′UTR control sequences that regulate transla- diabetes and high blood pressure that appear to affect the tion in the synapse was observed in many of these up- development of dementia. regulated neuro-plasticity genes (Williams et al. 2009). ATP-binding cassette transporter A7 (ABCA7) has also A study on micro-vessels in the AD brain showed more been recently identified as a strong candidate gene for late- than 2,000 genes which are differentially expressed com- onset AD (Hollingworth et al. 2011). ABCA7 is a member pared to normal brains with a large number of these genes of the ABCA subfamily which is characterised by the ability mapping to inflammatory and immune responses, signal to transport lipids acrossmembranes(Kimetal.2008). transduction and neural development in the GeneSpring ABCA7 is expressed in the brain and has been shown to GX program (Wang et al. 2012). Whilst these processes suppress the production of the neurotoxic amyloid-beta may not play pivotal roles in AD progression, they may peptide, a key pathological process in AD (Chan et al. compound the effects of the disease. In particular, over- 2008). Two isoforms of ABCA7, arising from alternative expression of cytokines and their receptors such as interleu- splicing, have been identified in the human brain (Ikeda et kin 1 receptor, type II (IL1R2) may significantly affect cell al. 2003). Type I ABCA7 splice variant has been detected on activity and be related to the decrease in gene expression of the cell surface and in intracellular compartments, whereas signal transduction-related genes. type II appears to remain in the endoplasmic reticulum A major problem with studying the brain is the signifi- (Ikeda et al. 2003). Different variants of this protein indicate cant difference in architecture and functionality across possibly different biological functions. Apart from the lipid regions. As such, studies need to decide whether to take transport function, two groups have identified a role for the whole brain homogenates, single cell samples or specif- ABCA7 in phagocytosis (Iwamoto et al. 2006; Jehle et al. ically the grey or white matter samples. Analysis of dissect- 2006). They have demonstrated that macrophage ABCA7 ed grey and white matter samples through DAVID allowed plays a key role in phagocytosis of apoptotic debris. Since for the identification of AD processes such as down- ABCA7 is strongly expressed in human microglia (Kim et regulation stabilisation of ryanodine-sensitive Ca2+ that al. 2006) and the fact that microglia actively phagocytose may have been masked in previous studies by white matter apoptotic debris in the CNS (Stolzing and Grune 2004), it is expression profiles (Blalock et al. 2011). plausible that microglial ABCA7 could be involved in In order to effectively treat AD, early disease detection phagocytosis of apoptotic debris arising from neurodegen- and neuroprotective treatments are essential. To accomplish erative disease. this, diagnostic indicators need to be differentiated from Correlation of microarray data using Expression Analysis processes that occur with ageing as many of these processes Systematic Explorer (EASE) has shown that incipient AD overlap (Blalock et al. 2004). As such, it is useful to have brains have a curiously high tumour suppressor transcription animal and single-cell models to investigate incipient AD. factor expression ratio (Blalock et al. 2004). These tumour Mice (5XFAD) have been used to show that early-stage AD suppressors, such as those from the TGF-β family and shows particular patterns of gene expression. This included J Mol Neurosci (2013) 51:28–36 33 up-regulation of solute carrier family 18 member 3 and analysis in DAVID, it has been shown that down- (SLC18A3) and cholinergic receptor, muscarine 1 (CHRM1) regulation of the A2BP1 leads to an increase in differential and the down-regulation of genes such as caveolin 3 (CAV3) splicing and mis-splicing of several genes in ASD brains and purkinje cell protein 2 (PCP2) as revealed in combina- when compared to control brains. Such targets included torial use of EASE, Ingenuity Pathway Analysis (IPA) and neuronal cell adhesion molecule (NRCAM)andgluta- GeneSpring GX packages (Kim et al. 2012). Such studies mate receptor, ionotropic, N-methyl D-aspartate 1 (GRIN1), may aid in finding markers for incipient AD that will allow involved in synapse formation, amongst other genes early detection, so neuro-protective treatments might be (Voineagu et al. 2011). implemented. Pathway analysis of autism affected brains has revealed Pathway analysis of AD brain tissue has indicated that ASD perturbations may affect central cytokine signalling thousands of differentially expressed genes to be involved pathways. This is seen in network hubs, from IPA pathway in the disease progression. Whilst the number of genes is tool, highly enriched for Tnf, NF-κB, TgfB1, Myc, Jnk and daunting, these studies have significantly improved our Mapk (Ziats and Rennert 2011). The over-representation of knowledge of AD pathology and increase the potential for many of these hubs will have significant downstream effects treatments to be developed. Whilst the complex nature of in many pathways, and the effects of these need to be explored the disease may prevent implementation of a single thera- in the disease context. The patterning of these network hubs peutic drug, a system of treatments could be developed with and cell type-specific expression led the authors to suggest sufficient knowledge on pathway disruption in AD. that glial cells as well as neurons are also affected in ASD brains. Knowledge of the genetic background of ASD was previously limited, but with the implementation of high- Autism Spectrum Disease throughput systems biology approaches, our understanding of the disease may rapidly progress. ASD is a heterogeneous collection of clinical conditions that are highly heritable with recurrence in siblings tens of times higher than the general population (Ziats and Rennert 2011; Schizophrenia Voineagu et al. 2011). Attempts to unravel a genetic cause for the disease have largely been unsuccessful. As the dis- The molecular aberrations leading to SCZ and bi-polar ease is highly complex, representing a broad clinical pre- disorder (BP) are largely unknown though recent attempts sentation is likely affected by a series of genetic, epigenetic at pathway analysis of data generated by high-throughput and environmental factors. However, recent implementation techniques such as proteomics, microarray and RNA-Seq of RNA-Seq and pathway analysis has begun to elucidate attempt to address this. Whilst SCZ and BP are clinically key changes in the autistic brain previously unknown. distinct disorders, they share a number of genetic risk factors A study on induced pluripotent stem cells derived from a (Lin et al. 2011). A wide range of differentially expressed number of patients and control groups showed that a series genes have been observed in common in these two diseases of genes was dysregulated during neuronal differentiation. through statistical analysis and use of EASE, DAVID, Gen- Further, it was seen that pseudogenes, lincRNAs and other MAPP and MAPPFinder programs. This includes genes regulatory RNAs were dysregulated during neurogenesis in such as neurexin 3 (NRXN3), glutamate decarboxylase 1 ASD. These changes, revealed by DAVID, were seen to (GAD1), claudin 11 (CLDN11)amongstmanyothers affect regulation of some gene candidates for ASD and (Konradi et al. 2004; Hashimoto et al. 2008; Dracheva et included chromatin remodelers, cell adhesion molecules al. 2006; Chu et al. 2009). These genes are primarily related and regulatory RNA species for HOX boxes and other to synapse formation and myelin synthesis which play key developmental gene sets (Lin et al. 2011). Changes in these roles in brain development and diseases of the brain. genes likely contribute to the lack of definition observed SCZ data mining in the Stanley Medical Research Insti- between compartments in ASD brains. tute database and qPCR has identified an increase in apo- It has been observed that transcriptome of the ASD brain ptotic signals, particularly tumour necrosis factor lacks definition when compared to the normal brain. This superfamily member 6 (FAS) receptor and tumour necrosis has been investigated in the temporal and frontal lobes of factor (ligand) superfamily member 13 (TNFSF13) (Catts ASD and normal patients; it was shown that the frontal and and Weickert 2012). However, it is noted that SCZ brains temporal lobes of ASD brains lack the ultrastructural orga- often have increased neural density with decreased size, yet nisation observed in normal brains (Voineagu et al. 2011). a loss of dendritic field is observed. This suggests that the Ataxin 2-bonding protein 1 (A2BP1), a splicing factor spe- role of these apoptotic factors in SCZ is not leading to cell cific to neural and muscle cells, has been highlighted as death but likely affects dendrite regression amongst other differentiating ASD brains. Through the use of RNA-Seq subtle effects on cellular components leading to interneuronal 34 J Mol Neurosci (2013) 51:28–36 signalling deficits (Catts and Weickert 2012). In con- Chan SL, Kim WS, Kwok JB, Hill AF, Cappai R, Rye KA, Garner B trast, increases in TNFSF13 and FAS receptor RNAs (2008) ATP-binding cassette transporter A7 regulates processing of amyloid precursor protein in vitro. J Neurochem 106:793–804 were not observed in BP patients. As expression pro- Chen G, Yin K, Shi L, Fang Y, Qi Y, Li P, Luo J, He B, Liu M, Shi T files are explored between these two diseases, a number (2011) Comparative analysis of human protein-coding and non- of discriminating diagnostic markers may be identified, coding RNAs between brain and 10 mixed cell lines by RNA-Seq. allowing for rapid determination of disease type without PLoS One 6:e28318 Chen Y, Dougherty ER, Bittner ML (1997) Ratio-based decisions and the need of psychoanalysis. the quantitative analysis of cDNA microarray images. J Biomed Opt 2:364–374 Chu TT, Liu Y, Kemether E (2009) Thalamic transcriptome screening Concluding Remarks in three psychiatric states. J Hum Genet 54:665–675 Costa V, Angelini C, De Feis I, Ciccodicola A (2010) Uncovering the complexity of transcriptomes with RNA-Seq. J Biomed Biotech- Pathway analysis of high-throughput techniques, such as nol 2010:853916 microarray and RNA-Seq, offers the chance to unravel mo- Courtney E, Kornfeld S, Janitz K, Janitz M (2010) Transcriptome lecular causes of complex diseases as discussed above. With profiling in neurodegenerative disease. J Neurosci Methods 193:189–202 increasing knowledge of complex diseases, the probability of Cruts M, Gijselinck I, van der Zee J, Engelborghs S, Wils H, Pirici D, early detection, prevention or treatment development for com- Rademakers R, Vandenberghe R, Dermaut B, Martin JJ, van plex diseases, such as AD, increases. Whilst many challenges Duijn C, Peeters K, Sciot R, Santens P, De Pooter T, Mattheijssens still oppose progress in the field, such as data handling and M, Van den Broeck M, Cuijt I, Vennekens K, De Deyn PP, Kumar-Singh S, Van Broeckhoven C (2006) Null mutations in storage, analysis of complex and lengthy gene lists and inte- progranulin cause ubiquitin-positive frontotemporal dementia gration of this knowledge into complex diseases, the improve- linked to chromosome 17q21. Nature 442:920–924 ment and application of these technologies will significantly Dracheva S, Davis KL, Chin B, Woo DA, Schmeidler J, Haroutunian increase our understanding of complex polygenic diseases and V (2006) Myelin-associated mRNA and protein expression defi- cits in the anterior cingulate cortex and hippocampus in elderly provide novel targets for further analysis. schizophrenia patients. Neurobiol Dis 21:531–540 Emmert-Streib F, Glazko GV (2011) Pathway analysis of expression Acknowledgments This work was supported by the National Health data: deciphering functional building blocks of complex diseases. and Medical Research Council of Australia (1022325 to WSK, 630434 PLoS Comput Biol 7:e1002053 to GMH). Fang Z, Cui X (2011) Design and validation issues in RNA-Seq experiments. Brief Bioinform 12:280–287 Faustino NA, Cooper TA (2003) Pre-mRNA splicing and human disease. Genes Dev 17:419–437 References Fratiglioni L, De Ronchi D, Aguero-Torres H (1999) Worldwide prev- alence and incidence of dementia. Drugs Aging 15:365–375 Golde TE, Estus S, Usiak M, Younkin LH, Younkin SG (1990) Ex- Ackermann M, Strimmer K (2009) A general modular framework for pression of beta amyloid protein precursor mRNAs: recognition gene set enrichment analysis. BMC Bioinforma 10:47 of a novel alternatively spliced form and quantitation in Alzheimer's Ameur A, Zaghlool A, Halvardson J, Wetterbom A, Gyllensten U, disease using PCR. Neuron 4:253–267 Cavelier L, Feuk L (2011) Total RNA sequencing reveals nascent Green ML, Karp PD (2006) The outcomes of pathway database com- transcription and widespread co-transcriptional splicing in the putations depend on pathway ontology. Nucleic Acids Res 34: human brain. Nat Struct Mol Biol 18:1435–1440 3687–3697 Bauer-Mehren A, Furlong LI, Sanz F (2009) Pathway databases and Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere tools for their exploitation: benefits, current limitations and chal- ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, lenges. Mol Syst Biol 5:290 Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi Bertram L, Lill CM, Tanzi RE (2010) The genetics of Alzheimer P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, disease: back to the future. Neuron 68:270–281 Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, Blalock EM, Buechel HM, Popovic J, Geddes JW, Landfield PW McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Love (2011) Microarray analyses of laser-captured hippocampus reveal S, Kehoe PG, Hardy J, Mead S, Fox N, Rossor M, Collinge J, distinct gray and white matter signatures associated with incipient Maier W, Jessen F, Schurmann B, van den Bussche H, Heuser I, Alzheimer’s disease. J Chem Neuroanat 42:118–126 Kornhuber J, Wiltfang J, Dichgans M, Frolich L, Hampel H, Hull Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, M, Rujescu D, Goate AM, Kauwe JS, Cruchaga C, Nowotny P, Landfield PW (2004) Incipient Alzheimer's disease: microarray Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De correlation analyses reveal major transcriptional and tumor sup- Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, pressor responses. Proc Natl Acad Sci USA 101:2173–2178 McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Breitling R, Amtmann A, Herzyk P (2004) Iterative Group Analysis Tsolaki M, Singleton AB, Guerreiro R, Muhleisen TW, Nothen (iGA): a simple tool to enhance sensitivity and facilitate interpre- MM, Moebus S, Jockel KH, Klopp N, Wichmann HE, Carrasquillo tation of microarray experiments. BMC Bioinforma 5:34 MM, Pankratz VS, Younkin SG, Holmans PA, O'Donovan M, Brown PO, Botstein D (1999) Exploring the new world of the genome Owen MJ, Williams J (2009) Genome-wide association study iden- with DNA microarrays. Nat Genet 21:33–37 tifies variants at CLU and PICALM associated with Alzheimer's Catts VS, Weickert CS (2012) Gene expression analysis implicates a disease. Nat Genet 41:1088–1093 death receptor pathway in schizophrenia pathology. PLoS One 7: Hashimoto T, Arion D, Unger T, Maldonado-Aviles JG, Morris HM, e35511 Volk DW, Mirnics K, Lewis DA (2008) Alterations in GABA- J Mol Neurosci (2013) 51:28–36 35

related transcriptome in the dorsolateral prefrontal cortex of sub- Jehle AW, Gardai SJ, Li S, Linsel-Nitschke P, Marimoto K, Janssen jects with schizophrenia. Mol Psychiatry 13:147–161 WJ, Vandivier RW, Wang N, Greenberg S, Dale BM, Qin C, Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, Carrasquillo Henson PM, Tall AR (2006) ATP-binding cassette transporter MM, Abraham R, Hamshere ML, Pahwa JS, Moskvina V, Dowzell A7 enhances phagocytosis of apoptotic cells and associated K, Jones N, Stretton A, Thomas C, Richards A, Ivanov D, ERK signaling in macrophages. J Cell Biol 174:547–556 Widdowson C, Chapman J, Lovestone S, Powell J, Proitsi P, Lupton Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Brown genomes. Nucleic Acids Res 28:27–30 KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Kapur K, Jiang H, Xing Y, Wong WH (2008) Cross-hybridization Mann D, Smith AD, Beaumont H, Warden D, Wilcock G, Love S, modeling on Affymetrix exon arrays. Bioinformatics 24:2887– Kehoe PG, Hooper NM, Vardy ER, Hardy J, Mead S, Fox NC, 2893 Rossor M, Collinge J, Maier W, Jessen F, Rüther E, Schürmann B, Kellett KA, Hooper NM (2009) Prion protein and Alzheimer disease. Heun R, Kölsch H, van den Bussche H, Heuser I, Kornhuber J, Prion 3:190–194 Wiltfang J, Dichgans M, Frölich L, Hampel H, Gallacher J, Hüll M, Kim WS, Guillemin GJ, Glaros EN, Lim CK, Garner B (2006) Quan- Rujescu D, Giegling I, Goate AM, Kauwe JS, Cruchaga C, titation of ATP-binding cassette subfamily-A transporter gene Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, expression in primary human brain cells. Neuroreport 17:891–896 Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Kim WS, Weickert CS, Garner B (2008) Role of ATP-binding cassette Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, transporters in brain lipid transport and neurological disease. J Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Neurochem 104:1145–1166 Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Khatri P, Sirota M, Butte AJ (2012) Ten years of pathway analysis: Wichmann HE, Pankratz VS, Sando SB, Aasly JO, Barcikowska current approaches and outstanding challenges. PLoS Comput M, Wszolek ZK, Dickson DW, Graff-Radford NR, Petersen RC, Biol 8:e1002375 Alzheimer's Disease Neuroimaging Initiative, van Duijn CM, Kim KH, Moon M, Yu SB, Mook-Jung I, Kim JI (2012) RNA-Seq Breteler MM, Ikram MA, DeStefano AL, Fitzpatrick AL, Lopez analysis of frontal cortex and cerebellum from 5XFAD mice at O, Launer LJ, Seshadri S, CHARGE consortium, Berr C, Campion early stage of disease pathology. J Alzheimers Dis 29:793–808 D, Epelbaum J, Dartigues JF, Tzourio C, Alpérovitch A, Lathrop M, Konradi C, Eaton M, MacDonald ML, Walsh J, Benes FM, Heckers S EADI1 consortium, Feulner TM, Friedrich P, Riehle C, Krawczak (2004) Molecular evidence for mitochondrial dysfunction in bi- M, Schreiber S, Mayhaus M, Nicolhaus S, Wagenpfeil S, Steinberg polar disorder. Arch Gen Psychiatry 61:300–308 S, Stefansson H, Stefansson K, Snaedal J, Björnsson S, Jonsson PV, Lambert JC, Heath S, Even G, Campion D, Sleegers K, Hiltunen M, Chouraki V,Genier-Boley B, Hiltunen M, Soininen H, Combarros Combarros O, Zelenika D, Bullido MJ, Tavernier B, Letenneur L, O, Zelenika D, Delepine M, Bullido MJ, Pasquier F, Mateo I, Bettens K, Berr C, Pasquier F, Fievet N, Barberger-Gateau P, Frank-Garcia A, Porcellini E, Hanon O, Coto E, Alvarez V, Engelborghs S, De Deyn P, Mateo I, Franck A, Helisalmi S, Bosco P, Siciliano G, Mancuso M, Panza F, Solfrizzi V, Nacmias B, Porcellini E, Hanon O, de Pancorbo MM, Lendon C, Dufouil C, Sorbi S, Bossù P, Piccardi P, Arosio B, Annoni G, Seripa D, Pilotto Jaillard C, Leveillard T, Alvarez V, Bosco P, Mancuso M, Panza F, A, Scarpini E, Galimberti D, Brice A, Hannequin D, Licastro F, Nacmias B, Bossu P, Piccardi P, Annoni G, Seripa D, Galimberti Jones L, Holmans PA, Jonsson T, Riemenschneider M, Morgan K, D, Hannequin D, Licastro F, Soininen H, Ritchie K, Blanche H, Younkin SG, Owen MJ, O'Donovan M, Amouyel P, Williams J Dartigues JF, Tzourio C, Gut I, Van Broeckhoven C, Alperovitch (2011) Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, A, Lathrop M, Amouyel P (2009) Genome-wide association study CD33 and CD2AP are associated with Alzheimer’s disease. Nat identifies variants at CLU and CR1 associated with Alzheimer's Genet 43:429–435 disease. Nat Genet 41:1094–1099 Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Lerch JK, Kuo F, Motti D, Morris R, Bixby JL, Lemmon VP (2012) Pearson JV, Stephan DA, Nelson SF, Craig DW (2008) Resolving Isoform diversity and regulation in peripheral and central neurons individuals contributing trace amounts of DNA to highly complex revealed through RNA-Seq. PLoS One 7:e30417 mixtures using high-density SNP genotyping microarrays. PLoS Li YY, Cui JG, Hill JM, Bhattacharjee S, Zhao Y, Lukiw WJ (2011) Genet 4:e1000167 Increased expression of miRNA-146a in Alzheimer's disease Hu X, Pickering E, Liu YC, Hall S, Fournier H, Katz E, Dechairo B, transgenic mouse models. Neurosci Lett 487:94–98 John S, Van Eerdewegh P, Soares H (2011) Meta-analysis for Lin M, Pedrosa E, Shah A, Hrabovsky A, Maqbool S, Zheng D, genome-wide association study identifies multiple variants at the Lachman HM (2011) RNA-Seq of human neurons derived from BIN1 locus associated with late-onset Alzheimer's disease. PLoS iPS cells reveals candidate long non-coding RNAs involved in One 6:e16616 neurogenesis and neuropsychiatric disorders. PLoS One 6:e23356 Huang DW, Sherman BT, Lempicki RA (2009) Systematic and inte- Lobo A, Launer LJ, Fratiglioni L, Andersen K, Di Carlo A, Breteler grative analysis of large gene lists using DAVID bioinformatics MM, Copeland JR, Dartigues JF, Jagger C, Martinez-Lage J, resources. Nat Protoc 4:44–57 Soininen H, Hofman A (2000) Prevalence of dementia and major Huang R, Jaritz M, Guenzl P, Vlatkovic I, Sommer A, Tamir IM, subtypes in Europe: a collaborative study of population-based Marks H, Klampfl T, Kralovics R, Stunnenberg HG, Barlow DP, cohorts. Neurologic Diseases in the Elderly Research Group. Pauler FM (2011) An RNA-Seq strategy to detect the complete Neurology 54:S4–S9 coding and non-coding transcriptome including full-length Lu T, Pan Y, Kao SY, Li C, Kohane I, Chan J, Yankner BA (2004) imprinted macro ncRNAs. PLoS One 6:e27288 Gene regulation and DNA damage in the ageing human brain. Ikeda Y, Abe-Dohmae S, Munehira Y, Aoki R, Kawamoto S, Furuya Nature 429:883–891 A, Shitara K, Amachi T, Kioka N, Matsuo M, Yokohama S, Ueda Martin JA, Wang Z (2011) Next-generation transcriptome assembly. K (2003) Posttranscriptional regulation of human ABCA7 and its Nat Rev Genet 12:671–682 function for the apoA-I-dependent lipid release. Biochem Biophys Mills JD, Janitz M (2012) Alternative splicing of mRNA in the mo- Res Commun 311:313–318 lecular pathology of neurodegenerative diseases. Neurobiol Aging Iwamoto N, Abe-Dohmae S, Sato R, Yokoyama S (2006) ABCA7 33:1012.e11–1012.e24 expression is regulated by cellular cholesterol through the Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying SREBP2 pathway and associated with phagocytosis. J Lipid Res of alternative splicing complexity in the human transcriptome by 47:1915–1927 high-throughput sequencing. Nat Genet 40:1413–1415 36 J Mol Neurosci (2013) 51:28–36

Pavlidis P, Qin J, Arango V, Mann JJ, Sibille E (2004) Using the gene Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice ontology for microarray data mining: a comparison of methods junctions with RNA-Seq. Bioinformatics 25:1105–1111 and application to age effects in human prefrontal cortex. Neuro- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel chem Res 29:1213–1222 H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, transcript expression analysis of RNA-Seq experiments with Hu X, Tan X, Guo J, Dong Z, Liang Y, Bao L, Wang J (2012) TopHat and Cufflinks. Nat Protoc 7:562–578 Comprehensive analysis of RNA-Seq data reveals extensive RNA Twine NA, Janitz K, Wilkins MR, Janitz M (2011) Whole transcriptome editing in a human transcriptome. Nat Biotechnol 30:253–260 sequencing reveals gene expression and splicing differences in brain Satoh J (2012) Molecular network of microRNA targets in Alzheimer's regions affected by Alzheimer's disease. PLoS One 6:e16266 disease brains. Exp Neurol 235:436–446 Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Shendure J (2008) The beginning of the end for microarrays? Nat Cantor RM, Blencowe BJ, Geschwind DH (2011) Transcriptomic Methods 5:585–587 analysis of autistic brain reveals convergent molecular pathology. Stolzing A, Grune T (2004) Neuronal apoptotic bodies: phagocytosis and Nature 474:380–384 degradation by primary microglial cells. FASEB J 18:743–745 Wang S, Qaisar U, Yin X, Grammas P (2012) Gene expression profil- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, ing in Alzheimer's disease brain microvessels. J Alzheimers Dis Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, 31:193–205 Mesirov JP (2005) Gene set enrichment analysis: a knowledge- Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool based approach for interpreting genome-wide expression profiles. for transcriptomics. Nat Rev Genet 10:57–63 Proc Natl Acad Sci USA 102:15545–15550 Williams C, Mehrian Shai R, Wu Y, Hsu YH, Sitzer T, Spann B, Sutherland GT, Janitz M, Kril JJ (2011) Understanding the pathogen- McCleary C, Mo Y, Miller CA (2009) Transcriptome analysis of esis of Alzheimer's disease: will RNA-Seq realize the promise of synaptoneurosomes identifies neuroplasticity genes overexpressed transcriptomics? J Neurochem 116:937–946 in incipient Alzheimer's disease. PLoS One 4:e4936 Tollervey JR, Wang Z, Hortobagyi T, Witten JT, Zarnack K, Kayikci Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black DL, Wong M, Clark TA, Schweitzer AC, Rot G, Curk T, Zupan B, Rogelj B, WH (2008) MADS: a new and improved method for analysis of Shaw CE, Ule J (2011) Analysis of alternative splicing associated differential alternative splicing by exon-tiling microarrays. RNA with aging and neurodegeneration in the human brain. Genome 14:1470–1479 Res 21:1572–1582 Ziats MN, Rennert OM (2011) Expression profiling of autism candi- Toung JM, Morley M, Li M, Cheung VG (2011) RNA-sequence date genes during human brain development implicates central analysis of human B-cells. Genome Res 21:991–998 immune signaling pathways. PLoS One 6:e24691 A.3 Conservation and tissue-specific transcription

patterns of long noncoding RNAs

Ward, M., McEwan, C., Mills, J. D., and Janitz, M., (2015).“Conservation and tissue-soecific transcription patterns of long noncoding RNAs.” Journal of Human transcriptome. Published Online.

152 http://informahealthcare.com/jht ISSN: 2332-4015 (electronic)

J Hum Transcriptome, Early Online: 1–8 DOI: 10.3109/23324015.2015.1077591

REVIEW Conservation and tissue-specific transcription patterns of long noncoding RNAs

Melanie Ward*, Callum McEwan*, James D Mills and Michael Janitz

School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia

Abstract Key Words: Over the past decade, the focus of molecular biology has shifted from being predominately DNA lncRNAs, comparative genomics, gene regula- and protein-centric to having a greater appreciation of RNA. It is now accepted that the genome tion, transcriptome, RNA-Seq is pervasively transcribed in tissue- and cell-specific manner, to produce not only protein-coding RNAs, but also an array of noncoding RNAs (ncRNAs). Many of these ncRNAs have been found to History interact with DNA, protein and other RNA molecules where they exert regulatory functions. Long Received 14 May 2015 ncRNAs (lncRNAs) are a subclass of ncRNAs that are particularly interesting due to their cell- Accepted 15 July 2015 specific and species-specific expression patterns and unique conservation patterns. Currently, individual lncRNAs have been classified functionally; however, for the vast majority the functional relevance is unknown. To better categorize lncRNAs, an understanding of their specific expression patterns and evolutionary constraints are needed.

Introduction is pervasively transcribed than first hypothesized [5]. While Recent developments in RNA sequencing (RNA-Seq) tech- they do not code for a protein, lncRNAs have been strongly nology have given scientists an in-depth view of the human associated with the regulation of epigenetic processes and transcriptome [1]. It is apparent that traditional views of expression of protein-coding genes. lncRNAs can be arranged RNA as merely an intermediary molecule between DNA and as intergenic/intervening, antisense, intronic, overlapping and For personal use only. protein discredits the complexity of the human genome and bidirectional, in relation to their localization to protein-coding ignores the pivotal role of noncoding RNA (ncRNA) as a genomic loci (Figure 1) [6]. There is now a growing wealth regulatory molecule in essential life processes [2]. Despite of data to suggest that lncRNAs possess biological function merely a twofold increase in the number of protein-coding [7-9]. genes between the human genome and that of the common The dysregulation of lncRNAs expression has been fruit fly, Drosophila melanogaster, these species exhibit implicated in a number of diseases across different tissue dramatically differing levels of phenotypic complexity. To types. Merely 7% of disease-associated single nucleotide account for this disparity, there must exist a multi-level polymorphism (SNPs) is located within protein-coding regulatory mechanism enabling such drastic diversity from a regions compared to the 93% of SNPs that are found in similar number of protein-coding genes. noncoding regions [10]. Despite this asymmetry in SNPs There is a direct correlation between the proportion of distribution, the determination of lncRNAs role in disease ncRNAs in an organism’s genome and its developmental

J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 pathogenesis remains difficult due to a lack of functional complexity [3]. The largest subclass of ncRNAs is long information prohibiting domain and functional prediction that noncoding RNAs (lncRNAs). These are mRNA-like is possible with protein-coding genes. transcripts arbitrarily defined as being greater than 200 lncRNAs have been shown to be expressed in a distinct nucleotides long, with no protein-coding capacity, which pattern across a number of tissue types. A number of however undergo alternative splicing and post-transcriptional lncRNAs have also been shown to be expressed in discrete processing [4]. Initially dismissed as ‘junk DNA’ where any cell types and within distinct subcellular structures [11,12]. transcription was interpreted an artifact of transcriptional These findings coincide with notions of lncRNA as regulators noise, it has recently been shown that far more of the genome of gene expression in specific cell types. Thus, the

Correspondence: Michael Janitz, PhD MD School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia. Tel: +61 2 938 58608. Fax: +61 2 938 51483. E-mail: [email protected] *These authors contributed equally to this work.

Ó 2015 The Author(s). Published by Taylor & Francis. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the origi- nal work is properly cited. 2 M. Ward et al. J Hum Transcriptome, 2015; Early Online:1–8

Figure 1. Genomic localization-based classes of lncRNAs. Upper panel: Intronic lncRNA: the lncRNA is transcribed from an intronic region of a protein- coding gene. Antisense lncRNA: the lncRNA is transcribed from the strand opposite to protein-coding gene, with partial or complete overlap of any intronic or exonic regions. Intergenic/intervening lncRNA: the lncRNA is transcribed from a region located between other genes. There is no overlap with any protein-coding genes. Overlapping lncRNA: The intron of the lncRNA encompasses a protein-coding gene. Bidirectional lncRNA: The lncRNA shares its transcription start site with a protein-coding gene on the opposite strand. Arrows indicate orientation of transcription. Lower panel: lncRNAs For personal use only. can be alternatively spliced to produce numerous splice variants. Here, the intervening lncRNA is spliced to produce two variants. Each of these variants produces RNAs with unique secondary structure. The unique RNA secondary structure can determine function of the lncRNA isoform.

identification and characterization of human lncRNAs with demonstrating a tissue-specific expression pattern and with tissue-specific expression become essential in order to deter- 29% found to be expressed in only one discrete tissue type mine their relevant functions. [17]. This widespread consistency of specific lncRNAs across Another interesting property of lncRNAs is their rapid evo- different tissue types is suggestive of their specific biological lution across species. Previously, the conservation of sequence function within the individual tissue. Despite this, little work was thought to be evidence of functionality but lncRNAs has been done, as of now, in characterizing the expression have proved that this is not always the case. The tissue- profiles of tissue-specific lncRNAs beyond possible roles in specific expression patterns of lncRNAs, coupled with their disease pathogenesis, and in particular cancer [18].

J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 distinctive conservation patterns, make lncRNAs a unique transcriptional element that warrants further investigation. Brain The brain is the most complex tissue in the human body. Tissue-specific expression of lncRNAs Beyond its billions of neurons, the brain comprises a number LncRNAs exhibit notably higher degree of tissue specificity of other cell types, such as oligodendrocytes, astrocytes and when compared to protein-coding genes [13,14]. The appar- microglia with heterogeneous distribution across anatomical ent specificity of lncRNAs throughout various tissue and cell subregions. Due to this complexity in terms of both structure types has been repeatedly highlighted and is indicative of and function, the brain requires a similarly complex regula- specific regulatory roles within essential cellular processes tory system and as a result is the richest source of lncRNAs [11,15,16]. Indeed, if lncRNAs were merely the result of in the body [2]. LncRNAs play an essential role in the brain transcriptional noise we would expect little variation in in terms of development, neuronal maintenance and function expression levels between tissues [2]. A comparative study and have been linked to a number of neurodegenerative dis- investigating tissue specificity of lncRNAs across 11 tissue eases [19]. When addressing the brain physiology of humans, types found that the majority of lncRNA expression was the lncRNA repertoire is the greatest point of differentiation restricted to discrete tissue types with 67% of lncRNAs from other primates and other vertebrate species entirely due DOI: 10.3109/23324015.2015.1077591 Conservation and tissue-specific transcription patterns of long noncoding RNAs 3

to the increased developmental complexity of the human localization of lncRNAs supports the premise of their brain [20]. Despite a high level of sequence similarity of functionality. protein-coding genes between humans and other primates we The purported functional relationship between lncRNAs, see far less conformity in the noncoding portion of the as cis-regulatory elements, and adjacent protein-coding genes genome that is transcribed. Indeed the number of lncRNAs, has also been observed in the human brain [9,24]. It has been particularly brain-specific, have been shown to directly found that many of these adjacent protein-coding genes have increase in correlation with developmental complexity even neurodevelopmental functions and the expression levels of as the number of protein-coding genes remains relatively adjacent lncRNAs consistently impact the transcription of the unchanged [2]. protein-coding genes. Despite this, no consistent pattern has There is a growing amount of data on the highly tissue- emerged linking the transcription of lncRNAs and adjacent specific lncRNA transcripts located between protein-coding protein-coding genes [25]. These lncRNAs have also been gene loci, known as long intervening ncRNAs (lincRNAs), indicated to have an integral role in the regulation of cellular and their role in regulation of fundamental cellular processes differentiation of neuronal and glial cells, particularly during [14]. The lincRNAs are generally expressed at lower levels development [25]. than protein-coding genes (~10-fold lower) [14]; however, Despite an incomplete annotation of the long noncoding the brain transcriptome contains many unique lincRNA transcriptome and a general lack of functional information, the transcripts that are expressed at significantly higher levels dysregulation of tissue-specific lncRNAs has been strongly than many protein-coding genes, such as the oligodendrocyte associated with a number of diseases [19]. The differential maturation-associated lincRNA (OLMALINC) [15]. OLMA- expression of lncRNAs in healthy and diseased states is LINC is a primate-specific transcript that has been shown to shown through comparisons of the transcriptome profiles, play an essential role in the regulation of genes responsible which differ significantly in neurodegenerative diseases such for human oligodendrocyte maturation [15]. OLMALINC is as multiple system atrophy (MSA) [26] and Parkinson’s highly expressed in the white matter of the human frontal disease [27]. Despite consistent association between the dysre- cortex with expression levels of 71.5 fragments per kilobase gulation of brain-specific lncRNAs and neurological disorders of exon per million fragments mapped (fpkm) as determined [19,28], further research is required to individually categorize by RNA-Seq. Such high level of expression indicates a and ascertain the functions and molecular mechanisms of strong regulatory role in oligodendrocytes, which comprise action of the dysregulated lncRNAs in order to determine their the majority of white matter. The differential expression of role in disease progression. OLMALINC in gray and white matter (16.2 and 71.5 fpkm, respectively) demonstrates the dynamic nature of the brain Testis transcriptome and the tissue specificity of lincRNAs.

For personal use only. A recent profiling of the transcriptome patterns of gray Testis is a rich source of many unique lncRNA transcripts; and white matter highlighted the tissue-specific nature of however, very little is known about lncRNAs expressed lincRNAs in a healthy brain [21]. The expression of lincR- solely in this organ. In-depth analyses of the testis transcrip- NAs differs significantly between gray and white matter and tome using RNA-Seq data have shown a widespread and this is believed to be largely due to the nonconformity in cell diverse transcription of both protein-coding and ncRNAs populations between the tissues. Thus in each tissue type [29]. The testis has two key functions: the secretion of sex there exists divergent transcriptome profiles indicative of hormones and spermatogenesis. The production of spermato- discrete roles in brain function for the different tissue types zoa is a complex biological process involving multiple stages and provides evidence that lincRNAs function in a cell controlled by epigenetic and molecular mechanisms at both type-specific manner [21]. transcriptional and post-transcriptional levels [30]. The need There is a growing need for the development of more for such regulation has been suggested as a reason for the comprehensive expression profiles of lncRNAs for all regions diversity of the testis transcriptome with specific lncRNAs

J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 of the human brain [2]. Recent transcriptome analyses of the predicted to play key regulatory roles [14]. hippocampus and pre-frontal cortex of the adult mouse brain A comparative study investigating the five most common found highly specific lncRNA expression signatures within cell types involved in spermatogenesis found that in addition subregions of the brain and distinct neuronal populations to expressing a greater palette of lncRNAs transcripts than [22]. A total of 2759 lncRNAs were found to be expressed in cells of the brain or liver, the expression of unique lncRNAs the hippocampus, 2561 in the pre-frontal cortex and of these differed significantly between the cells of the testis producing 2390 lncRNAs were expressed in both regions while 24 were highly specific expression patterns [29]. This was particularly differentially regulated. The expression levels of the six pronounced in spermatids and spermatocytes, which highest differentially expressed lncRNAs were then analyzed exhibited the highest levels of lncRNA transcription [29]. in the cerebellum and striatum, and compared to that of the Currently there are limited studies into human testis- hippocampus and the pre-frontal cortex. The majority of specific lncRNAs expression and as a result we must rely on these lncRNAs were found to be differentially regulated animal models. A recent study produced lncRNA expression across all of the brain subregions. A further study using the profiles for the testis of a neo-natal and adult mouse [31]. Allen Brain Atlas showed lncRNAs to be expressed not only This study identified over 3000 differentially expressed in specific subregions of the mouse brain but also in specific lncRNAs between the neo-natal and adult mice [31]. These cell types and subcellular compartments [23]. The specific dramatic differences in lncRNA expression could indicate a 4 M. Ward et al. J Hum Transcriptome, 2015; Early Online:1–8

significant biological role for lncRNA during the testis post- development and differentiation. Six3 opposite strand tran- natal development. Furthermore, lncRNAs were found to script (Six3OS) is promoter-associated lncRNA found to exhibit a greater spatial and temporal specificity than protein- play a role in the regulation of retinal cell differentiation coding genes consistent with previous studies and supportive through knockdown and overexpression studies [36]. Six3OS of a cell type-specific regulatory role. was also shown to modulate the expression of associated protein-coding genes through the recruitment of histone Liver modification enzymes. Six3OS acts as a molecular scaffold The role and function of lncRNAs in the liver is largely that leads to the recruitment of histone modification enzymes. unknown but the dysregulation of specific transcripts has A retina-specific lncRNA ventral anterior homeobox 2, been associated with liver diseases such as hepatocellular car- opposite strand (Vax2os) was also shown to regulate the cell cinoma [32] and nonalcoholic steatohepatitis [33]. Liver- cycle during mammalian retina development [37]. specific lncRNAs have also been implicated in the regulation Overexpression of this transcript during the early stages of of processes such as lipid metabolism. Liver-specific triglyc- development was associated with a reduced rate of retinal eride regulator (lncLSTR) was found to regulate the clearance cell proliferation. Vax2os is so far the only example of a cell of triglyceride and help maintain systemic lipid homeostasis type-specific lncRNA regulating the cell cycle during through a novel lncRNA signaling pathway. Its apparently mammalian development. key role in this crucial metabolic process highlights the potential physiological importance of lncRNAs in the liver. Rapid evolution of lncRNAs LncRNAs show very little conservation in their sequence and Heart they evolve rapidly [38-40]. The predicted amount of shared Little is known about the role of lncRNAs in the heart; functional sequence decreases dramatically as the divergence however, a heart-specific lncRNAs has been found to be between mammalian species increases, suggesting a very involved in cardiac development. FOXF1 adjacent noncoding high rate of sequence turnover [41]. The rate of nucleotide developmental regulatory RNA (Fendrr) is a lateral substitution in protein-coding sequences is ~ 10%, whereas mesoderm-specific lncRNA that is essential for the develop- noncoding sequences have a substitution rate of 90%. ment of the heart wall in mouse and was shown to have an The rapid evolution of lncRNAs originally led to the orthologous transcript in humans [34]. Fendrr was found to assumption that they were nonfunctional. Nonfunctional modulate chromatin signatures that define gene activity by sequences tend to display a similar rate of sequence change binding directly to the histone-modifying complexes Poly- when compared to evolutionarily neutral sequences [42]. comb repressive complex 2 (PRC2) and histone–lysine N- However, lncRNAs have demonstrated more constraint than

For personal use only. methyltransferase 2A (KMT2A), which play a central role in random intergenic regions [43]. Ancient lncRNAs (minimum the activation of genes responsible for cell differentiation and of 90 Myr) show higher levels of long-term exonic sequence lineage commitment. PRC2 and KMT2A act as a repressor conservation than untranslated regions, with the oldest pre- and activator of cellular proliferation, respectively, in the heart senting similar levels of constraint with protein-coding exons. during embryonic development. The knockdown of Fendrr in In comparison, young lincRNAs (under 25 Myr) show lower mice was shown to be lethal to the embryos due to heart wall levels of exonic sequence conservation than random inter- deficits and significantly impaired heart function demonstrat- genic regions [39]. This may be due to the fact that young ing its importance for normal heart function. genes demonstrate rapid evolution [44]. Young genes are more susceptible to variable selection pressures than well- established genes [45]. Interestingly, lncRNAs with multiple Skeletal muscle exons appear to demonstrate greater evolutionary constraints Long intergenic ncRNA, muscle differentiation 1 (Linc-MD1) within exons [46]. J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 has been identified to have a significant role in myogenesis through its control of muscle differentiation [35]. Linc-MD1 Conservation beyond the primary sequence expression is temporally dynamic in order to control the pro- gression through the stages of muscle differentiation where it The sequence of RNAs can differ whilst their secondary functions as a competing endogenous RNA for the binding structure can be conserved [47,48]. Many lncRNAs showed a of the microRNAs (miRNAs) miR-133 and miR-135. The number of correlated positions that could be the result of con- two miRNAs regulate the binding of the transcription factors served secondary structures (Derrien et al. 2012). One of the that promote muscle differentiation. Hence, Linc-MD1 plays well-characterized lncRNAs, HOX transcript antisense RNA a crucial role in the regulation of muscle terminal differentia- (HOTAIR), is believed to have conserved structures but diver- tion through its action as part of a network of regulatory gent sequences across species [44]. RNAs can form a variety interactions. of structures such as tetraloops [49], GU base pair motifs [50], adenosine platforms, helixes and tandem repeats [51]. These motifs have demonstrated sequence conservation, for Retina example the hairpin loop and the tRNA-like structure in the Several retina-specific lncRNAs in mice have been identified lncRNA metastasis-associated lung adenocarcinoma transcript and determined to be of functional importance in retinal cell 1(Malat1) [52]. The majority of the helixes appear to be DOI: 10.3109/23324015.2015.1077591 Conservation and tissue-specific transcription patterns of long noncoding RNAs 5

conserved across a variety of species, in comparison to the been evolutionarily selected for increased stability [65]. It is base paired regions, which are not so well conserved [53]. believed that A/T to C/G substitutions led to a more stable This theory is supported by the fact that many lncRNAs secondary structure in HAR1 [62]. Forkhead box protein with differing sequences are able to bind to the same P2 (FOXP2) and abnormal spindle-like microcephaly protein [54,55]. associated protein (ASPM), which are involved in speech pro- The functional role of the lncRNA may also be conserved. duction and brain size respectively, have undergone the same One established lncRNA is X-inactive-specific transcript kind of evolutionary change [66,67]. (Xist), which is involved in X-chromosome inactivation. The function of Xist is conserved across mammals, even though Methods of detecting lncRNAs the sequence is evolving at a high rate [56]. In addition mouse and zebra fish lncRNAs, involved in embryonic devel- RNA-Seq is a high-throughput next-generation sequencing opment, did not have conserved sequences, whereas the func- technique that is capable of measuring RNA expression lev- tion appears to be conserved [57]. If the functional roles of els and providing an accurate picture of the transcriptome lncRNAs are conserved across species, then it is most likely [68]. RNA-Seq has numerous advantages over other that their loci will also be conserved [38]. Indeed, studies transcriptome profiling techniques such as microarrays. have found that lncRNAs have conserved synteny across a RNA-Seq has a higher resolution, lower levels of background range of species [39,58]. noise, lower requirement of input RNA and can detect a greater range of expression levels [69]. The most important aspect of RNA-Seq is that it can be used to assemble LncRNA evolution in primates transcriptomes de novo; this allows for the discovery of King and Wilson first proposed that the major biological dif- un-annotated transcripts and novel splicing events [69]. This ferences between humans and chimpanzees are due to gene ability makes RNA-Seq an ideal tool for the identification regulation, not differences in sequence [59]. There are proba- species- and tissue-specific lncRNAs, many of which have bly too few changes in the amino acid sequence of proteins not been previously annotated. to result in the phenotypical differences between humans and More recently, slight modifications of the template prepa- chimpanzees [20]. In fact, a larger number of protein-coding ration stage of RNA-Seq have allowed for the strand of ori- genes are conserved for primates when compared to gin from which an RNA molecule is transcribed from to be lncRNAs; 92% of human intergenic lncRNAs are expressed tracked, thus allowing for the identification of antisense tran- in chimpanzee or bonobo and ~ 72% are expressed in the scription. These techniques are known as strand-specific macaque. In comparison > 98% of protein-coding genes is RNA-Seq [70,71]. While a multitude of different strand- conserved for all primates [39]. specific RNA-Seq exist, currently the most widely used is the

For personal use only. It is believed that human brain evolution has occurred dUTP second-strand marking method [70]. Strand-specific through changes in noncoding parts of the genome [60]. RNA-Seq techniques allow for the identification of antisense The human brain is in fact a rich source of lincRNAs, transcripts and this feature is particularly relevant to further supporting this theory [21,26]. The majority of lncRNAs. Examples of antisense lncRNAs include TSIX gene expression differences between the brains of humans transcript, XIST antisense RNA (TSIX) [72] and the beta-site and nonhuman primates involved upregulation of gene APP-cleaving enzyme 1 antisense RNA (BACE1-AS) [73]. It expression in humans [61]. While this may be due to is estimated that between 20–30% of human transcripts have higher levels of neuronal activity, it has been found that an antisense partner [74,75]. Further, the amount of antisense genes critical for neural development are upregulated transcription will vary from cell type to cell type [76]. across mammals [62]. Another important technical advance concerning RNA-Seq is Brain growth patterns vary across primate species [63] and the use of ribosomal depletion to select the RNA fraction for humans show a unique pattern of expression [61]. The sequencing rather than selecting only those transcripts that

J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 expression pattern of genes in the chimpanzee brain cortex is are polyadenylated. Ribosomal depletion removes ribosomal more similar to gene expression patterns in macaques than RNA from the samples, allowing for the selection of both humans [64]. This indicates an increase in the rate of evolu- polyadenylated positive (poly(A)+) and polyadenylated tion in gene regulation in the human lineage [64]. The negative (poly(A) )fractionsforsequencing[77].Thisis expression of human-specific genes was greater in the frontal important as largeÀ amounts of transcriptional output in lobe in comparison to the hippocampus and caudate. This eukaryotic cells is poly(A) [78]. As ribosomal depleted suggests that the majority of evolutionary change in the strand-specific RNA-Seq becomesÀ the standard for all human brain was focused in the frontal lobe [20]. Genes in transcriptome-profiling experiments, it is expected that more the frontal lobe that are associated with neuron projections, lncRNAs will be found throughout different tissue types in neurotransmitter transport, synapses, axons and dendrites, as the human body. well as genes implicated in schizophrenia showed increased Raw RNA-Seq data needs to be processed and analyzed to connectivity in the human brain when compared to chimpan- answer all sorts of bioinformatics enquires, including investi- zees and macaques [20]. gation of gene/transcript expression levels, detection of alter- One example of a noncoding gene that is thought to have native splicing events and identification of unannotated evolved a unique function in humans is the human acceler- genes/transcripts. In brief to analyze RNA-Seq data, first the ated region 1 (HAR1). It has been suggested that HAR1 has reads must be mapped to the reference genome, next 6 M. Ward et al. J Hum Transcriptome, 2015; Early Online:1–8

transcripts are assembled followed by a differential expres- The lncRNAdb (http://www.lncrnadb.org/) provides a sion analysis. A common workflow currently used by summary of known eukaryotic lncRNAs. lncRNAdb differs researchers is known as the Tuxedo suite, which utilizes the from many other databases as entries must be supported by software packages Tophat, Cufflinks and Cuffdiff [79]. This literature and they do not pull their data from unconfirmed workflow is ideal for the identification of lncRNAs as it has sources to ensure validity [82]. Thus, lncRNAdb serves as a the ability to identify novel splicing events and un-annotated reliable resource for exploration of eukaryotic lncRNAs; transcripts down to the resolution of a single base. It also however it represents only a small fraction of currently anno- takes advantage of data generated by ribosomal strand- tated lncRNAs. The database currently contains 287 eukary- specific RNA-Seq to locate antisense transcripts. otic lncRNAs that have been manually curated and described A typical RNA-Seq experiment will produce vast amounts independent of scientific literature [83]. It provides informa- of data. Generally it is not feasible to analyze data on a per- tion on lncRNA function, sequences, expression data and sonal computer due to limitations in storage size and raw relevant supportive literature. Of these, 100 lncRNAs have processing power. These problems can be overcome through had function determined through direct in vitro and/or in vivo the use of high-performance computing (HPC) clusters. experiments. A HPC cluster consists of multiple nodes, with each node GermlncRNA (http://germlncrna.cbiit.cuhk.edu.hk/) is a containing one or more central processing units (CPUs), each web-based lncRNA catalog containing annotations of male with numerous cores. HPC clusters are normally a resource germ-cell specific lncRNAs [84]. This catalog currently con- shared across a major institute such as a university or hospi- tains 110476 annotated lncRNAs and 2790 novel lncRNAs, tal. Another alternative is to take advantage of cloud comput- the latter classified as novel as they were unannotated in any ing services such as Amazon Web Services (AWS) (http:// of the public genomic databases. The database was created aws.amazon.com/). AWS allows researchers to dynamically through the integration of male germ transcriptome profiles adjust the computing power and storage requirements based from microarray, RNA-Seq and GermSAGE studies [84]. on current requirements and has potential computing power A tissue-specific focus allows for more comprehensive gene much larger than any HPC cluster. coverage, especially important for the testes, which are a rich source of lncRNAs. LNCipedia (http://www.lncipedia.org/) is a comprehensive Concluding remarks and future directions database for annotated human lncRNAs generated through Only recently has technology been able to identify lncRNAs the incorporation of data obtained from a number of different using high-throughput methods such as RNA-seq. Questions sources. This allowed for a rapid increase of the gene entries still remain as to how many of the proposed lncRNAs are from 21488 annotated lncRNAs in LNCipedia v.1.0 to functional, what that function is and the role that they have 111685 annotated lncRNAs in LNCipedia v.3.1. [85]. Along

For personal use only. played in evolution. More knockdown and overexpression with sequence/transcript information, secondary structure and studies are necessary to explore the diverse roles that protein-coding potential are explored in detail for many of lncRNAs possess. For example, the overexpression of the the cataloged lncRNAs [86]. A strategy to detect lncRNAs 3’UTR region of the phosphatase and tensin homolog pseu- with protein-coding potential has been integrated within the dogene 1 (PTENP1), through retroviral vectors, revealed its database, which reanalyzes the mass spectrometry data pub- role in the regulation of the phosphatase and tensin homolog licly available from the PRIDE database. The wide scale of (PTEN) [80]. RNAi of OLMALINC in human oligodendro- LNCipedia allows for incorporation of its content into large cytes [15] revealed the perturbation of the expression of genomic projects, including development of customized genes involved in the maturation and myelination of oligo- microarrays allowing genome-wide surveys of lncRNA dendrocytes. A systematic approach is needed to attempt to expression. elucidate the function of various lncRNAs, which could These databases were created through the integration of prove difficult due to the species and tissue specificity of pre-existing public resources. While this allows for large

J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 many lncRNAs. amounts of information to be shared and combined, it also In order to better determine lncRNA role as part of a regu- led to lncRNA predictions that greatly vary between latory network it is essential to produce comprehensive, func- individual repositories. This is due to differences in tional annotations for lncRNAs similar to those that exist for methodology, classification and assembly algorithms, protein-coding genes. This is especially relevant for those which result in many lncRNAs to be missed or improperly novel lncRNAs associated with human diseases. As a result of categorized [81]. Constant verification is required to ensure advances into ncRNA research, there are several public data- the validity of the database, which is particularly difficult bases of annotated lncRNAs; however complete functional to achieve in large lncRNA databases. This remains an characterization of all lncRNAs is needed beyond merely issue with the number of lncRNAs being annotated con- basic sequence and transcript information [81]. A number of stantly increasing but experimental functional characteriza- lncRNA databases currently exist, each with different focuses tion lagging behind. Before function can be determined which determines their utility. This includes LNCipedia with for all annotated lncRNAs, for example utilizing knock- broad coverage of a high number of lncRNAs, lncRNA data- down and overexpression approaches, a complete and base (lncRNAdb) providing in-depth annotation of a variety comprehensive catalog of evolutionary conservation and of different lncRNAs and GermlncRNA with a tissue-specific tissue-specific expression for these transcripts must firstly catalogue of lncRNAs. be produced. DOI: 10.3109/23324015.2015.1077591 Conservation and tissue-specific transcription patterns of long noncoding RNAs 7

Acknowledgements brain regions and neuronal populations. Front Cell Neurosci 2015;9:63. [23] Mercer TR, Dinger ME, Sunkin SM, et al. Specific expression of The authors would like to thank Cathy and Travis Hore and long noncoding RNAs in the mouse brain. Proc Natl Acad Sci their family and friends for generous donations to the UNSW USA 2008;105:716–21. MSA Research Fund. [24] Orom UA, Derrien T, Beringer M, et al. Long noncoding RNAs with enhancer-like function in human cells. Cell 2010;143:46–58. [25] Mercer TR, Qureshi IA, Gokhan S, et al. Long noncoding RNAs Declarations of interest in neuronal-glial fate specification and oligodendrocyte lineage maturation. BMC Neurosci 2010;11:14. The authors report no declarations of interest. [26] Mills JD, Kim WS, Halliday GM, Janitz M. Transcriptome analysis of grey and white matter cortical tissue in multiple system atrophy. Neurogenetics 2015;16:107–22. References [27] Soreq L, Guffanti A, Salomonis N, et al. Long non-coding RNA and alternative splicing modulations in Parkinson’s leuko- [1] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for cytes identified by RNA sequencing. PLoS Comput Biol 2014;10: transcriptomics. Nat Rev Genet 2009;10:57–63. e1003517. [2] Mattick JS. The central role of RNA in human development and [28] Barry G, Briggs JA, Vanichkina DP, et al. The long non-coding cognition. FEBS Lett 2011;585:1600–16. RNA Gomafu is acutely regulated in response to neuronal activa- [3] Taft RJ, Pheasant M, Mattick JS. The relationship between non- tion and involved in schizophrenia-associated alternative splicing. protein-coding DNA and eukaryotic complexity. Bioessays 2007; Mol Psychiatry 2014;19:486–94. 29:288–99. [29] Soumillon M, Necsulea A, Weier M, et al. Cellular source and [4] Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. mechanisms of high transcriptome complexity in the mammalian Annu Rev Biochem 2012;81:145–66. testis. Cell Reports 2013;3:2179–90. [5] Carninci P, Kasukawa T, Katayama S, et al. The transcriptional [30] Bao J, Wu J, Schuster AS, et al. Expression profiling reveals landscape of the mammalian genome. Science 2005;309:1559–63. developmentally regulated lncRNA repertoire in the mouse male [6] Mattick JS, Rinn JL. Discovery and annotation of long noncoding germline. Biol Reprod 2013;89:107. RNAs. Nat Struct Mol Biol 2015;22:5–7. [31] Sun J, Lin Y, Wu J. Long non-coding RNA expression profiling of [7] Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding mouse testis during postnatal development. PloS One 2013;8: RNAs: lack of conservation does not mean lack of function. Trends e75750. Genet 2006;22:1–5. [32] Huang JF, Guo YJ, Zhao CX, et al. Hepatitis B virus X protein [8] Guttman M, Garber M, Levin JZ, et al. Ab initio reconstruction (HBx)-related long noncoding RNA (lncRNA) down-regulated of cell type-specific transcriptomes in mouse reveals the expression by HBx (Dreh) inhibits hepatocellular carcinoma conserved multi-exonic structure of lincRNAs. Nat Biotechnol metastasis by targeting the intermediate filament protein vimentin. 2010;28:503–10. Hepatology 2013;57:1882–92. [9] Ponjavic J, Oliver PL, Lunter G, Ponting CP. Genomic and tran- [33] Takahashi K, Yan I, Haga H, Patel T. Long noncoding RNA in scriptional co-localization of protein-coding and long non-coding liver diseases. Hepatology 2014;60:744–53. RNA pairs in the developing brain. PLoS Genet 2009;5:e1000617. [34] Grote P, Wittler L, Hendrix D, et al. The tissue-specific [10] Hindorff LA, Sethupathy P, Junkins HA, et al. Potential etiologic lncRNA Fendrr is an essential regulator of heart and body wall and functional implications of genome-wide association loci for development in the mouse. Dev Cell 2013;24:206–14. For personal use only. human diseases and traits. Proc Natl Acad Sci USA [35] Cesana M, Cacchiarelli D, Legnini I, et al. A long noncoding 2009;106:9362–7. RNA controls muscle differentiation by functioning as a competing [11] Sone M, Hayashi T, Tarui H, et al. The mRNA-like noncoding endogenous RNA. Cell 2011;147:358–69. RNA Gomafu constitutes a novel nuclear domain in a subset of [36] Rapicavoli NA, Poth EM, Zhu H, Blackshaw S. The long noncod- neurons. J Cell Sci 2007;120:2498–506. ing RNA Six3OS acts in trans to regulate retinal development by [12] Bond CS, Fox AH. Paraspeckles: nuclear bodies built on long non- modulating Six3 activity. Neural Deve 2011;6:32. coding RNA. J Cell Biol 2009;186:637–44. [37] Meola N, Pizzo M, Alfano G, et al. The long noncoding [13] Djebali S, Davis CA, Merkel A, et al. Landscape of transcription RNA Vax2os1 controls the cell cycle progression of photoreceptor in human cells. Nature 2012;489:101–8. progenitors in the mouse retina. RNA 2012;18:111–23. [14] Cabili MN, Trapnell C, Goff L, et al. Integrative annotation of [38] Kutter C, Watt S, Stefflova K, et al. Rapid Turnover of Long Non- human large intergenic noncoding RNAs reveals global properties coding RNAs and the Evolution of Gene Expression. PLoS Genet and specific subclasses. Genes Dev 2011;25:1915–27. 2012;8:e1002841. [15] Mills JD, Kavanagh T, Kim WS, et al. High expression of long [39] Necsulea A, Soumillon M, Warnefors M, et al. The evolution of intervening non-coding RNA OLMALINC in the human cortical lncRNA repertoires and expression patterns in tetrapods. Nature white matter is associated with regulation of oligodendrocyte 2014;505:635–40. J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 maturation. Mol Brain 2015;8:2. [40] Ulitsky I, Shkumatava A, Jan CH, et al. Conserved Function of [16] Wu SC, Kallin EM, Zhang Y. Role of H3K27 methylation in the lincRNAs in Vertebrate Embryonic Development Despite Rapid regulation of lncRNA expression. Cell Res 2010;20:1109–16. Sequence Evolution. Cell 2011;147:1537–50. [17] Sasaki YT, Sano M, Ideue T, et al. Identification and characteriza- [41] Meader S, Ponting CP, Lunter G. Massive turnover of functional tion of human non-coding RNAs with tissue-specific expression. sequence in human and other mammalian genomes. Genome Res Biochem Biophys Res Commun 2007;357:991–6. 2010;20:1335–43. [18] Quagliata L, Terracciano LM. Liver diseases and long non-coding [42] Ponting CP, Oliver PL, Reik W. Evolution and functions of long RNAs: new insight and perspective. Front Med 2014;1:35. noncoding RNAs. Cell 2009;136:629–41. [19] Wu P, Zuo X, Deng H, et al. Roles of long noncoding RNAs in [43] Guttman M, Amit I, Garber M, et al. Chromatin signature reveals brain development, functional diversification and neurodegenera- over a thousand highly conserved large non-coding RNAs in tive diseases. Brain Res Bull 2013;97:69–80. mammals. Nature 2009;458:223–7. [20] Konopka G, Friedrich T, Davis-Turak J, et al. Human-specific tran- [44] He S, Liu SP, Zhu H. The sequence, structure and evolutionary scriptional networks in the brain. Neuron 2012;75:601–17. features of HOTAIR in mammals. BMC Evol Biol 2011;11:14. [21] Mills JD, Kavanagh T, Kim WS, et al. Unique transcriptome [45] Vishnoi A, Kryazhimskiy S, Bazykin GA, et al. Young proteins patterns of the white and grey matter corroborate structural and experience more variable selection pressures than old proteins. functional heterogeneity in the human frontal lobe. PLoS One Genome Res 2010;20:1574–81. 2013;8:e78480. [46] Chodroff RA, Goodstadt L, Sirey TM, et al. Long noncoding [22] Kadakkuzha BM, Liu XA, McCrate J, et al. Transcriptome analy- RNA genes: conservation of sequence and brain expression among ses of adult mouse brain reveal enrichment of lncRNAs in specific diverse amniotes. Genome Biol 2010;11:R72. 8 M. Ward et al. J Hum Transcriptome, 2015; Early Online:1–8

[47] Lindgreen S, Gardner PP, Krogh A. Measuring covariation in [67] Evans PD, Anderson JR, Vallender EJ, et al. Adaptive evolution of RNA alignments: physical realism improves information measures. ASPM, a major determinant of cerebral cortical size in humans. Bioinformatics 2006;22:2988–95. Hum Mol Genet 2004;13:489–94. [48] Torarinsson E, Sawera M, Havgaard JH, et al. Thousands of [68] Costa V, Angelini C, De Feis I, Ciccodicola A. Uncovering the corresponding human and mouse genomic regions unalignable in complexity of transcriptomes with RNA-Seq. J Biomed Biotechnol primary sequence contain common RNA structure. Genome Res 2010;2010:853916. 2006;16:885–9. [69] Courtney E, Kornfeld S, Janitz K, Janitz M. Transcriptome profil- [49] Woese CR, Winker S, Gutell RR. Architecture of ribosomal RNA: ing in neurodegenerative disease. J Neurosci Methods 2010;193: constraints on the sequence of “tetra-loops”. Proc Natl Acad Sci 189–202. USA 1990;87:8467–71. [70] Levin JZ, Yassour M, Adiconis X, et al. Comprehensive compara- [50] Gautheret D, Konings D, Gutell RR. G.U base pairing motifs in tive analysis of strand-specific RNA sequencing methods. Nat ribosomal RNA. RNA 1995;1:807–14. Methods 2010;7:709–15. [51] Gautheret D, Konings D, Gutell RR. A major family of motifs [71] Mills JD, Kawahara Y, Janitz M, Strand-specific RNA-Seq pro- involving G? A mismatches in ribosomal RNA. J Mol Biol vides greater resolution of transcriptome profiling. Curr Genomics 1994;242:1–8. 2013;14:173–81. [52] Smith MA, Gesell T, Stadler PF, Mattick JS. Widespread purifying [72] Lee JT, Davidow LS, Warshawsky D. Tsix, a gene antisense to selection on RNA structure in mammals. Nucleic Acids Res Xist at the X-inactivation centre. Nat Genet 1999;21:400–4. 2013;41:8220–36. [73] Engstrom PG, Suzuki H, Ninomiya N, et al. Complex loci in [53] Novikova IV, Hennelly SP, Sanbonmatsu KY. Structural architec- human and mouse genomes. PLoS Genet 2006;2:e47. ture of the human long non-coding RNA, steroid receptor [74] Chen J, Sun M, Kent WJ, et al. Over 20% of human transcripts RNA activator. Nucleic Acids Res 2012;40:5034–51. might form sense-antisense pairs. Nucleic Acids Res 2004;32: [54] Guttman M, Donaghey J, Carey BW, et al. LincRNAs act in the 4812–20. circuitry controlling pluripotency and differentiation. Nature [75] Ozsolak F, Kapranov P, Foissac S, et al. Comprehensive polyade- 2011;477:295–U60. nylation site maps in yeast and human reveal pervasive alternative [55] Khalil AM, Guttman M, Huarte M, et al. Many human large polyadenylation. Cell 2010;143:1018–29. intergenic noncoding RNAs associate with chromatin-modifying [76] He Y, Vogelstein B, Velculescu VE, et al. The antisense transcrip- complexes and affect gene expression. Proc Natl Acad Sci USA tomes of human cells. Science 2008;322:1855–7. 2009;106:11667–72. [77] Chen Z, Duan X. Ribosomal RNA depletion for massively parallel [56] Nesterova TB, Slobodyanyuk SY, Elisaphenko EA, et al. Charac- bacterial RNA-sequencing applications. Methods Mol Biol 2011; terization of the genomic xist locus in rodents reveals conservation 733:93–103. of overall gene structure and tandem repeats but rapid evolution of [78] Cheng J, Kapranov P, Drenkow J, et al. Transcriptional maps of unique sequence. Genome Res 2001;11:833–49. 10 human chromosomes at 5-nucleotide resolution. Science 2005; [57] Ulitsky I, Bartel DP. LincRNAs: genomics, evolution, and mecha- 308:1149–54. nisms. Cell 2013;154:26–46. [79] Trapnell C, Roberts A, Goff L, et al. Differential gene and tran- [58] Consortium TF, Carninci P, Kasukawa T, et al. The transcriptional script expression analysis of RNA-seq experiments with tophat and landscape of the mammalian genome. Science 2005;309:1559–63. cufflinks. Nat Protoc 2012;7:562–78. [59] King M, Wilson A. Evolution at two levels in humans and [80] Poliseno L, Salmena L, Zhang J, et al. A coding-independent func- chimpanzees. Science 1975;188:107–16. tion of gene and pseudogene mRNAs regulates tumour biology. [60] Haygood R, Babbitt CC, Fedrigo O, Wray GA. Contrasts between Nature 2010;465:1033–8. adaptive coding and noncoding changes during human evolution. [81] Fritah S, Niclou SP, Azuaje F. Databases for lncRNAs: a compara- For personal use only. Proc Natl Acad Sci USA 2010;107:7853–7. tive evaluation of emerging tools. RNA 2014;20:1655–65. [61] Caceres M, Lachuer J, Zapala MA, et al. Elevated gene expression [82] Amaral PP, Clark MB, Gascoigne DK, et al. lncRNAdb: levels distinguish human from non-human primate brains. Proc a reference database for long noncoding RNAs. Nucleic Acids Res Natl Acad Sci USA 2003;100:13030–5. 2011;39:D146–51. [62] Lambert N, Lambot MA, Bilheu A, et al. Genes expressed in [83] Quek XC, Thomson DW, Maag JL, et al. lncRNAdb v2.0: expand- specific areas of the human fetal cerebral cortex display distinct ing the reference database for functional long noncoding RNAs. patterns of evolution. PLoS One 2011;6:e17753. Nucleic Acids Res 2015;43:D168–73. [63] Leigh SR. Brain growth, life history, and cognition in primate and [84] Luk AC, Gao H, Xiao S, et al. GermlncRNA: a unique catalogue human evolution. Am J Primatol 2004;62:139–64. of long non-coding RNAs and associated regulations in male germ [64] Enard W, Khaitovich P, Klose J, et al. Intra- and interspecific variation cell development. Database (Oxford) 2015;2015:bav044. in primate gene expression patterns. Science 2002;296:340–3. [85] Volders PJ, Verheggen K, Menschaert G, et al. An update on [65] Pollard KS, Salama SR, Lambert N, et al. An RNA gene expressed LNCipedia: a database for annotated human lncRNA sequences. during cortical development evolved rapidly in humans. Nature Nucleic Acids Res 2015;43:D174–80. 2006;443:167–72. [86] Volders PJ, Helsens K, Wang X, et al. LNCipedia: a database for [66] Enard W, Przeworski M, Fisher SE, et al. Molecular evolution of annotated human lncRNA transcript sequences and structures. J Hum Transcriptome Downloaded from informahealthcare.com by University New South Wales on 08/11/15 FOXP2, a gene involved in speech and language. Nature Nucleic Acids Res 2013;41:D246–51. 2002;418:869–72.