Bioinformatics Analysis of Long Noncoding Rnas Differentially Expressed in Autism: Identification of Potential Involvement in Staufen-Mediated Mrna Decay

Bioinformatics Analysis of Long Noncoding RNAs Differentially Expressed in Autism: Identification of Potential Involvement in Staufen-mediated mRNA Decay

by Ye Su

B.S. in Bioengineering, June 2014, Xiamen University

A Thesis submitted to

The Faculty of The Columbian College of Arts and Sciences of The George Washington University in partial fulfillment of the requirements for the degree of Master of Science

January 19, 2018

Thesis directed by

Valerie W. Hu Professor of Biochemistry and Molecular Medicine

Nagarajan Pattabiraman Research Associate of Biochemistry and Molecular Medicine

ii Acknowledgements

I wish to thank, first and foremost, my supervisor, Dr. Valerie W. Hu, who

encouraged and supported me from the initial to the final level to develop an

understanding of the project over the past year and a half. This thesis would not have

been possible without her patience, carefulness and earnest. I am also heartily grateful to

my co-mentor, Dr. Nagarajan Pattabiraman. The knowledge and skills I learned from him

have broadened my eyes and will profoundly influence me in my future. I would also like

to gratefully acknowledge Dr. Anelia Horvath for taking the time to read and revise this

thesis. Moreover, I would like to sincerely thank Dr. Vanderhoek for advising and supporting me during the completion of my master’s degree in the Bioinformatics and

Molecular Biochemistry program. Lastly, I offer my regards and blessings to all of those who supported me in any respect throughout the completion of the project.

iii Abstract of Thesis

Bioinformatics Analysis of Long Noncoding RNAs Differentially Expressed in Autism: Identification of Potential Involvement in Staufen-mediated mRNA Decay

Autism spectrum disorders (ASD) are neurodevelopmental disorders characterized by abnormal language development, deficits in social interaction, repetitive behaviors and restricted interests. Despite the core symptoms that define ASD, there is considerable heterogeneity in the manifestations and severity of behaviors associated with ASD, thus presenting a major challenge to ‘omics studies directed towards understanding its underlying biology. To reduce heterogeneity in ASD for transcriptomic analyses, our laboratory used multivariate cluster analyses of severity scores on a “gold standard” behavior-based diagnostic instrument to divide individuals with ASD into 4 phenotypically distinct subgroups. Expression profiling of lymphoblastoid cells from three of the four subgroups revealed both distinct as well as overlapping differentially expressed genes relative to controls, of which only 20 transcripts were shared among all three subtypes of ASD. Interestingly, all 20 transcripts were identified as novel, long noncoding RNAs (lncRNAs) of unknown functions. The goal of this study was to employ bioinformatics analyses to identify potential functions of these lncRNAs.

An analysis of the sequences of these noncoding transcripts showed that, among these 20 lncRNAs, 5 were found to contain similar Alu elements, which are the most

iv common repetitive sequences in the genome. Published studies showed that lncRNAs

with Alu elements can facilitate decay of selected mRNAs by forming dsRNA structures

with 3’-UTR sequences of mRNAs leading to the recruitment of a protein called Staufen1

(STAU1), a dsRNA binding protein. Subsequently, STAU1 recruits a nonsense-meditated mRNA decay factor UPF1 to cause degradation of the dsRNA helix. Therefore, we postulate that these 5 lncRNA containing Alu elements may play an important role in autism by initiating Staufen-mediated decay (SMD) of selected mRNAs containing 3’-

UTR sequences complementary to the Alu elements in the lncRNAs.

Here, we show that: 1) some lncRNAs differentially expressed in ASD contain repetitive Alu elements; 2) the Alu-containing regions of the differentially expressed lncRNAs can form dsRNA duplexes with the 3’-UTR regions of a number of different mRNAs; and 3) computer modeling of the interaction of STAU1 and dsRNA duplexes containing the Alu elements of the lncRNAs and 3’-UTRs of several target mRNAs supports potential binding interactions.

v Table of Contents

Acknowledgement iii

Abstract of Thesis vi

List of Figures vii

List of Tables xi

List of Abbreviations xii

Introduction 1

Methods & Materials 9

Results 13

Discussion 48

Conclusion 52

Reference 53

vi List of Figures

Figure 1 Sequence alignment of selected Alu subfamilies using AliView 7

Figure 2 Model for STAU1-meditated mRNA decay as presented by 8

Gong and Maquat, 2011.

Figure 3 Workflow of this study 9

Figure 4 NCBI BLAST search result for lncRNA T65857 15

Figure 5 RepeatMasker search result for lncRNA T65857 16

Figure 6 Predicted base-pairing between lncRNA T65857 and 3’-UTR of 20

mRNA UGT1A1

Figure 7 Comparison of Alu-containing 3’-UTR sequences of potential 21

target mRNAs of lncRNA T65857.

Figure 8 Predicted base-pairing between lncRNA R11217 and 3’-UTR 22

of mRNA CHM

Figure 9 Comparison of Alu-containing 3’-UTR sequences of potential 23

target mRNAs of lncRNA R11217

Figure 10 Predicted base-pairing between lncRNA H85885 and 3’-UTR 24

of mRNA RCAN3

vii Figure 11 Comparison of Alu-containing 3’-UTR sequences of potential 25

target mRNAs of lncRNA H85885

Figure 12 Predicted base-pairing between lncRNA N73227 and 3’-UTR 26

of mRNA FOXK1

Figure 13 Comparison of Alu-containing 3’-UTR sequences of potential 27

target mRNAs of lncRNA N73227

Figure 14 Predicted base-pairing between lncRNA N47010 and 3’-UTR 28

of mRNA NAP1L1

Figure 15 Comparison of STAU1 dsRBD3 and dsRBD4 sequences from 30

STAU1 homologues of Drosophila melanogaster (Dm),

Drosophila virilis (Dv), Homo sapiens (Hs), Mus musculus

(Mm), and Musca domestica (Md)

Figure 16 Model for how human STAU1 dimers recruit UPF1 to the 3’- 32

UTR of an mRNA that is targeted for SMD and promote its

decay

Figure 17 Comparison of amino acid residues that interact with dsRNA 35

from different dsRNA binding protein models and prediction of

interacting residues of human STAU1 dsRBD3 and dsRBD4

viii Figure 18 Models of STAU1 binding to dsRNA formed by lncRNA 37

T65857 and UGT1A1

Figure 19 Models of STAU1 binding to dsRNA formed by lncRNA 38

R11217 and CHM

Figure 20 Models of STAU1 binding to dsRNA formed by lncRNA 39

H85885 and RCAN3

Figure 21 Models of STAU1 binding to dsRNA formed by lncRNA 40

N73227 and FOXK1

Figure 22 Models of STAU1 binding to dsRNA formed by lncRNA 41

N47010 and NAP1L1

Figure 23 Interacting residues in model of human STAU1 binding 42

domains 3 and 4 to dsRNA formed by lncRNA T65857 and

mRNA UGT1A1

Figure 24 Interacting residues in model of human STAU1 binding 43

domains 3 and 4 binding to dsRNA formed by lncRNA R11217

and mRNA CHM

Figure 25 Interacting residues in model of human STAU1 binding 44

domains 3 and 4 binding to dsRNA formed by lncRNA

H85885 and mRNA RCAN3

ix Figure 26 Interacting residues in model of human STAU1 binding

domains 3 and 4 binding to dsRNA formed by lncRNA 45

N73227 and mRNA FOXK1

Figure 27 Interacting residues in model of human STAU1 binding

domains 3 and 4 binding to dsRNA formed by lncRNA 46

N47010 and mRNA NAP1L1

Figure 28 Comparison of predicted interacting amino acid residues in

models of human STAU1 binding through dsRBD3 and 4 to 47

dsRNA formed by the five lncRNAs containing Alu-elements

in their 3’-UTR regions and their respective target mRNAs

x List of Tables

Table 1 Differentially expressed noncoding transcripts across all 14

three ASD subgroups Analyzed

Table 2 Potential target mRNAs for 5 ASD-associated lncRNAs 18

xi List of Abbreviations

ASD Autism spectrum disorders dsRBD Double-strand RNA binding domain dsRBP Double-strand RNA binding protein dsRNA Double-strand RNA lincRNA Long intergenic non-coding RNAs

LINEs Long interspersed elements lncRNAs Long non-coding RNAs ncRNAs Non-coding RNAs

IP Immunoprecipitation

SBS STAU1 binding site

SINEs Short interspersed elements

SMD STAU1-mediated mRNA decay

SSM STAU1-swapping motif

TEs Transposable elements

xii

Introduction

Autism spectrum disorders (ASD) are neurodevelopmental disorders characterized

by abnormal language development, deficits in social interaction, repetitive behaviors and

restricted interests. Heterogeneity in phenotypic presentation of ASD has been a

challenge to identifying specific genes involved in autism. The pathobiology of ASD still

remains unknown. To reduce heterogeneity in autism, severity scores of ASD behaviors

from an ASD diagnostic instrument was used to divide individuals with ASD into 4

different phenotypic subgroups (Hu and Steinberg, 2009). Gene expression analysis of three out of four ASD subgroups revealed both unique and overlapping genes that are differentially expressed in ASD. Interestingly, only 20 differentially expressed long noncoding RNAs (lncRNAs) were shared by all three subgroups, but these were all unannotated. The overall goal of this study is to probe the functions of 20 lncRNAs in

ASD using bioinformatics approaches.

Long non-coding RNAs and repetitive elements

In contrast to a small proportion (1-2%) of the mammalian genome that is

transcribed into mRNAs, the large majority of the genome is transcribed into non-coding

RNAs (ncRNAs) that do not encode information about proteins (Ma, Bajic & Zhang,

2013). Non-coding RNAs longer than 200 nucleotides are commonly defined as

1 long non-coding RNAs (lncRNAs), while the remainder is regarded as small ncRNAs, which include microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs and scaRNAs, each of which are associated with specific functions (Ma, Bajic & Zhang,

2013). Much less is known about the functions of lncRNAs, the focus of this study.

It is suggested that intergenic lncRNAs and intronic lncRNAs are most likely regulated via different transcription activation mechanisms (Ma, Bajic & Zhang, 2013).

Long intergenic non-coding RNAs (lincRNAs) are able to regulate gene expression at the level of chromatin modification, transcription and post-transcriptional processing

(Mercer, Dinger & Mattick, 2009), while intronic lncRNAs are less stable than intergenic lncRNAs and they may relate to transcription and post-transcriptional regulation (Ma,

Bajic & Zhang, 2013). Among lncRNAs repetitive sequences, which are mostly transposable elements (TEs), have been estimated to comprise approximately 50% of the human genome. Repetitive elements fall into different classes depending on their mode of multiplication and/or structure. Characterized by size, long interspersed elements

(LINEs) are more than 5000 bp, while short interspersed elements (SINEs) are typically less than 500 bp. LINEs and SINEs are capable of affecting gene expression at different levels through different mechanisms, epigenetic silencing, for example. Finding sequence similarity to a single consensus sequence for each repeat type is a typical approach for identifying repetitive elements within ncRNAs.

2 Alu elements

Among the most notable SINEs, Alu elements are repetitive sequences that are

commonly found in introns, 3′-UTR regions of genes, and between genes (Batzer &

Deininger, 2002). Full-length Alu sequences are about 280 bp long, comprising

approximately 10% of the human genome (Price, Eskin & Pevzner, 2004). For a long

time, Alu elements were believed to have no biological function (Price, Eskin & Pevzner,

2004). Nonetheless, Alu insertions and Alu unequal recombination are believed to be

responsible for a great proportion of human genetic diseases, such as hemophilia and

neurofibromatosis (Deininger & Batzer, 1999). One study suggested that 33 cases of

germ-line genetic diseases and 16 cases of cancer were caused by unequal homologous

Alu recombination (Deininger & Batzer, 1999). Since Alu elements have arisen from replication of an RNA polymerase III transcript, the evolution of Alu elements has given rise to Alu subfamilies (Price, Eskin & Pevzner, 2004). Alu subfamilies that have close relationships are defined by consensus sequences (Fig. 1). In all, 213 Alu subfamilies have been identified and 31 of them have been reported to Repbase Update, a database of repetitive sequences. The characterization and classification of Alu subfamilies not only reveal the evolutionary history of Alu family, but also play an important role in phylogenetic analysis. For instance, by building the phylogenetic relationships of one Alu

subfamily, AluYe5 within the hominid lineage, scientists have provided strong evidence

that the chimp is the closest living relative of humans (Salem et al., 2003).

3 Double-strand RNA binding domain

The dsRNA binding domain (dsRBD) was first identified in the Drosophila STAU1,

which contains 5 dsRBDs (St Johnston et al., 1992). The dsRNA binding protein

(dsRBP) family includes multiple species and share a common evolutionarily conserved

motif specifically promoting interaction with dsRNA. The motif, which is an

approximately 70 amino-acid sequence, usually forms an α−β−β−β−α structure that

contains some residues specifically involved in dsRNA binding. However, the number of

dsRBDs is different for each dsRBP family. Some dsRBPs can contain up to five

dsRBDs, like Drosophila Staufen1, while other dsRBPs, for instance, RNaseIII, contain only one (Fierro & Mathews, 2000). Since dsRBDs like dsRNA-activated protein kinase

(PKR), have been shown to play a role in protein dimerization, it is likely that multiple

dsRBDs represent a feature that stabilizes homo- or heterotypic protein–protein interactions (Cosentino et al, 1999).

Alu-directed STAU1-mediated decay of mRNA

Some Alu elements are involved in Staufen (STAU1)-mediated decay of mRNA

(Gong & Maquat, 2011). Staufen is a double-strand RNA binding protein which was first

found in Drosophila (St Johnston et al., 1992). STAU1-mediated decay (SMD) is an mRNA degradation process that is mediated by the binding of STAU1 to a STAU1- binding site (SBS) within the 3'-untranslated region (3'-UTR) of a target mRNA (Fig. 2).

4 As a double-strand RNA (dsRNA) binding protein, STAU1 binds to RNA helix

structures that are formed either by intramolecular base-pairing of 3’-UTR sequences within an mRNA molecule or by intermolecular base-pairing of 3’-UTR sequences within a target mRNA and a long noncoding RNA (lncRNA) via partially complementary

Alu elements. STAU1 complexes with dsRNA then recruits an mRNA decay factor,

UPF1, which leads to the degradation of the bound dsRNA. Interestingly, UPF1 was also found to be differentially expressed in ASD (Hu et al., 2009) as well as a potential biomarker for ASD (Hu and Lai, 2013).

Hypothesis

In this study, we test the hypothesis that at least some of the lncRNAs differentially expressed in ASD contain functional repetitive elements. Based on an initial survey of the lncRNA sequences which revealed Alu elements in at least five of the 20 lncRNAs, we

further proposed that these lncRNAs with Alu elements may be involved in STAU1-

mediated decay of specific target mRNAs, some of which may be relevant to autism.

Specific Aims

The specific aims of this study are:

1) To identify which of the lncRNAs contain Alu repetitive sequences

5 2) To identify potential mRNA binding targets for these Alu-containing lncRNAs, and 3)

To develop a model for the interaction of STAU1 with the dsRNA complexes formed between Alu elements in the lncRNA and the target mRNA.

Figure 1. Sequence alignment of selected Alu subfamilies using AliView. The consensus sequence of the whole Alu family (top line), with base positions labeled from 1 to 293. Shown are the sequences of ten representative Alu subfamilies out of 31 that are currently reported in Repbase Update: AluJo, AluJb, AluJr, AluSx, AluSx1, AluSg, AluSz,

AluY, FLAM and FRAM. Deletions are shown as dashes, and mutations are shown as

different colored boxes.

Figure 2. Model for STAU1-meditated mRNA decay as presented by Gong and

Maquat, 2011. Base-pairing between a polyadenylated Alu-containing mRNA (blue) and a lncRNA (red) through partially complementary Alu elements (each of which is a half

Staufen binding site, ½ SBS) forms a functional STAU1-binding-site (SBS). The binding of STAU1 to SBS triggers STAU1-mediated mRNA decay (SMD) in a UPF1-dependent mechanism. Translating ribosomes (blue ovals) do not remove bound STAU1 since translation terminates sufficiently upstream of the SBS. The RNA helix is not destroyed in the process. C, cytoplasm; N, nucleus; Ter, termination codon (which is generally, but not necessarily, a normal termination codon) (Gong & Maquat, 2011).

Materials and methods

The schematic in Figure 3 describes the workflow for this entire study.

Figure 3. Workflow of this study

9 Analysis of sequences of 20 lncRNAs for potential functional elements using NCBI EST database.

NCBI EST database was used to search for functional elements in the sequences of

20 lncRNAs. The search revealed that five of the 20 lncRNAs contained Alu elements.

Identification of Alu repeating element in 5 lncRNAs using NCBI BLAST.

Alu repeating sequences of the lncRNAs with GenBank accession numbers T65857,

R11217, H85885, N47010 and N73227 were confirmed by NCBI BLAST, which finds regions of similarity between biological sequences (Johnson et al., 2008), using the sequences of the five lncRNA. Then Alu repeating sequences in the 5 lncRNAs were further identified to obtain the family types and positions in the sequence of the whole

Alu family using the program RepeatMasker (http://www.repeatmasker.org/cgi- bin/WEBRepeatMasker) (Tempel, 2012).

Searching for potential target mRNA with UCSC BLAT.

UCSC BLAT (GRCh38/hg38) (Kent, 2002) was used to search each of the reverse complementary sequences of Alu elements from the five lncRNAs for mRNAs containing the same or similar Alu repeating sequence in their respective 3’-UTRs. Reverse

Complement (http://www.bioinformatics.org/sms/index.html) (Stothard, 2000) was used to generated reverse complementary sequences of the Alu repetitive sequences of

10 lncRNAs. Compared to NCBI BLAST, BLAT searched for more similar and longer sequences with higher speed (Kent, 2002). Gene annotations and coordinates were obtained from UCSC database using the ‘RefGene’ table.

Predicting alignment between mRNAs and lncRNA with Clustal Omega.

The sequences of Alu repeats in the lncRNAs and 3’ UTRs of potential targeted mRNA were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo).

As a multiple sequences alignment tool for DNA, RNA and protein, Clustal Omega aligns input sequences in fasta sequence format (McWilliam et al., 2013).

Comparing Staufen1 dsRNA binding domain with AliView.

AliView, a colorful alignment viewer and editor (Larsson, 2014),

(http://ormbunkar.se/aliview) was used to align and display the sequences of Staufen1 dsRNA binding domain 3 and domain 4 from 5 species: Drosophila melanogaster (Dm),

Drosophila virilis (Dv), Musca domestica (Md), Mus musculus (Mm), and Homo sapiens

(Hs) in Nexus format. The sequence of all dsRNA binding domains were obtained from

UniProt.

Modeling the binding of RNA duplexes and Staufen1.

11 An RNA helix model was generated by CHIMERA (UCSF, http://www.cgl.ucsf.edu/chimera/index.html) (Pettersen et al., 2004) and Molecular

Operating Environment (MOE) software (https://www.chemcomp.com) was used for homology modeling of human STAU1 binding to dsRNA based on the 3D structure of

RNaseIII (PDB:2NUG). After CHIMERA constructed a rough model of dsRNA, MOE was used to optimize the rough version by minimizing energy of the environment. Then the dsRNA structure in the RNaseIII model (PDB:2NUG) was replaced with the dsRNA model representing the respective Alu-containing sequences of the lncRNA and putative mRNA target 3’-UTR regions which were optimized by MOE using CHIMERA.

Results

Several noncoding transcripts differentially expressed in 3 subphenotypes of ASD contain Alu elements.

Among the 20 noncoding transcripts that have been shown to be differentially expressed in autism (shown in Table 1), five lncRNAs (T65857, N47010, N73227,

R11217 and H85885) were found to contain Alu elements by NCBI BLAST search..

Figure 4 presents only one example of the search results for T65857. The sequences shown by lower case letters in the BLAST analyses were repetitive sequences. To further confirm the Alu elements, RepeatMasker was used to reveal that the Alu repetitive sequences within the 5 lncRNAs belong to different Alu subfamilies. T65857 (Fig. 5) and

N47010 both contain part of AluJo sequences; N73227, R11217 and H85885 contain repeating sequences that have been identified as AluJb, FLAM and AluJr, respectively.

Using the Alu sequences of the ASD lncRNAs, we next searched for potential target mRNAs with complementary Alu sequences in their respective 3’-UTRs.

Table 1. Differentially expressed noncoding transcripts across all three ASD subgroups Analyzed (Hu et al., 2009). Yellow: lncRNA containing Alu elements.

Figure 4. NCBI BLAST search result for lncRNA T65857. Query 1-566: sequence of lncRNA T65857; Sbjct 90058901-90059466: sequences in human genome from Chr12

90058901-90059466. "-" indicates an insertion/deletion; “|” indicates match; Gap

represents mismatch; Lower-cased letters: repetitive sequences.

Figure 5. RepeatMasker search result for lncRNA T65857. 1-87: sequence of lncRNA

T65857 1-87; 90-2: reverse-complementary sequence of Alu subfamily AluJo 2-90; "-" indicates an insertion/deletion; "i" means a transition (G<->A, C<->T) and "v" represents a transversion (all other substitutions).

16 Identifying potential target mRNAs of lncRNAs

To form a dsRNA structure, an lncRNA with Alu element must form a complex with an mRNA containing a complementary Alu element in its 3’-UTR region. Using Alu sequences within the 5 ASD-associated lncRNAs, a UCSC BLAT search provided hundreds of potential target mRNAs containing Alu elements complementary to 3’-UTR sequences of each ASD lncRNA in their 3’-UTRs. The list of target mRNAs was filtered by alignment of reverse complementary Alu sequences of lncRNAa to the 3’-UTR of the targets by Clustal Omega (Table. 2). Among the potential target genes (Table 2), FOXK1,

HOOK3 and NAP1L1 were found to have increased expression in the severely language- impaired subgroup of ASD by an earlier study (Hu et al., 2009).

Table 2. Potential target mRNAs for 5 ASD-associated lncRNAs. Yellow: Genes that were found to be differentially expressed in ASD (Hu et al., 2009).

18 Potential STAU1 binding site in target mRNAs

STAU1 binds to dsRNA with no sequence specificity and the RNA signature for

STAU1 binding remains unclear. Based on the results of some studies (Ramos et al.,

2000), STAU1 binding requires 10-16 bp of A-form dsRNA with no gap. Since DNA

does not contain 2’-OH, it can form both A-form (C3’-endo) and B-form (C2’-endo) structures, while RNA helix containing 2’-OH adopts only A-form conformation, which makes dsDNA less stable than dsRNA. Using the base-pairing between lncRNA T65857 and one of its potential target mRNAs UGT1A1 as an example, we found 3 potential

STAU1 binding sites that meet the above-mentioned requirement (Fig. 6). In actual RNA duplex structures, base-pairing between T and G as well as between A and C are also allowed. Using these criteria for RNA duplex formation between Alu elements, 3 putative

STAU1 binding sites (SBS) were identified on each of 8 potential target mRNAs of

T65857 (Fig. 7). In the same manner, 1 SBS was found on 5 potential target mRNAs of

R11217 (Fig. 8 and Fig. 9), 1 to 2 SBSs on 8 potential target mRNAs of H85885 (Fig. 10 and Fig. 11), 2 SBSs on 13 potential target mRNAs of N73227 (Fig. 12 and Fig. 13), and more than 3 SBSs on 11 potential target mRNAs of N47010, for which Figure 14 showed an example.

Figure 6. Predicted base-pairing between lncRNA T65857 and 3’-UTR of mRNA

UGT1A1. Sequence of potential target 3’-UTR of mRNA UGT1A1 (top). Sequence of

Alu element in lncRNA T65857 (Bottom). Red characters represent potential STAU1 binding sites with length of sequence indicated in the blue box above. Vertical lines indicates base-pairing; Gap means no base-pairing; Dot means possible base-pairing in the RNA duplex structure.

Figure 7. Comparison of Alu-containing 3’-UTR sequences of potential target

mRNAs of lncRNA T65857. The Alu-containing 3’-UTR sequences of each potential mRNA target of lncRNA T65857 are shown along with 3 predicted STAU1 binding sites.

The high sequence similarity among the targets suggest that each is capable of forming dsRNA structures with T65857 at three distinct SBS. Dash means less likely base-paring at that site.

Figure 8. Predicted base-pairing between lncRNA R11217 and 3’-UTR of mRNA

CHM. Sequence of potential target 3’-UTR of mRNA CHM (top). Sequence of Alu element in lncRNA R11217 (Bottom). Red characters represent potential STAU1 binding sites with length of sequence indicated in the blue box above. Vertical lines indicates

base-pairing; Gap means no base-pairing; Dot means possible base-pairing in the RNA duplex structure.

Figure 9. Comparison of Alu-containing 3’-UTR sequences of potential target

mRNAs of lncRNA R11217. The Alu-containing 3’-UTR sequences of each potential mRNA target of lncRNA R11217 are shown in the context of 1 predicted STAU1 binding site. Dash means less likely base-paring at that site.

Figure 10. Predicted base-pairing between lncRNA H85885 and 3’-UTR of mRNA

RCAN3. Sequence of potential target 3’-UTR of mRNA RCAN3 (top). Sequence of Alu element in lncRNA H85885 (Bottom). Red characters represent potential STAU1 binding sites with length of sequence indicated in the blue box above. Vertical lines indicates

base-pairing; Gap means no base-pairing; Dot means possible base-pairing in the RNA duplex structure.

Figure 11. Comparison of Alu-containing 3’-UTR sequences of potential target

mRNAs of lncRNA H85885. The Alu-containing 3’-UTR sequences of each potential mRNA target of lncRNA H85885 are shown in the context of one predicted STAU1 binding sites including a few shorter SBSs. Dash means less likely base-paring at that site.

Figure 12. Predicted base-pairing between lncRNA N73227 and 3’-UTR of mRNA

FOXK1. Sequence of potential target 3’-UTR of mRNA FOXK1 (top). Sequence of Alu

element in lncRNA N73227 (Bottom). Red characters represent potential STAU1 binding

sites with length of sequence indicated in the blue box above. Vertical lines indicates

base-pairing; Gap means no base-pairing; Dot means possible base-pairing in the RNA duplex structure.

Figure 13. Comparison of Alu-containing 3’-UTR sequences of potential target

mRNAs of lncRNA N73227. The Alu-containing 3’-UTR sequences of each potential mRNA target of lncRNA N73227 are shown along with 2 predicted STAU1 binding sites. Dash means less likely base-paring at that site.

Figure 14. Predicted base-pairing between lncRNA N47010 and 3’-UTR of mRNA

NAP1L1. Sequence of potential target 3’-UTR of mRNA NAP1L1 (top). Sequence of

Alu element in lncRNA N47010 (Bottom). Red characters represent potential STAU1 binding sites with length of sequence indicated in the blue box above. Vertical lines indicates base-pairing; Gap means no base-pairing; Dot means possible base-pairing in the RNA duplex structure.

28 Comparison of dsRNA binding domains of STAU1 with that of other families of dsRNA binding proteins

Human STAU1 is predicted to contain 4 dsRBDs, while UniProt only presents three.

However, UniProt does provide a new structure (PDB: 4dkk) that contains the last domain of human STAU1, which is less likely to directly interact with dsRNA (Gleghorn et al. 2013). As the two most efficient dsRBDs in STAU1, the residues of dsRBD3 and 4 from different species are highly conserved (Fig. 15). These common residues, may play an important role in recognizing and interacting with dsRNAs.

(a) STAU1 dsRBD3

(b) STAU1 dsRBD4

Figure 15. Comparison of STAU1 dsRBD3 and dsRBD4 sequences from STAU1 homologues of Drosophila melanogaster (Dm), Drosophila virilis (Dv), Homo sapiens

(Hs), Mus musculus (Mm), and Musca domestica (Md) (Ramos et al. 2000). Consensus amino acids (aa) are listed at the bottom, with dashes representing sites where the sequences cannot be matched across the species.

30 Model of STAU1 recruiting UPF1 via dimerization

Human STAU1 contains four dsRNA-binding domains, but only dsRBD3 and

dsRBD4 bind to dsRNA (Gleghorn et al. 2013). Even though RBD5 does not bind to

dsRNA, it plays a significant role in forming dimerization that promotes UPF1 binding

(Fig. 16) (Gleghorn et al. 2013).

Ideally, dsRBD3 and dsRBD4 from two hSTAU1 proteins bind to two SBSs of an

RNA helix, and STAU1-swapping motif (SSM) of one hSTAU1 molecule forms a

dimer, which recruits UPF1, with dsRBD5 of another hSTAU1 as shown in Figure 16

(Gleghorn et al. 2013). In other studies, dsRBD2 may also facilitate protein-protein

interactions between hSTAU1s (Martel et al., 2010).

Figure 16. Model for how human STAU1 dimers recruit UPF1 to the 3’-UTR of

an mRNA that is targeted for SMD and promote its decay (from Gleghorn et al.,

2013)

The dimerization of human STAU1 domain 5 promotes UPF1 binding. Binding of human

STAU1 domain 3 and domain 4 to targeted dsRNA activates mRNA decay. Shown here

is an intermolecular SBS formed by base pairing of an mRNA 3’-UTR (top strand) with a long noncoding RNA (bottom). RBD, RNA binding domain; SSM: STAU1-swapping

motif; SBS: STAU1 binding site; 7mG, 7-methyl guanosine cap; AUG, translation

initiation codon; Ter, normal termination codon; An, poly(A) tail (Gleghorn et al. 2013).

32 STAU1 binds to dsRNA with specific residues

Among the four dsRNA-binding domains in human STAU1, dsRBD3 and dsRBD4

play the most significant role in binding to dsRNA. Also, dsRBD3 binds dsRNA with

higher affinity than dsRBD4 (Gleghorn et al. 2013).

As mentioned before, the number of dsRBDs each dsRNA binding protein family

contains is different. However, it is not the number of dsRBDs in a dsRBP, but the

number of dsRBDs that actually participate in the interaction that matters. For instance,

RNaseIII is a type of ribonuclease that binds to and cleaves dsRNA. Therefore, RNaseIII

has two kinds of domains, a dsRNA binding domain and a ribonuclease domain. During the interaction with dsRNA, sometimes two RNaseIII molecules with four domains in total work together. It has been inferred from the model in Figure 16 that two human

STAU1 with 4 binding domains will be involved in the interaction under ideal circumstances. Consequently, the binding model of human STAU1 to dsRNA may be similar to the model of RNaseIII binding to dsRNA. Therefore, we can use the published model of RNaseIII binding to dsRNA(2NUG) to simulate the binding between STAU1 and its target mRNA. Since STAU1 does not contain a domain that cleaves RNA, to promote dsRNA degradation, it recruits nonsense-mediated mRNA decay factor UPF1

(Gong, & Maquat, 2011).

In as much as there is no reported structure of human STAU1 binding to dsRNA, by comparing the interacting residues of 13 dsRBP binding models from 3 dsRNA binding

33 proteins, we predict that residues S4, E8, P29, H30, K54 and K55 on human STAU1

dsRBD3, and S4, Q8, P29, R30, K54 and K55 on dsRBD4 are more likely to interact

with dsRNA duplexes formed between ASD-related lncRNAs and the 3’-UTRs of the putative targets (Fig. 17). These residues of dsRBD3 and dsRBD4 all belong to the evolutionary conserved residues displayed in Figure 15 except for P29.

Figure 17. Comparison of amino acid residues that interact with dsRNA from different dsRNA binding protein models and prediction of interacting residues of human STAU1 dsRBD3 and dsRBD4. AaRNaseIII: Aquifex aeolicus RNaseIII; hRNA: human RNA helicase A (RHA); DmStaufen: Drosophila melanogaster Staufen; hSTAU1: human STAU1; D3: dsRBD3; D4: dsRBD4; Red: predicted dsRNA interacting residues of human STAU1 dsRBD3 and dsRBD4 with a high probability. Purple: predicted dsRNA interacting residues of human STAU1 dsRBD3 and 4 with a lower probability.

35 Modeling the binding of STAU1 domains 3 and 4 to dsRNA formed by ASD lncRNA and the 3’-UTR regions of their target mRNAs

Using the aligned sequences of the Alu-containing 3-’UTR region of UGT1A1 and lncRNA T65857 (Fig. 6), we generated a dsRNA structure and created a model for the binding of human STAU1 to the dsRNA, via STAU1 dsRBD3 and dsRBD4 (Fig.18). In the same manner, we modeled hSTAU1 dsRBD3 and dsRBD4 binding to the dsRNA formed from lncRNA R11217 and CHM mRNA (Fig. 19), as well as to lncRNA H85885

and RCAN3 mRNA (Fig. 20), lncRNA N73227 and FOXK1 mRNA (Fig. 21) and

lncRNA N47010 and NAP1L1 mRNA (Fig. 22). For each of these binding models, we

also identified the interacting residues by identifying hydrogen bonding potential between

amino acid residues and the RNA (Figs. 22-27).

The interacting residues of 5 models (Fig. 28) are basically in line with the expectations (Fig. 17). However, in the model of STAU1 binding to lncRNA H85885 and lncRNA N73227, the dsRBD3 seems to bind to RNA duplexes with less affinity than dsRBD4 with less interacting residues (Fig. 28), which is not as expected.

Figure 18. Models of STAU1 binding to dsRNA formed by lncRNA T65857 and

UGT1A1. Models were constructed using MOE and CHIMERA. Red: α helix; Yellow: β sheet; White: dsRNA ribbon; hSTAU1 d3: human STAU1 domain 3; hSTAU1 d4: human STAU1 domain 4; SBS2: STAU1-binding site 2; The sequence of dsRNA shown below was from Figure 6:

Figure 19. Models of STAU1 binding to dsRNA formed by lncRNA R11217 and

CHM. Models were constructed using MOE and CHIMERA. Red: α helix; Yellow: β sheet; White: dsRNA ribbon; hSTAU1 d3: human STAU1 domain 3; hSTAU1 d4: human STAU1 domain 4; SBS1: STAU1-binding site 1; The sequence of dsRNA shown below was from Figure 8:

Figure 20. Models of STAU1 binding to dsRNA formed by lncRNA H85885 and

RCAN3. Models were constructed using MOE and CHIMERA. Red: α helix; Yellow: β sheet; White: dsRNA ribbon; hSTAU1 d3: human STAU1 domain 3; hSTAU1 d4: human STAU1 domain 4; SBS2: STAU1-binding site 2; The sequence of dsRNA shown below was from Figure 10:

Figure 21. Models of STAU1 binding to dsRNA formed by lncRNA N73227 and

FOXK1. Models were constructed using MOE and CHIMERA. Red: α helix; Yellow: β sheet; White: dsRNA ribbon; hSTAU1 d3: human STAU1 domain 3; hSTAU1 d4: human STAU1 domain 4; SBS2: STAU1-binding site 2; The sequence of dsRNA shown below was from Figure 12:

Figure 22. Models of STAU1 binding to dsRNA formed by lncRNA N47010 and

NAP1L1. Models were constructed using MOE and CHIMERA. Red: α helix; Yellow: β sheet; White: dsRNA ribbon; hSTAU1 d3: human STAU1 domain 3; hSTAU1 d4: human STAU1 domain 4; SBS2: STAU1-binding site 2; The sequence of dsRNA shown below was from Figure 14:

Figure 23. Interacting residues in model of human STAU1 binding domains 3 and 4

to dsRNA formed by lncRNA T65857 and mRNA UGT1A1.

Cyan: interacting amino acids; Gray: dsRNA ribbon; Red: oxygens; Blue: nitrogens; The sequence of dsRNA shown below was from Figure 6:

Figure 24. Interacting residues in model of human STAU1 binding domains 3 and 4 binding to dsRNA formed by lncRNA R11217 and mRNA CHM.

Cyan: interacting amino acids; Gray: dsRNA ribbon; Red: oxygens; Blue: nitrogens; The sequence of dsRNA shown below was from Figure 8:

Figure 25. Interacting residues in model of human STAU1 binding domains 3 and 4 binding to dsRNA formed by lncRNA H85885 and mRNA RCAN3.

Cyan: interacting amino acids; Gray: dsRNA ribbon; Red: oxygens; Blue: nitrogens; The sequence of dsRNA shown below was from Figure 10:

Figure 26. Interacting residues in model of human STAU1 binding domains 3 and 4 binding to dsRNA formed by lncRNA N73227 and mRNA FOXK1.

Cyan: interacting amino acids; Gray: dsRNA ribbon; Red: oxygens; Blue: nitrogens; The sequence of dsRNA shown below was from Figure 12:

Figure 27. Interacting residues in model of human STAU1 binding domains 3 and 4 binding to dsRNA formed by lncRNA N47010 and mRNA NAP1L1.

Cyan: interacting amino acids; Gray: dsRNA ribbon; Red: oxygens; Blue: nitrogens; The sequence of dsRNA shown below was from Figure 14:

Figure 28. Comparison of predicted interacting amino acid residues in models of human STAU1 binding through dsRBD3 and 4 to dsRNA formed by the five lncRNAs containing Alu-elements in their 3’-UTR regions and their respective target mRNAs. Red: predicted dsRNA interacting residues of human STAU1 dsRBD3 and dsRBD4 with a high probability. Purple: predicted dsRNA interacting residues of human STAU1 dsRBD3 and 4 with a lower probability; Blue: interacting residues of human STAU1 dsRBD3 and dsRBD4 in models.

Discussion LncRNAs are highly expressed in human neural progenitor cells as well as brain tissues, which indicates that lncRNAs are likely to be involved in brain cellular processes and pathologies (Hecht et al., 2015). Meanwhile, another study suggested that aberrantly expressed lncRNAs in autistic brain leads to dysregulation of protein-coding genes in autism (Ziats & Rennert, 2013). Besides gene mutations, the discovery of antisense lncRNAs to ASD related genes provided new evidence for epigenetic deregulation of genes in ASD (Wang et al., 2015). With huge gaps in our understanding of the roles of lncRNAs in autism, this study provides one possible and novel mechanism through which lncRNAs may regulate the expression of genes related to autism, specifically via STAU1- mediated decay of mRNA.

Based on our discovery of Alu elements in 5 ASD-associated lncRNAs, we hypothesized that these lncRNAs may be involved in STAU1-mediated decay of their target mRNAs. Here, we not only show that these five lncRNAs can potentially form dsRNA with the 3’-UTR regions of a number of mRNAs, we also identify putative

STAU1 binding sites on each of the dsRNA duplexes resulting from lncRNA-target mRNA binding, and model the binding interactions between the dsRBDs on STAU1 and the dsRNAs. Since all of the lncRNAs in question showed decreased expression in subjects with ASD in comparison to control subjects (Hu et al., 2009), we predict that there would be less STAU1-mediated decay of the respective target mRNAs resulting in

48 relatively higher expression of the target mRNAs in ASD. Indeed, predicted target mRNAs FOXK1, NAP1L1, and HOOK3 of lncRNAs N73227, N47010, and T65857, respectively, each showed increased expression in lymphoblastoid cells from ASD subjects relative to controls in an earlier study from our laboratory (Hu et al., 2009), further supporting our hypothesis. With respect to biological functions, accelerated demethylation of the FOXK1 promoter in sperm has been correlated with paternal age, a risk factor for autism (Atsem et al., 2016). Moreover, the same study showed that

FOXK1 promoter demethylation which is associated with increased expression, was also elevated in blood samples from children with ASD in comparison to that of age-matched controls. Thus, increased FOXK1 expression as a result of reduced SMD due to decreased regulatory lncRNA might also contribute to the pathobiology of ASD. Dysregulated expression of HOOK3, another mRNA which is potentially regulated by lncRNA- directed SMD, may also contribute to ASD. HOOK3 is a microtubule cargo adaptor protein that is involved in centrosome functioning as well as in maintaining the balance between neurogenesis and neural progenitor proliferation (Ce et al., 2010). Thus, disturbance of this balance is likely to impact brain development during embryogenesis.

In fact, abnormal neurogenesis and proliferation of neural progenitor cells have been associated with ASD through studies on induced pluripotent stem cells derived from individuals with ASD and controls (Marchetto et al., 2017). The involvement of other

49 potential target mRNAs of the ASD-associated lncRNAs (Table 1) in ASD requires further study.

Limitations of this study and future studies

To identify the potential target mRNAs of 5 ASD-associated lncRNAs, we used the

Alu repetitive sequences of each lncRNA to search for mRNAs that contain

complementary Alu elements in their 3’-UTRs. However, it is unlikely that this approach identified all possible target mRNAs. Future studies could focus on developing a more efficient and accurate computational approach to search for potential target mRNAs with

3’-UTRs complementary to ASD lncRNAs. In addition, the 3D structural models

generated here may not be robust enough to prove convincingly that the ASD lncRNAs

can actually form RNA duplexes with their target mRNAs. Experimental methods such

as immunoprecipitation (IP) of STAU1-bound dsRNA duplexes may help demonstrate

the existence dsRNAs formed by ASD lncRNAs and potential target mRNAs. On the

other hand, it is possible that the lncRNAs without Alu elements or other repetitive

sequences may also form RNA duplexes with mRNAs capable of inducing STAU1-

mediated decay, thus complicating the analyses by increasing the number of genes

potentially regulated by SMD.

50 Based on previous dsRBP binding models from other dsRBP families and species, there are multiple possibilities for selection of binding sites and the combinations of dsRBDs. This study presents only one binding model that we believe to be the most optimal for hSTAU1 (Fig. 16). As for the recognition of SBS, it is possible that certain bases of RNA are more likely to interact with STAU1, a topic to be explored in future studies.

This study only tested one possible function of Alu elements in lncRNAs that were differentially expressed in autism. Other functions of Alu elements like

Alu retrotransposition-mediated deletions (Callinan et al., 2005) may cause genomic instability leading to many human genetic disorders. In other studies, Alu elements can function as gene regulators via alternative splicing, RNA editing, and translation regulation (Häsler & Strub, 2006). It is likely that these functions of Alu elements also play an important role in autism, which may lead to new directions in the future.

51 Conclusion

In this study, we tested the hypothesis that ASD-associated differentially expressed lncRNAs with Alu elements may be involved in STAU1-mediated decay of specific target mRNAs relevant to autism. Using bioinformatics analyses, we provide support for this hypothesis by identifying a list of potential target mRNAs containing complementary

Alu elements of ASD-associated lncRNAs, locating potential STAU1 binding sites on the

RNA duplexes formed by lncRNAs and their target mRNAs, and generating plausible models of hSTAU1 binding to target dsRNAs. Based on these results, we conclude that some of the differentially expressed lncRNAs with Alu elements may regulate gene expression in ASD through STAU1-mediated decay of specific target mRNAs. As to the test of actual binding of STAU1 to RNA helix, techniques like immunoprecipitation may provide more evidences.

52 Reference

Atsem, S., Reichenbach, J., Potabattula, R., Dittrich, M., Nava, C., Depienne, C., ... & Haaf, T. (2016). Paternal age effects on sperm FOXK1 and KCNA7 methylation and transmission into the next generation. Human molecular genetics, 25(22), 4996-5005.

Batzer, M. A., & Deininger, P. L. (2002). Alu repeats and human genomic diversity. Nature reviews genetics, 3(5), 370-379.

Callinan, P. A., Wang, J., Herke, S. W., Garber, R. K., Liang, P., & Batzer, M. A. (2005). Alu retrotransposition-mediated deletion. Journal of molecular biology, 348(4), 791-800.

Cosentino, G. P., Venkatesan, S., Serluca, F. C., Green, S. R., Mathews, M. B., & Sonenberg, N. (1995). Double-stranded-RNA-dependent protein kinase and TAR RNA- binding protein form homo-and heterodimers in vivo. Proceedings of the National Academy of Sciences, 92(21), 9445-9449.

De Lucas, S., Oliveros, J. C., Chagoyen, M., & Ortín, J. (2014). Functional signature for the recognition of specific target mRNAs by human Staufen1 protein. Nucleic acids research, 42(7), 4516-4526.

Deininger, P. L., & Batzer, M. A. (1999). Alu repeats and human disease. Molecular genetics and metabolism, 67(3), 183-193.

Fierro-Monti, I., & Mathews, M. B. (2000). Proteins binding to duplexed RNA: one motif, multiple functions. Trends in biochemical sciences, 25(5), 241-246.

Ge, X., Frank, C. L., de Anda, F. C., & Tsai, L. H. (2010). Hook3 interacts with PCM1 to regulate pericentriolar material assembly and the timing of neurogenesis. Neuron, 65(2), 191-203.

Gleghorn, M. L., Gong, C., Kielkopf, C. L., & Maquat, L. E. (2013). Staufen1 dimerizes through a conserved motif and a degenerate dsRNA-binding domain to promote mRNA decay. Nature structural & molecular biology, 20(4), 515-524.

Gong, C., & Maquat, L. E. (2011). lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3’-UTRs via Alu elements. Nature, 470(7333), 284-288.

53 Häsler, J., & Strub, K. (2006). Alu elements as regulators of gene expression. Nucleic Acids Research, 34(19), 5491-5497.

Hecht, P. M., Ballesteros-Yanez, I., Grepo, N., Knowles, J. A., & Campbell, D. B. (2015). Noncoding RNA in the transcriptional landscape of human neural progenitor cell differentiation. Frontiers in neuroscience, 9.

Hu, V.W. and Lai, Y. (2013) Developing a predictive gene classifier for autism spectrum disorders based upon differential gene expression profiles of phenotypic subgroups. North American Journal of Medicine and Science, 6(3):107-116.

Hu, V. W., & Steinberg, M. E. (2009). Novel clustering of items from the Autism Diagnostic Interview‐Revised to define phenotypes within autism spectrum disorders. Autism Research, 2(2), 67-77.

Hu, V. W., Sarachana, T., Kim, K. S., Nguyen, A., Kulkarni, S., Steinberg, M. E., ... & Lee, N. H. (2009). Gene expression profiling differentiates autism case–controls and phenotypic variants of autism spectrum disorders: Evidence for circadian rhythm dysfunction in severe autism. Autism research, 2(2), 78-97.

Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., & Madden, T. L. (2008). NCBI BLAST: a better web interface. Nucleic acids research, 36(suppl_2), W5-W9.

Kent, W. J. (2002). BLAT—the BLAST-like alignment tool. Genome research, 12(4), 656-664.

Larsson, A. (2014). AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics, 30(22), 3276-3278.

Ma, L., Bajic, V. B., & Zhang, Z. (2013). On the classification of long non-coding RNAs. RNA biology, 10(6), 924-933.

Marchetto, M. C., Belinson, H., Tian, Y., Freitas, B. C., Fu, C., Vadodaria, K. C., ... & Nunez, Y. (2017). Altered proliferation and networks in neural cells derived from idiopathic autistic individuals. Molecular psychiatry, 22(6), 820.

Martel, Catherine, Samuel Dugré-Brisson, Karine Boulay, Billy Breton, Gabriel Lapointe, Sylvain Armando, Véronique Trépanier, Thomas Duchaîne, Michel Bouvier,

54 and Luc Desgroseillers. "Multimerization of Staufen1 in live cells." Rna16, no. 3 (2010): 585-597.

Mercer, T. R., Dinger, M. E., & Mattick, J. S. (2009). Long non-coding RNAs: insights into functions. Nature Reviews Genetics, 10(3), 155-159.

McWilliam, H., Li, W., Uludag, M., Squizzato, S., Park, Y. M., Buso, N., ... & Lopez, R. (2013). Analysis tool web services from the EMBL-EBI. Nucleic acids research, 41(W1), W597-W600.

Park, E., & Maquat, L. E. (2013). Staufen‐mediated mRNA decay. Wiley Interdisciplinary Reviews: RNA, 4(4), 423-435.

Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., & Ferrin, T. E. (2004). UCSF Chimera—a visualization system for exploratory research and analysis. Journal of computational chemistry, 25(13), 1605- 1612.

Price, A. L., Eskin, E., & Pevzner, P. A. (2004). Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome research, 14(11), 2245- 2252.

Ramos, A., Grünert, S., Adams, J., Micklem, D. R., Proctor, M. R., Freund, S., ... & Varani, G. (2000). RNA recognition by a Staufen double‐stranded RNA‐binding domain. The EMBO Journal, 19(5), 997-1009.

Salem, A. H., Ray, D. A., Xing, J., Callinan, P. A., Myers, J. S., Hedges, D. J., ... & Batzer, M. A. (2003). Alu elements and hominid phylogenetics. Proceedings of the National Academy of Sciences, 100(22), 12787-12791.

Stothard, P. (2000). The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences.

St. Johnston, D., Brown, N. H., Gall, J. G., and Jantsch, M. (1992) A conserved double-stranded RNA-binding domain. Proc. Natl. Acad. Sci. USA 89, 10979–10983

Tempel, S. (2012). Using and understanding RepeatMasker. Mobile Genetic Elements: Protocols and Genomic Applications, 29-51.

55 Wang, Y., Zhao, X., Ju, W., Flory, M., Zhong, J., Jiang, S., ... & Shen, C. (2015). Genome-wide differential expression of synaptic long noncoding RNAs in autism spectrum disorder. Translational psychiatry, 5(10), e660.

Ziats, M. N., & Rennert, O. M. (2013). Aberrant expression of long noncoding RNAs in autistic brain. Journal of Molecular Neuroscience, 49(3), 589-593.