<<

Comparative sequence analyses of genome and transcriptome reveal novel transcripts and variants in the Asian Elephas maximus

1,† 2,† 1,† 1,† PULI CHANDRAMOULI REDDY , ISHANI SINHA , ASHWIN KELKAR , FARHAT HABIB , 1 2,‡ 1, ,‡ SAURABH J. PRADHAN , RAMAN SUKUMAR and SANJEEV GALANDE * 1Centre of Excellence in Epigenetics, Indian Institute of Science Education and Research, Pashan, Pune 411 008, India 2Centre for Ecological Sciences, Indian Institute of Science, Bangalore 560 012, India

*Corresponding author (Fax, 91-20-25865086; Email, [email protected])

†These authors contributed equally to the work. ‡Shared senior authorship.

The Elephas maximus and the Loxodonta africana that diverged 5–7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of . We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other . Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other . This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant.

[Reddy PC, Sinha I, Kelkar A, Habib F, Pradhan SJ, Sukumar R and Galande S 2015 Comparative sequence analyses of genome and transcriptome reveal novel transcripts and variants in the Asian elephant Elephas maximus. J. Biosci. 40 891–907] DOI 10.1007/s12038-015-9580-y

1. Introduction evidence suggests that the African elephant comprises two distinct species, the savannah elephant L. africana Elephants are the largest living land mammals in the and the forest elephant L. cyclotis (Roca et al. 2001; world, belonging to the family and order Rohland et al. 2007, 2010). Together, they comprise the , which evolved 60 million years ago only living representatives of the order Proboscidea, (Shoshani and Tassy 1996; Sukumar 2003). Two which includes 175 species (Shoshani and Tassy 2005). elephant species that are well known are the Asian Proboscideans in general and members of the family elephant, Elephas maximus, and the African elephant, Elephantidae in particular, including the recently extinct Loxodonta africana. Morphological as well as molecular (Mammuthus spp.), shared several unique

Keywords. Asian elephant; comparative genomics; gene prediction; transcriptome

Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/ dec2015/supp/Reddy.pdf http://www.ias.ac.in/jbiosci J. Biosci. 40(5), December 2015, 891–907, * Indian Academy of Sciences 891

Published online: 4 December 2015 892 Puli Chandramouli Reddy et al. features common to the extant elephants, such as large coverage (Dastjerdi et al. 2014). Recently, high coverage body size, probosces that formed long muscular trunks genome sequences of three Asian elephants along with two and unique dental characteristics (Osborn and Percy 1936). genomes have been published (Lynch et al. 2015). One of the distinctive features of present-day elephants, the High-coverage whole genome sequences are also available for tusks, were also present during the evolution of early Probo- several other Afrotherians, but remain poorly analysed. A few studies on comparative genomics have also been scideans (Osborn and Percy 1936). Elephants are also known carried out in elephants. One of the first comparative for their exceptional olfactory sense and chemical communi- genomic studies compared the placental transcriptome of cation (Rasmussen and Krishnamurthy 2000; Shoshani et al. the African elephant with that of other eutherian mammals 2006), and (Bates et al. 2008; Byrne et al. (Hou et al. 2012). This study identified thousands of 2009; Foerder et al. 2011), unique vocal communication commonly expressed genes, as well as lineage-specific genes (Payne 2003; Nair et al. 2009; Stoeger and de Silva 2014) that tracked the evolution of placental characteristics across and social organization (Moss 1988; de Silva and Wittemyer the mammalian phylogenetic tree. The fibroblast transcrip- 2012). tome of African elephants was also sequenced and used in a The closest living relatives of elephants are the hyraxes study to trace the evolution of the Y-chromosome in mam- (order Hyracoidea) and sea cows (Sirenia), followed by aard- mals (Cortez et al. 2014). A phylogeny-based comparison of varks (Tubulidentata), elephant shrews (Macroscelidea), tenrecs olfactory genes across 13 mammals revealed an unusually and golden moles (Afrosoricida), which together comprise the large repertoire of functional olfactory receptor genes among superorder Afrotheria (Springer et al. 1997). Despite the radical the African elephants when compared to other mammals differences in body size and visible morphological features, (Niimura et al. 2014) and substantiated the well- members of this group share several developmental character- documented observations that elephants have an exceptional istics such as a primitive placenta (Carter et al. 2006), increased sense of smell. In another study, the molecular bases of cold number of thoracolumbar vertebrae (Sánchez-Villagra et al. adaptation in mammoths was carried out by comparative 2007), late eruption of permanent teeth (Asher and Lehmann sequence analyses between whole genome sequences of 2008), non-descended male gonads (Werdelin and Nilsonne Asian elephants and mammoths against the African elephant 1999), etc. Molecular and phylogenetic analyses have conclu- reference genome and identification and analysis of mam- sively proved the monophyletic origin of this group (reviewed moth specific SNVs that fall within protein coding regions in Tabuce et al. 2008; Svartman and Stanyon 2012). (Lynch et al. 2015). Two recent studies have elucidated a A number of investigations on elephants thus far have been possible mechanism for the reduced occurrence of cancer in focused on their ecology, behaviour, phylogeny, population Asian and African elephants (Abegglen et al. 2015; Sulak genetics and conservation. However, since the early 1990s, et al. 2015) compared to other mammals including . molecular analyses for resolving phylogenies and population Comparative genome and phylogenetic analyses showed that genetics in elephants have been carried out by sequencing a the elephantid lineage had evolved multiple copies of TP53, few mitochondrial regions or nuclear genes and microsatellite a tumour suppressor gene that is present as a single copy in loci. With the advent of high-throughput sequencing technol- other mammals. Some of the extra copies of TP53 were ogies, sequence data from hundreds of loci were used to shown to be expressed, and elephant cells showed a higher resolve fine-scale phylogenetic relationships between the Af- sensitivity to DNA-damaging radiation, and an increased rican elephants, the Asian elephant and the induction of apoptosis when compared to cells. (Rohland et al. 2010). This study indicated that the Asian In this study, we have carried out whole genome and elephant was the closest relative of the woolly mammoth, transcriptome next generation sequencing and performed a and that the forest and savannah African elephants were at comparative genomics analysis with other elephantid least as diverged from each other as the Asian elephant from sequences available from public databases. We were able the woolly mammoth. Another study used shotgun and RAD to uncover several interesting leads that can be investigated sequencing to identify de novo markers for a genetically iso- further and will significantly contribute to building genomic lated population of elephants in Borneo with extremely low resources for functional genetic research in elephants in the levels of genetic diversity, which could be used in population future. We identified Asian elephant specific homozygous genetic studies (Sharma et al. 2012). single nucleotide variants (SNVs), many of which mapped The first version of the genome assembly was made avail- within protein coding regions and resulted in non- able in May 2005, and the third version used for all analysis synonymous changes. We used our transcriptome data to here was released in June 2009 (https://www.broadinstitu- improve the functional annotation of predicted genes in the te.org/scientific-community/science/projects/mammals-mod- L. africana reference genome. We identified potentially els/elephant/elephant-genome-project). Subsequently, the novel transcripts from expressed transcript fragments in our whole genome of the Asian elephant was sequenced at low dataset that did not map to any known or predicted genes in

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 893 the L. africana genome. We also analysed the unmapped and adapters were ligated. The library was further enriched transcript fragments to identify lncRNAs which might be with PCR to create the final cDNA library and sequenced involved in tissue and cell-type-specific functions. In addi- using 2×100 bp PE sequencing with Illumina HiSeq 2500. tion, we compared the domain architectures of elephant Bioanalyzer plots were used at every step to assess mRNA proteome sequences with those of three other Afrotheria quality, enrichment success, fragmentation sizes, and final species and identified domain architectures that are overrep- library sizes. Prior to loading on the sequencer, the quantity resented in elephants. of the library was estimated using Qubit fluorimeter (Thermo Fisher) and qPCR.

2. Methods 2.2 Genome sequence mapping

2.1 Sample collection, extraction of nucleic acids and Raw FASTQ reads were checked using FastQC (http:// next-generation sequencing www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads had excellent overall quality for both the paired-end Peripheral blood was collected from the ear vein of a captive and mate pair runs. Read mapping was performed using the male Asian elephant Jayaprakash from the Karnataka Forest Burrows-Wheeler Aligner (BWA) (Li and Durbin 2009) Department’s elephant camp at Bandipur National Park, Kar- (bwa mem algorithm was used) with default parameters for nataka, India (permit details provided in Acknowledgments). the paired-end reads and with -L 80,80 option for mate-pair Venous blood was collected in PAXgene blood RNA Tubes reads to suppress clipped alignments using the L. africana (Cat no. 762165 from PreAnalytiX) and stored at −20°C. (loxAfr3) genome sequence as reference. The loxAfr3 White blood cells (WBCs) were extracted from the blood genome was obtained from the Broad Institute. Both the by lysing the red blood cells (RBCs) with 0.7 M KCl. mate-pair and paired end runs achieved a greater than 98% Genomic DNA from total lymphocytes was isolated from alignment rate, though ~9% of the reads were clipped for peripheral blood (WBCs) using PureLink genomic DNA paired end alignments and 82% of the reads were clipped for isolation kit. DNA integrity was assessed by electrophoresis the mate-pair sequences. The high rate of clipping for mate-pair on 0.8% agarose gel. Total RNA from lymphocytes was sequences indicates potential issues in downstream analysis; isolated from peripheral blood (WBCs) using Trizol and hence we focused on the paired end run for most of the further the quality of RNA was analysed using Bioanalyzer 2100 analysis. The raw genomic DNA sequence data files were RNA 6000 nano Chip (Agilent). Five μg of high quality deposited at NCBI SRA under BioProject ID PRJNA301482, genomic DNA and 1 μg RNA samples with RIN value of BioSample ID SAMN04260428 and run ID SRR2912975. 7.2 were used for library preparation and sequencing. For whole genome sequencing, a paired end library with 2.3 Genome rearrangement analyses 400–500 bp insert size and a mate-paired library with 2–3kb insert size were prepared. The paired end (PE) library was To visualize genome rearrangements using Circos prepared using Illumina Trueseq library kit and 2×100 bp (Krzywinski et al. 2009) we used uniquely aligning paired sequencing reads were obtained using Illumina HiSeq 2500, end reads whose ends aligned onto different chromosomes for approximately 15X coverage. The mate pair library was (samtools view -F 1294 -q 1), we then binned the reads into prepared using the Illumina Nextera Mate-Pair library kit and 1 kb bins and only considered bins where more than 25 reads sequenced to obtain 2×250 bp reads for 5X coverage. Bioana- were aligning into bins in either end as reliable links to weed lyzer plots were used at every step to assess fragmentation out chimeric reads or misalignments. This was performed sizes and final library sizes. Prior to loading on the sequencer, using shell and awk scripts. To visualize the E. maximus the quantity of the library was estimated using picogreen assay genome, all scaffolds were laid end to end in order of (Thermo Fisher) and quantitative PCR (qPCR). decreasing length with no gaps, and links identified were For the transcriptome sequencing, the RNA-seq library connected via arcs. was prepared using TruSeq RNA Sample Prep Kits (Illumina). Briefly, poly-A containing mRNA molecules were enriched using poly-T oligo-attached magnetic beads. 2.4 Variant calling and identification of Asian elephant mRNA was fragmented to obtain desired size using divalent specific SNVs cations under elevated temperature. The cleaved RNA fragments were used to synthesize first strand cDNA using We carried out variant calling of our Asian elephant genome reverse transcriptase and random primers followed by sequence using African elephant genome as reference. The second strand cDNA synthesis using DNA Polymerase I Samtools/vcftools variant calling pipeline was used with and RNase H. These cDNA fragments were end repaired, default parameters except for -D250 parameter (ignoring

J. Biosci. 40(5), December 2015 894 Puli Chandramouli Reddy et al. regions with alignment depth greater than 250 as was were selected. Subsequently, the genes were centered based followed by Lynch et al. 2015) and combining both the on log values and hierarchical clustering was carried out paired-end and mate-pair alignments. based on calculated Pearson correlation and displayed based We also followed a modified Galaxy pipeline (https:// on centroid of the clusters using Cluster3 (de Hoon et al. 2004). usegalaxy.org/r/woolly-mammoth) and a dataset of Asian A cluster of highly expressed genes from the WBC elephant and mammoth variants, ‘mammoth SNPs’ from transcriptome of this study compared to the two fibroblast Lynch et al. (2015) to filter the subset of homozygous transcriptomes was further analysed for associated functions. variants present in all the three Asian elephant sequences In addition to the cluster analysis, we performed comparative but not in the two mammoth sequences. We compared the expression analysis using Cuffdiff (Trapnell et al. 2010) and homozygous variants from our data with the subset of Asian visualized the data by drawing a volcano plot with elephant specific homozygous variants obtained from the CummeRbund in R programming language (Goff et al. 2012). Lynch et al. (2015) dataset to determine the extent of The raw RNA sequence data files were deposited at NCBI SRA overlap. We then joined the subset of Asian elephant specific under BioProject ID- PRJNA301482, BioSample SNVs with the ‘mammoth coding SNPs’ dataset (also from ID- SAMN04247652 and run ID SRR2911072. Lynch et al. 2015) to identify the Asian elephant specific Functional enrichment analysis of selected cluster of variants that mapped to protein coding regions and resulted genes selected was carried out by BiNGO app in Cytoscape in non-synonymous mutations. We retrieved the FASTA (Shannon et al. 2003; Maere et al. 2005). A hypergeometric protein sequences of all the genes in this dataset from the test was performed to calculate the overrepresented GO L. africana genome and carried out a BLAST search using terms with Bonferroni Family-Wise Error Rate (FWER) default parameters to identify the top hits in other mammals. correction. Ontology terms belonging to Biological Process All the unique Uniprot IDs for the top mammalian homologs with less than 0.05 p-value were selected and mapped hier- were analysed for associations with known gene ontology archically based on parent and child relation. using DAVID (da Huang et al. 2009; Huang et al. 2009) and the BiNGO app in Cytoscape (Shannon et al. 2003; Maere et al. 2005). The list of biological process GO terms 2.6 Annotation of predicted protein models from the DAVID output were analysed for semantic similarity with REVIGO (Supek et al. 2011) and the overlap Functional annotation of hybrid GFF (Gene Feature File) between the gene lists of significant GO terms was generated by combining reference from L. africana determined using the UpSetR package (Gehlenborg 2015) available at NCBI and novel transcripts identified from in RStudio (R CoreTeam 2014). current study was used. Here, InterProScan-5 pipeline (Jones et al. 2014) was used according to the software 2.5 Transcriptome analysis manual against ProDom, ProSiteProfiles, PRINTS, Pfam and TIGRFAM databases, associated GO (gene ontology) terms were extracted and a custom GO We performed reference based transcriptome assembly using Annotation File (GAF) was built and used for further the Tuxedo pipeline (Trapnell et al. 2012) using the analysis. L. africana genome sequence as reference with annotations from NCBI (http://www.ncbi.nlm.nih.gov/genome/ 224?genome_assembly_id=28577). We first executed 2.7 Identification of novel transcripts TopHat with transcriptome reads and obtained alignments for all reads. We downloaded the transcriptome sequences We used the Tuxedo pipeline to identify novel transcripts from two male African elephant fibroblasts available in GEO dataset (GSM1227965 and GSM1278046) (Trapnell et al. 2012) that were not annotated in L. africana (Cortez et al. 2014) and performed similar analyses for reference genome. First, we used cufflinks to generate comparison with our transcriptome. Count of reads aligned transcript fragments (transfrags) and a gene transfer format to mRNA feature were calculated using MultiCov option in (GTF) file of the same using TopHat aligned reads using BEDtools (Quinlan and Hall 2010) using L. africana GFF as L. africana reference GTF from NCBI. The output GTF file reference. Next, the read count data of all the three from cufflinks was used for comparing reference GTF and transcriptomes were modified by adding 1 to all counts to Cufflinks generated GTF using Cuffcompare. From the include genes not expressed in one of the samples and to Cuffcompare output file, transfrags aligned uniquely to avoid artifacts in log transformation. The data were then log regions marked as intergenic in the L. africana reference transformed and filtered to include data with a minimum of 5 genome were considered as putatively novel transcripts. counts per mRNA feature, and only genes with a minimum These were further filtered by Transdecoder for identifying difference of 1 log transformed value, among the samples potential novel transcripts coding for peptides. These coding

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 895 peptides were annotated using InterProscan-5 to determine 3. Results the associated function. Gene ontology analysis of novel genes identified in this 3.1 Genome sequence analysis study was done using REVIGO. Here, list of GO terms associated with novel transcripts were used with 0.9 allowed A total of 472,003,946 reads were obtained from the paired similarity using SimRel option for semantic calculation end sequencing and in 95.63% of the reads, both reads in the against UniProt database. Results of REVIGO are pair were mapped, amounting to a mean coverage of 13.9X represented as treemap. with a mean mapping quality of 28.89. A total of 88,920,076 reads were sequenced from the mate-pair library and 99.14% of the reads had both reads in the pair mapped, amounting to 2.8 Identification of lncRNAs a mean coverage of 4.15X with a mean mapping quality of 25.45 (supplementary figure 1A and B). The BWA was We utilized the TransDecoder (Haas et al. 2013) and able to find alignments for >99% of the mate-pair reads and Coding potential calculator (CPC) (Kong et al. 2007) which >95% of the paired end reads, though multi-part alignments classify RNAs based on presence or absence of protein were obtained for many reads as is possible in the presence coding domains and codon usage frequencies to the 3904 of structural variations, gene fusion or reference mis-assembly. unannotated transcripts. This is expected as the reference genome is from a different The transcripts that were predicted to be ncRNAs by . both the tools were then further analysed using RFam database to identify any possible homology to known ncRNAs. The annotations were based on a Covariance 3.2 Genome rearrangement analyses model and/or Hidden Markov Model that denote a specific signature representing a specific lncRNA. These models Circos plot of paired end reads from E. maximus that aligned on are generated using all the known transcripts for each different scaffolds of the L. africana genome indicate putative reported and annotated ncRNA. These predictive models genomic rearrangements or scaffold interconnections in the were used to compare against our sequences of interest, Asian elephant as compared to the African elephant genome for identification of any significant match with a known (figure 1). To simplify the figure and show only reliable ncRNA. rearrangement sites, we isolated uniquely aligning reads that aligned on different scaffolds. The reads were then binned in 1Kb bins, and only regions where more than 25 reads aligned 2.9 Comparative domain composition analysis within the same bin on both ends were connected with an arc.

A comparative analysis of protein domain architectures was performed by mapping domain HMM (hidden 3.3 Variant analysis markov model) models of reference proteomes of three Afrotheria species: tenrec (Echinops telfairi, We identified a total of 28.29 million SNVs in our genome http://www.ncbi.nlm.nih.gov/genome/234), manatee sequence when compared to the L. africana genome, which (Trichechus manatus latirostris, http://www.ncbi.nlm.nih.- included indels, heterozygous as well as homozygous gov/genome/24630), and aardvark (Orycteropus afer, variants. Of these, about 21.7 million SNVs were homozy- http://www.ncbi.nlm.nih.gov/genome/11840) with the gous. The size of the African elephant genome is 3.1 Gb L. africana refseq proteome build from the current (http://www.ncbi.nlm.nih.gov/genome/224?genome_assem- analysis and refseq available in NCBI database bly_id=28577) and the Asian elephant genome is assumed to (http://www.ncbi.nlm.nih.gov/genome/224). Here, initially be of similar size (Dastjerdi et al. 2014). The average SNV hmmscan available in HMMER3 suit (Johnson et al. 2010) density was 8.85 SNVs per kb, with most of the regions was performed with default parameters with all the proteomes containing about 20 SNVs per kb (supplementary figure 2). and proteins. Unique domain sets were extracted from each In parallel, we identified ~1.9 million homozygous Asian hmmscan outfile and all lists were compared using VENNY elephant specific SNVs defined as variants (compared to the tool (Oliveros 2007). Based on this analysis, domains unique African elephant reference genome) that were homozygous to the L. africana sequence were selected for further across all the three Asian elephant sequences in the characterization. Different levels of domain architectures Lynch et al. (2015) dataset but absent from both the based on their numbers were calculated for understanding the mammoth sequences and found a 95% overlap between our group specific changes within the selected species. dataset and the Lynch et al. Asian elephant specific homo- Generic commands and scripts used in this study are zygous SNVs. Of the Asian elephant specific SNVs 7155 given in supplementary file 2. SNVs mapped to protein coding regions of the L. africana

J. Biosci. 40(5), December 2015 896 Puli Chandramouli Reddy et al.

Figure 1. Circos plot depicting paired end reads from E. maximus that aligned on different scaffolds of the L. africana genome. The L. africana genome (loxAfr3) is laid out clockwise starting from the 12 o'clock position with scaffolds in order of decreasing length. As there are over 2300 scaffolds in loxAfr3, gaps between the scaffolds are not shown. The numbers on the circumference refer to Gigabases with the elephants genome being ~3.1Gb. This figure shows clusters of paired end reads where the two ends of the pairs align on different scaffolds. A large number of these happen in the smaller scaffolds which could potentially be connected into larger scaffolds providing a more complete genome for E. maximus using the L. africana genome as a reference. The links have been coloured by scaffold position to make differentiating them easier.

genome, and 1859 of them resulted in non-synonymous were mapped to genomes of mammals by BLAST search, changes in the coding sequences. The list of all Asian while 8 returned no significant hits. FASTA sequences of the elephant specific non-synonymous mutations is provided in 1506 genes were used to retrieve 1367 unique Uniprot IDs in supplementary table 1. The 1859 non-synonymous variants higher mammals 1259 of which were mapped to GO terms in mapped to 1514 unique protein coding genes, of which 1506 the DAVID annotation tool. We focused on 81 biological

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 897

Table 1. List of the top ten biological process GO terms

Fold GO ID Term Count % p-Value Enrich Bonferroni ment

GO:0007608 sensory perception of smell 113 8.98 4.21784E−48 4.95 9.36782E−45

sensory perception of chemical GO:0007606 114 9.05 4.28044E−44 4.51 9.50687E−41 stimulus

GO:0007600 sensory perception 136 10.80 6.0228E−35 3.17 1.33766E−31

GO:0050890 139 11.04 2.61192E−31 2.89 5.80108E−28

GO:0050877 neurological system process 153 12.15 2.06571E−25 2.39 4.58795E−22

G-protein coupled receptor GO:0007186 141 11.20 5.25912E−23 2.37 1.16805E−19 protein signalling pathway

cell surface receptor linked signal GO:0007166 180 14.30 3.77473E−17 1.83 8.38367E−14 transduction

GO:0000279 M phase 38 3.02 1.16E−05 2.18 0.025339209

GO:0007018 Microtubule-based movement 19 1.51 2.49E−05 3.18 0.053890643

GO:0022403 cell cycle phase 42 3.34 7.92E−05 1.92 0.161278645

Only the top 7 GO terms marked in green were used for further analyses.

process GO terms (supplementary table 2) and semantic found to overlap significantly with cognition (GO:0050890). analyses with REVIGO indicated three terms with highly Moreover, sensory perception and cognition were found to be significant log 10 p-values: sensory perception of smell, cell subsets of neurological system process (GO:0050877) which surface receptor protein signalling pathway and G-protein overlapped significantly with the two remaining GO terms G- coupled receptor protein signalling pathway (figure 2A). protein coupled receptor protein signalling pathway Applying a stringent Bonferroni corrected cutoff of 0.05 (GO:0007186) and cell surface receptor linked signal transduc- yielded 7 significant biological processes: GO:0007608 (sen- tion (GO:0007166) (figure 2B). Significantly, a set of 113 sory perception of smell), GO:0007606 (sensory perception of olfactory receptor genes (supplementary table 3) was common chemical stimulus), GO:0007600 (sensory perception), to all the seven GO categories, while 38 genes were unique to GO:0050890 (cognition), GO:0050877 (neurological system cell surface receptor linked signal transduction and 11 were process), GO:0007186 (G-protein coupled receptor protein sig- unique to neurological system process. The GO categories nalling pathway), GO:0007166 (cell surface receptor linked under cellular processes and molecular function indicated sim- signal transduction) (table 1). However, there was a significant ilar trends with enrichment of functions related to olfactory overlap between these categories. The lists of genes categorized receptor activity (GO:0004984), plasma membrane in the GO terms sensory perception of smell (GO:0007608) and (GO:0005886) and integral to membrane (GO:0016021). An sensory perception of chemical stimulus (GO:0007606) almost independent analysis of functional enrichment using BinGO in completely overlapped with each other, and were subsets of the Cytoscape also yielded similar results with an enrichment of GO term sensory perception GO:0007600) which in turn was olfactory receptor activity (supplementary figure 3).

J. Biosci. 40(5), December 2015 898 Puli Chandramouli Reddy et al.

Figure 2. Gene ontology terms (biological processes) enriched in the Asian elephant specific non-synonymous SNVs. (A) Semantic analyses of all biological processes GO terms identified in our dataset in REVIGO. Colour scale is the log 10 of p-value for each GO term. Three terms: sensory perception of smell, G-protein-coupled receptor signalling pathway and cell surface receptor signalling pathway are significantly enriched. (B) Analysis of overlap between the 7 GO term gene sets with significant Bonferroni corrected p values using UpSetR. Sensory perception of smell (GO:0007608), Sensory perception of chemical stimulus (GO:0007606), Cognition (GO:0050890), G-protein coupled receptor protein signalling pathway (GO:0007186), Neurological system process (GO:0050877) and Cell surface receptor linked signal transduction (GO:0007166).

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 899

3.4 Transcriptome analysis splicing, it was possible that at least some of the 3,904 transcripts that were un-annotated could be lncRNAs which Out of 35,052,348 reads from RNA sequencing data, 74.1 % could not be mapped to predicted gene models. Thus an reads were aligned to L. africana reference genome suggest- independent analysis of these 3,904 transcripts was per- ing a good coverage of the total transcriptome. Next, to formed to explore the possibility of them being lncRNAs. evaluate the correlation in expression levels among different Since lncRNAs have unique sequence features that separate samples and to identify the genes unique to WBCs, a cluster them from all the protein coding genes, any combination the analysis with the two transcriptome sequences of African following 3 parameters could be used to identify potential elephant male fibroblasts was performed. We focused on lncRNAs (Dinger et al. 2008): (1) Nucleotide frequencies of genes that were different in their log transformed expression protein coding regions are dictated by non random codon values by a minimum of 1 between the two fibroblast data- usage, (2) Protein coding regions typically contain known sets and our WBC dataset. As expected, the two fibroblast protein coding domains, thus sequence analysis against data- transcriptomes clustered together, while our WBC transcrip- bases like Pfam can be used to identify coding regions, (3) tome occupied a separate node. We could identify 2 distinct Protein coding regions tend to have sequence similarities clusters of genes, one that was up-regulated and another that with protein sequence databases as conservation of amino was down-regulated in the WBCs when compared to the two acid sequence exerts a much higher selection constraint on fibroblast transcriptomes (figure 3A). Upon annotation using protein coding regions as opposed to non-coding genes. NCBI RefSeq for L. africana containing 29784 proteins, we TransDecoder predicted 1,503 transcripts as potential could annotate 28638 protein coding genes using InterPro- ncRNAs while Coding Potential Calculator (CPC) predicted Scan. We obtained 3904 novel transcript fragments 2904. Of these, 66 ncRNAs were identified based on homol- (supplementary file 1) that did not map to the L. africana ogy to known lncRNAs that are expressed in other mammals reference genome annotation file in our transcriptome anal- with an e-value 0.01 (supplementary table 6). Our dataset ysis. A custom annotation file was created using our anno- includes some extremely well studied lncRNAs (summarized tations for functional enrichment analysis. A more detailed in table 2) such as PVT1_3, DLEU2_1, ZNRD-AS1_1 that analysis showed a sub-cluster of 458 genes (Supplementary were identified with a high significance. PVT1_3 and Table 4) that were highly expressed only in our WBC tran- DLEU2_1 are oncogenes and span translocation breakpoints scriptome, but not in the fibroblasts. Functional enrichment that are related to Burkitt’s lymphoma and acute B cell lym- using our custom annotation files indicated functions asso- phocytic leukemia respectively while ZNRD-AS1_1 is ciated with the immune system (figure 3B). This suggests expressed from the MHC I locus specifically. Our good representation of the transcriptome specific to WBCs in dataset also contains well-characterized lncRNAs MALAT1, our data. An additional differential expression analysis was NEAT1 that are universally expressed and highly conserved in performed using Cuffdiff tool (part of Tuxedo suite). In this mammals with respect to expression pattern and function. The analysis 9019 genes showed significant differential expres- highly significant matches to these lncRNAs yield credibility sion in comparison with the fibroblast transcriptome to our dataset as being a good representation of the transcribed (figure 3C). In addition, 321 of the 458 genes selected as lncRNAs in E. maximus. In addition to these ‘benchmark’ sub-cluster unique to WBC from cluster analysis were also lncRNAs, we also detected other transcripts that are reported found significantly upregulated in WBC by Cuffdiff to be involved in diverse functions, ranging from circadian analysis. rhythms to cellular apoptosis (Table II). We identified 3,904 transcripts aligning to the intergenic regions that do not have any gene annotation associated with it. Out of these 1,458 could potentially code for peptides. We 3.6 Domain architecture were able to annotate 103 of these 1,458 predicted peptide models using InterProScan. Most of these 103 novel proteins We compared the domain architectures of the proteomes of (supplementary table 5) were found to be associated with L. africana with three other species of Afrotheria, the tenrec, immune-related functions (figure 4). manatee and aardvark. The total number of predicted pro- teins in these species ranges from ~23,000 to ~29,000 of which 5,936 proteins shared similar domain architectures. 3.5 Identification of lncRNAs We observed a significant gain in the number of proteins in most of the architectures in the African elephant. Specifical- Since we used gene prediction models based on alternative ly, proteins with 11 to 20 domains have doubled in number splicing patterns to align the transcripts in our dataset to the in the African elephant in comparison to tenrecs (E. telfairi) L. africana assembly, and a majority of the lncRNAs have a (figure 5A.) (supplementary table 7). Moreover, 181 maximum of 2 exons and do not undergo alternative domains were found to be unique to the elephant

J. Biosci. 40(5), December 2015 900 Puli Chandramouli Reddy et al.

Figure 3. Gene expression analysis of the Asian elephant WBC transcriptome. (A) Hierarchical clustering of genes based on their expression from two African elephant fibroblast transcriptomes and the Asian elephant WBC transcriptome from the present study. Clustering was performed by Cluster3 after log transforming the counts of aligned reads to transcripts. Scale bar represents all values normalized to the maximum read count and red coloured transcripts are upregulated, green coloured transcripts are down regulated whereas black colour transcripts show no change across the samples. (B) Functional enrichment analysis of selected cluster of genes upregulated in the WBC transcriptome. Enrichment analysis of the highlighted cluster of transcripts was carried out by BiNGO tool in Cytoscape using custom GO Annotation File (GAF) made from InteProScan-5 outfile. Scale bar shows the p-value score. Size of the circles is proportional to number of genes with the corresponding GO term. (C) Volcano plot displaying differentially expressed genes between African elephant fibroblast transcriptome (q1) and Asian elephant WBC transcriptome (q2). Differential expression analysis was carried out by cuffdiff (part of the Tuxedo suite) and results were plotted using cummeRbund package using R programming language. Each dot represents a gene and red coloured dots denote significantly differentially regulated genes.

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 901

Figure 4. Gene Ontology Treemap of novel transcripts identified in current study. Poly A enriched transcripts assembled from the WBC transcriptome of Asian elephant from the present study were assembled using L. africana genome sequence as reference and compared with its gene annotation file. Transcripts aligned to intergenic regions with no predicted gene models were considered as novel. All the novel genes with greater than 100 nucleotide ORFs (open reading frame) were annotated with InterProScan-5 and associated gene ontology (GO) terms were analysed by REVIGO. Box size indicates the log 10 p-value of the GO enrichment. Same colour boxes represent the GO terms with semantic similarity greater than 0.9 calculated by SimRel method in REVIGO. Parent GO term representing similar gene ontologies were shown in bigger font with grey background.

(figure 5B) and only 33 of these domains had corresponding coding RNAs that are widely expressed and perform many related GO terms (supplementary table 8). important functions (Trapnell et al. 2010; Xie et al. 2014). Of particular interest are the long non-coding (lnc)RNAs that are longer than 200 nt, often multi exonic and are engaged in 4. Discussion a variety of functions like embryonic development (Ulitsky et al. 2011), dosage compensation (Chan et al. 2011), chro- Functional genetic studies among ‘non-model’ or wild ani- matin remodeling (Rinn et al. 2007) and may even act as mals, such as elephants, is often severely restricted because scaffolds for chromatin remodelling complexes like PRC2 of limitations such as access to samples, problems with establishing proper controls and treatments, biological vari- (involved in X-inactivation) (Jeon and Lee 2011) and influ- ability, inability to carry out genetic manipulations, to name ence gene regulation of miRNA target genes (Salmena et al. a few. Many such ‘non-model’ organisms may hold the key 2011). Comparative genomics may also yield insights into to answering pertinent questions with respect to the evolu- genomic rearrangements including events such as duplica- tion of specific adaptive traits. While it is not possible to tions, inversions, or translocations that change the structure create laboratory models that would cover the entire spec- of the genome. It has been hypothesized that genomic rear- trum of biological diversity, the recent availability of whole rangements could promote speciation by reducing gene flow genome and transcriptome sequences of several organisms by suppressing recombination and extending the effects of have allowed the use of comparative analyses to provide linked isolation genes (Rieseberg 2001). valuable insights into the evolution of phylogenetic groups Elephants are extraordinary mammals that have evolved as well as the evolutionary history of specific traits (Mikkel- numerous characteristics unique to them. Identifying the sen et al. 2007; Ellegren 2008; Warren et al. 2008; Lindblad- genetic bases of evolutionary changes is an extremely chal- Toh et al. 2011; Parker et al. 2013; Keane et al. 2015). In lenging task and comparative sequence analyses overcome particular, comparative transcriptomics not only helps to many of these limitations but the availability of high quality identify novel transcripts and novel gene expression patterns, genome and transcriptome sequence is critical to carry out but also aids in the discovery and characterization of non- such analyses. Despite the availability of the whole genome

J. Biosci. 40(5), December 2015 902 Puli Chandramouli Reddy et al.

Table 2. List of well-characterized lncRNAs that were identified in the Asian elephant transcriptome with relevance to the function of the immune system

Cuff ID for ncRNA Putative lncRNA e-value Description

CUFF.27.1 Pinc 0.01 Highly conserved mammalian ncRNA, involved in cell survival and cell cycle progression, discovered in mammary glands CUFF.4121.1 bxd_6 0.00075 Involved in Regulation of Ultrabithorax via transcriptional interference CUFF.5432.1 FMR1-AS1 0.0015 X linked FMR gene antisense RNA, reported to have anti apoptotic function CUFF.7140.1 PVT1_3 0.0026 ncRNA located at translocation breakpoint in Burkitt’s lymphoma. PVT1 is an oncogene that generates a non-coding RNA CUFF.10590.1 DLEU2_1 1.2E−24 Lymphocytic specific lncRNA, potential tumor suppressor gene implicated in acute B cell lymphocytic leukaemia CUFF.13499.1 ZNRD1-AS1_1 0.0037 lncRNA encoded from the MHC-1 locus CUFF.17087.1 Mico1 0.0051 Imprinted ncRNA involved in Circadian rhythms CUFF.18583.1 MALAT1 6.9E−19 highly conserved mammalian lncRNA ubiquitously expressed in all tissues. Regulates expression of metastasis related genes and cell motility CUFF.18578.1 NEAT1_3 9.2E−35 Constitutively expressed in all tissues, structural component of nuclear paraspeckles. Also reported to be a virus induced cRNA (VINC) CUFF.15862.1 KCNQ1OT1 0.0043 Paternally imprinted expression. The main function of this RNA is to maintain silencing of the paternally silenced KCNQ1 gene by recruiting PRC2 repressive complex to the enhancer in cis. lncRNA names and description are from the RFam database. Cuff IDs are generated from the transcriptome assembly reported here.

sequence of L. africana, most of the research on elephants in elephants were reported (Lynch et al. 2015). However, the past as indicated by published literature has been in the the current study is the first report of high coverage area of behaviour, ecology, reproductive biology and breed- transcriptome sequencing from the Asian elephant. ing, infectious disease, life history traits and welfare of wild From the comparative analysis of genome sequences, we and captive populations in addition to population genetics report a large number of Asian elephant specific, non- and demography, , conflicts with people and other synonymous SNVs when compared to the African elephant conservation issues. and mammoth genomes. One of the most significant groups At the time this project was conceived no whole of genes in this dataset were the olfactory receptor (OR) genome sequences were available for the Asian elephant. genes. The African elephant genome has previously been Given the split between Asian and African elephants that shown to encode an expanded repertoire of OR genes when occurred approximately 7 million years ago (Rohland compared to other mammals (Niimura et al. 2014). The OR et al. 2010) and their different evolutionary histories, we gene family is known to be rapidly evolving, with frequent expected to find genetic differences between the two gains (due to gene duplication) and losses (due to the con- species. While Loxodonta remained confined to the Afri- version of functional copies into pseudogenes) (reviewed in can continent, Elephas migrated into Eurasia with a dis- Jiang and Matsunami 2015). The number of OR genes tinct phylogeography shaped by the glacial-interglacial present in a genome has also been found to vary between cycles of the (Vidya et al. 2009). We there- different mammalian species, on the basis of adaptation to fore aimed to develop a resource for genome-wide se- their respective environments (Hayden et al. 2010; Niimura quence analyses of the Asian elephant and comparative 2012). For example, (including humans) have much analyses with the genome sequences of the African ele- fewer functional OR genes as well as pseudogenes, presum- phant, other members of Afrotheria as well as mammals ably because primates depend more on vision which is in general. We carried out the whole genome and tran- highly developed (Matsui et al. 2010). It is therefore not scriptome sequencing of an Asian elephant from India. surprising to find species specific differences in more than a Earlier this year, high coverage sequences of three Asian hundred OR genes between the Asian and the African

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 903

Figure 5. Comparison of protein domain architectures across Afrotheria. (A) Bar plot of protein sets classified based on the number of domains per protein drawn for each category. Protein refseq databases of selected Afrotherians along with L. africana refseq merged with predicted peptides from novel transcripts identified in the present study were used for scanning HMM profiles of domains available at Pfam database using hmmscan of HMMER3. (B) Venn diagram for distribution of Pfam domains in four Afrotherian species. Venn-diagram was plotted on VENNY. Unique and shared domains among Afrotherians were represented in different colours along with numbers. [Tenrec (Echinops telfairi), Elephant (L. africana), Manatee (Trichechus manatus latirostris) and Aardvark (Orycteropus afer)].

elephant. Further investigation is necessary to determine would have encountered significantly different environ- whether the SNVs identified in this study result in functional ments, climate and vegetation patterns, glaciation events differences in perception of olfactory cues between the two (Vidya et al. 2009) and other conditions that were different species. However, it is not difficult to imagine that as the from those present in Africa. The differences in OR gene ancestral Elephas migrated into the Asian land mass, they sequences between the Asian and African elephants

J. Biosci. 40(5), December 2015 904 Puli Chandramouli Reddy et al. therefore might possibly reflect a functional adaptation to the lncRNAs in the Asian elephant. Evidence that lncRNAs difference in various volatile chemical signals they encoun- perform biologically relevant roles in organisms has accrued tered. For instance, the pheromone, (Z)-7-dodecen-1-yl ace- from studies that demonstrate the dysregulation of lncRNAs tate, released in urine by female Asian elephants to signal in disease conditions (Gibb et al. 2011; Mitra et al. 2012). estrous status, is different from that of the African elephant Mutations in the lncRNA sequence and alterations to DNA (Rasmussen and Schulte 1998). In the future, it would also methylation at lncRNA promoters have been demonstrated be interesting to closely examine the number and overlap of to be linked to cancer (Du et al. 2013; Zhi et al. 2014). OR genes between the three elephantids (mammoths, Asian Taken together all these facts about lncRNAs paint a com- and African), as well as between elephantids and other pelling picture of the very important yet largely unexplored members of Afrotheria comprising of species that have dis- depths of what lncRNAs are involved in. The occurrence of tinct diets and occupy diverse habitats. highly conserved lncRNAs in the Asian elephant transcrip- We explored genomic rearrangements in the E. maximus tome provides confidence in the quality of our dataset in genome by looking at discordant paired end alignments on the representing a comprehensive transcriptome of the PBMCs loxAfr3 assembly of the paired ends reads from the E. maximus in E. maximus. These lncRNAs need to be functionally genome. The number of rearrangements is far greater in the validated in an in vivo scenario by functional screens. cDNA smaller scaffolds (toward the 10–12 o’clockregioninthefig- cloning of these sequences into mammalian expression vec- ure), which indicates that the smaller scaffolds are sites where a tors and then transfection into other mammalian cells might better assembly than the current one may be obtained. be one way of performing such analysis. Since we had generated transcriptome data from a single Significant number of lncRNAs could not be identified individual at a single time point, we cross validated our data with the RFam database searches and the reason for this by comparing our WBC transcriptome with the fibroblast could be two-fold. The RFam database may not be compre- transcriptomes of the African elephant. As expected, our hensive enough to harbor entries for all possible lncRNAs dataset showed a gene clustering pattern that was distinct for reasons explained above (see sections 2.8 and 3.5). from the two fibroblast transcriptomes, and an enrichment of Secondly, some of the transcripts that have been tagged as immune system related genes. We then used our transcrip- putative lncRNAs may be novel and thus have not been tome sequence to perform transcriptome based annotation of reported earlier. An in-depth analysis with functional valida- the L. africana genome. Many genomes sequenced as a part tion is the only way in which this could be resolved. of large sequencing projects carry out annotations using Comparative analyses of domain architectures between computational pipelines based on homology to human and the available proteomes of L. africana with three other mouse genes, that may not be able to identify species or Afrotherian species showed an increased number of proteins lineage specific features. Based on our transcriptome data, in almost every domain category in the African elephant, we were able to identify more than a hundred novel despite its evolutionary proximity to these species. As expressed transcripts with protein coding potential and ho- expected, we observed the African elephant protein domain mology to known protein domains, most of them with ho- architectures overlapped most with those of the manatee, mology to immune-system related genes or domains. These which is evolutionarily closer to elephants than other Afro- genes may either be present and/or expressed only in the theria species. We also identified 181 proteins with domain Asian elephant, or they may be found in the African elephant architectures that were unique to the African elephant. Fur- as well, but were not annotated previously. Validation of ther analyses of these proteins and their functions might these novel transcripts would reveal whether elephants (in yield interesting insights into elephant specific traits. It would also be interesting to perform a similar comparison general, or Asian elephants in particular) might have evolved between the different elephantids when the assembly and novel immune system related functions. Additional studies proteome of the Asian elephant and woolly mammoth be- on comparative analyses of tissue specific and developmen- come available. tal stage specific transcriptomes would further improve the These findings have broader implications in furthering the annotation of the elephant genomes and help identify many understanding of a wide range of genetic and evolutionary more species specific genes, as well as gene expression processes among elephants in particular, and Afrotheria and patterns. A large number of transcripts that were expressed could not be identified or annotated based on sequence and mammals in general. Our data serves as a primary resource protein domain comparisons. One possibility is that such that can be analysed by researchers from various fields to transcripts could represent non-coding RNAs which do not extract specific information that would be of relevance to contain protein coding potential. answering questions of their interest. Further analyses and We therefore investigated the possibility of expression of comparison of whole genome and transcriptome sequences non-coding RNAs, and particularly, lncRNAs in the tran- would help unravel the evolutionary history of many more scriptome. Our study is the first to report the expression of unique traits that may have conferred an adaptive advantage

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 905 to the elephantid lineage and enhance our understanding of da Huang W, Sherman BT and Lempicki RA 2009 Systematic and how these magnificent creatures evolved and diverged. For integrative analysis of large gene lists using DAVID bioinfor- – example, studies focused on key genes of the immune sys- matics resources. Nat. Protoc. 4 44 57 Dastjerdi A, Robert C and Watson M 2014 Low coverage sequenc- tem can be carried out to understand the susceptibility or ing of two Asian elephant Elephas maximus genomes. Giga- adaptations of Asian elephants to various types of pathogens, science 3 12 irritants, allergens and toxins. Genetic predispositions to de Hoon MJL, Imoto S, Nolan J and Miyano S 2004 Open source diseases and physiological susceptibility to various environ- clustering software. Bioinformatics 20 1453–1454 mental and anthropogenic factors can also be studied. This de Silva S and Wittemyer G 2012 A comparison of social organi- report also provides a stepping-stone for evolutionary biolo- zation in Asian elephants and African savannah elephants. Int. J. Primatol. 33 1125–1141 gists to study the evolution of elephant specific traits such as Dinger ME, Pang KC, Mercer TR and Mattick JS 2008 Differenti- large body size, their metabolic machinery and skeletal ating protein-coding and noncoding RNA: challenges and am- structure, tusks and aspects related to olfaction. biguities. PLoS Comput. Biol. 4 e1000176 Du Z, Fei T, Verhaak RGW, Su Z, Zhang Y, Brown M, et al. 2013 Acknowledgements Integrative genomic analyses reveal clinically relevant long non-coding RNA in human cancer. Nat. Struct. Mol. Biol. 20 – This work was supported by grant from the Centre of Excel- 908 913 Ellegren H 2008 Comparative genomics and the study of evolution lence in Epigenetics program of the Department of Biotech- by natural selection. Mol. Ecol. 17 4586–4596 nology, India, and from IISER Pune to SG. PCR is supported Foerder P, Galloway M, Barthel T, Moore DE 3rd and Reiss D by Postdoctoral Fellowship from the Department of Biotech- 2011 Insightful problem solving in an Asian elephant. PLoS One nology, India. AK is supported by an Early Career Fellow- 6 e23251 ship from the Wellcome Trust-DBT India Alliance. FH is Gehlenborg N 2015 UpSetR: a more scalable alternative to venn supported by the Centre of Excellence in Epigenetics Pro- and euler diagrams for visualizing intersecting SetsR package gram. SJP is supported by Senior Research Fellowship from version 100 (http://www.CRANR-projectorg/package=UpSetR) the Council of Scientific and Industrial Research, India. IS is Gibb EA, Brown CJ and Lam WL 2011 The functional role of long supported by Centre for Ecological Sciences through grant- non-coding RNA in human carcinomas. Mol. Cancer 10 38 in-aid from Ministry of Environment, Forest and Climate Goff L, Trapnell C, Kelley D 2012 PDF R. Script CummeRbund Change. RS is a JC Bose National Fellow. Elephant blood User Guide, DataImport biocViews Clustering, DifferentialEx- sample was obtained with the permission and support of the pression DataRepresentation, Infrastructure GeneExpression “ Karnataka Forest Department (vide permit No. B1/WL/CR- et al. Analysis, exploration, manipulation, and visualization of ” 25/2013-14). Cufflinks high-throughput sequencing data. R package version 2no1 Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. 2013 De novo transcript sequence reconstruc- References tion from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8 1494–1512 Abegglen LM, Caulin AF, Chan A, Lee K, Robinson R, Campbell Hayden S, Bekaert M, Crider TA, Mariani S, Murphy WJ and MS et al. 2015 Potential mechanisms for cancer resistance in Teeling EC 2010 Ecological adaptation determines functional elephants and comparative cellular response to DNA damage in mammalian olfactory subgenomes. Genome Res. 20 1–9 humans. JAMA 314 1850–1860 Hou Z-C, Sterner KN, Romero R, Than NG, Gonzalez JM, Weckle A, Asher RJ and Lehmann T 2008 Dental eruption in afrotherian et al. 2012 Elephant transcriptome provides insights into the evolu- mammals. BMC Biol. 6 14 tion of Eutherian placentation. Genome Biol. Evol. 45 713–725 Bates LA, Poole JH and Byrne RW 2008 Elephant cognition. Curr. Huang DW, Sherman BT and Lempicki RA 2009 Bioinformatics Biol. 18 R544–R546 enrichment tools: paths toward the comprehensive functional Byrne RW, Bates LA and Moss CJ 2009 Elephant cognition in analysis of large gene lists. Nucleic Acids Res. 37 1–13 perspective. Cogn. Behav. Rev. 4 65–79 Jeon Y and Lee JT 2011 YY1 tethers Xist RNA to the inactive X Carter AM, Blankenship TN, Enders AC and Vogel P 2006 The nucleation center. Cell 146 119–133 fetal membranes of the otter shrews and a synapomorphy for Jiang Y and Matsunami H 2015 Mammalian odorant receptors: afrotheria. Placenta 27 258–268 functional evolution and variation. Curr. Opin. Neurobiol. 34 Chan KM, Zhang H, Malureanu L, van Deursen J and Zhang 54–60 Z 2011 Diverse factors are involved in maintaining X Johnson LS, Eddy SR and Portugaly E 2010 Hidden Markov model chromosome inactivation. Proc. Natl. Acad. Sci. USA 108 speed heuristic and iterative HMM search procedure. BMC Bio- 16699–16704 informa. 11 431 Cortez D, Marin R, Toledo-Flores D, Froidevaux L, Liechti A, Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. Waters PD, et al. 2014 Origins and functional evolution of Y 2014 InterProScan 5: genome-scale protein function classifica- chromosomes across mammals. Nature 508 488–493 tion. Bioinformatics 30 1236–1240

J. Biosci. 40(5), December 2015 906 Puli Chandramouli Reddy et al.

Keane M, Semeiks J, Webb AE, Li YI, Quesada V, Craig T, et al. Quinlan AR and Hall IM 2010 BEDTools: a flexible suite of 2015 Insights into the evolution of longevity from the bowhead utilities for comparing genomic features. Bioinformatics 26 genome. Cell Rep. 10 112–122 841–842 Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G and R Core Team 2014 R: A language and environment for statistical Web Server issue 2007 CPC: assess the protein-coding potential computing R Foundation for Statistical Computing, Vienna, of transcripts using sequence features and support vector ma- Austria (http://www.R-project.org) chine. Nucleic Acids Res. 35 W345–W349 Rasmussen LEL and Schulte BA 1998 Chemical signals in the Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman reproduction of Asian (Elephas maximus) and African (Loxo- – D, et al. 2009 Circos: an information aesthetic for comparative donta africana) elephants. Anim. Reprod. Sci. 53 19 34. genomics. Genome Res. 19 1639–1645 Rasmussen LEL and Krishnamurthy V 2000 How chemical signals Li H and Durbin R 2009 Fast and accurate short read align- integrate Asian elephant society: the known and the unknown. – ment with burrows-wheeler transform. Bioinformatics 25 Zoo Biol. 19 405 423 1754–1760 Rieseberg LH 2001 Chromosomal rearrangements and speciation. – Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Trends Ecol. Evol. 16 351 358 Washietl S, et al. 2011 A high-resolution map of human Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, evolutionary constraint using 29 mammals. Nature 478 et al. 2007 Functional demarcation of active and silent chroma- 476–482 tin domains in human HOX Loci by non-coding. RNAs Cell 129 – Lynch VJ, Bedoya-Reina OC, Ratan A, Sulak M, Drautz-Moses DI, 1311 1323 Perry GH, et al. 2015 Elephantid genomes reveal the molecular Roca AL, Georgiadis N, Pecon-Slattery J and O'Brien SJ 2001 bases of Woolly Mammoth adaptations to the arctic. Cell Rep. Genetic evidence for two species of elephant in Africa. Science – 12 217–228 293 1473 1477 Mitra SA, Mitra AP and Triche TJ 2012 A central role for long non- Rohland N, Malaspinas AS, Pollack JL, Slatkin M, Matheus P and coding RNA in cancer. Front. Genet. 3 17 Hofreiter M 2007 Proboscidean mitogenomics: chronology and Maere S, Heymans K and Kuiper M 2005 BiNGO: a Cytoscape mode of elephant evolution using as outgroup. PLoS plugin to assess overrepresentation of gene ontology categories Biol. 5 e207 in biological networks. Bioinformatics 21 3448–3449 Rohland N, Reich D, Mallick S, Meyer M, Green RE, Georgiadis Matsui A, Go Y and Niimura Y 2010 Degeneration of olfactory NJ, et al. 2010 Genomic DNA sequences from mastodon and receptor gene repertories in primates: no direct link to full woolly mammoth reveal deep speciation of forest and savanna trichromatic vision. Mol. Biol. Evol. 27 1192–1200 elephants. PLoS Biol. 8 e1000564 Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Salmena L, Poliseno L, Tay Y, Kats L and Pandolfi PP 2011 A Duke S, et al. 2007 Genome of the marsupial Monodelphis ceRNA hypothesis: the Rosetta stone of a hidden RNA lan- domestica reveals innovation in non-coding sequences. Nature guage? Cell 146 353–358 447 167–177 Sanchez-Villagra MR, Marcelo R, Narita Y and Kuratani S 2007 Moss C 1988 Elephant : thirteen years in the life of an Thoracolumbar vertebral number: the first skeletal synapomor- elephant family / Cynthia Moss. New York: W. Morrow phy for afrotherian mammals. Syst. Biodivers. 5 1–7 Nair S, Balakrishnan R, Seelamantula CS and Sukumar R 2009 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Vocalizations of wild Asian elephants (Elephas maximus): struc- et al. 2003 Cytoscape: a software environment for integrated tural classification and social context. J. Acoust. Soc. Am. 126 models of biomolecular interaction networks. Genome Res. 13 2768–2778 2498–2504 Niimura Y 2012 Olfactory receptor multigene family in vertebrates: Sharma R, Goossens B, Kun-Rodrigues C, Teixeira T, Othman N, from the viewpoint of evolutionary genomics. Curr. Genomics Boone JQ, et al. 2012 Two different high throughput sequencing 13 103–114 approaches identify thousands of de novo genomic markers for Niimura Y, Matsui A and Touhara K 2014 Extreme expansion of the genetically depleted Bornean elephant. PLoS One. 7 e49533 the olfactory receptor gene repertoire in African elephants and Shoshani J, Tassy P 1996 The Proboscidea: Evolution and Palaeoecol- evolutionary dynamics of orthologous gene groups in 13 pla- ogy of Elephants and Their Relatives (Oxford University Press) cental mammals. Genome Res. 24 1485–1496 Shoshani J and Tassy P 2005 Advances in proboscidean Oliveros J 2007 VENNY An interactive tool for comparing lists and classification, anatomy and physiology, and ecology and with Venn diagrams behavior. Quat. Int. 126–128 5–20 Osborn HF, Percy MR 1936 Proboscidea; a monograph of the Shoshani J, Kupsky WJ and Marchant GH 2006 Elephant brain - discovery, evolution, migrations and of the masto- Part 1: Gross morphology, functions, comparative anatomy, and donts and elephants of the world (New York: American Muse- evolution. Brain Res. Bull. 70 124–157 um Press) Springer MS, Cleven GC, Madsen O, de Jong WW, Waddell VG, Parker J, Tsagkogeorga G, Cotton JA, Liu Y, Provero P, Stupka E Amrine HM and Stanhope MJ 1997 Endemic African mammals and Rossiter SJ 2013 Genome-wide signatures of convergent shake the phylogenetic tree. Nature 388 61–64 evolution in echolocating mammals. Nature 502 228–231 Stoeger AS and de Silva S 2014 African and Asian Elephant Vocal Payne KB, Langbauer WB Jr and Thomas EM 1986 Infrasonic Communication: A Cross-Species Comparison; in Biocommuni- Calls of the Asian Elephant (Elephas maximus). Behav. Ecol. cation of (ed) G Witzany (Netherlands: Springer) pp Sociobiol. 18 297–301 21–39

J. Biosci. 40(5), December 2015 Novel transcripts and variants in the Asian elephant 907

Sukumar R 2003 The living elephants : evolutionary ecology, RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. behavior, and conservation (New York: Oxford University 7 562–578 Press) Ulitsky I, Shkumatava A, Jan CH, Sive H and Bartel DP 2011 Sulak M, Fong L, Mika K, Chigurupati S, Yon L, Mongan NP et al. Conserved function of lincRNAs in vertebrate embryonic devel- 2015 TP53 copy number expansion correlates with the evolution opment despite rapid sequence evolution. Cell 147 1537–1550 of increased body size and an enhanced DNA damage response Vidya TN, Sukumar R and Melnick DJ 2009 Range-wide mtDNA in elephants bioRxiv. doi:10.1101/028522 phylogeography yields insights into the origins of Asian ele- Supek F, Bosnjak M, Skunca N and Smuc T 2011 REVIGO phants. Proc. Biol. Sci. 276 893–902 summarizes and visualizes long lists of gene ontology terms. Warren WC, Hillier LW, Marshall Graves JA, Birney E, Pont- PLoS One 6 e21800 ing CP, Grutzner F, et al. 2008 Genome analysis of the Svartman M and Stanyon R 2012 The chromosomes of Afrotheria platypus reveals unique signatures of evolution. Nature 453 and their bearing on mammalian genome evolution. Cytogenet 175–183 Genome Res. 137 144–153 Werdelin L and Nilsonne A 1999 The evolution of the scrotum and Tabuce R, Asher RJ and Lehmann T 2008 Afrotherian mammals: a testicular descent in mammals: a phylogenetic view. J. Theor. review of current data. Mammalia 72 2–14 Biol. 196 61–72 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Xie C, Yuan J, Li H, Li M, Zhao G, Bu D, et al. 2014 NON- van Baren MJ, et al. 2010 Transcript assembly and abun- CODEv4: exploring the world of long non-coding RNA genes. dance estimation from RNA-Seq reveals thousands of new Nucleic Acids Res. 42 D98–D103 transcripts and switching among isoforms. Nat. Biotechnol. Zhi H, Ning S, Li X, Li Y, Wu W and Li X 2014 A novel 28 511–515 reannotation strategy for dissecting DNA methylation patterns Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. of human long intergenic non-coding RNAs in cancers. Nucleic 2012 Differential gene and transcript expression analysis of Acids Res. 42 8258–8270

MS received 03 November 2015; accepted 16 November 2015

Corresponding editor: PARTHA PMAJUMDER

J. Biosci. 40(5), December 2015