Mol Genet Genomics DOI 10.1007/s00438-016-1233-9

ORIGINAL ARTICLE

Transcriptome sequencing and de novo characterization of Korean endemic , Koreanohadra kurodana for functional transcripts and SSR markers

Se Won Kang1 · Bharat Bhusan Patnaik1,2 · Hee‑Ju Hwang1 · So Young Park1 · Jong Min Chung1 · Dae Kwon Song1 · Hongray Howrelia Patnaik1 · Jae Bong Lee3 · Changmu Kim4 · Soonok Kim4 · Hong Seog Park5 · Yeon Soo Han6 · Jun Sang Lee7 · Yong Seok Lee1

Received: 10 February 2016 / Accepted: 25 July 2016 © Springer-Verlag Berlin Heidelberg 2016

Abstract The Korean endemic land snail Koreanohadra the public databases. The direction of the unigenes to func- kurodana (: Bradybaenidae) found in humid tional categories was determined using COG, GO, KEGG, areas of broadleaf forests and shrubs have been consid- and InterProScan protein domain search. The GO analysis ered vulnerable as the number of individuals are declining search resulted in 22,967 unigenes (12.02 %) being cate- in recent years. The species is poorly characterized at the gorized into 40 functional groups. The KEGG annotation genomic level that limits the understanding of functions at revealed that metabolism pathway genes were enriched. the molecular and genetics level. In the present study, we The most prominent protein motifs include the zinc finger, performed de novo transcriptome sequencing to produce a ribonuclease H, reverse transcriptase, and ankyrin repeat comprehensive transcript dataset of visceral mass tissue of domains. The simple sequence repeats (SSRs) identified K. kurodana by the Illumina paired-end sequencing tech- from >1 kb length of unigenes show a dominancy of dinu- nology. Over 234 million quality reads were assembled cleotide repeat motifs followed with tri- and tetranucleotide to a total of 315,924 contigs and 191,071 unigenes, with motifs. A number of unigenes were putatively assessed to an average and N50 length of 585.6 and 715 bp and 678 belong to adaptation and defense mechanisms including and 927 bp, respectively. Overall, 36.32 % of the unigenes heat shock proteins 70, Toll-like receptor 4, AMP-activated found matches to known protein/nucleotide sequences in protein kinase, aquaporin-2, etc. Our data provide a rich source for the identification and functional characterization of new genes and candidate polymorphic SSR markers in Communicated by S. Hohmann. K. kurodana. The availability of transcriptome informa- tion (http://bioinfo.sch.ac.kr/submission/) would promote S. W. Kang and B. B. Patnaik contributed equally to this work. the utilization of the resources for phylogenetics study and Electronic supplementary material The online version of this genetic diversity assessment. article (doi:10.1007/s00438-016-1233-9) contains supplementary material, which is available to authorized users.

* Yong Seok Lee 4 National Institute of Biological Resources, 42, [email protected] Hwangyeong‑ro, Seo‑gu, Incheon 22689, Korea 5 1 Department of Life Science and Biotechnology, College Research Institute, GnC BIO Co., LTD., 621‑6 of Natural Sciences, Soonchunhyang University, Banseok‑dong, Yuseong‑gu, Daejeon 34069, Korea 22 Soonchunhyangro, Shinchang‑myeon, Asan, 6 College of Agriculture and Life Science, Chonnam National Chungcheongnam‑do 31538, Korea University, 77 Yongbong‑ro, Buk‑gu, Gwangju 61186, Korea 2 Trident School of Biotech Sciences, Trident Academy 7 Institute of Environmental Research, Kangwon National of Creative Technology (TACT), Chandaka Industrial Estate, University, 1 Kangwondaehak‑gil, Chuncheon‑si, Chandrasekharpur, Bhubaneswar, Odisha 751024, India Gangwon‑do 243341, Korea 3 Korea Zoonosis Research Institute (KOZRI), Chonbuk National University, 820‑120 Hana‑ro, Iksan, Jeollabuk‑do 54528, Korea

1 3 Mol Genet Genomics

Keywords Koreanohadra kurodana · Transcriptomics · De threatened for extinction with lack of regional conservation novo assembly · Functional annotation · Simple sequence measures. With genomic information lacking for the spe- repeats cies, an evaluation of population genetic structure, genetic diversity, and possible evolutionary responses to environ- mental perturbations and climate change are not feasi- Introduction ble. The National Centre for Biotechnology Information (NCBI) database only shows the sequence of mitochondrial Gastropods represent one of the most successful classes protein cytochrome c oxidase I (CO-1) under the synonym of living molluscs having greatly reduced shells or a com- name Fruticicola koreana. plete absence of shells. From an evolutionary standpoint it Next-generation sequencing (NGS) such as Illumina is perplexing, though on account of parallel development sequencing by synthesis (SBS) allows for massive char- of such an adaptation trait, it relates to a strong advantage acterization of the transcriptome by generation of mil- for colonization on land. The Gastropod class of molluscs lions of short reads. The short reads produced provide a has invaded the land with more than 62,000 described liv- more sensitive application for the detection of differential ing species, comprising about 80 % of living molluscs. The expression compared to conventional gene arrays (Marioni land snails are the most numerous with close to 35,000 et al. 2008; Collins et al. 2008). The Illumina platform has described species, mostly inhabiting the tropics. Still, many become increasingly cost-effective and been successfully species remain undescribed mainly due to their minute used to characterize the transcriptome of non-model spe- sizes and under-exploration from extreme terrestrial eco- cies with greater precision (Wang et al. 2010, 2012; Meng zones. The land snail family Bradybaenidae, identified on et al. 2013; Yue et al. 2015; Patnaik et al. 2015; Hwang the basis of the presence of dart sac and dart-related organs, et al. 2016). The Illumina NGS technology has also greatly are a relatively younger taxon, to the closely related Cama- benefitted with the development of algorithms and experi- enidae family (Hirano et al. 2014). The bradybaenids are ments for de novo assembly (Martin and Wang 2011; Ekb- extremely specious with distribution in Northwest Amer- lom and Wolf 2014). The Illumina sequencing technology ica and Europe, Southeast and East Asia including Japan, has been utilized for the transcriptome characterization of Korea, Taiwan, Indonesia, Philippines, and China (Chen molluscs and annotation of genes suggested to participate and Zhang 2004; Wu et al. 2007; Kuznik-Kowalska et al. in various metabolic, cellular and immune pathways (Cas- 2013; Hirano et al. 2014; Huang et al. 2014). A study by tellanos-Martinez et al. 2014; Deng et al. 2014; Patnaik Kwon et al. (1993) classified the Korean Bradybaenidae et al. 2016a). Furthermore, molecular markers such as sin- snails into 24 species. Among the land snail fauna of Jeju gle nucleotide polymorphism (SNP) and simple sequence Island, species belonging to family Bradybaenidae are one repeats (SSR) have been detected from de novo assembled of the largest, with eight reported species (Noseworthy sequences of Illumina-based transcriptome of molluscs et al. 2007). Under bradybaenid snails, Koreanohadra has (Franchini et al. 2011; Patnaik et al. 2016b). In this study, been classified as a genus of its own by Min et al. (2004) we have used Illumina HiSeq 2500 sequencing to gener- and accorded synonyms of Fruticicola and Karaftohelix by ate a transcriptome of K. kurodana. We have conducted Schileyko (2004) and Sysoev and Schileyko (2009), respec- de novo assembly of the transcriptome and used bioinfor- tively. Koreanohadra kurodana is a bradybaenid land snail matics analysis to annotate the functional direction of the species endemic to Korea. The species is found at humid assembled sequences. In doing so, we have suggested the areas in broadleaf forests with shrubs and known from predicted functions based on database classification struc- Gangwon-do and Gyeonggi-do provinces of Korea. Chro- ture. This is the first large-scale transcriptome information mosome numbers and karyotype of K. kurodana reported for K. kurodana that could be useful in providing valuable by Lee and Kwon (1993) suggests that it possesses 11 pairs resources towards functional gene discovery and devel- of metacentric chromosomes, 17 sub-metacentric and one opment of molecular markers for the conservation of the telocentric chromosome. Such an analysis was referred species. while reporting the 29 chromosome pairs in another species from the same genus, Koreanohadra koreana (Park 2011). Koreanohadra kurodana has been assessed as vulner- Materials and methods able ( 10 species occupied within <2000 km2) by Korean ≤ Red List of Threatened Species, 2014. A continual decline Species collection and RNA extraction has been observed in the area of occupancy of the species due to predation by natural enemies, indiscriminate collec- Specimens of K. kurodana were collected from the moun- tion for ornamental use, change and loss of natural habi- tain near to Maha-ri, Mitan-myeon, Pyeongchang-gun, tats caused by forest development. The species has been Gangwon-do, Republic of Korea in August 2014. No

1 3 Mol Genet Genomics prior permission and ethical clearance was required as the The reads, thus, obtained were assembled using Trinity species is endemic to Korea and enlisted under ‘vulner- software with default options (minimum allowed length able’ category of Korean Red List KORED assessments. of 200 bp) (Grabherr et al. 2011). A K-mer length of 25 As per the regulations of National Institute of Biological was chosen that seems appropriate as smaller K-mers lead Resources (NIBR), Korea, only the collection of ‘endan- to complex de Bruijin graphs. Trinity de novo program gered’ species class I or class II needs a permission certifi- used extensively as an assembly algorithm compiles the cate. The collected specimens were housed in the labora- sequence data into a number of de Bruijin graphs that ulti- tory within purpose-built enclosures at room temperature, mately reports transcripts in its final form. The TIGR Gene and provided with water and carrot diet ad libitum. Nec- Indices Clustering Tool (TGICL) (Pertea et al. 2003) was essary care was taken during the period of laboratory used to assemble unigene sequences (without Ns that could acclimatization and dissection of visceral mass tissue fol- not be extended on either end). lowed the International Guiding Principles of Biomedi- cal Research Involving (1985) (http://www.ncbi. Annotation of unigene sequences nlm.nih.gov/books/NBK25438/). Visceral mass tissue was dissected from three living snails, and was stored in RNA- The non-redundant unigenes were annotated to public pro- later stabilization reagent (Qiagen) that preserves the integ- tein sequence repositories using BLASTX at an E value rity of RNA in tissue samples. Total RNA was extracted cutoff of 1E 5. The databases for annotation include Pro- − with TRIzol® reagent (Invitrogen, Carlsbad, CA, USA) tostome (PANM-DB), Clusters of Orthologous Groups using manufacturer protocols. Residual DNA (if any) was (COG) (http://www.ncbi.nlm.nih.gov/COG/), Swiss-Prot, removed using RNase-free DNase I (Qiagen) by incubation Gene Ontology (GO), and Kyoto Encyclopedia of Genes at 37 °C for 30 min. Following extraction, the RNA quality and Genomes (KEGG) (Kanehisa et al. 2010). The Uni- was assessed on a 1.2 % formaldehyde agarose gel electro- gene DB resource was also utilized for the annotation of phoresis and quantified using NanoDrop Spectrophotom- assembled sequences using BLASTN. PANM-DB regis- eter (Thermo Scientific, Wilmington, DE) and Agilent 2100 ters the protein sequence information of species belong- Bioanalyzer (Agilent Technologies). ing to , Arthropoda, and Nematoda phylum in a multi-FASTA format. PANM-DB is readily accessible in a Illumina HiSeq 2500 sequencing and de novo web-based interface of the Malacological Society of Korea transcriptome analysis (http://malacol.or.kr/blast/aminoacid.html) (Kang et al. 2015). The Venn diagram plot constructed using Venny 2.0 Paired-end cDNA library was prepared using the mRNA- (http://bioinfogp.cnb.csic.es/tools/venny/) was used for the seq sample preparation kit. For the same, poly (A) mRNA representation of sequence similarity using PANM, Uni- was isolated using oligo (dT) beads and fragmented using gene, and COG DB. The BLAST2GO software (Ashburner the fragmentation buffer. The cleaved mRNA was used to et al. 2000; Conesa et al. 2005) was utilized under default synthesize first strand cDNA using random-hexamer prim- parameters for InterProScan protein domain analysis, GO ers and reverse transcriptase (Invitrogen). RNase H (Inv- functional and KEGG pathway mapping. itrogen) and DNA polymerase I (New England BioLabs, Ipswich, MA, USA) were used to prime the second-strand Discovery of SSR markers cDNA. The double-stranded cDNA, thus formed was end- repaired using T4 DNA polymerase, the Klenow fragment, We screened unigenes of length >1000 bp for SSR type and T4 polynucleotide kinase (New England BioLabs). markers using the MISA (MicroSAtellite; http://pgrc.ipk- Illumina paired-end adapters were ligated to the end- gatersleben.de/misa, version 1.0) software. We selected for repaired cDNA and were sequenced at GnC Bio Company, di-, tri-, tetra-, penta-, and hexanucleotide repeats with six, Daejeon, Korea, using an Illumina HiSeq 2500 sequencing five, four, four, and four minimum iterations, respectively. platform (2 100 bp paired-ends). The sequencing reads Other higher repeat motifs were not considered due to their × were pre-processed for adapter removal using Cutadapt restricted occurrence within the selection criteria. Mononu- program (http://code.google.com/p/cutadapt/) with default cleotides were not considered due to frequent homopolymer parameters. After filtering the adapter-only sequences, generation in Illumina sequencing platform. Subsequently, low-quality reads ( 20) were removed using Phred (http:// we designed a set of PCR primers using the BatchPrimer 3 ≥ www.phrap.com/). GC content bias in the first 13 bases of program (You et al. 2008) that flanks the SSR microsatel- Illumina reads, possibly introduced through random-hex- lites. Considering the utilization of the same in polymor- amer priming were also removed to get high quality reads. phism studies we used the following criteria for primer

1 3 Mol Genet Genomics

Table 1 Sequence assembly statistics after transcriptome sequencing using Illumina Next-Generation sequencing platform Description Total number Total nucleotides (Mb) Mean length (bp) N50 length (bp) GC%

Raw reads 238,755,412 30,083.18 126 126 47.07 Clean reads 234,382,719 28,988.89 123.7 126 47.06 Contigs 315,924 185 585.6 715 42.95 Unigenes 191,071 129.55 678 927 42.84 Kmer 25 High-quality reads 98.17 % (sequences), 96.36 % (nucleotides) Largest contig (bp) 18,009 No. of large contigs ( 500 bp) 98,300 ≥ Range of unigenes length (bp) 107–25,186 identification: dinucleotides with ten or more iterations and 31.12 % of contigs had sizes of 500 bp with the larg- ≥ tri-/tetranucleotide repeats with seven or more iterations. est contig of length 18,009 bp. These contigs were fur- ther assembled into 191,071 unigenes comprising of 129.55 Mb of sequences. While the length of unigenes Results ranged from 107 to 25,186 bp, the mean length, N50 length and GC% were 678 bp, 927 bp, and 42.84 %, respectively. High throughput sequencing of the visceral mass The de novo assembly statistics for this study are given in transcriptome Table 1. Considering the length distribution of the assem- bled contig sequences, almost 68.98 % of contigs were The Illumina sequencing of visceral mass samples from found shorter than 500 bp and only 11.81 % of contigs the bradybaenid snail, K. kurodana, generated 238,755,412 were longer than 1000 bp. Within the final assembled uni- paired-end reads with 30,083.18 Mb of sequence data. The genes, 61.14 % were of length 500 bp, 22.75 % were no ≤ average length of raw reads were 126 bp with a GC% of more than 1000 bp, and 16.11 % were of length 1000 bp. ≥ 47.07. The raw read sequence data in FASTQ format was The size distribution of assembled contigs and unigenes for deposited in the National Centre for Biotechnology Infor- this study is shown in Fig. 1. mation (NCBI) Sequence Read Archive (SRA) database under the accession number SRP065002. Further, the Sequence similarity of assembled unigenes assembled contigs can be downloaded from the server on or after 16-10-2016 (1 year after the initial date of submission). The assembled unigenes were annotated against the public The pre-processing of the reads included adapter trim- protein (PANM, COG, Swiss-Prot, GO, and KEGG data- ming wherein 30,023.45 Mb were filtered with 99.80 % bases) and nucleotide sequence database (Unigene) using read efficiency. The average read length after trimming was BLASTX and BLASTN at an E value of 1E 5. Out of ≤ − 125.8. The adapter trimming statistics is summarized in S1 191,071 assembled unigenes, 36.32 % of unigenes found Table. Further processing, qualified 234,382,719 reads with a match to any one of the databases. A total of 62,758 uni- 28,988.89 Mb as clean reads. This suggests that 1.83 % of genes (32.85 %) found a match to PANM database, while sequences did not pass the quality control check and hence 40,041 (20.96 %), 22,504 (11.78 %), 27,894 (14.6 %), were eliminated before the assembly process. The clean 22,967 (12.02 %), and 2104 (1.1 %) unigenes showed reads showed a mean and N50 length of 123.7 and 126 bp, homology to available sequences in Swiss-Prot, Unigene, respectively, with a GC% of 47.06. COG, GO, and KEGG databases, respectively. A greater proportion of the annotated unigene sequences were having De novo assembly of the visceral mass transcriptome lengths of 1000 bp. In total, 33.3 % of all database anno- ≥ tated sequences were having lengths 1000 bp and 17.62 % ≥ The Trinity de novo program was used to assemble the of sequences had lengths in between 300 and 1000 bp. The clean reads to contiguous overlapping sequences defined sequence similarity of the assembled unigenes against as contigs. A total of 315,924 contigs were obtained com- public databases in this study is given in Table 2. A total prising of 185 Mb of sequences. The mean length, N50 of 15,868 unigene sequences show homology to protein length (i.e., 50 % of total assembled sequences were hav- entries in both PANM and COG database while 5124 and 2 ing this length or longer contigs), and GC% of contigs unigenes show homology to sequences in both PANM and were 585.6 bp, 715 bp, and 42.95 %, respectively. About Unigene and Unigene and COG databases, respectively.

1 3 Mol Genet Genomics

Fig. 1 Size distribution of de novo assembled contigs (a) and unigenes (b) from the visceral mass transcriptome of K. kurodana

Table 2 Annotation hits of K. kurodana unigenes from transcriptome assembly to public databases Databases All 300 bp 300–1000 bp 1000 bp ≤ ≥ PANM-DB 62,758 10,735 29,897 22,126 Unigene 22,504 3596 10,123 8785 COG 27,894 2622 9964 15,308 GO 22,967 2863 9028 11,076 KEGG 2104 140 699 1,265 InterProScan 24,136 2520 9032 12,584 SwissProt 40,041 5240 16,708 18,093 ALL 69,393 12,361 34,064 22,968

A total of 12,018 unigene sequences showed matches to sequences in all three databases. The three-way Venn dia- gram depicting the unigene annotation to PANM, Unigene, and COG databases in this study is shown in Fig. 2. The top-hit species distribution of the unigenes against PANM database using BLASTX program is shown in Fig. 3. In total, 46.07 % of K. kurodana unigene sequences matched Aplysia californica protein sequences. This was signifi- cant as the other top-hit species accounted for a very small Fig. 2 A Venn diagram showing the annotation profile of K. proportion of the unigene sequences. Matches against Lot- kurodana unigenes against PANM-DB (Protostomia database), Uni- tia gigantea and Crassostrea gigas were limited to about gene DB, and COG DB 9.3 % and 7.11 %, respectively. Continuing with the BLASTX annotation of the assem- (Fig. 4b). The similarity distribution showed 65.24 % of bled unigenes of K. kurodana against PANM database at a unigenes to have a similarity of more than 80 % with the cutoff of 1E 5, we obtained the homology match statistics matched database sequences (Fig. 4c). The number of uni- − as shown in Fig. 4. The E value distribution indicated that genes hit to matched sequence in the database increased a major proportion of unigenes (67.11 %) had an E value proportionately with the length of the unigenes. Sequences between 1E 50 and 1E 5. Only 12.75 and 10.93 % of with length less than 500 bp showed a maximum unigene − − unigenes had E value of 1E 100 to 1E 50 and 0, respec- hit of approximately 25 %, while sequences of length − − tively (Fig. 4a). Only 30.68 % of unigenes showed an iden- above 2001 bp showed unigene hit of approximately 90 %. tity greater than 60 % to the matched database sequences This suggests that a greater proportion of longer unigenes

1 3 Mol Genet Genomics

Fig. 3 Top-hit species distribution of K. kurodana unigenes against PANM-DB using BLASTX

Fig. 4 Homology search of K. kurodana unigenes using BLASTX against PANM-DB. a BLAST E value distribution analysis, b BLAST iden- tity distribution, c BLAST similarity distribution, and d unigene hit and non-hit ratio

1 3 Mol Genet Genomics

Fig. 5 COG classification of K. kurodana unigenes to functional categories would find matches to a homologous sequence in the data- under ‘function unknown (6.8 %)’, ‘posttranslational modi- base. The hit and non-hit ratio of K. kurodana unigenes to fication, protein turnover and chaperones (5.5 %)’, and PANM database is shown in Fig. 4d. ‘transcription (4.2 %)’ functional groups. The COGs with lowest number of unigenes were ‘defense mechanisms Functional classification of assembled unigenes (0.8 %)’, ‘chromatin structure and dynamics (0.6 %)’, ‘coenzyme transport and metabolism (0.5 %)’, and ‘cell The assembled unigenes of K. kurodana were ascribed pre- motility (0.2 %)’. dicted functions based on COG classifications, GO terms, The assembled unigenes from this study were subjected and KEGG orthology mappings. Furthermore, the unigenes to GO analysis that identifies the transcribed region and were searched for conserved protein domains using the suggests the grouping of a gene to known (or predicted) InterPro protein domain search. COG database includes function under biological process, cellular component, orthologous proteins and classifies the gene products into and molecular functions. This is an international standard- broad functional categories including a class of general ized gene functional classification that only suggests the function prediction and uncharacterized COGs (Tatusov putative products in an organism. Out of the 22,967 uni- et al. 2001). A phylogenetic classification of the coding genes annotated to GO database, 39.61 % unigenes were sequences of assembled unigenes within COG database for specifically classified under molecular function, 3.61 % K. kurodana unigenes is shown in Fig. 5. We find that the under biological process, and 3.13 % under cellular com- assembled unigenes were classified into 25 functional COG ponent category. Many unigenes classified under more categories. The ‘multi’ category represents those unigene than one GO functional category showing the overlap- sequences that belong to more than one of the functional ping structure in the Venn diagram (Fig. 6a). A total of COG categories. Among the representative COG catego- 28.95 % unigenes were grouped to biological process and ries, ‘general function prediction’ and ‘signal transduction molecular function, 3.65 % to biological process and cel- mechanisms’ were prominent with 21.1 and 10.9 % of uni- lular component, and 2.69 % to cellular component and genes classified, respectively. Following the larger COG molecular function. About 18.35 % of unigenes got clas- categories, the unigenes were also predicted to classify sified under all the three GO functional categories. A total

1 3 Mol Genet Genomics

Fig. 6 Annotation profile ofK. kurodana unigenes against the GO molecular function. b An analysis of GO annotated unigenes based database. a Venn diagram showing the predicted classification of uni- on the number of GO terms genes to GO categories of biological process, cellular component and of 7026 assembled unigene sequences annotated against and vitamins (964 unigenes with 32 enzymes), carbohy- GO database were suggested to classify with a single GO drate metabolism (402 unigenes with 48 enzymes), and term. The remaining annotated sequences got ascribed to xenobiotics biodegradation and metabolism (356 unigenes more than one GO term. The classification of GO anno- with 165 enzymes). All the unigene sequences predicted tated unigene sequences to number of GO terms has been to genetic information processing, environmental infor- shown in Fig. 6b. The annotations of unigenes under GO mation processing and organismal systems were found to terms at level 2 for this study have been shown in Fig. 7. fall under translation (aminoacyl-tRNA biosynthesis), sig- The most predominant GO terms under biological process nal transduction (mTOR and phosphatidylinositol signal- category were cellular process (GO: 0009987), metabolic ing system), and immune system (T-cell receptor signal- process (GO: 0008152), and single-organism process (GO: ing pathway), respectively. The KEGG pathway analysis 0044699). Many unigenes were also predicted to function obtained in this study is shown in Fig. 8. Furthermore, as under response to stimulus (GO: 0050896) and signaling shown in S2 Table, most unigenes under metabolism were (GO: 0023052) (Fig. 7a). Under the cellular component classified to purine metabolism, most under metabolism category, the most prominent GO terms included mem- of cofactors and vitamins to thiamine metabolism, and brane (GO: 0016020), cell junction (GO: 0005623), orga- most under xenobiotics biodegradation and metabolism to nelle (GO: 0043226), and macromolecular complex (GO: aminobenzoate degradation. We made an attempt to iden- 0032991) (Fig. 7b). In the category of molecular function, tify unigenes suggested to participate in the adaptation and the term binding (GO: 0005488) showed the highest pro- defense mechanisms of the snail species (Table 3). Among portion of annotations, followed by catalytic activity (GO: the candidate gene families, AMP-activated protein kinase 0003824) (Fig. 7c). (AMPK), heat shock proteins 70, insulin receptor sub- The KEGG pathway database supports the study of the strate-1, Toll-like receptor 4 (TLR4), and aquaporin-2 were biological functions of genes by elucidating the regula- represented by the unigenes. tory pathway specific to particular organisms (Altermann The BLAST2GO annotation, directly performed on the and Klaenhammer 2005). The mapping of identified uni- unigene sequences, was useful in the identification of Inter- genes of K. kurodana in the KEGG database revealed 4725 Pro conserved domains in 12.63 % of the sequences. The sequences with known enzymes (661) assigned to intracel- 25 top-hit InterPro domains has been listed in Table 4. The lular metabolic pathways, organismal systems, environ- IPR015880 (Zinc finger, C2H2-like domain) was the most mental information processing, and genetic information represented (1739 unigene sequences), followed up with processing pathways. Within the predominant metabolism IPR012337 (Ribonuclease H-like domain) and IPR000477 group, the unigenes sequences were predicted to func- (reverse transcriptase domain). Other notable top-hit Inter- tion largely under nucleotide metabolism (1516 unigenes Pro domains identified in this study includes IPR002110 with 9 enzymes) followed with metabolism of cofactors (ankyrin repeat), IPR001680 (WD40 repeat), IPR013783

1 3 Mol Genet Genomics

Fig. 7 GO term classification. The annotated unigenes against GO database were ascribed to GO terms under biological process (a), cellular component (b), and molecular function (c) category

(Immunoglobulin-like fold domain), IPR000504 (RNA rec- (1764, 25.48 %), AG/CT (17.35 %), and AT/AT (5.68 %) ognition motif domain), IPR001304 (C-type lectin domain) formed the predominant dinucleotide motif types. ATC/ and IPR002126 (cadherin domain). ATG (1007, 14.54 %) and ACAG/CTGT (1.98 %) formed the predominant trinucleotide and tetranucleotide motif SSR detection in the assembled unigenes type, respectively. The frequency of repeat motif types obtained in this study is shown in Fig. 9b. For future stud- For SSR identification, we examined 30,827 assem- ies on the polymorphic assessment of the species diver- bled unigene sequences ( 1000 bp) with a total base sity, we have specified the primer sequences flanking the ≥ size of 61,674,216 bp. We identified 6924 SSRs in 4939 di- (minimum 10 repeat numbers), tri-, and tetranucleotides sequences, with 1290 sequences containing more than one (minimum 7 repeat numbers) (S3 Table). Many of the uni- SSR. Additionally, we found 1269 SSRs in the compound genes with SSR primer sequences suggests putative func- form. Among the identified SSRs, dinucleotide (3406, tional status such as T-cell receptor, TRAF3 interacting 49.19 %) repeats were the predominant followed by tri- protein, RAB2, beclin 1-like isoform, acetylcholine recep- nucleotide (2628, 37.95) and tetranucleotide (11.57 %) tor subunit, and fibrinogen-related protein 1 that are essen- repeats. While we selected for a minimum of six repeat tial components of innate immunity system and defense numbers for dinucleotide repeats, we screened for trinu- mechanisms. cleotide repeats and other nucleotide repeats (tetra-, penta-, and hexanucleotides) with a minimum repeat number of five and four, respectively. As shown in Fig. 9a, the pre- Discussion dominant repeat numbers were six, five, and four for di-, tri-, and all other nucleotide repeats, respectively. Consider- Next Generation Sequencing technologies are in the fore- ing the repeat motif types identified in this study, AC/GT front of revolutionizing the field of genetics and genomics.

1 3 Mol Genet Genomics

Fig. 8 Kyoto Encyclopedia of Genes and Genomes (KEGG) path- within each pathway is shown in the inner circle. Each pathway is way analysis. The KEGG annotated sequences to different ontolo- represented in a different color gies are shown (outer circle). The total number of enzyme sequences

The mRNA sequencing of non-model invertebrates have sequences (Ren et al. 2012; Zhang et al. 2014). Similarly, been gaining popularity among researchers as it promises N50 is suggested to provide a measure of continuity but not to establish the species at the interface of ecology and evo- accuracy of contigs (Salzberg et al. 2012). Some reports lution. The transcriptomes of economically important, rare have even suggested the coverage of mitochondrial genes and endangered species of molluscs have been investigated as a parameter for the completeness of assembly process in separate studies using Next Generation platforms such (Coppe et al. 2010; Zhang et al. 2014; Patnaik et al. 2016a), as 454 and Illumina (Werner et al. 2012; Clark and Thorne but this study is impossible in organisms without the mito- 2015). In this study, we targeted the transcriptome of K. chondrial transcriptome. Hence, to our understanding no kurodana, a vulnerable gastropod species endemic to South general criterion satisfies the quality of de novo transcrip- Korea. It was hypothesized that, in order to conserve the tome assembly. species, a phenotypic screen of favorable traits are neces- The derived information on the unigene sequences, sary that could manifest towards manipulation of adaptive obtained after clustering of the contig assembly were con- features. The visceral mass transcriptome clarified a total of sidered for the discovery of known genes. The query uni- 234,382,719 high quality reads assembling to overlapping gene sequences were matched with the subject homolo- contiguous sequences. The average and N50 contig length gous sequences from the public databases. The maximum thus obtained were greater in this study than that obtained matches of unigenes to PANM database in the present with Illumina sequencing and de novo contig assembly of study is justified as it is a refined protostome group data- molluscs such as Radix balthica (Feldmeyer et al. 2011), base used to improve the efficiency of the annotation in a Haliotis midae (Franchini et al. 2011), Chlamys farreri (Shi reduced time when compared with NCBI non-redundant et al. 2013), Echinolittorina malaccana (Wang et al. 2014), database (Kang et al. 2015). About 64.05 % of unigenes and Anadara trapezia (Prentis and Pavasovic 2014). Con- remain unannotated and we assume that these would be sidering the evaluation of de novo transcriptome assem- short sequences with the absence of a conserved protein blies wherein reference sequences are unavailable, median domain. It is also expected that the unannotated sequences contig length, N50 contig length and number of contigs may be non-coding genes, untranslated regions or just ran- are common indicators of quality, although higher total dom transcriptional noise. The unigenes showing greater contig lengths may also suggest a greater redundancy of match to A. californica protein sequences is possibly due

1 3 Mol Genet Genomics

Table 3 List of adaptation-related unigenes identified from K. kurodana transcriptome Candidate genes family Number of unigene Full name Symbol Also known as

Angiotensin I converting enzyme ACE DCP; ICH; ACE1; DCP1; CD143; MVCD3 545 Adenylate cyclase activating polypeptide 1 ADCYAP1 PACAP 376 Angiotensinogen AGT ANHU; SERPINA8 219 AMP-activated protein kinase AMPK AK; AmpKalpha; CG3051; dAMPK; Gprk4; SNF1A 6026 Aquaporin 2 AQP2 AQP-CD; WCH-CD 2481 Basic helix–loop–helix family, member e40 BHLHE40 DEC1; HLHB2; BHLHB2; STRA13; Stra14; 36 SHARP-2 Basic helix–loop–helix family, member e41 BHLHE41 DEC2; hDEC2; BHLHB3; SHARP1 80 Chromosome 9 open reading frame 3 C9ORF3 APO; AP-O; AOPEP; ONPEP; C90RF3 5 Collagen type I alpha 1 COL1A1 OI1; OI2; OI3; OI4; EDSC 847 Deiodinase, iodothyronine DIO 5DI; TXDI1 199 Endothelial PAS domain protein 1 EPAS1 HLF; MOP2; ECYT4; HIF2A; PASD2; bHLHe73 168 Glutamate ionotropic receptor delta type subunit 1 GRID1 GluD1; GluRdelta1 142 Glutamate ionotropic receptor NMDA type subunit GRIN2B MRD6; NR2B; hNR3; EIEE27; GluN2B; 470 2B NMDAR2B Heat shock proteins 70 HSP70 HSPA6; HSP70B 4160 Insulin receptor substrate 1 IRS1 G972R; IRS-1 3066 Mitogen-activated protein kinase kinase kinase 15 MAP3K15 ASK3; bA723P2.3 51 Phospholipase A2 group XIIA PLA2G12A GXII; ROSSY; PLA2G12 4 Regulator of cell cycle RGCC RGC32; RGC-32; C13orf15; bA157L14.2 92 Somatolactin SL 74 Solute carrier family 14 member 2 SLC14A2 UT2; UTA; UTR; HUT2; UT-A2; hUT-A6 1 Stimulated by retinoic acid 6 STRA6 MCOPS9; MCOPCB8; PP14296 585 T-box 5 TBX5 HOS 903 Toll-like receptors4 TLR4 TIL4; CD282 2907 to extensive genomic information for the species (Broad including molluscs (Lv et al. 2014; Patnaik et al. 2016a, b). Institute, A. californica genome project). Additionally, the GO analysis was conducted on annotated transcripts using neuronal transcriptome (Moroz et al. 2006), developmen- BLAST2GO software. GO classification is not an evidence tal transcriptome (Heyland et al. 2011), and early life his- of functionality as all GO terms do not show equal validity tory stages transcriptome (Fiedler et al. 2010) of A. califor- (Rhee et al. 2008; Rogers and Ben-Hur 2009). Here, con- nica also have been reported suggesting its importance as a sidering the GO terms, we emphasize the point that each model system for cellular and molecular studies involving GO term has an associated evidence code. In this study, as learning and behavior (Leonard and Edstrom 2004; Choi in many other species, the majority of terms are assigned et al. 2010). Similarly, genomic resources for the freshwa- the code ‘IEA’ (inferred from electronic annotation); these ter snail, L. gigantea (Simakov et al. 2013) and the oyster, are terms that have no experimental verification and hence C. gigas (Zhang et al. 2012) are abundant that could reflect would be of questionable validity. Additionally, GO terms a greater homology of K. kurodana unigenes. The unigene also contain low quality terms with evidence codes such as annotation results in this study seems appropriate espe- ‘NAS’ (non-traceable author statement) or ‘ND’ (no bio- cially when related to the minor BLAST hits to mollusca logical data available). Hence, the GO interpretation of uni- sequences reported for other molluscan transcriptomes genes in this study is only predicted. (Sadamoto et al. 2012). The BLAST annotation procedure We also identified the unigenes responsible for a regu- involving databases has been considered as the best utiliz- latory function in defense mechanism and adaptation. able model in ascertaining the unigene direction in most AMPK is called the “fuel gauge” of the cell as it regu- non-model species (Rao et al. 2015; Chen et al. 2015; Pat- lates the catabolic ATP-generating and anabolic ATP con- naik et al. 2015). suming processes. AMPK helps in the adaptation of land The COG classification of unigenes in this study are snails as it responds to hypoxia, glucose deprivation, 2 in consistent with studies in other non-model organisms various cytokines, elevated intracellular Ca + and tumor

1 3 Mol Genet Genomics

Table 4 Top-hit 25 Domain Description Unigene InterProScan protein domains annotated within the unigene IPR015880 Zinc finger, C2H2-like domain 1739 sequences of K. kurodana IPR012337 Ribonuclease H-like domain 432 IPR000477 Reverse transcriptase domain 396 IPR002110 Ankyrin repeat 364 IPR013087 Zinc finger C2H2-type/integrase DNA-binding domain 323 IPR027417 P-loop containing nucleoside triphosphate hydrolase domain 261 IPR002290 Serine/threonine/dual specificity protein kinase, catalytic domain 226 IPR003591 Leucine-rich repeat, typical subtype repeat 195 IPR001680 WD40 repeat 184 IPR005135 Endonuclease/exonuclease/phosphatase domain 183 IPR013783 Immunoglobulin-like fold domain 176 IPR000504 RNA recognition motif domain 173 IPR000276 G protein-coupled receptor, rhodopsin-like family 172 IPR011701 Major facilitator superfamily 156 IPR001304 C-type lectin domain 153 IPR002126 Cadherin domain 145 IPR002048 EF-hand domain 138 IPR000742 EGF-like domain 132 IPR011989 Armadillo-like helical domain 126 IPR001841 Zinc finger, RING-type domain 122 IPR012336 Thioredoxin-like fold domain 119 IPR001888 Transposase, type 1 family 107 IPR001478 PDZ domain 103 IPR001245 Serine-threonine/tyrosine-protein kinase catalytic domain 101 IPR000008 C2 domain 98

Fig. 9 SSR repeats and types identified from the assembled unigene sequences. a The frequency of SSR repeat types in relation to the repeat numbers. b The diverse repeat motif types and their frequency of occurrence in the unigene sequences suppressors (Hardie 2007; Storey and Storey 2010). Heat 2015). The discovery of unigenes predicted to biochemical shock protein 70 family of proteins are known stress mark- pathways related to metabolism, immunity and defense will ers and have been implicated in the defense mechanisms be useful in advancing the understanding of K. kurodana and adaptation of the land snail species (Wang et al. 2013). functions under biotic and abiotic stressors. TLRs with high identity to TLR4 have also been identi- With the InterPro domain analysis, significant attrib- fied in the transcriptome of freshwater snail, Oncomelania utes of the unigenes could be suggested. The most com- hupensis, responsible for Toll signaling process (Zhao et al. mon zinc finger domain proteins can interact with the

1 3 Mol Genet Genomics nucleotides (DNA, RNA), other proteins and lipids and simple repeated sequences in the functional transcripts. contribute to regulation of transcription, processing and The transcriptome-derived microsatellites are more ame- stability of mRNA, and protein turnover (Ravasi et al. nable to studies related to conservation genomics as these 2003). They are most often the top-hit abundant domains correlate to known (or predicted) genes and show better in transcriptomes of non-model organisms including mol- transferability among different species (Kashi et al. 1997; luscs (Firmino et al. 2013; Zhao et al. 2014; Patnaik et al. Wang and Guo 2007). Lately, many transcriptome studies 2015, 2016b). The ribonuclease H-like domain is included on non-model species characterized microsatellites and with many such unique domain sequences under Ribonu- their direct application towards marker-assisted selection clease H-like (RNHL) superfamily involved to contribute (MAS), growth and reproductive performance, and quanti- in many biological processes including RNA interference fication of genetic diversity within and among the popula- (Majorek et al. 2014). The ankyrin repeat motif (33-resi- tions of rare and endangered species (Mikheyev et al. 2010; due) is also a frequently found domain in many proteins Zheng et al. 2013; Ma et al. 2014; Yue et al. 2015; Barat that serves protein–protein interactions (Mosavi et al. et al. 2016; Patnaik et al. 2016b). In this study, we screened 2004). The functions of the proteins in which ankyrin SSR microsatellite sequences and analyzed the di- to hexa- repeats are populated includes regulation of transcrip- nucleotide repeats both in terms of the number of repeats tion, embryogenesis (Notch protein in Drosophila), tumor and repeat motif types. Furthermore, we recorded the puta- suppressor, and transcription factor (Swi4 and Swi6 in tive functions of the SSR derived unigenes and proposed a yeasts) (Voronin and Kiseleva 2008). WD-40 repeats, also set of primer sequences flanking the SSR sites for polymor- a motif contributing to protein–protein interactions and phism identification in the species and studies on popula- involved in cell death and transcriptional regulation has tion diversity. been found in the transcriptome of Crassostrea virginica Our results were in contrast with the SSRs detected from (Zhang et al. 2014) and Cristaria plicata (Patnaik et al. the gonad transcriptome of Pinctada margaritifera, where 2016b). The immunoglobulin-like fold domain is the most tetranucleotide repeats formed more than half of the SSRs abundant of immunoglobulin-related domains pivoting (Teaniniuraitemoana et al. 2014). In case of C. virginica, essential functions in vertebrate immune responses and dinucleotide repeats were found more frequently after the has been reported from the transcriptomes of non-model abundant mononucleotide repeats (Zhang et al. 2014). species (Pallavicini et al. 2013; Zhao et al. 2014; Patnaik We have ignored the screening of mononucleotide repeats et al. 2015). The C-type lectin domain proteins are primar- in the present study due to the possibility of accessing ily carbohydrate-binding proteins called lectins, but there homopolymers from Illumina sequencing errors. The most are many members with this domain those are not lectins. abundant dinucleotide repeat motif types in the oyster C. Basically, these proteins contribute to innate immunity plicata were also found to be similar, but the predominant in invertebrates including molluscs (Joubert et al. 2010; tri- (AAT/ATT) and tetranucleotide repeat (ACAT/ATGT) Zhang et al. 2014; Patnaik et al. 2016b). The cadherin motif types were different (Patnaik et al. 2016b). The SSRs domain has also been abundantly noticed in transcriptome identified in this study will support the study of popula- analysis of molluscs (Zhang et al. 2012; McDowell et al. tion structure, phylogenetics, and phylogeography of the 2014; Patnaik et al. 2016b). species. An important utility of high-throughput transcriptome Here we report the first transcriptome survey of genetic characterization is to identify and validate microsatellite resources from Koreanohadra kurodana, a non-model markers within the putative transcribed regions. Microsat- species endemic to Korea facing extinction. The 191,071 ellite identification including the development of SSRs and unigenes identified and annotated using bioinformatics SNPs provide for a convenient platform to assess popula- resources enable functional classification to cellular, bio- tion diversity at the genetic level and serves to conserve logical and molecular processes. Many biochemical path- the gene pool for rare and endangered species. The limi- ways in the species were revealed due to the association tations to the development of efficient genetic markers in of the transcripts including enzymes to them. In addition, non-model organisms included the time and cost proto- identification of suitable SSRs provide viable targets for cols as many of these were screened from whole-genome polymorphism studies across Koreanohadra kurodana pop- libraries. But lately, the cDNA based microsatellite iden- ulations and exploring genetic identity or diversity within tification has accelerated the screening, development, related species. and application of markers for the benefit of non-model organisms (Zalapa et al. 2012; Lopez-Uribe et al. 2012). Acknowledgments This work was supported by the grant entitled Advances in NGS technologies, bioinformatics platforms “The Genetic and Genomic Evaluation of Indigenous Biological Resources” funded by the National Institute of Biological Resources and algorithms have helped for an efficient detection of (NIBR201503202) and Soonchunhyang University Research Fund.

1 3 Mol Genet Genomics

Compliance with ethical standards non-model snail species transcriptome (Radix balthica, Basom- matophora, ), and a comparison of assembler perfor- All applicable international, national, and/or institutional guidelines mance. BMC Genom 12:317 for the care and use of animals were followed. Fiedler TJ, Hudder A, McKay SJ, Shivkumar S, Capo TR, Schmale MC, Walsh PJ (2010) The transcriptome of early life history Conflict of interest The authors declare that they have no conflict of stages of the California Sea Hare Aplysia californica. Comp Bio- interest. chem Physiol Part D Genomics Proteomics 5:165–170 Firmino AAP, Fonseca FC, de Macedo LLP, Coelho RR, de Souza Jr JDA, Togawa RC, Silva-Junior OB, Pappas-Jr GJ, Mattar da References Silva MC, Engler G, Grossi-de-Sa MF (2013) Transcriptome analysis in cotton ball weevil (Anthonomus grandis) and RNA interference in insect pests. PLoS One 8:e85079 Altermann E, Klaenhammer TR (2005) Pathway voyager: pathway Franchini P, van der Merwe M, Roodt-Wilding R (2011) Transcrip- mapping using the Kyoto Encyclopaedia of Genes and Genomes tome characterization of the South African abalone Haliotis (KEGG) database. BMC Genom 6:60. doi:10.1186/1471.2164-6-60 midae using sequencing-by-synthesis. BMC Res Notes 4:59 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli DP, Issel-Tarver L, Kasarkis A, Lewis S, Matese JC, Richardson E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nus- JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontology: baum C, Lindblad-Toh K, Friedman N, Regev A (2011) Full- tool for the unification of biology. The Gene Ontology Consor- length transcriptome assembly from RNAseq data without a ref- tium. Nat Genet 25:25–29 erence genome. Nat Biotechnol 29:644–652 Barat A, Kumar R, Goel C, Singh AK, Sahoo PK (2016) De novo Hardie DG (2007) AMP-activated/SNF-1 kinase: conserved guardians assembly and characterization of tissue-specific transcriptome of cellular energy. Nat Rev Mol Cell Biol 8:774–785 in the endangered golden mahseer, Tor putitora. Meta Gene Heyland A, Vue Z, Voolstra CR, Medina M, Moroz LL (2011) Devel- 7:28–33 opmental transcriptome of Aplysia californica. J Exp Zool B Castellanos-Martinez S, Arteta D, Catarino S, Gestal C (2014) De Mol Dev Evol 316B:113–134 novo transcriptome sequencing of the Octopus vulgaris hemo- Hirano T, Kameda Y, Chiba S (2014) Phylogeny of the land snails cytes using Illumina RNASeq technology: response to the infec- bradybaena and phaeohelix (Pulmonata: Bradybaenidae) in tion by the gastrointestinal parasite Aggregata octopiana. PLoS Japan. J Mollusc Studies, pp 1–7 One 9:e107873 Huang C-W, Lee Y-C, Lin S-M, Wu W-L (2014) Taxonomic revision Chen DN, Zhang GQ (2004) Fauna Sinica (Mollusca: Gastropoda: of Aegista subchinensis (Mollendorf, 1884) (, Stylommatophora: Bradybaenidae). Science Press, Beijing, pp Bradybaenidae) and a description of new species of Aegista from 140–142 eastern Taiwan based on multilocus phylogeny and comparative Chen X, Yang Q, Li H, Li H, Hong Y, Pan L, Chen N, Zhu F, Chi X, morphology. Zookeys 445:31–55 Zhu W, Chen M, Liu H, Yang Z, Zhang E, Wang T, Zhong N, Hwang H-J, Patnaik BB, Kang SW, Park SY, Wang TH, Park EB, Chung Wang M, Liu H, Wen S, Li X, Zhou G, Li S, Wu H, Varshney R, JM, Song DK, Patnaik HH, Kim C, Kim S, Lee JB, Jeong HC, Park Liang X, Yu S (2015) Transcriptome-wide sequencing provides HS, Han YS, Lee YS (2016) RNA sequencing, de novo assembly and insights into geocarpy in peanut (Arachis hypogea L.). Plant Bio- functional annotation of a Nymphalid butterfly, Fabriciana nerippe technol. doi:10.1111/pbi.12487 Felder, 1862. Entomol Res. doi:10.1111/1748-5967.12164 Choi SL, Lee Y-S, Rim Y-S, Kim T-H, Moroz LL, Kandel ER, Bhak Joubert C, Piquemal D, Marie B, Manchon L, Pieratt F, Zanella-Cleon J, Kaang B-K (2010) Differential Evolutionary rates of neuronal I, Cochennec-Laureau N, Gueguen Y, Montagnani C (2010) transcriptome in Aplysia kurodai and Aplysia californica as a Transcriptome and Proteome analysis of Pinctada margaritifera tool for gene mining. J Neurogenet 24:75–82 calcifying mantle and shell: focus on biomineralization. BMC Clark MS, Thorne MA (2015) Transcriptome of the Antarctic brood- Genom 11:613 ing gastropod mollusc Margarella antartica. Mar Genomics Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) 24:231–232 KEGG for representation and analysis of molecular networks Collins LJ, Biggs PJ, Voelckel C, Joly S (2008) An approach to tran- involving diseases and drugs. Nucleic Acids Res 38:355–360 scriptome analysis of non-model organisms using short-read Kang SW, Park SY, Patnaik BB, Hwang HJ, Kim C, Kim S, Lee JS, sequences. Genome Inform 21:3–14 Han YS, Lee YS (2015) Construction of PANM Database (Proto- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M stome DB) for rapid annotation of NGS data in mollusks. Korean (2005) Blast2GO: a universal tool for visualization and analysis J Malacol 31:243–247 in functional genomics research. Bioinformatics 21:3674–3676 Kashi Y, King D, Soller M (1997) Simple sequence repeats as a Coppe A, Pujolar JM, Maes GE, Larsen PF, Hansen MM, Bernatchez source of quantitative genetic variation. Trends Genet 13:74–78 L, Zane L, Bortoluzzi S (2010) Sequencing, de novo annotation Kuznik-Kowalska E, Lewandowska M, Pokroyszko BM, Prockow and analysis of the first Anguilla anguilla transcriptome: Eeel- M (2013) Reproduction, growth and circadian activity of the Base opens new perspectives for the study of critically endan- snail Bradybaena fruticum (O.F. Muller, 1774) (Gastropoda: gered European eel. BMC Genom 11:635 Pulmonata: Bradybaenidae) in the laboratory. Cent Eur J Biol Deng Y, Lei Q, Tian Q, Xie S, Du X, Li J, Wang L, Xiong Y (2014) 8:693–700 De novo assembly, gene annotation, and simple sequence repeat Kwon OK, Park KM, Lee JS (1993) Colored shells of Korea. Aca- marker development using Illumina paired-end transcriptome demic Publishing Co., Seoul sequences in the pearl oyster Pinctada maxima. Biosci Biotech- Lee JS, Kwon OK (1993) Chromosomal studies of eight species of nol Biochem 78:1685–1692 Bradybaenidae in Korea. Korean J Malacol 9:30–43 Ekblom R, Wolf JBW (2014) A field guide to whole-genome sequenc- Leonard JL, Edstrom JP (2004) Parallel processing in an identified ing, assembly and annotation. Evol Appl 7:1026–1042 neural circuit: the Aplysia californica gill-withdrawal response Feldmeyer B, Wheat CW, Krezdorn N, Rotter B, Pfenninger M model system. Biol Rev Camb Philos Soc 79:1–59 (2011) Short read Illumina data for the de novo assembly of a

1 3 Mol Genet Genomics

Lopez-Uribe CK, Santiago SM, Bogdanowicz BND (2012) Discov- Giant Hornet (Vespa mandarinia) using Illumina HiSeq 4000 ery and characterization of microsatellites for the solitary bee sequencing: De novo assembly, functional annotation, and dis- Colletes inaequalis using Sanger and 454 pyrosequencing. Apid- covery of SSR markers. Int J Genomics, Article ID 4169587. ologie 44:163–172 doi:10.1155/2016/4169587 Lv J, Liu P, Gao B, Wang Y, Wang Z, Chen P, Li J (2014) Transcrip- Patnaik BB, Wang TH, Kang SW, Hwang H-J, Park SY, Park EB, tome analysis of Portunus trituberculatus: de novo assembly, Chung JM, Song DK, Kim C, Kim S, Lee JS, Han YS, Park HS, growth-related gene identification and marker discovery. PLoS Lee YS (2016b) Sequencing, de novo assembly, and annotation One 9:e94055 of the transcriptome of the endangered fresh water pearl bivalve Ma J-Q, Yao M-Z, Ma C-L, Wang X-C, Jin J-Q, Wang X-M, Chen L Cristaria plicata provides novel insights into functional genes (2014) Construction of a SSR based genetic map and identifica- and marker discovery. PLoS One 11:e0148622 tion of QTLs for catechins content in Tea plants (Camellia sinen- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva sis). PLoS One 9:e93131 S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J Majorek KA, Dunin- Horkawicz S, Steczkiewicz K, Muszewska (2003) TIGR Gene Indices Clustering Tool (TGICL): a software A, Nowotny M, Ginalski K, Bujnicki JM (2014) The RNase H system for fast clustering of large EST datasets. Bioinformatics superfamily: new members, comparative structural analysis and 19:651–652 evolutionary classification. Nucleic Acids Res 42:4160–4179 Prentis PJ, Pavasovic A (2014) The Anadara trapezia transcriptome: a Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA- resource for molluscan physiological genomics. Mar Genomics seq: an assessment of technical reproducibility and comparison of Pt B 18:113–115 gene expression arrays. Genome Res 18:1509–1517 Rao R, Zhu YB, Alinejad T, Tiruvayipati S, Thong KL, Wang J, Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Bhassu S (2015) RNA-seq analysis of Macrobrachium rosenber- Nat Rev Genet 12:671–682 gii hepatopancreas in response to Vibrio parahaemolyticus infec- McDowell IC, Nikapitiya C, Aguiar D, Lane CE, Istrail S, Gomez- tion. Gut Pathogens 7:6 Chiarri M (2014) Transcriptome of American oysters, Cras- Ravasi T, Huber T, Zavolan M, Forrest A, Gaasterland T, Grimmond sostrea virginica, in response to bacterial challenge: insights S, RIKEN GER Group, GSL Members, Hume DA (2003) Sys- into potential mechanisms of disease resistance. PLoS One tematic characterization of the zinc-finger-containing proteins in 9:e105097 the mouse transcriptome. Genome Res 13:1430–1442 Meng J, Zhu Q, Zhang L, Li C, Li L, She Z, Huang B, Zhang G Ren X, Liu T, Dong J, Sun L, Yang J, Zhu Y, Jin Q (2012) Evaluating (2013) Genome and Transcriptome analyses provide insight de Bruijin graph assemblers on 454 transcriptomic data. PLoS into the euryhaline adaptation mechanism of Crassostrea gigas. One 7:e51188 PLoS One 8:e58563 Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of Mikheyev AS, Vo T, Wee B, Singer MC, Parmesan C (2010) Rapid the gene ontology annotations. Nat Rev Genet 9:509–515 microsatellite isolation from a Butterfly by De novo transcrip- Rogers MF, Ben-Hur A (2009) The use of gene ontology evidence tome sequencing: performance and a comparison with AFLP- codes in preventing classifier assessment bias. Bioinformatics derived distances. PLoS One 5:e11212 25:1173–1177 Min DK, Lee JS, Koh DB, Je JK (2004) Molluscs in Korea. Hangeul Sadamoto H, Takahashi H, Okada T, Kenmoku H, Toyota M, Asakawa Publishing Company, Min Molluscan Research Institute, Korea, Y (2012) De novo sequencing and transcriptome analysis of the p 566 central nervous system of mollusc Lymnaea stagnalis by deep Moroz LL, Edwards JR, Puthanveettil SV, Kohn AB, Ha T, Heyland RNA sequencing. PLoS One 7:e42546 A, Knudsen B, Sahni A, Yu F, Liu L, Jezzini S, Lovell P, Ian- Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, nucculli W, Chen M, Nguyen T, Sheng H, Shaw R, Kalachikov Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marcais G, S, Panchin YV, Farmerie W, Russo JJ, Ju J, Kandel ER (2006) Pop M, Yorke JA (2012) GAGE: a critical evaluation of genome Neuronal transcriptome of Aplysia: neuronal compartments and assemblies assembly algorithms. Genome Res 22:557–567 circuitary. Cell 127:1453–1467 Schileyko AA (2004) Treatise on recent terrestrial pulmonate mol- Mosavi LK, Cammett TJ, Desrosiers DC, Peng Z (2004) The ankyrin luscs. Pt. 12. Bradybaenidae, Monodeniidae, Xanthonychidae, repeat as molecular architecture for protein recognition. Protein Epiphragmophoridae, Helminthoglyptidae, Elonidae, Humbold- Sci 13:1435–1448 tianidae, Sphincterochilidae, Cochlicellidae. Ruthenica Suppl Noseworthy RG, Lim N-R, Choi K-S (2007) A catalogue of the 2:1627–1763 mollusks of the Jeju Island, South Korea. Korean J Malacol Shi X, Gupta S, Lindquist IE, Cameron CT, Mudge J, Rashotte AM 23:65–104 (2013) Transcriptome analysis of cytokinin response in tomato Pallavicini A, Canapa A, Barucca M, Alfoldi J, Biscotti MA, Buono- leaves. PLoS One 8:e55090 core F, De Moro G, Di Palma F, Fausto AM, Forconi M, Ger- Simakov O, Larsson TA, Arendt D (2013) Linking macro- and micro- dol M, Makapedua DM, Turner-Meier J, Olmo E, Scapigliati G evolution at the cell type level: a view from the lophotrochozoan (2013) Analysis of the transcriptome of the Indonesian coela- Platynereis dumerilii. Brief Funct Genomics 12:430–439 canth Latimeria menadoensis. BMC Genom 14:538 Storey KB, Storey JM (2010) Metabolic regulation and gene expres- Park G-B (2011) Karyotypes of Korean endemic land snail, Koreano- sion during aestivation. In: Navas CA, Carvalho JE (eds) Aes- hadra kurodana (Gastropoda: Bradybaenidae). Korean J Malacol tivation: molecular and physiological aspects, progress in 27:87–90 molecular and subcellular biology, vol 49. Springer, Berlin. Patnaik BB, Hwang H-J, Kang SW, Park SY, Wang TH, Park EB, doi:10.1007/978-3-642-02421-4_2 Chung JM, Song DK, Kim C, Kim S, Lee JB, Jeong HC, Park Sysoev A, Schileyko A (2009) Land snails and slugs of Russia and HS, Han YS, Lee YS (2015) Transcriptome characterization for adjacent countries. Pensoft, Sofia non-model endangered lycaenids, Protantigius superans and Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram Spindasis takanosis, using Illumina HiSeq 2500 sequencing. Int UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin J Mol Sci 16:29948–29970 EV (2001) The COG database: new developments in phyloge- Patnaik BB, Park SY, Kang SW, Hwang H-J, Wang TH, Park EB, netic classification of proteins from complete genomes. Nucleic Chung JM, Song DK, Kim C, Kim S, Lee JB, Jeong HC, Park Acids Res 29:22–28 HS, Han YS, Lee YS (2016a) Transcriptome profile of the Asian

1 3 Mol Genet Genomics

Teaniniuraitemoana V, Huvet A, Levy P, Klopp C, Lhuillier E, Gaert- You FM, Huo N, Gu YQ, Luo M-C, Ma Y, Hane D, Lazo GR, Dvorak ner-Mazouni N, Gueguen Y, Le Moullac G (2014) Gonad tran- J, Anderson OD (2008) BatchPrimer3: a high throughput web scriptome analysis of pearl oyster Pinctada margaritifera: iden- application for PCR and sequencing primer design. BMC Bio- tification of potential sex differentiation and sex determining inform 9:253 genes. BMC Genom 15:491 Yue H, Li C, Du H, Zhang S, Wei Q (2015) Sequencing and De Novo Voronin DA, Kiseleva EV (2008) Functional role of proteins contain- assembly of the Gonadal transcriptome of the endangered Chi- ing ankyrin repeats. Cell Tissue Biol 2:1–12 nese Sturgeon (Acipenser sinensis). PLoS One 10:e0127332 Wang Y, Guo X (2007) Development and characterization of EST- Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, McCown SSR markers in the eastern oyster Crassostrea virginica. Mar B, Harbut R, Simon P (2012) Using next-generation sequencing Biotechnol 9:500–511 approaches to isolate simple sequence repeats (SSR) loci in the Wang Z, Fang B, Chen J, Zhang X, Luo Z, Huang L, Chen X, Li Y plant sciences. Am J Bot 99:1–16 (2010) De novo assembly and characterization of root transcrip- Zhang J, Liang S, Duan J, Wang J, Chen S, Cheng Z, Zhang Q, Liang tome using Illumina paired-end sequencing and development of X, Li Y (2012) De novo assembly and characterization of the cSSR markers in sweet potato (Ipomoea batatus). BMC Genom transcriptome during seed development, and generation of genic- 11:726 SSR markers in Peanut (Arachis hypogea L.). BMC Genom Wang S, Wang X, He Q, Liu X, Xu W, Li L, Gao J, Wang F (2012) 13:90 Transcriptome analysis of the roots at early and late seedling Zhang L, Li L, Zhu Y, Zhang G, Guo X (2014) Transcriptome analy- stages using Illumina paired-end sequencing and development of sis reveals a rich gene set related to innate immunity in the East- EST-SSR markers in radish. Plant Cell Rep 31:1437–1447 ern Oyster (Crassostrea virginica). Mar Biotechnol 16:17–33 Wang L, Yang C, Song L (2013) The molluscan HSP70s and their Zhao S, Fung-Leung W-P, Bittner A, Ngo K, Liu X (2014) Compari- expression in hemocytes. ISJ 10:77–83 son of RNA-Seq and microarray in transcriptome profiling of Wang W, Hui JH, Chan TF, Chu KH (2014) De novo transcriptome activated T cells. PLoS One 9:e78644 sequencing of the snail Echinolittorina malaccana: identification Zhao QP, Xiong T, Xu XJ, Jiang MS, Dong HF (2015) De novo tran- of genes responsive to thermal stress and development of genetic scriptome analysis of Oncomelania hupensis after Molluscicide markers for population studies. Mar Biotechnol 16:547–559 treatment by Next-Generation Sequencing: implications for Werner GDA, Gemmell P, Grosser S, Hamer R, Shimeld SM (2012) Biology and future snail interventions. PLoS One 10:e0118673 Analysis of a deep transcriptome from the mantle tissue of Zheng X, Pan C, Diao Y, You Y, Yang C, Hu Z (2013) Development of Patella vulgata Linnaeus (Mollusca: Gastropoda: Patellidae) microsatellite markers by transcriptome sequencing in two spe- reveals biomineralising genes. Mar Biotechnol. doi:10.1007/ cies of Amorphophallus (Araceae). BMC Genom 14:490 s10126-012-9481-0 Wu S-P, Hwang C-C, Huang H-M, Chang H-W, Lin Y-S, Lee P-F (2007) Land molluscan fauna of the Dongsha Island with twenty two recorded species. Taiwania 52:145–151

1 3