Marine Genomics 3 (2010) 179–191

Contents lists available at ScienceDirect

Marine Genomics

journal homepage: www.elsevier.com/locate/margen

Gilthead sea bream (Sparus auratus) and European sea bass (Dicentrarchus labrax) expressed sequence tags: Characterization, tissue-specific expression and gene markers

Bruno Louro a,j, Ana Lúcia S. Passos a, Erika L. Souche b,1, Costas Tsigenopoulos c, Alfred Beck d, Jacques Lagnel c, François Bonhomme e, Leonor Cancela a, Joan Cerdà f, Melody S. Clark g, Esther Lubzens h, Antonis Magoulas c, Josep V. Planas i, Filip A.M. Volckaert b, Richard Reinhardt d, Adelino V.M. Canario a,⁎ a Centre of Marine Sciences, University of Algarve, Building 7, Gambelas, 8000-139 Faro, Portugal b Laboratory of Animal Diversity and Systematics, Katholieke Universiteit Leuven, Charles Deberiotstraat 32, B-3000 Leuven, Belgium c Hellenic Centre for Marine Research, Institute of Marine Biology and Genetics, Thalassocosmos, Ex-US base at Gournes, P.O. Box 2214, Gournes Pediados, 715 00 Heraklion, Crete, Greece d MPI Molecular Genetics, Ihnestrasse 63-73, D-14195 Berlin-Dahlem, Germany e Département Biologie Intégrative, Institut des Sciences de l'Evolution, UMR 5554 Université de Montpellier 2, cc 63 — Pl. E Bataillon, F34095 Montpellier Cedex 5, France f Laboratory of Institut de Recerca i Tecnologia Agroalimentaries (IRTA)-Institut de Ciencies del Mar (Consejo Superior de Investigaciones Científicas, CSIC), Passeig Marítim 37-49, 08003-Barcelona, Spain g British Antarctic Survey, Natural Environment Research Council, High Cross, Madingley Road, Cambridge CB3 0ET, UK h National Institute of Oceanography, Israel Oceanographic & Limnological Research, P.O. Box 8030, Haifa 31080, Israel i Departament de Fisiologia, Facultat de Biologia, Universitat de Barcelona, Av. Diagonal 645, 08028 Barcelona, Spain j Division of Genetics and Genomics, Roslin Institute and Royal (Dick) School of Veterinary Sciences, University of Edinburgh, Roslin, Midlothian, EH25 9PS, United Kingdom article info abstract

Article history: The gilthead sea bream, Sparus auratus, and the European sea bass, Dicentrarchus labrax, are two of the most Received 16 May 2010 important marine species cultivated in Southern Europe. This study aimed at increasing genomic resources for Received in revised form 17 September 2010 the two species and produced and annotated two sets of 30,000 expressed sequence tags (EST) each from 14 Accepted 21 September 2010 normalized tissue-specific cDNA libraries from sea bream and sea bass. Clustering and assembly of the ESTs formed 5268 contigs and 12,928 singletons for sea bream and 4573 contigs and 13,143 singletons for sea bass, Keywords: representing 18,196 and 17,716 putative unigenes, respectively. Assuming a similar number of genes in sea Expressed Sequence Tag fi Microsatellite bass, sea bream and in the model sh Gasterosteus aculeatus genomes, it was estimated that approximately Single nucleotide polymorphism two thirds of the sea bream and the sea bass transcriptomes were covered by the unigene collections. BLAST − Annotation sequence similarity searches (using a cut off of e-value b10 5) against fully the curated SwissProt (and Aquaculture TrEMBL) databases produced matches of 28%(37%) and 43%(53%) of the sea bream and sea bass unigene Teleost fish datasets respectively, allowing some putative designation of function. A comparative approach is described using human Ensembl peptide ID homolog's for functional annotation, which increased the number of unigenes with GO terms assigned and resulted in more GO terms assigned per unigene. This allowed the identification of tissue-specific genes using enrichment analysis for GO pathways and protein domains. The comparative annotation approach represents a good strategy for transferring more relevant biological information from highly studied species to genomic resource poorer species. It was possible to confirm by interspecies mRNA-to-genomic alignments 25 and 21 alternative splice events in sea bream and sea bass genes, respectively. Even using normalized cDNA from relatively few pooled individuals it was possible to identify 1145 SNPs and 1748 microsatellites loci for genetic marker development. The EST data are being applied to a range of projects, including the development microarrays, genetic and radiation hybrid maps and QTL genome scans. This highlights the important role of ESTs for generating genetic and genomic resources of aquaculture species. © 2010 Elsevier B.V. All rights reserved.

⁎ Corresponding author. E-mail addresses: [email protected] (B. Louro), [email protected] (A.L.S. Passos), [email protected] (E.L. Souche), [email protected] (C. Tsigenopoulos), [email protected] (A. Beck), [email protected] (J. Lagnel), [email protected] (F. Bonhomme), [email protected] (L. Cancela), [email protected] (J. Cerdà), [email protected] (M.S. Clark), [email protected] (E. Lubzens), [email protected] (A. Magoulas), [email protected] (J.V. Planas), [email protected] (F.A.M. Volckaert), [email protected] (R. Reinhardt), [email protected] (A.V.M. Canario). 1 Current address: Institut Pasteur, Plate-Forme Intégration et Analyse Génomiques, 28 Rue du Docteur Roux, F-75724 Paris Cedex 15, France.

1874-7787/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.margen.2010.09.005 180 B. Louro et al. / Marine Genomics 3 (2010) 179–191

1. Introduction assembly and annotation of draft genome sequence (Kuhl et al., 2010)], sea bass radiation hybrid map and QTL mapping (Massault et The gilthead sea bream, Sparus auratus, and the European sea bass, al., 2010). Dicentrarchus labrax, are two of the main marine fish species cultured in Southern Europe. Both species are economically valuable and, therefore, 2. Methods over the last two decades efforts have been made to increase the knowledge of their physiology, reproduction, immunology, ecology, 2.1. cDNA libraries construction quantitative genetics and population genetics (Power et al., 2001; Scapigliati et al., 2002; Piferrer et al., 2005; Pawson et al., 2007; Lemaire The sea bream and sea bass were obtained from local fish farms and et al., 2005; Laiz-Carrion et al., 2005; Arends et al., 1999; Senger et al., maintained at the Ramalhete marine station (University of Algarve, 2006; Meiri et al., 2002). The fact that these are not designated model Faro, Portugal). The sea bream cDNA libraries were made from tissue fish species has contributed to the shortage of genomic resources. pools of 9 animals, ranging in weight between 0.3 and 4 kg and However, recently there has been growing awareness of the value of included males (2), females (4) and hermaphrodites (3). The sea bass genomic information in the efficient implementation of strategies for cDNA libraries were made from tissue pools from 5 male individuals of selective breeding, disease prevention, treatment and quality control in approximately 450 g except for the ovarian tissue which came from a cultured fish species. Therefore, the EU Network of Excellence Marine single individual from a separate stock. Animals were sacrificed with Genomics Europe (MGE; http://www.marine-genomics-europe.org/) an excess of 2-phenoxyethanol, tissues were dissected and stored in set out to develop EST (expressed sequence tags; single-pass sequencing liquid nitrogen at −80°C. Total RNA was extracted using the acid of cDNA libraries) projects on aquacultured marine fish species, focusing guanidinium thiocyanate-phenol-chloroform method (Chomczynski on the sea bream and the sea bass. These EST projects represent an and Sacchi, 1987). important contribution to the genomic resources of these two species. Fourteen normalized tissue-specific cDNA libraries were con- The generation and analysis of ESTs is a valuable approach for the structed and normalized from liver; ovary; testis; bone/cartilage; identification and characterization of new genes and provide a platform brain/pituitary; heart/vessels; adipose; head/kidney; trunk/kidney; for developing functional genomics methods (Gong, 1999; Douglas gill; intestine; spleen; muscle; and skin of each species. cDNA library et al., 1999; 2007; Gonzalez et al., 2007; Cerdà et al., 2008; Canario et al., construction and sequencing was carried out at the Max Planck 2008; Koop et al., 2008). Institute of Molecular Genetics (Berlin, Germany). mRNA purifications Representation of transcript abundance and diversity in EST and concentration were performed using the Dynal's oligo-dT collections is dependent on the method of cDNA library construction. magnetic beads (Invitrogen Dynal AS, Olso, Norway). cDNA libraries If the objective is to obtain the widest possible representation of cDNAs, were constructed using the cDNA SMART-kit (Clontech, Palo Alto, CA, then it is a common procedure to normalize the cDNA libraries prior to USA) (Zhu et al., 2001) and normalized using thermostable duplex- sequencing in order to increase the presence of rare transcripts, specific nuclease (Zhulidov et al., 2004). however the gene expression profiles are lost (Bonaldo et al., 1996; Coblentz et al., 2006; Govoroun et al., 2006). In contrast, non normalized 2.2. Sequencing: editing and cluster assembly cDNA libraries may be deficient in rare transcripts, unless deep sequencing coverage is achieved, something that is now more accessible Sequencing was performed using Capillary Sequencer systems (ABI with Next Generation pyrosequencing (Morozova and Marra, 2008). 3730 XL and GE Healthcare MegaBace 4500) at the Max Planck Institute Fish EST projects started in the late 90s with the model fishes zebrafish, of Molecular Genetics (Berlin, Germany). For both sequencing systems, Danio rerio (Gong, 1999; Gong et al., 1997) and medaka, Oryzias latipes the sequencing kit used was the ABI BigDye Terminator v.3.1 (Applied (Hirono and Aoki, 1997). These are still the species with the largest Biosystems, Foster City, CA, USA). For each library between 2304 collections of published ESTs. These were followed by another two and 2688 ESTs were sequenced from the 5´ ends for both species. model fishes, the Japanese puffer fish, Takifugu rubripes (Clark et al., The resulting ESTs were quality-trimmed (Nq20) and vector- 2003) and the mummichog killifish, Fundulus heteroclitus (Paschall et al., clipped using Phred (Ewing et al., 1998); http://www.phrap.org, 2004). Since then, large scale EST projects from the cultured salmonids, (Ewing and Green, 1998) and LUCY (Chou and Holmes, 2001); http:// Atlantic salmon, Salmo salar (Adzhubei et al., 2007; Hagen-Larsen et al., www.tigr.org, and all sequences shorter than 100 bp were discarded. 2005) and rainbow trout, Onchorynchus mykiss (Govoroun et al., 2006), Cross-match (http://www.phrap.org) was then used to screen the and other cultured fish, such as the catfish, Ictalurus spp (Li et al., 2007; sequences for contaminant vector sequences. Repeats were masked by Wang et al., 2010), and the common carp, Cyprinus carpio (Gonzalez RepeatMasker (Smit et al., 1996–2010) before the ESTs were clustered et al., 2007), Nile tilapia, Oreochromis niloticus (Lee et al., 2010)have and assembled, according to the default parameters, using TGICL been developed. Currently, there are more than a dozen teleost EST (Pertea et al., 2003). Consensus sequences originating from clusters collections comprising more than 50,000 ESTs, of which eight are of ESTs representing the same transcripts were termed contigs and cultured fish species. Prior to the study described in this article, there unique sequences were classified as singletons. All sequences were were less that 2,500 transcript sequences published for each of sea deposited in EMBL with accession numbers AM950553 to AM980446 bream and sea bass, which included a random mix from directed gene- (S. auratus) for and FM178562 to FM178778 (D. labrax). based studies and small scale EST collections (Chini et al., 2006). The objective of this work was to increase the knowledge base of 2.3. Unigene identification using sequence similarity searching sea bream and sea bass genomics through the generation and analysis of two medium-scale EST collections. This study has produced The BLASTX algorithm (Altschul et al., 1990; 1997) was used to approximately 30,000 ESTs using Sanger sequencing from 14 tissue- query for sequence similarity of the unigenes against the Swissprot specific normalized cDNA libraries for each species. These now and Trembl databases using a local blast engine, blastall (ftp://ftp.ncbi. represent 44% of sea bream and 54% sea bass entries in the NCBI nih.gov/blast/executables/). The fully manually curated and annotat- GenBank dbEST (accessed 24 April 2010). These EST collections have ed Swissprot database (ftp://ftp.expasy.org/databases/uniprot/ already had a significant impact in the development of additional knowledgebase/uniprot_sprot.fasta.gz, version 10/04/2008) was que- genomics tools and applications such as microarrays (Ferraresso et al., ried in order to obtain putative functional identifications for the 2008; Calduch-Giner et al., 2010; Ferraresso et al., 2010), microsatellite unigenes. BLASTX was also used to query the Trembl database (ftp:// markers and linkage maps (Vogiatzi et al., submitted for publication), ftp.expasy.org/databases/uniprot/knowledgebase/uniprot_trembl. SNP development and population studies (Souche et al., 2007), fasta.gz, version 10/04/2008) to improve the unigene identifications. B. Louro et al. / Marine Genomics 3 (2010) 179–191 181

Both BLASTX queries were performed using the blosum62 matrix homology human peptides (Fig. 1 step 4). Ensembl human peptides with 1e−5 e-value, 10 open gap penalty, 1 extended gap penalty and identifiers retrieved via blastp (Fig. 1 step 1) and unique Ensembl 3 word size as cut-off values. human peptides identifiers retrieved via stickleback comparison were In order to identify novel transcripts, Genbank dbEST was also merged and overlapping identifiers for sea bream or sea bass unigenes queried using BLASTN (with a cut-off value of E-value of b 1e−5 and a were eliminated. In order to do this the Ensembl human identifiers data bit score N40). The sequence similarity searches were performed both were uploaded to the Microsoft Office Access 2003 database and cross- on the unigenes resulting from all tissues clustered together into a queried using sea bream or sea bass unigenes as primary key link to filter species-specific EST database and also on each tissue-specific EST for redundancy (Fig. 1 step 5). A unique file with unigenes and their database (data not shown). Spreadsheet tables were produced with respective human peptide orthologous Ensembl IDs was obtained for the BLAST results, cluster consensus sequences and links to the EMBL both sea bass and sea bream unigenes separately. The procedure was UniProt (Supplementary file 1 and 2). also carried out separately for each tissue library and also for all libraries combined. The output files were used to retrieve all the human GO 2.4. Unigene GO annotation annotations by querying the “Ensembl 50 genes (Sanger UK), Homo sapiens gene (NCBI36)” datasets with biomart martview (Fig. 1 step 6). Two methods were used to obtain Gene Ontology (GO) terms The retrieved GO terms were submitted to the GO term classification (Ashburner et al., 2000). The first approach was to directly retrieve the counter tool from the Gene Ontology Consortium (Zhi-Liang et al., 2008) GO terms, if available, from the first match of the BLASTX query against to produce a representative GO set (GOA-slim set plus the direct child Trembl. The second approach was to obtain the GO terms for sea bream terms of binding and cellular processes) of the two ancestral GO terms and sea bass unigenes separately, via a comparative approach to the (Biological process and Molecular function) separately. All GO classifi- stickleback, Gasterosteus aculeatus, Ensembl transcripts (http://www. cation data output was integrated in spreadsheet tables with all of ensembl.org/Gasterosteus_aculeatus) and human, Homo sapiens, the BLAST results using Microsoft Office Access 2003 with sea bream or Ensembl peptides (http://www.ensembl.org/Homo_sapiens)(Fig. 1). sea bass unigenes defined as primary key to link all incremental data The unigenes were queried against the human Ensembl peptides (Fig. 1 in one file (Supplementary file 1 and 2). step 1) (Homo_sapiens.NCBI36.47.pep.all.fa, 26/10/2007) and stickle- back Ensembl transcripts (Fig. 1 step 2) (Gasterosteus_aculeatus. 2.5. Tissue specific gene-enrichment analysis BROADS1.46.cdna.all.fa, 26/10/2007) via local blast using BLASTX and BLASTN, respectively. Blast parameters were as described previously. To obtain specific biological annotations for each sea bass and sea BLAST results against the human and stickleback were merged and bream tissue library, a gene-enrichment analysis was carried out overlapping clones were eliminated. For this purpose stickleback using the DAVID functional annotation tool (http://www.david.abcc. Ensembl transcripts identifiers retrieved from the blastn output were ncifcrf.gov/summary.jsp). Human Ensembl gene IDs obtained via used as input filters via biomart martview (http://www.biomart.org/ biomart, corresponding to human Ensembl peptides IDs homologs of martview) to retrieve human peptides orthologs (Fig. 1 step 3) as output the sea bream and sea bass unigenes present only in one given specific attributes. Because several homology relationships exist (one-to-one, tissue were uploaded and submitted to the DAVID functional one-to-many, many-to-many, and many-to-one), a Microsoft Office annotation tool (http://www.david.abcc.ncifcrf.gov/summary.jsp). Access 2003 database was created and queried to filter maximum Batch annotations for all tissue-specific biological clusters containing

Fig. 1. Flowchart of Gene Ontology annotation methodology carried out for the two species and for each cDNA library. Unigenes were queried against the human Ensembl peptides (step 1) and stickleback Ensembl transcripts (step 2) using BLASTX and BLASTN, respectively. The two BLAST results were merged and overlapping clones eliminated. For this transcripts identifiers retrieved from the blastn output were used as input filters via biomart martview to retrieve human peptides orthologs (step 3) as output attributes. Maximum homology human peptides were filtered in step 4. Ensembl human peptide identifiers retrieved via blastp (step 1) and unique human peptide identifiers retrieved via the stickleback comparison were merged and overlapping identifiers for sea bream or sea bass unigenes eliminated (step 5). The output files were used to retrieve all the human GO annotations by querying the “Ensembl 50 genes (Sanger UK), Homo sapiens gene (NCBI36)” datasets with biomart martview (step 6). See also Section 2.4 for details. 182 B. Louro et al. / Marine Genomics 3 (2010) 179–191 molecular function and biological process GO terms, Interpro protein ESTs for SNPs when redundant SNP candidates were selected. Default functional domains, KEGG and Biocarta pathways were retrieved parameters and tissue information were used. Insertions/deletions using a false discovery rate of 10% as function of the EASE Score and a (indels) of several base pairs, initially detected as several indels of one modified Fisher Exact P-value as a cut-off score. The same procedure base pair, were considered single indels of several base pairs. was repeated for all the unigenes in the different tissue-specific libraries irrespective of their coverage within the libraries. 2.8. Microsatellite detection

2.6. Alternative splice form analysis Both sea bass and sea bream unigenes were used to identify and characterize SSRs (Simple Sequence Repeats) using a Perl script based In order to identify sequence insertions/gaps or partially dissimilar on the algorithm of the MISA script (http://www.pgrc.ipk-gatersleben. sequences, potentially indicative of alternative splicing events, de/misa). Perfect tandem repeats were defined in sequences with a “supracontigs” were created using a more relaxed clustering gap minimum number of repeats of 9 for dinucleotide, 6 for trinucleotide, 5 penalty parameter and aligned using ClustalX (Thompson et al., 1997). for tetranucleotide, and 4 for penta- and hexanucleotide repeats. SSR- Contigs containing inserts or partially dissimilar sequences were used ESTs were analyzed for redundancy by BLASTN and/or Cap3 software. as queries against the green spotted pufferfish (Tetraodon nigroviridis) SSR results were retrieved in a tabular form using a pipeline of SQL genome (http://www.genoscope.cns.fr/externe/tetranew/) using Blat queries and Perl scripts. (Kent, 2002) in order to retrieve the respective gene loci necessary for the mRNA-to-genomic alignments. To confirm the genomic organiza- 3. Results and discussion tion and identify the type of alternative splicing present in each gene, the Spidey software (Wheelan et al., 2001) was used selecting the 3.1. cDNA library construction, sequencing and clustering divergent sequences option necessary for the interspecies alignment. The aim of this project was to obtain a comprehensive overview of 2.7. SNP detection the sea bream and sea bass transcriptomes. Hence, 14 normalized tissue-specific cDNA libraries were constructed for each species and ESTs are single-pass reads and therefore they contain base calling approximately 2000 clones from each library were single-pass errors on average every 100 bp (Edwards, 2007) and are prone to sequenced from their 5´ ends. The normalization procedure resulted transcriptional errors and misalignments (Garg et al., 1999). In order to in an average of 10% and 12% redundancy per tissue for the sea bream limit the number of false SNP candidates, only mismatches appearing and sea bass, respectively, with the exception of the sea bass skin twice in an alignment (redundant mismatches) are usually considered library (60%; Table 1). Normalization success was confirmed by the SNP candidates (Garg et al., 1999; Batley et al., 2003; Le Dantec et al., fact that most contigs contained only two ESTs. 2004). The software MiraEST [version 2.4; (Chevreux et al., 2004)was A total of 29,895 and 29,260 high quality ESTs were obtained from considered to be the most efficient (data not shown) for mining sea bass the sea bream and sea bass, respectively (Table 1). The average length

Table 1 Number of total sequences per tissue and their respective contigs, singletons and redundancy obtained by clustering in sea bream and sea bass.

Average length (bp)

cDNA library Total ESTs EST Contigs Singletons Unigenes (contigs+singletons) Redundancy (%)

Sea bream Liver 2181 733 923 735 1799 (251+1548) 18 Ovary 2107 670 845 670 1913 (155+1758) 9 Testis 2141 718 913 717 1999 (128+1871) 7 Bone/cartilage 2163 656 936 654 2012 (141+1862) 7 Brain/pituitary 2104 618 889 616 1957 (127+1830) 7 Heart/vessels 2210 726 992 722 1921 (217+1704) 13 Adipose 2125 648 799 644 1864 (218+1646) 12 Head/kidney 2161 679 937 675 1900 (206+1694) 12 Trunk/kidney 2060 686 931 683 1907 (108+1799) 7 Gill 2096 591 729 591 1871 (197+1674) 11 Intestine 2088 671 930 671 1881 (175+1706) 10 Spleen 2185 725 1021 721 2014 (141+1873) 8 Muscle 2168 741 1050 736 1858 (225+1633) 14 Skin 2105 601 823 601 1959 (131+1828) 7 All libraries 29895 677 1049 665 18196 (5268+12928) 39

Sea bass Liver 2139 699 909 697 1858 (213+1645) 13 Ovary 2389 526 659 525 2069 (264+1805) 13 Testis 2127 697 881 692 1940 (153+1787) 9 Bone/cartilage 2045 645 839 645 1842 (96+1746) 10 Brain/pituitary 2118 579 739 578 1997 (95+1902) 6 Heart/vessels 2296 541 715 536 2082 (157+1925) 9 Adipose 2295 631 818 629 2127 (123+2004) 7 Head/kidney 1935 535 638 536 1687 (116+1571) 13 Trunk/kidney 2203 680 890 672 1917 (161+1756) 13 Gill 2101 701 873 696 1877 (119+1758) 11 Intestine 1989 743 868 738 1628 (97+1531) 18 Spleen 2277 608 817 607 2074 (136+1938) 9 Muscle 1776 694 828 697 1249 (102+1147) 30 Skin 1570 668 711 695 632 (62+570) 60 All libraries 29260 636 928 619 17716 (4573+13143) 39 B. Louro et al. / Marine Genomics 3 (2010) 179–191 183

Table 2 of high quality unassembled ESTs was 676 bp for sea bream and Summary of BLAST query results for sea bream and sea bass unigenes. 636 bp for sea bass, with, respectively, 86% and 80% sequences longer Unigenes hits than 500 bp after quality trimming. Clustering of all tissues for each species yielded 5,268 contigs and Database Sea bream Sea bass 12,928 singletons for the sea bream and 4,573 contigs and 13,143 Swissprot 5166 (28%) 7640 (43%) singletons for the sea bass (Table 1). The contigs plus the singletons Trembl 6 716 (37%) 9 320 (53%) Genebank ESTdb 10470 (57%) 12104 (68%) putatively corresponded to different transcripts and were designated Human Ensembl peptides 5126 (28%) 7595 (43%) unigenes. Stickleback Ensembl transcripts 6056 (33%) 8544 (48%) Stickleback is, among the genome sequenced fish species, the one Stickleback Ensembl peptides 5811 (32%) 8519 (48%) with highest representation of Ensembl predicted transcripts (27,629; (http://www.ensembl.org/Gasterosteus_aculeatus/Info/StatsTable)at the same time is the most phylogeneticaly closely related to sea Table 3 bream and sea bass. Assuming the number of stickleback transcripts as a Summary of GO annotation and method comparison. GO TrEMBL refers to annotation near representation of the whole transcriptome, the ratio of sea bream with GO terms retrieved directly from the first match of Blastx query against the TrEMBL database. GO biomart refers to annotation with human GO terms retrieved via and sea bass unigenes with stickleback predicted transcripts indicate, an integrated and comparative approach (see Fig. 1). that 66% and 64% transcriptomes were covered by the unigenes collections, respectively. The average length of the sea bream unigenes Unigenes GO TrEMBL (A) GO biomart (B) B/A was 776 bp, with a median of 750 bp and a mode of 782 bp. The sea bass Sea bream unigenes comprised an average length of 698 bp, a median of 699 bp With GO annotation (C) 3272 (18%) 5203 (29%) 1.6× Total GO terms (D) 9408 29,834 3.2× and a mode of 770 bp. The redundancy of the unigenes for the whole Average GO terms (D/C) 2.9 5.7 2× species datasets was 39% for both sea bream and sea bass. This was higher than the within tissue average redundancy due to two factors: Sea bass redundancy is expected to increase with a higher transcriptome With GO annotation (E) 4658 (26%) 7528 (43%) 1.6× representation and normalization procedures were carried out sepa- Total GO terms (F) 13,435 41,511 3.1× Average GO terms (F/E) 2.9 5.5 1.9× rately for each tissue library and not for the pool of tissues.

Fig. 2. Gene Ontology annotation of the sea bream total ESTs according to Biological Process and Molecular Function. Direct child GO terms of cellular process and binding are represented in their respective stacked bars. 184 B. Louro et al. / Marine Genomics 3 (2010) 179–191

3.2. Unigene identification using sequence similarity higher sea bass unigene GC content (data not shown) and higher percentage of trinucleotide microsatellites (see below), which are For the sea bream, 28% and 37% positive matches were obtained in more abundant in exons in all taxa (Tóth et al., 2000). Swissprot and Trembl databases, respectively using the unigenes (Table 2). The corresponding values for sea bass were 43% and 53% 3.3. EST GO annotations (Table 2). Although a smaller percentage of positive matches were obtained by querying the fully curated Swissprot database, these matches The sea bream and sea bass unigene associated GO terms were produced more reliable sequence similarity identification with added retrieved from two BLASTX IDs, either directly from the first BLAST biological annotation. The Trembl BLASTX query outputs produced more hit Trembl ID, or from the merged stickleback transcripts and human matches, with almost 80% of the most significant matches to the proteins peptide Ensembl IDs (described in Fig. 1 and Section 2.4). The of fish species. However the biological annotation of these matches stickleback was chosen because it is the most phylogenetically related was very limited. For both the sea bream and sea bass Trembl BLASTX teleost fish to the species under study for which genome data and outputs, approximately half of the most significant matches were to annotation is available. Human was used as the final anchor species, the green spotted pufferfish genome predicted proteins and about since it is the species with the most GO term annotated genes. BLASTN one fifth were to the zebrafish genome automatically predicted, non- queries against the stickleback transcript database enabled identifica- curated, proteins. The sea bass Trembl BLASTX queries had a much higher tion of the ESTs consisting mainly of UTR regions, which had produced percentage of positive matches compared with those of the sea bream no significant results on the BLASTX queries to the Swissprot, Trembl (Table 2). This, together with the higher number of matches against and human peptide databases. Genbank dbEST using BLASTN, indicated that the sea bass unigenes were A higher than threefold increase in total GO terms retrieved for the more representative of coding region sequences (CDS) with comparison two species was obtained using human peptides IDs in comparison to to sea bream. Additional evidence in support of this conclusion was the using Trembl IDs (Table 3). This was because of the greater depth of

Fig. 3. Gene Ontology annotation of the sea bass total ESTs according to Biological Process and Molecular Function. Direct child GO terms of cellular process and binding are represented in their respective stacked bars. B. Louro et al. / Marine Genomics 3 (2010) 179–191 185

Table 4 Sea bream and sea bass integrated unigenes tissue-specific gene-enrichment analysis. Biological annotation clusters containing molecular function and biological process GO terms, Interpro protein functional domains, KEGG and Biocarta pathways.

Category Term % P value Fold

Spleen Interpro IPR011261:RNA , dimerisation 1.0 5.8E−06 19.6 Interpro IPR011262:RNA polymerase, insert 0.7 1.1E−04 32.7 Interpro IPR011263:RNA polymerase, RpoA/D/Rpb3-type 0.7 1.1E−04 32.7 GO bio p GO:0043285~biopolymer catabolic process 3.9 7.6E−04 2.2 GO mol f GO:0016502~nucleotide receptor activity 1.1 7.0E−04 6.3

Skin GO bio p GO:0007517~muscle development 3.9 1.5E−04 4.1 GO bio p GO:0048747~muscle fiber development 2.3 1.5E−04 8.6 GO bio p GO:0048741~skeletal muscle fiber development 2.3 1.5E−04 8.5 Interpro IPR000315:Zinc finger, B-box 2.3 6.7E−04 6.6 Interpro IPR006574:SPRY-associated 1.9 9.5E−04 7.9

Muscle GO bio p GO:0007517~muscle development 4.1 4.0E−07 4.3 GO bio p GO:0044267~cellular protein metabolic process 28.7 2.0E−08 1.5 GO bio p GO:0006413~translational initiation 1.9 5.5E−04 4.8

Intestine GO bio p GO:0044255~cellular lipid metabolic process 7.4 3.9E−07 2.3 GO bio p GO:0019752~carboxylic acid metabolic process 6.9 6.5E−07 2.3 GO bio p GO:0032787~monocarboxylic acid metabolic process 3.8 1.1E−05 3.0 GO bio p GO:0006631~fatty acid metabolic process 3.0 3.6E−05 3.3 Interpro IPR000265:Prostanoid EP3 receptor 0.8 4.4E−06 32.4 Interpro IPR001481:Prostanoid EP3 receptor, type 2 0.8 4.4E−06 32.4 Interpro IPR001244:Prostaglandin DP receptor 0.8 2.9E−05 23.1 Interpro IPR008365:Prostanoid receptor 0.8 5.1E−04 12.5 GO bio p GO:0015031~protein transport 7.4 7.3E−06 2.1 GO bio p GO:0045184~establishment of protein localization 7.8 8.0E−06 2.0 GO bio p GO:0051649~establishment of cellular localization 7.9 2.3E−04 1.7 GO bio p GO:0046907~intracellular transport 6.8 2.6E−04 1.8 KEGG p hsa00071:Fatty acid metabolism 1.7 1.1E−04 5.0 GO mol f GO:0050660~FAD binding 1.7 3.9E−04 4.4 GO bio p IPR007905:Emopamil−binding 0.8 1.3E−05 27.0 GO bio p GO:0008202~steroid metabolic process 2.8 2.7E−04 2.9 GO mol f GO:0047750~cholestenol delta- activity 0.7 3.4E−04 24.1 Interpro IPR006574:SPRY-associated 1.3 6.3E−04 5.4 Interpro IPR008144:Guanylate 1.3 3.5E−05 8.4 Interpro IPR008145:/L-type calcium channel region 1.3 6.6E−05 7.6 GO bio p GO:0006732~coenzyme metabolic process 3.0 1.6E−04 2.9

Heart and vessels GO bio p GO:0000074~regulation of progression through cell cycle 6.3 3.6E−05 2.3 GO bio p GO:0051726~regulation of cell cycle 6.3 4.0E−05 2.2 GO bio p GO:0045786~negative regulation of progression through cell cycle 3.1 5.4E−04 2.8 GO bio p GO:0055086~nucleobase, nucleoside and nucleotide metabolic process 3.7 1.6E−04 2.8 GO bio p GO:0009117~nucleotide metabolic process 3.5 3.0E−04 2.8 GO bio p GO:0006412~translation 6.3 8.5E−04 1.9 GO mol f GO:0004749~ribose phosphate diphosphokinase activity 0.8 1.4E−04 32.6 Interpro IPR005946:Ribose-phosphate pyrophosphokinase 0.8 1.6E−04 31.2 GO mol f GO:0016778~diphosphotransferase activity 0.8 4.6E−04 23.5 GO mol f GO:0016818~ activity, acting on acid anhydrides, in phosphorus-containing anhydrides 6.5 1.8E−04 2.0 GO mol f GO:0016462~pyrophosphatase activity 6.1 8.1E−04 1.9 GO bio p GO:0044267~cellular protein metabolic process 24.9 8.5E−05 1.4

Head/trunk kidney GO mol f GO:0032555~purine ribonucleotide binding 14.6 3.1E−07 1.5 GO mol f GO:0032559~adenyl ribonucleotide binding 11.9 3.2E−06 1.5 GO mol f GO:0030554~adenyl nucleotide binding 12.2 7.5E−06 1.4 GO bio p GO:0006461~protein complex assembly 3.3 1.7E−05 2.2 GO bio p GO:0065003~macromolecular complex assembly 4.9 1.7E−04 1.7 Interpro IPR000265:Prostanoid EP3 receptor 0.5 4.1E−05 18.4 Interpro IPR001481:Prostanoid EP3 receptor, type 2 0.5 4.1E−05 18.4 Interpro IPR001244:Prostaglandin DP receptor 0.5 2.6E−04 13.2 GO bio p GO:0007599~hemostasis 1.7 5.3E−05 3.1 GO bio p GO:0007596~blood coagulation 1.5 3.4E−04 2.9 GO bio p GO:0042060~wound healing 1.7 7.5E−04 2.5 GO bio p GO:0000074~regulation of progression through cell cycle 4.5 4.3E−04 1.7 GO bio p GO:0051726~regulation of cell cycle 4.5 5.0E−04 1.7 GO bio p GO:0015031~protein transport 5.8 1.4E−04 1.6 GO bio p GO:0045184~establishment of protein localization 6.0 2.8E−04 1.6 GO bio p GO:0046907~intracellular transport 5.8 4.5E−04 1.6

(continued on next page) 186 B. Louro et al. / Marine Genomics 3 (2010) 179–191

Table 4 (continued) Category Term % P value Fold

Head/trunk kidney GO bio p GO:0006886~intracellular protein transport 3.7 5.3E−04 1.8 GO bio p GO:0051649~establishment of cellular localization 6.7 8.1E−04 1.5 GO bio p GO:0007067~mitosis 2.4 8.0E−04 2.1 GO bio p GO:0000087~M phase of mitotic cell cycle 2.4 9.1E−04 2.0 GO bio p GO:0044267~cellular protein metabolic process 22.2 5.9E−05 1.2 GO bio p GO:0016192~vesicle-mediated transport 4.4 7.1E−04 1.7 Interpro IPR013128:Peptidase C1A, papain 0.7 2.6E−04 7.2 GO bio p GO:0044265~cellular macromolecule catabolic process 3.7 2.7E−05 2.1 GO bio p GO:0044275~cellular carbohydrate catabolic process 1.6 4.0E−04 2.8 GO bio p GO:0016052~carbohydrate catabolic process 1.6 7.1E−04 2.6 Interpro IPR013833:Cytochrome c oxidase, subunit III, 4-helical bundle 0.5 4.1E−05 18.4 Interpro IPR000298:Cytochrome c oxidase, subunit III 0.5 4.1E−05 18.4 Interpro IPR014014:RNA , DEAD-box type, Q motif 0.8 8.3E−04 4.4 GO bio p GO:0008219~cell death 6.5 6.7E−04 1.5 GO bio p GO:0012501~programmed cell death 6.1 8.4E−04 1.5 GO bio p GO:0006915~apoptosis 6.1 9.0E−04 1.5 Interpro IPR000568:ATPase, F0 complex, subunit A 0.6 2.7E−06 18.4 GO bio p GO:0008632~apoptotic program 1.3 6.0E−04 3.0

Gill Interpro IPR014745:MHC class II, alpha/beta chain, N-terminal 9.8 2.0E−43 11.0 Interpro IPR000353:MHC class II, beta chain, N-terminal 9.6 2.6E−43 11.3 Biocarta h_mhcPathway:Antigen Processing and Presentation 5.6 1.3E−20 6.7 Biocarta h_eosinophilsPathway:The Role of Eosinophils in the Chemokine Network of Allergy 5.4 1.8E−20 6.9 Biocarta h_bbcellPathway:Bystander B Cell Activation 5.4 2.9E−20 6.8 Biocarta h_il5Pathway:IL 5 Signaling Pathway 5.4 4.7E−20 6.8 Biocarta h_asbcellPathway:Antigen Dependent B Cell Activation 5.4 1.9E−19 6.5 Biocarta h_blymphocytePathway:B Lymphocyte Cell Surface Molecules 5.4 1.9E−19 6.5 Biocarta h_tcraPathway:Lck and Fyn tyrosine in initiation of TCR Activation 5.4 2.4E−18 6.0 Biocarta H_th1th2Pathway:Th1/Th2 Differentiation 5.4 5.3E−18 5.9 Biocarta h_CSKPathway:Activation of Csk by cAMP-dependent Inhibits Signaling through the T Cell Receptor 5.4 3.5E−17 5.6 Biocarta h_inflamPathway:Cytokines and Inflammatory Response 5.4 7.2E−17 5.5 Biocarta h_ctla4Pathway:The Co-Stimulatory Signal During T-cell Activation 5.4 1.0E−16 5.4 GO mol f GO:0032555~purine ribonucleotide binding 13.9 8.8E−05 1.5 GO mol f GO:0032555~purine ribonucleotide binding 13.9 8.8E−05 1.5 Interpro IPR003006:Immunoglobulin/major histocompatibility complex motif 3.2 1.3E−05 3.4 Interpro IPR003597:Immunoglobulin C1-set 3.1 2.0E−05 3.4 GO bio p GO:0009889~regulation of biosynthetic process 2.9 2.5E−04 2.9 Interpro IPR011769:Adenylate/cytidine kinase, N-terminal 0.9 1.5E−04 16.6 GO bio p GO:0045184~establishment of protein localization 6.4 5.6E−04 1.8 GO mol f GO:0019001~guanyl nucleotide binding 4.4 2.7E−04 2.2 GO mol f GO:0032561~guanyl ribonucleotide binding 4.4 3.1E−04 2.2 Interpro IPR001404:Heat shock protein Hsp90 1.0 1.5E−04 11.1

Bone and cartilage GO mol f IPR002048:Calcium-binding EF-hand 4.2 4.1E−07 3.2 GO mol f GO:0005509~calcium ion binding 9.1 2.0E−05 1.8 GO mol f IPR011992:EF-Hand type 3.5 5.3E−05 2.8 GO bio p GO:0006941~striated muscle contraction 1.7 2.9E−07 8.7 Interpro IPR001978:Troponin 0.9 7.9E−07 25.8 GO bio p GO:0006936~muscle contraction 3.3 9.0E−07 3.7 GO bio p GO:0006937~regulation of muscle contraction 1.6 2.8E−06 8.0 Interpro IPR000169:Peptidase, cysteine peptidase 1.4 3.5E−05 6.9 GO bio p GO:0006767~water-soluble vitamin metabolic process 1.4 1.6E−04 5.7 Interpro IPR001101:Plectin repeat 0.8 3.9E−05 21.5 Interpro IPR002017:Spectrin repeat 1.1 7.9E−04 6.2 GO bio p GO:0051129~negative regulation of cellular component organization and biogenesis 1.4 1.3E−04 5.8 GO bio p GO:0051128~regulation of cellular component organization and biogenesis 1.7 8.2E−04 3.7 GO bio p GO:0006732~coenzyme metabolic process 3.0 1.2E−04 2.8 GO bio p GO:0006767~water-soluble vitamin metabolic process 1.4 1.6E−04 5.7 Interpro IPR008937:Ras guanine nucleotide exchange factor 1.3 5.6E−05 7.8 Interpro IPR002396:Selectin (CD62E/L/P antigen) 0.8 1.3E−04 16.7

Adipose Interpro IPR011261:RNA polymerase, dimerisation 1.1 2.5E−07 21.0 Interpro IPR014001:DEAD-like helicase, N-terminal 2.0 2.6E−04 3.6 Interpro IPR014021:Helicase, superfamily 1 and 2, ATP-binding 2.0 3.3E−04 3.5 Interpro IPR001650:DNA/RNA helicase, C-terminal 2.0 3.9E−04 3.4 GO mol f GO:0003899~DNA-directed RNA polymerase activity 1.5 2.2E−05 6.3 GO bio p GO:0001570~vasculogenesis 0.9 7.9E−04 7.9

Ovary and testis KEGG p hsa00500:Starch and sucrose metabolism 1.5 5.1E−06 3.0 KEGG p hsa00860:Porphyrin and chlorophyll metabolism 1.0 6.4E−06 4.1 Biocarta h_arenrf2Pathway:Oxidative Stress Induced Gene Expression Via Nrf2 0.8 2.4E−05 4.4 KEGG p hsa00040:Pentose and glucuronate interconversions 0.8 2.9E−05 4.9 B. Louro et al. / Marine Genomics 3 (2010) 179–191 187

Table 4 (continued) Category Term % P value Fold

Interpro IPR002213:UDP-glucuronosyl/UDP-glucosyltransferase 0.6 6.0E−05 6.0 KEGG p Hsa00150:Androgen and estrogen metabolism 1.0 8.7E−04 2.8 GO mol f GO:0015020~glucuronosyltransferase activity 0.6 9.9E−04 4.2 GO bio p GO:0046907~intracellular transport 5.9 1.2E−05 1.6 GO bio p GO:0015031~protein transport 5.5 8.1E−05 1.5 GO bio p GO:0000279~M phase 2.7 2.2E−04 1.9 GO bio p GO:0006399~tRNA metabolic process 1.8 1.2E−06 2.9 Biocarta GO:0004177~aminopeptidase activity 0.8 8.5E−05 4.2 GO bio p GO:0006100~tricarboxylic acid cycle intermediate metabolic process 0.6 7.9E−04 4.3 Interpro IPR009072:Histone-fold 0.8 5.8E−04 3.4 Biocarta h_setPathway:Granzyme A mediated Apoptosis Pathway 0.6 5.7E−04 4.9

Liver Interpro IPR002213:UDP-glucuronosyl/UDP-glucosyltransferase 1.7 5.2E−10 15.3 GO mol f GO:0015020~glucuronosyltransferase activity 1.8 5.5E−09 10.6 KEGG p hsa00040:Pentose and glucuronate interconversions 1.8 7.6E−08 7.8 KEGG p hsa00980:Metabolism of xenobiotics by cytochrome P450 2.8 3.8E−07 4.3 KEGG p hsa00860:Porphyrin and chlorophyll metabolism 2.1 4.7E−07 5.6 KEGG p hsa00500:Starch and sucrose metabolism 2.8 5.2E−06 3.6 Biocarta h_arenrf2Pathway:Oxidative Stress Induced Gene Expression Via Nrf2 1.5 6.8E−06 6.6 KEGG p Hsa00150:Androgen and estrogen metabolism 2.1 1.8E−05 4.1 GO bio p GO:0002541~activation of plasma proteins during acute inflammatory response 1.5 2.3E−06 8.1 KEGG p hsa04610:Complement and coagulation cascades 2.6 2.5E−06 4.0 GO bio p GO:0006958~complement activation, classical pathway 1.4 2.7E−06 9.5 GO bio p GO:0002526~acute inflammatory response 2.0 1.1E−05 4.9 GO bio p h_compPathway:Complement Pathway 1.1 1.0E−04 8.1 GO bio p GO:0002253~activation of immune response 1.7 1.2E−04 4.6 GO bio p GO:0019724~B cell mediated immunity 1.5 1.3E−04 5.1 GO bio p GO:0050776~regulation of immune response 1.8 6.9E−04 3.4 GO bio p GO:0050778~positive regulation of immune response 1.7 6.9E−04 3.7 GO bio p GO:0002684~positive regulation of immune system process 1.7 7.6E−04 3.7 Interpro IPR002401:Cytochrome P450, E-class, group I 1.5 8.2E−05 5.4 Interpro IPR001128:Cytochrome P450 1.7 2.1E−04 4.3 KEGG p hsa04610:Complement and coagulation cascades 2.6 2.5E−06 4.0 GO bio p GO:0008202~steroid metabolic process 3.4 2.5E−06 3.3 GO bio p GO:0016125~sterol metabolic process 1.8 2.2E−04 3.9 GO bio p GO:0008203~cholesterol metabolic process 1.7 3.1E−04 4.1 KEGG p hsa00190:Oxidative phosphorylation 3.1 2.4E−04 2.5 GO bio p GO:0009117~nucleotide metabolic process 3.5 3.4E−05 2.7 KEGG p hsa00190:Oxidative phosphorylation 3.1 2.4E−04 2.5 GO bio p GO:0009142~nucleoside triphosphate biosynthetic process 1.7 2.5E−04 4.2 GO bio p GO:0006754~ATP biosynthetic process 1.4 3.7E−04 5.0 GO bio p GO:0009260~ribonucleotide biosynthetic process 1.8 4.4E−04 3.6 GO bio p GO:0006164~purine nucleotide biosynthetic process 1.8 4.4E−04 3.6 GO bio p GO:0009108~coenzyme biosynthetic process 2.0 7.6E−04 3.2 Biocarta h_compPathway:Complement Pathway 1.1 1.0E−04 8.1 child GO term annotation associated with the human genes. Retrieving coherent biological annotation for most tissues. Gene clusters expressed GO terms from the most significant Trembl database BLASTX matches in specific tissues have functions and are involved in pathways known produced a more stringent annotation, as the protein matches were to be consistent with that given tissue. A particular tissue of interest is more similar (and more likely to come from other fish species) and the gill since it is not a tissue present in humans. An antigen processing hence this increased the probability of conservation of biological and presentation pathway (Biocarta h_mhcPathway) is highly enriched function. However, this could result in a bias towards the most studied with the presence of major histocompatibility complex (MHC) proteins since the fish GO annotation is less uniform and extensive glycoproteins. This is in agreement with previous observations and compared to that of the human database. the fact that the gill is rich in lymphoid tissue (Lukacs et al., 2010; Annotation of the sea bream and sea bass (Figs. 2 and 3) was Haugarvoll et al., 2008). Another example was the intestine annotation, further achieved via the biological process and molecular function enriched with a biological process cluster containing GO:0006631 GOA-slim sets (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/goslim) “fatty acid metabolic process” necessary for the digestion and uptake with the addition of their respective binding and cellular process of long fatty acids (Windler and Greten, 1989). Also enriched in the child terms. Although more sea bass unigenes had GO terms assigned intestine was the protein domain cluster for prostaglandin-related (43%) compared with the sea bream (29%), annotation of both species receptors (IPR000265, IPR001481, IPR001244, IPR008365) which produced similar GO profiles indicating that the lower level of sea mediate stimulation of intestinal epithelial secretion (Smith et al., bream GO term coverage was a good representation of the GO term 1987). This same protein domain was also enriched in the trunk and set used. head kidney where prostanoid receptors are involved in kidney vascular responsiveness and osmolarity regulation (Breyer and Breyer, 2001). 3.4. Tissue-specific gene-enrichment analysis Another highly enriched functional cluster in two different tissues (liver and testis/ovary) were pathways where UDP-glucuronosyl/UDP- The sea bream and sea bass tissue-specific biological annotations glucosyltransferase proteins are involved, such as KEGG Pentose and obtained with gene-enrichment analysis were carried out separately for glucuronate interconversions (hsa00040), androgen and estrogen each species (Supplementary files 1 and 2) and then merged using the metabolism (hsa00150) and metabolism of xenobiotics by cytochrome tissue-specific human Ensembl gene IDs (Table 4). The results show a P450 (hsa00980). 188 B. Louro et al. / Marine Genomics 3 (2010) 179–191

Fish are increasingly recognised as an important model species for Sea bass ornithine decarboxylase antizyme isoform a (SeabassC- bone disease related studies, hence it was noted that in the bone/ CL86Contig1) was present in the trunk kidney library while isoform b cartilage tissues results that the GO molecular function cluster (SeabassC-CL86Contig2) was present in muscle and brain/pituitary “calcium ion binding” was highly enriched. The parent GO term ”ion libraries. Similarly, sea bream Prefoldin subunit 3 isoform a (Sea- binding” is represented in Figs. 2 and 3. Proteins tagged with this GO bassC-CL86Contig1) was present in the adipose, liver, head kidney, term interact selectively and non-covalently with calcium ions (Ca2+), ovary, gill, and trunk kidney libraries and isoform b (SeabassC- with obvious roles in the chemical dynamics of bone mineralization, CL86Contig2) in ovary library. e.g., the secreted “Cartilage Acidic protein 1” that contains an EGF-like Highly similar gene products within the sea bream and sea bass domain and is detected in humans cartilage, bone, cultured chon- unigenes, numbering 57 and 33 respectively, were also identified (data drocytes and lung (Benz et al., 2002). Another example is “matrix not shown) using multiple contig alignment. Future detailed phyloge- gla 1” which associates with the organic matrix of bone and cartilage, netic analysis will be needed to confirm these putative paralogous in principle acting as an inhibitor of cartilage calcification (Luo et al., genes. 1997). 3.6. SNP detection 3.5. Alternative splice form analysis All contigs with a minimum depth of four overlapping sequences Single alignments of the 311 and 193 sea bream and sea bass were selected for SNP mining. For both sea bass and sea bream a single supracontigs (see Section 2.6) containing 767 and 400 contigs contig containing more than 50 EST was present and removed from the respectively identified 89 and 54 inserts, or partially different sequences. data set to minimize discovery of false positive candidate SNPs (Le This indicated alternative splice events in 79 and 46 sea bream and sea Dantec et al., 2004). Both contigs (2,006 ESTs and 1026 bp long in sea bass genes respectively. It was possible to confirm by interspecies bass and 111 ESTs and 913 bp long in sea bream) mapped to the mRNA-to-genomic alignments that 25 and 21 alternative splice events human aldolase A, fructose-bisphosphate (ALDOA) transcript variant were present in sea bream (Supplemental file 3, Fig. 4) and sea bass 2, mRNA. A total of 974 (21.3%) sea bass contigs of 4 to 37 ESTs (Supplemental file 4, Fig. 4) genes, respectively. No alternative selection and 1136 (21.6%) sea bream contigs of 4 to 22 ESTs were mined for of promoters was identified in either species (Fig. 4) due to the SNPs. In sea bass, 570 SNP candidates were discovered in 477,224 bp, methodology used rather than because they are not present in the representing one SNP candidate every 837 bp. In sea bream, 575 SNP transcriptomes. This latter type of alternative transcript isoform leads to candidates were discovered in 583,397 bp, representing one SNP variation of the first exon present in the transcript containing most if not candidate every 1014 bp. Although more contigs (and more ESTs) all the 5' UTR region, and these tend not to align well in mRNA-to- were screened for SNPs in the sea bream, the proportion of genomic alignments between different species (Wheelan et al., 2001) polymorphic contigs was lower in sea bream (31.3%) than in sea and hence are difficult to detect in comparative analyses. bass (36.4%). This could be due to fish originating from different The remaining supracontigs for which it was not possible to confirm aquaculture facilities with different degrees of kinship and/or different genomic organization and the respective alternative splice event, was geographical origins. Fewer transitions (49%) and transversions (25%) either due to no retrieval of the homologous green spotted pufferfish were detected in the sea bream than the sea bass (60% of transitions gene loci, no alignment at the target splice site region, or no alignment and 30% of transversions). The proportion of detected indels was more at all. than double in sea bream (26%) compared with the sea bass (10%) Several alternative transcripts were identified in specific tissues, possibly as a result of the fact that sea bream contigs include less although the possibility of overlap of expression cannot be ruled out. protein-coding regions (data not shown). Most of the indels (59%

Fig. 4. Alternative splice events identified in sea bream and sea bass gene products. B. Louro et al. / Marine Genomics 3 (2010) 179–191 189

Table 5 Detection of microsatellites (SSRs) found in sea bream and sea bass ESTs (SSR-ESTs). Number of repeats is considered 9, 6, 5, 4 and 4 for di-, tri, tetra-, penta- and hexanucleotides, respectively. Data for other fish are from (Ju et al., 2005).

Species Unigenes SSR-ESTs Number of SSRs / motif (%) SSRs total

n % Di- Tri- Tetra- Penta- Hexa- n

Sea bream 18,196 899 4.94 474 (47.6) 325 (32.6) 115 (11.5) 66 (6.6) 17 (1.7) 997 Sea bass 17,716 795 3.97 267 (37.6) 312 (41.5) 109 (14.5) 47 (6.3) 16 (2.1) 751 Zebrafish 24,003 1.749 7.3 1,497 (64) 579 (25) 202 (8.6) 55 (2.4) – 2,333 Medaka 8,158 209 2.6 105 (47) 81 (36) 27 (12) 10 (4.5) – 223 Killifish 16,726 369 2.2 237 (52) 155 (34) 43 (9.5) 18 (4.0) – 453

for sea bass and 72% for sea bream) were only one nucleotide long. In 4. Conclusions the sea bass, 25% and 8% of the indels were two and three nucleotides long, while in the sea bream, they were 21% and 6%, respectively. Their We reported the characterisation and analysis of approximately impact varies depending on their position in the genome. For instance, 30,000 ESTs from each of the sea bream and the sea bass ESTs. These SNPs located in the promoter region of a gene may influence gene were generated from a range of tissues. The comparative annotation expression while non-synonymous SNPs may alter gene function approach described here shows the utility of retrieving useful biological (Barnes, 2009). Some polymorphic microsatellites can be detected as annotation from less-related model species and in particular, from indels by SNP discovery tools and this could explain the high number of human, as this increases the amount of biological annotation. The indels longer than one nucleotide detected in the sea bream since this increasing democratization of genomics through the use of new, species contains more microsatellites than sea bass. The proportion of cheaper and higher throughput sequencing technologies (such as transitions, transversions and indels detected in the sea bass was Next Gen sequencing) will produce a large increase in the number of consistent with the proportion found while sequencing 80 sea bass transcriptomes from new species. These will lack phylogenetically loci (Souche et al, unpublished). Sequencing of a few sea bream loci closely related well-annotated species for assignment of putative would also clarify if the proportion of detected transitions, transversions functionality to genes and therefore the comparative annotation and indels (and the high number of one nucleotide long indels) is approach described here is useful and easy to adapt. The primary correct. goal of this project was production of the EST datasets for the aquaculture community, with the aim of these acting as a cornerstone 3.7. Microsatellite detection resource for the development of new genomic tools with broodstock and culturing applications. As a result, the sea bass and the sea bream A total of 899 and 795 sea bream and sea bass unigenes were have become part of the small group of genomic resource rich identified that harboured 997 and 751 SSRs respectively (Table 5). The aquaculture species. frequency of EST-SSRs was 3.97% for the sea bass and 4.94% for the sea Supplementary materials related to this article can be found online bream and is in the range reported for model fish species such as at doi:10.1016/j.margen.2010.09.005. medaka (2.6%), mummichog killifish (2.2%), zebrafish (7.3%) and 11.2% in channel catfish, Ictalurus punctatus (Ju et al., 2005; Serapion et al., 2004). The dinucleotide repeat motifs were the most abundant Acknowledgements SSRs in sea bream accounting for 47.6% of the SSRs as observed in other species (47%, 52%, and 64% for medaka, mummichog killifish, The authors acknowledge funding by the European Commission of and zebrafish, respectively), followed by the tri-, tetra-, and penta- the European Union through the Network of Excellence Marine and hexanucleotide SSRs (Table 5). In contrast, the most abundant Genomics Europe (contract GOCE-CT-2004-505403). BL benefited repeats in the sea bass were the trinucleotides (Table 5). In the sea from the Portuguese National Science Foundation (SFRH/BD/29171/ bass, the greater abundance of trinucleotides together with the 2006) and SABRETRAIN Marie Curie EST fellowships. The authors reduced number of EST-SSR % frequency might reflect the fact that thank Dr. Juan Fuentes and Carla Viegas (Centre of Marine Sciences) for sea bass unigenes in comparison to sea bream unigenes represented the tissue collection and RNA extraction, respectively, and Sven Klages fewer UTR and more coding regions (data not shown) where (MPI Molecular Genetics) for the bioinformatic support. trinucleotides are more abundant (Tóth et al., 2000). Among the dinucleotide motifs, AC/TG was the most abundant type in both References species. This was also observed in the other species except mummichog killifish in which AT/TA was the most abundant motif Adzhubei, A.A., Vlasova, A.V., Hagen-Larsen, H., Ruden, T.A., Laerdahl, J.K., Hoyheim, B., 2007. Annotated Expressed Sequence Tags (ESTs) from pre-smolt atlantic salmon (Ju et al., 2005). No CG/GC motif was found in the sea bass and sea (Salmo salar) in a searchable data resource. BMC Genomics 8, 209. bream ESTs. Ten trinucleotide SSR motifs were identified in sea bass Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment and sea bream; the AAC and AAG motifs were the most frequent in sea search tool. J. Mol. Biol. 215, 403–410. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., bass and sea bream, respectively, whereas the other 8 trinucleotide 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search motifs (AAT, ACC, ACG, ACT, AGC, AGG, ATC, and CCG) were programs. Nucleic Acids Res. 25, 3389–3402. approximately equally frequent. The length and location of an SSR Arends, R., Mancera, J., Munoz, J., Wendelaar Bonga, S., Flick, G., 1999. The stress fi motif is an important factor in determining its usefulness as a marker, response of gilthead sea bream (Sparus aurata L) to air exposure and con nement. J. Endocrinol. 143, 23–31. the longer the repeats, the higher the probability a marker will be Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., polymorphic. For subsequent marker development for linkage Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, mapping and population genetics studies it is advisable to prioritize A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene ontology: tool for the unification of biology. The Gene Ontology screening of EST-SSRs with relatively high repeat number for each Consortium. Nat. Genet. 25, 25–29. type,e.g.dinucleotideswithmorethan20repeats(11.7%of Barnes, M.R. Genetic variation analysis for biomedical researchers: A primer, in, 2009, dinucleotides), tri- and tetranucleotides with more than 15 repeats pp. 1-20. Batley, J., Barker, G., O'Sullivan, H., Edwards, K.J., Edwards, D., 2003. Mining for single (5.7 and 6.5%, respectively), and penta- and hexanucleotides with nucleotide polymorphisms and insertions/deletions in maize expressed sequence more than 6 repeats (11.3% and 7.1, respectively). tag data. Plant Physiol. 132, 84–91. 190 B. Louro et al. / Marine Genomics 3 (2010) 179–191

Benz, K., Breit, S., Lukoschek, M., Mau, H., Richter, W., 2002. Molecular analysis of Koop, B., von Schalburg, K., Leong, J., Walker, N., Lieph, R., Cooper, G., Robb, A., Beetz- expansion, differentiation, and growth factor treatment of human chondrocytes Sargent, M., Holt, R., Moore, R., Brahmbhatt, S., Rosner, J., Rexroad, C., McGowan, C., identifies differentiation markers and growth-related genes. Biochem. Biophys. Davidson, W., 2008. A salmonid EST genomic study: genes, duplications, phylogeny Res. Commun. 293, 284–292. and microarrays. BMC Genomics 9, 545. Bonaldo, M.F., Lennon, G., Soares, M.B., 1996. Normalization and subtraction: two Kuhl, H., Beck, A., Wozniak, G., Canario, A.V.M., Volckaert, F.A.M., Reinhardt, R., 2010. approaches to facilitate gene discovery. Genome Res. 6, 791–806. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC- Breyer, M.D., Breyer, R.M., 2001. G protein-coupled prostanoid receptors and the mapping and low coverage shotgun sequencing. BMC Genomics 11, 68. kidney. Annu. Rev. Physiol. 63, 579–605. Laiz-Carrion, R., Guerreiro, P.M., Fuentes, J., Canario, A.V.M., Del Rio, M.P.M., Mancera, Calduch-Giner, J., Davey, G., Saera-Vila, A., Houeix, B., Talbot, A., Prunet, P., Cairns, M., J.M., 2005. Branchial osmoregulatory response to salinity in the gilthead sea Perez-Sanchez, J., 2010. Use of microarray technology to assess the time course of bream, Sparus auratus.J.Exp.Zool.Comp.Exp.Biol.303A,563–576. liver stress response after confinement exposure in gilthead sea bream (Sparus Le Dantec, L., Chagné, D., Pot, D., Cantin, O., Garnier- Géré, P., Bedon, F., Frigerio, J.-M., aurata L.). BMC Genomics 11, 193. Chaumeil, P., Léger, P., Garcia, V., Laigret, F., de Daruvar, A., Plomion, C., 2004. Canario, A.V.M., Bargelloni, L., Volckaert, F., Houston, R.D., Massault, C., Guiguen, Y., Automated SNP detection in expressed sequence tags: statistical considerations 2008. Genomics toolbox for farmed fish. Rev. Fish. Sci. 16 (S1), 1–13. and application to maritime pine sequences. Plant Mol. Biol. 54, 461–470. Cerdà, J., Mercade, J., Lozano, J., Manchado, M., Tingaud-Sequeira, A., Astola, A., Lee, B.Y., Howe, A.E., Conte, M.A., D'Cotta, H., Pepey, E., Baroiller, J.F., di Palma, F., Infante, C., Halm, S., Vinas, J., Castellana, B., Asensio, E., Canabate, P., Martinez- Carleton, K.L., Kocher, T.D., 2010. An EST resource for tilapia based on 17 Rodriguez,G.,Piferrer,F.,Planas,J.,Prat,F.,Yufera,M.,Durany,O.,Subirada,F., normalized libraries and assembly of 116, 899 sequence tags. BMC Genomics 11, Rosell, E., Maes, T., 2008. Genomic resources for a commercial flatfish, the 278. Senegalese sole (Solea senegalensis): EST sequencing, oligo microarray design, Lemaire, C., Versini, J.J., Bonhomme, F., 2005. Maintenance of genetic differentiation and development of the Soleamold bioinformatics platform. BMC Genomics 9, across a transition zone in the sea: discordance between nuclear and cytoplasmic 508. markers. J. Evolution Biol. 18, 70–80. Chevreux, B., Pfisterer,T.,Drescher,B.,Driesel,A.J.,Muller,W.E.,Wetter,T.,Suhai, Li, P., Peatman, E., Wang, S., Feng, J., He, C., Baoprasertkul, P., Xu, P., Kucuktas, H., Nandi, S., 2004. Using the miraEST assembler for reliable and automated mRNA S., Somridhivej, B., Serapion, J., Simmons, M., Turan, C., Liu, L., Muir, W., Dunham, R., transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14, Brady, Y., Grizzle, J., Liu, Z., 2007. Towards the ictalurid catfish transcriptome: 1147–1159. generation and analysis of 31, 215 catfish ESTs. BMC Genomics 8, 177. Chini, V., Rimoldi, S., Terova, G., Saroglia, M., Rossi, F., Bernardini, G., Gornati, R., 2006. Lukacs, M.F., Harstad, H., Bakke, H.G., Beetz-Sargent, M., McKinnel, L., Lubieniecki, K.P., EST-based identification of genes expressed in the liver of adult seabass Koop, B.F., Grimholt, U., 2010. Comprehensive analysis of MHC class I genes from (Dicentrarchus labrax, L.). Gene 376, 102–106. the U-, S-, and Z-lineages in Atlantic salmon. BMC Genomics 11, 154. Chomczynski, P., Sacchi, N., 1987. Single-step method of RNA isolation by acid Luo, G., Ducy, P., McKee, M.D., Pinero, G.J., Loyer, E., Behringer, R.R., Karsenty, G., 1997. guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162, Spontaneous calcification of arteries and cartilage in mice lacking matrix GLA 156–159. protein. Nature 386, 78–81. Chou, H.H., Holmes, M.H., 2001. DNA sequence quality trimming and vector removal. Massault, C., Hellemans, B., Louro, B., Batargias, C., Houdt, J.K.J.V., Canario, A., Volckaert, Bioinformatics 17, 1093–1104. F.A.M., Bovenhuis, H., Haley, C., Koning, D.J.d., 2010. QTL for body weight, Clark, M.S., Edwards, Y.J., Peterson, D., Clifton, S.W., Thompson, A.J., Sasaki, M., Suzuki, morphometric traits and stress response in European sea bass Dicentrarchus Y., Kikuchi, K., Watabe, S., Kawakami, K., Sugano, S., Elgar, G., Johnson, S.L., 2003. labrax. Anim. Genet. 41, 337–345. Fugu ESTs: new resources for transcription analysis and genome annotation. Meiri, I., Gothilf, Y., Zohar, Y., Elizur, A., 2002. Physiological changes in the spawning Genome Res. 13, 2747–2753. gilthead seabream, Sparus aurata, succeeding the removal of males. J. Exp. Zool. Coblentz, F.E., Towle, D.W., Shafer, T.H., 2006. Expressed sequence tags from 292, 555–564. normalized cDNA libraries prepared from gill and hypodermal tissues of the blue Morozova, O., Marra, M.A., 2008. Applications of next-generation sequencing crab, Callinectes sapidus. Comp. Biochem. Physiol. Part D Genomics Proteomics 1, technologies in functional genomics. Genomics 92, 255–264. 200–208. Paschall, J., Oleksiak, M., VanWye, J., Roach, J., Whitehead, J., Wyckoff, G., Kolell, K., DL, Douglas, S., Gallant, J., Bullerwell, C., Wolff, C., Munholland, J., Reith, M., 1999. Winter C., 2004. FunnyBase: a systems level functional annotation of Fundulus ESTs for the flounder expressed sequence tags: establishment of an EST database and analysis of gene expression. BMC Genomics 5, 96. identification of novel fish genes. Mar. Biotechnol. 1, 458–464. Pawson, M.G., Pickett, G.D., Leballeur, J., Brown, M., Fritsch, M., 2007. Migrations, fishery Douglas, S.E., Knickle, L.C., Kimball, J., Reith, M.E., 2007. Comprehensive EST analysis of interactions, and management units of sea bass (Dicentrarchus labrax)in Atlantic halibut (Hippoglossus hippoglossus), a commercially relevant aquaculture Northwest Europe. ICES J. Mar. Sci. 64, 332–345. species. BMC Genomics 8, 144. Pertea, G., Huang, X., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, Edwards, N.J., 2007. Novel peptide identification from tandem mass spectra using ESTs J., Cheung, F., Parvizi, B., Tsai, J., Quackenbush, J., 2003. TIGR Gene Indices clustering and sequence database compression. Mol. Syst. Biol. 3, 102. tools (TGICL): a software system for fast clustering of large EST datasets. Ewing, B., Green, P., 1998. Base-calling of automated sequencer traces using phred. II. Bioinformatics 19, 651–652. Error probabilities. Genome Res. 8, 186–194. Piferrer, F., Blazquez, M., Navarro, L., Gonzalez, A., 2005. Genetic, endocrine, and Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automated sequencer environmental components of sex determination and differentiation in the traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185. European sea bass (Dicentrarchus labrax L.). Gen. Comp. Endocrinol. 142, 102–110. Ferraresso, S., Vitulo, N., Mininni, A., Romualdi, C., Cardazzo, B., Negrisolo, E., Reinhardt, Power, D., Llewellyn, L., Faustino, M., Nowell, M., Bjornsson, B., Einarsdottir, I., Canario, R., Canario, A., Patarnello, T., Bargelloni, L., 2008. Development and validation of a A., Sweeney, G., 2001. Thyroid hormones in growth and development of fish. Comp. gene expression oligo microarray for the gilthead sea bream (Sparus aurata). BMC Biochem. Physiol. C Toxicol. Pharmacol. 130, 447–459. Genomics 9, 580. Scapigliati, G., Romano, N., Buonocore, F., Picchietti, S., Baldassini, M., Prugnoli, D., Ferraresso, S., Milan, M., Pellizzari, C., Vitulo, N., Reinhardt, R., Canario, A.V.M., Galice, A., Meloni, S., Secombes, C., Mazzini, M., Abelli, L., 2002. The immune system Patarnello, T., Bargelloni, L., 2010. Development of an oligo DNA microarray for the of sea bass, Dicentrarchus labrax, reared in aquaculture. Dev. Comp. Immunol. 26, European sea bass and its application to expression profiling of jaw deformity. BMC 151–160. Genomics 11, 354. Senger, F., Priat, C., Hitte, C., Sarropoulou, E., Franch, R., Geisler, R., Bargelloni, L., Power, Garg, K., Green, P., Nickerson, D.A., 1999. Identification of candidate coding region D., Galibert, F., 2006. The first radiation hybrid map of a perch-like fish: the gilthead single nucleotide polymorphisms in 165 human genes using assembled expressed seabream (Sparus aurata L). Genomics 87, 793–800. sequence tags. Genome Res. 9, 1087–1092. Serapion, J., Kucuktas, H., Feng, J., Liu, Z., 2004. Bioinformatic mining of type I micro- Gong, Z., 1999. Zebrafish expressed sequence tags and their applications. Methods Cell satellites from expressed sequence tags of channel catfish (Ictalurus punctatus). Biol. 60, 213–233. Mar. Biotechnol. 6, 364–377. Gong, Z., Yan, T., Liao, J., Lee, S., He, J., Hew, C., 1997. Rapid identification and isolation of Smit, A.F.A., Hubley, R., Green, P. RepeatMasker Open-3.0, 1996-2010 http://www. zebrafish cDNA clones. Gene 201, 87–98. repeatmasker.org. Gonzalez, S., Chatziandreou, N., Nielsen, M., Li, W., Rogers, J., Taylor, R., Santos, Y., Smith, G., Warhurst, G., Lees, M., Turnberg, L., 1987. Evidence that PGE2 stimulates Cossins, A., 2007. Cutaneous immune responses in the common carp detected using intestinal epithelial cell adenylate cyclase by a receptor-mediated mechanism. Dig. transcript analysis. Mol. Immunol. 44, 1675–1690. Dis. Sci. 32, 71–75. Govoroun, M., Le Gac, F., Guiguen, Y., 2006. Generation of a large scale repertoire of Souche, E.L., Hellemans, B., Houdt, J.K.J.V., Canario, A., Klages, S., Reinhardt, R., Volckaert, Expressed Sequence Tags (ESTs) from normalised rainbow trout cDNA libraries. F.A.M., 2007. Mining for single nucleotide polymorphisms in expressed sequence BMC Genomics 7, 196. tags of European sea bass. J. Integr. Bioinform. 4, 73. Hagen-Larsen, H., Laerdahl, J.K., Panitz, F., Adzhubei, A., Høyheim, B., 2005. An EST- Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The based approach for identifying genes expressed in the intestine and gills of pre- CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment smolt atlantic salmon (Salmo salar). BMC Genomics 6, 171. aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Haugarvoll, E., Bjerkas, I., Nowak, B.F., Hordvik, I., Koppang, E.O., 2008. Identification Tóth, G., Gáspári, Z., Jurka, J., 2000. Microsatellites in different eukaryotic genomes: and characterization of a novel intraepithelial lymphoid tissue in the gills of survey and analysis. Genome Res. 10, 967–981. Atlantic salmon. J. Anat. 213, 202–209. Vogiatzi, E., Lagnel, J., Pakaki, V., Louro, B., Canario, A.V.M., Reinhardt, R., Kotoulas, G., Hirono, I., Aoki, T., 1997. Expressed sequence tags of medaka (Oryzias latipes) liver Magoulas, A., Tsigenopoulos, C.S. In silico mining and characterization of simple mRNA. Mol. Mar. Biol. Biotechnol. 6, 345–350. sequence repeats from gilthead sea bream (Sparus aurata) expressed sequence tags Ju, Z., Wells, M.C., Martinez, A., Hazlewood, L., Walter, R.B., 2005. An in silico mining for (EST-SSRs); PCR amplification, polymorphism evaluation and multiplexing and simple sequence repeats from expressed sequence tags of zebrafish, medaka, cross-species assays, Marine Genomics, (submitted). Fundulus, and Xiphophorus. In Silico Biol. 5, 439–463. Wang, S., Peatman, E., Abernathy, J., Waldbieser, G., Lindquist, E., Richardson, P., Lucas, Kent, W.J., 2002. BLAT- the BLAST-like alignment tool. Genome Res. 12, 656–664. S., Wang, M., Li, P., Thimmapuram, J., Liu, L., Vullaganti, D., Kucuktas, H., Murdock, B. Louro et al. / Marine Genomics 3 (2010) 179–191 191

C., Small, B.C., Wilson, M., Liu, H., Jiang, Y., Lee, Y., Chen, F., Lu, J., Wang, W., Xu, P., Zhi-Liang, H., Bao, J., Reecy, J.M., 2008. CateGOrizer: A web-based program to batch Somridhivej, B., Baoprasertkul, P., Quilang, J., Sha, Z., Bao, B., Wang, Y., Wang, Q., gene ontology classification categories, in. Online J. Bioinformatics 5. Takano, T., Nandi, S., Liu, S., Wong, L., Kaltenboeck, L., Quiniou, S., Bengten, E., Miller, Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R., Siebert, P.D., 2001. N., Trant, J., Rokhsar, D., Liu, Z., 2010. Assembly of 500,000 inter-specific catfish template switching: a SMART approach for full-length cDNA library construction. expressed sequence tags and large scale gene-associated marker development for Biotechniques 30, 892–897. whole genome association studies. Genome Biol. 11, R8. Zhulidov, P.A., Bogdanova, E.A., Shcheglov, A.S., Vagner, L.L., Khaspekov, G.L., Wheelan, S.J., Church, D.M., Ostell, J.M., 2001. Spidey: a tool for mRNA-to-genomic Kozhemyako, V.B., Matz, M.V., Meleshkevitch, E., Moroz, L.L., Lukyanov, S.A., alignments. Genome Res. 11, 1952–1957. Shagin, D.A., 2004. Simple cDNA normalization using kamchatka crab duplex- Windler, E., Greten, H., 1989. Intestinal lipid and lipoprotein metabolism, in. W. specific nuclease. Nucleic Acids Res. 32, e37. Zuckschwerdt Verlag, München.