A Comprehensive Suite for Functional Analysis in Plant Genomics

Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2008, Article ID 619832, 12 pages doi:10.1155/2008/619832

Resource Review Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics

Ana Conesa and Stefan Gotz¨

Bioinformatics Department, Centro de Investigacion´ Pr´ıncipe Felipe, 4012 Valencia, Spain

Correspondence should be addressed to Ana Conesa, [email protected]

Received 5 October 2007; Accepted 26 November 2007

Recommended by Chunguang Du

Functional annotation of novel sequence data is a primary requirement for the utilization of functional genomics approaches in plant research. In this paper, we describe the Blast2GO suite as a comprehensive bioinformatics tool for functional annotation of sequences and data mining on the resulting annotations, primarily based on the gene ontology (GO) vocabulary. Blast2GO optimizes function transfer from homologous sequences through an elaborate algorithm that considers similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. The tool includes numerous functions for the visualization, management, and statistical analysis of annotation results, including gene set enrichment analysis. The application supports InterPro, enzyme codes, KEGG pathways, GO direct acyclic graphs (DAGs), and GOSlim. Blast2GO is a suitable tool for plant genomics research because of its versatility, easy installation, and friendly use.

Copyright © 2008 A. Conesa and S. Gotz.¨ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION The use of controlled vocabularies greatly facilitates the ex- change of biological knowledge and the benefit from com- Functional genomics research has expanded enormously in putational resources that manage this knowledge. The gene the last decade and particularly the plant biology research ontology (GO, http://www.geneontology.org)[3]isproba- community has extensively included functional genomics bly the most extensive scheme today for the description of approaches in their recent research proposals. The num- gene product functions but also other systems such as en- ber of Affymetrix plant GeneChips, for example, has dou- zyme codes [4], KEGG pathways [5], FunCat [6], or COG bled in the last two years [1] and extensive international ge- [7] are widely used within molecular databases. Many bioin- nomics consortia exist for major crops (see last PAG Con- formatics tools and methods have been developed to assist ference reports for an updated impression on current plant in the assignment of functional terms to gene products (re- genomics, http://www.intl-pag.org). Not less importantly, viewed in [8]). Fewer resources, however, are available when many middle-sized research groups are also setting up plant it comes to the large-scale functional annotation of novel se- EST projects and producing custom microarray platforms quence data of nonmodel species, as would be specifically [2]. This massive generation of plant sequence data and rapid required in many plant functional genomics projects. Web- spread of functional genomics technologies among plant re- based tools for the functional annotation of new sequences search labs has created a strong demand for bioinformat- include AutoFact [9], GOanna/AgBase [10], GOAnno [11], ics resources adapted to vegetative species. Functional an- Goblet [12], GoFigure + GoDel [13], GoPET [14], Gotcha notation of novel plant DNA sequences is probably one of [15], HT-GO-FAT (liru.ars.usda.gov/ht-go-fat.htm), Inter- the top requirements in plant functional genomics as this ProScan [16], JAFA [17], OntoBlast [18], and PFP [19]. Ad- holds, to a great extent, the key to the biological inter- ditionally, functional annotation capabilities are usually in- pretation of experimental results. Controlled vocabularies corporated in EST analysis pipelines. A few relevant exam- have imposed along the way as the strategy of choice for ples are ESTExplorer, ESTIMA, ESTree. or JUICE (see [2]for the effective annotation of the function of gene products. a survey in EST analysis). These resources are valuable tools 2 International Journal of Plant Genomics for the assignment of functional terms to uncharacterized se- in functional genomics projects where thousands of frag- quences but usually lack high-throughput and data mining ments need to be characterized. In principle, B2G accepts capabilities, in the first case, or provide automatic solutions any amount of records within the memory resources of the without much user interactivity, in the second. In this pa- user’s work station. Typical data files of 20 to 30 thousand per, we describe the Blast2GO (B2G, www.blast2go.org)ap- sequences can be easily annotated on a 2 Giga RAM PC plication for the functional annotation, management, and (larger projects may use the graphical interface free version data mining of novel sequence data through the use of com- of Blast2GO). During the annotation process, intermediate mon controlled vocabulary schemas. The philosophy be- results can be accessed and modified by the user if desired. hind B2G development was the creation of an extensive, Flexible annotation. Functional annotation in Blast2GO is user-friendly, and research-oriented framework for large- based on homology transfer. Within this framework, the ac- scale function assignments. The main application domain tual annotation procedure is configurable and permits the of the tool is the functional genomics of nonmodel organ- design of different annotation strategies. Blast2GO annota- isms and it is primarily intended to support research in ex- tion parameters include the choice of search database, the perimental labs where bioinformatics support may not be strength and number of blast results, the extension of the strong. Since its release in September 2005 [20], more than query-hit match, the quality of the transferred annotations, 100 labs worldwide have become B2G users and the appli- and the inclusion of motif annotation. Vocabularies sup- cation has been referenced in over thirty peer-reviewed pub- ported by B2G are gene ontology terms, enzyme codes (EC), lications (www.blast2go.org/citations). Although B2G has a InterPro IDs, and KEGG pathways. broad species application scope, the project originated in a crop genomics research environment and there is quite some Data mining on annotation results. Blast2GO is not a mere accumulated experience in the use of B2G in plants, which generator of functional annotations. The application in- includes maize, tobacco, citrus, Soybean, grape, or tomato. cludes a wide range of statistical and graphical functions for Projects range from functional assignments of ESTs [21–24] the evaluation of the annotation procedure and the final re- to GO term annotation of custom or commercial plant mi- sults. Especially, (relative) abundance of functional terms can croarrays [25, 26], functional profiling studies [27–29], and be easily assessed and visualized. functional characterization of specific plant gene families The first release of B2G covered basic application func- [30, 31]. tionalities: high-throughput blast against NCBI or local In the following sections we will explain more extensively databases, mapping, annotation, and gene set enrichment the concepts behind Blast2GO. We will describe in detail analysis; scalar vector graphics (SVG) combined graphs and main functionalities of the application and show a use case basic distributions charts. Enhanced modules for massive that illustrates the applicability of B2G to plant functional blast, modification of annotation intensity, curation, addi- genomics research. tional vocabularies, high-performing customizable graphs and pathway charts, data mining and sequence handling, as 2. BLAST2GO HIGHLIGHTS well as a wide array of input and output formats have been incorporated into the Blast2GO suite. Four main driving concepts form the foundation of the Blast2GO software: biology orientation, high-throughput, 3. THE BLAST2GO APPLICATION annotation flexibility,and data-mining capability. Biology orientation. The target users of Blast2GO are biol- Figure 1 shows the basic components of the Blast2GO suite. ogy researchers working on functional genomics projects in Functional assignments proceed through an elaborate anno- labs where strong bioinformatics support is not necessar- tation procedure that comprises a central strategy plus re- ily present. Therefore, the application has been conceived to finement functions. Next, visualization and data mining en- be easy to install, to have minimal setup and maintenance gines permit exploiting the annotation results to gain func- requirements, and to offer an intuitive user interface. B2G tional knowledge. has been implemented as a multiplatform Java desktop application made accessible by Java Webstart technology. This 3.1. The annotation procedure solution employs the higher versatility of a locally running application while assuring automatic updates provided that The Blast2GO annotation procedure consists of three main an internet connection is available. This implementation has steps: blast to find homologous sequences, mapping to col- proven to work very efficiently in the fast transfer to users of lect GO terms associated to blast hits, and annotation to as- new functionalities and for bug fixes. Furthermore, access to sign trustworthy information to query sequences. Once GO data in B2G is reinforced by graphical parameters that on one terms have been gathered, additional functionalities enable hand allow the easy identification and selection of sequences processing and modification of annotation results. at various stages of the annotation process and, on the other Blast step. The first step in B2G is to find sequences similar hand, permit the joint visualization of annotation results and to a query set by blast [32]. B2G accepts nucleotide and pro- highlighting of most relevant features. tein sequences in FASTA format and supports the four ba- High-throughput while interactive. Blast2GO strives to be the sic blast programs (blastx, blastp, blastn, and tblastx). Ho- application of choice for the annotation of novel sequences mology searches can be launched against public databases A. Conesa and S. Gotz¨ 3

NCBI Local Annex WS Blast GOSlim EBI

Manual IPS Mapping

Annotation GO InterPro EC KEGG

Stats charts Sequence color Pathways Enrichment High- GO DAG lighting

Filtering

Figure 1: Schematic representation of Blast2GO application. GO annotations are generated through a 3-step process: blast, mapping, annotation. InterPro terms are obtained from InterProScan at EBI, converted and merged to GOs. GO annotation can be modulated from Annex, GOSlim web services and manual editing. EC and KEGG annotations are generated from GO. Visual tools include sequence color code, KEGG pathways, and GO graphs with node highlighting and ﬁltering options. Additional annotation data-mining tools include statistical charts and gene set enrichment analysis functions.

Mapping step. Mapping is the process of retrieving GO terms = × DT max similarity ECweight associated to the hits obtained after a blast search. B2G AT = (#GO − 1) × GO ff weight performs three di erent mappings as follows. (1) Blast re- AR : lowest.node AS(DT + AT) ≥ threshold sult accessions are used to retrieve gene names (symbols) making use of two mapping files provided by NCBI (gene- Figure 2: Blast2GO annotation rule. info, gene2accession). Identified gene names are searched in the species-specific entries of the gene product table of the GO database. (2) Blast result GI identifiers are used to retrieve UniProt IDs making use of a mapping file from PIR (Non-redundant Reference Protein database) including PSD, such as (the) NCBI nr using a query-friendly version of blast UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB. (QBlast). This is the default option and in this case, no ad- (3) Blast result accessions are searched directly in the DBXRef ditional installations are needed. Alternatively, blast can be Table of the GO database. run locally against a proprietary FASTA-formatted database, which requires a working www-blast installation. The Make Annotation step. This is the process of assigning functional Filtered Blast-GO-BD function in the Tools menu allows terms to query sequences from the pool of GO terms gath- the creation of customized databases containing only GO- ered in the mapping step. Function assignment is based on annotated entries, which can be used in combination with the gene ontology vocabulary. Mapping from GO terms to the local blast option. Other configurable parameters at the enzyme codes permits the subsequent recovery of enzyme blast step are the expectation value (e-value) threshold, the codes and KEGG pathway annotations. The B2G annota- number of retrieved hits, and the minimal alignment length tion algorithm takes into consideration the similarity be- (hsp length) which permits the exclusion of hits with short, tween query and hit sequences, the quality of the source of low e-value matches from the sources of functional terms. GO assignments, and the structure of the GO DAG. For each Annotation, however, will ultimately be based on sequence query sequence and each candidate GO term, an annotation similarity levels as similarity percentages are independent score (AS) is computed (see Figure 2).TheASiscomposed of database size and more intuitive than e-values. Blast2GO of two terms. The first, direct term (DT), represents the high- parses blast results and presents the information for each se- est similarity value among the hit sequences bearing this GO quence in table format. Query sequence descriptions are ob- term, weighted by a factor corresponding to its evidence code tained by applying a language processing algorithm to hit de- (EC). A GO term EC is present for every annotation in the scriptions, which extracts informative names and avoids low- GO database to indicate the procedure of functional assign- content terms such as “hypothetical protein” or “expressed ment. ECs vary from experimental evidence, such as inferred protein”. by direct assay (IDA) to unsupervised assignments such as 4 International Journal of Plant Genomics inferred by electronic annotation (IEA). The second term database is a collection of manually curated univocal rela- (AT) of the annotation rule introduces the possibility of ab- tionships between GO terms from the different GO cate- straction into the annotation algorithm. Abstraction is de- gories that permits the inference of biological process and fined as the annotation to a parent node when several child cellular component terms from molecular function annota- nodes are present in the GO candidate pool. This term mul- tions. Up to 15% of annotation increase and around 30% tiplies the number of total GOs unified at the node by a user- of GO term confirmations are obtained through the Annex defined factor or GO weight (GOw) that controls the possi- dataset [20]. Secondly, annotation results can be summarized bility and strength of abstraction. When all ECw’s are set to through GOSlim mapping. GOSlim consists of a subset of 1 (no EC control) and the GOw is set to 0 (no abstraction is the gene ontology vocabulary encompassing key ontological possible), the annotation score of a given GO term equals the terms and a mapping function between the full GO and the highest similarity value among the blast hits annotated with GOSlim. Different GOSlim mappings are available, adapted that term. If the ECw is smaller than one, the DT decreases to specific biological domains. At present, GOSlim mappings and higher query-hit similarities are required to surpass the for plant, yeast, from GOA and Tair, as well as a generic one annotation threshold. If the GOw is not equal to zero, the AT are available from the GO through Blast2GO. Thirdly, the becomes contributing and the annotation of a parent node is manual curation function means that the user has the possi- possible if multiple child nodes coexist that do not reach the bility of editing annotation results and manually modifying annotation cutoff. Default values of B2G annotation param- GO terms and sequence descriptors. eters were chosen to optimize the ratio between annotation coverage and annotation accuracy [20]. Finally, the AR se- 3.3. Visualization and data mining lects the lowest terms per branch that exceed a user-defined threshold. One aspect of the uniqueness of the Blast2GO software is the The annotation step in B2G can be further adjusted by availability of a wide array of functions to monitor, evaluate, setting additional filters to the hit sequences considered as and visualize the annotation process and results. The pur- annotation source. A lower limit can be set at the e-value pose of these functions is to help understand how functional parameter to ensure a minimum confidence at the level of annotation proceeds and to optimize performance. homology. Similarly, % “hit” filter has been implemented to Statistical charts. Summary statistics charts are generated af- assure that a given percentage of the hit sequence is actu- ter each of the annotation steps. Distribution plots for e- ally spanned by the query. This parameter is of importance value and similarity within blast results give an idea of the de- to prevent potential function transfer from nonmatching se- gree of homology that query sequences have in the searched quence regions of modular proteins. Additionally, the mini- database. Once mapping has been completed, the user can mal hsp length required at the blast step permits control of check the distribution of evidence codes in the recovered the length of the matching region. GO terms and the original database sources of annotations. These charts give an indication of suitable values for B2G an- 3.2. Modulation of annotation notation parameters. For example, when a good overall level of sequence similarity is obtained for the dataset, the default Blast2GO includes different functionalities to complete and annotation cutoff value could be raised to improve annota- modify the annotations obtained through the above-defined tion accuracy. Similarly, if evidence code charts indicate a procedure. low representation of experimentally derived GOs, the user Additional vocabularies. Enzyme codes and KEGG path- might choose to increase the weight given to electronic an- way annotations are generated from the direct mapping of notations. After the final annotation step, new charts show GO terms to their enzyme code equivalents. Additionally, the distribution of annotated sequences, the number of GOs Blast2GO offers InterPro searches directly from the B2G in- per sequence, the number of sequences per GO, and the dis- terface. The user, identified by his/her email address, has the tribution of annotations per GO level, which jointly provide possibility of selecting different databases available at the In- a general overview of the performance of the annotation pro- terProEBI web server [33]. B2G launches sequence queries cedure. in batch, and recovers, parses, and uploads InterPro results. Sequence coloring. The visual approach of B2G is further rep- Furthermore, InterPro IDs can be mapped to GO terms and resented by the color code given to annotated sequences. merged with blast-derived GO annotations to provide one During the annotation process, the background color of ac- integrated annotation result. In this process, B2G ensures tive sequences changes according to their analysis status. that only the lowest term per branch remains in the final Nonblasted sequences are displayed in white and change to annotation set, removing possible parent-child relationships light red once a positive blast result is obtained. If the result originating from the merging action. was negative, they will stay dark red. Mapped sequences are Annotation fine-tuning. Blast2GO incorporates three addi- depicted in green while annotated sequences become blue. tional functionalities for the refinement of annotation re- Finally, manually curated sequences can be labeled and col- sults. Firstly, the Annex function allows annotation aug- ored purple (see Figure 3(A)). Sequence coloring is a simple mentation through the Second Layer concept developed and effective way of identifying sequences that have reached by The Norwegian University of Science and Technol- differential stages during the annotation process. Further- ogy (http://www.goat.no,[34]). Basically, the Second Layer more, sequences can be selected by their color. This is a very A. Conesa and S. Gotz¨ 5

Figure 3: Blast2GO user interface. (A) Main sequence table showing sequence color codes. (B) Graphical tab showing a combined graph with score highlighting.

Similarity distribution includes several parameters to make these combined graphs 9 easy to analyze and navigate. Firstly, the ZWF format [35], a 8 powerful scalable vector graphics engine, has been adopted 7 6 to make zooming and browsing through the DAG fast and 5 light. Secondly, annotation-rich areas of the generated DAG

Hits 4 can be readily spotted by a node-coloring function. B2G col- 3 2 ors nodes either by the number of sequences gathered at that 1 term (additive function) or by a node information score (ex- 0 · dist 0 102030405060708090100 ponential function, GOsseq α ) that considers the places Similarity (%) of direct annotation. This B2G score takes into account the amount of sequences collected at a given term but penalizes Figure 4: Similarity distribution of Soybean GeneChip. Similarity by the distance to the node of actual annotation [20]. The is computed of each query-hot pair as the sum of similarity values B2G score has shown to be a useful parameter for the identifi- for all matching hsps. cation of “hot” terms within a specific DAG (see Figure 3(B); Conesa, unpublished). Thirdly, the extension and density of the plotted DAG can be modulated by a node filter function. When the number of sequences involved in the combined useful function for the interactive use of the application. For graph is large, the resulting DAG can be too big to be prac- example, sequences that stayed dark red after blast (no posi- tical. B2G permits filtering out of low informative terms by tiveresult)canbeselectedtobelaunchedtoInterProScan. imposing a threshold on the number of annotated sequences Sequences that remained green (mapping code) after the or B2G scores for a node to be displayed. In this case, the annotation step can be selected and reannotated with more number of omitted nodes is given for each branch, which is permissive parameters. an indication of the level of local compression applied to that branch. Combined graph. A core functionality of Blast2GO is the joint visualization of groups of GO terms within the structure Enrichment analysis. A typical data mining approach ap- of the GO DAG. The combined graph function is typically plied in functional genomics research is the identification of used to study the collective biological meaning of a set of functional classes that statistically differ between two lists of sequences. Combined graphs are a good alternative to en- terms. For example, one might want to know the functional richment analysis (see below) where no reference set is to be categories that are over- or underrepresented in the set of considered or the number of involved sequences is low. B2G differentially expressed genes of a microarray experiment, or 6 International Journal of Plant Genomics

Blast hits 0 102030405060708090100

Arabidopsis thaliana Oryza sativa Medicago truncatula Unknown Glycine max Nicotiana tabacum Mus musculus Solanum tuberosum Zea mays Homo sapiens Ostreococcus tauri Danio rerio Pan troglodytes Pisum sativum Lycopersicon esculentum

Species Canis familiaris Macaca mulatta Bos taurus Rattus norvegicus Triticum aestivum Synechococcus sp. Strongylocentrotus purpuratus Hordeum vulgare Dictyostelium discoideum Xenopus laevis Gossypium hirsutum Gallus gallus Anopheles gambiae Drosophila melanogaster Others

Figure 5: Species distribution chart of Soybean GeneChip after blastx to NCBI nr.

Sequences GO level distribution 0 1020304050607080 12

IEA 10 RCA ND IDA 7.5 IMP ISS TAS 5 IEP NAS EC code IPI 2.5 IGI IC Number of annotations 0 NR 0 2 4 6 8101214161820 = = Figure 6: Evidence code distribution chart of Soybean GeneChip GO Level (total annotations 67865, mean level 4.84, = after mapping to B2G database. standard deviation 1.49) Biological process Molecular funcion Data distribution Cellular component

35 Figure 8: GO level distribution chart for Soybean Aﬀymetrix 30 GeneChip. Most sequences have between 3 and 6 GO terms anno- 25 tated. 20 15 10 with nodes colored by their signiﬁcance value. Also in this

Number of sequences 5 case, graph pruning and summarizing functions are avail- 0 able. No No No No Annotation Total blast result blast hit mapping annotation 3.4. Other functionalities Figure 7: Annotation process results for Soybean Affymetrix GeneChip. Next to the annotation and data mining functions, Blast2GO comprises a number of additional functionalities to handle data. In this section, we briefly comment on some of them. it could be of interest to find which functions are distinctly Import and export. B2G provides different formats for the ex- represented between different libraries of an EST collection. change of data. Typically, B2G inputs are FASTA-formatted Blast2GO has integrated the Gossip [36] package for statisti- sequences and returns a tab-delimited file with GO anno- cal assessment of differences in GO term abundance between tations. Other supported output formats are GOstats and two sets of sequences. This package employs the Fisher’s ex- GOSpring. Furthermore, B2G also accepts blast results in act test and corrects for multiple testing. For this analysis, the xml format. This option permits skipping the first step of involved sequences with their annotations must be loaded in the B2G annotation procedure when a blast result is al- the application. B2G returns the GO terms under- or over- ready present. Similarly, when accession IDs or gene sym- represented at a specified significance value. Results are given bols are known for the query sequences, these can be directly as a plain table and graphically as a bar chart and as a DAG uploaded in B2G and the application will query the B2G A. Conesa and S. Gotz¨ 7

Merge Interpro annotation results sive in the annotation coverage. For a great deal of sequences with a positive blast result, functional information is avail- 70 able in the GO database and the final annotation success is 60 50 related to the length and quality of the query sequence and 40 the strictness of annotation parameters. Typically and using 30 default parameters, around 50–60% of annotation success is 20 common for EST datasets and slightly higher values are ob- 10 tained for full-length proteins (Table 1). Number of annotations 0 On average, between 3 and 6 GO terms are assigned per Before After Confirmed Too general sequenceatameanGOlevelverycloseto5.InterPro,An- Figure 9: InterPro merging statistics. Red column: total number of nex, and GOw annotation parameters significantly increase GO annotations before adding InterPro-based GO terms. Blue: total annotation intensity—around 15%—and validate annota- number of GO annotations after adding InterPro-based GO terms. tion results. Furthermore, default annotation options tend to Green: number of blast-based GO terms confirmed after InterPro provide coherent results and resemble the functional assign- merging. Yellow: too general terms removed after InterPro merging. ment obtained by a human computational reviewed analysis [37].

3.6. Use case database for their annotations. Moreover, the Main Sequence table (see Figure 3) can be saved to a file at any moment In this section, we present a typical use case of Blast2GO to store intermediate results. Finally, graph and enrichment to illustrate the major application features described in the analysis results are presented both graphically and as text previous sections. We will address the functional annota- ff files. tion of the Soybean A ymetrix GeneChip. The GeneChip Soybean Genome Array targets over 37,500 Soybean tran- Validation. The “true path rule” defined by the gene ontology scripts (www.affymetrix.com). The array also contains tran- consortium for the GO DAG assures that all the terms in the scripts for studying two pathogens important for Soybean pathway from a term up to the root must always be true for research. Sequence data and a detailed annotation sheet for a given gene product. The B2G annotation validation func- the Soybean Genome Array are provided at the Blast2GO site tion applies this property to annotation results by removing (http://blast2go.bioinfo.cipf.es/b2gdata/soybean). any parent term that has a child within the sequence annotation set. B2G always executes validation after any modification has been made to the existing annotation, for example, Blast after InterPro merging, Annex augmentation, or manual curation. Sequence data in FASTA format were uploaded into the application from the menu File → Open File. After selecting ComparisonoftwosetsofGOterms. Given two annotation re- the Blast menu, a dialog opens where we can indicate the sults, Blast2GO can compare their implicit DAG structures. parameters for the blast step. In our case, the easiest op- B2G computes the number of identical nodes, more general tion is to select the nr protein database and perform blast re- and more specific terms within the same branch, and terms motely on the NCBI server through Qblast. Additional blast located to different branches or different GO main categories. parametersarekeptatdefaultvalues:e-value threshold of Comparison is directional; this means that the active anno- 1e-3 and a recovery of 20 hits per sequence. These permis- tation file is contrasted to a reference or external one. Each sive values are chosen to retrieve a large amount of infor- GO term is compared to all terms in the reference set and the mation at this first time-consuming step. Annotation strin- best matching comparison result is recorded. Once a term is gency will be decided later in the annotation procedure. Fur- matched, it is removed from the query set. thermore, we set the hsp filter to 33 to avoid hits where the length of the matching region is smaller than 100 nu- cleotides. After launching, blast sequences turn red as results 3.5. Some performance figures arrive, up to a total of 22,788. Once blast is completed, we can visualize different charts (similarity, e-value, and species The annotation accuracy of Blast2GO has been evaluated by distributions, see supplementary material available online at comparing B2G GO annotation results to the existing an- doi:10.1155/2008/619832) to get an impression of the qual- notation in a set of manually annotated Arabidopsis pro- ity of the query sequences and the blast procedure. For ex- teins that had been previously removed from the nr database. ample, Statistics → Blast statistics → Similarity distribution This evaluation indicated that using B2G default parame- chart (see Figure 4) shows that most sequences have blast ters, nearly 70% of identical branch recovery was achievable, similarity values of 50–60% or higher. This information is which is at the top end of the methods that are based on ho- useful for choosing the annotation cutoff parameter at the mology search [20]. More recent evaluations have shown that annotation step, and suggests that taking a value of 60 would Blast2GO annotation behavior is consistent across species be adequate. Furthermore, the Species distribution chart (see and datasets. In general, the blast step has shown to be deci- Figure 5) shows a great majority of Arabidopsis sequences 8 International Journal of Plant Genomics

Table 1: Blast2GO performance ﬁgures of seven cDNA datasets. FE: percentage of sequences with some functional evidence (Mapping or InterProScan positive). BA: percentage of blast-based annotated sequences, #GO: number of GOs per sequence. GO L: mean GO level. IP: percentage of annotation increase by InterProScan. Ann: percentage of annotation increase by Annex. TA: total percentage of annotated sequences (including blast and InterPro). Datasets are described in [37].

DataSet FE BA no. GOs GO L IP Ann TA C. clementina 70.2 58.2 4.4 5.10 7.9 11.8 62.3 M. incog 70.7 55.7 5.6 4.95 11.8 9.9 63.9 T. harzianum 61.1 47.7 3.6 5.27 14.4 16.2 53.4 G. max 61.8 51.1 4.3 5.11 6.1 11.8 53.5 P. ﬂesus 50.1 34.4 5.2 5.07 21.9 10.6 45.1 A. phagocytophilum 56.6 42.5 3.0 4.91 35.4 20.9 49.1 Whale metagenome 69.5 50.7 3.0 4.45 17.6 18 58.8

Molecular function Score: 6379 is a is a is a is a is a is a is a

is a Molecular Binding Structural molecule Enzyme regulator Motor activity Catalytic activity transducer activity activity activity Transporter activity Score: 3559 Score: 4538 Score: 107.2 Score: 54 Score: 837 is a Score: 375 Score: 184

Transcription Translation is a is a regulator activity regulator activity is a is a is a is a is a is a is a is a Score: 418 Score: 46.19

Hydrolase activity Transferase activity Nucleotide binding Chromatin binding Nucleic acid binding Lipid binding Oxygen binding Carbohydrate binding Protin binding Signal transducer activity Score: 1560 Score: 1196 Score: 1378 Score: 24 Score: 1046 Score: 98 Score: 11 Score: 74 Score: 1178 Score: 178.8 is a is a is a is a is a is a is a is a is a Transferase activity, Hydrolase activity, Translation factor RNA binding Receptor activity transferring phosphorus- activity, nucleic acid binding DNA binding Receptor binding acting on ester bonds Score: 328 Score: 138 containing groups Score: 77 Score: 762 Score: 61 Score: 68.39 Score: 447

is a is a is a Nuclease activity Kinase activity Transcription factor Score: 114 Score: 745 activity Score: 340

Molecular function-combined graph

Figure 10: Molecular function combined graph of GOSlim annotation of the Soybean Aﬀymetrix GeneChip. Nodes are colored by score value. within the blast hits, followed by Cotton, Medicago, Glycine, source annotation will be maintained at the annotation and Nicotiana. step.

Annotation Mapping Taking into consideration the charts generated by the previ- Mapping is a nonconfigurable option launched from the ous steps, we have chosen an annotation configuration with menu Mapping → Make Mapping. GO terms could be an e-value filter of 1e-6, default gradual EC weights, a GO found for 21,079 sequences (56%). Mapping charts (menu weight of 15, and an annotation cutoff of 60. This implies Statistics → Mapping Statistics) permit the evaluation of that only sequences with a blast e-value lower than 1e-6 will mapping results. The evidence code distribution chart (see be considered in the annotation formula, that the query- Figure 6) shows an overrepresentation of electronic annota- hit similarity value adjusted by the EC weight of the GO tions, although other nonautomatic codes, such as review term should be at least 60, and that abstraction is strongly by computational analysis (RCA), inferred by mutant phe- promoted. This annotation configuration resulted in 17,778 notype (IMP), or inferred by direct assay (IDA) are also successfully GO annotated sequences with a total of 70,035 well represented. This suggests that an annotation strategy GO terms at a mean GO level (distance of the GO term to that promotes nonelectronic ECs would be meaningful as the ontology root term) of 4.72. Furthermore, 6,345 enzyme it would benefit from the high-quality GO terms without codes were mapped to a total of 5,390 sequences. Once an- totally excluding electronic annotations. Therefore, the de- notation has been completed, we can visualize the results fault EC weights (menu Annotation → Set Evidence Code at each step of the annotation process (see Figure 7). Re- Weights) that adjust proportionally to the reliability of the annotation is possible by selecting green or red sequences A. Conesa and S. Gotz¨ 9

Diﬀerential GO term distribution Sequences (%) 0 10 20 30 40 50 60 70 80 90 100 110 120 130

Organelle membrane Transport ATPase activity,coupled to movement of substances ATPase activity,coupled to transmembrane movement... Intrinsic to membrane Organelle envelope Localization Cation transport Integral to membrane Plastid envelope Nucleoside-triphosphatase activity ATPase activity,coupled Macromolecule transmembrane transporter activity Protein transmembrane transporter activity ATPase activity,coupled to transmembrane movement... Transporter activity Transmembrane transporter activity Ion transport Outermembrane Hydrolase activity, acting on acid anhydrides, catalyzi... Ion transmembrane transporter activity Protein transporter activity Cation transmembrane transporter activity Substrate-specific transmembrane transporter activity Substrate-specific transporter activity ATPase activity,coupled to transmembrane movement... P-P-bond-hydrolysis-driven transmembrane transport... ATPase activity Active transmembrane transporter activity Envelope Pyrophosphatase activity Establishment of localization Primary active transmembrane transporter activity Chloroplast envelope Membrane part Hydrogen transport Proton transport Membrane Passive transmembrane transporter activity Channel activity Hydrolase activity, acting on acid anhydrides,... Hydrolase activity, acting on acid anhydrides Inorganic cation transmembrane transporter activity Hydrogen-exporting ATPase activity, phosphorylati. . . Wide pore channel activity Porin activity Cell part GO terms Cell Protein targeting Monovalent inorganic cation transport ATPase activity, uncoupled Protein targeting to mitochondrion Establishment of protein localization Chloroplast part Protein localization Plastidpart Macromolecule localization Organelle innermembrane Thylakoidpart NAD(P)H oxidase activity Oxidoreductase activity, acting on NADH or NADPH, wi. . . Photosynthesis, light reaction Photosystem Intracellularprotein transport Protein transport Thylakoid

Test set Reference set Figure 11: Bar chart for functional category enrichment analysis of Soybean membrane proteins. The Y-axis shows signiﬁcantly enriched GO terms and the X-axis give the relative frequency of the term. Red bars correspond to test set (membrane) and blue bars correspond to the whole Soybean genome array.

(Tools → Select Sequences by Color) and rerunning blast, tation needs to be changed. For example, the target of mapping, and annotation with different, more permissive GmaAffx.69219.1.S1 at probe was found to be the UDP- parameters. In this way, we obtain a trustworthy annotation glycosyltransferase. The automatic procedure assigned GO for most sequences and behave more permissively only for terms metabolic process (GO:0008152) and transferase ac- those sequences which are hard to annotate. Other charts tivity, transferring hexosyl groups (GO:0016758) to this se- available at the Annotation Statistics menu show the distri- quence. However, as we are aware of the ER localization of bution of GO levels (see Figure 8), the length of annotated this enzyme and its involvement in protein maturation, we sequences, and the histogram of GO term abundance. would like to add this information to the existing annotation. The manual curation function is available at the Sequence Annotation augmentation Menu which is displayed by mouse right button click on the selected sequence. From this Menu, the blast and annotation Blast-based GO annotations can be increased by means of the results for this particular sequence can be visualized. Selec- integrated InterProScan function available under Annotation tion of Change Annotations and Description edits the an- → Run InterProScan. The user must provide his/her email notation record of GmaAffx.69219.1.S1 at. We can now type address and select the motif databases of interest. An Inter- in the Annotations box the terms GO:0005783 (endoplasmic ProScan search against all EBI databases resulted in the re- reticulum) and GO:0006464 (protein modification process) covery of motif functional information for 11,347 sequences and mark the manual annotation box. The new annotations and a total of 8,046 GO terms. Once merged to the al- are then added and the sequence turns purple (manual an- ready existing annotation (Annotation → Add InterProScan notation color code). GOs to Annotation), 1,189 additional sequences were annotated (see Figure 9). Once Blast plus InterProScan annota- GOSlim tions have been gathered, a useful step is to complete implicit annotations through the Annex function (Annotation As the number of sequences and different GO terms in the → Augment Annotation by Annex). After this step, it is rec- Soybean array is quite large, we are interested in a sim- ommended to run the function to remove first-level anno- pler representation of the functional content of the data. An tations (under Annotation menu). In our use case, the An- appropriate option is to map annotations into a GOSlim. nex function resulted in the addition of 8,125 new GO terms At Annotation → Change to GOSlim View, we can se- and a confirmation of 3,892 annotations, which is an average lect an appropriate GOSlim (generic plant) for this dataset. contribution of the Annex function [37]. Upon completion of slimming sequences acquire the yellow GOSlim coding. The original annotations are stored and can Manual curation be recovered at any moment. GOSlim mapping generated a set of 105 different annotating GO terms on 18,820 se- The manual annotation tool is a useful functionality quences with a mean GO level of 3.41. This means around 40 when information on the automatically generated anno- times less functional diversity than in the original annotation 10 International Journal of Plant Genomics

Biological process GO: 0008150

is a is a is a

Cellular process Localization GO: 0051179 Metabolic process FDR: 2.38201E − 9FWER:1.68971E − 8 GO: 0009987 p-Value: 0 GO: 0008152 is a is a is a is a is a Macromolecule localization Cellular component GO: 0033036 Photosynthesis Generation of precursor organization and biogenesis is a − − metabolites and energy FDR: 4.80702E 8FWER:6.85E 7 GO: 00159 GO: 0016043 p-Value: 2.94146E − 9 GO: 0006091 is a is a is a

Establishment of localization Protein localization photosynthesis, light reaction Cellular localization GO: 0051234 GO: 0008104 − − GO: 0019684 GO: 0051641 FDR: 2.38201E − 9FWER:2.24787E − 8 FDR: 1.90847E 8FWER:2.62414E 7 − − − FDR: 2.1184E 7 p-Value: 3.08712E 11 p-Value: 1.67034E 9 FWER: 3.28351E − 6 p-Value: 1.58308E − 8 is a is a is a

Establishment of protein localization Establishment of cellular Transport GO: 0006810 GO: 0045184 − − localization FDR: 1.34568E − 8FWER:1.78303E − 7 FDR: 2.38201E 9FWER:1.68971E 8 GO: 005164 p-Value: 8.90903E − 10 p-Value: 0 is a is a is a is a is a Protein transport Ion transport Intracellular transport GO: 0015031 GO: 0006811 GO: 0046907 FDR: 5.18606E − 7FWER:8.4273E − 6 FDR: 2.38201E − 9FWER:2.03743E − 8 p-Value: 3.41157E − 8 p-Value: 1.2074E − 11

Hydrogen transport is a GO: 0006818 is aFDR: 2.38201E − 9FWER:2.26849E − 8 is a is a p-Value: 3.55845E − 11

Intracellular protein transport Cation transport GO: 0006886 GO: 0006812 FDR: 3.8086E − 7FWER:6.09374E − 6 FDR: 2.38201E − 9FWER:1.68971E − 8 p-Value: 3.1865E − 8 p-Value: 0

Mitochondrial transport GO: 0006839 is a is a is a

Protein targeting Monovalent inorganic cation transport GO: 0006605 GO: 0015672 FDR: 3.03575E − 9FWER:3.71879E − 8 − − − FDR: 7.02969E 9FWER:8.78711E 8 p-Value: 1.34834E 10 p-Value: 4.17795E − 10 is a is a is a Protein targeting to mitochondrion Proton transport GO: 0006626 GO: 0015992 FDR: 1.26156E − 8FWER:1.64003E − 7 FDR: 2.38201E − 9FWER:2.31925E − 8 p-Value: 7.77203E − 10 p-Value: 3.70634E − 11

Figure 12: Enriched graph (biological process) of the Soybean membrane subset of sequences. Node ﬁlter has been set at FDR < 1e-6. Nodes are colored accordingly to their FDR value in the Fisher exact’s test against the whole Soybean genome array.

(4533 different terms) and an increase of almost 2 levels of for the Molecular Function Category. The two most inten- the mean annotation depth. sively colored terms at the second GO level indicate the two most abundant functional categories in the Soybean Chip: Combined graph catalytic activity and binding. Highlighting at lower levels re- veals other, most informative, highly represented functional Once the slimmed annotation is obtained, we can visualize terms, such as hydrolase activity (level 3), kinase activity the functional information of the Soybean Genome Array (level 4), transcription factor activity (level 3), protein bind- on the GO DAG. This functionality is available under Anal- ing (level 3), nucleotide binding (level 3), and transporter ysis → Combined Graph. At the Dialog we must indicate activity (level 2). The reader is referred to the annotation the GO category to display (e.g., biological process). To ob- sheet URL (http://blast2go.bioinfo.cipf.es/b2gdata/soybean) tain a compact representation of the information, two fil- for figure navigation. ters can be applied. For example, by setting the sequence filter to 20, only those nodes with at least 20 sequence as- Enrichment analysis signments will be displayed. By setting the score filter to 20, additionally, parent nodes that do not annotate more The enrichment analysis function in B2G executes a sta- sequences than their children terms will be omitted from tistical assessment of differences in functional classes be- the graph. Node coloring by score value highlights the ar- tween two groups of sequences. To illustrate this function, eas in the resulting DAG where sequence annotations are we have selected all sequences in the Soybean chip which most concentrated. Figure 10 shows the Combined Graph contain the word “membrane” within their description—132 A. Conesa and S. Gotz¨ 11 sequences—and compared their annotations to the whole 4. CONCLUSIONS chip. We go to Analysis → Enrichment Analysis → Make Fisher’s Exact Test and browse for a text file containing the Functional annotation of novel sequence data is a key re- test set with the names of membrane sequences. As the com- quirement for the successful generation of functional ge- parison is made against the complete microarray dataset nomics in biological research. The Blast2GO suite has been loaded into the application, no file needs to be selected as developed to be a useful support to these approaches, espe- Reference. We uncheck the two-tail box to perform only pos- cially (but not exclusively) in nonmodel species. This bioin- itive enrichment analysis. Upon completion a table with test formatics tool is ideal for plant functional genomics re- statistical results is presented in the Statistics tab. This table search because of the following: (1) it is suitable for any contains significant GO terms which are ranked according species but can be also customized for specific needs, (2) to their significance. Three different significance parame- it combines high throughput with interactivity and cura- ters are given for false-positive control: false discovery rate tion, and (3) it is user-friendly and requires low bioinfor- ff (FDR), family-wise error rate (FWER), and single test P- matics e orts to get it running. In our opinion, the major value (Fisher P-value) (see [36] for details). By taking a FDR B2G strength is the combination of functional annotation significance threshold of 0.05, we obtain those functionali- and data mining on annotation results, which means that, ties that are strongly significant for membrane proteins in within one tool, researchers can generate functional annota- the Soybean Chip. These refer to processes related to trans- tion and assess the functional meaning of their experimental port, protein targeting, and photosynthesis as might be ex- results. Further developments of Blast2GO will reinforce this pected for a plant species. Graphical representations of these second aspect thought the integration of the tool with the results can be generated at Analysis → Enrichment Anal- Babelomics (www.babelomics.org,[38]) and GEPAS suites ysis → Bar Chart and Analysis → Enrichment Analysis → (www.gepas.org,[39]) for the statistical analysis of functional Make Enriched Graph. The Bar Chart shows, for each sig- profiling data. nificant GO term, frequency differences between the membrane and the whole chip datasets (see Figure 11). The En- ACKNOWLEDGMENTS riched Graph shows the DAG of significant terms with a node-coloring proportional to the significance value. This This work has been funded by the Spanish Ramon y Cajal representation helps in understanding the biological con- Program and the National Institute of Bioinformatics (a plat- text of functional differences and to find pseudoredundan- form of Genoma Espana). cies in the results—parent-child relationships within significant terms—(see Figure 12). REFERENCES [1] A. Conesa, J. Forment, J. Gadea, and J. van Dijk, “Microarray technology in agricultural research,” in Microarray Technology Through Applications, F. Falciani, Ed., pp. 173–209, Taylor & Export results Francis, New York, NY, USA, 2007. [2] S. H. Nagaraj, R. B. Gasser, and S. Ranganathan, “A hitch- hiker’s guide to expressed sequence tag (EST) analysis,” Brief- Once different analyses have been completed the data can ings in Bioinformatics, vol. 8, no. 1, pp. 6–21, 2007. be exported in many different ways. The annotation format [3] M. Ashburner, C. A. Ball, J. A. Blake, et al., “Gene Ontology: (menu File → Export → Export Annotations) is the default tool for the unification of biology. The Gene Ontology Con- format for export/import in B2G and simply consists of a sortium,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. tab-delimited file with two columns, one for sequences and [4] I. Schomburg, A. Chang, C. Ebeling, et al., “BRENDA, the enzyme database: updates and major new developments,” Nu- other for annotation IDs. Another useful export format is cleic Acids Research, vol. 32, Database issue, pp. D431–D433, GeneSpring, for communication with this interesting appli- 2004. cation, which consists of one row per sequence and three dif- [5] H. Ogata, S. Goto, K. Sato, W. Fujibuchi, H. Bono, and ferent columns showing the descriptions of the GO terms at M. Kanehisa, “KEGG: Kyoto Encyclopedia of Genes and the three main GO categories. Graphs can be saved in png Genomes,” Nucleic Acids Research, vol. 27, no. 1, pp. 29–34, format. Additionally, all information contained in the Com- 1999. bined Graph can be generated as table (including sequences, [6] A. Ruepp, A. Zollner, D. Maier, et al., “The FunCat, a func- GO IDs, levels, and scores) and exported (Analysis → Export tional annotation scheme for systematic classification of pro- Graph Information). teins from whole genomes,” Nucleic Acids Research, vol. 32, The analysis presented in this use case took about 15 days no. 18, pp. 5539–5545, 2004. to complete. Four days were necessary to obtain the totality [7]R.L.Tatusov,N.D.Fedorova,J.D.Jackson,etal.,“The of 37,500 blast results from the NCBI while twelve days were COG database: an updated version includes eukaryotes,” BMC Bioinformatics, vol. 4, p. 41, 2003. required for the InterProScan at the EBI web server. Mapping [8] S. Kumar and J. Dudley, “Bioinformatics software for biolo- and Annotation were ready within a few hours and one day gists in the genomics era,” Bioinformatics, vol. 23, no. 14, pp. was necessary to collect and evaluate charts. This shows that 1713–1717, 2007. with the adequate tools and some training, functional anno- [9] L. B. Koski, M. W. Gray, B. F. Lang, and G. Burger, “Auto- tation of a plant genome-wide sequence collection is in reach FACT: an automatic functional annotation and classification within a couple of weeks. tool,” BMC Bioinformatics, vol. 6, p. 151, 2005. 12 International Journal of Plant Genomics

[10] F. M. McCarthy, S. M. Bridges, N. Wang, et al., “AgBase: a genomes is primarily limited to plastid functions,” Current Bi- unified resource for functional analysis in agriculture,” Nucleic ology, vol. 16, no. 23, pp. 2320–2325, 2006. Acids Research, vol. 35, Database issue, pp. D599–D603, 2007. [28] M. J. Nueda, A. Conesa, J. A. Westerhuis, et al., “Discover- [11] F. Chalmel, A. Lardenois, J. D. Thompson, et al., “GOAnno: ing gene expression patterns in time course microarray exper- GO annotation based on multiple alignment,” Bioinformatics, iments by ANOVA-SCA,” Bioinformatics, vol. 23, no. 14, pp. vol. 21, no. 9, pp. 2095–2096, 2005. 1792–1800, 2007. [12] D. Groth, H. Lehrach, and S. Hennig, “GOblet: a platform for [29] T. D. Williams, A. M. Diab, S. G. George, V. Sabine, and J. Gene Ontology annotation of anonymous sequence data,” Nu- K. Chipman, “Gene expression responses of European floun- cleic Acids Research, vol. 32, pp. W313–W317, 2004. der (Platichthys flesus) to 17-β estradiol,” Toxicology Letters, [13] S. Khan, G. Situ, K. Decker, and C. J. Schmidt, “GoFigure: au- vol. 168, no. 3, pp. 236–248, 2007. tomated Gene Ontology annotation,” Bioinformatics, vol. 19, [30] J. Ma, D. J. Morrow, J. Fernandes, and V. Walbot, “Compara- no. 18, pp. 2484–2485, 2003. tive profiling of the sense and antisense transcriptome of maize [14] A. Vinayagam, C. del Val, F. Schubert, et al., “GOPET: a tool lines,” Genome Biology, vol. 7, no. 3, p. R22, 2006. for automated predictions of Gene Ontology terms,” BMC [31] R. T. Nelson and R. Shoemaker, “Identification and analysis Bioinformatics, vol. 7, p. 161, 2006. of gene families from the duplicated genome of soybean using [15] D. M. A. Martin, M. Berriman, and G. J. Barton, “GOtcha: a EST sequences,” BMC Genomics, vol. 7, p. 204, 2006. new method for prediction of protein function assessed by the [32] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lip- annotation of seven genomes,” BMC Bioinformatics, vol. 5, p. man, “Basic local alignment search tool,” Journal of Molecular 178, 2004. Biology, vol. 215, no. 3, pp. 403–410, 1990. [16] E. M. Zdobnov and R. Apweiler, “InterProScan—an integra- [33] A. Labarga, F. Valentin, M. Anderson, and R. Lopez, “Web ser- tion platform for the signature-recognition methods in Inter- vices at the European bioinformatics institute,” Nucleic Acids Pro,” Bioinformatics, vol. 17, no. 9, pp. 847–848, 2001. Research, vol. 35, Web Server issue, no. supplement 2, pp. W6– [17] I. Friedberg, T. Harder, and A. Godzik, “JAFA: a protein func- W11, 2007. tion annotation meta-server,” Nucleic Acids Research, vol. 34, [34] S. Myhre, H. Tveit, T. Mollestad, and A. Lægreid, “Additional Web Server issue, pp. W379–W381, 2006. Gene Ontology structure for improved biological reasoning,” [18] G. Zehetner, “OntoBlast function: from sequence similari- Bioinformatics, vol. 22, no. 16, pp. 2020–2027, 2006. ties directly to potential functional annotations by ontology [35] E. Pietriga, “A toolkit for addressing HCI issues in visual lan- terms,” Nucleic Acids Research, vol. 31, no. 13, pp. 3799–3803, guage environments,” in Proceedings of the IEEE Symposium 2003. on Visual Languages and Human-Centric Computing (VL/HCC [19] T. Hawkins, S. Luban, and D. Kihara, “Enhanced automated ’05), pp. 145–152, Dallas, Tex, USA, September 2005. function prediction using distantly related sequences and con- [36] N. Bluthgen,¨ K. Brand, B. Cajavec, M. Swat, H. Herzel, and textual association by PFP,” Protein Science, vol. 15, no. 6, pp. D. Beule, “Biological profiling of gene groups utilizing Gene 1550–1556, 2006. Ontology,” Genome Informatics, vol. 16, no. 1, pp. 106–115, [20] A. Conesa, S. Gotz,¨ J. M. Garc´ıa-Gomez,´ J. Terol, M. Talon,´ and 2005. M. Robles, “Blast2GO: a universal tool for annotation, visual- [37] S. Goetz, J. M. Garc´ıa-Gomez,´ J. Terol, et al., “High through- ization and analysis in functional genomics research,” Bioin- put functional annotation and data mining with the Blast2GO formatics, vol. 21, no. 18, pp. 3674–3676, 2005. suite,” submitted. [21] J. Terol, A. Conesa, J. M. Colmenero, et al., “Analysis of 13000 [38] F. Al-Shahrour, P. Minguez, J. Tarraga,´ et al., “BABELOMICS: unique Citrus clusters associated with fruit quality, production a systems biology perspective in the functional annotation of and salinity tolerance,” BMC Genomics, vol. 8, p. 31, 2007. genome-scale experiments,” Nucleic Acids Research, vol. 34, [22] J. A. Vizca´ıno,F.J.Gonzalez,´ M. B. Suarez,´ et al., “Generation, Web Server issue, pp. W472–W476, 2006. annotation and analysis of ESTs from Trichoderma harzianum [39]D.Montaner,J.Tarraga,´ J. Huerta-Cepas, et al., “Next station CECT 2413,” BMC Genomics, vol. 7, p. 193, 2006. in microarray data analysis: GEPAS,” Nucleic Acids Research, [23] A. C. Faria-Campos, F. S. Moratelli, I. K. Mendes, et al., “Pro- vol. 34, Web Server issue, pp. W486–W491, 2006. duction of full-length cDNA sequences by sequencing and analysis of expressed sequence tags from Schistosoma man- soni,” Memorias do Instituto Oswaldo Cruz, vol. 101, supplement 1, pp. 161–165, 2006. [24] D. S. Durica, D. Kupfer, F. Najar, et al., “EST library sequencing of genes expressed during early limb regeneration in the fiddler crab and transcriptional responses to ecdysteroid ex- posure in limb bud explants,” Integrative and Comparative Bi- ology, vol. 46, no. 6, pp. 948–964, 2006. [25] T. D. Williams, A. M. Diab, S. G. George, et al., “Develop- ment of the GENIPOL European flounder (Platichthys flesus) microarray and determination of temporal transcriptional responses to cadmium at low dose,” Environmental Science and Technology, vol. 40, no. 20, pp. 6479–6488, 2006. [26] M. Gand´ıa, A. Conesa, G. Ancillo, et al., “Transcriptional re- sponse of Citrus aurantifolia to infection by Citrus tristeza virus,” Virology, vol. 367, no. 2, pp. 298–306, 2007. [27] A.Reyes-Prieto,J.D.Hackett,M.B.Soares,M.F.Bonaldo,and D. Bhattacharya, “Cyanobacterial contribution to algal nuclear International Journal of Peptides

Advances in BioMed Stem Cells International Journal of Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Virolog y http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of Nucleic Acids

Zoology International Journal of Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at http://www.hindawi.com

Journal of The Scientific Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Genetics Anatomy International Journal of Biochemistry Advances in Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Enzyme International Journal of Molecular Biology Journal of Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014