Applications of to Plant Diego F. Gomez-Casati1*, María V. Busi1, Julieta Barchiesi1, Diego A. Peralta1, Nicolás Hedin1 and Vijai Bhadauria2

1Centro de Estudios Fotosintéticos y Bioquímicos (CEFOBI-CONICET), Universidad Nacional de Rosario, Suipacha, Rosario, Argentina. 2Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada. *Correspondence: [email protected] htps://doi.org/10.21775/cimb.027.089

Abstract , and plant biotechnology. Bioinfor- Bioinformatics encompasses many tools and matics also includes the collection of biological techniques that today are essential for all areas of , storage and management of , and research in the biological sciences. New the development of sofware for analysis of large with a wealth of information about , pro- data sets. teins, metabolites, and metabolic pathways appear In order to deal with this deluge of available almost daily. Particularly for who carry information, scientists need the ability to collect out research in plant , the amount of infor- accurate , store it in databases, and mation has multiplied exponentially due to the large organize it in a systematic way. In addition, com- number of databases available for many individual puter programs and mathematical tools such as plant . In this sense, bioinformatics together and are necessary for the analy- with next-generation and ‘’ sis of the data. approaches, can provide tools for plant breeding Recent advances in genomic technologies have and the of plants. In addition, led to the accumulation of large amounts of data, these technologies enable a beter understanding resulting in a signifcant increase in the amount of of the processes and mechanisms that can lead to biological information available for plant research plants with increased tolerance to diferent abiotic and many other areas of biology, particularly the stress conditions and resistance to pathogen atack, . Te applications of bioin- as well as the development of crop varieties with formatics in plant biotechnology have expanded improved nutritional quality of seeds and fruits. with the emergence of next-generation sequencing (NGS) techniques and the various omics technolo- gies, providing across the various Introduction omics platforms to reveal information about the In recent years, bioinformatics has emerged as a , , , metabolome, new, interdisciplinary feld of science that can be and metabolic pathways of diferent plant species described as the structuring of biological informa- (Edwards and Batley, 2004; Rhee et al., 2006). tion to enable the understanding of biological In this review we cover some aspects of the phenomena (Edwards and Batley, 2004; Rhee et applications of bioinformatics to problems in plant al., 2006). Bioinformatics is essential in many areas biotechnology, such as the improvement of fruit of biology such as , and quality, the use of bioinformatics tools for plant

Curr. Issues Mol. Biol. Vol. 27 90 | Gomez-Casati et al.

breeding, and the improvement of crop species in involved in such metabolic processes (Agarwal their responses to various types of stress and patho- et al., 2014). Fortunately, all expression data gen atack. We also describe the databases available collected under diferent metabolic and environ- for the study of and metabolites, mental conditions is deposited in databases that are especially those containing information on difer- publicly accessible (Table 5.1). Tese data provide ent plant species. valuable information that could potentially be used to produce improved plant varieties. On the other hand, bioinformatics is also being Gene expression and other applied to the study and analysis of the metabo- omics databases lome. Te omics approach known as Transcriptomic analysis comprises the study of involves the comprehensive, high-throughput changes in gene expression profles due to changes analysis of complex metabolite mixtures that, ide- in the normal condition of a or . In addi- ally, allows the identifcation and quantifcation of tion, transcriptomic studies can be used to assess every individual metabolite that are the end prod- how these changes in gene expression afect the ucts of cellular regulatory processes (Gomez-Casati development, growth, or phenotype of a particular et al., 2013). Metabolomic studies, in conjunction . with or , can refect changes in Microarrays are currently the most widely used the phenotype and of a particular tissue or technology for monitoring gene expression in plants. . As with gene expression data, there are Tis technique allows the simultaneous determina- several databases for plant species (some of which tion of transcript abundance for many thousands also include useful information for biomedical of (Rhee et al., 2006). data may researchers). Some of these databases also provide be directly integrated into functional genomic users with a powerful tool that allows them to gen- approaches aimed at both assigning function to erate an unambiguous to identify structures. identifed genes and studying the organization and Tis ‘tag’ is known as the International Chemical control of genetic pathways that act concurrently to Identifer (InChI) developed by the International create a functional organism. Te rationale behind Union of Pure and Applied Chemistry (IUPAC) this approach is that genes with similar expression and the National Institute of Standards and Tech- paterns may be functionally related and subject to nology (NIST). Te InChI identifer is based on the same or similar genetic control mechanisms. the chemical structure, and it makes it easier to Terefore, a common strategy undertaken in early perform searches on databases since the user can microarray studies was to analyse data by cluster- translate from the InChI to the structure, or back- ing genes into groups based on their expression translate. CheBI (www.ebi.ac.uk/chebi/) is one of profles scored over multiple (Aharoni the databases that allow users to generate InChI et al., 2001; Brown and Botstein, 1999). Recently, identifers, and PubChem (htp://pubchem.ncbi. RNA sequencing (RNA-seq) has been developed nlm.nih.gov/) has a structure drawing tool to gen- to directly measure mRNA profles in a biological erate the InChI identifer. In addition, Fiehn Lab system using NGS technology (Morin et al., 2008; Chemical Translation Service (htp://cts.fehnlab. Weber, 2015). Tis technique has the advantage in ucdavis.edu/) is an online tool that has the capabil- that it can generate relative measures of mRNA and ity to generate InChI identifers. also individual exon abundance while simultane- Some of the most well-known of the available ously profling the prevalence of both annotated metabolomics databases are the Golm Metabolome and novel exons and exon-splicing events. In (htp://gmd.mpimp-golm.mpg.de/), addition, it is possible to identify known single which contains information about the mass spec- polymorphisms (SNPs) as well as novel tra from biologically active metabolites, and the single-base variants (Morin et al., 2008). Microar- Human Metabolome Database www.hmdb.ca/), ray and RNA-seq techniques have enabled not which contains information on metabolites found only the identifcation of genes and the networks in humans, including biochemical, clinical, and involved in diferent developmental processes, but chemical data linking known metabolites to several have also helped to identify the downstream genes genes and . Te Madison Metabolomics

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 91

Table 5.1 Plant-specifc gene expression databases Plant species Database name URL NASC Arrays http://afymetrix.arabidopsis.info/Arabidopsis Arabidopsis Genevestigator www.genevestigator.ethz.ch/ Arabidopsis AREX LITE www.arexdb.org/ Arabidopsis AraNet www.inetbio.org/aranet/ Barley Barleybase www.barleybase.org Citrus CitrusPLEX www.plexdb.org/plex.php?database=Citrus Cotton CottonPLEX www.plexdb.org/plex.php?database=Cotton Generic GEO (Gene Expression www.ncbi.nlm.nih.gov/projects/geo Omnibus) Generic ArrayExpress www.ebi.ac.uk/arrayexpress Generic Plant Co-expression http://planex.plantbioinformatics.org/ database Generic Expression Atlas EMBL- www.ebi.ac.uk/gxa/home EBI Generic BAR http://bar.utoronto.ca/ Generic PLANET http://aranet.mpimp-golm.mpg.de/ Generic PathoPlant www.pathoplant.de/ Generic Plant Promoter Database http://linux1.softberry.com/berry.phtml?topic=plantprom&group= data&subgroup=plantprom Generic PLEXdb www.barleybase.org/plexdb/html/index.php Grape GrapePLEX www.plexdb.org/plex.php?database=Grape Maize Maize genomics and www.maizegdb.org/ genetics database Maize Zeamage www.maizearray.org Maize MaizePLEX www.plexdb.org/plex.php?database=Corn Medicago MedicagoPLEX www.plexdb.org/plex.php?database=Medicago Poplar PoplarPLEX www.plexdb.org/plex.php?database=Poplar Rice Rice Expression Database http://tenor.dna.afrc.go.jp/ Rice NSF Rice Oligo Array www.ricearray.org Project Database Rice Rice expression Profle http://ricexpro.dna.afrc.go.jp/ Database Rice RicePLEX www.plexdb.org/plex.php?database=Rice Solanaceae Solanaceae Gene www.tigr.org/tdb/potato Expression Database Solanaceae SolGenomics Network https://solgenomics.net/gem/experimental_design.pl?id=2 Soybean Soybean Genome and https://omictools.com/soybean-genome-and-microarray- Microarray Database database-tool Soybean SoyPLEX www.plexdb.org/plex.php?database=Soybean Sugarcane SugarPLEX www.plexdb.org/plex.php?database=Sugarcane Tobacco TOBFAC http://infogizmo.com/tobfac_tobacco_transcription_factors.html Tomato Tomato Functional http://ted.bti.cornell.edu Genomics Database Tomato TomPLEX www.plexdb.org/plex.php?database=Tomato Vitis vinifera (grape) VvGDB www.plantgdb.org/VvGDB/ Wheat WheatPLEX www.plexdb.org/plex.php?database=Wheat

Curr. Issues Mol. Biol. Vol. 27 92 | Gomez-Casati et al.

Database (htp://mmcd.nmrfam.wisc.edu/) that contains information about plant terpenoids, contains information about metabolites deter- particularly sesquiterpene lactones and phenolic mined using NMR and MS. Another integrative diterpenes important for use as therapeutic drugs. cross-species and cross-technique database is Tere are also several resources that integrate Metabolights (www.ebi.ac.uk/metabolights/), metabolic pathways with metabolite data such as which contains information about metabolite MetaCyc (htp://metacyc.org/) that contains structures, spectra, and their biological roles. information on ~2450 pathways from more than Metabolomics at Rothamsted (MeT-RO) is a 2780 , and BioCyc (htp://biocyc.org) project that contains several resources that can a collection of about 7600 pathway and genome be applied to plant and microbial metabolomics databases. PlantCyc (htp://plantcyc.org) is a (www.metabolomics.bbsrc.ac.uk/MeT-RO. comprehensive multispecies database that collects htm). Te Metlin Metabolite Database contains biochemical information from many plant path- information on about 240,000 metabolites and ways. Other related databases are RiceCyc (htp:// nearly 72,000 high-resolution MS/MS spectra and pathway.gramene.org/gramene/ricecyc.shtml), the tandem MS experiments (htp://metlin.scripps. Sol Genomics Network (htp://solgenomics.net/ edu/). PRIMe, a platform for RIKEN metabo- tools/solcyc/index.pl), AraCyc (www.arabidopsis. lomics, is a database that integrates genomic and org/biocyc/), HumanCyc (htp://humancyc. metabolomics data. PRIMe contains information org/), Mapman (htp://mapman.gabipd.org/web/ on metabolites obtained from GC-MS, LC-MS, guest/mapman), ChemSpider (www.chemspider. CE-MS and NMR spectroscopy (htp://prime.psc. com/), Reactome (www.reactome.org), the riken.jp/). ReSpect for Phytochemicals (htp:// KEGG Pathway database (www.genome.jp// spectra.psc.riken.jp/menta.cgi/index) is a collec- pathway.html), KappaView (htp://kpv.kazusa. tion of more than 9000 literature and in-house or.jp/kpv4/), and KNApSAcK (htp://kanaya. MSn spectra records for research on plant metabo- naist.jp/KNApSAcK/). Tese databases are sum- lomics. MassBank (www.massbank.jp) is a shared marized in Table 5.2. public repository of mass spectral data that, as of this writing (September 2016), contained 41,092 spectra records. Tese data are useful for chemical Applications of bioinformatics to identifcation of compounds detected by mass spec- enhance fruit development and trometry. Te MS-MS Fragment Viewer (htp:// ripening webs2.kazusa.or.jp/msmsfragmentviewer/) Te increasing number of NGS technologies and is a metabolomics database that contains FT-, new bioinformatics algorithms developed since IT-MS/MS and FT-MS spectral data with predicted the publication of the genome structures of ion fragments observed in LC-FT/ sequence in 2000 (Te Arabidopsis Genome Initia- ICR-MS analysis. Metabolome Express (www. tive, 2000), have enabled a deeper understanding metabolome-express.org) is a public place where of plant biology in both model species and crops GC/MS metabolomics datasets can be processed, (Gapper et al., 2014). interpreted, and shared. It houses both uncurated Current NGS technologies enable whole- repositories as well as quality-controlled databases. genome shotgun DNA sequencing, and have Databases specifc for individual plant species become established as the preferred genome have also been developed; an example is the Metab- sequencing technology, due to longer read lengths, olome Tomato Database (MoTo DB) (www.ab.wur. drastically reduced costs, and improved assembly nl/moto/), which is based on metabolites from algorithms. Tese advantages are refected in the Solanum lycopersicum obtained by LC-MS. Plant fact that the genomes sequenced in the last 10 years Metabolomics (htp://plantmetabolomics.vrac. are mostly from crops or non-model plant species, iastate.edu/ver2/) is a metabolomic and functional allowing an increase in the number of applied genomics tool for determining the roles of Arabi- studies in crops and fruits of interest (Feuillet et dopsis thaliana genes. Terpmed (www.terpmed.eu/ al., 2011; Gapper et al., 2014; Hamilton and Buell, databases.html) is another plant related database 2012).

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 93

Table 5.2 Metabolomics resources databases Name URL Information/species ChEBI www.ebi.ac.uk/chebi/ Dictionary of small chemical PubChem http://pubchem.ncbi.nlm.nih.gov/ General dictionary of chemicals Fiehn Lab Chemical http://cts.fehnlab.ucdavis.edu/ Chemical structures, names, synonyms and database identifers

Translation service Golm metabolome database http://gmd.mpimp-golm.mpg.de/ GS-MS Human metabolome database www.hmdb.ca/ Chemical and biological data of human metabolites Madison metabolomics http://mmcd.nmrfam.wisc.edu/ NMR and MS

Database Metabolights www.ebi.ac.uk/metabolights/ Metabolite structures, spectra, function/ cross-species Metabolomics at Rothamsted www.metabolomics.bbsrc.ac.uk/ Plant and microbial metabolites MeT-RO.htm

MeT-RO Metlin metabolite database http://metlin.scripps.edu/ High-resolution MS/MS spectra and tandem MS experiments PRIMe http://prime.psc.riken.jp/ Genomic and metabolomics data, NMR spectroscopy, GC-MS, LC-MS and CE-MS ReSpect for Phytochemicals http://spectra.psc.riken.jp/menta. MSn spectra data for research on plant cgi/index metabolomics MassBank www.massbank.jp Database of mass spectral data MS-MS Fragment Viewer http://webs2.kazusa.or.jp/ FT-MS, IT- and FT-MS/MS spectral data msmsfragmentviewer/ MetabolomeExpress http://metabolome-express.org GC/MS datasets Metabolome Tomato Database www.ab.wur.nl/moto/ Metabolites identifed by LC-MS

MoTo DB Plantmetabolomics http://plantmetabolomics.vrac. Arabidopsis and other plant species iastate.edu/ver2/ Terpmed www.terpmed.eu/databases.html Plant terpenoids, natural products, secondary metabolites, therapeutic drugs MetaCyc http://metacyc.org/ Integration of metabolite data with metabolic pathways/2000 organisms BioCyc http://biocyc.org Collection of pathway/genome database PlantCyc http://plantcyc.org Multispecies plant database RiceCyc http://pathway.gramene.org/ Metabolic pathways, , metabolites gramene/ricecyc.shtml Sol Genomics Network http://solgenomics.net/tools/solcyc/ Pathway genome databases/Solanaceae index.pl species AraCyc www.arabidopsis.org/biocyc/ Metabolic pathways, compounds, Arabidopsis HumanCyc http://humancyc.org/ Metabolic pathways, genome/human Mapman http://mapman.gabipd.org/web/ Datasets (e.g. gene expression data, guest/mapman metabolic pathways) ChemSpider www.chemspider.com/ Chemical structure database Reactome www.reactome.org Free curated and peer reviewed pathway database

Curr. Issues Mol. Biol. Vol. 27 94 | Gomez-Casati et al.

Table 5.2 Continued Name URL Information/species KEGG Pathway database www.genome.jp/kegg/pathway.html Pathways, , genetic information, cellular processes, human diseases KappaView http://kpv.kazusa.or.jp/kpv4/ Metabolic pathway database aimed at metabolic regulation KNApSAcK http://kanaya.naist.jp/KNApSAcK/ Species–metabolite relationship database

Te genome sequences of 27 fruit and nut crops the best characterized mutant collections available have been published to date; these include grape (htp://tgrc.ucdavis.edu/) (Gapper et al., 2014; (Jaillon et al., 2007), papaya (Ming et al., 2008), Seymour et al., 2013). Similar collections are the cucumber (Huang et al., 2009), apple (Velasco et European Prunus Database (www.bordeaux.inra. al., 2010), strawberry (Shulaev et al., 2011), tomato fr/eupeachdb/) and the Citrus Genome Database (Tomato Genome Consortium, 2012), melon (www.citrusgenomedb.org/). (Garcia-Mas et al., 2012), banana (D’Hont et al., Gene expression during fruit ripening is also 2012), Chinese plum (Zhang et al., 2012), pear regulated by epigenetic modifcations, which can (Wu et al., 2013), watermelon (Guo et al., 2013), consist of genome structure, packaging, and non- peach (Verde et al., 2013), kiwifruit (Huang et al., sequence-based changes such as DNA methylation, 2013), (Xu et al., 2013a), mandarin (Wang acetylation, and histone modifcations (Gallusci et al., 2015), jujube (Liu et al., 2014), mulberry (He et al., 2016; Gapper et al., 2014). Te entire set of et al., 2013), pineapple (Zhang et al., 2014), blue- epigenetic modifcations, known as the epigenome, berry (Gupta et al., 2015), raspberry (VanBuren et can be elucidated using NGS approaches. Recently, al., 2016), mango (Azim et al., 2014), macadamia using bisulfte sequencing of the tomato genome, nut (Nock et al., 2014), cocoa (Motamayor et al., Liu and colleagues demonstrated for the frst time 2013), cofee (Denoeud et al., 2014), hazelnut that active DNA demethylation is an absolute (Hampson et al., 1996), aubergine (Hirakawa et al., requirement for fruit ripening to occur, and show a 2014) and pepper (Qin et al., 2014). direct cause and efect relationship between hyper- Nevertheless, the additional development of methylation at specifc promoters and repression of physical and functional annotation and improved gene expression (Liu et al., 2015). However, even genome assembly is required. For instance, the in tomato, the extent and role of the epigenetic assembly of the tomato genome is the most com- regulation of fruit ripening is still relatively poorly plete of all fruit species sequenced, but functional understood (Gallusci et al., 2016). annotation of its genes is still incomplete (Gapper Several transcriptomic studies have identifed et al., 2014). new regulators of fruit development and ripening. Te new genomics approaches enable the identi- Zhu and colleagues, using RNA sequencing and fcation of genes responsible for mutant phenotypes functional analysis, demonstrated the regulatory in a way that is much faster than ever before. One role of long non-coding RNAs in tomato fruit rip- example is the cloning of a GOLDEN-LIKE2 Myb ening (Zhu et al., 2015). In another example, Asif superfamily transcription factor responsible for the and collaborators examined the global changes that uniform ripening mutant in tomato. Tis occurred during fruit ripening in banana, sequenc- has been bred into many commercial cultivars ing cDNA libraries from unripe and ripe banana resulting in more uniform ripening (Nguyen et al., fruit pulp using the 454-GS platform (Asif et al., 2014; Powell et al., 2012). 2014). Te availability of a collection of mutants with Frequently used proteomics techniques are now ripening inhibition phenotypes has been essential being adapted to high throughput screening meth- to clarifying the transcriptional control of fruit ods that allow for the increased use of annotated ripening (Gapper et al., 2014). Tomato has one of genome sequences and robust bioinformatics tools.

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 95

Mass spectrometry-based methods have gained Biotechnology and importance compared with antibody-based tech- bioinformatics for plant niques owing to their higher specifcity, precision breeding and reproducibility, and the capability to quickly At present, there exists a continuous challenge analyse numerous peptide transitions in a single in plant breeding to improve crops that produce test (Findeisen and Neumaier, 2009; Gapper et al., fodder, fuel, food, or other products through the 2014). application of crop genetics (Ray and Satya, 2014). Using high-throughput iTRQ (isobaric tag Tese challenges are also related to the fact that for relative and absolute quantization of tryptic some crop genomes are not yet fully sequenced and peptides) and high-resolution , annotated, either because these crops have been Li and colleagues performed comparative analyses under-researched, or due to the structural complex- of the proteome and transcriptome during pear ity of some very large genomes (Sikhakhane et al., fruit development and identifed 35 diferentially 2016). expressed proteins related to fruit quality, includ- Te accessibility of whole genome sequences ing proteins related to sugar formation, aroma allows for the identifcation of sequence-based synthesis, and to the formation of lignin (Li et al., single nucleotide polymorphisms (SNPs) as DNA 2015). Studies of this type can provide resources markers instead of DNA fragment-based poly- for improvement of fruit quality in pear. morphism identifcation, which has substantially Fruits are an extremely rich source of primary increased the number of informative markers that and secondary metabolites. Metabolomics studies can be developed (Stapley et al., 2010). In this on the fruit ripening have been undertaken way, although there are various strategies for plant in several species including peach (Lombardo et al., breeding, the use of genomics-assisted breed- 2011), melon (Moing et al., 2011), tomato (Oms- ing is an efective and economic strategy that is Oliu et al., 2011), apple (Rudell et al., 2009), pear becoming widely used (Sikhakhane et al., 2016). (Pedreschi et al., 2009), avocado (Pedreschi et al., Moreover, the genetics of particular traits in spe- 2014) and pepper (Osorio et al., 2012). Tese stud- cies with large and complex genomes may now ies have provided new insights into conserved and be studied in related plants with smaller genomes unique metabolic pathways associated with fruit that share conserved regions through comparative maturation (Gapper et al., 2014). genomics. Tis will potentially identify genes or One interesting recent study was a comparative quantitative trait loci (QTL) and putative SNP investigation to identify common and/or species- markers to be annotated for genome-wide associa- specifc modes of regulation in sugar accumulation. tion mapping. Te application of NGS techniques For that purpose, Dai and collaborators designed a in several crops is revolutionizing and accelerat- process-based mathematical framework to compare ing the pace of plant breeding (Sikhakhane et al., soluble sugar accumulation in three fruits: grape, 2016). Fig. 5.1 shows a scheme for plant breeding tomato, and peach (Dai et al., 2016). Tese authors that incorporates NGS. demonstrated that at maturity, grape presented the Since the introduction of the Sanger sequenc- highest soluble sugar concentrations, followed by ing technique nearly 40 years ago (Sanger et al., peach and tomato. Tey also showed that the higher 1977), several signifcant advances have been soluble sugar concentration in grape compared to made that address the limitations of this early tomato is a result of higher sugar importation, technology. Tese advances complement the although the higher soluble sugar concentration development of more sophisticated DNA sequenc- in grape compared to peach is due to lower water ing methodologies that allow de novo genome dilution. On the other hand, they observed that sequencing by rapidly producing huge amounts carbon utilization for synthesis of non-soluble of sequence information at very low cost. Table sugar compounds was conserved among the three 5.3 shows a summary of the advances made in fruit species. Tese results increase our understand- DNA sequencing technology, from dideoxy chain ing of the origins of diferences in soluble sugar termination sequencing to modern NGS technolo- concentration between feshy fruits. gies including 454 sequencing (Roche), Illumina

Curr. Issues Mol. Biol. Vol. 27 96 | Gomez-Casati et al.

Figure 5.1 Plant breeding assisted by NGS.

Table 5.3 NGS timeline Year introduced Method/Platform Read length Performance (data yield) 1977 Sanger sequencing Up to 1000 bp Up to 84 kb per 3 hours 2004 Roche/454 Up to 1000 bp 700 Mb per day 2006 Illumina Up to 300 bp 130 Gb per 30 h 2006 ABI SOLiD Up to 50 bp 20 Gb per 72 hours 2009 Helicos 35 bp (average) 35 Gb per 8 days 2010 Ion Torrent/Pacifc Biosciences 35 to 400 bp 2 Gb per 5 hours (average)

sequencing-by-synthesis, SOLiD sequencing by with large genomes and species with high levels of ligation and detection, single mol- nucleotide diversity due to advances in genomic ecule sequencing (Helicos Biosciences and Pacifc technologies that have decreased the cost of high Biosciences), and Ion Torrent sequencing. Tere throughput DNA sequencing. GBS is based on are available several modifcations for each of these NGS technology and takes advantage of reducing technologies and protocols are continuously being genomic complexity to enable high-throughput developed to overcome some of the current limita- genotyping of large numbers of samples at a large tions. number of SNP marker loci (Glaubitz et al., 2014). Despite the benefcial prospects of the vari- Repetitive regions of the genome may be avoided ous NGS platforms, many limitations must be and lower copy regions targeted with high ef- addressed to maximize their potential. Tese ciency by using restriction enzymes (REs). Tis limitations are related to bioinformatics data approach widely simplifes bioinformatic and the storage and integration of the alignment issues in species that possess a high large number of data generated by these technolo- level of genetic diversity (Elshire et al., 2011). Tis gies. Tis constant challenge is also related to the robust, cost-efective, and straightforward method advent of new techniques and protocols in order is widely used for genotyping and linkage map- to make data treatment and analysis easier. Tere- ping in many diverse plant species. Although the fore, high-throughput bioinformatics equipment, methods used to analyse raw GBS sequence data computational investments, and human resources are constantly being improved, there are two main combined with several NGS technologies will commonly used methods at present: genome-wide permit the data generated from diferent sources to association studies (GWAS) (Cantor et al., 2010; be used universally. Visscher et al., 2012) and TASSEL-GBS (Glau- Genotyping-by-sequencing (GBS), a widely bitz et al., 2014). Tese methods are designed for used SNP discovery method, is feasible for species the efcient processing of raw GBS sequence data

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 97 for SNP genotyping. GWAS and TASSEL-GBS of about 19,000 transcripts that are diferentially procedures largely satisfy the following design cri- regulated by either high temperature and/or teria: (1) the ability to run on modest computing drought stress. Most of the up-regulated transcrip- resources, including desktop or laptop machines tion factor genes belonged to heat shock factors without a lot of RM, (2) scalability from small and dehydration-responsive element-binding gene to large studies (large breeding programs), and (3) families. Tis information will enhance the under- applicability in an accelerated breeding context. standing of the molecular mechanisms behind the Wide adoption of these approaches across diferent plant responses to abiotic stresses, and will enable methods will speed up the production of useful efcient strategies to improve tolerance to high databases and datasets for efective plant breeding temperature and drought stress. (Sikhakhane et al., 2016). Considerable progress has been made towards engineering transgenic plants that express abiotic stress tolerance. Analyses of the large quantity of Bioinformatics for studying abiotic stress-related transcriptomic data generated stress resistance in plants in several plant species has revealed the importance Te molecular regulatory networks involved in of osmoprotectants in stress resistance. Tese stress resistance and in plants can be osmoprotectants help protect plants against the decoded using a combination of omics studies damaging efects of osmotic and ionic stress. For (Unamba et al., 2015). NGS technologies and example, several drought- and salt-resistant trans- potent computational pipelines have reduced the genic potato plants developed using this strategy cost of whole genome and transcriptomic sequenc- have been reported. Te tubers were successfully ing, allowing their application to model and engineered to express an osmotin-like non-model plants. Tis generates a large number (Evers et al., 1999), glyceraldehyde-3-phosphate of resources, which allows dehydrogenase (Jeong et al., 2001), and nucleoside us to increase our understanding of the molecular diphosphate kinase 2 (Tang et al., 2008). mechanisms underlying plant responses to stress Te many eforts to improve abiotic stress conditions (Xu et al., 2013b). tolerance in plants have resulted in important Plants constantly adjust their transcriptome achievements (Khan et al., 2015). However, due profle in response to diferent stresses (Debnath to the complexity of stress tolerance, the success et al., 2011); thus, techniques such as microarrays of these strategies has been limited because most or RNA-seq can provide a vast amount of infor- of the resulting transgenic plants have been tested mation about diferential gene expression. One only under controlled laboratory conditions. In example is the transcriptomic analysis of salt and such conditions, transgenic plants are evaluated at extreme drought tolerance in Ipomoea imperati, a the early growth stage and are exposed to the stress sweet potato relative, that was performed using 454 condition for a short period of time. Short exposure pyrosequencing by Solis and collaborators. Tis experiments at early growth stages performed in study identifed several genes related to salt toler- laboratories may not predict the response of the ance in this species. Te information generated in plant under true feld conditions. In addition, trans- this study could be useful in marker-assisted breed- genic plants may be exposed to multiple stresses in ing to improve salt tolerance in sweet potato (Solis the feld, which may suppress the protective efects et al., 2016). of the introduced transgene. In another report, Bhardwaj and colleagues Future investigations into improving plant stress performed the frst comprehensive transcriptome tolerance should put more emphasis on combining study of B. juncea (the oilseed crop Indian mustard) diferent approaches, such as multigenic strate- under conditions of high temperature and drought gies to simultaneously incorporate more than one stress (Bhardwaj et al., 2015). Tey constructed gene in transgenic plants. For example, the genes three transcriptome libraries, sequenced them on involved in osmoprotectant synthesis could be co- the Illumina GA IIx platform, and assembled the expressed with other stress resistance-related genes high quality reads obtained using the SOAP de such as ion transporters and transcription factors novo assembler. Bhardwaj et al. identifed a subset (Khan et al., 2015).

Curr. Issues Mol. Biol. Vol. 27 98 | Gomez-Casati et al.

Bioinformatics approaches symptoms (Studholme et al., 2011). However, sev- to study resistance to plant eral published cases present concrete evidence for pathogens a relationship between the presence of disease and Plants have evolved a diverse of strategies to the pathogen found. For example, Adams and col- counteract pathogen atacks that involve the modi- leagues found that the presence of Carrot yellow leaf fcation of gene expression, activation of several virus (CYLV) is strongly related to the occurrence metabolic pathways, and postranslational modif- of internal necrosis of carrots by using a metagen- cation of proteins (Gomez-Casati et al., 2016). omic approach, which consisted of constructing an Te new DNA sequencing technologies, com- RNA-seq and sequencing it on the Illumina bined with sophisticated bioinformatics, are having Miseq instrument (Adams et al., 2014). a major infuence on research in the feld of plant At present, most of plant RNA viruses identifed . Tese technologies are gaining impor- by NGS were obtained from either total nucleic tance in studies of the genomics, , acid or total double-stranded RNA extracted from proteomics, metabolomics, and transcriptomics of infected plant tissue (Barba et al., 2014). In another both the host plant and the pathogen, and are also strategy, host was eliminated by extremely useful in and diagnostics, hybridization to nucleic acid isolated from healthy especially with respect to viruses (Studholme et al., plants in order to enrich for virus sequences prior 2011). to sequencing. Plant viruses can also be identifed Whole genome sequences are available for indirectly by detecting specifc RNA molecules, numerous plants and a diverse group of microbial called short interfering RNAs (siRNA), that the phytopathogens that includes bacteria, viruses, host plant generates in response to infection by fungi, and oomycetes. Many of these microbial RNA viruses (Mlotshwa et al., 2008). NGS of genome sequences were obtained using the tra- siRNAs is a good approach to identify viruses ditional Sanger sequencing method. Te frst infecting plants, even when they are present at very plant pathogen genome sequenced using second- low titres and in symptomless infections. generation sequencing technologies was P. syringae Microbial metagenomic studies can be per- pathovar oryzae, a bacterium that causes halo blight formed on environmental samples or individual on rice (Reinhardt et al., 2009; Studholme et al., plants (Duta et al., 2014; Roossinck et al., 2015). 2011). Reinhardt and colleagues sequenced the P. In general, most viruses identifed in wild plants syringae pv. oryzae genome by combining data gen- cannot be associated with , although erated from the Roche 454 and Illumina sequencing some of them can be pathogenic when infecting platforms and using a new bioinformatics pipeline domesticated plants. Tus, the use of metagenom- based on their VCAKEv1.5 assembly ics data provides advance knowledge of viruses that (Jeck et al., 2007). Since that time, several other could cause threats if they jump from wild species phytopathogen genomes have been sequenced into domestic hosts. (Harrison et al., 2016; Studholme et al., 2011; Was- Using high-throughput transcriptomic tech- ukira et al., 2014). Tese draf genome sequences nologies, such as microarrays or RNA-seq, makes it are expected to contain errors and be incomplete. possible to obtain gene expression profles from the However, they can provide valuable information host plants and to detect gene expression changes about the molecular basis for infection of plant in the host-associated pathogen. On the other hand, hosts, including he sequences of potential novel the use of diferent proteomics approaches allows virulence factors, like T3SS efectors (Studholme for the identifcation of many resistance proteins et al., 2009). that plants express in the presence of specifc patho- For disease diagnostics, NGS allows the direct gen efectors that trigger its defence mechanisms. identifcation of bacteria or viruses in an unbiased Using this methodology, several antimicrobial way without requiring previous knowledge of the proteins expressed during phytopathogenic inter- pathogen DNA/RNA sequence or antibodies actions have been identifed, mainly by high (Quan et al., 2008). Finding a pathogen nucleic acid resolution of two-dimensional polyacrylamide gel sequence within host tissue does not necessarily electrophoresis coupled with mass spectrometry prove that the pathogen is causing the plant disease (Gomez-Casati et al., 2016; Mehta et al., 2008).

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 99

All of the omics technologies discussed here plants existing on the market (e.g. squash and papa- can provide a wealth of information about the yas) have this resistance mechanism (Ratclif et al., pathogen life cycle, and enable the discovery of 1999; Voinnet, 2001). novel virulence factors in pathogens as well as their host targets (Gomez-Casati et al., 2016). Both approaches are important for developing strategies Conclusions to improve disease resistance in plants. Te application of bioinformatics to the study and Traditionally, the control of pathogen invasion characterization of biological phenomena repre- and spread has been achieved through breeding sents a fundamental shif in the way that scientists programmes that focus on increasing host resist- study living organisms. As mentioned above, the ance to particular pathogens, or by the application many genomic, transcriptomic, proteomic, and of chemical pesticides and fungicides. A relatively metabolite databases available have the potential to straightforward approach to generate pathogen accelerate the rate of functional discoveries in plant resistant plants is through introduction of genes biology. Although these databases provide access to from other organisms to develop new cellular meta- gene information, gene expression data, and metab- bolic pathways. olite profles, as well as the experimental conditions Several genes involved in specifc immune used to generate them, additional measures are responses to diferent phytopathogens were iden- needed to complement genome annotation eforts tifed using RNA-seq technology; these include efectively and to increase discovery of gene func- genes that a tyrosine protein kinase (Epk1) tion and regulation. One possible course of action (Pombo et al., 2014), a cell wall-associated kinase would be the generation of reference expression (SlWAK1) (Rosli et al., 2013), and a nucleotide datasets for plant cells and/or tissues from specifc binding leucine-rich repeat receptor (NB-LRR) stages of plant development. In addition, it would (Bernoux et al., 2011). be highly desirable to develop databases which Whereas some atempts to engineer disease could contain increasingly integrated information resistance in economically important crop plants about genomic, transcriptomic, metabolomic, as have failed, several have been successful. Cao and well as proteomic and phenomic data for each plant colleagues reported transgenic maize lines that species. constitutively express a mutant E. coli dsRNA- specifc endoribonuclease gene and show increased Acknowledgements resistance to the dsRNA virus that causes Maize Tis work was supported by grants from ANPCyT rough dwarf disease (Cao et al., 2013). Mysore (PICT 2013-2188 and 2014-2184). and collaborators showed that the overexpression of a serine/threonine kinase gene (Pto) in tomato References confers protection against P. syringae (Mysore et al., Adams, I.P., Skelton, A., Macarthur, R., Hodges, T., Hinds, 2003). In addition, the overexpression of NPR1, H., Flint, L., Nath, P.D., Boonham, N., and Fox, A. (2014). Carrot yellow leaf virus is associated with carrot a gene involved in systemic acquired resistance in internal necrosis. PLOS ONE9, e109125. htp://dx.doi. plants, resulted in increased disease resistance in org/10.1371/journal.pone.0109125 apple, Malus domestica, to two important fungal Agarwal, P., Parida, S.K., Mahto, A., Das, S., Mathew, I.E., pathogens, Venturia inaequalis and Gymnosporan- Malik, N., and Tyagi, A.K. (2014). Expanding frontiers in plant transcriptomics in aid of functional genomics gium juniperi-virginianae (Gomez-Casati et al., and molecular breeding. Biotechnol. J. 9, 1480–1492. 2016; Malnoy et al., 2007). htp://dx.doi.org/10.1002/biot.201400063 On the other side, the most usual way of obtain- Aharoni, A., De Vos, C.H., Wein, M., Sun, Z., Greco, R., ing virus resistant plants is by expressing in them a Kroon, A., Mol, J.N., and O’Connell, A.P. (2001). Te strawberry FaMYB1 transcription factor suppresses viral gene encoding the coat protein. So, the plant anthocyanin and favonol accumulation in transgenic produces this viral protein before the virus atacks, tobacco. Plant J. 28, 319–332. htp://dx.doi. and when the virus tries to infect the plant, the org/10.1046/j.1365-313X.2001.01154.x host gene silencing pathway is already activated Asif, M.H., Lakhwani, D., Pathak, S., Gupta, P., Bag, S.K., Nath, P., and Trivedi, P.K. (2014). Transcriptome and degrades the virus RNA early in the infection analysis of ripe and unripe fruit tissue of banana process. All genetically modifed virus resistant identifes major metabolic networks involved in fruit

Curr. Issues Mol. Biol. Vol. 27 100 | Gomez-Casati et al.

ripening process. BMC Plant Biol. 14, 316. htp:// revealed by sequence analysis. Arch. Virol. 159, 1755– dx.doi.org/10.1186/s12870-014-0316-1 1764. htp://dx.doi.org/10.1016/j.tibtech.2004.03.002 Azim, M.K., Khan, I.A., and Zhang, Y. (2014). Edwards, D., and Batley, J. (2004). Plant bioinformatics: Characterization of mango (Mangifera indica L.) from genome to . Trends Biotechnol. 22, 232– transcriptome and chloroplast genome. Plant Mol. 237. htp://dx.doi.org/10.1016/j.tibtech.2004.03.002 Biol.85, 193–208. htp://dx.doi.org/10.1007/s11103- Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, 014-0179-8 K., Buckler, E.S., and Mitchell, S.E. (2011). A robust, Barba, M., Czosnek, H., and Hadidi, A. (2014). Historical simple genotyping-by-sequencing (GBS) approach for perspective, development and applications of next- high diversity species. PLOS ONE6, e19379. htp:// generation sequencing in plant . Viruses 6, dx.doi.org/10.1371/journal.pone.0019379 106–136. htp://dx.doi.org/10.1016/j.pbi.2011.05.005 Evers, D., Overney, S., Simon, P., Greppin, H., and Bernoux, M., Ellis, J.G., and Dodds, P.N. (2011). New Hausman, J.F. (1999). Salt tolerance of Solanum insights in plant immunity signaling activation. tuberosum L. overexpressing an heterologous osmotin- Curr. Opin. Plant Biol. 14, 512–518. htp://dx.doi. like protein. Biol. Plant. 42, 105–112. htp://dx.doi. org/10.1016/j.pbi.2011.05.005 org/10.1023/a:1002131812340 Bhardwaj, A.R., Joshi, G., Kukreja, B., Malik, V., Arora, Feuillet, C., Leach, J.E., Rogers, J., Schnable, P.S., and P., Pandey, R., Shukla, R.N., Bankar, K.G., Katiyar- Eversole, K. (2011). Crop genome sequencing: lessons Agarwal, S., Goel, S., etal. (2015). Global insights into and rationales. Trends Plant Sci.16, 77–88. htp:// high temperature and drought stress regulated genes dx.doi.org/10.1016/j.tplants.2010.10.005 by RNA-Seq in economically important oilseed crop Findeis en, P., and Neumaier, M. (2009). Mass spectrometry- Brassica juncea. BMC Plant Biol. 15, 9. htp://dx.doi. based clinical proteomics profling: current status and org/10.1186/s12870-014-0405-1 future directions. Expert. Rev. Proteomics6, 457–459. Brown, P.O., and Botstein, D. (1999). Exploring the htp://dx.doi.org/10.1586/epr.09.67 new world of the genome with DNA microarrays. Gallusci, P., Hodgman, C., Teyssier, E., and Seymour, G.B. Nat. Genet.21 (Suppl. 1),33–37. htp://dx.doi. (2016). DNA Methylation and Chromatin Regulation org/10.1038/4462 during Fleshy Fruit Development and Ripening. Cantor, R.M., Lange, K., and Sinsheimer, J.S. (2010). Front. Plant Sci.7, 807. htp://dx.doi.org/10.3389/ Prioritizing GWAS results: A review of statistical fpls.2016.00807 methods and recommendations for their application. Gapper, N.E., Giovannoni, J.J., and Watkins, C.B. (2014). Am. J. Hum. Genet. 86, 6–22. htp://dx.doi. Understanding development and ripening of fruit crops org/10.1016/j.ajhg.2009.11.017 in an ‘omics’ era. Hortic. Res.1, 14034. htp://dx.doi. Cao, X., Lu, Y., Di, D., Zhang, Z., Liu, H., Tian, L., Zhang, org/10.1038/hortres.2014.34 A., Zhang, Y., Shi, L., Guo, B., et al. (2013). Enhanced Garcia-Mas, J., Benjak, A., Sanseverino, W., Bourgeois, virus resistance in transgenic maize expressing a dsRNA- M., Mir, G., González, V.M., Hénaf, E., Câmara, F., specifc endoribonuclease gene from E. coli. PLOS Cozzuto, L., Lowy, E., et al. (2012). Te genome of ONE 8, e60829. htp://dx.doi.org/10.1371/journal. melon (Cucumis melo L.). Proc. Natl. Acad. Sci. pone.0060829 U.S.A.109, 11872–11877. htp://dx.doi.org/10.1073/ D’Hont, A., Denoeud, F., Aury, J.M., Baurens, F.C., Carreel, pnas.1205415109 F., Garsmeur, O., Noel, B., Bocs, S., Droc, G., Rouard, M., Glaubitz, J.C., Casstevens, T.M., Lu, F., Harriman, J., Elshire, et al. (2012). Te banana (Musa acuminata) genome and R.J., Sun, Q., and Buckler, E.S. (2014). TASSEL-GBS: the evolution of monocotyledonous plants. 488, a high capacity genotyping by sequencing analysis 213–217. htp://dx.doi.org/10.1038/nature11241 pipeline. PLOS ONE9, e90346. htp://dx.doi. Dai, Z., Wu, H., Baldazzi, V., van Leeuwen, C., Bertin, N., org/10.1371/journal.pone.0090346 Gautier, H., Wu, B., Duchêne, E., Gomès, E., Delrot, Gomez-Casati, D.F., Pagani, M.A., Busi, M.V., and S., et al. (2016). Inter-species comparative analysis Bhadauria, V. (2016). Omics approaches for the of components of soluble sugar concentration in engineering of pathogen resistant plants. Curr. Issues. feshy fruits. Front. Plant Sci. 7, 649. htp://dx.doi. Mol. Biol.19, 89–98. org/10.3389/fpls.2016.00649 Gomez-Casati, D.F., Zanor, M.I., and Busi, M.V. (2013). Debnath, M., Pandey, M., and Bisen, P.S. (2011). An Metabolomics in plants and humans: applications omics approach to understand the plant abiotic stress. in the prevention and diagnosis of diseases. OMICS 15, 739–762. htp://dx.doi.org/10.1089/ Biomed Res. Int. 2013, 792527. htp://dx.doi. omi.2010.0146 org/10.1155/2013/792527 Denoeud, F., Carretero-Paulet, L., Dereeper, A., Droc, G., Guo, S., Zhang, J., Sun, H., Salse, J., Lucas, W.J., Zhang, H., Guyot, R., Pietrella, M., Zheng, C., Alberti, A., Anthony, Zheng, Y., Mao, L., Ren, Y., Wang, Z., et al. (2013). Te F., Aprea, G., etal. (2014). Te cofee genome provides draf genome of watermelon (Citrullus lanatus) and insight into the convergent evolution of cafeine resequencing of 20 diverse accessions. Nat. Genet. 45, biosynthesis. Science 345, 1181–1184. htp://dx.doi. 51–58. htp://dx.doi.org/10.1038/ng.2470 org/10.1126/science.1255274 Gupta, V., Estrada, A.D., Blakley, I., Reid, R., Patel, K., Duta, M., Sokhandan Bashir, N., Palmer, M.W., and Meyer, M.D., Andersen, S.U., Brown, A.F., Lila, M.A., Melcher, U. (2014). Genomic characterization of and Loraine, A.E. (2015). RNA-Seq analysis and Ambrosia asymptomatic virus 1 and evidence of other annotation of a draf blueberry genome assembly Tymovirales members in the Oklahoma tallgrass prairie identifes candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specifc

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 101

alternative splicing. Gigascience 4, 5. htp://dx.doi. tree biology. Nat. Commun. 5, 5315. htp://dx.doi. org/10.1186/s13742-015-0046-9 org/10.1038/ncomms6315 Hamilton, J.P., and Buell, C.R. (2012). Advances in plant Liu, R., How-Kit, A., Stammiti, L., Teyssier, E., Rolin, D., genome sequencing. Plant J. 70, 177–190. htp://dx.doi. Mortain-Bertrand, A., Halle, S., Liu, M., Kong, J., Wu, org/10.1111/j.1365-313X.2012.04894.x C., etal. (2015). A DEMETER-like DNA demethylase Hampson, C.R., Coleman, G.D., and Azarenko, A.N. governs tomato fruit ripening. Proc. Natl. Acad. Sci. (1996). Does the genome of Corylus avellana L. contain U.S.A. 112, 10804–10809. htp://dx.doi.org/10.1038/ sequences homologous to the self-incompatibility gene ncomms6315 of Brassica? Teor. Appl. Genet. 93, 759–764. htp:// Lombardo, V.A., Osorio, S., Borsani, J., Lauxmann, M.A., dx.doi.org/10.1007/BF00224073 Bustamante, C.A., Budde, C.O., Andreo, C.S., Lara, M.V., Harrison, J., Grant, M.R., and Studholme, D.J. (2016). Draf Fernie, A.R., and Drincovich, M.F. (2011). Metabolic Genome Sequences of Two Strains of Xanthomonas profling during peach fruit development and ripening arboricola pv. celebensis Isolated from Banana Plants. reveals the metabolic networks that underpin each Genome Announc. 4, e01705–15. htp://dx.doi. developmental stage. Plant Physiol. 157, 1696–1710. org/10.1128/genomeA.01705-15 htp://dx.doi.org/10.1104/pp.111.186064 He, N., Zhang, C., Qi, X., Zhao, S., Tao, Y., Yang, G., Lee, Malnoy, M., Jin, Q., Borejsza-Wysocka, E.E., He, S.Y., and T.H., Wang, X., Cai, Q., Li, D., et al. (2013). Draf Aldwinckle, H.S. (2007). Overexpression of the apple genome sequence of the mulberry tree Morus notabilis. MpNPR1 gene confers increased disease resistance in Nat. Commun. 4, 2445. htp://dx.doi.org/10.1038/ Malus x domestica. Mol. Plant Microbe Interact.20, ncomms3445 1568–1580. htp://dx.doi.org/10.1094/MPMI-20-12- Hirakawa, H., Shirasawa, K., Miyatake, K., Nunome, T., 1568 Negoro, S., Ohyama, A., Yamaguchi, H., Sato, S., Isobe, Mehta, A., Brasileiro, A.C., Souza, D.S., Romano, E., S., Tabata, S., etal. (2014). Draf genome sequence of Campos, M.A., Grossi-de-Sá, M.F., Silva, M.S., Franco, eggplant (Solanum melongena L.): the representative O.L., Fragoso, R.R., Bevitori, R., et al. (2008). Plant- solanum species indigenous to the old world. DNA pathogen interactions: what is proteomics telling us? Res. 21, 649–660. htp://dx.doi.org/10.1093/dnares/ FEBS J.275, 3731–3746. htp://dx.doi.org/10.1111/ dsu027 j.1742-4658.2008.06528.x Huang, S., Ding, J., Deng, D., Tang, W., Sun, H., Liu, D., Ming, R., Hou, S., Feng, Y., Yu, Q., Dionne-Laporte, A., Zhang, L., Niu, X., Zhang, X., Meng, M., et al. (2013). Saw, J.H., Senin, P., Wang, W., Ly, B.V., Lewis, K.L., et Draf genome of the kiwifruit Actinidia chinensis. al. (2008). Te draf genome of the transgenic tropical Nat. Commun. 4, 2640. htp://dx.doi.org/10.1038/ fruit tree papaya (Carica papaya Linnaeus). Nature 452, ncomms3640 991–996. htp://dx.doi.org/10.1038/nature06856 Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., Lucas, Mlotshwa, S., Pruss, G.J., and Vance, V. (2008). Small W.J., Wang, X., Xie, B., Ni, P., etal. (2009). Te genome RNAs in viral infection and host defense. Trends of the cucumber, Cucumis sativus L. Nat. Genet. 41, Plant Sci. 13, 375–382. htp://dx.doi.org/10.1016/j. 1275–1281. htp://dx.doi.org/10.1038/ng.475 tplants.2008.04.009 Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Moing, A., Aharoni, A., Biais, B., Rogachev, I., Meir, S., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Brodsky, L., Allwood, J.W., Erban, A., Dunn, W.B., Kay, Jubin, C., et al. (2007). Te grapevine genome sequence L., etal. (2011). Extensive metabolic cross-talk in melon suggests ancestral hexaploidization in major angiosperm fruit revealed by spatial and developmental combinatorial phyla. Nature 449, 463–467. metabolomics. New Phytol. 190, 683–696. htp:// Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, dx.doi.org/10.1111/j.1469-8137.2010.03626.x M.T., Magrini, V., Mardis, E.R., Dangl, J.L., and Jones, Morin, R., Bainbridge, M., Fejes, A., Hirst, M., Krzywinski, C.D. (2007). Extending assembly of short DNA M., Pugh, T., McDonald, H., Varhol, R., Jones, S., and sequences to handle error. Bioinformatics 23, 2942– Marra, M. (2008). Profling the HeLa S3 transcriptome 2944. using randomly primed cDNA and massively parallel Jeong, M.J., Park, S.C., and Byun, M.O. (2001). short-read sequencing. BioTechniques 45, 81–94. Improvement of salt tolerance in transgenic potato htp://dx.doi.org/10.2144/000112900 plants by glyceraldehyde-3 phosphate dehydrogenase Motamayor, J.C., Mockaitis, K., Schmutz, J., Haiminen, N., gene transfer. Mol. Cells 12, 185–189. Livingstone, D., Cornejo, O., Findley, S.D., Zheng, P., Khan, M.S., Ahmad, D., and Khan, M.A. (2015). Utilization Utro, F., Royaert, S., et al. (2013). Te genome sequence of genes encoding osmoprotectants in transgenic plants of the most widely cultivated cacao type and its use to for enhanced abiotic stress tolerance. Electronic J. identify candidate genes regulating pod color. Genome Biotechnol. 18, 257–266. htp://dx.doi.org/10.1016/j. Biol. 14, r53. htp://dx.doi.org/10.1186/gb-2013-14- ejbt.2015.04.002 6-r53 Li, J.M., Huang, X.S., Li, L.T., Zheng, D.M., Xue, C., Zhang, Mysore, K.S., D’Ascenzo, M.D., He, X., and Martin, G.B. S.L., and Wu, J. (2015). Proteome analysis of pear reveals (2003). Overexpression of the disease resistance gene key genes associated with fruit development and quality. Pto in tomato induces gene expression changes similar Planta 241, 1363–1379. htp://dx.doi.org/10.1007/ to immune responses in human and fruitfy. Plant s00425-015-2263-y Physiol. 132, 1901–1912. htp://dx.doi.org/10.1104/ Liu, M.J., Zhao, J., Cai, Q.L., Liu, G.C., Wang, J.R., Zhao, pp.103.022731 Z.H., Liu, P., Dai, L., Yan, G., Wang, W.J., et al. (2014). Nguyen, C.V., Vrebalov, J.T., Gapper, N.E., Zheng, Y., Te complex jujube genome provides insights into fruit Zhong, S., Fei, Z., and Giovannoni, J.J. (2014). Tomato

Curr. Issues Mol. Biol. Vol. 27 102 | Gomez-Casati et al.

GOLDEN2-LIKE transcription factors reveal molecular Ray, S., and Satya, P. (2014). Next generation sequencing gradients that function during fruit development technologies for next generation plant breeding. and ripening. Plant Cell 26, 585–601. htp://dx.doi. Front. Plant Sci. 5, 367. htp://dx.doi.org/10.3389/ org/10.1105/tpc.113.118794 fpls.2014.00367 Nock, C.J., Baten, A., and King, G.J. (2014). Complete Reinhardt, J.A., Baltrus, D.A., Nishimura, M.T., Jeck, W.R., chloroplast genome of Macadamia integrifolia confrms Jones, C.D., and Dangl, J.L. (2009). De novo assembly the position of the Gondwanan early-diverging eudicot using low-coverage short read sequence data from family Proteaceae. BMC Genomics 15 Suppl 9, S13. the rice pathogen Pseudomonas syringae pv. oryzae. htp://dx.doi.org/10.1186/1471-2164-15-S9-S13 Genome Res. 19, 294–305. htp://dx.doi.org/10.1101/ Oms-Oliu, G., Hertog, M.L.A.T.M., Van de Poel, B., gr.083311.108 Ampofo-Asiama, J., Geeraerd, A.H., and Nicolaï, Rhee, S.Y., Dickerson, J., and Xu, D. (2006). Bioinformatics B.M. (2011). Metabolic characterization of tomato and its applications in plant biology. Annu. Rev. Plant fruit during preharvest development, ripening, and Biol. 57, 335–360. htp://dx.doi.org/10.1146/annurev. postharvest shelf-life. Postharv Biol. Technol. 62, 7–16. arplant.56.032604.144103 htp://dx.doi.org/10.1016/j.postharvbio.2011.04.010 Roossinck, M.J., Martin, D.P., and Roumagnac, P. (2015). Osorio, S., Alba, R., Nikoloski, Z., Kochevenko, A., Plant virus metagenomics: advances in virus discovery. Fernie, A.R., and Giovannoni, J.J. (2012). Integrative Phytopathology 105, 716–727. htp://dx.doi. comparative analyses of transcript and metabolite org/10.1094/PHYTO-12-14-0356-RVW profles from pepper and tomato ripening and Rosli, H.G., Zheng, Y., Pombo, M.A., Zhong, S., Bombarely, development stages uncovers species-specifc paterns A., Fei, Z., Collmer, A., and Martin, G.B. (2013). of network regulatory behavior. Plant Physiol. 159, Transcriptomics-based screen for genes induced by 1713–1729. htp://dx.doi.org/10.1104/pp.112.199711 fagellin and repressed by pathogen efectors identifes Pedreschi, R., Hertog, M., Robben, J., Lilley, K.S., Karp, a cell wall-associated kinase involved in plant immunity. N.A., Baggerman, G., Vanderleyden, J., and Nicolaï, B. Genome Biol. 14, R139. htp://dx.doi.org/10.1186/ (2009). Gel-based proteomics approach to the study gb-2013-14-12-r139 of metabolic changes in pear tissue during storage. J. Rudell, D.R., Matheis, J.P., and Hertog, M.L. (2009). Agric. Food Chem. 57, 6997–7004. htp://dx.doi. Metabolomic change precedes apple superfcial scald org/10.1021/jf901432h symptoms. J. Agric. Food Chem. 57, 8459–8466. htp:// Pedreschi, R., Muñoz, P., Robledo, P., Becerra, C., Deflippi, dx.doi.org/10.1021/jf901571g B.G., van Eekelen, H., Mumm, R., Westra, E., and de Vos, Sanger, F., Nicklen, S., and Coulson, A.R. (1977). DNA R.C.H. (2014). Metabolomics analysis of postharvest sequencing with chain-terminating inhibitors. Proc. ripening heterogeneity of ‘Hass’ avocadoes. Postharv. Natl. Acad. Sci. U.S.A. 74, 5463–5467. Biol. Technol. 92, 172–179. htp://dx.doi.org/10.1021/ Seymour, G.B., Østergaard, L., Chapman, N.H., Knapp, S., jf901432h and Martin, C. (2013). Fruit development and ripening. Pombo, M.A., Zheng, Y., Fernandez-Pozo, N., Dunham, Annu. Rev. Plant Biol. 64, 219–241. htp://dx.doi. D.M., Fei, Z., and Martin, G.B. (2014). Transcriptomic org/10.1146/annurev-arplant-050312-120057 analysis reveals tomato genes whose expression is Shulaev, V., Sargent, D.J., Crowhurst, R.N., Mockler, T.C., induced specifcally during efector-triggered immunity Folkerts, O., Delcher, A.L., Jaiswal, P., Mockaitis, K., and identifes the Epk1 protein kinase which is required Liston, A., Mane, S.P., et al. (2011). Te genome of for the host response to three bacterial efector proteins. woodland strawberry (Fragaria vesca). Nat. Genet. 43, Genome Biol. 15, 492. htp://dx.doi.org/10.1186/ 109–116. htp://dx.doi.org/10.1038/ng.740 s13059-014-0492-1 Solis, J., Baisakh, N., Brandt, S.R., Villordon, A., and La Powell, A.L., Nguyen, C.V., Hill, T., Cheng, K.L., Bonte, D. (2016). Transcriptome profling of beach Figueroa-Balderas, R., Aktas, H., Ashraf, H., Pons, morning glory (Ipomoea imperati) under salinity and C., Fernandez-Munoz, R., Vicente, A., etal. (2012). its comparative analysis with sweetpotato. PLOS ONE Uniform ripening encodes a Golden 2-like transcription 11, e0147398. htp://dx.doi.org/10.1371/journal. factor regulating tomato fruit chloroplast development. pone.0147398 Science 336, 1711–1715. htp://dx.doi.org/10.1126/ Stapley, J., Reger, J., Feulner, P.G., Smadja, C., Galindo, J., science.1222218 Ekblom, R., Bennison, C., Ball, A.D., Beckerman, A.P., Qin, C., Yu, C., Shen, Y., Fang, X., Chen, L., Min, J., Cheng, and Slate, J. (2010). Adaptation genomics: the next J., Zhao, S., Xu, M., Luo, Y., etal. (2014). Whole-genome generation. Trends Ecol. Evol.25, 705–712. htp:// sequencing of cultivated and wild peppers provides dx.doi.org/10.1016/j.tree.2010.09.002 insights into Capsicum domestication and specialization. Studholme, D.J., Glover, R.H., and Boonham, N. (2011). Proc. Natl. Acad. Sci. U.S.A. 111, 5135–5140. htp:// Application of high-throughput DNA sequencing dx.doi.org/10.1073/pnas.1400975111 in phytopathology. Annu. Rev. Phytopathol. 49, Quan, P.L., Briese, T., Palacios, G., and Ian Lipkin, W. (2008). 87–105. htp://dx.doi.org/10.1146/annurev- Rapid sequence-based diagnosis of viral infection. phyto-072910-095408 Antiviral Res. 79, 1–5. htp://dx.doi.org/10.1016/j. Studholme, D.J., Ibanez, S.G., MacLean, D., Dangl, J.L., antiviral.2008.02.002 Chang, J.H., and Rathjen, J.P. (2009). A draf genome Ratclif, F.G., MacFarlane, S.A., and Baulcombe, D.C. sequence and functional screen reveals the repertoire (1999). Gene silencing without DNA. rna-mediated of type III secreted proteins of Pseudomonas syringae cross-protection between viruses Plant. Cell. 11, pathovar tabaci 11528. BMC Genomics 10, 395. htp:// 1207–1216. dx.doi.org/10.1146/annurev-phyto-072910-095408

Curr. Issues Mol. Biol. Vol. 27 Applications of Bioinformatics to Plant Biotechnology | 103

Tang, L., Kim, M.D., Yang, K.S., Kwon, S.Y., Kim, S.H., Kim, Wang, Y., Zhou, L., Li, D., Dai, L., Lawton-Rauh, A., J.S., Yun, D.J., Kwak, S.S., and Lee, H.S. (2008). Enhanced Srimani, P.K., Duan, Y., and Luo, F. (2015). Genome- tolerance of transgenic potato plants overexpressing wide comparative analysis reveals similar types of NBS nucleoside diphosphate kinase 2 against multiple genes in hybrid Citrus sinensis genome and original environmental stresses. Transgenic Res. 17, 705–715. Citrus clementine genome and provides new insights into htp://dx.doi.org/10.1007/s11248-007-9155-2 non-TIR NBS genes. PLOS ONE10, e0121893. htp:// Sikhakhane, T.N., Figlan, S., Mwadzingeni, L., Ortiz, R., dx.doi.org/10.1371/journal.pone.0121893 and Tsilo, T.J. (2016). Integration of Next-generation Wasukira, A., Coulter, M., Al-Sowayeh, N., Twaites, Sequencing Technologies with Comparative R., Paszkiewicz, K., Kubiriba, J., Smith, J., Grant, M., Genomics in Cereals. In Agricultural and Biological and Studholme, D.J. (2014). Genome sequencing Sciences ‘Plant Genomics’, InTech, ed. htp://dx.doi. of Xanthomonas vasicola Pathovar vasculorum org/10.5772/61763 reveals variation in and genes encoding Te Arabidopsis Genome Initiative (2000). Analysis of the lipopolysaccharide synthesis, Type-IV Pilus and Type- genome sequence of the fowering plant Arabidopsis III efectors. Pathogens 3, 211–237. htp:// thaliana. Nature 408, 796–815. htp://dx.doi. dx.doi.org/10.3390/pathogens3010211 org/10.1038/35048692 Weber, A.P. (2015). Discovering new biology through Tomato Genome Consortium. (2012). Te tomato genome sequencing of RNA. Plant Physiol. 169, 1524–1531. sequence provides insights into feshy fruit evolution. htp://dx.doi.org/10.1104/pp.15.01081 Nature 485, 635–641. htp://dx.doi.org/10.1038/ Wu, J., Wang, Z., Shi, Z., Zhang, S., Ming, R., Zhu, S., Khan, nature11119 M.A., Tao, S., Korban, S.S., Wang, H., et al. (2013). Unamba, C.I., Nag, A., and Sharma, R.K. (2015). Next Te genome of the pear (Pyrus bretschneideri Rehd.). generation sequencing technologies: the doorway Genome Res. 23, 396–408. htp://dx.doi.org/10.1101/ to the unexplored genomics of non-model plants. gr.144311.112 Front. Plant Sci. 6, 1074. htp://dx.doi.org/10.3389/ Xu, Q., Chen, L.L., Ruan, X., Chen, D., Zhu, A., Chen, C., fpls.2015.01074 Bertrand, D., Jiao, W.B., Hao, B.H., Lyon, M.P., et al. VanBuren, R., Bryant, D., Bushakra, J.M., Vining, K.J., Edger, (2013a). Te draf genome of sweet orange (Citrus P.P., Rowley, E.R., Priest, H.D., Michael, T.P., Lyons, sinensis). Nat. Genet. 45, 59–66. htp://dx.doi. E., Filichkin, S.A., et al. (2016). Te genome of black org/10.1038/ng.2472 raspberry (Rubus occidentalis). Plant J. 87, 535–547. Xu, Y., Gao, S., Yang, Y., Huang, M., Cheng, L., Wei, Q., htp://dx.doi.org/10.1111/tpj.13215 Fei, Z., Gao, J., and Hong, B. (2013b). Transcriptome Velasco, R., Zharkikh, A., Afourtit, J., Dhingra, A., Cestaro, sequencing and whole genome expression profling A., Kalyanaraman, A., Fontana, P., Bhatnagar, S.K., of chrysanthemum under dehydration stress. BMC Troggio, M., Pruss, D., et al. (2010). Te genome of the Genomics 14, 662. htp://dx.doi.org/10.1186/1471- domesticated apple (Malus × domestica Borkh.). Nat. 2164-14-662 Genet.42, 833–839. htp://dx.doi.org/10.1038/ng.654 Zhang, J., Liu, J., and Ming, R. (2014). Genomic analyses of Verde, I., Abbot, A.G., Scalabrin, S., Jung, S., Shu, the CAM plant pineapple. J. Exp. Bot. 65, 3395–3404. S., Marroni, F., Zhebentyayeva, T., Detori, M.T., htp://dx.doi.org/10.1093/jxb/eru101 Grimwood, J., Catonaro, F., etal. (2013). Te high- Zhang, Q., Chen, W., Sun, L., Zhao, F., Huang, B., Yang, quality draf genome of peach (Prunus persica) identifes W., Tao, Y., Wang, J., Yuan, Z., Fan, G., et al. (2012). unique paterns of genetic diversity, domestication and Te genome of Prunus mume. Nat. Commun. 3, 1318. genome evolution. Nature Genet. 45, 487–494. htp:// htp://dx.doi.org/10.1038/ncomms2290 dx.doi.org/10.1038/ng.2586 Zhu, B., Yang, Y., Li, R., Fu, D., Wen, L., Luo, Y., and Zhu, Visscher, P.M., Brown, M.A., McCarthy, M.I., and Yang, H. (2015). RNA sequencing and functional analysis J. (2012). Five years of GWAS discovery. Am. J. implicate the regulatory role of long non-coding RNAs Hum. Genet. 90, 7–24. htp://dx.doi.org/10.1016/j. in tomato fruit ripening. J. Exp. Bot. 66, 4483–4495. ajhg.2011.11.029 htp://dx.doi.org/10.1093/jxb/erv203 Voinnet, O. (2001). RNA silencing as a plant against viruses. Trends Genet. 17, 449–459.

Curr. Issues Mol. Biol. Vol. 27