<<

Central International Journal of Plant Biology & Research

Review Aritcle *Corresponding author Jitendra Narayan, Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, UK, Unraveling the Impact of Tel: +91-7 835 999 528 / +44 0 742 477 477 0; :

Submitted: 24 January 2015 Bioinformatics and Omics in Accepted: 20 April 2015 Published: 24 April 2015 Agriculture ISSN: 2333-6668 Copyright Rahul Agarwal1 and Jitendra Narayan2* © 2015 Narayan et al. 1Department of Animal and Aquacultural Sciences, Norwegian University of Life OPEN ACCESS Sciences (NMBU), Germany 2Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, UK Keywords • Agriculture Abstract • Crops The existence of diverse communities of plant species, including crops is crucial • Bioinformatics for maintaining the ecological balance between human kind and environment. Their • Omics existence also ensures the continuous food supply to human and animals. Therefore, it • Sequencing is critical to use modern biotechnological techniques in breeding scheme to increase • Breeding the productivity of the economical crops and plants so we can able to continuously feed the billions of organism in this planet. We have seen tremendous progress in the field of bioinformatics, genomics and sequencing recentlyand their potential to improve the economical and agronomic traits in various plants. Agriculture bioinformatics is about using latest genomic advancement, and bioinformatics tools and databases to enrich scientific community with genetic knowledge to yield drought, disease and insect resistant crops and other plant species. With current sequencing technologies, it is possible to sequence thousands of plant species together and then assemble those individually using de novo genome assemblers. Availability of genomic information make possible to trace candidate genes and mutationsassociated with particularcomplex trait. Methylation and microRNAs data make possible to disclose epigenetic regulation of candidate genes. This article reviews how current scenario in agriculture related research restructured entirely under influence of tremendous growth in bioinformatics and omics technology.

INTRODUCTION The last decade has witnessed the dawn of a new era of bioinformatics and computational biology which increases the Agriculture is not only an important occupation of some peoples, but also way of life, culture, and custom of two-third of working population for their livelihood [1-3]. Rice, wheat, wepace usually of scientific do plant discovery related researchin life science in previous [13,14]. decades. Involvement Rapid barley, corn, sorghum, millet, sugar cane - ever since the Neolithic groundof computer breaking science evolution in field of plant sequencing biology technology has change over the way few revolution, cereals have always been considered as staple food past years made this technology so cost-effective that nowadays it in human populations across different continents [4,5]. From is usual for any experimental lab to employ sequencing methods thousands of years, humans are using breeding and selection to to study genome of interest [15,16]. Including modern bio- create the domestic varieties of these crops with desired traits technology advancement in agriculture will surely provide reap [5-8]. Considerable progress has been accomplished in taste, huge dividends to the bioenergy sector, agro-based industries, nutritional value and productivity, notably during the “Green agricultural by-products utilization, plant improvement, and Revolution” which took place between 1960 and 1970 [9,10]. better management of the environment [17,18]. Latest genome However, the Green Revolution has also known for its failures, and and transcriptomics sequencing of a plant species make possible we no longer able to surviveby few “high yield” varieties [11,12]. to reveal the genetic architecture of numerous plant species, Now, we need to use more advancedbiotechnologymethods the differences in thousands of individuals within and outside in agronomy in order to provide nutritionalfood tocontinuous population, the genes and mutations essential for improving the increasing world population while considering three major limitations- lessarable lands, depletion of energy resources and unpredictable climatechange.In other word, we need to increase specific desired complex traits [19-22]. the pace of research so we can able to secure enough food for disciplines, have led to an exponential growth of plant genomics, future generations. transcriptomics,The recent technological epigenomics, advances proteomics in the field and of metabolomics omics related

Cite this article: Agarwal R, Narayan J (2015) Unraveling the Impact of Bioinformatics and Omics in Agriculture. Int J Plant Biol Res 3(2): 1039. Narayan et al. (2015) Email: Central data [23-25]. Hence there are immense responsibility on SEQUENCING AND OMICS RELATED DEVELOP- bioinformatics scientist to generate hypothesis and tools to MENT IN PLANT GENOMICS analyze these huge pileups of data. Data mining is a research Recently, lots of fast, high throughput and affordable sequencers from different biotechnology companies available computational tools to overcome the obstacles and constraints area that aims to provide the analysts with novel and efficient posed by the traditional statistical methods [26]. The current genomics to unthinkable pace[30-33]. Most of present sequencers mission inarea of bioinformatics is to provide the tools which not arein the coming market from which Illumina already platform; the research for example activity HiSeq2000 in the field has of a only give impeccablesolution, but also able to provide solution in short span of time. With huge data, there is also demand of paired end reads per run, and most of Illumina sequencer based onpotential the principle of obtaining of shotgun ∼600 approachGb of sequence where datapieces in of100 DNA bp ×are 2 so much information in lesser space [27]. Visualization and sheared randomly, cloned, and sequenced in parallel fashion integrationefficient data of different storage mechanism,kinds of omics thus data we are can also able two to major store [34]. Generally, sequencing of anorganismwould likely produce a challenges in bioinformatics [28,29]. thousands of millions of paired end reads which need to assemble together into contigs and then into scaffolds which may later on Therefore, we need to use genomics resourcesavailable for assign with chromosome name based on presence of physical many non-model and model plant species as a result of rapid map of that organism [35,36]. At the forefront of plant genomics is the model dicotyledonous plant, Arabidopsis. Starting with the Arabidopsis EST project in the early 1990s [37,38] and knowntechnological as ‘Plant advances Genomics’. in omics Within and the bioinformatics scope of plant fields genomics, which finally led us to recognize new translational area of plant science we will be able to do following activities: Arabidopsis has led the plant community in capitalizing on genomicculminating era. with the first complete plant genome in 2000 [39], i) Sequencing and de novo assembly of non-model plant species EST SEQUENCING ii) Create an exhaustive inventory of genes with their Sequencing of Arabidopsis ESTs using traditional Sanger functional annotation and ontology. sequencing was done with a purposes to performthe gene discovery and its annotation, expression study, comparative iii) Discovery of a large quantity of SNP/ InDeLs markers to throughout whole genome [40]. (Table 1) represent list of some analysis, and to discover gene specific molecular markers iv) assistIdentify in fine “candidate mapping and genes/mutations/alleles”selection of superior breed. in agriculturally important plant species with its number of EST association withdesired traitsafter demarcating sequences (size) stored in NCBI dbEST which provide us access underlying QTLs from markersgenerated in ii) using QTL toEST data of all sequenced organism. If an EST matches a DNA mapping methods e.g. GWAS. Table 1: dbEST record of some plant species. v) Create “MarkerChip Panel” for the purpose of genotyping * and selection. This panel can also be used for other similar No. Scientific Name Common Name Sizes variety of breeds. 1 Arabidopsis thaliana Thale cress 1,529,700 2 Oryza sativa Rice 1,253,557 vi) To study evolutionary pattern of genome within and among populations (population genomics). 3 Zea mays Maize 2,019,137 4 Triticum aestivum Wheat 1,286,372 However, we always need to integrate bioinformatics 5 Brassica napus Oilseed rape 643,881 discoverywith experimental one for functional validation as Hordeum vulgare + subsp. computational based approach is always statistically biased and 6 Barley 501,838 vulgare suffers from machine and systemic errors. Nevertheless, there are 7 Glycine max Soybean 1,461,722 numerous challenges which need to overcome at computational level: 8 Pinus taeda Loblolly 328,662 9 Solanum lycopersicum Tomato 297,142 i) Need to improve the experimental protocol to sequence 10 Malus x domestica Apple tree 325,020 complex plant species in order to avoid creating erroneous reference genome. 11 Medicago truncatula Barrel medic 269,501 12 Solanum tuberosum Potato 249,761 ii) 12 Sorghum bicolor Sorghum 209,835 iii) EfficientSequencing storage data and need compression to be correctlyannotatedandbased algorithms 13 Nicotiana tabacum Tobacco 334,808 visualized better. Brassica rapa subsp. 14 Chinese cabbage 168,703 pekinensis iv) with progressively more complex data and simultaneously 15 Ricinus communis Castor bean 62,592 ableHence, to weintegrate need more all omics efficient data and to fastmake algorithms better sense to deal of 16 Raphanus sativus Radish 110,006 any investigating problem. * denotes the number of EST sequences

Int J Plant Biol Res 3(2): 1039 (2015) 2/14 Narayan et al. (2015) Email: Central sequence that codes for a known gene with a known function, Among plant scientists, today most pursuing area is that gene’s name and function are placed on the EST record. Annotating EST records allows users to use dbEST as an avenue experimental conditions for instance to list the differential genes for gene discovery. By using a database search tool, such as NCBI’s ingenerating two conditions expression differs profile by of fungal mRNA infectedwith respect plant to and exposed non- BLAST [41], any interested party can conduct sequence similarity infected plant . RNA- Sequencing is also popular among plant searches against dbEST. Current dbEST record of plant species is scientist as it does not demands availability of reference genome available at URL: http://www.ncbi.nlm.nih.gov/dbEST/dbEST_ and it also allows simultaneous determination of assembled summary.html novel transcripts, splice sites, and polymorphisms along with

PLANT GENOME SEQUENCING so popular for the rapid development of genetic resources in generation of expression profile,hence RNA sequencingbecome De novo assembly of complex plant genomes is not possible plant species[56]. Besides mRNA, we could investigate role of to achieve with reads from single type of sequencing machine, small RNAs (microRNA,small interference RNA and piwiRNA) we need to use reads coming from different sequencers in order in developmental process using small RNA sequencing reads. to accomplish complete sequencing due to higher ploidy and Small RNAsare basically 15-30 nt long derived from total complexity of plant genome [42-45]. We could use sequencer RNAs role of microRNAin plants still not very obvious, however gaps created by Illumina reads. These sequencers however previousand studies present have abundantly suggested in eukaryotictheir role in species. gene regulation Specifically, at employfrom other different companies chemistry than Illumina to sequence which can DNA. able Nowadays, to fill the transcriptional and post-transcriptional levels, genome stability numerous tools are available for sequence assembly like Phred/ and defense mechanism[57,58]. There are many tools and Phrap/Consed (http://www.phrap.org), (http://www. databases available for searching small RNAs and their targets in broad.mit.edu/wga/), and GAP4 (http://staden.sourceforge. crop, and other plant species as given in (Table 2). net/overview.html). A modular, open-source package called Epigenetics has also gain place in plant science where we use AMOS (http://www.tigr.org/software/AMOS/) from TIGR can cytosine methylome (methylC-seq) and small RNA transcriptome be used for comparative genome assembly. In addition, there (small RNA-seq) for revealing methylated and non-methylated are many other denovo assembly tools available which can run patterns throughout the genome[59] for instance Arabidopsis in high performance cluster like SOAPdenovo, Newbler, Velvet, Oasis, etc(http://bioinformaticsonline.com/pages/view/11457/ a straight relationship between the location of small RNAs and commercial-and-public-next-gen-seq-ngs-software) DNAinflorescences methylation disclosed [60]. Agenome-scale genome wide methylation pattern of patterns methylated and Detection of genome wide markers from whole genome re- sequencing make possible to trace candidate mutations and [61]. Chromatin immune precipitations (ChIP) along with alleles associated with particular traits[21,46,47]. Genome-wide sequencingcytosine in Arabidopsis technology was or obtained tiling array by using (ChIP-seq bisulfite or sequencing ChIP-chip) association study (GWAS) is widely used statistical method for

(H3K4me2,are basically used H3K4me3, for generatingregions H3K27me1, H3K27me2, of histone modifications H3K36me3, genome-scaledefining a list array of intervals of markers[48]. (or QTLs) Arabidopsis found to be1001 significantly Genomes H3K56ac,in entire genome.H4K20me1 In and Arabidopsis, H2Bub) were eight recently histone reported modifications [62]. associated with specific complex trait in plant genome using The Epigenetics of Plants International Consortium web site which were initiated in 2008 with an aim to catalogue the (https://www.plant-epigenome.org/) gives further detail about whole-genomeProject is one of known sequence big project variations in the in field 1,001 of plant Arabidopsis genomics current epigenetic effort in different plant species[63]. Sequence strains (http://1001genomes.org) Read Archive (SRA) in NCBI (http://www.ncbi.nlm.nih.gov/ of the project, whole genome paired-end re-sequencing of sra), European Nucleotide Archive (http://www.ebi.ac.uk/ena/ 80 Arabidopsis thaliana accessions[49]. from By end 8 regions of first across phase home) and DDBJ Sequence Read Archive (http://trace.ddbj.nig. Eurasia revealed almost 5 million SNPs and close to 1 million ac.jp/dra/index_e.shtml) are two major database which store small InDeLs and other structural variant-InDeLs, inversions, raw sequencing data from NGS platforms, users can see detail and highly diverged regions [49]. More updated information information about how sequencing done and download the raw available at http://1001genomes.org/data/MPI/MPICao2010/ sequencing data for academic purpose[64]. releases/current/ and http://polymorph.weigelworld.org/cgi- bin/webapp.cgi METABOLOMICS AND INTERACTOMICS Restriction site-associated DNA sequencing (RADSeq) is generally used to scan all the metabolites present in sample using becoming more acceptable method to obtain genome wide LC-MS,Metabolomics NMR-MS and is fast GC-MS emerging instruments field in like the inworld human,where of omics, and we markers across many plant genomes[50-52]. RAD-Seq provides determineall the possible metabolites which directly or indirectly advantage over whole genome re-sequencing as it reduces the indicatefood habit of an individuals whose urine samples were collected, analyzed in one of MS instruments and obtained data sites demarcated by restriction enzymes. With RADSeq method, process computationally[65]. In similar way, in plant we can itcomplexity is possible of to genome distinguish by focusing and score sequencing each markers only all at specificat once and hence possible to know source location of each marker[53]. conditions such as treatments, tissues and genotypes[66]. Therefore, RADSeq is not only limited to genotyping and SNP Metabolomicfind out metabolites data was in obtainedrelation toin numeroustypescase of Arabidopsis of biological and the discovery, but also to quantitative genetic and phylogeographic obtained metabolites details were stored in AtMetExpress[67]. related works[54,55].

Int J Plant Biol Res 3(2): 1039 (2015) 3/14 Narayan et al. (2015) Email: Central

Table 2: List of small RNA databases and tools in plants. Database and Tools URL Purpose Provide entry to small RNA data and resources from Arabidopsis small RNA Project http://asrp.danforthcenter.org/ the Carrington laboratory Provide sequencing by synthesis based small RNA Arabidopsis small RNA database http://mpss.udel.edu/at_sRNA/ data Platform for providing cereals small RNA data and Cereal small RNAs Database http://sundarlab.ucdavis.edu/smrnas/

It will perform high-throughput analysis of next- Plant Small RNA Target Analysis http://plantgrn.noble.org/psRNATarget/ generationtools for finding data tosmall give RNAs a putative and their list oftargets miRNA and Server their target pairs. PMRD integrates the available large information of plant microRNAs data, consisting of microRNA Plant microRNA database(PMRD) http://bioinformatics.cau.edu.cn/PMRD/ sequence and their target genes, secondary

browser, etc. Thisdimension site provides structure, link expression to tools for profiling, the analysis genome of UEA snRNA toolkit http://srna-tools.cmp.uea.ac.uk/plant/cgi-bin/srna-tools.cgi high-throughput small RNA data It is collection of predicted miRNA and precursor miRNA Precursor Candidates http://sundarlab.ucdavis.edu/mirna/ candidates for the Arabidopsis genome predicted by for Arabidopsis thaliana

psRobot: Plant Small RNA Analysis http://omicslab.genetics.ac.cn/psRobot/ Toolbox targets'findMiRNA' method PLncDB: Plant Long noncoding It can identifies stem-loop shaped smRNAs and their http://chualab.rockefeller.edu/gbrowse2/homepage.html Repository of Arabidopsis long noncoding RNAs RNA Database MiSolRNA http://www.misolrna.org/about Provides tomato miRNA data Provides access to small RNA data of Phytophthora Phytophthora smallRNA Database infestans and P. sojae, plant pathogens responsible http://phytophthora-smallrna-db.cgrb.oregonstate.edu/ for multibillion-dollar damage to crops, ornamental plants, and natural environments This page provides us with a list of tools to miRNAtools 2.0 https://sites.google.com/site/mirnatools2/ investigate miRNAs and their regulatory actions

also been done in various crop combination of different sequencing technologies to bypass these species[68-73]. problems as suggested in many previous works [79,80]. Profiling of metabolites has Interactomics (or Protein–protein interactions) activity However, the range of sequenced plant genome size varies in plants is mostly restricted to Arabidopsis and rice in which from small 63.6 Mb (Genlisea aurea) to extremely large 22.18Gb large scale interactome map was prepared using yeast two- (Pinus taeda) which is about seven times larger than human hybrid (Y2H) based method. The interactome is made up from genome and almost 82 percent of its genome is occupied by a complete set of all protein–protein interactions which help duplicated regions in comparison to only 25 percent in human us to comprehend the molecular networks governing cellular [81,82]. Another large genome sequenced recently is Norway systems[74,75]. Interaction map of Arabidopsis revealed about spruce (Picea abies) which has about 20 Gb size with astonishing thousands of highly reliable interactions between proteins facts that number of genes in this gymnosperm plant are almost (Arabidopsis Interactome Mapping Consortium 2011)[76]. as same as in Arabidopsis and negligible amount of recent duplication event, however due to consistent gain of different PLANT GENOMES long-terminal repeat transposable elements in its genome, this Major sequencing efforts were started with model crop conifer has gained so huge size [83] species -Arabidopsis, Rice and Maize genome and then extend There are reportedly more than three dozenof plant to other non-model species. Model species acts as a platform species so far sequenced completely as more updated genome to pass functional information to their closely related species. information available in Plant Ensembl (http://plants. Plant genome is usually known for having lots of complexities ensembl.org/info/website/species.html), Plant NCBI(http:// like large genome size, higher ploidy level, higher heterozygosity, www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html#C_ and abundant repetitive regions including various kinds SEQ),CoGePEdia (http://genomevolution.org/wiki/index.php/ of duplicated regions[77,78]. Most of the current assembly Sequenced_plant_genomes), PGSB plant databases (http://mips. algorithms,somehow, struggle to deal with so many complexities, helmholtz-muenchen.de/plant/genomes.jsp) and Phytozome hence we need to remove or alleviate these complexities by database (http://www.phytozome.net) which give us entry to either preparing large sequencing insert libraries or use of mate- 41 sequenced green plant genomes and these genomes were pairedreadsor sequence DNA samples from homozygous lines or combination of these.Other way to deal with problem to use nodes.Some of these plant species details are given in Table 3. clustered into gene families at twenty evolutionarily significant

Int J Plant Biol Res 3(2): 1039 (2015) 4/14 Narayan et al. (2015) Email: Central

Table 3: List of some published plant genome.

* ** chromosomes 2 and 4 and a single rice chromosome. Moreover, Species Name Size (~Mb) #of Chr constructed which defined 6 collinear blocks between maize (mouse ear cress) 115 5 Arabidopsis thaliana 1 and 5 with maize chromosomes 3, 6 and 8 shows the presence Bracypodium distachyon 355 5 the fine-scale analysis of collinearity between rice chromosomes Brassica rapa (Chinese cabbage) 284 10 duplication events in rice[93]. of internal rearrangements within collinear regions and five new Cajanus cajan(pigeonpea) 883 11 MODEL PLANT SPECIES Carica papaya (papaya) 372 9 Organisms that are suitable for use in experimental research Cucumis sativus (cucumber) 203 - are termed as model species. They have a number of properties Fragaria vesca(woodland strawberry) 240 7 that make them ideal for research purposes including short life Glycine max (soybean) 975 20 spans, rapid reproduction, inexpensive and they can be easily Medicago truncatula (barrel medic) 241 8 manipulated at the genetic level[94]. Arabidopsis thaliana and Oryza sativa (apple) 881.3 - Malus domestica completely sequenced with large international effort and Oryza sativa× (rice, japonica) 372 12 collaboration[95,96]. are two Their first majorgenomes plant are species comparably whose smaller genomes in Panicum virgatum (switchgrass) 1,230 - comparison to the maize genome[95,97]. Populus trichocarpa (poplar) 422.9 19 Arabidopsis thaliana is widely used by plant researchers as Ricinus communis (castor bean) 400 - a model organism to study plant developmental processes[95]. Pinus taeda(loblolly pine) 22,180 27 There are many genes which most plants shared with each other and the study of genes in a model organism like A. thaliana Solanum tuberosum(potato) 800 12 facilitates our understanding of gene expression and function in Sorghum bicolor (sorghum) 730 10 most of the plants[98]. Furthermore, since animals and plants Theobroma cacao(cacao) 346 - are both eukaryotes, many of the genes found in A. thaliana have Vitis vinifera (grapevine) 487 19 homologs in animals. Arabidopsis has the smallest genome of Zea mays (maize) 3,233 10 a model organism for genome sequencing. The whole genome * Size column is the haploid genome size (in megabases [Mb]) of the ofany Arabidopsis flowering plant, is made which up is of the about main 135reason; million it was bases, selected which as sequenced organism. It may be either the calculated size, based on sequence, or the estimated size, based on the literature or other resources ** of Arabidopsis thaliana was done by means of BAC-by-BAC # Chr (Number of Chromosomes) denotes the haploid number of are anchored to five chromosomes. The complete sequencing chromosomes that the sequenced organism has. strategy and chromosomes were assigned using many different kinds of genetic and physical maps[39]. However, gaps still Moreover, the availability of DNA sequence and encrypted exist in Arabidopsis genome (ftp://ftp.arabidopsis.org/home/ knowledge does not tell us directly how these genetic information tair/Sequences/whole_chromosomes/tair9_Assembly_gaps. leads to the observable traits and behaviors (phenotypes). This gff). Scientists have used Arabidopsis for the past 40 years as research in which the genome sequences of different species equivalent genes in other plants, such as rice, corn, potatoes or andcan bewide inferred varieties with of comparative organisms from genomics, bacteria a fieldto human of biological — are tomatoes[99].the model for finding a gene and have used it as a guide to find compared to understand the evolutionary mechanism and forces Oryza sativa, a cereal which was sequenced on priority at the molecular level. It also provides a powerful tool for studying basis as it is staple food in many developing and poor countries evolutionary changes among organisms, helping to identify genes and due to this fact that scientist believed having complete that are conserved or common among species, as well as genes that genome sequence of rice will improve nutritious quality of rice give each organism its unique characteristics[84,85]. In addition, cultivars and has gained the status “model organism” for cereal these comparative techniques provide an overview of natural biology as it is easier to modify genetically[100,101]. It has the plant variation in time and space, and help us to understand smallest genome amongst all the cereals: 430 Mbarranged on 12 the fundamental questions about the origin, structure, and chromosomes, and it can serve as a model genome for one of the evolution of genetic diversity[86-88]. In comparative genomics the preserved order of genes on chromosomes of related species Because it has been the subject of studies on yield, hybrid vigor, which descent from a common ancestor is termed as synteny genetictwo main resistance groups of to flowering disease and plants, adaptive the monocotyledons[102]. responses, scientists have taken advantage of the existence of a multitude of varieties at http://bioinformaticsonline.com/blog/view/4574/tools-to- that have adapted to a very wide range of environmental detect-synteny-blocks-regions-among-multiple-genomesregion which are defined with various available tools mentioned. in tropical regions[103]. Essential biological information from a invaluable technique to identify similarities and differences theconditions, rice genome from dry will soil undoubtedly in temperate improve regions ourto flooded understanding cultures betweenThese speciescomparative and enables approach the of transfer synteny of blocks information definition from is of the basic genomics and genetics of other related and one species to other and also assists in the reconstruction of ancestral genomes[89-92]. In recent years, the orthologous and and members of the grass family, but also dicot crops such as syntenic regions between maize and rice chromosomes were economically significant crops, not only wheat, corn, sorghum,

soybean and cotton[101]. The finished genomes will assist in Int J Plant Biol Res 3(2): 1039 (2015) 5/14 Narayan et al. (2015) Email: Central

Table 4: Details of plants databases including agriculturally important crops. Database Name Description Database URL In the fall of 1999, the ACWW (AGI, CSHL, WashU, Univeristy of Wisconsin) consortium was awarded a grant from the USDA -CSREES/NSF/DOE rice ACWW Rice Genome genome initiative to sequence and annotate the short arms of chromosomes http://www.genome.arizona.edu/shotgun/ Sequencing Project: 3 and 10. Together with The Institute for Genomic Research (TIGR) and rice/ Chromosome 3 & 10 chromosomes as part of the International Rice Genome Sequencing Project. databasethe Plant Genomeof safety Initiativeinformation at Rutgers includes (PGIR), not only we plants plan to produced finish both using recombinant DNA technologies (e.g., genetically engineered or transgenic CERA's database of http://www.ceragmc.org/?action=gm_crop_ plants), but also plants with novel traits that may have been produced safety information database using more traditional methods, such as accelerated mutagenesis or plant breeding. These latter plants are only regulated in Canada. This database has (1) Agro-ecological zoning system (AEZ) methodology and software use several databases, models and decision support tools for better planning, management and monitoring of land resources. http://www.sdnbd.org/sdi/issues/ Agricultural Database Ordering details. (2) A computer program for irrigation planning and agriculture/database/index.htm management(CROPWAT) helps irrigation engineers/irrigation agronomists carry out standard calculations for design and management of irrigation schemes. (3) Agricultural Statistics of Bangladesh The Arabidopsis membrane protein library is a collection of polytopic AMPL membrane protein sequences (containing two or more predicted membrane http://wardlab.cbs.umn.edu/arabidopsis/ spanning domains) from the model plant Arabidopsis thaliana. Arabidopsis ABC http://www.arabidopsisabc.net/ transporters Arabidopsis at PlaCe Site with info on P450, glucosyltransferases, etc. http://www.p450.kvl.dk/ Arabidopsis Swiss-Prot Links to A.thaliana WWW sites and to Swiss-Prot entries http://www.uniprot.org/docs/arath list

BeansGenes is a plant genome data base which currently contains BeanGenes http://beangenes.cws.ndsu.nodak.edu/ information relevant to Phaseolus and Vigna species BeanRef-Links and references from literature to different BeanRef is a collection of external links and references from literature to http://www.nenno.it/Beanref/?p=Beanref aspects of research on different aspects of research on beans (Phaseolus and Vigna). beans (Phaseolus and Vigna) CottonDB is a database that contains genomic, genetic and taxonomic CottonDB http://cottondb.org/ information for cotton (Gossypium spp.). http://compbio.dfci.harvard.edu/tgi/cgibin/ DCFI LeGI DFCI Tomato Gene Index tgi/gimain.pl?gudb=tomato Dendrome Forest trees genome database http://dendrome.ucdavis.edu/

ePIC is a major project to bring together all of Kew's digitised information electronic Plant about plants and make it easier to search. You can use it to pinpoint Information Centre information of interest in our varied collections, bibliographies, http://www.kew.org/epic/index.htm (ePIC) nomenclators and checklists, publications and taxonomic works, as well as links to information resources provided by external organisations. Where further information from Kew is available online

GBIF Global Biodiversity Information Facility http://www.gbif.org/

This database is created for comparative and functional genomics in plants GreenPhylDB http://www.greenphyl.org/cgi-bin/index.cgi and contains a record of gene families, and broad taxonomy of green plants.

GrainGenes is a genetic database for Triticeae, oats, and sugarcane, being assembled as part of the United States Department of Agriculture, National Agricultural Library's Plant Genome Program, initiated by Jerome P. Miksche GrainGenes and currently directed by Henry L. Shands. Additional support is provided http://wheat.pw.usda.gov/index.shtml by ITMI, the International Triticeae Mapping Initiative, through a grant from the USDA/DOE/NSF Joint Program on Collaborative Research in Plant Biology.

Gramene A comparative mapping resource for grains http://www.gramene.org/

Int J Plant Biol Res 3(2): 1039 (2015) 6/14 Narayan et al. (2015) Email: Central

Integrated Taxonomic authoritative taxonomic information on plants, animals, fungi, and microbes http://www.itis.gov/ Information System ofITIS, North the IntegratedAmerica and Taxonomic the world Information System! Here you will find KOMUGI is an integrated database of wheat created by Japanese researchers http://www.shigen.nig.ac.jp/wheat/ KOMUGI (Japan) in wheat sciences. komugi/top/top.jsp The Legume Information System (LIS) is a publicly accessible legume resource that integrates genetic and molecular data from multiple legume species and enables cross-species genomic, transcript and map comparisons. The intent of the LIS is to help researchers leverage data-rich model plants LIS http://comparative-legumes.org/ traverse between interrelated data types. LIS, a component of the Model Plantto fill knowledgeInitiative (MPI), gaps isacross being crop developed plant species as part and of a provide cooperative the ability research to agreement between the National Center for Genome Resources (NCGR) and the USDA Agricultural Research Service (ARS). MaizeGDB is the community database for biological information about the crop plant Zea mays ssp. mays. Genetic, genomic, sequence, gene product, MaizeDB/MaizeGDB http://www.maizegdb.org/ functional characterization, literature reference, and person/organization contact information are among the datatypes accessible through this site MAIZEWALL Database and Developmental Gene http://www.polebio.scsv.upstlse.fr/ MAIZEWALL/ http://mips.gsf.de/proj/plant/jsf/athal/ MAtDB MIPS Arabidopsis thaliana database Expression Profiling of Cell Wall Biosynthesis and Assembly in Maize index.jsp http://www.shigen.nig.ac.jp/rice/ Oryzabase Japanese rice genome database oryzabase/top/top.jsp Structural and functional annotation of 25 species Contains 909,850 genes out of which 85.8% are protein coding. http://bioinformatics.psb.ugent.be/plaza/ PLAZA 2.5 These protein coding genes are clustered in 32,294 multi-gene gene families https://bioinformatics.psb.ugent.be/ which turned out into 18,547 phylogenetic trees. knowledge/wiki-plaza/tools Also included tools for performing comparative genome analysis Rice Annotation Database (RAD) is a contig-oriented database for high- RAD quality manual annotation of RGP, which can present non-redundant contig http://golgi.gs.dna.affrc.go.jp/SY-1102/rad/ analyses by merging the accumulated PAC/BAC clones. RGP Rice Genome Research Program http://rgp.dna.affrc.go.jp/ Rice Genome Automated Annotation System: A rice genome automated RiceGAAS annotation system; integrates programs for prediction and analysis of http://ricegaas.dna.affrc.go.jp/rgadb/ protein-coding gene structure. The rice membrane protein library is a collection of polytopic membrane RMPL protein sequences (containing two or more predicted membrane spanning http://wardlab.cbs.umn.edu/rice/ domains) from Oryza sativa The soybean genomics and microarray database (SGMD) was established in 1999 to serve as a sequence and microarray database for the Soybean Genomics and Improvement Laboratory (SGIL), Beltsville Agricultural Research Center (BARC) and collaborators. It serves both as a sequence http://bldg6.arsusda.gov/benlab/sgmd_ SGMD repository, holding DNA sequences for numerous EST's, and also as public.htm a microarray experiment database. It allows scientists to explore the expression levels of the EST clones stored in the database and to correlate expression levels with function The SOL Genomics Network (SGN) is a database and website dedicated to Solanaceae Genomics the genomic information of the nightshade family, which includes species http://www.sgn.cornell.edu/ Network such as tomato, potato, pepper, petunia and eggplant. Soybase Soybase contains Soybean Tools and Genetic Information http://soybase.agron.iastate.edu/ Systematic Botany and Site of the U.S. Agricultural Research Service includes databases on http://nt.ars-grin.gov/SBMLWeb/ Mycology Laboratory taxonomy of vascular plants and fungi. The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence TAIR along with gene structure, gene product information, metabolism, gene http://www.arabidopsis.org/ expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Goals include discovering maize genes using EST sequencing and engineered Maize Genetics and Mu (RescueMu) tagging, and developing new tools to facilitate gene mapping http://www.maizegdb.org/ Genome Database and phenotypic analysis

Int J Plant Biol Res 3(2): 1039 (2015) 7/14 Narayan et al. (2015) Email: Central plant breeding as breeders will be able to determine whether a through which we can get relevant information about particular seed contains a particular sequence of interest which may favors plant species. Some of them are as follows: desirable trait. If a gene is known to contribute to trait of interest, PlantTribes2.0: The PlantTribes database (http://fgp.bio. variants of this gene can be examined in other varieties. Tools psu.edu/tribedb/10_genomes/index.pl) is a plant gene family and resources are being developed to maximally interpret the rice genome sequence[90,104]. These include improvement of plant species: Arabidopsis thaliana, Carica papaya, Medicago gene prediction programs, expansion of the rice EST and cDNA truncatuladatabase based, Oryza on sativa the inferred and Populus proteomes trichocarpa of five. It sequenced uses the graph-based clustering algorithm MCL [109] to classify all of As the data from rice genomics can be leveraged rapidly to other these species’ protein-coding genes into putative gene families, grassresources, species, and identification it is imperative of molecular that resources resources be developed for mapping. to called tribes, using three clustering stringencies (low, medium exploit rice in this manner. Current efforts to extend the rice and high). For all tribes, it generates protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database putative orthologs, alignment of rice BAC/PAC sequences with of microarray experimental results is linked to the genes, which plantsequence gene data indices, to other and genomesintegration includes of rice the sequence identification data into of lets researchers identify groups of related genes and their comparative genetic maps. All of these efforts will continue to be tribes can be related to traditional gene families and conserved by the accomplishments of the Arabidopsis Genome Initiative, expression patterns. Unified nomenclatures were developed, and knowledgerefined as more of rice sequence and its closeinformation relatives is collected.in the grass As familyevidenced will iteration of MCL clustering, connect distant, but potentially be exponentially increased in the next few years [105,106]. domain identifiers. SuperTribes, constructed through a second The Zea mays is one of most popular plant model species used 000 plant proteins was used as a scaffold for sorting ~4 million regularly for in-depth research in the area of plant domestication, additionalrelated gene cDNA clusters. sequences The from global over classification 200 plant species. of nearly All data 200 genetics, cytogenetic, and comparative genomics in relevant species. Presence of large mutant stocks, huge heterochromatic chromosomes, higher nucleotide diversity, transposition and analyses are accessible through a flexible interface allowing mutagenesis, simple mechanism of controlled pollinations and studyusers [110]. to explore In this the latest classification, version, they to have place introduce query sequencesadditional strict gene colinearity with other closely related plant species has within the classification, and to download results for further based on OrthoMCL algorithm [111] Analysis of the complete maize genome offers the opportunity get another fine scale classification for identifying othologuous genes tomade know maize the acomplete significant gene model tool speciebox of formaize plant and scientists to learn [107].about TropGENE-DB: TropGENE-DB is a crop information system the system biology of maize (http://mips.gsf.de/proj/plant/jsf/ created to store genetic, molecular and phenotypic data of the maize/index.jsp). numerous yet poorly documented tropical crop species. The most common data stored in TropGENE-DB are information AGRICULTURALLY IMPORTANT BIOLOGICAL on genetic resources (agro-morphological data, parentages, DATABASE and allelic diversity), molecular markers, genetic maps, results At the beginning of the “genomic revolution”, the main of quantitative trait loci (QTL) analyses, data from physical task of bioinformatics was to create and maintain databases to mapping, sequences, genes, as well as the corresponding store biological information, such as nucleotide and amino acid references. TropGENE-DB is organized on a crop basis with sequences. A biological database is a large, organized body of currently three running modules (sugarcane, cocoa and banana), persistent data, usually associated with computerized software with plans to create additional modules for rice, cotton, oil palm, designed to update, query, and retrieve components of the data coconut, rubber tree, pineapple, taro, yam and sorghum. The TropGENE-DB information system is accessible for consultation containing many records, each of which includes the same set of via the internet at http://tropgenedb.cirad.fr/tropgene/JSP/ information.stored within For the example,system. A a simple record database associated might with be a anucleotide single file index.jsp [112] sequence database typically contains information such as contact The FlagDB database (http://urgv.evry.inra.fr/projects/ name; the input sequence with a description of the type of FLAGdb++/HTML/index.shtml) characterizes a large integrative collection of the structural and functional annotations, and it was isolated; and, often, literature citations associated with ESTs from six different plant species. Aditionally, there are also molecule; the scientific name of the source organism from which the sequence. Database development involved not only involves information about novel gene predictions, mutant tags, gene design and store data but also the development of user friendly families, protein motifs, transcriptome data, repeat sequences, GUI so researchers could both access existing data and submit primers and tags for genomic approaches , subcellular targeting, new or revised data e.g., NCBI, Ensembl. [108]. secondary structures, tertiary models, curated annotations and Now a day, primarily plant genomics research is revolving mutant phenotypes available in this database [113,114]. around the study of plant genomics and transcriptomics to The CATMA database (Complete Arabidopsis Transcriptome Micro-Array, also at http://www.catma.org); CATMA was for completion over the next few years, and many additional initiated in Génoplante, and is now a European consortium. A improve the beneficial traits. As more plant genomes scheduled genome and transcript omics projects being initiated, there is a complete structural annotation of the Arabidopsis genome is huge need to expand storage capacity of existing plant genome offered in CATMA, obtained via new gene prediction software: databases to store information. There are many useful databases Eugène, also developed partly in Génoplante . For each predicted

Int J Plant Biol Res 3(2): 1039 (2015) 8/14 Narayan et al. (2015) Email: Central

capacity building for diagnostics in plant health, including linkage whenever possible (21120 GSTs). These two steps were carried developments between training and research organizations., outgene, by a ad specific hoc software (GST) developed was computed, by Génoplante: and primers SPADS designed [115] . create and use educational tools for training undergraduates/ postgraduates, engender public awareness about plant health The Plant genomedatabase: PlantGDB (http://www. concerns in Australia. plantgdb.org/) is a catalogue of genomic sequences of all the plant species and for purpose performing comparative genomics. Tree fruit Genome Database Resources (tfGDR): This database This also database categorizes EST sequences into contigs that is a repository of numerous completed fruit tree genomes that may characterize unique genes [116]. also provided with bioinformatics resources and software tools USDA Plants Database : Impressive Department of toscientists help scientists can use into theiraddress effort specific to produce problem. durable This diseasedatabase and is Agriculture site providing taxonomic names, checklists, and pest resistance variety of fruits. This database is available at distributional, phylogenetic, and other data for vascular plants, (http://www.tfgdr.org/). mosses, liverworts, hornworts, and lichens of the U.S. and its territories (http://plants.usda.gov/) [117] Some other agriculturally important databases along with description and URL are given in Table4 and also at (http:// WAICENT -The World Agricultural Information Centre www.hsls.pitt.edu/obrc/index.php?page=plant). .WAICENT is FAO’s strategic programme on information management and dissemination. Using Internet technologies, APPLICATIONS OF AGRICULTURAL BIOINFOR- WAICENT provides access to FAO’s data and information to MATICS millions of users per month from around the world. WAICENT also provides specialized information systems on topics of Collection and storage of plant genetic resource can be used to produce stronger, more drought, disease and insect resistant Development, Food Standards, Animal Genetic Resources, Post- crops and improve the quality of livestock making them healthier, Harvestglobal relevance, Operations, such Agro-Biodiversityas Desertification, Gender and Food and SystemsSustainable in more disease resistant and more productive. Urban Centers Crops International Plant Names Index : IPNI (http://www. Comparative genetics of the model and non-model plant ipni.org/) is a database of the names and associated basic speciescan reveal an organization of their genes with respect to bibliographical details of all seed plants, ferns and fern allies. Its each other which further use for transferring information from goal is to eliminate the need for repeated reference to primary the model crop systems to other food crops. Arabidopsis thaliana sources for basic bibliographic information about plant names. (water cress) and Oryza sativa (rice) are examples of available The data are freely available and are gradually being standardized complete plant genomes[90]. and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community. Renewable Energy IPNI is the product of a collaboration between The Royal Plant based biomass is one of the resource for obtaining Botanic Gardens, Kew, The Harvard University Herbaria, and the energy by converting into biofuels such as ethanol which could Australian National Herbarium [118]. Global Plant Checklist (International Organization for Plant crop species such as maize (corn),switch grass and lignocellulosic Information) : IOPI GPC (http://www.biologie.uni-hamburg. speciesbe used liketo drive bagasse, the vehicles and straw and fly are the widely planes. used Biomass for biofuel based de/b-online/ibc99/iopi/default.htm) is a prototype site for production. We could detect sequence variants in biomass-based comprehensive database that will ultimately encompass 300,000 crop species to maximize biomass production and recalcitrance. vascular-plant species and more than one million plant names, as Recently, genome of eucalyptus grandis has been released which well as some non-vascular species[119]. is also one of major resource of biomass components and all the genes take part in conversion of sugars into biomass components International Legume Database & Information Service : have already been deciphered , therefore provides great insight ILDIS (http://www.ildis.org/) ILDIS is an international project into mechanisms and pathways responsible for this conversion which maintains a database of plants in the family Fabaceae so in future we can enhance production of biomass components (Leguminosae) and provides services to scientists and other in eucalyptus and other relevant plants [120]. Hence,the use of people interested in these plants, including this web-site for genomics and bioinformatics in combination with breeding would access to the database. likely increase the capability of breeding crop species to be being Pests and Diseases Image Library:PaDIL (http://www.padil. used as biofuel feedstock and consequently keep increasingthe gov.au/aboutOverview.aspx) is a Commonwealth Government use of renewable energy in modern society[121,122]. initiative, developed and built by Museum Victoria’s Online Publishing Team, with support provided by DAFF (Department Insect resistance of Agriculture, Fisheries and Forestry) and PHA (Plant Health Genes from Bacillus thuringiensis that can control a number of serious pests have been successfully transferred to cotton, are production of high quality images showing primarily exotic maize and potatoes. This new ability of the plants to resist insect targetedAustralia), organisms a non-profit of plant public health company. concern PaDIL’s to Australia. primary ,assist aims outbreakmay reduce the amount of insecticides being used and with plant health diagnostics in all areas, from initial to high level, hence, the nutritional quality of the crops will be increased[123].

Int J Plant Biol Res 3(2): 1039 (2015) 9/14 Narayan et al. (2015) Email: Central

Improve nutritional quality predict their responses to changes in the environment. This could Scientists have recently succeeded in transferring genes into leadspecified to prevention traits, to and discover targeted the treatment causality of of diseases, diseases, improved and to rice to increase levels of Vitamin A, iron and other micronutrients. This work could have a profound impact in reducing occurrences food production, and preservation of the environment[18].� iron respectively[124]. Scientists have inserted a gene from yeast development of the agricultural sector, agro-based industries, Bioinformatics is now playing a significant role in the intoof blindness the tomato, and and anemia the resultcaused is by a plant deficiencies whose fruitin Vitamin stays longer A and agricultural by-products utilization and better management of on the vine [125]. the environment[134]. Genomics including sequencing of the model plant and plant pathogen genome has progressed rapidly Grow in poorer soils and drought resistant and opened several opportunities for genetic improvement of Progress has been made in developing cereal varieties that crop plants. The high degree of synteny among diverse plant have a greater tolerance for soil alkalinity, free aluminium and species, commonality in traits, the availability of expression and iron toxicities. These varieties will allow agriculture to succeed in function information of sequences has enabled the discovery poorer soil areas, thus adding more land to the global production of many useful traits for crop improvement[135]. Comparative base. Research is also in progress to produce crop varieties genomics is based on the fact that gene order and content among capable of tolerating reduced water conditions[126,127]. related plant species are largely conserved over millions of years of evolution[89]. Genome sequencing of several important Plant breeding plants species has enabled researchers to identify ‘chromosome’ The goal of plant genomics is to understand the genetic and and ‘difference’ factor in sequences. This in turn has been used molecular basis of all biological processes in plants that are to identify value traits for crop improvement. For instance, the relevant to the specie. This understanding is fundamental to barley comparisons and the sugarcane rust resistance gene based barley stem rust resistance gene has been identified from rice- development of new cultivars with improved quality and reduced on maize-sorghum comparisons[136]. Comparative genomics could help in achieving improvement of yields in rice, maize, economicallow efficient and environmentalexploitation of plantscosts. Traitsas biological considered resources of primary in the and other related grass crops such as barley, rye, sugarcane interest are, pathogen and abiotic stress resistance, quality traits and wheat. The ability to represent high resolution physical and for plant, and reproductive traits determining yield[128,129]. genetic maps of plants has been one of the great applications of An omics data can now be envisioned as a highly important tool informatics tools. With several display formats available it has for plant improvement. Additionally, the ability to examine gene expression will allow us to understand how plants respond to for large plant genomes. and interact with the internal and external milieu. These data been possible to look for specific positions on the chromosome may become crucial tool of future breeding decision management Some of the areas in agricultural bioinformatics that need systems[130]. focus would be are data curation and need for the use of restricted

Agriculturally Important Microorganism of information and therefore highly curated datasets need to be With the help of bioinformatics, we can under understand the vocabularies. Editing scientific data is important for dissemination genetic architecture of microorganism and pathogens to check and the results be provided for public use. There is a need for how these microbes affecting the host plant using metagenomics researchcontinuously communities developed through to collaborate analysis and by expert share in controlled the field and transcriptomics approach, so we can generate pathogen vocabularies. The efforts of the plant ontology(PO) and gene resistant crop and in addition,would identify those microbes ontology (GO) consortia would help in a uniform implementation of restricted vocabulary databases. Efforts are being carried out internationally to link existing related databases around the whichFuture are perspective beneficial for host[131-133]. whole world[137]. This will enable instantaneous transfer of With the increase of sequencing projects, bioinformatics knowledge and information on agriculturally related matters continues to make considerable progress in biology by providing and practices. The linking of agricultural information resources scientists with access to the genomic information. With cloud based service on internet, scientists are now able to freely world to pace the effort in pursuit of cultivating best variety of would be helpful for whole scientific community around the access volumes of such biological information, which enables the crops and plants. CONCLUSION advancement of scientific discoveries in agriculture. In this overview, current development in the world of transformation in the approaches taken, ranging from theoretical plant bioinformatics/agri-informatics has been discussed The field of biology has undergone several rounds of to experimental perturbation to discovering molecular with focus primarily on sequencing, omics, tools, database and components. In the next decades to come, it is believed that we application. The websites URLmentioned in this paper may change afterwards. Although many tools have been developed computational models of systems-wide properties could serve as will take on another giant leap in bioinformatics field, where to focus on in future such as integration of different kinds of of this will be not only the precise understanding of how plant relevantfor plant-specific data stored studies, in various there databasesare still many and dataareas generated we need the basis for experimentation and discovery. The ramifications species are built, but also the ability to engineer them to exhibit from distinct omicstechnologies, increase the use of automated

Int J Plant Biol Res 3(2): 1039 (2015) 10/14 Narayan et al. (2015) Email: Central machine learning algorithms in inference of desired results, 20. Grattapaglia D, Plomion C, Kirst M, Sederoff RR. Genomics of growth creation of more informative and user friendly visualization traits in forest trees. Curr Opin Plant Biol. 2009; 12: 148-156. tools, and encouragement of using mathematical models in new 21. Edwards D, Batley J. Plant genome sequencing: applications for crop bioinformatics tools. Bioinformatics is an indispensable part of improvement. Plant Biotechnol J. 2010; 8: 2-9. agriculture/plant research hence we need to encourage more 22. Tuberosa R, Salvi S. Genomics-based approaches to improve drought plant researcher toutilize bioinformatics tools to give more tolerance of crops. Trends Plant Sci. 2006; 11: 405-412. statistical meaning to their experimental data which may lead them to novel and unanticipated outcome. 23. Edwards D, Batley J. Plant bioinformatics: from genome to phenome. Trends Biotechnol. 2004; 22: 232-237. ACKNOWLEDGEMENT 24. Bender J. Plant epigenetics. Curr Biol. 2002; 12: R412-414. Both authors contribute equally to writing manuscript and 25. Rapp RA, Wendel JF. Epigenetics and plant evolution. New Phytol. authors declare that they have no competing interests. 2005; 168: 81-91. REFERENCES 26. C integration for the prediction of protein structural classes. J Comput 1. Boserup E. The conditions of agricultural growth: The economics Chem.hen L, 2009; Lu L, 30: Feng 2248-2254. K, Li W, Song J, Zheng L, et al. Multiple classifier of agrarian change under population pressure 2005: Transaction Publishers. 27. Hey AJ, Trefethen AE. The data deluge: An e-science perspective. 2003. 2. Lewis W.A. Theory of economic growth. Vol. 7. 2013: Routledge. 28. Gehlenborg N, O’Donoghue SI, Baliga NS, Goesmann A, Hibbs MA, Kitano H, et al. Visualization of omics data for systems biology. Nat 3. Yang DT, X. Zhu. Modernization of agriculture and long-term growth. Methods. 2010; 7: S56-68. Journal of Monetary Economics, 2013; 60: 367-382. 29. Berger B, Peng J, Singh M. Computational solutions for omics data. Nat 4. Taiz L. Agriculture, plant physiology, and human population growth: Rev Genet. 2013; 14: 333-346. past, present, and future. Theoretical and Experimental Plant Physiology. 2013; 25: 167-181. 30. Kircher M, Kelso J. High-throughput DNA sequencing--concepts and limitations. Bioessays. 2010; 32: 524-536. 5. Zeder MA. 13 Agricultural origins in the ancient world. Anthropology 31. Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, et al. High- Explored: The Best of Smithsonian AnthroNotes, 2013. throughput genotyping by whole-genome resequencing. Genome Res. 6. Graham RD, Welch RM. Breeding for staple food crops with high 2009; 19: 1068-1076. micronutrient density 1996: Intl Food Policy Res Inst. 32. Glenn TC. Field guide to next-generation DNA sequencers. Mol Ecol 7. N Resour. 2011; 11: 759-769. food crops. J Nutr. 2006; 136: 1064-1067. estel P, Bouis HE, Meenakshi JV, Pfeiffer W. Biofortification of staple 33. Alquezar-Planas DE, Fordyce SL. Roche genome sequencer FLX based 8. Svizzero S, Tisdell C. The Neolithic Revolution and human societies: high-throughput sequencing of ancient DNA. Methods Mol Biol. 2012; diverse origins and development paths. School of Economics. 888: 109-118. University of Queensland. 2014. 34. Ansorge WJ. Next-generation DNA sequencing techniques. N 9. Randhawa MS. Green Revolution: John Wiley and Sons. 1974 Biotechnol. 2009; 25: 195-203. 10. Conway GR, Barbier EB. After the green revolution: sustainable 35. Grabherr MG, Mauceli E, Ma LJ. Genome sequencing and assembly. agriculture for development. Routledge 2013. Methods Mol Biol. 2011; 722: 1-9. 11. Evenson RE, Gollin D. Assessing the impact of the green revolution, 36. Grath A. Genome sequencing and assembly. Perspectives in 1960 to 2000. Science. 2003; 300: 758-762. Bioanalysis. 2007; 2: 27-355. 12. Pingali PL. Green revolution: impacts, limits, and the path ahead. Proc 37. Newman T, de Bruijn FJ, Green P, Keegstra K, Kende H, McIntosh L, Natl Acad Sci U S A. 2012; 109: 12302-12308. et al. Genes galore: a summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA 13. Wishart DS. Current progress in computational metabolomics. Brief clones. Plant Physiology, 1994; 106: 1241-1255. Bioinform. 2007; 8: 279-293. 38. Höfte H, Desprez T, Amselem J, Chiapello H, Rouzé P, Caboche M, et 14. Ouzounis CA. Rise and demise of bioinformatics? Promise and al. An inventory of 1152 expressed sequence tags obtained by partial progress. PLoS Comput Biol. 2012; 8: e1002487. sequencing of cDNAs from Arabidopsis thaliana. Plant J. 1993; 4: 1051-1061. 15. Mardis ER. A decade’s perspective on DNA sequencing technology. Nature. 2011; 470: 198-203. 39. I Arabidopsis thaliana. Nature, 2000. 408: 796-815. 16. Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and nitiative AG. Analysis of the genome sequence of the flowering plant genome sequencing. J Appl Genet. 2011; 52: 413-435. 40. Richmond T, Somerville S. Chasing the dream: plant EST microarrays. Curr Opin Plant Biol. 2000; 3: 108-116. 17. Bayat A. Science, medicine, and the future: Bioinformatics. BMJ. 2002; 324: 1018-1022. 41. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990; 215: 403-410. 18. Rhee SY, Dickerson J, Xu D. Bioinformatics and its applications in plant biology. Annu Rev Plant Biol. 2006; 57: 335-360. 42. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008; 26: 1135-1145. 19. Thompson GA, Goggin FL. Transcriptomics and functional genomics of plant defence induction by phloem-feeding insects. J Exp Bot. 2006; 43. Imelfort M, Edwards D. De novo sequencing of plant genomes using 57: 755-766. second-generation technologies. Brief Bioinform. 2009; 10: 609-618.

Int J Plant Biol Res 3(2): 1039 (2015) 11/14 Narayan et al. (2015) Email: Central

44. Paszkiewicz K, Studholme DJ. De novo assembly of short sequence 66. Sumner LW, Mendes P, Dixon RA. Plant metabolomics: large-scale reads. Brief Bioinform. 2010; 11: 457-472. phytochemistry in the functional genomics era. Phytochemistry. 2003; 62: 817-836. 45. Bolger ME, Weisshaar B, Scholz U, Stein N, Usadel B, Mayer KF. Plant genome sequencing - applications for crop improvement. Curr Opin 67. Matsuda F, Hirai MY, Sasaki E, Akiyama K, Yonekura-Sakakibara K, Biotechnol. 2014; 26: 31-37. Provart NJ, et al. AtMetExpress development: a phytochemical atlas of Arabidopsis development. Plant Physiol. 2010; 152: 566-578. 46. Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol. 2002; 5: 94-100. 68. Akihiro T, Koike S, Tani R, Tominaga T, Watanabe S, Iijima Y, et al. Biochemical mechanism on GABA accumulation during fruit 47. Bundock PC, Eliott FG, Ablett G, Benson AD, Casu RE, Aitken KS, et al. development in tomato. Plant Cell Physiol. 2008; 49: 1378-1389. Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant species using 454 sequencing. Plant Biotechnology 69. Mochida K, Furuta T, Ebana K, Shinozaki K, Kikuchi J. Correlation Journal, 2009; 7: 347-354. exploration of metabolic and genomic diversity in rice. BMC Genomics. 2009; 10: 568. 48. Kearsey MJ, Farquhar AG. QTL analysis in plants; where are we now? Heredity (Edinb). 1998; 80: 137-142. 70. Yin YG, Tominaga T, Iijima Y, Aoki K, Shibata D, Ashihara H, et al. Metabolic alterations in organic acids and gamma-aminobutyric acid 49. Weigel D, Mott R. The 1001 genomes project for Arabidopsis thaliana. in developing tomato (Solanum lycopersicum L.) fruits. Plant Cell Genome Biol. 2009; 10: 107. Physiol. 2010; 51: 1300-1314. 50. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and 71. Fujimura Y, Kurihara K, Ida M, Kosaka R, Miura D, Wariishi H, et al. Metabolomics-driven nutraceutical evaluation of diverse green tea restriction site associated DNA (RAD) markers. Genome Res. 2007; cultivars. PLoS One. 2011; 6: e23426. 17:cost-effective 240-248. polymorphism identification and genotyping using 72. Kusano M, Redestig H, Hirai T, Oikawa A, Matsuda F, Fukushima A, et 51. Pegadaraju V, Nipper R, Hulke B, Qi L, Schultz Q. De novo sequencing metabolomics for objective substantial equivalence assessment. PLoS Associated DNA) approach. BMC Genomics. 2013; 14: 556. One.al. Covering 2011; 6: chemical e16989. diversity of genetically-modified tomatoes using of sunflower genome for SNP discovery using RAD (Restriction site 52. Wang N, Fang L, Xin H, Wang L, Li S. Construction of a high-density 73. Saito K, Matsuda F. Metabolomics for functional genomics, systems genetic map for grape using next generation restriction-site associated biology, and biotechnology. Annu Rev Plant Biol. 2010; 61: 463-489. DNA sequencing. BMC Plant Biol. 2012; 12: 148. 74. Cusick ME, Klitgord N, Vidal M, Hill DE. Interactome: gateway into 53. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML. systems biology. Hum Mol Genet. 2005; 14 Spec No. Genome-wide genetic marker discovery and genotyping using next- generation sequencing. Nat Rev Genet. 2011; 12: 499-510. 75. Morsy M, Gouthu S, Orchard S, Thorneycroft D, Harper JF, Mittler R, et al. Charting plant interactomes: possibilities and challenges. Trends 54. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et Plant Sci. 2008; 13: 183-191. al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008; 3: e3376. 76. Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science. 2011; 333: 55. Davey J, Blaxter d. ML. RAD Seq: next-generation population genetics. 601-607. Brief Funct Genomics. 2010; 10: 416-423. 77. Claros MG, Bautista R, Guerrero-Fernández D, Benzerki H, Seoane P, 56. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and Fernández-Pozo N. Why assembling plant genome sequences is so opportunities. Nat Rev Genet. 2011; 12: 87-98. challenging. Biology (Basel). 2012; 1: 439-459. 57. Bonnet E, Van de Peer Y, Rouzé P. The small RNA world of plants. New 78. Lee TH, Tang H, Wang X, Paterson AH. PGDD: a database of gene and Phytol. 2006; 171: 451-468. genome duplication in plants. Nucleic Acids Res. 2013; 41: D1152- 1158. 58. Chen X. Small RNAs and their roles in plant development. Annu Rev Cell Dev Biol. 2009; 25: 21-44. 79. Schatz MC, Witkowski J, McCombie WR. Current challenges in de novo plant genome sequencing and assembly. Genome Biol. 2012; 13: 243. 59. Kim KD, El Baidouri M, Jackson SA. Accessing epigenetic variation in the plant methylome. Brief Funct Genomics. 2014; 13: 318-327. 80. Hamilton JP, Buell CR. Advances in plant genome sequencing. Plant J. 2012; 70: 177-190. 60. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, et al. Highly integrated single-base resolution maps of the epigenome 81. Wegrzyn JL, Liechty JD, Stevens KA, Wu LS, Loopstra CA, Vasquez- in Arabidopsis. Cell. 2008; 133: 523-536. Gross HA, et al. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics. 2014; 61. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, et 196: 891-909. al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008; 452: 215-219. 82. Ritland K, Krutovsky VK, Tsumura Y, Pelgas B, Isabel N, Bousquet J. Genetic mapping in conifers. Genetics, genomics and breeding of 62. Roudier F, Ahmed I, Bérard C, Sarazin A, Mary-Huard T, Cortijo S, et al. conifers. 2011; 196-238.

in Arabidopsis. EMBO J. 2011; 30: 1928-1938. 83. N Integrative epigenomic mapping defines four main chromatin states The Norway spruce genome sequence and conifer genome evolution. 63. Lane AK, Niederhuth CE, Ji L, Schmitz RJ. pENCODE: a plant Nature.ystedt 2013;B, Street 497: NR, 579-584. Wetterbom A, Zuccolo A, Lin YC, Scofield DG, et al. encyclopedia of DNA elements. Annu Rev Genet. 2014; 48: 49-70. 84. Hardison RC. Comparative genomics. PLoS Biol. 2003; 1: E58. 64. Kahn SD. On the future of genomic data. Science. 2011; 331: 728-729. 85. Sankoff D, Nadeau JH. Comparative genomics. Springer. 2000. 65. German JB, Hammock BD, Watkins SM. Metabolomics: building on a century of biochemistry to guide human health. Metabolomics. 2005; 86. Alföldi J, Lindblad-Toh K. Comparative genomics as a tool to 1: 3-9. understand evolution and disease. Genome Res. 2013; 23: 1063-1068.

Int J Plant Biol Res 3(2): 1039 (2015) 12/14 Narayan et al. (2015) Email: Central

87. Paterson AH, Bowers JE, Burow MD, Draye X, Elsik CG, Jiang CX, et al. and applied research in plant biology. Cold Spring Harb Protoc. Comparative genomics of plant chromosomes. Plant Cell. 2000; 12: 2009; 2009: pdb. 1523-1540. 108. Hack C, Kendall G. Bioinformatics: Current practice and future 88. Hamel LP, Sheen J, Séguin A. Ancient signals: comparative genomics of challenges for life science education. Biochem Mol Biol Educ. 2005; green plant CDPKs. Trends Plant Sci. 2014; 19: 79-89. 33: 82-85. 89. Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. Synteny and 109. E collinearity in plant genomes. Science. 2008; 320: 486-488. large-scale detection of protein families. Nucleic Acids Res. 2002; 30: 1575-1584.nright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for 90. Proost S, Van Bel M, Sterck L, Billiau K, Van Parys T, Van de Peer Y, et al. PLAZA: a comparative genomics resource to study gene and 110. Wall PK, Leebens-Mack J, Müller KF, Field D, Altman NS, dePamphilis genome evolution in plants. Plant Cell. 2009; 21: 3718-3731. CW. PlantTribes: a gene and gene family resource for comparative genomics in plants. Nucleic Acids Res. 2008; 36: D970-976. 91. Monaco MK, Stein J, Naithani S, Wei S, Dharmawardhana P, Kumari S, et al. Gramene 2013: comparative plant genomics resources. Nucleic 111. L Acids Res. 2014; 42: D1193-1199. groups for eukaryotic genomes. Genome Res. 2003; 13: 2178-2189. i L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog 92. Altenhoff AM, Skunca N, Glover N, Train CM, Sueki A, Pilizota I, et al. 112. Ruiz M, Rouard M, Raboin LM, Lartaud M, Lagoda P, Courtois B. The OMA orthology database in 2015: function predictions, better TropGENE-DB, a multi-tropical crop information system. Nucleic plant support, synteny view and other improvements. Nucleic Acids Acids Res. 2004; 32: D364-367. Res. 2015; 43: D240-249. 113. Samson F, Brunaud V, Balzergue S, Dubreucq B, Lepiniec L, Pelletier 93. Salse J, Piégu B, Cooke R, Delseny M. New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) (FSTs) of Arabidopsis thaliana T-DNA transformants. Nucleic Acids Res.G, et 2002;al. FLAGdb/FST: 30: 94-97. a database of mapped flanking insertion sites genome. Plant J. 2004; 38: 396-409. highlights reshuffling and identifies new duplications in the rice 114. Samson F, Brunaud V, Duchêne S, De Oliveira Y, Caboche M, Lecharny 94. Hedges SB. The origin and evolution of model organisms. Nat Rev A, et al. FLAGdb++: a database for the functional analysis of the Genet. 2002; 3: 838-849. Arabidopsis genome. Nucleic Acids Res. 2004; 32: D347-350. 95. Meinke DW, Cherry JM, Dean C, Rounsley SD, Koornneef M. Arabidopsis 115. Crowe ML, Serizet C, Thareau V, Aubourg S, Rouzé P, Hilson P, et al. thaliana: a model plant for genome analysis. Science. 1998; 282: 662, CATMA: a complete Arabidopsis GST database. Nucleic Acids Res. 679-682. 2003; 31: 156-158. 96. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, et al. A draft 116. Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence sequence of the rice genome (Oryza sativa L. ssp. japonica). Science. CJ, et al. PlantGDB: a resource for comparative plant genomics. 2002; 296(5565): 92-100. Nucleic Acids Res. 2008; 36: D959-965. 97. Bennetzen JL, Ma J, Devos KM. Mechanisms of recent genome size 117. Usda C, Idaho N. The PLANTS database. In NRCS, National Plant Data Team. 2012. Citeseer.

98. variationSchmid M, in Davison flowering TS, plants. Henz SR, Ann Pape Bot. UJ, 2005; Demar 95: M,127-132. Vingron M, et al. A 118. IPNI I. The International Plant Names Index, 2009, KEW, The Royal gene expression map of Arabidopsis thaliana development. Nat Genet. Botanical Garden. 2005; 37: 501-506. 119. Bisby FA. Botanical strategies for compiling a global plant checklist. 99. Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller- SYSTEMATICS ASSOCIATION SPECIAL VOLUME, 1993. 48: p. 145- Roeber B, et al. PLAZA 3.0: an access point for plant comparative 145. genomics. Nucleic Acids Res. 2015; 43: D974-981. 120. Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, 100. Izawa T, Shimamoto K. Becoming a model plant: the importance of Grimwood J, et al. The genome of Eucalyptus grandis. Nature. 2014; rice to plant science. Trends in plant science. 1996; 1: 95-99. 510: 356-362 101. Kennedy D. The importance of rice. Science. 2002; 296: 13. 121. Boyle G. Renewable energy2004: OXFORD university press. 102. Wittbrodt J, Shima A, Schartl M. Medaka--a model organism from the 122. Turner JA. A realizable renewable energy future Science. 1999; 285: far East. Nat Rev Genet. 2002; 3: 53-64. 687-689. 103. Cheung F. Yield: The search for the rice of the future. Nature. 2014; 123. Betz FS, Hammond BG, Fuchs RL. Safety and advantages of Bacillus 514: S60-61. thuringiensis-protected plants to control insect pests. Regul Toxicol 104. Lohse M, Nagel A, Herter T, May P, Schroda M, Zrenner R, et al. Pharmacol. 2000; 32: 156-173. Mercator: a fast and simple web server for genome scale functional 124. Paine JA, Shipton CA, Chaggar S, Howells RM, Kennedy MJ, Vernon annotation of plant sequence data. Plant Cell Environ. 2014; 37: G, et al. Improving the nutritional value of Golden Rice through 1250-1258. increased pro-vitamin A content. Nat Biotechnol. 2005; 23: 482-487. 105. Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D. The TIGR 125. F rice genome annotation resource: annotating the rice genome and formation in tomato fruit and the potential application of systems creating resources for plant biologists. Nucleic Acids Res. 2003; 31: andraser synthetic PD, Enfissi biology E, Bramley approaches. PM. Genetic Archives engineering of Biochemistry of carotenoid and 229-233. Biophysics. 2009; 483: 196-204. 106. Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR. 126. Kasuga M, Liu Q, Miura S, Yamaguchi-Shinozaki K, Shinozaki K. Rice bioinformatics. analysis of rice sequence data and leveraging Improving plant drought, salt, and freezing tolerance by gene the data to other plant species. Plant Physiol. 2001; 125: 1166-1174. transfer of a single stress-inducible transcription factor. Nat 107. Strable J, Scanlon MJ. Maize (Zea mays): a model organism for basic Biotechnol. 1999; 17: 287-291.

Int J Plant Biol Res 3(2): 1039 (2015) 13/14 Narayan et al. (2015) Email: Central

127. Wang S, Wan C, Wang Y, Chen H, Zhou Z, Fu H et al. The characteristics 133. Schenk PM, Carvalhais LC, Kazan K. Unraveling plant-microbe of Na< sup>+, K< sup>+ and free proline distribution in several interactions: can multi-species transcriptomics help? Trends drought-resistant plants of the Alxa Desert, China. Journal of arid Biotechnol. 2012; 30: 177-184. environments. 2004; 56: 525-539. 134. Field D, Tiwari B, Snape J. Bioinformatics and data management 128. Blum A. Plant breeding for stress environments1988: CRC Press, Inc. support for environmental genomics. PLoS Biol. 2005; 3: e297. 129. Xu Y. Molecular plant breeding2010: CABI. 135. Mochida K, Shinozaki K. Genomics and bioinformatics resources for crop improvement. Plant Cell Physiol. 2010; 51: 497-523. 130. Langridge P, Fleury D. Making the most of ‘omics’ for crop breeding. Trends Biotechnol. 2011; 29: 33-40. 136. Kilian A, Kudrna DA, Kleinhofs A, Yano M, Kurata N, Steffenson B, et al. Rice-barley synteny and its application to saturation mapping 131. Berg G. Plant-microbe interactions promoting plant growth and of the barley Rpg1 region. Nucleic Acids Res. 1995; 23: 2729-2733. health: perspectives for controlled use of microorganisms in agriculture. Appl Microbiol Biotechnol. 2009; 84: 11-18. 137. Plant Ontology Consortium. The Plant Ontology Consortium and plant ontologies. Comp Funct Genomics. 2002; 3: 137-142. 132. McCarthy FM, Wang N, Magee GB, Nanduri B, Lawrence ML, Camon EB, et al. AgBase: a functional genomics resource for agriculture. BMC Genomics. 2006; 7: 229.

Cite this article Agarwal R, Narayan J (2015) Unraveling the Impact of Bioinformatics and Omics in Agriculture. Int J Plant Biol Res 3(2): 1039.

Int J Plant Biol Res 3(2): 1039 (2015) 14/14