Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 04 (2018) Journal homepage: http://www.ijcmas.com

Original Research Article https://doi.org/10.20546/ijcmas.2018.704.358

Microbial Ontology for Agriculturally Important Microorganisms (AMO) Coupled with Sequence Alignment Reinforcement Options

Chandan Kumar Deb1*, Saket Kumar Karn1, Madhurima Das2 and Sudeep Marwaha1

1Indian Agricultural Statistics Research Institute, New Delhi-110012, India 2Indian Agricultural Research Institute, New Delhi-110012, India

*Corresponding author

ABSTRACT

Ontology is a knowledge representation technique, devised for the web based systems to provide the capability to deal with the semantics of the concepts in the specific knowledge domain. Alternatively, taxonomy describes the real world concepts in a well-defined hierarchy and exists in standard form for various domains in science. The present study K e yw or ds dealt with the taxonomy of microorganisms. The Three Domain System taxonomy is most

Semantic web, widely adopted taxonomy in this domain. It covers Bacteria, and Eukarya

Ontology, Bacteria, domains. In this research work a web based application has been developed using N-tier Archaea, N-tier architecture which extended the previously developed Microbial Ontology covering Architecture Archaea domain up to the species level. Developed application easily identified new

Article Info microorganisms by matching their characteristics. Domain experts can insert, delete and edit any new information about the microbial taxonomy. The web interface also provided Accepted: search facility for finding information about the concepts and 16S rRNA sequences of 26 March 2018 various Archaea species. This software also facilitated name based search for Available Online: microorganism’s taxonomic terms. A sequence alignment tool is also developed in the 10 April 2018 system for aligning the query sequence with the existing sequence in the ontology. The use

of ontologies to represent the taxonomic information and the ability of this software to provide this knowledge to other applications increases the utility of this work to a greater extent. Introduction efficient knowledge representation technique – Ontology. Ontology is used in agriculture in Microbes, coherently indispensable for various ways like Gene Ontology (GO): Gene agriculture and crop productivity; apart from Ontology (GO) was developed by Gene the catastrophic damage it results. Proper Ontology Consortium (Ashburner et al., utilization of the microbe can only be 2000). AmiGO is an HTML based browser, achievable through its explicit knowledge of which one can use to browse and search Gene domains and capability of drawing inference Ontology (GO). Gene Ontology covers three from them for better utilization of that domains Molecular Function, Biological knowledge. It is only feasible through an Process and Cellular Component. Plant 3154

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Ontology (PO): Plant Ontology (PO) was information from Domain to developed by Plant Ontology Consortium, specifically. 2002 It deals with plant genome databases and plant systematics to describe phenotype and In this work, an attempt has been made to expression patterns of plant genes. Designing conceptualize and develop ontology for Ontology from Traditional Taxonomies (Bedi agriculturally important microorganisms and Marwaha, 2004): proposed a methodology (Madigan et al., 2006). Microbial Taxonomy for the conversion of taxonomies to mainly comprises of three parts: ontologies. The proposed methodology is Classification, Nomenclature and tested and implemented for a pilot soil Identification. Taxonomy can be defined as ontology using the IEEE standard Web the science of classification, consisting of two Ontology Language (OWL) and protégé 2.1 parts: identification and nomenclature. 16S OWL plugin. Ontology-based intelligent rRNA sequence data is an identifiable retrieval system for soil knowledge (Minz et characteristic of Archaea. Microbial Ontology al., 2009): This system search the documents contains various classes, properties, related to soils by using soil domain ontology. restrictions and individuals related to Basic Classification information in soil domain Characteristics, Ecology, Cell Structure, mode ontology is displayed in a tree structure form, of respiration, type of nutrition, shape, Gram from the navigation database Building and Staining etc. In this work the ontology is Querying Soil Ontology for Agriculture (Das extended for Archaea from Domain to Species et al., 2012). This deals with various aspects level. of development of web based software for the information regarding USDA Soil Taxonomy. The present study is proposed to extend the This system describes only seven soil orders work carried out by Biswas, 2012 for the (Alfisols, Aridisols, Entisols, Inceptisols, Archaea Domain. The extended system also Mollisols, Ultisols and Vertisols) seen in aims to store and establish relationship India. One can classify the newly found soil between corresponding Archaea according to the USDA Soil Taxonomic microorganism’s upto Species level and its Classification system up to Subgroup level 16S rRNA sequence. (Deb et al., 2015). It was the enhancement of work done by Das, 2010. It was extended up This research work includes three objectives: to the soil series level of existing 7 soil order firstly, to perform requirement analysis for and adding 5 soil order in to the soil ontology. strengthening and enhancing microbial It also provides the query interface for adding, ontology, secondly, to develop and populate deleting and updating information to the soil the microbial ontology, and thirdly, to develop ontology. Ontology also facilitates sustainable a query interface for querying the ontology. agriculture techniques. Building and Querying Microbial Ontology (Biswas et al., 2013) Materials and Methods deals with various aspects of developing a web based software for the information Software development regarding Three Domain System classification of microbial taxonomy for the microbes Tools and technique used to develop important in agricultural purpose. This system microbial taxonomy ontology contains information mainly about the microorganisms (Bergey et al., 1989) that are Microbial Taxonomy Ontology is a web based important in agriculture. This system contains software which follows the N-tier architecture.

3155

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Figure 1 describes the block diagram of the sequence diagram. We have designed a software. The client side interface layer sequence diagram, to visualize the step by step (CSIL) is in front layer, made to communicate output and also the interaction with the with the user i.e. to take the user query and software (Fig. 2). respond to it. The CSIL layer is made up of HTML, CSS and Java scripts. The server side Results and Discussion application layer (SSAL) is made up of java server pages (JSP) and build up on J2EE The result of our study can be divided into two platform. The SSAL layer handles the user sections. Firstly, we have developed a back query and process it to get the information end of our web base software and secondly, from the back end of the software. The back we have developed a front end to extract, end is made up with database layer (DBL) and manipulate and process the stored information knowledgebase layer (KBL). DBL is built up in the back end. In the ontology development by the RDBMS (Relational Database process, we have used Protégé OWL editor Management System) SQL server 2008 and on from Domain to Species level and a query the other hand, KBL is built up of protégé interface has been developed that will help a which follows the standard OWL (Web detailed study of classification of Ontology Language). KBL also enabled to microorganisms, microbial taxonomy. In this deal with OWL Lite, OWL DL and OWL Full. research work, we have enhanced the KBL and semantic web framework layer Microbial Ontology, developed by Biswas et (SWFL) made the system semantically al., 2013. The existing ontology was enabled and it can handle the complex populated with the information of bacteria up semantic query and decision making hurdle. to the genus level. The Microbial Ontology SWL consists of JENA; a programming has been extended to the Species for bacteria framework to handle Resource Description and also added the information of Archea up Framework (RDF), Resource Description to the species level (Domain → Phylum → Framework Schema (RDFS) and Web Class → Order→ Family → Genus → Ontology Language (OWL). It contains the Species). implementation of SPARQL specifications. SPARQL (Clark, 2008) is a query language Creating classes, individuals and their which obtains information from RDF graph. properties Jena is used to store and retrieving data information from Ontology. Additionally this The building block of ontology development layer uses OWL Protégé, OWL syntax etc. is the classes, individuals and the properties of Java API is used to edit the Ontology through the domains. Figure 3 depicts some snaps of Java. The sequence alignment in this software the ontology class which has been developed is done by integration of BioJava in the in the Microbial Taxonomy Ontology. system. In the hierarchy, the Class Microbial Sequence diagram of microbial taxonomy Taxonomy is created as topmost class. ontology Therefore it is created as the subclass of the class owl: Thing. The class Microbial To develop a software, the designing and Taxonomy has three subclasses-Archaea, visualization of the output is a very important Bacteria and Eukarya. Two Phylum classes aspect. In accordance with this, the most and Euryarchaeota (as given in important tool to visualize the output is the the Bergey's Manual of Systematic

3156

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Bacteriology) created as child classes of the where one can write query to find out Archaea class. Then child classes of both the particular knowledge from the Ontology. phylum classes were created. Likewise all the hierarchical data has been incorporated to the A query interface for querying the Microbial ontology (e.g. Phylum-Crenarchaeota has one Ontology has been developed. This subclass Class-) and class information is used for extraction from the Thermoprotei have three subclasses (i.e. OWL Ontology layer. Framework layer is Order-Desulfurococcales, Sulfobales and implemented by using Jena. The system ). After creating the hierarchy authorizes three types of users viz. Normal class of both the phylum, the classes User, Domain Experts (the user has detail representing the properties of the microbes knowledge about Microbial Taxonomy) and such as Basic_Characteristics, Nutrition_type, Administrator. Domain Experts are those users Other_Characteristics, and Shape were who can insert, delete and update knowledge created. in the knowledge base. Administrator is the owner of the system and has privilege to add / After creating the class hierarchy of the delete / modify the rights of various users. Microbial Taxonomy, the next step is to After verification of sign in all tabs (options) populate the classes with their respective are available to normal users, except “Edit individuals. Individuals are the instance of Ontology”. “Edit Ontology” option is classes. In protégé individuals are created in available only to Domain Experts and the INDIVIDUAL EDITOR. Individuals of all Administrator. Home Page of the Software is the subclasses of Microbial Taxonomy must as shown in Figure 7. Figure 8 depicts the be with the same name of their respective steps involved in the detailed study of classes, and names have been written in small Microbial Taxonomy with the help of letters. In Ontology, more than one resource “Taxonomy” tap after log in of every user. cannot exist with same name. By using this tab, user can study in detail After creation of classes and their individuals about microbial taxonomy of Bacteria and the next step is creation of Properties in Archaea up to the Species level, for ten Phyla, Ontology (Figure 5 and 6). In protégé OWL twenty two Classes, thirty six Orders, fifty plug-in, Property Browser is used for creating five Families, one hundred thirty five Genera, properties. Properties are of two types- Object and sixty six Species as given in the “Brocks and Data type. For each property, Domain and Biology of Microorganisms” and “Bergey's Range have to be specific and clearly defined. Manual of Systematic Bacteriology”. It will Domain is the class where the property has to show basic characteristics, cell structure, be applied and Range is the class from which ecology, shape, nutrition type, respiration the property values have been taken; e.g. the mode, 16S rRNA sequence etc. property has16SrRNAsequenceis an object property and its Domain is Archaea and Range This software also facilitates name based is Basic_Characteristics. search for all the Microbial Taxonomic terms as shown in Figure 9. This is the result of the Querying the microbial ontology search term thermoprotus. Software gives all the hierarchy from the domain up to the genus For retrieving the knowledge from the where the term thermoprotus resides. All red ontology, Protégé provides a query interface ellipse is clickable and it navigates to the known as Open SPARQL; a query panel corresponding page of the term.

3157

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.1 Block diagram representation of the software

(HTML, CSS,JavaScript

Fig.2 Representation of the Sequence diagram of Microbial Taxonomy Ontology

3158

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.3 Representation of the entire class hierarchy of Microbial Taxonomy

Fig.4 Representation of the individuals of a class

3159

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.5 Representation of the list of all the properties of microbial ontology

Fig.6 Representation of neutrophilus class with its individuals and their properties

3160

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.7 Representation of the home page of the software

Fig.8 Representation of a detailed study of microbial taxonomy

3161

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.9 Representation of the results of name based search

Fig.10 Representation of an advanced search module

3162

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Fig.11 Representation of sequence search module

Fig.12 Representation of alignment of the query sequence with the existing sequence and the identification of the species as Archaea

Table.1 Table showing some of the query results of the Domain

Functional Attributes Cell Structure Attributes Results 1. Nitrogen Fixation Cell nucleus not present Organic Matter Decomposition Histone protein present Archaea Ribosome sedimentation value is 70S 2. Nitrogen Fixation Cell nucleus not present Organic Matter Decomposition Histone protein not present Bacteria Ribosome sedimentation value is 70S

3163

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Table.2 Table showing some of the query results of the Phylum

Sl No. Functional Attributes Shape Other Characteristics Results 1. Ammonia Oxidizing Coccus, Rod, Anaerobic thermophilic Archaea Filamentous and fermentative Crenarchaeota Photosynthesis Gram negative 2. Halophilic archaea Bacillus, All major nutritional Coccus, Disc, types Euryarchaeota Thermophilic archaea Filamentous, These are mainly Rod halophiles and methanogens 3. Bio remediation Bacillus Low G C DNA composition Proteobacteria Acetic acid bacteria Gram negative

Table.3 Table showing some of the query results of the Family

Sl No. Functional Attributes Shape Other Characteristics Results 1. Ammonia Oxidizing Archaea Rod Cells are gram positive Nitrogen fixation Nutrition type Thermoproteaceae Chemolithotrophic 2. Nitrogen fixation Spiral Nitrite oxidizing bacteria Nitrospiraceae Chemolithotrophic Important for healthy marine ecosystems

Table.4 Table showing some of the query results of Genus

Sl No. Functional Shape Other Results Attributes Characteristics 1. Sulfer reduction Rod Chemoorganotrophs Thermophilic archaea Gram negative cells Thermoproteus Rod shaped 2. Photosynthesis Coccus Anaerobic respiration Thermophilic archaea Gram negative cells Photosynthetically Pyrococcus helpful 3. Chemoorganotrophic Filamentous, Aerobic respiration Thermophilic archaea Spherical Gram negative cells Thermoplasma Unicellular organism 4. Green sulfur bacteria Spherical Anaerobic respiration Photosynthesis Cell division by Chlorobium fission Gram negative cells

3164

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166

Table.5 Table showing some of the results of Species

Sl Functional Shape Other Characteristics Results No. Attributes 1. Chemolithotrophic Rod Binary fission Nitrogen fixation Gram negative cell Thermoproteus Nonmotile and lack flagella neutrophilus 2. Chemolithotrophic Rod Gram negative cell Thermoproteus Thermophilicarchaea Nonmotile and lack flagella tenax Mode of respiration Anaerobic 3. Chemoorganotrophic Disc Gram positive cell Halophilicarchaea Mode of respiration Aerobic Haloferax volcanii Nonmotile and lack flagella

Property based search or advanced search species Archaea, Figure 12. Provision for editing knowledge base The Advance search module is as shown in (Ontology) by domain experts Figure 10. This is an advanced version of the name based search described in the next section. Microbial Taxonomy Ontology knowledge base This module comes under the tab of “Advanced can be edited by domain experts; if there is any Search”. This search is dedicated to all the wrong entry by the system developer or any hierarchy of microbial taxonomy from domain new information is available regarding the to species level. Every level of hierarchy has particular microorganisms. It is done by using some typical attribute that separates one from Edit Ontology tab. On clicking this tab, an the another. We tried to capture those interface will guide domain expert for the characteristics of the microbial taxonomy. editing purpose. After proper review of the changes made by domain experts the final Figure 10 describes the Probable Domain; change may be committed in the ontology. which is Archaea. User can study in detail about Microbial Taxonomy Ontology is a rich the Domain Archaea, on click More repository of information of agriculturally Information. Similarly one can search for important microorganisms- Bacteria and Phylum, Family, Genus and Species by Archea. This system will be beneficiary for the Advanced Search module (Table 1, 2, 3, 4 and community of microbiologists and 5). agriculturalist worldwide. The taxonomic description of microbial taxonomy will help in Sequence search and alignment the detailed study of agriculturally important Bacteria and Archaea. Apart from the sequential Microbial ontology consists of 16S rRNA study of the taxonomy, the system enables us to sequence of Archaea. This information not only randomly search term related to the microbial used for showing purpose we take it in one step taxonomy called the name based search. The ahead. If users have unknown sequence, then term based search or the name based search is user can know by Sequence Search tab to not sufficient, hence, Microbial Taxonomy unknown sequence corresponds to exact match Ontology provides the advanced search module with the existing sequences, Figure 11. or the property based search module. This Otherwise if the users have partial sequence module provides the selection facility of the data then they can align their sequence with the special characteristics of a particular hierarchy existing sequence and identify the probable (e.g. Domain, Phylum etc.). On the basis of the

3165

Int.J.Curr.Microbiol.App.Sci (2018) 7(4): 3154-3166 property combination, it can give the probable Bedi P, Marwaha S., 2004. Designing hierarchy that matched with the particular set of Ontologies from Traditional Taxonomies. characteristics. As we discussed earlier, the Proceedings of International Conference system provides the sequence search and on Cognitive Science 324-329. sequence alignment concurrently. Both types of Bergey D H, Harrison FC, Breed RS, Hammer sequence search can be of utmost importance to BW, Huntoon FM., 1989. Bergey’s the microbiologist as well as the experts in the Manual of Systematic Bacteriology 3. field of bioinformatics. The system also has the Biswas S, Marwaha S, Malhotra P K, Wahi S secure login facility to maintain the user D, Dhar D W, and Singh R., 2013. privileges. Building and querying microbial ontology. Procedia Technology, 10:13-19. Ontology is applied in several research area, Clark K., 2008. SPARQL protocol for RDF, including database design and integration, W3C Recommendation, information retrieval and extraction, software http://www.w3.org/TR/rdf-sparql- engineering and natural language processing. protocol/. Knowledge base of this software can be Das M, Malhotra PK, Marwaha S, Pandey RN., enriched by the information of all the 2012. Building and Querying Soil microorganisms based on the Taxonomic Ontology. Journal of the Indian Society of Classification to classify any recognized Agricultural Statistics. 66(3): 459-464. microorganisms. There is a scope of Deb C K, Marwaha S, Malhotra P K, Wahi S D, enhancement of information of the knowledge Pandey R N., 2015. Strengthening soil base up to strains level, therefore making it as a taxonomy ontology software for tool for the other usage of microbiological description and classification of USDA areas; such as-Industrial microbiology, Marine soil taxonomy up to soil series. microbiology, Medical microbiology etc. Proceedings of International Conference on Computing for Sustainable Global Acknowledgments Development (INDIACom) IEEE. 1180- 1184. We gratefully acknowledge the INSPIRE Madigan MT, Martinko JM, Parker J., 2006. Fellowship provided by Department of Science Brock Biology of Microorganisms and Technology, New Delhi and ICAR-JRF Eleventh Edition. USA: Pearson Prentice Fellowship provided by Indian Council of Hall. Agricultural Research. Ming Z, Qingling Z, Dong T, Ping Q, Xiaoshuan Z., 2009. Ontology-based References intelligent retrieval system for soil knowledge. Wseas transactions on Ashburner M, Ball C A, Blake J A, Botstein D, information science and applications 6: 7. Butler H, Cherry J M, and Harris M A., Plant Ontology Consortium. 2002. The Plant 2000. Gene Ontology: tool for the Ontology consortium and plant unification of biology. Nature Ontologies. International Journal Plant genetics 25: 25. Genomics. 3: 137-142.

How to cite this article:

Chandan Kumar Deb, Saket Kumar Karn, Madhurima Das and Sudeep Marwaha. 2018. Microbial Taxonomy Ontology for Agriculturally Important Microorganisms (AMO) Coupled with Sequence Alignment Reinforcement Options. Int.J.Curr.Microbiol.App.Sci. 7(04): 3154-3166. doi: https://doi.org/10.20546/ijcmas.2018.704.358

3166