Development of Front End Tools for Semantic Grid Services

Total Page:16

File Type:pdf, Size:1020Kb

Development of Front End Tools for Semantic Grid Services

IT@MIT Development of Front End tools for Semantic Grid Services

Development of Front End Tools for Semantic Grid Services

TECHNICAL PROGRESS REPORT (CDAC-MIT-Garuda-TR-3-MAR- 2006) (April, 2005 – March, 2006)

Submitted to

Centre for Development of Advance Computing Bangalore

By

Dr S.THAMARAI SELVI CHIEF INVESTIGATOR

Department of Information Technology M.I.T. Campus of Anna University Chromepet, Chennai – 600 044.

Technical Report (April, 2005 – March, 2006) 1 IT@MIT Development of Front End tools for Semantic Grid Services

Abstract This project proposes various approaches to implement knowledge layer of semantic grid architecture. One of our approaches enables the service providers to create their own service ontology and perform matchmaking of requested services against the advertised ones. Using this approach, we propose Semantic Grid architecture using protégé enabled Globus toolkit to create service ontology of the grid service. The Parameter Matchmaking Algorithm implemented in the discovery module of the architecture performs matchmaking of requested descriptions with the advertised descriptions and computes the degree of similarity between them. In the other approach, we implement semantic description module that describes the grid resource semantically using ontology template. Also, the resource discovery module uses an inference engine to discover closely matching grid resource. The semantic grid architecture using this approach uses gridbus broker architecture at lower layer for job execution. Here, Knowledge base of the grid resources will be created using Protégé-OWL APIs. The grid resource information is obtained by using GIIS of Globus Toolkit. The discovery module discovers best matching resource using Algernon inference engine for the user’s request. In the end, we also provide a knowledge layer that can be implemented over the grid architecture using proposed approaches of implementing knowledge layer. It is proposed to develop GridWSDL2OWL-S that converts the WSDL file of WSRF services into OWL-S descriptions which can further be used in matchmaking of grid services against the requested ones. Ontology clustering is identified to use in the proposed system to improve the performance of matchmaking. Keywords Grid Computing, Globus Toolkit, Semantic Web, Ontology, Inference engine, Grid Resource Broker.

Technical Report (April, 2005 – March, 2006) 2 IT@MIT Development of Front End tools for Semantic Grid Services

1. Introduction Until recently, application developers could often assume a target environment that was homogeneous, reliable and centrally managed. Increasingly, however, computing is concerned with collaboration, data sharing and other new modes of interaction that involve distributed resources. The result is an increased focus on the interconnection of systems both within and across enterprises, whether in the form of intelligent networks, switching devices, caching services, appliance servers, storage systems, or storage are network management systems. Moreover, the continuing decentralization and distribution of software, hardware and human resources make it essential that we need to achieve desired qualities of services on resources assembled dynamically from enterprise systems, service provider systems and customer systems. We require new abstractions and concepts that allow applications to access and share resources and services across distributed, wide area networks. Development of grid technologies addresses precisely these problems and which are seeing widespread and successful adoption for scientific and technical computing. Grid Technologies support the sharing and coordinated use of diverse resources in dynamic virtual organization that is the creation, from geographically and organizationally distributed components, of virtual computing systems that are sufficiently integrated to deliver desired QoS[2].In [1], the essential properties of Grids are defined and it also introduced the key requirements for protocols and services, distinguishing among connectivity protocols concerned with communication and authentication, resource protocols concerned with negotiating access to individual resources and collective protocols and services concerned with the coordinated use of multiple resources. Grid Concepts and technologies were first developed to enable resource sharing within far-flung scientific collaborations. Applications include collaborative visualization of large scientific data analysis (pooling of computer power and storage), and coupling of scientific instruments with remote computers and archives (increasing functionality as

Technical Report (April, 2005 – March, 2006) 3 IT@MIT Development of Front End tools for Semantic Grid Services well as availability) [1]. We can expect similar applications to become important in commercial settings, initially for scientific and technical computing applications and then for commercial distributed computing applications, including enterprise application integration and business to business partner collaboration over the Internet.

Fig 1.1: A typical Virtual Organization Grid concepts are critically important for commercial computing not primarily as a means of enhancing capability, but rather as a solution to new challenges relating to the construction of reliable, scalable and secure distributed systems. A Grid instantiation is a grid system prototype using one or more of the grid middleware technologies such as Globus, Condor or UNICORE. Most current grid instantiations are focused on computational services for end users. However, they lack the ability to provide domain problem solving services and knowledge related services. In addition to that, fundamental research on semantic web has allowed the grid community to move from the current data centric view supporting the Grid, towards semantic grid with a set of domain specific problem solving services and knowledge services. The semantic Grid is a service oriented architecture in which entities provide services to one another under various forms of contract. The semantic Grid is characterized by an open system, with a high degree of automation, which supports flexible collaboration and computation on a global scale. In such an environment it is essential that information relating to the needs of the user and their applications, and the

Technical Report (April, 2005 – March, 2006) 4 IT@MIT Development of Front End tools for Semantic Grid Services resource providers and their networking, storage and computational resources all have easily discoverable interfaces and are defined, which means that can be used by higher level services to effectively exploit the grid. Similar to the semantic web, the semantic grid is described as an “extension of the current Grid in which information and services are given well defined meaning, better enabling computers and people to work in cooperation”. However, in a heterogeneous multi-institutional environment such as Grid, it is difficult to enforce the syntax and semantics of resource and service descriptions. Hence, in these distributed computing environment where resources come and go dynamically, there is a demand for a framework to support semantic description and semantic discovery of services and resources. In this project, we propose two different approaches to implement knowledge layer in the semantic grid architecture. The first approach we propose addresses the issue of semantic description and discovery of services. Using this approach, we propose semantic grid architecture using PEG in which protégé editor is integrated with Globus Toolkit for semantic description of services. Parameter Matchmaking Algorithm introduced in the knowledge layer performs semantic matching of advertised services against the requested ones and computes their degree of closeness. The second approach addresses the semantic description of computing resources present in the grid using grid resource ontology template. With this approach, we implement a five layered semantic grid architecture using gridbus broker at the lower layer for job submission and execution. This architecture also addresses semantic discovery of grid resources using Algernon inference engine. Since it is built on Gridbus broker architecture, it supports most of the middlewares that include Globus, Alchemi etc. In the end, we have also proposed a functional model of the knowledge layer for semantic descriptions of computing resources and services. The model will enable to perform QoS based matchmaking of services against the available ones. It also exploits ontology clustering techniques for improving the efficiency of the matchmaking algorithm. The rest of the document is organized as follows: - Section 2 provides the necessary background needed for semantic grid services and its evolution. The semantic

Technical Report (April, 2005 – March, 2006) 5 IT@MIT Development of Front End tools for Semantic Grid Services web services and its architecture are described briefly in section 3. It also describes the motivation for semantic grid services. Section 4 introduces Semantic Grid Services and its conceptual layer. The research issue is also discussed in this section. In section 5, the related technology that includes the concept of ontology is discussed in detail. The issues and difficulties in implementation are described in the section 6. The section 7 focuses on proposed approach used in our research work and it also discuss the motivation behind those works. Section 8 discusses one of our research works that proposes a semantic grid architecture using Protégé Enabled Globus Toolkit. The section 9 proposes the implementation of knowledge layer using ontology template in the semantic grid architecture. The section 10 discusses a case study in WSMX environment. The further scope of this research work is detailed in the section 11. The section 12 concludes the document by highlighting the overall objective and advantage of the proposed approach.

2. Background The establishment, management, and exploitation of dynamic, cross organizational VO sharing relationships require new technology and Architecture. A Typical Grid application will usually consist of several different components. For example, a typical grid application could have:  VO Management Service: To manage what nodes and users are part of each Virtual Organization  Resource Discovery and Management Service: So applications on the grid can discover resources that suit their needs and then manage them  Job Management Service: So users can submit tasks in the form of jobs to the grid and a whole other bunch of services like security, data management etc., Furthermore, all these services are interacting constantly. For example, the job management service might consult the resource discovery service to find computational resources that match the job’s requirements. With so many services and so many interactions between them, there exists the potential for chaos. What if every vendor out there decided to implement a Job Management Service in a completely different way,

Technical Report (April, 2005 – March, 2006) 6 IT@MIT Development of Front End tools for Semantic Grid Services exposing not only different functionality but also different interfaces? It would be very difficult to get all the different pieces of software to work together. The solution is Standardization: define a common interface for each type of service. The Open Grid Services Architecture (OGSA), developed by The Global Grid Forum, aims to define a common, standard, and open architecture for grid-based applications. The goal of OGSA is to standardize practically all the services one commonly finds in a grid application (job management services, resource management services, security services etc.,) by specifying a set of standard interfaces for these services. The Grid Architecture identifies fundamental system components, specifies the purpose and functions of these components and indicates how these components interact with one another. The Fig 2.1 shows the grid architecture defined in [2].

G Application ri d P Collective r ot Resource o c Connectivity ol A Fabric rc hi Figte 2.1: The layered Grid Architecture ct Fabric Layer u The Grid Fabric layer providesre the resources to which shared access is mediated by Grid protocols: for example, computational resources, storage systems, catalogs, network resources and sensors. A “resource” may be a logical entity, such as a distributed file system, computer cluster, or distributed computer pool. Fabric components implement the local, resource-specific operations that occur on specific resources as a result of sharing operations at higher levels. There is thus a tight interdependence between the functions implemented at the fabric level, on the one hand, and the sharing

Technical Report (April, 2005 – March, 2006) 7 IT@MIT Development of Front End tools for Semantic Grid Services operations supported, on the other. Connectivity Layer The connectivity layer defines core communication and authentication protocols required for Grid-specific network transactions. Communication protocols enable the exchange of data between Fabric layer resources. Authentication protocols build on communication services to provide cryptographically secure mechanisms for verifying the identity of users and resources. Resource Layer The Resource Layer builds on connectivity layer protocols to define protocols including APIs and SDKs for the secure negotiation, initiation, monitoring, control, accounting and payment of sharing operations on individual resources. Resource layer implementations of these protocols calls fabric layer functions to access and control local resources. Resource layer protocols are concerned entirely with individual resources and hence ignore issues of global state and atomic actions across distributed collections; such issues. Collective Layer The Collective layer contains protocols including APIs and SDKs and services that are not associated with any one specific resource but rather are global in nature and capture interactions across collections of resources. Because Collective components build on the narrow Resource and Connectivity layer “neck” in the protocol hourglass, they can implement a wide variety of sharing behaviors without placing new requirements on the resources being shared. Application Layer The final layer in the architecture comprises the user applications that operate within a VO environment. Applications are constructed in terms of, and by calling upon, services defined at any layer. While creating this new architecture, the developer realized that they needed to choose some sort of distributed middleware on which to base the architecture. In other words, if OGSA, for example, defines that the JobSubmissionInterface has a submitJob

Technical Report (April, 2005 – March, 2006) 8 IT@MIT Development of Front End tools for Semantic Grid Services method, there has to be a common and standard way to invoke that method if we want the architecture to be adopted as an industry-wide standard. This base for the architecture could, in theory, be any distributed middleware, Web services were chosen as the underlying technology as it has its own reasons. However, although the Web Services architecture was certainly the best option, it still didn’t meet one of OGSA’s most important requirements; the underlying middleware had to be stateful. Unfortunately, although Web Services can in theory is either stateless or stateful, they are usually stateless and there is no standard way of making them stateful. To overcome this limitation, OASIS developed a specification called Web Services Resource Framework (WSRF) that specifies how we can make our Web Services stateful, along with adding a lot of other features. It is important to note that WSRF is a joint effort by the Grid and Web Services communities, so it fits pretty nicely inside the whole Web Services Architecture. Hence, it is possible to derive the relationship between OGSA and WSRF. WSRF provides the stateful services that OGSA needs. In otherwords, while OGSA is the architecture, WSRF is the infrastructure on which that architecture is built on. Before taking a closer look at WSRF, we need to have some background knowledge in Globus Toolkit 4.0 and Web Services.

OGSA WSRF

Requires Specifies

Stateful Web Services Extends

Web Services

Fig 2.2: Relationship between OGSA, WSRF and Web Services

Technical Report (April, 2005 – March, 2006) 9 IT@MIT Development of Front End tools for Semantic Grid Services

2.1 Web Services The term Web services describes an important emerging distributed computing paradigm that differs from other approaches such as DCE, CORBA, and Java RMI in its focus on simple, Internet-based standards (e.g., eXtensible Markup Language: XML [4, 5]) to address heterogeneous distributed computing. Web services define a technique for describing software components to be accessed, methods for accessing these components, and discovery methods that enable the identification of relevant service providers [3]. Web services are programming language-, programming model-, and system software- neutral. Web services standards are being defined within the W3C and other standards bodies and form the basis for major new industry initiatives such as Microsoft (.NET), IBM (Dynamic e-Business), and Sun (Sun ONE). However the three main concerns are the standards: SOAP, WSDL, and WS-Inspection.  The Simple Object Access Protocol (SOAP) [6] provides a means of messaging between a service provider and a service requestor. SOAP is a simple enveloping mechanism for XML payloads that defines a remote procedure call (RPC) convention and a messaging convention. SOAP is independent of the underlying transport protocol too. HTTP, FTP, Java Messaging Service (JMS), and the like. We emphasize that Web services can describe multiple access mechanisms to the underlying software component. SOAP is just one means of formatting a Web service invocation.  The Web Services Description Language (WSDL) [7] is an XML document for describing Web services as a set of endpoints operating on messages containing either document-oriented (messaging) or RPC payloads. Service interfaces are defined abstractly in terms of message structures and sequences of simple message exchanges (or operations, in WSDL terminology) and then bound to a concrete network protocol and data-encoding format to define an endpoint. Related concrete endpoints are bundled to define abstract endpoints (services). WSDL is extensible to allow description of endpoints and the concrete representation of their messages for a variety of different message formats and network protocols. Several standardized binding conventions are

Technical Report (April, 2005 – March, 2006) 10 IT@MIT Development of Front End tools for Semantic Grid Services defined describing how to use WSDL in conjunction with SOAP 1.1, HTTP GET/POST and MIME.  WS-Inspection [8] comprises a simple XML language and related conventions for locating service descriptions published by a service provider. A WS-Inspection language (WSIL) document can contain a collection of service descriptions and links to other sources of service descriptions. A service description is usually a URL to a WSDL document; occasionally, a service description can be a reference to an entry within a Universal Description, Discovery, and Integration (UDDI) [9] registry. A link is usually a URL to another WS-Inspection document; occasionally, a link is a reference to a UDDI entry. With WS-Inspection, a service provider creates a WSIL document and makes the document network accessible. Service requestors use standard Web-based access mechanisms (e.g., HTTP GET) to retrieve this document and discover what services the service provider advertises. WSIL documents can also be organized in different forms of index. The Web services framework has two advantages for Grid architecture. First, the need to support the dynamic discovery and composition of services in heterogeneous environments necessitates mechanisms for registering and discovering interface definitions and endpoint implementation descriptions and for dynamically generating proxies based on (potentially multiple) bindings for specific interfaces. WSDL supports this requirement by providing a standard mechanism for defining interface definitions separately from their embodiment within a particular binding (transport protocol and data encoding format). Second, the widespread adoption of web services mechanisms means that a framework based on Web services can exploit numerous tools and extant services, such as WSDL processors that can generate language bindings for a variety of languages (e.g., Web Services Invocation Framework: WSIF), workflow systems that sit on top of WSDL, and hosting environments for Web services (e.g., Microsoft .NET and Apache Axis). In [4], it is emphasized that the use of Web services does not imply the use of SOAP for all communications. If needed, alternative transports can be used, for example to achieve higher performance or to run over specialized network protocols.

Technical Report (April, 2005 – March, 2006) 11 IT@MIT Development of Front End tools for Semantic Grid Services

2.2 WSRF Plain Web services are usually stateless (even though, in theory, there is nothing in the Web Services Architecture that says they can't be stateful). This means that the Web service can't "remember" information, or keep state, from one invocation to another. For example, imagine we want to program a very simple Web service which simply acts as an integer accumulator. This accumulator is initialized to zero, and we want to be able to add (accumulate) values in it. Suppose we have an add operation which receives the value to add and returns the current value of the accumulator. As shown in the FIG 2.2.1, the first invocation of this operation might seem to work (we request that 5 be added, and we receive 5 in return). However, since a Web service is stateless, the following invoca- tions have no idea of what was done in the previous invocations. So, in the second call to add we get back 6, instead of 11 (which would be the expected value if the Web service was able to keep state).

Request Add 6 Response 6

Request Add 8 Web Client Response 8 Service

Request Add 12 Response 12

Fig: 2.2.1: A stateless Invocation The fact that Web services don’t keep state information is not necessarily a bad thing. There are plenty of applications which have no needed whatsoever for statefulness. However, Grid applications do generally require statefulness. So, we would ideally like the Web service to somehow keep state information.

Technical Report (April, 2005 – March, 2006) 12 IT@MIT Development of Front End tools for Semantic Grid Services

State Request Add 6 Response 6 6

Request Add 8 Web 14 Client Response 14 Service

Request Add 12 26 Response 26

Fig 2.2.2: A stateful invocation Giving Web services the ability to keep state information while still keeping them stateless seems like a complex problem. Fortunately, it's a problem with a very simple so- lution: simply keep the Web service and the state information completely separate. Instead of putting the state in the Web service (thus making it stateful, which is generally regarded as a bad thing) it is kept in a separate entity called a resource, which will store all the state information. Each resource will have a unique key, so whenever we want a stateful interaction with a Web service we simply have to instruct the Web service to use a particular resource. This pairing of a Web service with a resource is called a WS- Resource (FIG 2.2.3).

RESOURCES

Filename: “tutorial.zip” Size: 750 Descriptors: {”Globus”,”Tutorail”)

Resource 0xF56433FA

Web Service Filename: “mynotes.doc” Web Size: 250 + Descriptors: {“notes”,”Globus”) Resource Servic = e Resource 0x09EB23FA WS-Resource

Filename: “pacman.exe” Size: 150 Descriptors: {“game”)

Resource 0x106E43FA

Fig 2.2.3: WS-Resource

Technical Report (April, 2005 – March, 2006) 13 IT@MIT Development of Front End tools for Semantic Grid Services

The address of a particular WS-Resource is called an endpoint reference. The dif- ficulty encountered in this approach is that how to specify what resource must be used. A URI might be enough to address the Web service, but something more than that is needed to address the resource. A new specification called WS-addressing is used to overcome this difficulty which provides a more versatile way of addressing Web Services.

2.3 The WSRF specification The WSRF is a collection of five different specifications and they all relate to the management of WS-Resources. WS-ResourceProperties A resource is composed of zero or more resource properties. For example, as shown in the Fig 2.2.3 each resource has three resource properties: Filename, Size, and Descriptors. WS-ResourceProperties specifies how resource properties are defined and accessed. The resource properties are defined in the Web service's WSDL interface description. WS-ResourceLifetime Resources have non-trivial lifecycles. In other words, they're not a static entity that is created when our server starts and destroyed when our server stops. Resources can be created and destroyed at any time. The WS-ResourceLifetime supplies some basic mechanisms to manage the lifecycle of our resources. WS-ServiceGroup We will often be interested in managing groups of Web Services or groups of WS- Resources, and performing operations such as 'add new service to group', 'remove this service from group, and (more importantly) 'find a service in the group that meets condition FOOBAR. The WS-ServiceGroup specifies how exactly we should go about grouping services or WS-Resources together. Although the functionality provided by this specification is very basic, it is nonetheless the base of more powerful discovery services (such as GT4's IndexService) which allow us to group different services together and access them through a single point of entry (the service group).

Technical Report (April, 2005 – March, 2006) 14 IT@MIT Development of Front End tools for Semantic Grid Services

WS-BaseFaults Finally, this specification aims to provide a standard way of reporting faults when something goes wrong during a WS-Service invocation Related specifications WS-Notification WS-Notification is another collection of specifications that, although not a part of WSRF, is closely related to it. This specification allows a Web service to be conFIGd as a notification producer, and certain clients to be notification consumers (or subscribers). This means that if a change occurs in the Web service (or, more specifically, in one of the WS-Resources), that change is notified to all the subscribers (not all changes are notified; only the ones the Web services programmer wants to). WS-Addressing As mentioned before, the WS-Addressing specification provides us a mechanism to address Web services which is much more versatile than plain URIs. In particular, we can use WS-Addressing to address a Web service + resource pair (a WS-Resource).

2.4 Globus Toolkit The Globus Toolkit a (GT) [1, 2] is a community-based, open-architecture, open- source set of services and software libraries that support Grids and Grid applications. The toolkit addresses issues of security, information discovery, resource management, data management, communication, fault detection, and portability. Globus Toolkit mechanisms are in use at hundreds of sites and by dozens of major Grid projects worldwide. The toolkit, first and foremost, includes quite a few high-level services that we can use to build Grid applications. These services, in fact, meet most of the abstract requirements set forth in OGSA. In other words, the Globus Toolkit includes a resource monitoring and discovery service, a job submission infrastructure, a security infrastructure, and data management services (to name a few!). Since the working groups at GGF are still working on defining standard interfaces for these types of services, it is not possible to say (at this point) that GT4 (GT Version 4.0) is an implementation of

Technical Report (April, 2005 – March, 2006) 15 IT@MIT Development of Front End tools for Semantic Grid Services

OGSA (although GT4 does implement some security specifications defined by GGF). However, it is a realization of the OGSA requirements and a sort of de facto standard for the Grid community while GGF works on standardizing all the different services. Most of these services are implemented on top of WSRF (the toolkit also includes some services that are not implemented on top of WSRF and are called the non- WS components). The Globus Toolkit 4, in fact, includes a complete implementation of the WSRF specification. This part of the toolkit (the WSRF implementation) is a very important part of the toolkit since nearly everything else is built on top of it.

2.4.1 GT4 Architecture The Globus Toolkit 4 is composed of several software components. As shown in the FIG 2.4.1.1, these components are divided into five categories: Security, Data Man- agement, Execution Management, Information Services, and the Common Runtime. No- tice how, despite the fact that GT4 focuses on Web Services, the toolkit also includes components which are not implemented on top of Web services. For example, the GridFTP component uses a non-WS protocol which started as an ad hoc Globus protocol, but later became a GGF specification. Common Runtime The Common Runtime components provide a set of fundamental libraries and tools which are needed to build both WS and non-WS services. Security Using the Security components, based on the Grid Security Infrastructure (GSI), we can make sure that our communications are secure. Data management These components will allow us to manage large sets of data in our virtual organi- zation

Technical Report (April, 2005 – March, 2006) 16 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 2.4.1.1: GT4 Architecture .Information services The Information Services, more commonly referred to as the Monitoring and Dis- covery Services (MDS), includes a set of components to discover and monitor resources in a virtual organization. Note that GT4 also includes a non-WS version of MDS (MDS2) for legacy purposes. This component is deprecated and will surely disappear in future re- leases of the toolkit. Execution management Execution Management components deal with the initiation, monitoring, manage- ment, scheduling and coordination of executable programs, usually called jobs, in a Grid.

3. Semantic Web Service Formulated by the creator of the World Wide Web Tim Berners-Lee, the Seman- tic Web is about “bringing the web to its full potential” [10]. The web currently contains around 3 billion static documents, which are accessed by over 500 millions users. While these numbers are increasing at a staggering rate, the task of dealing with the information

Technical Report (April, 2005 – March, 2006) 17 IT@MIT Development of Front End tools for Semantic Grid Services is getting harder [11]. The Semantic Web is an effort to develop technologies so that the value of information can scale with the increase of information, thus brining the web to its full potential. The Semantic Web’s approach is to make information “understandable” by computers and hence they must be described in such a way that computers can inter- pret it and derive its meaning. This will enable computers to work more intelligently with the information; for example assisting humans in classifying, filtering and searching in- formation. Computer understandable information is information annotated with semantics. Annotations can therefore be though of as metadata that describes the meaning – the se- mantics – of the information. The annotations themselves have to be defined so that com- puters can interpret and reason with them. A collection of annotations where their mean- ing is described is called an ontology which will be discussed in detail in the next section. For ontologies to represent knowledge of a domain, they need to be expressed with lan- guage that can convey the necessary complexity of the domain. Description Logic (DL) is knowledge formalism that describes the abstract world with concepts and relations. These basic constructs can be used to build up advanced hierarchies and graphs with restrictions on various levels. Knowledge has to be represented so that logic can be applied. In special it is im- portant to be able to compare and derive similarities between annotations. DL has ad- vanced capabilities for this. One powerful feature in DL is subsumption, which checks whether or not a concept contains another concept. The advantages of annotations are closely related to what can be derived from the representation. Using a language with ad- vanced capabilities for reasoning is therefore of great importance. Applications interpreting the semantics of a document need to have access to the ontologies that define the semantics. When a document is annotated with semantics it in- cludes information about where the annotations are described. The ontologies describing the annotations must therefore be available and readable so that applications can derive their meaning. For example in context of the Semantic Web – which is a distributed sys-

Technical Report (April, 2005 – March, 2006) 18 IT@MIT Development of Front End tools for Semantic Grid Services tem – the ontologies have to be network accessible. They are therefore defined with URIs which is unique network identifications. For ontologies to be used in distributed systems and across systems there has to be agreements on how knowledge is represented and reasoned with. These standards should be general and – at the same time – advanced enough to capture the wide needs of differ- ent interest groups. They should also specify syntax and formats for representation. Re- cently developed open standards for knowledge representation are RDF and OWL [12] [13].

3. 1 Motivation Conventionally, the capabilities of Web Services are described with Web Service Definition Language (WSDL) [7] by defining operations and parameters that the service supports. This is done by naming operations and parameters, and then associating the pa- rameters with abstract types which can be though of as data types. The problem with this service description is that agents can not derive what the service does. An agent interpret- ing the parameters of the service can not derive their meaning by simply looking at them. For agents they are only parameters names – named variables that are used to contain in- formation. Agents can not derive the content of these parameters, because they can not read the parameter names like humans can. What agents can infer from a service descrip- tion is that the input and output parameters and their data types. Agents can only interpret information that confirms to a syntactical structure. They are programmed to derive the meaning from formats like WSDL, which is an agree- ment on how service descriptions are interpreted. The service description can thus be seen on as a structured syntactical description. A human developer who wants to use a service in a client program needs to read the syntax of service description, interpret it, and then write the client program confirming to the syntax of service. Agents can not read text like humans – they can understand the structure of the service descriptions but not the content [14]. Semantic Web Services are described so that agents can interpret their capabilities. Autonomous agents should be able to look into the service description to de-

Technical Report (April, 2005 – March, 2006) 19 IT@MIT Development of Front End tools for Semantic Grid Services termine if the service provides the desired capabilities and if the agent is able to use the service. The service description must thus be extended with computer interpretable infor- mation called semantics. The parameters or the service names must be described in such a way that agents can find out what they mean. This is achieved by defining vocabularies – organized in ontologies – that are used to annotate service capabilities. For agents to in- terpret service descriptions they will have to use these ontologies – which are shared con- ceptualizations of domains [11]. Research is currently being done on how the state of ser- vices can be represented as preconditions and effects [15]. Preconditions would represent what is necessary for using the service, and effects would represent the consequences of using the service. Preconditions and effects are therefore specifying the state transforma- tion of the service [16].

3.1.1 Increasing Number of Web Services There are important trends in distributed systems that affect the importance of ser- vice discovery. The most noticeable trend is that the number of services increases rapidly. This is due to the popularity of Web Services and the fact that the service-oriented para- digm encourages lousily coupled independent services. A result of the latter is that ser- vices are looked up on demand when they are needed and the effect of this might be that services are looked up more often. Service discovery is thus becoming more and more important in distributed systems.

3.1.2 Advantages of using Semantics in Service Discovery There are several advantages of using semantics in service discovery. The accura- cy of the service discovery will improve with semantics. Semantics provide the expres- siveness needed to make detailed descriptions of advertised service capabilities and capa- bility requests. Matchmaking using semantics is more accurate than a keyword-based search, because the direct similarities are found using inference logic, while the keyword- based searches can be vague and inaccurate. Service discovery based on semantics will enable a more dynamic coupling between clients and services. The exact services do not

Technical Report (April, 2005 – March, 2006) 20 IT@MIT Development of Front End tools for Semantic Grid Services need be known in advance and the client can more easily change services that are used. This is an improvement towards the service oriented paradigm, where services are loose- ly coupled. Dynamic coupling of services based on semantics will furthermore enable new technologies and applications to emerge. More resilient and flexible systems can for ex- ample be made by smart service mediators that use semantics to provide functionality transparency. This is useful in dynamic service environments or for systems that need high operability. Functionality transparency means that a client using a set of services through a service mediator does not know which or where these services are. The re- quired functionality is just provides to the client. Failing or disappearing services can therefore be replaced unknowingly by the client. Semantics can also improve the admin- istration of service registers provided for manual browsing. Services can be automatically categorized and classified based on their semantics, making administration easier and registries more up to date.

4. Semantic Grid Services The grid computing infrastructure defined in [1, 2] is only a part of much larger picture that also includes information handling and support for knowledge processing within the distributed scientific process. This broader view is adopted for semantic grid which can be described as an extension of the current Grid where information and ser- vices are given well defined meaning, better enabling computers and people to work in cooperation [8, 9]. The semantic Grid is a service oriented architecture in which entities provide services to one another under various forms of contract. Thus the Semantic Grid characterized as an open system in which users, software components and computational resources come and go on a continual basis. Hence there should be a high degree of auto- mation that supports flexible collaborations and computation on a global scale, Moreover, this environment should be personalized to the individual participants and should offer seamless interactions with both software components and other relevant users. 4.1 Conceptual layer of Semantic Grid

Technical Report (April, 2005 – March, 2006) 21 IT@MIT Development of Front End tools for Semantic Grid Services

Given the above view of the scope of semantic grid, it has become popular to characterize the computing infrastructure as consisting of three conceptual layers [27].  Data/computation This layer deals with the way that computational resources are allocated, scheduled and executed and the way in which data is shipped between the various processing resources. It is characterized as being able to deal with large volumes of data, providing fast networks and presenting diverse resources as a single metacomputer. The data/computation layer builds on the physical ‘grid fabric’, i.e. the underlying network and computer infrastructure, which may also interconnect scientific equipment. Here data is understood as uninterpreted bits and bytes.

 Information This layer deals with the way that information is represented, stored, accessed, shared and maintained. Here information is understood as data equipped with meaning. For example, the characterization of an integer as representing the temperature of a reaction process, the recognition that a string is the name of an individual.  Knowledge This layer is concerned with the way that knowledge is acquired, used, retrieved, published and maintained to assist e-Scientists to achieve their particular goals and objectives. Here knowledge is understood as information applied to achieve a goal, solve a problem or enact a decision. In the Business Intelligence literature, knowledge is often defined as actionable information. For example, the recognition by a plant operator that in the current context a reaction temperature demands shutdown of the process.

Knowledge Services Layer

Information Services Layer

Semantic Grid Data Serv ices Layer Technical Report (April, 2005 – March, 2006) 22 Computational Services Laye

Distributed Resources IT@MIT Development of Front End tools for Semantic Grid Services

Fig 4.1.1: A Layered Architecture of Semantic Grid

Although this view is widely accepted, to date most research and development work in this area has concentrated on the data/computation layer and on the information layer. While there are still many open problems concerned with managing massively distributed computations in an efficient manner and in accessing and sharing information from heterogeneous sources, the full potential of grid computing can only be realized by fully exploiting the functionality and capabilities provided by knowledge layer services. This is because it is at this layer that the reasoning necessary for seamlessly automating a significant range of the actions and interactions takes place. Thus this is the area we focus on most in this research project.

4.2 Knowledge Layer The aim of the knowledge layer is to act as an infrastructure to support the man- agement and application of scientific knowledge to achieve particular types of goal and objective. In order to achieve this, it builds upon the services offered by the data/compu- tation and information layers. The first thing to reiterate with respect to this layer is the problem of the sheer scale of content we are dealing with. We recognize that the amount of data that the data grid is managing is likely to be huge. Once information is delivered that is destined for a particular purpose, we are in the realm of the knowledge grid. Thus at this level we are fundamentally concerned with abstracted and annotated content and with the management of scientific knowledge. The effort of acquiring knowledge was

Technical Report (April, 2005 – March, 2006) 23 IT@MIT Development of Front End tools for Semantic Grid Services one bottleneck recognized early but so too are; modeling, retrieval, publication and main- tenance. Knowledge acquisition sets the challenge of getting hold of the information that is around, and turning it into knowledge by making it usable. This might involve, for in- stance, making tacit knowledge explicit, identifying gaps in the knowledge already held, acquiring and integrating knowledge from multiple sources (e.g. different experts, or dis- tributed sources on the Web), or acquiring knowledge from unstructured media (e.g. nat- ural language or diagrams). However, the process of explicit knowledge acquisition from human experts re- mains a costly and resource intensive exercise. Hence, the increasing interests in methods that can (semi-) automatically elicit and acquire knowledge that is often implicit or else distributed on the web. A variety of information extraction tools and methods are being applied to the huge body of textual documents that are now available. Knowledge modeling bridges the gap between the acquisition of knowledge and its use. Knowledge models must be able both to act as straightforward placeholders for the acquired knowledge, and to represent the knowledge so that it can be used for prob- lem-solving. Knowledge representation technologies have a long history in Artificial In- telligence. There exist a numerous languages and approaches that cater for different knowledge types; structural forms of knowledge, procedurally oriented representations, rule based characterizations and methods to model uncertainty, and probabilistic repre- sentations. Once knowledge has been acquired and modeled, it needs to be stored or hosted somewhere so that it can be retrieved efficiently. In this context, there are two related problems to do with knowledge retrieval. First, there is the issue of finding knowledge again once it has been stored. And second, there is the problem of retrieving the subset of content that is relevant to a particular problem. This will set particular problems for a knowledge retrieval system where content alters rapidly and regularly. Technologies for information retrieval exist in many forms. They include methods that attempt to encode structural representations about the content to be retrieved such as explicit attributes and

Technical Report (April, 2005 – March, 2006) 24 IT@MIT Development of Front End tools for Semantic Grid Services values. Varieties of matching algorithm can be applied to retrieve cases that are similar to an example or else a partial set of attributes presented to the system. Having acquired knowledge, modeled and stored it, the issue then arises as to how to get that knowledge to the people who subsequently need it. The challenge of knowledge publishing or disseminating can be described as getting the right knowledge, in the right form, to the right person or system, at the right time. Different users and systems will require knowledge to be presented and visualized in different ways. The quality of such presentation is not merely a matter of preference. It may radically affect the utility of the knowledge. Getting presentation right involves understanding the different perspectives of people with different agendas and systems with different requirements. An understanding of knowledge content will help to ensure that important related pieces of knowledge get published at the appropriate time. Finally, having acquired and modeled the knowledge, and having managed to retrieve and disseminate it appropriately, the last challenge is to keep the knowledge content current – knowledge maintenance. This may involve the regular updating of content as knowledge changes. Some content has considerable longevity, while other knowledge dates quickly. If knowledge is to remain useful over a period of time, it is essential to know which parts of the knowledge base must be updated or else discarded and when. Other problems involved in maintenance include verifying and validating the content, and certifying its safety.

4.3 Research Issues in Semantic Grid The following is some of the key research issues that remain for exploiting knowledge services in the Semantic Grid. In many cases there are already small-scale exemplars for most of these services; consequently many of the issues relate to the problems of scale and distribution [17].  We need tools and infrastructure to automate the creation of ontology of knowl- edge present in the Grid environment.

Technical Report (April, 2005 – March, 2006) 25 IT@MIT Development of Front End tools for Semantic Grid Services

 Matchmaking of semantic web services can be done in OWL-S but it poses prob- lems when it comes to Grid.  Knowledge capture tools are needed that can be added as plugins to a wide variety of applications and which draw down on ontology services. This will include a clearer understanding of profiling individual and group e-Science perspectives and interests.  Dynamic linking, visualization, navigation and browsing of content from many perspectives over large content sets.  Provision of knowledge discovery services with standard input/output APIs to on- tologically mapped data

5. Related Technology The concept of an ontology is necessary to capture the expressive power that is needed for modeling and reasoning with knowledge. Generally speaking, an ontology determines the extension of terms and the relationships between them. However, in the context of knowledge and web engineering, an ontology is simply a published, more or less agreed, conceptualization of an area of content. The ontology may describe objects, processes, resources, capabilities or whatever. Ontologies are used to capture knowledge about some domain of interest. Ontology describes the concepts in the domain and also the relationships that hold between those concepts. Different ontology languages provide different facilities. Recently a number of languages have appeared that attempt to take concepts from the knowledge representation languages of AI and extend the expressive capability of those of the Web (e.g., RDF and RDF Schema). Also, there has been an attempt to integrate the best features of these languages in a hybrid called DAML+OIL. As well as incorporating constructs to help model ontologies DAML+OIL is being equipped with a logical language to express rule-based generalizations. The most recent development in standard ontology language is Web Ontology Language (OWL) from the World Wide Web Consortium. It is based on a different logical model which makes it possible for concepts to be defined as well as described. Complex concepts can therefore be built up in definitions out of simpler concepts.

Technical Report (April, 2005 – March, 2006) 26 IT@MIT Development of Front End tools for Semantic Grid Services

According to Karlsruhe Ontology model [18], an ontology with datatypes is a structure O:= (C, T, ≤C, R, A, σR, σA, ≤R, ≤A, I, V, tC, tT, tR, tA ) consisting of  Six disjoint sets C,T,R,A,I and V called concepts, datatypes, relations, at- tributes, instances and data values respectively.

 Partial orders ≤C on C called concept hierarchy or taxonomy and ≤T on T called type hierarchy.

2  Functions σR: R→ C called relation signature and σA: A→CxT called at- tribute signature.

 Partial orders ≤R on R called relation hierarchy and ≤A on A called at- tribute hierarchy, respectively.

I  A function tC: C→ 2 called concept instantiation.

V  A function tT: T→ 2 called datatype instantiation.

IxI  A function tR: R→ 2 called relation instantiation.

IxV A function tA: A→ 2 called attribute instantiation Protégé is an integrated software tool used by system developers and domain experts to develop knowledge based systems. The Protégé editor is developed by Stanford University and is widely used for developing semantic web services [15].Applications developed with Protégé 2000 are used in problem solving and decision making in a particular domain. This tool is used to create ontology in many applications. Furthermore, the logical model allows the use of reasoner which can check whether or not all of the statements and definitions in the ontology are mutually consistent and they can also recognize which concepts falls under which definitions. The reasoner can therefore help to maintain the hierarchy correctly. Algernon is an inference engine that facilitates direct interaction with Protégé knowledge bases (KBs) and supports access to multiple concurrent KBs. Algernon commands not only retrieve and store slot values, but can also modify the ontology. Protégé editor has Algernon inference engine as plug in and therefore it becomes easy to retrieve the knowledge from the ontology created by Protégé editor. Algernon consists of simple queries by which one can interact with the protégé knowledge base and retrieves knowledge from it.

Technical Report (April, 2005 – March, 2006) 27 IT@MIT Development of Front End tools for Semantic Grid Services

5.1 OWL-S Though OWL increases the level of expressiveness with a richer vocabulary but retaining the decidability, it is primarily used to describe content. The next logical step is to describe the semantics of services to improve their platform- and organization- independent interoperability over the internet, an upper ontology for the description of web services named OWL-Services (OWL-S) has been introduced. OWL-S is ontology of service concepts. OWL-S is an OWL based Web Service Ontology which supplies a core set of markup language constructs for describing the properties and capabilities of Web Services in unambiguous, computer interpretable form [15]. Using OWL-S for the description of Web services can increase the ability of computer systems to find eligible services autonomously. This is important in open environments where provided services can appear and disappear dynamically. OWL-S is thus an ontology that provides the necessary concepts and relations to describe the general capabilities of web services. This includes representation of the information transformation with Inputs and Outputs, and state transformation like Preconditions and Effects normally referred as IOPE together.

provides Resource Service supports presents describedBy ServiceGrounding Profile ServiceModel

Fig: 5.1.1: Top Level Ontology of OWL-S OWL-S provides moreover a high level description of web services as shown in Fig 5.1.1. This top level ontology describes how a Resource is related to a Service, and subsequently how the Service is related to the Profile, the Service Model and the service Grounding. In short, the Profile contains information about what the service does, the service Model describes how the service works, and the grounding describes how the service is accessed.

Technical Report (April, 2005 – March, 2006) 28 IT@MIT Development of Front End tools for Semantic Grid Services

A process model describes how a service performs its tasks. It includes information about inputs, outputs, preconditions and results. The process model differentiates between composite, atomic and simple processes. For a composite process, the process model shows how it breaks down into simpler component processes and the flow of control and data between them. A profile provides a general description of a Web service, intended to be published and shared to facilitate service discovery. Profiles can include both functional properties and non-functional properties. The functional properties are derived from the process model, but it is not necessary to include all the functional properties from the process model in a profile. A simplified view can be provided for service discovery, on the assumption that the service consumer would eventually look at the process model to achieve a full understanding of how the service works. Grounding specifies how a service is invoked, by detailing how the atomic processes in a service’s process model map onto a concrete messaging protocol. OWL-S allows for different types of groundings to be used, but the only type developed to data is the WSDL grounding, which allows any Web service with a WSDL definition to be marked up as a Semantic Web Service using OWL-S. A service simply binds the other parts together into a unit that can be published and invoked. It is important to understand that the different parts of a service can be reused and connected in various ways. Together, these three concepts are designed to give a total picture of the capabilities of a service.

5.2 OWL-S Matchmaking Basically a service provider describes his advertised services in an OWL-S compliant ontology and a service requester queries for services with an OWL-S ontology expressing his requirements. In this scenario, matching service descriptions of advertisements with requirements has the purpose to select a suitable service among a set of available ones. Considering known matching approaches that return mismatch or match, this selection process has some potential benefits. Matchmaking, in this context

Technical Report (April, 2005 – March, 2006) 29 IT@MIT Development of Front End tools for Semantic Grid Services refers to capability matching which means to compare the requested service description with the advertised service descriptions. The goal of this comparison is to obtain information on how similar they are. This degree of similarity is used to determine if the advertised service satisfies the requested capabilities. Comparing the requested service description with the advertised service descriptions takes all the inputs and the outputs into account. The inputs (I) and outputs (O) of the requested service description are compared to those of the advertised service descriptions. In most of the matchmaking algorithm, only I and O are considered for matching, because P and E are not sufficiently standardized to be considered for Matchmaking algorithm. The profile defines a set of additional properties that are used to describe the features of the service. From these properties, in addition to IOPE we can also use the service category [19], which is used to classify the service with respect to some ontology or taxonomy of services. On an optional basis, other properties can also be taken into account, such as the element QualityRating, or custom defined properties, such as the duration of the execution.

6. Issues and Difficulties in Existing Approach The most prominent technology for web service discovery is the Universal Description Discovery and Integration (UDDI). This technology is expected to be the internet standard for service discovery due to its strong industry backing [6, 7]. UDDI provides functionalities – an API – to search for services using keywords. These keywords are matched with text used in the descriptions of the services. It is very hard for agents to work with keywords because it involves some degree of language understanding. Keyword based search is thus not enabling autonomous discovery. Similar to UDDI, in Grid environment, the Globus Toolkit’s Monitoring and Discovery System (MDS) defines and implements mechanism for resource discovery and monitoring in distributed environments like Grid [20, 21, 22]. MDS has been specifically designed to address the needs of grid nodes to publish and discover services or resources that are in use by multiple people across multiple administrative domains. It supports traditional service matching which is done based on symmetric and attribute based matching [23]. In

Technical Report (April, 2005 – March, 2006) 30 IT@MIT Development of Front End tools for Semantic Grid Services these systems, the values of attributes advertised by nodes are compared with those required by jobs. For the comparison to be meaningful and effective, the node providers and consumers have to agree upon attribute names and values. The exact matching and coordination between providers and consumers make such system inflexible and difficult to extend to new characteristics or concepts. Moreover, in a heterogeneous multi- institutional environment such as Grid, it is difficult to enforce the syntax and semantics of node descriptions. Therefore, MDS neither offers expressive description facilities nor provide sophisticated matchmaking capabilities. Hence, in these distributed computing environment where resources come and go dynamically, there is a demand for a framework to support semantic description and semantic discovery of service and resources. Currently, there is no tool available for converting the Web Service Description Language written for Grid Service into OWL file. The WSDL2OWL-S tool that comes with OWL-S package can convert WSDL into OWL-S file [15] but it cannot convert the Grid WSDL into OWL. This is because the WSDL written for a Grid Service is WSRF compliant and it will have WSRF specific elements that cannot be recognized by the tool. Moreover, there are no standards for specifying ResourceProperties in WSDL file as well as in the Service Ontology. Currently there is no component in Globus toolkit to facilitate the Service Provider to describe the service semantically. Since the MDS registry is built using UDDI v.2, it supports conventional keyword matching of services and it is not possible to expect semantic searching and retrieval from MDS4. Hence to semantically describe the services, we need to create ontology of grid service, but currently MDS4 does not include any tool to create service ontology.

Technical Report (April, 2005 – March, 2006) 31 IT@MIT Development of Front End tools for Semantic Grid Services

7. Proposed Approach With the motivation of addressing the issues discussed in the last section, this project work introduces several approaches of implementing knowledge layer and proposes semantic grid architecture using those approaches.  The section 8 proposes and implements semantic grid architecture by integrating protégé editor with Globus Toolkit and implements Parameter Matchmaking Algorithm for semantic discovery of services. However, the service providers need to have an expertise in protégé editor to create ontology of their services in this approach.  In section 9, we propose a five layered semantic grid architecture using Gridbus broker that addresses the need of semantic component in the grid environment to discover and describe the grid resource semantically  It is decided to devise a knowledge layer for semantic description of resources and its semantic retrieval, for semantic description of services and matchmaking of advertised grid services against the requested ones.

Technical Report (April, 2005 – March, 2006) 32 IT@MIT Development of Front End tools for Semantic Grid Services

8. Semantic Grid Services using PEG

8.1 Introduction The main objective of this research work is to extend the capability of Globus Toolkit (GT) to support semantic description and discovery of Grid Services. We have integrated GT with Protégé editor to support globus user for semantic descriptions of Grid services. This Protégé Enabled Globus toolkit (PEG) is used for semantic description of services by creating service ontology and the Algernon inference engine is used to interact with the created ontology. We have also proposed a new algorithm called Parameter Matchmaking Algorithm that computes various degrees of matching of advertised service descriptions with that of the requested ones based on the Input, Output and Functionality (IOF) parameters. On the contrary to algorithms that return only success or fail, ranked degrees of match obtained from our proposed algorithm provide better precision against the selection of a service among a large set of services. A separate Grid Portal is developed using Gridsphere framework that enables the service requester to submit query and performs the matchmaking of requested service against the advertised ones. The proposed algorithm is tested successfully in PEG toolkit for the semantic discovery of grid services.

8.2 Layered Architecture In PEG, the Protégé editor is integrated with GT to address the demand of single toolkit for semantic description and representation of services by creating service ontology and its capability is extended to enable semantic description and representation of services by creating service ontology. We propose a five layered architecture as shown in the Fig 8.2.1 using PEG as middleware for semantic description and discovery of services. W omit the discussion of Fabric layer to avoid explanation redundancy.

Technical Report (April, 2005 – March, 2006) 33 IT@MIT Development of Front End tools for Semantic Grid Services

Grid Information Portlet Application Layer Semantic Discovery portlet Application Portlet

Semantic Knowledge Layer Tokenizer Component Service Ontology

Computational Grid Information Data Management File/Data Services (High level Protégé_3.1 Grid Services) MDS GRAM GridFTP

Grid Middleware Authentication Authorization Services GT4 Middleware GSI

R2 Fabric Layer R3 R1 Resources R4

Fig 8.2.1: A layered architecture for Semantic Grid Services using PEG

Grid Middleware Services This layer incorporates Grid Middleware and we use PEG as Grid Middleware in this research work. It also consists of required protocols for Authentication and Authorization which are implemented using Grid Security Infrastructure provided by PEG. High Level Grid Services This layer uses the communication protocols to control, initiate, monitor, accounting and payment for the sharing of functions of individual resources. This layer is responsible for individual resource management and also for all global resource management and interaction with collections of resources. The other high level services

Technical Report (April, 2005 – March, 2006) 34 IT@MIT Development of Front End tools for Semantic Grid Services included in this layer are Information (MDS) and job management services (GRAM), data management services (GridFTP). This layer also allows grid service provider to provide semantic meaning for the services advertised in MDS registry using protégé in PEG. Knowledge services layer Running on top of the high level grid service layers, the knowledge service layer can provide knowledge discovery from a huge amount of data. This layer is domain oriented and usually consists of service ontology built using protégé editor. The parameter matchmaking algorithm proposed in this project is implemented in this layer that performs matchmaking of services based on IOF parameters. Application layer The application layer enables the use of resources in a grid environment through various collaboration and resource access protocols. The semantic portlet present at this layer enables the service provider to register the service into the MDS registry and it prompts the provider to describe the service semantically using Protégé editor. The portlet also enables the service requester to submit the query and semantic retrieval of information from the service ontology using the proposed matchmaking algorithm. In addition to that, this layer may also consist of various application portlets to use grid resources. The semantic component in the knowledge layer enables the service provider for semantic description of services using protégé editor. It also implements the proposed Parameter Matchmaking Algorithm to compute the degrees of match between the advertised service ontology and the requested services on the basis of IOF parameters for semantic matchmaking of services.

Technical Report (April, 2005 – March, 2006) 35 IT@MIT Development of Front End tools for Semantic Grid Services

8.3 Parameter Matchmaking Algorithm Matchmaking refers to capability matching which means to compare the requested service description with the advertised service descriptions. The goal of this comparison is to obtain information on how similar they are [24]. This degree of similarity is used to determine degrees of match between the advertised services and the requested capabilities. Comparing the requested service requirements with the advertised service descriptions takes all the inputs and the outputs into account [19]. In this research work, the proposed algorithm computes various matching degrees of service advertisement (A) and request (R) by successively applying different filters. The comparison is based on three parameters of the service namely the Inputs, Outputs and Functionalities (IOF). The service ontology that clearly describes IOF of the service is created using protégé editor of PEG to enable effective matchmaking of services. The algorithm compares the IOF of the requested services with that of the advertised ones and computes various degrees of matches as listed below. Exact Match Here the advertised IOF of the service are exactly matches with that of requested service. A(IOF) ≡ R(IOF) → { A(I) ≡ R(I) ∩ A(O) ≡ R(O) ∩ A(F) ≡ R(F) }

A ? R

Plug-in match

This filter guarantees that advertised service A requires less input that it has been specified in the request R. In addition, service S is expected to return more specific output data whose semantics is exactly the same or very close to what has been requested by the user. A(IOF) ≥ R(IOF) → { A(I) ≥ R(I) U A(O) ≥ R(O) U A(F) ≥ R(F) }

A(IOF)

B(IOF) A(IOF) ? R(IOF)

Technical Report (April, 2005 – March, 2006) 36 IT@MIT Development of Front End tools for Semantic Grid Services

Subsumes match This filter is more or less the reverse of plug in filter and it is weaker than plug in with respect to the extent the returned IOF is more specific than requested by the user.

A(IOF) ≤ R(IOF) → { A(I) ≤ R(I) U A(O) ≤ R(O) U A(F) ≤ R(F) }

R(IOF)

A(IOF) A(IOF) ? R(IOF)

Intersection This filter reveals that not all the capabilities requested by the service matching with the advertised capabilities.

A(IOF) ? R(IOF

R(IOF) A(IOF)

Disjoint The requested service R does not match with the described service A according to any of the above filters.

A(IOF) ≠ R(IOF) → { A(I) ≠ R(I) U A(O) ≠ R(O) U A(F) ≠ R(F) }

The algorithm starts with extracting IOF from the advertised service. Since, the ontology knowledge base (KB) has been created using OWL, a reasoner can be used to retrieve the information from KB. Here, we use Algernon inference engine to interact with KB and executes different queries to retrieve IOF of the advertised service and stores it. The requested IOF is then compared with that of IOF of the advertised service and degrees of match is obtained

Technical Report (April, 2005 – March, 2006) 37 IT@MIT Development of Front End tools for Semantic Grid Services

Algorithm: Parameter Matchmaking Algorithm input_rank=compute_intermediaterank(m,c1,r) Input: Advertised_Ontology A, Requester_query R output_rank=compute_intermediaterank(n,c2,s) Output: Degree_of_Match M functionality_rank=compute_intermediaterank(p,c3,t) Rank: input_rank,output_rank,functionality_rank M=leastof(input_rank, output_rank, functionality_rank) parse A into A(I1,I2,..Im),A(O1,O2,..On) and A(F1,F2,..Op) Rank compute_intermediaterank(i,c,j) parse R into R(I1,I2,..Ir),R(O1,O2,..Os) and R(F1,F2,..Ot) { c1=0, c2=0,c3=0 if(i==c==j) then R=1; for each parsed A( I1,I2,..Im), A(O1,O2,..Om), A( F1,F2,.Fm) if(i>c=j), then R = 0.75; do if(i=cc

Fig 8.3.1: Parameter Matchmaking Algorithm

8.4 Implementation For testing and demonstration purpose, the proposed architecture is implemented and is discussed in this section. It also discuss about various modules involved in this project including creation of Grid service, service ontology creation, Service Provider module and also service requester module.

8.4.1 Developing Grid Service Grid Service can be developed using any of the available middleware that complements Grid architecture. We use PEG which is based on Web Service Resource Framework specification (WSRF) in this research work. A WSRF compliant Web Service Description (WSDL) file is written that defines the interface for four different methods namely add, sub, multiply and divide. The Request and Responses have been clearly defined and also the input and output messages. We also defined an interface that returns the result and one more interface to receive the inputs. The JAVA programming language is used to implement the service. The methods add, sub, multiply and divide, when invoked, will perform the respective operation on both input values and returns the result to the client module.

Technical Report (April, 2005 – March, 2006) 38 IT@MIT Development of Front End tools for Semantic Grid Services

8.4.2 Creation of Service Ontology The Grid service developed is deployed onto the globus container and registered in MDS. However, MDS does not provide semantic meaning to the service, we need to create service ontology to enable semantic discovery of services. We use protégé editor of PEG to describe this service semantically and create service ontology as shown in the Fig 8.4.2.1 series. The ontology clearly describes the methods implemented in the service and also the service parameters. It may also includes non-functional properties like contact person etc. The Fig 8.3.2.1a shows the class-instance hierarchy of the grid service and also the general category to which the service belongs. The Fig 8.4.2.1b shows the object properties by which the classes communicate between them. The Fig 8.4.2.1c shows the various datatype properties of the grid service described. The Fig 8.4.2.1d shows the class hierarchy described for operations and their instances. The Fig 8.4.2.1e shows the class hierarchy described for parameters and their instances.

Fig: 8.4.2.1a – Class-Instance Hierarchy of Service Ontology – TreeView

Technical Report (April, 2005 – March, 2006) 39 IT@MIT Development of Front End tools for Semantic Grid Services

methods_implemented (Domain – GridService, Range –Operations)

isImplementedBy(Domain –Operations, Range-GridService) isUsedBy(Domain –Parameter, Range –GridService) Output hasOutput(Domain –GridService, Range – Output) hasInput(Domain –GridService, Range –Input)

Fig: 8.4.2.1b – Nested View of Service Ontology with Domain-range of object property

Fig 8.4.2.1.c: GridService Class and its Instance with properties

Technical Report (April, 2005 – March, 2006) 40 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 8.4.2.1.d: Operations Class and its Instances with properties

Fig 8.4.2.1e: Parameter Class and its Instances with properties

8.4.3. Matchmaking Module The parameter matchmaking algorithm is implemented in knowledge layer of the proposed architecture using java language in this project. The java implemented algernon packages are used to query the ontology knowledge base. The package offers several java APIs with which various queries can be executed. The java implemented tokenizer extracts IOF from the service requester’s query by eliminating unwanted information from the query which is then compared with that of the advertised service and computes the degrees of matches. The algorithm starts with extracting IOF from the advertised service by executing appropriate algernon queries over service ontology described in PEG. The tokenizer implemented in the semantic component receives the service requester’s query which will be in non-syntactic format, eliminates unwanted information from the query and identifies IOF. The algorithm will then go through four stages as

Technical Report (April, 2005 – March, 2006) 41 IT@MIT Development of Front End tools for Semantic Grid Services shown in Fig 8.4.3.1 to compute the degrees of match. The matchmaking module then performs comparison of IOF of the requested service R(IOF) with that of advertised A(IOF) service individually in three stages and computes three intermediate ranks namely Ir, Or, and Fr as shown in the Fig 8.4.3.1. All the intermediate ranks are combined together in aggregate module and least rank is considered as the final rank. This final rank reveals the degrees of match and the requester is allowed to access service if the ranked degrees of match is neither intersection nor disjoint [25].

R(I) A(I) R(O) A(O) R(F) A(F)

Input Output Functionality Matching Matching Matching

Ir Or Fr

Aggregate Module Ranked Degree of Match

Fig 8.4.3.1: Stages of Parameter Matchmaking Algorithm

8.4.4. Semantic Grid Portal A Grid portal that consists of several portlets to provide required user interface for semantic description and discovery of services is developed. It provides necessary interface for the service providers to register their grid service and to describe it semantically. It also provides interface for the service requesters to submit their queries and to perform matchmaking of services. The Service Oriented Architecture model of the proposed architecture for semantic grid service is shown in Fig 8.4.4.1. The sequence diagram for the service requester and the provider are shown in the Fig 8.4.4.2.

Technical Report (April, 2005 – March, 2006) 42 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 8.4.4.1: Service Oriented Architecture of Semantic Grid using PEG

Fig 8.4.4.2 a: Sequence diagram of service Fig 8.4.4.2 b: Sequence diagram of service provider requester

8.5. Experimental Results The user interface for semantic discovery of services and for matchmaking of advertised and requested services is developed using Gridsphere framework [26]. The service implemented in our testing purpose has four different arithmetic operations namely addition, subtraction, multiplication and division. Here, the functionalities play vital role in matching of services. The corresponding degrees of match are obtained using

Technical Report (April, 2005 – March, 2006) 43 IT@MIT Development of Front End tools for Semantic Grid Services parameter matchmaking algorithm and are listed in table 8.5.1.The table 8.5.1 also reveals the possibility of service access for all the degree of matches. The Fig 8.5.1 is a snapshot that shows the requester’s query and the corresponding degree of match obtained.

Fig 8.5.1: Snapshot for plugin match

Sl.No Requested Capability Ranked Degree Possibility of Of Match Service Invocation 1. Addition and Subtraction Plugin True

2. Addition, Subtraction, Multiplication Exact True and Division 3. Addition, Subtraction and Reversal of Intersection False String 4. Squaring and Temperature service Disjoint False

5. Addition, Subtraction, Multiplication Subsume True and Division, Temperature Service 6. Multiply, add, divide Plugin True

7. Square service Disjoint False

Technical Report (April, 2005 – March, 2006) 44 IT@MIT Development of Front End tools for Semantic Grid Services

Table 8.5.1: Experimental results for various inputs

8.6 Observations In this research work, we extended the capability of Globus Toolkit 4.0 by integrating Protégé ontology editor in it. This feature facilitates the Grid Service Providers to describe their services semantically. The semantic description of services enables semantic discovery of services. A semantic matchmaking algorithm is proposed that performs matchmaking of services on the basis of IOF parameters. The user interface for semantic description and retrieval is developed as a portal enabling the user to interact easily with the grid environment. A MathService is implemented and described semantically using PEG. The proposed algorithm has been applied for semantic matchmaking of mathematical services implemented. The proposed Architecture using Parameter Matchmaking Algorithm can be applied for any specific applications enabling the users to access grid comfortably.

Technical Report (April, 2005 – March, 2006) 45 IT@MIT Development of Front End tools for Semantic Grid Services

9. Semantic Grid Architecture using Gridbus Broker 9.1 Introduction This work addresses the need of semantic component in the grid environment to discover and describe the grid resources semantically. We propose semantic component that enables semantic description of grid resources with the help of ontology template. Further, we propose semantic grid architecture by proposing knowledge layer at the top of gridbus broker architecture and thereby enabling broker to discover resources semantically. The Ontology template has been created by considering all possible types of computing resources in the grid environment and protégé-OWL APIs is effectively used to update the template. Algernon inference engine is used for interacting with the ontology template to discover suitable resources.

9.2 Motivation In such an environment, it is essential to facilitate the user for easier discovery of the available resources. A resource on a grid can be any entity ranging from compute servers to databases, scientific instruments, applications etc. In grid like environment where resources are generally owned by different people, communities or organizations with varied administration policies and capabilities, obtaining and managing these resources is not a simple task. Resource brokers simplify this process by providing an abstraction layer to users who just want to get their work done. In the field of grids and distributed systems, resource brokers are software components that let users to access heterogeneous resources transparently [27]. Gridbus broker is a resource broker designed to support both computational and data grid applications. However, the resource discovery module implemented in the gridbus broker supports conventional keyword matching for discovering suitable resources. The broker neither offers expressive description facilities nor provides sophisticated matching capabilities. The resource discovery using semantics is generally more accurate than

Technical Report (April, 2005 – March, 2006) 46 IT@MIT Development of Front End tools for Semantic Grid Services keyword based search, as the direct similarities are found using inference logic. In this research work, we propose knowledge layer at the top of gridbus broker architecture for semantic description and discovery of resources. Eventually, the gridbus broker allows the resource requester to submit the job and execute it in the resource identified by knowledge layer.

9.3 Resource Brokers Grid platforms support sharing, exchange, discovery, selection and aggregation of geographically/Internet wide distributed heterogeneous resources – such as computers, databases, visualization devices, and scientific instruments. However, the harnessing of the complete power of grids remains to be a challenging problem for users due to the complexity involved in the creation and composition of applications and their deployment on distributed resources. Resource brokers hide the complexity of grids by transforming user requirements into a set of jobs that are scheduled on the appropriate resources, managing them and collecting results when they are finished. A resource broker in a data grid must have the capability to locate and retrieve the required data from multiple data sources and to redirect the output to storage where it can be retrieved by processes downstream. It must also have the ability to select the best data repositories from multiple sites based on availability of files and quality of data transfer. Gridbus Broker is one such broker developed by The University of Melbourne, Australia. The user’s job and quality of service requirements are submitted to the grid resource broker. The grid resource broker performs resource discovery based on user-defined characteristics, including price, using the Grid Information service and the Grid Market Directory. The broker identifies the list of data sources or replicas and selects the optimal ones. The broker also identifies the list of computational resources that provides the required application services using the Application Service Provider (ASP) catalogue. The broker ensures that the user has the necessary credit or authorized share to utilize resources. The broker scheduler maps and deploys data analysis jobs on resources that meet user QoS requirements. The broker

Technical Report (April, 2005 – March, 2006) 47 IT@MIT Development of Front End tools for Semantic Grid Services agent on a resource executes the job and returns results. The broker collects the results and passes them to the user. The complete architecture of gridbus broker is available at http://www.gridbus.org/broker. We exploit this sophisticated feature of Gridbus broker to devise semantic grid architecture. The semantic component implemented at the top of Gridbus broker will enable semantic description of grid resources using adaptive ontology template. It also implements semantic discovery module that uses Algernon inference engine to interact with the Ontology Knowledge base and discovers closely matching resources.

9.4 Layered Architecture We propose a five layered architecture that implements knowledge layer at the top of gridbus broker architecture as shown in Fig 9.4.1 and it can be used for building semantic grid infrastructure. The discussion of Fabric layer is omitted to avoid redundancy in explanation. Core Middleware Layer This layer incorporates Grid Middleware and currently the broker supports Globus, Alchemi, Nimrod-G and Unicore. This layer implements required protocols for Job scheduling, Resource Allocation and Management service and the like. High Level Middleware Layer/Gridbus Broker This layer uses the services offered by Gridbus broker. The Gridbus broker follows a service-oriented architecture and is designed on object-oriented principles with a focus on the idea of promoting simplicity, modularity, reusability, extensibility and flexibility. The inputs to the broker are the tasks and the associated parameters with their values. A task is a sequence of commands that describe the user’s requirements. The task requirements drive the discovery of resources such as computational nodes and data resources. The resource discovery module gathers information from remote information services such as the Grid Market Directory or Grid Index Information Services (GIIS) for availability of compute resources. Optionally, the list of available compute resources can be provided by the user to the broker. The broker also interacts with the information

Technical Report (April, 2005 – March, 2006) 48 IT@MIT Development of Front End tools for Semantic Grid Services service on each computational node to obtain its properties. The task descriptions, i.e., the task along with its associated parameters, is resolved or “decomposed” into jobs. A job is an instantiation of the task with a unique combination of parameter values. It is also be unit of work that is sent to a grid node. The set of jobs along with the set of service nodes are an input to the scheduler. The scheduler matches the job requirements with the services and dispatches jobs to the remote node. The jobs are dispatched to the remote node through the actuator component. The actuator submits the job to the remote node using the functionality provided by the middleware running on it. The actuator has been designed to operate with different grid middleware framework and toolkits such as Globus that primarily runs on Unix-class machines and Alchemi, which is a .NET based grid computing platform for Microsoft Windows enabled computers. Hence it is possible to create cross-platform grid implementation using the gridbus broker. On completion of execution, the agent returns any results to the broker and provides debugging information. The monitoring component updates the status of the jobs which is fed back to the scheduler to update its estimates of the rate of execution and of the performance of the compute resources. Knowledge layer Running on top of a high level middleware layer, the knowledge layer provide knowledge discovery from a huge amount of data. This layer is domain oriented and usually consists of service ontology built using protégé editor. The semantic component implemented in this layer enables semantic description of resources present in the grid environment. The component also provides a framework for semantic discovery of resources across the grid. A suitable inference engine is used to interact with the resource ontology and obtains suitable resources requested by the user to execute the job. Protégé- OWL APIs are used to develop resource ontology of the grid and its modification.

Technical Report (April, 2005 – March, 2006) 49 IT@MIT Development of Front End tools for Semantic Grid Services

Portlet Portlet Application Layer

Portlet Semantic Portlet Resource requester

Semantic Reasoner Tokeniser Interpreter Knowledge Layer

Job File Job De scriptor Reasoner Query

App Desc Resource Desc File File

Book High Level Middleware Scheduler Keeper layer Jobs Gridbus Actuator Job Monitor Broker

Agent Globus Unicore Core Middleware layer Alchemi SRB

Resources Storage Devices Super Computer Resources

Fabric Layer Cluster Desktop Machine

Fig 9.4.1: The Semantic Grid Architecture

Application layer The application layer enables the use of resources in a grid environment through various collaboration and resource access protocols. The semantic portlet present at this layer enables the resource provider to register the resource into the grid environment and also describe the resource semantically. The portlet also enables the resource requester to submit the query and semantic retrieval of suitable resource from the service ontology using suitable reasoner. This layer may also consist of various application portlets to use grid resources and for access to the grid environment.

Technical Report (April, 2005 – March, 2006) 50 IT@MIT Development of Front End tools for Semantic Grid Services

9.5 Semantic Description of Resources The semantic component implemented in the knowledge layer uses semantic web’s approach of making information understandable by computers. Information must therefore be described in such a way that computers can interpret it and derive its meaning. This will enable computers to work more intelligently with the information. Computer understandable information is information annotated with semantics that describes the meaning of the information. The annotations themselves have to be defined so that computers can interpret and reason with them. A collection of annotations where their meaning is described is called an ontology which plays central role of knowledge layer proposed in the architecture. Ontologies are used to capture knowledge about some domain of interest. Ontology describes the concepts in the domain and also the relationships that hold between those concepts [13]. Different ontology languages provide different facilities. The most recent development in standard ontology language is Web Ontology Language (OWL) from the World Wide Web Consortium. It is based on different logical model which makes it possible for concepts to be described and hence complex concepts can be built up in definitions out of simpler concepts [13, 28]. In the Grid environment, users and software agents should be able to discover, invoke, compose and monitor grid nodes offering particular services and having particular properties. An important goal for semantic grid then is to provide a framework for describing the resources semantically.

9.6 Resource Ontology Template To date, much ontology creation has been a manual process. In [30], common sense knowledge was extracted manually from different sources and expressed using Ontologies. This is inevitably a very labor-intensive process, and there is a need to at least partially automate the process of ontology creation and knowledge extraction. Hence, we can imagine a predefined ontology of classes and relationships, plus a

Technical Report (April, 2005 – March, 2006) 51 IT@MIT Development of Front End tools for Semantic Grid Services knowledge base of instances, being extended by automated learning [29, 34]. We create ontology of possible resources using protégé editor for semantic description of resources. Our structuring of the ontology of nodes is motivated by the need to provide semantic information about a resource. The Resource ontology proposed in this project takes all possible types of computing resources into account. We propose the following precise definitions to explain the motivation behind the creation ontology template and how it can be used for semantic description. Definition 1 An ontology template is a domain specific ontology that provides hierarchy of classes and properties to define their characteristics. Definition 2 Any resource can be modeled as an instance of a specific class provided that the resource can be described using the properties defined in that class. Once the ontology template is created that contains classes and properties, we can build the knowledge base which contains the instances and the specific property instantiations. Together the ontology and the knowledge base make up a semantic repository. Whether, the two parts are stored separately, e.g., in two distinct relational databases depends on the practicalities of the implementation [29]. When a resource is registered into the grid, its information can be described semantically in the ontology template using resource monitoring tool of grid middleware (Ex, MDS of Globus Toolkit). The resource information will be added as an instance of the respective class. Protégé-OWL APIs can be used to dynamically create instance of a particular class and also to update their properties. With these features, the resource information of the grid environment can be described semantically which in turn enables semantic discovery of grid resources. We also develop semantic discovery module that uses Algernon inference engine to query the knowledge base of the grid and retrieves the resource which is closely matching to the user request.

Technical Report (April, 2005 – March, 2006) 52 IT@MIT Development of Front End tools for Semantic Grid Services

9.7 Resource Discovery Module This module enables the resource requester to submit the information about the resource required to execute the job. The Query generator generates different types of algernon queries depending upon the requirements specified by the requester. The resource discovery module executes these queries over the ontology knowledge base to obtain best possible resources closely matching to the request. Once the suitable resource is obtained, the resource discovery module submits the resource information to the job descriptor. Meanwhile, the requester submits the application to be executed on the resource to the job descriptor. With this information, the descriptor then creates application description file and resource description file. Both these files are required by the broker to successfully run the application in the specified resource.

Fig 9.7.1: Service Oriented Model of Semantic Component

The Fig 9.7.1 identifies various modules implemented in semantic component. It provides a framework for semantic description of resources and also for its semantic discovery. The resource description module consists of resource ontology template

Technical Report (April, 2005 – March, 2006) 53 IT@MIT Development of Front End tools for Semantic Grid Services created using protégé editor provides necessary concepts and properties with which the resource is described. With this approach, the service provider is not required to possess knowledge in protégé editor as well as in OWL. The ontology template is domain specific and here we refer to possible computing resources in the grid environment. The resultant OWL file can then be queried using any inference engine to interact with the knowledge base. Here, we use Algernon inference engine for querying the ontology knowledge and semantic retrieval of information.

9.8 Design and Implementation For experimentation and demonstration purpose, the globus toolkit 4.0 has been successfully installed in six machines all of with Fedora core Operating system. All the components of globus toolkit have been successfully configured. Also, the MDS component has been tested properly so that the grid-info-search tool plugs the resource information of the local host and stores it in the ldap server. All prerequisite libraries has been installed and tested for proper functioning.

9.8.1 Creation of Ontology Template Protégé has been installed in one of the machine and ontology template has been created by considering all possible computing resources in the grid. The concept of these resources has been defined properly using relations and properties so that the characteristics of any resource can be defined by their properties. It is possible to explain the ontology template created on the basis of Karlsruhe Ontology Model. For illustration purpose, we define the concept of “RAM“ as C = {owl:Thing, RAM, C1024, C128, C256, C512},. For simplicity, we here described in detail about the concept of “C1024”. T = {Integer},

≤ C = {{owl:Thing, RAM},(RAM, C1024},{RAM, C128},{RAM, C256},{RAM, C512}}, A = {hasFreeMB}, R = {presentInComputer},

Technical Report (April, 2005 – March, 2006) 54 IT@MIT Development of Front End tools for Semantic Grid Services

σA = {(hasFreeMB, (Parameter, Integer))}

σR = {(presentInComputer, (RAM, WorkStation))} I = {C1024_0} V = {192}

tC = {C1024, {C1024_0}}

tT = {Integer, {192}}

tR = {presentInComputer, {C1024, (g06.grid)})

tA = {hasFreeMB, (Integer, {192})}

The values of the properties considered to define concepts, can be pulled up from MDS. The Fig 9.8.1.1 shows the ontology template with class hierarchy considered and the Fig 9.8.1.2 shows the object properties with which classes communicate between them.

Fig 9.8.1.1: The ontology template shows classes and properties.

Technical Report (April, 2005 – March, 2006) 55 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 9.8.1.2: Domain and Range of Properties of ontology template

9.8.2 Creation of Knowledge Base The MDS component offers tools to plug resource information of the host and that can be accessed through remote machines. The grid-info-search tool aggregates properties of the node and stores it in the ldap server which can be retrieved from ldap server using suitable ldap query. The resource description module has been developed using java programming language. The module contacts the grid nodes periodically and retrieves resource information by executing suitable ldap queries on those nodes and then updated into the ontology template. The protégé editor offers versatile libraries called Protégé-OWL APIs with which one can manage ontology and performs several operations over the ontology that includes creating and deleting the instances of concepts, assigning values to the properties etc. For every class of information retrieved from the grid node, we create instances of appropriate concept in the ontology template. Also, the values of various properties retrieved are assigned to respective properties of the appropriate concepts in the ontology template. At this point, the ontology template with concepts and properties and corresponding instances and property values together constitutes knowledge base of the

Technical Report (April, 2005 – March, 2006) 56 IT@MIT Development of Front End tools for Semantic Grid Services grid resources. This semantic description of resources facilitates the use of inference engine to interact with the knowledge base and retrieves information semantically. Also, the java module is made to execute periodically so that removal and addition of resources is accounted in the knowledge base. The Fig 9.8.2.1 shows the knowledge base that consisting of resource information described semantically in the ontology template. The following coding segment create instance for Workstation concept. OWLNamedClass computerC=owlmodel.getOWLNamedClass("WorkStation"); The following code create instance of datatype property and object property. It also shows how to assign values to those properties. OWLDatatypeProperty hasIP = owlModel.getOWLDatatypeProper- ty("hasIP"); cpuI.addPropertyValue(owlModel.getOWLObjectProperty("hasCPU Vendor"),cVendorI); computerI.addPropertyValue(owlModel.getOWLObjectProperty("h asCPU"),cpuI);

Technical Report (April, 2005 – March, 2006) 57 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 9.8.2.1: Knowledge base of the Grid.

9.8.3 Semantic Discovery Module The discovery module relies on the power of Algernon inference engine. In order to make the conversion of user query into Algernon query and also to provide flexible mechanism of querying, we propose query tags. We took inspiration for forming query tags from GMAIL tagging mechanism. In GMAIL, it is possible to search all mails with a particular label using the query “label:label_name”. Similarly, here we use properties of the resource as label and requested value as label_name. However, we implement a modified version of GMAIL query tag in our discovery mechanism. We also include to operators in query label for flexible querying. For Ex., If the user wants to search for machines with free RAM value greater than 200 MB, the query value would be

Technical Report (April, 2005 – March, 2006) 58 IT@MIT Development of Front End tools for Semantic Grid Services

RAM:>200. Currently, the querying system supports >, <, = and also NOT operators. Also, the query mechanism is designed in such a manner that it can query a resource with multiple resource constraints. For Example, if the user wants to query a machine with free RAM 200MB and free Harddisk space of 10000MB, then the query “freeRAM:200 freeHDD:10000” will retrieve those resources with 200MB and harddisk space with 10000MB. The Query generator module parses the user query using regular expression, stores lefttag and righttag in a vector and converts it into suitable Algernon query. Currently, the system supports the following queries:- "((:instance RAM ?inst)(hasFreeMB ?inst ?val)(:TEST (:LISP (= ?val "+rightTag+")))(presentInComputer ?inst ?instance- Computer))";

"((:instance RAM ?inst)(hasFreeMB ?inst ?val)(:TEST (:LISP "+rightTag.charAt(0)+" ?val "+rightTag.substring(1)+"))) (presentInComputer ?inst ?instanceComputer))";

"((:instance CPU ?inst)(hasL2Cache ?inst ?val)(:TEST (:LISP (= ?val "+rightTag+" ) ) )(:instance WorkStation ?in- stanceComputer)(hasCPU ?instanceComputer ?inst))";

"((:instance CPU ?inst)(hasL2Cache ?inst ?val)(:TEST (:LISP ("+rightTag.charAt(0)+" ?val "+rightTag.substring(1)+" ) ) )(:instance WorkStation ?instanceComputer)(hasCPU ?in- stanceComputer ?inst))";

"((:instance CPU ?inst)(hasCPUSpeed ?inst ?val)(:TEST (:LISP (= ?val "+rightTag+" ) ) )(:instance WorkStation ? instanceComputer)(hasCPU ?instanceComputer ?inst))";

"((:instance CPU ?inst)(hasCPUSpeed ?inst ?val)(:TEST (:LISP ("+rightTag.charAt(0)+" ?val "+rightTag.substring(1)+" ) ) )(:instance WorkStation ?in- stanceComputer)(hasCPU ?instanceComputer ?inst))";

Technical Report (April, 2005 – March, 2006) 59 IT@MIT Development of Front End tools for Semantic Grid Services

"((:instance CPU ?inst)(hasCPUSpeed ?inst ?val)(:TEST (:LISP ("+rightTag.charAt(0)+" ?val "+rightTag.substring(1)+" ) ) )(:instance WorkStation ?in- stanceComputer)(hasCPU ?instanceComputer ?inst))";

"((:instance FileSystem ?inst)(hasFreeSpace ?inst ?val) (:TEST (:LISP (= ?val "+rightTag+" ) ) )(:instance Work- Station ?instanceComputer)(hasFileSystem ? instanceComputer ?inst))";

"((:instance FileSystem ?inst)(hasFreeSpace ?inst ?val) (:TEST (:LISP ("+rightTag.charAt(0)+" ?val "+rightTag.sub- string(1)+" ) ) )(:instance WorkStation ?instanceComputer) (hasFileSystem ?instanceComputer ?inst))";

The module will then execute the queries over the knowledge base of the grid and obtains the resource that is matching with the user’s request. The module also enables the requester to submit the job and execute it on the resource obtained from the discovery module. The job submitter generates application description file and resource description file using the information given by the user and resource discovered. It is then submits both the file to the gridbus broker and obtains the results which will be delivered to the user.

9.8.4 Job Descriptor The resource discovery module interacts with the knowledge base and obtains best possible resource depending on the user requirements. The user is then prompted to submit the job to the gridbus broker. The gridbus broker executes user’s job in the resource discovered by the discovery module. The user is prompted to load the executable followed by the command to execute the job. With this information, the job descriptor creates two XPML files namely the Application Description File and Resource Description File which are needed by the broker to locate the resource and execute it. The Application Description File is an XML file with special elements as defined in the XML schema that comes with the broker. XPML supports description of parameter sweep application execution model in which the same application is run for different values of Technical Report (April, 2005 – March, 2006) 60 IT@MIT Development of Front End tools for Semantic Grid Services input parameters often expressed as ranges. An XPML application description consists of three sections: Parameters, Tasks, and Requirements.  Parameters normally have a name, type and domain and any additional attributes. Parameters can be of various types including: integer, string, gridfile and belong to a “domain” such as single, range or file.  A task consists of “commands” such as copy, execute, substitute etc. The copy command specifies a copy operation to be performed. Each of the copy commands has a source and destination file specified. An execute command is where actual execution happens. The execute command specifies an executable to be run on the remote node. It also specifies any arguments to be passed to the command on the command-line. A substitute command specifies a string substitution inside a text file. This operation is used to substitute the names of user-defined variables. The resource description file is just an xml file describing the resources that can be used by the broker, and their properties as defined in the resource description schema that comes with the broker. The resource description can be used to describe two types of entities – resources and credentials to access the resources. A resource can be of three types: Compute resources, storage resource and services. Compute resources are servers to which the user’s jobs can be submitted for execution. Storage resources are used to store the results of execution, and hence can be considered as data sinks. Service resources are those which provide generic services that can be used by the broker. A compute resource is associated with a “domain” which can take two values – “local” and “remote”. Local resources could be the local computer, or a cluster (on which the broker is running). Remote compute resources are used to represent nodes on the grid which have a job-submission interface accessible via a network. So resources which run grid-middleware such as Globus, Unicore and Alchemi etc. are described here.

A storage resource is a data sink where the user can opt to store the results of

Technical Report (April, 2005 – March, 2006) 61 IT@MIT Development of Front End tools for Semantic Grid Services execution of a grid application. Currently, this feature is not fully supported by the broker. A service resource can be of two types – “information” services and “application”services. Information services are typically entities which provide information about other resources or services. Currently supported service types include the SRB MCAT and the replica catalog. Application services provide applications hosted on nodes that can be accessed as a service. A “credentials” entry describes the user’s authentication information that is used to access the services provided by a grid resource. Credential can be of the following types – x.509 based proxy certificates, simple username/password pairs, MyProxy saved proxies or key stores. For demonstration purpose, we wrote a simple application that performs multiplication of two numbers without taking any external arguments. This job must be executed in a resource identified by the semantic discovery module. The application has been compiled successfully and respective class file is created. The user searches for the resources through the semantic component and the resource discovery module discovers the suitable resource, providing the job descriptor with the hostname of the resource. Meanwhile, the user’s job and the Unix command needed to execute the job are submitted to descriptor. Hence, the class file, command to execute the class file and the hostname of the resource are inputs to the job descriptor. The Job descriptor is implemented using java programming language that automatically creates application description file and resource description file with its input information. They will be submitted to the broker and initiates scheduling of jobs. Once the execution is over, the results will be collected and presented to the user. The following is the resource description file if the node discovered by the resource discovery module is the same in which broker is running. In that case, the compute resource is local resource.

Technical Report (April, 2005 – March, 2006) 62 IT@MIT Development of Front End tools for Semantic Grid Services

xsi:noNamespaceSchemaLocation="../xml/ResourceDescrip- tionSchema.xsd">

The following coding segment identifies credential needed to access the local re- source.

The following coding segment identifies the local computing resources. In this case “g06.grid” is the hostname of the local compute resource.

The following is the coding segment of the resource description file needed to execute broker in the remote node.

The following coding segment identifies credential needed to access the local resource. Here the password has been specified which will be used for authentication purposes. The following coding segment identifies the local computing resources. In this

Technical Report (April, 2005 – March, 2006) 63 IT@MIT Development of Front End tools for Semantic Grid Services case “g03.grid” is the hostname of the remote compute resource.

Similar to the resource description file, the application description will also be created which is purely dependent on the job’s requirement. The structure of the file will change depending on the nature of job and location of the resource. The following is the coding segment of the application description file in which the job to be executed in the local compute resource.

Since the execute location is local, it is assumed that class file available in the di- rectory that broker can locate. “java cals” is the command to execute the job and the re- sults will be stores in the broker directory. The following is the coding segment of the application description file in which the job to be executed in the remote compute resource. In this case, the job file must be transferred from local machine in which broker is running to the remote node. The execu- tion will take place in the remote node and results will be transferred from the remote node to the local machine.

Technical Report (April, 2005 – March, 2006) 64 IT@MIT Development of Front End tools for Semantic Grid Services

The following is the code segment to transfer the class file specified in the source element and copied onto the location of the remote node specified in the destination loca- tion.

The following is the coding segment to execute the job which is now present in the remote node.

Once the execution is over, the results are collected and transferred back to local machine from the remote destination.

Once the description files are created, then broker must be invoked and scheduling is initialized. The following coding segment invokes the broker with description files as input.

import org.gridbus.broker.farming.common.GridbusFarmingEngine;

Technical Report (April, 2005 – March, 2006) 65 IT@MIT Development of Front End tools for Semantic Grid Services import org.gridbus.broker.farming.common.BrokerProperties; public class SimpleBrokerAPIExample {

public static void main(String[] args) throws Exception { try{

//Create a new "Farming Engine" GridbusFarmingEngine fe=new GridbusFarmingEngine(proper- ties);

//Set the App-description file fe.setAppDescriptionFile("/home/adf.xml");

//Set the Resource-description file fe.setResourceDescriptionFile("/home/rdf.xml");

//Call the initialise method fe.init();

//Start scheduling fe.schedule();

/* * The schedule method returns immediately after starting the * scheduling. To wait for results / monitor jobs, * use the following loop: */

while (!fe.isSchedulingFinished() && !fe.isS- chedulingFailed());

}catch (Exception e){ e.printStackTrace(); } } } The coding segment will wait for results and collects them once the execution is over. This discovery module has been currently been tested with globus middleware but since we are using gridbus broker, it can also support Alchemi etc., provided.

9.9 Experimental Results

Technical Report (April, 2005 – March, 2006) 66 IT@MIT Development of Front End tools for Semantic Grid Services

The following are some of the screenshots of the implementation. A separate gridsphere portlet has been developed to enable the user to interact with the semantic discovery component and submit job to the broker. The portlet has been then deployed onto the tomcat container. Since, the knowledge layer has been implemented as web service identified as unique URI, it is possible to access the semantic component to discover the resource and to submit the job to the broker from remote machine itself. The Fig 9.9.1 shows the front page of the discovery module that presents an user interface layout to enter query. To new user, the interface also presents a link to know about the format of queries and other information to access the semantic component without any problem. This query will be converted into suitable Algernon query and executes over the Knowledge base and as a result, the hostname of the closely matching resource will be listed as shown in Fig 9.9.2. An optional “Detail” button is presented with which one can obtain more details about the host. The portlet then allows the user to proceed further with broker information page as shown in the Fig 9.9.3 through which job can be submitted to the broker and executed in the resource obtained from the discovery module.

Fig 9.9.1: User Interface to enter query

Technical Report (April, 2005 – March, 2006) 67 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 9.9.2: Semantic Retrieval of Resource Information

Fig 9.9.3: Job Submission interface with Gridbus Broker

Technical Report (April, 2005 – March, 2006) 68 IT@MIT Development of Front End tools for Semantic Grid Services

9.10 Observations The knowledge layer implemented in the proposed architecture makes gridbus broker to describe and discover resources semantically. The resource description module has been designed so that any entry and removal of resources is reflected in the ontology templates making the system more flexible. With this ontology template, we overcome the difficulty of service provider to have the knowledge of protégé ontology editor. However, the ontology template developed in this project is depending on MDS component and hence it may not support middlewares other than globus.

Technical Report (April, 2005 – March, 2006) 69 IT@MIT Development of Front End tools for Semantic Grid Services

10. A case Study of WSMX environment In this section, we describe the new and emerging technology for developing semantic web service named Web Service Execution Environment (WSMX). It is an execution environment for dynamic discovery, mediation and invocation of web services based on the Web Services Modeling Ontology (WSMO). We give a brief introduction about various aspects of WSMX in this section and also describe our experience with WSMX while developing semantic web service and its dynamic invocation.

10.1 Web Service Modeling eXecution Framework The Web Services Modeling Execution Environment (WSMX) [32] is an execution environment for dynamic discovery, mediation and invocation of web services. WSMX is based on the Web Services Modeling Ontology (WSMO), an ontology for describing various aspects related to Semantic Web Services. So far web services are mostly hard-wired into the provider's software and their abilities are not too sophisticated. Their main drawback is lack of semantic description and because of that it is difficult to perform automatic or semi-automatic service retrieval. WSMO aims to change this

Fig 10.1.1: The Architecture of WSMX

Technical Report (April, 2005 – March, 2006) 70 IT@MIT Development of Front End tools for Semantic Grid Services situation in order to fully enable web service functionality through adding semantic means that will result in web services vast proliferation. WSMX is a reference implementation and test bed environment for WSMO. Web Services Capability Descriptions are stored in the WSMX repository and they are described in terms of logic expressions so they can be derived from. When service request (Goal) is sent to WSMX, then WSMX components start to perform their actions. The Fig 10.1.1 shows the architecture of WSMX and its components are explained in brief as follows:- Message Adapters: These components are in fact external to the WSMX architecture, but as long as back end applications do not offer a WSMO API they are essential for connecting any kind of back-end application to the WSMX server. The Message Adapters allow transforming from any message format (e.g. RosettaNet, UBL, EDIFACT, ANSI X12, xCBL etc.) to the expected WSML message format. User Interface (WSML Editor): This component is used by the service provider to create the description of the Web Services, Ontologies, Mediators and Goals. Communication Manager: The Communication Manager has a twofold purpose. On the one hand it provides an interface to the Adapters to accept and send WSML messages and on the other hand it is responsible for translating the message into whatever data representation is required by the external entity to be invoked. This translation is often referred to as ‘lowering’. To do so the Communication Manager interprets the interface part of the WSMO service description to determine which binding is necessary for the service. Where it is not fixed, the transport protocol and associated security required for communicating with the external entity may also be selected. After invocation the Communication Manager is responsible to translate the message received from an external entity using any data representation to WSML. This translation is often called ‘lifting’. WSMX Manager: This component is the coordinator within the architecture. Although it is not an essential component to describe the use cases, it is mentioned here because it denotes the Service Oriented Architecture (SOA) approach of WSMX. All data handled inside WSMX is internally represented as a notification with a type and state. The WSMX

Technical Report (April, 2005 – March, 2006) 71 IT@MIT Development of Front End tools for Semantic Grid Services manager manages the processing of all notifications by passing them to other components as appropriate. Ontology Repository: WSMX will offer the management of capability descriptions stored in a repository. This repository could be centralized or decentralized, whereas this use case and the current architecture only scopes with a decentralized repository in each WSMX. These repositories are designed to store, search, retrieve and manage WSMO descriptions of Semantic Web Services. Within this document the name Capability Repository is synonymously used for Ontology Repository. Matchmaker: WSMX will offer a set of usable Semantic Web Services by matching capabilities stored in the repository with the goal provided by the user. In subsequent versions WSMX will even be capable to fulfill goals by a composition of the capabilities of several Semantic Web Services. In both cases the result of the Matchmaker can be zero, one or many Web Services. Mediator: WSMX will offer mediation of communicated data. The mediation component tries to determine a Mediator for a request in case this is necessary. This mediation can be between two or more ontologies in the matchmaking process and the opposite way after invocation to mediate between the instance data of a known ontology provided by the executed Web Service to the required data in the invoking application. Another application of Mediators would be the mapping between the data provided in the input of the goal to the actual required input of the Web Service. Choreography Engine: The choreography of a Web Service defines its communication pattern, that is, the way a requester can interact with it. The requestor of the service has its own communication pattern and only if the two of them match precisely, a direct communication between the requestor and the provider of a service may take place. Since the clients communication pattern is in general different from the one used by the Web Service, the two of them will not be able to directly communicate, even if they are able to understand the same data formats. The role of the Choreography Engine is to mediate between the requester's and the provider's communication patterns. This means to provide the necessary means for a runtime analyses of two given choreography instances and to

Technical Report (April, 2005 – March, 2006) 72 IT@MIT Development of Front End tools for Semantic Grid Services use Mediators to compensate the possible mismatches that may appear, for instance, to generate dummy acknowledgement messages, to group several messages in a single one, to change their order or even to remove some of the messages in order to facilitate the communication between the two parties.

10.2 Web Service Modeling Ontology The Web Service Modeling Ontology [WSMO] along with its related efforts in the WSML [WSML Working Group] and WSMX [WSMX Working Group] working groups presents a complete framework for Semantic Web Service, combining Semantic Web and Web Service technologies [31]. The Web Service Modeling Language (WSML) is a formalization of the WSMO ontology, providing a language within which the properties of Semantic Web Services can be described. The objectives of the WSMO are to:-  Apply WSMO technologies for Semantic Web Services.  Specify Semantic Web Services with WSMO.  Correctly assess technologies, products, and development within Semantic Web and Web Service technologies. The WSMO is ontology for describing Semantic Web Service. WSMO is based on the Web Service Modeling Framework (WSMF).WSMF consists of four different main elements for describing semantic Web services: Ontologies provide the terminology used by other WSMO elements. Goals represent user desires, for which fulfillment could be sought by executing a Web service. Ontologies can be used for the domain terminology to describe the relevant aspects. Web services describe the computational entity providing access to services that provide some value in a domain. These descriptions comprise the capabilities, interface, and internal working of the web service. Mediators resolve interoperability problems between different WSMO elements. Mediators are the core concept to resolve incompatibilities on the data process and protocol level i.e., in order to resolve mismatches between different used terminologies (data level), in how to communicate between Web services (protocol level) and on the level of combining web services. There are four different mediators as

Technical Report (April, 2005 – March, 2006) 73 IT@MIT Development of Front End tools for Semantic Grid Services listed below:-  OOMediators-import the target ontology into the source ontology by resolving all the representation mismatches between the source and the target.  GGMediators-connect goals that are in relation of refinement and resolve mismatches between those.  WGMediators-link Web services to goals and resolve mismatches.  WWMediators-connect several Web services for collaboration. The aim of WSMO is to solve the integration problem by describing Web Services semantically and by removing ambiguity about the capabilities of a Web Services and the problems it solves.

10.3 Web Services Modeling Toolkit The Web Services Modeling Toolkit (WSMT) is a framework for the rapid creation and deployment of homogeneous tools for Semantic Web Services [33]. A homogeneous toolkit improves the users experience while using the toolkit, as the tools have a common look and feel. Usability is also improved as the user does not need to relearn how to use the application when switching between tools. The WSMT enables developers of tools to focus on the tool's functionality and provides the framework within which they can be deployed and executed. The WSMT is implemented in the Java programming language in order to benefit from its multi-platform support and the existing Java libraries available for Semantic Web technologies, for example WSMO4J3. Using the WSMT frame-work does not require the user to learn any new technologies. This can be a problem with other frameworks like Eclipse4, which uses a different graphical library (IBM's SWT - The Standard Widget Toolkit) than most Java users are accustomed to (Sun's Java Swing library).

Technical Report (April, 2005 – March, 2006) 74 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 10.3.1: Architecture of the WSMT

The Fig 10.3.1 shows the architecture of the Web Services Modeling Toolkit, which consists of three tiers. The first tier contains the compact launcher, the second contains the core and the third contains the individual plug-ins. Each tool is implemented as a plug-in to the WSMT framework. Deploying the tool into the framework is just a matter of compiling the plug-in into a jar file, which implements a number of interfaces, and placing the jar file, along with any third-party jars used, into the lib folder of the WSMT installation. This means that new tools can be deployed into the application without the requirement of recompiling the application. Building the classpath dynamically is a major issue when developing applications where it is not known in advance what jar files will be in the classpath. When executing an application in unix environment, scripting can be used to build this classpath. However, this is only possible in operating systems where scripting is supported. The job of the launcher is to build the dynamic classpath. It does this by locating all jar files in the lib folder of the WSMT installation and building a dynamic classloader. This classloader is then used to launch the WSMT core. The WSMT core is responsible for supplying the 'glue' code to the plug- ins (tools), providing the main application frame, the menu bar, and the configurations for multi-language localization. The core loads all the available tools by searching for plug-

Technical Report (April, 2005 – March, 2006) 75 IT@MIT Development of Front End tools for Semantic Grid Services in description files in the lib folder of the WSMT installation. Each description specifies a unique identifier and the class that the core should instantiate in order to load the plug- in. This class must implement the Plugin interface, which allows access to the plug-in itself. The WSMT is wrapped in a full installation system, which allows the end user to choose the tools that are installed during the installation process. A fully private Java 1.5 run-time environment is also installed, which means there is no dependency on the user to install any third-party software. A third-party tool provider can choose to supply their own tool for inclusion in the WSMT installation or provide their own installation for their tool.

10.4 Implementation Issues with WSMX With the knowledge of the WSMX environment, we developed a MathService and it is described semantically using WSMO. The WSML editor is used to create ontology and the concepts are described clearly as shown in the Fig 10.4.1. The service implements an adder method that performs addition of two integers and returns an integer as output.

Fig 10.4.1: Service Ontology using WSMO

Technical Report (April, 2005 – March, 2006) 76 IT@MIT Development of Front End tools for Semantic Grid Services

The WSMX environment provides an interface named WSMX invoker with which one can invoke and access the service.

F Fig 10.4.2 a: WSMX invoker providing input-one

Fig 10.4.2.b: WSMX invoker providing input-two

Technical Report (April, 2005 – March, 2006) 77 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 10.4.3: WSMX invoker returning output The Fig 10.4.3 shows the WSMX invoker returning output to the requester. The WSMX environment communicates with the service through messages. The Fig 10.4.3 clearly shows the response message received from the service. There is no agreement on a stable Reasoner interface so far. Once work on WSMO reasoner implementation will finalize, its interface will become standardized through WSMX infomodel. It is also mentioned in the website of WSMO that the latest version will be having reasoning feature which will be based on FLORA-2.

10.5 FLORA-2 FLORA-2 is a sophisticated object-oriented knowledge base language and application development platform. It is implemented as a set of run-time libraries and a compiler that translates a unified language of F-logic, HiLog and Transaction Logic into tabled Prolog code. Applications of FLORA-2 include intelligent agents, Semantic Web, ontology management, integration of information and others. The programming language

Technical Report (April, 2005 – March, 2006) 78 IT@MIT Development of Front End tools for Semantic Grid Services supported by FLORA-2 is a dielect of F-logic with numerous extensions, which include a natural way to meta-programming in the style of HiLog and logical updates in the style of Transaction Logic. FLORA-2 was designed with extensibility and flexibility in mind, and it provides strong support for modular software design through its unique feature of dynamic modules. FLORA-2 is distributed in two ways. First, it is part of the official distribution of XSB and thus is installed together with XSB. Second, a more up-to-date version of the system is available in http://flora.sourceforge.net A simple FLORA-2 program

// Load this into module john[name->'John']. john[age->33]. john[salary->111111]. The above file will be having filename extension .flr. The query to interact with their program and the respective output is shown below. flora2 ?- john[name->Y]. //query This query will retrieve the name of john and store in the variable 'Y'. as follows:- Y = john 1 solution(s) in 0.0000 seconds on g03.grid Yes floa2 ?-

10.6 Issues and Difficulties The difficulties we faced while working with WSMX environment is its inability to support linux in the latest development releases. The latest stable release WSMX 0.1 is based on Eclipse framework and is does not support linux operating system. Even, in the latest version, the inference engine is not integrated with the WSMX environment. We communicated Mike Kerrigen, the pioneer of WSMX environment and he promised that their developer group will concentrate in the new stable release with full support to Linux operating system. With these limitations, we found there is no possibility of working with current release of WSMX environment in Linux and work is temporarily suspended.

Technical Report (April, 2005 – March, 2006) 79 IT@MIT Development of Front End tools for Semantic Grid Services

11. Further Scope The architectures discussed in the preceding sections implemented with their own knowledge layer, have their own advantages and limitations. The Semantic Grid Architecture using PEG which is described in the section 8 provides the service providers to describe their services semantically using protégé editor integrated with globus toolkit. In this case, the service provider needs to have the familiarity in protégé editor to create service ontology. Since, the semantic descriptions of Grid Service cannot be done automatically, manual intervention for creating grid service ontology is unavoidable. Parameter Matchmaking Algorithm currently works with OWL file but it can also be made to work with OWL-S descriptions too with necessary modifications in the source code. The matchmaking algorithm currently tested with minimal number of OWL files. In actual grid environment, large number of service providers publishes their service ontologies. In those cases, the request needs to be compared with all OWL files to obtain closely matching services thereby increasing the overhead and response time. This can be avoided using the concept of clustering where OWL files are clustered based on some common property which could be service type and the like. Consequently, the request needs to be directed to appropriate cluster and OWL files falling in that cluster only need to be compared. This will greatly improve the performance and leads to efficient searching of services. The semantic grid architecture using gridbus broker described in section 9 enables semantic description of grid resources using ontology template. This approach of using an ontology template has been proposed after thorough literature survey in related works. The definitions proposed in that architecture has been a result of thorough analysis of ontology literatures. Though the ontology template considers all possible computing resources in the grid, insertion of new class of resource will not be possible automatically. This is because, the semantic description module currently does not implement any mechanism to identify entry of anonymous resource in the grid. Currently, the ontology template does not impose any restrictions on concepts. However, this feature can be added to improve the efficiency of resource discovery

Technical Report (April, 2005 – March, 2006) 80 IT@MIT Development of Front End tools for Semantic Grid Services

The semantic discovery module implemented in the architecture relies on the power of Algernon inference engine. Currently, the discovery module implements about ten types of queries. This can greatly be increased to support more and more number of types of queries. Algernon is a rule based inference engine supporting backward and forward chaining rules. To date, suitable rules are being tested and will be added in future thereby increasing speed of resource discovery. Still we need to exploit constrains and axioms imposed on ontology concepts while querying Algernon inference engine. This feature will also be added and tested. In the proposed discovery mechanism, we present a user interface which will accept the query in predefined format. The user can query a resource by specifying necessary constraints to meet his requirement. For example, freeRAM of 400MB. These constraints will be treated as properties of the resource and its associated value that is present in the ontology knowledge base. The Algernon inference engine queries for a resource using this property value. Hence, the user needs to query a resource using the property that is used in ontology. This is not practical always as there can be many words interpreting the same meaning. For example, freeRAM can also be equivalent to hasfreeRAM or RAM. Currently, we implement Hash table with similar words revealing the same properties. But we can take inspiration from google approach of training the system to learn about same words interpreting similar meaning. In this approach, we can present a user interface that lets the people using this discovery mechanism to update the new words thereby it is possible to attain greater efficiency of semantic search. With these observations and inferences, we have designed a full-fledged knowledge layer that enables semantic description of resources and services by using the concepts of ontology template, GridWSDL2OWL-S tool. The layer also proposes semantic discovery engine that performs resource discovery and also service matchmaking based on QoS parameters. Ontology clustering proposed in this layer will improve the efficiency of matchmaking. The Fig 11.1 shows the functional model of the knowledge layer.

Technical Report (April, 2005 – March, 2006) 81 IT@MIT Development of Front End tools for Semantic Grid Services

Fig 11.1 Functional Model of the Knowledge Layer

The architecture uses the semantic grid architecture described in section 9 for semantic description of resources and its discovery. Further, the architecture identifies necessary components for semantic description of services and Matchmaking of services advertisements against the requested ones. The resource provider registers the resource in the grid environment. The semantic description module queries the MDS registry and describes it semantically in the ontology template. The ontology template and instances of resources together constitutes the semantic repository and it also can be called as Knowledge base (KB). The semantic discovery module queries the knowledge base and retrieves suitable resource that meets the user requirement.

Technical Report (April, 2005 – March, 2006) 82 IT@MIT Development of Front End tools for Semantic Grid Services

The service provider registers the grid service in the MDS registry. Meanwhile, GridWSDL2OWL-S tool converts the WSDL file into OWL-S descriptions. We are also making literature survey of WSLD-S and we try to see whether it can be used as an alternative to OWL-S. But currently, OWL-S is widely used for describing service and it is accepted as standard, we propose OWL-S as the main stream component in the architecture. The clustering module clusters the service descriptions by exploiting the common properties of the OWL-S descriptions and stores it in the UDDI registry. Most recently, tools have come up to store the OWSL descriptions in UDDI registry and also to retrieve from [35]. The Matchmaking module receives request descriptions from the user which contains input, output, functionality and other optional parameters that include QoS parameters. The Module identifies the appropriate cluster and starts comparing the OWL-S descriptions of that cluster. It retrieves inputs, outputs, functionality and QoS parameters of the OWL-S descriptions and finds the degree of closeness of those parameters with that requested ones. The module uses appropriate domain ontology to identify the degree of closeness and ranks the degree starting from exact, plugin, subsume and disjoint which stands for similar to different-in-all-respects respectively. With this architecture, it will be possible to describe service and resources semantically and also its semantic retrieval. Further, it is also decided to include workflow engine for this knowledge layer. Literature survey has to be started to implement workflow engine for the proposed architecture.

Technical Report (April, 2005 – March, 2006) 83 IT@MIT Development of Front End tools for Semantic Grid Services

12. Conclusion The semantic grid architecture using PEG enables the service providers to describe their grid services semantically. Whereas, the architecture using Gridbus broker, provide semantic descriptions of grid resources using grid resource ontology template. We made a wide literature survey of ontology clustering with which the performance of ontology matchmaking can be improved. With these observations, we propose a versatile knowledge layer which can be implemented in the grid architecture that performs semantic descriptions of grid resources, WSDL description of WSRF services into OWL- S descriptions, Discovery of Suitable Grid resources, Ontology clustering and QoS based Matchmaking algorithm. With these sophisticated features implemented in architecture will result in versatile front end for implementing semantic grid services.

Technical Report (April, 2005 – March, 2006) 84 IT@MIT Development of Front End tools for Semantic Grid Services

References

1. Foster, I. and Kesselman, C. (eds), “The Grid: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann, 1999, 259-278. 2. Foster, I. Kesselman, C. and Tuecke, S, “The Anatomy of the Grid: Enabling Virtual Organizations”, International Journal of High Performance Computing Applications, 15(3), 200-222, 2001. 3. Foster, I., Kesselman, C, Jeffrey M. Nick, Steven Tuecke. „The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, A Draft Document, Version: 6/22/2002 4. Bray, T., Paoli, J. and Sperberg-McQueen, C.M. “The Extensible Markup Language (XML) 1.0”, 1998. 5. Fallside, D.C. “XML Schema Part 0: Primer”. W3C, Recommendation, 2001, http://www.w3.org/TR/xmlschema-0/ 6. “Simple Object Access Protocol (SOAP) 1.1”. W3C, Note 8, 2000. 7. Christensen, E., Curbera, F., Meredith, G. and Weerawarana., S. “Web Services Description Language (WSDL) 1.1”. W3C, Note 15, 2001, www.w3.org/TR/wsdl. 8. Brittenham, P. “An Overview of the Web Services Inspection Language”, 2001, www.ibm.com/developerworks/webservices/library/ws-wsilover. 9. “UDDI: Universal Description, Discovery and Integration”, www.uddi.org. 10. Daconta, Obrst, Smith. “The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management”, Wiley Publishing, Inc. 2003. 11. Grigoris Antoniou and Frank van Harmelen, “A Semantic Web Primer", The MIT Press, 2004. 12. RDF Primer. W3C Recommendation 10 February 2004 13. OWL Web Ontology Language Overview. W3C Recommendation 10 February 2004. 14. Massimo Paolucci, Katia Sycara, Takuya Nishimura, and Naveen Srinivasan, “To- ward a Semantic Web e-commerce” To appear in Proceedings of BIS2003.

Technical Report (April, 2005 – March, 2006) 85 IT@MIT Development of Front End tools for Semantic Grid Services

15. Dean, M. (ed.), “OWL-S: Semantic Markup for Web Services”, Version 1.1 Beta, 2004. 16. Katia Sycara, Massimo Paolucci, Anupriya Ankolekar and Naveen Srinivasan, “Auto- mated Discovery, Interaction and Composition of Semantic Web services”, Journal of Web Semantics, Volume 1, Issue 1, September 2003, pp. 27-46 17. David De Roure, Nicholas R. Jennings and Nigel R. Shadbolt, “The Semantic Grid: Afuture e-Science Infrastructure”, Grid Computing – Making the Global Infrastruc- ture a reality, John Wiley & Sons, Ltd, 2003. 18. Stumme, G., Ehrig, M., Handschuh, S., Hotho, A., Maedche, A., Motik, B., Oberle, D., Schmitz, C.,Staab, S., Stojanovic, L., Stojanovic, N., Studer, R., Sure, Y., Volz, R., Zacharias, V., “The Karlsruhe view on ontologies”, Technical report, University of Karlsruhe, Institute AIFB (2003) 19. Micheel C.Jaeger, Gregor Rojec-Goldmann, Christoph Liebetruth and Kurt Geihs, “Ranked Matching for Service Descriptions using OWL-S”,2004 20. www.globus.org/toolkit/mds 21. Foster, “A Globus Primer: Describing Globus Toolkit Version 4: August 2005 22. www-128.ibm.com/developerworks/grid/library/gr-mdsgt4 23. YeZhang and William Song, “Semantic Description and Matching of Grid Services Capabilities”. 24. Lei Li,Ian Horrocks, “A Software Framework For Matchmaking Based on Semantic Web Technology”, WWW2003, May 20-24,2003. 25. Chen Zhou, Liang-Tien Chia, Bu-Sung Lee, “Service Discovery and Measurement based on DAML-QoS Ontology”, WWW2005, May 10-14, 2005 26. JSR – 000168 Portlet Specification(final release), 2003 27. http://www.gridbus.org/broker. 28. Berners-Lee, T., Hendler, J. and Lassila, O. “The Semantic Web”, Scientific American, May 2001 29. J Davies, R Studer, Y Sure and P W Warren “Next Generation Knowledge Management” BT Technology Journal, Vol. 23, No. 3, July 2005.

Technical Report (April, 2005 – March, 2006) 86 IT@MIT Development of Front End tools for Semantic Grid Services

30. Lenat D B and Guha R V, “Building Large Knowledge Based System: representation and interface in the Cyc project”, Addison-Wesley(1990). 31. D. Roman, H. Lausen, U. Keller (eds.):, “Web Service Modeling Ontology (WSMO)”, http://www.wsmo.org/TR/d2/ 32.M.Zaremba, M.Moran, “WSMX Architecture”, http://www.wsmo.org/TR/d13/d13.4/v0.2/ 33.M.Kerrigan, “Web Services Modeling Toolkit (WSMT)”, http://www.wsmo.org/TR/d9/d9.1/v0.1/ 34. Wei Xing, Marios D. Dikaiakos, Rizos Sakellariou, “Design and Development of Core Grid Ontology”, GGF16 Semantic Grid Workshop, Feb 2006. 35. Said Mirrza, Pahlevi, Lsao Kojima, “S-MDS: A Semantic Information Service for Advanced Resource Discovery and Monitoring in WS-Resource Framework”, GGF16 Semantic Grid Workshop, Feb 2006.

Technical Report (April, 2005 – March, 2006) 87

Recommended publications