A Pattern-Based Approach to Ontology-Supported Semantic Data Integration
Total Page:16
File Type:pdf, Size:1020Kb
A pattern-based approach to ontology-supported semantic data integration Andreas A. Jordan Wolfgang Mayer Georg Grossmann Markus Stumptner Advanced Computing Research Centre University of South Australia, Mawson Lakes Boulevard, Mawson Lakes, South Australia 5095, Email: [email protected] Email: (wolfgang.mayer|georg.grossmann)@unisa.edu.au, [email protected] Abstract RSM2, ISO 159263 and Gellish4 standards are all used to represent information about engineering assets and Interoperability between information systems is associated maintenance information. However, there rapidly becoming an increasingly difficult issue, in exists considerable variation in the scope and level of particular where data is to be exchanged with exter- detail present in each standard, and where the infor- nal entities. Although standard naming conventions mation content covered by different standards over- and forms of representation have been established, laps, their vocabulary and representation used vary significant discrepancies remain between different im- considerably. Furthermore, standards like ISO 15926 plementations of the same standards and between dif- are large yet offer little guidance on how to represent ferent standards within the same domain. Further- particular asset information in its generic data struc- more, current approaches for bridging such hetero- tures. Heterogeneities arising from continued evolu- geneities are incomplete or do not scale well to real- tion of standards have further exacerbated the prob- world problems. lem. As a consequence, tool vendors formally support We present a pragmatic approach to interoperabil- the standard, yet their implementations do not inter- ity, where ontologies are leveraged to annotate proto- operate due to different use of the same standard or typical information fragments in order to facilitate varying assumptions about the required and optional automated transformation between fragments in dif- information held in each system. Not all standards ferent standards and representations. We discuss our are supported by formal ontologies that would allow approach based on an example drawn from the oil and one to overcome these heterogeneities automatically. gas industry. Where formal models are present, their focus is typ- ically on describing individual entities and relation- 1 Introduction ships, whereas dealing with issues arising from their use in larger representation structures, and scalabil- A growing number of organisations are interested in ity problems stemming from application of inference sharing information between their internal informa- mechanisms, are often left aside. tion systems and externally with partners or the pub- The aim of our work is to build a bridge that lic sector. A core issue facing these organisations ties together the heterogeneous standards and pro- is the information management systems they employ vides automated translation between their informa- have been built without supporting a common stan- tion models and concrete implementations. Our hy- dard for the underlying data models used for storing pothesis is that this goal can be achieved only if the information and their interfaces. Consequently, creat- information representation fragments used by differ- ing custom-built software that translates between dif- ent tools and standards are captured such that pos- ferent interfaces and data models is a predominantly sible translations between equivalent fragments can manual task, and current solutions for bridging het- be inferred automatically. To resolve the interoper- erogeneities between information systems do not scale ability problem not only with respect to terminologi- well to real-world problems. cal heterogeneities but also with respect to heteroge- Although standardisation of domain models, in- neous usage patterns by different organisations, exist- terfaces, and information representation can help to ing standards must be complemented with a formal reduce this burden, standardisation alone has been in- model of the elementary \information fragments" and sufficient to overcome the interoperability challenge. possible use of these fragments within each standard This is predominantly due to the fact that the stan- and organisation. Whereas approaches for translating dards themselves are heterogeneous and sometimes a semi-formal model like Gellish into natural language suffer from ambiguity. In the Engineering domain exist, we aim to extend these ideas to formal com- alone, there exist several competing standards and plex structures. This requires a formal model of the de-facto standards set by large tool vendors. For ex- building blocks, henceforth \information fragments", 1 in the source and target representation and mech- ample, in the oil and gas industry, the MIMOSA , anisms to translate between \similar" fragments in Copyright c 2011, Australian Computer Society, Inc. This pa- different standards. per appeared at the Seventh Australasian Ontology Workshop For this purpose, we intend to construct an on- AOW 2011, Perth, Western Australia. Conferences in Research tology that reconciles the concepts and relationships and Practice in Information Technology (CRPIT), Vol. 48, in each standard into a common model and formally Vladimir Estivill-Castro and Gillian Dobbie, Ed. Reproduc- express the information fragments found in each stan- tion for academic, not-for-profit purposes permitted provided this text is included. 3http://15926.org/home/tiki-index.php 1 http://www.mimosa.org/ 4http://www.gellish.net/ dard in terms of the common model. Our focus will be on representing information fragments and their con- Upper Ontology crete representation in different standards such that A A equivalent representations in different standards can Information MIMOSA Fragment ISO be constructed automatically. Data Model B Ontology B Data Model We seek to investigate formal models and infer- 3 MIMOSA MIMOSA Fragment ISO ISO ence procedures to determine which are adequate RDL Templates 2 instances 4 Templates RDL for capturing the intent and composition of concrete 1 5 data structures on each end of the translation pro- MIMOSA Data ISO Data cess while reusing as much as possible existing on- tologies capturing domain-specific entities and their relations. Whereas (domain-specific) ontologies pro- Figure 1: Schematic of the overall translation process vided by several standards exhaustively capture do- on the example of translation between the MIMOSA main concepts and relations, we intend to develop and ISO 15926 standards. mechanisms for reasoning about translations between different representations. The resulting ontology will help to build adequate unifying ontology. Each fragment can be seen as a translation systems by providing developers of in- parametric statement that, when instantiated with tegration tools a catalogue of possible translations suitable values and objects for its parameters, ex- between different systems, either at run-time or at presses a certain meaning in the concrete representa- design time, when translations for information frag- tion corresponding ontology. By combining fragment ments in the source representation must be formed. instances representing template instances in the tar- Our ontology will also be able to verify completeness get representation, suitable translation for a given set of a given translation by identifying fragments and of template instances in the source representation can constraints in the source model that cannot be trans- be devised. lated precisely into the target model, and fragments that have ambiguous or multiple conflicting transla- Ontology of Information Fragments. Central tions. to our approach is the unifying ontology of informa- Our research will be predominantly driven by a tion fragments and translation operators depicted in case study within the oil and gas industry, where in- the centre of figure 1. It extends a suitable upper on- formation models held in MIMOSA, ISO 15926, and tology in order to foster reuse and to ensure its over- RSM are to be synchronised among a number of sys- all organisation adheres to well-established knowledge tems. representation principles. Each fragment in our on- In this paper we make the following contributions: tology is linked with its corresponding template(s) in • we outline our overall approach to automated one or more source or target ontologies. Note that not fragment-based information integration, every fragment will have corresponding templates in each data source. Therefore, it is necessary to find • present our approach to linking existing tax- combinations of fragments that are available in the onomies and ontologies to our model, target ontology that correspond to fragments found in the information source. Furthermore, it may be • show how to capture information fragments and necessary to query information from the target in or- associated translation operations in an ontology, der to obtain information (such as object identifiers of and existing objects) while instantiating target fragments. We intend to construct an ontology that captures • show how the resulting ontology can be applied the properties and constraints of fragments in a for- to infer automated translation between source mal manner such that equivalent instances of frag- and target representations. ments can be identified and synthesised. In order to The paper is organised as follows. We present our reason about transformations, it is necessary to cap- overall approach in Section 2 and introduce