978-3-642-23300-5 6 Chapter.Pd
Total Page:16
File Type:pdf, Size:1020Kb
The Problem of Conceptual Incompatibility Simon Mcginnes To cite this version: Simon Mcginnes. The Problem of Conceptual Incompatibility. 1st Availability, Reliability and Security (CD-ARES), Aug 2011, Vienna, Austria. pp.69-81, 10.1007/978-3-642-23300-5_6. hal-01590394 HAL Id: hal-01590394 https://hal.inria.fr/hal-01590394 Submitted on 19 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License The Problem of Conceptual Incompatibility Exploring the Potential of Conceptual Data Independence to Ease Data Integration Simon McGinnes Trinity College Dublin, Dublin 2, Ireland [email protected] Abstract. Application interoperability and data exchange are desirable goals, but conventional system design practices make these goals difficult to achieve, since they create heterogeneous, incompatible conceptual structures. This conceptual incompatibility increases system development, maintenance and integration workloads unacceptably. Conceptual data independence (CDI) is proposed as a way of overcoming these problems. Under CDI, data is stored and exchanged in a form which is invariant with respect to conceptual structures; data corresponding to multiple schemas can co-exist within the same application without loss of integrity. The use of CDI to create domain- independent applications could reduce development and maintenance workloads and has potential implications for data exchange. Datasets can be merged without effort if stored in a conceptually-independent manner, provided that each implements common concepts. A suitable set of shared basic-level archetypal categories is outlined which can be implemented in domain- independent applications, avoiding the need for agreement about, and implementation of, complex ontologies. Keywords. Data integration, domain-independent design, conceptual data independence, archetypal categories. 1 Introduction: The Problem of Conceptual Incompatibility The present massive proliferation of databases, web pages and other information resources presents a data integration problem. There is a need to use data in a joined- up way, but mechanisms are lacking that allow easy data integration in the general case; it is often hard to combine data resources. A prime reason for this is that different datasets typically have incompatible conceptual structures. Common practice in information systems (IS) design leads each organisation or software vendor to create its own idiosyncratic data structures that are incompatible with those created by others; commercial pressures can have the same effect. Standard conceptual structures are normally used only in limited circumstances, when imposed by enterprise software platforms, legislative requirements or other external constraints. In the rush 2 Simon McGinnes to create ever-more comprehensive and powerful IS, the increasing problem of heterogeneous, incompatible conceptual structures has been left for future technology to solve. 1.1 Why Do Current Methods of Integration Not Solve the Problem? Developers have historically faced two issues with regard to integration of systems that have distinct conceptual data structures: physical incompatibility and conceptual incompatibility. Thankfully, many technologies now exist to resolve the first issue, by physically interconnecting heterogeneous platforms; these include RPC, CORBA and web services. Programs can also be linked simply by exchanging files using a common format such as XML. However, progress on physical compatibility has exposed the deeper second issue of conceptual or semantic compatibility: the problem of reconciling implicit conceptual models. If we wish to use several data resources in an integrated way, they must share both a common vocabulary and a common conceptual framework. This fundamental and unavoidable principle of semiotics [1] may be understood by analogy to human communication: if two people wish to exchange information effectively they must speak the same language, but they must also possess shared concepts. Conceptual compatibility thus runs deeper than mere language; for people to communicate they must interpret words identically, or nearly so, and there is no guarantee that this will be the case. Meaning is essentially personal and subjective, affected by context, culture, and so on. Getting two programs to exchange data involves a similar problem. A common, recognisable vocabulary must be used by both sides, and the two programs must also have been programmed with common concepts, so that they can act on the data appropriately. Computers cannot yet understand data in the sense that a human does, but they can be programmed to deal sensibly with specific items of data provided that the data is of a known type; this is what we mean when we say that a program ―understands‖ a particular concept. In practice, however, most IS share neither vocabulary nor concepts. It would be surprising if they did, given they ways they are developed and the rarity with which standard conceptual structures are applied. For this reason, linking real-world IS that have heterogeneous conceptual schemas is rarely a simple matter. In trivial cases it can seem straightforward to map the conceptual models of distinct systems to one another. For example, two programs which manage data about customers might well use similar data structures and common terms such as name, address and phone number. But semantic complexity lurks even in apparently straightforward situations. Is a customer a person or an organisation? Are business prospects regarded as customers, or must we already be trading with someone for them to be considered a customer? What about ex-customers? Many such questions can be asked, highlighting the inconvenient truth that most concepts are more complex than they seem, when one scratches the surface, and certainly far more complicated and esoteric than the trivial example quoted above. Uniting separately- developed conceptual structures can be challenging even for expert developers The Problem of Conceptual Incompatibility 3 working with systems in closely-related domains [2]; it can be difficult to discern what data structures are intended to signify and what unstated assumptions have been made. Another approach to data integration involves the use of automated schema matching, and tools for this purpose have been developed with some success [3]. But there is an inherent limit to the ability of automated matching strategies to operate reliably in the general case. Software cannot easily call upon context, domain expertise and general knowledge to understand and disambiguate the meaning of conceptual structures [4]. Again, the analogy of human understanding is relevant. When conversing with others, we draw upon our prior knowledge to understand what is meant. A person without prior knowledge has no hope of understanding what somebody else says. This analogy suggests that automated schema matching strategies must first overcome the grand challenge of accumulating and applying general knowledge before they can be expected to extract the semantics in arbitrary schemas with sufficient reliability [5]. In summary, semantic issues make it difficult, as a rule, to match conceptual models between IS—especially since most IS have idiosyncratic designs and complex conceptual structures that are based on unstated assumptions [6]. Conceptual incompatibility therefore presents a major barrier when we attempt to link IS. And this ignores the scalability problem, that integrating systems typically requires a good deal of interface code which must be crafted, onerously, by hand. Conceptual incompatibility is also a problem for end users [7]. It means that we must adopt a different mental model of reality each time we use a different program. For example, consider how the concept person is treated in different software products from the same vendor. In Microsoft Word, people are represented merely as ―users‖. In Microsoft Project, people are considered from a management perspective as ―resources‖. In Microsoft Outlook people are considered as ―contacts‖. Although these different treatments refer to the same underlying entity (a person), they are in fact three quite distinct mental concepts, each with its own meaning and implications. The same applies to most software applications: each application takes a unique perspective on reality to suit its own purpose. The user is left to mentally reconcile the various perspectives. This is at best confusing, since the concepts may be overlapping or orthogonal, and applications rarely spell out precisely what they mean by any given term. It can also be a problem for developers, who often lack understanding of the domain concepts in applications [8]. 2 Standardisation of Conceptual Structures The reliance on post-hoc system integration implicitly facilitates the trend towards growth in conceptual incompatibility. By allowing heterogeneous applications to proliferate, we are effectively supporting the development