(Pdf) of Metadata Standards for Semantic Interoperability In
Total Page:16
File Type:pdf, Size:1020Kb
Metadata Standards for Semantic Interoperability in Electronic Government Jim Davies, Steve Harris, Charles Crichton, Aadya Shukla, and Jeremy Gibbons Software Engineering Programme, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD, UK [email protected] ABSTRACT a greater degree of ownership or control over the develop- Effective data sharing, across government agencies and other ment. Standards are the means by which electronic govern- organisations, relies upon agreed meanings and representa- ment can achieve interoperability across departments and tions. A key, technological challenge in electronic gover- agencies, improve their management of supplier contracts, nance is to ensure that the meaning of data items is accu- and ensure that key data remains accessible over time. rately recorded, and accessible in an economical—effectively, Standardisation activity in software was originally focussed automatic—fashion. In response, a variety of data and meta- upon language and protocol design: upon the intended in- data standards have been put forward: from government terpretation of programming statements, and upon the con- departments, from industry groups, and from organisations crete representation of data and commands. Since then, such as the ISO and W3C. there has been a pronounced shift in focus towards metadata This paper shows how the leading standard for metadata standards: descriptions of intended functionality and mean- registration—ISO 11179—can be deployed without the need ing that can be associated with particular items of data, in for a single, monolithic conceptualisation of the domain, and order to ensure a consistent treatment and interpretation. hence without the need for universal agreement upon a par- Initial work in this area was motivated by the concerns of ticular model of electronic governance. The advantages of document management: the widely-used Dublin Core stan- this approach are discussed with regard to the UK eGovern- dard, for example, is a collection of metadata items address- ment Interoperability Framework (eGIF) and the UK Inte- ing issues such as the authorship, format, intended audience, grated Public Sector Vocabulary (IPSV). and availability of information resources. Subsequent work has been focussed upon the data within documents, most notably, within the messages sent between different systems 1. INTRODUCTION in business [5] and healthcare [8]. Standards can have an important role to play in software de- The importance of metadata standards at the level of indi- velopment: by adhering to a published standard, we might vidual data elements has already become apparent in these hope that our applications or data would prove immediately and other domains: for example, compatible with those produced by others: for example, if − The NASA Mars Climate Orbiter [11] was lost after we use tags only as allowed by an earlier HTML standard, messages between two different systems were we might be confident that our web pages would render suc- misinterpreted: the first was sending values in US cessfully in a wide range of browsers. Customary units (lbf-s), the second assumed that the In industry, there may be a competitive advantage in not values arriving were measured in SI units (Ns); the following a standard: the benefits of compliance can be eval- result was an initial orbit 170km lower than uated in purely commercial terms, and it may well be that planned—23km below survivable height. they do not justify the costs. Suppliers will often add new features, adopt different interpretations, or omit certain as- − US Government compliance data on water quality pects of a standard to reduce costs, to improve performance, near industrial waste sites had to be re-evaluated or to make it more difficult for customers to use products when it was discovered that ‘rate of flow’ had been from other suppliers. taken to mean the rate of flow of waste by some In electronic government, the situation is reversed: it is teams, while others had taken it to mean the rate of often more important to work to agreed standards than to flow of the waterway [7]. reduce costs, improve performance, or to seek to establish Although in either case, misunderstandings could have been discovered earlier—and remedied—through improved com- munications, review and management activities, this kind of expensive mistake is hard to avoid when data is processed Permission to make digital or hard copies of all or part of this work for and integrated automatically, and its semantic consistency— personal or classroom use is granted without fee provided that copies are the compatibility of units, and of intended interpretation—is not made or distributed for profit or commercial advantage and that copies checked only manually. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. The need to record and promote—and where possible, The approach to data semantics taken in the standard is automate—the re-use of standard metadata elements across characterised by the notion of data element, characterised enterprises and initiatives has led to the establishment of by a combination of: metadata registries, for example: − a data element concept, corresponding to a property − the Intelligent Transport Systems metadata registry, of an object class; a repository of data definitions and models for − a value domain, corresponding to a set of values that transport initiatives, currently being used by the UK may be assigned to this concept, described in terms Highways Agency to improve the quality of standards of a conceptual domain. under development, such as the European DATEX II specification for travel information [15]; The notions of property and object class are only loosely defined: it is left to the implementer to decide upon their − the National Information Exchange Model (NIEM) precise interpretation. maintained by the US Departments of Justice and Homeland Security is designed to “develop, disseminate and support enterprise-wide information object class property exchange standards and processes. to effectively share critical information in emergency situations, as 0..1 0..1 well as support the day-to-day operations of agencies * * throughout the nation” [6]; * 1 data element concept conceptual domain − the METadata Online Registry (METeOR) maintained by the Australian Institute of Health and 1 1 Welfare, part of the Australian Government, is a repository of national data standards on health, * * 1 * housing and community services [12]. data element value domain Each of the registries mentioned above is designed accord- ing to the model provided by ISO 11179 [9], an international standard for metadata registration. It is this standard, and Figure 1: Basic entities in ISO 11179 its interpretation within the domain of electronic govern- ment, that is the subject of this paper. The six different entities mentioned above are related ac- The paper begins with an introduction to ISO 11179, to- cording to the class diagram shown in Figure 1, based upon gether with an explanation of its relationship to existing an entity-relationship diagram given in Part 3 of the stan- electronic government interoperability frameworks. In Sec- dard. Note that a data element is uniquely characterised tion 3, we identify a number of problems that may arise in by the combination of a data element concept and a value the implementation of the standard and related initiatives— domain. particularly in regard to the implied adoption of a single, Many data elements may share the same data element conceptual domain. concept. This allows for different kinds of measurement of In Section 4, we explain how these problems can be solved the same property: for example, the data element concept with a specific interpretation of the ISO 11179 standard, to- of a person’s weight with two different value domains to gether with an extension of the metadata registry approach produce two different data elements: a person’s weight in to include the registration of multiple models of definition, kilograms and a person’s weight in pounds. Similarly, many classification, and usage. The result is an approach to data data element concepts may share the same conceptual do- and metadata standards that supports multiple perspectives main, indicating that they share the same dimensionality: on information and its meaning, an essential prerequisite for for example, that they are all measurements of mass. interoperability across different departments, agencies, and It would seem logical for the association between data el- cultures. The paper ends with a brief discussion of the value ement concepts and conceptual domains to correspond to of the interpretation and extension, particularly with respect the composition of three other associations: that is, a data to initiatives such as the UK electronic Government Inter- element concept is associated with a conceptual domain pre- operability Framework. cisely when this is the conceptual domain for one of the value domains of an associated data element. However, the stan- dard does not explicitly require that this is the case. 2. ISO11179 All of the above entities are maintained as administered The ISO/IEC 11179 standard Metadata registries (MDR) metadata items within a registry. Each item is associated addresses “the semantics of data, the representation of data, with a collection of administrative metadata, whose function and the registration of the descriptions of that data”. Its is to support the processes of registration, maintenance, and purpose is to promote: standard descriptions, common un- re-use. For example, each item has an administration record, derstanding, harmonisation, and re-use of data in different recording status, change information, and dates of creation, contexts; and the management and re-use of the “compo- change, and effect. Guidelines for the registration and main- nents of data” [9]. It exists in six parts: the first of these tenance of administered items are set out in Part 6 of the describes the overall approach; the others address the spe- standard.