Using Dublin Core to Build a Common Data Architecture

Proc. Int. Conf. on Dublin Core and Metadata for e-Communities 2002: 139-146 © Firenze University Press Using Dublin Core to Build a Common Data Architecture Sandra Fricker Hostetter Rohm and Haas Company, Knowledge Center [email protected] Abstract something new. The plant world provides us with a helpful analogy. A hybrid plant is the combination of The corporate world is drowning in disparate data. two separate entities into something completely new Data elements, field names, column names, row and unique, yet shares the attributes of both parent names, labels, metatags, etc. seem to reproduce at plants. This does not happen by accident. Two differ- whim. Librarians have been battling data disparity for ent species of plants will not merge to create a new over a century with tools like controlled vocabularies one without purposeful human intervention, man- and classification schemes. Data Administrators have agement, and care. And therein lie both the problem been waging their own war using data dictionaries and and the opportunity. naming conventions. Both camps have had limited In the past, tabular and non-tabular data have been success. A common data architecture bridges the gap managed and accessed in very different ways. between the worlds of tabular (structured) and non- However, the ever-demanding user population wants tabular (unstructured) data to provide a total solution to see all the available data integrated together and and clear understanding of all data. Using the Dublin presented in a manner individually tailored to their Core Metadata Element Set Version 1.1 and its specific needs. It has become impossible to separate- Information Resource concept as building blocks, the ly manage non-tabular data and tabular data. This Rohm and Haas Company Knowledge Center has cre- demands we address seemingly mutually exclusive ated a common data architecture for use in the imple- issues in a way that satisfies all parties. The creation mentation of an electronic document management sys- of a common data architecture is the most effective tem (EDMS). This platform independent framework, way to bridge the gap between all types of data. when fully implemented, will provide the ability to create specific subsets of enterprise data on demand, enable interoperability with other internal or external 2. Metadata management in a document systems, and reduce cycle time when migrating to the managed world next generation tool. Keywords: common data architecture, CDA, docu- The importance of controlling the metadata used ment management, platform independent framework, to describe items deposited in a document manage- data resource management, metadata, Dublin Core, ment system is critical to facilitate effective search controlled vocabularies and retrieval activities in partnership with the duel- ing aspects of a full-text environment – instant grati- fication and lack of discrimination. At the Rohm and 1. A new hybrid Haas Company, Dublin Core was a good starting point and became the basis for the document class Organizing information has become a core compe- and document properties structure “dictated” by the tency for corporations. Moving from a paper-based EDMS. From the beginning, our goal was to create a world to an electronic-based one is a difficult and platform independent framework that would meet lengthy transformation. Paper forced us to behave in the following needs: (1) enable the creation of specif- certain ways because of physical limitations associat- ic subsets of enterprise data on demand (2) provide ed with its tangibility. However, paper also had inher- future interoperability with other internal and exter- ent strengths in its universality and this is something nal systems (3) reduce cycle time when migrating we have taken for granted. from “today’s tool,” to the next generation of docu- Blending the features of paper and electronic for- ment management software without excessive re- mats is an enormous challenge. We must create work. 140 DC-2002, October, 13-17 - Florence, Italy The Dublin Core data elements as implemented in 4.1 Defining the “pivotal” data subject the EDMS at the Rohm and Haas Company function as the common metadata. All document classes have The first step is to identify, formally name, and these properties, though it is not mandatory the define the pivotal data subject. The pivotal data sub- properties be populated. Eventually, three of these ject is the most central business concept. All related Dublin Core based properties (DC.Title, concepts will be organized around this data subject. DC.Date.issued, DC.Publisher) will be required, and The pivotal data subject for the EDMS was the soft- DC.Publisher will have a Rohm and Haas specific ware defined object “Document Class”. We adopted controlled scheme to reflect the company’s business the Dublin Core terminology for “Information unit structure. Resource” and broadened the definition as follows: Information Resource 3. The common data architecture An Information Resource is a set of data in con- approach text, recorded in any medium of expression (text, audio, video, graphic, digital) that is meaningful, A common data architecture (CDA) “is a formal, relevant, and understandable to one or more peo- comprehensive, data architecture that provides a ple at a point in time or for a period of time. common context within which ALL DATA are under- Traditionally, an Information Resource is recorded stood and integrated”. A CDA has the following basic on some medium, such as a document, a web components – data subjects, data characteristics, and page, a diagram, and so on. In the broad sense, data characteristic variations. A data subject is “a per- however, an Information Resource could be a person, place, thing, concept, or event that is of interest son or a team of people. to the organization and about which data are cap- An Information Resource in this data architec- tured and maintained”. A data characteristic is “an ture represents a version of an Information individual characteristic that describes a data sub- Resource when there is more than one version ject”. A data characteristic variation “represents a dif- produced. The Information Resource. System ference in the format, content, or meaning of a specif- Identifier changes for each version. The ic data characteristic” (Brackett, 1994, p. 31, p. 39). Information Resource Document. Number that is At first glance, a standard like the Dublin Core assigned as an Information Property Item through Metadata Element Set Version 1.1 looks like it might Information Resource Property remains the same be a common data architecture. However under clos- across versions and identifies the Information er scrutiny, its deficiencies become more obvious. Resource, and the Information Resource. Version Dublin Core violates a core principle of data manage- Identifier uniquely identifies the version of that ment by mixing different facts within a single field. Information Resource. DC.Creator can represent a person or an organiza- Note that the system identifier as defined in this tion. The ideal data management equation is 1 Fact = data architecture is the system identifier of the 1 Field. In Dublin Core’s well-intended effort to be home system where data about information simple yet fully extensible, it is also very non-specific. resources are stored. Any other foreign identifiers This leads us down the tempting path to the never- from other systems where data about information ending crosswalk. Cross walking happens only at the resources are stored are assigned as an physical level, requires an excessive amount of work, Information Property Item through Information and yields minimal understanding. Instead, if we Resource Property. move beyond the traditional physical level analysis Note that there are non-EDMS versions of an and cross-reference to a common data architecture Information Resource, such as web page versions, created at the logical level, we gain a true common that may not have a date, version identifier, URL context for understanding all data. change, and so on. There is no way to know or dis- tinguish versions of this type. 4. How to build a common data 4.2 Defining the data characteristics architecture The second step is to identify, formally name, and Building a common data architecture involves five define the data characteristics of the pivotal data major steps. It is a reiterative process that may take subject. Examples include: several months to become an accurate reflection of the organizational situation and will require occa- Information Resource. Title sional readjustments over time. Since a common The official title of the Information Resource, data architecture represents is a living breathing such as “The Importance of Adding Property Data organization that grows and changes, it too must be to a Panagon Document.” This is the name by refreshed as needed. which the Information Resource is formally known. Proc. Int. Conf. on Dublin Core and Metadata for e-Communities 2002 141 Information Resource. System Identifier from a set of reference items commonly held by The system assigned identifier in the home sys- an Information Resource. Each Information tem that uniquely identifies an Information Property Item belongs to an Information Property Resource. This is not the same as the system iden- Group. Information Resource Property assigns the tifier that identifies an Information Resource in an Information Property Items to Information EDMS system or any other foreign system docu- Resources. menting Information Resources. The Information Information Property Item Alias Resource, System Identifier changes for each ver- An Information Property Item can have different sion of an Information Resource. The Information names in different systems or standards. There is Resource. Version Identifier identifies the version no uniform name that transcends all systems and of the Information Resource. standards. Information Property Item Alias docu- Information Resource. Version Identifier ments all of the alias names for a foreign The version number of the Information Information Property Items in various systems Resource.

Using Dublin Core to Build a Common Data Architecture

Metadata for Semantic and Social Applications

Provenance and Annotations for Linked Data

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

CODATA Workshop on Big Data Programme Book

Introduction to Ontology- Based Semantics Goals Service

Folksonomies - Cooperative Classiﬁcation and Communication Through Shared Metadata

Extending the Role of Metadata in a Digital Library System*

A Blockchain-Based Application to Protect Minor Artworks

Integrating Dublin Core Metadata for Cultural Heritage Collections Using Ontologies**

Publishing E-Resources of Digital Institutional Repository As Linked Open Data: an Experimental Study

Developing Cultural Heritage Preservation Databases Based on Dublin Core Data Elements

Choosing a Metadata Type on Metadata Types This Is Our Future; We Can No Longer Rely on Only One Record Structure