Open Government Vocabularies Registration Model
Total Page:16
File Type:pdf, Size:1020Kb
Open Government Vocabularies – Registration Model Open Government Vocabularies Working Group
Introduction: This document contains a general description of registration for Open Government Vocabularies, comprised of a set of registration attributes and a registry model. This document is one of a related set of documents on Open Government Vocabularies, whereby the set consists of these documents with each titled as follows: Data Architecture Subcommittee – Open Government Vocabularies – Overview Open Government Vocabularies – Content Model Open Government Vocabularies – Registration Model Open Government Vocabularies – Registration Procedure
One of the original drivers for the creation of the Open Government Vocabularies Working Group (OGV-WG) was the realization within the US Government Data.Gov community that the need exists for a catalog of government vocabularies. This document is one of a series designed to realize that vision.
The OGV-WG unanimously agree that a catalog and a registry are synonymous ideas within the context of this working group’s mission. Thus, we use the words interchangeably in this document.
This document is divided into these sections: Registration attributes Registry model
The content of this document is based on these international standards and other specifications: ISO 704:2000 – Terminology – Principles and methods ISO 1087-1:2000 – Terminology – Part 1: General vocabulary ISO/IEC 11179-3:2003 – Metadata registries – Part 3: Metamodel and basic attributes ISO/IEC 11179-6:2005 – Metadata registries – Part 6: Registration ISO/IEC 11404:2007 – General purpose datatypes ISO/IEC 19773:2011 – Metadata Modules Open Government Vocabularies – Content Model Open Government Vocabularies – Registration Procedure ISO/IEC 11404, ISO/IEC 11179 (all parts), and ISO/IEC 19773 are freely available on the web at http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html. The modeling diagrams throughout this document use a Unified Modeling Language (UML) notational format, with underlying assumptions limited to those appearing elsewhere in this document. We offer the following semantic interpretation with respect to that which is represented by these models: Boxes (look like UML classes) represent concepts and instances are objects in the extension of that concept (defined below) The lines representing relationships are instances of the relations defined in ISO 1087-1 o Hierarchical (super-ordinate / sub-ordinate) relations . Partitive (whole / part) – line with triangle shaped head; example: car (whole) – body, engine, wheels (parts); illustrated with line headed by triangle . Generic (type / sub-type) – line with diamond shaped head; example: car (type) – Ford, Fiat, Ferrari (sub-type); illustrated with line headed by diamond o Associations – illustrated with line o Association classes – illustrated by dotted line from relationship to class of attributes
Assumptions: This document is written for an international audience. This document does not specify any US-only designs or formats. This document is not independent. Several are required for understanding this one, and they are listed where their reference is necessary.
Registration attributes: This section contains a set of attributes for describing: a vocabulary as a set; a controlled list of possible vocabulary types; and a list of high level vocabulary subject headings. The attributes specified describe the vocabulary as a container but do not describe the contents. The description of the contents is detailed in the document titled “Open Government Vocabularies – Content Model”. These attributes, which are similar to those specified in the Dublin Core, should be used for cataloging vocabularies. These attributes are intended to be available to support searches and the high level discovery of vocabularies.
The attributes and their definitions are as follows:
Table 1: Vocabulary Catalog Metadata Attributes 1 This is a list of attributes that must be filled out for submitting a vocabulary to a registry.
1 The sets of attributes, vocabulary types, and vocabulary subject headings are based on input from Russ Cole of the USDA. Element Occurrence Element Description Example URI* [min,max] Official name of the Title [1,1] Standard Occupational Classification vocabulary. Short title or Abbreviated abbreviation by which [1,1] SOC Title this vocabulary is popularly known. For vocabularies that are updated Version [1,1] periodically, the 2010 number for this version. Detailed explanation See Description [1,1] of the nature and http://www.bls.gov/opub/mlr/2010/08/art3full.pdf purpose of it. Submitter is entity responsible for presenting a John Doe vocabulary description Any Federal Agency for registration. Department of Redundancy Department Submitter Relevant attributes for 5555 Main St, NW Contact [1,1] specifying contact Washington, DC 20000 Information information are found Tel 202-555-1212 in ISO/IEC 19773:2011 FAX 202-555-0000 Clause 19 Entity- Email – john.doe@ drd.afa.gov Person-Group Contact Data. Steward is entity responsible for subject Sally Public matter content of a Any Federal Agency Vocabulary. Relevant Department of Redundancy Department Steward attributes for 5555 Main St, NW Contact [1,1] specifying contact Washington, DC 20000 Information information are found Tel 202-555-2121 in ISO/IEC 19773:2011 FAX 202-555-1111 Clause 19 Entity- Email – [email protected] Person-Group Contact Data. From Vocabulary Type Vocabulary [1,1] list – Table 2 in this Multi-Level Vocabulary: Taxonomy Type document.
Pertinent subject area – from Subject Area Subject Area [1,1] Labor Force, Employment, and Earnings Headings list – Table 3 in this document. Element Occurrence Element Description Example URI* [min,max] Comma separated list of keywords; both technical and non- Occupation, employment, statistics, major groups, Keywords [1,n] technical are minor groups, broad occupations, detailed appropriate. Intended occupations purpose: Aid search and discovery. Date when the vocabulary was first Release Date [1,1] Jan 2010 made available to the public.
Date of last change to vocabulary. Note that this, by default, is the Modify Date [1,1] Jan 2010 same as Release Data if no changes have been made.
Date when vocabulary Catalog Date [1,1] was entered into Oct 2011 catalog.
Vocabulary Uniform [1,1] URI to the vocabulary. http://www.bls.gov/soc/ Resource Identifier (URI)
URI or bibliographic Reference for citation for the Technical [0,n] technical http://www.bls.gov/soc/#materials Documentation documentation for this vocabulary.
Bibliographic APA Publication Bureau of Labor Statistics (2010). Standard citation for [0,1] Manual, 6th Ed Occupational Classification. Retrieved month day, vocabulary compliant citation. year from http://www.bls/gov/soc/
URI to content metadata for the vocabulary. See Open Content [0,1] Government N/A Metadata Vocabularies – Content Model for details. Element Occurrence Element Description Example URI* [min,max] Vocabulary Does this vocabulary usage requires No, this vocabulary is freely and publically [1,1] require a license a license available. agreement for use? agreement
Vocabulary URI to the license Conditional license agreement, if N/A [0,1] agreement URI applicable.
URI to cross-walk Vocabulary between this http://www.bls.gov/soc/ - Scroll to 2010 SOC, [0,n] Cross-walk vocabulary and related Downloadable Materials ones
* We don’t specify what those URI’s might look like.
A specific identifier was left off the list of attributes, as assigning an identifier is the responsibility of the registration authority.
Table 2: Vocabulary Types This is a vocabulary of types of vocabularies. This is a code list: 4th item in this list. The codes are URI’s.
URI* Vocabulary Type Description Flat Vocabulary: Authority File Terms Flat Vocabulary: Glossary Terms with Definitions Flat Vocabulary: Dictionary Terms, definitions, additional information Flat Vocabulary: Gazetteer Location, coordinate Flat Vocabulary: Code List Code, definition Multi-Level Vocabulary: Taxonomy Terms classified into subject-specific categories Multi-Level Vocabulary: Subject Terms classified into broad categories Heading Relational Vocabulary: Thesaurus Terms with relationships Relational Vocabulary: Semantic Terms with relationships (includes additional Network types of relationships) Relational Vocabulary: Ontology Terms in categories, with relationships and rules/axioms
* We don’t specify what those URI’s might look like. Table 3: Vocabulary Subject Headings This list exemplifies a vocabulary of subjects for vocabularies. It may be modified. It is a code list where the codes are URI’s.
URI* Subject Headings Population Births, Deaths, Marriages, and Divorces Health and Nutrition Education Law Enforcement, Courts, and Prisons Geography and Environment Elections State and Local Government Finances and Employment Federal Government Finances and Employment National Security and Veterans Affairs Social Insurance and Human Services Labor Force, Employment, and Earnings Income, Expenditures, Poverty, and Wealth Prices Business Enterprise Science and Technology Agriculture Natural Resources Energy and Utilities Construction and Housing Manufactures Wholesale and Retail Trade Transportation Information and Communications URI* Subject Headings Banking, Finance, and Insurance Arts, Recreation, and Travel Accommodation, Food Services, and Other Services Foreign Commerce and Aid Puerto Rico and the Island Areas International Statistics Other
* We don’t specify what those URI’s might look like. Registry Model: Here is a model of how a vocabulary registry may be organized.
technical documentation
0..1
Object 0..n vocabulary 0..n
0..1 0..1 0..n Registered Item Title : Multi-text 0..n license Title-short : Multi-text Version : Multi-text Description : Multi-text Type : Type-list 0..n cross-walk Subject : Subject-list Keywords : Multi-text Catalog-date : Date Modify-date : Date 0..n steward Contact Release-date : Date Citation : String 1..n License : Yes-No 0..n 1 1 0..1
content metadata registration submitter 1 registrar Vocabulary
1..n Registry 0..n Name : String Registration Administrative-status : Administrative-codes 1..n 0..n Identification-status : Identification-codes Metadata-status : Metadata-codes management Registration Authority Content-Quality-status : Content-Quality-codes Name : String Vocabulary-Quality-status : Vocabulary-Quality-codes 1 Registration-status : Registration-codes
Figure 1: Registration Model
The following is a description of each of the classes, attributes, relationships, and datatypes in the model. Attributes are listed under the class they characterize. Relationships are listed as if they are attributes, but they appear as such for the classes at both ends. Designations for relationships are written in lower case. Commonly known datatypes such as Boolean, date, integer, and string are not defined. The specialized datatypes are defined at the end, since each datatype often is used to specify several attributes. Class Contact Definition Details for contacting some entity via postal or electronic (voice, text, or otherwise) means Attributes All attributes specified in ISO/IEC 19773: 2010 – Metadata Modules, Clause 19 (Module 19: Data structure for entity-person-group (EPG) contact data) Relationships Registrar Definition: The entity responsible for managing the Registries within a Registration Authority Relationship: A Contact may be registrar for one or more Registration Authorities Steward Definition: The entity responsible for the subject matter for a Vocabulary Relationship: A Contact may steward one or more Registered Items Submitter Definition: The entity responsible for presenting a Vocabulary for registration to a Registration Authority Relationship: A Contact may submit one or more Registrations
Class Object Definition Anything perceivable or conceivable (from ISO 1087-1) Attributes Relationships Cross-Walk Definition: A table showing the nature of the similarity of the meaning of pairs of concepts between Vocabularies Relationship: An Object may be a cross-walk for one or more Registered Items License Definition: The rules for fair or lawful use of a Registered Item Relationship: An Object may be the license for one or more Registered Items Technical Definition: The technical descriptions necessary for using or understanding a Documentation Vocabulary Relationship: An Object may be the technical documentation for one or more Registered Items Vocabulary Definition: The information object (e.g., a file) containing a Vocabulary Relationship: An Object may be the vocabulary for one or more Registered Items
Class Registered Item Definition A class of objects, each of which are cataloged in a Registry Attributes Catalog Date Definition: Date the Registered Item is submitted for Registration Citation Definition: Bibliographic citation for the Registered Item (APA edition 6) Description Definition: Short description of Registered Item Keywords Definition: Terms signifying content of Registered Item License Definition: Flag indicating if there is a license for the use of the Registered Item Modify Date Definition: Date the Registered Item was modified Release Date Definition: Date the Registered Item was released for use Subject Definition: Main subject field associated with this Registered Item Title Definition: Title for the Registered Item Title-short Definition: Short title or abbreviation for the Registered Item Type Definition: Kind of structure for this Registered Item Version Definition: Version for this Registered Item Relationships Content Definition: Detailed formal description of the Registered Item Metadata Relationship: A Registered Item must have content metadata for one Vocabulary Description Cross-Walk Definition: A table showing the nature of the similarity of the meaning of pairs of concepts between Vocabularies Relationship: A Registered Item may have a cross-walk for one or more Objects License Definition: The rules for fair or lawful use of a Vocabulary Relationship: A Registered Item may have a license described in one Object Registration Definition: The association of a Registered Item with a registry Relationship: A Registered Item must be a registered in one or more Registries Steward Definition: The entity responsible for the subject matter for a Registered Item Relationship: A Registered Item must be stewarded by one or more Contacts Technical Definition: The technical descriptions necessary for using or understanding a Documentation Vocabulary Relationship: A Registered Item may have its technical documentation in one Object Vocabulary Definition: The information object (e.g., a file) containing a Vocabulary Relationship: A Registered Item may actualize a vocabulary in one or more Objects
Class Registration Definition Association of a Registered Item with a Registry Attributes Administrative Definition: Non-Null indicator as to whether the Vocabulary is under review status or open for use and reference Identification Definition: Code to designate which aspects of a Vocabulary have identifiers status assigned, with allowable values and meanings specified Metadata status Definition: Non-Null indicator as to which kinds of metadata (registration or content) have been created, with default specified, with allowable values and meanings specified Content Quality Definition: Code to designate the assessment of the Content metadata status received with respect to quality guidelines, with allowable values and meanings specified Vocabulary Definition: Code to designate the assessment of the Vocabulary metadata Quality Status received with respect to quality guidelines, with allowable values and meanings specified Registration Definition: Code to designate the level at which comparability has been status assessed, with allowable values and meanings specified Relationships Submitter Definition: The entity responsible for presenting a Vocabulary for registration to a Registration Authority Relationship: A Registration must be submitted by one Contact
Class Registration Authority Definition Entity responsible for managing registrars Attributes Name Definition: Label – see Open Government Vocabularies – Content Model document Relationships Management Definition: Registries are managed by a Registration Authority Relationship: A Registered Authority may manage one or more Registries Registrar Definition: The entity responsible for managing the Registries within a Registration Authority Relationship: A Registration Authority must have one registrar Contact
Class Registry Definition: Collection of data stores Attributes Name Definition: Label for Registry Relationships Management Definition: Registries are managed by a Registration Authority Relationship: A Registered Authority may manage one or more Registries Registration Definition: The process and record of managing a description for some object Relationship: A Registry may register one or more Registered Items
Class Vocabulary Description Definition: Detailed description of a vocabulary Attributes Relationships Content Definition: Detailed formal description of a vocabulary Metadata Relationship: A Vocabulary Description may be content metadata for one Registered Item Datatypes* Multi-text Set of textual values that all have the same meaning, but may have different presentations and different datatypes that are dependent upon the context of use – see ISO/IEC 19773 – Metadata modules, sub-clause 12.4.3 Administrative- State datatype consisting of Administrative-statuses defined in Open codes Government Vocabularies – Registration Procedure Content-Quality- State datatype with value space consisting of Content-Quality-statuses codes defined in Open Government Vocabularies – Registration Procedure Identification- State datatype with value space consisting of Identification-statuses defined codes in Open Government Vocabularies – Registration Procedure Metadata-codes State datatype with value space consisting of Metadata-statuses defined in Open Government Vocabularies – Registration Procedure Registration- State datatype with value space consisting of Registration-statuses defined in codes Open Government Vocabularies – Registration Procedure Subject-list State datatype with value space consisting of values in Table 3 in this document Type-list State datatype with value space consisting of values in Table 2 in this document Vocabulary- State datatype with value space consisting of Vocabulary-Quality-statuses Quality-codes defined in Open Government Vocabularies – Registration Procedure Yes-No State datatype with value space consisting of values ‘Yes’ and ‘No’ *NOTE: Each datatype in the State family of datatypes consists of a finite number of distinguished and unordered values – see ISO/IEC 11404:2007 – General purpose datatypes, sub-clause 8.1.2
The attributes described in Table 1 are listed in the Registered Item class in the Figure 1 above. Some attributes from Table 1 do not appear in Figure 1, and these are captured in the relationships. The cross-walk, vocabulary, license, technical documentation, content metadata, registration relationships from Figure 1 are included in Table 1 as attributes. Administered- status, Identification-status, Metadata-status, Content-Quality-status, Vocabulary-Quality-status, and Registration-status are attributes created by the registry. These attributes are used to record the status of a vocabulary with respect to the registration process. The registrar, submitter, and steward relationships are contact information for a registrar, the entity responsible for running the registration authority – the activities of a registry; the submitter, the entity responsible for presenting a Vocabulary to the registration authority for registration; and the steward, the entity responsible for the subject matter in a Vocabulary. The cross-walk relationship is a link to a cross-walk between this Vocabulary and a related one. There may be many cross-walks. The content-metadata relationship is a link to the content metadata, described in the document Open Government Vocabularies – Content Model. The license relationship is a link to a license agreement for use of the Vocabulary, if necessary. The vocabulary relationship is a link to an information artifact containing the Vocabulary. The technical documentation relationship is a link to any technical document describing the Vocabulary or its use. The registration relationship is a link describing facts about the registration of that Vocabulary.
Registered Item data are collected for any kind of item the registry handles. In this case, the only item is a Vocabulary; however this structure allows an expansion of the role of the registry to handle additional kinds of items.
The roles of submitter, steward, and registrar are described in Open Government Vocabularies – Registration Procedure. Principles for the formation of URI’s are contained in Open Government Vocabularies – URI Principles.
An actual Vocabulary will not be stored in the registry, only a URI that points to it. This allows the agency responsible for the Vocabulary to maintain complete control of its contents.