Meta Data Management

Total Page:16

File Type:pdf, Size:1020Kb

Meta Data Management

Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

PhUse Emerging Technology Working Group Metadata Definitions

0515a2da0c326a925f858d2e7d575d27.docx Page 1 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Table of Contents

0515a2da0c326a925f858d2e7d575d27.docx Page 2 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

1 INTRODUCTION: purpose of this document This document provides agreed definitions within the PhUse CSS working group around metadata management and related aspects across the industry. It is expected that these definitions will be re-used in the FDA guidelines as cross industry definitions. To be of operational value, the document contains not only definitions but also a short description and example of usage. Whenever possible, the definitions are built from those existing definitions from FDA guidance's, CDISC glossary, check cross industry definition (e.g. Gartner). Reference to the source definition is provided either directly with the definition or in the reference section. This document does not intend to be extensive and complete. It is intended to bring clarification on the most commonly used (and misused!) definition in our industry around metadata and master data management.

2 SCOPE The following topic areas are in scope of this document • Metadata management • Controlled terminology, code system, value set • Master data management • Interoperability, semantic interoperability • Data pooling, data integration, data aggregation

Definitions are provided per topic area to ease reading and structure of this document.

0515a2da0c326a925f858d2e7d575d27.docx Page 3 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3 DEFINITIONS 3.1 Metadata management

There are 2 types of metadata: structural and descriptive  Structural metadata describes the instance data that are collected and derived during clinical research across different processes and systems. It includes data domains, data elements (semantic, data type, related value sets), data mappings and transformations, and data derivations. As such they facilitate clinical software re-use and thus improve with business process efficiency. Structural metadata is defined, maintained, and governed at the level of an organisation or enterprise (e.g. pharma company, CRO, CDISC, etc.) across all projects, with subsets per therapeutic areas/drug compounds. At the study level, the study data standards is composed by structural metadata extracted from the therapeutic/drug level structural metadata.  Descriptive metadata describes process or domain-specific information about instance data collected and derived during clinical research. It provides conceptual, contextual, and processing information for instance data and as such descriptive metadata is a key enabler in deriving business value from instance data. It can also provide greater depth and more insight about the "container" of the data, whether it is a file, document, or representation. Process descriptive metadata typically includes information on the “how”, “where”, “who”, and “when” for the instance data; a specific example of this is audit trail. Semantic descriptive metadata includes additional information on the instance data, for instance patient population of a study or indication of a drug Structural metadata are typical stored and managed in a Metadata repository. Descriptive metadata are typically stored either with instance data in the source applications or in a specific master data management repository

3.1.1 Metadata Synonym

Recommende Descriptive data about an object, further differentiated into structural and d definition descriptive metadata

Definition &  Wikipedia. The term Metadata refers to "data about data". The term is ambiguous,

0515a2da0c326a925f858d2e7d575d27.docx Page 4 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

source as it is used for two fundamentally different concepts (types). o Structural metadata is about the design and specification of data structures and is more properly called "data about the containers of data". o Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description.  ISO 11179. “Descriptive data about an object [ISO/IEC 20944-1]”. Thus, Metadata is a kind of data.  Adrienne Tannenbaum, Metadata Solutions: o "Metadata: the detailed description of the instance data; the format and characteristics of populated instance data; instances and values depending on the role of the Metadata recipient." and "Instance data: That which is input into a receiving tool, application, database, or simple processing engine". o Meta metadata “The descriptive details of metadata; metadata qualities and locations that allow tool-based processing and access; the basic attributes of metadata solutions:” Description Metadata describe instance data.  Instance data are data stored in a computer as the result of data entry by a person or data processing by an application.  A metadata can become an instance data described itself by a level 2 metadata (or meta metadata) o Each CDISC standard or instance of a standard defined could be considered an object. That object will have properties that describe the operations that can be performed on it and by whom; i.e, Global SDTM objects -standard template definitions for SDTM standard domains for each version of the standard- can be copied and a few properties adjusted (instantiated at a compound level or study level to force the inclusion of PERM variables and define some of them or some EXP variables as Mandatory). The available "Copy" operation and the available "properties that can be changed" and associated "values permitted to change (from x to y)" are metadata elements to be used by the corresponding MDR processing tool to instantiate

0515a2da0c326a925f858d2e7d575d27.docx Page 5 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

that object. o The relationships among standards can be considered meta- metadata so that "conversion" or "visualization" tools can relate data elements as they move from one instance of data to other data instance of the data. – mapping

There are 2 types of metadata (see below for more details description and examples)  Structural metadata  Descriptive metadata Example See Structural metadata and Descriptive metadata in section 3.1.2 and 3.1.3.

3.1.2 Structural metadata Synonym Standard metadata Recommende In pharmaceutical research, Structural metadata describes the instance data d definition that are collected and derived during clinical research across different processes and systems. As such they facilitate clinical software re-use and thus improve with business process efficiency. Structural metadata is defined, maintained, and governed at the level of an organisation (e.g. pharma company, CRO, CDISC, etc.) across all projects; at the study level, it is the study instance metadata - extracted from the study level structural metadata – whichever is applicable. Definition &  http://en.wikipedia.org/wiki/Metadata source The design and specification of data structures (e.g. format, semantic...), cannot be “data about data”, because at design time the application contains no data. In this case the correct description would be "data/information about the containers of data".  [FDA] Structural metadata is structured information that describes, explains, or otherwise makes it easier to retrieve, use, or manage data. Description Structural metadata is what most of people mean by metadata. Structural 0515a2da0c326a925f858d2e7d575d27.docx Page 6 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

metadata is said to “give meaning to data” or to put data “in context.” Key components of Structural metadata include data domains, data elements, terminology, data mappings and transformations, and data derivations. The successful usage of Structural metadata requires data standards governance that should include:  Workflows to address the creation and/or revision of Structural metadata.  Version control of Structural metadata and Study instance metadata (see definition below).  Access control, by user role.

Standards metadata is the source from which the study instance metadata (see below) is built. A data model - describing the classes, attributes, relationships and hierarchies – constitutes the Structural metadata of the underlying data base. Example The number 120 itself is meaningless without structural metadata such as  The name of the variable (e.g. Systolic Blood Pressure) with its definition  The unit related to this physical quantity (e.g; Systolic Blood Pressure Unit = mmHG)

CDISC SDTM is the data standard approved across the industry for clinical data to be transferred to the FDA.  For instance the variable “Sex” is described by a set of structural meta data such as the label, data type (char) and associated value sets (male and female ...), role in SDTM …  The metadata for the AE (Adverse Event) SDTM domain that is compliant with the CDISC SDTM Implementation Guide (version 3.2) consists of attributes such as Variable Name, Variable Label, Type, Controlled Terms, Role, etc.

0515a2da0c326a925f858d2e7d575d27.docx Page 7 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.1.3 Descriptive metadata Synonym Process metadata (subset of descriptive metadata) Semantic metadata (subset of descriptive metadata) Recommende In pharmaceutical research, Descriptive metadata describes process or domain- d definition specific information about instance data collected and derived during clinical research. It provides conceptual, contextual, and processing information for instance data and as such Descriptive metadata is a key enabler in deriving business value from instance data. It can also provide greater depth and more insight about the "container" of the data, whether it is a file, document, or representation. Descriptive metadata should be stored as structural data elements in the Metadata Repository (MDR). ; It is generated by systems or people. Definition &  http://en.wikipedia.org/wiki/Metadata source The individual instances of application data, the data content. In this case, a useful description would be "data about data content" or "content about content".  Ralph Kimball's "Process metadata describes the results of various operations in a data warehouse." Description It is used in different contexts:  Data operations and statistical analysis (Semantic metadata) Additional content on the data that support further analysis of the data. For instance patient population in the context of a clinical trial study is descriptive metadata  Software implementation (Process metadata): describes the results of various operations happening in an application, be it in a data warehouse or any other application. This includes o Processes used to reformat (convert) or transcode content. o All information needed to support data lineage & traceability. o Details of origin and usage (including start and end times for creation, updates and access).

0515a2da0c326a925f858d2e7d575d27.docx Page 8 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Descriptive metadata is often a key enabler in deriving business value from data through both direct relationships and indirect relationships between instance data. In effect, it creates the “how”, “where”, “who”, and “when” for the instance data.  “How” - how the instance data is used within the info flow.  “Where” - source of the instance data.  “Who” - who created, modified and approved the instance data.  “When” - versioning info of the instance data. Example  Data operations and statistical analysis (Semantic metadata): patient population, indication, therapeutic area  Software implementation (Process metadata): o Metadata needed for the effective management of version control for structural metadata: UserID who executed the last modification, date of the last modification, UserID who approved the last modification. o Metadata needed for the effective management of instance data: o What is source of the data, in which system(s) is it authored o Which transformation happened to the data, how, when, by whom o Metadata needed for managing access control: different roles for accessing information and which action can they can perform (create, read, update, delete) o Audit trail: who access which information, when

3.1.4 Study Instance Metadata Synonym Study Specific metadata Recommende An instance of a higher level (enterprise/organization, compound/indication) d definition standard metadata or a subset of it. Definition & (no source found)

0515a2da0c326a925f858d2e7d575d27.docx Page 9 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

source  Study Instance metadata is a defined grouping of metadata that serves as the most complete representation of the metadata that defines an individual study.  It is commonly thought of as the set of metadata that is actually consumed by the clinical technology platform to facilitate processes that are more automated and consistent.

Description Study Instance Metadata consists of Structural metadata and some Descriptive metadata to support the management of the Study Instance Metadata. The Study Instance Structural Metadata is extracted from the Structural metadata maintained at the enterprise/organisation level; is therefore a subset of the enterprise Structural metadata. The Study Instance Metadata is exported to and consumed by the clinical data platform to ensure maximal automation and consistency of the processes for trial design, execution, storage, analysis, and submission. Example  Example of Study Instance Structural metadata: subset of SDTM data domains and variables needed to collect and derive instance data for a specific study  Example of Study Instance Descriptive metadata. For a Statistical Computing Environment (SCE) that is leveraging metadata to automate the production of TLFs, the Study Instance Descriptive metadata could include study-specific selections that help the SCE process the metadata, such as the selection of BY variables to determine appropriate breaks for a table in that particular study.

3.1.5 Metadata repository Synonym Metadata registry Recommende A Metadata repository (MDR) is a centralized repository of metadata, with information d definition about instance data such as semantics (meaning), relationships to other data, origin, usage, and format. 0515a2da0c326a925f858d2e7d575d27.docx Page 10 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

When the emphasis is put on control of new metadata – through a specific registration process with well identified administration/registration authority - the Metadata repository is often called a Metadata registry.

Recommendation is to use terms  Metadata registry when the software has a strong registration process  Metadata repository when the software is more of a library with less emphasis on registration Definition & http://datadictionary.blogspot.com/2008/03/metadata-repositories-vs-metadata.html source Definitions from Data Dictionary site - a place, room, or container where something is deposited or stored. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as they see fit and put their own definition into the metadata repository as their own definition. No problems. http://en.wikipedia.org/wiki/Metadata_repository “A Metadata repository is a database created to gather, store, and distribute contextual information about business data, when documented it is known as metadata. This contextual information of business data include meaning and content, policies that govern, technical attributes, specifications that transform, and programs that manipulate. The Metadata repository is responsible for physically storing and cataloguing metadata. The metadata that is stored should be generic, integrated, current, and historical. Generic for a Metadata repository means that the meta model should store the metadata by generic terms instead of storing it by an applications-specific defined way, so that if your data base standard changes from one product to another the physical meta model of the metadata repository would not need to change. Integration of the metadata repository allows all entities of the enterprise business to view all metadata subject areas. The Metadata repository should also be designed so that current and historical metadata both can be accessed. Metadata repositories used to be referred to as a data dictionary. http://en.wikipedia.org/wiki/Data_dictionary . A data dictionary, or Metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage,

0515a2da0c326a925f858d2e7d575d27.docx Page 11 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

and format." The term may have one of several closely related meanings pertaining to databases and database management systems (DBMS):  A document describing a database or collection of databases  An integral component of a DBMS that is required to determine its structure A piece of middleware that extends or supplants the native data dictionary of a DBMS. http://www.springerreference.com/docs/html/chapterdbid/63927.html http://www.uspto.gov/web/patents/patog/week13/OG/html/1388- 4/US08407194 20130326.html (link does not work) http://www.bls.gov/ore/pdf/st000010.pdf

Description Data Store for metadata, defined within an organization. Example CDISC SHARE NCI caDSR

3.1.6 Metadata Registry Synonym Metadata repository Recommende See metadata repository recommended definition d definition Definition & http://en.wikipedia.org/wiki/Metadata_registry A metadata registry is a central source location in an organization where metadata definitions are stored and maintained in a controlled method. A Metadata registry typically has the following characteristics:  Protected environment where only authorized individuals may make changes  Stores data elements that include both semantics and representations  Semantic areas of a metadata registry contain the meaning of a data element with precise definitions  Representational areas of a metadata registry define how the data is 0515a2da0c326a925f858d2e7d575d27.docx Page 12 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

represented in a specific format, such as in a database or a structured file format (e.g., XML) http://datadictionary.blogspot.com/2008/03/metadata-repositories-vs- metadata.html Definitions from Dr. Data Dictionary site - A Registry has the connotation of more than just a shared dumping ground. Registries have the additional capability to create workflow processes to check that new metadata is not a duplicate (for a given namespace). One of the definitions from Webster is an official record book. Note the word official.

ISO/IEC 11179-3 Third edition 2013-02-15 3.2.113 Registry: information system for registration (3.2.108)

3.2.78 Metadata registry: information system for registering metadata (3.2.74)  The structure of a metadata registry is specified in the form of a conceptual data model. The metadata registry is used to keep information about data elements and associated concepts, such as “data element concepts”, “conceptual domains” and “value domains”. Description See above Example See above.

3.1.7 Data element Synonym Variable (Note: the term “attribute” is also used interchangeably for DE when “attribute” is synonym of a variable or the property of a class) Recommende A Data Element is the most elementary unit of data that cannot be further d definition subdivided from a semantic point of view, as it is linked with a precise meaning. The definition, identification, representation and permissible values of a data element are specified by means of a set of properties. 0515a2da0c326a925f858d2e7d575d27.docx Page 13 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Definition [http://www.fda.gov/downloads/Drugs/Guidances/UCM292334.pdf A Data element is the smallest (or atomic) piece of information that is useful for analysis (e.g., a systolic blood pressure measurement, a lab test result, a response to a question on a questionnaire). A Data element is an atomic unit of data that has precise meaning or precise semantics.

[CDISC] 1. For XML, an item of data provided in a mark-up mode to allow machine processing. [FDA - GL/IEEE] 2. Smallest unit of information in a transaction. [Center for Advancement of Clinical Research] 3. A structured item characterized by a stem and response options together with a history of usage that can be standardized for research purposes across studies conducted by and for NIH. [NCI, caBIG] NOTE: The mark up or tagging facilitates document indexing, search and retrieval, and provides standard conventions for insertion of codes.

[ISO/IEC 11179-4:2004, 3.4] Unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes. The Data element is foundational concept in an ISO/IEC 11179 metadata registry. The purpose of the registry is to maintain a semantically precise structure of Data elements. Each Data element in an ISO/IEC 11179 metadata registry:  Should be registered according to the Registration guidelines (11179-6).  Will be uniquely identified within the register (11179-5).  Should be named according to Naming and Identification Principles (11179- 5).  Should be defined by the Formulation of Data Definitions rules (11179-4).  May be classified in a Classification Scheme (11179-2). Description A Data Element is the most elementary unit of data that cannot be further

0515a2da0c326a925f858d2e7d575d27.docx Page 14 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

subdivided from a semantic point of view, as it is linked with a precise meaning. A Data element has different properties:  An identification such as a data element name .  A clear definition/ semantic description.  A data type.  Optional enumerated permissible values (value sets).  One or more representation terms (synonyms).  An author and registration authority who takes responsibility for the definition of the data element. Example Birth Date is a Data Element. It is described by a set of properties:  DE name: Birthdate.  Definition/description: date and time on which the subject is born.  Data type: date (mm/dd/yyyy – hh/mm/ss – time zone).  Value sets: not applicable.  Synonyms: BRTHDTC in CDISC SDTM, birthdate in BRIDG

If Variable in SDTM is provided as a synonym of Data Element, then Data Element would have a similar association to ItemDef as Variable to ItemDef in the Define-XML.

3.1.8 Attribute Synonym Property (Note: the term “Data element” is also used interchangeably for attribute – but it is a different concept) Recommende Properties of an object or class in a conceptual or logical data model. d definition Definition & http://en.wikipedia.org/wiki/Attribute_(computing) source In computing, an attribute is a specification that defines a property of an object, element, or file. An Attribute of an object usually consists of a name and a value; of an element, a type or class name; of a file, a name and extension.

0515a2da0c326a925f858d2e7d575d27.docx Page 15 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

[Source: Understanding HL7 version 3: Andrew Hinchley] Attributes are abstractions of the data captured about classes.

[Source: ISO 1087] Attribute is short for attribute type and attribute value. Attribute type: category of attribute values used as a criterion for the establishment of a concept system.

[source: Medical Data Management” Florian Leiner et al] Attribute value: Value of an attribute type as observed for a particular object.

[Source: ISO 21090] Characteristic of an object that is assigned a name and a type. NOTE the value of an attribute can change during the lifetime of the object.

Description A prerequisite for correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and representation of the data. To facilitate this common understanding, a number of attributes, of the data have to be defined. Such attributes include: the element’s name, data type, caption presented to users, detailed description, and basic validation information such as range checks.

Description of the characteristics of an object /class in a logical model. If the attributes represent the most elementary unit of data that cannot be further subdivided from a semantic point of view it can be considered as a Data Element.

Attribute is an overloaded term. It is sometime used as synonym of Data Element or as synonym of a property of a Data Element. While the first case may be correct in many cases1, we suggest to avoid the second practice and to use the term “property” instead.

0515a2da0c326a925f858d2e7d575d27.docx Page 16 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Example In BRIDG,  RaceCode is an attribute of class Person (i.e. Person.raceCode).  Value is an attribute of DefinedObservationResult.

3.1.9 Class Synonym Object Recommende Description of a set of objects that share the same attributes, operations, d definition methods, relationships, and semantics. A Class has:  An identifier such as a class name.  A clear object definition / semantic description.  One or more representation terms/words.  A list of Data Element (also known as attributes).  A list of related classes and a description of the relationship type(s).  Any description – in addition to Data Elements – that allow to map the object within an application. Definition & http://en.wikipedia.org/wiki/Class_(computer_programming) source In object-oriented programming, a Class is a construct that is used to define a distinct type. The Class is instantiated into instances of itself – referred to as class instances, class objects, instance objects or simply objects. ….A Class usually represents a noun, such as a person, place or thing, or something nominalized. For example, a "Banana" class would represent the properties and functionality of bananas in general. A single, particular banana would be an instance of the "Banana" class, an object of the type "Banana". [Source: ISO 21090]Class: Descriptor for a set of objects with similar structure, behavior and relationships. Description Description of a set of objects that share the same attributes, operations,

1 In an information model, like BRIDG, an attribute may have a data type like “ADDRESS” which is a class. This attribute will not qualify as being a Data Element 0515a2da0c326a925f858d2e7d575d27.docx Page 17 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

methods, relationships, and semantics. Example  StudySite Class in the BRIDG model.  ManufacturedMaterial class in HL7 RIM: An Entity or combination of Entities transformed for a particular purpose by a manufacturing process.

3.1.10 Data type Synonym Storage format Recommende Data types define the format - that can be included in a specific Data Element (or d definition variable or attribute) , There are two categories of Data type:  Simple / primitive types such as Boolean, Integer, Character –defined in ISO11404.  Abstract Data types such as Address, PQ (Physical Quantity) –defined in ISO 21090 – and using the terminology, notations and Data types defined in ISO/IEC 11404. Definition & [Source: ISO 11404] source A Data type is a classification identifying one of various types of data, such as real-value, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.

[Source: ISO 21090] Set of distinct values, characterized by properties of those values, and by operations on those values.

[ Source: http://msdn.microsoft.com/] Objects that contain data have an associated data type that defines the kind of data; for example, character, integer, or binary, the object can contain. The following objects have Data types:  Columns in tables and views.  Parameters in stored procedures.

0515a2da0c326a925f858d2e7d575d27.docx Page 18 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

 Variables.  Transact-SQL functions that return one or more data values of a specific data type.  Stored procedures that have a return code, which always has an integer data type.

Description Storage format in a Data Base – not the display format in the User Interface Data types define the kind of data – or the format - that can be included in a field (Data Element, Attribute or Variable). There are two categories of data type:  Simple / primitive data types such as Boolean, Integer, Character –defined in ISO 11404.  Abstract data types –defined in ISO 21090 – and defining basic concepts that are commonly encountered in healthcare in support of information exchange. Abstract data types are using the terminology, notations and data types defined in ISO/IEC 11404, thus extending the set of data types defined in that standard. Example  Primitive data type (ISO 11404): boolean, enumerated, character, time, integer, real …  Abstract data types (ISO 21090): Address, PQ (for Physical Quantity) or II (for Instance Identifier), CD (Concept Descriptor), Range (low, high), Period (start, end).

3.1.11 Value level metadata (VLM) Synonym Row level metadata, value list metadata Recommen VLM is the mechanism (implementation approach) used in the Define-XML standard to ded express semantic dependencies between variables defined independently within the definition CDISC standards. For instance a vital sign test is defined by a test code, a value and a unit. These are independent variables in the CDISC Standards. The value level metadata allow expressing the dependencies i.e. the value and the unit will be different based on the test code.

0515a2da0c326a925f858d2e7d575d27.docx Page 19 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Definition CDISC Define-XML Specification Version 2.0 – http://www.cdisc.org/define-xml & source Value Level Metadata is metadata defined based on the value of other variable(s) to support data review and analysis in cases where variable metadata is not sufficient. The normalized data structure used by datasets based on the SDTM, SEND and ADaM models (generally one record per subject per topic variable (test code or parameter code) per visit or observation) provides an efficient method for transmitting information. However, there are cases where the dataset variable metadata does not provide sufficient detail to support data review and analysis. In these cases Value Level Metadata should be provided in the Define-XML document. Value Level Metadata enables the specification of the metadata of a variable under conditions involving one or more other dataset variables. The definition of a variable for a specific condition is known as Value Level Metadata.

Note: The Define-XML team is working on creating an Implementation Guide on the different use cases of VLM and associated requirements. It is expected that the Define-XML Implementation Guide will be gradually available to the public. Description  Variable level metadata = structural metadata on variable. o E.g. Variable VSORRESU is a coded concept with as structural metadata type = text, length = 30, value set further specified through VLM.  Value-level metadata is a specific term used in the CDISC Define-XML standard due to the fact that CDISC standards include dataset definitions in a generic way (vertical structure) which does not allow capturing explicit semantic dependencies between variables. For instance in VS we have the following variables. Variable Label Type lght EX1 EX2 VSTESTCD Vital Signs Test Short Name text 20 SYSBP HGTH VSTEST Vital Signs Test Name text 24 Systolic BP Height VSORRES Result or Finding in Original text 30 Integer Float Units VSORRESU Original Units Text 20 CMHG, INCH, MMHG CM, M It is clear that if the VSTEST = Systolic Blood Pressure, the result in VSORRES and

0515a2da0c326a925f858d2e7d575d27.docx Page 20 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Unit code in VSORRESU will be different than if the VSTEST = Height . This is further specified through VLM as displayed below: Variable Where Type Length Controlled Terms or Format VSORRES VSTESTCD EQ HEIGHT float 5.1 (Height) VSORRES VSTESTCD EQ SYSBP integer 3 (Systolic Blood Pressure) VSORRES VSTESTCD EQ HEIGHT text 5 ["cm" = U (Height) "Centimeter" ] ["IN" = "Inch"]

Example(s) Example 1.  Data values are often stored in variables that are dedicated to a single kind of measurement, for example height values are stored in a variable named “height” and weight values are stored in a variable named “weight”.  But sometimes data values for different measurements are stored in a single shared variable. And Height and weight values can all be stored in a variable named “result_value”. So, how can you know which values are height and which weight?  A second variable could name the measurement whose values are stored in “result_value”. This second variable could be named “result_name”, thus a data set contains the variable “result_name” with values like “height” and “weight” and the data set contains the variable “result_value” with values like “185” and “75 ”. This data design is good for software, which likes consistency in the data 0515a2da0c326a925f858d2e7d575d27.docx Page 21 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

variable names. But the metadata describing attributes of the shared variables must be able to describe these attributes separately for each value of result_name.  Value Level Metadata is the metadata design that enables metadata descriptions of “result_value” for each “result_name”. VLM assigns a different set of variable characteristics to result_value for each value of result_name, stating that the attributes of values in result_value are different when result_name is “height” as compared to when result_name is “weight”.

Example 2. This example of value level metadata is related to the example provided in section 3.3.7 “Value set” with Family Pet Variable Where Type Length Controlled Terms or Format FamilyPet - Text 20 Animals Breed FamilyPet EQ “Dog” Text 20 Breed of Dogs FamilyPet Different “Dog” Text 20

A set of data about a group of families, that contains a variable “Family pet” may also contain a separate variable “Breed” (considered a variable qualifier of “Family Pet”) that is conditioned upon the value of the data element “Family pet”:  The variable “Family pet” bound to the value set “Animals”.  And the data element “Breed” bound to the value set “Breeds of Dog” when “Family pet”=”Dog”.

0515a2da0c326a925f858d2e7d575d27.docx Page 22 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.2 Controlled Terminology, code systems & value sets In this section we only limit the definition to the terms most often used in clinical research operations, to clarify the confusion between terms like “code lists”, “controlled terminology”, “dictionary” like MedDRA.

The components of controlled vocabularies .. with example from CDISC Terminology …

3.2.1 Controlled Terminology/controlled vocabulary Synonym Controlled vocabulary Recommende  See Below Description d definition Definition & [CDISC] source CDISC Controlled Terminology is a set of standard value lists that are used throughout the clinical research process from data collection through analysis and submission. History of alignment of CDISC terminology:  NCI EVS (Enterprise Vocabulary Services) original terminology applicable to SDTMIG (2005).  HL7 EHR Clinical research functional profile linking HL7 standards with CDISC CDASH (data collection standards).  HITSP - (replaced by HITSC).  ISO - in progress.  JIC - Future intention to align with JIC?

http://en.wikipedia.org/wiki/Controlled_vocabulary Controlled vocabularies provide a way to organize knowledge for subsequent

0515a2da0c326a925f858d2e7d575d27.docx Page 23 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

retrieval. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designer of the vocabulary.

[Source: Mapping from a Clinical Terminology to a Classification: AHIMA] Controlled means that the content of the terminology is validated with careful quality assurance procedures in place to ensure that the terminology is structurally sound, biomedically accurate and consistent with current practice.

Controlled terminology in the context of Controlled Vocabulary:  [Amy Warner, A Taxonomy Primer]. Controlled vocabulary are organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.  [Source: ISO Standard 1087] and [Medical Informatics: Computer Applications in Healthcare and Biomedicine] The terms terminology, vocabulary and nomenclature are often used interchangeably by creators of coding systems and by authors discussing the subjects. ISO Standard 1087 (Terminology –Vocabulary) lists the various definitions for these terms. o Terminology: Set of terms representing the system of concepts of a particular subject field. o Nomenclature: System of terms that is elaborated according to pre- established naming rules. o Dictionary: Structured collection of lexical units, with linguistics information about each of them. o Vocabulary: Dictionary containing the terminology of a subject field. Description A Controlled Terminology is a synonym of Controlled Vocabulary. It is a set of standardized words and phrases (designations) used to refer to concepts.  It has a defined scope or describes a specific domain.  It may support categorization, indexing, and retrieval of information (optional).

0515a2da0c326a925f858d2e7d575d27.docx Page 24 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

 A good terminology typically includes preferred terms and synonyms while promoting consistency in preferred terms and in the assignment of the same terms to similar content.

A Controlled terminology – or code system – can be used for coding i.e. assignation of a code together with a verbatim. Example ICD-9 CM, SNOMED CT, LOINC, MedDRA are all controlled terminologies AND code systems. CDISC CT is a Controlled terminology but not a true code system because:  No OID to represent all the CDISC CT as a unique well identified set.  Governance: The organisation that publishes/manage it (NCI) with OID and designation is not the same than the one responsible for it (CDISC).  It can be extended by the sponsor.

3.2.2 Code system Synonym Controlled Terminologies, Controlled Vocabularies, Coding schemes, [Dictionary is sometimes used incorrectly], (and sometime also code lists e.g. ISO country code). Recommende A Code system – as a controlled terminology - is described as “a collection of d definition uniquely identifiable concepts with associated representations, designations, associations, and meanings”. Each concept in a code system is unique. A code system has strict governance rules to manage its content, while controlled terminology does not have strict governance. Definition & [Source: ISO 21090] source Managed collection of concept identifiers, usually codes, but sometimes more complex sets of rules and references. NOTE They are often described as collections of uniquely identifiable concepts with associated representations, designations, associations and meanings. EXAMPLES ICD-9, LOINC and SNOMED-CT. Description A Code System is a more strictly “regulated” controlled terminology: • A Code system may be described as “a collection of uniquely identifiable

0515a2da0c326a925f858d2e7d575d27.docx Page 25 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

concepts with associated representations, designations, associations, and meanings” (B for Blue, Y for Yellow) – while a controlled terminology could be just a list of words (Blue, Yellow ...). • A Concept should be unique in a given Code System and should have unique identifier (e.g. CUI – concept unique identifier), following the governance rules of the Code System. • A Code system should have:  An identifier (e.g. OID) that uniquely identifies the Code System.  A description consisting of prose that describes the Code System, and may include the Code System uses, maintenance strategy, intent and other information of interest.  Administrative information proper to the Code System, such as ownership, source URL, and copyright information.  A code system version, as the code system could evolve over time (with some time change in the underlying concept).

A Controlled terminology – or Code system – can be used for coding i.e. assignation of a code together with a verbatim. Example ICD-9 CM, SNOMED CT, LOINC, and MedDRA, NCIT (NCI Thesaurus), ISO 3166 for country code. Note: CDISC CT is not a code system as it does not have a strict version control and governance – see above).

3.2.3 Dictionary Synonym Controlled Terminology/Controlled vocabulary Recommende Do not use this term! d definition Definition & - source 0515a2da0c326a925f858d2e7d575d27.docx Page 26 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Description Often used in clinical data management for MedDRA, this term is an overloaded term with different significations in different contexts. We therefore suggest to avoid its use and use the proper wording i.e. controlled terminology or code system Example MedDRA, WHODRUG

3.2.4 Concept Synonym Recommende A Concept is a unitary mental representation of a real or abstract thing – an d definition atomic unit of thought; a Concept can be labelled with a code and/or a designation Definition & [Source: ISO 21090] source unitary mental representation of a real or abstract thing; an atomic unit of thought NOTE 1 It should be unique in a given code system. NOTE 2 A Concept can have synonyms in terms of representation and it can be a primitive or compositional term. Description • A Concept is a unitary mental representation of a real or abstract thing – an atomic unit of thought – within a specific context • The purpose of defining the concept is to share meaning in information exchange • They constitute the smallest semantic entities with which models are built. The authors and the readers of a model use concepts and their relationships to build and understand the models; these are what matter to the human user of models. A Concept can be labelled with a code (machine readable) and/or a designation (human readable); a collection of codes constitute a Code system. • Concepts and real world objects are defined at a different level (object is an actual thing that exists – while a concept is a mental thing). Example Real “unit of thought”: apple, pomme (when we need a more refined definition 0515a2da0c326a925f858d2e7d575d27.docx Page 27 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

such as green or red apple – the concept can be refined). Abstract “unit of thought”: love.

3.2.5 Code Synonym Permissible value Recommende Meaningless identifiers of a concept, which should ideally be linked with a d definition designation (or decode) which is human readable/meaningful Definition & [Source: ISO 21090] source Concept representation published by the author of a code system as part of the code system, being an entity of that Code system. Description • A Code is a machine processable Concept Representation published by the author of a Code System as part of the Code System. • It is the preferred unique identifier (unambiguous) for that concept in that Code System for the purpose of communication (preferred machine- readable identifier), and is used in the 'code' property of an ISO 21090 CD data type. • Codes are sometimes meaningless identifiers, and sometimes they are mnemonics that imply the represented concept to a human reader.

Note: • A Concept representation has a code and one or more designations. If there is more than one designation of the same concept – these are synonym of each other’s.  In a Code system that has synonyms, it is useful to have a “primary designation” assigned by the code system provider.  This is helpful in maintenance, because if a change is needed then this can be done without needing to retire and re-author the whole concept; whereas if there is no primary designation, it is difficult to decide whether making a change to “one of the synonyms” means retiring and re- authoring the whole concept. • A decode is generally used as the (primary) designation of a concept.

0515a2da0c326a925f858d2e7d575d27.docx Page 28 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Example • MedDRA code – has meaningless identifiers – “10040589” (Shoplifting). • ISO (2 letter) Country codes – mnemonic – GB = Great Britain. • In CDISC Controlled Terminology: • C16576 is the code for Female in CDISC Vocab CT. • F is the designation for Female. • Female might be another designation (and is a synonym of F , and should ideally be the primary designation as this more human readable).

3.2.6 Code list Synonym Value set, Code system (e.g. ISO country code) Recommende Do not use – not precise enough – use either code system or value set as d definition appropriate Definition & source Description Code lists within a database are implementations of a CT. The coded value is operational and not necessarily part of the CT. For example a code list 1=Male, 2=Female is the sponsor application of the CDISC terminology for SEX containing value list (Male, Female). Example

3.2.7 Value set Synonym Code list Recommende  A Value Set represents a uniquely identifiable set of valid concepts in context i.e. d definition bound to a specific data element.  It is not recommended to extend a value sets, if there is a real need this should done under a well-defined governance process. Definition & Source: ISO 21090]

0515a2da0c326a925f858d2e7d575d27.docx Page 29 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

source That which represents a uniquely identifiable set of valid concept representations, where any concept representation can be tested to determine whether or not it is a member of the value set.

Description  A Value Set represents a uniquely identifiable set of valid concepts in context i.e. bound to a specific data element.  A value set draws from one or more code systems (see examples below) :

The figure below shows the relationship between values set and code system.

 Not all dataElements are coded with a value set; therefore the cardinality between a dataElement and a valueSetBinding is 0,1.  A valueSetBinding will usually only be to a single value set but a valueSet can be bound to more than one data Element if necessary; therefore cardinality 1, n between valueSetBinding and Value Set.  A value set must always have a definition, so the cardinality between valueSet and valueSetDefinition is 1, 1.  A valueSet may relate to one or more codeSystem – so that cardinality should be 1, n.  A valueSet contains 1, n coded concepts when you expand it from its definition, which can be sourced from one or more codeSystems. But each coded concept comes from only one codeSystem so that cardinality should be 1, 1.

Examples  Example 1. A value set is needed to instantiate the data element “family pets”. o codeSystem 1= “Animals” (including “guinea pig”, “rabbit”, “hamster” etc.) o codeSystem 2 = “Breeds of Dog” (“Poodle”, “Alsatian”, “Jack Russell” etc.) which can come from a code system called “Breeds of Dog”. o valueSet = “family pets” draws concepts from two code systems – “Animals” and “Breeds of Dog”.  Example 2. In SDTM, LBTESTCD is a value set that can be extended. There are a number of LabTest concepts defined using the NCI Thesaurus code system. But if there is a lab test that you need that is not in NCI, you can add it using any other 0515a2da0c326a925f858d2e7d575d27.docx Page 30 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

lab related terminology concept such as LOINC or SNOMED – so here again the value set is drawn from more than one code system.  Example 3: most SDTM value sets can be extended with sponsor defined concepts (Which needs to be defined as part of the sponsor code system).  Example 4. In SDTM AESEV cannot be extended.

Notes  The Unique Meaning rule is important when a value set contains concepts from more than one code system. Its aim is to ensure that the value set does not contain identical concepts from two different code systems and that every concept has a single globally unique identifier. So a value set should not contain both the concept “C103812 CD19 Cell to Lymphocyte Ratio Measurement” from the NCI and the concept “8117-4 Cells.CD19/100 Cells” from LOINC as they both represent the same real world thing.  Inclusion of concepts in a value set must be properly governed i.e. the added concepts must be defined and managed in a code system. Any organization – and certainly pharmaceutical companies - need to have a properly governed code systems o Organization can build a code system by taking CDISC CT and potentially adding new concepts through a well-documented process. o However by adding new concepts, any organization diverges with the industry standards. It is therefore recommended: . To ask to the SDO at the source of the code system if the concept exists. . If not then request to add the code. . If it exists and it is not clear then request to provide an update. o To implement a well-documented governance process if the organisation wants to add their own concepts.

Extensibility: if a Value set include others, does it allow for extensibility? So others should NOT be accepted as a concept in a code system/value set. Example  Value set for countries is all the complete ISO 3166. Value set for LATAM countries is the subset of ISO 3166 for the Southern American countries.  Value set for the variable SEX in CDISC is identified by C66731 and is composed by F (for Female), M (for Male), U (for Unknown), and UN (for undifferentiated). 0515a2da0c326a925f858d2e7d575d27.docx Page 31 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

0515a2da0c326a925f858d2e7d575d27.docx Page 32 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.3 Master data management

3.3.1 Master Data Synonym Master Reference Data Recommend Master Data is a single source of basic business data used across multiple systems, ed definition applications, and/or processes. It is an object including several attributes supporting unique identification and use across multiples systems. Master Data are categorized by dimensions and persisted in specific repository Definition & http://en.wikipedia.org/wiki/Master_data Master Data is a single source of source basic business data used across multiple systems, applications, and/or processes. Master data is information that is key to the operation of a business. … Can include reference data. This key business information may include data about customers, products, employees, materials, suppliers, and the like. ... Because master data may not be stored and referenced centrally, but is often used by several functional groups and stored in different data systems across an organization, master data may be duplicated and inconsistent (and if so, inaccurate). Thus Master Data is that persistent, non-transactional data that defines a business entity for which there is, or should be, an agreed-upon view across the organization.

[Gartner – Magic Quadrant for Master Data Management of Customer Data Solution] http://www.gartner.com/technology/reprints.do?id=1- 1CK9UDO&ct=121019&st=sb Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise, such as customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts.

0515a2da0c326a925f858d2e7d575d27.docx Page 33 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Description  Master Data are objects, that must be manipulated across different systems and therefore need to have a consistent meaning and definition to ensure they can be uniquely identified across these systems. o It is produced within a transactional system (the “master system”) as part of a transaction and is used for reference and validation in transactions within other systems. o Master data are defined by a set of attributes (see example below) that support unique identification of the object and/or additional information for use across different systems.  Master Reference Data = Master Data + Reference Data (consumed in the same way) – see below for definition of reference data.  Master Data – as any other data – are defined with Structural Meta data.  Master Data are categorized in dimensions referred to with a unique identifier. o In marketing a typical master data dimension is customer. o In clinical research, the following are considered as master data dimension: drug product, device, study, site, investigator, staff, and sponsor. o Visit and Subject are not master data dimensions because they would not be persisted in an independent repository – but there should be an agreement on how to uniquely identify them within a specific trial  Master Data are persisted in a specific repository either centralized or virtual, integrating data from different systems. Master data repositories are generally implemented per dimension; so they would be a study master data repository, an investigator DB, a product registry…

0515a2da0c326a925f858d2e7d575d27.docx Page 34 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Dimension Identifying attributes (recommendation – not normative) (with key identifier in SDTM) Drug Product ID (IMP_ID ISO11615 or MPD_ID) (Investigational and Product name Comparator) (set of ) Active Ingredient Dose Form Strength Administration device Device Device name Unique Device Identifier (in CDISC: UDEVID) Type Manufacturer Model Batch Identifier Lot Identifier Serial Study Sponsor (STUDYID) Study Name Protocol ID; StudyID if different than ProtocolID Protocol Title Product (DrugProductID or DeviceID) Registered trial id (CT.gov or EUDRACT) (Protocol Short Title) Site Name (centre) (SITEID) SiteID Phone, fax Complete Postal Address (country, zip code, town, street.. see ISO 21090 address) Site Type (hospital, clinic, pharmacy, …) Investigator InvestigatorID (INVID) Name (member of a clinical Phone, fax organisation which is a key staff Email member and treated separately Complete Postal Address (ISO 21090 address) and stored in a Investigator DB)

0515a2da0c326a925f858d2e7d575d27.docx Page 35 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Dimension Identifying attributes (recommendation – not normative) (with key identifier in SDTM) Staff Name (internal to a sponsor or a CRO DateOfBirth or a hospital) Phone, fax Email Initials/username Complete Postal Address ((ISO 21090 address) Sponsor SponsorID (like Dun&Bradstreet unique number or OID) (sponsorID) Name Phone, fax Postal Address

Example  Site identification information such as: Site ID, Site Name, Site Address …  Investigator identification attributes.  The picture below gives an example of investigator master data in different Health Care systems: how they different and how they need to be integrated within a centralized repository (line at the bottom) to ensure that the SAME investigator described DIFFERENTLY in different systems is referred and use in the same way across systems.

0515a2da0c326a925f858d2e7d575d27.docx Page 36 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.3.2 (Master) Reference Data Synonym Code System (for frequently used concept such as country code) Reference Data Recommende Recommended to use “Master Reference Data” and not “Reference Data” d definition In the context of Master data Management, Master Reference Data is the set of codes, from a code system widely accepted and used, to be used within data fields ACROSS different applications.

0515a2da0c326a925f858d2e7d575d27.docx Page 37 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Definition & http://en.wikipedia.org/wiki/Master_data source Reference Data is the set of permissible values to be used by other (master or transaction) data fields. Reference data normally changes slowly, reflecting changes in the modes of operation of the business, rather than changing in the normal course of business. http://en.wikipedia.org/wiki/Reference_data Reference data are data from outside the organization (often from standards organizations) which is, apart from occasional revisions, static. This non-dynamic data is sometimes also known as "standing data".[1] Examples would be currency codes, Countries (in this case covered by a global standard ISO 3166-1) etc. Reference data should be distinguished [2] from "Master Data" which is also relatively static data but originating from within the organization e.g. products, departments, even customers. http://www.information-management.com/issues/20060401/1051002-1.html#Login Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise. Specific differences between reference and master data. Identification is a major difference between reference and master data.  In master data, the same entity instance, such as a product or customer, can be known by different names or IDs. For example, a product typically follows a lifecycle from a concept to a laboratory project to a prototype to a production run to a phase. In each of these phases, the name of the product may change, and its product identifier may, too throughout their life cycle. Beyond product, we are all aware that customers can change their names, or have identical names ...  By contrast, reference data typically has much less of a problem with identification. This is partly because reference data changes more slowly. Existing issues tend to revolve around the use of acronyms as codes. Reference data, such as product line, gender, country or customer type, often consists of a code, a description and little else. The code is usually an acronym, which is actually very useful, because acronyms can be used in system outputs, even views of data, and still be recognizable to users. Description  Master Reference Data is a Code system (see definition above) that is widely used across many different systems and need to be used consistently to ensure data integration. For instance COUNTRY code is used across many applications. This is a Master Reference Data.

0515a2da0c326a925f858d2e7d575d27.docx Page 38 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

 Master reference data are managed as any other code system, with strict governance. They therefore do not change so often as Master data which are generated by a transactional system.  The term “reference data” is widely used in different contexts and therefore it is suggested to use it in its full context i.e. “Master Reference Data”. For example other uses of the term “reference data”.  Clinical reference data = in SDTM this means data that is not subject level specific, for instance Trial Summary domain data.  Reference range for laboratory values. Example Country Code

3.3.3 Master Data Management Synonym Reference Data Management; MDM Recommende Set of processes and tools that consistently define and manage the master data and d definition master reference data within an organization. Definition & [Gartner – Magic Quadrant for Master Data Management of Customer Data source Solution] http://www.gartner.com/technology/reprints.do?id=1- 1CK9UDO&ct=121019&st=sb MDM is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official, shared master data assets. [Source: Master Data Management] Master Data Management (MDM) is the collective application of governance, business processes, policies, standards and tools facilitate consistency in data definition.

The idea of Master Data focuses on providing unobstructed access to a

0515a2da0c326a925f858d2e7d575d27.docx Page 39 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

consistent representation of shared information [Source: SAS White Paper on Supporting Your Information Strategy with a Phased Approach to Master Data Management. Description Master Data Management (MDM) comprises of a set of processes and tools that consistently define and manage the master data and master reference data of an enterprise, which are fundamental to the company’s business operations. MDM has the objective of providing processes & tools for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information.

There are different models for Master data management – the 2 main extremes are  Centralized model – where all data are managed within a central data store and pushed to the different applications within an organization.  Decentralized model (registry) where the master data are managed within each applications but then reconciled through a registry systems to federate. Example Specific products from vendors such as INFORMATICA, IBM, Software AG…

0515a2da0c326a925f858d2e7d575d27.docx Page 40 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.4 Interoperability

Categorization of Interoperability (by HL7) Synonym Interworking, To be interoperable; interoperate Recommende Ability of two or more systems of components to exchange information and to d Definition use the information that has been exchanged. Definition &  ISO 11179 Interoperability concerning the creation, meaning, computation, source use, transfer, and exchange of data [ISO/IEC 20944-1].  ISO 1117: Capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units [ISO/IEC 2382-1].  IEEE: Ability of two or more systems of components to exchange information and to use the information that has been exchanged. (Source: http://www.ieee.org/education_careers/education/standards/standards_g lossary.html)  Interoperability describes the extent to which systems and devices can exchange data, and interpret that shared data. For two systems to be

0515a2da0c326a925f858d2e7d575d27.docx Page 41 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

interoperable, they must be able to exchange data and subsequently present that data such that it can be understood by a user. (Source: http://www.himss.org/library/interoperability-standards/what-is) Description Interoperability provides means to share information between disparate information systems in such a way that the information can be used in a meaningful way. Example Interoperability between healthcare and clinical research.

3.4.1 Technical interoperability (“machine interoperability”) Synonym Machine Interoperability; Syntactic Interoperability, Functional Interoperability Recommende Technical interoperability is about exchanging information between systems d definition without explicit guarantee of shared meaning. Definition & Technical Interoperability: The focus of Technical interoperability is on the source conveyance of data, not on its meaning. Technical interoperability encompasses the transmission and reception of information that can be used by a person but which cannot be further processed into semantic equivalents by software. Note that mathematical operations can be - and frequently are - performed at the level of Technical interoperability. A good example is the use of a “check digit” to determine the integrity of a specific unit of transmitted or keyed-in data. The same mathematical formula is performed at each end of a transaction and the results compared to assure that the data was successfully transmitted. Technical interoperability moves data from system A to system B. (Source: Coming to Term: Scoping Interoperability for Health Care, HL7 EHR Interoperability WG) Description Technical Interoperability is usually associated with hardware/software components, systems and platforms that enable machine-to-machine communication to take place. This kind of interoperability is often centered on (communication) protocols and the infrastructure needed for those protocols to operate. Technical/syntactical interoperability is usually associated with data formats. Certainly, the messages transferred by communication protocols need to have

0515a2da0c326a925f858d2e7d575d27.docx Page 42 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

a well-defined syntax and encoding, even if it is only in the form of bit-tables. Example TCP/IP, XML, HTTPS, SMIME, Web services

3.4.2 Semantic interoperability Synonym Recommende Semantic Interoperability is about ensuring that exchange between systems is d definition understood and appropriately used. Definition & Semantic Interoperability: To maximize the usefulness of shared information source and to apply applications like intelligent decision support systems, a higher level of interoperability is required. This is called semantic interoperability which has been defined as the ability of information shared by systems to be understood… So that non-numeric data can be processed by the receiving system. Semantic interoperability is a multi-level concept with the degree of semantic interoperability dependent on the level of agreement on data content terminology and the content of archetypes and templates used by the sending and receiving systems. Semantic Interoperability ensures that system A and system B understand the data in the same way. (Source: Coming to Term: Scoping Interoperability for Health Care, HL7 EHR Interoperability WG) Description Semantic Interoperability is associated with the meaning of content and machine interpretation of it. Thus, interoperability on this level means that there is a common understanding between people and machine of the meaning of the content (information) being exchanged. To achieve Semantic interoperability across computer systems, we need proper definition of data through metadata, controlled terminologies and master data.

Example 1. CDISC SDTM with full compliance with future SDTM implementation guidelines would support the goal of SI. 2. CDISC SHARE is used to improve the definitions within CDISC SDTM. 3. Physical implementation of BRIDG (ex: Janus CTR). 4. CDISC ADaM datasets which are well defined and approved, but currently there is no officially defined SI (ex: linking ADaM datasets with output, SI is 0515a2da0c326a925f858d2e7d575d27.docx Page 43 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

achieved by creating layers that are sponsor-defined). 5. Note: add a counter example of lack of SI: CDISC ADaM (For Tim to add an example).

3.4.3 Process Interoperability Synonym Organizational Interoperability Recommende The mechanisms by which the integrity of workflow processes can be d definition maintained between systems. Definition & Process Interoperability: Process interoperability is an emerging concept that source has been identified as a requirement for successful system implementation into actual work settings.1 Process interoperability coordinates work processes, enabling the business processes at the organizations that house system A and system B to work together. Process interoperability is achieved when human beings share a common understanding, so that business systems interoperate and work processes are coordinated.2

Organizational Interoperability: the ability of organizations to effectively communicate and transfer (meaningful) data (information) even though they may be using a variety of different information systems over widely different infrastructures, possibly across different geographic regions and cultures.3 (Sources: 1. Coming to Term: Scoping Interoperability for Health Care, HL7 EHR Interoperability WG. 2. Principles of Health Interoperability HL7 and SNOMED (Health Information Technology Standards), author: Tim Benson, April 2012). 3: EU Interoperability Framework (EIF).

Description Process interoperability deals primarily with methods for the optimal integration of computer systems into actual work settings and includes the following: 0515a2da0c326a925f858d2e7d575d27.docx Page 44 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

• Explicit user role specification. • Useful, friendly, and efficient human-machine interface. • Data presentation/flow supports work setting. • Engineered work design. • Proven effectiveness in actual use. Example Getting married would alter taxation data. The process of adjusting the marriage status will trigger a process of adjusting the required taxation items via Technical and Semantic interoperability. Healthcare providers must standardize business rules to ensure that health information is recorded in a uniform and timely manner such that the transfer of information between systems is consistent and complete.

ICH Good Clinical Practice (GCP) , an ethical and scientific quality standard for designing, conducting, recording, and reporting trials that involve the participation of human subjects.

Maintaining/conveying information such as user roles between systems.

3.4.4 Data Exchange standards/ Study Data Exchange Standard Synonym Recommende d definition Definition & source Description Example

3.4.4.1

0515a2da0c326a925f858d2e7d575d27.docx Page 45 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.5 Data aggregation, integration, pooling

Data integration (= Data pooling + transformation) . Pooling i.e. pulling together different kinds of data from different sources to give a holistic representation of what was observed. Different data sources are combined into one central (virtual or physical) location in the format they were originally collected. This is Data pooling. . Transformations i.e. mappings to restructure the data format into a common standardized format, but leave the data itself unchanged. This often occurs since the format in which the data is collected is different across different systems.

- This is Data integration.

- Pooling and integration/transformation may happen together (i.e. data are transformed while they are combined) which leads to confusion.

0515a2da0c326a925f858d2e7d575d27.docx Page 46 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Derivations i.e. use of mathematical or logical algorithms to change or to create new data values or flags. Derivations also include imputations for missing data to facilitate statistical analysis and inference. This is Data aggregation.

3.5.1 Data pooling Synonym Data integration (= data pooling + transformation) Recommende Data pooling is pulling together data from different sources and to combine d definition them into one central (virtual or physical) location without transformation. Definition & http://english.stackexchange.com/questions/44643/meaning-of-data-pooling source Data Pooling: In the more general case, we pool our resources so that collectively we make better use of them. In the computing sense, data pool can be slightly misleading, because it often just means a centralised database. Strictly speaking, it ought to mean an arrangement whereby multiple distributed data servers store "their own" data locally but provide access to that data across the entire network. In practice, it's a buzzword that's often used loosely.

http://en.wikipedia.org/wiki/Data_pool A data pool is a centralized database, where all necessary information to perform business transactions between trading partners is stored in a standardized way.

Description

Example  Data pooling from different data collection instrument (EDC, LAB, ECG, MI...) before generation of the SDTM data sets.  Pooling clinical trial data in order to identify rare and uncommon safety signals.  Pooling (for Integrated Safety reports) consists of adding the numbers of events observed in a given treatment group across the trials and dividing the results by the total number of patients included in this group.

0515a2da0c326a925f858d2e7d575d27.docx Page 47 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

3.5.2 Data integration Synonym Recommende Data integration is the result of transforming data into a common format d definition within a central (virtual or physical) location, maintaining integrity and non- redundancy. Definition & http://en.wikipedia.org/wiki/Data_integration source Data integration involves combining data residing in different sources and providing users with a unified view of these data.

IBM: http://www-01.ibm.com/software/data/integration/ Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources. Data integration involves combining data residing in different sources and providing users with a unified view of these data. (Source: Data Integration: A Theoretical Perspective)

Others Condition of an information system in which each data item needs to be recorded, changed, deleted, or otherwise edited just once, even if it is used in several application systems. (source: Medical Data Management, A practical Guide)

Data integration means anything from two systems passing data back and forth (loosely coupled) to a shared data environment in which all data elements are unique and non-redundant and are reused by multiple applications (tightly coupled). (Source: Data Strategy)

0515a2da0c326a925f858d2e7d575d27.docx Page 48 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Description Data integration is the act of transforming data i.e mapping – toward a common standardized format. It allows users to have a unified view of data that are coming from different applications. Data integration can be the result of  Data pooling and then transformation  Data transformation and then pooling Example 1. SDTM data sets for a clinical trial is an “integrated data set” resulting from the pooling an transformation of data from different source systems 2. A Web application integrating data from various sources 3. Integration of clinical research data and metadata

3.5.3 Data aggregation Synonym Recommende Data aggregation is any process in which information is gathered and d definition expressed in a summary form, for purposes such as statistical analysis. Definition & http://en.wikipedia.org/wiki/Aggregate_data source In statistics, aggregate data describes data combined from several measurements. When data are aggregated, groups of observations are replaced with summary statistics based on those observations.[1]

http://searchsqlserver.techtarget.com/definition/data-aggregation Data aggregation is any process in which information is gathered and expressed in a summary form, for purposes such as statistical analysis. A common aggregation purpose is to get more information about particular groups based on specific variables such as age, profession, or income. The information about such groups can then be used for Web site

0515a2da0c326a925f858d2e7d575d27.docx Page 49 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

sells music CDs might advertise certain CDs based on the age of the user and the data aggregate for their age group. Online analytic processing (OLAP) is a simple type of data aggregation in which the marketer uses an online reporting mechanism to process the information.

Description . Differences between Data integration and aggregation is:

Data integration: Transforming data into a common format within a central (virtual or physical) location, maintaining integrity and non-redundancy.

Data Aggregation is summary of the integrated data (ex: by age group, race …)

Example  Data set prepared for ISE and ISE - based on agreed standards as long as the data sets are comparable.  NIH Biomedical Translational Research Information System (BTRIS): A clinical research data repository for aggregation and re-use of data collected at the NIH.

0515a2da0c326a925f858d2e7d575d27.docx Page 50 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

4 Appendices 4.1 CDISC glossary and abbreviation

0515a2da0c326a925f858d2e7d575d27.docx Page 51 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

4.2 Related Documents

Related Documents Reference Document Name Filename No. [FDA1] Guidance for Industry. Providing Regulatory http://www.fda.gov/downloads/Drugs/Guid Submissions in Electronic Format — Standardized ances/UCM292334.pdf Study Data - DRAFT GUIDANCE . February 2012 [CDISC1] CDISC Glossary – 2009 http://www.cdisc.org/stuff/contentmgr/file s/0/08a36984bc61034baed3b019f3a87139/ misc/act1211_011_043_gr_glossary.pdf [ISO1] ISO1179 - Accessible on ISO site ISO/IEC 11179 Metadata Registry (MDR) standard [ISO2] ISO2109 Accessible on ISO site (draft version ISO 21090 Healthcare Data Type Standard available on Internet)

4.3 Working group members Name email Isabelle de Zegher (co-chair) [email protected] Mitra Rocca (co-chair) [email protected] Marcelina Hungria (co-chair) [email protected] Yun Oldshue [email protected] Kenneth Stoltzfus [email protected] Julie James [email protected] Tim Church [email protected]

0515a2da0c326a925f858d2e7d575d27.docx Page 52 of 53 Project: Metadata Management Title: Metadata Definitions Working Group: Emerging Technologies

Version: 1.0 Date: May 21, 2014

Gregory Steffens [email protected] Praveen Garg [email protected] John Leveille [email protected] Aimee Basile [email protected] Sam Hume [email protected]

0515a2da0c326a925f858d2e7d575d27.docx Page 53 of 53

Recommended publications