i

n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417

journa l homepage: www.ijmijournal.com

Quality metrics for detailed clinical models

a,b c,d a,e,∗ f,g

SunJu Ahn , Stanley M. Huff , Yoon Kim , Dipak Kalra

a

Department of Health Policy and Management, College of in Seoul National University, 28 Yeongeon-dong, Jongno-gu, Seoul,

Republic of Korea

b

Graduate School, Interdisciplinary Program in Biomedical Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul,

Republic of Korea

c

Intermountain Healthcare, Salt Lake City, UT, USA

d

Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA

e

Institute of Health Policy and Management, Seoul National University Medical Research Center (SNUMRC), 28 Yeongeon-dong,

Jongno-gu, Seoul, Republic of Korea

f

Department of Health Informatics, University College London, London, United Kingdom

g

Centre for Health Informatics and Multiprofessional Education, University College London, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Objective: To develop quality metrics for detailed clinical models (DCMs) and test their valid-

Received 30 June 2012 ity.

Received in revised form Methods: Based on existing quality criteria which did not include formal metrics, we devel-

18 September 2012 oped quality metrics by applying the ISO/IEC 9126 software quality evaluation model. The

Accepted 22 September 2012 face and content validity of the initial quality metrics were assessed by 9 international

experts. Content validity was defined as agreement by over 70% of the panelists. For elic-

Keywords: iting opinions and achieving consensus of the panelists, a two round Delphi survey was

Electronic health record conducted. Valid quality metrics were considered reliable if agreement between two evalu-

Detailed clinical models ators’ assessments of two example DCMs was over 0.60 in terms of the kappa coefficient.

Quality metrics After reliability and validity were tested, the final DCM quality metrics were selected.

Results: According to the results of the reliability test, the degree of agreement was high (a

kappa coefficient of 0.73). Based on the results of the reliability test, 8 quality evaluation

domains and 29 quality metrics were finalized as DCM quality metrics.

Conclusion: Quality metrics were validated by a panel of international DCM experts. There-

fore, we expect that the metrics, which constitute essential qualitative and quantitative

quality requirements for DCMs, can be used to support rational decision-making by DCM

developers and clinical users.

© 2012 Elsevier Ireland Ltd. All rights reserved.

need to be standardized [1–5]. These data structure defini-

1. Introduction

tions, known as detailed clinical models (DCMs), specify how

a particular kind of entry within an EHR has to be organized,

To exchange and interpret clinical information consistently for example, which clinical terms apply, what value range is

between (EHR) systems, the data valid, which measurement units may be used, which data

structures used to represent distinct items of information items within the structure require a value [6,7]. The clinical

Corresponding author at: Department of Health Policy and Management, College of Medicine in Seoul National University, 28 Yeongeon-

dong, Jongno-gu, Seoul, Republic of Korea. Tel.: +82 10 5024 6146; fax: +82 2 743 2009.

E-mail address: [email protected] (Y. Kim).

1386-5056/$ – see front matter © 2012 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ijmedinf.2012.09.006

i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 409

element model by Intermountain Healthcare [8,9], templates Union, led an EC funded research project to develop criteria

defined by HL7 International [10,11], archetypes defined by the for the quality classification of EHR systems [21]. The research

openEHR Foundation [12,13] and ISO 13606, the Clinical Infor- resulted in a set of archetype quality criteria, which covered

mation Model in the Netherlands [14], and the Clinical Content administrative, clinical, technological, information manage-

Model by the Center for Interoperable EHR in South Korea [15] ment, and repository operation requirements. Furthermore,

are examples of published DCM formalisms that can be used they emphasized the use of standard terms and modeling

to organize the clinical content of an EHR interoperability mes- language, the construction of repositories for DCM sharing,

sage and an EHR repository, share decision logic, and build a and the importance of metadata. Among the clinical require-

data capture form of a clinical application. Each instance of ments, the requirement for clinical use suggests the listing

a DCM dictates how a corresponding generic EHR represen- of accurate use patterns of clinical concepts, specification of

tation is to be used to represent particular types of clinical whether the corresponding archetype is used in a specific

information. Examples of DCM instances might include rep- workflow, description of subject population groups, as well as

resentations for documenting a pain symptom, heart sounds, expert groups using an archetype, etc.

liver function tests, a prescribed drug, or a chest X-ray report. These published requirements and criteria are valuable

Whereas DCMs have the potential to improve clinical deci- sources as a starting place to define the good quality character-

sion support and clinical documentation in EHR system, the istics of DCMs (Table 1). However, these existing quality criteria

critical challenge is to identify the qualitative and quantitative have different levels of detail. Furthermore, these criteria have

requirements of DCMs. A few studies have suggested some not been specified to the level of precision that is needed to

quality requirements for DCMs (Table 1). For the flexibility undertake formal and objective evaluation of the quality of

and scalability of DCMs, general requirements for the system DCMs. Current DCM criteria cannot compare the quality of

in which clinical data models are implemented demand the DCMs due to the absence of validated and reliable measur-

following [16,17]: (1) the addition of elements and attributes able characteristics. There is, therefore, still a need to develop

to the clinical model without the necessity of changing the quality metrics that can be used to objectively quantify quality

underlying software or database schema; (2) use an exist- criteria of DCMs.

ing formalism/syntax for the representation of the model;

(3) tight binding of model attributes to standard terminology

2. Objective

systems; and (4) the existence of a mechanism for stating

‘negation’. General principles of good modeling include: (1)

This study was conducted to develop objective, reliable, and

adoption of standard terminologies for use in the models; (2)

reproducible quality metrics for DCMs drawing on the pub-

representing the models in standard modeling languages; (3)

lished quality criteria referenced above, and to test their

sharing and approving the DCMs with a community of clin-

validity.

ical experts; (4) define decision modules that reference the

models. DCM quality criteria [18] have also been proposed

where the following qualities were identified as being impor- 3. Materials and methods

tant requirements of a good DCM: usefulness, desirability, the

degree of use/acceptance in clinical services, reusability, the The study was conducted in four phases. Phase 1 developed

quality of clinical content, the degree of clinician introduc- quality metrics; phase 2 tested the validity of the quality

tion/validation, the use of vocabulary, mapping to information metrics; phase 3 tested the reliability of the quality metrics;

models, applicability, application to other technologies, and and phase 4 finalized the quality metrics. Fig. 1 shows the

maintenance. applied methods of the study (Fig. 1).

Principles for the development of DCMs [19] can be clas-

sified as principles pertaining to the structure of the DCM, 3.1. Phase 1: developing quality metrics

principles for creating the DCM content, and principles for the

DCM development process. The principles that pertain to the 3.1.1. Structure definition

structure of DCMs contains information about the language DCM quality domains were developed based on the domains

formalism, description of binding of attributes to standard ter- of the Appraisal of Guidelines and Research and Evaluation

minologies, a strategy for supporting semantic links among (AGREE) [22] to systematically define the domains of DCM

DCM instances, the definition of standard data types, and the quality metrics. Among the 6 domains used in AGREE, we

description of standard units of measure. The principles for included 4 domains and excluded 2 domains. We added 4 new

DCM content creation emphasize the granularity, reusability, quality domains so that a total of 8 quality domains were

correctness, and comprehensiveness of the models. Principles defined. See Section 4.1.1 for an explanation of how the 8

for the DCM development process emphasize evidence based domains were chosen.

model development, the need for proper use cases, use of meta

data to track changes, and compliance to the syntax of the 3.1.2. Classification and specification of the quality

modeling language. criteria into quality domains

Archetype representation requirements [20] published in Existing quality criteria were categorized under the 8 defined

ISO 13606-2 are focused more on the technological aspects of DCM quality domains to develop DCM quality metrics (Fig. 2).

models. They are divided into requirements for archetype defi- From amongst these published requirements and criteria, the

nition, archetype node constraints, and data value constraints. EuroRec archetype quality criteria were found to be the most

EuroRec, an organization that certifies EHRs in the European suitable starting point for defining metrics since they define

410 i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 CT,

in

etc.) ADL,

the

clinical of

RDF,

for UCUM,

model (SNOMED with

standards reference

standard

usage OWL,

to

type, the data

representation

and etc.)

UML,

[19]

Data

meta

Scope Accuracy Comprehensive-ness Mapping Compliance Formal Declare

(ISO international model (XML, etc.) the LOINC, Requirement terminologies Ahn • • • – information • • • • as

the in free

this and

to of

as has

number

be

natural an

[20] or

the

concepts, is

this

scope link

of that any

constraint expressed statement

synonyms

version)

may medical

of

or

given the to term

purpose

a this

and

node formalism

terminology

13606-2 in

formal description,

additional

coded A Any The A

which published definition language of mapped defining text underpinned a from knowledge specification clinical systems (including archetype, archetype Archetype representation requirements archetype terms ISO – – represented reference – • • • – • that

entity any any the more to

health or it

the

node a

entity

of

or or design, of

use

able

is each

published all its

quality

one which clinical

it

pattern

specify specify specify define be

a

clinical of

for knowledge to applies health

conforms

overall scenarios for

the in use

scenarios

it

shall shall shall shall shall across from its

of

which label

pattern

[21] and

intended

whose

for vocabulary clinical sub-populations inclusive

archetype

which

use al. entities)

published for drawn

references a

nature

be representation to

of particularly variations

et users

of clinical

be informed archetype archetype archetype archetype’s archetype archetype

it

set

An An An An The An An

have and/or cases, shall defines representation kinds formal care minor controlled citizens (or workflows should particular particular precise include • particularly • Kalra • • – • • – criteria • EuroRec

to

DCMs.

[18] content

for

fragment quality

Clinical Vocabulary XML

criteria coded terminology DCM quality Goossen – – – use/mapping – criteria • • – •

and

to

a

that

model

for have linking

the

of

coded modeling modules

must coded

requirements

modeling [17]

reference

Accurate The Standard Standard Decision

the mechanism standard DCM terminologies Requirement terminologies language language Huff – • – • – • • • quality

etc.)

be tight be must

a existing

of patient

published

can be

an [16]

a must

data

Graphs,

(XML standard

al. requirement that use

ASN.1,

modification

must

to

some et about

model

of clinical

must

The There Standard It

• • accommodate representation formalism without Conceptual anything terminologies schema, comprehensive-it model linkage stated General Coyle • terminologies for – • – with of

Comparison

support – –

criteria 1 and

ness terminologies purpose and standards reference Accuracy – Formalism Declaration Decision Compliance Comprehensive- Vocabulary/ Table Requirement Scope

i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 411

Fig. 1 – Applied methodology of the study.

2 members and tested the face validity of the criteria. Expert

Classification of the quality criteria into 8 quality domains Group 2, which included 7 members, tested content validity

by applying the Delphi technique.

3.2.2. Face validity

Refinement of the classified quality criterion to a concrete The objective of the first face validity test was to formalize

level(1 to many)

the first version of quality metrics. The draft metrics were

tested by Expert Group 1 (two members). The face validity

test includes three questions: “appropriateness of a selected

metric as a quality evaluation metric,” “appropriateness as

Decompo sition of the concrete level quality criteria into

multi ple measurable quality metrics a quality improvement metric,” and “applicability as a qual-

ity evaluation metric for DCMs other than archetypes”. This

third question was necessary since the original source quality

Fig. 2 – Development process of the initial DCM quality

criteria were developed specifically for archetypes, and it was

metrics.

important to confirm that the metrics derived from them are

more widely applicable.

quality requirements more specifically and are the most com-

3.2.3. Content validity

prehensive. To specify the quality metrics in a measurable way,

We conducted two rounds of email-based Delphi surveys.

the wording of each quality criterion was re-expressed as a for-

If a structured questionnaire can be prepared based on an

mal test statement using the principles of ISO/IEC 9126 [23–25].

extensive literature review, a revised Delphi first round is pos-

The ISO/IEC 9126 software quality standard provides a process

sible and permissible [26–28]. Such an approach was possible

for the specification and evaluation of the quality of software.

here, using the metrics resulting from the face validity tests

described above. The Delphi panel was provided a question-

3.2. Phase 2: validity

naire with questions that were the same as those that were

used for the first face validity test. In the first round, each

3.2.1. Expert panel organization

panel member was asked to score each question on a 4-point

We organized an international expert panel to test the validity

scale (4: ‘Strongly agree,’ 3: ‘Agree,’ 2: ‘Disagree,’ 1: ‘Strongly

and reliability of the quality metrics. First, a keyword search

disagree’). In the second round, feedback was provided to the

was performed from 1990 to 2010 in Medline using the key-

panel participants from the initial round of scorings (Table 2).

words ‘DCM, archetype, Clinical Model, Quality Criteria,’ and

In this study, the base level for agreement among experts was

the authors of these papers were selected for the expert panel.

defined as 70%. That is, if those who either replied ‘1’ or ‘2,’ or

Second, a list of experts was obtained from DCM-related work-

alternatively, ‘3’ or ‘4’ for each metric, comprised over 70% of

ing groups in ISO, CEN, and HL7 International. The lists from

the panel, it was taken to mean that an agreement had been

the two sources were condensed into one list. Among the

reached [29].

13 experts identified, four were unable to participate in the

panel for personal reasons; hence, the panel was finalized with

nine experts. The panel was divided into two expert groups 3.3. Phase 3: reliability

to encourage high participation since it was considered too

demanding for one panel to test both the validity and reli- The reliability of quality metrics was tested based on the

ability of all the quality metrics. Therefore, Expert Group 1 agreement of assessments between the two evaluators in

included those who had indicated that they could contribute Expert Group 1. In this study, the threshold kappa coeffi-

more time than those in Group 2. Expert Group 1 consisted of cient was set to 0.60, a value that falls within the “Good

412 i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417

agreement level” according to the guidelines for interpreting

kappa coefficients [30]. We selected archetypes as a type of round

model

DCM for our reliability testing because archetypes are in wide your

answer

use and they are based on an international standard. Among

2nd

archetypes, we selected the Apgar score and blood pressure clinical

models for reliability testing because they have all the com-

ponents and properties necessary for quality assessment [31]. other 3 4

round

your to

answer

1st 3.4. Phase 4: finalizing DCM quality metrics

The purpose of phase 4 was to refine any quality metrics for

which the results of the reliability test did not agree, in order 2.7 2.7

round

to finalize the quality metrics. For this purpose, we first ana- (mean) panel’s answer

(5)Applicability

1st lyzed the causes of disagreement in cases where the results of

reliability testing did not agree. The metrics were then refined

according to the type of disagreement. The refined metrics

a

were then tested again for their validity by Expert Group 1.

After the validity test was completed, the DCM quality metrics round

your

were finalized. answer 2nd improvement

4. Results

3 4

round

4.1. Quality metrics quality your

answer of 1st

4.1.1. Structure definition

Among the 6 domains used in AGREE, we included the follow-

ing 4: the scope and purpose, stakeholder involvement, rigor of Possibility 2.9 2.8

round

development, and clarity and presentation. We excluded the (mean) panel’s answer

(4)

1st following 2 domains: applicability and editorial independence

since these domains are not relevant in terms of guideline

specificity. An attempt was made to map all of the archetype

requirements in the published documents listed earlier to round

the four chosen AGREE domains, but some of the archetype your

answer

requirements did not fit so four new domains were added:

2nd

compliance to standards, general methodology, metadata, and

a

management and maintenance. In total, 8 quality domains

were defined.

survey.

3 4 round

your answer Relevance

1st

4.1.2. Classification and specification of the quality (3)

Delphi

criteria into quality domains

Existing quality criteria were classified into the 8 DCM qual-

round ity domains (Fig. 3). Each allocated quality criterion was then

2.6 2.6 round

refined to a concrete level. The concrete level quality crite- (mean) panel’s answer 2nd

1st

ria were then decomposed into multiple measurable quality

the metrics. Each metric was defined together with its attributes

of such as objects, evaluation method, and scoring method. for

Many of the existing quality criterion statements were found made Archetype.

the

DCM

who have

a to cover more than one testable feature.

be

of and

approved

expertise

qualifications who

should

and

evaluate

the

4.2. Validity

should

to designed

questionnaire DCM expertise seniority,

clinician(s)

Description

DCM

of

4.2.1. Face validity country) have the reviewed a include (including public clinicians and (2) The The

metric

Based on the results of the first face validity test from Expert

the Group 1, it was proposed that 2 metrics be deleted, 2 new

of of

of a

a and ones added (Table 3). The newly proposed quality metrics

Example

were “the majority of clinicians participating in the develop- have have 2

clinician(s) clinician(s)

expertise expertise ment stage” and “the majority of clinicians participating in the

Metrics

the the DCM qualifications who qualifications who DCM approved designed reviewed

Relevance

review stage”. It was proposed to remove the metric “seman- The Table (1) The

a

tic link” because this is an unfamiliar concept even to experts

i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 413

Table 3 – Summary of the metric change by validity and reliability.

Initial metrics Face validity Content validity Reliability Face validity Final metrics

1st round 2nd round

Total 38 38 38 38 30 29 29

No. of accepted metrics – 36 29 29 – – –

No. of rejected metrics – 2 9 9 – – –

No. of proposed metrics – 2 – – – – –

A : An archetype shall specify the precise nature of the

Classification of the

Table 4 – Reliability test results.

clinical entity (or set of entities) for which it defines a use

quality criteria into 8

pattern (classified into “Stakeholder involvement” quality quality domains

domain) Agreement Non agreement Kappa

Apgar score 27 3 0.73

A.1 : Parti cipation of clinician (as a potential user) at design,

Blood pressure 27 3 0.73

review, and clinical validation stage to provide clear

requirements Refinement of the Total 54 6 0.73

classif ied quality criterion

A.2 : Op en access to the results of DCM evaluation, done by to a concrete level

clinic ians not involved in the development of the DCM (1 to many)

two separate metrics (Table 3). The results agreed with 27

A.2 : Open access to the review and/or approval results of

DCM metrics but differed for the following 3: the “Appropriate sup-

•De finition: Public accessibility of the results of the review

and/or approval of DCM port for negation,” “Appropriate provision of narrative text,”

•Evaluation Subj ect: DCM metadata (approver, approving Decomposition of the concrete

organization), administrative documents, homepage level quality criteria into and “Professional discipline for use”. A characteristic of the

•Evaluation Meth od: Whether the results are accessible or not. multiple measurable quality

•Scores: ‘0’: No such information accessible as date of metrics quality metrics for which the assessments did not agree upon,

approval, co ntents of approval (approving organization), or

“Appropriate support for negation,” might not be applicable to

approval ex piry date. ‘1’: Date of approval, contents of

approval (approving organization), and approval expiry date the archetypes, or its objects of evaluation may need to be

are accessible.

expanded to the level of the reference model—a potential rea-

son for the disagreement. In the current quality metrics, the

Fig. 3 – Classification and specification example.

reference model domain is not included in the objects of evalu-

ation. Accordingly, it may not be appropriate to evaluate these

at the archetype level but with the reference model. Another

and caused confusion, and to also remove the metric “patient’s possible reason for disagreement is differences in personal

medical condition” because this will usually be redundant if opinion: this may be why the evaluation results for the “pur-

the DCM is used for the relevant populations. For some quality pose of DCM” did not agree between the two evaluators. The

metrics, it was agreed to rewrite the statement using plainer non-agreement metrics were modified and validated by the

terms. two-member Expert Group 1 in terms of evaluation method.

After the reliability test, the “Clinician-participants role” was

merged with “Clinician-participants involved in the develop-

4.2.2. Content validity

ment of DCM”.

From the results of the content validity test conducted through

the Delphi survey, the following quality metrics were con-

sidered inappropriate and were rejected: the “majority of 4.4. Quality metrics for DCMs

clinicians participating in development or review” and the

“qualification of clinicians participating in certification”. The The DCM quality metrics finalized through the validity and

reason for this rejection was that “Clinicians’ participation reliability tests covered 8 quality domains and included 29

in DCM development, review, and certification processes” quality metrics (Table 5). The 8 DCM quality domains were

was already included as a metric for evaluating “stakeholder “purpose and scope,” “stakeholder involvement,” “rigor of

involvement”. These changes therefore resulted in improved development,” “clarity and presentation,” “compliance to

wording and the removal of duplication but did not materially standard,” “general methodology,” “metadata,” and “manage-

modify the coverage of the set of metrics. ment and maintenance”.

The meanings and functions of each domain, as well as the

4.3. Inter-evaluator reliability meanings and functions of the quality metrics within each

domain were as follows. The “purpose and scope” domain

Tw o archetypes, Apgar score and blood pressure, were evalu- evaluated whether the purpose and scope of a DCM was

ated by two evaluators. The resulting kappa coefficient of the presented clearly and in a way that was understandable to

evaluation of the two archetypes was 0.73, which fell within users. The domain included quality metrics such as “whether

the result range “Good agreement level” (Table 4). When the the purposes, subjects, and patient groups of a DCM are

reliability of the quality metrics was tested using the two described”; “whether a DCM user’s school system, major, and

archetypes, the number of metrics used for this test was its applicable clinical environment are described”; “whether

30 because the metric “Clinician-participants and clinician’s there are contents on applicable patient groups; and whether

role involved in the development of DCM” was divided into applicable ages and genders are specified”.

414 i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417

Table 5 – The structure of the DCM Quality Metrics.

Quality domains Quality metrics

1. Scope and purpose 1.1. Purpose of DCM

1.2. Appropriate description of application target of DCM

2. Stakeholder involvement 2.1. Clinician-participants involved in the development of DCM

2.2. Clinician-participants involved in the review/approval of DCM

3. Rigor of development 3.1. Mention of reference(s) used in DCM development

3.2. Correctness (embodiment of usage patterns)

4. Clarity and presentation 4.1. Formal syntax (codes of data elements written in ADL, XML, UML, OWL, etc.)

4.2. Codification of DCM text data elements

4.3. Appropriate use of cardinality

4.4. Precise mention of DCM domain

5. Compliance to standard 5.1. Use of international standard terminology

5.2. Use of international standard data types

5.3. Use of international standard units of measures

5.4. Examination of use of data types

5.5. Handling exception(s) to the use of international standard terminologies

5.6. Examination of the quality of mapping to international standard terminologies

6. General methodology 6.1. Use of modifiers

6.2. Information provenance

6.3. Appropriate support for negation

6.4. Appropriate support for past history

6.5. Appropriate provision of narrative text

7. Metadata 7.1. Developer of DCM

7.2. Appropriate description of discipline of DCM user

7.3. DCM version

7.4. Accessibility of metadata

7.5. Extent of metadata

7.6. Open access to the review and/or approval results of DCM

8. Management and maintenance 8.1. Maintenance organization of DCM

8.2. Existence of user feedback mechanism for DCM

The “stakeholder involvement” domain evaluates whether complied with current standards. This domain consisted of

clinicians participated in the development process and suf- metrics evaluating the use of standard terminology, standard

ficiently reflected the needs of field users. ‘This domain data types, and standard measurement units.

comprised metrics of whether clinicians had participated in The “general methodology” domain classified the major

DCM design (development), review, and approval processes. attributes of DCMs and evaluated whether they were satisfied.

The “rigor of development” domain evaluated whether all Its metrics evaluated whether the DCM supports attributes

the data elements (or contents) of a DCM had been developed such as modifiers, the source of information, and appropriate

using evidence-based clinical knowledge. The data elements support for negation. While other domains evaluated indi-

of DCMs are usually derived from clinical service guidelines, vidual DCMs, the general methodology domain evaluated the

medical textbooks, care path guidelines, etc.; in addition, entire modeling methodology.

they may reflect current clinical practice patterns. Thus, this The “metadata” domain evaluated the faithfulness of the

domain consists of metrics for evaluating whether users metadata. Metadata is not directly related to the quality of

can see what reference data are utilized. The objects of the a DCM but provides critical information on its appropriate

domain evaluated whether a DCM described clinical data use. All of the panel members confirmed that metadata are

elements (name, value, object and attribute) accurately and essential. The quality metrics of this domain included the

specifically for field users; its metrics were also related to DCM developer, version, and the accessibility and scope of the

these items. metadata.

The “clarity and presentation” domain evaluated whether The “management and maintenance” domain evaluated

the language and format of a DCM were described clearly. activities for the process management and quality improve-

Its quality metrics included “whether to use a typified DCM ment of a DCM. Its quality metrics included “responsible

modeling language,” “data coding for computer processing,” maintenance institution” and “feedback mechanism for col-

“consistent representation of data elements,” and “appropri- lecting users’ opinions on improvement”.

ate use of cardinality”. Each metric included the following attributes: definition,

The “compliance to standards” domain is a key require- objects of evaluation, evaluation method, and scoring method.

ment for guaranteeing DCM-based exchange and reusability of The evaluation method was direct evaluation. The objects of

medical information. Therefore, it evaluated whether the DCM evaluation were usually documents describing a DCM, DCM

i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 415

contents (or instances), DCM metadata, DCM modeling lan- addition, two archetypes are one of the typical ones in terms

guage, as well as development policies and guidelines. In of their structure and attributes. Although the quality metrics

terms of scoring, ‘0’ means ‘Not satisfied’ and ‘1’ means ‘Sat- were developed from a broad set of published criteria, most of

isfied’. A detailed description of the complete set of quality the metrics ultimately came from the archetype literature. The

measures is available as supplementary material. metrics might be best suited to evaluating archetypes. Consid-

ering those limitations, external validity of the quality metrics

can be improved when its validity is tested for a larger set of

5. Discussion archetypes and other DCMs. These quality metrics can be also

improved when the metrics showed low agreement rates in

This study has developed DCM quality metrics with high valid- the reliability test were retested to check their completeness.

ity and reliability for the first time with the collaboration of It would be good to do a follow on study that would evaluate

an international expert panel. This study started with criteria whether the metrics would work as well and be as applica-

found in the published literature, but the criteria were consol- ble when reviewing HL7 templates, CDA templates, clinical

idated and converted into a coherent set of metrics that could element models, etc.

be objectively evaluated, which distinguishes this study from The Clinical Information Modeling Initiative (CIMI) is an

previous work. This set of criteria can now serve as a basis for international collaboration that is dedicated to providing a

practical use and implementation and for further research and common format for detailed clinical models so that seman-

testing. Several interesting questions follow from this work. tically interoperable information may be created and shared

First of all, will the use of the metrics have the intended in health records, messages and documents [32]. CIMI is com-

result? That is, are models that meet these metrics more con- mitted to making these specifications openly available in a

sistent, easier to use, and a better basis for interoperability number of formats, beginning with the Archetype Definition

than models that do not meet these criteria? A logical next Language (ADL) from the openEHR Foundation (ISO 13606.2)

step would be to apply the metrics to a large corpus of DCMs, and the Unified Modeling Language (UML) from the Object

some of which meet all criteria and some that do not, and then Management Group (OMG), with the intent that the users

monitor the ease of use, frequency of use, rate of error correc- of these specifications can convert the models into their

tion, etc., for all of the models and see whether the models local formats. With increasing research and implementation

that meet the metrics are actually “better” in clinical use. of DCMs in international standards organizations and local

Secondly, it would be good to know which criteria con- health information systems as well, it is important to ask

tribute the most to “goodness.” One would suspect that the how the models’ quality can be objectively measured. If DCMs

metrics are not of equal value. For example, one could specu- are to adequately support the EHR documentation needs of

late that tracking of meta data and model versions is essential clinical practice, be endorsed by clinical professional bodies

to model use while mention of references might be nice but and health services, and be adopted by vendors, these mod-

less important. It would be interesting to design an experi- els have to be of good quality, trusted, and in the future,

ment that would allow some evaluation of the value of each certified [33].

metric in producing the “best” and most usable models.

A third question would be how much it costs in time and

resources to fulfill the criteria for each metric, and whether

6. Conclusion

the cost is equal or greater than the value obtained by meet-

ing the criterion. For example, one could speculate that it

A set of quality metrics for DCMs were developed and

may be costly for modelers to involve practicing clinicians

validated using existing published quality criteria and an

at every step of model development, or that it may be time

international panel of experts. Given that the metrics were val-

consuming to collect and record literature references for the

idated by a panel of DCM experts, we expect them to be used to

models. The cost of meeting these criteria could slow or stop

support rational decision-making by DCM developers who will

the development of models and could outweigh the benefits.

now have the essential qualitative and quantitative quality

More research is needed into the cost and benefits of each

requirements to use as guidelines as they create new content.

metric to ensure that each metric is not only “good”, but that

Clinical users can then use the quantitative assessments as

it provides value that is commensurate with its development

they select models for specific use cases and implement them

cost.

in their clinical systems.

Finally, one could question whether all quality criteria and

quality domains have been discovered. Are the 29 metrics and

8 domains sufficient? Greater experience with a large corpus of

models could well lead to the discovery of additional metrics Author’s contributions

and quality domains.

Sun-Ju Ahn contributed in the design and conduct of this

There are a couple of limitations to the study. Although the

research and the writing of this manuscript. Stan Huff con-

experts agreed that these quality metrics are applicable to the

tributed as a panel member and provided valuable comments

quality evaluation of any DCM, the objects of evaluation in

to the study. Yoon Kim is an adviser to this paper. He provided

this study were limited to two archetypes. However, we think

valuable comments to the study. Dipak Kalra contributed

that these quality metrics can be used for evaluating quality of

as a panel member and provided valuable comments to

other DCMs because the structure and content of archetype is

the study.

more complex and more comprehensive than other DCMs. In

416 i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417

[7] W. Goossen, A. Goossen-Baremans, M. van der Zel, Detailed

Summary points clinical models: a review, Healthcare Inf. Res. 16 (4) (2010)

What was already known on the topic before the study? 201–214.

[8] S.M. Huff, R.A. Rocha, B.E. Bray, H.R. Warner, P.J. Haug, An

• event model of medical information representation, JAMIA 2

Detailed clinical model supports health information

(2) (1995) 116–134.

exchange and reusability.

[9] C.G. Parker, R.A. Rocha, J.R. Campbell, S.W. Tu, S.M. Huff,

In order for DCMs to adequately support the EHR doc-

Detailed clinical models for sharable, executable

umentation needs of clinical practice, to be endorsed guidelines, Stud. Health Technol. Inform. 107 (Pt 1) (2004)

by clinical professional bodies and health services, 145–148.

and to be adopted by vendors, these models have [10] G.W. Beeler, HL7 version 3 – an object-oriented methodology

for collaborative standards development, Int. J. Med. Inform.

to be of good quality, trusted and, in the future,

48 (1998) 151–161.

certified.

[11] http://www.hl7.org/ (accessed 05.03.12).

To implement DCM in clinical practice, one of the most

[12] S. Garde, R. Chen, H. Leslie, T. Beale, I. McNicoll, S. Heard,

critical conditions is that user should evaluate the

Archetype-based knowledge management for semantic

quality of the DCM systematically. interoperability of electronic health records, Stud. Health

Technol. Inform. 150 (2009) 1007–1011.

[13] T. Beale, Archetypes and the EHR, Stud. Health Technol.

What this study has added to our knowledge?

Inform. 96 (2003) 238–244.

[14] J. van der Kooij, W.T. Goossen, A.T. Goossen-Baremans, N.

• A set of quality metrics for DCMs have been developed

Plaisier, Evaluation of documents that integrate knowledge,

and validated using existing published quality criteria

terminology and information models, Stud. Health Technol.

and an international panel of experts. Inform. 122 (2006) 519–522.

Using these DCM quality metrics, any DCMs may be [15] S.J. Ahn, Yoon Kim, J.H. Yun, S.H. Ryu, K.H. Cho, Clinical

compared and evaluated in a systematic and objective Contents Model to ensure semantic interoperability of

manner. clinical information, J. KIISE: Softw. Appl. 37 (2010)

• 871–881.

The quality metrics for DCM also provides a qualitative

[16] J.F. Coyle, A.R. Mori, S.M. Huff, Standards for detailed clinical

and quantitative quality requirement to DCM develo-

models as the basis for medical data exchange and decision

pers, users and evaluators.

support, Int. J. Med. Inform. 69 (2003) 157–174.

[17] S.M. Huff, Ontologies, vocabularies, and data models, in:

R.A. Greenes (Ed.), Clinical Decision Support the Road

Ahead, Elsevier, Oxford, UK, 2007, pp. 307–324.

Competing interests [18] W.T. Goossen, Using detailed clinical models to bridge the

gap between clinicians and HIT, Stud. Health Technol.

Inform. 141 (2008) 3–10.

None of the authors have any competing interests to declare.

[19] S.J. Ahn, Development and application of development

principles for clinical information model, J. Korea Acad. Ind.

Coop. Soc. 11 (2010) 2899–2905.

Appendix A. Supplementary data

[20] International Organization of , ISO/TC 215:

Health Informatics-Electronic Health Record

Supplementary data associated with this article can be found,

Communication, Geneva. ISO 2008:13606-2.

in the online version, at http://dx.doi.org/10.1016/j.ijmedinf. [21] D. Kalra, A. Tapuria, G. Freriks, F. Mennerat, J. Devlies, et al.,

2012.09.006. Management and maintenance policies for EHR

interoperability resources, Inform. Soc. Technol. 6 (2008)

15–26.

r e f e r e n c e s

[22] The AGREE Collaboration, Appraisal of Guidelines for

Research and Evaluation (AGREE Instrument), 2001, pp. 2–20,

Available at: http://www.agreecollaboration.org/pdf/

agreeinstrumentfinal.pdf (accessed 30.07.10).

[1] R. Haux, Health information systems – past, present, future,

[23] D.L. Moody, G.G. Shanks, What Makes a Good Data Model?

Int. J. Med. Inform. 75 (2006) 268–281.

Evaluating the Quality of Entity Relationship Models,

[2] A.L. Rector, W.A. Nowlan, S. Kay, Foundations for an

vol. 881, ER, Manchester, United Kingdom, 1994,

electronic medical record, Methods Inf. Med. 30 (August (3))

pp. 94–111.

(1991) 179–186.

[24] D.L. Moody, G.G. Shanks, Improving the quality of data

[3] L.M. Braun, F. Wiesman, H.J. Herik, A. Hausman, E. Korsten,

models: empirical validation of a quality management

Towards patient-related information needs, Int. J. Med.

framework, J. Inform. Syst. 28 (2003) 80–91.

Inform. 76 (2007) 246–251.

[25] O. Yildiz, O. Demirörs, Measuring health care process quality

[4] C. Friedman, S.M. Huff, W.R. Hersh, E. Pattison-Gordon, J.J.

with software quality measures, Stud. Health Technol.

Cimino, The Canon Group’s effort: working toward a merged

Inform. 180 (2012) 1005–1009.

model, JAMIA 2 (1) (1995) 4–18.

[26] M. Jaana, H. Tamim, G. Paré, M. Teitelbaum, I.T. Key,

[5] D. Kalra, Electronic health record standards, in: R. Haux, C.

management issues in hospitals: results of a Delphi study in

Kulikowski (Eds.), IMIA Yearbook of Medical Informatics

Canada, Int. J. Med. Inform. 80 (12) (2011) 828–840.

2006, Methods Inf. Med., 2006, pp. S136–S144.

[27] N.C. Dalkey, The Delphi Method: An Experimental Study of

[6] S.M. Huff, R.A. Rocha, J.F. Coyle, S.P. Narus, Integrating

Group Opinion. Santa Monica, CA: The Rand Corporation

detailed clinical models into application development tools,

1969, Report No. RM-5957-PR, pp. 15–73.

Stud. Health Technol. Inform. 107 (2004) 1058–1062.

i n t e r n a t i o n a l j o u r n a l o f m e d i c a l i n f o r m a t i c s 8 2 ( 2 0 1 3 ) 408–417 417

[28] H.A. Linstone, M. Turoff (Eds.), The Delphi Method: [31] http://www.openehr.org/ (accessed June 2011).

Techniques and Applications, Addison-Wesley Publishing, [32] http://informatics.mayo.edu/CIMI/index.php/Main Page

Addison-Wesley Publishing Company, Reading, MA, 1975. (accessed April 2012).

[29] C.S. Goodman, R. Ahn, Methodological approaches of health [33] D. Kalra, A. Tapuria, T. Austin, G. Moor, Quality requirements

technology assessment, Int. J. Med. Inform. 56 (1999) 97–105. for EHR Archetypes, 2012 European Federation for Medical

[30] J.L. Fleiss, Statistical Methods for Rates and Proportions, Informatics, IOS Press, Amsterdam, Netherlands, 2012, pp.

John Wiley, New York, 1980, pp. 150–185. 48–52.