A Architectural Style for Outcomes Research Designing a Cardio-thoracic Patient Registry in the Heart and Vascular Institute (HVI) •Patient record abstraction •Workflow management •Quality Reporting •Cohort identification

Chimezie Ogbuji 1.Exploring Validation in an End-to-end XML Architecture” 2007 2.“GRDDL: The Why, What, How, and Where” 2008 3.“Semantic Web Technologies as a Framework for Clinical Informatics” 2009 4.“A Role for Semantic Web Technologies in Patient Record Data Collection” 2010 5.“Harnessing Cyc to Answer Clinical Researchers’ Ad Hoc Queries” 2010

1 SemanticDB

 http://www.w3.org/2001/sw/sweo/public/ UseCases/ClevelandClinic/  Challenges: ◦ Fragmented gathering and storing of data ◦ Compartmentalization of medical science and practice ◦ Clinical knowledge is typically expressed in ambiguous, idiosyncratic terminology. ◦ Problematic for longitudinal patient data that can feasibly span multiple, geographically separated sources and disciplines 2

2 The Director of Research’s Goal

 Create a framework for context-free data management systems.  Expert-provided, domain-specific knowledge is used to control all aspects of data entry, storage, display, retrieval, communication, and formatting for external systems.  Context-free: the framework can be used for any domain and nothing (or little) about the domain is assumed or hardcoded. 3 3 Architectural Style

 a coordinated set of architectural constraints that restricts the roles/ features of architectural elements and the allowed relationships among those elements within any architecture that conforms to that style [Fielding ‘00]  The Semantic Web and Linked Open Data are architectural styes that restrict the roles and features to those of the W3C’s Semantic Web Activity standards

4

4 GRDDL: The Acronym

 Gleaning  Resource  Descriptions (from)  Dialects (of)  Language

 Rather long and clumsy

5 GRDDL: By Deconstruction

 Wordnet Definition of Glean: ◦ (gather, as of natural products) ◦ Synonyms: reap, harvest.  Resource Description Framework (RDF) ◦ Logical assertions as a labeled, directed graph of web resources  Dialects of Language ◦ XML document families (XHTML, for instance)

6 GRDDL: By Analogy GRDDL can be thought of as a protocol for sowing semantics in web content for later harvest.

7 Why

 Vast amount of latent semantics in markup Chimezie Ogbuji  Web content today is primarily built for human consumption  Text indexing will only get you so far for large-scale document retrieval from discrete data

8 Faithful Rendition

“By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”  Licenses an author-certified interpretation of an XML document  A powerful paradigm for messaging  See David Booths “RDF and SOA”  http://www.w3.org/2007/01/wos-papers/booth

9 GRDDL Transformations

 Functions that take an XML document and return an RDF graph  Transformations can be written in any particular language  The “reference” transformation language is XSLT  “[XSLT1] is the format most widely supported by GRDDL- aware agents as of this writing […] is specifically designed to express XML to XML transformations and has some good safety characteristics”

10 Namespace Documents

“Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace”  A GRDDL source document lives at the location of the namespace URI of the root element (the namespace document)  The GRDDL result of the namespace document has a statement of the form: ?nsDoc grddl:namespaceTransformation ?txDoc • txDoc is the location of a transformation applicable to such XML documents

11 Hidden Value Proposition

 Supports separation of concerns:  XML for messaging, data collection, structural validation  RDF for Expressive assertions, inference, etc.  A way to invest in data richness and accessibility

12 Constraint: Dual Representation

 Emerging archetype: ◦ XML is the document and messaging syntax ◦ A semantics-preserving RDF rendering (mirrored persistently in an RDF dataset) is used as the knowledge representation (KR) for inference and querying.  Facilitates symbiotic usage of documents, messages, and formal KR in a content repository with XML and RDF processing capabilities

13

13 Patient Record Abstraction

[1]

14

14 Declarative User Interface Plans

15

15 Compiling Screens: The What

 Compiling XForms via XSLT from user interface plan documents ◦ Refer to concepts in domain ontology via their URI from UI widgets  Compile schematron rules into XForms binds ◦ Schematron is a rule-based grammar checking language for XML document families  Salt (XForms) instance data with state information

16

16 Demo of XForms app on Firefox

17

17 Data Entry Workflow Mgt.

 Semantic web technologies can be used to facilitate a patient record data collection workflow over hundreds of thousands of patient records [3]  RDF works well as the state machine of a workflow engine  Process of transcribing details of a procedure from the EHR into a registry can be thought of as a business process whose is managed in RDF ◦ Concurrent data collection task 18 18 Workflow State as RDF Dataset

 Each task is an XML document in an open source content repository  Mirrored into a named RDF graph shares a web location (the name) with the document  (SPARQL) query is dispatched against a case management dataset to find tasks in particular states or assigned to particular people  Task provenance is managed via XForms 19

19 Data collection task schema

20

20 Workflow State as RDF Dataset

 Web resources in resulting solutions can be interacted with to fetch: ◦ XML representation (for use with XForms) ◦ JSON representation (for use with Exhibit) ◦ Exhibit (XHTML) documents that render faceted view of a collection of tasks ◦ Faceted view includes links to subsequent stages in workflow and into other web applications on same web server  Such hypermedia applications are quite RESTful [3] 21

21 Quality Reporting

22

22 Cohort Identification

 SPARQL and RDF datasets are well-suited as infrastructure for longitudinal patient record data warehouses [3][5] ◦ Longitudinal patient record: Patient records from different times, providers, and sites of care that are linked to form a lifelong view of a patient’s health care experience

23

23 GRAPH Operator

24

24 SPARQL Topology

 an RDF dataset with no default graph and one named graph per patient record (a patient record graph)  There are almost no cross-graph statements  Beyond cohort identification criteria, most processing happens within a single patient record graph

25

25 SPARQL Constraint

 In our vocabulary, there are instances of PatientRecord, Operation, Patient, etc.  PatientRecord resources share a URI with their containing graph  GRAPH operator can be used to optimize the search space  Optimal for cohort querying (constraints in the first part of query are cross-graph while the second part are inter-graph) 26

26 27 Expressive power of

 3974+ OWL Classes, 171 Object properties, 217 Datatype properties  Diseases, findings, symptoms, medication, procedures, etc…  SHOIN(D) expressiveness (OWL-DL)  Use second-order classes to model controlled vocabularies (drop-down data collection lists)

28

28 Semantic Research Assistant

 Cyc-powered medical expert system for semantic web content repositories  Natural-language driven interface for composing logical queries against a SPARQL Protocol service  Used to identify patient cohorts from Cariothoracic procedure registry in HVI  An artificially-intelligent logically-aligned to an RDF dataset via ontology (OWL) and rules 29

29 30

30 31

31