Managing semantic in public private information chains: A reference architecture for alignment of semantics, technology and stakeholders

Master thesis MSc Systems Engineering, Policy Analysis & Management (SEPAM)

V. den Bak Student #: 1268260

Delft University of Technology Thauris B.V. Faculty of Technology, Policy & Management

Graduation Section Information & Communications Technology

Graduation committee Prof.dr. Y. Tan (chair) Information & Communications Technology Dr.ir. M.F.W.H.A. Janssen (co-chair) Information & Communications Technology Ir. N. Bharosa (first supervisor) Information & Communications Technology Drs. H.G. van der Voort (second supervisor) Policy, Organization, Law & Gaming R. van Wijk MSc (external supervisor) Thauris B.V. Drs. S. Kockelkoren (external supervisor) Thauris B.V.

1

2

Managing semantic metadata in public private information chains: A reference architecture for alignment of semantics, technology and stakeholders

Master thesis MSc Systems Engineering, Policy Analysis & Management (SEPAM)

Author: Victor den Bak Student #: 1268260

Institution: Delft University of Technology Faculty of Technology, Policy & Management Jaffalaan 5 2628BK Delft The Netherlands

In concordance with: Thauris B.V.

Program: MSc Systems Engineering, Policy Analysis & Management Section: Information & Communications Technology Course: SEPAM Graduation Project (SPM5910) Submitted: 24-10-2011 Graduation: 7-11-2011

Graduation committee Prof.dr. Y. Tan (chair) ICT Dr.ir. M.F.W.H.A. (Marijn) Janssen (co-chair) ICT Ir. N. (Nitesh) Bharosa (first supervisor) ICT Drs. H.G. (Haiko) van der Voort (second supervisor) POLG R. (Remco) van Wijk MSc (external supervisor) Thauris B.V. Drs. S. (Stephan) Kockelkoren (external supervisor) Thauris B.V.

3

4

Acknowledgements Before presenting this research, I would like to express my gratitude to those without whom this research project could not have been completed successfully.

First, I would like to thank the members of my graduation committee for their support: Marijn Janssen, Nitesh Bharosa, Haiko van der Voort, Remco van Wijk and Stephan Kockelkoren. They have always been available to provide useful advice and valuable comments on my work.

Also, I am grateful to my colleagues at Thauris. They provided a friendly and motivating atmosphere that ensured steady progress amongst the many incentives for distraction. In particular Joris Hulstijn, who helped to structure this difficult problem on several occasions.

Finally, I would like to thank the 18 interviewees from TU Delft, Bureau Jeugdzorg and Belastingdienst. These people made time available in their busy schedules and have provided me with valuable insights. Their hands-on and subject-matter expertise added enormous value on top of the scientific literature available.

5

6

Summary The use of a common set of semantic metadata is seen as one of the most promising developments in information exchange among public and private parties. Semantic metadata is data that provides context to core data and helps to convey the actual meaning and perspective of the information that is shared among people, systems and organizations. All information sharing activities are aimed at one objective: having the right information available to the end user, with as little loss, time delay and clutter as possible. Using a common set of semantics in electronic information exchange is believed to further reduce costs and time of information exchange, increase information quality and remove many of the unforeseen side effects and complexities of interconnecting stand alone information systems.

Semantic metadata management is required in order to use semantic metadata effectively in a PPIC. A common vocabulary is of little use if it does not match organizational requirements. The main difficulty in semantic metadata management is that it touches on many elements of the organizational architecture. Semantic metadata management is primarily an alignment effort and partially a standardization effort. It includes the alignment of processes, technology and data models, both within and beyond organizational boundaries. However, many existing semantic metadata management approaches are ad hoc and lack a coordinated and premeditated approach. There are many theories and studies on individual topics related to metadata management, but a documented approach that puts all elements within the given scope in perspective is non-existent.

This master thesis project was aimed at aiding those tasked with implementing a coordinated form of semantic metadata management within the domain of Public Private Information Chains (PPIC). The problem was approached from an enterprise architecture point of view. This means a broad, holistic view was applied. A PPIC is a digital information chain consisting of both public and private parties that is centered around a certain information process with a high rate of repetition and mutual responsibilities.

This research project started out with a literature review and expert interviews. Best practices were extracted and tested in an in depth case study with two complementary cases in Dutch government organizations. The main research question has been answered by developing a reference architecture for semantic metadata management in a PPIC. A reference architecture is a generic blueprint that provides a holistic approach for a specific architecture archetype. It puts all elements required for semantic metadata into perspective making it easier to structure the many pieces of the puzzle. The reference architecture uses a format that on the one hand provides enough rigor to ensure interoperability, while on the other hand provides enough leeway to fit organizations with different characteristics or specific requirements. The mixture of rigor and leeway has been achieved by using both prescriptive design principles and tradeoffs that extend the design space.

The reference architecture is centered around mitigating the main challenge in this domain and reinforcing one of the main potentials: reduction of complexity. Much of the complexity regarding information exchange in PPIC’s is artificial, not inherently present. Challenges have arisen by creating connections between systems, processes and organizations that were never designed from the outset to be interconnected in such a way. The semantic metadata management approach that is

7 introduced in this research has two pillars. First, a conceptual model is introduced to act as a single point of reference between all components, reducing the number of existing relations. Second, the relations between all components in the organizational architecture are actively managed. This proactive approach reduces incidents and improves information quality.

The solution presented in this thesis is generic. The design principles and tradeoffs apply in a similar way to both private and public organizations. Moreover, it applies to organizations with different maturity levels in technology, data management and processes and with a varying level of ambition on this topic. In an information chain the diversity in stakeholders and their interests is a given situation. A certain degree of commitment and effort can be expected from the stakeholders in the chain, but semantic metadata management should not interfere with the private processes or bring an additional burden. The evaluated reference architecture presented in this study deals with this problem. Even though the solution is primarily aimed at providing benefits in inter-organizational information exchange, it is useful for internal use in individual organizations as well.

8

9

10

Table of contents

1 Introduction ------17 1.1 Problem statement ------17 1.2 Research domain ------17 1.3 Research goals ------19 1.4 Scope ------19 2 Methodology ------21 2.1 Research questions ------21 2.2 Methodology ------23 2.3 Case study approach ------27 2.4 Reference architecture ------29 3 Public Private Information Chains ------33 3.1 Public policy and bureaucratic processes ------33 3.2 Characteristics of Public Private Information Chains ------34 3.3 Public Private Information Chains in practice ------36 4 Semantic metadata: potential and challenges ------37 4.1 Potential benefits of standardizing semantic metadata ------37 4.2 Challenges regarding semantic metadata management ------41 5 Aspects of semantic metadata management in literature ------47 5.1 Semantic metadata management: Data perspective ------47 5.2 Semantic metadata management: Technology perspective ------54 5.3 Semantic metadata management: Process perspective ------61 6 Preliminary architecture ------69 6.1 Reference architecture design process ------69 6.2 Design propositions ------71 6.3 From prescriptive to evaluated reference architecture ------72 7 Case study 1: Child protective services ------73 7.1 Background ------74 7.2 Metadata management – Technology ------77 7.3 Metadata management – Data ------80 7.4 Metadata management – Processes ------83 7.5 Conclusion ------86 8 Case study 2: Tax office ------87

11

8.1 Background ------88 8.2 Metadata management – Technology ------91 8.3 Metadata management – Data ------93 8.4 Metadata management – Processes ------96 8.5 Conclusion of the tax office case ------100 9 Evaluated architecture ------101 9.1 Preconditions ------101 9.2 Design principles ------102 9.3 Reference architecture overview------107 9.4 Tradeoffs ------110 9.5 Expert session ------118 10 Conclusion ------121 10.1 Conclusions ------121 10.2 Reflection on evaluated reference architecture ------126 10.3 Recommendations ------129 10.4 Scientific contribution ------131 10.5 Personal reflection ------134 11 References ------135 12 Appendix ------139

12

List of figures Figure 1: Information System Research Framework adapted from Hevner (2003). The original generic content of the environment, research and knowledge base has been replaced by content applicable to this study...... 24 Figure 2: Reference architecture design model by Muller (2011)...... 25 Figure 3: Overview of the research process. Created by author...... 25 Figure 4: An indication of where semantics combined with automation can provide gains. Reducing the time spent at tasks with very little added value. Adopted from Hoffman et al...... 40 Figure 5: TBM Enterprise Architecture meta-framework. By Janssen (2009)...... 42 Figure 6: Graphic representation of an information chain, showing that each organization within the chain has an enterprise architecture. Alignment of semantics takes place both within and amongst organizations. Created by author...... 42 Figure 7: Metadata example. Created by author...... 48 Figure 8: A setup in which metadata is loosely coupled to the core data. From Brandt et al (2003). . 49 Figure 9: The Information Maturity Model by McClowry (2008). Image is released in the public domain...... 50 Figure 10: Metadata levels in the business semantics management model. Derived from De Leenheer (2010)...... 53 Figure 11: Overview of optimal application of internal and external metadata. Created by author. .. 57 Figure 12: External metadata storage in a large enterprise data storage system, showing where a metadata management tool enters the picture. Created by author ...... 59 Figure 13: Business semantics management cycles. From De Leenheer (2010)...... 64 Figure 14: Flows of information to and from BJz. Created by author ...... 75 Figure 15: Information chain within BJz, showing both communications within a team and as a team with others. Created by author...... 76 Figure 16: Overview of data streams in the primary process regarding sales tax. Created by author. 89 Figure 17: Steps taken in filing a tax report through SBR. Created by author...... 90 Figure 18: Top down specification of metadata for creating the partial taxonomy (top) and bottom up reuse of existing metadata (bottom). Created by author...... 97 Figure 19: Overview of reference architecture for metadata management in a PPIC. The numbers correspond to the numbers of the design principles. Created by author...... 108 Figure 21: Cooperation models within the stakeholder constellation. Created by author...... 110 Figure 22: Governance archetypes. Created by author...... 111 Figure 23: Representation of potential positioning of moment of consultation within change management cycle. Created by author...... 114

13

Figure 23: A single definition provided with an example and characteristics. Ownership and status are shown on the right, with other options in the menu below. From Collibra...... 148 Figure 24: Overview of relations defined between several semantics. From Collibra...... 149 Figure 25: A relation between semantics being defined in a menu. From Collibra...... 149 Figure 26: A simple business rule added to a definition. From Collibra...... 150 Figure 27: A taxonomy created from a number of semantics. Combining both categories and relations. From Collibra...... 150 Figure 28: Information products and relations present in a single generic two year OTS case. Note that there may be multiple instances of each product, with the average case file having about 700 pages. Created by author...... 154 Figure 29: Overview that shows links among information products for any form of reuse of information. In a regular case there are multiple instances of most document types. Created by author...... 155 Figure 30: The reuse of information among information products relating to the planning, showing what information is reused in what way. Created by author...... 156 Figure 31: The reuse of information among information products relating to the evaluation, showing what information is reused in what way. Created by author...... 156 Figure 32: The reuse of information among information products related to the follow up planning, showing what information is reused in what way. Created by author...... 157

14

List of tables Table 1: Features of complementary cases. Created by author...... 28 Table 2: Reference architecture quality indicators. Created by author...... 31 Table 3: TOGAF quality indicators relating to design principles. TOGAF (2007) ...... 32 Table 4: Overview of XML based semantic metadata standards. Created by author...... 55 Table 5: Overview of theories on standardization. Created by author...... 61 Table 6: Roles identified in web service orchestration by Janssen, Gortmaker & Wagenaar (2006). .. 65 Table 7: Overview of the 14 steps of reference architecture design. The blue steps relate the preliminary phase and the red steps to the evaluation phase. Created by author...... 70 Table 8: Overview of roles within BJz. Created by author...... 85 Table 9: Overview of roles within the tax office/SBR case. Created by author...... 99 Table 10: Remarks on the positioning of the design principles in the reference architecture overview figure. Created by author...... 109 Table 11: Roles in semantic metadata management. Adapted from Janssen, Gortmaker & Wagenaar (2006), created by author...... 113 Table 12: Overview of ranking (blue) and chronology (white) by experts. Created by author...... 119 Table 13: Test on quality indicators. Created by author...... 128 Table 14: Test on quality indicators for design principles. Created by author...... 128 Table 15: List of reference architecture design steps. Created by author...... 144 Table 16: Overview of reference architecture structures. Created by author...... 145 Table 17: Roles identified in web service orchestration by Janssen, Gortmaker & Wagenaar (2006)...... 152 Table 18: Roles in semantic metadata management. Adapted from Janssen, Gortmaker & Wagenaar (2006), created by author...... 153

15

16

1 Introduction This chapter provides introduces the motivation behind carrying out this master thesis. The problem statement and the main research question are in the first section. This is followed by an introduction to the research domain and the research goals. The final section provides the scope of this research project.

1.1 Problem statement The use of a common set of semantic metadata is seen as one of the most promising developments in information exchange among public and private parties (ICTU, 2006; Morgan, 2005; Sbodio, Moulin, Benamou, & Barth, 2010; WRR, 2011). Semantic metadata is data that provides context to core data. It helps to convey the actual meaning and perspective of the information that is shared among people, systems and organizations. A common vocabulary in semantics allows for reduced transmission costs, faster retrieval and processing of information, improved information quality and allows for new types of automation (Ghosh, 2010). Successful use of a common set of semantic metadata requires a form of management (Houtevels, 2010). Without management of semantic metadata the potential advantages cannot be realized or are significantly inhibited. However, at this moment there is a lack of knowledge on how semantic metadata should be managed within the complex setting of public private information chains.

Semantic metadata requires management since semantics are dynamic and need to be aligned over the various links in the information chain. Semantics change over time, especially in a setting of multiple stakeholders, changing processes and technological advances (Pieter De Leenheer, 2009). Semantics have a lifecycle since they need to be specified, verified, updated and removed (Hepp, De Leenheer, De Moor, & Sure, 2008). A common set of semantics has little value if it does not match the information it ought to describe, contains faults or contains much irrelevant items. Organizing semantic metadata management Public Private Information Chains currently is a black box in both science and practice. Semantic metadata management approaches are ad hoc and lack a coordinated and planned approach. There are many theories and studies on individual topics related to metadata management, but a documented approach that puts all elements within the given scope in perspective is non-existent.

This research shows that semantic metadata management is an alignment effort of processes, data models, technology and stakeholders. More specifically it provides an answer to the following main research question: What design principles are required and what tradeoffs still have to be made in a reference architecture for semantic metadata management in public bodies that operate in a Public Private Information Chain? As such it provides a generic perspective on how semantic metadata management should be carried out in Public Private Information Chains. This perspective should make implementing semantic metadata management in a specific context easier, allowing more of the potential benefits to materialize sooner and at lower costs.

1.2 Research domain Private parties, consisting of both individuals and organizations such as companies, need to exchange information with the government. The government is not a single entity but consists of

17 numerous public bodies, each with its own services portfolio. The nature of this information exchange is diverse. It may relate to permits, regulation, taxes, statistics, subsidies and even primary processes. In order to save time and money many processes carried out by humans using a paper based administrative system have been replaced by IT-systems in both private parties and public bodies. Additionally, with the rise of the internet many of those IT-systems have been interconnected to reduce the transaction costs of exchanging information. The combination of these two trends has led to a phenomenon called Public Private Information Chains (PPIC). These are digital information chains centered around a certain type of information, with a high rate of repetition and mutual responsibilities and dependencies amongst both public and private parties. The PPIC's have shown great benefits, reducing costs and time of bureaucratic processes significantly. However, even with those benefits there still is a desire among politicians and private parties alike to reduce bureaucracy. The reduction of administrative burden has been one of the focal points of Balkenende III and IV coalition agreements. The processes are vital to the executive branch of the government and the execution of policy, but their associated costs are a burden on society (WRR, 2011).

There is room to improve performance of existing PPIC’s since many inefficiencies remain . The root cause is that most information systems in use predate the genesis of contemporary PPIC’s and have not been designed to be interoperable in the manner of their current role (Hepp, et al., 2008). Note that in this context information systems are mentioned, not IT-systems. An information system is defined as the whole complex of people, IT-systems, data and processes. This means that the interoperability challenge exceeds the field of technology. Even when through web services IT- systems have been made interoperable, the organizations within a PPIC are still heterogeneous (Sun & Yen, 2005). Information is used for different purposes by experts of a different background. This poses different quality needs and even with a standardized formats interpretations of exchanged information may still vary.

The heterogeneity of stakeholders in a PPIC makes that the challenge of information exchange is wider than technology alone. The transaction costs incurred in information exchange do not only include transmission, the majority is made up by translation costs (Delone & McLean, 1992). Translation is the effort of interpretation and reformatting the information for use in another information system or for another audience. An alternative approach is to get the information from the source, avoiding translation in all other links in the chain. Often this is not an option within a PPIC. Many intermediary information products are exchanged while each link in the information chain adds a unique value, such as expertise or efficiency.

The solution for increased efficiency and improved information quality within the PPIC as a whole lies in using a common set of semantics within the chain. Semantics that describe information leave room for diversity in information itself, while lowering translation costs. The effort of aligning the various IT-systems, data models and stakeholders to use a common set of semantic metadata is called semantic metadata management.

18

1.3 Research goals This master thesis project was aimed at aiding those confronted with organizing semantic metadata management in a PPIC. The problem was approached from an enterprise architecture point of view. This means a broad, holistic view was applied. The goal was to provide generic approach for semantic metadata management that can be used as a starting point for a specific implementation. The chosen format for the generic approach is a reference architecture.

This research project started out with a literature review and some expert interviews. Best practices were extracted and tested in an in depth case study with two complementary cases. The end result is a reference architecture that consists of (prescriptive) design principles and tradeoffs. Following from this methodology the scientific value of this research is threefold: 1. This research provides an overview of various topics related to semantic metadata management in scientific literature. Additionally, the case study provides accurate descriptions of two real life examples of metadata management in a PPIC. 2. The cohesion among these topics and their implications are shown. The holistic view provides context to each topic, shows what tradeoffs apply and under which conditions each topic is applicable. 3. The format of the reference architecture is unique in the fact that is both prescriptive and allows for leeway with the inclusion of tradeoffs. The multi-stakeholder context required this leeway, but is not unique to semantic metadata management. Stakeholder complexity is found in many other and this type of reference architecture design could be applicable to other research areas as well.

1.4 Scope The scope of this research project is limited to Public Private Information Chains in the Netherlands. The reason behind this delineation is that cross organizational bureaucratic processes in government are very visible and well documented. Semantic metadata management is also a challenge that exists in the private sector, but examples are less visible and statements on generality are harder to support. The Netherlands as geographic delineation was chosen for practical reasons. For a Dutch student the Netherlands as geographic scope is more practical for interviews and site visits regarding the case studies.

The government institutions within the scope are operating within a Public Private Information Chain (PPIC), a concept described in detail in chapter 3.1. As such they interact with many other public and private parties. This adds a multi-actor complexity on top of the technical challenges and integration within the primary processes as presented in the problem statement. This multi-actor complexity follows from the need to achieve a certain level of inter-organizational alignment in a domain that is immature at this moment. The public sector is further characterized by:  Operating within a context of organizational stimuli and values that differ significantly from organizations operating on a competitive market. This includes a non-profit driven attitude, a low tolerance for faults and a duty to handle all cases regardless of respective effort.  The use of very private personal or organizational information that is entrusted with the confidence and duty that they remain private.

19

 Strict regulation by laws regarding procedures, non-ambiguity, diligence, due dates, quality assurance and compliance.

Aside from the context the subject of this research is delineated to semantic metadata management, not the everyday use of semantics. Only the relevant aspects of technology, data models and organizational alignment will be discussed. The definition used in this research is that semantic metadata management is the whole set of procedures and tools related to regarding administration, application, alignment and governance of semantic metadata. Regarding semantic metadata management itself the scope is limited to external semantic metadata that is used beyond organizational boundaries. This has the following consequences:  Administrative and structural metadata are seen as context. The focus is on semantic metadata. Structural and administrative metadata are well researched and are used for very different purposes than semantic metadata. They are for example not visible or of any direct use to the end user of the information they relate to, unlike semantic metadata which is specifically added for use by the end user of the information.  Internal and non-standardized metadata are not included as they fall primarily within the domain of web 2.0 and crowd based tagging (Pieter De Leenheer, 2009). And this domain is seen as less than optimal for business and government use (Hepp, et al., 2008). The external metadata is stored and managed separately from the content, mostly in the form of a common glossary in a taxonomy or ontology.  Only semantic metadata that is used in more than one link in the information chain is included. Very specific metadata that is only used internally or is otherwise irrelevant to the functioning of the information chain is out of scope.

20

2 Methodology This chapter describes how this research project has been set up and carried out. First the research questions are presented in section 2.1. The methodology used in this research project consists of a number of instruments. The rationale behind the choice of instruments can be found in section 2.2. This is followed by an overview of the case study approach in section 2.3. The main outcome of this research is a reference architecture for semantic metadata management in public private information chains. Section 2.4 describes the rationale behind the type of reference architecture that has been developed in this research.

2.1 Research questions In this research a reference architecture for semantic metadata management in a public private information chain is developed and evaluated. In order to do this research must be performed in order to accumulate all the knowledge that is required to answer the main research question and to come up with a good design. The information needs are encompassed by five research questions. The answers to the research questions will together answer the main research question that is presented first. All research questions are presented in their context. The relevant research instruments are described in the following three sections in this chapter.

Main research question At this moment there is a lack of knowledge on how semantic metadata should be managed within the complex setting of public private information chains. Organizing semantic metadata management currently is a black box in both science and practice. There are many theories and studies on individual topics related to metadata management but an approach that puts all elements in perspective is nonexistent. Since metadata is credited with many potential benefits (Hepp, et al., 2008) and requires management to maximize its utility (Pieter De Leenheer, 2009) knowing how metadata management should be organized is relevant for both the scientific knowledge base and project managers within public and private organizations that are confronted with this issue. This leads to the following main research question: What design principles are required and what tradeoffs still have to be made in a reference architecture for semantic metadata management in public bodies that operate in a public private information chain?

2.1.1 The relevance of metadata Question 1: Why is metadata mentioned in a wide range of solutions to an even wider range of challenges in large cross organizational IT-systems? The answer to this question indicates the relevance of the research domain. Semantic metadata is mentioned in a wide range of solutions to an even wider range of challenges relating information exchange across organizational boundaries. An overview of the potential gains and motives for implementation of semantic metadata in a PPIC has been listed.

2.1.2 Challenges regarding semantic metadata management Question 2: What makes implementing semantic metadata management within networks of organizations so difficult?

21

Employing a common set of semantic metadata for information exchange between organizations is technically feasible. Various taxonomies and ontologies have been developed and remain in use, albeit in less complex situations. The challenges regarding semantic metadata management exceed the technology domain.

2.1.3 Developing a reference architecture Question 3: Which technological and organizational aspects should be incorporated in the reference architecture according to literature? The use of semantic metadata impacts various elements of the enterprise architecture. Examples are technology, primary processes, data models, governance processes and stakeholder relations. Before implementing semantic metadata management, or developing a reference architecture, the scope must be well defined. The reference architecture for semantic metadata management should encompass all relevant topics in order to manage metadata effectively.

2.1.4 Real life application of metadata management Question 4: What architectures do we find in practice for metadata management within the Dutch government? The literature study has indicated that there are many individual theories on the use of semantic metadata. In practice theories derived from literature could conflict, prove not applicable or even false. Semantic metadata management could be subject to changes and challenges that are not covered by science yet. For that reason the theories previously found in literature have been validated by real life examples and expert opinions.

2.1.5 Validation and refinement Question 5: What design principles can be derived from the application of the preliminary architecture on the cases? The reference architecture is developed at the enterprise architecture level. Given this high level of abstraction the architecture is not highly detailed, but is based around generic design principles. In this research design principles constitute the broader structural aspects of the composition of elements. Design principles are the leading element in the preliminary and evaluated reference architecture. In relevant areas where the implementation must be case specific tradeoffs have been formulated.

22

2.2 Methodology This chapter contains several aspects of the methodology. First design science is introduced as the leading methodology in this research project. Then the Information System Research Framework is presented as the specific design science approach. This is followed by describing the research process and expert validation. The case study approach and reference architecture format are presented in the next two sections of this chapter.

2.2.1 Research characteristics Developing a generic approach for managing semantic metadata in a Public Private Information Chain is very ambitious. Firstly, the research domain is immature, requiring observations and expert interviews to explore this domain and to complement existing literature. Second, a holistic view is required to value all topics within the very broad scope. And thirdly, the desired reference architecture can be considered a socio-technical system, thus requiring both knowledge from the engineering and social science domains. These elements result in two profound research characteristics that are described below.

This research is characterized by a research domain which is new and immature. Semantic metadata management has been on the agenda in the Netherlands only for the last few years. The limited knowledge available in this field makes this research project explorative. A so called Alice-in- wonderland-approach was employed. The true scope, subject-matter wise, could only be determined during the execution of the research project. For this reason an iterative research approach was chosen as is shown in section 2.2.4.

The second major characteristic of this research is the way the main research question is answered. The goal of this research was to determine how semantic metadata management in PPIC's should be carried out. This answer is hard to capture in a single paragraph given the number of relevant topics and their interrelations. For that reason the answer has been given the format of a reference architecture. A reference architecture is a generic, non-specific approach that provides a holistic view on a certain architecture archetype (Angelov & Grefen, 2008; Robertson, 2001). The number of topics makes the span of the reference architecture rather broad. Given the available amount of time this resulted in a low granularity.

2.2.2 Design science The leading methodology in this research is design science. This methodology was chosen since it is claimed that designs resulting from this approach both have a contextual and scientific merit. A solution for a problem is devised while at the same time the knowledge base is extended. According to Hevner systems design can be a highly valued contribution to science if two requirements are met (Hevner, March, Park, & Ram, 2003). The first is that the design effort includes an appropriate scientific foundation. The second is that its field of application or use of theories and methodology is a novelty and not a common practice, in which case no knowledge would be added. Since the design will be a reference framework the addition to the scientific knowledge base will be relatively high (Angelov & Grefen, 2008).

23

Within design science a range of research instruments can be applied (Gonzalez, 2007). The proposed research is composed of three research instruments: literature review, case study and expert interviews. Case studies can be seen as both an instrument and methodology since a case study comprises several instruments itself, including interviews and observation (Horan & Schooley, 2007). Gonzalez indicates that these three instruments can be combined and are used more often in design science. As stated in the chapter on research goals a reference architecture is thought to be the best method to structure and present the answer the main research question.

2.2.3 Information System Research Framework Hevner poses that information systems research is influenced by scientific theories and methods on the one hand and relevant elements such as people, organizations and technology at the other. This is shown in the Information System Research Framework. Figure 1 shows the framework and the stated interdependence between research, environment and knowledge base. Figure 1 has been adapted from original generic model by Hevner to fit this particular research project. The case study design was also carried out according to this framework, linking relevant research methodology to related case elements. In chapter 10.4.1 there is a reflection on the applicability of the design science approach.

Environment Relevance IS Research Rigor Knowledge base

People Develop/Build Foundations - Roles - Enterprise architecture - Capabilities - Design propositions - Business semantics - Characteristics management Business - Design principles Applicable - Reference architecture - IT-governance Organizations needs knowledge - Master data - Structure & culture management - Network & relations - Information quality - Information needs Assess Refine - Expert systems - Processes Methodology Technology Justify/Evaluate - Literature review - Infrastructure - Case studies - Applications - Analytical framework - Interviews - Capabilities - Case studies - Expert session - Field study - Observation

Application in the Additions to the Appropriate Environment Knowledge Base

Figure 1: Information System Research Framework adapted from Hevner (2003). The original generic content of the environment, research and knowledge base has been replaced by content applicable to this study.

The artifact that is developed in this research project is a reference architecture. According to Muller (2011) the design of a reference architecture is based on mining useful architecture patterns from

24 existing architectures and theories and enriching them with exploring and analyzing customer and business needs. This synthesis is shown in Figure 2. The existing architectures and patterns can be considered the knowledge base Hevner refers to. The business needs and future requirements draw up the environment Hevner refers to. Using Muller's reference architecture design model adheres to using both relevance and rigor as proposed by the design science research approach.

Figure 2: Reference architecture design model by Muller (2011).

2.2.4 Research process and validation The overall research process is described as follows. The research started out with an in depth analysis of the problem area and the chosen delineation. Throughout the entire process scientific literature was used and interviews were held for the case studies and expert opinions. The resulting insights served as requirements for the design of the preliminary architecture. Two in depth cases were used to apply this preliminary architecture and it was iteratively improved. At the end the preliminary architecture was validated using expert opinions and the evaluation criteria set at the beginning of the process. Based on the validation conclusions were drawn and an improved and validated reference architecture is presented. The entire research process is depicted in Figure 3: Overview of the research process.

Evaluation criteria

Problem analysis Preliminary architecture Case study Validation Reference architecture

Scientific literature Design propositions Case youth care Expert session Design principles Experts Tradeoffs Case tax office Evaluation criteria Tradeoffs

Iterative

Figure 3: Overview of the research process. Created by author.

25

Answering the research questions The main methodology that has been used to answer the first three research questions is literature review, but additional insights from the case studies and experts have been processed as well. In chapter 3 a description of the research domain is given. Both the motives for using semantic metadata and challenges with managing semantic metadata can be found in chapter 4. Chapter 5 describes what organizational aspects should be incorporated in the reference architecture according to literature. These propositions have resulted in a preliminary reference architecture which is untested. Research question 4 introduces the cases used for testing the preliminary architecture, which are found in chapters 7 and 8. The answer to question 5 merges the existing theories (question 3) with the lessons from real life cases (question 4). From the design propositions tested design principles have been derived, turning the preliminary reference architecture into the evaluated reference architecture that is shown in chapter 9.

Validation by experts The validation of the evaluated reference architecture was carried out by a number of selected experts. The experts have been used in the creation of certain key areas the framework and for validation of the framework as a whole. The expert opinions have been gathered by means of interview. In preparation of these interviews an interview protocol has been created. Due to the background of most experts the interviews are in Dutch and a summary in English is provided. In the validation phase and expert session has been planned in which experts are confronted with the reference architecture. The ability to interact and discuss findings makes the expert opinions more explicit (Verschuren & Doorewaard, 2003). The experts for the validation have been selected by means of the following two criteria:  The experts must possess a level of authority on the area of expertise in which they are questioned.  The expert panel will include experts with research experience and experts with practical experience.

26

2.3 Case study approach Coinciding with the Alice-in-wonderland-approach the case studies make up the main element of this research. Literature alone did not provide enough information on the coherence, interaction or tradeoffs among topics. Case studies allow for the in depth approach required given the research goals (Baarda & De Goede, 2001). Observations in real world use of shared semantics allow for insight the need for, challenges with and devised solutions for semantic metadata management. This chapter starts off with the case study selection criteria. The in depth cases that are used in this research are the Dutch youth care sector and tax office. Both cases are presented in the second section. The third section shows how both cases complement each other.

2.3.1 Case study selection criteria Both cases are selected on a number of common criteria. The vast majority of the public sector in the Netherlands conforms to the scope as described in chapter 1. The youth care sector and tax office were selected as cases since they were available for the in depth review required for an explorative case study. Both cases:  are about government organizations that operate in a network of organizations with multiple stakeholders and mutual relations.  are heavily dependent on information, which is used for both compliance and decision making, both of which impact the third party pertaining the information and the functioning of the social system.  make use of sensitive information relating to third parties, both individual persons and organizations. For that reason these cases are subject to strong regulation and strict rules.

2.3.2 Cases used in the case study  The first case used to evaluate the preliminary reference architecture is the Dutch youth care sector, specifically that of one of the 15 Bureaus Jeugdzorg (BJz) in the Netherlands. The youth care sector consists of a large network of both public and private organizations, all of which are very different in nature. Within this chain a lot of information is shared, mostly in the form of reports. BJz has a coordinating role in this flow of information. Many of the reports that are created are used for informing organizations or for compliance, both internally and towards regulators. Within the youth care sector there are several initiatives regarding a common methodology, but a true semantic metadata architecture embedded in IT-systems does not exist yet. Given the intricate mix of Bureau Jeugdzorg’s own requirements, alignment with partners in the information chain and various national initiatives this case is very interesting. It also requires that the semantic metadata management effort to be able to handle the constant changes and major modifications of new methodologies, information needs and regulation.

 The second case used to evaluate the preliminary reference architecture is the Dutch tax office (Belastingdienst). Within this large organization the focus lies on the semantic metadata management effort relating to the Standard Business Reporting program. The tax office is responsible for many elements in the Dutch Taxonomy, the most prominent standardized set of semantic metadata in the Netherlands. These elements have to be

27

managed before being implemented by Logius, the public body that coordinates IT projects of the national government. The IT-infrastructure has been adapted for the use of semantic metadata as used in the Dutch Taxonomy. However, a management and governance structure does not exist and these tasks are carried out on an ad hoc basis. The interesting thing in this case is that the semantic metadata is not only relevant to the actual end user who does financial reporting. When legislation is created the impact on the back office, in this case the tax office, must be clear and should be able to be estimated. The whole process runs from the creation of new legislation to implementation in the Dutch Taxonomy. The focus will be on the steps of the process within the tax office and the actual law making remains out of scope. This twofold use of semantic metadata poses additional demands on management and governance.

2.3.3 Case study features These cases were selected regarding a number of common criteria, but are also complementary in order to fill the entire spectrum of e-government institutions in the delineation. The cases differ enough to ensure the reference architecture is generic and not fit to a particular situation. Yet the cases have enough in common to compare and contrast the findings.

The following three elements make the cases complementary. First, the tax office is a body that operates at the national government level while the youth care is a government task relegated to the provincial and municipal level. Second, in relation to the creation and governance of metadata the tax office is a body that specifies the semantic metadata, albeit with input from others. In case of the youth care sector this situation is reversed. Much of the metadata, such as definitions, is designed at higher levels of government or must conform to partners in the information chain. Third, the tax office receives highly structured data, mostly in the form of numbers, with some of those numbers structured in tables or other listings. The youth care sector has a lot of ill structured information products, with most of the data stored in plain text format. Semantic metadata applies to all types of data but the format may require or enable a specific approach.

Table 1 shows how both case relate to each other and what properties are complementary.

Property Youth care case Tax office case Data type The information products mainly Information products mainly consist consist of written text. of figures and tables with limited space for text. Data structure Limited structured data with Highly structured data with many contents varying per case near similar products Flow of data Reciprocal data streams varying in Well defined and scheduled one way intensity per case data streams Collaboration with Cooperation Hierarchy partners Government level Provincial and municipal level National government level Size of organization Several hundred employees About 33.000 employees Table 1: Features of complementary cases. Created by author.

28

2.4 Reference architecture The main deliverable of this research project is an evaluated reference architecture, since that is format chosen for the answer to the main research question. This chapter starts off with the definition of what a reference architecture is and what capabilities are credited to a reference architecture. Subsequently, the design principles and tradeoffs are presented as the core element of the reference architecture. This chapter finished with a section on what scientific value an ‘evaluated’ reference architecture has and a section on quality indicators and evaluation criteria.

In short the chosen format for the reference architecture has the following characteristics:  Very broad scope that ranges from technology to processes, resulting in limited granularity.  Preliminary version based on literature, evaluation based on two case studies and experts.  Consisting of both prescriptive design principles and tradeoffs that leave room for leeway.  Emergent behavior, can be implemented per organization to achieve system wide effects.

2.4.1 Scope & granularity A reference architecture is a general, non-specific approach that provides a holistic view on a certain architecture archetype (Angelov & Grefen, 2008; Robertson, 2001). Reference architectures differ in scope and granularity. The scope is what determines the span, the number of topics that are incorporated. The granularity is the level of detail in which these topics are addressed. As such reference architectures may range from very detailed, like many IEEE models (Bass, Clements, & Kazman, 2003), to high level architectures that are based around design principles, such as the NORA (NORA, 2010). Examples of elements that can be included are process flows, roles, responsibilities, rules, guidelines, standards, system designs and data models.

Many reference architectures focus on software design and the scope usually includes the software and underlying technical architecture (Kazman et al., 1998). In this thesis an information system is defined to consist of people, processes, data and resources. This makes the scope of this research rather broad since processes regarding cooperation and organizational alignment are added. Therefore, given the constraints on time and effort, the granularity is limited.

2.4.2 Reference architecture capabilities According to Muller (2011) a reference architecture may facilitate multi-organization system creation and life-cycle support. Especially in areas of increased complexity, scope and systems size and increased dynamics in integration due to multiple organizations. The use of a reference architecture may… … provide guidance, principles and best practices. … manage synergy and individual gains. … capture and share architectural visions. … provide an architecture baseline and blueprint. … provide a common lexicon among partners. … emphasize explicit modeling of functions and qualities above systems level. … emphasize explicit decisions about compatibility, upgrade and interchangeability.

29

The combination of the listed capabilities may make it easier for those tasked to implement a semantic metadata architecture and corresponding management structure in a PPIC to achieve interoperability between many different and ever evolving system elements. These characteristics are valuable given the challenges presented in chapter 4.2.

2.4.3 Design principles The main component of the reference architecture in this research will be design principles, see section 1 of this chapter. Design principles are the key element of principle-based design which are to result in “a prescriptive theory which integrates normative and descriptive theories into design paths intended to produce more effective information systems” (Walls, Widmeyer, & El Sawy, 1992). The Open Group has defined principles, in the area of information technology, as “general rules and guidelines, that are intended to be enduring and seldom amended, that inform and support the way in which an organization sets about fulfilling its mission” (TOGAF, 2004). Bharosa (2011) has defined design principles as “normative and directive guidelines, formulated towards taking action by the information system architects”. These descriptions indicate that design principles are prescriptive but focus on goal attainment rather than compliance.

In this research the design principles are structured in the same format as detailed in the TOGAF architecture (TOGAF, 2007). This means that each design principle is captured in a short unambiguous statement which is provided with a name, rationale and implications.

2.4.4 Tradeoffs The reference architecture is a balancing act. On the one hand there must be enough rigor to provide interoperability. On the other hand there must be leeway in the design that follows from its use to fit organizations with different characteristics or changes in functional requirements. The mixture of rigor and leeway has been achieved by using both design principles and tradeoffs.

Aside from the design principles the reference architecture consists of a number of tradeoffs. These tradeoffs are relevant topics related to metadata management in PPIC's for which the implementation may strongly differ per situation. The term “tradeoffs” has been deliberately chosen to illustrate that these topics together form the metadata management infrastructure. According to Bass et al (2003) part-whole decomposition enables to achieve modifiability and integrability qualities in a system. The tradeoffs provide the ability to create different configurations as the designer sees fit.

The design principles can be seen as a tradeoff as well, but are a unidirectional one. The design principles are to be fulfilled as far as resources allow. The tradeoffs in this context are decisions on system characteristics. Functionality is the primary criteria, with resources being a secondary criteria.

30

2.4.5 Credibility A reference architecture is the outcome of a design process, as indicated in the chapter on design science. As with any design its actual suitability to the intended purpose is to be determined by a form of evaluation. Evaluation adds to the credibility of the design and corroborates existing scientific knowledge or adds new knowledge.

Regarding a reference architecture three levels of credibility can be distinguished (Clements, Kazman, & Klein, 2001). At the conclusion of the design phase it can be regarded as a preliminary architecture since it has not been tested in any way. Following expert validation and/or performing a case study it may be regarded an evaluated architecture. Once it has been applied by the intended end user for its intended purpose it will be a tested reference architecture (Kazman, et al., 1998).

In this research project a preliminary architecture is developed based on literature and evaluated using two explorative case studies. In the future it may be applied and become a tested reference architecture.

2.4.6 Quality indicators The reference architecture will be evaluated in two ways. First of all the content will be evaluated using the case studies and experts. Second the format will be evaluated, in order to see if the architecture meets the demands that were set for the purpose and format of the architecture. These can be seen as analogous to functional and non-functional requirements in software design. A list of quality indicators has been composed which is shown below. The format is evaluated in chapter 10.2.

Common quality indicators The format of the reference architecture is evaluated by checking the conformity to a number of quality indicators that were drawn up before the start of this research. The quality indicators are a combination of and literature (Angelov & Grefen, 2008; TOGAF, 2004). Additionally during the interviews the experts were asked of their expectations of a reference architecture for metadata management. Table 2 lists the reference architecture quality indicators.

The reference architecture should… … provide a holistic view on semantic metadata management in order to show interdependencies and trade-offs. … address multiple stakeholder perspective and roles. … be a generic solution that is context and vendor neutral. … be based on a scientific foundation and apply real life best practices. … be able to deal with applicable laws and regulations. … be concise, understandable and easy to communicate. … merely assist *the real life designer+ and leave as much design space as possible to accommodate a specific implementation. … be understandable to people within the organization with various backgrounds. Table 2: Reference architecture quality indicators. Created by author.

31

Quality indicators applying to design principles TOGAF (2007) states a number of quality indicators that specifically relate to design principles. Given that the design principles are the most important part of the reference architecture and shape the final design special attention for the design principles is prudent. The TOGAF quality indicators relating to design principles are listed in Table 3.

Quality indicators for design principles Understandable The intention of the design principle should be clear and unambiguous. They should be understood by individuals throughout the organization. Robust The design principles should be sufficiently definitive and precise to result in near similar solutions in near similar situations. Complete The design principles should cover any situation within the scope of application. They should also cover all topics that matter within the subject. Consistent The design principles should be consistent. Adhering to one design principle should not exclude adhering to another design principle. Stable The design principles should be enduring, yet be able to accommodate changes. Table 3: TOGAF quality indicators relating to design principles. TOGAF (2007)

32

3 Public Private Information Chains The topic of this research is semantic metadata management. This topic is rather comprehensive and is thus delineated to the domain of Public Private Information Chains (PPIC), as has been shown in chapter 1.4. In order to substantiate the PPIC as research domain two topics are introduced. The first topic is a short introduction to bureaucratic processes in government to provide a historic perspective. This is followed by defining PPIC’s in the second section and two examples in the third section.

3.1 Public policy and bureaucratic processes In Western society laws and policy are implemented by the executive branch of the government. The government is not a single entity but consists of numerous public bodies, each with its own area of responsibility and services portfolio. The executive public bodies perform a wide range of information intensive services, many of which have to do with regulation and decision making in individual cases. These services often require interaction with civilians, businesses and other government bodies (Wimmer, 2002). In order to carry out those responsibilities public and private parties need to exchange information frequently. This need for information exchange is either a requirement by law, related to one of the primary processes or is in the interest of one of the parties. The nature of this information exchange is very diverse. The exchange may relate to permits, accountability, taxes, statistics, subsidies and even primary processes of public and private parties alike.

Since the Industrial Revolution information exchange in this context has been increasingly standardized in a trend called bureaucracy. In analogy to standardization in the production of physical goods the information product and its production process have been standardized. For information processes this means that both the format of the information and the execution of transmission, transformation and decision making having been standardized (Albrow, 1970). In a similar way to the production of physical goods information processes have been divided into well defined steps that are usually carried out sequentially, sometimes by different parties. Albrow lists a number of perceived benefits of bureaucracy:  Transparency, insight in processes for both to the subjects and management.  Predictability, the use of logic makes the outcome of the process deterministic.  Quality, due diligence can be proven and the process can be easily reviewed.  Equality and objectivity, the use of logic restricts subjectivity.  Efficiency, standardization allows for specialization and efficiencies of scale.  Reduction of complexity, decomposition of complex tasks makes them easier to understand.

Aside from benefits there are also perceived drawbacks of bureaucracy. Bureaucracy nowadays has a negative connotation of being costly and time consuming. Bureaucracy is also said to have become a goal itself, resulting in more distance from the underlying purpose. This turnaround of connotation is also described by the following quote: “It is hard to imagine today, but a hundred years ago bureaucracy meant something positive. It connoted a rational, efficient method of organization – something to take the place of the arbitrary exercise of power by authoritarian regimes. Bureaucracy brought the same logic to government work that the assembly line brought to the factory. With the

33 hierarchical authority and functional specialization, they made possible the efficient undertaking of large complex tasks.” (Osborne, David, & Gaebler, 1993).

The automation of public private processes The perceived cost of bureaucratic processes in time, effort and money has led to a desire to reduce administrative burden. This can be partly achieved by abolishing some processes, but this is only a partial solution since most processes are found necessary to execute public policy. Therefore most of the attention is focused on more efficient execution of these processes (OECD, 2003). The high level of standardization and the extensive application of logic make that bureaucratic processes lend themselves very well to automation. Since the 1980’s two trends in automation have completely transformed the characteristics of bureaucracy.

First, in both private parties and public bodies many processes carried out by people using paper have been supplemented by IT-systems or even have been replaced entirely (Sbodio, et al., 2010). The result is that time and money are saved in every value adding step in the information chain. IT is unable to make professional judgments, but is very able at carrying out logic such as classification based on characteristics. Additionally many overhead tasks are simplified significantly using IT. An example is duplication of a document, which requires less effort on a computer than using paper.

Second, with the rise of the internet many of those IT-systems have become interconnected, also across organizations. This allows for better cooperation at the cost of more complex structures for sharing information (Janssen & Van Veenstra, 2005). The reduction in transaction costs has saved even more time and money.

3.2 Characteristics of Public Private Information Chains In conjunction the presented trends regarding automation of bureaucracy in government have led to what are called Public Private Information Chains (PPIC) in this research. The common terminology “e-government” is avoided on purpose. For e-government there are many definitions, many of which are either very broad or even contradictory. No clear threshold or definition exists to determine whether a public body is part of the e-government or not. Often the term e-government stands for electronic government and refers to the use of information technology by the government to provide information and services to civilians, businesses and other government bodies. This definition is much wider than the exchange of information. Introducing the term PPIC allows for an unambiguous description to refer to the research domain.

A PPIC is defined in this research as a digital information chain spanning multiple organizations that is centered around a certain type of information, has a high rate of repetition and mutual responsibilities and dependencies. The PPIC's have shown great benefits, reducing costs and time significantly. However, many inefficiencies remain. Most information systems in use predate the PPIC and have not been designed to be interoperable in such a way. Note that in this context information systems are mentioned, not IT-systems. An information system is defined as the whole complex of people, IT-systems, data and processes. Through web services and standardized data models many IT-systems have been made interoperable. However, since the organizations within a

34

PPIC are heterogeneous information is used for different purposes, by experts of different background, posing different quality needs.

The transaction costs incurred in information exchange do not only include transmission, the majority is made up by translation costs. Translation is the effort of interpretation and refitting for another information system. An option would be to get the information from the source, avoiding translation in all other links in the chain. But, often this is not an option since intermediary information products are shared and each link in the information chain adds a unique value. Therefore the answer in increased efficiency and improved information quality lies in using the same semantics within the chain to describe information, leaving room for diversity in information itself. This does away with the translation costs. The effort aligning the various IT-systems, data models and stakeholders is called semantic metadata management.

Characteristics of information chains Throughout this research organizations and systems are presented as architectures. An architecture is defined in this research as the constellation of people, processes, data and systems regarding a certain subject. The word architecture is used in order to indicate that such constellations can be analyzed and redesigned. In concordance with this descriptive view of architectures the notion Public Private Information Chain (PPIC) is put in perspective by means of the theories of value chains and the information supply chain.

A value chain is a concept from business management introduced by Porter. It describes that products undergo a series of primary business activities that add value to the product. Information products and decision making processes are also subject to a value chain (Rayport & Sviokla, 2000). This value chain may take place within the boundary of a single organization or extend over a network of organizations. According to Rayport and Sviokla (2000) value adding steps that can be identified in both structured and non-structured information products are gathering, organizing, selection, synthesis and distribution. Hoffman et al (2010) speak of analysis, interpretation, decision making, distributing, transmitting, verifying, reconciling, correcting, rekeying, generation, discovery and gathering.

Sun and Yen (2005) introduce the notion of an information supply chain and compare it to supply chain management theories that normally apply to tangible goods. They observe that information products are undergoing similar transitions as tangible goods. As with physical products the information product also matures in the value chain. Instead of fully finished products a lot of intermediate products are exchanged within the chain. These intermediate products, when joined together as a finished product, result in a better product or a similar quality product with lower costs. Each link in the chain adds its specific values.

In contrast to supply chains the information supply chain possesses certain unique characteristics tangible goods do not possess. Information products can be easily multiplied, especially when they exist digitally. Transportation and distribution also differs significantly. Moving a car or toaster to the other side of the world takes a day by plane and weeks by sea. Most information can be transported near instantaneously to any location by fax, email or web services. Therefore the need for warehousing (storage near client) or demand estimation do not play a major role according to Sun

35 and Yen. Additionally, most tangible goods are well structured, unless made or designed to order (Platier, 1996). Regarding information there are also many non or ill structured products. Even the same type of document with the same type of content often differs due to the author’s own habits and style of writing, although formats do increase the level of structure.

3.3 Public Private Information Chains in practice The concept of public private information chains is illustrated by two examples. These are also introductions to the context in which the two case study cases are situated. Additional information on the background of the cases, the nature of these information chains and the challenges in these cases are presented in the case studies in chapters 0 and 8.

Child protective services case Bureau Jeugdzorg (BJz) has a coordinating role regarding the various agencies that support to the families placed under their watch. Additionally, they are to constantly monitor safety of the children. Reinforced by a string of incidents causing nationwide headlines there was a desire for improved monitoring of safety. As a result information was to be obtained from more sources and at a higher frequency. Consequently the information chain grew steadily. The increased frequency also led to more data, which is conflicting with the desire for a faster throughput and review of that information. Presented with these challenges there are various initiatives to automate these new information exchanges processes. All these initiatives take place in a context in which processes are redesigned and rationalized to reduce arbitrariness. Also and the intellectual capabilities of the employees are to be preserved for making judgments, not to be wasted on simpler forms of information processing and drafting of reports.

Tax office case Both on municipal and national level the government aims to reduce the administrative burden on citizens and companies. As a result digital information flows are added to the existing paper based information flows. In order to orchestrate those flows and to provide a one stop shop independent government agencies have been established, such as Logius and Agentschap NL in the Netherlands. Another trend is that the national government bodies strive not to request the same type of information twice. For instance it is preferable that an address needs to be changed only in one location, not a dozen times. Government bodies that previously had nothing to do with each other now have to interact digitally since they need to exchange information required for their primary processes. This results in a variety of public private information chains which are interlinked and form a network.

36

4 Semantic metadata: potential and challenges This chapter details the motives for using semantic metadata and the challenges related to its implementation. Section 4.1 lists the potential and the motives for using semantic metadata. Many motives are listed since semantic metadata may impact various parts of the organization. What benefits are realized depends on the specific implementation of semantic metadata. Section 4.2 lists the challenges imposed on those who are to design and carry out semantic metadata management. These challenges are divided into the areas of technology, data and stakeholders in order to structure the problem. This chapter ends with conclusions and a view on the tradeoff between benefits and challenges.

4.1 Potential benefits of standardizing semantic metadata This chapter provides insight into the potential benefits that semantic metadata management may offer within and amongst organizations. Semantic metadata management has no direct contribution, but it is a prerequisite for having a common set of semantics across organizational barriers, which does has its benefits. Semantic metadata management can be viewed as a standardization effort of semantic metadata within and amongst organizations. These benefits are presented first. Moreover having a well managed set of semantic metadata allows for further automation of public processes that are carried out in chains (Fokkema & Hulstijn, 2011).

4.1.1 Effects of standardization Semantic metadata management can be viewed as a standardization effort of semantic metadata. Whether employed within an organization or over organizational borders it is aimed at increasing levels of conformity and reuse. Standards allow each individual entity, such as an organization within a PPIC, to individually conform with an emergent performance increase over multiple entities as a result. This increases the performance of the PPIC as a whole (Egyedi, 2003). The advantages of standardization in PPIC’s are illustrated by the following three views.

Interoperability The use of a common set of semantics improves interoperability in information exchange. Within an information chain this means that information can flow more easily, reducing the transaction costs, throughput time and errors in translation. Houtevels (2010) describes the costs of closed world systems, where costs are incurred for translating and interpreting inbound information and translating and compiling outbound information. The use of standards, both on the technical and the content level, allow for open world systems. An open world system is designed with inter- organizational information exchange in mind.

Reduction of complexity Standardization enforces a form of structure. Categorization, which semantic metadata provides, is one form of structure. Standardization of semantics also reduces the amount of metadata through reuse. Both structure and data reduction allows for better insight. Insight and understanding of the organization its cross-organizational processes is a prerequisite for being in control. This in turn is much desired by those managing the organization. It allows for evaluating and increasing

37 operational performance. Moreover it aids in accountability for production, spending, process compliance and performance.

Conformity of content Semantic metadata is directly linked to the information contained within the core data. Standardization of semantic metadata results in a common vocabulary for describing data, lowering chances of miscommunication. Subsequently common metadata eliminates the need for interpreting and translating inbound and outbound information. This results in improved information quality throughout the information chain. The information quality is improved even further when quality indicators are added and the quality of information is actively monitored (Bharosa, 2011).

4.1.2 Possibilities of automation The second motive for standardizing and managing semantic metadata is further automation of processes. Since the advent of the computer more and more tasks previously carried out by people are being automated. A well managed and common set of semantic metadata is a precondition and enabler of various types of automation which might be desirable but cannot yet be implemented. On the one hand these forms of automation may counter the side effects of automation presented in the problem statement. On the other hand they may further reduce paper based processes and human effort.

Compliance by design In the public sector processes need to conform to numerous legal requirements. The government is not only a source of regulation, but is also highly regulated itself. Compliance depends on the way the processes are set up and how strictly these processes are adhered to. Computer systems are deterministic by nature, lacking any form of arbitrariness humans might display. Once the legal requirements are correctly embedded within the design there is a reasonable assurance that the outcomes of the process will be compliant too (Fokkema & Hulstijn, 2011). Therefore the use of computer systems in the public sector allows for compliance by design.

Business intelligence Having correct and consistent semantic metadata is of key importance for various business intelligence and data mining applications, which in turn are the key in reducing human effort. Statistics based business intelligence allows for projections of trends, insight in production and process flows and consistency checks. Rule based business intelligence allows for classification, input assistance and fault reduction (Witten & Eibe, 2005). Especially within the public sector the use of classification is appealing, since laws and regulations clearly indicate which persons or entities have which privileges and responsibilities. Classification is already common regarding numbers, since metadating them is common and they have the right level of granularity. A good example is the tax office application for filing personal taxes. By providing several values the application itself determines whether you are entitled to certain tax advantages.

38

Drafting of reports Within a PPIC much information is exchanged. Usually this information is contained in a sort of report, a (digital) document which provides a certain set of information for a certain purpose in a presentable form. Since a form of professional judgment is required in compiling reports this often takes a lot of human effort. Jans (2007) divides this effort in five steps: overview, selection, ordering, describing and compiling. These are briefly presented. First there needs to be an overview of all information which might be relevant given the purpose of the report, which becomes easier when search criteria match the content well. Then a selection needs to be made which is easier when excerpts can be reused. Subsequently this information needs to be ordered in a way that makes sense. Fourth, there needs to be a form of description of the purpose, content and quality assurance of the document. Finally the report needs to be compiled in a desired format. Semantic metadata can aid in all these five steps, providing the end user with much better insight in the available information. For reports of a predetermined format many of these steps may even be fully automated.

4.1.3 Conclusions on potential benefits of semantic metadata Information exchange and (human) processing have much to gain from semantic metadata, especially when standardized and exchanged alongside the core data. The actual benefits that are realized depend on the actual implementation. In any case a form of management is required in order to achieve these benefits. Many of the stated benefits are overlapping, interrelated, or could even be mutually exclusive. Therefore there needs to be a focus on which benefits are desired most.

Hoffman et al (2010) describe how the advantages from standardization and automation reinforce each other. This matches with the observation that benefits are overlapping and interrelated. As an example they describe the generic processes regarding the drafting of reports and the exchange of information, as seen in Figure 4. This figure uses arbitrary figures but gives a rough indication on benefits of standardization. In short the tasks that are necessary but found to be a waste of human intellect require less effort. This allows for more human attention to go to areas in which people have a distinct added value, such as analysis and decision making. Alternatively the combination of semantic metadata and further automation allows for lower costs and/or reduced process time.

39

Figure 4: An indication of where semantics combined with automation can provide gains. Reducing the time spent at tasks with very little added value. Adopted from Hoffman et al.

The described processes can be found in nearly every type of organization. For some it is their core business, for others it is in support of their primary processes. The benefits of semantic metadata standardization are applicable to three generic groups (Hoffman, et al., 2010). These three groups include all who are potentially impacted by semantic metadata in their daily activities: 1. Those responsible for specifying metadata. This group consists of those who are responsible within an organization to determine which metadata is used and to coordinate what definitions are used. This group can be extended to not only within an organization, but in the context of this research also those cooperating in a consortium of organizations. 2. The creators of information. Those responsible for registering new information and creating new information products. Semantics ease registration efforts and result in increased quality input information, providing a better performance in later processes. 3. The consumers of information. Consumers of information are those who receive information and need to use it for whatever task other than creating information. That task is made easier since the metadata potentially adds various benefits regarding interpretation, verification, validation, analysis, trust and extraction of information.

Organizations that operate within a chain are subject to all three categories. Inbound information needs to be interpreted. Outbound information is to be composed and drafted in various formats. Meanwhile metadata must be perpetually specified and reviewed for internal and external use, even when common standards are in use. The third category is encompassed by metadata management and enables the first two.

40

4.2 Challenges regarding semantic metadata management System architects who are to implement semantic metadata management are faced with a number of challenges, which make implementing semantic metadata management in PPIC’s complicated. The challenges regarding managing the use of semantic metadata in PPIC are very diverse. Aside from the actual content semantics are related the challenges range from technology and information exchange to cooperation and trust among stakeholders. The root of all the challenges lies in the fact that a PPIC is an information chain that consists of various heterogeneous organizations. The metadata management effort takes place both within each organization and over the chain as a whole. Metadata management efforts in PPIC’s are on the agenda since the early 2000’s (Sbodio, et al., 2010), but actual implementations and best practices are limited.

4.2.1 Impact on enterprise architecture In this thesis semantic metadata management is reviewed from both an organizational and inter- organizational view. These views are structured by plotting the challenges on a three layer enterprise architecture. The three layers are the technology level, data level and process level. This three layer model originates from the ArchiMate standard. It was chosen as it allows for structure and is simple enough to communicate. The ArchiMate standard was developed to describe information system architectures in public and private organizations (Lankhorst et al., 2008). In ArchiMate a three level enterprise architecture is used to represent organizational structures, for both public and private organizations. These levels are the technology level, the data and application level and the business level. This three level view conforms with the Federal Enterprise Architecture Program of the US federal government (FEA-PMO, 2007).

The TBM Enterprise Architecture model by Janssen (2009) is shown in Figure 5. This uses five levels and clearly shows how the elements on different levels interact. The TBM-EA model also shows the context of an enterprise architecture:  The enterprise architecture is influenced by the business and stakeholder environment. The enterprise architecture must fulfill the needs posed by the business, which in turn is influenced by the stakeholder environment.  The enterprise architecture is governed by the managers and architects responsible for the business processes ,information architecture, applications and technical infrastructure. The architecture is never static but continuously changes. These changes may be ad hoc or be part of a pre-planned growth path.  Finally, the enterprise architecture is implemented, translating plans to reality.

41

Figure 5: TBM Enterprise Architecture meta-framework. By Janssen (2009).

Inter-organizational alignment The target audience of this research operates within their own organization, in which there are dynamics and dependencies both within and between the three enterprise levels that are detailed in the next section. Aside from their own organization those tasked to implement a semantic metadata management are confronted with other organizations in the PPIC they have no direct control over. Figure 6 provides insight in the complexity and dependencies that make alignment difficult to achieve. It shows a chain of five organizations, each of which need to apply a form of semantic metadata management. These management efforts can range from very extensive to being as limited as simply implementing the common standard.

Figure 6: Graphic representation of an information chain, showing that each organization within the chain has an enterprise architecture. Alignment of semantics takes place both within and amongst organizations. Created by author.

42

4.2.2 Challenges on sub domains For each of the three levels an overview of the challenges is given.

Technology level The main technological issue is that there is no green field regarding technologies. Semantic metadata initiatives have to cope with existing IT-systems. Many organizations still employ legacy IT- systems that have been developed, or even purpose built, many years ago. This has the following consequences:  Many organizational information systems are generally ‘closed world systems’, as in they have not been designed to be interoperable with other systems, let alone those outside of the organization.  Development of dedicated IT-systems, or the configuration of generic systems for that matter, is costly and very time consuming. Many specialist systems are not available off the shelf and the replacement of large IT-systems takes years. Hence there is no green field to implement external semantic metadata. Legacy systems are a given context.  The staff in the IT-department and primary process employees is accustomed to the existing systems. This hinders both the acquisition of new systems and the implementation of external semantic metadata, which is new to many organizations.

These three technological issues are applicable to all IT-innovations and are not specifically related to semantic metadata. This means that other IT-projects may be influenced by metadata management. The chapter on potential benefits speaks of enabling other IT-projects, but they may also be impaired. Given that the central challenge in metadata management is alignment make these technological challenges are very relevant. A further complication is that these challenges exist within each organization within the PPIC.

Data level Digital information exchanges requires alignment between the two exchanging parties (Delone & McLean, 1992). When computer systems are in use but information exchange takes place in a non- digital manner (paper or verbally) there are human ‘translators’ on either side of the exchange. Natural language and figures are a common standard both parties adhere to. The translators interpret the message and enter the information in a manner that suits their information system. Digital information exchange does not automatically remove the inefficiency of having these translators since different data models and definitions may be in use.

A workaround for this problem used primarily in since the 1990’s is the use of plain text messages during the exchange, as if it were paper, leaving all of the interpretation to the end user of the information during reading. For example the US Securities and Exchange Commission (SEC) uses the XBRL standard to interact with publicly traded companies, but no semantics have been defined. A recent study shows that there only is a 2% commonality in semantic metadata (Bergeron, 2003). This means that the need for translation and interpretation by the end user is equal to receiving the same information on paper.

A solution to this problem would be the common application of the semantics in use. This requires that semantic metadata either be included within the message or be referred to in a manner

43 accessible to the recipient. This could be a standard common to both parties or a publication of the semantics by the sender of the message (Brandt, Miller, Long, & Xue, 2003).

Process level Management processes are very important for semantic metadata management. They directly touch upon the metadata lifecycle and the performance of the system (Kimball, Reeves, Ross, & Thornthwaite, 2002). The management processes take place on two levels: within the own organization and within the information chain.

Within the organization the management focuses on the metadata lifecycle. They must be specified, available, verified and eventually deleted. With many IT-systems the management processes can be carried out as a standalone effort, with metadata management this is not possible. Semantics serve multiple end users and span over various primary processes, technologies, data models and even organizations. Alignment is critical to be in sync with all those other elements in the organizational architecture.

Cooperation among stakeholders poses a number of stakeholder related challenges (De Bruijn & Ten Heuvelhof, 2007). These challenges are grouped into three categories. In appendix 12.1.3 these are presented in detail.  The first category of stakeholder related challenges applies to all projects with multiple stakeholders. These generic need to be overcome in nearly any project. For instance, it is hard to determine exactly which stakeholders are involved and what their motives and interests are.  Operating within a chain makes that responsibilities and efforts are being redistributed (Fokkema & Hulstijn, 2011). Transitioning from a closed world organization to a situation in which information from various third parties is used requires organizational change. Before people and organizations are willing to be dependent on other parties there needs to be a level of trust, for instance in the form of agreements or transparency.  Infrastructures have a number of unique features that provide their own set of specific stakeholder related challenges (Blecker & Kersten, 2006). These are also present information technology based infrastructures such as metadata management in information chains.

4.2.3 Conclusion on challenges The enterprise architecture indicates that there are challenges regarding semantic metadata management on various levels. The differentiation in technology, data and processes is artificial but gives a good indication of the wide variety of topics that relate to metadata management. For each of these topics there are different subject-matter experts, managers that are accountable and end users that want or need to be included in the design process. This applies to every organization within the information chain.

Semantic metadata management is partially complicated, but most of all it is complex. There is a difference between complicated and complex. Anderson (1999) defines complex as opposite of independent and complicated the opposite of simple. Individually each of the presented challenges can be overcome. Even though some challenges are intrinsically more complicated than others, for

44 each individual challenge one or more acceptable solutions can be devised. However, devising a solution that fits all challenges is difficult since every action will also impact another part of the system. All parts of the system are interconnected. Decisions or the given situation in one area will limit or open options in another area. For instance, the technically most suitable for data transmission may not fit the type of semantics or the data model in use. The most optimal data model may not be acceptable to some partners in the PPIC.

Aside from the interrelation of challenges there is another complicating factor. Dynamics. None of the described challenges are static. All factors change over time due to continuous developments. New technologies may arise, processes adapt to changing needs and laws, people change their mind and partnerships evolve.

In summary, source of this complexity stem from a number of factors, including:  The number of components;  The number of relations;  The number of processes;  The number of stakeholders;  The number of interests per stakeholder;  The dynamics of each stated factor.

4.2.4 The balance between potential benefits and challenges A characteristic of semantic metadata management is that it requires a significant short term investment for long term gains. A wise approach would be to perform a cost benefit analysis before starting on semantic metadata management in a chain. In some cases the effort in overcoming the challenges may indeed not be worth the gains. On the other hand given the large design space there probably will be some degree of improvements that can be made.

Not pursuing semantic metadata management in the scope presented in the problem statement will possibly force all parties to carry out mitigation efforts somewhere in the future. These individual mitigation efforts are likely less beneficial than combined efforts. Combined and coordinated efforts may lead to a situation in which the gains are greater than the sum of all effort. Whether the benefits of semantic metadata management truly outweigh the cost of overcoming the challenges depends on the actual situation at hand and what solution is ultimately achieved.

Risk management theories argue that knowing what challenges and risks may occur allows for preparation, thus increasing the chances for success. Knowing the risks allows them to be treated, transferred, terminated or taken. The evaluated reference architecture developed in this research project may be used to identify what risks may occur.

45

46

5 Aspects of semantic metadata management in literature This chapter describes what technological and organizational aspects should be incorporated in the reference architecture according to literature. The aspects are divided in three perspectives, in concordance with the enterprise architecture approach presented in chapter 4. The three perspectives are data, technology and processes. Within each perspectives a number of aspects are covered. Each aspect ends with a summary of best practices that are presented by literature. Together, these best practices make up the preliminary reference architecture described in chapter 6.

5.1 Semantic metadata management: Data perspective The data perspective is chosen as the first domain since definition to understand the other domains are explained here. The introduction to semantic metadata is followed by describing how the use of a common set of semantics benefits information quality. The third aspect discussed in this section is the relation between semantics and business rules.

5.1.1 Introducing semantic metadata This section introduces semantic metadata and the need for consistency in order to preserve the relation between data and its actual meaning. Complementary, Appendix 0 provides an overview of metadata typologies, providing an overview of the relation between semantic metadata and other types of metadata.

Semantic metadata Semantic metadata provides semantics, meaning and context, to a data element (NISO, 2004). Semantic metadata may include tags, labels, definitions, context, concept, units, references and notes. In short it represents all data that adds context to other data. The data that is being placed in a context is also referred to as core data. Unlike other types of metadata semantic metadata is of particular interest to the human end user that is using data for a certain purpose or task (Borghoff & Pareschi, 1997). Implementation of descriptive metadata can vary. It may be a general or specific to a single data element and can be stored externally or with the data itself (Elmazri & Navathe, 2007). Semantic metadata is also directly linked to information quality indicators, such as origin, owner, age and mutation (Strong, Lee, & Wang, 1997).

Interrelations among metadata types Administrative and structural metadata are out of the scope of this research, as indicated in the scope in chapter 1.4. There are some relations between semantic metadata and the other types of metadata. Although not related to meaning structural metadata is related to semantic metadata as the semantics need to be linked in some way to the core data. Administrative metadata is not within the scope of this research. However, in some cases it may supplement semantic metadata in providing a context for the data. A timestamp that tells when certain data is entered into the system is not only relevant for administrative purposes. For the end user may also put that information into a certain perspective. Since administrative metadata is usually unique to the system it is lost upon transmission to another system. Some administrative metadata can be added to the core data itself,

47 for instance a timestamp. In this example administrative metadata is transformed into semantic metadata as it becomes visible to the end user of the data.

Example of metadata types To provide a better insight in metadata types a simple example is provided. Figure 7 shows an instance of a very rudimentary profit statement. All three types of metadata are present in this figure. The structural metadata indicates what core data (shown in blue) and internal metadata (shown in green) relates to the generic metadata (shown on the right, in orange). The actual date shown with the timestamp could be administrative metadata from when the instance was received, or be entered by hand. A business rule that could be present in this example is that net profit should be equal to income minus costs. Figure 7: Metadata example. Created by author.

The whole reason to carry out metadata on semantic metadata management is that semantic metadata only has value if correctly linked to the core data it describes. This means that semantic metadata must make sense and be linked to the data models that are in use. Semantic metadata must also be internally consistent. Semantics should not contradict and upon aggregation the meaning should not be lost.

Topic A: Consistency

 The data model should be consistent with relations among metadata be defined.  Implemented semantic metadata should be linked to the data model.  The data model should be horizontally and vertically consistent.  There should be little or no overlap in semantic metadata.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

5.1.2 Common set of semantics Using a common set of semantics is a form of standardization. Master data management is a well known form of data standardization in organizations. This analogy is used to present the benefits of using a common set of semantics. Subsequently, the use of a common set of semantics is linked to information maturity. A higher level of maturity requires standardization and reuse of semantics, which is achieved by using a common set.

Metadata management as a form of master data management Master data management is the effort of retaining a set of information within an organization which can be used as reference data. Semantic metadata management is related to master data

48 management in two ways. First it is a form of master data management itself, but performed on a very specific subset of the data within the organization: the external semantic metadata. Second metadata management may aid master data management efforts related to the core data.

Efficient metadata management is a critical aspect of overall system performance in large distributed storage systems (Brandt, et al., 2003). The goal in systems with specialized metadata management is to efficiently manage the metadata so that conventional directory and file semantics can be maintained, but without negatively affecting overall system performance. Although the size of metadata is generally small compared to the overall storage capacity of such a system, 50% to 80% accesses of a file system are to retrieve metadata. As such applying master data management principles to semantic metadata increases technological performance. Having metadata external of the core data does not only improve performance, it allows for easier management. Figure 8 shows a generic setup how metadata can be external in practice.

Figure 8: A setup in which metadata is loosely coupled to the core data. From Brandt et al (2003).

Information maturity and common set of semantics The added value of semantics increases when organizations are more dependent on information, be it for carrying out primary processes or for management and control. McClowry (2008) has developed the Information Maturity Model (IMM), which is an information-oriented equivalent to the Capability Maturity Model (CMM). The CMM was originally developed to assess an organization’s software development maturity level. The Information Maturity Model is shown in Figure 9 and the description of the five levels of maturity is given within the figure.

49

Figure 9: The Information Maturity Model by McClowry (2008). Image is released in the public domain.

Relation between information maturity and semantics Semantics adds context to data, turning data into information. The Information Maturity Model focuses on information itself and metadata is not explicitly mentioned. As presented earlier in chapter 4.1 metadata is key in enriching information and closely linked with monitoring information quality. As from level 1 to 5 information becomes more important semantic metadata will automatically become more important as well. The higher levels cannot be reached without dedicated attention to the use of semantic metadata. How information maturity reflects on semantic metadata management is shown in the list below:

 At level 1 the organization has no common information practices. All organization of data is solely the initiative of individuals. The use of metadata, including semantics, is ad hoc.  At level 2 there are some information management practices. Certain elements within the organization are aware of the importance of information for the primary processes. Yet there is no shared approach and metadata is not shared or reused.  At level 3 the organization has awareness on the importance of information on the primary and management processes. At this level policies, procedures and standards exist throughout all parts of the organization and information management is supported by IT. This results in the structural use of administrative and structural metadata and a degree of semantic metadata.  At level 4 information is managed as an organizational asset and staff is heavily engaged in information management procedures, including metadata management. Processes and structures that aid in the management and quality control endeavors exist.  At level 5 information is managed as one of the dominant organizational assets and is part of the organizational strategy. Metadata is managed by well coordinated processes and

50

supported by dedicated tools. Information and metadata are preemptively shaped to the needs of the organization.

Relation to networks and information chains The Information Maturity Model is related to a single organization, but its principles can be extended to information chains as well. In some respects a PPIC is similar to an information chain within a single large organization. The information is transferred among several departments, or partners, that may act as standalone entities with different responsibilities, procedures, frames of reference, standards and technology. The lower levels of the framework reflect how many organizations historically exchange data, for instance in full text with a lot of room for inconsistencies. The higher levels show increased standardization, for which more standardized processes and data models are used.

Topic B: Common set of semantics

 Use generic metadata as intermediate level between business needs and implementation.  Generic metadata derived by subject-matter experts should translate into implementation.  There should be a leading data model from which all implementations are derived.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

5.1.3 Business rules Semantics are of most value to experts in the primary process, the actual end users of the exchanges information. Semantics do not only provide context to human end users, but also to computer systems. This allows experts to be supported by dedicated expert systems. An expert system is a software system that supports human expertise and is usually found in knowledge intensive organizations. These systems serve niche markets and are often very specific, unlike accounting, document management and systems that can be found in nearly any organization. Many of these systems use semantics to apply business rules.

Expert systems are usually divided into subject-matter support and management support systems. The first category relates to systems that support knowledge workers in their tasks, which are usually part of the primary process. Management support systems are more generic systems which are used to provide information for management decisions and accountability. Both types of systems use similar techniques but apply them on different data. Subject-matter systems usually pertain a subset of specific application data sets while management support systems are usually fed by data marts that combine selected data from several application data sets (Kimball, et al., 2002). Witten and Eibe (2005) describe a number of techniques that are an important part of many expert systems. The following techniques require metadata for functioning:  Projections of existing and future trends. Combining aggregated data, data analysis and models.

51

 Reasoning using inference rules for forward and backward chaining.  Quasi-probabilistic judgments/classification and reasoning using Bayesian or fuzzy logic.  Use and creation of decision models that can be based on either statistic data or business rules.

Need for consistent semantics Within and among organizations existing definitions may differ. This may cause communication problems and misunderstandings, which may lead loss of opportunities or even harm performance due to faults. The effect of conflicting definitions is similar to low data quality (Silvola, Jaaskelainen, Kropsu-Vehkapera, & Haapasalo, 2011). A clear and consistent data model, regarding both data and metadata, is required when experts cooperate. Therefore semantics must be determined and agreed by professionals from the primary process.

Business rules and semantics Semantic metadata is closely linked to business rules and other types of rule based logic. Business rules are used to structure and control business processes. They can be carried out by people or be implemented in IT-systems (Borghoff & Pareschi, 1997). Business rules are in place in order to deliver the desired outcome in the primary processes as planned at the macro level, the organizational strategy (Papazoglou & Ribbers, 2008). Despite the use of the word ‘business’ this applies to private and public organizations alike.

Semantic metadata is related to business rules as the definitions and context do not only apply to data but also to rules. A rule such as “every return client is entitled to perform action X without further checks” gets a different meaning whether a client is defined as a “natural person” or “natural person aged 18 or older”. Additionally the business rules may be incorporated within semantic metadata. In economics a term such as “net profit” is generally defined as “net income minus net loss”, which is a definition that includes a formula. This relation is reciprocal since the same business rule has links to several definitions. The business rule that checks if the net profit is correctly derived (net profit = net income - net loss) has three concepts that have a semantic context. According to Kimball (2002) consistency among metadata and business rules can be maintained by defining and specifying the relations between the two.

De Leenheer (2010) specifies four levels of metadata: conceptual, relational, technical and operational. This model is depicted in Figure 10, in which it is applied to the architecture design of the Belgian tax office. The semantics are found in this model on the conceptual level. De Leenheer looks at business rules as rules and patterns among concepts and positions business rules at the relational level. These in turn are translated in technical models, which are implemented at the operational level.

52

Figure 10: Metadata levels in the business semantics management model. Derived from De Leenheer (2010).

Topic C: Business rules

 Semantics (definitions) are a source for business rules.  Business rules should be linked to semantics in order to safeguard their meaning.  Semantic metadata should be linked to business rules and vice versa.  Business rules must be kept at generic level to enforce consistent implementation.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

53

5.2 Semantic metadata management: Technology perspective The second category of elements that make up a semantic metadata management reference architecture is technology. Technology is an enabler for semantic metadata and data exchange. Additionally technology in the form of tooling may help to carry of semantic metadata management.

5.2.1 Standards and interfaces External semantic metadata must be linked in some way to the core data. Using a standardized interface allows applications and data models to be mapped to a standardized set of semantic metadata, lowering transaction costs in data exchange. This section introduces link types and its role in standardization of semantic metadata.

Link types Metadata needs to be linked in a way to the data it relates to. It must be known to both man and machine what relations exist between the data and the metadata (Sen, 2002). In the absence of links the information is lost. There are various ways of linking data to metadata. Five generic methods for linking are described in this paragraph, these are text, forms, tables, schemas and tags. The vast majority of implementations are variations on these five generic practices.

The most natural way for a person is to use text, for instance describing the origin, units, context and definitions. In a text the data and metadata are linked through grammar and are usually located close together. Text may exist on paper and digitally. The same goes for an alternative to texts: forms. Forms are structured formats in which data is entered. The headers of the various sections usually act as metadata, indicating what the meaning of that bit of text is. Many forms require a date with the phrase ‘date’ and the actual date itself both being metadata.

In IT- systems the data and metadata do not need to be in the vicinity of each other and the links do not have to be comprehensible for humans. The systems needs to be able to comprehend the links and the presentation should be comprehensible for the end user (Elmazri & Navathe, 2007). The actual data model and presentation may be very different. The most common form of linking data to metadata is the use of tables in a database. Metadata can be both in the header of columns as well as in the same row in another column. Instead of a table other structures can be used as well. Schemas in data systems are definitions of structure, content and optionally some semantics. A schema may encompass some degree of semantics in a similar way as the headers in a database, but mostly indicates what metadata is linked to the data.

Aside from fixed structures such as and forms it is also possible to loosely couple data and metadata. The most common type of loose coupling is the use of tags. In the setting of internal metadata a tag is semantic metadata itself. Usually being a single word or phrase to identify, classify or describe the related data. In the Netherlands the metadata initiative is used to make government information accessible on the web through tags (ICTU, 2006). Regarding external metadata a tag is a link to semantic metadata located elsewhere. In many XML standards links to external metadata are established by uniform resource locators, very similar to those used to identify websites (Debreceny, Felden, Ochocki, Piechocki, & Piechocki, 2009). Instead of linking to a website the URL links to a definition that is comprehensible to either man or machine.

54

Standardization of semantic metadata External metadata is usually standardized, which makes it easier to exchange alongside the core data. Standardization is an answer to a coordination problem. First of all standardization leads to reuse, thus lowering the quantity of semantic metadata and reducing complexity. Second information exchange is made easier since compatibility and interoperability are improved. Finally standardization may improve information quality.

Standard Description XML XML is the generic eXtensible Markup Language that is used in countless applications and web services. XBRL eXtensible Business Reporting Language. An XML standard used for financial reporting. UBL Universal Business Language, a standard used for transactions and invoices. HR-XML Human Resource XML. A standard used for exchanging human resource and employment related information. ebXML Electronic Business using eXtensible Markup Language. Table 4: Overview of XML based semantic metadata standards. Created by author.

Many semantic metadata standardization efforts are based around the XML format, as shown in Table 4. Besides XML based standards there are many others, but XML is the most dominant type of standard. XML is popular since it intrinsically carries metadata along with the data, plus it calls for agreements and definitions among stakeholders (Bakker, 2006). XML is mainly used in most web service technologies. In its role as data carrying format in web services it proved very useful. The inclusion of semantic metadata probably contributed to its success. This inclusion also forced both parties to adapt their technological infrastructure for these semantics, forcing them to draft the aforementioned agreements and definitions. As such XML presented a bottom up approach for semantic metadata management. According to Bakker XML is not the solution to the challenges regarding the use of semantic metadata among multiple stakeholders, it merely presents the ability for a better approach.

Topic D: Interfaces/standards

 The metadata should be independent from (existing) technological implementations.  There should be a loose coupling with the technical implementation.  The metadata should be independent from data standards (including rules) in use.  Use a flexible infrastructure based on interfaces.  The metadata management effort should be independent of the standards in use.  Standardize information formats as much as possible.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

55

5.2.2 External semantic metadata As presented in the scope of the research area this thesis covers only external semantic metadata. External semantic metadata has a low level of granularity and is loosely coupled to the data it is related to. This section provides insight in the aspects granularity and its externality. These two aspects define external semantic metadata and its specific added value.

Granularity Semantic metadata may relate to entire clusters of data (documents and collections), or individual facts (numbers, words), or anything in between (paragraphs in a document). The granularity is the level of detail the semantic metadata has, more specifically the level to which it is structured.

As with structural metadata the external semantic metadata is usually designed, i.e. specified, on beforehand. This may be a list or an entire data model that includes hierarchy and relations. Figure 7 shows an instance of a very rudimentary profit statement. The core data shown in blue is what it is all about, however without the context provided by the metadata (green/orange) these three figures would be meaningless. The values in green are quite specifically linked to the core data, with the three values ranging from high to low granularity. The values in orange have the highest granularity since they apply to each instance of this type of profit statement and may even be used in a variety of other information products.

Even within the external semantic metadata there is a range of granularity. For instance the semantic ‘address’ has a very low level of granularity. When an address is divided into ‘street’, ‘postal code’ and ‘city’ the granularity is much higher. Even then it might be possible to divide ‘street’ into ‘street name’, ‘number’ and ‘suffix’. As granularity rises the quantity and specificity increase as well.

Internal vs. external semantic metadata Metadata can be either internal or external. Internal metadata is metadata that is embedded with the core data itself. External metadata is stored alongside the core data as shown in Figure 8. Internal and external metadata are abstract concepts. They are best clarified by an everyday life analogy, regarding price tags in a clothing store. With internal metadata the price tag would be attached to the t-shirt (and thus part of the shirt) and indicates the price, size, brand and so on. This setup requires that every t-shirt has its own price tag. This results in a lot of effort and controlling uniformity and complexity. This way of adding metadata to t-shirts works very well when there is much diversity in the store and there are very little duplicates. In the case of external metadata the price tag is on the clothes rack and applies to every t-shirt on that rack. In this setup much less price tags are needed, it is easier to observe the price level of the entire store and when the price changes only one sign needs to be changed instead of a multitude of individual price tags.

The example shows that both internal and external metadata has its own set of benefits and drawbacks. In practice this means that a mixture is used. Combining the observations of Kimball and Brandt the optimal management approach can be determined by three factors, shown in Figure 11.

56

These factors lead to three categories: 1) Semantics which are generic and very common make up a small quantity of all semantics. The low granularity makes them applicable to a lot of core data and other metadata. In the example shown in Figure 7 these would be all semantics in orange. 2) Semantics with lower granularity but with significant levels of reuse can be considered part of the data, thus be kept internally, but can be patterned after a master data file. In Figure 7 these would be the “ABC group” and “euro”. 3) Finally the majority of semantics are (nearly) unique, such as the date “19-12-2008”. For this type of metadata standardization is not of any use.

With a single instance of Figure 7 this may be hard to , but with a 1000 instances of Figure 7 all external metadata can be reused, there could be about 100 company names and all timestamps are probably unique. The exact boundaries of each category is hard to determine and may be worthy of further research.

Figure 11: Overview of optimal application of internal and external metadata. Created by author.

Topic E: External metadata

 The metadata should be stored and managed independently from the core data.  Semantic metadata should be linked to the data model.  Semantic metadata should be exchanged among partners in the information chain.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

57

5.2.3 Tooling Tooling is a form of technology that may aid semantic metadata management processes. First the possible functions of tooling are described. These are followed by metadata repository standards. This section ends with a reflection on availability of tools. An example of metadata tooling is provided by appendix 12.1.5.

Functions of tooling Semantic metadata management can be supported by tooling. Such a tool is a dedicated piece of software that aids in managing a set of external semantic metadata. Tooling is able to support semantic metadata management in a variety of ways. All metadata management activities can in theory be carried out by people. Tooling may reduce that effort significantly and produce better results. It is possible that not all roles can be fulfilled by a single tool. In that case a mix of tools can be used if they are compatible. Based on literature (Pieter De Leenheer, 2009; Hepp, et al., 2008; Kimball, et al., 2002) six functions for a semantic metadata management tool have been found. These include a repository role, relation management, access, versioning, reduction and translation. A single metadata management tool may fulfill one or more of these functions. Several tools may also be used side by side. Yet according to Kimball it is uncommon for all functions to be supported by adequate tooling.

Identified functions for tooling: 1) Repository. A dedicated tool that acts as a collector for all semantic metadata at the conceptual level. 2) Access. A tool that makes semantic metadata available to human end users in a way that better suits human consumption, for instance for review or for use in the primary process. 3) Relation management. A tool in which relations among semantic metadata can be defined and reviewed. 4) Business rules. A tool that maps semantic metadata to business rules or even stores business rules. 5) Versioning. A tool that stores all past versions, maintains a change log and aids in the publication of a new version. 6) Translation. A tool that supports the export of generic level semantic metadata to operational systems and aids in the translation to other standards.

Metadata repository standards Most semantic metadata standards focus on the data model and schemas. Section 5.2.1 indicates that a lot of semantic metadata standards exist. Aside from using a standard to describe the data model it is also possible to standardize the capabilities and process regarding a metadata repository. Unlike data model standards there are hardly and standards relating to repositories and tools, except for a single ISO standard. The ISO has a standard for metadata registry, abbreviated by the ISO as a MDR, called ISO 11179 (ISO/IEC, 2004). In the terms of the section above the registry would be a tool that combines the roles of a repository with access. Each Data element in an ISO/IEC 11179 metadata registry:  should be registered according to the Registration guidelines.  will be uniquely identified within the register.  should be named according to Naming and Identification Principles.

58

 should be defined by the Formulation of Data Definitions rules.  may be classified in a Classification Scheme.

The de facto standard model for data integration platforms and exchange among large enterprises is the Common Warehouse Metamodel (CWM). The ISO standard matches the CWM standard, but is a higher level description regarding what features should be present. It is not a model that is easily implemented and as such has seen little adoption among corporate systems. On the contrary it has become popular among government agencies, mostly in the USA, Australia and the UK (Sbodio, et al., 2010). Possibly since off the shelf metadata technologies do not match government requirements due to very specific legacy systems, prompting the design of new dedicated systems. With the design of new systems design guidelines could be valuable and conforming to an international standard is favorable when justifying government spending.

Availability of tools Metadata management tools are present in most business grade database management systems. These tools run on the application level in the enterprise level and affect the Most of these tools relate to administrative and structural metadata. To a degree semantic metadata is present, but mostly as internal metadata within the same table (Elmazri & Navathe, 2007). High end data systems have shown a need for external metadata. An enterprise grade data warehouse usually holds several application data servers and uses data marts for easy access of core data and aggregated core data. Figure 12 shows a simplified model of a common large enterprise system with dedicated external metadata storage. The metadata management tool directly interfaces with the external metadata and has no link with the actual data or other components.

Figure 12: External metadata storage in a large enterprise data storage system, showing where a metadata management tool enters the picture. Created by author

In case of multiple parties involved in metadata management, such as in a PPIC, standard tooling does not fully meet the needs. Apparently Metadata management in cooperative systems require

59 multiple user access and process support (Kimball, et al., 2002). Most metadata management tools are aimed at single data architect, not multiple. If cooperative systems exist they are made from scratch, as seen in the tax office case in chapter 8. In PPICs metadata is usually centrally published in taxonomies or ontologies. Such semantic metadata models are relatively new and are very differently structured than semantic metadata stored in a data warehouse. Specific tooling for maintaining such taxonomies is hardly available and is mostly made from scratch to meet specific situational demands.

Topic F: Tooling

 Metadata management processes should be supported by adequate tooling.  Tooling should support a variety of roles and needs in the management aspect.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

60

5.3 Semantic metadata management: Process perspective The third category in elements that make up a semantic metadata management reference architecture consists of processes. In this category human interaction and cooperation play a prominent role. The first section describes how management theory looks at standardization in the primary processes of organizations, which is partly the result of metadata management. No best practices are derived from this section since the primary processes are not within the scope of semantic metadata management. The second section is about metadata management processes and the different roles that must be fulfilled. Finally, the third section is about cooperation amongst stakeholders.

5.3.1 Management theory on standardization Applying semantic metadata management in a PPIC is a standardization effort. There are many theories on standardization that view standardization from different aspects. These aspects include inter-organizational cooperation, innovation and competitive advantages and knowledge management. Table 5 shows that standardization is thought to have both positive and detrimental effects on organizational performance.

Positive effects Detrimental effects Economic &  Allows for being in control over  Standardization reduces management (aggregated) information uniqueness, loss of competitive theory on  Allows for being in control over advantage standardization technology  Alignment requires effort and  Standards reduce transaction creates lock in costs in chains  Standardization enforces  Standardization enables obsolescence innovation: reuse increases value  Standards are a compromise and  Standardization reduces time to never cover every area market Knowledge  Knowledge elicitation benefits  Knowledge workers know best: management from categories and procedures infinite options theory on  Definitions are valuable to bridge  Limitations change organizational standardization the cognitive distance between mentality creators and users  Standardization reduces creativity  Definitions and interfaces enforce and quality in creation quality Table 5: Overview of theories on standardization. Created by author.

Standardization is usually driven by those who benefit most, both amongst and within organizations (Egyedi, 2003). Within organizations these are the managers and those responsible for the IT- systems. Standardization in information allows for easier aggregation and for the use of business intelligence, creating a better insight in the organization, improving the control of the management over the organization. This has partly been achieved by master data management [reference other chapter]. Standardization in semantics might be even more attractive to the management since semantic metadata will not only describe the content of data but also the processes in which this data is used. The advantages for those responsible for the IT-systems have other causes.

61

Standardization may reduce the amount of data, making it easier to manage and structure IT systems in use.

According to Farrell and Saloner (1985) standardization and compatibility are key enablers of innovation through reduced operational costs and new starting point allowing even more advanced technologies. In their macroeconomic view they pose that standardization enables reuse. Reuse in turn adds value to both the users of the standard and society as a whole. Uniqueness commands a premium as the fixed costs are relatively higher. This premium might seem desirable, but reuse allows the fixed costs to be spread over a larger number of products. Reduced price levels force competition to develop new products. Increased gains allow for more spending on new products/innovation by those embracing innovation. From the market side cheaper products allow the savings to be spent on new innovative products, which usually demand a risk or low scale production premium at first. Additionally, standards reduce dissimilarity of products and increase the knowledge over the product, resulting a market that performs better.

Aside from positive effects of standardization literature indicates that there are also drawbacks relating to standardization. Organizational alignment requires an effort which is hard to trace to the products and their price level. It forms a part of the overhead costs which are often divided unequally over all products. Alignment may also create lock-in situations in which potentially better performing alternative standards do exist but are not adopted due to sunk costs of aligning and adhering to the standard in use. Farrell and Saloner (1985) do also pose that industries can be trapped in inferior standards but that these will eventually be overcome on the premise that information is complete. Those who use the standard should know their organizational environment well.

Apart from the costs and efforts relating to standardization the use of standards itself may be detrimental to performance. Standards are a compromise to satisfy a group of users, this holds true both within and amongst organizations. A standard does never perfectly fit all end user needs, especially when the standard favors a certain group more than others as is often the case in a standard forced on others (Egyedi, 2003). Another hazard of standardization is that it may reduce uniqueness. This means that a competitive advantage, or economic specialization might be lost.

Knowledge management and standardization From the field of knowledge management there are various views on standardization. Bessant and Tidd (2007) pose that regarding information products the knowledge workers need total freedom of expression in order produce the most creative work and best level of quality. This is in line with the infinite options theory which poses that the combination of procedures, accountability, time limits and so on pose unnecessary barriers that limit the design space of professionals and potentially result in less than optimal performance. Standardization imposes new rules to add to this list. Semantic metadata management is a standardization effort that may reduce the available vocabulary and thus limit the design space. The extend of this limitation depends on the granularity as shown in chapter 5.2.2.

Standardizing contents or processes has an impact on the organizational mentality when producing information products. The atmosphere in which tasks are carried out changes when guidelines are in

62 place. Standardization may reduce creativity, imagination, conceptual thinking, curiosity, effort, judgment, commitment, nuance, reflection or a number of other valuable attributes (Vanderfeesten, Reijers, & Van der Aalst, 2010).

Contrary to the theories presented above there are knowledge management theories that do embrace the use of standards for capturing information. Knowledge elicitation is the effort of capturing expert knowledge in information products. This term is mostly linked to filling a knowledge repository but the creation of every information product can be considered knowledge elicitation (Bessant & Tidd, 2007). Knowledge elicitation is improved in speed and quality when documents to capture that information are structured well (Universiteit van Amsterdam, 2010). Alternatively definitions are valuable to bridge the cognitive distance between creators and users of information. In line with the notions of division of labor and specialization of individuals within organization standardization plays an important role. It may become harder to understand each other when the cognitive distance increases. Two lawyers with the same specialization and working in the same department will understand each other fine. Lawyers with different specialization and working in different organizations will have more difficulty understanding each other. Having a reduced, common, well documented overview of semantics aids in bridging communication barriers. A third reason for standardization is that the use of standards such as taxonomies and ontologies may enforce quality through uniformity and a stronger link between definitions and meaning.

Conclusion on management theories on standardization Standardization has both its advantages and disadvantages. Allowing for further innovation, improved insight and control over the organization and increased quality knowledge elicitation are advantages that are desirable to any organization. On the other hand imposing barriers on the professional judgment of knowledge workers and standards that do not fully meet operational needs are a serious threat to operational performance. Given the number of variables for semantic metadata management the design space should allow for standardization in such a way that many of the benefits are materialized and most disadvantages avoided. Therefore the reference architecture should leave as much room to those using it to implement semantic metadata management in order to incorporate or avoid certain characteristics.

5.3.2 Metadata management processes The metadata management processes evolve about managing the lifecycle of the common set of semantics. Versioning is one of the major activities in metadata management. Through verification and versioning the match between semantics and actual meaning is retained.

Metadata lifecycle Semantic metadata is never static, it has a lifecycle. Much of the metadata management effort is spent at versioning. The organizational structure, primary process, actor constellation, technology and all other components will change over time. Having more stakeholders, as in a PPIC, makes changes more frequent and they may potentially impact more processes and stakeholders. Versioning must be incorporated into the management design, since it will be a recurring activity. If not well embedded within the management structure may be detrimental to quality. Change management requires well defined protocols, which are agreed upon by all involved stakeholders

63

The metadata lifecycle consists of six phases (Luther, 2009):  Specification. Semantics are created by those with subject-matter knowledge.  Application. Semantics are applied, being used to provide context for core data.  Duplication. Semantics are duplicated when used over multiple systems, or when printed out on paper.  Verification. The relation between the definition and its actual use must be periodically verified.  Evolution. The meaning attached to semantics may change over time when the meaning of the content changes or perspectives on the contents change.  Deletion. Eventually every type of metadata will become obsolete and will be discarded.

De Leenheer (2010) has created a model for both versioning and maintaining organizational alignment called Business Semantics Management. In this model a top down view is employed. Semantics are specified during a reconciliation phase and verified during the application phase.

Figure 13: Business semantics management cycles. From De Leenheer (2010).

Metadata management roles According to Silviola (2011) data management roles within organizations are imperative. Semantic metadata management is seen as key component in being able to manage the core data. Silviola does not provided any details on what specific roles should be present. Janssen, Gortmaker & Wagenaar (2006) have identified eight types of roles for web service orchestration in public administration. These observations are very valuable since the context is very similar to this research. First, the public administration/e-government setting meets the criteria of the PPIC definition. Second, web services are one of the premier means of electronic data interchange. Third the means of cooperation is not only on the technical level but also is strongly related to the content and primary processes.

The roles in Table 6 have been tested on the case studies in chapter 9 and 10, with and updated table shown in the tradeoffs of section 9.4.

64

Role Description Initiator and enabler This role is to convince and stimulate agencies to participate in and role commit to an automated process execution. Some organizations might initially resist the idea to use Web service orchestration technology for improving cross-agency processes. This might be due to a lack of knowledge, but also healthy suspicion. Often it is necessary to educate agencies on the basics of the technology and to show the potential advantages. Developer role This role is about defining the requirements for each agency in order to enable cross-agency processes. This role involves the identification of the organizations and departments involved and determines the interests, objectives, and requirements for each of them. Standardization role Technology interface standards should be determined and set as a standard. Existing systems can be selected as standard, but it can also be better to develop and impose new, preferably open standards. Control and progress The time-dependent sequence of activities performed by agencies needs monitoring role to be managed. This role should control the sequence of Web service invocations and collect progress and status information. All unexpected events, such as non-availability of Web services, should be tracked as soon as they occur and analyzed to determine what actually did happen and why, to ensure reliable cross-agency process execution. Facilitator role This role facilitates the implementation of cross-agency processes by collecting and disseminating best practices, reference models, and reusable system functionality such as identification, authentication, and payment. Ideally, functionality and databases are shared when possible and duplication of efforts is avoided. Service and product There should be a one-stop shop that provides a consistent point of aggregator role aggregation and is equipped with logic to meet customers’ needs. Needs should be analyzed and translated into product and service requests, and related products and services should be recommended, multiple processes started, status information provided, and the results of each process aggregated into a single answer. For this purpose the services and products should be bundled into one large catalogue and rules determined to translate citizens’ and business’ needs into the appropriate multiple cross-agency processes. Accountability role As a general rule in modern societies, governmental decisions should have accountability. This role should ensure that the motivations behind decisions made by each agency and the performance and outcomes of the complete cross-agency process can be accounted for. Process improvement Changes in processes and governmental rules often affect more than one role agency. This role should maintain an overview of the cross-agency processes and define mechanisms and procedures to assess the implications of changes in law, technology, and other developments. This role initiates complex transformation processes to restructure the public sector. Table 6: Roles identified in web service orchestration by Janssen, Gortmaker & Wagenaar (2006).

65

Topic G: Versioning

 The metadata management approach should be able to provide insight into the impact of versioning.  There should be a process/protocol for versioning.  There should be a process/protocol for changes to the metadata architecture.  Besides the current metadata model the historic versions should be accessible.  The metadata architecture should be flexible in order to respond to changing needs.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

5.3.3 Cooperation with partners This section explains the stakeholder complexity in semantic metadata management beyond organizational boundaries. Best practices from other inter-organizational standardization efforts are presented as potential solutions.

Dealing with stakeholder complexity Most literature on semantic metadata management assumes that the designer or the management is able to any decision that is desired. The project management approach that most literature uses assumes a form of hierarchy. With or without consultation among those involved the designer or management takes a decision on what the end product will be like and how it is to be achieved. In practice the decision making process isn't that simple, even when on paper the decision making process is deemed hierarchic. Within most hierarchic organizations there are departments with conflicting interests or experts with a knowledge advantage compared to the management (De Bruijn & Ten Heuvelhof, 2007).

Since PPIC's span multiple organizations hierarchic decision making is even more difficult. Hierarchy is replaced by joint decision making. A process that is more laborious than hierarchical decision making and with less predictable outcomes. Cooperation amongst multiple stakeholders requires a form of trust, whether or not achieved by letters of intent or contracts. The mixture of both public and private organizations make that there is a difference amongst organizational interests, information needs and mentality. The diversity of stakeholders makes cooperation more difficult, but also allows for interesting opportunities.

The stakeholder complexity in semantic metadata management in PPIC's is illustrated in appendix 0. Four domains of potential stakeholder related issues are listed. According to De Bruijn, Ten Heuvelhof, & In 't Veld (2007) there are many options to deal with many of the listed stakeholder related issues. Instead of starting off with a substantive discussion a process design leaves most of the substance and details of the agenda until there and a level of trust and shared goals has been built. Additional approaches presented by De Bruijn are substantive, command and control and project management.

66

Irrespective of the chosen process approach and the stakeholder constellation the reference architecture ought to be applicable. The impact on the reference architecture is that a degree of leeway is required. The reference architecture is a balancing act. On the one hand there must be enough rigor to provide interoperability. On the other hand there must be leeway in the design that follows from its use to fit organizations with difference characteristics or changes in functional requirements. The mixture of rigor and leeway has been achieved by using both design principles and tradeoffs.

Lessons from standards consortia The previous section poses many potential challenges relating to cooperation in networks. There are many theories available on cooperating with stakeholders in various constellations, ranging from management in networks (De Bruijn & Ten Heuvelhof, 2007), to project management, to policy measures (Janssen et al., 2010), to open cooperation models (Bessant & Tidd, 2007). This section draws lessons for this specific context from endeavors with similar complexity in a near similar setting. These lessons can be used in combination with the mentioned theories.

Semantic metadata management is a relatively new phenomenon, especially when carried out in an information chain such as a PPIC. However, it is not the first standardization effort in chains. It is not even the first standardization effort regarding technology in the very same setting of PPIC’s. Janssen, Gortmaker & Wagenaar (2006) have already identified challenges and growth stages in web service orchestration in public administration and PPIC’s. The wide scale use of web services is one of the reasons semantic metadata management is an issue.

Regarding stakeholder relations the main area for drawing lessons are standards consortia. Standards consortia are conglomerates of stakeholders who participate together in a standardization effort. They differ from standard setting organizations in that they are usually a single issue conglomeration and operate at a much smaller scale. Often they do not only result in a standard, but also in a form of pact or declaration or intent.

Egyedi (2003) describes Three features of standards consortia that might benefit metadata management in public private information chains:  First, consortia impose a form of clarity within the actor constellation. They turn a difficult situation into one which is better to understand. Instead of viewing the situation from a holistic view, which is hard to grasp, the problem is cut into several topics (technological standards, data formats, processes, etc.). For each of these fields the conformity is measurable.  Second, the use of interfaces allows for individual alignment. Conforming to the interfaces defined between stakeholders allows each stakeholder to do a lot of work on their own. This limits the need for cooperation, reducing time and effort. Standards lead to pre-planned emergent behavior. If each link in the chain conforms to the agreements the whole chain will function in a new manner profitable to all.  Third, consortia act as a high level steering committee. This functions as a platform in which decisions are made and interests are weighed. It allows the conglomerate to cope with the diversity of actors and managing the various interest. Alignment between many organizations at once reduces the overall inter-organizational alignment effort.

67

Topic H: Cooperation with partners

 The metadata management processes and coordination structures must be defined and implemented.  The goals of the semantic metadata architecture should be defined and accepted.  Roles and responsibilities among organizations should be present and formalized.  All required roles and responsibilities should be filled to an adequate degree.  Management processes and protocols should be defined and respected.  The ownership model should match the PPIC needs.

The table above summarizes the best practices distilled from the literature discussed in this paragraph. These best practices form one of the topics in the topics in the preliminary reference architecture shown in chapter 6.

68

6 Preliminary architecture The preliminary reference architecture has been based on literature and several expert opinions. Chapter 6.1 explains that the preliminary architecture is a snapshot of the reference architecture half way during the research process. Chapter 6.2 lists the topics and their best practices that have been distilled from the research in chapter 5. Finally chapter 6.3 describes the relation between the topics and the final reference architecture.

6.1 Reference architecture design process This section gives an overview of the steps that were taken to develop the preliminary and validated reference architecture. The development of the (preliminary) architecture was an iterative process. Continuously new insights were gained from literature, case studies and experts. Literature provided a lot of insight and best practices on many individual topics related to semantic metadata management. How these topics relate to each other and which topics should be leading was hard to find out. A holistic view was missing. This view could be gained from the case studies. Reality forces designers of metadata management processes to consider the relations between all elements included in the design. Using these observations the earlier knowledge gained from literature was put into perspective. This in turn led for a search of literature in areas that were not yet explored earlier in the research process. For that reason the design of the preliminary architecture and drawing lessons from the case studies is an iterative process, as is stated in chapter 2.2.4 on research methodology.

Additionally along the way the final format and scope of the reference architecture became clear. In this thesis the preliminary reference architecture is presented as if it were a single design (version A), which was then tested and turned into an evaluated reference architecture (version B). In reality the design was constantly updated, modified, rephrased and changed. There were no two designs (A & B) but many iterations. The preliminary architecture is a snapshot of what the architecture looked like when most of the literature study was complete. The evaluated referenced architecture is the version that incorporates lessons learned from the case studies and the comments of the expert panel.

All actions taken during the design of the reference architecture have been listed in 14 steps. These 14 steps are listed in Table 7. Appendix 0 provides the details of the actions performed in each step. This appendix is very valuable for those who want to know rationale behind the design process.

69

• Deriving principles from theories and best practices in literature Step 1

• Clustering principles into similar topics Step 2

• Adding first lessons from case studies/additional literature Step 3

• Removing principles unrelated to metadata management Step 4

• Adding first lessons from case studies Step 5

• Moving some principles to preconditions Step 6

• Restructuring topics according to new insights Step 7

• Defining the tradeoffs and listing their contents Step 8

• Validation of principles in case study interviews Step 9

• Rewriting clusters into design principles Step 10

• Matching design principles to each other Step 11

• Writing design principles in TOGAF format Step 12

• Validation of principles in expert session Step 13

• Finalizing and updating the tradeoffs Step 14

Table 7: Overview of the 14 steps of reference architecture design. The blue steps relate the preliminary phase and the red steps to the evaluation phase. Created by author.

70

6.2 Design propositions The principles and best practices distilled from the literature and expert interviews have been categorized in 8 topics. These design propositions are listed below and are a collection of the tables with best practices from chapter 5. Each topic has been evaluated in the case studies. The evaluation process has been presented in the previous section. Appendix 0 provides greater detail on the evaluation process. Appendix 12.1.9 shows the interview protocol that was built around these 8 topics.

Topic A: Consistency

 The data model should be consistent with relations among metadata be defined.  Implemented semantic metadata should be linked to the data model.  The data model should be horizontally and vertically consistent.  There should be little or no overlap in semantic metadata.

Topic B: Common set of semantics

 Use generic metadata as intermediate level between business needs and implementation.  Generic metadata derived by subject-matter experts should translate into implementation.  There should be a leading data model from which all implementations are derived.

Topic C: Business rules

 Semantics (definitions) are a source for business rules.  Business rules should be linked to semantics in order to safeguard their meaning.  Semantic metadata should be linked to business rules and vice versa.  Business rules must be kept at generic level to enforce consistent implementation.

Topic D: Interfaces/standards

 The metadata should be independent from (existing) technological implementations.  There should be a loose coupling with the technical implementation.  The metadata should be independent from data standards (including rules) in use.  Use a flexible infrastructure based on interfaces.  The metadata management effort should be independent of the standards in use.  Standardize information formats as much as possible.

Topic E: External metadata

 The metadata should be stored and managed independently from the core data.  Semantic metadata should be linked to the data model.  Semantic metadata should be exchanged among partners in the information chain.

Topic F: Tooling

 Metadata management processes should be supported by adequate tooling.  Tooling should support a variety of roles and needs in the management aspect.

71

Topic G: Versioning

 The metadata management approach should be able to provide insight into the impact of versioning.  There should be a process/protocol for versioning.  There should be a process/protocol for changes to the metadata architecture.  Besides the current metadata model the historic versions should be accessible.  The metadata architecture should be flexible in order to respond to changing needs.

Topic H: Cooperation with partners

 The metadata management processes and coordination structures must be defined and implemented.  The goals of the semantic metadata architecture should be defined and accepted.  Roles and responsibilities among organizations should be present and formalized.  All required roles and responsibilities should be filled to an adequate degree.  Management processes and protocols should be defined and respected.  The ownership model should match the PPIC needs.

6.3 From prescriptive to evaluated reference architecture The topics are distilled from the literature study. They consist of a number of best practices on the same subject. This makes the topics prescriptive and very broad. Evaluation on the case studies allowed the topics to be become more specific and mutually consistent.

Relation between topics and evaluated architecture As described in appendix 0 all topics proved relevant to semantic metadata management. However, not all best practices from literature could be represented by design principles. The applicable best practices that make up the topics can be found in the evaluated architecture in three forms: 1. All topics were turned into design principles with a much narrower scope and are consistent. The design principles have the same subject as the topics but are more specific and much better defined. 2. A number of best practices were found to be relevant, but not generically applicable. These form the basis of the tradeoffs, best practices that are applicable in a certain situation or require a balance of positive and negative effects. 3. Some best practices have been rewritten as preconditions. For instance, having managerial support is not something that is designed, but should be present or be reached before the start of the project.

72

7 Case study 1: Child protective services This chapter entails the first of two complementary case studies. The selection criteria for the case are given in chapter 2.3. This case study was carried out at one of the 15 Bureaus Jeugdzorg in the Netherlands. Due to the long name Bureau Jeugdzorg is further referred to with the commonly used abbreviation BJz. The particular BJz in this case study covers a large geographical area in the Netherlands and employs several hundred people.

Scope BJz has a wide variety of responsibilities that ranges from minor incidental care, to probation, to providing access to various kinds of care providers, and safeguarding the children’s wellbeing. This case study focused on the responsibility called Jeugdbescherming, which is best translated as child protective services. This responsibility was chosen since this fits best with the PPIC definition. Additionally the focus in this case study lies on the information system. The information system extends beyond the technical domain since it includes the aspects people, processes, data and technology.

Case study approach The findings in this case study are based on four ways in which information regarding this case was extracted from BJz:  A large number of interviews with various employees at BJz. The interviews relating the primary process include 2 secretaries, 3 family supervisors, 3 behavioral scientists and 2 team leaders. The other interviews relate to the information technology department and metadata management and include the data warehouse specialist, the application manager and an information analyst.  Close inspection of 10 case files. Each case file on paper ranges in thickness from 2 centimeters to 25 centimeters. The analyzed files are from two different locations and were selected to reflect reality. They range from simple to very complex, differ in issues at hand and originate from various family supervisors from different teams.  Site visits to three locations of BJz in three different cities. Several interviews were held in this setting and daily activities and difficulties were shown on site.  Dozens of documents including manuals on working procedures regarding child protective services, manuals for information system IJ, forms and checklists, description of the information architecture, data structure, lists and glossaries with definitions and metadata standards and formats and aggregated data reports for internal and external use.

The preliminary reference architecture from chapter 6 has been reviewed in the light of the information gathered in this case study. During the case study notable topics were identified. A topic can be seen as a cluster that consists of an introduction, observations and findings. In the introduction the topic and its link to the architecture are presented. This is followed by observations which are presented as factual as possible. Each topic has its own findings in which the observations are valued. To conform with the literature study the same approach has been used and the topics are grouped in the sections technology, data and processes.

73

7.1 Background In order to understand the specific needs and complexities of the information system and the stakeholder context background knowledge regarding child protective services is required. This section details a brief description of the primary processes and the position of BJz in the information chain.

7.1.1 The primary processes In two cases a child can be placed under the care of BJz. The most common situation is that the Raad voor de Kinderbescherming, the child protection council, asks the court for a ondertoezichtstelling, which translates to supervision by BJz, which is normally abbreviated to OTS. The OTS is requested when it is found that the child resides in a situation which is (potentially) detrimental to its physical or mental wellbeing. Various reasons could call for an OTS and include parents with drug or mental problems, neglecting proper care and food, domestic violence, child molestation, loverboy situations and so on. The court signs of on a one year OTS period, which can be extended one year at a time. The average OTS period is about 3 years. The second but less common possibility is that BJz is given custody if the family situation is permanently uninhabitable. This second category is much less laborious since in such a case the child normally resides in a safe environment with proper care, resulting only in various administrative tasks.

In the child protective services role BJz has two major responsibilities. First it is to safeguard the physical and mental wellbeing of the children placed under the care of BJz, and to monitor their situation continuously. Second it is to coordinate all the types of care given to the children and family members, to see to it that all points of concern are being addressed and to monitor the progress.

These two tasks are carried out by one of the many family supervisors working for BJz, which form the bulk of BJz employees tasked with child protective services. They operate in various small teams. The caseload varies from 15 to 25 cases per supervisor, depending on severity of the cases and working hours. The procedures are based on the Delta methodology. This is a relatively standardized approach that is applied to all cases and is used throughout the Netherlands. Since the cases are rather complex the family supervisors receive aid from a number of behavioral analysts, who are specialists with an in depth study in the various aspects of child care. The family supervisors are also supported by secretaries regarding administrative tasks.

The team leader is formally responsible for the content and timeliness of all information products and needs to monitor how the family supervisors and behavioral analysts are performing. The team leader also serves as the interface between the primary processes and the higher levels of the organization. The team leader needs to manage the capabilities of his team, is to provide insights and is to implement organizational policies.

74

7.1.2 Information chain BJz can be considered part of a Public Private Information Chain as it is part of a network consisting of a variety of public and private parties in which child care related information is exchanged. BJz is the central node regarding information collection and is considered the director for all bodies that aid or monitor the client.

The information chain in which Jeugdbescherming is located can be viewed on two levels of aggregation. The first is a high level view in which BJz is seen as an institution within a chain of institutions. The first view is the primary view within this research, since it is centered around Public Private Information Chains. The second view is how organizations look at themselves. The proposed architecture must work in both circumstances for it to be acceptable.

Public Private Information Chain Figure 14 shows BJz from the Public Private Information Chain perspective. Yet this figure hardly looks like a chain. The organizations and individuals aside from BJz interact with each other as well, but it proved too difficult to pattern these. Every case is unique the flow of information differs from case to case. Therefore Figure 14 is limited to the flows that are certain to exist, which also corresponds with the image of BJz being the central node regarding information on child protective services.

Figure 14: Flows of information to and from BJz. Created by author

Figure 14 shows both individual case related information flows as well as aggregated flows. The color green represents those directly related to child protective services, light blue consists of official

75 bodies that ‘contract’ BJz, purple represents care providers and dark blue denotes other related parties.

Organizational view Figure 15 shows the information chain from the point of view of the primary process within BJz. The team, as described in the introduction, is posed as the central entity that carries out the coordination in each case.

Figure 15: Information chain within BJz, showing both communications within a team and as a team with others. Created by author.

In Figure 15 four clusters of types of communications can be distinguished. First the communications within a team are listed in blue. The family supervisors can be seen as a central node regarding the exchange of case related information. Additionally some information is aggregated for the team leader in order to manage the team. This includes the planning, goals and case loads. Second, case related information is shared on a peer level with other dedicated child care bodies, which are colored green. Third, information is used in aggregated form within BJz as is shown in red. This mostly concerns aggregated information for management and accountability, with sporadic individual cases with tough and important decisions. Fourth, information is continuously shared with the clients and other partners in the chain, plus sporadic contacts with a variety of third parties.

76

7.2 Metadata management – Technology This section tests three principles that are mainly related to the technology level. The technology level is introduced by providing an overview of the current architecture and a view on its adaptability. This is followed by the interfaces and standards that link the semantic metadata to its application. Finally, the third section describes what tooling is in use and which roles for tooling have been identified.

7.2.1 Architecture & adaptability Within BJz the main IT-system that supports the primary processes and collects the data that is to be aggregated into reports is called IJ. The name IJ is an abbreviation for “Informatiesysteem Jeugdzorg”, which translates to “information system for child care services”. IJ is a system that supports the primary processes with report generation and is the main IT-system that supports the primary processes and provides the organization with information. Besides IJ there are 27 other software programs in use.

IJ is based on an Oracle 10G database platform in which the tables contain the data that make up the content of the forms and reports that are hardcoded in the Microsoft .NET environment. The user interface is implemented in Microsoft Active Server Pages and is accessed through the browser. When reports are drafted they remain saved within the database, but can be exported by means of a COM-object relationship with Microsoft Word. Some information products are exported as PDF.

When implemented over a decade ago IJ represented a state of the art system, yet nowadays its performance is found less than satisfactory. Over time changes have been carried out regarding the system architecture, data model and hardware. As with many system designs from that era the adaptability of the system proved low, making changes expensive. An extra complicating factor is that there is no real forum to discuss architecture changes in IJ among all 15 BJz in the Netherlands. Subsequently each BJz adapted its own system to meet local needs, many of which are comparable. As a result IJ that started out years ago as a common system is now different in each BJz. This means that case files are handed over on paper between two BJz and are manually entered in the other system.

Principle No.8: Adaptable architecture The lessons that can be drawn from IJ is that not only the contents of the information system will change over time, but also the context. When the context changes the fundamentals of the architecture might need to change. The contents can be changed through versioning protocols, see principle 4. Changes in architecture pose two different requirements. First, the technical design needs to be adaptable, for instance through loose coupling, the use of standards and SOA. Additionally, there needs to be a platform in which the fundamentals of the architecture can be discussed amongst all users.

77

7.2.2 Interfaces & standards The PPIC in which BJz operates originally only exchanged information on paper. Many information products were reports that did exist on paper before being implemented in IJ. Other implementations such as new paper forms are specified alongside IJ, they are not derived from the de facto leading data model. This means that changes are not consistently implemented, causing errors. Orchestration and consistency checks amongst all implementations are best effort. When inconsistencies are encountered they are remedied.

The principle states that there should be loose coupling with the implementation using standardized interfaces. Since the semantics are hardcoded in IJ there is no possibility to use an interface. Creating an interface is this particular situation is not worth the effort. As IJ is possibly replaced in the future it is better to include such an interface in the list of desired specifications for a new system.

Principle No. 9: Mapping with implementation Currently there is no loose coupling between semantics and their implementation. The semantics are hardcoded in IJ. Having a leading data model that is mapped with the other implementation would avoid inconsistencies in the implementation.

7.2.3 Tooling for metadata management At this moment there is no dedicated tooling for metadata management in use with BJz. Semantics regarding aggregated reports are kept on paper and in MS Word files. Additionally the terms from the Rapportageformat and the newly developed national quality indicators are available on PDF. The definitions are kept in these list to ensure access and to act as a form of repository, having a single location in which all definitions are stored. The semantics used within the primary process are either hardcoded in the forms present in IJ or are non-standardized, being internal semantic metadata stored alongside the core data in IJ.

The lack of tooling is not deemed problematic since the quantity of semantic metadata used in the forms and reports is of a quantity that can still be maintained by one or more individuals. A further reason that tooling is not required yet is the fact that semantics are not actively shared within the information chain. As of now there is no need for sharing them.

It is expected that tooling is required once the IT-infrastructure matures and information is exchanged in a more standardized manner.  When the contents of the information products are provided with semantic metadata and improved versions of IJ with higher granularity in categories and additional process related information, the quantity of semantics will increase. This quantity is expected to be beyond the ability to manage without processes or tooling.  There is a desire, existing both within and amongst BJz, that relations between semantics are defined. A first step in this direction is the JZ-XML project. Defining relations can be done manually but a tool that aids in mapping would greatly reduce effort and aid in consistency, quality and presentation.

78

 As gradually changes are implemented in IJ and the corresponding data model it becomes harder to keep an overview of various versions in semantics. This overview is required for the change management processes in section 7.4.2 and the creation of reports with aggregated data as described in section 7.3.3.

Principle No.5: Adequate tooling Without standardized exchange of information and little use of standardized semantics the metadata management can be carried out without tooling. Any increased use of semantic metadata makes tooling very desirable, or even necessary, as is shown in the BJz case. The roles for tooling that were found in use or were desired are repository, access and relation management. In section 7.3.3 the roles of version management and mapping with business rules are confirmed as well. The need for translation might arise in the future but was not directly observed in this case.

79

7.3 Metadata management – Data In this section three principles are tested that relate mainly to the data level of semantic metadata management. First the topic of external metadata is discussed since it is related to having a consistent model described in the second section. The third and final section describes how the semantics are related to the business rules used in the primary process and for creating aggregated reports.

7.3.1 External metadata When making an inventory of the semantic metadata in use with BJz three categories can be observed:  Semantic metadata related to categorization and context, such as the field names in the forms and column headers in the database. The column headers are not for end user consumption as they are linked to IJ. Both groups are external metadata. However, all of them are hardcoded in IJ, which was a common approach in the era when IJ was developed. This diminishes the advantages of external metadata with respect to metadata management.  Semantic metadata related to the contents of the forms is internal metadata. In order to allow for maximum freedom the end user may provide any input, as long as it is within an 10.000 character limit. The downside of this freedom is that additional text must be added to provide context for the findings that are central to that section. Additionally, free input may reduce consistency of this added context. A different choice of words may not show up during a search.  Semantic metadata is related to aggregated information. This category is fully external as these definitions are kept on paper, as shown in section 7.3.3.

The current implementation allows for limited management options and automation. Versioning is very difficult. Changes to the first category have to be hardcoded in IJ. While doing that older versions are lost and a mismatch in older content and new semantics may arise. Creating new columns in the database for changed semantics creates problems during aggregation as detailed in section 7.3.3. With the second category being semantic metadata within plain text options for automation to support the knowledge workers with their daily activities are limited. End users have filed the wish for contents to be interpretable for computer systems. For composing a new information product it would be very valuable if every observation regarding for example for 'education' can be easily retrieved instead of having to manually search through a dozens of information products for the right excerpts.

Principle No. 3: External metadata The use of external metadata is limited within BJz. This was to be expected since literature indicated that full text information products often lack external metadata. Highly standardized products such as financial records or measurements are more likely to use metadata. Making all metadata external is beneficial for BJz in two ways. First it may make metadata management efforts much easier, especially related to versioning. Second, tasks that are essential but take up time that otherwise could be spent at professional judgment could be further automated. Possibly having the metadata externally may also increase systems performance, but this could not be fully verified.

80

7.3.2 Consistent data model The data model within IJ originates from the paper forms in use before IJ was introduced. 10 years ago IJ merely provided a digital way of filling in the forms already existing on paper. Nowadays the implementation within IJ can be considered the de facto data model, even though it is hardcoded. The database on which IJ runs is a relational database, describing a number of relations. A result from its heritage is that the data model in IJ is not fully consistent, plus there is a lot of overlap. Additionally not all semantics are well defined, since they are titles in forms not definitions.

The need for definitions was limited since the forms were always seen and analyzed by experts in its original context. With the current automated reuse of fields from various forms this context is lost. According to the end users the reuse of fields is helpful, but from time to time can be detrimental to information quality as well. Over time changes have been made to make the data model more consistent and to reduce overlap. Partially because of automation such as the reuse of fields over several forms, also in order to provide a better match with queries used in reports with aggregated data, but mostly at the request of end users within the primary process.

Currently the data model regarding single cases (in IJ) and aggregated reports are unrelated. No relations are defined between the two, the consequences are presented in section 7.3.3.

Principle No.6: Consistent data model The data model within BJz is not fully consistent. This is a heritage from digitizing all paper forms previously used. These inconsistencies hamper the reuse of information between different information products. This reuse may reduce the effort carried out by family supervisors when properly implemented. Creating consistency requires relations among semantics to be specified. This cannot be done by the application manager. It requires the cooperation of the information analyst and subject-matter experts, in particular the behavioral analysts that are tasked to check various documents.

7.3.3 Link to business rules Within BJz a lot of reports are drafted for both internal and external use. These reports vary from a single table with an overview for a single municipality to the quarterly reports to the province which are dozens of pages long. Some reports are drafted periodically, such as the weekly case loads per team, the monthly production figures or the quarterly and annual reports to the province. On occasion unique reports are drafted in order to answer a certain question for management purposes or on the request of third parties.

The contents of the reports are extracted by aggregating all relevant individual cases. Every night relevant data from the data storage is extracted and reformatted though ETL (Extract, Transform and Load, a common database technique) and loaded into a dedicated business intelligence database. On this database queries are run in order to aggregate the individual cases to construct the desired information. Within these queries business rules are incorporated in order to correctly classify cases

81 and to determine if thresholds or terms are violated. The queries are created, stored and run in the IBM Cognos software program which is commonly used in medium and large enterprises.

Depending on the type of report the drafting of the queries takes half an hour to several weeks. Most of this time is spent on determining what data is exactly present within the various tables and performing checks whether the queries provide the exact information that is desired. The exact definitions are looked up manually. The definitions are on paper, most of the metadata is present in the forms that are present digitally within IJ. There is no mapping between semantic metadata and business rules.

The close relation between business rules and semantic metadata is clearly shown in this case. Over time definitions have changed and for altered semantics new columns have been added in the table. In queries ranging over a time span that covers both the old and new situation combining both to provide accurate data is very hard and time consuming.

The creation of reports with aggregated data is a transformation process in which in each of the four links of the chain there is a potential that things go wrong:  First within the primary process the data must match the semantics (in case of forms) or the semantic must match the context (in case of tags).  The ETL that extracts the relevant date from the primary process and transforms it in a way that business intelligence software can use it should be correct.  It is possible that the queries do not match what the report claims to provide.  Based on the reports created within the official reporting process employees carry out their own calculations and modifications of tables. This is out of scope of this research.

Principle No.2: Mapping with business rules The case shows that there is a strong relation between semantics and business rules, even though their use is very different. A change in the semantic metadata may have impact on the associated business rules and vice versa.

When the semantics are mapped to the existing queries it becomes much easier to create queries and check their validity. This is especially true when the data model is consistent (principle no.6) and the relation between single instances and aggregations are well defined. From an organizational point of view effort is reduced and quality of reports with aggregated data is increased.

82

7.4 Metadata management – Processes This sections tests the three final principles. The first section is related to having a conceptual model and the role concepts may play in alignment efforts. This is followed by reviewing the change management processes and drawing lessons from these processes. Finally, the third section identifies what division of roles and responsibilities has been encountered and how they match the roles found in literature.

7.4.1 Conceptual level As stated earlier the use of semantic metadata within BJz (and all other partners within the chain) is very limited at the moment. Semantic metadata is mostly found in the forms in IJ that are used to draft reports and in the various report in which data is aggregated for management and accountability purposes. Regarding the latter category there is a form of conceptual level. Jeugdzorg Nederland and the Ministry of Health, Welfare and Sports created a list of definitions called the Rapportageformat. These definitions are listed in written text and the format of corresponding tables is given. Additional definitions drafted by the management of BJz are held centrally at the management level. All these definitions apply to aggregated data, those related to single cases is hardcoded in the system (IJ) and forms.

Within BJz a total of 28 software systems are in use excluding various office applications. Only a number of these systems use client data. Other programs are used for HRM, finance and other tasks. However, most digital forms also exist on paper, although the use of paper declines steadily. Altogether these systems hold many forms and many tags could be applicable. It is thought that a list of terminology can be very useful. In fact this would be a conceptual level, although without the mapping and digital availability presented in this thesis.

Principle No.1: Conceptual level There is a conceptual level for metadata within BJz. This level is very thin since it only holds concepts used for information aggregation for a selection of management and accountability reports. Despite the limited size of the conceptual level it already shows its benefits within the processes of drafting and analyzing these reports. The conceptual level increases consistency and information quality. It is also a master data file that can be used as a reference for implementing changes or reviewing the performance and compliance of the system.

7.4.2 Change management BJz employs a number of information analysts whose job it is to match the information needs of the primary process with those for the aggregated reports. Additionally the information needs and the implementation are matched. Requests for change are drawn up in a document that specifies how the new situation should look like.

Modifications to IJ are directly edited into the Microsoft .NET environment. The changes are directly visible through the Active Server Pages rendering. Implementing changes is considered costly and time consuming. The application manager has a high workload, partially because not all changes in IJ

83 have been documented. In case the ETL or business intelligence server has to be changed external expertise is required.

Principle No. 4: Change management External metadata would allow dedicated tooling to be used for changing semantics. Additionally external metadata combined with a rich taxonomy in which information products are defined would allow easy modifications. This would reduce complexity since it disconnects the data model from the implementation.

7.4.3 Roles and responsibilities Within BJz there are four roles in semantic metadata management domain:  The data warehouse specialist maintains an overview of semantics for aggregated reports and creates and stores the business rules.  The information manager matches the information needs of the primary process with those for the aggregated reports and with the implementation.  The application manager collects requests for change regarding IJ and carries them out in concordance with the information managers and .  The management carries out specification of semantics for aggregated reports in case definitions are not yet covered within the Rapportageformat. See chapter 9.4.3 and appendix 12.1.6 for further observations on required roles.

Role Observations Initiator & enabler The initiator and enable role was found lacking, both within BJz and within the PPIC. Within BJz the responsibility for metadata, let alone semantics, was not specifically laid down. Some of the tasks are included in other responsibilities, such as those of the information manager and data warehouse specialist. Within the PPIC there also is no initiator for information exchange, mainly out of feat that in the future a different standard will be adopted. There is a pilot project but most organizations are waiting for a communication standard that is to be developed by industry organization Jeugdzorg Nederland, but has little priority. Developer The developer role is taken on by multidisciplinary project teams. Since BJz is the leading party in information orchestration it takes on a leading role when it comes to information exchange. The multiple disciplines represented within the team ensure that all fields are covered, including strategic interests, and technological capabilities. Standardization Regarding semantics for aggregated reports the standardization role clearly lies with industry organization Jeugdzorg Nederland. On the technology level Jeugdzorg Nederland along with the council of the 15 BJz are working on a standard, but progress is slow. Control & process Within the BJz the timely and correct delivery of incoming and outgoing monitoring information is the responsibility of the family supervisors. Their performance is monitored by the team leaders. Regarding the information system as a whole the I&A department is responsible. The application manager stores request for change and the information managers review operational performance.

84

Facilitator The facilitator role is carried out on a high level by Jeugdzorg Nederland. They collect and disseminate best practices amongst the 15 BJz. However, most are related to the primary process, a few are related to IJ and hardly any are related to inter-organizational cooperation. Service and product The service and product aggregator role were no found in the BJz case. aggregator This probably has to do with the size of the organizations involved. Many links within the PPIC are only a few dozen or few hundred people large. These organizations do not feature large dedicated departments that support the primary processes. Additionally, much IT-systems in use are off the shelf products provided by software developers. Accountability BJz is accountable for every information product they produce, even though based on information acquired from third parties. For that reason every information product from the primary process shared with any third party is signed off by a team leader. Higher up the organization the management checks outgoing reports. In the end the accountability lies with the director. Process improvement Cross agency processes are revised and improved in pilot projects. There is no standard approach or trigger for these pilot projects. However, all pilot projects are reviewed by the management and carried out by multidisciplinary teams. Information analyst The information analyst ensures a good fit between semantics within the information system and the requirements in the primary process. Semantics have a lifecycle, starting at specification and requiring constant alignment. Table 8: Overview of roles within BJz. Created by author.

Principle No. 7: Cooperation with stakeholders In the BJz case study most roles described in literature were found to exist. However, they are often not formalized and not consistently carried out in practice. Many of the described roles relate to cooperation with partners, but in practice most roles are carried out with a focus on the own organization. There are very few dedicated specialists since BJz has relatively few support departments. As a result a single role can be fulfilled by multiple people, but mostly one person may have various roles.

85

7.5 Conclusion All 9 principles were tested on the case were found to be applicable and relevant. They were confirmed by both real life practices and examples and were found relevant by the experts at BJz. As expected the maturity level of semantic metadata management was intermediate. Not many principles are carried out in the current state. Thus far semantic metadata management has not been high on the agenda within BJz, nor within the information chain, nor by groups that represent the interests of the 15 BJz in the Netherlands. This particular BJz aims to have a leading role

The case study does indicate that there is a lot of potential for semantic metadata management in this particular PPIC. Transaction costs of information may be significantly reduced. This does not only reduce costs, but also reduces the burden on specialists and increases information quality.

Recommendations for Bureau Jeugdzorg The first thing BJz should do is create a conceptual level in order to align the primary processes and to align the semantics of the individual cases with the creation of aggregated reports. This will steadily improve the quality of management information, allowing for better decisions to be taken in all areas. Having a conceptual level is a no regret measure. Even when other organizations in the chain do not adopt the model it helps in three ways. It makes internal information exchange easier, results in more accurate management information and the output of BJz is easier for other parties such as the courts and care providers to interpret.

The second priority should be having the semantic metadata external and add a form of tagging. This will unlock the contents to advanced search options. The employees within BJz would immediately save a significant amount of time during their daily activities. This feature will alleviate some of the pressure that is perceived as high by BJz employees. In addition it will also make versioning less costly and less time consuming.

Third, semantic metadata should be actively managed by BJz. Roles and responsibilities should be defined. Ownership and active management prevents fault and misinterpretation and may therefore even Protocols for change management should be implemented in order to match semantics with the needs within the primary process. Those changes must also be aligned with the creation of aggregated reports.

86

8 Case study 2: Tax office This chapter entails the second of two complementary case studies. The selection criteria for the case are given in chapter 2.3. This case study was carried out at Logius (Dutch digital government office) and the Belastingdienst (Dutch tax office, further referred to simply as tax office). Both Logius and the tax office operate at the national level. The tax office is one of the largest organizations in the Netherlands with over 33.000 employees in many locations. The tax office is one of the few organizations that has its own department for developing its business applications.

Scope Semantic metadata is relevant in nearly any branch of the tax office, but that scope would be too large. This case study focuses on the Standard Business Reporting (SBR) program which is one of the most mature metadata management efforts in the Netherlands. The SBR program involves other branches of government as well. The chambers of commerce (KvK) and central bureau for statistics (CBS) also take part, but are out of scope. The focus is on the upstream part of SBR-chain, ranging from Logius to the tax office. This is where the specification and most of the metadata management effort take place.

Case study approach The findings in this case study are based on three ways in which information regarding this case was extracted from both Logius and the tax office:  Several interviews with senior employees at Logius and four tax office business units. The interviews touched upon the current state of metadata management, pilot projects and future projects, roles, responsibilities and cooperation and best practices for use in any context.  A review of the published architecture and presentations on XBRL. The Dutch Taxonomy is published and available to the public. The contents and structure have been reviewed.  Site visits to the tax office in Apeldoorn and Utrecht.  A wide variety of documents were obtained, including workflows for versioning, procedures for consultation, checklists, lists and glossaries and high level overviews of processes, departments and cooperation schemes. Additionally there is information regarding the SBR project that is available to the public, including brochures and the SBR wiki.

The preliminary reference architecture from chapter 6 has been reviewed in the light of the information gathered in this case study. During the case study notable topics were identified. A topic can be seen as a cluster that consists of an introduction, observations and findings. In the introduction the topic and its link to the architecture are presented. This is followed by observations which are presented as factual as possible. Each topic has its own findings in which the observations are valued. To conform with the literature study the same approach has been used and the topics are grouped in the sections technology, data and processes.

87

8.1 Background The Dutch Standard Business Reporting (SBR) program stems from the desire for the reduction of administrative burden. In the 1990’s it was found that companies spent much time and effort in filing taxes and reporting statistics. This was found to be a waste of effort. Spending time and money on tasks not related to the core business is detrimental to economic performance and lowers competitive power of businesses. The SBR program was one of the initiatives designed to lower the transaction costs of reporting to the government.

8.1.1 Primary processes Companies vested in the Netherlands are required to pay taxes and to report various statistics to the government. The primary process within the SBR program can be viewed as the information exchange between the government and private parties. The SBR program allows companies to file a number of tax statements and statistics to three types of government bodies: Tax office:  Inkomstenbelasting / IB (income tax)  Omzetbelasting / OB (sales tax)  Vennootschapsbelasting / VPB (corporate tax) CBS (central bureau for statistics):  Opgaven productiestatistieken (statement of production statistics)  Opgaven Investeringsstatistieken (statement of investment statistics)  Opgaven Korte termijnstatistieken (statement of short term statistics) Chambers of commerce:  Jaarrekeningen (balance sheets)

In this case study only the reporting to the tax office in scope. Return messages follow a different process for which the metadata management effort is similar, but the processes are not. Combining all message types the tax office expects to receive as little as 125.000 messages in the year 2011. This number is expected to grow to nearly 3 million messages in 2012 and well over 10 million messages in the year 2014. All the stated reports are to be handed in using the XBRL format, an adaptation of the XML standard used for financial reporting. Aside from the XBRL format the semantic metadata is specified in the Nederlandse Taxonomie (Dutch Taxonomy) which is published for public use.

According to the tax office the following advantages apply to the use of XBRL messages and standardized semantic metadata in the Dutch Taxonomy: 1. Reduced costs for companies and intermediaries by standardizing reports by means of a taxonomy, allowing their processes to be standardized as well. 2. Reduced costs for companies and intermediaries by reducing transaction costs. 3. Improved data integrity by means of Horizontaal Toezicht, a form of benchmarking. 4. Improved certainty on fiscal position through faster delivery and processing. 5. Reduced costs since forced checks result in less reports to be reviewed manually. 6. Improved information quality by using a common set of semantics within the information chain. 7. Improved consistency of reports and better match with bookkeeping software.

88

Overview of the information chain In order to understand the context of this case study the PPIC of the SBR project is introduced. The PPIC is shown in Figure 16. The PPIC starts with the companies who are to report to the government. The majority of the companies (an estimated 80%) in the Netherlands have an intermediary such as a bookkeeper who handles tax reports and other official financial and statistical statements. One of the ideas of the SBR project is that in time companies will also use XBRL for communications. Dutch banks are already participating; however these other private parties are out of scope for this research.

Figure 16: Overview of data streams in the primary process regarding sales tax. Created by author.

All messages are sent to the Digipoort, one of the government bodies established to function as a portal for electronic messages direct to the government (Fokkema & Hulstijn, 2011). The Digipoort takes care of the reception of the messages, as is shown in green in Figure 17. Upon reception a validation process runs in order to check if the message is intact, the type of message is known, the XBRL and Dutch Taxonomy standards are adhered to and for any viruses or other undesirable payload. Then there is a transmission to the government body it was addressed to and the sender is notified by its delivery by means of an acknowledgement.

At the tax office the message arrives at the central administration, which handles all digital processing. These steps are shown in dark blue in Figure 17. Even though validated at the Digipoort some additional checks are carried out. The XBRL message is stored and archived in its original format. Subsequently it is converted to XML for internal processing and sent to the respective back office. Much of the fiscal processing is done automatically. Areas of special attention and messages with suspicious values are checked manually by fiscal specialists. Eventually there is an outcome of the process and a response to the company is provided, on paper and digitally if desired. These are both out of scope for this research.

89

Figure 17: Steps taken in filing a tax report through SBR. Created by author.

90

8.2 Metadata management – Technology This section tests three principles that are mainly related to the technology level. The technology level is introduced by providing an overview of the current architecture and a view on its adaptability. This is followed by the interfaces and standards that link the semantic metadata to its application. Finally, the third section describes what tooling is in use and which roles for tooling have been identified.

8.2.1 Architecture & adaptability The SBR case shows the necessity for having an adaptable architecture. All systems in use before the SBR project was initiated were never designed to be interoperable in this specific way, or to accommodate XBRL. One of the major lessons that can be drawn is that new data streams, new partners and new content may become part of the information system. With the information system define in a broad sense to include people, technology, processes and data. The information system must be able to respond to and facilitate changing needs and requirements.

The Dutch Taxonomy (NT) holds the semantics. The Dutch Taxonomy Architecture (NTA) contains the format of the taxonomy. For the NTA there is a special expert group that is to be consulted. A platform for discussing the architecture is not only relevant concerning major changes but also in more practical issues. For instance it is possible to agree on semantics, only to disagree on format. Stakeholders may have different systems, in this case the tax office wanted that a name could be 40 characters long, while another partner’s system was only able to cope with 25 characters. In the end the tax office added

Regarding requests for changes in the Dutch Taxonomy Architecture (NTA) there is a well defined protocol. Changes can be put on the agenda in several ways. These include bringing them up during a meeting of the associated expert group and through request-for-change-forms. Each request must meet a predefined format and motivation. Before any decisions are made there is a period in which all stakeholders can review the request and provide their comments. The request for change is judged on 7 criteria. Among these criteria there is the check on a set of hierarchically ordered architecture principles.

Principle No. 8: Adaptable architecture The SBR case shows that an architecture has to be adaptable. Requirements for an information system do not only exist prior to its development, requirements are defined continuously. Having a platform to discuss changes

91

8.2.2 Interfaces & standards For each type of tax processes have been defined. For each of these various technologies are in use. Additionally there are various standards with same concepts in use. For the types of tax covered by the SBR project each semantic exists in both XML (back office) and XBRL (delivery by beneficiary). Currently the leading data model is the implementation in XML. Since this is a technical implementation it is difficult to use this as a reference.

When semantics are changed it is hard to determine where all changes should be implemented as well. There is no mapping of the semantics with a leading data model, let alone a conceptual one. Oversight of where particular semantics are used is easily lost. Some types of tax still use paper forms, others do not. Some semantics are used in the tax office computer application, others are not. Most are used in third party software packages, some are not. Some semantics are explained on the website or in brochures. All semantics should be known to service desks that aid customers. Types of tax used in the SBR program also require XBRL translation.

Principle no. 9: Mapping with implementation Having a mapping with the implementation creates on overview of where which semantics are applied. When changes are to be implemented it is immediately known what these changes are. This also allows forecasting of the impact of the proposed changes.

92

8.3 Metadata management – Data In this section three principles are tested that relate mainly to the data level of semantic metadata management. First the topic of external metadata is discussed since it is related to having a consistent model described in the second section. The third and final section describes how the semantics are related to the business rules used in the primary process and for creating aggregated reports.

8.3.1 External metadata External metadata is used both within the tax office processes and within the SBR chain.

Tax office primary processes Within the primary processes of the tax office the large information systems that process the tax reports all use external semantic metadata. The semantics are stored in a repository linked to the implementation. Due to the age the repository does not meet modern standards. For instance, the primary key is an auto number and definitions are optional and not standardized. The auto number makes that no relations among semantics are defined. The external metadata makes versioning much easier. This is an important requirement for the tax office since each year changes have to be implemented. This means that each year a new set of semantics is used, although most remain the unchanged. Previous versions are stored in order to be able to interpret tax statements made several years ago. For at least 5 years in the past the tax office still has the option to review taxes and adjust their notification.

Standard Business Reporting In the Standard Business Reporting chain all semantics are encompassed by the Dutch Taxonomy, which is published in the public domain. All reports are made up in XBRL and are linked to the Dutch Taxonomy. The semantics are also provided to the software developers. This allows them to be integrated in accountancy software. With the right labels (semantics) attached the semantics are attached to figures during ordinary accountancy tasks. Should a tax statement be created this can be done by the push of a button. The right figures are extracted from the accountancy software and an XBRL message is created.

Principle 3: External metadata The SBR case is a great example of how external semantic metadata allows for much easier metadata management, improved performance and allowing additional means of automations. These include validation of messages using a predefined schema with threshold values and automated creating of reports from a larger dataset.

8.3.2 Consistent data model The tax office case shows that a consistent data model is desirable. Processes need to be reviewed by subject-matter experts when changed, and changes are common as shown in 10.4.2. The high level of automation makes that even a little mistake in a process that handles millions of cases per year may have significant consequences. Doing it right the first time saves a lot of time. Even when

93 processes are segregated in such a way that inconsistencies do not matter removing them is desirable. End users are confronted with multiple types of tax and inconsistencies undermine the reputation of the tax office. Moreover, segregated systems may become interconnected over time as shown by the move towards PPIC's. That move is characterized by creating digital relations among organizations that did not exist earlier.

Data models are created on three levels: per type of tax, the partial taxonomy and the top down data models that are under construction. Currently the leading models are the functional designs that are created per type of tax. A number of these models are consolidated into the partial taxonomy, but most types of tax are not in the SBR program.

Currently there are two efforts in order to make a consistent data model that spans multiple processes. First, there is the bottom up approach of the partial taxonomy. This is a bottom up approach since the semantic metadata from the application level is bundled and reused. The partial taxonomy is made consistent for the end user, but is not used as a data model for organizational redesign. Second, there is a top down approach that aims to create an object model and conceptual data model. The first step in this process is to determine the objects and relations per type of tax. In the year 2013 these are to be combined into a single model. This requires tooling for relation management and versioning which is not available to the project yet, and possible has to be made to specification. The top down approach has the semantics derived from the business information needs, as argued in principle 1. In time this is seen as the proper way and the bottom up data model serves as an intermediate model that is being used since it is already available.

Principle No. 6: Consistent data model The case study shows that having a consistent data model is desirable. Inconsistencies may lead to loss of information quality and incidents. Not having a consistent data model requires additional quality checks, claiming effort of knowledge workers that is not spent at adding additional value. Consistency in the data model is attained by specifying the relations. This requires subject-matter expertise, at least for review of quality. In order to maintain consistency from conceptual level to implementation a mapping as suggested in principle 9 is required.

8.3.3 Link to business rules Business rules play a vital role within the primary process of the tax office. The fiscal process of processing and reviewing tax statements is based on tax law. The outcome of the process is defined by many rules, values, thresholds, entity characteristics, transitional arrangements and so on. These values are clearly stated in the law. The principle is that the same rules apply to any entity in an equal way, making the process fully deterministic.

The tax office currently uses Regelspraak, an adaptation of the Rulespeak restricted language for defining many of business rules in use. The use of Regelspraak makes review easier since it is quite close to natural language. Currently a program is carried out within the tax office to orchestrate the use of business rules. Since business rules and semantics are responsibilities of different departments a link between the two seems unthinkable. The tax office tries to reduce complexity by

94 limiting interdependencies. But there is a possibility that a mapping will be made with semantics. If a mapping is adopted this will be loosely coupled and used only to inform the other department of changes. Whether this mapping will be with conceptual level semantics or with the implementation is not known yet.

The rules are clearly linked to the definitions (semantics) in use. For instance, the definition determines the category you are in, the linked business rules determine if you are entitled to a certain arrangement or not. In practice tax laws are updated every year. The number of annual changes differs per type of tax, but often it is easy to lose overview. Court cases are also a source of changes. A judge may rule that under a certain circumstance an entity should be viewed differently with corresponding entitlements or obligations. Such a ruling requires both new semantics and business rules, which need to be in sync.

Those responsible for specifying the metadata agree that business rules should be managed on a conceptual level as well. In concordance with the conceptual level for semantic metadata there should be a link between the business rules in concept and those in use. Business rules and semantics are related. Insight in these relations is deemed useful. In case these relations are to be specified there should be a form of loose coupling to comply with the tax office strategy of independent business units. Forms of loose coupling could be references or a mapping.

Principle No. 2: Mapping with business rules The tax office case indicates there are strong links between business rules and semantics. A change in the semantic metadata may have impact on the associated business rules and vice versa. In the future there probably will be a mapping between semantics and business rules. This mapping will be loosely coupled, as proposed in the principle.

95

8.4 Metadata management – Processes This sections tests the three final principles. The first section is related to having a conceptual model and the role concepts may play in alignment efforts. This is followed by reviewing the change management processes and drawing lessons from these processes. Finally, the third section identifies what division of roles and responsibilities has been encountered and how they match the roles found in literature.

8.4.1 Conceptual level Currently semantic metadata is specified by specialists on behalf of business level. The semantics are specified right into the implementation. A conceptual level is currently non-existent. There are calls for implementing a conceptual level. A conceptual level is deemed valuable for a number of reasons:  The implementation of semantics has to be reviewed has to be reviewed by subject-matter experts in order to validate fit between semantics and their intended purpose. This is much easier at the conceptual level since the subject-matter experts have a hard time reading code.  As with the primary process having a conceptual model is easier for high level review. The most important of these is the legal review. Definitions follow from law and task given to government body, these in turn lead to additional semantics practically required to carry out the given task. The tax office is only allowed to request information as stated by the law or absolutely required to carry out that task.  A conceptual level allows for vendor independency on the lower levels. Not being bound to a certain standard makes it easier to change to a different format or standard. This increased vendor options and makes transitions less costly.  A conceptual level allows for a higher level of complexity. This complexity is required in order to define all relations that exist, especially when making a model that spans multiple types of tax. In practice there are multiple realities per entity depending on the context. A person can be both an individual and a one person undertaking. Having a high level model that is consistent allows for much simpler derivative models implemented in technology that are still consistent.  Concepts need to be distributed to software developers and other partners. Concepts in plain text are the easiest to communicate. Plain text semantics are also most favorable for creating forms.

A conceptual level is being created from various angles. In practice the Dutch Taxonomy is the closest thing to a normalized set of semantics. This covers only a portion of all types of tax and is not truly conceptual since it is in XBRL standard. Another initiative is the phased top down specification of an all-encompassing data model, which is to be finished around 2014.

Principle No. 1: Conceptual level A conceptual level is currently non-existent but desired by nearly all those involved in some way with semantic metadata management. These are experts with very different backgrounds and motives.

96

8.4.2 Change management Change management processes take place in two areas. First, the tax office creates its partial taxonomy and incorporates changes in its primary process. Then the partial taxonomies are combined and normalized at Logius. Besides from the content there also is a dedicated change management process for the architecture.

Partial taxonomy tax office In 2010 the tax office started the Competence Center Taxonomy (CCT). The CCT bundles the knowledge that was previously present in a number of pilot teams related to taxonomy development. The main task of the CCT is to annually produce the tax office partial taxonomy. In practice this is done by combining the existing data models for each type of tax. These are combined, overlap is eliminated and a translation from XML to XBRL is carried out. The workflow is shown in the bottom chain in Figure 18. This method ensures that the partial taxonomy and messages based on that taxonomy fit the implementation of the primary processes well.

The downside of this approach is that the semantics may lose their fit with the real life meaning. A top down specification is seen as more favorable. Subject-matter experts, such as fiscalists and legal experts, should interpret the tax laws and specify the semantic metadata. This reinforces the link between symbol and actual meaning. Ultimately this should result in a better organizational fit with the law the tax office is to carry out. In practice this is harder than the bottom up approach out and remains in the pilot phase for this moment. The top down approach is shown in Figure 18 as well.

Figure 18: Top down specification of metadata for creating the partial taxonomy (top) and bottom up reuse of existing metadata (bottom). Created by author.

97

Creation of Dutch Taxonomy Logius receives the partial taxonomies in January. Each partner is responsible for drafting its partial taxonomy against the then current Dutch Taxonomy Architecture. Logius is responsible for the process of normalizing the partial architectures by combining the overlapping semantics in the GEN- base. Besides the GEN-base there are domain specific parts within the Dutch Taxonomy. In July the alpha versions are released for public review. This is followed by a period of consultation and modification lasting until November 1st, with the release of the beta version of the Dutch Taxonomy. This is considered error free but is published to be certain. On the 1st of December the definitive version is published.

Dutch Taxonomy Architecture Regarding requests for changes in the Dutch Taxonomy Architecture (NTA) there is a well defined protocol. Changes can be put on the agenda in several ways. These include bringing them up during a meeting of the associated expert group and through request-for-change-forms. Each request must meet a predefined format and motivation. Before any decisions are made there is a period in which all stakeholders can review the request and provide their comments. The request for change is judged on 7 criteria. Among these criteria there is the check on a set of hierarchically ordered architecture principles. Each year the final architecture for next year is determined by the 15th of May. The months up to December are required to implement the requested changes and to test the architecture for flaws and performance.

Principle No.4: Change management Semantic metadata management should be part of standard procedures for organizational changes and review. Well defined procedures, protocols and criteria make versioning more objective and transparent. The well defined timeline for versioning allows all those involved to work towards the deadlines. Strict deadlines also reduce the possibility for endless discussion and delays.

8.4.3 Roles and responsibilities Table 9 shows the testing of the roles found by Janssen, Gortmaker & Wagenaar (2006). The implementation orchestrator was added as a new role. See chapter 9.4.3 and appendix 12.1.6 for further observations on required roles.

Role Observations Initiator & enabler Regarding the SBR project the initiative was with the Dutch government. Currently the initiator role is carried out by the SBR team at Logius. The SBR team cooperates with the public bodies in the SBR program and maintains support amongst the intermediaries, software developers and end users. Developer Within the tax office there is not a single developer for the further implementation of semantic metadata. Various departments are working on semantic metadata from different perspectives. For the SBR project a project manager is responsible. Regarding the cooperation between the partners in the SBR project there

98

The requirements for cooperation are sought in three fields: technology, data and processes. Standardization The major technology standards have been set by using XBRL and a taxonomy in order to There is a council in which the participants are able to discuss the Dutch Taxonomy Architecture. Protocols for changes to the architecture exist. Control & process Both within the Digipoort and in the three phases of receiving messages at monitoring the tax office the process of information exchange is closely monitored. At the tax office the department Ontvangen en Mededelen is responsible for reviewing message integrity. Facilitator There are various steering committees that facilitate cooperation between the partners in the SBR project. These so-called expert groups are centered around various subjects, such as data and architecture. Here Service and product The service and product aggregator role has not been observed. There is aggregator no such thing as a one-stop shop that aggregates services to meet customer needs. Accountability Regarding semantic metadata management in the SBR program the SBR project leader is accountable. For creating and publishing the Dutch Taxonomy the Logius taxonomy project team is responsible. Process improvement The process improvement role is mainly carried out by Logius. Logius has the overview of the interests of the stakeholders and being interconnected with each of them can measure operational performance. Implementation Semantic metadata management is to be centered around a conceptual orchestrator set of semantics. This serves as a reference for the implementation in various systems, processes, data models, forms and business rules. Within the tax office the data specification team ensures that semantics are uniformly applied over various back office systems, paper forms, end user software developers and the Dutch Taxonomy. Table 9: Overview of roles within the tax office/SBR case. Created by author.

Principle 7: Cooperation with stakeholders. The tax office case indicates that all roles are very relevant, except for the service and product aggregator. One additional role has been found, the implementation orchestrator. This role is unique to having a conceptual model for semantic metadata as proposed in principle 1.

99

8.5 Conclusion of the tax office case All 9 principles were tested on the case were found to be applicable and relevant. Compared to the child care case the tax office had much more of the principles already (partially) implemented or under development. This can be explained by the nature of the organization. For both BJz and the tax office proper information management is a core value, but the tax office is a much larger organization and information management is also a more prominent topic, receiving more attention and funds. The tax office has several departments that are focused on different aspects of managing information and its exchange, ranging from a conceptual level down to the development of proprietary programs. These departments together are several times larger than the entire organization of BJz. Having a dedicated IT staff of over 2.000 people allows for a more mature infrastructure than a staff of about a dozen. Additionally, the information is much more standardized than in the case of BJz. Numbers and tables are easier to support with semantics than written text.

Even with such a capability and level of maturity the case study shows that there is a lot of potential for semantic metadata management in this particular PPIC. This is primarily because SBR is quite new and the ad hoc processes are in the process of being formalized.

Recommendations for the tax office The first priority of the tax office should be the creation of a conceptual level in order to align the primary processes and their implementation within and beyond organizational boundaries. The conceptual level may act as a point of reference, a master data file. Given the scale effort this will not be the first principle to be in effect. Even during development the insights that are gained can be put to use in other projects that are implemented earlier, such as the two described below.

Second, change management protocols must be in place. The management of semantic metadata should support follow on processes, such as its implementation in processes, protocols and technology. In the tax office case every year a number of changes must be processed regarding processes and corresponding data models. Before changes can be implemented the design must be finished and reviewed, this includes the semantic metadata. Communication with partners in the chain and across organizational implementation takes time. This reduces the one year window to only a few months.

Third, the tax office should use an adequate set of tooling for various metadata management activities. Given the amount of semantic metadata, rapid cycle of changes and number of people involved alignment is very challenging. Tooling helps to capture and record all actions and results of metadata management efforts.

100

9 Evaluated architecture This chapter shows the evaluated architecture. This architecture is the result of the validation of the preliminary architecture by means of two complementary in-depth case studies and expert review.

The format of the reference architecture is discussed in section 2.4. The preliminary architecture is based on theory as found in the literature review. The lessons from the literature review that are present in this architecture are indicated in the blue tables in chapter 5. The red boxes in the literature review and case studies relate to lessons learned during the validation phase. The lessons from the literature review that proved correct remain, those falsified have been removed. The additional lessons learned from the case study and expert review have been added.

This chapter starts off with section on the preconditions. This is followed by the design principles and the reference architecture overview. Additional knowledge is provided in the tradeoffs that complete the reference architecture. This chapter ends with the conclusions from the expert session.

9.1 Preconditions This paragraph lists a number of preconditions for the setting in which reference architecture and the corresponding design principles must be viewed. For any situation that does not meet the preconditions the reference architecture the value is not guaranteed. In such a situation it can be presented as one of the arguments to go through a process of change and cooperation that will result in a situation that meets the preconditions.

1) The reference architecture has been designed to be applicable within the scope as defined in chapter 1.4 the public private information chain (PPIC). Any applicability of the reference architecture outside the defined scope is speculative. 2) Managerial support among partners in the PPIC is required for inter-organizational cooperation. Far reaching cooperation is of little use when there is no desire or trust to do so. 3) Managerial support and organizational changes are required within each organization. Metadata management must be on the agenda and must be incorporated within the enterprise architecture. 4) Exchange of information and information products should already take place within the PPIC. Semantic metadata management over organizational boundaries is not the first topic one should pick to start cooperation and develop a new chain. It becomes a relevant topic when it is known what information is to be exchanged. 5) An intermediate level of both information and IT-maturity is required before semantic metadata management is of any use, or efforts are better spent in those areas first. A maturity level that is too low has little benefits from semantic metadata management and additional challenges for implementation arise. 6) There must be a basic level of common vocabulary and semantics within the PPIC. An infrastructure that enables reuse of metadata and information is of little use in a situation that holds only unique information.

101

7) There must be an agreement on what added value the metadata management effort is focused. How the metadata management architecture is arranged specifically depends on what benefits are desired most, i.e. automation, reduction of complexity, information quality, etc. 8) The scope of semantics to be shared should be determined beforehand. All relevant external metadata should be shared within the PPIC. Non-relevant external and internal semantic metadata should not be shared in this manner.

9.2 Design principles This chapter lists the design principles of the reference architecture. In total 9 principles have been derived. Each principle is provided with a rationale and a brief description of the implications that follow from adhering to the principle. These design principles complement each other and all are interrelated to a certain level. Their interdependence is shown in section 9.3, which shows how these principles translate into a generic and high level organizational model for semantic metadata management.

No. 1: Conceptual metadata model Statement Use conceptual, generic semantic metadata as intermediate level between specification and implementation. Rationale There is a gap between those who specify the business information needs (and thus metadata) and the technical implementation. This gap may be bridged by an intermediate level in which the conceptual design is laid down. A practice that is common in many other fields of design. This level may act as a master data layer combining the various business needs and many technical implementations into a single, normalized, non-overlapping data model. This conceptual semantic metadata model is much easier to understand and maintain than a multitude of coexisting implementations. As such it may not only act as an interface between the business and technological layers, but also among multiple stakeholders in the PPIC. And as a master data file it may be the lead model after which all implementations are modeled, creating consistency. Implications Conceptual designs often exist for process design and technological architectures, but are lacking for semantic metadata and must be created. Such an implementation requires organizational and cultural changes. The conceptual metadata model is not a onetime development effort, but should be continuously updated. A dedicated group should be given the responsibility for this task and should be equipped with the right skills, tools and managerial support. Operating as a hub between the business and technological level requires communicational skills, knowledge of business requirements and a view on technological implications. With multiple stakeholders involved (as in a PPIC) communicational skills and a solid approach are even more important.

102

No. 2: Mapping with business rules Statement Semantic metadata should be mapped to business rules on both the conceptual and implementation level. Rationale Business rules and semantic metadata are both closely linked to the primary process. A change in the semantic metadata may have impact on the associated business rules, or vice versa. Introducing a link between the two (either tight or loose) provides additional insights for those who design and coordinate business processes and it reduces the effort of change management. This allows the involved stakeholders to act proactively. The impact of changes and inconsistencies are known beforehand. Subsequently, they do not only emerge when faults are made and detected in the primary process. Implications Process design and data architecture are often different responsibilities within organizations, making links between semantic metadata and business rules uncommon. Both on the conceptual level and in functional designs business rules and metadata are normally managed in different tools and procedures, lacking support for any kind of coupling. Once the coupling issue is resolved this new functionality must be actively used before it is of any value. The management processes and protocols for either product must be changed to take into account the other product. Depending on situation the implementation may vary from simple automated change notifications to close cooperation.

No. 3: External metadata Statement The semantic metadata should be stored and managed independently from the core data. Rationale Metadata has a different purpose and properties than the core data it describes. For a machine the vicinity of the metadata to the core data is no concern, only the end user requires a clear representation. The information system has the best performance when both are stored and managed in ways that best meet their individual properties. In order to do so the metadata must be separated from the core data. Storing semantic metadata externally makes it easier to manage and reduces data quantity and complexity. At the same time all functionality is maintained, or even expanded. Implications When metadata is not yet stored externally legacy data must be migrated, most data warehousing implementations support this activity. In small and simple systems using external metadata increases complexity and overhead. In the setting of large volumes, processes that alter data and information exchange with various parties this overhead structure greatly increases performance. Due to the importance of this structure it requires robustness, since when the link between metadata and data is lost than information is lost. Robustness is achieved by a good technological design, supported by protocols (testing) and competent staff.

103

No. 4: Change management Statement The metadata management approach should support change management and versioning. Rationale The metadata architecture is never static. The organizational structure, primary process, actor constellation, technology and all other components will change over time. Having more stakeholders, as in a PPIC, makes changes more frequent and they may potentially impact more processes and stakeholders. Versioning must be incorporated into the management design, since it will be a recurring activity and if not well embedded within the management structure may be detrimental to quality. Implications Change management requires well defined protocols, which are agreed upon by all involved stakeholders. Frequently releasing new versions creates a better match with operational needs, but versioning has its costs and a lot of versions may create a confusing situation. Furthermore, it must be determined who has what access level to the semantic metadata. This also relates to how the use of metadata is enforced. Strict enforcement creates consistency but limits professional judgment.

No. 5: Adequate tooling for multi-stakeholder context Statement Semantic metadata management efforts and processes should be supported by adequate tooling. Rationale Semantic metadata management consists of a variety of processes. These processes include publishing present and past metadata models, reducing redundancy, removing unused metadata, translating concepts into implementation, and various others. Many of these tasks can be supported by tools. Tools are advisable since many of these tasks are complex, complicated and larger than a single person can grasp. Tools may reduce the management efforts, provide a structure for working, log management activities and reduce errors. Implications Semantic metadata management in chains often requires other tools than those currently in use with most organizations. This requires that new tools are adopted or developed. As indicated tools may aid in a variety of metadata management processes. These multiple roles for tooling may result in multiple tools to be used alongside each other. Depending on the setting of implementation and the need and importance of tools varies.

No. 6: Consistent data model Statement The relations among concepts in the data model should be defined and be consistent. Rationale On the conceptual level the semantics can merely be a list or stored, but a model that defines relations is preferable. If inconsistencies are found and removed at the conceptual level they are avoided in practice. This results in lower error rates within the primary process, creates less versioning requests and reduces efforts as a single conceptual model is changed instead of redoing various primary processes. Links among semantics are also required to add additional context and to indicate the impact of change. Implications This design principle has two implications. First, if not yet present relations among semantics must be defined. This requires the active involvement of subject-matter experts. Additionally it requires a semantic metadata model that supports relations and tooling to be able to define the relations. Second, the consistency requirement requires review of the relations by those with subject-matter experience.

104

No. 7: Cooperation with stakeholders Statement The metadata management coordination structures and roles must be defined and implemented. Rationale When cooperating over various organizational departments and among stakeholders in the PPIC it is wise to define coordination structures. These are processes and protocols for cooperation, alignment, change management, versioning and strategic decision making. Additionally formalized roles and responsibilities provide clarity on who does what, aside from creating trust among cooperating stakeholders. Implications Formalizing semantic metadata management in a PPIC creates two types of new organizational structures and processes. First, internally within organizations a structure is required to create and maintain the conceptual level metadata. Second, coordination among stakeholders within the PPIC should take place. The design on this level depends on the level of cooperation.

No. 8: Adaptable architecture Statement The metadata architecture should be adaptable in order to respond to changing needs. Rationale Lessons learned from IT-projects indicate that the requirements on IT-systems change over time. The reasons for change may include the use of new technology, changes in the primary processes or a change in the stakeholder constellation. With a larger number of stakeholders involved the chances that requirements change are increased. Should the architecture not be adaptable changes are not made or at a later stage than alignment with the needs is reduced. Additionally changes require more effort and are more expensive. Implications In order to meet changing demands the architecture needs to be adaptable. This means that there are not only protocols for changes within the architecture, but that changes to the architecture itself are possible as well. This means that the architecture is well defined and documented. When a modification is proposed the effort and the impact are clear.

No. 9: Mapping semantics with implementation Statement There should be a loose coupling with the technical implementation using standardized interfaces. Rationale The conceptual layer should be linked to technical implementation, since the implementation is what is used in the day to day primary processes, not the concepts. On the one hand this allows an organization, or an entire PPIC, to be in control of the technical implementation and the semantics used in the primary process. Any translation error between concept and implementation is detrimental to performance. On the other hand the use of standardized interfaces allows for a wide variety of systems, databases and forms to be used without much alignment effort. Orchestration of the implementation includes alignment of various systems, consistency checks and change management. Orchestration should empower the implementations, but its existence should not be reducing the design space. Implications Orchestration is difficult without any links between concept and implementation. These links should provide insight in how the concept is translated into reality. The existence of various technologies and legacy systems requires the coupling between concept and implementation and among systems to be standardized. With the design of the right interfaces coupling can be achieved without (seriously) impacting existing technologies and future design space.

105

106

9.3 Reference architecture overview The 9 design principles in the final reference architecture have been designed to be implemented in sync. They complement each other and the combined effect exceeds the sum of individual effect. The actual implementation will differ per case in which it is applied. Each of the principles has some degree of leeway. They can be carried out in different levels of maturity. For instance, having a conceptual metadata model can be realized on paper, in a relational database or in a dedicated software tool.

Even though each implementation will differ it is possible to show how the principles are interrelated in a generic way. Figure 19 shows all 9 principles embedded in a conceptual overview. This model shows how the conceptual level is located in between the business level and the technical implementation. The arrows indicate the direction in which metadata is specified. Table 10 contains remarks on the positioning of the Figure 19.

Categories for semantic metadata specification The categories on the right hand side of the architecture overview have been added to provide better insight in the three levels that are portrayed. The levels are based on enterprise architecture models (sources) and are confirmed by the case studies and expert opinions. All categories relate to external semantic metadata.

The goals and strategy are in the top of the enterprise architecture and determine what business processes take place. The business information needs are the metadata requirement on the business level, which is to be represented by the conceptual level and provided by the implementation. The object model is the highest level of concepts in which all concepts are unified into a single non- overlapping data model showing both relations and meaning. The functional design does the same for each implementation, meaning a functional design exists for each implementation, thus allowing for partial overlap of concepts among functional designs. The concepts of the functional design are translated one on one into a technical design, which translates the universal concepts into a syntax a computer can use, such as XML, XBRL and many others. The technical implementation portrays semantic metadata as implemented within the technology itself, such as in electronic forms. Within the technical implementation there is the actual data, such as instances of XML messages or rows in a relational database.

107

Figure 19: Overview of reference architecture for metadata management in a PPIC. The numbers correspond to the numbers of the design principles. Created by author.

108

Design principle Positioning 1: Conceptual metadata The overview figure shows an intermediate level between the top level level from which metadata originates and the bottom level in which metadata is implemented in various systems and processes. Combined with principle 2 there are also generic business rules on this level, which are also implemented one level lower. 2: Link to business rules Both on the conceptual level and in the implementation the business rules are linked to the semantic metadata. How they are linked depends on the implementation, the link represents the minimum coupling between the two: for each concept it is possible to see how it relates to one or more rules and vice versa. 3: External metadata On the implementation level the semantics are stored separate from the actual data. This allows them to be managed more easily. Combined with principle 9 it allows the semantics in use to be coupled with the semantics on the conceptual level. In turn this coupling allows the organization, or stakeholder constellation, to be in control of their organization. 4: Change management Change management ranges from the business level to the implementation level. The number is positioned between the business and conceptual level since the majority of the change management takes place at this level. 5: Adequate tooling The tooling is positioned at the entire conceptual metadata level. In this area tooling is most valuable for semantic metadata management. The tooling supports the tasks carried out in principles 2, 4, 6, 7 and 9. 6: Consistent data model The consistency is positioned at the conceptual level. If the data model is consistent at the conceptual level the consistency will cascade to the implementation level. This principle is of key importance to realize the alignment between the business and implementation level. 7: Cooperation with The cooperation with partners is located on the business level since partners cooperation on this level is required should organizations want to (partly) share the same conceptual metadata design or align the primary processes. 8: Adaptable architecture The principle of an adaptable architecture is applicable to the entire architecture. When the design on the conceptual level is adaptable than it can easily incorporate significant changes on both business level and implementation level. The adaptability of the implementation is partly assured by implementing design principles 2 and 9. 9: Interfaces/loose This principle acts as an interface between the conceptual level and coupling the implementation. This coupling shows how changes on the conceptual level impact the implementation. This indicates where changes will/must take place, which can be reviewed preemptively. Table 10: Remarks on the positioning of the design principles in the reference architecture overview figure. Created by author.

109

9.4 Tradeoffs The tradeoffs make up the second part of the reference architecture. They cover topics that are deemed important, but could not be covered by the design principles. The reason that they are not principles is that they are not prescriptive features. The design principles have some leeway but the tradeoffs have a much larger design space. That design space is required since organizations operate within different contexts, have different goals and priorities and differ in ambition or resources. The name ‘tradeoff’ is used since it has the connotation of deciding on the preference for one set of characteristics or another. A balance of interests is required. Tradeoffs allow the designer to use them in a way that matches the views and demands of the stakeholders. Additionally they also have the connotation of being interdependent. There is much freedom in their application, but they also impact other elements in the reference architecture.

9.4.1 Cooperation archetypes and growth path The level at which alignment takes place determines the transaction costs that are incurred. Figure 20 shows three archetypes of cooperation among two parties. The alignment can also be determined in a growth path, starting at the technology level and moving up to data and process alignment. The nature of the primary processes limits the level of alignment that can be realized.

Low level alignment vs. high level alignment Alignment at lower levels of the enterprise architecture is easy to achieve. Cooperation at higher levels reduces the transaction costs the most. This also requires more alignment efforts, which are costly. The disparity between organizational goals may limit the level of alignment that can be achieved.  The cooperation between A and B takes place on the operational level. Information products including metadata are exchanged, but there is no alignment and the metadata is available to a human end user, but cannot be reused. In practice this situation is common, but without the conceptual level. This figure, including the conceptual level, has been adopted in order to show that creating a conceptual level might be a first step in a growth path.  The cooperation between K and L takes place on the data level. Parts of the data model that are used by the other party but are not covered by the own data model are copied, allowing for easier exchange and reuse of semantic metadata.  The cooperation between X and Y takes place on the business level. This means that alignment takes place on the business level, not only on the data level. Since the business level determines the data level a common set of concepts is required.

Figure 20: Cooperation models within the stakeholder constellation. Created by author.

110

9.4.2 Metadata publication and commonality Four generic types of semantic metadata publication can be identified. Each of them is presented in this section. However, within the information chain the typologies may exist. In practice this means that information chains are more complex than the archetypes presented here.

Using a common set of semantics vs. independency Using a single common semantic metadata standard results in the lowest transaction costs, but requires significant alignment efforts that limit the freedom self-determination.

Publishing semantics vs. using a common set of semantics Publishing semantics counters loss of information quality and makes interpretation easier, but does not allow for easy reuse. Reuse is enabled by common semantics. Having common semantics requires cooperation and consultation, for which costs are incurred.

Situation ABC shows a situation in which each organization specifies its own metadata and keeps this metadata repository to itself. When information is exchanged metadata might be included, but it cannot be reused internally and often things like definitions are often not included. This means that information is partially lost and translation costs are high.

Situation EFG shows a situation in which all organizations publish their metadata. This means that when E receives a message from G it is possible to look up the external metadata. This means that not all external metadata has to be added to each information exchange. In the same way F may use G’s metadata when creating a product specifically for G.

In situation KLM the overlap among each other’s metadata is harmonized, either through cooperation or having it enforced. All red metadata is standardized and can thus be easily exchanged and reused internally. Information products that feature a mixture of common and unique metadata pose challenges. Figure 21: Governance archetypes. Created by author.

In situation XYZ a single common metadata standard is in use. This standard covers all semantics in use within the organizations, or at least all semantics that are exchanged with others in the PPIC.

111

This means that all metadata exchanged within the PPIC is in common use and can be easily reused and reused internally.

9.4.3 Roles in semantic metadata management A number of roles within an organization operating within a PPIC have been identified. Depending on the amount of work to be carried out multiple roles can be assigned to a single person or an entire department can be assigned a single role. In respect to the roles defined by Janssen, Gortmaker & Wagenaar (2006) the roles have been reworked to match semantic metadata management. The service and product aggregator role was found not relevant in this domain. The end user, information analyst and implementation orchestration have been added. Appendix 12.1.6 shows the original table and a reflection on the changes that follow from the case studies.

Formalized roles vs. professional judgment The roles that are described can be highly formalized. This results in clear ownership of tasks and aids in communication, knowing who is responsible for all defined tasks. A drawback is that relevant tasks that are not defined may not be carried out by anyone since nobody feels responsible for them. Fulfilling a role can also become a goal itself, leading to inefficiencies. When roles are not strictly defined a more flexible situation is created, which is not as inherently transparent. The execution of the required roles can be periodically checked in order to check that coordination amongst professionals is sufficient.

Role Description Initiator & enabler This role is to convince and stimulate agencies to participate in and commit to an automated process execution. Often it is necessary to educate agencies on the basics of the technology and to show the potential advantages. End user The end user is the one who actually uses the semantic metadata. In general these are the experts within the primary process that handle the information that is provided with semantic metadata. Given the number of organizations and different types of specialists within the PPIC this group is rather large and heterogeneous. As such the end user is both the expert in the own organization as well as the next link in the chain. In metadata management the explicit role of the end user is to verify the validity and applicability of the common set of semantics. Developer Defining the requirements for each organization in order to enable cross- agency processes. This role involves the identification of the organizations and departments involved and determines the interests, objectives, and requirements for each of them. Information analyst The information analyst ensures a good fit between semantics within the information system and the requirements in the primary process. Semantics have a lifecycle, starting at specification and requiring constant alignment. Standardization Technology interface standards should be determined and set as a standard. Existing systems can be selected as standard, but it can also be better to develop and impose new, preferably open standards. Implementation Semantic metadata management is to be centered around a conceptual set

112 orchestrator of semantics. This serves as a reference for the implementation in various systems, processes, data models, forms and business rules. Whether this conceptual set is self-maintained or imposed the application of the metadata in the various forms of implementation must be orchestrated. Control and process The time-dependent sequence of activities needs to be managed. All monitoring role unexpected events should be tracked as soon as they occur and analyzed to determine what actually did happen and why, to ensure reliable cross- agency process execution. Facilitator This role facilitates the implementation of cross-agency processes by collecting and disseminating best practices, reference models, and reusable system functionality such as identification, authentication, and payment. Ideally, components are shared when possible and duplication of efforts is avoided. Accountability Governmental decisions should have accountability. This role should ensure management that the motivations behind decisions made by each agency and the performance and outcomes of the complete cross-agency process can be accounted for. Process Changes in processes and governmental rules often affect more than one improvement agency. This role should maintain an overview of the cross-agency processes and define mechanisms and procedures to assess the implications of changes in law, technology, and other developments. Table 11: Roles in semantic metadata management. Adapted from Janssen, Gortmaker & Wagenaar (2006), created by author.

9.4.4 Consultation protocols Semantic metadata management requires cooperation amongst stakeholders, both within and across organizational boundaries. Having a common set of semantic metadata in a situation without absolute hierarchy demands that at a given moment other stakeholders must be consulted whenever changes are deemed necessary. Consultation is a repetitive operation in metadata management and protocols can safeguard these management activities.

Early consultation vs. late consultation of stakeholders Early consultation allows for additional input of other stakeholders to jump start the change management process. In case the proposed changes are a no go or a very different direction is chosen hardly any effort has been wasted. The drawback of early consultation is that there is hardly any material for the consulted parties to review. Much depends on the presentation skills of the initiator to convey a clear picture. Consultation at a later moment is much more focused at the presented proposal, avoiding confusion or a lack of momentum. The possibility is that effort is wasted when there is no agreement. Figure 22 shows in which phases of change management consultation can be located.

113

Figure 22: Representation of potential positioning of moment of consultation within change management cycle. Created by author.

The positioning of the moment of consultation is related to the responsibilities and decision making authority the stakeholders possess. This means that stakeholders can be grouped and multiple moments for consultation can be planned, with varying stakeholder constellations.

Frequent meetings vs. avoiding endless conversation. Low thresholds for triggering meetings may lead to endless conversation and little action. Setting the thresholds too high may result in dysfunctional cooperation.

Metadata management is a continuous process. In the previous section roles have been identified. Protocols determine how these roles interact with each other. Consultation can be ensured by implementing triggers for meetings. Active management of interfaces may provide leads at what points consultation is required:  Consultation should take place when the primary processes change. An indication of such a change is a change in business rules.  Consultation should take place when interfaces or relations between objects change.  Consultation should take place when technologies and data models change.  Having a list with unresolved issues may be used input for meetings. Problems that do not require immediate attention can be noted and will not be forgotten. The same applies to good ideas and opportunities.

114

9.4.5 Versioning protocols Versioning is an integral part of semantic metadata management. Semantics have a lifecycle. Semantics have to be specified, applied, reviewed, changed and removed. In a PPIC this frequency is even higher since within each link a request for modification of the semantics may occur. Versioning is covered by principle 4, but several tradeoffs remain.

Long versioning cycle vs. fast versioning cycle A low rate of versioning allows time for consultation, has lower annual implementation costs and creates more awareness of changes in each version. A higher frequency allows for a better fit with changes and demands in the primary process. The time between identified need for change and implementation is reduced, but the overview of versions and their changes may be lost.

Actively removing unused semantics vs. no removal Semantics that are not used can be actively traced and removed. This reduces clutter and a smaller set improves the overview. Removing unused semantics requires additional effort but results in a smaller, better fitting and more up to date set of semantics.

Top down vs. bottom up specification Semantics can be specified by experts and enforced on the end users in the primary process. Another option is crowd sourcing by allowing the end users to add their own metadata. This may result in a better fit and allows for more freedom, but may also lead to synonyms and lower quality metadata.

9.4.6 Functions of tooling A number of tasks regarding semantic metadata management can be supported by tooling. Tooling is an investment which may reduce alignment efforts. In theory all activities can be carried out by people using pen and paper, although this is not recommendable.

Using tools right from the start vs. later in the process Using dedicated tools from the start may ease the implementation of a more mature form of semantic metadata management. Adopting tools later in the process allows for the best tools to be selected using the lessons learned in the meantime. However, at that moment it may also be much harder to switch tooling. Early usage of tools may result in a more consistent approach from the start.

Single tools vs. multiple tools Having multiple roles carried out by a single tool is often less costly and often requires less effort. That drawback is that dedicated tools may perform better and allow for a single function to switch to another tool, having multiple functions in a single tool is a package deal.

Designed to order vs. off the shelf tools Developing tools to fit the particular needs of the organization may produce better results. On the other hand they are more expensive than off the shelf products and might not fully conform to standards.

115

Decision making on tooling requires insight in the functions that tooling may have. The following list describes a number of functions. What functions are required depends on the specific situation. Any number of functions can be present in a tool and several tools can be used side by side.

1. Repository. First of all a metadata management tool is used as a repository for external semantic metadata. A repository acts as a single point of storage, ensuring all metadata is captured and no metadata remains out of view. Such a repository may also be used to preserve metadata that is no longer in use. A repository tool may be integrated with the metadata server or use stand alone data. Often the repository function is combined with one or more of those described below.

2. Relation management. A metadata management tool can also be used as a tool in which relations between the semantics are drawn. These relations exist in code but a relation management tool may provide a graphical representation that matched the human perceptional capabilities better. It may also act as a tool to simplify the coding, combining a simple graphical user interface with coding engine. Both practices are common in other types of coding, such as software and website development. The relation management is of particular use in taxonomies and ontology’s.

3. Access. Metadata management tools can be used to access metadata in a different way than an IT-system would. A way that is more suitable for human end users that want to access the metadata directly. Motives for accessing the metadata directly could be communication and review. In both cases a graphical user interface is an added value. Options within an access tool may be to query the metadata or categorizing metadata, thus creating semantic metadata regarding semantic metadata.

4. Business rules. A tool that maps semantic metadata to business rules or even stores business rules. There are strong links between business rules and semantics. A change in the semantic metadata may have impact on the associated business rules and vice versa. Tooling may help in forms of loose coupling such as references or a mapping.

5. Version control. Another possible use of a metadata is version control. Since organizational requirements are not static semantic metadata will change over time. Within networks the rate of change may be even higher due to the greater number of stakeholders. Regarding metadata versioning there are two approaches. First a new metadata set may be released on an interval basis. Second changes can be made continuously with prior or afterward checks. A tool may support both types of processes. Furthermore it is often valuable to track which changes have been made, or be able to access old versions that are associated with older data. For managing the versioning process it may be useful to add versioning process metadata, such as which metadata has been checked and by whom, and which partners in the chain nominated the metadata.

6. Reduction. Metadata management tools may also be used in order to reduce overlap and redundancy. Different systems and/or databases often use the same data. In a metadata

116

repository duplicates may be removed. This results in a reduction of metadata with improvements in performance and consistency as a result. Additionally there may be different standards in use with an overlap in semantics. Some tools allow for more than one representation of the definition, ensuring that changes are carried out consistently over various data standards in use.

7. Translation. Metadata management tools may also aid in the translation of metadata to different formats. Within a network it is possible that more than one metadata standard is used. The adopted standard may differ from legacy systems that are still in use. A metadata management tool may map metadata one on one, but may also provide predefined rules for translation. Translation may be done both periodically as well as on the fly. The former being less flexible but more reliable and less complex than the latter.

8. Orchestration of processes. A tool that aids in cooperation amongst various roles and organizations. Semantic metadata management is a series of processes that are carried out by people. Information exchange and planning of these processes can be supported by tooling. It may include relation management, notification of deadlines, gathering requests for change, indicating level of participation, publication of new versions, etc.

9. Testing. A tool that simulates performance for a new version or larger volume of data. For instance by applying historic data sets. External metadata has its own technical infrastructure. Since both metadata and core data make up the information both must be available. If the capacity of the semantic metadata infrastructure cannot meet the demands a bottleneck that impairs information services is created.

117

9.5 Expert session During an expert session the reference architecture as validated by the case studies has been reviewed, as indicated in chapter 2.2.4. On the one hand these experts are part of the target group of the reference architecture. On the other hand they are in the best position to judge the value since they are senior experts in the field of semantic metadata management.

Expert selection The experts have been selected for their holistic view on semantic metadata management and complementary expertise. Each expert has years of experience and metadata management is key part of their daily activities, unlike many IT experts for whom metadata is one of many topics. Expert 1 has been involved for many years with a multitude of metadata management initiatives at the tax office. These include projects very close to the actual implementation, mapping several implementations for an improved data model and sharing semantics with various kinds of other stakeholders. Expert 2 specifies semantics “on behalf of the experts at the business level” for use in the technical implementation of processes at the tax office. Expert 3 has been involved with merging the partial taxonomies in the SBR project and creating a stable release. This can be considered “one of the largest mergers of semantics coming from multiple stakeholders operating in the same field”.

The experts are complementary since they have a different focus. Expert 1 is most aware of the link with the technology, expert 2 is closer to the business processes and expert 3 is confronted with cooperation with partners. All the experts share that they share a holistic view since they are aware of the various facets of metadata management, including cooperation with other stakeholders, metadata specification, change management and the link with technology.

The expert session focused on the design principles, but the tradeoffs were also reviewed. The experts agreed on the topics chosen as tradeoffs, but noted that more topics could be viewed as tradeoffs. There was no agreement on what these topics ought to be. The experts agreed that the design principles are mostly a balancing act between features and investments, and that the described tradeoffs are mostly about balancing different features of the semantic metadata management effort. The principles should be carried out up to a level that is limited by available effort and funds. The tradeoffs are carried out up to a level that suits the interests of the stakeholders best.

During the expert sessions four aspects of the design principles were reviewed: 1) The experts did indicate if they agreed with each individual design principle, plus the rationale. 2) The second aspect called for a holistic view on all principles regarding consistency and completeness. The principles should not conflict, cancel each other out or overlap too much. 3) All principles that were found valid were ranked in order of importance, ranging from 1 to X. 4) All principles that were found valid were ranked in order of chronology of implementation, ranging from 1 to X.

Validity of the principles Regarding the first step all principles were found were found relevant and correct by all the experts. Various comments were given and the rationale and implications were adjusted to be more

118 internally consistent or understandable. The second aspect did not lead to the removal of any design principles. There was a unanimous agreement that none of the principles were conflicting. However, the question was raised whether change management and adaptability were the same. The rationale behind this division is that change management is related to the contents and adaptability is related to the architecture. The experts did agree that it was possible to have a versioning approach in place that functioned well, but within a rigid system that is unable to accommodate changes in structure and technology. Additionally it was found that all major topics were covered. Some lower level topics were added to the rationale or the implications of the design principles.

Ranking the design principles Since all principles were found valid all principles were ranked by the experts in order of importance. These are shown on the blue lines in Table 12. Having a consistent metadata model and a conceptual metadata level both rank very high. This is no surprise since these are part of the reason metadata management is carried out. Change management follows suit with a rank of 3 or 4. According to expert 1 “combining all semantics and sharing them with partners did not prove as hard as expected, keeping track of all changes that followed is what proved to be the major challenge”. Expert 3 adds “determining the release frequency, processing all comments and providing a stable release is much more difficult than most technical challenges”. While the experts agreed on the higher ranks the lower ranks show hardly any similarities. Unlike the other experts Expert 1 ranked the adaptable architecture lowest as “it is nice to think ahead and make a system durable over time, but making a working system is hard enough already and we have grown accustomed to work with systems that are not adaptable”. Expert 2 ranked adequate tooling the lowest since it has very little to do with the contents and the primary process semantics are to support, but did indicate that “without tooling there is no metadata management”. Expert 3 ranked the mapping with business rules the lowest since “it is something that is nice to have, but even though it can save time, this can be done manually”.

tooling

Design principles Design

Conceptual Conceptual level metadata business Mapping rules metadata External Change management Adequate data Consistent model with Cooperation partners Adaptable architecture Mapping implementation

1. 2. 3. 4. 5. 6. 7. 8. 9. Expert 1 1 6 4 3 7 2 8 9 5 (JB) Planning 1 2 2 2 3 2 3 3 2

Expert 2 1 5 7 4 9 3 6 2 8 (CZ) Planning 1 3 4 3 4 2 4 1 4

Expert 3 5 7 2 3 2 1 4 3 6 (SK) Planning 1 3 3 2 1 1 1 2 3

Table 12: Overview of ranking (blue) and chronology (white) by experts. Created by author.

119

Chronology The fourth aspect is included since importance does not necessarily equal chronology. Principles which are less important may lay the foundation for those who are deemed more important. The ranking is shown on the white lines in Table 12. Interestingly the experts did not rank them from 1 through 9, but each of them independently opted for using 3 or 4 stages. Their rationale was that some design principles could and perhaps should be implemented concurrently. A three level approach is also easier to communicate than a 9 level one.

All experts did indicate that having a conceptual semantic metadata level was the place to start, even though it is a difficult topic to start on. The implementation of a conceptual level will mature over time. At first it may exist only on paper, later with all protocols and tooling in place the management effort will truly bear fruit and outweigh the effort. Implementing change management is the runner up. According to an interviewee from the tax office “change management is where the real challenge of metadata management lies”. Early implementation allows change management to become accepted and a routine.

120

10 Conclusion This chapter presents the conclusions of this research project. First, section 10.1 presents the to the research questions that form the pinnacle of this study. Subsequently, section 10.2 reflects on the reference architecture that is the main result of this study. Section 10.3 presents the societal value of this research and provides recommendations. Then section 10.4 discusses the scientific contribution and remaining knowledge gaps that provide a basis for further research. Section 10.5 finalizes the conclusion with a personal reflection.

10.1 Conclusions This section presents the main conclusions of this research project. The potential and complexity of semantic metadata management are presented first. These findings match the findings/opinions of other research, subject-matter experts and real life examples from the case study. This section continues drawing conclusions on the evaluated reference architecture. Subsequently the cornerstones of the semantic metadata management in PPIC’s are presented. These are followed by a view on implementation and how the unique characteristics of the reference architecture provide their merit.

10.1.1 Potential benefits of using a common set of semantic metadata The first research question asked Why is metadata mentioned in a wide range of solutions to an even wider range of challenges in large cross organizational IT-systems? It is believed that using a common set of semantic metadata can increase the quality and speed of creation and exchange of information products, while at the same time costs and effort can be reduced even further. All information sharing activities are aimed at one objective: having the right information available to the end user, with as little loss, time delay and clutter as possible. Using external semantic metadata enables a number of benefits not attainable before and removes a number of existing barriers:  Electronic information exchange reduces transmission costs for information, which are only a part of the overall transaction costs. Those transaction costs also include translation. Commonality in semantics removes most of the costs regarding translation on both ends of the information exchange.  Retrieval and reuse of existing information is made much easier because information is indexed in a manner that suits the content and processes in which the information is used. This makes creation of new information products less time consuming. In turn this allows more time to be spent at activities that add more value.  Increased use of semantics, improved consistency among semantics and a better fit with the primary processes improves overall information quality. The link between contents and their actual use is safeguarded. Especially amongst organizations operating in a PPIC information quality and insight in the quality level are much improved.  Semantics do not only provide context to data for human end users. Information technology is also able to use semantics. This allows for further automation of processes normally carried out by people. Technologies that are enabled or improved include better workflow support, business intelligence, data mining and automated quality checks.

121

10.1.2 Complexity as fundamental challenge The second research question asked What makes implementing semantic metadata management within networks of organizations difficult? Semantic metadata management is required in order to use semantic metadata effectively in a PPIC. Semantic metadata management is primarily an alignment effort and partially a standardization effort. The alignment effort is very challenging due to the complexity of the situation in which it is carried out. The complexity of metadata management can be characterized by the following four properties. Each property reinforces complexity due to the number of relations, making alignment difficult.  There are strong interdependencies between the technology in use, data models and formats, semantic metadata management processes, the actual primary processes, stakeholders and their various interests.  Compared to other topics in IT, semantics have a much closer link to primary processes and end users. Subject-matter experts need to be included in metadata management and coordination with the end user is required.  Metadata management takes place both within the own organization as well as between the organizations that cooperate within the PPIC. Cooperation among stakeholders has very different dynamics compared to processes within an organization. This makes it more difficult and the results unpredictable.  Certain components or activities are inherently complex by themselves. For instance, mapping all relations in a single conceptual data model is both challenging from a technological (tooling) point of view as well as from a subject-matter point of view.

Much of the challenges regarding information exchange in PPIC’s are artificial. They are not inherently complex. Challenges have arisen by creating connections between systems, processes and organizations that were never designed from the outset to be interconnected in such a way. The benefits of this interconnectivity are desirable. However, the many related inefficiencies and incidents are not.

This complexity we believe is the reason that well orchestrated semantic metadata management is currently not very common in PPIC's. Since semantic metadata management touches on so many aspects of the organization, it can be considered an organizational redesign. Stop gap measures have proven more attractive in the short term. Ad hoc efforts are triggered by incidents and focus on certain aspects of semantic metadata management. For instance, creating a point to point information exchange between two specific systems is a much simpler solution when looking exclusively to transmission costs of information exchange. However, according to the cases translation makes up most of the transaction costs and often remains unchanged.

Fortunately, awareness and interest in semantic metadata management is increasing. Ad hoc coordination and stop gap measures are a responsive approach. Semantic metadata management should be a proactive approach that coordinates and streamlines management activities in order to deal with the complexity. This is believed to reduce the complexity and allows the potential benefits to materialize sooner and at lower costs.

122

10.1.3 Evaluated reference architecture Three research questions remain unanswered. Together the answers of these research questions have led to the development of an evaluated reference architecture for semantic metadata management in Public Private Information Chains.

The preliminary architecture in chapter 6 answers the third research question: Which technological and organizational aspects should be incorporated in the reference architecture according to literature? The case studies in chapter 7 and 8 answer the fourth research question: What architectures do we find in practice for metadata management within the Dutch government? The evaluated reference architecture in chapter 9 includes the answer to the fifth and final research question: What design principles can be derived from the application of the preliminary architecture on the cases?

Answer to the main research question The reference architecture in chapter 9 answers the main research question: What design principles are required and what tradeoffs still have to be made in a reference architecture for semantic metadata management in public bodies that operate in a public private information chain?

The 15 elements that make up the evaluated reference architecture are nine design principles and six domains for tradeoffs. The design principles are prescriptive and the tradeoffs provide leeway by balancing the characteristics of the chosen approach semantic metadata management. The design principles include creating a conceptual metadata model, mapping semantics with business rules, using external metadata, having change management incorporated, using adequate tooling for the multi-stakeholder context, having a consistent data model, cooperation with stakeholders, using an adaptable architecture and mapping semantics with implementation. The domains for the tradeoffs are cooperation archetypes, metadata publication and level of commonality, roles in semantic metadata management, consultation protocols, versioning protocols and functions of tooling.

The evaluated reference architecture was derived by applying the preliminary reference architecture that was based on scientific literature on two complementary case studies that were followed by an expert review. The reference architecture is regarded as complete. All best practices from literature and practice could be grouped under the 15 elements that make up the evaluated reference architecture or proved to be not relevant. Conclusions related to the content of the evaluated reference architecture can be found in the next section, which is followed by a conclusions related to the chosen format.

10.1.4 Fundamental solution The reference architecture is the answer to the main research question of this thesis, a summary is provided here. The solution for metadata management presented in the reference architecture is based on mitigating the main challenge and reinforcing one of the main potentials: reduction of complexity. As stated earlier, much of the complexity regarding information exchange in PPIC’s is artificial. Introduction of metadata management enhances performance over time by reducing complexity. This reduction is achieved through establishing alignment between processes, data, technology and organizations. The semantic metadata management approach that is introduced in

123 this research has two pillars. First, a conceptual model is introduced. Second, the relations between all components in the organizational architecture are actively managed.

There are three reasons why having a conceptual model for semantic within the organization is desirable. First of all, it acts as a master file, being a single authoritative location for semantics. In its role as a master file it can be used as a guide for specifying and implementing changes in semantics and ensuring consistency among processes and implementations. Second, it acts as a bridge between the source of the semantics, the primary processes, and their implementation in technology and data models. Experts in the primary process who are consulted to specify and review the semantics in use often have little knowledge of the technical implementation. Therefore, they might find it hard to grasp the context and review software code. Third, it acts as a bridge between the organization and the other organizations within the PPIC. Therefore, making it easier to communicate and proof compliance to agreements and standards.

The active management of relations is the second fundamental of the solution. The effort takes place on various levels and are captured in procedures. First of all, the relations among the semantics on the conceptual level are mapped. The dynamics of the relations over time are covered as well by including versioning. Second, the relations between concepts and implementation are mapped. The mapping relates to both technology and data models. Third, the link between the actual meaning and implementation is carefully monitored, comprising of consulting subject-matter experts from the primary process and mapping the business rules that are used within those processes. Finally, in the ideal model there is interaction and coordination amongst the partners in the PPIC.

10.1.5 Implementing semantic metadata management The general direction of the solution is described above. This is the first step towards implementation. Developing the actual implementation, including both physical components and the intangible roles and procedures, requires a much higher degree of detail. The optimum implementation strongly depends on the specific context in which semantic metadata management is implemented. The reference architecture developed in this research supports the persons with the task with semantic metadata management, both within and between organizations. The reference architecture provides the final picture. Therefore, it makes it easier to structure the many pieces of the puzzle.

The solution presented in this thesis is generic. The design principles and tradeoffs apply in a similar way to both private and public organizations. Moreover, it applies to organizations with different base maturity level in technology, data management and processes and with a varying level of ambition on this topic. In an information chain the diversity in stakeholders and their interests is a given situation. A certain degree of commitment and effort can be expected from the partners in the chain, but much of the alignment should not interfere with own processes or bring an additional burden. The solution in this research deals with this problem. Even though the solution is primarily aimed at providing benefits in inter-organizational information exchange, it is beneficial to individual organizations as well.

124

The reference architecture in this paper differs from many ‘classic’ IT reference architectures dating from the 90’s. These types of architectures have shown a number of pitfalls that have been avoided in this reference architecture. First of all, they showed an utopia which assumed a green field to start with. Also, they would provide their merit only once fully finished. The current structure of an organization or a roadmap were not included. The proposed reference architecture allows for incremental implementation and transition in an order and pace that suits the balance of interests of an organization. Second, most reference architectures have the end goal that all implemented infrastructures are identical and therefore compatible. This reference architecture improves alignment but allows for much freedom and diversity to suit organizational needs and characteristics. Finally, the classic reference architectures have a very strong focus on information technology and little regard for end users and primary organizational processes that they support. The presented reference architecture does include primary processes, their professionals and their business rules in order to maintain the relationship between the semantics in the system and their real life meaning and application.

125

10.2 Reflection on evaluated reference architecture The preliminary reference architecture was based on literature and then evaluated by applying it to two case studies. Many individual topics presented in literature proved right in reality. However, the interrelations and tradeoffs between topics found during this research were hardly mentioned in literature. In reality each topic had more depth, issues and tradeoffs.

The reference architecture is very generic by nature and contains little of the specifics that are found in many other architectures, such as those from the IEEE. This was clear from the start of the research as shown in chapter 2.4. The generic nature is required since the organizations that will have to apply the architecture differ a lot in technology, data and processes. Additionally they differ also in maturity levels, size and competences. Even within a PPIC there are many different types of organizations. This reality imposed a new requirement which was not found in literature before the start of the research.

The reference architecture focuses on all important parts and those specifically linked to semantic metadata management. These are covered in the design principles and tradeoffs. Due to this method for selecting what topics are covered by the reference architecture some are on a very high level (strategic choices on level of cooperation) while others range to much lower levels (tooling). The topics that are not covered in depth can be found in the scientific literature that is referred to throughout this thesis.

The reference architecture provides a good view on what the desired end state looks like, but provides limited advice on how to get there. Initiation of cooperation and growth stages are mentioned and some topics, like cooperation and integration levels, are covered. However, there are no clear approaches or guidelines for growth towards the desired situation. Including a growth path was never part of the research plan. Given the added value of this missing feature this has been partly addressed by having the experts list a number of growth stages for the design principles.

10.2.1 Assumptions The reference architecture is based on a number of assumptions. In case they be proven faulty the impact on the validity should be reviewed. An overview of the main assumptions:  A reference architecture helps during the design phase in creating awareness, easing communication, exchanging knowledge and providing insight in relations and tradeoffs.  Alignment within the organization and over organizational boundaries is possible. Even with so many variables and components that have to be aligned.  Every relevant actor in the PPIC cooperates and takes an active part, at least on the interface (standards) level. In reality the chain (or even network) can be broken without full cooperation and alternative solutions are to be found. The final assumption is related to the methodology:  Having two extensive complementary cases provides enough validation to label the reference architecture as evaluated.

126

10.2.2 Test on quality indicators Before the reference architecture was created a number of quality indicators were listed in chapter 2.4.6. In Table 13 the evaluated reference architecture has been compared against the quality indicators. In general it shows that the reference architecture meets the goals set in advance.

Quality indicator Result Interdependencies & Literature provided many insights into components for semantic metadata tradeoffs management, but hardly any interdependencies and tradeoffs. The figure in which all design principles are combined into a single generic enterprise architecture for semantic metadata management shows the interdependencies between the design principles. The tradeoffs are specifically listed as well. As such both interdependencies and tradeoffs in characteristics are covered. Multiple perspectives The final reference architecture encompasses the views, requirements and & roles best practices of many (if not all) perspectives and roles. This was assured by using experts with various backgrounds including the end users in the primary process, those who check the system’s compliance, those responsible for data models, those who specify and maintain semantics, those responsible for technology and information exchange, those coordination inter-organizational cooperation and so on. Generic solution, The evaluated reference architecture is a very generic solution applicable neutral to a wide range of organizations. It encompasses commonly used technologies, although applied in a new context. Regarding tooling for semantic metadata there are very few vendors. However, none are explicitly mentioned and the functions of tooling are described in a generic way. Since no existing tool fulfills all roles a mixture has to be acquired in any case. Science & best The preliminary architecture is based on a literature review, while the practices evaluation is carried out on two case studies. Final evaluation was carried out by experts with hands-on experience. Therefore both requirements are met. Laws & regulations Using the reference architecture does not automatically result in compliance to laws. There is enough room for compliance while adhering to the design principles at the same time. With semantic metadata management in place it actually is easier to prove compliance since processes and semantics will be well defined, aligned and described. Concise, Spanning multiple pages the reference architecture is not very compact. understandable, easy Given the complexity and large scope of the architecture it can also be to communicate considered very concise. There is no actual criterion on what is a concise or lengthy reference architecture. The architecture by Angelov and Grefen is only a few pages long, the NORA well over 300 pages including appendices. The experts viewed the reference architecture as understandable and the use jargon was avoided where possible. Open design space Leaving as much design space to meet conditions unique to the given situation was one of the premises of the design of the reference architecture. The design principles capture prescriptive part of the reference architecture. Even though prescriptive they still offer room. The tradeoffs limit offer a lot of design space since they allow for different solutions in many areas.

127

Understandable to There is very little jargon and the technological nature is limited. This various backgrounds makes the reference architecture easy to understand while the complexity and depth are still maintained and visible. This characteristics was shown during the expert session with experts of different backgrounds. Table 13: Test on quality indicators. Created by author.

10.2.3 Test on quality indicators for design principles Given that the design principles are of such importance to the quality of the reference architecture special quality indicators have been listed under chapter 2.4.6. Each quality indicator is addressed in Table 14. There are no quality indicators dedicated specifically to the tradeoffs that makes up the other half of the reference architecture. Their content has already been covered under the general quality indicators.

Quality indicator Result Understandable Despite the complex nature of the research topic the design principles proved understandable, both on their own and as a set. This has been corroborated by the experts that reviewed the principles, who mentioned that they were clearly presented and understandable. Robust The robustness of the design principles is questionable as they have been designed to allow as much of the design space to be open to the needs of that specific situation. Their application will result in a roughly similar design in near similar situations, but not identical. Complete During the expert sessions it was found that no principles were found missing. Topics that were deemed important but are more a tradeoff than a principle are covered in the listed tradeoffs. Consistent The consistency was tested during the expert session. None of the experts who reviewed the design principles found any inconsistencies. Additionally the principles have been combined in a figure with three levels to show their impact and mutual relationships. Stable Stability of the design principles is to be proven over time. However, they have been designed with adaptability in mind and should be able to cope with new technologies, trends in data usage and new processes and stakeholder constellations. Table 14: Test on quality indicators for design principles. Created by author.

128

10.3 Recommendations The value for society of this research project materializes in several ways. In the long term the evaluated reference architecture may aid in the performance of Public Private Information Chains. In the short term the evaluated reference architecture is a practical tool for mapping existing initiatives and creating awareness. Additionally, practical recommendations have been made for the specific challenges presented in the tax office and Bureau Jeugdzorg case studies.

Long term benefits The long term goal of this research project is to contribute to increasing the effectiveness of Public Private Information Chains. If within PPIC’s a common set of semantic metadata can be adequately managed and transaction costs can be reduced even further while improving information quality this presents a great added value to society as a whole. The exact value of this research in this grand setting is hard to determine. The short term benefits are much more tangible.

Short term benefits In the short term the evaluated reference architecture can be used to map existing semantic metadata management efforts to determine the maturity within organizations. The holistic approach can be used to align efforts that already take place independently or on an ad hoc basis.

Moreover, the evaluated reference architecture may be used as a communications tool to create shared awareness among stakeholders. This applies to both stakeholders within and across organizational boundaries. Once the awareness that semantic metadata management enhances organizational performance is present it may set off a chain reaction. Awareness allows for a cost benefit analysis to be carried and ownership can be determined. Actions that are already carried out can be consciously linked to semantic metadata management and aligned with other initiatives. Ownership and accountability will eventually result in implementation.

Finally, several no regret measures can be implemented. No regret measures still provide benefits even when using a common set of semantics within the PPIC is not achieved. It depends on the organizational characteristics which initiatives are no regret measures. The most likely candidate is the use of a conceptual level of semantics. Even when other organizations will not participate in using a common set of semantics there are benefits to having a conceptual level. Mapping all processes, data model and technology to that conceptual level results in increased control over and insight in operational performance. This is valuable to organizations seeking increase operational excellence, reduce the number of error, increase efficiency or desire to prove adherence to regulation.

Recommendations for Bureau Jeugdzorg The first thing BJz should do is create a conceptual level in order to align the primary processes and to align the semantics of the individual cases with the creation of aggregated reports. This will steadily improve the quality of management information, allowing for better decisions to be taken in all areas. Having a conceptual level is a no regret measure. Even when other organizations in the chain do not adopt the model it helps in three ways. It makes internal information exchange easier, results in more accurate management information and the output of BJz is easier for other parties such as the courts and care providers to interpret.

129

The second priority should be having the semantic metadata external and add a form of tagging. This will unlock the contents to advanced search options. The employees within BJz would immediately save a significant amount of time during their daily activities. This feature will alleviate some of the pressure that is perceived as high by BJz employees. In addition it will also make versioning less costly and less time consuming.

Third, semantic metadata should be actively managed by BJz. Roles and responsibilities should be defined. Ownership and active management prevents fault and misinterpretation and may therefore even Protocols for change management should be implemented in order to match semantics with the needs within the primary process. Those changes must also be aligned with the creation of aggregated reports.

Recommendations for the tax office The first priority of the tax office should be the creation of a conceptual level in order to align the primary processes and their implementation within and beyond organizational boundaries. The conceptual level may act as a point of reference, a master data file. Given the scale effort this will not be the first principle to be in effect. Even during development the insights that are gained can be put to use in other projects that are implemented earlier, such as the two described below.

Second, change management protocols must be in place. The management of semantic metadata should support follow on processes, such as its implementation in processes, protocols and technology. In the tax office case every year a number of changes must be processed regarding processes and corresponding data models. Before changes can be implemented the design must be finished and reviewed, this includes the semantic metadata. Communication to partners in the chain and cross organizational implementation take time. This reduces the one year window to only a few months.

Third, the tax office should use an adequate set of tooling for various metadata management activities. Given the amount of semantic metadata, rapid cycle of changes and number of people involved alignment is very challenging. Tooling helps to capture and record all actions and results of metadata management efforts.

130

10.4 Scientific contribution This section reflects on the chosen methodology and the scientific literature that has been found. Additionally, given the low level of granularity of the evaluated reference architecture there is much room for further research.

10.4.1 Reflection on methodology The main methodology used in this research project is design science, using the framework by Hevner. A design approach based on both rigor and real life requirements has proven valuable. Table 7 shows the 14 steps of getting to the evaluated reference architecture. The first 8 steps relate to using the existing knowledge base. The final 6 steps rely on real life practices. Although design science has shown to be valuable there are some reservations as well. Design science leaves little room for exploratory research. It is hard to assess and refine a design when the existing knowledge base is inadequate. The final design is to be determined by the stakeholders that are involved.

The focus on an artifact with well defined characteristics also leaves little room for process designs that focus on stakeholder constellations and interaction. The evaluated reference architecture has a unique format that combines the rigor proposed by Hevner with the leeway required to respond to multi-stakeholder complexities. The end result adds value to the design science approach. The tradeoffs that complement the prescriptive design principles add leeway to the rigor of design science.

10.4.2 Reflection on literature The available literature can be characterized in two ways. On the one hand there is an enormous amount of literature available that is only partly relevant to semantic metadata management. On the other hand there is little literature focused on semantic metadata management over organizational boundaries. The conclusion is that the overall framework is missing, but that many individual relevant topics are well researched.

Most available literature has a single focus. A holistic view is missing, its role in the constellation of other topics is not defined. In practice initiatives on all levels proved interrelated resulting in tradeoffs or even mutual exclusiveness. Many research projects conclude that there is a single truth given a well defined setting. In practice the problems at hand never fit the well defined settings and multiple options with different characteristics are available. In this research project this proved most prominent in the areas of tooling, stakeholder cooperation, metadata standards and specification of semantics.

Aside from the engineering and project management approach there also is an area of study related to process design. This area of research is much better able to deal with stakeholder complexity, which is abundantly present in this domain. With process design focus lies on getting from A to B instead of focusing on what B should look like. Instead of a well defined solution the solution remains vague. This vagueness is excellent for forging an alliance of stakeholders. The downside is that this domain relies on well defined standards for IT, data models and semantics. Once there is an agreement on cooperation the process should quickly focus on setting such standards.

131

The holistic view of the reference architecture may provide a context for the more focused literature, allowing those involved with semantic metadata management to identify the relations with other topics and look for potential tradeoffs or mutual benefits.

10.4.3 Grounds for further research This research has filled in a blank spot in research on semantic metadata management. During this research project a number of questions were answered but even more emerged. The most interesting questions might form the basis for further research in the areas of semantic metadata management and reference architectures.

Applying the performance in a real life application The outcome of this thesis is an evaluated reference architecture. Once applied by the target audience it will become a tested reference architecture. The result of that test can be that it is helpful in the design phase or is of little use, and either way in which areas it can be improved.

New generation of reference architecture The reference architecture presented in this thesis is very different from the classic ones from the 1990’s. It was designed in a way that avoids the elements which were thought to be flawed in those reference architectures. One of the unique elements of this reference architecture is that the approach is incremental and assumes a legacy organization. Additionally, it sets out to improve alignment even though the final implementation and maturity level will differ for every organization within the PPIC. Finally, it can be reviewed if the increased focus on the primary process translates in a better fit. The assumption that these three features make a reference architecture perform better can be tested.

Emergent behavior and independent alignment A feature of the reference architecture is emergent behavior in systems architecture. The reference architecture restrains design choices on a micro level. On a macro level this should translate to effects and system behavior that is desirable. The emergent behavior theory partially exists already in the field of interface management. But the setting of semantic metadata management in PPIC’s involves procedures and other forms of non-technical cooperation as well.

Wider applicability of the reference architecture The reference architecture in this thesis was design for a very narrow context: public private information chains in the Netherlands. Additional research may be performed in order to check if it is applicable in the same situation in other nations. Furthermore given the generic nature and inclusion of private parties it may also be applicable to information chains among private parties only.

Optimum size of information chains and common semantics This research was based on two existing and well defined information chains in the case studies. There is reason to believe that there is an optimum span for an information chain and an optimum

132 amount of common semantics. For both factors one can argue that the overhead is disproportionate for a small span or a very large one.

Implications of semantic metadata management The impact and implications of every design principle and tradeoff can be individually researched. The scope in this research project did not allow for a lot of depth in the many topics that are touched upon. With additional knowledge Semantic metadata management is presented in this research as an enabler of further automation and mitigation of side effects of current IT-systems. However, it might show undesirable side effects in the future or could present both costs and benefits of which science is not yet aware.

Hierarchic information chains The current approach to semantic metadata management is loosely coupled regarding inbound and outbound information, and also includes metadata management of information that is not shared in the chain. What could be researched are the variables that determine whether a hierarchic approach or the presented loosely coupled approach are more effective. It might be possible that in small and simple information chains, in chains with a certain dominant stakeholder or chain with a major central node a hierarchic approach presents economic or managerial advantages.

133

10.5 Personal reflection This report presents the results of an eight month research project on semantic metadata management. Although this research has been fruitful in terms of the new insights, knowledge and reference architecture it produced, a number of challenges were encountered during the process as well. These challenges were primarily related to the complexity and scope of the subject and the reference architecture that was to be designed.

First, it proved very difficult to define the scope. The topic of semantic metadata management is very broad. Starting out with ‘metadata management’ I eventually ended up with ‘metadata management regarding external semantic metadata in public private information chains’. Even with such a well defined scope the number of applicable topics, literature, theories and questions remained enormous.

Second, many challenges regarding semantic metadata management are commonly found in large IT-projects spanning multiple organizations. Others proved unique to semantic metadata management. Choices had to be made on which topics and theories to include in this research. I decided to include all topics and theories that seemed to have a major influence on the eventual design, plus some that were unique to semantic metadata management.

Third, it was very difficult to determine what the format of the reference architecture was to be like. During the SEPAM study I encountered a number of reference architectures, all of which differed in size, scope and purpose. Literature on reference architectures in general proved very limited, so I turned to examples of actual reference architectures. Using those examples I defined the scope and nature of the architecture myself.

Fourth, according to literature the 1990’s school of thought regarding reference architectures has proven to have some flaws. These include green field thinking, being very strict in design space and having a too strong focus on IT alone. The proposed reference architecture was designed in a way that circumvents these known flaws, using both design principles and tradeoffs.

Fifth, keeping the reference architecture generic was difficult. Creating a generic model proved very hard. It is tempting to show insight in this difficult topic by making a much more in depth design for a specific situation.

Finally, during the literature review, case studies, expert interviews and the interviews with the independent experts a lot of information was gathered. Only a portion was relevant to the narrowed down scope, the rest providing context to the research domain and case studies. Only a selection could be incorporated in this thesis for readability and confidentiality. Personally having the desire to provide as much context and support of observations it proved very hard to restrict myself to only provide the most relevant information.

134

11 References

Albrow, M. (1970). Bureaucracy. London: MacMillan. Anderson, P. (1999). Complexity theory and organization science. Organization Science, 10, 17. Angelov, S., & Grefen, P. (2008). An e-contracting reference architecture. Systems and Software, 28. Baarda, D. B., & De Goede, M. P. M. (2001). Basisboek methoden en technieken. Handleiding voor het opzetten en uitvoeren van onderzoek. . Groningen: Stenfert Kroese. Bakker, J. G. M. (2006). De (on)betrouwbaarheid van informatie: Pearson Ecudation Benelux. Bass, L., Clements, P., & Kazman, R. (2003). Software Architecture in Practice (2nd ed.): Addison- Wesley Professional. Bergeron, B. (2003). Essentials of XBRL Financial Reporting in the 21st Century. Hoboken: John Wiley & Sons. Bessant, J., & Tidd, j. (2007). Innovation and entrepreneurship. Chichester: John Wiley & Sons. Bharosa, N. (2011). Netcentric Information Orchestration: Assuring information and system quality in public safety networks. PhD, TU Delft. Blecker, T., & Kersten, W. (2006). Complexity Management in Supply Chains: Concepts, Tools and Methods. Berlin: Erich Schmidt Verlag. Borghoff, U. M., & Pareschi, R. (1997). Information Technology for Knowledge Management. Journal of Universal Computer Science, 3(8). Brandt, S. A., Miller, E. L., Long, D. D. E., & Xue, L. (2003). Efficient metadata management in large distributed storage systems. Paper presented at the Efficient metadata management in large distributed storage systems, San Diego. Clements, P., Kazman, R., & Klein, M. (2001). Evaluating software architectures: methods and case studies: Addison-Wesley Professional. De Bruijn, H., & Ten Heuvelhof, E. (2007). Management in netwerken: over veranderen in een multi- actor context (3rd ed.). Den Haag: Lemma. De Bruijn, H., Ten Heuvelhof, E., & In 't Veld, R. (2008). Procesmanagement: over procesontwerp en besluitvorming (3rd ed.). The Hague: SDU Uitgevers. De Leenheer, P. (2009). On Community-based Ontology Evolution. PhD, Vrije Universiteit Brussel, Brussel. De Leenheer, P., De Moor, A., & Christiaens, S. (2010). Metadataroadmap voor de Vlaamse overheid. Informatie, June. Debreceny, R., Felden, C., Ochocki, B., Piechocki, M., & Piechocki, M. (2009). XBRL for Interactive Data: Engineering the Information Value Chain. Heidelberg: Springer-Verlag.

135

Delone, W., & McLean, E. (1992). Information Systems Success: the quest for the dependent variable. Information Systems Research 35. Egyedi, T. (2003). Consortia problem redefined: negotiating democracy in the actor network on standardization. International Journal of IT Standards and Standardization Research, 1(2), 17. Elmazri, R., & Navathe, S. B. (2007). Fundamentals of database systems. Boston: Pearson. Farrell, J., & Saloner, G. (1985). Standardization, compatibility and innovation. The RAND Journal of Economics, 16(1), 14. FEA-PMO. (2007). US Federal Enterprise Architecture Practice Guidance. Fokkema, W., & Hulstijn, J. (2011). Process compliance in publice information chains. Paper presented at the IFIP e-government conference 2011, Delft. Ghosh, S. (2010). Net centicity and technological interoperability in organizations. Perspectives and strategies. Hershey: IGI Global. Gonzalez, R. (2007). A Concept Map of Information Systems Research Approaches: Idea Group Inc. Hepp, M., De Leenheer, P., De Moor, A., & Sure, Y. (Eds.). (2008). Ontology Management: Semantic Web, Semantic Web Services, and Business Applications: Springer. Hevner, A. R., March, S. T., Park, J., & Ram, S. (2003). Design science in information systems research. MIS Quarterly, 28(1), 30. Hoffman, C., Watson, L. A., Van Hilvoorde, M., Tan, C., Van Egmond, R., & Watanabe, E. (2010). XBRL For Dummies. Indianapolis: Wiley Publishing. Horan, T., & Schooley, B. (2007). Design science in information systems research. Communications of the ACM, 50(3), 6. Houtevels, Y. (2010). Master data management. Informatie. Humphreys, P. K., Lai, M. K., & Sculli, D. (2001). An inter-organizational information system for supply chain management. International Journal of Production Economics, 70, 11. ICTU. (2006). Het verbeteren van de toegankelijkheid van digitale informatie binnen de Nederlandse overheid: Advies Overheid.nl. ISO/IEC. (2004). ISO 11179 Metadata Registry. Jans, E. O. J., Wezeman, K., & Van Dijk, M. (2007). Grondslagen Administratieve Organisatie (20th ed.). Houten, the Netherlands: Wolters-Noordhoff. Janssen, M. F. W. H. A. (Ed.). (2009). Framing Enterprise Architecture: A meta-framework for analyzing architectural effors in organizations: International Enterprise Architecture Institute.

136

Janssen, M. F. W. H. A., Gortmaker, J., & Wagenaar, R. W. (2006). Web service orchestration in public administration: challenges, roles and growth stages. Information Systems Management. Janssen, M. F. W. H. A., & Van Veenstra, A. F. E. (2005). Stages of Growth in e-Government: An Architectural Approach. The Electronic Journal of e-Government, 3(4), 8. Janssen, M. F. W. H. A., Van Veenstra, A. F. E., Groenleer, M., Van der Voort, H., De Bruijn, H., & Bastiaansen, C. (2010). Uit het Zicht: Beleidsmaatregelen voor het versnellen van het gebruik van ICT-toepassingen voor administratieve latenverlichting. Delft: TU Delft. Kazman, R., Klein, M., Barbacci, M., Longstaff, T., Lipson, H., & Carriere, J. (1998). The Architecture Tradeoff Analysis Method. Pittsburgh: Software Engineering Institute, Carnegie Mellon University. Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (2002). The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses: Wiley. Lankhorst, M. M., Klievink, A. J., Oude Luttighuis, P. H. W. M., Fielt, E., Heerink, L., & Van Leeuwen, D. (2008). Kanaalpatronen. Delft: Telematica Instituut. Luther, J. (2009). Streamlining Book Metadata Workflow. Ardmore: NISO. McClowry, S. (2008). Information Maturity Model, 2011, from http://mike2.openmethodology.org/wiki/Information_Maturity_Model Morgan, T. (2005). Expressing Business Semantics. Northface University. Muller, G. (2011). A reference architecture primer. Eindhoven: Embedded Systems Institute. NISO. (2004). Understanding metadata: National Information Standards Organization NORA. (2010). NORA 3.0: Principes voor samenwerking en dienstverlening. OECD. (2003). The e-government imperative: main findings: Organisation for Economic Co-operation and Development Osborne, David, & Gaebler. (1993). Reinventing Government : How the Entrepreneurial Spirit is Transforming the Public Sector. Plume: Ted. Papazoglou, M., & Ribbers, P. (2008). e-Business: organizational and technical foundations. Chichester: John Wiley & Sons. Platier, E. A. H. (1996). Een logistieke kijk op bedrijfsprocessen. PhD, Technische Universiteit Eindhoven, Amersfoort. Rayport, J. F., & Sviokla, J. J. (2000). Exploiting the Virtual Value Chain. Harvard Business Review, 11. Robertson, S. (2001). Requirements Trawling: techniques for discovering requirements. International Journal of Human-Computer Studies, 55, 17. Sbodio, M. L., Moulin, C., Benamou, N., & Barth, J. P. (Eds.). (2010). Toward an E-Government Semantic Platform.

137

Sen, A. (2002). Metadata management: past, present and future. Decision Support Systems, 37, 23. Silvola, R., Jaaskelainen, O., Kropsu-Vehkapera, H., & Haapasalo, H. (2011). Managing one master data - challenges and preconditions. Industrial Management & Data Systems, 111(1), 17. Strong, D. M., Lee, Y. W., & Wang, R. Y. (1997). Data Quality in Context. Communications of the ACM, 40(5), 8. Sun, S., & Yen, J. (2005). Information Supply Chain: A Unified Framework for Information-Sharing. Intelligence and Security Informatics, 7. TOGAF. (2004). The Open Group Architecture Framework (Vol. Version 8.5, Enterprise Edition). TOGAF. (2007). TOGAF Architecture Principles Section IV: Resource Base: The open group. Universiteit van Amsterdam. (2010). Knowledge Acquisition and Documentation Structuring, from www.commonkads.uva.nl Vanderfeesten, I. T. P., Reijers, H. A., & Van der Aalst, W. M. P. (2010). Product-based workflow support. Information Systems, 36, 19. Verschuren, P., & Doorewaard, H. (2003). Het ontwerpen van een onderzoek. Utrecht: Lemma. Walls, J. G., Widmeyer, G. R., & El Sawy, O. A. (1992). Building an Information System Design Theory for Vigilant EIS. Information Systems Research, 3(1), 24. Wimmer, M. A. (2002). Integrated Service Modelling for Online One-stop Government, Electronic Markets. Electronic Markets, 12(3), 7. Witten, I., & Eibe, F. (2005). Data mining: practical machine learning tools and techniques. San Fransisco: Reed Elsevier. WRR. (2011). iOverheid. Amsterdam: Wetenschappelijke Raad voor het Regeringsbeleid.

138

12 Appendix

12.1.1 Glossary The desk study revealed that in literature there are a lot of synonyms, homonyms and slight variations in meaning regarding the nomenclature of the topics discussed in this research project. This glossary below lists what meaning is used for the terminology in this thesis.

Terminology Meaning Architecture Defined in this research as the constellation of people, processes, data and objects regarding a certain subject. It is descriptive in nature, unlike a reference architecture which is prescriptive. Data Data is a collection of facts. This includes numbers, text, graphs, pictures, etc. When given a context data turns into information. Information Data which is presented in a context, thus providing meaning for people and computers alike. Information system An information system within an organization is the complex of people, processes, data and technology that is used to distribute and analyze information. Metadata All data about other data. It is distinguished in administrative, structural and semantic metadata. Metadata management Metadata management the whole set of procedures and tools related to regarding administration, application, alignment and governance of semantic metadata. These efforts are often laid down in protocols and may be supported with tooling. Public Private Information A public private information chain is characterized as a value chain Chain (PPIC) that relates to information products and that extends over the organizational boundaries of various public bodies and private parties. This means that multiple heterogeneous stakeholders are involved, each with its own perceptions, motives, responsibilities and resources. Reference architecture Defined in this research as prescribing what the constellation of people, processes, data and objects regarding a certain subject should look like. Semantic metadata Metadata that adds context and meaning to data, turning it into information. Since it adds contextual information to data, it makes data interpretable, which in turn makes it easier use for man and machine. Structural metadata Metadata that defines the location and structure of data. As such it has no direct relation with the meaning of the data.

139

12.1.2 Metadata types The US National Information Standards Organization (2004) distinguishes three types of metadata: administrative, structural and descriptive metadata. In this research the definitions of the NISO are used, although descriptive metadata will be referred to as semantic metadata in order to conform to the most common terminology in use today. All three types of metadata are briefly introduced in order to distinct semantic metadata from other types of metadata.

Administrative metadata Administrative metadata relates to the logging of all operations in a database or data warehouse (NISO, 2004). This type of metadata is closely related to the technology and what performance the technology provides when processing the data. It describes usage and indicates the service level the users are receiving. Administrative metadata is hardly ever visible to the end user and is generally used only by those monitoring the functioning of the IT. It has no relation at all with the meaning of the data it describes.

Structural metadata Structural metadata describes the structure and logic of the various components of a data object (NISO, 2004). Since it describes a structure it is recurring and not unique to a certain instance. For instance the design of a table or the structure of a form. For each row in the table the structure remains similar. This makes structural metadata a key element in data warehouses and it is very important in order to retrieve and present the data desired by the end user (Witten & Eibe, 2005). Structural metadata is not related to the meaning of the data in any way. The structure is developed before data is entered in the structure. In short structural metadata is about the whereabouts of and relations among data.

Semantic metadata Semantic metadata provides semantics, meaning and context, to a data element (NISO, 2004). Semantic metadata may include tags, labels, definitions, context, concept, units, references and notes. In short it represents all data that adds context to other data. The data that is being placed in a context is also referred to as core data. Unlike other types of metadata semantic metadata is of particular interest to the human end user that is using data for a certain purpose or task (Borghoff & Pareschi, 1997). Implementation of descriptive metadata can vary. It may be a general tag or specific to a single data element and can be stored externally or with the data itself (Elmazri & Navathe, 2007). Semantic metadata is also directly linked to information quality indicators, such as origin, owner, age and mutation (Strong, et al., 1997).

140

12.1.3 Four domains of stakeholder challenges In this appendix a number of potential challenges for cooperation among stakeholders are listed. First, a number of challenges related to the characteristics of individual stakeholders are presented. Second, difficulties regarding stakeholder constellations are discussed. Third, cooperation in a PPIC requires trust among stakeholders. Finally, infrastructures pose their own set of challenges.

Stakeholder characteristics A number of challenges relates to the characteristics of each individual stakeholder. These challenges are listed below.  The first issue regarding stakeholders is to determine who actually are stakeholders in each case. (Potential) stakeholders are affected and involved in various degrees. Given a certain subject there are one or more parties that are clearly stakeholders since the subject in question is their core business. However, in the PPIC they operate in there will be various parties that are partly or marginally involved with the subject in question, but may be very relevant to other stakeholders or will be severely impacted.  In a PPIC individual stakeholder have varying roles, core business, size and abilities. Semantic metadata in PPICs relates to the entire enterprise architecture, ranging from business to information technology and from the boots on the ground to top level management. On both these axis there are a wide variety of people with different world views and vocabularies [EXP1]. This makes cooperation and coordination difficult. In this light it is hard to determine the boundary of a stakeholder is. Organizations may officially be a single entity but may have various internal stakeholders, such as departments. These internal stakeholders may have conflicting interests or perceptions (De Bruijn, Ten Heuvelhof, & In 't Veld, 2008).  The attitude may vary among stakeholders. Information oriented organizations with the same core business may range from avant garde early adopters to very conservative bureaucratic organizations. These different mentalities may create conflicts even though the goals may coincide [EXP2].

Stakeholder constellation Aside from the characteristics of individual stakeholders the constellation of stakeholders requires thought. It is hard to determine which stakeholders should be included in the cooperation and what approach and type of network are desirable. A variety of networks types and constellations are possible. Networks may differ in span, level of cooperation, number of stakeholders, and type of coupling. Each type of network has its own advantages, drawbacks and opportunities.  A PPIC does appear out of the blue. Cooperation and exchange of information(products) will already be prevalent to a certain degree. This means that there are preexisting dependencies and agreements among stakeholders. These may impact the design space and freedom of choice in selecting partners for cooperation [EXP1]. Existing structures may require a growth path for change or be gradually phased out.  There are varying approaches available for starting and maintaining cooperation and alignment in networks. Options for control range from a hierarchy to close cooperation. Cooperation may be highly structured or be ad hoc. The technical interconnection among stakeholders may range from loose coupling to tight coupling. Often large government bodies adopt a (near) hierarchic approach due to their power [EXP2]. Smaller bodies are

141

more likely to seek cooperation and mutual benefit [EXP1]. Inspections are an exception due to their supervisory nature [EXP2].  The stakeholder constellation will probably not be static, but dynamic. Changes over time may relate to stakeholders. New actors may enter the arena, existing actors may merge or split up, or stop due to change of strategy or bankruptcy. Another source for changes over time are new insights, changing requirements, new opportunities, changes in laws and regulation, and various others.

Knowledge and trust In all situations where multiple stakeholders cooperate and are dependent on each other there are challenges regarding information dissimilarity and mutual trust. According to Farrell and Saloner (1985) knowledge on standards, partners and competition is key, while Egyedi (2003) claims that trust is the primary issue.  There is an information dissimilarity among stakeholders. The dissimilarity exists in various areas that include knowledge of potential benefits and cost structures, risks and uncertainties and stakeholder views and capabilities.  Within the constellation of actors there may be free riders, those willing to reap the benefits without performing the same level of effort as others, or any effort at all. Aside from free riders stakeholders may differ in risk adversity. Stakeholders may exhibit a wait and see attitude, which may not be appreciated by other stakeholders.  Trust among partners is key before agreeing to cooperation in a project with large sunk costs. Sunk costs are investments and effort which cannot be capitalized in any other way than the original purpose. Loss of independency requires solid guarantees before commitment.  In a network it is likely that there are existing dependencies and cooperation. Collaboration on the level of semantic metadata in PPIC does not start out of the blue. However, the holistic approach may result in new relations that did exist since many organization only deal with their direct relations. As a result a new constellation of actors exist with either very strong or hardly any prior ties, creating a knowledge and trust imbalance.

Infrastructures and stakeholders Infrastructures provide their own set of specific stakeholder related challenges (Blecker & Kersten, 2006). These challenges do not only apply to physical infrastructures but also information technology based infrastructures (Humphreys, Lai, & Sculli, 2001).  Infrastructures provide a wide set of benefits, which are hard to determine and sell. Not all benefits are explicitly known beforehand and it is often unknown what part of the potential will materialize. Additionally the gains are distributed over a large set of stakeholders.  Infrastructure projects are long term projects with very little quick wins. They require investments of time and money beforehand and are only operational when fully finished. A nearly finished bridge with a gap is still useless until that very last segment is installed. The same applies to an IT-infrastructure, but in a less visible manner (Humphreys, et al., 2001).  The sunk cost may seem (over even be) insurmountable. Infrastructures require high sunk costs in order to reduce small amount of costs of a highly repetitive nature. The single large effort is usually outweighed by smaller annual gains over a longer time span, albeit not visible at a glance. Also from a financing and cost structure perspective repetitive small costs

142

may be more favorable over a single large investments, even though in the long run it is less efficient [EXP2].  Infrastructures requires conformity, usually to standards that do not fully match each individual organization’s requirements and needs (Ghosh, 2010). Conformity also limits the freedom of choice in the (near) future. Additionally conformity may render old investments useless.  Infrastructures are known for the number of options and alternatives that are available, making it difficult to agree on a design. Large systems with many components enable result in an enormous design space (Blecker & Kersten, 2006). Large funds and long time development and implementation span make technology existing merely in theory an option as well. This may lead to endless discussion and an everlasting considerations and trade-offs.  Infrastructure design and implementation is made more difficult by coordination problems. The effort (time and money) may not coincide with the gains [EXP1]. Given the scale and number of involved actors a knowledge disparity may occur. It is hard to determine which benefits have materialized and who has made what effort.

143

12.1.4 Reference architecture design process This appendix details each of the 14 steps of the design of the preliminary and evaluated architecture. A quick overview is presented in chapter 6.1. Table 15 lists the design steps. The blue steps relate to the design of the preliminary architecture. These match the blue tables with best practices in the literature review. The red steps relate to the creation and evaluation of the final architecture. These steps match with the red boxes with conclusions in the literature study and case studies. Below the table each of the 14 steps is explained in further detail.

• Deriving principles from theories and best practices in literature Step 1

• Clustering principles into similar topics Step 2

• Adding first lessons from case studies/additional literature Step 3

• Removing principles unrelated to metadata management Step 4

• Adding first lessons from case studies Step 5

• Moving some principles to preconditions Step 6

• Restructuring topics according to new insights Step 7

• Defining the tradeoffs and listing their contents Step 8

• Validation of principles in case study interviews Step 9

• Rewriting clusters into design principles Step 10

• Matching design principles to each other Step 11

• Writing design principles in TOGAF format Step 12

• Validation of principles in expert session Step 13

• Finalizing and updating the tradeoffs Step 14

Table 15: List of reference architecture design steps. Created by author.

144

Step 1: Deriving principles from theories and best practices in literature In order to write the research proposal and to answer the first three questions in this thesis a literature study was carried out. A number of principles, theories and best practices were encountered. These were listed in an excel sheet. Double entries were removed and near similar entries were merged.

Step 2: Clustering principles into similar topics When reviewing the list it became apparent that many of the listed principles were related or shared the same topic. A list of over 30 principles, or likely over 50 after the early stages of the case studies, is too much to test in the given amount of time. Additionally the reference architecture would become too large, which would conflict with the desire to keep the reference architecture concise and understandable. NORA 3.0 has 40 principles and the specification of NORA principles is 75 pages alone (NORA, 2010). The complete NORA documentation is several hundreds of pages long.

A lesson that can be drawn from the NORA architecture is the use of topics. The 40 design principles are clustered into 7 topics. These 7 topics cover the whole spectrum the reference architecture covers. Their relevance and interrelations are presented before the 40 principles are listed. Viewed from the TOGAF perspective the topics in the NORA are very close to design principles but not phrased that way. The 40 individual principles are very specific and their combined impact on the architecture is much smaller than what the 7 topics prescribe.

Angelov and Grefen (2008) developed a reference architecture for e-contracting which is more similar to the classic IEEE reference architectures. They do not use design principles but their approach is very similar to the NORA. Their reference architecture consists of 9 components. The characteristics and interdependence of these building blocks are described and within these blocks there are more detailed functionalities, best practices and examples.

NORA 3.0 E-contracting This research project High level principles 7 main topics, plus 9 components, plus 9 principles, plus their that define the their relations interdependencies relations (prescriptive) architecture

Low level components 40 low level principles, Various features and 6 tradeoffs /dilemma’s related to each higher 3 to 8 per topic functionalities per to balance level principle component characteristics

Table 16: Overview of reference architecture structures. Created by author.

Step 3: Adding first lessons from case studies and additional literature The next step was to draw some lessons from the case studies. In scientific literature many aspects of metadata management, plus its benefits and challenges, are covered. However, most literature is focused on a single topic, but how it fits into the enterprise architecture is often not covered. In the cases the interdepencies and consequences could be observed. For nearly each topic one or more principles were added. Other observations found in the real life cases were already listed. Some observations led to additional literature review.

145

Step 4: Removing principles unrelated to metadata management Semantic metadata management is closely linked to the goals it helps achieve. As such it is easy to lose sight of where metadata management ends and the effects it can have on the enterprise architecture start. A partial reduction of principles could be achieved by focusing on the subject of this study. In the process of developing this reference architecture this means that principles such as ‘separate the know from the flow’, ‘information should be derived as close from the source as possible’ and ‘information quality must be documented’ are deleted.

Step 5: Adding first lessons from case studies In the iterative process of developing an architecture based on literature and real life practices the next step was to includes lessons from the case studies. A number of principles were added to the topics derived under step 2. Additionally a new topic was introduced: business rules. At first the idea was that business rules were not in the scope of this research. In practice there proved to be a close relation. Therefore the link between semantic metadata and business rules is included, any other aspect of business rules is not.

Step 6: Moving some principles to preconditions Design principles are prescriptive. They dictate what a design should look like. Not all listed principles met the description of the design principles given in chapter 2.4.3. These prescriptive statements did relate more to the setting in which metadata management is to be conducted than the design of the semantic metadata management itself.

Step 7: Restructuring topics according to new insights The early case study observations and expert interviews led to new insights. Related to semantic metadata management data standards mainly function as interface, both within the organization and when exchanging information with partners in the chain. For that reason the topics standards and interfaces were merged. The other aspect of standardization, the reduction of overlap and complexity, is already encompassed by a conceptual metadata level.

Step 8: Defining the tradeoffs and listing their contents With all the topics specified all elements that were to be prescribed were covered. Other topics proved to be very relevant, but not necessarily prescriptive. The reference architecture needs to be generally applicable within the setting that is presented: the PPIC. This setting still allows for a wide variety of cases. There is no single approach to metadata management that works in every case. A portion is generic, but some topics and many of the details differ per case. This knowledge is applied in the tradeoffs. The building tradeoffs useful information for semantic metadata management, but are not prescriptive. It is up to the designer(s) to determine how the contents of the tradeoffs are to be applied. Most tradeoffs pose dilemmas since characteristics have to be weighed against each other.

Step 9: Validation of principles in case study interviews Each of the topics which was found relevant was discussed during the case study interviews. The topics were applied to the case and it was discussed which role and added value it would have. Additionally the experts named examples and expressed their opinions on each topic. The list of

146 topics and example questions can be found in the interview protocol. During the validation phase it became apparent that versioning of the contents and adaptability of the infrastructure are not the same. Adaptability was added as a separate topic, bringing the total up to 9.

Step 10: Rewriting clusters into design principles The validation of the principles in the case study interviews allowed the content of the cluster of principles to be made more specific. Each cluster was rewritten into a single design principle which covered the content of the entire cluster. The design principles were also adjusted to match one another to make sure that as a whole they made sense and covered the entire architecture.

Step 11: Matching design principles to each other All the design principles influence each other in one way or another. It is not a set of independent principles. The set as a whole results in the characteristics that are desired. This means that all design principles must be in sync. The effect of one principle must not cancel the other out. Some principles may reinforce each other. Adaptability of the design is partly achieved by loose coupling with the implementation. Change management is partly enabled by having the right roles and responsibilities. Tools may aid in any metadata management activity that is performed.

Step 12: Writing design principles in TOGAF format In this research the design principles are structured in the same format as detailed in the TOGAF architecture (TOGAF, 2007). This means that each design principle is captured in a short unambiguous statement which is provided with a name, rationale and implications. This meant that the statements were adapted to become as unambiguous as possible. A rationale was added in which some of the statements of the extensive lists were incorporated. The implications were listed as far as possible. The true implications differ depending on the starting characteristics of the actual case.

Step 13: Validation of principles in expert session An expert session with various experts was held in order to validate the reference architecture. The conclusion was that the experts agreed on all principles, but each of them had some remarks or suggestions. During the expert session the principles were reviewed on three aspects. First the experts could indicate whether they agreed with the design principles. Then the principles were ranked in order of importance. Finally the experts could indicate what would be the best order in which to implement the principles.

Step 14: Finalizing and updating tradeoffs Based on the validated set of design principles the tradeoffs were developed. Many of the topics and contents were already listed under step 8. These were expanded into full text. The contents of the tradeoffs comes directly from literature and best practices. The way they are listed differs from the literature approach, case studies and interview transcripts. As with the other parts of the reference architecture they are presented as statements. The scientific foundation for these statements is this research.

147

12.1.5 Metadata tooling example Tooling for semantic metadata management is a niche product. This appendix shows an example in order to get a feel for what a metadata management tool may look like. In this example screenshots from the Collibra tool are shown. This was selected since it combines several functions described in section 9.4. The functions shown in the picture are described in the picture footnote.

Figure 23: A single definition provided with an example and characteristics. Ownership and status are shown on the right, with other options in the menu below. From Collibra.

148

Figure 24: Overview of relations defined between several semantics. From Collibra.

Figure 25: A relation between semantics being defined in a menu. From Collibra.

149

Figure 26: A simple business rule added to a definition. From Collibra.

Figure 27: A taxonomy created from a number of semantics. Combining both categories and relations. From Collibra.

150

12.1.6 Roles and responsibilities Janssen, Gortmaker & Wagenaar (2006) have identified eight types of roles for web service orchestration in public administration. These observations are very valuable since the context is very similar to this research. First, the public administration/e-government setting meets the criteria of the PPIC definition. Second, web services are one of the premier means of electronic data interchange. Third the means of cooperation is not only on the technical level but also is strongly related to the content and primary processes.

Table 17 shows the original list by Janssen, Gortmaker & Wagenaar. Table 18 shows the list that includes the additional insights gained during the evaluation of the reference architecture, shown in chapter 7.4.3 and chapter 8.4.3. There are four major adaptations:  The service and product aggregator role has been removed, since it was found not applicable in both cases. This role is more specifically related to web services.  The end user role was added. In the original list this has been partly covered by the developer role, which looks after the interests, objectives and requirements on behalf of the end user. In semantic metadata management the end user is actively involved.  The information analyst role was added to ensure a good fit between semantics within the information system and the requirements in the primary process.  The implementation orchestrator role was added that maintains the conceptual set of semantics that are implemented in the various systems, processes and data models.

151

Role Description Initiator and enabler This role is to convince and stimulate agencies to participate in and role commit to an automated process execution. Some organizations might initially resist the idea to use Web service orchestration technology for improving cross-agency processes. This might be due to a lack of knowledge, but also healthy suspicion. Often it is necessary to educate agencies on the basics of the technology and to show the potential advantages. Developer role This role is about defining the requirements for each agency in order to enable cross-agency processes. This role involves the identification of the organizations and departments involved and determines the interests, objectives, and requirements for each of them. Standardization role Technology interface standards should be determined and set as a standard. Existing systems can be selected as standard, but it can also be better to develop and impose new, preferably open standards. Control and progress The time-dependent sequence of activities performed by agencies needs monitoring role to be managed. This role should control the sequence of Web service invocations and collect progress and status information. All unexpected events, such as non-availability of Web services, should be tracked as soon as they occur and analyzed to determine what actually did happen and why, to ensure reliable cross-agency process execution. Facilitator role This role facilitates the implementation of cross-agency processes by collecting and disseminating best practices, reference models, and reusable system functionality such as identification, authentication, and payment. Ideally, functionality and databases are shared when possible and duplication of efforts is avoided. Service and product There should be a one-stop shop that provides a consistent point of aggregator role aggregation and is equipped with logic to meet customers’ needs. Needs should be analyzed and translated into product and service requests, and related products and services should be recommended, multiple processes started, status information provided, and the results of each process aggregated into a single answer. For this purpose the services and products should be bundled into one large catalogue and rules determined to translate citizens’ and business’ needs into the appropriate multiple cross-agency processes. Accountability role As a general rule in modern societies, governmental decisions should have accountability. This role should ensure that the motivations behind decisions made by each agency and the performance and outcomes of the complete cross-agency process can be accounted for. Process improvement Changes in processes and governmental rules often affect more than one role agency. This role should maintain an overview of the cross-agency processes and define mechanisms and procedures to assess the implications of changes in law, technology, and other developments. This role initiates complex transformation processes to restructure the public sector. Table 17: Roles identified in web service orchestration by Janssen, Gortmaker & Wagenaar (2006).

152

Role Description Initiator & enabler This role is to convince and stimulate agencies to participate in and commit to an automated process execution. Often it is necessary to educate agencies on the basics of the technology and to show the potential advantages. End user The end user is the one who actually uses the semantic metadata. In general these are the experts within the primary process that handle the information that is provided with semantic metadata. Given the number of organizations and different types of specialists within the PPIC this group is rather large and heterogeneous. As such the end user is both the expert in the own organization as well as the next link in the chain. In metadata management the explicit role of the end user is to verify the validity and applicability of the common set of semantics. Developer Defining the requirements for each organization in order to enable cross- agency processes. This role involves the identification of the organizations and departments involved and determines the interests, objectives, and requirements for each of them. Information analyst The information analyst ensures a good fit between semantics within the information system and the requirements in the primary process. Semantics have a lifecycle, starting at specification and requiring constant alignment. Standardization Technology interface standards should be determined and set as a standard. Existing systems can be selected as standard, but it can also be better to develop and impose new, preferably open standards. Implementation Semantic metadata management is to be centered around a conceptual set orchestrator of semantics. This serves as a reference for the implementation in various systems, processes, data models, forms and business rules. Whether this conceptual set is self-maintained or imposed the application of the metadata in the various forms of implementation must be orchestrated. Control and process The time-dependent sequence of activities need to be managed. All monitoring role unexpected events should be tracked as soon as they occur and analyzed to determine what actually did happen and why, to ensure reliable cross- agency process execution. Facilitator This role facilitates the implementation of cross-agency processes by collecting and disseminating best practices, reference models, and reusable system functionality such as identification, authentication, and payment. Ideally, components are shared when possible and duplication of efforts is avoided. Accountability Governmental decisions should have accountability. This role should ensure management that the motivations behind decisions made by each agency and the performance and outcomes of the complete cross-agency process can be accounted for. Process Changes in processes and governmental rules often affect more than one improvement agency. This role should maintain an overview of the cross-agency processes and define mechanisms and procedures to assess the implications of changes in law, technology, and other developments. Table 18: Roles in semantic metadata management. Adapted from Janssen, Gortmaker & Wagenaar (2006), created by author.

153

12.1.7 Schematic overview of primary process at BJz

Figure 28: Information products and relations present in a single generic two year OTS case. Note that there may be multiple instances of each product, with the average case file having about 700 pages. Created by author.

154

12.1.8 Information chain within BJz Within the primary process the BJz employees reuse a lot of information, mostly excerpts and paragraphs, in the creation of other information products. This reuse is mainly caused by the partial information overlap among products. For instance for each individual product to be understandable as a standalone product or to provide a context. An example, the conclusion from planning can be the trigger for drafting a healthcare indication, the goals of which are then copied to the evaluation and matched with the conclusions drawn in a healthcare provider’s report.

Figure 29 shows what relations exist among the most important information products within BJz. These relations are based on the review of case files, the interviews and the design for IJ.

There also is sequential reuse, meaning the reuse of the same information multiple times. An excerpt from product A can be reused in product B and at a later moment in product C. This reuse also takes place within the information chain. The inspection of case files revealed that a paragraph from the RvdK was used in the verdict by the judge, which was in turn literally typed into the planning, copy pasted in a healthcare indication and turning up in the evaluation. Figure 28, located in the previous appendix, shows a two year OTS period and how information can be sequentially reused.

The reuse of information saves a lot of time when drafting products. It also ensures consistency among information products when used properly. However, when typing or copy and pasting information much metadata is lost, unless it is part of the text.

VVeeiliilgighheeididssliljisjstt Veiligheidsplan

Stamblad

Voorblad documenten IJ Contactjournaals CCoonntatacctjtojouurrnnaaaalsls Vervolg plan van aanpak Onderzoeksrapport Actieagenda RvdK

Evaluatie/afsluiting Plan van aanpak plan van aanpak Verzoekschrift rechtbank

Indicatiebesluit

Verzoek tot RRaappppoorrttaaggeess verderstrekkende Melding RvdK zzoorrggaaaannbbieieddeerrss maatregel

Figure 29: Overview that shows links among information products for any form of reuse of information. In a regular case there are multiple instances of most document types. Created by author.

155

Stamblad Indicatiebesluit IJ: Personalia

Typing: Personalia Voorblad documenten IJ C/P: onderbouwing zorg

Onderzoeksrapport Plan van aanpak Actieagenda RvdK Typing: voorgeschiedenis IJ: Bedreigingen & bedreigingen

IJ: Conclusies

C/P: Observaties

Veiligheidslijst Veiligheidsplan IJ: Conclusies

Figure 30: The reuse of information among information products relating to the planning, showing what information is reused in what way. Created by author.

Plan van aanpak IJ: Voorgeschiedenis & bedreigingen Verzoekschrift rechtbank C/P: Observaties en bedreigingen

C/P: Onderbouwing

Actieagenda (IJ/Word) Melding RvdK C/P: Afspraken Evaluatie/afsluiting C/P: Onderbouwing plan van aanpak Typing & C/P

Contactjournaals C/P: Onderbouwing CCoonntatacctjtojouurnrnaaaalsls Verzoek tot verderstrekkende IJ: Conclusies maatregel C/P: Observaties

RRaappppoorrttaaggeess Typing: Observaties zorgaanbieders zorgaanbieders Veiligheidslijst

Figure 31: The reuse of information among information products relating to the evaluation, showing what information is reused in what way. Created by author.

156

Evaluatie/afsluiting Plan van aanpak plan van aanpak

C/P: Observaties IJ: Voorgeschiedenis

Onderzoeksrapport Vervolg Actieagenda RvdK Typing: Bedreigingen plan van aanpak IJ: Bedreigingen (IJ/Word) C/P: Bedreigingen

Typing: Observaties IJ: Conclusies C/P: Observaties RRaappppoorrttaaggeess zorgaanbieders zorgaanbieders Veiligheidslijst Veiligheidsplan IJ: Conclusies

Figure 32: The reuse of information among information products related to the follow up planning, showing what information is reused in what way. Created by author.

Conclusion The reuse of data among information products follows from the lack of a separate knowledge base. Several cycles of reuse increase the possibility for faults and information may become outdated: Every retyping or copy and paste action has the potential for errors. If such actions are sequential actions the possibility of errors rises significantly. Also retyping or copy pasting results in the loss of the original context, leading to misinterpretation and uncertainty about correctness. The lack of any type of link between the original data and the (multitude of) copied data leads to a loss of provenance.

The information may become outdated over time since it is unknown how old the information that is reused actually is, it may have been reused one or more times. Also, it is uncertain if new information on that topic is available, unless the author remembers changes in the situation or specifically looks in all available information on that particular topic.

Additionally it is regarded that metadata is not required in final products. This was not an active decision. In the implemented system metadata is lost upon reuse. That was never regarded as a problem, nor was it an explicit design feature. However, metadata would be of value to the one receiving the end product. Furthermore the extensive sequential reuse indicates that most products are not final products, as they serve as input for other products.

157

12.1.9 Interview protocol This appendix shows the interview protocol that was used in both case studies. The additional experts were also interviewed using this protocol. This wide application of the protocol is possible due to its generic nature. The interview is structured into topics in order to ensure all topics and their relations are covered (Verschuren & Doorewaard, 2003). All interviews were held to validate the preliminary architecture or final set of design principles. Depending on the case and area of expertise the focus on the topics varied per interviewee.

Name and function interviewee Introduction What is the background and expertise of the interviewee? What is the function of the interviewee and how does this relate to metadata management? Potential of What is the rationale behind implementing metadata management in this case? semantics What added value is desired? How important is semantic metadata for the organization? Barriers & What are the main technical/semantic/organizational barriers and challenges for challenges metadata management in this case? Metadata Should metadata specification come from bottom up (existing technical designs) specification or top down (subject-matter experts)? (How) Is semantic metadata reviewed by those who use it in the primary processes? Metadata What does the current semantic metadata model look like? architecture Is semantic metadata stored externally from the core data? Business Is data aggregated for managerial purposes or compliance? intelligence Should business rules be linked to semantic metadata? Tooling What tooling is needed to support metadata management? Is that tooling available? If not then why not? Versioning Does versioning play a significant role in metadata management? What is the frequency and extent of the versions? How is versioning arranged (protocols, cooperation, review)? Are old versions stored ore is the current version overwritten? Management Who is responsible for metadata management? processes, Who are responsible for alignment of primary processes? roles & What protocols do exist for change management? Is semantic metadata a responsibilities standard part of those protocols? Interoperability Do legacy systems play a significant role? & technology What standards are used? What role do standards fulfill? How adaptable is the technical infrastructure? Role reference What should a reference architecture provide for the architects (involved in architecture developing and employing semantic metadata governance in a PPIC)? What are best practices that have not been covered yet in this interview? What are areas of interest that have not been covered yet in this interview?

158