Ref. Ares(2018)3457727 - 29/06/2018

Contract No. H2020 – 730539

IN2SMART Project Title: Intelligent Innovative Smart Maintenance of Assets by integrated Technologies Starting date: 01/09/2016 Duration in months: 36 Call (part) identifier: H2020-S2RJU-CFM-2016-01-1 Grant agreement no: 730569

Open data: a review of the state-of-the-art D7.1

Due date of deliverable: Month 12 Actual submission date: 31-08-2017 Leader of this deliverable: DLR Dissemination level: CO Revision: Issued

Revision History Table Version Reason for change Issue Date V1.0 Initial Issue 18/07/2017 V2.0 Requested Revision 28/06/2018

This project has received funding from the Shift2Rail Joint Undertaking under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 730569 D7.1 GA 730569 Open data: a review of the state-of-the-art

Details of contribution Author(s) DEUTSCHES ZENTRUM - Deliverable coordination FUER LUFT - UND - Main contents in chapters 1, 2, 3, 5, 7.1, RAUMFAHRT EV (DLR) 8.1.2, 8.5, 9, 10, 11 - Discussions about document structure Elmar Brockfeld and contents Jörn Christoffer Groos - Complete document review Rüdiger Ebendt Christian Rahmig Lucas Schubert Michael Scholz Contributor(s) ANSALDO STS S.p.A. - Main contributions chapters 4, 6.5, 7.2 (ASTS) - Discussions about document structure Fabrizio Cosso and contents - Complete document review Matteo Pinasco

NETWORK RAIL - Main contributions to chapter 8.1 INFRASTRUCTURE - Discussions about document structure LIMITED (NR) and contents - Complete document review Caroline Lowe

BOMBARDIER - Main contributions to chapters 8.1.3 – TRANSPORTATION 8.1.6 SWEDEN AB (BT) - Discussions about document structure and contents Zbigniew Dyksy - Complete document review Mikael Danielsson Martin Karlsson SIEMENS - Main contributions to chapters 6.3, 6.4, AKTIENGESELLSCHAFT 8.3 (SIE) - Discussions about document structure and contents Sven Adomeit - Complete document review Andreas Bolm Frank Aust Jochen Grühser THALES GROUND - Main contributions to chapters 6.3.2, 7.3 TRANSPORTATION - Discussions about document structure SYSTEMS UK LTD (THA) and contents - Complete document review David Tickem

IMS-WP7-D7.1-DLR-006-02-I 2 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Kompetenzzentrum - - Main contributions to 6.1.1, 6.1.2, 6.1.3, Das Virtuelle Fahrzeug, 8.2 Forschungsgesellschaft - Discussions about document structure and contents mbH (VIF) - Complete document review Josef Fuchs Alexander Meierhofer FCP FRITSCH, CHIARI & - Main contributions to chapters 8.1.4, 8.4 PARTNER - Discussions about document structure ZIVILTECHNIKER GMBH and contents - Complete document review (FCP)

Gerald Julian Rajasingam WIENER LINIEN GMBH - Main contributions to chapter 7.2.2 &CO KG (WL) - Discussions about document structure Simon Wallner and contents - Complete document review

LULEA TEKNISKA - Main contributions to chapters 6.1, 6.4, UNIVERSITET (LTU) 7.4, 7.5 Mustafa Aljumaili - Discussions about document structure and contents Matti Rantatalo - Complete document review Karim Ramin

IMS-WP7-D7.1-DLR-006-02-I 3 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

TABLE OF CONTENTS

1 EXECUTIVE SUMMARY ...... 10 1.1 Acronyms and Abbreviations ...... 11 2 BACKGROUND ...... 18 3 OBJECTIVE/AIM ...... 19 4 SUMMARY OF RELEVANT IN2RAIL RESULTS ...... 20 4.1 IN2RAIL Description ...... 20 4.2 IN2RAIL Deliverables ...... 21 4.3 D9.1 Asset Status Representation [3] ...... 22 4.3.1 Deliverable content ...... 22 4.3.2 Deliverable Conclusions ...... 23 4.4 D8.1 Requirements for the Integration Layer ...... 23 4.4.1 Deliverable content ...... 23 4.4.2 Deliverable Conclusions ...... 24 4.5 D8.5 Requirements for the Generic Application Framework ...... 24 4.5.1 Deliverable content ...... 24 4.5.2 Deliverable Conclusions ...... 25 4.6 Annex to D8.3: Description of the Canonical ...... 25 4.7 Conclusions ...... 25 5 ONLINE SURVEY ...... 27 5.1 Questionnaire ...... 27 5.2 Feedback ...... 27 5.2.1 Participants’ domains of service ...... 27 5.2.2 Extent of use and use cases of Open Data Exchange formats ...... 28 5.2.3 How generic/specialised are Open Data Exchange formats? ...... 29 5.2.4 “Best” example of an Open Data Exchange format suitable for one of several sources of information ...... 30 5.2.5 The extent to which a participant’s company or institution participates in Open Data Exchange initiatives and communities ...... 31 5.2.6 Optional mindset questions: Open Data Exchange policy and attitude towards Open Data ...... 32

IMS-WP7-D7.1-DLR-006-02-I 4 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

5.2.7 Strengths and weaknesses of Open Data Exchange formats ...... 32 5.2.8 Licensing and/or legal issues hampering application of Open Data Exchange formats 33 5.3 conclusions ...... 33 6 OPEN DATA EXCHANGE: TECHNOLOGIES ...... 35 6.1 Files ...... 35 6.1.1 General formats ...... 35 6.1.2 Specific formats ...... 39 6.2 Modeling Languages and Tools ...... 41 6.3 Communication protocols ...... 42 6.3.1 OPC UA ...... 42 6.3.2 Queue/topic based messaging systems ...... 45 6.4 Web services / ...... 49 6.4.1 Web Services ...... 49 6.4.2 Web APIs ...... 51 6.4.3 Comparison Web Services vs. Web APIs ...... 52 6.5 In memory data grid technologies ...... 52 6.5.1 In-Memory Data Grid Overview ...... 52 6.5.2 Infinispan...... 52 6.5.3 Redis ...... 53 7 OPEN DATA EXCHANGE: APPLICATIONS ...... 54 7.1 Geodata (DLR, LTU) ...... 54 7.1.1 Vector Data ...... 54 7.1.2 Raster Data ...... 55 7.1.3 Geo Web Services ...... 55 7.1.4 Open Street Map, Open Railway Map ...... 56 7.2 Sensor / Measurement data ...... 57 7.2.1 sensorML ...... 57 7.2.2 LAS file format ...... 59 7.3 Maintenance ...... 59 7.3.1 Building Information Modeling – BIM ...... 59 7.3.2 Maintenance Management...... 61 7.3.3 Asset Condition ...... 64

IMS-WP7-D7.1-DLR-006-02-I 5 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.3.4 Alarms Systems ...... 66 7.4 Process Mining / business process analytics ...... 68 7.5 Business Process Data Exchange Standards ...... 69 7.5.1 Business Process Definition Metamodel (BPDM) ...... 70 7.5.2 XML Process Definition Language (XPDL) ...... 70 7.5.3 B2B Information Exchange Standards ...... 71 7.6 Strengths and weaknesses of interchange standards ...... 72 8 OPEN DATA EXCHANGE: USAGE IN DOMAINS AND RELEVANT COMMUNITIES ...... 73 8.1 Railway ...... 73 8.1.1 Open data provided by European infrastructure Managers ...... 73 8.1.2 railML® ...... 79 8.1.3 TAF/TAP TSI ...... 86 8.1.4 UIC 407-1...... 88 8.1.5 RINF – Register of Infrastructure ...... 90 8.1.6 EULYNX...... 93 8.2 Automotive ...... 95 8.2.1 Automotive centered formats ...... 95 8.3 Industry and Home Automation ...... 97 8.3.1 Industry ...... 97 8.3.2 Home Automation ...... 97 8.3.3 OPC Foundation: The Interoperability Standard for Industrial Automation ...... 98 8.4 Civil engineering / construction...... 99 8.5 Traffic Management ...... 100 8.5.1 OpenLR (Location Referencing) ...... 100 8.5.2 Transport Protocol Experts Group TPEG ...... 100 8.5.3 Traffic Message Channel (TMC) ...... 101 8.5.4 DATEX II ...... 101 8.5.5 General Transit Feed Specification (GTFS) ...... 101 8.5.6 The TRIAS interface ...... 102 8.5.7 Mobilitätsdatenmarktplatz (MDM) and mCLOUD ...... 102 9 REFERENCED DOCUMENTS ...... 104 10 APPENDIX A-QUESTIONNAIRE ...... 110 10.1 appendix a-questionnaire: participants’ domains of service ...... 110

IMS-WP7-D7.1-DLR-006-02-I 6 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

10.2 appendix a-questionnaire: extent of use and use cases of open data exchange formats ...... 110 10.2.1 Appendix A-questionnaire: extent of use and use cases of railway formats ... 110 10.2.2 Appendix A-questionnaire: extent of use and use cases of maintenance formats 112 10.2.3 Appendix A-questionnaire: extent of use and use cases of other formats ...... 115 10.3 appendix a-questionnaire: how generic/specialised are open data exchange formats? 120 10.3.1 Appendix a-questionnaire: how generic/specialised are railway formats? ...... 120 10.3.2 Appendix a-questionnaire: how generic/specialised are maintenance formats? 121 10.3.3 Appendix a-questionnaire: how generic/specialised are other formats? ...... 121 10.4 appendix a-questionnaire: “best” example of an open data exchange format suitable for one of several sources of information ...... 125 10.5 appendix a-questionnaire: Optional mindset questions: Open Data Exchange policy and attitude towards Open Data ...... 127 10.6 appendix a-questionnaire: strengths and weaknesses of open data exchange formats 129 10.7 appendix a-questionnaire: licensing and/or legal issues hampering application of open data exchange formats ...... 132

IMS-WP7-D7.1-DLR-006-02-I 7 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

TABLE OF FIGURES

Figure 1: Generic architecture overview...... 18 Figure 2: Domains of the participating companies or institutions ...... 27 Figure 3: How generic/specialised are railway formats on average? ...... 30 Figure 4: Tag cloud of "best" formats and the respective criteria ...... 31 Figure 5: Extent of participation in Open Data Exchange initiatives and communities ...... 32 Figure 6: Object representation in JSON...... 36 Figure 7: Array representation in JSON...... 36 Figure 8: Value representation in JSON...... 36 Figure 9: String representation in JSON...... 37 Figure 10: Number representation in JSON...... 37 Figure 11: OPC UA Concepts ...... 43 Figure 12: Typical Messaging System - Logical Model ...... 45 Figure 13: ISO Standards related to BIM ...... 61 Figure 14: Effectiveness of asset maintenance methodologies on ...... 61 Figure 15: MIMOSA – Open Asset Information Model ...... 63 Figure 16: MIMOSA Open System Architecture for Enterprise Application Integration (OSA- EAI)...... 64 Figure 17: OSA-CBM functional blocks ...... 65 Figure 18: IEC 62682 Alarm State Model ...... 67 Figure 19: Principle of common interface for TAF/TAP TSIs ...... 87 Figure 20: Common interface for TAF TSI ...... 88 Figure 21: Principle of common interface for RINF ...... 92 Figure 22: EULYNX System architecture...... 94 Figure 23: Tag cloud of use cases/comments for railway formats ...... 111 Figure 24: Tag cloud of use cases/comments for maintenance formats ...... 114 Figure 25: Tag cloud of use cases/comments for other formats ...... 118 Figure 26: How generic/specialised are maintenance formats? ...... 121 Figure 27: Top 15 generic other formats...... 122 Figure 28: Top 15 specialised other formats ...... 123 Figure 29: How generic/specialised are other formats on average? ...... 124 Figure 30: Policy of the company/institution ...... 128 Figure 31: Attitude of the company/institution: Mean attitude ...... 129 Figure 32: Tag cloud of strengths and weaknesses of Open Data exchange formats ...... 130 Figure 33: Tag cloud for licensing and/or legal issues hampering application of Open Data Exchange formats ...... 133

IMS-WP7-D7.1-DLR-006-02-I 8 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

TABLE OF TABLES

Table 1: IN2RAIL deliverables ...... 21 Table 2: Extent of use of railway formats (frequencies of answers) ...... 28 Table 3: OPC UA Standards ...... 44 Table 4: Vendor Messaging system protocols ...... 47 Table 5: Vendor Messaging system: Message Exchange Patterns Support ...... 48 Table 6: Comparison Web Services vs. Web APIs [29] ...... 52 Table 7: Data feeds in France ...... 74 Table 8: Data feeds in Germany ...... 75 Table 9: Data feeds in Switzerland ...... 75 Table 10: Data feeds in the United Kingdom ...... 76 Table 11: railML® versions [77] ...... 79 Table 12: Primary purpose of RINF ...... 92 Table 13: Use cases/comments for railway formats ...... 111 Table 14: Extent of use of maintenance formats (frequencies of answers) ...... 112 Table 15: Use cases/comments for maintenance formats ...... 114 Table 16: Extent of use of other formats (frequencies of answers) ...... 115 Table 17: Use cases/comments for other formats ...... 118 Table 18: "Best" formats for several information sources with the respective criteria ...... 125 Table 19: Strengths and weaknesses of Open Data exchange formats ...... 130 Table 20: Licensing and/or legal issues hampering application of Open Data Exchange formats ...... 133

IMS-WP7-D7.1-DLR-006-02-I 9 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

1 EXECUTIVE SUMMARY This deliverable is providing a current, focused and brief spotlight on today’s most relevant formats and technologies as IT/data technologies evolve quickly. All mature industry sectors put concurrently significant effort in digitalization and automation leading to rapid developments. This document serves as working resource for the work to be done on data standardization in task 7.2. D7.1 enlarges furthermore the list of standards suitable for data representation with respect to the work done in lighthouse project IN2RAIL. These new identified standards should be considered to integrate the data representation being defined in IN2RAIL, focused on TMS, in order to cover the maintenance requirements and needs. In fact, the state of the art analysis presented here is not capable to provide already a selectable short-list. Without forestalling WP7.2 work results it seems rather that not a single standard data format but a couple of them might be recommendable. The detailing of the final requirements as well as the prototype implementation to be done in task 7.2 will be necessary to finally select a suitable combination of open data exchange formats. It is foreseen, that next to the utilization of as much as possible flexible cross-domain formats and technologies the ongoing IN2RAIL activities in communities relevant for railways (e. g. railML) should also be followed during the next steps in IN2SMART. Due to the rapid developments in data exchange in all industry domains a careful reviewing and monitoring of available and emerging technologies have to be maintained throughout the project. The deliverable summarizes in section 4 the state of IN2RAIL results which are relevant for IN2SMART work package 7. In order to have a further enhanced basis for the state of the art a questionnaire was conceptually designed and conducted via an online survey “Open Data Exchange formats” together with all project partners from March to May 2017. Section 5 presents the results of it to give first impressions about relevant formats and opinions - detailed information can be found in “10 appendix a – questionnaire”. In section 6 the general relevant technologies are listed and described reaching from basic file formats over modeling languages and tools, communication protocols and Web Services to special in memory data grid technologies. In section 7 applications and application areas are described which gives insights in more specialized use in the fields of geodata, sensor and measurement data, maintenance, business process analytics and business data exchange standards. Section 8 gives insights to some more domain-specific concepts as well as appreciations about the most important ones and the main streams, in parts driven by also mentioned relevant initiatives/communities in the according domains and central places to go.

IMS-WP7-D7.1-DLR-006-02-I 10 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

1.1 ACRONYMS AND ABBREVIATIONS The following tables provide definitions for acronyms and abbreviations and for terms used in this document. Definition AD Active Directory ANSI American National Standards Institute AP Access Point API Application programming interface ASCII American Standard Code for Information Interchange ASPRS American Society for Photogrammetry and Remote Sensing AWS Automatic Warning System BIM Building Information Modeling BP (“Betriebsprotokoll”) operations log BMWi German Federal Ministry for Economic Affairs and Energy BMVi German Federal Ministry for Transport and Digital Infrastructure BPM Business Process Management BPML Business Process Modeling Language BPMI Business Process Management Initiative BPMS Business Process Management System B2B Business to Business C “C” programming language C++ “C++” programming language CAD Computer-Aided Design CC Control Component of ACC (Active Cruise Control) CDF Common Data Format CDM Canonical Data Model The Common Interface File (CIF) format is the industry standard for transfer of schedules electronically from 's Integrated Train CIF Planning System (ITPS) to downstream operational and information systems. CM Configuration Management CM Counting Monitoring error

IMS-WP7-D7.1-DLR-006-02-I 11 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition CM Coupling Mode COM Communication COM Component Object Model CPO Code of PLM Openness COTS Commercial off-the-shelf CSV Comma-Separated Values DB Data Bus signal EB Emergency Brake EC Element Controller EC European Community EC Evaluation Computer EN European standard ERA European Railway Agency ERTMS European rail traffic management system EV (“Endverbinder”) terminal bond F Fail-safe FIFO First In, First Out FM Function Module FS Full Supervision FTP File Transfer Protocol GIS Geographic Information System GML Geography Markup Language GPS Global Positioning System GTFS General Transit Feed Specification GWT Google Web Toolkit HDF Hierarchical Data Format HTML Hypertext Markup Language HTTP Hypertext Transfer Protocol HVAC Heating, Ventilation and Air-Conditioning

IMS-WP7-D7.1-DLR-006-02-I 12 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition I/O Input/Output ID Identifier IDMVU (Infrastruktur-Daten-Management für Verkehrsunternehmen“) infrastructure data management for transportation companies IDS Intrusion detection system IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IFC InterFace Connection IL Integration Layer INSPIRE Infrastructure for Spatial Information in Europe IM Interface Module IMDG In-Memory Data Grid IP Ingress Protection (class) IP Internet Protocol (RFC791) IS Information Security IS Isolation Mode ISO International Organization for Standardization IT Information Technology JMS Java Messaging Service JSON JavaScript Object Notation KM (“Kilometrierung”) mileage, kilometrage LCC Life-Cycle Costs LDP Linked Data Platform MDM (“Mobilitäts-Daten-Marktpatz”) Mobility data market place ML Delete Reminder Note MP MegaPixel MS Mini-main Signal MS Modular Standard MS N Neutral conductor

IMS-WP7-D7.1-DLR-006-02-I 13 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition NetCDF Network Common Data Format NR Noise speed Reduction NR Not Responsible mode NURBS Non-Uniform Rational B-Spline O&M Operation and Maintenance O&M Operation and Monitoring logic ÖBB Austrian Federal Railways OGC Open Geospatial Consortium OMG Object Management Group OOP Object-Oriented Programming OpenLR Open Location Referencing openCRG Open Curved Regular Grid OPC UA OPC Unified Architecture OS On-Sight (mode) OS Operating System OSA Operating System Adaptor OSA-EAI Open System Architecture for Enterprise Application Integration OSM Open Street Map OSLC Open Service for Lifecycle Collaboration PA Passenger Announcement PA Possession Area PA Proceed Authority PC Personal Computer PDF Portable Document Format PDM Product Data Management PLM Management PMI Product and Manufacturing Information PNG Portable Network Graphics POI Points Of Interest

IMS-WP7-D7.1-DLR-006-02-I 14 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition QoS Quality of Service railML Rail Markup Language RAM Random Access Memory RAM Reliability, Availability, Maintainability RAMS Reliability, Availability, Maintainability and Safety RBAC Role Based Access Control RDF Ressource Description Framework REST Representational State Transfer RIF Requirements Interchange Format RINF Register of infrastructure model RF Radio Frequency RST Reset button RST Rolling STock RTM RailTopoModel RU Regional Unit SCADA Supervisory Control and Data Acquisition SD SecureDigital Memory (card) sensorML Sensor Markup Language SGML Standard Generalized Markup Language SIG Signal information SIG "Signaling equipment supplier”; “signaling and safety systems" SIL Safety Integrity Level SIL Siemens Language SMTP Simple Mail Transfer Protocol SNMP Simple Network Management Protocol SOA Service-Oriented Architecture SOAP Simple Object Access Protocol SOS Sensor Observation Service SPS Sensor Planning Service

IMS-WP7-D7.1-DLR-006-02-I 15 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition SQL Structured Query Language SRS Speed Restriction Section SRS System Requirements Specification SSN Semantic Sensor Network STS Security Translator System SWE Sensor Web Enablement SWEET Semantic Web for Earth and Environmental Terminology SysML Systems Modeling Language S&C Switches and Crossings TAF Track Ahead Free TC Track Circuit TC Train Consist TCP Transmission Control Protocol TCP/IP Transmission Control Protocol/Internet Protocol TD Maximum permissible data transmission duration TMC Traffic Message Channel TMS Traffic Management System (same as OCS) TMS Train Management System TPEG Transport Protocol Experts Group TR Technical Report TSI Technical Specifications for Interoperability TSR Temporary Speed Restriction UIC Union internationale des chemins de fer (international union of railways) UML Unified Modelling Language UN Non-provided mode UN Unfitted URL Uniform Resource Locator V Voltage VB Visual Basic

IMS-WP7-D7.1-DLR-006-02-I 16 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Definition VDV Association of German Transport Undertakings W3C World Wide Web Consortium WCF Windows Communication Foundation WCS Web Coverage Service WFS Web Feature Service WFMC Workflow Management Coalition WFMS Workflow Management System WP Work Package WMS Web Mapping Service WS Web Service WSDL Web Service Description Language XML eXtensible Markup Language

IMS-WP7-D7.1-DLR-006-02-I 17 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

2 BACKGROUND

This deliverable represents summarized state-of-the-art information gathered within task 7.1 of work package 7 as background for the work in the following task 7.2. Main purpose of this document is to provide a source of information for the upcoming work within work package 7 to define standards for data formats and exchange throughout IN2SMART and within the Shift2Rail ecosystem. This document is only capable to provide a current, focused and brief spotlight on today’s most relevant formats and technologies as IT/data technologies evolve quickly. Furthermore, all mature industry sectors put concurrently significant effort in digitalization and automation leading to rapid developments. Figure 1 illustrates a preliminary generic architecture for a solution for seamless diagnostic data gathering from multiple signalling and telecom systems, each characterized by a proprietary interface. In the middle of this architecture, named as proxy level in the figure, the objective of WP7 is to develop a guideline for Open Standard Interfaces for maintenance data including models and data exchange.

Figure 1: Generic architecture overview.

IMS-WP7-D7.1-DLR-006-02-I 18 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

3 OBJECTIVE/AIM

The aim of the D 7.1 is to analyse the state of the art in open data exchange formats in order to have a comprehensive overview as a resource for the work to be done on data standardization in IN2SMART task 7.2. The analysis is done not only for the railway domain, but for different domains to get a look at what applications / ideas / concepts can be used to fulfill requirements in the IN2SMART project. Main purpose is to support the utilization of already established and emerging standards for data exchange for the domain of railway asset management to efficiently implement sustainable solutions instead of creating costly, isolated, and railway specific solutions. The other WPs are also involved in the process of defining the needs and requirements for task 7.2. The focus of task D 7.1 relies on the data exchange formats itself, the evaluation of data exchange formats in combination with big data approaches such as data ingestion will be addressed in task 7.2. This document is only capable to provide a current, focused and brief spotlight on today’s most relevant formats and technologies as IT/data technologies evolve quickly. Furthermore, all mature industry sectors put concurrently significant effort in digitalization and automation leading to rapid developments. Therefore, careful reviewing and monitoring of available and emerging technologies have to be maintained throughout the project.

IMS-WP7-D7.1-DLR-006-02-I 19 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

4 SUMMARY OF RELEVANT IN2RAIL RESULTS

4.1 IN2RAIL DESCRIPTION

The IN2RAIL project [1] is one of the “lighthouse” projects of Shift2Rail and is contributing to Innovation Programme 2 “Advanced Traffic Management and Control Systems” and 3 “Cost- Efficient and Reliable High-Capacity Infrastructure”.

IN2RAIL aims to set the foundation for a resilient, cost-efficient, high capacity, and digitalised European rail network and to make advances towards Shift2Rail objectives:  enhancing the existing capacity fulfilling user demand of the European rail system,  increasing the reliability delivering better and consistent quality of service of the European rail system,  reducing the Life Cycle Cost (LCC) increasing competitiveness of the European rail system and European rail supply industry.

IN2RAIL has been organized into three technical sub-projects: 1. Smart Infrastructure Smart Infrastructure adopts a whole system approach which addresses the fundamental design of critical infrastructure assets – switches and crossings (S&C), and the track system. It will research infrastructure components capable of meeting the demands of future and will utilise modern development technologies such as rapid prototyping and integrated virtual testing in the process. Risk and condition-based LEAN approaches to optimise RAMS and lifecycle costs in asset maintenance activities will be created to tackle the root causes of degradation and target known problem areas. 2. Intelligent Mobility Management sub-project (I2M) I2M researches advanced traffic management systems that are automated, interoperable and inter-connected; scalable and upgradable. Utilising standardised products and interfaces enables easy migration from legacy systems. The research targets the wealth of available data and transforms it into harmonised, useable information to improve and fully exploit network capacity. Currently the data is distributed over a wide range of information systems of differing standards. A standard ICT environment supporting transport operations with standard interfaces and protocols will be developed, enabling an open, integrated Traffic Management System (TMS). Advances will be made to the state of the art of asset information management systems, adding the capability of ‘nowcasting’ and forecasting of critical asset status. 3. Rail Power Supply and Energy Management Rail Power Supply and Energy Management sub-project provides solutions to improve the energy performance of the railway system. The research focuses on new power systems characterised by reduced losses and capable of balancing energy demands,

IMS-WP7-D7.1-DLR-006-02-I 20 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

along with innovative energy management systems that enable accurate and precise estimates of energy flows within the railway. This should result in reduced energy consumption and costs, optimised asset management and better use of the railway capacity.

IN2SMART WP7 “DRIMS Open Standard Interfaces” topics are mainly addressed by the I2M subproject within the following WP:  IN2RAIL WP8 – Intelligent Mobility Management (I2M) - Integration Layer It addresses and develops a standardised integrated ICT environment capable of supporting diverse TMS dispatching services and operational systems. WP8 includes standard interfaces to external systems outside TMS/dispatching (for other railway management systems and transport modes) with a plug and play framework for TMS/dispatching applications.  IN2RAIL WP9 – Intelligent Mobility Management (I2M) – ‘Nowcasting’ and Forecasting: WP9 focuses on the design and development of an advanced asset information system with the ability to ‘nowcast’ and forecast network asset status with the associated probabilities. This should allow TMS/dispatching systems to seamlessly access heterogeneous data sources. WP9 bases its work on the findings of WP7 and complements the standardised integrated ICT environment of WP8.

4.2 IN2RAIL DELIVERABLES The following table lists all deliverables that should be considered for the review of the state-of- the-art [2]. Table 1: IN2RAIL deliverables Number Title Dissemination Due Delivered Level date (project months) D8.1 Requirements for the Integration Layer Public 18 Y D8.2 Requirements for Interfaces Public 27 N D8.3 Description of Integration Layer and Public 36 N Constituents D8.4 Interface Control Document for Public 36 N Integration Layer Interfaces, external/ Web interfaces and Dynamic Demand Service D8.5 Requirements for the Generic Application Public 15 Y Framework D8.6 Description of the Generic Application Public 27 N Framework and its constituents

IMS-WP7-D7.1-DLR-006-02-I 21 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

D8.7 Interface Control Document (ICD) for Public 27 N Application-specific Interfaces D8.8 Integration Test Plan for Application Public 36 N Framework and Constituents D9.1 Asset status representation Public 18 Y

The IN2RAIL project duration is 36 months, being the starting date 01/05/2015. For this reason the delivery date of IN2SMART 7.1 deliverable will correspond to IN2RAIL M27. Not all IN2RAIL deliverables will produce their final results in time for evaluation within the review of the state-of-the-art.

4.3 D9.1 ASSET STATUS REPRESENTATION [3] 4.3.1 Deliverable content The “Asset Status Representation” document aims to describe a data representation for the status of assets within the railway infrastructure. The logical steps followed by the document are:  Identification of the attributes needed to represent the operational status of a set of nine railway assets relevant to the TMS (defined in other work packages within IN2RAIL project). The considered assets are: o Switch o Crossing o Track (Rail) o Catenary o Bridge o Tunnel o Embankments o Line sections o Each of them has been described by distinguishing their attributes into: o Static data: related to static characteristics of the asset under examination, with values that never change or change infrequently, o Dynamic data: with values that change frequently and are related to operational state; they are further classified in Internal, Asset-related, External, Diagnostic and Maintenance

 A review of existing modelling approaches to the problem area, and production of recommendations for modelling of assets as described in previous step. Different models have been considered for both static and dynamic attributes: railML, railML2, RailTopoModel/railML3, Register of Infrastructure (RINF) model, Infrastructure

IMS-WP7-D7.1-DLR-006-02-I 22 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

for Spatial Information in Europe (INSPIRE), Open Geospatial Consortium’s (OGC) Sensor Web Enablement (SWE) framework, Semantic Web for Earth and Environmental Terminology (SWEET), Semantic Sensor Network (SSN).

 Production of proof of concept examples illustrating the use of the proposed approach for some of identified assets: level crossing and switch.

4.3.2 Deliverable Conclusions None of reviewed existing models was able to adequately represent both static and dynamic attributes independently. For this reason a hybrid approach has been proposed:  railML has been proposed to describe static elements  OGC/sensorML has been proposed to describe dynamic elements The proof-of-concept examples mentioned in previous paragraph have been produced using the railML/sensorML combined approach.

4.4 D8.1 REQUIREMENTS FOR THE INTEGRATION LAYER 4.4.1 Deliverable content

The D8.1 summarises the work done in first part of task 8.1, Integration Layer (IL), to produce a system requirements specification (SRS) for a standardised information exchange layer to be provided to TMS and external systems.

Regarding IL purpose, WP8 focused mainly on:  providing communication based on a standardised data model between railway services, applications, and interface plug-ins communicating to external systems,  providing a standard communication medium between the business applications (i.e. TMS applications) running in the context of the Generic Application Framework (D8.5 scope).

Among the others, D8.1 introduces the concepts of:  Canonical Data Model: it contains data types for exchanged data between TMS and external systems and also contains the definitions of relations between them  Information Item: it is a unit of information exchanged within the TMS (i.e. between TMS business applications/services) or between the TMS and the external systems. Information Item has the following properties: it may be structured data, and it is atomic (i.e. irreducible in fields without loss of meaning)

IMS-WP7-D7.1-DLR-006-02-I 23 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

The requirement list has been organized into the following categories:  Communication  Messaging  Topic Tagging  Security and Accounting  Availability and QoS  IT Management  Data Access Patterns  Implementation of IL  Compliance with existing standards 4.4.2 Deliverable Conclusions The D8.1 output is a requirements list that should be used as reference document for all following deliverables related to Integration Layer design.

4.5 D8.5 REQUIREMENTS FOR THE GENERIC APPLICATION FRAMEWORK 4.5.1 Deliverable content

The D8.5 summarises the work done in first part of task 8.2, Generic Framework for Application, to produce a system requirements specification (SRS) for a standardized generic application framework allowing plug-and-play of service application module.

The Generic Application Framework, which is work of IN2RAIL task 8.2, comprises TMS core applications managing highly dynamic service related processes, associated communications and required system services to enable plug-and-play functionality.

The long term objective is to provide a standardised integrated ICT environment supporting diverse TMS applications that are connected to other multimodal operational systems.

The standardisation includes specification of the interfaces to external systems and plug-and - play mechanisms for the TMS-applications inside of the Application Framework.

The requirement list has been organized into the following categories:  Communication  Availability, Performance  Data Management system  Security of information system  Requirements on Data model  Safety Integrity Level (SIL) Requirements  Start-up, Shut-down

IMS-WP7-D7.1-DLR-006-02-I 24 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 Synchronization/Time Management  Directory (naming service, identifying services)  Requirements related to API-Type  Requirements related to operating environment  Monitoring/profiling the Applications  Scalability  Scheduling  Transactions  Logging and traceability  Alarm and Events  Workflow, module orchestration  Backwards/version compatibility  Applicable standards  Diagnostics and System Maintenance  Manuals and documentation

4.5.2 Deliverable Conclusions The D8.5 output is a requirements list that should be used as reference document for all following deliverables related to Generic Application Framework design.

4.6 ANNEX TO D8.3: DESCRIPTION OF THE CANONICAL DATA MODEL The requirement analysis of D8.1 and D8.5 identified the concept of Canonical Data Model as one of the patterns to be used within the framework for message data modeling. A Canonical Data Model defines message formats that are independent from any specific application so that all applications can communicate with each other in this common format. Since no specific deliverable for this important topic has been indicated within the project, a dedicated Annex to deliverable D8.3 “Description of Integration Layer and Constituents” will be prepared. The annex will describe the data model to be used within IN2RAIL taking into account existing data models and analyzing their characteristics. The current plan is to base the data model on railML 3. A collaboration with railML.org has been established, in order to expand the scope of railML to cover the needs of TMS, in particular real time data. There is a major risk related to the fact that railML3 is not yet released and the choice to base CDM on railML3 could be re-evaluated.

4.7 CONCLUSIONS At the time of writing the above mentioned deliverables represent the work related to data management/data exchange that should be available before the due date of IN2SMART D7.1.

IMS-WP7-D7.1-DLR-006-02-I 25 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

IN2RAIL D9.1 should be taken into account for data modeling and data format analysis since it analyses a subset of existing standards and makes an explained choice of the best fitting ones.

The CDM document, previously planned as Annex to D8.6 to be refined as Annex to D8.3, won’t be delivered with D8.6, thus it won’t be available before the due date of this deliverable. Nevertheless the ongoing work within IN2RAIL WP8 Canonical Data Model document could be taken into account as emerging data model.

The D9.1 approach is to provide a subset of relevant assets and generate some examples of how they could be mapped into selected standards. The Annex will concentrate on providing a generic data model.

IN2RAIL D8.1 and D8.5, Requirement for Integration Layer and Generic Application Framework are less data standardization oriented, but they provide requirements, e.g. for communication patterns, that should be considered while analysing existing communication protocols to be used for data exchange.

All the considerations/conclusions of IN2RAIL are focused on a TMS application; this must be considered keeping in mind that the scope of IN2SMART is the Intelligent Asset Management and requirements may also be different.

IMS-WP7-D7.1-DLR-006-02-I 26 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

5 ONLINE SURVEY

5.1 QUESTIONNAIRE From March 20, 2017 to May 2, 2017, experts on Open Data Exchange formats have been invited to an online questionnaire on Open Data Exchange formats. The purpose of the questionnaire was to collect information  about the use of Open Data Exchange formats in the participating companies or institutions,  about the extent of their participation in the respective Open Data Exchange format communities,  about their general mindset and attitude towards Open Data Exchange formats, and  about other aspects of formats for Open Data Exchange, such as e.g. how generic or specialised they are perceived as by the users.

More details can be found in appendix A.

5.2 FEEDBACK A summary of the obtained feedback follows; details can be found in Appendix A.

5.2.1 Participants’ domains of service As shown in Figure 2, most participants came from Railway, Automotive, and from Traffic Management (each with about the same fraction of the sample size).

Domains of the participating companies or institutions Building Energy; 1 monitoring; 1 Tunnel; 1

Traffic Railway; 8 management; 7

Industry Automotive; 7 automation; 3

Figure 2: Domains of the participating companies or institutions

IMS-WP7-D7.1-DLR-006-02-I 27 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

5.2.2 Extent of use and use cases of Open Data Exchange formats Railway formats: Table 2 shows the extent of use of 19 well-known railway formats, given as frequencies of answers in the respective categories (the darker the green cell colour, the more frequently the respective category has been chosen as answer). IDMVU, LandXML, and the rail track database from ÖBB, an Austrian mobility services provider (“ÖBB Gleisdatenbank”), are all formats used within the context of the traffic planning software PROVI. RTM (RailTopoModel), railML v 3 as well as RINF are all tested in pilots, whereas TAF TSI is already in use for Train Composition. Figure 23 in Appendix 10.2.1 depicts a tag cloud (generated with a tool like e.g. Wordle [4]) of use cases/comments for railway formats given by the participants: according to Wikipedia, a tag cloud is “[…] a visual representation of text data […]. The importance of each tag is shown with font size or colour. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence”. Table 13 in Appendix 10.2.1 gives their answers in full detail.

Maintenance formats: According to the feedback, the Alarms Management Standard in Industrial Asset Management EN 62682:2015 & EMMUA 191, while quite new for Rail markets, are established well in plant, manufacturing and in materials processing markets. MIMOSA OSA-CBM is used for rail remote condition monitoring of infrastructure assets, for data acquisition, manipulation and state detection. Newer uses are in preparation for health assessment and for prognostic assessment. SensorML is currently tested in pilots.

Table 2: Extent of use of railway formats (frequencies of answers) Format: of interest in considered, not in preparation in operation future used RINF 1 2 2 2 TAF TSI 3 2 0 2 UIC 407-1 0 0 1 1 LandXML 0 0 0 1 CIVIL3D 0 0 0 1 csv 0 0 0 1 Signalling Data Exchange Format (SDEF) 0 0 0 1 DATEX II 0 0 0 1 OSLC 0 0 0 1 FMI 0 0 0 1 OPC-UA 0 0 0 1 MQTT 0 0 0 1 ICE870-5 0 0 0 1 RTM / railML v 3 3 2 7 0

IMS-WP7-D7.1-DLR-006-02-I 28 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

TAP TSI 2 1 3 0 ÖBB Gleisdatenbank 0 1 0 0 FRAME 0 1 0 0 OpenScenario 0 0 1 0 OpenSimulation 0 1 0 0

Table 14 in Appendix 10.2.2 shows the extent of use of 8 well-known maintenance formats, given as frequencies of answers in the respective categories (the darker the green cell colour, the more frequently the respective category has been chosen as answer). Figure 24 in Appendix 10.2.2 depicts a tag cloud of use cases/comments for maintenance formats given by the participants. Table 15 in Appendix 10.2.2 gives their answers in full detail.

Other formats: According to the feedback, the formats ASCII-GRID, Simple Feature Access, GeoPackage, Esri Shape and RoadXML are used for Noise Maps and Road Noise Maps; LandXML, CityGML, OSM, and Esri Shape are used in conjunction with software for traffic or infrastructure planning / Building Information Modeling (BIM) like PROVI, Infra Works, and Civil 3D; formats like e.g. UML, GML, City GML, and OLE/COM (OpenGIS) play a major role in tasks like import, export and modelling; OSM and OpenWeather-Maps find application in Webservices; GeoTIFF and GML in JPEG 2000 are used for Overlays. Moreover, SNMP is used for Standard Server Monitoring and Product Monitoring. Table 16 in Appendix 10.2.3 shows the extent of use of 96 other (miscellaneous) formats, given as frequencies of answers in the respective categories (the darker the green cell colour, the more frequently the respective category has been chosen as answer). Figure 25 in Appendix 10.2.3 depicts a tag cloud of use cases/comments for the other formats given by the participants. Table 17 in Appendix 10.2.3 gives their answers in full detail.

5.2.3 How generic/specialised are Open Data Exchange formats? Railway formats: Figure 3 shows how generic/specialised 14 prominent railway formats were perceived on average, averaged over the participating experts. According to the obtained feedback, the ÖBB Gleisdatenbank, the comma-separated values (CSV) file format in its special use case for time tables, as well as the RTM / railML v 3 formats are perceived as the most specialised formats.

Maintenance formats: According to the obtained feedback, the NR-L2-SIG-30036-Issue1 and the comma-separated values (CSV) file format in its special format/use case for time tables are perceived as the most specialised formats. Figure 26 in Appendix 10.3.2 shows how generic/specialised 8 well- known maintenance formats were perceived on average over the participating experts.

IMS-WP7-D7.1-DLR-006-02-I 29 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Other formats: OpenLR, OpenSCENARIO, TMS, TPEG, WKT CRS, Coordinate Transformation, KNXbus, RoadXML, ONVIF, OpenWeather-Maps, BACnet, RDS, TMC, DATEX2, and ASCII- GRID have been named as the top 15 most specialised miscellaneous formats. Figure 27/Figure 28 in Appendix 10.3.3 show the top 15 generic/specialised other formats, averaged over the participating experts.

How generic/specialised are railway formats on average?

ÖBB Gleisdatenbank csv RTM / railML v 3 TAP TSI RINF railML EULYNX IP-KOM-ÖV How generic/specialised on average TAF TSI 1-5; SDEF - Network Rail 0: not answered Civil 3D LandXML OJP IDMVU UIC 407-1

0 1 2 3 4 5

Figure 3: How generic/specialised are railway formats on average?

5.2.4 “Best” example of an Open Data Exchange format suitable for one of several sources of information Figure 4 depicts a tag cloud for best formats and the respective criteria given by the answers of the participants. Table 18 in Appendix 10.4 gives their answers in full detail.

In most of the answers, railML has been given as best format for modelling Rail infrastructure and Rolling-stock, and also for Rail operation plans. RailML is also the most frequently named best format in total, preferred because it is an open format, and due to its large user base.

IMS-WP7-D7.1-DLR-006-02-I 30 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

For asset condition, alarms and events, and asset maintenance, the majority of answers gave SensorML as the best format, e.g. because of a good user experience. For alarms and events, OPC-UA or OPC-AE was named as best format just as often as SensorML. Among other formats like SysML and BPMN, BPML has been named as best format for a business process notation model because it is generic, widely used, and well known.

Figure 4: Tag cloud of "best" formats and the respective criteria

5.2.5 The extent to which a participant’s company or institution participates in Open Data Exchange initiatives and communities Figure 5 shows the extent to which a participant’s company or institution participates in Open Data Exchange initiatives and communities. According to the feedback, a significant part of the companies or institutions of the participating experts were contributors and/or active developers in OpenDRIVE, OpenLR, railML.org, the RailTopoModel Expert Group, OpenSCENARIO, OpenCRG, SysML, OSLC, and in Road2Simulation.

IMS-WP7-D7.1-DLR-006-02-I 31 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

5.2.6 Optional mindset questions: Open Data Exchange policy and attitude towards Open Data The feedback for a set of optional general mindset questions is shown in Figure 30 and Figure 31 in Appendix 10.5. For half of the participating companies or institutions, answers were given to the optional set of mindset questions. According to the obtained feedback, all companies or institutions are willing to contribute to Open Data Exchange initiatives/portals. Two out of three companies or institutions define Open Data strategies, and 40% of the companies or institutions even owned an Open Data portal.

Extent of participation in Open Data Exchange intiatives and communities

8 7 6 5 4 3 2 1 0

No activities Frequencyofanswer Following/applying Contributing/developing

Figure 5: Extent of participation in Open Data Exchange initiatives and communities

5.2.7 Strengths and weaknesses of Open Data Exchange formats General weaknesses have been seen in possible misinterpretations, in a potential confusion arising from the fact that there are too many formats in total, too many solutions, and finally because the potential risk of misuse of data is relatively high with universal formats. A strength of railML and railTopoModel is the fact that they are defined involving the main European railway actors, and that they are standardised open formats. On the other hand, railML has been criticised because it is not yet completed for all railway assets, because it is “uglily” (/nasty) hierarchical and huge, and because the tools interpret the data differently (i.e. only a subset of the format is implemented and not all data format versions are supported). Figure 32 in Appendix 10.6 depicts a tag cloud for strengths and weaknesses of Open Data Exchange formats as given by the participants. Table 19 in Appendix 10.6 gives their answers in full detail.

IMS-WP7-D7.1-DLR-006-02-I 32 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

5.2.8 Licensing and/or legal issues hampering application of Open Data Exchange formats Issues include unclear adoption policies (railML), a confusing tangle of different uses, programs and policies (SHP), the obstacle of horrendous costs for joining the consortium prior to access (NDS), and the general problem that data from business projects usually is not royalty-free and therefore cannot be provided as Open Data. Figure 33 in Appendix 10.7 depicts a tag cloud for licensing and/or legal issues hampering application of Open Data Exchange formats as named and explained by the participants. Table 20 in Appendix gives their answers in full detail.

5.3 CONCLUSIONS Regarding use cases of Open Data Exchange formats, RTM (RailTopoModel), railML v 3 as well as RINF are all tested in pilots, whereas TAF TSI is already in use for Train Composition. EN 62682:2015 & EMMUA 191, while quite new for Rail markets, are established well in plant, manufacturing and in materials processing markets. MIMOSA OSA-CBM is used for rail remote condition monitoring of infrastructure assets, for data acquisition, manipulation and state detection. Sensor ML is currently tested in pilots. Regarding the question, as how specialised / generic experts do perceive Open Data Exchange formats, formats perceived as most specialised are RTM / railML v 3, NR-L2-SIG- 30036-Issue1, OpenLR, OpenSCENARIO, TMS, TPEG, WKT CRS, Coordinate Transformation, KNXbus, RoadXML, ONVIF, OpenWeather-Maps, BACnet, RDS, TMC, DATEX2, and ASCII-GRID. Concerning the question for the best formats for various sources of information, in most of the answers, railML has been named as the best format for modelling Rail infrastructure and Rolling-stock, and also for Rail operation plans, preferred because it is an open format, and due to its large user base. For asset condition, alarms and events, and asset maintenance, the majority of answers gave SensorML as the best format, e.g. because of a good user experience. Regarding participation in Open Data Exchange initiatives/communities and regarding optional mindset questions, a significant part of the companies or institutions of the participating experts were contributors and/or active developers in OpenDRIVE, OpenLR, railML.org, the RailTopoModel Expert Group, OpenSCENARIO, OpenCRG, SysML, OSLC, and in Road2Simulation. According to the obtained feedback, all companies or institutions are willing to contribute to Open Data Exchange initiatives/portals. Two out of three companies or institutions define Open Data strategies, and 40% of the companies or institutions even owned an Open Data portal. As for strengths and weaknesses of Open Data Exchange formats, general weaknesses have been seen in possible misinterpretations, in a potential confusion arising from the fact that there are too many formats in total, too many solutions, and finally because the potential risk of misuse of data is relatively high with universal formats. A strength of railML and railTopoModel is the fact that they are defined involving the main European railway actors, and that they are

IMS-WP7-D7.1-DLR-006-02-I 33 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

standardised open formats. On the other hand, railML has been criticised because it is not yet completed for all railway assets, and because it is “uglily” (/nasty) hierarchical and huge. In this regard it seems worth to note that in planned version 3 of railML the structure will already be simpler and "flatter", and thus less hierarchical. Finally, concerning licensing and/or legal issues hampering application of Open Data Exchange formats, issues include unclear adoption policies (railML), a confusing tangle of different uses, programs and policies (SHP), and the obstacle of horrendous costs for joining the consortium prior to access (NDS). A general problem is that data from business projects usually is not royalty-free and therefore cannot be provided as Open Data.

IMS-WP7-D7.1-DLR-006-02-I 34 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

6 OPEN DATA EXCHANGE: TECHNOLOGIES The survey with the partners described in section 5 gives a first sketch and valuable hints for the state of the art in open data exchange. In section 6 as a first step going into details the commonly used relevant technologies are described reaching from basic file formats over modeling languages and tools, communication protocols and Web Services to special in memory data grid technologies. This will be followed by the use in application area where some more formats are described as well in section 7 and more domain-specific technologies, applications and relevant communities in section 8.

6.1 FILES 6.1.1 General formats 6.1.1.1 JSON JSON (JavaScript Object Notation) is a simple file format that is very easy for any programming language to read and it is a lightweight data-interchange format. Because of its simplicity and lightweight as well as strict structure it is generally easier for computers to process than (of course proprietary formats) as well as XML. Additionally, it is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON a most suitable data-interchange language for many applications (see [5]). JSON is built on two structures: a) A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. b) An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures. In JSON, they take on these forms:  An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

IMS-WP7-D7.1-DLR-006-02-I 35 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 6: Object representation in JSON.

 An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).

Figure 7: Array representation in JSON.

 A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.

Figure 8: Value representation in JSON.

 A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. A character is represented as a single character string. A string is very much like a C or Java string.

IMS-WP7-D7.1-DLR-006-02-I 36 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 9: String representation in JSON.

 A number is very much like a C or Java number, except that the octal and hexadecimal formats are not used.

Figure 10: Number representation in JSON.

 Whitespace can be inserted between any pair of tokens. Excepting a few encoding details, that completely describe the language.

IMS-WP7-D7.1-DLR-006-02-I 37 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

6.1.1.2 XML The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. It was derived from an older standard format called SGML (ISO 8879), in order to be more suitable for Web use. XML is a widely used format for data exchange because it gives good opportunities to keep the structure in the data and the way files are built on, and allows developers to write parts of the documentation inside data files without interfering with the reading of them ([6]). 6.1.1.3 RDF A W3C-recommended format called RDF makes it possible to represent data in a form that makes it easier to combine data from multiple sources. RDF is a framework for describing resources on the web. It is designed to be read and understood by computers. RDF is not designed for being displayed to people. It is written in XML a part of the W3C's Semantic Web Activity. RDF data can be stored in XML and JSON, among other serializations. RDF encourages the use of URLs as identifiers, which provides a convenient way to directly interconnect existing open data initiatives on the Web. RDF is still not widespread, but it has been a trend among Open Government initiatives, including the British and Spanish Government Linked Open Data projects. The inventor of the Web, Tim Berners-Lee, has recently proposed a fivesstar scheme that includes linked RDF data as a goal to be sought for open data initiatives ([7]).

6.1.1.4 SPREADSHEETS Many authorities have information left in the spreadsheet, for example Microsoft Excel. This data can often be used immediately with the correct descriptions of what the different columns mean. However, in some cases there can be macros and formulas in spreadsheets, which may be somewhat more cumbersome to handle. It is therefore advisable to document such calculations next to the spreadsheet, since it is generally more accessible for users to read ([8]).

6.1.1.5 COMMA SEPARATED VALUES CSV files can be very useful because it is a compact format and thus suitable to transfer large sets of data with the same structure. However, the format is so spartan that data are often useless without documentation since it can be almost impossible to guess the significance of the different columns. It is therefore particularly important for the comma- separated formats that documentation of the individual fields is accurate. Furthermore, it is essential that the structure of the file is respected, as a single omission of a field may disturb the reading of all remaining data in the file without any real opportunity to rectify it, because it cannot be determined how the remaining data should be interpreted ([9]).

6.1.1.6 TEXT DOCUMENT Classic documents in formats like Word, ODF, OOXML, or PDF may be sufficient to show certain kinds of data - for example, relatively stable mailing lists or equivalent. It may be cheap to exhibit in, as often it is the format the data is born in. The format gives no support to keep the structure consistent, which often means that it is difficult to enter data by

IMS-WP7-D7.1-DLR-006-02-I 38 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

automated means. Be sure to use templates as the basis of documents that will display data for re-use, so it is at least possible to pull information out of documents. It can also support the further use of data to use typography markup as much as possible so that it becomes easier for a machine to distinguish headings (any type specified) from the content and so on. Generally it is recommended not to exhibit in word processing format, if data exists in a different format ([9]).

6.1.1.7 PLAIN TEXT DOCUMENTS (.TXT) These are very easy for computers to read. They generally exclude structural metadata from inside the document however, meaning that developers will need to create a parser that can interpret each document as it appears. Some problems can be caused by switching plain text files between operating systems. MS Windows, Mac OS X and other Unix variants have their own way of telling the computer that they have reached the end of the line ([9]). 6.1.1.8 HTML Nowadays much data is available in HTML format on various sites. This may well be sufficient if the data is very stable and limited in scope. In some cases, it could be preferable to have data in a form easier to download and manipulate, but as it is cheap and easy to refer to a page on a website, it might be a good starting point in the display of data. Typically, it would be most appropriate to use tables in HTML documents to hold data, and then it is important that the various data fields are displayed and are given IDs which make it easy to find and manipulate data ([7])

6.1.1.9 SCANNED IMAGE Probably the least suitable form for most data, but all TIFF, JPEG-2000 and PNG can at least mark them with documentation of what is in the picture - right up to mark up an image of a document with full text content of the document. It may be relevant to their displaying data as images whose data are not born electronically - an obvious example is the old church records and other archival material - and a picture is better than nothing ([9]).

6.1.1.10PROPRIETARY FORMATS Some dedicated systems, etc. have their own data formats that they can save or export data in. It can sometimes be enough to expose data in such a format - especially if it is expected that further use would be in a similar system as that which they come from. Where further information on these proprietary formats can be found should always be indicated, for example by providing a link to the supplier’s website. Generally it is recommended to display data in non-proprietary formats where feasible ([9]).

6.1.2 Specific formats 6.1.2.1 HDF5 HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their

IMS-WP7-D7.1-DLR-006-02-I 39 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analysing data in the HDF5 format. The HDF5 technology suite includes (see [10]):  A versatile data model that can represent very complex data objects and a wide variety of metadata.  A completely portable file format with no limit on the number or size of data objects in the collection.  A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.  A rich set of integrated performance features that allow for access time and storage space optimizations.  Tools and applications for managing, manipulating, viewing, and analysing the data in the collection.

6.1.2.2 NETCDF NetCDF is an abstraction that supports a view of data as a collection of self-describing, portable objects that can be accessed through a simple interface. Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access netCDF datasets and transform, combine, analyse, or display specified fields of the data. The development of such applications has led to improved accessibility of data and improved re-usability of software for array-oriented data management, analysis, and display (see [11]). The netCDF software implements an abstract data type, which means that all operations to access and manipulate data in a netCDF dataset must use only the set of functions provided by the interface. The representation of the data is hidden from applications that use the interface, so that how the data are stored could be changed without affecting existing programs. The physical representation of netCDF data is designed to be independent of the computer on which the data were written.

6.1.2.3 JUPITER TESSELATION (JT) JT (Jupiter Tesselation) is an ISO-standardized 3D data format and is in industry used for product visualization, collaboration, CAD data exchange, and in some also for long-term data retention. It can contain any combination of approximate (faceted) data, surfaces (NURBS), Product and Manufacturing Information (PMI), and Metadata (textual attributes) either exported from the native CAD system or inserted by a product data management (PDM) system. ([15])

6.1.2.4 OASIS OSLC LIFECYCLE INTEGRATION CORE (OSLC CORE) TC The OSLC (Open Services for Lifecycle Collaboration) initiative supports integration between a heterogeneous set of products and components from various sources using an architecture that is minimalist, loosely coupled, and standardized. OSLC applies World Wide Web and Linked Data principles, such as those defined in the W3C Linked Data Platform (LDP), to create a cohesive set of specifications that can enable products, services and other distributed

IMS-WP7-D7.1-DLR-006-02-I 40 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

network resources to interoperate successfully. The OSLC Core TC is responsible for specifications that expand W3C LDP concepts, as needed, to enable integration. ([16])

6.1.2.5 REQUIREMENTS INTERCHANGE FORMAT (RIF/REQIF) RIF/ReqIF (Requirements Interchange Format) is an XML file format that can be used to exchange requirements, along with its associated metadata, between software tools from different vendors. The requirements exchange format also defines a workflow for transmitting the status of requirements between partners. ([17])

6.1.2.6 (ODATA) OData (Open Data Protocol) is an ISO/IEC approved, OASIS standard that defines a set of best practices for building and consuming RESTful APIs. OData helps you focus on your business logic while building RESTful APIs without having to worry about the various approaches to define request and response headers, status codes, HTTP methods, URL conventions, media types, payload formats, query options, etc. ([18])

6.2 MODELING LANGUAGES AND TOOLS

6.2.1.1 BUSINESS PROCESS MODELING LANGUAGE (BPML) Business Process Modeling Language (BPML) is an XML-based language for business process modeling. It was maintained by the Business Process Management Initiative (BPMI) until June 2005 when BPMI and OMG (Object Management Group) announced the merger of their respective Business Process Management (BPM) activities to form the Business Modeling and Integration Domain Task Force (BMI DTF). ([12])

6.2.1.2 CODE OF PLM OPENNESS (CPO) The Code of PLM Openness (CPO) is a worldwide unique approach and runs under the patronage of the German Federal Ministry for Economic Affairs and Energy (BMWi). CPO is a prostep ivip initiative, for establishing a common understanding on openness of IT systems in the context of PLM between IT customers, IT vendors and IT service providers. Thereby, the CPO goes far beyond the requirement to provide IT standards and related interfaces. It defines measurable criteria (‘shall’, ‘should’, ‘may’) for the following categories: interoperability, infrastructure, extensibility, interfaces, standards, architecture as well as partnership. ([13])

6.2.1.3 SYSTEMS MODELING LANGUAGE (SYSML) The Systems Modeling Language (SysML) is a general-purpose modeling language for systems engineering applications. It supports the specification, analysis, design, verification and validation of a broad range of systems and systems-of-systems. ([19])

6.2.1.4 UNIFIED MODELING LANGUAGE (UML) The Unified Modeling Language (UML) is a general-purpose, developmental, modeling language in the field of software engineering that is intended to provide a standard way to visualize the design of a system.

IMS-WP7-D7.1-DLR-006-02-I 41 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

The OMG's Unified Modeling Language™ (UML®) helps to specify, visualize, and document models of software systems, including their structure and design, in a way that meets all of these requirements. (UML can be used for business modeling and modeling of other non- software systems too.) Using any one of the large number of UML-based tools on the market, future application's requirements can be analysed and design a solution that meets them, representing the results using UML 2.0's thirteen standard diagram types. Models can be built about any type of application, running on any type and combination of hardware, operating system, programming language, and network, in UML. Its flexibility enables modelling of distributed applications that use just about any middleware on the market. Built upon fundamental OO concepts including class and operation, it's a natural fit for object-oriented languages and environments such as C++, Java, and the recent C#, but it can be used to model non-OO applications as well in, for example, Fortran, VB, or COBOL. UML Profiles (that is, subsets of UML tailored for specific purposes) helps in model Transactional, Real-time, and Fault-Tolerant systems in a natural way. ([20], [21])

6.2.1.5 GOOGLE WEB TOOLKIT (GWT) Google Web Toolkit (GWT) or GWT Web Toolkit, is an open source set of tools that allows web developers to create and maintain complex JavaScript front-end applications in Java. Other than a few native libraries, everything is Java source that can be built on any supported platform with the included GWT Ant build files. It is licensed under the Apache License version 2.0. ([14])

6.3 COMMUNICATION PROTOCOLS A is a system of rules in telecommunications that allow two or more entities of a communications system (e.g. M2M Machine to Machine) to transmit information via any kind of variation of a physical quantity. These are the rules or standard that defines the syntax, semantics and synchronization of communication and possible error recovery methods. In addition, data definitions have to be provided in terms of data types, structure and semantics. Protocols may be implemented by hardware, software, or a combination of both.

6.3.1 OPC UA

6.3.1.1 INTRODUCTION OPC UA answers the increasing need for interoperability and communication of industry4.0. A state of the art interface definition has to provide important features beyond transmitting data including means for standardized definition of data and functions, standardized transmission, security, availability, and others. The following sections base on contributions of ascolab GmbH ([22]). With the new Unified Architecture (UA) the OPC Foundation ([23]) follows todays and future requirements of industrial communication needs. Based on the functionality of all previous OPC Specifications (DA, A+E, HDA, Commands, Complex Data) the new defined standard is completely realized using a service oriented architecture (SOA). This new approach is platform

IMS-WP7-D7.1-DLR-006-02-I 42 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

independent, scalable and high-performance. The use in small devices of process and measurement technology with their specialized operating systems is just as well possible as the use in enterprise applications on Unix/Linux machines or Mainframes ([22]).

Figure 11: OPC UA Concepts

6.3.1.2 OPC UA APPROACH Definition ([22]): OPC Unified Architecture, or OPC UA for short, is a TCP/IP based communication technology developed by the OPC Foundation to allow a manufacturer independent exchange of information in the field of industrial automation. OPC UA is also referred to as a machine to machine (M2M) communication protocol. Due to its generic information model, OPC UA has been adapted to other sectors as well, e.g. building automation, power generation and distribution, oil and gas exploration.

Data Model ([22]): The OPC Information Model is not just a hierarchy based on folders, items and properties anymore, but a so-called Full Mesh Network based on Nodes instead. This network of Nodes can additionally transmit all varieties of meta information and diagnostic data. The closest image of a node would be an object, known from object-oriented programming (OOP). It can

IMS-WP7-D7.1-DLR-006-02-I 43 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

own attributes for read access (Data Access (DA), Historical Data Access (HDA)), methods which can be called (Commands), and triggered events which can be fired (AE, DA DataChange) to exchange certain information between devices. An Event contains among other things a time of notification, a message and a severity. Nodes are used for process data as well as for all other types of meta data. The newly modelled OPC namespace now contains the Type Model used to describe all possible data types as well.

Transport ([22]): The transport layer transforms these methods into a protocol, which means it serializes/deserializes the data and transmits it over the network. Currently there are two TCP/IP based protocols specified for this purpose. One is a binary, high performance optimized TCP protocol and the second, a web service based protocol. The binary protocol is mandatory and is supported by all UA stacks. In addition, there is a combination of both protocols, the so-called hybrid protocol. Here, a binary encoded (unencrypted) message is sent using an encrypted channel (HTTP). Additional protocols are possible and may be added when necessary.

OPC UA Implementation The OPC Foundation provides the communication stack for its members. Developers of OPC UA products can choose between three implementations: C, .NET, or Java. All stacks provide the same functionality and, within the limits of the programming languages, the APIs can be applied similarly. The OPC Foundation maintains these implementations and integrates innovations if necessary. For members, the source code is available as well. All three implementations are tested against each other to ensure compatibility of protocol implementations.

6.3.1.3 STANDARDS A set of standards released all main aspects covered. All referred standards are valid.

Table 3: OPC UA Standards Document ID Issued Title IEC/TR 62541-1 2016-10-01 OPC Unified architecture - Part 1: Overview and concepts IEC/TR 62541-2 2016-10-01 OPC Unified architecture - Part 2: Security Model IEC 62541-3 2015-03-01 OPC Unified Architecture - Part 3: Address Space Model IEC 62541-4 2015-03-01 OPC Unified Architecture - Part 4: Services IEC 62541-5 2015-03-01 OPC Unified Architecture - Part 5: Information Model IEC 62541-6 2015-03-01 OPC Unified Architecture - Part 6: Mappings IEC 62541-7 2015-03-01 OPC Unified Architecture - Part 7: Profiles

IMS-WP7-D7.1-DLR-006-02-I 44 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Document ID Issued Title IEC 62541-8 2015-03-01 OPC Unified Architecture - Part 8: Data Access IEC 62541-9 2015-03-01 OPC Unified Architecture - Part 9: Alarms and conditions IEC 62541-10 2015-03-01 OPC Unified Architecture - Part 10: Programs IEC 62541-11 2015-03-01 OPC Unified Architecture - Part 11: Historical Access IEC 62541-13 2015-03-01 OPC Unified Architecture - Part 13: Aggregates

6.3.2 Queue/topic based messaging systems Before describing message based systems it is worth giving an architectural overview which incorporates some main messaging pattern types. Understanding these pattern types will help understand the differences of the vendors’ products.

Consider the real world example of a letter being sent from one responsible party to another via a postal delivery service. The letter is the message and is contained within an envelope. The envelope defines the addressee’s information as well as the destination address amongst other things. This letter maybe sent recorded delivery whereby a receipt is needed. This letter may traverse several channels and delivery hubs until it reaches its destination, where the letter is signed for. This signature recording the delivery of the letter will eventually let the original responsible party know that their letter has been received.

6.3.2.1 MESSAGING – OVERVIEW Messaging provides the ability for one system to send a self-contained message to another. The sender and the recipient do not need to be aware of each other, this is called loose coupling. The receiving system cannot guarantee its availability; thus, the message is needs to be sent asynchronously.

Messaging is structured with the following logical model, implementations may vary and some product vendors have variances again.

Sender System Messaging System Reciever System

Message Channel

Figure 12: Typical Messaging System - Logical Model

IMS-WP7-D7.1-DLR-006-02-I 45 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 Channels o Messages are transmitted through a Message Channel that connects a Sender System to a Receiver System. This Channel will need to be determined, for example the channel could be TCP or HTTP.  Message o A Message is textual or binary package to be sent or received. To transmit a Message the contents must be encoded for the Channel. To receive a Message from the Channel the system must be able to decode Message. o Message Header (the envelope) . Information regarding the sender & receiver . Time sent and time received . Sending system information including protocol . Size of the Message o Message Body (the letter) . The information to be sent and received o Message Attachment . Any other MIME type data can also be attached to the message  Routing o A Message may traverse many Channels, which are triaged by Routing systems. The original Sender does not need to be aware of all the Channels, only the one for sending that they submit the message to. The Routing system is then responsible for ensuring the Message is delivered by using Pipes/Filters to the receiver, or the next Routing System.  Transformation o Systems that do not support a common message format will need the Message translating in transit so that the receiving system can decode the Message.  Endpoints o An Endpoint is an interface to the sending/receiving and the messaging system.

6.3.2.2 ENTERPRISE MESSAGING – OVERVIEW Building on the principles for Messaging, Enterprise Messaging seeks to address more of the business aspects of Messaging such as the following:  Data Structure formats o XML o JSON  Messaging protocols o AMQP – Advanced Messaging Queuing Protocol o DDS – Data Distribution Service o MSMQ – Microsoft Message Queuing o JMS – Java Messaging Service o ZMQ – Zero Message Queue  Security

IMS-WP7-D7.1-DLR-006-02-I 46 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

o Encryption o Signing o Altered Content o Point to Point Security (Transport) o Non-repudiation o Replay protection o Persistence  Routing o Efficiency o QoS – First Class, Second Class etc.  Metadata o Message Expiration o Receipt required o Message Correlator o Message Identification o others  Enterprise Policies o Policies pertinent to the organization sending a receiving message o RBAC  Message Patterns o Synchronous – Recipient expected to be available and operating at time of request – the originator will wait for a response o Asynchronous – Recipient may be offline, may delay processing until later – the originator does not wait for a response o Publish Subscribe – Subscribe to Message topics that match a pattern such as latest news, o Distribution: one to one, one to many and many to many patterns o Queues – Messages utilizing FIFO

6.3.2.3 MESSAGING SYSTEMS – MIDDLEWARE To address the varied complexities of Messaging, vendors created their own implementations of the Messaging Protocols all of which have their own complexities. The following table attempts to distill the main capabilities of the main vendors and allegiance to protocol

Table 4: Vendor Messaging system protocols

Product Vendor License Technology Protocol

)

20922

JMS

(ISO/IEC

MQTT

19464:2014)

MQP 0.9 MQP

(ISO/IEC PRF

CORBA Proprietary A 1.0 AMQP

IMS-WP7-D7.1-DLR-006-02-I 47 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

ActiveMQ Apache OSS X X X ZeroMQ LGPL X HornetMQ OSS X Service Bus Microsoft Commercial X X MSMQ Microsoft Commercial X Simple Queue Amazon Commercial X Service AWS RabbitMQ Rabbit Commercial X DDS X WebSphereMQ IBM Commercial X Tibco Commercial X

Table 5: Vendor Messaging system: Message Exchange Patterns Support

Product Vendor License Message exchange pattern (MeP) -

-

Broker Request Reply (Synchron ous) Request Response (Asynchro nous) Public Subscribe (Asynchro nous) ActiveMQ Apache OSS X X X X ZeroMQ Zero LGPL X X X HornetMQ Redhat OSS X X X X Service Bus Microsoft Commercial X X X X MSMQ Microsoft Commercial X X X X Simple Queue Amazon Commercial X X X X Service AWS RabbitMQ Rabbit Commercial X X X X DDS RTI Commercial X X WebSphereMQ IBM Commercial X X X X Tibco Commercial X X X

6.3.2.4 FUTURE OF MESSAGING SYSTEMS

IMS-WP7-D7.1-DLR-006-02-I 48 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Whilst the message exchange patterns stay consistent, the goals of the messaging technology depend on the domain being addressed. Domains such as messaging of medical information need to be reliable, whereas financial data works on a principle where each transaction needs to ensure synchronicity in preference to reliability.

It is fair to say that each industry sector has its own challenges in terms of choosing the appropriate messaging technology.

 IoT – Internet of Things – Consumer grade devices  IIoT – Industrial Internet of Things - Industrial grade devices  Cloud – Storage and Compute  Smart Devices – Users devices  Edge – Devices at the edge of the enterprise network  Fog/Mist – Extending the enterprise and processing closer to the assets

All the above drivers indicate a push towards a de-centralised cloud delivered messaging architecture, where point to point and point to brokers still exist. Low latency and throughput become more of a concern when delivering global services.

The conclusion is more effort and research should be invested in the emerging technologies that will advance the messaging system and possibly negate the need for discrete messaging middleware. These areas are from the perspective of IIoT, whereby edge, and fog processing is required in an atomic way before reaching it intended target.

6.4 WEB SERVICES / APIS There is some ambiguity regarding the terms. API (application programmers interface) is a basic concept of software architecture which enables use/reuse of functionality (e.g. by including libraries). In the context of internet and machine – machine communication, protocol stacks are regarded as well as data specifications. Therefore, these terms are used here in the context and meaning as follows.

6.4.1 Web Services Almost all aspect of internet communication is coordinated ([25]). Web services are defined as follows: [Definition: A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine- processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.] ([26]) The IT systems perform services that are defined and described in the context of the enterprise’s business activities with Service-Oriented Architecture (SOA). At a business

IMS-WP7-D7.1-DLR-006-02-I 49 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

level of abstraction, services are offered which renders the interface as a business interface i.e. a contract. The contract is a platform neutral and standard way of describing what the service does. This principle enables use of techniques such as service composition, message- based communication, discovery, and model-driven implementation, which give fast development of effective and flexible solutions. They are important features of SOA. Their benefits – especially that of enterprise agility – are the most frequently quoted reasons for SOA adoption ([25]). Service‐Oriented Architecture is an architecture model or design approach, which states that the system should be composed of several independent, loosely‐coupled services. It is recommended that SOA infrastructure implementations use open standards to realize interoperability and location transparency. Therefore, key concepts of SOA are loose coupling, high interoperability and services. SOA aims to enhance the agility, efficiency and productivity of an enterprise system. SOA can be implemented by using web services, particularly with WCF services. WCF is configurable to communicate with web services using both SOAP and XML messages. Because WCF can communicate using web service standards, interoperability is straightforward with other platforms that also support SOAP. Therefore, interoperability is gained through a set of XML-based open standards, such as WSDL, SOAP, and UDDI. These standards provide a common approach for defining, publishing, and using web services ([25]).

Web services are XML software systems over web or clouds, which are designed to support interoperable machine-to-machine interaction. To support refined communications between various nodes in a network standards act as series of protocols ([22]).

The Web service protocol stack is a collection of open standards that are used to make Web services interact with each other ([28]):  Discovery Protocol: This protocol is a directory for storing information about web services. Service providers use Universal Description, Discovery, and Integration (UDDI) specification to advertise the existence of their services and then requesters use to search and discover already registered services.  Description Protocol: This protocol is used to describe and locate web services. Web Service Definition Language (WSDL) is used to describe what type of message a Web Service accepts and generates. For a service it can be thought of as the overall technical interface specification. It serves as not only the definition of the interface but also contains technical information such as the allowable operations for a service and its endpoint address. • Messaging Protocol: This protocol is responsible for encoding messages so that they can be understood at either end of a network connection by using XML format. Extensible Markup Language (XML) has become the fundamental message form for SOA consumers and services. In an SOA based on Web services, the message has a structure to allow for deeper integration and cross-platform collaboration. A key part of which is an enveloping scheme known as Simple Object Access Protocol (SOAP), which includes the message content, and is also encoded using XML. Thus, SOAP is the specific format for exchanging Web Services data over HTTP.

IMS-WP7-D7.1-DLR-006-02-I 50 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 Transport Protocol: This protocol is responsible for the transport of messages between network applications. Internet uses HTTP (HyperText Transfer Protocol) as the low-level protocol for the transport layer. As service interface hides the implementation logic from the users, therefore, the service can be used on different platforms and any application capable of communicating through the standard XML messaging protocol can use the service through the standard interface. The main advantage of Web services is that the service can be used remotely without the user’s actual involvement and thus, eliminating the need for constant updates to locally installed software.

 Uses of Web Services o Standardized Protocol: For communication, Web Services use standardized industry standard protocol. In the Web Services protocol stack all the four layers (Service Transport, XML Messaging, Service Description and Service Discovery layers) use the standardized protocol. This standardization of protocol stack gives the business many advantages like increase in the quality and reduction in the cost due to competition. o Exposing the existing function on to network: A Web service is a unit of managed code that can be activated using HTTP requests. So, Web Services allows us to expose the functionality of our existing code over the cloud. Once it is exposed on the cloud, other application can use the functionality of our program. o Interoperability i.e. Connecting Different Applications: Web services are used to make the application platform and technology independence by allowing different applications to talk to each other and share data and services among themselves. So, for example VB or .NET application can talk to java web services and vice versa. o Low Cost of communication: We can use our existing low cost Internet for implementing Web Services because it uses SOAP over HTTP protocol for the communication. This solution is much less costly compared to proprietary solutions. Beside SOAP over HTTP, Web Services can also be implemented on other reliable transport mechanisms like FTP etc.

6.4.2 Web APIs Representational state transfer is a concept (and implementation) for communication via internet. Compared to WS* it is easier to implement. Often referred to as RESTful services, simplified access to services via http(s) requests and responses is possible. The basic HTTP requests (GET, POST, PUT, DELETE) are used to transmit the data and/or function requests (e.g. searching a database with given parameters). Identification of server, functions, and parameters for execution are provided in the http request itself. Example access google: https://www.google.de/?gfe_rd=cr&ei=EYhzWKH9E8nb8Af0iZrICg&gws_rd=ssl

IMS-WP7-D7.1-DLR-006-02-I 51 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

6.4.3 Comparison Web Services vs. Web APIs Table 6: Comparison Web Services vs. Web APIs [29] Criterion WS* Web API Specification of data Standardized Application specific and functions Complexity High Low(er) Standards All aspects of application of Available for basic web services are standardized communication (http) and data formats (xml, ) Support by .Net, Java Many languages, easy to implement Security Included as part of the Depending on application protocol stack WS* - the web services protocol stack

6.5 IN MEMORY DATA GRID TECHNOLOGIES 6.5.1 In-Memory Data Grid Overview An In-Memory Data Grid (IMDG) is a distributed in-memory (RAM) data structure. IMDG is typically implemented using a key-value data structure. The advantages of using IMDG are mainly related to:  Enhanced performance in terms of read/write speed,  Easily scaling and upgradable,  High availability (fault tolerance) thanks to distributed data,  Persistent storage caching. The IMDG can be also thought as a data exchange platform/middleware between heterogeneous systems in the scope of the open data exchange standardization. In the following paragraphs some technologies implementing IMDG are listed. 6.5.2 Infinispan Infinispan [30] is an open source distributed in-memory key/value data store implementing an IMDG. It is developed under Java and it implements the JSR 107 specification. The main applications of Infinispan are:

IMS-WP7-D7.1-DLR-006-02-I 52 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 Local cache: providing a fast in-memory cache of frequently accessed data,  Clustered cache: In case a single node is not enough for data storage,  Remote cache: in case a decoupling between application and stored data is needed  Data grid: using advanced features such as transaction, notifications …

A communication mechanism can be implemented by using Listeners and Notifications: clients can register and are notified when an event takes place; events trigger a notification which is dispatched to listeners. 6.5.3 Redis Redis [31] is an open source, in memory data structure store, used as a database, cache and message broker. It supports different kinds of data structures such as string, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs and geospatial indexes. Some of Redis feature:  replication: is master-slave replication that allows slave Redis servers to be exact copies of master servers  clustering: automatically sharing data across multiple Redis nodes  on-disk persistence: dumping the dataset to disk periodically or by appending each command to a log  Publishing/Subscribing: implementing the Publish/Subscribe messaging paradigm Redis clients exist for most programming languages.

IMS-WP7-D7.1-DLR-006-02-I 53 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7 OPEN DATA EXCHANGE: APPLICATIONS Having the basic commonly used technologies described in section 6, section 7 describes the use of technologies in different application areas and amongst others describes application- specific formats. This will be followed in section 8 by the usage of these technologies and applications in different domains and relevant communities.

7.1 GEODATA (DLR, LTU) 7.1.1 Vector Data Vector data represent geographic features with discrete coordinates as points, lines and polygons which could be stations, tracks and land use areas, for instance. Most of the geographic vector formats implement the OGC Simple Feature Access ([32]). Each geographic feature is characterised by an arbitrary number of specific attributes usually stored in tables. Vector datasets are either stored in stand-alone files on file-system-level or in databases. The Geospatial Data Abstraction Library (GDAL, [33]) offers functionality for conversion between different vector data formats.

7.1.1.1 ESRI SHAPEFILES (SHP) The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature. information is usually included as textual description ([39]).

7.1.1.2 GEOJSON (.JSON, .GEOJSON) GeoJSON is based on the popular JSON format (see section 6.1.1.1) with added support for geometries in form of Point, LineString, Polygon, MultiPoint, MultiLineString and MultiPolygon. Commonly GeoJSON is found in light-weight web mapping applications such as Leaflet/OpenLayers. Spatial reference system information can be included as spatial reference identifier (SRID).

7.1.1.3 WELL-KNOWN TEXT (WKT) Text-based representation of geographic features which can be of the type Point, LineString, Polygon, Multipoint, MultiLineString, MultiPolygon and GeometryCollection. Often used for small-sized, quick and human-readable exchange of geodata. Spatial reference system information is usually not included and has to be provided separately.

7.1.1.4 SPATIAL DATABASE Many common database management systems, including Oracle, PostgreSQL, MySQL, SQLite, offer support for geographic vector data through the implementation of the OGC Simple Feature Access with spatial reference system information.

IMS-WP7-D7.1-DLR-006-02-I 54 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.1.2 Raster Data Raster data represent geographic features as continuous grids of information which can be thematic base maps, aerial imagery, elevation models and arbitrary geo-referenced sensor measurements. Many common image formats such as GIF, JPEG, TIFF and PNG can be turned into geographic raster data by adding a textual meta data file (world file [34]) defining the pixel extent/scale and coordinate system origin of the dataset. The Geospatial Data Abstraction Library (GDAL) offers functionality for conversion between different raster data formats.

7.1.2.1 GEOTIFF GeoTIFF as extension of TIFF can be considered as the most popular and versatile raster exchange format. It supports different bands/channels with compression and includes spatial reference system information which is natively supported by most GIS.

7.1.3 Geo Web Services Geographic web services serve geodata from vector and raster sources through standardised interfaces for easy access and distribution independent from the underlying and often heterogeneous raw data backends.

7.1.3.1 WEB MAPPING SERVICES (WMS) Web mapping services offer tile-based images of vector and raster data as layers which are rendered on server side for each request. Such tiles can be easily included in client web mapping applications and desktop GIS. Each layer can support different styles for custom visualisation of geographic raw data. For better performance in large-scale applications such tiled mapping services can pre-render tiles (WMS-C, WMTS, TMS) which are then served from a tile store instead of dynamic re-rendering of tiles on each request. Recently also vector tiles are supported as output format offering better integration in mobile applications, smoother rendering and better performance. One drawback of vector tiles is that the styling must be known to the client side for client-based rendering of vector tiles. WMS also support spatio-temporal data offering easy/standardised access to time series of geographic features.

7.1.3.2 WEB FEATURE SERVICES (WFS) Web feature services offer direct, standardised raw data access to geographic vector data obscuring different data backend implementations of various data sources. Different vector geodata output formats (see section 7.1.1) are supported depending on the server implementation. Modification of raw data is additionally realised through transactions (WFS-T).

7.1.3.3 WEB COVERAGE SERVICES (WCS) Web coverage services offer direct, standardised raw data access to geographic raster data obscuring different data backend implementations of various data sources. Different raster geodata output formats (see section 7.1.2) are supported depending on the server implementation.

IMS-WP7-D7.1-DLR-006-02-I 55 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.1.4 Open Street Map, Open Railway Map

7.1.4.1 OPENSTREETMAP (OSM) OpenStreetMap (OSM) is a collaborative project to create a free editable map of the world. The creation and growth of OSM has been motivated by restrictions on use or availability of map information across much of the world, and the advent of inexpensive portable satellite navigation devices. OSM is considered a prominent example of volunteered geographic information. OSM datasets contain Points, LineStrings and Polygons representing different geographic features. Each geo-feature is tagged with specific attributes. No certain tagging rules exist but just a rough guideline which can lead to problems when the same type of features is tagged in different ways through different people. ([35])

7.1.4.2 POINTS OF INTEREST (POI) OSM provides a huge database of common points of interest (POI) in many different domains such as leisure, tourism, traffic, and transport ([36]). These elements can be queried via different APIs online or extracted directly from raw data which is freely available for everyone. The data basically consists of spatial objects with attached attributes describing the features by an arbitrary amount of tags ([37]). The data export can be obtained in standardized XML files and also converted to many different other vector-based spatial data formats, as described in section 7.1.1. This allows OSM databased to be easily used for heterogeneous analysis scenarios.

7.1.4.3 OPENRAILWAYMAP The OpenRailwayMap is a collaborative project to create a map of the world’s railway infrastructure. This map is based on the OpenStreetMap project but extended for the railway domain. This map can be used to display railway-specific information such as signals, infrastructure elements and its meta information. It includes diverse rail-mounted vehicles such as railways, subways and trams. This project was founded in 2011, previously known as “Bahnkarte” and since 2013 known as OpenRailwayMap under the URL ([38]). Such as in OSM, the available data was uploaded by individuals, companies and institutions that are willing to share their data with the rest of the community. Depending on how and when the data was recorded, the information could be old, or not representing the exact position of an element in the reality. The main motivations of the project are:  Worldwide coverage  Open source and open data  Up-to-date and detailed  OpenStreetMap There is also a Tagging scheme that is country specific, so that elements or signals that are country specific can also be modelled.

IMS-WP7-D7.1-DLR-006-02-I 56 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.2 SENSOR / MEASUREMENT DATA The integration of measurement data coming from distributed sensors, often referring to the same system or asset, is becoming increasingly important with the advances in sensor technology and network technology. Sensors provide measurements that could be used both for dynamic monitoring and for analytics purposes. Sensor measurement data can be associated to additional properties (metadata) that can be used for discovery and for understanding the nature of the object and for qualifying the output.

7.2.1 sensorML

7.2.1.1 OGC - SENSOR WEB ENABLEMENT The Open Geospatial Consortium (OGC) [40] is an international consortium of industry, academic and government organizations that collaboratively develop open standards for geospatial and location services. The OGC standardization activities focus on sensors, sensor networks and Sensor Web are known as Sensor Web Enablement [41].

The functionalities targeted by OGC within SWE include:  Discovery of sensor systems, observations, and observation processes;  Establishing of a sensor’s capabilities and quality of measurements;  Access to sensor parameters that automatically allow software to process and geo- locate observations;  Retrieval of real-time or time-series observations and coverages in standard encodings  Description of sensors task to acquire observations of interest;  Subscription to and publishing of alerts to be issued by sensors or sensor services

To achieve its objectives SWE initiative has created a framework including several OGC standards harmonized with other OGC standards for geospatial processing:  Sensor Model Language (SensorML) – Standard models and XML Schema for describing the processes within sensor and observation processing systems.  Observations & Measurements (O&M) –The general models and XML encodings for observations and measurements.  Sensor Observation Service (SOS) – Open interface for a web service to obtain observations and sensor and platform descriptions from one or more sensors.  Sensor Planning Service (SPS) – An open interface for a web service by which a client can 1) determine the feasibility of collecting data from one or more sensors or models and 2) submit collection requests.  SWE Common Data Model – Defines low-level data models for exchanging sensor related data between nodes of the OGC Sensor Web Enablement (SWE) framework.

IMS-WP7-D7.1-DLR-006-02-I 57 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 SWE Service Model •– Defines data types for common use across OGC Sensor Web Enablement (SWE) services. Five of these packages define operation request and response types.  PUCK Protocol Standard – Defines a protocol to retrieve a SensorML description, sensor "driver" code, and other information from the device itself, thus enabling automatic sensor installation, configuration and operation.

7.2.1.2 SENSORML SensorML [42] is one of the implementation standards included in the SWE suite. It defines conceptual models and XML Schema encoding for describing sensors and measurement processes. The primary focus of SensorML is to provide a framework for defining processes and processing components associated with measurement and post-measurement transformation of observations. The common framework provided by SensorML is particularly well-suited for the description of sensors and systems and the processes underlying the act of measurement and subsequent processing of observations. Sensor and transducer components (detectors, transmitters, actuators and filters) are all modeled as physical processes interconnected and equally participating within a system. The basic entities of the model are processes which take one or more input and produce one or more outputs, through the application of well-defined methods and configurable parameters. SensorML process model also allows explicit linking between processes using a composite pattern to define aggregate processes (e.g. chains, network and workflows). Current version of SensorML is version 2.0. SensorML is heavily dependent on the SWE Common Data Model standard for defining inputs, outputs, and parameters, as well as for specifying characteristics, capabilities, interfaces, and event properties. The SWE Common Data Models, which were originally defined within the version 1.0 SensorML specification, are in version 2.0 defined as a separate specification and are utilized throughout the SWE family of encoding and web service specifications. The SWE Common Data Model is intended to be used for describing static data (files) as well as dynamically generated datasets (on the fly processing), data subsets, process and web service inputs and outputs and real time streaming data. UML is used to describe both SensorML and SWE Common Data models.

Within IN2RAIL WP9, SensorML has been selected for dynamic data representation; among different options for the encoding of the sensor data the decision taken was to enable access to the data via RESTful web service interfaces and simple text serialization.

IMS-WP7-D7.1-DLR-006-02-I 58 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.2.2 LAS file format The LAS file format is an open file format for the interchange of 3-dimensional point cloud data between users of different systems. It was mainly developed for the exchange of LiDAR (Light detection and ranging) or other point cloud data but supports the exchange of any 3- dimensional data tuple. This binary file format is an alternative to proprietary systems or generic ASCII file interchange systems used by many companies. ([43]) The LAS 1.4 Specification was approved by the ASPRS in November 2011 and is the most recent approved version of the document ([44]). It stores an x, y, z coordinate set per point and additional information as the actual intensity or magnitude of the return value and user defined values, such as point classification. Though the LAS format is widely adopted and used, it is not spatially indexed and does not provide generalizations which are problems when working with very large datasets. Spatial indexing allows locating all the points within a given area quickly without scanning the whole file and generalization allows for a representative subset of the points to be used for visualization at small scales. Potential use-cases in the project could be the exchange of sensor data gathered by drones or laser scanners. Also two-dimensional image sequences, gathered by drones, could be used to derive 3d point-clouds by using structure from motion (SfM) photogrammetric range imaging techniques that may be coupled with local motion signals and be shared using the LAS format.

7.3 MAINTENANCE In this section the topic of open data exchange for maintenance is explored. This subject includes the areas of:  asset design information – how open data exchange is achieved both during the design and construction of assets, and, the transfer of asset information from construction projects to the “operate and maintain” phase of the asset life (see 7.3.1),  maintenance management – how open data exchange is achieved during the operational phase of the asset life (see 7.3.2),  asset condition – how open data exchange is achieved for asset condition assessment and evaluation (see 7.3.3)  asset alarms – standardization around events and notifications of asset state (7.3.4)

7.3.1 Building Information Modeling – BIM

7.3.1.1 BACKGROUND Building Information Modeling has its roots within the construction and building industry. The term is used to refer to software tools, design processes and structured data models used throughout the design, construction and maintenance phases of the asset lifecycle. Unlike CAD tools, which optimize traditional pen and paper design and drawing processes, the BIM approach to design is model driven. Complex assets are modeled from real construction

IMS-WP7-D7.1-DLR-006-02-I 59 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

elements such as slabs, windows, walls and roof. The model is stored as structured data and is a digital representation of the physical and functional characteristics of a facility. Drawings are generated using rendering software that takes the BIM data model and combines it (for example) with a camera position and definition, and automatically generates a drawing of the asset from that perspective. Through the use of models rather than drawings, the effect of change on elements within a design can be rapidly evaluated for impact on other areas and the construction drawings automatically updated. This delivers a significant reduction in effort required for all phases of asset life - design, build and operation/maintenance. The use of a model also enables rapid iterations of designs between teams of different disciplines, significantly reducing the risk of design conflict and the incidence of conflict discovery during construction. Realizing the potential benefits of having a full BIM model for assets requires that the model is maintained throughout the asset life. For example, this enables decision making to be made against the current asset without the need for re-surveying. The use of BIM approaches and modeling is growing within the rail industry with governments and administrations mandating the use of BIM in projects. Examples include: 1. Norwegian National Rail Administration – the rail administrator’s design manual defines the types of models required and the content of the models. It does not define the tools or methods to be used. The overall coordination model is a combination of base models and rail discipline models. Base models include map data, surveyed data, rail data, water and services pipelines and underground data. The discipline models include track, superstructures, signaling, telecoms, electrical distribution and substructures such as tunnels. ([45]) 2. UK Rail Industry – Crossrail and Brighton Mainline upgrade investment projects are two examples where BIM is being used to significantly derisk project development and implementation. In addition, the UK Government has mandated BIM level 2 on all centrally procured HM Government projects by 4th April 2016, and is currently on track in delivering a strategy for Level 3. ([46])

7.3.1.2 OPENBIM AND BUILDINGSMART From the OpenBIM web site ([47]), OpenBIM is described as: OpenBIM is a universal approach to the collaborative design, realization and operation of buildings based on open standards and workflows. OpenBIM is an initiative of buildingSMART and several leading software vendors using the open buildingSMART Data Model. The buildingSMART core is based around a common model called IFC that enables the storage and exchange of BIM information between software applications. These models are captured as ISO standards – as illustrated below (reproduced from http://buildingsmart.org/ifc/).

IMS-WP7-D7.1-DLR-006-02-I 60 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 13: ISO Standards related to BIM

7.3.2 Maintenance Management

7.3.2.1 BACKGROUND Asset maintenance is used within the Rail Industry to mitigate the safety risks associated with rail undertaking. It is also used to maximize asset availability to deliver the service promised. There are many approaches to asset maintenance; an example of these as a measure of maintenance maturity is illustrated below. y t i

l Reliability engineering i

b Diagnostics a l i

a Predictive Maintenance v A Systematic planning and scheduling d n a Preventive maintenance y t i l i b a i l Inspect and service e R

t n

e Fault repair only m p i u q E

Maintenance Approach Figure 14: Effectiveness of asset maintenance methodologies on asset reliability and Availability

IMS-WP7-D7.1-DLR-006-02-I 61 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

In summary, the methodologies illustrated are:  Fault repair only – maintenance interventions are only performed when an asset has failed to perform the requested function  Inspect and service – an unstructured approach to asset condition management, ad-hoc inspections and service actions  Preventive maintenance – following a time or use based regime, usually stipulated by the manufacturer to mitigate in-warranty failure.  Systematic planning and scheduling – following a time or use based regime, designed around risk assessment and barriers to hazards, threats and consequences. Most common in developed rail operators.  Predictive maintenance – the use of condition monitoring to detect symptoms of emerging failure modes, initiating a maintenance intervention prior to asset failure, with the goal of preventing in-service failures (Predict and prevent).  Diagnostics – using derived diagnoses to reduce the time spent maintaining an asset in response to an alarm on asset condition (emerging fault or full failure) and increasing the effectiveness of predictive maintenance through better knowledge prior to attending site  Reliability engineering – combining asset knowledge derived from asset performance with asset design and manufacture to design for dependability for the planned life and usage of the asset. Open standards for interfaces and data that support these maintenance methodologies have been in development over a number of years. One example of this is from MIMOSA – the Open System Architecture for Enterprise Asset Integration.

7.3.2.2 MIMOSA OSA-EAI MIMOSA is an operations and maintenance information open system alliance. It is a non-profit industry association that is focused on solutions that leverage supplier neutral, open standards to establish an interoperable industrial ecosystem for Commercial Off The Shelf (COTS) solution components provided by major industry suppliers. MIMOSA maintains a specification: Open System Architecture for Enterprise Application Integration (OSA-EAI). The OSA-EAI specification provides an information exchange standard to allow sharing asset registry, condition, maintenance and reliability information between enterprise systems; and a relational database model to allow storage of the same asset information. The specification is maintained as a UML model and is freely downloadable from the MIMOSA web site. It is aligned to the Condition Monitoring and Diagnosis Information Architecture as set out in ISO 13374-2:2007. The MIMOSA site illustrates the information scope with the following diagram: (reproduced from [48])

IMS-WP7-D7.1-DLR-006-02-I 62 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 15: MIMOSA – Open Asset Information Model

The specification is described in four main areas, 1. Open object registry management 2. Open maintenance management, and 3. Open reliability management 4. Open Condition Management. The exchange of information is supported through the definition of an XML schema that can be exchanged over a variety of transport options – including files, HTTP and SOAP web services. The MIMOSA site illustrates the overall OSA-EAI architecture using the diagram below (reproduced from [48]).

IMS-WP7-D7.1-DLR-006-02-I 63 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 16: MIMOSA Open System Architecture for Enterprise Application Integration (OSA-EAI)

The database model (CRIS) is represented as a logical and physical model, with direct Oracle and Microsoft SQL Server support for both table creation and reference data population.

7.3.3 Asset Condition

7.3.3.1 BACKGROUND Rail infrastructure maintainers have traditionally relied on a prescriptive, time based preventive maintenance strategy to manage asset condition. These methodologies include formal inspection and asset condition assessment executed by maintenance teams. Asset condition monitoring systems have been implemented by infrastructure maintainers to supplement the preventive maintenance strategies with predictive maintenance and to inform Reliability Centric Maintenance strategies. Traditional condition monitoring systems are often based on the SCADA systems model, being tightly integrated applications where the concept of modularity is not applied to the processing blocks within the system. This tight coupling results in applications that are not easily extended from a processing perspective. Within an environment where new algorithms or insights are still being developed (for example, in the rail industry), algorithms need to be quickly evaluated and de-risked before being generally applied for predicting and preventing failures across a rail operators’ estate.

IMS-WP7-D7.1-DLR-006-02-I 64 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

An open, modular, loosely coupled approach to integrating data and condition processing blocks into a predictive maintenance system enables the rapid evaluation of algorithms without introducing significant risk to existing processing and benefits.

7.3.3.2 MIMOSA OSA-CBM In recognition that, historically, Condition Monitoring and Diagnostics (CM&D) systems are often tightly integrated, ISO 13374 (parts 1-3) sets out requirements for an open CM&D processing architecture. The purpose of this open architecture is to enable asset condition data to be processed and communicated in a plug-and-play capability. MIMOSA OSA-CBM (Open System Architecture for Condition Based Maintenance) is an implementation of the ISO 13374 CM&D processing architecture, adding data structures and interface method definitions for the blocks defined in the standard. It is modeled using UML and is distributed under a non-exclusive, royalty free, perpetual license: http://www.mimosa.org/sites/default/files/policies-charters/MIMOSA_License_Agreement. The processing architecture is identified as having six functional blocks – Data Acquisition (DA), Data Manipulation (DM), State Detection (SD), Health Assessment (HA), Prognostic Assessment (PA) and Advisory Generation (AG).

Advisory Generation T e c E h x n t i e c r a n l

a

Prognostic Assessment d l

i s s y p s l a t e y m s c

o a s n n , Health Assessment

d f d i a

g i t n u a f r

o a a r t r m i c o h n a

i State Detection t v i i o n n g

p a r n e d s

e b Data Manipulation n l o t a c k t i

o n Data Acquisition

Sensor / Transducer / manual entry Figure 17: OSA-CBM functional blocks

1. Data Acquisition blocks are responsible for transforming the output of a transducer or sample test to a scaled digital representation.

IMS-WP7-D7.1-DLR-006-02-I 65 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

2. Data Manipulation blocks calculate descriptors and identify features of interest from sampled sensor data, other descriptors or the output of computations (For example, an average or calculated duration) 3. State Detection blocks categorize data and generate descriptors for a measurement, component or system as normal or abnormal, including the degree of abnormality in the associated operational context (For example, is the average current greater than a threshold). 4. Health Assessment blocks assess a component’s or system’s current health state with associated diagnoses of discovered abnormal states in the associated operational context. (For example, Point Machine is at 60% health and is showing symptoms of an emerging brush fault with 80% confidence) 5. Prognostic Assessment blocks assess a component’s or system’s future health state with the associated predicted abnormal states and remaining life for a projected operational context. (For example, Point Machine will reach critical health of 35% in two days based on normal timetable operation and current planned maintenance – with a confidence of 70%) 6. Advisory Generation blocks integrate information to generate advisories to operations and maintenance and to respond to capability forecast assessment requests. (For example, Recommendation to maintain a point machine in the overnight maintenance window tonight as the asset has enough remaining useful life to support normal timetable operation until then, but not enough to reach the next scheduled maintenance intervention – with confidence of 60%)

This model can be considered as a form of maturity model, where the most data and least value is at the DA layer and the least data of highest value is derived at the AG layer. Although diagrammatically presented as a sequence from DA, to DM, to SD, to HA, to PA and finally to AG, the model allows for a functional block at any level to ingest and process data from a functional block at any other level. For example, a SD block may consume data directly from a DA functional block. When realized through implementation on an appropriate software architecture, such as a service bus or middleware, this model enables a plug-and-play approach to algorithm evaluation and implementation at any of the six levels in a loosely coupled manner. This enables the system owner to introduce new functions at any of the conceptual levels as and when they are available, greatly reducing the risks and costs associated with change to a tightly coupled system.

7.3.4 Alarms Systems

7.3.4.1 BACKGROUND An alarm is defined in EEMUA 191 as indicating to an operator that equipment or process malfunction or abnormal condition. Alarm systems provide support to operators for generating and handing alarms, for managing abnormal situations.

IMS-WP7-D7.1-DLR-006-02-I 66 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

7.3.4.2 ALARM STATE MODEL IEC 62682 is an international technical standard on the management of alarm systems for the process industries. It does not define an open data exchange format or model, however, it defines an alarm state model that is applicable directly to alarm management in any industry including rail. Using a common alarm state model across alarm systems can greatly reduce the complexity of integration and alarm definition.

C Acknowledged alarm Process: Abnormal al rm Alarm: Active no to n R rn io Ack: Acknowledged e tu dit -a Re on la c rm

Ac know ledge

A B Normal Unacknowledged alarm Abnormal condition Process: Normal Process: Abnormal Alarm: Not active Alarm: Active Ack: Acknowledged Ack: Unacknowledged

Abnormal condition

al Ac m kn or ow o n le D n t on d ur iti ge et nd RTN unacknowledged R co Process: Normal Alarm: Not active Ack: Unacknowledged e

n

c e - i e o o d v n v i n l t v e e

r e s o u e c i o

v s e n n i l h s r s e d g v

s s e m i r r u - e s e e h t p e m n n r e S e R s p o g p U i D R r u p s f s u e s F D E G Suppressed by Shelved Out of service design Process: N/A Process: N/A Process: N/A Alarm: Not active Alarm: Not Active Alarm: Not active Ack: N/A Ack: N/A Ack: N/A

Figure 18: IEC 62682 Alarm State Model

The alarm states are defined in IEC62682 as:  Normal state (A) The normal (NORM) alarm state is defined as the state in which the process is operating within normal specifications, the alarm is inactive and past alarms have been acknowledged.

IMS-WP7-D7.1-DLR-006-02-I 67 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 Unacknowledged state (B) The unacknowledged alarm (UNACK) state is the initial state of an alarm becoming active due to abnormal conditions. In this state the alarm is unacknowledged. Previously acknowledged alarms can be designed to re-alarm, causing a return to this state.  Acknowledged state (C) The acknowledged (ACKED) alarm state is the state in which the alarm is active and the operator has acknowledged the alarm.  Return to normal unacknowledged state (D) In the returned to normal unacknowledged (RTNUN) alarm state, the process is within normal limits and the alarm becomes inactive before an operator has acknowledged the alarm condition.  Shelved state (E) In the shelved (SHLVD) alarm state an alarm is temporarily suppressed using a controlled methodology, and not annunciated. An alarm in the shelved state is under the control of the operator. The shelving function can automatically unshelve alarms.  Suppressed-by-design state (F) In the suppressed–by-design (DSUPR) alarm state an alarm is suppressed based on operating conditions or plant states, and not annunciated. An alarm in the suppressed- by-design state is under the control of logic that determines the relevance of the alarm.  Out-of-service state (G) In the out-of-service (OOSRV) alarm state an alarm is manually suppressed (e.g., control system functionality to remove alarm from service) when it is removed from service, typically for maintenance, and not annunciated. An alarm in the out-of-service state is under the control of maintenance.

7.4 PROCESS MINING / BUSINESS PROCESS ANALYTICS The need for companies to learn more about how their processes operate in the real world is a major driver behind the development and increasing use of process-mining techniques. The practice of business process mining derives from the field of data mining. Data mining refers to the extraction of knowledge from large data sets through identification of patterns within the data. Data mining practice has been developed and adapted to create the business process- mining techniques that are now being used to mine data logs containing process execution data to reconstruct actual business processes. Business process-mining techniques use execution logs of business processes. These are typically hosted within business process management (BPM) systems, though they may also be accessible though other process- related systems installed within a company (see [49]). There are many techniques that may be used to perform mining of business processes:  Genetic algorithms.  General algorithmic approach.  Markovian approach.  Neural network.  Cluster analysis.

IMS-WP7-D7.1-DLR-006-02-I 68 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

The BPM life cycle consists of (see [50]):  Process design. In this stage, fax- or paper-based as-is business processes are electronically modeled into BPMS. Graphical standards are dominant in this stage.  System configuration. This stage configures the BPMS and the underlying system infrastructure (e.g. synchronization of roles and organization charts from the employee’s accounts in the company’s active directory. This stage is hard to standardize due to the differing IT architectures of different enterprises.  Process enactment. Electronically modeled business processes are deployed in BPMS engines. Execution standards dominate this stage.  Diagnosis. Given appropriate analysis and monitoring tools, the BPM analyst can identify and improve on bottlenecks and potential fraudulent loopholes in the business processes. The tools to do this are embodied in diagnosis standards. The main issues still encountered in business process mining as (see [50]):  Noise. Logged data may be incorrect or incomplete creating problems when data is being mined.  Hidden tasks. Tasks that exist but cannot be found in the data.  Duplicate tasks. Two process nodes may refer to the same process model.  Non-free choice constructs. These are controlled choices that depend on choices made in other part of the process model.  Mining loops. A process may be executed several times; loops may be simple involving one or more events or more complex.  Different perspectives. Process events may be appended with additional information for mining purposes.  Delta analysis. Comparison of process model and reference model to check for similarity/disparity.  Visualising results. The results of process mining may be presented in graphical form in terms of a management panel.  Heterogeneous results. Access to information systems based on different platforms.  Concurrent processes. Mining of processes occurring at the same time.  Local/global search. Local strategies restrict the search space and are less complex, global strategies are complicated but have a better chance of finding the optimal solution.  Process re-discovery. The selection of a mining algorithm which can rediscover a class of process models from a complete workflow log.

7.5 BUSINESS PROCESS DATA EXCHANGE STANDARDS With intensified globalisation, the effective management of an organisation’s business processes became ever more important. Many factors such as ([51]):  the rise in frequency of goods ordered;  the need for fast information transfer;  quick decision making;  the need to adapt to change in demand;

IMS-WP7-D7.1-DLR-006-02-I 69 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 more international competitors; and  demands for shorter cycle times Many new BPM terminologies and technologies are often not well defined and understood by many practitioners and researchers using them. New languages and notations proposed often contain duplicating features for similar concepts. Standardisation groups (e.g. OMG) which pioneered interchange standards often claim their creations as the missing link between the business analyst and the IT specialist. There are currently some prominent interchange standards: (1) BPDM by OMG. (2) XPDL by the WfMC. (3) B2B information exchange standards.

7.5.1 Business Process Definition Metamodel (BPDM) The BPDM is an XML-based proposal by the OMG. It was initiated following a RFPs issued on 31 January 2003 and is still in its formative years. At the time of writing, the finalization of the specifications is underway (see [70]). BPDM provides the capability to represent and model business processes independent of notation or methodology, thus bringing different approaches together into a cohesive capability. As its name suggests, the BPDM was meant to be the authoritative meta-object facility (an abstract modelling language by the OMG) metamodel for the common elements in process definitions (see [44]). The metamodel behind BPDM captures business processes in a very general way and provides a XML syntax for storing and transferring business process models between tools and infrastructures. Various tools, methods and technologies can then map their way to view, understand and implement processes to and through BPDM (see [70]). This means that BPDM works like a multi-lingual standards translator with a common platform. BPDM is not as concerned with graphical notation as with semantics. It is conceivable that vendors will choose to maintain their existing notations but use the OMG BP metamodel to facilitate the transfer of information to other tools and models. In other words, a variety of different notations can continue to thrive in the OMG BP metamodel. In the long-run, however, the OMG will probably move most companies toward UML AD. However, BPDM is criticised as a complex and user-unfriendly standard. As the BPDM is relatively immature with no software tool using it.

7.5.2 XML Process Definition Language (XPDL) The XML-based XPDL stood the test of time and will mark its tenth-year anniversary in 2008. XPDL started in 1995 when the WfMC published the workflow reference model identifying five key interfaces necessary for any WfMS. One of the interfaces was for defining business processes. It includes a process definition expression language developed via a programmatic

IMS-WP7-D7.1-DLR-006-02-I 70 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

interface (i.e. process definition tool) to transfer the process definition to/from the workflow management system (see [71]). From 2002 to 2004, XPDL was an influential standard for the interchange of process design. This was especially so after WfMC endorsed BPMN as a graphical standard in 2004, after it was enhanced to represent the concepts present in a BPMN diagram in XML. This extension made XPDL ideal not only as a definition (i.e. execution) standard for business processes, but also as an interchange format between BPMN and XML-based execution standards (e.g. BPEL). The third revision of XPDL (XPDL 2.0) was released by the WfMC in 2005. Today, there are about 70 different BPM-related software based on XPDL. As its flow control features cannot be compared to that of BPEL and BPML ([122]), the main strength of XPDL still remains in its interchange capabilities, which is its selling point. There are currently over 70 products and applications that leverage XPDL on Java, Microsoft.NET Framework, or Linux. Some examples include Oracle 9i Warehouse Builder, IDS Scheer Business Architect, BEA Enterprise Repository and BPM Suite, etc.

7.5.3 B2B Information Exchange Standards  Electronic data interchange. Electronic data interchange – EDI, one of the early B2B information exchange standards, was created for communications between different proprietary formats of collaborating partners. There are two predominant forms of EDI; the American National Standards Institute X12 standards and the European UN/EDIFACT standards. In 1987, the International Organisation for Standardisation (ISO) adopted the EDIFACT standard. EDI serves to facilitate document exchange between companies. It is a medium for exchanging business documents with external entities, and integrating the data from those documents into the company’s internal systems. This is done via a value-added network, which is like a post office that forwards the data bundles to their designated businesses for a service fee (see [72]).  ebXML BPSS. The Electronic Business using eXtensible Markup Language (ebXML) was formalised in 2001 as a joint initiative between the United Nations Centre for Trade Facilitation and Electronic Business – UN/CEFACT and OASIS. Presently, it is a full set of ISO standards maintained by its two contributing organisations. ebXML’s stated objective was to make it possible for any business of any size in any industry to do business with any other companies anywhere in the world. The initial hope was that the presence of an accepted international e-business standard would motivate small business software developers to support ebXML. Compared to RosettaNet, ebXML is a collection of general standards which are not specific to any business (i.e. horizontal standards) while RosettaNet comprises specific standards, thereby making a thorough coverage (i.e. vertical standards). ebXML is adopted at much lower cost as compared to RosettaNet (see [51]).  RosettaNet. launched in June 1998, aims to standardise supply chain interactions by creating interoperable collaborative business processes. Member companies transact billions of dollars within their trading networks using partner interface process (PIP) specifications. PIPs are system-to-system, XML-based dialogues that represent operational-level collaborative business processes. Each PIP defines how two specific

IMS-WP7-D7.1-DLR-006-02-I 71 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

processes, running in two different partners’ organizations, are standardized and interfaced across the entire supply chain. PIPs include all business logic, message flow, and message contents to align the two business processes. The entire scope of RosettaNet processes is divided into seven clusters containing all supply chain processes: partner product and service review, product information, order management, inventory management, marketing information management, service and support, and manufacturing (see [51]).  Universal Business Language. Universal Business Language – UBL is a royalty-free library of XML-based, commonly used business documents such as purchasing orders, invoices, legal documents, etc. It is an international effort by OASIS, designed to eliminate the re-keying of data in existing fax- and paper-based business correspondence and provide an entry point into electronic commerce for small and medium-sized businesses. Its second version, UBL 2.0, was released in 2006 (see [51]).

7.6 STRENGTHS AND WEAKNESSES OF INTERCHANGE STANDARDS The strengths of interchange standards include:  interchange standards offer a “globally accepted” file format to save process definitions and Business process models in different BPMS are perfectly compatible; and  XPDL is well-accepted and stable, having had a ten-year history. The shortcomings of interchange standards include (see [51]):  Owing to fundamental differences in graph-oriented graphical and block-oriented execution standards, the quality of transformation of the interchange standards is limited by different syntax and structures. For instance, a cyclical and temporal implication in a graphical standard cannot be easily transformed into an execution standard. The translation of recursive capabilities from an execution standard to a graphical standard is an even more challenging task.  Currently in the industry, translation from graphical to execution is easier than that from execution to graphical standards. This applies to XPDL and even BPDM. This limitation raises doubts as to whether the “bridge between the business analyst and the IT specialist” is near in sight.

IMS-WP7-D7.1-DLR-006-02-I 72 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8 OPEN DATA EXCHANGE: USAGE IN DOMAINS AND RELEVANT COMMUNITIES

In section 8 the usage of the most important afore mentioned technologies and applications in different domains is described and relevant communities as well as places to go are sketched. This is first done for the railway domain as the state of the art in this field given a basis to someone who works or will work in this field. The usage in other domains may give hints to possible adoption of formats and applications and places to go for evaluation of further details.

8.1 RAILWAY 8.1.1 Open data provided by European infrastructure Managers There are a range of open data-feeds available from European railways. The tables below present an overview of some of these. Typically this type of information can be classified within the following genres:  Health & Safety: Metrics that describe the numbers and types of safety and occupational health incidents.  Operational: Information associated with the current timetable and the operational movements of trains.  Operational Performance: Train delays, cancellations and other quality of service indicators.  Network & Asset Characteristics: Physical description of the laydown of the network including asset registries and the location of assets. Some railways provide information about the condition of the assets.  Network Usage: Information about the extent of services and the numbers of passengers or quantity of freight that is carried.  Corporate: Information about finance, human resources, carbon emissions and other indicators where there is a responsibility to make data available to the public. With the exception of “operational” data, most of the other data does not significantly change frequently. Consequently, whilst some operational data-feeds are provided in real-time (via messaging services or application programmable interfaces) the remaining are relatively static and may be updated quarterly or even less frequently.

France

Access to the real-time data feeds is via https://data.sncf.com/api [52] and https://ressources.data.sncf.com/explore/?sort=modified [53].

IMS-WP7-D7.1-DLR-006-02-I 73 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Table 7: Data feeds in France Type of Updating Theme Feed Description and Comments Data rate Operational Train journeys Static Yearly Best (lowest) duration evaluated for certain - best duration journeys Operational Train Real-time Automatically Calculates a multi-train itinerary that goes journeys updated 5 through multiple train stations. times a week. Provides planned scheduled times for TGV and Regional trains Operational Timetables Real-time Automatically Consults a line’s scheduled route (and stops) updated 5 Provides planned scheduled times for TGV and times a week. Regional trains Operational Scheduled Real-time Automatically Looks up scheduled stops in each station stops updated 5 Provides planned scheduled times for TGV and times a week. Regional trains Operational SNCF Real-time Once a week Provides SNCF Transilien network departures in Transilien real time for given stations Real-Time Departures Network and List of lines Static Yearly List of lines including type of line (e.g. high Assets with general speed) characteristics information Network and List of stations Static Yearly List of all the stations along the network, Assets including the type e.g. passenger station, characteristics marshalling yard, etc. Network and Technical and Static Yearly Data provided per homogeneous line section Assets operating including operating status, maximum speed, characteristics characteristics electrified, speed control system implemented, of the lines links line/regional areas Network and Lists of Static Yearly Individual lists of assets (provided per asset Assets specific assets type), with their location on the network characteristics including: level crossings, track circuits, hotbox detectors, bridges, tunnels, substations, earthworks. Network and List of private Static Yearly List of private sidings locations and network Assets siding availability characteristics Network and Technical and Static Yearly Data provided per homogeneous track section Assets operating including curves, grade, specific operating rules characteristics characteristics (equipment for occasional wrong-track working) of the tracks Maintenance Track Static Monthly Maintenance and renewals activities on the possessions lines, including the kind of assets concerned (catenaries, track etc.). Data shared per month and per line (no detailed location and maintenance dates). Operational Delays and Static Monthly Performances indicators provided per service performance Quality of line (TGV, Paris regional lines, other regional service lines, etc.) indicators

IMS-WP7-D7.1-DLR-006-02-I 74 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Usage of the Annual Static Yearly National indicators expressed in Passenger network statistics - km/year and gross ton-km/year Passengers and Freight Corporate National Static Yearly Data and information published as part of SNCF Financial Réseau commitment to transparency indicators Indicators describing the financial status and performances of SNCF Réseau, as presented each year in the Annual Production and Activities Report Includes sales revenue, taxes, debt, annual volumes of OPEX and CAPEX Corporate Corporate Static Yearly Data and information published as part of SNCF Social Réseau commitment to transparency Responsibility that includes workforce and organisation, staffing / retirement, effective annual working duration, CO2 emissions, etc. Corporate Safety Static Weekly Description and location of events with incidents significant safety issues (precursors of accidents) Corporate Passengers' Static Yearly Annual counting of events and related accidents consequences

Germany Table 8: Data feeds in Germany Theme Feed Type of Data Description and Comments Network and Rail network Static National rail network provided in XML or GeoJSON format. Assets DB http://data.deutschebahn.com/dataset/data-streckennetz [54] characteristics Network and Station data Static The API provides station addresses, GPS and additional Assets information (including the length of platforms). characteristics http://data.deutschebahn.com/dataset/data-stationsdaten [55] Operational Target Static Target timetable for long-distance trains timetable http://data.deutschebahn.com/dataset/api-fahrplan [56] Fernverkehr Operational Berlin Real-time The transport association Berlin-Brandenburg provides an API Brandenburg for real-time data for all suburban railways (S-Bahn) and metro API trains (U-Bahn). http://www.vbb.de/de/article/fahrplan/webservices/schnittstellen- fuer-webentwickler/5070. [57]

Switzerland Table 9: Data feeds in Switzerland Theme Feed Type of Data Description and Comments Network and Actual data Real-time The actual service provided is displayed. The final forecast is Assets used where no actual data are available. The "quality" is characteristics shown in the appropriate Status fields. https://opentransportdata.swiss/en/dataset/istdaten [58] Operational Timetable 2017 Real-time The timetable contains the essential topological and (GTFS) temporal elements that enable timetable display and

IMS-WP7-D7.1-DLR-006-02-I 75 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

information. https://opentransportdata.swiss/en/dataset/timetable-2017- gtfs [59] Network and DiDok Static DiDok stands for “Dienststellendokumentation” (location Assets documentation). The data are an extract from all of the characteristics operating points in Switzerland, including all of the stops. https://opentransportdata.swiss/en/dataset/didok [60] Corporate Business Static The business organisations display transport companies Organisations organised structurally by billing-related and customer- information-related features. https://opentransportdata.swiss/en/dataset/goch [61] Network and Station list Static The station list consists of two files taken from the timetable Assets (HRDF). characteristics https://opentransportdata.swiss/en/dataset/bhlist [62] Operational Timetable 2017 Yearly The annual timetable contains the timetable data that are (HRDF) primarily communicated in the form of printed materials. https://opentransportdata.swiss/en/dataset/timetable-2017- hrdf [63] Network and GTFS Realtime Real-time GTFS Realtime is an expansion to GTFS static. It offers the Assets “Trip Updates” feed for transport companies supplying real- characteristics time information. https://opentransportdata.swiss/en/dataset/gtfsrt [64] Operational Trip forecast Real-time API allows the user to retrieve real-time data about a specific trip. https://opentransportdata.swiss/en/dataset/fahrtprognose [65] Operational Departure/arrival Real-time The departure/arrival display’s API allows you to search for display the departures/arrivals from/to a stop at a specific time. Real- time information is given where applicable. https://opentransportdata.swiss/en/dataset/aaa [66] Operational Timetable Operational The file provides an overview of the available timetable data overview as well as its status, its validity and the corresponding permalink. https://opentransportdata.swiss/en/dataset/timetabeloverview [67]

United Kingdom

The real-time data-feeds presented in the table above are targeted for use by software developers (see [68]). In addition the Office of Rail & Road provides high-level statistics about the UK railway ([69]).

Table 10: Data feeds in the United Kingdom Updating Theme Feed Type of Data Description and Comments rate Operational RTPPM Real-time 1 per minute Real Time Public Performance Measure. This shows the performance of trains against the timetable, measured as the percentage of trains arriving at destination on time, and is updated every minute Operational Train Real-time Up to 600 Messaging from the TRUST system, Movements per minute containing reports of train movements past timetabled calling and passing points.

IMS-WP7-D7.1-DLR-006-02-I 76 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Note: Messages are batched to reduce network overheads. Operational TD Real-time Up to 1000 Berth-level data from the Train Describer per minute system, showing raw data with train movements in more detail than the Train Movements feed. Note: Messages are batched to reduce network overheads. Operational VSTP Real-time Low volume Late-notice train schedules which are not available through the SCHEDULE feed Operational TSR Low-volume <10 per week Temporary Speed Restriction data as on Fridays published in the Weekly Operating Notice Operational SCHEDULE Static Daily Extracts of train schedules from the Integrated Train Planning System in CIF and JSON format Operational Reference Static Infrequent Reference data which can be used to help Data analyse other data feeds: SMART -train describer berth offset data used for train reporting Corpus - location reference data (JSON format) BPLAN - train planning data, including locations and sectional running times (Public Interface Format “PIF”) Train Planning Network Model - contains very detailed information on the network model used by ITPS, the Integrated Train Planning System. Summary Rail Statistics Static Yearly Annual compendium publication contains a Compendium summary of the statistical releases published by ORR. Usage of the Freight rail Static Quarterly All information on rail freight usage in Great network usage - Britain. freight moved, freight lifted, normalised freight delay Usage of the Estimates of Static Yearly All information on station usage in Great network station usage Britain. Usage of the Passenger rail Static Quarterly All information on rail passenger usage in network usage - Great Britain. Passenger train KM, Passenger KM, journeys, revenue Corporate Passenger Static Quarterly Rail service This release contains information on complaints - complaints made by passengers regarding Complaints, rail services in Great Britain. Appeals,NRE Corporate Disabled Static Quarterly Rail passenger assists and bookings. Person’s Railcard (DPRC) and assisted journeys data

IMS-WP7-D7.1-DLR-006-02-I 77 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Usage of the Regional rail Static Yearly Provides passenger journeys data for each network usage Region of Great Britain, covering the volume of journeys between Regions and within Regions. It also looks at cross-border flows between England, Scotland and Wales. Operational Passenger & Static Quarterly This section includes reports on the performance freight rail punctuality of passenger and freight services performance - and the reliability of passenger service. PPM, CaSL, FDM Health & Signals Static Quarterly Number of signals passed at danger Safety passed at (SPADs) without authority on the mainline indicators danger (SPADS) - Official Statistics Corporate UK rail Static Yearly UK rail industry financial information industry presents ORR's analysis of the latest financial financial data from across the industry information - Official Statistics Corporate Rail fares Static Yearly Shows average change in price of rail fares index and by ticket type. Health & Occupational Static Yearly Provides occupational health indicators for Safety Health - rail including manual handling, shock or indicators Official trauma incidents, assaults and verbal abuse. Statistics Corporate Rail finance Static Yearly Includes government support, subsidy, private investment Health & Rail safety - Static Yearly Provides safety indicators for rail including Safety Key Safety broken and buckled rail, passenger, public, indicators Statistics workforce, road-rail interface, injuries at level crossings and near misses. Network and Rail Static Yearly Asset management information including Assets infrastructure, asset renewals, remediation projects, asset characteristics assets and failures, asset condition, carbon emissions, environmental late possessions, station and station stewardship data. There are a number of further initiatives that are being considered to make data openly available to academia and suppliers to enable increased exploit. The focus of these activities is around providing specific datasets associated with:

 Asset inventories  Asset faults and incidents  Track and OLE monitored data (by NR fleet and also passenger-based monitoring)  Environmental data as available  Train delay data Data available is focused on developing methodologies to respond to specific NR challenges. Activities are being progressed through research and development programs and are at an early stage of development.

IMS-WP7-D7.1-DLR-006-02-I 78 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Use of standard open formats In general, most data is provided either through:

 A web service that is assessed to download “static” data. Much of this data is provided in open common-separated-values (CSV) or Excel format.  Real-time data streams are provided via an Application Programmable Interface. A relatively narrow range of formats are employed including JSONS and XML.  There is limited use of railway/transport specific open formats; the Common Interface Format1 is used to exchange schedules in the UK; the General Transit Feed Specification (GTFS) is used in France and Switzerland.  RailML and SensorML are not actively used by the Infrastructure Managers that have been involved in this review (Network Rail and SNCF).

8.1.2 railML®

8.1.2.1 INTRODUCTION railML® is a data exchange format based on the Extensible Markup Language (XML) focusing on railway applications ([75]). At the same time, railML.org is an open source initiative working constantly on the development of this data exchange format for railway applications. Currently, railML includes the following data schemes ([76]):  Timetable  Infrastructure  Rolling stock  Interlocking The latest version of railML®, railML® v2.3, has been released in March 2016. The following figure provides an overview about the versions that are currently supported and the planned future releases ([77]): Table 11: railML® versions [77]

IMS-WP7-D7.1-DLR-006-02-I 79 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Additionally, railML® v2.4 has been announced at the last railML® Conference in Berne on 22.03.2017 ([79]). According to this announcement, the summary of changes in the railML® infrastructure schema will be small ([80]). railML is a user driven standard for data exchange in railways and based on an open development. For interaction between users and developers, a number of tools are provided:  The railML Website is the central point of information. From here, railML users, developers and interested people are directed to all the other information they are searching for ([81])  The railML Forum is the discussion platform where users can discuss with users and developers about certain modelling and application aspects ([82]). The different scheme-specific forum topics are moderated by the railML scheme coordinators.  The railML Wiki is the open usage documentation that complements the scheme documentation (cf. [83]). In particular, railML beginners will find here useful information about the different elements and attributes.  The railML Trac is a ticket system for tracking defects and enhancements that have been discussed and consolidated in the railML forum and which shall be solved / implemented in the future (cp. [84]).  railVIVID is an open-source tool for validation and viewing of railML files. It can be downloaded from the railML Website (cf. [85]).

The railML website [81] lists more than 100 companies as railML partners. 24 of these companies are categorized as “developers”. 42 companies are categorized as “users”. The remaining companies are tagged as “supporters”. In order to use railML with projects and products the license terms listed in [87] have to be obeyed. If you intend to use railML in a productive manner, you are obliged to certify your railML interface(s) that you want to promote or sell. A detailed description of the certification process is given in [88]. The data exchange format is used in domains such as:  exchange of the track geometry  capacity operational simulation  timetable information  exchange of train formation data  schematic track plan  exchange of infrastructure, interlocking, timetables and rolling stocks information.

8.1.2.2 INFRASTRUCTURE SCHEME The railML infrastructure schema has its focus on the description of the railway network and related infrastructure ([89]).  Topology. The track network is described as a topological node edge model at the level of tracks and switches.  Coordinates. All railway infrastructure elements can be located in an arbitrary 2- or 3- dimensional coordinate system, e.g. the WGS84 that is widely used by today's

IMS-WP7-D7.1-DLR-006-02-I 80 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

navigation tools and software. It is further possible to define a separate height coordinate system.  Geometry. The track geometry can be described in terms of radius and gradient change points along the track.  Railway infrastructure elements enclose a variety of railway relevant assets that can be found on, under, over or next to the railway track, e.g. balises, platform edges and level crossings.  Further located elements encompass elements that are closely linked with the railway infrastructure, but that "cannot physically be touched", e.g. speed profiles and track conditions.

8.1.2.3 TIMETABLE SCHEME The railML timetable schema has its focus on the description necessary to exchange any kind of timetable for operational or conceptional purposes, including the following information listed below ([83]):  Operating Periods. The operating days for train services or rostering.  Train Parts. The basic parts of a train with the same characteristics such as formation and operating period. The train part includes the actual information regarding the path of the train as a sequence of operation or control points together with the corresponding schedule information.  Trains. One or more train parts make up a train and represent either the operational or the commercial view of the train run.  Connections. The relevant connections/associations between trains at a particulare operaton or control point.  Rostering. Train parts can be linked to form the circulations necessary for rostering (rolling stock schedules).

8.1.2.4 ROLLING STOCK SCHEME The schema rolling stock has its focus on the description of rail vehicles including locomotives, multiple units, passengers and freight wagons as also the combination of single vehicles into formations. Scheme features are listed below ([83]):  separate parts for vehicles and for train parts or complete trains  possible specification of vehicle families and individual vehicles using the common features of the family  different level of detail for data

1. vehicle as black box (with respect to dynamic characteristics) with only mean values 2. vehicle as black box (with respect to dynamic characteristics) with curves for particular values being variable within the operating range 3. vehicle as white box with details about the internal propulsion system

 vehicles with motive power, for passenger or freight use  combination of vehicles to formations, i.e. train parts or complete trains

IMS-WP7-D7.1-DLR-006-02-I 81 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8.1.2.5 INTERLOCKING SCHEME The railML interlocking scheme has its focus on the description of information that infrastructure managers usually maintain in signal plans and route locking tables. A few scenarios are listed below ([83]):  Data transfer. a standard data exchange format will allow the automation of data transfer, which is the process of adapting a railway interlocking and signalling system to a specific yard.  Simulation programs. the railML® IL schema allows modellers to quickly absorb information about the interlocking systems such as timing behaviour and routes and analyse the impact on railway capacity. The Interlocking scheme is not available for the railML Version 2.x, but will be implemented in the railML Version 3.

8.1.2.6 RAILML® V2.3 railML version 2.3 is the latest release of the data exchange format and it has been published in March 2016. Like the previous versions, railML v2.3 contains infrastructure, timetable and rollingstock elements and attributes. A detailed list of changes between railML v2.2 and v2.3 can be found in [90]. For infrastructure, the amount of modifications is not very extensive since the railML infrastructure development is already actively working on the new baseline of the model – railML v3.

8.1.2.7 RAILTOPOMODEL AND RAILML® V3 UIC RTM Feasibility Study Assigned by UIC the Swiss IT company TrafIT Solutions did a feasibility study for a common railway infrastructure data model. They presented their results at the 24th railML.org Conference in Paris on 18.09.2013 ([91]). As a result of this study it had been concluded that about 95% of all the elements and attributes of the different existing data models used by European railway companies are very similar to each other due to their reference to the built railway network. The feasibility study postulates central requirements for a generic railway infrastructure topology model:  The model must be scalable: a generic core may be extended by various user specific themes.  Topology is the core of the data model.  The model shall support different levels of detail and these levels are linked with each other.  Depending on the specific user application relevant information are stored in the matching level of detail.

RTM Development Based on the results of the feasibility study the railway infrastructure managers organized in the UIC working group ERIM (European Railway Infrastructure Masterplan) started the development of a generic railway infrastructure data model – the RailTopoModel (RTM). In 2016, the RailTopoModel approach has been released by the UIC as International Railway

IMS-WP7-D7.1-DLR-006-02-I 82 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Solution IRS 30100 ([92]). Thus, the RailTopoModel can be seen as a standardized modelling approach for railway infrastructure data models that is recommended for use by the International Railway Union. In addition to the IRS 30100 UIC published a wiki in order to provide complementary information about the modelling concepts, practical model usage and model constraints ([93]).

RTM Modelling concepts The foundation for RTM is a mathematical graph model that is described in detail in [95]. It allows modeling the topology of a railway network of any complexity. Further, RTM defines a generic concept for locating railway infrastructure elements at the railway network described by the graph: A generic “NetEntity” is working as an anchor element for any objects referencing the topology network. The anchor element may be modelled punctiform, linear or as sub- network. For further information about the RTM modelling concepts, please read the relevant wiki pages in [101]. railML® v3 The RTM builds the basis for the development of the new version of the data exchange format, railML v3. It can be seen as a first implementation of the RTM with a special focus on data exchange using XML syntax. Like the RTM, railML v3 has been developed on the basis of UML class diagrams using the software Enterprise Architect. The railML v3 schema files (XSD) are generated from the UML using integrated and proprietary export tools. A first official release of railML v3 schema files is scheduled for autumn 2017 (cp. [77])

8.1.2.8 RAILML® IN IN2RAIL railML in IN2RAIL WP9 IN2RAIL (Innovative Intelligent Rail) is the main predecessor project to IN2SMART. In its work package WP 9 the focus was on design and development of an advanced asset information system that is able to analyse and predict the status of the network assets. In this context, the data structure of the asset status data to be collected is of major interest. The deliverable D9.1 lists all the data that are necessary for the asset information system and its interfaces (cf. [3]). Further, it is analysed how existing data exchange formats like railML cover these requirements. As a result of this investigation it has been decided to use railML for exchanging static railway infrastructure data and to use a different format like SensorML for all the dynamic information.

IN2RAIL requires railML v3 Since the current railML version 2.3 does not cover all the aspects required for the asset status representation, the IN2RAIL project is longing for the upcoming new version railML v3. In order to have all the required elements and attributes then being implemented, a railML data exchange use case has been derived from the report D9.1 and officially submitted to railML.org. This use case is now available in the railML wiki [102]. It currently has the priority 2, which means that the use case is going to be implemented not with the very first railML v3

IMS-WP7-D7.1-DLR-006-02-I 83 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

release (railML v3.1), but with the following one (railML v3.2). According to railML.org [78], the release of railML v3.2 is planned for the end of 2018. railML in IN2RAIL WP7 The focus of the WP 7 is the development of a prototype based on the IN2RAIL WP 9 described above. The deliverable D7.1 describes that the import format of the topology data could be railML 2.3 or the new railML version 3. The railML 3 version uses the UIC RailTopoModel for modelling the railway infrastructure. The format that could be used is still in discussion and should be decided in WP8.

8.1.2.9 RAILML COMMUNITY railML® is a community driven project that is coordinated by the non-profit organization railML.org which is a registered association by German law. railML.org is fully independent from railways, manufacturers and authorities ([129]). Two coordinators, Vasco Paul Kolmorgen and Dr. Daniel Hürlimann, are in charge of the main coordination at railML.org. The ongoing development of the different railML data subschemes is managed by four scheme coordinators that come from scientific railML partners as well as from industry using the railML standard.

The following two sections describe the sequence of steps for railML® scheme development. The first one is directed to the use case driven railML development, which is quite new and specifically set up for the new railML version 3. The second section addresses the process of incorporating small changes in the railML schema as it has been done in the past for previous and current versions of railML 2.x.

Use case view  Initial situation: you have a specific data exchange issue for which you want to use the railML data exchange format. Such an issue is called a railML use case.  Step 1: Review the lists of use cases in the railML wiki in order to find out whether your use case has already been recorded ([95]). There exists a list of use cases for every railML subschema. The use cases for railML based data exchange of railway infrastructure data can be found in [97].  Step 2a: If your use case is already listed, review it with respect to your specific task to find out whether there are relevant aspects missing in the use case. If that’s the case, bring your issues to the railML forum ([81]) and discuss it there together with the railML community. The responsible railML scheme coordinator will lead the process of use case modification. Finish.  Step 2b: If your use case is not yet listed, inform the responsible scheme coordinator by email including a very brief use case description. The scheme coordinator reviews the use case description and asks you to formulate the use case according to the structured template either directly in the railML wiki ([83]) or using a Word document. The use case description comprises: o A precise description of the data exchange application.

IMS-WP7-D7.1-DLR-006-02-I 84 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

o A brief analysis of the relevant data flows and interfaces for the data exchange o A brief summary of characteristics of the data to be exchanged via the interface. o If possible, you may further add a list of functional elements that are included in your data exchange use case  Step 3: The responsible railML scheme coordinator leads the task of consolidating the use case user input. In particular, requirements are derived and formulated as tickets in the railML Trac ticket system ([84]).  Step 4: The responsible railML scheme coordinator leads the implementation of the requirements and the resulting elements and attributes in the railML data model. Usually, the work is done by the railML scheme coordinator and a specific scheme working group. The implementation comprises: o Changes in the railML data model (UML or XSD) o Tracking the changes in the railML Trac ticket system o Documenting the changes in the source code o Documenting the changes in the railML Wiki  Step 5: The responsible railML scheme coordinator leads the task of use case element specification. The aim of this step is to specify which elements and attributes of the railML data model are mandatory considering the given use case and which elements and attributes are optional.  Step 6: The responsible railML scheme coordinator leads the work of writing an official use case document that brings all the use case facets mentioned before together. The official use case document will be entitled “Use Case Definition” and released on the railML website. Thus, the use case definition is the reference document for certification of railML interface implementations.

Element view  Initial situation: you discover a bug in the existing railML® schemes or you want to enhance the model at a specific point.  Step 1: Discuss the issue with users and developers of the railML® community in the railML forum ([81]). There is one forum for each railML subschema and one forum for common aspects.  Step 2: The responsible scheme coordinator summarizes the outcome of the forum discussion and consolidates the solution / result. If the solution comprises a modification or extension of the existing railML data model, the scheme coordinator will create a ticket using the railML Trac ticket system ([84]). Each ticket is linked with a future version of railML. Thus, users can see when to expect which modifications being implemented in the schema.  Step 3: The railML scheme coordinator leads the implementation of the scheme modification or enhancement and tracks the state in the railML Trac ticket system. The implementation comprises: o Changes in the railML data model (UML or XSD) o Tracking the changes in the railML Trac ticket system o Documenting the changes in the source code o Documenting the changes in the railML Wiki

IMS-WP7-D7.1-DLR-006-02-I 85 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

o Releasing the new version of railML in the railML subversion repository ([98]) and publishing information about the changes o If necessary, adapting the use cases that are affected by the changes o Presenting the summary of changes at the next railML conference  In the meantime: Modifications and enhancements of the railML data model require some time for implementation. If you cannot wait that long, you may want to make use of the “any element” and “any attribute” to attach your own temporary scheme extensions to the railML model.

Based on the sequence steps above, the schema is continually extended, modified and enhanced.

8.1.3 TAF/TAP TSI The Telematic Applications for Freight / Passenger Services Technical Standards for Interoperability ([TAF], [TAP]) are EU regulations specifying the exchange of information between relevant stakeholders, in order to enable cross-border rail services.

All RU/IM messages described in TAP are common with TAF (which contains additional messages specific for freight traffic). The common processes are related to path allocation, train readiness, train running reporting and service interruption. Consequently, messages are harmonised between TAP and TAF and gathered in the same data model. The Technical Specification for Interoperability on “Telematics Applications for Passengers” (TAP TSI) prescribes protocols for the data exchange of  timetables  tariffs  reservations, fulfilment  information to passengers in station and vehicle area  train running information  etc. which must be expected by the European rail sector (railways, infrastructure managers, ticket vendors etc.) according to the European Rail Passengers’ Rights Regulation EC/1371/2007 and to the Interoperability Directive EC/2008/57.

IMS-WP7-D7.1-DLR-006-02-I 86 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 19: Principle of common interface for TAF/TAP TSIs

Data formats used in TAP TSI:  EDIFACT (timetabling)  Fixed length text files (tariff data)  Binary messages (reservation messages)  XML-messages (home printed tickets, PRM reservation) The Technical Specification for Interoperability on “Telematics Applications for Freight” (TAF TSI) drafted by ERA prescribes protocols for the data exchange of:  Path request  Train Running Forecast  Service Disruption Information  Shipment Estimated Time of Interchange / Arrival  Etc. TAF TSI prescribes furthermore databases which must be implemented by European RUs, IMs, or Freight Customers:  Reference Files (such as location ID, company ID etc.)  Rolling Stock Reference Databases  Wagon and Intermodal Unit Operational Database  Trip plan for wagon / Intermodal unit TAF TSI prescribes the mandatory use of a so called “common interface” which is mandatory for all RUs and IMs:

IMS-WP7-D7.1-DLR-006-02-I 87 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 20: Common interface for TAF TSI

The Common Interface support following connectors that are used by existing legacy systems:  IP socket connector (customized application protocol)  JMS connector  MQ (IBM)/JMS connector  FTP Connector  FS Connector  SMTP Connector  Web Service Connector

Supported data formats are:  XML  Text  CSV  UIC 407-1

(see [103])

8.1.4 UIC 407-1 “Standardized data exchange for the execution of train operations, including international punctuality analysis” - standard developed by the International Union of Railways (UIC). The objective of the standard is to automate as far as possible the exchange of operationally necessary information between the RUs (IMs and /or RUs) involved in a train movement and to overcome language barriers in international rail traffic. Messages defined in this standard contain information / data needed to carry out the most important processes in train operations. These are in the first instance processes of operative train running (for instance: traffic regulations and the planning of resource deployment) but also of planning and quality control. Where planning is concerned, however, this only applies in

IMS-WP7-D7.1-DLR-006-02-I 88 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

the case of alterations at short notice to current train utilization (e.g. scheduling of a special train or cancellation of a train to take effect within 24 hours). Messages are functionally designed to facilitate exchanges between different IMs and between IMs and RUs. Exchanges between RUs only do not form part of this standard. From a technical point of view, messages are suitable for exchange both between process- driven telematics systems and PC-based mail systems. Furthermore, an exchange of messages is also feasible between both systems variants (i.e. between telematics systems on the one hand and mail systems on the other). The type and exact content of messages to be transmitted are dependent on the information requirements of the RUs involved in the exchange of messages as well as on potential of their telematics or mail systems.

From architecture point of view an essential distinction is made between: - train related messages (primary statement always concern individual train) - event related messages (primary statement always concern a specified event, for instance strike or bomb threat) - messages between Quality Monitoring Centers

Train related messages by turn are divided into: - those that are not replied to by the recipient (unidirectional messages) - those that require a reply from the recipient (bidirectional messages)

Train-related messages can be designed and subsequently issued or received, displayed or further processed within a system as part of process-controlling telematics system (traffic control system) or else via a suitably appointed mail system. In telematics systems, a message can be designed and sent either in an event-driven automatic fashion as part of the ongoing process or else as triggered by the operator. The latter approach is always required in mail system. Messages are defined as text strings with field of fixed length. In this standard messages have been numerically coded to reflect the aforementioned principles. The coding pattern is as follows: 2001 to 2099 – unidirectional messages between IMs 2101 to 2199 – unidirectional messages from IMs to RUs 2201 to 2299 – unidirectional messages from RUs to IMs 2301 to 2399 - bidirectional messages between IMs 2401 to 2699 – bidirectional messages between IMs to RUs 2701 to 2799 – event related messages 2801 to 2899 – messages defined and structured in a national context in accordance with this standard 2901 to 2999 – messages between Quality Monitoring Centres

IMS-WP7-D7.1-DLR-006-02-I 89 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Unidirectional messages are numbered in rising sequence within a given number block, commencing with “01” in each case. Bidirectional messages comprise an enquiry and a reply or several possible replies. The enquiries and their respective reply messages are numbered in rising sequences of tens, the units digit for enquiries in each case being “0”. Each message within a number block has a set textual designation. Identical messages retain the same text from block to block. They are merely distinguished by their differing message numbers. These differing message numbers are nevertheless structured in such a way that the identical nature of given message can be inferred. For example, the “running forecast” between IMs bears the message number “2001” whilst the running forecast from IMs to RUs bears the message number “2101”. The standard provides message frames to be exchanged, however do not define protocols to be used to exchange the data with. Currently there are other concepts that can replace the UIC 407-1 like TAF/TAP TSI. (see [104])

8.1.5 RINF – Register of Infrastructure The European Register of Infrastructure refers to Article 49 of Directive (EU) 2016/797 and provides for transparency concerning the main features of the European Railway infrastructure. The common technical specifications are set out in a Commission Implementing Decision (RINF Decision). The most recent RINF Decision (Decision 2014/880/EU from 26 November 2014) repeals the previous Decision 2011/633/EU and introduces a computerised common user interface (CUI) which simplifies queries of infrastructure data. This interface, set up and managed by the European Railway Agency, is publicly available. Furthermore, the RINF Decision obliges each Member State to nominate an entity (NRE) in charge of setting up and maintaining its register of infrastructure and to notify an implementation plan. The primary purpose of RINF is to support technical compatibility between fixed installations and rolling stock within the European community. For that purpose, the railway network is considered to be at the macro-level a series of operational points and sections of line. At the micro-level, subsystem features are assigned to infrastructure elements, such as tracks and sidings. Ultimately macro- and micro-level should be presented in terms of digital maps.

Railway network structure for RINF For the purpose of RINF  the railway network is considered to be a series of operational points (OPs) connected by sections of line.  a line is a sequence of one or more sections, which may consist of several tracks

IMS-WP7-D7.1-DLR-006-02-I 90 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

 a section of line is the part of line between adjacent OPs and may consist of several tracks  operational points are locations for train service operations for example where train services can begin and end, change route and where passenger or freight services are provided  stopping points for passengers on plain line are also regarded as OPs  operational points may be locations where the functionality of basic parameters of a subsystem are changing for example: track gauge, voltage and frequency, signaling system  operational points may be at boundaries between MSs or IMs  passing loops and meeting loops on plain line or track connections only required for train operation do not need to be published (however, if parameters change at the connection it would be considered an Operational Point and included in the register)  sidings are all tracks not used for train service movements  Figure 2 shows an example of the railway network structure of RINF, the elements of which belonging to different IMs.

Figure 2: Structure of the railway network for the register

Items collected in RINF have to be accessible for end-users (process of data retrieval). This requires an implementation using IT-means with the need to define a harmonized model of the railway system.

The use cases mentioned in table below represent the primary purpose of RINF, which have been influencing the selection of items of the data base.

IMS-WP7-D7.1-DLR-006-02-I 91 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Table 12: Primary purpose of RINF Title Description Quality demand Technical RUs to retrieve technical characteristics of a Mission critical compatibility for specific route for the check of line specific route allocation technical compatibility between fixed installations and rolling stock. Technical NoBos/DeBos to retrieve technical Process critical compatibility for characteristics of a MS for the conformity EC verification assessment within the process of EC verification. RST design Rolling stock manufacturers to retrieve technical Financial critical characteristics for a certain part of the network in order to achieve compliance when designing and authorizing vehicles for placing in service on “type”-level. Interoperability EC/ERA/MSs/NSAs to retrieve characteristics Financial critical progress for specific parts of the networks to follow up regularly the progress towards an European interoperable network in terms of key performance indicators.

Regarding the transition period for infrastructures placed in service the RINF WG decided to set the final deadline for transition to five years. All types of network shall be included in RINF within five years after entry into force (1st January of 2015) of the RINF specification.

Figure 21: Principle of common interface for RINF

IMS-WP7-D7.1-DLR-006-02-I 92 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Providing data to RINF can be done via web application under “Data management” where users have the possibility to upload XML datasets compressed within a .zip file. The XML file is then validated against the corresponding XML Schema Definition (XSD). If the XSD validation fails, the data will be rejected and a report will be dispatched to the NRE listing the encountered errors. The communication between the users and the RINF system is performed through the internet. Thus, in both cases the RINF architecture is transparent to the users. The users (i.e. RUs, IMs, Manufacturers, NSAs, etc.) open a web browser and connect to the RINF system via HTTP (or HTTPS for extra security if necessary). The RINF system provides access to the provided infrastructure information, as well as any additional functionalities and services. The RINF system queries the central RINF database and provides back to the users the proper RINF information. More information can be found under RINF application guide or RINF User Manual (see [130]).

8.1.6 EULYNX The EULYNX (European Initiative Linking Interlocking Subsystems) project is an initiative of several European infrastructure managers with a common goal for standardization of interfaces.

The following Infrastructure Managers are currently involved as partners in the initiative:  Société Nationale des Chemins de Fer Luxembourgeois (CFL)  DB Netz AG (DB)  S.A.  Bane NOR  Liikennevirasto (FTA)  Network Rail  ProRail B.V.  Société Nationale des Chemins de Fer Français (SNCF)  SŽ-Infrastruktura, d.o.o. (SŽ)  Trafikverket

The project aspires to a mutually shared vision toward harmonization of rail signaling systems, their technical architecture, its functions and interfaces. The work breakdown structure of the EULYNX project includes items like system architecture, modelling & testing, data preparation, interfaces between interlocking, interfaces to track vacancy detection and adjacent interlocking or signaling subsystems.

The rail infrastructure managers intend to benefit from the project by being able to change, maintain, renew and update the systems in a competitive way.

IMS-WP7-D7.1-DLR-006-02-I 93 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

This would place the IM’s as the system integrators into a position which provides them with a choice of various suppliers for different subsystems during the systems life cycle. The goal is reduced costs for new projects, or when modifying existing system functionality or infrastructure layouts. Also maintenance related activities should benefit from this. Results of previous European initiatives concerning interlocking system standardization (e.g. Euro Interlocking, INESS and ERTMS) provide the basis for the project. The EULYNX initiative acts as a cooperation based on a mutually accepted agreement following democratic principles and membership fees. The project community expects to have different kinds of partners: rail infrastructure managers as core members, other active members like signaling or industrial partners, engineering bureaus and universities. Also observers like associations, regulators etc. may be joining in. The current phase of the project will provide a full set of specifications. The project has started on 19 February 2014, with a three year lifespan for this stage. After three years the project organization will evolve into a standing organization for standardization of interfaces, based on a full set to be published in 2017 (Baseline Set 1 - partly released in March 2017 rest of documents planned to release in June 2017; Set 2 is scheduled for December 2017 – including formal models). ([123])

Figure 22: EULYNX System architecture

IMS-WP7-D7.1-DLR-006-02-I 94 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8.2 AUTOMOTIVE A multitude of different formats is available for use in the automotive industry. As it is not possible to mention every used format within the scope of this research, some examples of important formats are listed below. To help readability, this section is separated into two parts: in the first one, automotive centered formats are discussed. While these formats were developed for automotive specific tasks, some of them may be applied in other domains as well. The second part focuses on general formats that were not specifically developed with the automotive industry in mind and are often used in many different domains outside of automotive. They range from data processing and management to non-specific formats like csv and UML.

8.2.1 Automotive centered formats

8.2.1.1 ASAM ODS ODS (Open Data Services) focuses on the persistent storage and retrieval of testing data. The standard is primarily used to set up a test data management system on top of test systems that produce measured or calculated data from testing activities. Components of a complex testing infrastructure can store data or retrieve data as needed for proper operation of tests or for test data post-processing and evaluation. A typical scenario for ODS in the automotive industry is the use of a central ODS server, which handles all testing data produced by vehicle test beds. The major strength of ODS as compared to non-standardized data storage solutions is that data access is independent of the IT architecture and that the data model of the database is highly adaptable yet still well-defined for different application scenarios. ([105])

8.2.1.2 AUTOMOTIVE DATA AND TIME-TRIGGERED FRAMEWORK (ADTF) EB Assist ADTF (Automotive Data and Time-Triggered Framework) is a tool for the development, validation, visualization and test of driver assistance and automated driving features that includes the latest technology. You can deliver advanced driver assistance systems (ADAS) and highly automated driving (HAD) features to your customers with EB Assist ADTF. This trustworthy tool is flexible, efficient, extendable, and stable. ([106])

8.2.1.3 AUTOMOTIVE OPEN SYSTEM ARCHITECTURE (AUTOSAR) AUTOSAR (AUTomotive Open System ARchitecture) is a worldwide development partnership of automotive interested parties founded in 2003. It pursues the objective of creating and establishing an open and standardized software architecture for automotive electronic control units (ECUs) excluding infotainment. Goals include the scalability to different vehicle and platform variants, transferability of software, the consideration of availability and safety requirements, a collaboration between various partners, sustainable utilization of natural resources, and maintainability throughout the whole "Product Life Cycle". ([107])

8.2.1.4 FUNCTIONAL MOCK-UP INTERFACE (FMI) Functional Mock-up Interface (FMI) is a tool independent standard to support both model exchange and co-simulation of dynamic models using a combination of xml-files and compiled

IMS-WP7-D7.1-DLR-006-02-I 95 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

C-code. It is used to develop virtual products based on components that interchange their data. The variables for data exchange are defined as XML files and the components (applications) are provided as compiled C-functions. ([108])

8.2.1.5 STEP AP 242 XML (ISO 10303-242) The standard STEP AP 242 (ISO 10303-242) “Managed model based 3D engineering" is the merging of 2 ISO standards: Aerospace's STEP AP203 "Configuration controlled 3D design" and Automotive's STEP AP214 "Core data for automotive mechanical design processes. ([109])

8.2.1.6 OPENDRIVE OpenDRIVE is a XML based road description format that defines road networks in a topologically way by linking road elements and topographically by using mathematical descriptions in a 3-dimentional way. Infrastructure is defined as road objects or signals in a relative coordinate system. By now OpenDRIVE can be seen as the de facto standard within the driving simulator domain and is used for testing purpose as well. It is developed and maintained by a core team consisting of car manufactures, map makers and research institutes. OpenDRIVE is royalty-free. ([110])

8.2.1.7 NAVIGATION DATA STANDARD (NDS) Navigation Data Standard is a data format for road networks used for navigation purpose. It has a strong focus on interoperability and minimal usage of resources. It defines road networks in a topological and topographical way (in lane level based on mathematical functions) and defines road infrastructure, city and terrain models in a rudimental way, too. NDS developed by a consortium consisting of car manufactures, tier-1 suppliers and map makers. ([111])

8.2.1.8 OPENSCENARIO OpenSCENARIO is a XML based traffic scenario description format that defines entities, environmental conditions and interaction or behavior of traffic participants. These interactions can be descripted relative to other entities, relative to the road or absolute. For road description a link to a corresponding OpenDRIVE file is used. OpenSCENARIO is developed and maintained by a core team consisting of car manufactures, simulation tool provider and research institutes. It is royalty-free. ([112]) 8.2.1.9 SENSORIS SENSORIS is an initiative founded by HERE to define a car-to-cloud data standard. Sensor data and information about the environmental condition should be shared in a defined way between vehicles and backend systems of different manufactures. Currently information about hazard warnings, street parking and traffic condition are modelled. ([113]) 8.2.1.10ADASIS The Advancing map-enhanced driver assistance systems (ADASIS) provide a so called “electric horizon” in the vehicle to different assistance systems. This horizon contains e.g. information about the map, vehicle position and speed in a standardized data model. The

IMS-WP7-D7.1-DLR-006-02-I 96 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

system is developed in a consortium consisting of car manufactures, tier-1 suppliers and map makers. ([114])

8.2.1.11CAR 2 CAR COMMUNICATION CONSORTIUM The Car 2 Car Communication Consortium (C2C CC) defines and establishes standards for cooperative intelligent transportation systems (C-ITS). Part of this task is the specification and contribution to the vehicle-to-vehicle and vehicle-to-infrastructure communication using various message types to descript the state of vehicles, distribute information and warnings as well as information about intersection layouts and signal time and phases. The consortium cooperates closely with the European Telecommunications Standards Institute. ([115])

8.3 INDUSTRY AND HOME AUTOMATION 8.3.1 Industry See chapter 6.3.1 OPC UA. 8.3.2 Home Automation

8.3.2.1 BACNET The communication protocol BACnet was specially developed for the requirements of buildings. It is suited for both the automation and the management level. The emphasis is placed on building automation and control with a view to HVAC plants, fire control panels, intrusion detection and access control systems. BACnet is continually being extended for additional building-specific systems such as escalators and elevators. By integrating new IT technologies such as IPv6 and Web services, the BACnet standard is further developing into a modern, IT-friendly and multidisciplinary building protocol. At the same time, standardized ASHRAE or AMEV device profiles ensure a high level of quality and planning reliability with a strict testing and certification procedure. ([116]) Standard: ISO 16484-5 8.3.2.2 KNX KNX is an open, worldwide standard used for more than 20 years, conforming to EN 50090 and ISO/IEC 14543, which is supported by more than 300 vendors. With KNX technology, advanced multiple disciplines as well as simple solutions can be implemented to satisfy individual requirements in room and building automation in a flexible way. KNX products for the control of lighting systems, shading and room climate plus energy management and security functions excel in ease of installation and commissioning. A vendor-independent tool (ETS) is available for commissioning. KNX can use twisted pair cables, radio frequency (RF) or data transmission networks in connection with the Internet Protocol for communication between the devices. Coordinated room and building management often demands the integration of other technologies and systems. Hence, KNX links and interfaces for connection to Ethernet/IP, RF, lighting control with DALI and building automation and control systems are provided. ([117]) Standard: EN 50090 and ISO/IEC 14543

IMS-WP7-D7.1-DLR-006-02-I 97 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8.3.2.3 LONWORKS The LonWorks-based communication protocol is one of the most widely deployed technologies worldwide. Using the protocol, complete networks made up of interoperable products can be created. This is proven by the fact that more than 700 LonMark®-certified products from more than 400 companies in the fields of building automation and control, traffic and energy supply are used. Owing to its worldwide use and being a global standard, LonWorks is focusing on HVAC functions in room automation and at the field level. The protocol conforms to ISO/IEC 14908 (worldwide), EN 14908 (Europe), ANSI/CEA-709/852 (U.S.) and is also standardized in China. ([118]) Standard: ISO/IEC 14908 (worldwide), EN 14908 (Europe), ANSI/CEA-709/852 (U.S.) 8.3.2.4 DALI DALI (Digital Addressable Lighting Interface) is a standardized interface for lighting control. Electronic ballasts for fluorescent lamps, transformers and sensors of lighting systems communicate with the building automation and control system via DALI. ([119]) Standard: IEC 62386-101:2009-06; Teil 101: System; IEC 62386-102: 2009-06 Teil 102: Betriebsgeräte

8.3.2.5 ENOCEAN Worldwide leading companies operating in the field of building infrastructure have joined to form the EnOcean Alliance, aimed at implementing innovative RF solutions for sustainable building projects. Core technology is the self-powered RF technology developed by EnOcean for maintenance-free sensors, which can be installed wherever desired. The EnOcean Alliance stands for the incremental development of the interoperable standard and for a secure future of the innovative RF sensor technology. ([120]) Standard: ISO/IEC 14543-3-10 EnOcean-Funk

8.3.3 OPC Foundation: The Interoperability Standard for Industrial Automation OPC Foundation: The Interoperability Standard for Industrial Automation™

The mission of the OPC Foundation (https://opcfoundation.org/) is to manage a global organization in which users, vendors and consortia collaborate to create data transfer standards for multi-vendor, multi-platform, secure and reliable interoperability in industrial automation. To support this mission, the OPC Foundation: - Creates and maintains specifications - Ensures compliance with OPC specifications via certification testing - Collaborates with industry-leading standards organizations

IMS-WP7-D7.1-DLR-006-02-I 98 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8.4 CIVIL ENGINEERING / CONSTRUCTION

8.4.1.1 CITY GML City Geography Markup Language is a format for 3D models of cities and agglomerations. It is widely used for contour lines, roof areas from buildings, land use attributes, roads, railways and vegetations. It can be imported in calculation programs for 3D noise mapping, where the areas are split into parts with different noise related attributes. Road and Railway imports are used for defining the sources and the acoustic properties. The vegetation attributes lead to different reflection coefficients, which have to be taken into account in large free field calculations.

8.4.1.2 GEOTIFF TIFF is primary used by government bodies for satellite pictures or pictures from overflights while scanning other attributes and geometries. The Geo-referencing makes it easy to import and adopt to the demanded coordinate system. TIFF pictures are not compressed, thus printed plans are always in the highest possible resolution. The pictures can also be used for modelling in a 3D noise mapping program. The picture resolution is usually reliable for being used for modelling manually on parts where no scanned 3D models exist. It also supports the improvement of unprecise or incorrect scanned 3D models.

8.4.1.3 SIMPLE FEATURE ACCESS (OPENGIS) Simple Feature Access is a format for storing 2D objects for geographical modelling. 2D objects like lines, areas and points can also be linked with attributes, like population numbers or street names but also with the height of the object - if needed. It is used in every calculation program for free field noise mapping and modelling in a large scale. It is also used in GIS programs like ArcGIS or QGIS. This programs are used for cutting and relabeling noise maps for uploading on a public GIS platform that is also using this format as it is widely used in most of the GIS applications.

8.4.1.4 GEO JAVASCRIPT OBJECT NOTATION (GEOJSON) JSON (short for JavaScript Object Notation) is a text-based file format for data exchange. The format is well-structured and conducted as a valid java-script. JSON is widely used in web- based and mobile applications. Its file extension is *.json. GeoJSON is built upon JSON. It allows representing geographical data by means of point, line string and polygon geometries as well as sets of them (multipoint, multiline string and multipolygon). GeoJSON files describing countries or smaller regional entities are freely available on the internet. In many cases the open source javascript library D3.js. is applied. Its file extension is *.geojson. Usually, country-boundaries are represented by polygons. The GeoJSON extension TopoJSON encodes topology. Thus it is possible to eliminate redundancies. Common borders of two countries are noted twice (polygon of country one and polygon of country two) in GeoJSON format. The TopoJSON extension allows the definition of

IMS-WP7-D7.1-DLR-006-02-I 99 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

“arcs” – in this case the common border is understood to be an arc, then arcs are combined to polygons. Thus topological relations are considered. As a result TopoJSON is substantially more compact than GeoJSON, frequently offering a reduction of ≥ 80% in memory size without a loss in accuracy.

8.5 TRAFFIC MANAGEMENT In the traffic management domain “classic data formats” as spreadsheets, csv files, plain text, scanned images are sometimes still used for exchange of static data (e.g. traffic signal plans, loop detector meta data). But for static data as well as for highly dynamic data more and more structured and more flexible formats as xml are used and for real time data especially the web service technology, WMS, WFS etc. are often used. As a map basis in addition to here and TomTom / Teleatlas the OSM becomes more and more popular and is even used by public authorities. Setting up e.g. traffic information- or management-systems often many data have to be processed and combined and as a first step may be put on a central basic map. For this purpose the location Reference System OpenLR, which is described below can be used. In traffic management an important topic is “Real time traffic information”. The commonly used formats TMC, TPEG and DATEX II are described in the following as well. Concerning public transport the General Transit Feed Specification GTFS is used to distribute real time information as well as provides journey planners, in Germany e.g. used with the TRIAS interface.

8.5.1 OpenLR (Location Referencing) OpenLR is a royalty-free open standard for "procedures and formats for the encoding, transmission, and decoding of local data irrespective of the map" developed by TomTom. The format allows locations localised on one map to be found on another map to which the data have been transferred. ([122], [123]) OpenLR requires that the coordinates are specified in the WGS 84 format and that route links are given in meters. Also, all routes need to be assigned to a "functional road class". The specification is licensed under a Creative Commons license ([123]). TomTom has published a library for the format under the GPLv2.

8.5.2 Transport Protocol Experts Group TPEG The Transport Protocol Experts Group (TPEG) is a data protocol suite for traffic and travel related information. TPEG can be carried over different transmission media (bearers), such as digital broadcast or cellular networks (wireless Internet). TPEG applications include, among others, information on road conditions, weather, fuel prices, parking or delays of public transport. ([124])

IMS-WP7-D7.1-DLR-006-02-I 100 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

8.5.3 Traffic Message Channel (TMC) Traffic Message Channel (TMC) is a technology for delivering traffic and travel information to motor vehicle drivers. It is digitally coded using the ALERT C protocol into RDS Type 8A groups carried via conventional FM radio broadcasts. It can also be transmitted on Digital Audio Broadcasting or satellite radio. TMC allows silent delivery of dynamic information suitable for reproduction or display in the user's language without interrupting audio broadcast services. Both public and commercial services are operational in many countries. When data is integrated directly into a navigation system, traffic information can be used in the system's route calculation. ([125])

8.5.4 DATEX II Delivering European Transport Policy in line with the ITS Action Plan of the European Commission requires co-ordination of traffic management and development of seamless pan European services. With the aim to support sustainable mobility in Europe, the European Commission has been supporting the development of information exchange mainly between the actors of the road traffic management domain for several years. In the road sector, the DATEX standard was developed for information exchange between traffic management centres, traffic information centres and service providers and constitutes the reference for applications that have been developed in the last 10 years. The second generation DATEX II specification now also pushes the door wide open for all actors in the traffic and travel information sector. Much investment has been made in Europe, both in traffic control and information centres over the last decade and also in a quantum shift in the monitoring of the trans-European transport network (TEN-T). This is in line with delivering the objectives of the EasyWay programme for safer roads, reduced congestion and a better environment. Collecting information is only part of the story – to make the most of the investment data needs to be exchanged both with other centres and, in a more recent development, with those developing pan-European services provided directly to road users. DATEX was originally designed and developed as a traffic and travel data exchange mechanism by a European task force set up to standardise the interface between traffic control and information centres. With the new generation DATEX II it has become the reference for all applications requiring access to dynamic traffic and travel related information in Europe. The aim of the DATEX II organisation is that in 2020 DATEX II is THE information model for road traffic and travel information in Europe. ([126])

8.5.5 General Transit Feed Specification (GTFS) The General Transit Feed Specification (GTFS), also known as GTFS static or static transit to differentiate it from the GTFS real time extension, defines a common format for public transportation schedules and associated geographic information. GTFS "feeds" let public transit agencies publish their transit data and developers write applications that consume that data in an interoperable way.

IMS-WP7-D7.1-DLR-006-02-I 101 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

A GTFS feed is composed of a series of text files collected in a ZIP file. Each file models a particular aspect of transit information: stops, routes, trips, and other schedule data. The details of each file are defined in the GTFS reference. An example feed can be found in the GTFS examples. A transit agency can produce a GTFS feed to share their public transit information with developers, who write tools that consume GTFS feeds to incorporate public transit information into their applications. GTFS can be used to power trip planners, time table publishers, and a variety of applications, too diverse to list here, that use public transit information in some way. ([127])

8.5.6 The TRIAS interface With the interface TRIAS (corresponding to German VDV-431-2) it is possible to connect to different journey planners. The interface emerged from the research and standardisation project IP-KOM-ÖV of the German BMVI. The information given by TRIAS have the same actuality as the information given by the original journey planner ([128]).

8.5.7 Mobilitätsdatenmarktplatz (MDM) and mCLOUD Providers and users of traffic data can find everything to move them forward on the marketplace MDM: a neutral B2B platform ([132]). Defined standards for data exchange. And above all, the most information in Germany on traffic flows, congestion, road works, parking facilities and more. The Mobility Data Marketplace is where stakeholders, information and opportunities meet. The MDM … - is a neutral platform, on which real time road traffic data of the public authorities and the private sector are offered and exchanged - in Germany is the national single point of access for road traffic following the European IVS- guideline. - offers a defined service level for the data exchange and undertakes the data distribution to customers. - releases providers and customers organisationally and technically. This way e.g. road operators can offer their traffic data on a central platform and don’t have find individual solutions for different customers.

The mCLOUD is a data platform of the BMVI (German Federal Ministry of Transport and Digital Infrastructure) which started in May 2016. The data treasure of the ministry and its subordinate agencies, millions of mobility, geo-, and weather data are made investigable at one central point. The mCLOUD is a growing system and is as well open for data from science and economy. The mCLOUD - is an investigation platform for open data from the business domain of the BMVI. - Works as a search engine. Search and find is made simple. - Doesn’t distribute the data itself, but instead refers to data interfaces and download links of the supplying providers / organisations.

IMS-WP7-D7.1-DLR-006-02-I 102 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

At start in 2016 the mCLOUD included data sets from mobility (roads, railways, water ways air transportation), weather and climate and waters, e.g.: - Data of the 1.700 count locations of the German Bundesanstalt für Straßenwesen (BASt) (road workloads, traffic density) - Flood times and water levels at the German Bay - Real time data and water levels of navigable waterways - Time series of more than 1.000 climate stations of the German Weather Service - Time tables of the Deutsche Bahn incl. data about the parking situation at railway stations

MDM as well as mCLOUD are part of the corporate strategy of the BMVI to support intelligent mobility in Germany. While the mCLOUD is a free open data portal where to search for data, the MDM offers its users comprehensive functionalities for offering, subscription and exchange of real time data

IMS-WP7-D7.1-DLR-006-02-I 103 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

9 REFERENCED DOCUMENTS

[1] http://www.in2rail.eu/, accessed 2017. [2] http://www.in2rail.eu/Page.aspx?CAT=DELIVERABLES&IdPage=69d2e365-3355- 45d4-bb3c-5d4ba797a3ac, accessed 2017 [3] IN2RAIL: Deliverable D9.1 – Asset status representation. [4] http://www.wordle.net/create, accessed 2017. [5] JSON.ORG. (2017). Introducing JSON. Retrieved from JSON.ORG: HTTP://JSON.ORG/JSON-EN.HTML, accessed 2017. [6] XML ESSENTIALS. (2017). Retrieved from w3.org: https://www.w3.org/standards/xml/core, accessed 2017. [7] W3Schools. (2017). XML RDF. Retrieved from w3schools.com: https://www.w3schools.com/xml/xml_rdf.asp, accessed 2017. [8] Computer-Hope. (2017). Spreadsheet. Retrieved from computerhope.com: https://www.computerhope.com/jargon/s/spreadsh.htm, accessed 2017. [9] Opendata-handbook. (2017). File Formats. Retrieved from Opendatahandbook.org: http://opendatahandbook.org/guide/en/appendices/file-formats/, accessed 2017. [10] HDF Group, https://www.hdfgroup.org/, accessed 2017 [11] An Introduction to NetCDF, Unidata, http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_introduction.html, accessed 2017 [12] https://en.wikipedia.org/wiki/Business_Process_Modeling_Language, accessed 2017 [13] http://www.prostep.org/en/projects/code-of-plm-openness/, accessed 2017 [14] https://en.wikipedia.org/wiki/Google_Web_Toolkit, accessed 2017 [15] https://en.wikipedia.org/wiki/JT_%28visualization_format%29, accessed 2017 [16] https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=oslc-core, accessed 2017 [17] https://en.wikipedia.org/wiki/Requirements_Interchange_Format, accessed 2017 [18] http://www.odata.org/, accessed 2017, accessed 2017 [19] https://en.wikipedia.org/wiki/Systems_Modeling_Language, accessed 2017 [20] https://en.wikipedia.org/wiki/Unified_Modeling_Language, accessed 2017 [21] http://www.uml.org/, accessed 2017 [22] Website of ascolab GmbH. (2017) http://www.ascolab.com/, last access: 15.05.2017. [23] https://opcfoundation.org/, accessed 2017

IMS-WP7-D7.1-DLR-006-02-I 104 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

[24] BOSAK, J., 1997. XML, Java, and the future of the Web. World Wide Web Journal, 2(4), pp. 219-227. [25] http://www.w3.org/Consortium/, accessed 2017 [26] http://www.w3.org/TR/2004/NOTE-ws-arch-20040211/#introduction [27] CÂNDIDO, G., JAMMES, F., DE OLIVEIRA, J.B. and COLOMBO, A.W., 2010. SOA at device level in the industrial domain: Assessment of OPC UA and DPWS specifications, Industrial Informatics (INDIN), 2010 8th IEEE International Conference on 2010, IEEE, pp. 598-603. [28] DISCENZO, F.M., NICKERSON, W., MITCHELL, C.E. and KELLER, K.J., 2001. Open systems architecture enables health management for next generation system monitoring and maintenance. Development program white paper, OSA-CBM Development Group. [29] http://www.w3.org/TR/tr-technology-stds, accessed 2017. [30] http://infinispan.org, accessed 2017. [31] https://redis.io/, accessed 2017. [32] http://www.opengeospatial.org/standards/sfa, accessed 2017. [33] http://gdal.org/, accessed 2017. [34] https://en.wikipedia.org/wiki/World_file, accessed 2017. [35] https://en.wikipedia.org/wiki/OpenStreetMap, accessed 2017. [36] http://wiki.openstreetmap.org/wiki/Map_Features, accessed 2017. [37] http://wiki.openstreetmap.org/wiki/Tags, accessed 2017. [38] http://www.openrailwaymap.org, accessed 2017. [39] https://en.wikipedia.org/wiki/Shapefile, access 2017. [40] http://www.opengeospatial.org/, accessed 2017. [41] http://docs.opengeospatial.org/wp/07-165r1/, accessed 2017. [42] http://www.sensorml.com/, accessed 2017. [43] LAS Specification, Version 1.4-R13, American Society for Photogrammetry and Remote Sensing, 15 July 2013. Retrieved from http://www.asprs.org/wp- content/uploads/2010/12/LAS_1_4_r13.pdf, accessed 2017. [44] asprs.org. The imaging & geospatial information society. https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html; accessed 09.06.2017. [45] http://www.novapoint.com/sets-standard-bim-railway-projects, accessed 2017. [46] http://www.bimtaskgroup.org/, accessed 2017. [47] http://buildingsmart.org/standards/technical-vision/, accessed 2017. [48] http://www.mimosa.org/mimosa-osa-eai, accessed 2017. [49] Tiwari, A., Turner, C. J., & Majeed, B. (2008). A review of business process mining: State-of-the-art and future trends. Business Process Management Journal, 14(1), 5-22. [50] Van der Aalst, Wil MP, van Dongen, B. F., Herbst, J., Maruster, L., Schimm, G., & Weijters, A. J. (2003). Workflow mining: A survey of issues and approaches. Data & Knowledge Engineering, 47(2), 237-267.

IMS-WP7-D7.1-DLR-006-02-I 105 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

[51] Ko, R. K., Lee, S. S., & Wah Lee, E. (2009). Business process management (BPM) standards: A survey. Business Process Management Journal, 15(5), 744-791. [52] https://data.sncf.com/api, accessed or used 2017. [53] https://ressources.data.sncf.com/explore/?sort=modified, accessed or used 2017. [54] http://data.deutschebahn.com/dataset/data-streckennetz, accessed or used 2017. [55] http://data.deutschebahn.com/dataset/data-stationsdaten, accessed or used 2017. [56] http://data.deutschebahn.com/dataset/api-fahrplan, accessed or used 2017. [57] http://www.vbb.de/de/article/fahrplan/webservices/schnittstellen-fuer- webentwickler/5070.html, accessed 2017. [58] https://opentransportdata.swiss/en/dataset/istdaten, accessed or used 2017. [59] https://opentransportdata.swiss/en/dataset/timetable-2017-gtfs, accessed or used 2017. [60] https://opentransportdata.swiss/en/dataset/didok, accessed or used 2017. [61] https://opentransportdata.swiss/en/dataset/goch, accessed or used 2017. [62] https://opentransportdata.swiss/en/dataset/bhlist, accessed or used 2017. [63] https://opentransportdata.swiss/en/dataset/timetable-2017-hrdf, accessed or used 2017. [64] https://opentransportdata.swiss/en/dataset/gtfsrt, accessed or used 2017. [65] https://opentransportdata.swiss/en/dataset/fahrtprognose, accessed or used 2017. [66] https://opentransportdata.swiss/en/dataset/aaa, accessed or used 2017. [67] https://opentransportdata.swiss/en/dataset/timetabeloverview, accessed or used 2017. [68] http://nrodwiki.rockshore.net/index.php/Main_Page, accessed 2017. [69] http://dataportal.orr.gov.uk/browsereports, accessed 2017. [70] OMG website, http://www.omg.org/bpdm/, accessed 2017. [71] WFMC website, http://www.wfmc.org/XPDL.htm, accessed 2017. [72] Edibasics website, http://www.edibasics.com, accessed 2017. [73] W3Schools. (2017). XML RDF. Retrieved from w3schools.com: https://www.w3schools.com/xml/xml_rdf.asp, accessed 2017. [74] XML ESSENTIALS. (2017). Retrieved from w3.org: https://www.w3.org/standards/xml/core, accessed 2017. [75] Nash, A.; Huerlimann, D.; Schütte, J.; Krauss, V.P. (2004): RailML – a standard data interface for railroad applications. In: Computers in Railways IX, pp. 233-240. [76] railML.org: The railML subschemas. https://www.railml.org/en/user/subschemes.html, accessed 04.04.2017. [77] Wikipedia: railML. https://en.wikipedia.org/wiki/RailML, accessed 30.03.2017. [78] railML.org: Version Planning. https://www.railml.org/en/developer/version-timeline.html, accessed 26.06.2018.

IMS-WP7-D7.1-DLR-006-02-I 106 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

[79] railML.org: 31st railML Conference. https://www.railml.org/en/event-reader/31st-railml- conference-berne.html; last access: 04.04.2017. [80] railML.org: (Forum) railML alpha version 3.0.5 available / railML 2.4 announced. http://www.railml.org/forum/index.php?t=msg&th=506&start=0&; last access: 04.04.2017. [81] railML.org: Official railML website. https://www.railml.org/en/; last access: 30.03.2017. [82] railML.org: railML Forum. http://forum.railml.org/; last access: 30.03.2017. [83] railML.org: railML Wiki. https://wiki.railml.org/; last access: 30.03.2017. [84] railML.org: railML Trac ticket system. http://trac.railml.org/; last access: 30.03.2017. [85] railML.org: railVIVID – The railML Viewer & Validator powered by UIC. https://www.railml.org/en/user/railvivid.html; last access: 29.04.2017. [86] railML.org: railML® partners. https://www.railml.org/en/introduction/partners.html; last access: 04.04.2017. [87] railML.org: Licence terms. https://www.railml.org/en/user/licence.html; last access: 29.04.2017. [88] railML.org: Certification of your railML® interface. https://www.railml.org/en/user/certification.html; last access: 29.04.2017. [89] railML.org: Infrastructure. https://www.railml.org/en/user/subschemes/infrastructure.html; last access: 04.04.2017. [90] railML.org: (Wiki) CO: changes / 2.3. http://wiki.railml.org/index.php?title=CO:changes/2.3; last access: 25.04.2017. [91] UIC: Rail TopoModel and railML® - The foundation for an universal Infrastructure Data Exchange Format. In: 24th railML.org Conference, Paris, 18.09.2013; http://documents.railml.org/events/slides/2013-09-17_uic_nissi-erim_presentation.pdf, accessed 2017. [92] UIC: International Railway Solution IRS 30100 – RailTopoModel; 1st edition September 2016 (IRS 30100:2016). [93] UIC, railML.org: RailTopoModel Wiki. http://wiki.railtopomodel.org/; last access: 28.04.2017. [94] railML.org: Organisation. https://www.railml.org/en/organisation.html; last access: 04.04.2017. [95] Gély, L.; Dessagne, G.; Vanderbeck, F. (2010): A multi scalable model based on a connexity graph representation. In: Computers in Railways XII, pp. 193-204. [96] railML.org: (Wiki) Dev: Use cases. http://wiki.railml.org/index.php?title=Dev:Use_cases; last access: 04.04.2017. [97] railML.org: (Wiki) IS: Use Cases. http://wiki.railml.org/index.php?title=IS:UseCases; last access: 04.04.2017.

IMS-WP7-D7.1-DLR-006-02-I 107 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

[98] railML.org: railML Subversion repository. https://svn.railml.org; last access: 04.04.2017. [99] railML.org: Schema railML.xsd. https://www.railml.org/files/download/schemas/2016/railML- 2.3/documentation/railML.html; last access: 24.04.2017. [100]UIC, http://railML.org: Official RailTopoModel Website. http://www.railtopomodel.org/en/; last access: 25.04.2017. [101]UIC, http://railML.org: (Wiki) RTM modelling concepts. http://wiki.railtopomodel.org/index.php?title=RTM_modelling_concepts; last access: 28.04.2017. [102]http://railML.org: (Wiki) IS:UC:Asset status representation. http://wiki.railml.org/index.php?title=IS:UC:Asset_status_representation; last access: 02.05.2017. [103]http://www.era.europa.eu/Document-Register/Pages/TAF-TSI.aspx, accessed 2017. [104]http://www.uic.org/com/IMG/pdf/UIC_Leaflet_407-1.pdf, accessed 2017. [105]https://wiki.asam.net/display/STANDARDS/ASAM+ODS, accessed 2017. [106]https://www.elektrobit.com/products/eb-assist/adtf/, accessed 2017. [107]https://en.wikipedia.org/wiki/AUTOSAR, accessed 2017. [108]http://fmi-standard.org/, accessed 2017. [109]http://www.ap242.org/, accessed 2017. [110]http://www.opendrive.org/project.html, accessed 2017. [111]http://nds-association.org/#thestandard, accessed 2017. [112]http://openscenario.org/project.html, accessed 2017. [113]https://here.com/en/innovation/sensoris, accessed 2017. [114]http://adasis.org/, accessed 2017. [115]https://www.car-2-car.org, accessed 2017. [116]http://www.big-eu.org, accessed 2017. [117]http://www.knx.org, accessed 2017. [118]http://www.lonmark.org, accessed 2017. [119]http://www.dali-ag.org, accessed 2017. [120]http://www.enocean-alliance.org, accessed 2017. [121]http://www.opengeospatial.org/standards/sensorml [122]http://web.archive.org/web/20110807055627/http://www.h- online.com/open/news/item/Open-format-for-local-map-data-743315.html, accessed 2017.

IMS-WP7-D7.1-DLR-006-02-I 108 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

[123]http://www.openlr.org, accessed 2017. [124]https://en.wikipedia.org/wiki/TPEG, accessed 2017. [125]https://en.wikipedia.org/wiki/Traffic_message_channel, accessed 2017. [126]http://www.datex2.eu/content/datex-background, accessed 2017. [127]https://developers.google.com/transit/gtfs/, accessed 2017. [128]http://www.connect-fahrplanauskunft.de/unsere-services/open-service.html, accessed 2017. [129]http://railML.org: Organisation. https://www.railml.org/en/organisation.html; last access: 20.06.2017. [130]http://www.era.europa.eu/Core-Activities/Interoperability/Pages/RINF.aspx, accessed 2017. [131]http://www.eulynx.eu, accessed 2017. [132]http://www.mdm-portal.de/, accessed 2017.

IMS-WP7-D7.1-DLR-006-02-I 109 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

10 APPENDIX A-QUESTIONNAIRE

The basic design of the questionnaire was that of a division of the relevant formats into  Railway formats  Maintenance formats, and miscellaneous  Other formats.

10.1 APPENDIX A-QUESTIONNAIRE: PARTICIPANTS’ DOMAINS OF SERVICE Figure 2 showed how the companies and institutions of the participants are distributed over the domains, the respective business or institution services. Most participants came from Railway, Automotive, and from Traffic Management (each with about the same fraction on the sample size). Additionally, there have also been participants working in the domain of Industry automation, Tunnel, Building monitoring, and Energy.

10.2 APPENDIX A-QUESTIONNAIRE: EXTENT OF USE AND USE CASES OF OPEN DATA EXCHANGE FORMATS

10.2.1 Appendix A-questionnaire: extent of use and use cases of railway formats Figure 23 depicts a tag cloud (generated with a tool like e.g. Wordle, cf. http://www.wordle.net/create) for the use cases of railway formats and comments given by the participants. Table 13 gives their answers in full detail.

According to the feedback, the Infrastructure Data Management (IDMVU) Interface Standard, the civil engineering and survey measurement data format LandXML, and the rail track database from ÖBB, an Austrian mobility services provider, i.e. the ÖBB Gleisdatenbank, are all formats used within the context of the traffic planning software PROVI (Programmsystem für Verkehrs- und Infrastrukturplanung) from the OBERMEYER Planen + Beraten GmbH. RTM (RailTopoModel), a logical object model to standardise the representation of railway infrastructure-related data, and railML v 3, version 3 of the Railway Markup Language for data exchange for infrastructure managers and railway companies, as well as RINF, the Register of Infrastructure, which refers to Article 49 of Directive (EU) 2016/797 and provides for transparency concerning the main features of the European Railway infrastructure, are all tested in pilots, whereas TAF TSI (Telematics applications for freight service) is already in use for Train Composition.

IMS-WP7-D7.1-DLR-006-02-I 110 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 23: Tag cloud of use cases/comments for railway formats

Table 13: Use cases/comments for railway formats Format: Use case/comment: IDMVU PROVI (program) LandXML PROVI (program) ÖBB Gleisdatenbank PROVI (program) CIVIL3D CIVIL3D csv Timetable Signalling Data Exchange Network Rail XML model Format (SDEF) for capture of network models, Initially for signalling. Supports model of network at multiple levels of detail, linked between levels for cross-referencing.

IMS-WP7-D7.1-DLR-006-02-I 111 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Use case/comment: FRAME European Union Transport OpenScenario http://www.opensce- nario.org/ DATEX II Roads Format OSLC https://open-services.net/ OpenSimulation https://github.com/Open- SimulationInterface FMI https://www.fmi- standard.org/ ICE870-5 old but still working IDMVU PROVI (Program) IP-KOM-ÖV non à vérifier OJP moovit? railML Tested in pilots Tested in pilot RTM / railML v 3 Tested in pilots Tested in pilot RINF Tested in pilot Used customized version A first description of the infrastructure was sent. TAF TSI Train Composition is already in use. RTM Tested in pilots Tested in pilot SNCF RESEAU is part of the network RailML. Not used for the moment.

10.2.2 Appendix A-questionnaire: extent of use and use cases of maintenance formats Table 14 shows the extent of use of 8 well-known maintenance formats, given as frequencies of answers in the respective categories (the darker the green cell colour, the more frequently the respective category has been chosen as answer). Figure 24 depicts a tag cloud for the use cases and comments given by the participants. Table 15 gives their answers in full detail.

Table 14: Extent of use of maintenance formats (frequencies of answers) of interest in considered, Format: future not used in preparation in operation OPC UA 0 1 0 5 ISO 13372 3 1 0 2 MIMOSA OSA-CBM 3 2 0 2 EN 62682:2015 & EMMUA 2 1 0 1

IMS-WP7-D7.1-DLR-006-02-I 112 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

191 NR-L2-SIG-30036-Issue1 0 0 0 1 MIMOSA OSA-EAI 3 2 0 1 Sensor ML 2 0 5 1 csv 0 0 0 1

According to the feedback, the Alarms Management Standard in Industrial Asset Management EN 62682:2015 & EMMUA 191, while quite new for Rail markets, are established well in plant, manufacturing and in materials processing markets. The ISO13372 Data Standards for Reference point for condition monitoring and diagnostics of machines is a reference point for EEMUA191, IEC62682 and ISO13374. The Network Rail Interlocking Log Standard NR-L2-SIG-30036-Issue1 is a common record format for interlocking incident recorders: it is used to capture sequences of events and actions, supports fault investigation and incident investigation (e.g. a signal passed at danger). MIMOSA OSA-CBM, the open system architecture for moving information in a condition-based maintenance system of MIMOSA, an Operations and Maintenance Information Open System Alliance, is an implementation of ISO 13374 and is used for rail remote condition monitoring of infrastructure assets, for data acquisition, manipulation and state detection. Newer uses are in preparation for health assessment and for prognostic assessment. Likewise, MIMOSA OSA-EAI, the Open System Architecture for Enterprise Application Integration is of interest for Enterprise application integration, and used in defence markets. Use cases for OPC UA, a machine-to-machine communication protocol for industrial automation are assumed in the field of security related applications. The format is considered to be too specific for local (plant) data collection, not suitable for wide area / national data acquisition and test automation. Finally, the Sensor Model Language Sensor ML is currently tested in pilots.

IMS-WP7-D7.1-DLR-006-02-I 113 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 24: Tag cloud of use cases/comments for maintenance formats

Table 15: Use cases/comments for maintenance formats Format: Use case/comment: EN 62682:2015 & EMMUA 191 Quite new for Rail markets, established well in plant, manufacturing and materials processing markets. ISO 13372 Common terminology for condition monitoring and diagnosis. Reference point for EEMUA191, IEC62682 and ISO13374. NR-L2-SIG-30036-Issue1 Common record format for interlocking incident recorders. Used to capture sequences of events and actions, supports fault investigation and incident investigation (e.g. signal passed at danger)

IMS-WP7-D7.1-DLR-006-02-I 114 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Use case/comment: MIMOSA OSA-CBM For rail remote condition monitoring of infrastructure assets, data aquisition, manipulation and state detection. Newer uses in preparation for health assessment and prognostic assessment. MIMOSA OSA-EAI Of interest for Enterprise application integration. Used in defence markets. OPC UA At the time was too specific for local Just for security related applications (plant) data collection, not suitable for wide area / national data aquisition and test automation. Sensor ML Tested in pilots Tested in pilot csv Time table

10.2.3 Appendix A-questionnaire: extent of use and use cases of other formats Table 16 shows the extent of use of 96 well-known miscellaneous open data exchange formats, given as frequencies of answers in the respective categories (the darker the green cell colour, the more frequently the respective category has been chosen as answer). Figure 25 depicts a tag cloud for the use cases and comments given by the participants. Table 17 gives their answers in full detail.

The obtained feedback in terms of use cases for the 96 other formats was comprehensive, and therefore only a few key statements are given here. E.g., according to the feedback, the formats ASCII-GRID, Simple Feature Access, GeoPackage, Esri Shape and RoadXML are used for Noise Maps and Road Noise Maps; LandXML, CityGML, OSM, and Esri Shape are used in conjunction with software for traffic or infrastructure planning / Building Information Modelling (BIM) like PROVI, Infra Works, and Civil 3D; formats like e.g. UML, GML, City GML, and Simple Features OLE/COM (OpenGIS) play a major role in tasks like import, export and modelling; OSM and OpenWeather-Maps find application in Webservices; GeoTIFF and GML in JPEG 2000 are used for Overlays. Moreover, SNMP is used for Standard Server Monitoring and Product Monitoring.

Table 16: Extent of use of other formats (frequencies of answers) considered, of interest in not in Format: future used in preparation operation UML 2 2 0 14 SNMP 0 2 0 10 GUID, UUID, … 1 0 0 8

IMS-WP7-D7.1-DLR-006-02-I 115 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

considered, of interest in not in Format: future used in preparation operation Esri Shape 2 2 0 7 OSM 0 4 1 7 GWT 4 2 0 6 Modbus 0 2 0 6 GeoJSON 3 0 1 5 BACnet 0 0 0 4 GML 3 2 0 4 GeoTIFF 2 0 1 4 KML 0 1 1 4 OpenDRIVE 0 0 0 4 OpenLayers 0 0 0 4 OWS Context 0 1 0 4 Simple Feature Access 0 0 0 4 TMC 1 1 0 4 Coordinate Transformation 3 3 0 3 DATEX2 0 0 0 3 XES 1 0 0 3 GeoAPI 4 1 0 3 GML in JPEG 2000 4 0 0 3 OpenCRG 0 1 0 3 Simple Features SQL (OpenGIS) 1 0 0 3 WKT CRS 0 0 0 3 Filter Encoding 0 0 0 2 HDF5 5 0 0 2 INSPIRE 1 0 0 2 ISO 8601 1 0 0 2 KNXbus 0 0 0 2 ONVIF 0 0 1 2 OpenSCENARIO 0 0 2 2 OpenWeather-Maps 2 1 2 2 PubSub 1 1 0 2 Road2Simulation 1 0 0 2 Web Feature Service 0 0 0 2 Web Map Service 0 0 0 2 Web Map Tile Service 0 0 0 2 Earth Observation Products 0 0 0 1 CityGML 0 2 1 1 GeoPackage 0 1 1 1

IMS-WP7-D7.1-DLR-006-02-I 116 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

considered, of interest in not in Format: future used in preparation operation GeoSparql 2 0 0 1 GeoXACML 1 0 0 1 GUF 1 0 0 1 LandXML 0 3 0 1 Moving Features 2 0 0 1 OpenLR 0 0 0 1 RDS 1 2 1 1 RoadXML 0 4 0 1 Simple Features OLE/COM (OpenGIS) 0 0 0 1 Styled Layer Descriptor 0 0 0 1 TMS 0 0 0 1 Web Coverage Service 1 0 0 1 ASCII- GRID 0 0 0 1 ARML2.0 1 1 1 0 Catalogue Service 2 1 1 0 GeoSciML 1 0 0 0 IndoorGML 1 0 0 0 ISO 6709 2 0 0 0 OpenLS 1 0 0 0 NetCDF 0 1 0 0 Observations and Measurements 1 0 0 0 Open GeoSMS 1 0 0 0 Ordering Services Framework for Earth Observation 0 1 0 0 PUCK 1 1 0 0 Sensor Observation Service 1 0 0 0 Sensor Planning Service 1 0 0 0 SENSORIS 1 1 0 0 SWE Common Data Model 1 0 1 0 SWE Service Model 0 0 1 0 TPEG 1 0 0 0 WaterML 1 0 0 0 Web Coverage Processing Service 2 0 0 0 Web Map Context 1 0 0 0 Web Processing Service 0 0 1 0 Web Service Common 1 0 0 0

IMS-WP7-D7.1-DLR-006-02-I 117 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

considered, of interest in not in Format: future used in preparation operation Navigation Data Standard (NDS) 0 0 1 0

Figure 25: Tag cloud of use cases/comments for other formats

Table 17: Use cases/comments for other formats Format: Use case/comment: GUID, UUID, … IT- Infrastructure/ Inventory NR datafeeds ARIANE Model BACnet Building Control Management CityGML Data Import, Modelling Infra Works Coordinate Likely to be used in RINM Dominion, Virtuelle Welt, Transformation Road2Simulation, wherever spatial data is processed DATEX2 for Highways England

IMS-WP7-D7.1-DLR-006-02-I 118 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Use case/comment: Esri Shape Export; Import; Noise Civil 3D, Infra Works Used in multiple systems Dominion, Virtuelle Welt, Mapping for geospatial information Road2Simulation, wherever spatial data is processed XES Used in nuber of software sustems GeoAPI Webservices General use of ARCGIS GML Import; Modelling GeoJSON SQL presentation of interactive contents GeoPackage Import; Noise Mapping GeoTIFF Webservices Overlays Civil 3D, Automap Dominion, Virtuelle Welt, Road2Simulation, wherever spatial data is processed GML in JPEG Webservices Overlays 2000 GWT tested in pilots IndoorGML Room Acoustics ISO 6709 Cross system ISO 8601 cross system KML project-specific application Noise Maps Dominion, Virtuelle Welt, in the course Road2Simulation, of a Lybian project wherever spatial data is processed LandXML Import, Noise Maps Infra Works, Civil3D, PROVI Modbus device configuration Many customers ONVIF for Highways England cameras OpenCRG Road2Simulation OpenDRIVE Dominion, Virtuelle Welt, Road2Simulation, ... OpenLayers DAT-GDV-Wiki, Bahnserver, lot more; superb for quick, accessible visualisation of geodata OpenSCENARIO Dominion OSM Webservices Navigation; Overlays InfraWorks nearly in every project OpenWeather- Webservices Maps OWS Context the institute's geodata

IMS-WP7-D7.1-DLR-006-02-I 119 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Use case/comment: infrastructure PUCK Measurements Devices Road2Simulation We develop it :-p RoadXML Road Noise Maps Simple Feature Data Handeling; Noise Nearly all geo-libraries Access Mapping build up on that: Oracle Spatial, PostGIS, GeoTools, GDAL/OGR, Spatialite, ESRI stuff, QGIS Simple Features Import; Export OLE/COM (OpenGIS) Simple Features Import; Export Oracle Spatial, PostGIS, SQL (OpenGIS) SpatiaLite ... SNMP Standard Server Monitorin; device-dependent many customers Product Monitoring Styled Layer Used in OGC OWS Descriptor SWE Common Industry common model Data Model TMC for Highways England UML Software Development Import, Modelling Web Coverage the institute's geodata Service infrastructure Web Feature the institute's geodata Service infrastructure, Geo-Bug-Tracker, ... ASCII-GRID Noise Mapping, Height Points

10.3 APPENDIX A-QUESTIONNAIRE: HOW GENERIC/SPECIALISED ARE OPEN DATA EXCHANGE FORMATS?

10.3.1 Appendix a-questionnaire: how generic/specialised are railway formats? Figure 3 showed how generic/specialised 14 prominent railway formats were perceived on average, averaged over the participating experts. According to the obtained feedback, the ÖBB Gleisdatenbank, the comma-separated values (CSV) file format in its special format/use case for time tables, as well as the RTM / railML v 3 formats are perceived as the most specialised formats.

IMS-WP7-D7.1-DLR-006-02-I 120 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

10.3.2 Appendix a-questionnaire: how generic/specialised are maintenance formats? Figure 26 shows how generic/specialised 8 prominent maintenance formats are, averaged over the participating experts. According to the obtained feedback, the NR-L2-SIG-30036- Issue1 and the comma-separated values (CSV) file format in its special format/use case for time tables are perceived as the most specialised formats.

How generic/specialised are maintenance formats?

csv

NR-L2-SIG-30036-Issue1

MIMOSA OSA-CBM

OPC UA How generic/specialised Sensor ML 1-5; 0: not answered ISO 13372 EN 62682:2015 & EMMUA 191 MIMOSA OSA-EAI

0 1 2 3 4 5

Figure 26: How generic/specialised are maintenance formats?

10.3.3 Appendix a-questionnaire: how generic/specialised are other formats?

Figure 27 shows the top 15 generic other (miscellaneous) formats, while Figure 28 shows the top 15 specialised other formats (Figure 29 shows how generic/specialised all of the other formats are perceived, averaged over the participating experts).

IMS-WP7-D7.1-DLR-006-02-I 121 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Top 15 generic other formats

Web Service Common Web Map Context Table Joining Service Symbology Encoding Simple Features CORBA SensorThings Sensor Planning Service Sensor Observation Service How generic/specialised Observations and Measurements 1-5; 0: not answered NetCDF Moving Features LonMark GUF ARML2.0 OpenLayers

0 0,5 1 1,5

Figure 27: Top 15 generic other formats

IMS-WP7-D7.1-DLR-006-02-I 122 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Top 15 specialised other formats

ASCII- GRID DATEX2 TMC RDS BACnet OpenWeather-Maps ONVIF RoadXML How generic/specialised KNXbus 1-5; 0: not answered Coordinate Transformation WKT CRS TPEG TMS OpenSCENARIO OpenLR

0 1 2 3 4 5

Figure 28: Top 15 specialised other formats

IMS-WP7-D7.1-DLR-006-02-I 123 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

How generic/specialised are other formats on average?

ASCII- GRID DATEX2 TMC RDS BACnet OpenWeather-Maps ONVIF RoadXML KNXbus Coordinate Transformation WKT CRS TPEG TMS OpenSCENARIO OpenLR OpenDRIVE Modbus ISO 8601 IndoorGML ALERT C CityGML OSM Road2Simulation KML INSPIRE OpenMTC Simple Features SQL (OpenGIS) GML GeoAPI GML in JPEG 2000 WaterML SWE Service Model SWE Common Data Model SENSORIS Ordering Services Framework for Earth Observation Filter Encoding Esri Shape GeoJSON Simple Feature Access OpenCRG PUCK ISO 6709 GeoTIFF SNMP How generic/specialised Web Map Tile Service Web Map Service Web Feature Service 1-5; Web Coverage Service Web Coverage Processing Service 0: not answered tsml OWS Context OpenSearch Geo OpenMI Open GeoSMS OpenLS LandInfra HDF5 GWT GeoSparql GeoSciML GeoPackage Catalogue Service Web Processing Service UML LandXML XES Styled Layer Descriptor Simple Features OLE/COM (OpenGIS) PubSub GeoXACML Earth Observation Products GUID, UUID, … OpenLayers Web Service Common Web Map Context Table Joining Service Symbology Encoding Simple Features CORBA SensorThings Sensor Planning Service Sensor Observation Service Observations and Measurements NetCDF Moving Features LonMark GUF ARML2.0 0 1 2 3 4 5

Figure 29: How generic/specialised are other formats on average?

IMS-WP7-D7.1-DLR-006-02-I 124 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

10.4 APPENDIX A-QUESTIONNAIRE: “BEST” EXAMPLE OF AN OPEN DATA EXCHANGE FORMAT SUITABLE FOR ONE OF SEVERAL SOURCES OF INFORMATION One question of the questionnaire asked for the (one) “best” example of an Open Data Exchange format suitable for the following information sources:  Asset model – Rail infrastructure, including asset definition, geospatial, logical and relational model  Asset model – Rolling-stock, including asset definition, logical and relational model  Asset condition  Alarms and events  Asset maintenance activity – past and future schedule  Asset fault history  Rail operation history – past rolling stock movements and information  Rail operation plan – timetable data  Business process notation model (including standard operating procedures format) Moreover, it was asked with respect to which criterion the chosen format is “best”. Figure 4 had depicted a tag cloud for best formats and the respective criteria given by the answers of the participants. Table 18 gives their answers in full detail. In most of the answers, railML has been given as best format for modelling Rail infrastructure and Rolling-stock, and also for Rail operation plans. RailML is also the most frequently named best format in total, preferred because it is an open format, and due to its large user base. For asset condition, alarms and events, and asset maintenance, the majority of answers gave SensorML as the best format, e.g. because of a good user experience. For alarms and events, OPC-UA or OPC-AE was named as best format just as often as SensorML. Among other formats like Systems Modeling Language (SysML, based on UML) and Business Process Model and Notation (BPMN), Business Process Modeling Language (BPML) has been named as best format for a business process notation model because it is generic, widely used, and well known.

Table 18: "Best" formats for several information sources with the respective criteria Information source:

Rail infra- Rolling- Asset Alarms Asset Rail operation Business process Road structure stock condition and events maintenance plan notation model infrastructure Format: SHP, railML, OSA-CBM sensorML sensorML XML SysML/UML OpenDRIVE ASCII, sensorML XML, GML Criterion: Applicability user Specific meaning open format, and experience large user base definition clear. For: acquisition, manipu-

IMS-WP7-D7.1-DLR-006-02-I 125 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Information source:

Rail infra- Rolling- Asset Alarms Asset Rail operation Business process Road structure stock condition and events maintenance plan notation model infrastructure lation, state detection, health assess- ment, prognosis and advisory generation. Format: PROVI, railML sensorML OPC-UA CIF UML, BPMN Civil 3D or OPC-AE Criterion: open user widely used user experience format, experience large user base Format: (1) RailML, railML SMTP railML BPML (2) SDEF Criterion: for network wide use user widely used, models, experience generic, well at many known layers, interlinked, with time domain. Format: RailML sensorML railML Criterion: XML based open format, so easy to large user base understand Format: railML + sensorML railML RailTopoMo del Criterion: user experience Format: railML, sensorML Criterion: Format: railML Criterion: open format, large user base Format: railML Criterion: openness

IMS-WP7-D7.1-DLR-006-02-I 126 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Information source:

Rail infra- Rolling- Asset Alarms Asset Rail operation Business process Road structure stock condition and events maintenance plan notation model infrastructure Format: railML Criterion: Most railML railML sensorML sensorML, sensorML railML UML OpenDRIVE frequently: OPC-UA or OPC-AE

10.5 APPENDIX A-QUESTIONNAIRE: OPTIONAL MINDSET QUESTIONS: OPEN DATA EXCHANGE POLICY AND ATTITUDE TOWARDS OPEN DATA In Figure 30, percentages of answers “yes” are given for three questions regarding the Open Data Exchange policy of the company or institution:  Does your company or institution define Open Data strategies?  Does your company or institution own Open Data portals?  Is your company or institution willing to contribute to Open Data Exchange initiatives/portals? This question belongs to a set of optional general mindset questions, the feedback for which is shown in Figure 30 and Figure 31. For exactly half of the participating companies or institutions (i.e. 50% of them), a participating employed expert chose the option to actually answer this set of questions. According to the obtained feedback, all companies or institutions with an expert answering on their behalf (i.e. 100% of them) are willing to contribute to Open Data Exchange initiatives/portals. Two out of three companies or institutions with an expert answering on their behalf (i.e. 66. 6% of them) define Open Data strategies, whereas 40% of the companies or institutions with an expert answering on their behalf even owned an Open Data portal.

IMS-WP7-D7.1-DLR-006-02-I 127 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Policy of the company/institution: Percentage of answers "yes"

Is your company or institution willing to contribute to Open Data Exchange inititatives/portals?

Does your company or institution Percentage of answers "yes" among own Open Data portals? all companies with an expert answer

Does your company or institution define Open Data strategies?

0% 20% 40% 60% 80% 100%

Figure 30: Policy of the company/institution

Figure 31 shows the attitude of the company or institution of the expert, averaged over the participating experts. According to the obtained feedback, the mean attitudes towards using non-proprietary data formats in products, as well as towards Open Data is very positive (both between 4 and 5 on a scale ranging from 1 (strongly negative attitude) to 5 (strongly positive attitude).

IMS-WP7-D7.1-DLR-006-02-I 128 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Attitude of the company/institution: Mean attitude

What is your company's/institution's attitude towards using non-proprietary data formats in products?

Mean attitude (1: strongly negative; 5: strongly positive) What is your company's/institution's attitude towards Open Data?

4 4,1 4,2 4,3 4,4 4,5

Figure 31: Attitude of the company/institution: Mean attitude

10.6 APPENDIX A-QUESTIONNAIRE: STRENGTHS AND WEAKNESSES OF OPEN DATA EXCHANGE FORMATS Figure 32 depicts a tag cloud for strengths and weaknesses of Open Data Exchange formats as given by the participants. Table 19 gives their answers in full detail: notice that strengths and weaknesses presented in one row of the table may be authored by different participants, and thus be controversial.

According to the obtained feedback, general strengths of Open Data Exchange formats are  open documentation,  universal use of data,  the fact that no vender logins do exist,  no country-specific data is involved

General weaknesses have been seen in possible misinterpretations, in a potential confusion arising from the fact that there are too many formats in total, too many solutions, and finally because the potential risk of misuse of data is relatively high with universal formats.

IMS-WP7-D7.1-DLR-006-02-I 129 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 32: Tag cloud of strengths and weaknesses of Open Data exchange formats

Moreover, strengths of OSA-CBM include covering applications ranging from data acquisition through to advisory generation in a structured, well defined format. A strength of railML and railTopoModel is the fact that they are defined involving the main European railway actors, and that they are standardised open formats. On the other hand, railML has been criticised because it is not yet completed for all railway assets, and because it is “uglily” (/nasty) hierarchical and huge. Other weaknesses of railML and OpenDRIVE include their slow development and innovation cycle, the fact that development of supporting tools is uncoupled, and that the tools interpret the data differently (i.e. only a subset of the format is implemented and not all data format versions are supported).

Table 19: Strengths and weaknesses of Open Data exchange formats Format: Strengths: Weaknesses: OSA-CBM Covers data acquisition Verbose. Somewhat ambiguous use through to advisory of XML inheritance. generation in a structured, well defined format.

IMS-WP7-D7.1-DLR-006-02-I 130 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Strengths: Weaknesses: OPC-UA (and OPC-DA) Very broadly used Proprietary, inflexible, limited structure and functionality in DA (i.e. just Data Acquisition, no higher level functional support - e.g. health assessment, diagnosis, prognosis) XML clear definition Creates large data files

time/location ease of exchange and use none

OGC OWS (WMS, WFS, WCS, ...) versatile, quite generic; standardised for decades only suitable to transport simple/flat already, today easy to use geodata without sophisticated data model dependencies OpenDRIVE detailed road topology and in version <= 1.4 very specific; topography description, representation of lanes other well established, than driving lanes very poorly possible, e.g. in intersection standardised areas; mathematical representation of geometries is pain in the ass for open formats data exchange and should be switched to something like OGC Simple Features. Do the bloody (/nasty) smoothing of your trajectories on the application's side, (for hell's sake)! slow innovation cycle slow development, development of supporting tools uncoupled and interpret the data different (only subset of the format is implemented, not all data format versions are supported) OpenCRG full open source none OpenSCENARIO great potential slow innovation cycle railTopoModel defined involving the main European railway actors railML defined involving the main not yet completed for all railway european railway actors assets, uglily (/nasty) hierarchical, huge, you should split in into smaller

IMS-WP7-D7.1-DLR-006-02-I 131 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Format: Strengths: Weaknesses: standardised formats too generic in asset description, open formats slow development, development of supporting tools uncoupled and interpret the data different (only subset of the format is implemented, not all data format versions are supported), slow innovation cycle Road2Simulation more generic road description in still under development and not OGC Simple Features known well, yet; offers possibility to derive other formats from, like OpenDRIVE, NDS, … sensorML good feedback from other sectors CityGML you can represent nearly you can represent nearly everything with it everything with it; huge, you should split in into smaller formats

10.7 APPENDIX A-QUESTIONNAIRE: LICENSING AND/OR LEGAL ISSUES HAMPERING APPLICATION OF OPEN DATA EXCHANGE FORMATS Figure 33 depicts a tag cloud for licensing and/or legal issues hampering application of Open Data Exchange formats as named and explained by the participants. Table 20 gives their answers in full detail. Named issues include unclear adoption policies (railML), a confusing tangle of different uses, programs and policies (SHP), the obstacle of horrendous costs for joining the consortium prior to access (NDS), and the general problem that data from business projects usually is not royalty-free and therefore cannot be provided as Open Data.

IMS-WP7-D7.1-DLR-006-02-I 132 of 133 D7.1 GA 730569 Open data: a review of the state-of-the-art

Figure 33: Tag cloud for licensing and/or legal issues hampering application of Open Data Exchange formats

Table 20: Licensing and/or legal issues hampering application of Open Data Exchange formats Format: Issue: SHP many different programs and different uses no access to specification unless joining Navigation Data Standard (NDS) the consortium/development group for horrendous costs! railML policy for adoption not clear data from projects is not royalty-free general and cannot be provided as Open Data

IMS-WP7-D7.1-DLR-006-02-I 133 of 133