Open Data Observatories

OPEN DATA OBSERVATORIES Naeima Hamed Omer Rana School of Computer Science and Informatics School of Computer Science and Informatics Cardiff University, UK Cardiff University, UK [email protected] [email protected] Pablo Orozco-terWengel Benoît Goossens School of Biosciences School of Biosciences Cardiff University, UK Cardiff University, UK [email protected] [email protected] Charith Perera School of Computer Science and Informatics Cardiff University, UK [email protected] March 26, 2021 ABSTRACT Open Data Observatories refer to online data platforms that provide free, real-time and historical data. They facilitate collaborative and unified environments for citizens and applications, supplemented with reusable datasets, analysis tools and interactive visualisations. Open Data Observatories collect and integrate various data types from multiple disparate providers. Data types include variables such as weather, traffic and social media, while providers are mainly the interconnected devices, services and individuals in the Internet of Things (IoT). The continually increasing volume and variety of such data require timely integration, management and analysis - yet to be presented in a way that end-users can easily understand. Data that interact in real-time preserve their value and enable a more in-depth understanding of real-world choices. This survey explored Open Data and reviewed twelve data observatories, focusing on their data management approaches. We investigated the observatories aims, designs and data types for some applied domains- namely transport, energy, environment, and social sensing. In what follows, we outlined five research challenges that influence their implementation. Keywords Data Observatories, Dashboards, Data portal, Smart City Data A PREPRINT -MARCH 26, 2021 1 Introduction Our evolving technology and proliferation has witnessed an immense increase in raw data—humans record information in their brains, manually on papers and digitally in machines. Structured, semi-structured and unstructured data accumulate from different sources including authorities, academic bodies, and the public. Each source uses various methods to collect information- mostly the Internet of Things (IoT). Many governments worldwide published some of these data as Open Data; however, few internet businesses, have vast amounts of data, but only a small portion of them is open [1]. Opening data can be achieved by using data observatories and interactive dashboards [2] that can fine grain raw data and apply scientific techniques to analyse and visualise them [3][4][5][2][6]. Open Data Observatories aim to promptly communicate heterogeneous data and their associated metadata on a single screen. Intended audiences and stakeholders may interact with the wealth of real and dynamic information, which enable them to discover new phenomena and infer future events. Most of the observatories extract real-time data from devices in the Internet Things (IoT) and transmit them to remote locations [7]. IoT devices collect data from the embedded sensors and actuators in objects and share them through nodes and controllers [8]. These sensors’ central role is to collect observations from the source and interact with the associated controllers (consumer perspective) or a gateway (industrial perspective). Controllers aggregate streams of real-time data and transmitted to the back-end systems such as IoT cloud platforms [9], yet they can also endure data analysis tasks. The IoT cloud platforms serve as an information repository that may enable data-centric actions such as modelling, analysis and visualisation. Legacy data systems, including data lakes, [10], and relational databases [11], are slow and siloed to cope with the ever-growing size and diversity of IoT data. Open Data Observatories can integrate, process, and share these big and heterogeneous data carefully and in a timely manner. Besides making them findable, downloadable and easily accessible in a user-friendly format[2]. In the absence of competent data observatories, crucial information may lose value, become isolated, and eventually stale. • Existing Survey Our survey was inspired by Ma et al. [12] work on finding timely solutions for managing the IoT data across multiple smart cities. Authors predominantly aimed to bridge the gap between data collection and utilisation. For this, they surveyed fourteen smart cities datasets and few modern methods for data modelling and decision making, outlining some research challenges that may augment with time. • Our Contribution Our survey reviewed twelve various data Observatories that collected, integrated, and delivered real-time and historical data. We studied their fundamental data management approaches from generation to processing and presentation. Further, we highlighted five challenges that may constraint their launching and viability, such as integrating heterogeneous data while maintaining sufficient data quality, provenance, and privacy. Our primary intention is to review some existing literature in hopes that it might be helpful to data observatories’ developers, users, and stakeholders. The survey is structured as follows: Section 2 explains and distinguishes Open Data, Public Data and Open Government Data. Section 3 introduces the twelve selected Open Data Observatories, individually describes their purposes, data management approaches and smart services applications. Section 4 recapitulates the data types and insights for the reviewed observatories on the applied domains of transport, environment, energy, social sensing. Section 5 briefly describes and compares most of the data formats, storage databases, predictive analytics and visualisations used at Open Data Observatories. Next, Section 6 specifies the five nominated research challenges- namely, data integration, context, quality, provenance, and privacy. Finally, Section 7 concludes and summaries the survey. 2 A PREPRINT -MARCH 26, 2021 2 Open Data For the past decade, many citizens and organisations worldwide used Open Data to develop applications that address social and economic challenges. Open Data are digital information, non-personal, limitlessly accessible, and free for the intended users [13]. Citizens can exercise them lawfully given that they credit the sources and their contributors [14][15]. There exist many different types of Open Data concerned with the environment, energy, education, and health. Open Data are predictably structured [16] and in machine-readable formats. For example, Spreadsheets [17], Comma Separated Value files (CSV), eXtensible Markup Language (XML), Javascript Object Notation (JSON), Shapefiles, Sequence (SP), Record Columnar (RC), Optimised Row Columnar (ORC), and Parquet files [16]. The machine-readable formats enable the computers software to re-use, integrate and model the data for analysis. There also exist few inflexible Open Data formats, specifically, Portable Document Format (PDF) and HyperText Markup Language (HTML), that computers are unable to modulate directly. Data collected and shared instantly are referred to as real-time data, in contrast to the historical ones which store over time. For the data to be open, they must satisfy the following criteria, as briefed by the Open Data handbook at opendatahandbook.org and discussed in [18]. • Available and accessible, the data must be complete, unaltered, and preferably downloadable over the internet in machine-readable formats. • Re-use and re-distribute, the data must be permitted for full exploitation and re-publication, including merging with other datasets. • Universal participation, the data must be non-discriminatory and non-restricted, equally offered to everyone. 2.1 Public Data Public data are different to Open data. They refer to information made freely available to the public, however, to access them, users must undergo certain procedures and satisfy usage requirements. An example of public data which is not open are the archive of legal records [19]. Public Data can become Open Data if transformed to a digital and standardised machine-processable format. 2.2 Open Government Data Essential providers for Open Data are the ruling government bodies. They are accountable for maintaining the technical and legal part of these data. Information made public by governments cover subject areas ranging from science and environment to public spending [20][13]. Several developed countries worldwide are now obligated by law to publish their collected data. In the UK, the scheme had commenced from 2009, when the government issued a Command Paper [21] affirming to release government datasets and make them publicly available to re-use at no cost. In 2010, the UK authority launched the data.gov.uk website, which permits the central government, local authorities, and public bodies to freely publish their data. These data representations are -also referred to as data catalogues-contain datasets in many formats, including Comma Separated Value files (CSV) and Javascript Object Notation (JSON). Data Catalog Vocabulary (DCAT)-a W3C recommendation-described a dataset as a collection, curated and published by an organisation that allows access to it in multiple formats [22]. UK government Open Data kept increasing exponentially since initiation to reach over 40,000 datasets by November 2017 [23]. The primary purposes of such data are to promote transparency, re-use, improve public services, engage citizens, and create broader opportunities for innovations and best practices [19].

Open Data Observatories

ISO/IEC JTC 1 N13604 ISO/IEC JTC 1 Information Technology

Deliverable D7.5: Standards and Methodologies Big Data Guidance

ISO Mai.Xlsx

Technical Report

(PSI) Project Overview 2009

ISO Aprilie.Xlsx

ICT 표준 활용·개발 맵 Ver.2020 | SW·AI/방송·콘텐츠/차세대보안 분야 - 지능정보, 클라우드컴퓨팅, 실감방송·미디어, 실감형콘텐츠, 정보·물리보안, 융합보안

International Standard Iso 8000-8:2015(E)

Standards Analysis Ict Sector Luxembourg

ISO 8000: the International Data Quality Standard

Iso 8000-2:2017

Standardiseringsprosjekter Og Nye Standarder