D4.1. Establish
Total Page:16
File Type:pdf, Size:1020Kb
D4.1 Data Management Platform Architecture Deliverable ID: D 4.1 Deliverable Title: Data Management Platform Architecture Revision #: 1.4 Dissemination Level: Public Responsible beneficiary: VTT Contributing beneficiaries: All Contractual date of delivery: 30.09.2017 Actual submission date: 27.10.2017 ESTABLISH D4.1 Architecture of Data Management Platform 1 Table of Content Table of Content .................................................................................................................................... 2 1. Introduction ..................................................................................................................................... 3 1.1 Definitions and abbreviations ........................................................................................................ 3 1.2 Introduction .................................................................................................................................. 4 2. ESTABLISH Data Management Platform Architecture ..................................................................... 6 2.1 Data Management Platform in relation to Whole ESTABLISH Architecture .................................... 6 2.2 ESTABLISH Microsoft Azure Reference Architecture .................................................................... 6 2.3 ESTABLISH Azure Reference Architecture’s PaaS services ......................................................... 7 2.4 Data Management Platforms relation to country pilots and relevant processes & principles........... 9 3. Security related questions and Best Practices regarding Cloud Architecture .................................. 12 3.1 Multi-factor Authentication and Role Based Access Control ........................................................ 12 3.2 Data Encryption and Encryption Key Management...................................................................... 12 3.3 Active Monitoring ........................................................................................................................ 13 3.4 Compliance with Regulation ....................................................................................................... 13 4. Data management in ESTABLISH ................................................................................................. 14 4.1 Data Management ...................................................................................................................... 14 4.2 Meta Data Management ............................................................................................................. 14 4.3 Storing Audit Trail information on Cloud Environment ................................................................. 14 4.4 Documentation ........................................................................................................................... 15 4.5 Life Cycle Management and Maintainability ................................................................................ 15 5. Conclusions .................................................................................................................................. 16 ESTABLISH D4.1 Architecture of Data Management Platform 2 1. Introduction 1.1 Definitions and abbreviations DMP (Data Management Platform) - is a platform for managing data from multiple sources for storing, processing, analyzing and provisioning. In this project, the platform is built over Azure which is Microsoft cloud platform developed in the PaaS model. Architecture - is both the process and the product of planning, designing, and constructing data solutions. It contains following elements: information technology specifications, models and guidelines, using a variety of Information Technology notations. Architecture of DMP is covered further in this document by describing all aspects of application Azure services and processes to fulfill project objectives. AI (Artificial Intelligence) - is apparently intelligent behavior by machines, rather than the natural intelligence (NI) of humans and other animals. In computer science AI research is defined as the study of "intelligent agents”. ETL (Extract, Transform, Load) - refers to a process in database usage and especially in data warehousing. Data extraction is where data is extracted from homogeneous or heterogeneous data sources; data transformation where the data is transformed for storing in the proper format or structure for the purposes of querying and analysis; data loading where the data is loaded into the final target database. GDPR (General Data Protection Regulation) – A 2016 formed regulation by European Union that targets to strengthen and unify Data Protection for individuals in EU. HSM (Hardware Security Module) - is a physical computing device that safeguards and manages digital keys for strong authentication and provides crypto processing. HTTPs (Hyper Text Transport Protocol) - also called HTTP over Transport Layer Security, is a communications protocol for secure communication over a computer network which is widely used on the Internet. IaaS (Infrastructure as a service) - refers to online services that provide high-level APIs used to dereference various low-level details of underlying network infrastructure like physical computing resources, location, data partitioning, scaling, security, backup etc. IoT (Internet of Things) - is the network of physical devices, vehicles, and other items embedded with electronics, software, sensors, actuators, and network connectivity which enable these objects to collect and exchange data. ML (Machine Learning) – field of computer science that utilizes algorithms to enable learning form data and predicting based on the trained algorithms. PaaS (Platform as a Service) - is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. REST API (Representational State Transfer Application Protocol Interface) - is a way of providing interoperability between computer systems on the Internet. REST-compliant Web services allow requesting systems to access and manipulate textual representations of Web resources using a uniform and predefined set of stateless operations. ESTABLISH D4.1 Architecture of Data Management Platform 3 SQL (Structured Query Language) - domain-specific language used in programming and designed for managing data held in a relational database management system. SSL/ TLS (Secure Sockets Layer / Transport Layer Security) – SSL is a predecessor of TLS, both frequently referred to as "SSL", are cryptographic protocols that provide communications security over a computer network. VPN (Virtual Private Network) - extends a private network across a public network, and enables users to send and receive data across shared or public networks as if their computing devices were directl y connected to the private network. BYOL (Bring Your Own License) - it enables companies to leverage the power of cloud computing in order to use their existing licenses to deploy and track their license use across the entire company. 1.2 Introduction This deliverable is about the Establish Data Management Cloud Platform Architecture. A strong architectural foundation is important for any project. Good understanding of the common issues and possible solutions are essential to guarantee good performance and provide good Data Security. Industry recognized best practices and Azure PaaS services ensure that all necessary threats are considered. The aim of the Data Management platform is to provide a secure and reliable platform for data management purposes. The platform is intended to gather, store, process and utilize data provided by ESTABLISH country pilots and possible applications or custom Web sites. Establish Data Modelling is built on Microsoft Azure Cloud Platform1. Azure is the current Cloud Platform leader achieving global scale with 42 announced regions – more than any other cloud provider. Compared to other platform providers on the market, it is characterized by more comprehensive compliance coverage, with more than 60 compliance offerings. Azure was also the first major cloud provider to contractually commit to the requirements of the General Data Protection Regulation (GDPR). In-order-to protect solutions, Azure embeds security, privacy and compliance into its development methodology, and has been recognized as the most trusted cloud for government institutions, earning a FedRAMP High authorization that covers 18 Azure services. Looking from the project’s requirements’ and deliverables perspective, Azure in the only Cloud Platform that enables building intelligent solutions at scale (Big Data) using cognitive APIs, bots, IoT and Machine Learning capabilities as PaaS With complaining these capabilities with powerful GPU-based computing, Azure accelerates deep learning, enables high-performance computing simulations and conducts real- time data analytics, which is crucial when dealing with sensor data. Also, to support local pilots, Azure offers hybrid consistency everywhere – in application development, management and security, identity management across the data platform. This helps reduce the risk and cost of a hybrid cloud environment by enabling a common set of skills and offering portability of applications and workloads. Azure is a great helper in getting more value out of investments by reducing costs of solutions. It enables