First Draft of Technical White Papers

BIG Big Data Public Private Forum Consolidated Technical White Papers © BIG consortium Page 1 of 120 Project Acronym: BIG Project Title: Big Data Public Private Forum (BIG) Project Number: 318062 Instrument: CSA Thematic Priority: ICT-2011.4.4 D2.2.1 First Draft of Technical White Papers Work Package: WP2 Strategy & Operations Due Date: 30/04/2013 Submission Date: 31/05/2013 Start Date of Project: 01/10/2012 Duration of Project: 26 Months Organisation Responsible of Deliverable: DERI Version: 1.02 Status: Draft Author name(s): Tim van Kasteren AGT Herman Ravkin AGT Martin Strohbach AGT Mario Lischka AGT Miguel Tinte ATOS Tomas Pariente ATOS Tilman Becker DFKI Axel Ngonga InfAI Klaus Lyko InfAI Sebastian Hellmann InfAI Mohamed Morsey InfAI Philipp Frischmuth InfAI Ivan Ermilov InfAI Michael Martin InfAI Amrapali Zaveri InfAI Sarven Capadisli InfAI Edward Curry NUIG Andre Freitas NUIG Nur Aini Rakhmawati NUIG Umair ul Hassan NUIG Aftab Iqbal NUIG © BIG consortium Page 2 of 120 Anna Karpinska NUIG Syzymon Danielczyk NUIG Pablo Mendes OKFN John Domingue STIR Anna Fensel UIBK Andreas Thalhammer UIBK Reviewer(s): Ricard Munné ATOS Nature: R – Report P – Prototype D – Demonstrator O - Other Dissemination level: PU - Public CO - Confidential, only for members of the consortium (including the Commission) RE - Restricted to a group specified by the consortium (including the Commission Services) Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013) © BIG consortium Page 3 of 120 © BIG consortium Page 4 of 120 BIG 318062 Revision history Version Date Modified by Comments 0.1 27/04/2013 Nur Aini (DERI) Integrated data acquisition, data analysis, data curation and data analysis white papers from the TWGs. 0.2 28/04/2013 Nur Aini (DERI) Fixed formatting issues. 0.3 29/04/2013 Aftab Iqbal (DERI) Executive summary added, reviewed the whole document and fixed references, typos etc. 0.4 30/04/2013 Aftab Iqbal (DERI) References added in Data Curation chapter. 0.5 01/05/2013 Aftab Iqbal (DERI) Updated 2.5.4, 4.2.4 and 4.2.5 0.6 01/05/2013 Edward Curry (DERI) Overall Review. Formatting changes. Updated Executive Summary. 0.61 10/05/2013 Ricard Munné (ATOS) Full document review 0.7 15/05/2013 Aftab Iqbal (DERI) Minor updates to the document based on the review by Ricard Munné (ATOS) 0.7.1 17/05/2013 Edward Curry (DERI) Review of updates 0.8 28/05/2013 Aftab Iqbal (DERI) Merged Data Usage white paper 0.81 29/05/2013 Edward Curry (DERI) Internal review of whitepaper 0.82 30/05/2013 Edward Curry (DERI) Minor feedback from Data Storage working group 0.82 30/05/2013 Ricard Munné (ATOS) Final document review 1.0 31/05/2013 Edward Curry (DERI) Minor changes based on review. Final Release. 1.01 10/06/2013 Edward Curry (DERI) Minor fix to figure references and added missing authors 1.02 09/10/2013 Andre Freitas (DERI) Minor text corrections, authors added, new references added. © BIG consortium Page 5 of 120 BIG 318062 Copyright © 2012, BIG Consortium The BIG Consortium (http://www.big-project.eu/) grants third parties the right to use and distribute all or parts of this document, provided that the BIG project and the document are properly referenced. THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. © BIG consortium Page 6 of 120 BIG 318062 Executive Summary Big Data is an emerging field where innovative technology offers alternatives to resolve the inherent problems that appear when working with huge amounts of data, providing new ways to reuse and extract value from information. The main dimensions of Big Data are typically characterized by the 3 Vs (volume, velocity and variety) that require new forms of data processing enabling enhanced decision making, insight discovery and process optimization. As a result Big Data technology adoption within industrial domains is not a luxury but an imperative need for most organizations to gain competitive advantage. The BIG Project (http://www.big-project.eu/) is a EU coordination action, which will provide a roadmap for Big Data. The work is split into various groups over industrial sectors and technical areas. Figure 0-1: The BIG Project Structure. As can be seen in Figure 0-1, BIG is split into a number of Sector Forums, which will gather needs from vertical industrial sectors and Technology Working Groups which focus on different Big Data technical aspects. The Data Value Chain is comprised of Data Acquisition, Data Analysis, Data Curation, Data Storage, and Data Usage. For each part of the value chain a Technical Working has been established to examine the capabilities and current maturity of Big Data technologies in these areas. Each technical working group has established the current state of play within their areas by, firstly, analysing the state of the art from the literature and secondly through interviews with prominent industrial players and academic researchers in related organisations and research fields. This whitepaper details the initial reports from the Technical Working groups in the Data Value Chain describing the state of the art in each part of the chain together with emerging technological trends coping with Big Data. Exemplar use cases are provided to illustrate existing approaches and a comprehensive survey on the existing approaches and tools are described. This document will provide an initial mechanism to tie the work of the Technology Working Groups and the Sector Forums together by linking generated sector forum requirements with found technology working group capabilities. © BIG consortium Page 7 of 120 BIG 318062 Table of Contents Executive Summary .................................................................................................................. 7 1. Data Acquisition .............................................................................................................. 14 1.1. Introduction ..................................................................................................... 14 1.2. State of the Art ................................................................................................ 14 1.2.1 Definition ........................................................................................ 14 1.2.2 Technologies ................................................................................. 15 1.3. Industrial Sector approach towards Data Acquisition ...................................... 28 1.3.1 Finance and Insurance .................................................................. 28 1.3.2 Health Sector ................................................................................. 29 1.3.3 Manufacturing, Retail, Transport ................................................... 29 1.3.4 Government, Public, Non-profit ..................................................... 30 1.3.5 Telco, Media, Entertainment .......................................................... 30 1.4. Roadmap ........................................................................................................ 31 1.4.1 Requirements ................................................................................ 31 1.4.2 Emerging Trends ........................................................................... 33 1.5. Conclusion and Next Steps ............................................................................. 34 1.6. References ...................................................................................................... 34 1.7. Useful Links .................................................................................................... 35 2. Data Analysis .................................................................................................................. 36 2.1. Introduction ..................................................................................................... 36 2.1.1 Objectives ...................................................................................... 36 2.1.2 Intended audience ......................................................................... 37 2.2. The Process .................................................................................................... 37 2.2.1 Interviewees .................................................................................. 37 2.2.2 Target Interviewees ....................................................................... 38 2.2.3 The Interview Questions ................................................................ 38 2.3. State of the art ................................................................................................ 39 2.3.1 Use of Linked Data and Semantic Approaches to Big Data Analysis 39 2.3.2 Recommendations and personal data ........................................... 40 2.3.3 Stream data processing ................................................................. 42 2.3.4 Large-scale

Load more