Project Acronym Sobigdata Project Title Sobigdata Research

Ref. Ares(2017)5087922 - 18/10/2017 SoBigData – 654024 www.sobigdata.eu Project Acronym SoBigData SoBigData Research Infrastructure Project Title Social Mining & Big Data Ecosystem Project Number 654024 Deliverable Title Data Scientists Training Materials 1 Deliverable No. D4.4 Delivery Date 30 April 2017 Giles Greenway (KCL), Tobias Blanke (KCL), Marco Braghieri Authors (KCL) SoBigData receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 654024 SoBigData – 654024 www.sobigdata.eu DOCUMENT INFORMATION PROJECT Project Acronym SoBigData Project Title SoBigData Research Infrastructure Social Mining & Big Data Ecosystem Project Start 1st September 2015 Project Duration 48 months Funding H2020-INFRAIA-2014-2015 Grant Agreement No. 654024 DOCUMENT Deliverable No. D4.4 Deliverable Title Data Scientists Training Materials Contractual Delivery Date 30 April 2017 Actual Delivery Date 18 October 2017 Author(s) Giles Greenway (KCL), Tobias Blanke (KCL), Marco Braghieri (KCL) Editor(s) Marco Braghieri (KCL), Valerio Grossi (CNR) Reviewer(s) Valerio Grossi (CNR), Anna Monreale (UNIPI), Paolo Ferragina (UNIPI), Beatrice Rapisarda (CNR) Contributor(s) Giles Greenway (KCL), Dominic Rout (USFD), Valerio Grossi (CNR), Anna Monreale (UNIPI) Work Package No. WP4 Work Package Title Training Work Package Leader KCL Work Package Participants CNR, USFD, UNIPI, FRH, UT, IMT, LUH, KCL, SNS, AALTO, ETHZ, TUDelft Dissemination Public Nature Report Version / Revision V1.0 Draft / Final Draft Total No. Pages 23 (including cover) Keywords Training Materials, Data Scientists D4.4 Data Scientists Training Materials 1 Page 2 of 23 SoBigData – 654024 www.sobigdata.eu DISCLAIMER SoBigData (654024) is a Research and Innovation Action (RIA) funded by the European Commission under the Horizon 2020 research and innovation programme. SoBigData proposes to create the Social Mining & Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. Building on several established national infrastructures, SoBigData will open up new research avenues in multiple research fields, including mathematics, ICT, and human, social and economic sciences, by enabling easy comparison, re-use and integration of state-of-the-art big social data, methods, and services, into new research. This document contains information on SoBigData core activities, findings and outcomes and it may also contain contributions from distinguished experts who contribute as SoBigData Board members. Any reference to content in this document should clearly indicate the authors, source, organisation and publication date. The document has been produced with the funding of the European Commission. The content of this publication is the sole responsibility of the SoBigData Consortium and its experts, and it cannot be considered to reflect the views of the European Commission. The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content. The European Union (EU) was established in accordance with the Treaty on the European Union (Maastricht). There are currently 27 member states of the European Union. It is based on the European Communities and the member states’ cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice, and the Court of Auditors (http://europa.eu.int/). Copyright © The SoBigData Consortium 2015. See http://project.sobigdata.eu/ for details on the copyright holders. For more information on the project, its partners and contributors please see http://project.sobigdata.eu/. You are permitted to copy and distribute verbatim copies of this document containing this copyright notice, but modifying this document is not allowed. You are permitted to copy this document in whole or in part into other documents if you attach the following reference to the copied elements: “Copyright © The SoBigData Consortium 2015.” The information contained in this document represents the views of the SoBigData Consortium as of the date they are published. The SoBigData Consortium does not guarantee that any information contained herein is error-free, or up to date. THE SoBigData CONSORTIUM MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, BY PUBLISHING THIS DOCUMENT. D4.4 Data Scientists Training Materials 1 Page 3 of 23 SoBigData – 654024 www.sobigdata.eu GLOSSARY ABBREVIATION DEFINITION Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. It has interfaces to many Python system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Python runs on many Unix variants, on the Mac, and on Windows 2000 and later. R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, R classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible. GitHub is a development platform. It allows users which range from GitHub open source to business, to host and review code, manage projects, and build software alongside in a participative environment. The Jupyter Notebook is an open-source web application that allows Jupyter you to create and share documents that contain live code, equations, visualizations and explanatory text. GATE is an open-source software which focuses on text processing and GATE includes a desktop client for developers, a workflow-based web application, a Java library, an architecture and a process. Apromore is an online business process analytics platform which Apromore includes a wide range of process mining analytics. It is an open-source tool which is available both via cloud or download. D4.4 Data Scientists Training Materials 1 Page 4 of 23 SoBigData – 654024 www.sobigdata.eu TABLE OF CONTENT DOCUMENT INFORMATION ......................................................................................................... 2 DISCLAIMER ................................................................................................................................ 3 GLOSSARY ................................................................................................................................... 4 TABLE OF CONTENT ..................................................................................................................... 5 DELIVERABLE SUMMARY ............................................................................................................. 6 EXECUTIVE SUMMARY ................................................................................................................ 7 1 Relevance to SoBigData ......................................................................................................... 9 1.1 Purpose of this document ............................................................................................................. 9 1.2 Relevance to project objectives .................................................................................................... 9 1.3 SOBIGDATA project description .................................................................................................... 9 1.4 Relation to other workpackages ................................................................................................. 10 1.5 Structure of the document ......................................................................................................... 10 2 Report on Training Activities: T2 – Training Modules ........................................................... 11 3 Interactive Training Environments ....................................................................................... 12 4 Notebooks ........................................................................................................................... 14 5 GATE Training Course Materials ........................................................................................... 16 6 Business Process Monitoring and Mining Materials ............................................................. 18 7 SoBigData Master Program Training Materials ..................................................................... 20 7.1 Module Example: Information Retrieval ..................................................................................... 21 8 Conclusions ......................................................................................................................... 23 D4.4 Data Scientists Training Materials 1 Page 5 of 23 SoBigData – 654024 www.sobigdata.eu DELIVERABLE SUMMARY This deliverable’s objective is to report on training materials that have been developed within Work Package 4. As in Task 4.2, Work Package 4 has the goal of

Project Acronym Sobigdata Project Title Sobigdata Research

OER: a Field Guide for Academic Librarians Andrew Wesolek Vanderbilt University, [email protected]

Open Education Resources: Current Limitations and Challenges and Its Usage in Developing Countries

Open Educational Engineering Resources: Adoption and Development by Faculty and Instructors

Open Educational Practices and Resources

2020 OCW Impact Report Contents

Open Educational Resources: CLIPP

Open Courseware and Developing Countries: Building a Community

Open Access, Open Education and Open Data

OPEN EDUCATIONAL RESOURCES Presented By: Robin Robinson and Millie Gonzalez Why OER

Impact of Opencourseware Publication on Higher Education Participation and Student Recruitment

Open Research Online Oro.Open.Ac.Uk

A Quantum Tinkerer's Quest to Mentor Open Science