HPI Future SOC Lab: Proceedings 2015

HPI Future SOC Lab: Proceedings 2015 Christoph Meinel, Andreas Polze, Gerhard Oswald, Rolf Strotmann, Ulrich Seibold, Bernhard Schulzki (Eds.) HPI Future SOC Lab: Proceedings 2015 Christoph Meinel | Andreas Polze | Gerhard Oswald | Rolf Strotmann | Ulrich Seibold | Bernhard Schulzki (Eds.) HPI Future SOC Lab Proceedings 2015 Hasso-Plattner-Institut für Softwaresystemtechnik GmbH Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.dnb.de/ abrufbar. Hasso-Plattner-Institut 2017 https://hpi.de/ Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam Tel.: +49-(0)331 5509-0 7 / Fax: +49-(0)331 5509-325 E-Mail: [email protected] Das Manuskript ist urheberrechtlich geschützt. Online veröffentlicht auf dem Publikationsserver der Universität Potsdam URN urn:nbn:de:kobv:517-opus4-102516 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus4-102516 Contents Spring 2015 Prof. Dr. Karl Kurbel, European University Viadrina Frankfurt (Oder) Solving of LP and CWLP Problems Using SAP HANA . 1 Prof. Dr. Christoph Meinel, Hasso Plattner Institute Security Monitoring and Analytics of HPI FutureSoC Lab (Phase I) . 7 Machine Learning for Security Analytics powered by SAP HANA (Phase II) . 11 Prof. Dr. Frank Morelli, Pforzheim University of Applied Sciences Sales Planning and Forecast . 19 Prof. Dr.-Ing. Jorge Marx Gomez,´ Carl von Ossietzky Universitat¨ Oldenburg Project OliMP: In-Memory Planning with SAP HANA . 25 Dr. Lena Wiese, Georg-August-Universitat¨ Gottingen¨ OntQA-Replica: Intelligent Data Replication for Ontology-Based Query Answering . 29 Prof. Dr. Hasso Plattner, Hasso Plattner Institute Natural Language Processing for In-Memory Databases: an Application to Biomedical Question Answering . 35 Provision of Analyze Genomes Services in a Federated In-Memory Database System for Life Sciences . 39 Prof. Dr. Andreas Polze, Hasso Plattner Institute Inspection and Evaluation of Modern Hardware Architectures . 43 Prof. Dr. Holger Giese, Hasso Plattner Institute Large-Scale Graph-Databases based on Graph Transformations & Multi-Core Architectures 49 Prof. Dr. Tadeusz Czachorski,´ Silesian University of Technology Modelling wide area networks using SAP HANA in-memory database . 53 Prof. Dr. Helmut Krcmar, Technical University of Munich Using Process Mining to Identify Fraud in the Purchase-to-Pay Process . 57 Dr. Harald Sack, Hasso Plattner Institute Comparison of Image Classification Models on Varying Dataset Sizes . 63 i Prof. Dr. Witold Abramowicz, Department of Information Systems, Poznan Uni- versity of Economics Sentiment Analysis for the needs of benchmarking the Energy Sector . 69 Prof. Dr. Katinka Wolter, Institute of Computer Science, Freie Universitat¨ Berlin Model-based Quantitative Security Analysis of Mobile Offloading Systems under Timing Attacks . 73 Fall 2015 Prof. Dr. Peter Fettke, Institut fur¨ Wirtschaftsinformatik (IWi) at Deutsches Forschungszen- trum fur¨ Kunstliche¨ Intelligenz (DFKI) and Saarland University Towards Process Mining on Big Data: Optimizing Process Model Matching Approaches on High Performance Computing Infrastructure . 79 Prof. Dr. Andreas Polze, Hasso Plattner Institute A survey of security-aware approaches for cloud-based storage and processing technologies 83 Dr. Lena Wiese, Institut fur¨ Informatik, Georg-August-Universitat¨ Gottingen¨ OntQA-Replica: Intelligent Data Replication for Ontology-Based Query Answering (Revis- ited and Verified) . 89 Prof. Dr. Hasso Plattner, Hasso Plattner Institute Natural Language Processing for In-Memory Databases: Boosting Biomedical Applications 95 Extending Analyze Genomes to a Federated In-Memory Database System For Life Sciences 99 Interactive Product Cost Simulation on Coprocessors . 103 Prof. Dr. Gunther Piller, University of Applied Sciences Mainz ActOnAir: Data Mining and Forecasting for the Personal Guidance of Asthma Patients . 109 Prof. Dr. Dr. h.c. Hans-Jurgen¨ Appelrath, Carl von Ossietzky Universitat¨ Olden- burg BICE: A Cloud-based Business Intelligence System . 113 Prof. Dr. Holger Giese, Hasso Plattner Institute Large-Scale Graph-Databases based on Graph Transformations & Multi-Core Architectures 117 Dr. Benjamin Fabian, Institute of Information Systems, Humboldt University of Berlin Analyzing the Global-Scale Internet Graph at Different Topology Levels: Data Collection and Integration . 121 Dr. Harald Sack, Hasso Plattner Institute Comparison of Feature Extraction Approaches for Image Classification . 127 ii Prof. Dr. Christoph Engels, University of Applied Sciences and Arts Dortmund Optimization of Data Mining Ensemble Algorithms on SAP HANA . 131 Prof. Dr. Christoph Meinel, Hasso Plattner Institute Simulation of User Behavior on a Security Testbed . 137 Prof. Dr. Jan Eloff, University of Pretoria, South Africa Protecting minors on social media platforms . 141 Prof. Dr. Helmut Krcmar, Technical University of Munich Using Process Mining to Identify Fraud in the Purchase-to-Pay Process . 145 Prof. Dr. Bernd Scheuermann, Hochschule Karlsruhe - Technik und Wirtschaft On the Potential of Big Data Boosting Bio-inspired Optimization A Study Using SAP HANA 151 iii Solving of LP and CWLP Problems Using SAP HANA Karl Kurbel Dawid Nowak European University Viadrina European University Viadrina Große Scharrnstraße 59 Große Scharrnstraße 59 15230 Frankfurt (Oder), Germany 15230 Frankfurt (Oder), Germany [email protected] [email protected] Abstract the test cases used. Section 5 discusses the test results, while the concluding section wraps up the findings and This paper explores the capabilities of SAP HANA for gives an outlook to further research. solving optimization problems, in particular linear programming and mixed-integer programming prob- 2 Solution approach lems. The study contrasts a tightly integrated solution approach (GENIOS) with an external solver approach From release SPS 08 on, SAP HANA can be equipped (R server) and with self-implemented optimization with its own solver, GENIOS (GENeric Integer heuristics. All solution approaches are integrated into Optimization System) [6]. GENIOS is capable of a test environment in HANA, and compared with solving continuous and mixed-integer linear program- respect to performance and solution quality. Based on ming (MILP) problems. It is a part of AFL (Applica- a series of test cases, performance indicators are tion Function Libraries) [7, p. 6]), which extends evaluated and factors influencing the performance are HANA's functionality by predefined functions. These discussed. functions can be called by user-defined stored procedures. They are native to the data engine underlying 1 Introduction and motivation SAP HANA’s index server, thus offering the best possible performance for data access and data pro- Today’s business software, such as enterprise resource cessing [8, p. 14]. planning (ERP) and supply chain management (SCM) Bearing in mind the architectural concept of SAP systems, provides solutions to many planning and HANA [8, pp. 13-15] and the results of previous decision problems [1]. A number of exact and studies [9], two approaches seem to be worth trying in heuristic optimization algorithms have been order to make a comparison with GENIOS. incorporated into the software (for example, mixed- Since HANA's power is exploited best when integer linear programming for supply network "everything" happens inside HANA, the first option is planning in SAP SNP [2]). to create a solver for optimization problems with the Optimization models representing real-life situations means that HANA provides for software development. can be very large, requiring a lot of computing power This means, in the first place, to use the native and efficient algorithms. programming language, SQLScript [10]. SAP HANA as an in-memory database exhibits very The second option is a linear programming package good performance where data access and data that is available on R servers. Although an R server is processing are concerned. Due to efficient techniques external to HANA, procedures in R can be written and such as parallel processing and column-oriented data invoked inside HANA, just like SQLScript procedures storage [3], SAP HANA outperforms typical data- [11]. An R server is a statistical server providing bases [4] [5]. advanced calculation functionalities. These function- The questions we are investigating in this paper is how alities can be easily extended through downloadable optimization models can be integrated into HANA and function libraries and R scripts. One of them is the whether optimization can also benefit from HANA's lpSolve package, which provides an application processing power. programming interface (API) for building and solving linear programs [12]. The next section discusses different approaches to solving optimization problems in HANA. Section 3 The types of problems we are considering in this study focuses on the solution architecture and the perfor- are linear programming (LP) and mixed-integer linear mance measures used for the test runs. The fourth programming (MILP) problems. A representative of section briefly describes the testing environment and the latter type is the so-called capacitated warehouse location problem (CWLP). The CWLP embraces the 1 decision in which locations warehouses with limited 3.2 Performance measuring concept capacities are to be built, and the decision which customers are to be served by which warehouses so

Load more