HPI Future SOC Lab : Proceedings 2012
Total Page:16
File Type:pdf, Size:1020Kb
HPI Future SOC Lab: Proceedings 2012 Christoph Meinel, Andreas Polze, Gerhard Oswald, Rolf Strotmann, Ulrich Seibold, Bernhard Schulzki (Hrsg.) Technische Berichte Nr. 85 des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam ISBN 978-3-86956-276-6 ISSN 1613-5652 Technische Berichte des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam Technische Berichte des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam | 85 Christoph Meinel | Andreas Polze | Gerhard Oswald | Rolf Strotmann | Ulrich Seibold | Bernhard Schulzki (Hrsg.) HPI Future SOC Lab Proceedings 2012 Universitätsverlag Potsdam Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.dnb.de/ abrufbar. Universitätsverlag Potsdam 2014 http://verlag.ub.uni-potsdam.de/ Am Neuen Palais 10, 14469 Potsdam Tel.: +49 (0)331 977 2533 / Fax: 2292 E-Mail: [email protected] Die Schriftenreihe Technische Berichte des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam wird herausgegeben von den Professoren des Hasso-Plattner-Instituts für Softwaresystemtechnik an der Universität Potsdam. ISSN (print) 1613-5652 ISSN (online) 2191-1665 Das Manuskript ist urheberrechtlich geschützt. Druck: docupoint GmbH Magdeburg ISBN 978-3-86956-276-6 Zugleich online veröffentlicht auf dem Publikationsserver der Universität Potsdam: URL http://pub.ub.uni-potsdam.de/volltexte/2014/6899/ URN urn:nbn:de:kobv:517-opus-68991 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-68991 Contents Spring 2012 Prof. Dr. Ben Juurlink, Architektur eingebetteter Systeme, Technische Universitat¨ Berlin Parallelizing H.264 Decoding with OpenMP Superscalar . 1 Prof. Dr. Jurgen¨ Dollner,¨ Computer Graphics Systems, Hasso-Plattner-Institut Service-Based 3D Rendering and Interactive 3D Visualization . 7 Dr. Ralf Kuhne,¨ SAP Innovation Center, Potsdam Benchmarking and Tenant Placement for Efficient Cloud Operations . 11 Prof. Dr. Christoph Meinel, Internet-Technologies and Systems Group, Hasso-Plattner- Institut Towards Multi-Core and In-Memory for IDS Alert Correlation: Approaches and Capabilities . 15 Multicore-Based High Performance IPv6 Cryptographically Generated Addresses (CGA) . 21 Blog- Intelligence Extension with SAP HANA . 27 Accurate Mutlicore Processor Power Models for Power-Aware Resource Management . 29 VMs Core-allocation scheduling Policy for Energy and Performance Management . 35 Prof. Dr. Andreas Polze, Operating Systems & Middleware Group, Hasso-Plattner- Institut Parallelization of Elementary Flux Mode Enumeration for Large-scale Metabolic Networks . 41 Dr. Felix Salfner, SAP Innovation Center, Potsdam Early Anomaly Detection in SAP Business ByDesign . 47 Prof. Dr. Michael Schottner,¨ Betriebssysteme, Universitat¨ Dusseldorf¨ ECRAM (Elastic Cooperative Random-Access Memory) . 53 Prof. Dr. Steffen Staab, Institute for Web Science and Technologies, Universitat¨ Koblenz-Landau KONECT Cloud — Large Scale Network Mining in the Cloud . 55 Prof. Dr. Rainer Thome, Chair in Business administration and business computing, Universitat¨ Wurzburg¨ Integrated Management Support with Forward Business Recommendations . 59 i Fall 2012 Prof. Dr. Jurgen¨ Dollner,¨ Hasso-Plattner-Institut Potsdam Service-Based 3D Rendering and Interactive 3D Visualization . 63 Prof. Dr. Jorge Marx Gomez,´ Carl von Ossietzky University Oldenburg Smart Wind Farm Control . 65 Prof. Dr. Helmut Krcmar, Technical University of Munich Measurement of Execution Times and Resource Requirements for single user requests . 69 Dr. Ralph Kuhne,¨ SAP Innovation Center Potsdam Benchmarking for Efficient Cloud Operations . 71 Prof. Dr. Christoph Meinel, Hasso-Plattner-Institut Potsdam Instant Intrusion Detection using Live Attack Graphs and Event Correlation . 73 Prof. Dr. Andreas Polze, Hasso-Plattner-Institut Potsdam Exploiting Heterogeneous Architectures for Algorithms with Low Arithmetic Intensity . 77 Dr. Felix Salfner, SAP Innovation Center Potsdam Using In-Memory Computing for Proactive Cloud Operations . 83 Prof. Dr. Kai-Uwe Sattler, Ilmenau University of Technology Evaluation of Multicore Query Execution Techniques for Linked Open Data . 89 Dr. Sascha Sauer, Max Planck Institute for Molecular Genetics (MPIMG) Berlin Next Generation Sequencing: From Computational Challenges to Biological Insight . 95 Prof. Dr. Michael Schottner,¨ Heinrich Heine University of Dusseldorf¨ ECRAM (Elastic Cooperative Random-Access Memory) . 99 Prof. Assaf Schuster, HPI research school, Technion IIT Analysis of CPU/GPU data transfer bottlenecks in multi-GPU systems for hard real-time data streams . 103 Prof. Dr. Steffen Staab, University of Koblenz and Landau KONECT Cloud — Large Scale Network Mining in the Cloud . 107 Prof. Dr. Rainer Thome, University of Wurzburg¨ Adaptive Realtime KPI Analysis of ERP transaction data using In-Memory technology . 111 Till Winkler, Humboldt University of Berlin The Impact of Software as a Service . 115 ii Parallelizing H.264 Decoding with OpenMP Superscalar Chi Ching Chi, Ben Juurlink Embedded Systems Architecture Einsteinufer 17 10551 Berlin {chi.c.chi,b.juurlink}@tu-berlin.de Abstract Once all input dependencies of a task are resolved, it is scheduled for execution. Since the advent of multi-core processors and systems, A key difference between OmpSs and other task-based programmers are faced with the challenge of exploit- parallel programming models is the ability to add tasks ing thread-level parallelism (TLP). In the past years before they are ready to execute. This is a power- several parallel programming models have been intro- ful feature which allows more complex paralleliza- duced to simplify the development of parallel applica- tion strategies that simultaneously exploit function- tions. OpenMP Superscalar is a novel task-based pro- level and data-level parallelism. In this paper we show gramming model, which incorporates advanced fea- how these features can be used to implement a parallel tures such as automated runtime dependency reso- implementation of H.264 decoding. lution, while maintaining simple pragma-based pro- The structure of this paper is as follows: Sections 2 gramming for C/C++. We have parallelized H.264 to 5 describe the implementation and optimization of decoding using OpenMP Superscalar to investigate its parallel H.264 video decoding using OmpSs. In Sec- ease-of-use and performance. tion 6 the performance of the OmpSs version is eval- uated and compared to an optimized Pthreads imple- mentation. Conclusions are drawn in Section 7. 1 Introduction 2 Pipelining H.264 Because multi-core processors have become om- The H.264 decoder pipeline consists in our design of nipresent, there is a lot of renewed interest in parallel 5 pipeline stages, shown in Figure 1. In the read stage programming models that are easy to use while being the bitstream is read from the disk and parsed into sep- expressive, and allow to write performance-portable arated frames. In the parse stage the headers of the applications. An important question is, however, how frame are parsed and a picture info entry in the pic- to evaluate the ease of use, expressiveness, and perfor- ture info buffer (PIB) is allocated. The entropy decode mance of a programming model. In this work we try (ED) stage performs a lossless decompression by ex- to answer this question by describing our experiences tracting the syntax elements for each macroblock in in parallelizing H.264 decoding using OmpSs, a novel the frame. Some syntax elements are directly pro- task-based parallel programming model. To evaluate cessed, e.g. the motion vectors differences are trans- its performance, the performance of the OmpSs appli- formed into motion vectors. The macroblock recon- cation is compared to a similary structured Pthreads struction stage allocates a picture in the decoded pic- applications. H.264 decoding is an excellent case ture buffer (DPB) and reconstructs the picture using study because it is highly irregular and dynamic, ex- the syntax elements and motion vectors. The output hibits many different types of data dependencies as stage reorders and outputs the decoded pictures either well as rather fine-grained tasks, and requires differ- to an output file or the display. ent types of parallelism to be exploited. In contrast to other task-based programming models, In OmpSs [3], in addition to OpenMP functionality, like Cilk++ [7] and OpenMP [2], pipeline parallelism programmers can express parallelism by annotating can be straightforwardly expressed in OmpSs, because certain code sections (typically functions) as tasks, OmpSs tasks can be spawned before its dependencies as well as the inputs, outputs, and inputs/outputs of have been resolved [8, 6]. Listing 1 presents the sim- these tasks. When these functions are called, they are plified code of the pipelined main decoder loop using added to a task graph instead of being executed. The OmpSs pragmas. A task is created for each pipeline task dependencies are resolved at runtime, using the stage in each loop iteration. For correct pipelining of input/output specification of the function arguments. the tasks, it is required that all tasks in iteration i are 1 ∼ 40% ∼ 50% #pragma omp task inout(*rc) output(*frm) Read Parse ED MBR Output void read_frame_task(ReadContext *rc, EncFrame *frm ); PIB DPB #pragma omp task inout(*nc, *frm) output(*s)