Analysis of Automatic Parallelization Methods for Multicore Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Using the GNU Compiler Collection (GCC)
Using the GNU Compiler Collection (GCC) Using the GNU Compiler Collection by Richard M. Stallman and the GCC Developer Community Last updated 23 May 2004 for GCC 3.4.6 For GCC Version 3.4.6 Published by: GNU Press Website: www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 59 Temple Place Suite 330 Tel 617-542-5942 Boston, MA 02111-1307 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being \GNU General Public License" and \Funding Free Software", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled \GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ...................................... 1 1 Programming Languages Supported by GCC ............ 3 2 Language Standards Supported by GCC ............... 5 3 GCC Command Options ......................... -
Memory Tagging and How It Improves C/C++ Memory Safety Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov Google February 2018
Memory Tagging and how it improves C/C++ memory safety Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov Google February 2018 Introduction 2 Memory Safety in C/C++ 2 AddressSanitizer 2 Memory Tagging 3 SPARC ADI 4 AArch64 HWASAN 4 Compiler And Run-time Support 5 Overhead 5 RAM 5 CPU 6 Code Size 8 Usage Modes 8 Testing 9 Always-on Bug Detection In Production 9 Sampling In Production 10 Security Hardening 11 Strengths 11 Weaknesses 12 Legacy Code 12 Kernel 12 Uninitialized Memory 13 Possible Improvements 13 Precision Of Buffer Overflow Detection 13 Probability Of Bug Detection 14 Conclusion 14 Introduction Memory safety in C and C++ remains largely unresolved. A technique usually called “memory tagging” may dramatically improve the situation if implemented in hardware with reasonable overhead. This paper describes two existing implementations of memory tagging: one is the full hardware implementation in SPARC; the other is a partially hardware-assisted compiler-based tool for AArch64. We describe the basic idea, evaluate the two implementations, and explain how they improve memory safety. This paper is intended to initiate a wider discussion of memory tagging and to motivate the CPU and OS vendors to add support for it in the near future. Memory Safety in C/C++ C and C++ are well known for their performance and flexibility, but perhaps even more for their extreme memory unsafety. This year we are celebrating the 30th anniversary of the Morris Worm, one of the first known exploitations of a memory safety bug, and the problem is still not solved. -
Statically Detecting Likely Buffer Overflow Vulnerabilities
Statically Detecting Likely Buffer Overflow Vulnerabilities David Larochelle [email protected] University of Virginia, Department of Computer Science David Evans [email protected] University of Virginia, Department of Computer Science Abstract Buffer overflow attacks may be today’s single most important security threat. This paper presents a new approach to mitigating buffer overflow vulnerabilities by detecting likely vulnerabilities through an analysis of the program source code. Our approach exploits information provided in semantic comments and uses lightweight and efficient static analyses. This paper describes an implementation of our approach that extends the LCLint annotation-assisted static checking tool. Our tool is as fast as a compiler and nearly as easy to use. We present experience using our approach to detect buffer overflow vulnerabilities in two security-sensitive programs. 1. Introduction ed a prototype tool that does this by extending LCLint [Evans96]. Our work differs from other work on static detection of buffer overflows in three key ways: (1) we Buffer overflow attacks are an important and persistent exploit semantic comments added to source code to security problem. Buffer overflows account for enable local checking of interprocedural properties; (2) approximately half of all security vulnerabilities we focus on lightweight static checking techniques that [CWPBW00, WFBA00]. Richard Pethia of CERT have good performance and scalability characteristics, identified buffer overflow attacks as the single most im- but sacrifice soundness and completeness; and (3) we portant security problem at a recent software introduce loop heuristics, a simple approach for engineering conference [Pethia00]; Brian Snow of the efficiently analyzing many loops found in typical NSA predicted that buffer overflow attacks would still programs. -
Pattern Matching
Functional Programming Steven Lau March 2015 before function programming... https://www.youtube.com/watch?v=92WHN-pAFCs Models of computation ● Turing machine ○ invented by Alan Turing in 1936 ● Lambda calculus ○ invented by Alonzo Church in 1930 ● more... Turing machine ● A machine operates on an infinite tape (memory) and execute a program stored ● It may ○ read a symbol ○ write a symbol ○ move to the left cell ○ move to the right cell ○ change the machine’s state ○ halt Turing machine Have some fun http://www.google.com/logos/2012/turing-doodle-static.html http://www.ioi2012.org/wp-content/uploads/2011/12/Odometer.pdf http://wcipeg.com/problems/desc/ioi1211 Turing machine incrementer state symbol action next_state _____ state 0 __1__ state 1 0 _ or 0 write 1 1 _10__ state 2 __1__ state 1 0 1 write 0 2 _10__ state 0 __1__ state 0 _00__ state 2 1 _ left 0 __0__ state 2 _00__ state 0 __0__ state 0 1 0 or 1 right 1 100__ state 1 _10__ state 1 2 0 left 0 100__ state 1 _10__ state 1 100__ state 1 _10__ state 1 100__ state 1 _10__ state 0 100__ state 0 _11__ state 1 101__ state 1 _11__ state 1 101__ state 1 _11__ state 0 101__ state 0 λ-calculus Beware! ● think mathematical, not C++/Pascal ● (parentheses) are for grouping ● variables cannot be mutated ○ x = 1 OK ○ x = 2 NO ○ x = x + 1 NO λ-calculus Simplification 1 of 2: ● Only anonymous functions are used ○ f(x) = x2+1 f(1) = 12+1 = 2 is written as ○ (λx.x2+1)(1) = 12+1 = 2 note that f = λx.x2+1 λ-calculus Simplification 2 of 2: ● Only unary functions are used ○ a binary function can be written as a unary function that return another unary function ○ (λ(x,y).x+y)(1,2) = 1+2 = 3 is written as [(λx.(λy.x+y))(1)](2) = [(λy.1+y)](2) = 1+2 = 3 ○ this technique is known as Currying Haskell Curry λ-calculus ● A lambda term has 3 forms: ○ x ○ λx.A ○ AB where x is a variable, A and B are lambda terms. -
Pathscale ENZO GTC12 S0631 – Programming Heterogeneous Many-Cores Using Directives C
PathScale ENZO GTC12 S0631 – Programming Heterogeneous Many-Cores Using Directives C. Bergström | May 14th, 2012 Brief Introduction to ENZO 2 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 ENZO Overview & Goals Speed transition to GPU & many-core systems • Simplify the task of migrating software written in C, C++ & Fortran • Uses OpenHMPP Standard (easy migration) • CAPS HMPP compatible Performance & HPC focused • Fully exploits NVIDIA GPU features • Generates native instructions optimized for NVIDIA GPU 3 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 Project Schedule & Status 4 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 Project Schedule . ENZO Production release June 2012 – OpenHMPP 2.5 C, C++ and Fortran . Next ENZO Production release October 2012 – More tools and better support for libraries – x8664 OpenHMPP task parallelism (similar to OMP3 tasks) – More optimizations (IPA / CG2 / textures) – OpenHMPP 3.0 – CUDA 4.x – Kepler 5 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 Project Status . OpenHMPP 2.5 – Running CAPS C & Fortran Labs – PathScale written HMPP test suite – Customer code . New C++ compiler – Perennial C++VS and CVSA regression free – Corner case compile time issues – Corner case runtime issues . Ongoing effort – Performance tuning & benchmarking – Compiler robustness – Nightly compiler builds to address issues 6 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 Performance 7 | PathScale GTC12 S0631 Tutorial | May 14th, 2012 Performance . NVIDIA Tesla 2050 - “Lab2” SGEMM – ENZO – /opt/enzo/bin/pathcc -hmpp -
Lambda Calculus and Functional Programming
Global Journal of Researches in Engineering Vol. 10 Issue 2 (Ver 1.0) June 2010 P a g e | 47 Lambda Calculus and Functional Programming Anahid Bassiri1Mohammad Reza. Malek2 GJRE Classification (FOR) 080299, 010199, 010203, Pouria Amirian3 010109 Abstract-The lambda calculus can be thought of as an idealized, Basis concept of a Turing machine is the present day Von minimalistic programming language. It is capable of expressing Neumann computers. Conceptually these are Turing any algorithm, and it is this fact that makes the model of machines with random access registers. Imperative functional programming an important one. This paper is programming languages such as FORTRAN, Pascal etcetera focused on introducing lambda calculus and its application. As as well as all the assembler languages are based on the way an application dikjestra algorithm is implemented using a Turing machine is instructed by a sequence of statements. lambda calculus. As program shows algorithm is more understandable using lambda calculus in comparison with In addition functional programming languages, like other imperative languages. Miranda, ML etcetera, are based on the lambda calculus. Functional programming is a programming paradigm that I. INTRODUCTION treats computation as the evaluation of mathematical ambda calculus (λ-calculus) is a useful device to make functions and avoids state and mutable data. It emphasizes L the theories realizable. Lambda calculus, introduced by the application of functions, in contrast with the imperative Alonzo Church and Stephen Cole Kleene in the 1930s is a programming style that emphasizes changes in state. formal system designed to investigate function definition, Lambda calculus provides a theoretical framework for function application and recursion in mathematical logic and describing functions and their evaluation. -
Aliasing Restrictions of C11 Formalized in Coq
Aliasing restrictions of C11 formalized in Coq Robbert Krebbers Radboud University Nijmegen December 11, 2013 @ CPP, Melbourne, Australia int f(int *p, int *q) { int x = *p; *q = 314; return x; } If p and q alias, the original value n of *p is returned n p q Optimizing x away is unsound: 314 would be returned Alias analysis: to determine whether pointers can alias Aliasing Aliasing: multiple pointers referring to the same object Optimizing x away is unsound: 314 would be returned Alias analysis: to determine whether pointers can alias Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x; } If p and q alias, the original value n of *p is returned n p q Alias analysis: to determine whether pointers can alias Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x *p; } If p and q alias, the original value n of *p is returned n p q Optimizing x away is unsound: 314 would be returned Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x; } If p and q alias, the original value n of *p is returned n p q Optimizing x away is unsound: 314 would be returned Alias analysis: to determine whether pointers can alias It can still be called with aliased pointers: x union { int x; float y; } u; y u.x = 271; return h(&u.x, &u.y); &u.x &u.y C89 allows p and q to be aliased, and thus requires it to return 271 C99/C11 allows type-based alias analysis: I A compiler -
Locality-Aware Automatic Parallelization for GPGPU with Openhmpp Directives
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives José M. Andión, Manuel Arenaz, François Bodin, Gabriel Rodríguez and Juan Touriño 7th International Symposium on High-Level Parallel Programming and Applications (HLPP 2014) July 3-4, 2014 — Amsterdam, Netherlands Outline • Motivation: General Purpose Computation with GPUs • GPGPU with CUDA & OpenHMPP • The KIR: an IR for the Detection of Parallelism • Locality-Aware Generation of Efficient GPGPU Code • Case Studies: CONV3D & SGEMM • Performance Evaluation • Conclusions & Future Work J.M. Andión et al. Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives. HLPP 2014. Outline • Motivation: General Purpose Computation with GPUs • GPGPU with CUDA & OpenHMPP • The KIR: an IR for the Detection of Parallelism • Locality-Aware Generation of Efficient GPGPU Code • Case Studies: CONV3D & SGEMM • Performance Evaluation • Conclusions & Future Work J.M. Andión et al. Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives. HLPP 2014. 100,000 Intel Xeon 6 cores, 3.3 GHz (boost to 3.6 GHz) Intel Xeon 4 cores, 3.3 GHz (boost to 3.6 GHz) Intel Core i7 Extreme 4 cores 3.2 GHz (boost to 3.5 GHz) 24,129 Intel Core Duo Extreme 2 cores, 3.0 GHz 21,871 Intel Core 2 Extreme 2 cores, 2.9 GHz 19,484 10,000 AMD Athlon 64, 2.8 GHz 14,387 AMD Athlon, 2.6 GHz 11,865 Intel Xeon EE 3.2 GHz 7,108 Intel D850EMVR motherboard (3.06 GHz, Pentium 4 processor with Hyper-Threading Technology) 6,043 6,681 IBM Power4, 1.3 GHz 4,195 3,016 Intel VC820 motherboard, 1.0 GHz Pentium III processor 1,779 Professional Workstation XP1000, 667 MHz 21264A Digital AlphaServer 8400 6/575, 575 MHz 21264 1,267 1000 993 AlphaServer 4000 5/600, 600 MHz 21164 649 Digital Alphastation 5/500, 500 MHz 481 Digital Alphastation 5/300, 300 MHz 280 22%/year Digital Alphastation 4/266, 266 MHz 183 IBM POWERstation 100, 150 MHz 117 100 Digital 3000 AXP/500, 150 MHz 80 HP 9000/750, 66 MHz 51 IBM RS6000/540, 30 MHz 24 52%/year Performance (vs. -
Teach Yourself Perl 5 in 21 Days
Teach Yourself Perl 5 in 21 days David Till Table of Contents: Introduction ● Who Should Read This Book? ● Special Features of This Book ● Programming Examples ● End-of-Day Q& A and Workshop ● Conventions Used in This Book ● What You'll Learn in 21 Days Week 1 Week at a Glance ● Where You're Going Day 1 Getting Started ● What Is Perl? ● How Do I Find Perl? ❍ Where Do I Get Perl? ❍ Other Places to Get Perl ● A Sample Perl Program ● Running a Perl Program ❍ If Something Goes Wrong ● The First Line of Your Perl Program: How Comments Work ❍ Comments ● Line 2: Statements, Tokens, and <STDIN> ❍ Statements and Tokens ❍ Tokens and White Space ❍ What the Tokens Do: Reading from Standard Input ● Line 3: Writing to Standard Output ❍ Function Invocations and Arguments ● Error Messages ● Interpretive Languages Versus Compiled Languages ● Summary ● Q&A ● Workshop ❍ Quiz ❍ Exercises Day 2 Basic Operators and Control Flow ● Storing in Scalar Variables Assignment ❍ The Definition of a Scalar Variable ❍ Scalar Variable Syntax ❍ Assigning a Value to a Scalar Variable ● Performing Arithmetic ❍ Example of Miles-to-Kilometers Conversion ❍ The chop Library Function ● Expressions ❍ Assignments and Expressions ● Other Perl Operators ● Introduction to Conditional Statements ● The if Statement ❍ The Conditional Expression ❍ The Statement Block ❍ Testing for Equality Using == ❍ Other Comparison Operators ● Two-Way Branching Using if and else ● Multi-Way Branching Using elsif ● Writing Loops Using the while Statement ● Nesting Conditional Statements ● Looping Using -
User-Directed Loop-Transformations in Clang
User-Directed Loop-Transformations in Clang Michael Kruse Hal Finkel Argonne Leadership Computing Facility Argonne Leadership Computing Facility Argonne National Laboratory Argonne National Laboratory Argonne, USA Argonne, USA [email protected] hfi[email protected] Abstract—Directives for the compiler such as pragmas can Only #pragma unroll has broad support. #pragma ivdep made help programmers to separate an algorithm’s semantics from popular by icc and Cray to help vectorization is mimicked by its optimization. This keeps the code understandable and easier other compilers as well, but with different interpretations of to optimize for different platforms. Simple transformations such as loop unrolling are already implemented in most mainstream its meaning. However, no compiler allows applying multiple compilers. We recently submitted a proposal to add generalized transformations on a single loop systematically. loop transformations to the OpenMP standard. We are also In addition to straightforward trial-and-error execution time working on an implementation in LLVM/Clang/Polly to show its optimization, code transformation pragmas can be useful for feasibility and usefulness. The current prototype allows applying machine-learning assisted autotuning. The OpenMP approach patterns common to matrix-matrix multiplication optimizations. is to make the programmer responsible for the semantic Index Terms—OpenMP, Pragma, Loop Transformation, correctness of the transformation. This unfortunately makes it C/C++, Clang, LLVM, Polly hard for an autotuner which only measures the timing difference without understanding the code. Such an autotuner would I. MOTIVATION therefore likely suggest transformations that make the program Almost all processor time is spent in some kind of loop, and return wrong results or crash. -
Purity in Erlang
Purity in Erlang Mihalis Pitidis1 and Konstantinos Sagonas1,2 1 School of Electrical and Computer Engineering, National Technical University of Athens, Greece 2 Department of Information Technology, Uppsala University, Sweden [email protected], [email protected] Abstract. Motivated by a concrete goal, namely to extend Erlang with the abil- ity to employ user-defined guards, we developed a parameterized static analysis tool called PURITY, that classifies functions as referentially transparent (i.e., side- effect free with no dependency on the execution environment and never raising an exception), side-effect free with no dependencies but possibly raising excep- tions, or side-effect free but with possible dependencies and possibly raising ex- ceptions. We have applied PURITY on a large corpus of Erlang code bases and report experimental results showing the percentage of functions that the analysis definitely classifies in each category. Moreover, we discuss how our analysis has been incorporated on a development branch of the Erlang/OTP compiler in order to allow extending the language with user-defined guards. 1 Introduction Purity plays an important role in functional programming languages as it is a corner- stone of referential transparency, namely that the same language expression produces the same value when evaluated twice. Referential transparency helps in writing easy to test, robust and comprehensible code, makes equational reasoning possible, and aids program analysis and optimisation. In pure functional languages like Clean or Haskell, any side-effect or dependency on the state is captured by the type system and is reflected in the types of functions. In a language like ERLANG, which has been developed pri- marily with concurrency in mind, pure functions are not the norm and impure functions can freely be used interchangeably with pure ones. -
How to Write Code That Will Survive
Programming Heterogeneous Many-cores Using Directives HMPP - OpenAcc F. Bodin, CAPS CTO Introduction • Programming many-core systems faces the following dilemma o Achieve "portable" performance • Multiple forms of parallelism cohabiting – Multiple devices (e.g. GPUs) with their own address space – Multiple threads inside a device – Vector/SIMD parallelism inside a thread • Massive parallelism – Tens of thousands of threads needed o The constraint of keeping a unique version of codes, preferably mono- language • Reduces maintenance cost • Preserves code assets • Less sensitive to fast moving hardware targets • Codes last several generations of hardware architecture • For legacy codes, directive-based approach may be an alternative o And may benefit from auto-tuning techniques CC 2012 www.caps-entreprise.com 2 Profile of a Legacy Application • Written in C/C++/Fortran • Mix of user code and while(many){ library calls ... mylib1(A,B); ... • Hotspots may or may not be myuserfunc1(B,A); parallel ... mylib2(A,B); ... • Lifetime in 10s of years myuserfunc2(B,A); ... • Cannot be fully re-written } • Migration can be risky and mandatory CC 2012 www.caps-entreprise.com 3 Overview of the Presentation • Many-core architectures o Definition and forecast o Why usual parallel programming techniques won't work per se • Directive-based programming o OpenACC sets of directives o HMPP directives o Library integration issue • Toward a portable infrastructure for auto-tuning o Current auto-tuning directives in HMPP 3.0 o CodeletFinder for offline auto-tuning o Toward a standard auto-tuning interface CC 2012 www.caps-entreprise.com 4 Many-Core Architectures Heterogeneous Many-Cores • Many general purposes cores coupled with a massively parallel accelerator (HWA) Data/stream/vector CPU and HWA linked with a parallelism to be PCIx bus exploited by HWA e.g.