The Opencl™ C 2.0 Specification

206

Copy Link

The OpenCL™ C 2.0 Specification Khronos® OpenCL Working Group Version V2.2-11, Fri, 19 Jul 2019 14:13:00 +0000 Table of Contents 6. The OpenCL C Programming Language . 2 6.1. Supported Data Types . 2 6.1.1. Built-in Scalar Data Types . 2 4 6.1.2. Built-in Vector Data Types . 5 6.1.3. Other Built-in Data Types . 6 6.1.4. Reserved Data Types . 7 6.1.5. Alignment of Types. 8 6.1.6. Vector Literals . 9 6.1.7. Vector Components. 9 6.1.8. Aliasing Rules. 14 6.1.9. Keywords . 14 6.2. Conversions and Type Casting. 14 6.2.1. Implicit Conversions . 14 6.2.2. Explicit Casts . 15 6.2.3. Explicit Conversions. 16 6.2.4. Reinterpreting Data As Another Type . 19 6.2.5. Pointer Casting. 22 6.2.6. Usual Arithmetic Conversions . 22 6.3. Operators . 23 6.3.1. Arithmetic Operators . 23 6.3.2. Unary Operators . 23 6.3.3. Pre- and Post-Operators . 23 6.3.4. Relational Operators . 24 6.3.5. Equality Operators . 24 6.3.6. Bitwise Operators . 25 6.3.7. Logical Operators . 25 6.3.8. Unary Logical Operator. 26 6.3.9. Ternary Selection Operator . 26 6.3.10. Shift Operators . 26 6.3.11. Sizeof Operator . 27 6.3.12. Comma Operator . 27 6.3.13. Indirection Operator . 27 6.3.14. Address Operator . 28 6.3.15. Assignment Operator. 28 6.4. Vector Operations . 29 6.5. Address Space Qualifiers . 30 6.5.1. __global (or global). 31 6.5.2. __local (or local). 32 6.5.3. __constant (or constant). 33 6.5.4. __private (or private). 34 6.5.5. The generic address space . 34 6.5.6. Changes to C99. 36 6.6. Access Qualifiers. 42 6.7. Function Qualifiers . 42 6.7.1. __kernel (or kernel). 42 6.7.2. Optional Attribute Qualifiers . 43 6.8. Storage-Class Specifiers. 45 6.9. Restrictions. 45 6.10. Preprocessor Directives and Macros . 48 6.11. Attribute Qualifiers . 49 6.11.1. Specifying Attributes of Types. 50 6.11.2. Specifying Attributes of Functions . 52 6.11.3. Specifying Attributes of Variables . 53 6.11.4. Specifying Attributes of Blocks and Control-Flow-Statements . 55 6.11.5. Specifying Attribute For Unrolling Loops . 56 6.11.6. Extending Attribute Qualifiers . ..

Recommended publications

Undefined Behaviour in the C Language

FAKULTA INFORMATIKY, MASARYKOVA UNIVERZITA Undefined Behaviour in the C Language BAKALÁŘSKÁ PRÁCE Tobiáš Kamenický Brno, květen 2015 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Vedoucí práce: RNDr. Adam Rambousek ii Acknowledgements I am very grateful to my supervisor Miroslav Franc for his guidance, invaluable help and feedback throughout the work on this thesis. iii Summary This bachelor’s thesis deals with the concept of undefined behavior and its aspects. It explains some specific undefined behaviors extracted from the C standard and provides each with a detailed description from the view of a programmer and a tester. It summarizes the possibilities to prevent and to test these undefined behaviors. To achieve that, some compilers and tools are introduced and further described. The thesis contains a set of example programs to ease the understanding of the discussed undefined behaviors. Keywords undefined behavior, C, testing, detection, secure coding, analysis tools, standard, programming language iv Table of Contents Declaration ................................................................................................................................ ii Acknowledgements .................................................................................................................. iii Summary .................................................................................................................................
What Is LLVM? and a Status Update

What is LLVM? And a Status Update. Approved for public release Hal Finkel Leadership Computing Facility Argonne National Laboratory Clang, LLVM, etc. ✔ LLVM is a liberally-licensed(*) infrastructure for creating compilers, other toolchain components, and JIT compilation engines. ✔ Clang is a modern C++ frontend for LLVM ✔ LLVM and Clang will play significant roles in exascale computing systems! (*) Now under the Apache 2 license with the LLVM Exception LLVM/Clang is both a research platform and a production-quality compiler. 2 A role in exascale? Current/Future HPC vendors are already involved (plus many others)... Apple + Google Intel (Many millions invested annually) + many others (Qualcomm, Sony, Microsoft, Facebook, Ericcson, etc.) ARM LLVM IBM Cray NVIDIA (and PGI) Academia, Labs, etc. AMD 3 What is LLVM: LLVM is a multi-architecture infrastructure for constructing compilers and other toolchain components. LLVM is not a “low-level virtual machine”! LLVM IR Architecture-independent simplification Architecture-aware optimization (e.g. vectorization) Assembly printing, binary generation, or JIT execution Backends (Type legalization, instruction selection, register allocation, etc.) 4 What is Clang: LLVM IR Clang is a C++ frontend for LLVM... Code generation Parsing and C++ Source semantic analysis (C++14, C11, etc.) Static analysis ● For basic compilation, Clang works just like gcc – using clang instead of gcc, or clang++ instead of g++, in your makefile will likely “just work.” ● Clang has a scalable LTO, check out: https://clang.llvm.org/docs/ThinLTO.html 5 The core LLVM compiler-infrastructure components are one of the subprojects in the LLVM project. These components are also referred to as “LLVM.” 6 What About Flang? ● Started as a collaboration between DOE and NVIDIA/PGI.
Lecture Notes on Types for Part II of the Computer Science Tripos

Q Lecture Notes on Types for Part II of the Computer Science Tripos Prof. Andrew M. Pitts University of Cambridge Computer Laboratory c 2016 A. M. Pitts Contents Learning Guide i 1 Introduction 1 2 ML Polymorphism 6 2.1 Mini-ML type system . 6 2.2 Examples of type inference, by hand . 14 2.3 Principal type schemes . 16 2.4 A type inference algorithm . 18 3 Polymorphic Reference Types 25 3.1 The problem . 25 3.2 Restoring type soundness . 30 4 Polymorphic Lambda Calculus 33 4.1 From type schemes to polymorphic types . 33 4.2 The Polymorphic Lambda Calculus (PLC) type system . 37 4.3 PLC type inference . 42 4.4 Datatypes in PLC . 43 4.5 Existential types . 50 5 Dependent Types 53 5.1 Dependent functions . 53 5.2 Pure Type Systems . 57 5.3 System Fw .............................................. 63 6 Propositions as Types 67 6.1 Intuitionistic logics . 67 6.2 Curry-Howard correspondence . 69 6.3 Calculus of Constructions, lC ................................... 73 6.4 Inductive types . 76 7 Further Topics 81 References 84 Learning Guide These notes and slides are designed to accompany 12 lectures on type systems for Part II of the Cambridge University Computer Science Tripos. The course builds on the techniques intro- duced in the Part IB course on Semantics of Programming Languages for specifying type systems for programming languages and reasoning about their properties. The emphasis here is on type systems for functional languages and their connection to constructive logic. We pay par- ticular attention to the notion of parametric polymorphism (also known as generics), both because it has proven useful in practice and because its theory is quite subtle.
Threads Chapter 4

Threads Chapter 4 Reading: 4.1,4.4, 4.5 1 Process Characteristics ● Unit of resource ownership - process is allocated: ■ a virtual address space to hold the process image ■ control of some resources (files, I/O devices...) ● Unit of dispatching - process is an execution path through one or more programs ■ execution may be interleaved with other process ■ the process has an execution state and a dispatching priority 2 Process Characteristics ● These two characteristics are treated independently by some recent OS ● The unit of dispatching is usually referred to a thread or a lightweight process ● The unit of resource ownership is usually referred to as a process or task 3 Multithreading vs. Single threading ● Multithreading: when the OS supports multiple threads of execution within a single process ● Single threading: when the OS does not recognize the concept of thread ● MS-DOS support a single user process and a single thread ● UNIX supports multiple user processes but only supports one thread per process ● Solaris /NT supports multiple threads 4 Threads and Processes 5 Processes Vs Threads ● Have a virtual address space which holds the process image ■ Process: an address space, an execution context ■ Protected access to processors, other processes, files, and I/O Class threadex{ resources Public static void main(String arg[]){ ■ Context switch between Int x=0; processes expensive My_thread t1= new my_thread(x); t1.start(); ● Threads of a process execute in Thr_wait(); a single address space System.out.println(x) ■ Global variables are
Lecture 10: Introduction to Openmp (Part 2)

Lecture 10: Introduction to OpenMP (Part 2) 1 Performance Issues I • C/C++ stores matrices in row-major fashion. • Loop interchanges may increase cache locality { … #pragma omp parallel for for(i=0;i< N; i++) { for(j=0;j< M; j++) { A[i][j] =B[i][j] + C[i][j]; } } } • Parallelize outer-most loop 2 Performance Issues II • Move synchronization points outwards. The inner loop is parallelized. • In each iteration step of the outer loop, a parallel region is created. This causes parallelization overhead. { … for(i=0;i< N; i++) { #pragma omp parallel for for(j=0;j< M; j++) { A[i][j] =B[i][j] + C[i][j]; } } } 3 Performance Issues III • Avoid parallel overhead at low iteration counts { … #pragma omp parallel for if(M > 800) for(j=0;j< M; j++) { aa[j] =alpha*bb[j] + cc[j]; } } 4 C++: Random Access Iterators Loops • Parallelization of random access iterator loops is supported void iterator_example(){ std::vector vec(23); std::vector::iterator it; #pragma omp parallel for default(none) shared(vec) for(it=vec.begin(); it< vec.end(); it++) { // do work with it // } } 5 Conditional Compilation • Keep sequential and parallel programs as a single source code #if def _OPENMP #include “omp.h” #endif Main() { #ifdef _OPENMP omp_set_num_threads(3); #endif for(i=0;i< N; i++) { #pragma omp parallel for for(j=0;j< M; j++) { A[i][j] =B[i][j] + C[i][j]; } } } 6 Be Careful with Data Dependences • Whenever a statement in a program reads or writes a memory location and another statement reads or writes the same memory location, and at least one of the two statements writes the location, then there is a data dependence on that memory location between the two statements.
A DSL for Resource Checking Using Finite State Automaton-Driven Symbolic Execution Code

Open Comput. Sci. 2021; 11:107–115 Research Article Endre Fülöp and Norbert Pataki* A DSL for Resource Checking Using Finite State Automaton-Driven Symbolic Execution https://doi.org/10.1515/comp-2020-0120 code. Compilers validate the syntactic elements, referred Received Mar 31, 2020; accepted May 28, 2020 variables, called functions to name a few. However, many problems may remain undiscovered. Abstract: Static analysis is an essential way to find code Static analysis is a widely-used method which is by smells and bugs. It checks the source code without exe- definition the act of uncovering properties and reasoning cution and no test cases are required, therefore its cost is about software without observing its runtime behaviour, lower than testing. Moreover, static analysis can help in restricting the scope of tools to those which operate on the software engineering comprehensively, since static anal- source representation, the code written in a single or mul- ysis can be used for the validation of code conventions, tiple programming languages. While most static analysis for measuring software complexity and for executing code methods are designed to detect anomalies (called bugs) in refactorings as well. Symbolic execution is a static analy- software code, the methods they employ are varied [1]. One sis method where the variables (e.g. input data) are inter- major difference is the level of abstraction at which the preted with symbolic values. code is represented [2]. Because static analysis is closely Clang Static Analyzer is a powerful symbolic execution related to the compilation of the code, the formats used engine based on the Clang compiler infrastructure that to represent the different abstractions are not unique to can be used with C, C++ and Objective-C.
SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer

SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer Mikhail R. Gadelha∗, Enrico Steffinlongo∗, Lucas C. Cordeiroy, Bernd Fischerz, and Denis A. Nicole∗ ∗University of Southampton, UK. yUniversity of Manchester, UK. zStellenbosch University, South Africa. Abstract—We describe and evaluate a bug refutation extension bit in a is one, and (a & 1) ˆ 1 inverts the last bit in a. for the Clang Static Analyzer (CSA) that addresses the limi- The analyzer, however, produces the following (spurious) bug tations of the existing built-in constraint solver. In particular, report when analyzing the program: we complement CSA’s existing heuristics that remove spurious bug reports. We encode the path constraints produced by CSA as Satisfiability Modulo Theories (SMT) problems, use SMT main.c:4:12: warning: Dereference of null solvers to precisely check them for satisfiability, and remove pointer (loaded from variable ’z’) bug reports whose associated path constraints are unsatisfi- return *z; able. Our refutation extension refutes spurious bug reports in ˆ˜ 8 out of 12 widely used open-source applications; on aver- age, it refutes ca. 7% of all bug reports, and never refutes 1 warning generated. any true bug report. It incurs only negligible performance overheads, and on average adds 1.2% to the runtime of the The null pointer dereference reported here means that CSA full Clang/LLVM toolchain. A demonstration is available at claims to nevertheless have found a path where the dereference https://www.youtube.com/watch?v=ylW5iRYNsGA. of z is reachable. Such spurious bug reports are in practice common; in our I.
The Problem with Threads

The Problem with Threads Edward A. Lee Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2006-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html January 10, 2006 Copyright © 2006, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement This work was supported in part by the Center for Hybrid and Embedded Software Systems (CHESS) at UC Berkeley, which receives support from the National Science Foundation (NSF award No. CCR-0225610), the State of California Micro Program, and the following companies: Agilent, DGIST, General Motors, Hewlett Packard, Infineon, Microsoft, and Toyota. The Problem with Threads Edward A. Lee Professor, Chair of EE, Associate Chair of EECS EECS Department University of California at Berkeley Berkeley, CA 94720, U.S.A. [email protected] January 10, 2006 Abstract Threads are a seemingly straightforward adaptation of the dominant sequential model of computation to concurrent systems. Languages require little or no syntactic changes to sup- port threads, and operating systems and architectures have evolved to efﬁciently support them. Many technologists are pushing for increased use of multithreading in software in order to take advantage of the predicted increases in parallelism in computer architectures.
ACCU 2015 “New” Features in C

"New" Features in C ACCU 2015 “New” Features in C Dan Saks Saks & Associates www.dansaks.com 1 Abstract The first international standard for the C programming language was C90. Since then, two newer standards have been published, C99 and C11. C99 introduced a significant number of new features. C11 introduced a few more, some of which have been available in compilers for some time. Curiously, many of these added features don’t seem to have caught on. Many C programmers still program in C90. This session explains many of these “new” features, including declarations in for-statements, typedef redefinitions, inline functions, complex arithmetic, extended integer types, variable- length arrays, flexible array members, compound literals, designated initializers, restricted pointers, type-qualified array parameters, anonymous structures and unions, alignment support, non-returning functions, and static assertions. 2 Copyright © 2015 by Daniel Saks 1 "New" Features in C About Dan Saks Dan Saks is the president of Saks & Associates, which offers training and consulting in C and C++ and their use in developing embedded systems. Dan has written columns for numerous print publications including The C/C++ Users Journal , The C++ Report , Software Development , and Embedded Systems Design . He currently writes the online “Programming Pointers” column for embedded.com . With Thomas Plum, he wrote C++ Programming Guidelines , which won a 1992 Computer Language Magazine Productivity Award . He has also been a Microsoft MVP. Dan has taught thousands of programmers around the world. He has presented at conferences such as Software Development and Embedded Systems , and served on the advisory boards for those conferences.
Open C++ Tutorial∗

Open C++ Tutorial∗ Shigeru Chiba Institute of Information Science and Electronics University of Tsukuba [email protected] Copyright c 1998 by Shigeru Chiba. All Rights Reserved. 1 Introduction OpenC++ is an extensible language based on C++. The extended features of OpenC++ are specified by a meta-level program given at compile time. For dis- tinction, regular programs written in OpenC++ are called base-level programs. If no meta-level program is given, OpenC++ is identical to regular C++. The meta-level program extends OpenC++ through the interface called the OpenC++ MOP. The OpenC++ compiler consists of three stages: preprocessor, source-to-source translator from OpenC++ to C++, and the back-end C++ com- piler. The OpenC++ MOP is an interface to control the translator at the second stage. It allows to specify how an extended feature of OpenC++ is translated into regular C++ code. An extended feature of OpenC++ is supplied as an add-on software for the compiler. The add-on software consists of not only the meta-level program but also runtime support code. The runtime support code provides classes and functions used by the base-level program translated into C++. The base-level program in OpenC++ is first translated into C++ according to the meta-level program. Then it is linked with the runtime support code to be executable code. This flow is illustrated by Figure 1. The meta-level program is also written in OpenC++ since OpenC++ is a self- reflective language. It defines new metaobjects to control source-to-source transla- tion. The metaobjects are the meta-level representation of the base-level program and they perform the translation.
Let's Get Functional

5 LET’S GET FUNCTIONAL I’ve mentioned several times that F# is a functional language, but as you’ve learned from previous chapters you can build rich applications in F# without using any functional techniques. Does that mean that F# isn’t really a functional language? No. F# is a general-purpose, multi paradigm language that allows you to program in the style most suited to your task. It is considered a functional-first lan- guage, meaning that its constructs encourage a functional style. In other words, when developing in F# you should favor functional approaches whenever possible and switch to other styles as appropriate. In this chapter, we’ll see what functional programming really is and how functions in F# differ from those in other languages. Once we’ve estab- lished that foundation, we’ll explore several data types commonly used with functional programming and take a brief side trip into lazy evaluation. The Book of F# © 2014 by Dave Fancher What Is Functional Programming? Functional programming takes a fundamentally different approach toward developing software than object-oriented programming. While object-oriented programming is primarily concerned with managing an ever-changing system state, functional programming emphasizes immutability and the application of deterministic functions. This difference drastically changes the way you build software, because in object-oriented programming you’re mostly concerned with defining classes (or structs), whereas in functional programming your focus is on defining functions with particular emphasis on their input and output. F# is an impure functional language where data is immutable by default, though you can still define mutable data or cause other side effects in your functions.
Modifiable Array Data Structures for Mesh Topology

Modifiable Array Data Structures for Mesh Topology Dan Ibanez Mark S Shephard February 27, 2016 Abstract Topological data structures are useful in many areas, including the various mesh data structures used in finite element applications. Based on the graph-theoretic foundation for these data structures, we begin with a generic modifiable graph data structure and apply successive optimiza- tions leading to a family of mesh data structures. The results are compact array-based mesh structures that can be modified in constant time. Spe- cific implementations for finite elements and graphics are studied in detail and compared to the current state of the art. 1 Introduction An unstructured mesh simulation code relies heavily on multiple core capabili- ties to deal with the mesh itself, and the range of features available at this level constrain the capabilities of the simulation as a whole. As such, the long-term goal towards which this paper contributes is the development of a mesh data structure with the following capabilities: 1. The flexibility to deal with evolving meshes 2. A highly scalable implementation for distributed memory computers 3. The ability to represent any of the conforming meshes typically used by Finite Element (FE) and Finite Volume (FV) methods 4. Minimize memory use 5. Maximize locality of storage 6. The ability to parallelize work inside supercomputer nodes with hybrid architecture such as accelerators This paper focuses on the first five goals; the sixth will be the subject of a future publication. In particular, we present a derivation for a family of structures with these properties: 1 1.