Scientific Programming and Computer Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Clangjit: Enhancing C++ with Just-In-Time Compilation
ClangJIT: Enhancing C++ with Just-in-Time Compilation Hal Finkel David Poliakoff David F. Richards Lead, Compiler Technology and Lawrence Livermore National Lawrence Livermore National Programming Languages Laboratory Laboratory Leadership Computing Facility Livermore, CA, USA Livermore, CA, USA Argonne National Laboratory [email protected] [email protected] Lemont, IL, USA [email protected] ABSTRACT body of C++ code, but critically, defer the generation and optimiza- The C++ programming language is not only a keystone of the tion of template specializations until runtime using a relatively- high-performance-computing ecosystem but has proven to be a natural extension to the core C++ programming language. successful base for portable parallel-programming frameworks. As A significant design requirement for ClangJIT is that the runtime- is well known, C++ programmers use templates to specialize al- compilation process not explicitly access the file system - only gorithms, thus allowing the compiler to generate highly-efficient loading data from the running binary is permitted - which allows code for specific parameters, data structures, and so on. This capa- for deployment within environments where file-system access is bility has been limited to those specializations that can be identi- either unavailable or prohibitively expensive. In addition, this re- fied when the application is compiled, and in many critical cases, quirement maintains the redistributibility of the binaries using the compiling all potentially-relevant specializations is not practical. JIT-compilation features (i.e., they can run on systems where the ClangJIT provides a well-integrated C++ language extension allow- source code is unavailable). For example, on large HPC deploy- ing template-based specialization to occur during program execu- ments, especially on supercomputers with distributed file systems, tion. -
Three-Dimensional Integrated Circuit Design: EDA, Design And
Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change. -
Overview of LLVM Architecture of LLVM
Overview of LLVM Architecture of LLVM Front-end: high-level programming language => LLVM IR Optimizer: optimize/analyze/secure the program in the IR form Back-end: LLVM IR => machine code Optimizer The optimizer’s job: analyze/optimize/secure programs. Optimizations are implemented as passes that traverse some portion of a program to either collect information or transform the program. A pass is an operation on a unit of IR code. Pass is an important concept in LLVM. LLVM IR - A low-level strongly-typed language-independent, SSA-based representation. - Tailored for static analyses and optimization purposes. Part 1 Part 1 has two kinds of passes: - Analysis pass (section 1): only analyze code statically - Transformation pass (section 2 & 3): insert code into the program Analysis pass (Section 1) Void foo (uint32_t int, uint32_t * p) { LLVM IR ... Clang opt } test.c test.bc stderr mypass.so Transformation pass (Section 2 & 3) mypass.so Void foo (uint32_t int, uint32_t * p) { ... LLVM IR opt LLVM IR } test.cpp Int main () { test.bc test-ins.bc ... Clang++ foo () ... LLVM IR } Clang++ main.cpp main.bc LLVM IR lib.cpp Executable lib.bc Section 1 Challenges: - How to traverse instructions in a function http://releases.llvm.org/3.9.1/docs/ProgrammersManual.html#iterating-over-the-instruction-in-a-function - How to print to stderr Section 2 & 3 Challenges: 1. How to traverse basic blocks in a function and instructions in a basic block 2. How to insert function calls to the runtime library a. Add the function signature to the symbol table of the module Section 2 & 3 Challenges: 1. -
Declustering Spatial Databases on a Multi-Computer Architecture
Declustering spatial databases on a multi-computer architecture 1 2 ? 3 Nikos Koudas and Christos Faloutsos and Ibrahim Kamel 1 Computer Systems Research Institute University of Toronto 2 AT&T Bell Lab oratories Murray Hill, NJ 3 Matsushita Information Technology Lab oratory Abstract. We present a technique to decluster a spatial access metho d + on a shared-nothing multi-computer architecture [DGS 90]. We prop ose a software architecture with the R-tree as the underlying spatial access metho d, with its non-leaf levels on the `master-server' and its leaf no des distributed across the servers. The ma jor contribution of our work is the study of the optimal capacity of leaf no des, or `chunk size' (or `striping unit'): we express the resp onse time on range queries as a function of the `chunk size', and we show how to optimize it. We implemented our metho d on a network of workstations, using a real dataset, and we compared the exp erimental and the theoretical results. The conclusion is that our formula for the resp onse time is very accurate (the maximum relative error was 29%; the typical error was in the vicinity of 10-15%). We illustrate one of the p ossible ways to exploit such an accurate formula, by examining several `what-if ' scenarios. One ma jor, practical conclusion is that a chunk size of 1 page gives either optimal or close to optimal results, for a wide range of the parameters. Keywords: Parallel data bases, spatial access metho ds, shared nothing ar- chitecture. 1 Intro duction One of the requirements for the database management systems (DBMSs) of the future is the ability to handle spatial data. -
Chap01: Computer Abstractions and Technology
CHAPTER 1 Computer Abstractions and Technology 1.1 Introduction 3 1.2 Eight Great Ideas in Computer Architecture 11 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Fallacies and Pitfalls 49 1.11 Concluding Remarks 52 1.12 Historical Perspective and Further Reading 54 1.13 Exercises 54 CMPS290 Class Notes (Chap01) Page 1 / 24 by Kuo-pao Yang 1.1 Introduction 3 Modern computer technology requires professionals of every computing specialty to understand both hardware and software. Classes of Computing Applications and Their Characteristics Personal computers o A computer designed for use by an individual, usually incorporating a graphics display, a keyboard, and a mouse. o Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. o This class of computing drove the evolution of many computing technologies, which is only about 35 years old! Server computers o A computer used for running larger programs for multiple users, often simultaneously, and typically accessed only via a network. o Servers are built from the same basic technology as desktop computers, but provide for greater computing, storage, and input/output capacity. Supercomputers o A class of computers with the highest performance and cost o Supercomputers consist of tens of thousands of processors and many terabytes of memory, and cost tens to hundreds of millions of dollars. -
Arch2030: a Vision of Computer Architecture Research Over
Arch2030: A Vision of Computer Architecture Research over the Next 15 Years This material is based upon work supported by the National Science Foundation under Grant No. (1136993). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Arch2030: A Vision of Computer Architecture Research over the Next 15 Years Luis Ceze, Mark D. Hill, Thomas F. Wenisch Sponsored by ARCH2030: A VISION OF COMPUTER ARCHITECTURE RESEARCH OVER THE NEXT 15 YEARS Summary .........................................................................................................................................................................1 The Specialization Gap: Democratizing Hardware Design ..........................................................................................2 The Cloud as an Abstraction for Architecture Innovation ..........................................................................................4 Going Vertical ................................................................................................................................................................5 Architectures “Closer to Physics” ................................................................................................................................5 Machine Learning as a Key Workload ..........................................................................................................................6 About this -
Compiling a Higher-Order Smart Contract Language to LLVM
Compiling a Higher-Order Smart Contract Language to LLVM Vaivaswatha Nagaraj Jacob Johannsen Anton Trunov Zilliqa Research Zilliqa Research Zilliqa Research [email protected] [email protected] [email protected] George Pîrlea Amrit Kumar Ilya Sergey Zilliqa Research Zilliqa Research Yale-NUS College [email protected] [email protected] National University of Singapore [email protected] Abstract +----------------------+ Scilla is a higher-order polymorphic typed intermediate | Blockchain Smart | | Contract Module | level language for implementing smart contracts. In this talk, | in C++ (BC) | +----------------------+ we describe a Scilla compiler targeting LLVM, with a focus + state variable | + ^ on mapping Scilla types, values, and its functional language foo.scilla | | | & message | fetch| | constructs to LLVM-IR. | | |update v v | The compiled LLVM-IR, when executed with LLVM’s JIT +--------------------------------------+---------------------------------+ framework, achieves a speedup of about 10x over the refer- | | | +-------------+ +----------------+ | ence interpreter on a typical Scilla contract. This reduced | +-----------------> |JIT Driver | +--> | Scilla Run-time| | | | |in C++ (JITD)| | Library in C++ | | latency is crucial in the setting of blockchains, where smart | | +-+-------+---+ | (SRTL) | | | | | ^ +----------------+ | contracts are executed as parts of transactions, to achieve | | | | | | | foo.scilla| | | peak transactions processed per second. Experiments on the | | | foo.ll| | | | | | | Ackermann -
Computer Architecture Techniques for Power-Efficiency
MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY i MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 ii MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 iii Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi www.morganclaypool.com ISBN: 9781598292084 paper ISBN: 9781598292091 ebook DOI: 10.2200/S00119ED1V01Y200805CAC004 A Publication in the Morgan & Claypool Publishers -
Majnemer-Fuzzingclang.Pdf
Fuzzing Clang to find ABI Bugs David Majnemer What’s in an ABI? • The size, alignment, etc. of types • Layout of records, RTTI, virtual tables, etc. • The decoration of types, functions, etc. • To generalize: anything that you need N > 1 compilers to agree upon C++: A complicated language union U { int a; int b; }; ! int U::*x = &U::a; int U::*y = &U::b; ! Does ‘x’ equal ‘y’ ? We’ve got a standard How hard could it be? “[T]wo pointers to members compare equal if they would refer to the same member of the same most derived object or the same subobject if indirection with a hypothetical object of the associated class type were performed, otherwise they compare unequal.” No ABI correctly implements this. Why does any of this matter? • Data passed across ABI boundaries may be interpreted by another compiler • Unpredictable things may happen if two compilers disagree about how to interpret this data • Subtle bugs can be some of the worst bugs Finding bugs isn’t easy • ABI implementation techniques may collide with each other in unpredictable ways • One compiler permutes field order in structs if the alignment is 16 AND it has an empty virtual base AND it has at least one bitfield member AND … • Some ABIs are not documented • Even if they are, you can’t always trust the documentation What happens if we aren’t proactive • Let users find our bugs for us • This can be demoralizing for users, eroding their trust • Altruistic; we must hope that the user will file the bug • At best, the user’s time has been spent on something they probably didn’t want to do Let computers find the bugs 1. -
COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011
COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Edgar Gabriel Spring 2011 Edgar Gabriel Moore’s Law • Long-term trend on the number of transistor per integrated circuit • Number of transistors double every ~18 month Source: http://en.wikipedia.org/wki/Images:Moores_law.svg COSC 6385 – Computer Architecture Edgar Gabriel 1 What do we do with that many transistors? • Optimizing the execution of a single instruction stream through – Pipelining • Overlap the execution of multiple instructions • Example: all RISC architectures; Intel x86 underneath the hood – Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR) • Example: all commercial processors (Intel, AMD, IBM, SUN) – Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches • Example: (nearly) all commercial processors COSC 6385 – Computer Architecture Edgar Gabriel What do we do with that many transistors? (II) – Multi-issue processors: • Allow multiple instructions to start execution per clock cycle • Superscalar (Intel x86, AMD, …) vs. VLIW architectures – VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet • Example: Intel Itanium series – Vector units: • Allow for the efficient expression and execution of vector operations • Example: SSE, SSE2, SSE3, SSE4 instructions COSC 6385 – Computer Architecture Edgar Gabriel 2 Limitations of optimizing a single instruction -
Using ID TECH Universal SDK Library Files in a C++ Project
Using ID TECH Universal SDK Library Files in a C++ Project Introduction From time to time, customers who wish to use ID TECH's Universal SDK for Windows (which is .NET-based and comes with C# code examples) ask if it is possible to do development against the SDK solely in C++ (on Windows). The answer is yes. Universal SDK library files (DLLs) are COM-visible and ready to be accessed from C++ code. (SDK runtimes require the .NET Common Language Runtime, but your C++ binaries can still use the SDK.) Note that while the example shown in this document involves Microsoft's Visual Studio, it is also possible to use SDK libraries in C++ projects created in Eclipse or other IDEs. How to Use the IDTechSDK.dll File in a C++ Project: 1. Create a Visual C++ project in Visual Studio 2015 (shown below, an MFC Application as an example). 2. Change the properties of the Visual C++ project. Under the General tag, set Commom Language Runtime Support under Target Platform to "Common Language Runtime Support (/clr)" under Windows. 3. Under VC++ Directories, add the path to the C# .dll file(s) to Reference Directories. 4. Under C/C++ General, set Commom Language Runtime Support to "Common Language Runtime Support (/clr)." 5. Under C/C++ Preprocessor, add _AFXDLL to Preprocessor Definitions. 6. Under C/C++ Code Generation, change Runtime Library to "Multi-threaded DLL (/MD)." 7. Under Code Analysis General, change Rule Set to "Microsoft Mixed (C++ /CLR) Recommended Rules." 8. Use IDTechSDK.dll in your .cpp file. a. -
Using Ld the GNU Linker
Using ld The GNU linker ld version 2 January 1994 Steve Chamberlain Cygnus Support Cygnus Support [email protected], [email protected] Using LD, the GNU linker Edited by Jeffrey Osier (jeff[email protected]) Copyright c 1991, 92, 93, 94, 95, 96, 97, 1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another lan- guage, under the above conditions for modified versions. Chapter 1: Overview 1 1 Overview ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld. ld accepts Linker Command Language files written in a superset of AT&T’s Link Editor Command Language syntax, to provide explicit and total control over the linking process. This version of ld uses the general purpose BFD libraries to operate on object files. This allows ld to read, combine, and write object files in many different formats—for example, COFF or a.out. Different formats may be linked together to produce any available kind of object file. See Chapter 5 [BFD], page 47, for more information. Aside from its flexibility, the gnu linker is more helpful than other linkers in providing diagnostic information.