Scientific Programming and Computer Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Scientific Programming and Computer Architecture Scientific Programming and Computer Architecture Scientific Programming and Computer Architecture Scientific and Engineering Computation William Gropp and Ewing Lusk, editors; Janusz Kowalik, founding editor A complete list of books published in the Scientific and Engineering Computation series appears at the back of this book. Scientific Programming and Computer Architecture Divakar Viswanath The MIT Press Cambridge, Massachusetts London, England © 2017 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in LyX by the author. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Names: Viswanath, Divakar, author. Title: Scientific programming and computer architecture / Divakar Viswanath. Description: Cambridge, MA : The MIT Press, [2017] | Series: Scientific and engineering computation | Includes bibliographical references and index. Identifiers: LCCN 2016043792 | ISBN 9780262036290 (hardcover : alk. paper) Subjects: LCSH: Computer programming. | Computer architecture. | Software engineering. | C (Computer program language) Classification: LCC QA76.6 .V573 2017 | DDC 005.1--dc23 LC record available at https://lccn.loc.gov/2016043792 10 9 8 7 6 5 4 3 2 1 To all my teachers, with thanks. https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Table of Contents Chapter: Preface Chapter 1: C/C++: Review Section 1.1: An example: The Aitken transformation Subsection 1.1.1: Leibniz series and the logarithmic series Subsection 1.1.2: Modular organization of sources Section 1.2: C review Subsection 1.2.1: Header files Subsection 1.2.2: Arrays and pointers Subsection 1.2.3: The Aitken iteration using arrays and pointers Subsection 1.2.4: Declarations and definitions Subsection 1.2.5: Function calls and the compilation process Section 1.3: C++ review Subsection 1.3.1: The Vector class Subsection 1.3.2: Aitken transformation in C++ Section 1.4: A little Fortran Section 1.5: References Chapter 2: C/C++: Libraries and Makefiles Section 2.1: Mixed-language programming Subsection 2.1.1: Transmutation of names from source to object files Subsection 2.1.2: Linking Fortran programs with C and C++ Section 2.2: Using BLAS and LAPACK libraries Subsection 2.2.1: Arrays, matrices, and leading dimensions Subsection 2.2.2: BLAS and LAPACK Subsection 2.2.3: C++ class interface to BLAS/LAPACK Section 2.3: Building programs using GNU Make https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Subsection 2.3.1: The utils/ folder Subsection 2.3.2: Targets, prerequisites, and dependency graphs Subsection 2.3.3: Make variables in makevars.mk Subsection 2.3.4: Pattern rules in makevars.mk Subsection 2.3.5: Phony targets in makevars.mk Subsection 2.3.6: Recursive make and .d files Subsection 2.3.7: Beyond recursive make Subsection 2.3.8: Building your own library Section 2.4: The Fast Fourier Transform Subsection 2.4.1: The FFT algorithm in outline Subsection 2.4.2: FFT using MKL Subsection 2.4.3: FFT using FFTW Subsection 2.4.4: Cycles and histograms Subsection 2.4.5: Optimality of FFT implementations Section 2.5: References Chapter 3: The Processor Section 3.1: Overview of the x86 architecture Subsection 3.1.1: 64-bit x86 architecture Subsection 3.1.2: 64-bit x86 assembly programming Subsection 3.1.3: The Time Stamp Counter Subsection 3.1.4: Cache parameters and the CPUID instruction Section 3.2: Compiler optimizations Subsection 3.2.1: Preliminaries Subsection 3.2.2: Loop unrolling Subsection 3.2.3: Loop fusion Subsection 3.2.4: Unroll and jam Subsection 3.2.5: Loop interchange Subsection 3.2.6: C++ overhead Subsection 3.2.7: A little compiler theory Section 3.3: Optimizing for the instruction pipeline Subsection 3.3.1: Instruction pipelines Subsection 3.3.2: Chipsets Subsection 3.3.3: Peak floating point performance Subsection 3.3.4: Microkernel for matrix multiplication Section 3.4: References Chapter 4: Memory Section 4.1: DRAM and cache memory Subsection 4.1.1: DRAM memory https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Subsection 4.1.2: Cache memory Subsection 4.1.3: Physical memory and virtual memory Subsection 4.1.4: Latency to DRAM memory: First attempts Subsection 4.1.5: Latency to DRAM Section 4.2: Optimizing memory access Subsection 4.2.1: Bandwidth to DRAM Subsection 4.2.2: Matrix transpose Subsection 4.2.3: Optimized matrix multiplication Section 4.3: Reading from and writing to disk Subsection 4.3.1: C versus C++ Subsection 4.3.2: Latency to disk Subsection 4.3.3: Bandwidth to disk Section 4.4: Page tables and virtual memory Subsection 4.4.1: Partitioning the virtual address space Subsection 4.4.2: Physical address space and page tables Section 4.5: References Chapter 5: Threads and Shared Memory Section 5.1: Introduction to OpenMP Subsection 5.1.1: OpenMP syntax Subsection 5.1.2: Shared variables and OpenMP’s memory model Subsection 5.1.3: Overheads of OpenMP constructs Section 5.2: Optimizing OpenMP programs Subsection 5.2.1: Near memory and far memory Subsection 5.2.2: Bandwidth to DRAM memory Subsection 5.2.3: Matrix transpose Subsection 5.2.4: Fast Fourier transform Section 5.3: Introduction to Pthreads Subsection 5.3.1: Pthreads Subsection 5.3.2: Overhead of thread creation Subsection 5.3.3: Parallel regions using Pthreads Section 5.4: Program memory Subsection 5.4.1: An easy system call Subsection 5.4.2: Stacks Subsection 5.4.3: Segmentation faults and memory errors Section 5.5: References Chapter 6: Special Topic: Networks and Message Passing Section 6.1: MPI: Getting started Subsection 6.1.1: Initializing MPI https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Subsection 6.1.2: Unsafe communication in MPI Section 6.2: High-performance network architecture Subsection 6.2.1: Fat-tree network Subsection 6.2.2: Infiniband network architecture Section 6.3: MPI examples Subsection 6.3.1: Variants of MPI send and receive Subsection 6.3.2: Jacobi iteration Subsection 6.3.3: Matrix transpose Subsection 6.3.4: Collective communication Subsection 6.3.5: Parallel I/O in MPI Section 6.4: The Internet Subsection 6.4.1: IP addresses Subsection 6.4.2: Send and receive Subsection 6.4.3: Server Subsection 6.4.4: Client Subsection 6.4.5: Internet latency Subsection 6.4.6: Internet bandwidth Section 6.5: References Chapter 7: Special Topic: The Xeon Phi Coprocessor Section 7.1: Xeon Phi architecture Subsection 7.1.1: Peak floating point bandwidth Subsection 7.1.2: A simple Phi program Subsection 7.1.3: Xeon Phi memory system Section 7.2: Offload Subsection 7.2.1: Initializing to use the MIC device Subsection 7.2.2: The target(mic) declaration specification Subsection 7.2.3: Summing the Leibniz series Subsection 7.2.4: Offload bandwidth Section 7.3: Two examples: FFT and matrix multiplication Subsection 7.3.1: FFT Subsection 7.3.2: Matrix multiplication Chapter 8: Special Topic: Graphics Coprocessor Section 8.1: Graphics coprocessor architecture Subsection 8.1.1: Graphics processor capability Subsection 8.1.2: Host and device memory Subsection 8.1.3: Timing CUDA kernels Subsection 8.1.4: Warps and thread blocks Section 8.2: Introduction to CUDA https://divakarvi.github.io/bk-spca/spca.html[20-1-2019 23:44:49] Scientific Programming and Computer Architecture Subsection 8.2.1: Summing the Leibniz series Subsection 8.2.2: CUDA compilation Section 8.3: Two examples Subsection 8.3.1: Bandwidth to memory Subsection 8.3.2: Matrix multiplication Section 8.4: References Chapter 9: Machines Used, Plotting, Python, GIT, Cscope, and gcc Section 9.1: Machines used Section 9.2: Plotting in C/C++ and other preliminaries Section 9.3: C/C++ versus Python versus MATLAB Section 9.4: GIT Section 9.5: Cscope Section 9.6: Compiling with gcc/g++ The website https://github.com/divakarvi/bk-spca has all the programs discussed in this book. Preface It is a common experience that minor changes to C/C++ programs can make a big difference to their speed. Although all programmers who opt for C/C++ do so at least partly, and much of the time mainly, because programs in these languages can be fast, writing fast programs in these languages is not so straightforward. Well-optimized programs in C/C++ can be even 10 or more times faster than programs that are not well optimized. At the heart of this book is the following question: what makes computer programs fast or slow? Programming languages provide a level of abstraction that makes computers look simpler than they are. As soon as we ask this question about program speed, we have to get behind the abstractions and understand how a computer really works and how programming constructs map to different parts of the computer’s architecture. Although there is much that can be understood, the modern computer is such a complicated device that this basic question cannot be answered perfectly. Writing fast programs is the major theme of this book, but it is not the only theme. The other theme is modularity of programs.
Recommended publications
  • Clangjit: Enhancing C++ with Just-In-Time Compilation
    ClangJIT: Enhancing C++ with Just-in-Time Compilation Hal Finkel David Poliakoff David F. Richards Lead, Compiler Technology and Lawrence Livermore National Lawrence Livermore National Programming Languages Laboratory Laboratory Leadership Computing Facility Livermore, CA, USA Livermore, CA, USA Argonne National Laboratory [email protected] [email protected] Lemont, IL, USA [email protected] ABSTRACT body of C++ code, but critically, defer the generation and optimiza- The C++ programming language is not only a keystone of the tion of template specializations until runtime using a relatively- high-performance-computing ecosystem but has proven to be a natural extension to the core C++ programming language. successful base for portable parallel-programming frameworks. As A significant design requirement for ClangJIT is that the runtime- is well known, C++ programmers use templates to specialize al- compilation process not explicitly access the file system - only gorithms, thus allowing the compiler to generate highly-efficient loading data from the running binary is permitted - which allows code for specific parameters, data structures, and so on. This capa- for deployment within environments where file-system access is bility has been limited to those specializations that can be identi- either unavailable or prohibitively expensive. In addition, this re- fied when the application is compiled, and in many critical cases, quirement maintains the redistributibility of the binaries using the compiling all potentially-relevant specializations is not practical. JIT-compilation features (i.e., they can run on systems where the ClangJIT provides a well-integrated C++ language extension allow- source code is unavailable). For example, on large HPC deploy- ing template-based specialization to occur during program execu- ments, especially on supercomputers with distributed file systems, tion.
    [Show full text]
  • Three-Dimensional Integrated Circuit Design: EDA, Design And
    Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change.
    [Show full text]
  • Overview of LLVM Architecture of LLVM
    Overview of LLVM Architecture of LLVM Front-end: high-level programming language => LLVM IR Optimizer: optimize/analyze/secure the program in the IR form Back-end: LLVM IR => machine code Optimizer The optimizer’s job: analyze/optimize/secure programs. Optimizations are implemented as passes that traverse some portion of a program to either collect information or transform the program. A pass is an operation on a unit of IR code. Pass is an important concept in LLVM. LLVM IR - A low-level strongly-typed language-independent, SSA-based representation. - Tailored for static analyses and optimization purposes. Part 1 Part 1 has two kinds of passes: - Analysis pass (section 1): only analyze code statically - Transformation pass (section 2 & 3): insert code into the program Analysis pass (Section 1) Void foo (uint32_t int, uint32_t * p) { LLVM IR ... Clang opt } test.c test.bc stderr mypass.so Transformation pass (Section 2 & 3) mypass.so Void foo (uint32_t int, uint32_t * p) { ... LLVM IR opt LLVM IR } test.cpp Int main () { test.bc test-ins.bc ... Clang++ foo () ... LLVM IR } Clang++ main.cpp main.bc LLVM IR lib.cpp Executable lib.bc Section 1 Challenges: - How to traverse instructions in a function http://releases.llvm.org/3.9.1/docs/ProgrammersManual.html#iterating-over-the-instruction-in-a-function - How to print to stderr Section 2 & 3 Challenges: 1. How to traverse basic blocks in a function and instructions in a basic block 2. How to insert function calls to the runtime library a. Add the function signature to the symbol table of the module Section 2 & 3 Challenges: 1.
    [Show full text]
  • Declustering Spatial Databases on a Multi-Computer Architecture
    Declustering spatial databases on a multi-computer architecture 1 2 ? 3 Nikos Koudas and Christos Faloutsos and Ibrahim Kamel 1 Computer Systems Research Institute University of Toronto 2 AT&T Bell Lab oratories Murray Hill, NJ 3 Matsushita Information Technology Lab oratory Abstract. We present a technique to decluster a spatial access metho d + on a shared-nothing multi-computer architecture [DGS 90]. We prop ose a software architecture with the R-tree as the underlying spatial access metho d, with its non-leaf levels on the `master-server' and its leaf no des distributed across the servers. The ma jor contribution of our work is the study of the optimal capacity of leaf no des, or `chunk size' (or `striping unit'): we express the resp onse time on range queries as a function of the `chunk size', and we show how to optimize it. We implemented our metho d on a network of workstations, using a real dataset, and we compared the exp erimental and the theoretical results. The conclusion is that our formula for the resp onse time is very accurate (the maximum relative error was 29%; the typical error was in the vicinity of 10-15%). We illustrate one of the p ossible ways to exploit such an accurate formula, by examining several `what-if ' scenarios. One ma jor, practical conclusion is that a chunk size of 1 page gives either optimal or close to optimal results, for a wide range of the parameters. Keywords: Parallel data bases, spatial access metho ds, shared nothing ar- chitecture. 1 Intro duction One of the requirements for the database management systems (DBMSs) of the future is the ability to handle spatial data.
    [Show full text]
  • Chap01: Computer Abstractions and Technology
    CHAPTER 1 Computer Abstractions and Technology 1.1 Introduction 3 1.2 Eight Great Ideas in Computer Architecture 11 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Fallacies and Pitfalls 49 1.11 Concluding Remarks 52 1.12 Historical Perspective and Further Reading 54 1.13 Exercises 54 CMPS290 Class Notes (Chap01) Page 1 / 24 by Kuo-pao Yang 1.1 Introduction 3 Modern computer technology requires professionals of every computing specialty to understand both hardware and software. Classes of Computing Applications and Their Characteristics Personal computers o A computer designed for use by an individual, usually incorporating a graphics display, a keyboard, and a mouse. o Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. o This class of computing drove the evolution of many computing technologies, which is only about 35 years old! Server computers o A computer used for running larger programs for multiple users, often simultaneously, and typically accessed only via a network. o Servers are built from the same basic technology as desktop computers, but provide for greater computing, storage, and input/output capacity. Supercomputers o A class of computers with the highest performance and cost o Supercomputers consist of tens of thousands of processors and many terabytes of memory, and cost tens to hundreds of millions of dollars.
    [Show full text]
  • Arch2030: a Vision of Computer Architecture Research Over
    Arch2030: A Vision of Computer Architecture Research over the Next 15 Years This material is based upon work supported by the National Science Foundation under Grant No. (1136993). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Arch2030: A Vision of Computer Architecture Research over the Next 15 Years Luis Ceze, Mark D. Hill, Thomas F. Wenisch Sponsored by ARCH2030: A VISION OF COMPUTER ARCHITECTURE RESEARCH OVER THE NEXT 15 YEARS Summary .........................................................................................................................................................................1 The Specialization Gap: Democratizing Hardware Design ..........................................................................................2 The Cloud as an Abstraction for Architecture Innovation ..........................................................................................4 Going Vertical ................................................................................................................................................................5 Architectures “Closer to Physics” ................................................................................................................................5 Machine Learning as a Key Workload ..........................................................................................................................6 About this
    [Show full text]
  • Compiling a Higher-Order Smart Contract Language to LLVM
    Compiling a Higher-Order Smart Contract Language to LLVM Vaivaswatha Nagaraj Jacob Johannsen Anton Trunov Zilliqa Research Zilliqa Research Zilliqa Research [email protected] [email protected] [email protected] George Pîrlea Amrit Kumar Ilya Sergey Zilliqa Research Zilliqa Research Yale-NUS College [email protected] [email protected] National University of Singapore [email protected] Abstract +----------------------+ Scilla is a higher-order polymorphic typed intermediate | Blockchain Smart | | Contract Module | level language for implementing smart contracts. In this talk, | in C++ (BC) | +----------------------+ we describe a Scilla compiler targeting LLVM, with a focus + state variable | + ^ on mapping Scilla types, values, and its functional language foo.scilla | | | & message | fetch| | constructs to LLVM-IR. | | |update v v | The compiled LLVM-IR, when executed with LLVM’s JIT +--------------------------------------+---------------------------------+ framework, achieves a speedup of about 10x over the refer- | | | +-------------+ +----------------+ | ence interpreter on a typical Scilla contract. This reduced | +-----------------> |JIT Driver | +--> | Scilla Run-time| | | | |in C++ (JITD)| | Library in C++ | | latency is crucial in the setting of blockchains, where smart | | +-+-------+---+ | (SRTL) | | | | | ^ +----------------+ | contracts are executed as parts of transactions, to achieve | | | | | | | foo.scilla| | | peak transactions processed per second. Experiments on the | | | foo.ll| | | | | | | Ackermann
    [Show full text]
  • Computer Architecture Techniques for Power-Efficiency
    MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 COMPUTER ARCHITECTURE TECHNIQUES FOR POWER-EFFICIENCY i MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 ii MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 iii Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50 to 150 page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Mutiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, James Laudon 2007 Transactional Memory James R. Larus, Ravi Rajwar 2007 Quantum Computing for Computer Architects Tzvetan S. Metodi, Frederic T. Chong 2006 MOCL005-FM MOCL005-FM.cls June 27, 2008 8:35 Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi www.morganclaypool.com ISBN: 9781598292084 paper ISBN: 9781598292091 ebook DOI: 10.2200/S00119ED1V01Y200805CAC004 A Publication in the Morgan & Claypool Publishers
    [Show full text]
  • Majnemer-Fuzzingclang.Pdf
    Fuzzing Clang to find ABI Bugs David Majnemer What’s in an ABI? • The size, alignment, etc. of types • Layout of records, RTTI, virtual tables, etc. • The decoration of types, functions, etc. • To generalize: anything that you need N > 1 compilers to agree upon C++: A complicated language union U { int a; int b; }; ! int U::*x = &U::a; int U::*y = &U::b; ! Does ‘x’ equal ‘y’ ? We’ve got a standard How hard could it be? “[T]wo pointers to members compare equal if they would refer to the same member of the same most derived object or the same subobject if indirection with a hypothetical object of the associated class type were performed, otherwise they compare unequal.” No ABI correctly implements this. Why does any of this matter? • Data passed across ABI boundaries may be interpreted by another compiler • Unpredictable things may happen if two compilers disagree about how to interpret this data • Subtle bugs can be some of the worst bugs Finding bugs isn’t easy • ABI implementation techniques may collide with each other in unpredictable ways • One compiler permutes field order in structs if the alignment is 16 AND it has an empty virtual base AND it has at least one bitfield member AND … • Some ABIs are not documented • Even if they are, you can’t always trust the documentation What happens if we aren’t proactive • Let users find our bugs for us • This can be demoralizing for users, eroding their trust • Altruistic; we must hope that the user will file the bug • At best, the user’s time has been spent on something they probably didn’t want to do Let computers find the bugs 1.
    [Show full text]
  • COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011
    COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Edgar Gabriel Spring 2011 Edgar Gabriel Moore’s Law • Long-term trend on the number of transistor per integrated circuit • Number of transistors double every ~18 month Source: http://en.wikipedia.org/wki/Images:Moores_law.svg COSC 6385 – Computer Architecture Edgar Gabriel 1 What do we do with that many transistors? • Optimizing the execution of a single instruction stream through – Pipelining • Overlap the execution of multiple instructions • Example: all RISC architectures; Intel x86 underneath the hood – Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR) • Example: all commercial processors (Intel, AMD, IBM, SUN) – Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches • Example: (nearly) all commercial processors COSC 6385 – Computer Architecture Edgar Gabriel What do we do with that many transistors? (II) – Multi-issue processors: • Allow multiple instructions to start execution per clock cycle • Superscalar (Intel x86, AMD, …) vs. VLIW architectures – VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet • Example: Intel Itanium series – Vector units: • Allow for the efficient expression and execution of vector operations • Example: SSE, SSE2, SSE3, SSE4 instructions COSC 6385 – Computer Architecture Edgar Gabriel 2 Limitations of optimizing a single instruction
    [Show full text]
  • Using ID TECH Universal SDK Library Files in a C++ Project
    Using ID TECH Universal SDK Library Files in a C++ Project Introduction From time to time, customers who wish to use ID TECH's Universal SDK for Windows (which is .NET-based and comes with C# code examples) ask if it is possible to do development against the SDK solely in C++ (on Windows). The answer is yes. Universal SDK library files (DLLs) are COM-visible and ready to be accessed from C++ code. (SDK runtimes require the .NET Common Language Runtime, but your C++ binaries can still use the SDK.) Note that while the example shown in this document involves Microsoft's Visual Studio, it is also possible to use SDK libraries in C++ projects created in Eclipse or other IDEs. How to Use the IDTechSDK.dll File in a C++ Project: 1. Create a Visual C++ project in Visual Studio 2015 (shown below, an MFC Application as an example). 2. Change the properties of the Visual C++ project. Under the General tag, set Commom Language Runtime Support under Target Platform to "Common Language Runtime Support (/clr)" under Windows. 3. Under VC++ Directories, add the path to the C# .dll file(s) to Reference Directories. 4. Under C/C++ General, set Commom Language Runtime Support to "Common Language Runtime Support (/clr)." 5. Under C/C++ Preprocessor, add _AFXDLL to Preprocessor Definitions. 6. Under C/C++ Code Generation, change Runtime Library to "Multi-threaded DLL (/MD)." 7. Under Code Analysis General, change Rule Set to "Microsoft Mixed (C++ /CLR) Recommended Rules." 8. Use IDTechSDK.dll in your .cpp file. a.
    [Show full text]
  • Using Ld the GNU Linker
    Using ld The GNU linker ld version 2 January 1994 Steve Chamberlain Cygnus Support Cygnus Support [email protected], [email protected] Using LD, the GNU linker Edited by Jeffrey Osier (jeff[email protected]) Copyright c 1991, 92, 93, 94, 95, 96, 97, 1998 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another lan- guage, under the above conditions for modified versions. Chapter 1: Overview 1 1 Overview ld combines a number of object and archive files, relocates their data and ties up symbol references. Usually the last step in compiling a program is to run ld. ld accepts Linker Command Language files written in a superset of AT&T’s Link Editor Command Language syntax, to provide explicit and total control over the linking process. This version of ld uses the general purpose BFD libraries to operate on object files. This allows ld to read, combine, and write object files in many different formats—for example, COFF or a.out. Different formats may be linked together to produce any available kind of object file. See Chapter 5 [BFD], page 47, for more information. Aside from its flexibility, the gnu linker is more helpful than other linkers in providing diagnostic information.
    [Show full text]