Altivec™ Technology Programming Environments Manual
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Vxworks Architecture Supplement, 6.2
VxWorks Architecture Supplement VxWorks® ARCHITECTURE SUPPLEMENT 6.2 Copyright © 2005 Wind River Systems, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means without the prior written permission of Wind River Systems, Inc. Wind River, the Wind River logo, Tornado, and VxWorks are registered trademarks of Wind River Systems, Inc. Any third-party trademarks referenced are the property of their respective owners. For further information regarding Wind River trademarks, please see: http://www.windriver.com/company/terms/trademark.html This product may include software licensed to Wind River by third parties. Relevant notices (if any) are provided in your product installation at the following location: installDir/product_name/3rd_party_licensor_notice.pdf. Wind River may refer to third-party documentation by listing publications or providing links to third-party Web sites for informational purposes. Wind River accepts no responsibility for the information provided in such third-party documentation. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): (800) 545-WIND telephone: (510) 748-4100 facsimile: (510) 749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support VxWorks Architecture Supplement, 6.2 11 Oct 05 Part #: DOC-15660-ND-00 Contents 1 Introduction -
Data Representation
Chapter 4 – Data Representation The focus of this chapter is the representation of data in a digital computer. We begin with a review of several number systems (decimal, binary, octal, and hexadecimal) and a discussion of methods for conversion between the systems. The two most important methods are conversion from decimal to binary and binary to decimal. The conversions between binary and each of octal and hexadecimal are quite simple. Other conversions, such as hexadecimal to decimal, are often best done via binary. After discussion of conversion between bases, we discuss the methods used to store integers in a digital computer: one’s complement and two’s complement arithmetic. This includes a characterization of the range of integers that can be stored given the number of bits allocated to store an integer. The most common integer storage formats are 16, 32, and 64 bits. The next topic for this chapter is the storage of real (floating point) numbers. This discussion will mention the standard put forward by the Institute of Electrical and Electronic Engineers, the IEEE Standard 754 for floating point numbers, but will focus on the base–16 format used by IBM Mainframes. The chapter closes with a discussion of codes for storing characters, focusing on the EBCDIC system used on IBM mainframes. Number Systems There are four number systems of possible interest to the computer programmer: decimal, binary, octal, and hexadecimal. Each system is characterized by its base or radix, always given in decimal, and the set of permissible digits. Note that the hexadecimal numbering system calls for more than ten digits, so we use the first six letters of the alphabet. -
Bits and Bytes and Words, Oh My!
Bits and Bytes and Words, Oh My! Let’s get down to the real ni8y gri8y • CIS 211 In principle you already know ... Computer memory is binary (base 2) Everything: instrucEons, numbers, strings, ... memory is just one big array of binary numbers If I ask you what 01101011101010 represents, the only correct answer is “it depends” 0 = False = 0 volts; 1 = True = 5 volts (maybe) • CIS 211 CPU and Memory Main Memory CPU Address ALU Reg Reg Reg Values Reg Reg Reg Reg Reg Reg Other buses • CIS 211 CPU and Memory (simplified*) Main Memory CPU Address ALU Reg Reg Reg Values Reg Reg Reg Reg Reg Reg Other buses • CIS 211 (*with several useful lies) CPU and Memory CPU places a memory address on the address bus CPU may place a value on the data base and assert a “write” line (wire) or assert a “read” line and read a value from the data bus Main Memory CPU Address ALU Reg Reg Reg Values Reg Reg Reg Reg Reg Reg Other buses • CIS 211 A few terms: • A bit is a single binary digit • A byte is 8 binary digits • Most computer memory is “byte addressed”; a byte is the smallest addressable unit • What’s half a byte? (4 bits)? • A “word” is a sequence of bytes • Usually 4 bytes (32 bits) or 8 bytes (64 bits) depending on the computer (see next slide) 1 = True = 5 volts ; 0 = False = 0 volts (or 3.3 volts) • CIS 211 Typical byte-addressed memory 15 0 1 0 1 0 0 0 1 with 32-bit words 14 0 0 1 0 0 1 0 0 13 1 1 0 0 1 1 1 0 12 0 0 0 1 1 0 0 0 11 0 1 0 0 1 1 0 0 10 0 0 0 0 0 0 0 0 9 0 0 0 1 1 0 0 0 8 0 0 0 0 0 1 1 0 7 0 0 0 0 0 0 0 0 6 0 0 1 1 1 0 0 1 5 0 1 1 0 0 1 1 -
SIMD Extensions
SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel. -
Endian: from the Ground up a Coordinated Approach
WHITEPAPER Endian: From the Ground Up A Coordinated Approach By Kevin Johnston Senior Staff Engineer, Verilab July 2008 © 2008 Verilab, Inc. 7320 N MOPAC Expressway | Suite 203 | Austin, TX 78731-2309 | 512.372.8367 | www.verilab.com WHITEPAPER INTRODUCTION WHat DOES ENDIAN MEAN? Data in Imagine XYZ Corp finally receives first silicon for the main Endian relates the significance order of symbols to the computers chip for its new camera phone. All initial testing proceeds position order of symbols in any representation of any flawlessly until they try an image capture. The display is kind of data, if significance is position-dependent in that regularly completely garbled. representation. undergoes Of course there are many possible causes, and the debug Let’s take a specific type of data, and a specific form of dozens if not team analyzes code traces, packet traces, memory dumps. representation that possesses position-dependent signifi- There is no problem with the code. There is no problem cance: A digit sequence representing a numeric value, like hundreds of with data transport. The problem is eventually tracked “5896”. Each digit position has significance relative to all down to the data format. other digit positions. transformations The development team ran many, many pre-silicon simula- I’m using the word “digit” in the generalized sense of an between tions of the system to check datapath integrity, bandwidth, arbitrary radix, not necessarily decimal. Decimal and a few producer and error correction. The verification effort checked that all other specific radixes happen to be particularly useful for the data submitted at the camera port eventually emerged illustration simply due to their familiarity, but all of the consumer. -
Binary Numbers 8'S Colum 8'S Colum 2'S Colum 1'S 4'S Colum 4'S N N N N
Codes and number systems Introduction to Computer Yung-Yu Chuang with slides by Nisan & Schocken (www.nand2tetris.org) and Harris & Harris (DDCA) Coding • Assume that you want to communicate with your friend with a flashlight in a night, what will you do? light painting? What’s the problem? Solution #1 • A: 1 blink • B: 2 blibliknks • C: 3 blinks : • Z: 26 blinks Wha t’s the problem ? • How are you? = 131 blinks Solution #2: Morse code Hello Lookup • It is easy to translate into Morse code than reverse. Why? Lookup Lookup Useful for checking the correctness/ redddundency Braille Braille What’s common in these codes? • They are both binary codes. Binary representations • Electronic Implementation – Easy to store with bitblbistable elemen ts – Reliably transmitted on noisy and inaccurate wires 0 1 0 3.3V 2.8V 0.5V 0.0V Number systems <13> 1 Chapter = = Systems 2 1's column 10 1's column 10's column 2's column 100's column 4's column 1000's column 8's column 5374 1101 Number • numbers Decimal • Binary numbers Number Systems • Decimal numbers 1000's col 10's colum 1's column 100's colu u m n mn n 3 2 1 0 537410 = 5 ? 10 + 3 ? 10 + 7 ? 10 + 4 ? 10 five three seven four thousands hundreds tens ones • Binary numbers 8's colum 2's colum 1's colum 4's colum n n n n 3 2 1 0 11012 = 1 ? 2 + 1 ? 2 + 0 ? 2 + 1 ? 2 = 1310 one one no one eight four two one Chapter 1 <14> Binary numbers • Digits are 1 and 0 (a binary dig it is calle d a bit) 1 = true 0 = false • MSB –most significant bit • LSB –least significant bit MSB LSB • Bit numbering: 1 0 1 1 0 -
TCG ACPI Specification
TCG ACPI Specification Family “1.2” and “2.0” Version 1.3 Revision 8 August 5, 2021 PUBLISHED Contact: [email protected] TCG PUBLISHED Copyright © TCG 2021 TCG TCG ACPI Specification Disclaimers, Notices, and License Terms THIS SPECIFICATION IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE. Without limitation, TCG disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification and to the implementation of this specification, and TCG disclaims all liability for cost of procurement of substitute goods or services, lost profits, loss of use, loss of data or any incidental, consequential, direct, indirect, or special damages, whether under contract, tort, warranty or otherwise, arising in any way out of use or reliance upon this specification or any information herein. This document is copyrighted by Trusted Computing Group (TCG), and no license, express or implied, is granted herein other than as follows: You may not copy or reproduce the document or distribute it to others without written permission from TCG, except that you may freely do so for the purposes of (a) examining or implementing TCG specifications or (b) developing, testing, or promoting information technology standards and best practices, so long as you distribute the document with these disclaimers, notices, and license terms. Contact the Trusted Computing Group at www.trustedcomputinggroup.org for information on specification licensing through membership agreements. Any marks and brands contained herein are the property of their respective owners. -
Numbering Systems
12 Digital Principles Switching Theory R HAPTE 1 C NUMBERING SYSTEMS 1.0 INTRODUCTION Inside today’s computers, data is represented as 1’s and 0’s. These 1’s and 0’s might be stored magnetically on a disk, or as a state in a transistor, core, or vacuum tube. To perform useful operations on these 1’s and 0’s one have to organize them together into patterns that make up codes. Modern digital systems do not represent numeric values using the decimal system. Instead, they typically use a binary or two’s complement numbering system. To understand the digital system arithmetic, one must understand how digital systems represent numbers. This chapter discusses several important concepts including the binary, octal and hexadeci- mal numbering systems, binary data organization (bits, nibbles, bytes, words, and double words), signed and unsigned numbering systems. If one is already familiar with these terms he should at least skim this material. 1.1 A REVIEW OF THE DECIMAL SYSTEM People have been using the decimal (base 10) numbering system for so long that they probably take it for granted. When one see a number like “123”, he don’t think about the value 123; rather, he generate a mental image of how many items this value represents. In reality, however, the number 123 represents: 1*102 + 2*101 + 3*100 OR 100 + 20 + 3 Each digit appearing to the left of the decimal point represents a value between zero and nine times an increasing power of ten. Digits appearing to the right of the decimal point represent a value between zero and nine times an increasing negative power of ten. -
An Empirical Analysis of the Ipv4 Number Market
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository Buying numbers: An Empirical Analysis of the IPv4 Number Market Milton L. Mueller Brenden Kuerbis Syracuse University Syracuse University [email protected] and University of Toronto [email protected] Abstract The emergence of a trading market for previously allocated Internet number blocks is an important change in Internet governance. Almost all of the Internet’s 32-bit address space has been given out, and we have not migrated to a new internet protocol, IPv6, with a larger address space. IP addresses are therefore being commoditized, as organizations with surplus numbers sell address blocks to organizations that want more. Though controversial, we know little about this phenomenon. This paper quantifies the number of address blocks that have been traded as of July 2012 and analyzes the scant information that exists about the pricing of these resources, discovering the emergence of a billion dollar market. The paper then shows how this factual information relates to key policy debates, in particularly the role of needs assessment and property rights in IPv4 number blocks. Keywords: Internet governance, IP address markets, IPv4 scarcity, IPv6 migration, property rights Introduction One of the most important but least-studied aspects of Internet policy is the emergence of a trading market for previously allocated Internet number blocks. Without unique Internet protocol numbers for the networks and devices attached, the Internet simply doesn’t work. The original Internet Protocol standard, known as IPv4, specified a 32-bit numbering space, which provided slightly less than 4 billion unique numbers that could be used as addresses (Postel, 1981). -
Multi-Platform Auto-Vectorization
H-0236 (H0512-002) November 30, 2005 Computer Science IBM Research Report Multi-Platform Auto-Vectorization Dorit Naishlos, Richard Henderson* IBM Research Division Haifa Research Laboratory Mt. Carmel 31905 Haifa, Israel *Red Hat Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. I thas been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g ,. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . Multi-Platform Auto-Vectorization Dorit Naishlos Richard Henderson IBM Haifa Labs Red Hat [email protected] [email protected] Abstract. The recent proliferation of the Single Instruction Multiple Data (SIMD) model has lead to a wide variety of implementations. These have been incorporated into many platforms, from gaming machines and em- bedded DSPs to general purpose architectures. In this paper we present an automatic vectorizer as implemented in GCC - the most multi-targetable compiler available today. We discuss the considerations that are involved in developing a multi-platform vectorization technology, and demonstrate how our vectorization scheme is suited to a variety of SIMD architectures. -
A Bibliography of Publications in IEEE Micro
A Bibliography of Publications in IEEE Micro Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 16 September 2021 Version 2.108 Title word cross-reference -Core [MAT+18]. -Cubes [YW94]. -D [ASX19, BWMS19, DDG+19, Joh19c, PZB+19, ZSS+19]. -nm [ABG+16, KBN16, TKI+14]. #1 [Kah93i]. 0.18-Micron [HBd+99]. 0.9-micron + [Ano02d]. 000-fps [KII09]. 000-Processor $1 [Ano17-58, Ano17-59]. 12 [MAT 18]. 16 + + [ABG+16]. 2 [DTH+95]. 21=2 [Ste00a]. 28 [BSP 17]. 024-Core [JJK 11]. [KBN16]. 3 [ASX19, Alt14e, Ano96o, + AOYS95, BWMS19, CMAS11, DDG+19, 1 [Ano98s, BH15, Bre10, PFC 02a, Ste02a, + + Ste14a]. 1-GHz [Ano98s]. 1-terabits DFG 13, Joh19c, LXB07, LX10, MKT 13, + MAS+07, PMM15, PZB+19, SYW+14, [MIM 97]. 10 [Loc03]. 10-Gigabit SCSR93, VPV12, WLF+08, ZSS+19]. 60 [Gad07, HcF04]. 100 [TKI+14]. < [BMM15]. > [BMM15]. 2 [Kir84a, Pat84, PSW91, YSMH91, ZACM14]. [WHCK18]. 3 [KBW95]. II [BAH+05]. ∆ 100-Mops [PSW91]. 1000 [ES84]. 11- + [Lyl04]. 11/780 [Abr83]. 115 [JBF94]. [MKG 20]. k [Eng00j]. µ + [AT93, Dia95c, TS95]. N [YW94]. x 11FO4 [ASD 05]. 12 [And82a]. [DTB01, Dur96, SS05]. 12-DSP [Dur96]. 1284 [Dia94b]. 1284-1994 [Dia94b]. 13 * [CCD+82]. [KW02]. 1394 [SB00]. 1394-1955 [Dia96d]. 1 2 14 [WD03]. 15 [FD04]. 15-Billion-Dollar [KR19a]. -
Compiler-Based Data Prefetching and Streaming Non-Temporal Store Generation for the Intel R Xeon Phitm Coprocessor
Compiler-based Data Prefetching and Streaming Non-temporal Store Generation for the Intel R Xeon PhiTM Coprocessor Rakesh Krishnaiyer†, Emre K¨ult¨ursay†‡, Pankaj Chawla†, Serguei Preis†, Anatoly Zvezdin†, and Hideki Saito† † Intel Corporation ‡ The Pennsylvania State University TM Abstract—The Intel R Xeon Phi coprocessor has software used, and generating them. In this paper, we: TM prefetching instructions to hide memory latencies and spe- • Present how the Intel R Xeon Phi coprocessor soft- cial store instructions to save bandwidth on streaming non- ware prefetch and non-temporal streaming store instructions temporal store operations. In this work, we provide details on R compiler-based generation of these instructions and evaluate are generated by the Intel Composer XE 2013, TM their impact on the performance of the Intel R Xeon Phi • Evaluate the impact of these mechanisms on the overall coprocessor using a wide range of parallel applications with performance of the coprocessor using a variety of parallel different characteristics. Our results show that the Intel R applications with different characteristics. Composer XE 2013 compiler can make effective use of these Our experimental results demonstrate that (i) a large mechanisms to achieve significant performance improvements. number of applications benefit significantly from software prefetching instructions (on top of hardware prefetching) that I. INTRODUCTION are generated automatically by the compiler for the Intel R TM TM The Intel R Xeon Phi coprocessor based on Intel R Xeon Phi coprocessor, (ii) some benchmarks can further Many Integrated Core Architecture (Intel R MIC Architec- improve when compiler options that control prefetching ture) is a many-core processor with long vector (SIMD) units behavior are used (e.g., to enable indirect prefetching), targeted for highly parallel workloads in the High Perfor- and (iii) many applications benefit from compiler generated mance Computing (HPC) segment.