Simplifying Many-Core-Based Heterogeneous Soc Programming

Total Page:16

File Type:pdf, Size:1020Kb

Simplifying Many-Core-Based Heterogeneous Soc Programming IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 957 Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives Andrea Marongiu, Member, IEEE, Alessandro Capotondi, Student Member, IEEE, Giuseppe Tagliavini, and Luca Benini, Fellow, IEEE Abstract—Multiprocessor systems-on-chip (MPSoC) are evolv- performance/watt. Unfortunately, these advantages are traded ing into heterogeneous architectures based on one host processor off for an increased programming complexity: extensive and plus many-core accelerators. While heterogeneous SoCs promise time-consuming rewrite of applications is required, using spe- higher performance/watt, they are programmed at the cost of major code rewrites with low-level programming abstractions (e.g, cialized programming paradigms. For example, in the general- OpenCL). We present a programming model based on OpenMP, purpose and high-performance computing (HPC) domains, with additional directives to program the accelerator from a sin- Graphics Processing Unit (GPU)-based systems are pro- gle host program. As a test case, we evaluate an implementation grammed with OpenCL.1 OpenCL aims at providing a stan- of this programming model for the STMicroelectronics STHORM dardized way of programming such accelerators; however, it development board. We obtain near-ideal throughput for most benchmarks, very close performance to hand-optimized OpenCL offers a very low-level programming style. Programmers must codes at a significantly lower programming complexity, and up to write and compile separate programs for the host system and for 30× speedup versus host execution time. the accelerator. Data transfers to/from the many-core and syn- Index Terms—Heterogeneous systems-on-chip (SoC), many chronization must also be manually orchestrated. In addition, core, nonuniform memory access (NUMA), OpenMP. OpenCL is not performance-portable: specific optimizations have to be recoded for the accelerator at hand. This gener- I. INTRODUCTION ates the need for a higher level programming style [9] [10]. HE EVER-INCREASING demand for computational OpenMP [11] has included in the latest specification exten- T power within tight energy budgets has recently led to rad- sions to manage accelerators, and XeonPhi coprocessors offer ical evolutions of multiprocessor systems-on-chip (MPSoC). all standard programming models [OpenMP, Pthreads, message Two design paradigms have proven particularly effective in passing interface (MPI)] [12], where the accelerator appears increasing performance and energy efficiency of such systems: like a symmetric multiprocessor (SMP) on a single chip. 1) architectural heterogeneity; and 2) many-core processors. In the embedded domain, such proposals are still lacking, but Power-aware design [1], [2] and mapping [3], [4] for hetero- there is a clear trend toward designing embedded SoCs in the geneous MPSoCs are being widely studied by the research same way it is happening in the HPC domain [13], and which community, and many-core-based heterogeneous MPSoCs are will eventually call for the same programming solutions. now reality [5]–[8]. In this paper, we present a programming model, compiler, A common embodiment of architectural heterogeneity is a and runtime system for a heterogeneous embedded platform template where a powerful general-purpose processor (usu- template featuring a host system plus a many-core accelerator. ally called the host), featuring sophisticated cache hierar- The many-core relies on a multicluster design, where each clus- chy and full-fledged operating system, is coupled to pro- ter features several simple cores sharing L1 scratchpad memory. grammable many-core accelerators composed of several tens Intercluster communication is subject to nonuniform memory of simple processors, where highly parallel computation ker- access (NUMA) effects. The programming model consists of nels of an application can be offloaded to improve overall an extended OpenMP, where additional directives allow to effi- ciently program the accelerator from a single host program, Manuscript received May 07, 2014; revised February 05, 2015; accepted rather than writing separate host and accelerator programs, and June 08, 2015. Date of publication June 24, 2015; date of current version July 31, 2015. This work was supported by EU projects ERC-AdG MultiTherman distribute the workload among clusters in a NUMA-aware man- (291125) and FP7 P-SOCRATES (611016). Paper no. TII-14-0489. ner, thus improving the performance. The proposed OpenMP A. Marongiu and L. Benini are with the Department of Electrical, Electronic, extensions are only partly inline with the latest OpenMP v4.0 and Information Engineering “Guglielmo Marconi” (DEI), University of specifications. The latter are in our view too tailored to the Bologna, Bologna 40136, Italy, and also with the Department of Information Technology and Electrical Engineering, Swiss Federal Institute of Technology characteristics of today’s GPUs, as they emphasize data-level Zurich (ETH Zurich), Zurich 8092, Switzerland (e-mail: [email protected]. accelerator parallelism (modern GPUs being conceived for ethz.ch; [email protected]). that) and copy-based host-to-accelerator communication (mod- A. Capotondi and G. Tagliavini are with the Department of Electrical, ern GPUs being based on private-memory designs). Our focus Electronic, and Information Engineering “Guglielmo Marconi” (DEI), University of Bologna, Bologna 40136, Italy (e-mail: alessandro.capotondi@ is on many-core accelerators which efficiently support more unibo.it; [email protected]). types of parallelism (e.g., tasks) and leverage shared memory Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TII.2015.2449994 1[Online]. Available: http://www.khronos.org/opencl 1551-3203 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 958 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 communication with the host, which is where the heteroge- neous system architecture (HSA)2 and all GPU roadmaps are heading in the longer term. We discuss how to provide efficient communication with the host on top of shared memory by trans- parently relying on pointer exchange in case virtual memory paging is natively supported by the many-core and leveraging software virtual address translation plus copies into contiguous shared memory (to overcome paging issues) if such support is lacking. We also comment on how copies can be used to imple- ment offload on top of a private accelerator memory space. To achieve our goals, we propose minimal extensions to the previous OpenMP v3.1, emphasizing ease of programming. For validation, we present a concrete embodiment of our pro- posal targeting the first STMicroelectronics STHORM devel- opment board [5]. This board couples an ARM9 host system and main memory (based on the Zynq7 device) to a 69-cores STHORM chip. We present a multi-ISA compilation toolchain Fig. 1. Heterogeneous embedded SoC template. that hides all the process of outlining an accelerator program from the host application, compiling it for the STHORM plat- where critical computation kernels of an application can be form, offloading the execution binary, and implementing data offloaded to improve overall performance/watt [5]–[8], [15]. sharing between the host and the accelerator. Two separate The type of many-core accelerator that we consider here has OpenMP runtime systems are developed, one for the host and a few key characteristics. one for the STHORM accelerator. 1) It leverages a multicluster design to overcome scalability Our experiments thoroughly assess the performance of the limitations [5], [6]. Processors within a cluster are proposed programming framework, considering six representa- tightly coupled to local L1 scratchpad memory, which tive benchmarks from the computer vision, image processing, implies low-latency and high-bandwidth communica- and linear algebra domains. The evaluation is articulated in tion. Globally, the many-core accelerator leverages a three parts. partitioned global address space (PGAS). Every remote 1) We relate the achieved throughput to each benchmark’s memory can be directly accessed by each processor, but operational intensity using the Roofline methodology intercluster communication travels through a networks- [14]. Here, we observe near-ideal throughput for most on-chip (NoC), and is subject to NUMA latency and benchmarks. bandwidth. 2) We compare the performance of our OpenMP to OpenCL, 2) The processors within a cluster are not GPU-like data- natively supported by the STHORM platform, achiev- parallel cores, with common fetch/decode phases which ing very close performance to hand-optimized OpenCL imply performance loss when parallel cores execute out codes, at a significantly lower programming complexity. of lock-step mode. The accelerator processors considered 3) We measure the speedup of our OpenMP versus sequen- here are simple independent RISC cores, perfectly tial execution on the ARM host, which exhibits peaks of suited to execute both single instruction, multiple data 30×. (SIMD) and multiple instruction, multiple data (MIMD) This paper is organized as follows. In Section II, we types of parallelism. This allows to efficiently support a describe the target heterogeneous embedded
Recommended publications
  • Methods and Tools for Rapid and Efficient Parallel Implementation Of
    Methods and tools for rapid and efficient parallel implementation of computer vision algorithms on embedded multiprocessors Vitor Schwambach To cite this version: Vitor Schwambach. Methods and tools for rapid and efficient parallel implementation of computer vision algorithms on embedded multiprocessors. Computer Vision and Pattern Recognition [cs.CV]. Université Grenoble Alpes, 2016. English. NNT : 2016GREAM022. tel-01523273 HAL Id: tel-01523273 https://tel.archives-ouvertes.fr/tel-01523273 Submitted on 16 May 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE ALPES Spécialité : Informatique Arrêté ministériel du 7 août 2006 Présentée par Vítor Schwambach Thèse dirigée par Stéphane Mancini et codirigée par Sébastien Cleyet-Merle et Alain Issard préparée au sein du Laboratoire TIMA et STMicroelectronics et de l’École Doctorale Mathématiques, Sciences et Technologies de l’Information, Informatique (EDMSTII) Methods and Tools for Rapid and Efficient Parallel Implementation of Computer
    [Show full text]
  • Demand-Based Scheduling Using Noc Probes
    Demand-based Scheduling using NoC Probes Kurt Shuler Jonah Probell Monica Tang Arteris, Inc. Sunnyvale, CA, USA www.arteris.com ABSTRACT Contention for shared resources, such as memory in a system-on-chip, is inefficient. It limits performance and requires initiator IPs, such as CPUs, to stay powered up and run at clock speeds higher than they would otherwise. When a system runs multiple tasks simultaneously, contention will vary depending on each task’s demand for shared resources. Scheduling in a conventional system does not consider how much each task will demand shared resources. Proposed is a demand-based method of scheduling tasks that have diverse demand for shared resources in a multi- processor system-on-chip. Tasks with extremes of high and low demand are scheduled to run simultaneously. Where tasks in a system are cyclical, knowledge of period, duty cycle, phase, and magnitude are considered in the scheduling algorithm. Using demand-based scheduling, the time variation of aggregate demand is reduced, as might be its maximum. The average access latency and resulting idle clock cycles are reduced. This allows initiator IPs to run at lower clock frequencies, finish their tasks sooner, and spend more time in powered down modes. Use of probes within a network-on-chip to gather demand statistics, application-specific algorithm considerations, operating system thread scheduler integration, and heterogeneous system middleware integration are discussed. Contents 1. Introduction ......................................................................................................................................................................
    [Show full text]
  • A Multifrequency MAC Specially Designed for Wireless Sensor
    HSAemu – A Full System Emulator for HSA Platforms JIUN-HUNG DING, ZHOUDONG GUO, CHUNG-MING KAO, National Tsing Hua University WEI-CHUNG HSU, National Chiao Tung University YEH-CHING CHUNG, National Tsing Hua University Heterogeneous system architecture is an open industry standard designed to support a large variety of data-parallel and task-parallel programming models. Many application processor vendors, including AMD, ARM, Imagination, MediaTek, Texas Instrument, Samsung and Qualcomm are members of the HSA Foundation. This paper presents the design of HSAemu, a full system emulator for the HSA platform. The design of HSAemu is based on PQEMU, a parallel version of QEMU, with supports for HSA defined features such as a) shared virtual memory between CPU and GPU, b) memory based signaling and synchronization, c) multiple user level command queues, d) preemptive GPU context switching, and e) concurrent execution of CPU threads and GPU threads, etc. In addition to the basic requirements of the HSA-compliant features, HSAemu also includes an LLVM based binary translation engine to efficiently support translating multiple different ISAs (e.g. ARM and HSAIL -- HSA Intermediate Language). Categories and Subject Descriptors: C.1.3 [Processor Architectures]: Other Architecture Styles— Heterogeneous (hybrid) Systems; C.1.6 [Simulation and Modeling]: Type of Simulation—Parallel General Terms: Design, Simulation Additional Key Words and Phrases: HSA, GPU simulation, parallel simulation 1. INTRODUCTION Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs), originally designed to perform specialized graphics computations in parallel, to perform general purpose parallel computation tasks traditionally handled by the central processing units (CPUs).
    [Show full text]
  • High-Level Programming Model for Heterogeneous Embedded Systems Using
    HIGH-LEVEL PROGRAMMING MODEL FOR HETEROGENEOUS EMBEDDED SYSTEMS USING MULTICORE INDUSTRY STANDARD APIS A Dissertation Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy By Peng Sun August 2016 HIGH-LEVEL PROGRAMMING MODEL FOR HETEROGENEOUS EMBEDDED SYSTEMS USING MULTICORE INDUSTRY STANDARD APIS Peng Sun APPROVED: Dr. Chapman, Barbara Dept. of Computer Science, University of Houston Dr. Gabriel, Edgar Dept. of Computer Science, University of Houston Dr. Shah, Shishir Dept. of Computer Science, University of Houston Dr. Subhlok, Jaspal Dept. of Computer Science, University of Houston Dr. Chandrasekaran, Sunita Dept. of CIS, University of Delaware Dean, College of Natural Sciences and Mathematics ii Acknowledgements First and foremost, I would like to thank my advisor, Dr. Barbara Chapman, for her invaluable advice and guidance in my Ph.D. study. I appreciate all her dedi- cated guidance, and great opportunities to participate in those worthwhile academic projects, and the funding to complete my Ph.D. study. Her passion and excellence on academic and contributions to communities encourage me to finish my Ph.D. degree. Specifically, I am very grateful to Dr. Sunita Chandrasekaran for the mentoring and guidance for my research and publications. That help is paramount to my Ph.D. Journey. I truly could not achieve the degree without her mentoring and advisory. Special thanks to all my committee members: Dr. Edgar Gabriel, Dr. Shishir Shah, Dr. Jaspal Subhlok, for their time, insightful comments and help. I would also like to thank my fellow labmates of the HPCTools group: Dr.
    [Show full text]
  • Notice Pursuant to the National Cooperative Research And
    64248 Federal Register / Vol. 78, No. 208 / Monday, October 28, 2013 / Notices Trimonite. Federal Register pursuant to Section Hong Kong, HONG KONG–CHINA, have Trinitroanisole. 6(b) of the Act on November 10, 2004 been added as parties to this venture. Trinitrobenzene. (69 FR 65226). Also, Action Electronics Co., Ltd., Trinitrobenzoic acid. The last notification was filed with Chung Li, TAIWAN; and DAT H.K. Trinitrocresol. the Department on May 10, 2013. A Limited, Quarry Bay, Hong Kong, HONG Trinitro-meta-cresol. notice was published in the Federal KONG–CHINA, have withdrawn as Trinitronaphthalene. Register pursuant to Section 6(b) of the parties to this venture. Trinitrophenetol. Act on June 13, 2013 (78 FR 35646). No other changes have been made in Trinitrophloroglucinol. either the membership or planned Trinitroresorcinol. Patricia A. Brink, activity of the group research project. Tritonal. Director of Civil Enforcement, Antitrust Membership in this group research Division. project remains open, and DVD CCA U [FR Doc. 2013–25281 Filed 10–25–13; 8:45 am] intends to file additional written Urea nitrate. BILLING CODE P notifications disclosing all changes in W membership. On April 11, 2001, DVD CCA filed its Water-bearing explosives having salts of DEPARTMENT OF JUSTICE original notification pursuant to Section oxidizing acids and nitrogen bases, 6(a) of the Act. The Department of sulfates, or sulfamates (cap sensitive). Antitrust Division Justice published a notice in the Federal Water-in-oil emulsion explosive Register pursuant to Section 6(b) of the compositions. Notice Pursuant to The National Act on August 3, 2001 (66 FR 40727).
    [Show full text]
  • Everything You Always Wanted to Know About HSA*
    Everything You Always Wanted to Know About HSA* Explained by Nathan Brookwood Research Fellow, Insight 64 October, 2013 * But Were Afraid To Ask Abstract For several years, AMD and its technology partners have tossed around terms like HSA, FSA, APU, heterogeneous computing, GP/GPU computing and the like, leaving many innocent observers confused and bewildered. In this white paper, sponsored by AMD, Insight 64 lifts the veil on this important technology, and explains why, even if HSA doesn’t entirely change your life, it will change the way you use your desktop, laptop, tablet, smartphone and the cloud. Table of Contents 1. What is Heterogeneous Computing? .................................................................................................... 3 2. Why does Heterogeneous Computing matter? .................................................................................... 3 3. What is Heterogeneous Systems Architecture (HSA)? ......................................................................... 3 4. How can end-users take advantage of HSA? ........................................................................................ 4 5. How does HSA affect system power consumption? ............................................................................. 4 6. Will HSA make the smartphone, tablet or laptop I just bought run better? ........................................ 5 7. What workloads benefit the most from HSA? ...................................................................................... 5 8. How does HSA
    [Show full text]
  • C 2016 MICHAEL GEOFFREY GRUESEN ALL RIGHTS RESERVED
    c 2016 MICHAEL GEOFFREY GRUESEN ALL RIGHTS RESERVED TOWARDS AN IDEAL EXECUTION ENVIRONMENT FOR PROGRAMMABLE NETWORK SWITCHES A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Michael Geoffrey Gruesen August, 2016 TOWARDS AN IDEAL EXECUTION ENVIRONMENT FOR PROGRAMMABLE NETWORK SWITCHES Michael Geoffrey Gruesen Thesis Approved: Accepted: Advisor Dean of the College Dr. Andrew Sutton Dr. John Green Faculty Reader Dean of the Graduate School Dr. Timothy O’Neil Dr. Chand Midha Faculty Reader Date Dr. Zhong-Hui Duan Department Chair Dr. David Steer ii ABSTRACT Software Defined Networking (SDN) aims to create more powerful, intelligent net- works that are managed using programmed switching devices. Applications for these SDN switches should be target independent, while being efficiently translated to the platform’s native machine code. However network switch vendors do not conform to any standard, and contain different capabilities and features that vary between manufacturers. The Freeflow Virtual Machine (FFVM) is a modular, fully programmable vir- tual switch that can host compiled network applications. Applications are compiled to native object libraries and dynamically loaded at run time. The FFVM provides the necessary data and computing resources required by applications to process packets. This work details the many implementation approaches investigated and evaluated in order to define a suitable execution environment for hosted network applications. iii ACKNOWLEDGEMENTS First, I would like to thank Dr. Andrew Sutton for being my advisor and giving me the opportunity to work as your research assistant during my graduate studies. I am truly grateful for all that I have learned while working with you.
    [Show full text]
  • Panel Discussion: the Future of I/O from a CPU Architecture Perspective Brad Benton AMD, Inc
    Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Future of I/O from CPU/Arch Perspective Issues • Move to Exascale involves more parallel processing across more processing elements – GPUs, FPGAs, Specialized accelerators, Processing In Memory (PIM), etc. • These elements frequently do not live in the same coherent memory domain • Results in much data movement to bring data to the needed processing element March 30 – April 2, 2014 #OFADevWorkshop 2 Future of I/O from CPU/Arch Perspective Heterogeneous System Architecture (HSA) • HSA Foundation – “Not-for-Profit Industry Consortium of SOC and SOC IP vendors, OEMs, academia, OSVs and ISVs defining a consistent heterogeneous platform architecture to make it dramatically easier to program heterogeneous parallel devices” – Develops open, royalty-free industry specifications and APIs for heterogeneous computing – http://www.hsafoundation.com/ • Goal: Bring Accelerators (and other devices) forward as first class citizens – Unified address space – Operate in pageable system memory – Full memory coherence – User mode dispatch/scheduling March 30 – April 2, 2014 #OFADevWorkshop 3 Future of I/O from CPU/Arch Perspective HSA Foundation Membership • 50+ Members – Multiple categories of membership • Founding Members – AMD – ARM – Imagination – MediaTek – Qualcomm – Samsung – Texas Instruments March 30 – April 2, 2014 #OFADevWorkshop 4 Future of I/O from CPU/Arch Perspective Application-level access to HSA devices C C • Application talks directly
    [Show full text]
  • Notice Pursuant to the National Cooperative Research And
    61786 Federal Register / Vol. 77, No. 197 / Thursday, October 11, 2012 / Notices DEPARTMENT OF JUSTICE Attorney General and the Federal Trade On September 15, 2004, ASTM filed Commission disclosing (1) the identities its original notification pursuant to Antitrust Division of the parties to the venture and (2) the Section 6(a) of the Act. The Department nature and objectives of the venture. of Justice published a notice in the Notice Pursuant to the National The notifications were filed for the Federal Register pursuant to Section Cooperative Research and Production purpose of invoking the Act’s provisions 6(b) of the Act on November 10, 2004 Act of 1993—Petroleum Environmental limiting the recovery of antitrust (69 FR 65226). Research Forum plaintiffs to actual damages under The last notification was filed with Notice is hereby given that, on specified circumstances. the Department on May 11, 2012. A September 10, 2012, pursuant to Section Pursuant to Section 6(b) of the Act, notice was published in the Federal 6(a) of the National Cooperative the identities of the parties to the Register pursuant to Section 6(b) of the Research and Production Act of 1993, venture are: Advanced Micro Devices Act on June 8, 2012 (77 FR 34069). 15 U.S.C. 4301 et seq. (‘‘the Act’’), Inc, Austin, TX; ARM, Ltd., Cambridge, Patricia A. Brink, Petroleum Environmental Research UNITED KINGDOM; Imagination Director of Civil Enforcement, Antitrust Forum (‘‘PERF’’) has filed written Technologies Group plc, Kings Langley, Division. notifications simultaneously with the Hertfordshire, UNITED KINGDOM; [FR Doc. 2012–24995 Filed 10–10–12; 8:45 am] Attorney General and the Federal Trade MediaTek Inc., Hsinchu City, Taiwan, BILLING CODE 4410–11–P Commission disclosing changes in its PEOPLE’S REPUBLIC OF CHINA; and membership.
    [Show full text]
  • US Department of Justice Antitrust Division Notice
    This document is scheduled to be published in the Federal Register on 11/06/2012 and available online at http://federalregister.gov/a/2012-27107, and on FDsys.gov U.S. Department of Justice Antitrust Division Notice Pursuant to the National Cooperative Research and Production Act of 1993 -- Heterogeneous System Architecture Foundation Notice is hereby given that, on October 9, 2012, pursuant to Section 6(a) of the National Cooperative Research and Production Act of 1993, 15 U.S.C. 301 et seq. (“the Act”), Heterogeneous System Architecture Foundation (“HSA Foundation”) has filed written notifications simultaneously with the Attorney General and the Federal Trade Commission disclosing changes in its membership. The notifications were filed for the purpose of extending the Act’s provisions limiting the recovery of antitrust plaintiffs to actual damages under specified circumstances. Specifically, Samsung Electronics Co., Ltd., Gyeonggi-do, REPUBLIC OF KOREA; Symbio, San Jose, CA; Arteris, Inc., Sunnyvale, CA; Vivante Corporation, Sunnyvale, CA; Apical Ltd., London, UNITED KINGDOM; MulticoreWare, Cupertino, CA; Sonics, Inc., Milpitas, CA; Qualcomm Incorporated, San Diego, CA; and LG Electronics, Inc., Seocho-gu, Seoul, REPUBLIC OF KOREA, have been added as parties to this venture. No other changes have been made in either the membership or planned activity of the group research project. Membership in this group research project remains open, and HSA Foundation intends to file additional written notifications disclosing all changes in membership. On August 31, 2012, HSA Foundation filed its original notification pursuant to Section 6(a) of the Act. The Department of Justice published a notice in the Federal Register pursuant to Section 6(b) of the Act on October 11, 2012 (77 FR 61786).
    [Show full text]
  • HSA Programmer's Reference Manual: HSAIL Virtual ISA And
    HSA Foundation Proprietary HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG) Publication #: 49828 ∙ Rev: Version 0.95 ∙ Issue Date: 1 May 2013 HSA Foundation Proprietary HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming PID: 49828 ∙ Rev: Version 0.95 ∙ 1 May 2013 Model, Compiler Writer’s Guide, and Object Format (BRIG) © 2013 HSA Foundation. All rights reserved. The contents of this document are provided in connection with the HSA Foundation specifications. This specification is protected by copyright laws and contains material proprietary to the HSA Foundation. It or any components may not be reproduced, republished, distributed, transmitted, displayed, broadcast or otherwise exploited in any manner without the express prior written permission of HSA Foundation. You may use this specification for implementing the functionality therein, without altering or removing any trademark, copyright or other notice from the specification, but the receipt or possession of this specification does not convey any rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anything that it may describe, in whole or in part. HSA Foundation grants express permission to any current Founder, Promoter, Supporter Contributor, Academic or Associate member of HSA Foundation to copy and redistribute UNMODIFIED versions of this specification in any fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for any version of the API is used whenever possible. Such distributed specification may be re-formatted AS LONG AS the contents of the specification are not changed in any way.
    [Show full text]
  • View Annual Report
    2012 ANNUAL REPORT ON FORM 10-K To Our Stockholders: During 2012, a fundamental shift occurred that is re-defining the landscape of our industry and providing new growth opportunities for AMD. Despite annual revenue declining 17% from the prior year, we faced the macro environment challenges head-on by taking decisive actions to position AMD for long-term success. Although we believe the PC market has begun to stabilize, we expect demand for PCs to remain choppy through the first half of 2013. As we embrace shifts in the industry that provide new growth opportunities, we are executing a three-step plan to stabilize, accelerate and ultimately transform AMD: • Completing the restructuring of our business. This is a critical step to reduce our operating costs and help return us to profitability. • Accelerating our business in 2013. We have already successfully passed several key milestones as we execute the delivery and launch of a new set of powerful product offerings based on our differentiated intellectual property (IP). • Transforming AMD to take advantage of high growth opportunities in adjacent markets where our IP provides a competitive advantage and a path for growth. Restructuring Plan: Reduce Operating Expenses and Better Position AMD Competitively In the fourth quarter of 2012, we completed the majority of the restructuring initiatives we laid out during the third quarter. We are simplifying our product development cycles, streamlining our supply chain operations, and methodically reviewing every process within the company to improve our cost structure and gain efficiencies. Our restructuring efforts are expected to reduce our quarterly operating expenses by approximately 25% from early 2012 levels and position us for improved financial results in the second half of the year.
    [Show full text]