Six-Core AMD Opteron Processor Istanbul Paul G. Howard, Ph.D
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
SIMD Extensions
SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel. -
RISC-V Vector Extension Webinar I
RISC-V Vector Extension Webinar I July 13th, 2021 Thang Tran, Ph.D. Principal Engineer Who WeAndes Are Technology Corporation CPU Pure-play RISC-V Founding Major Open-Source CPU IP Vendor Premier Member Contributor/Maintainer RISC-V Ambassador 16-year-old Running Task Groups Public Company TSC Vice Chair Director of the Board Quick Facts + NL 100 years 80% FR BJ KR USA JP IL SH CPU Experience in Silicon Valley R&D SZ TW (HQ) 200+ 20K+ 7B+ Licensees AndeSight IDE Total shipment of Andes- installations Embedded™ SoC Confidential Taking RISC-V® Mainstream 2 Andes Technology Corporation Overview Andes Highlights •Founded in March 2005 in Hsinchu Science Park, Taiwan, ROC. •World class 32/64-bit RISC-V CPU IP public company •Over 200 people; 80% are engineers; R&D team consisting of Silicon Valley veterans •TSMC OIP Award “Partner of the Year” for New IP (2015) •A Premier founding member of RISC-V Foundation •2018 MCU Innovation Award by China Electronic News: AndesCore™ N25F/NX25F •ASPENCORE WEAA 2020 Outstanding Product Performance of the Year: AndesCore™ NX27V •2020 HsinChu Science Park Innovation Award: AndesCore™ NX27V Andes Mission • Trusted Computing Expert and World No.1 RISC-V IP Provider Emerging Opportunities • AIoT, 5G/Networking, Storage and Cloud computing 3 V5 Adoptions: From MCU to Datacenters • Edge to Cloud: − ADAS − Datacenter AI accelerators − AIoT − SSD: enterprise (& consumer) − Blockchain − 5G macro/small cells − FPGA − MCU − Multimedia − Security − Wireless (BT/WiFi) 5G Macro • 1 to 1000+ core • 40nm to 5nm • Many in AI Copyright© 2020 Andes Technology Corp. 4 Webinar I - Agenda • Andes overview • Vector technology background – SIMD/vector concept – Vector processor basic • RISC-V V extension ISA – Basic – CSR – Memory operations – Compute instructions • Sample codes – Matrix multiplication – Loads with RVV versions 0.8 and 1.0 • AndesCore™ NX27V introduction • Summary Copyright© 2020 Andes Technology Corp. -
Intermediate X86 Part 2
Intermediate x86 Part 2 Xeno Kovah – 2010 xkovah at gmail All materials are licensed under a Creative Commons “Share Alike” license. • http://creativecommons.org/licenses/by-sa/3.0/ 2 Paging • Previously we discussed how segmentation translates a logical address (segment selector + offset) into a 32 bit linear address. • When paging is disabled, linear addresses map 1:1 to physical addresses. • When paging is enabled, a linear address must be translated to determine the physical address it corresponds to. • It’s called “paging” because physical memory is divided into fixed size chunks called pages. • The analogy is to books in a library. When you need to find some information first you go to the library where you look up the relevant book, then you select the book and look up a specific page, from there you maybe look for a specific specific sentence …or “word”? ;) • The internets ruined the analogy! 3 Notes and Terminology • All of the figures or references to the manual in this section refer to the Nov 2008 manual (available in the class materials). This is because I think the manuals <= 2008 organized and presented much clearer than >= 2009 manuals. • When I refer to a “frame” it means “a page- sized chunk of physical memory” • When paging is enabled, a “linear address” is the same thing as a “virtual memory ” “ ” address or virtual address 4 The terrifying truth revealed! And now you knowAHHHHHHHH!!!…the rest of the story.(Nah, Good it’s not so bad day. :)) 5 Virtual Memory • When paging is enabled, the 32 bit linear address space can be mapped to a physical address space less than 32 bits. -
NASM – the Netwide Assembler
NASM – The Netwide Assembler version 2.14rc7 © 1996−2017 The NASM Development Team — All Rights Reserved This document is redistributable under the license given in the file "LICENSE" distributed in the NASM archive. Contents Chapter 1: Introduction . 17 1.1 What Is NASM?. 17 1.1.1 License Conditions . 17 Chapter 2: Running NASM . 19 2.1 NASM Command−Line Syntax . 19 2.1.1 The −o Option: Specifying the Output File Name . 19 2.1.2 The −f Option: Specifying the Output File Format . 20 2.1.3 The −l Option: Generating a Listing File . 20 2.1.4 The −M Option: Generate Makefile Dependencies. 20 2.1.5 The −MG Option: Generate Makefile Dependencies . 20 2.1.6 The −MF Option: Set Makefile Dependency File. 20 2.1.7 The −MD Option: Assemble and Generate Dependencies . 20 2.1.8 The −MT Option: Dependency Target Name . 21 2.1.9 The −MQ Option: Dependency Target Name (Quoted) . 21 2.1.10 The −MP Option: Emit phony targets . 21 2.1.11 The −MW Option: Watcom Make quoting style . 21 2.1.12 The −F Option: Selecting a Debug Information Format . 21 2.1.13 The −g Option: Enabling Debug Information. 21 2.1.14 The −X Option: Selecting an Error Reporting Format . 21 2.1.15 The −Z Option: Send Errors to a File. 22 2.1.16 The −s Option: Send Errors to stdout ..........................22 2.1.17 The −i Option: Include File Search Directories . 22 2.1.18 The −p Option: Pre−Include a File . 22 2.1.19 The −d Option: Pre−Define a Macro . -
Multiprocessing Contents
Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References ............................................. -
Cray XT and Cray XE Y Y System Overview
Crayyy XT and Cray XE System Overview Customer Documentation and Training Overview Topics • System Overview – Cabinets, Chassis, and Blades – Compute and Service Nodes – Components of a Node Opteron Processor SeaStar ASIC • Portals API Design Gemini ASIC • System Networks • Interconnection Topologies 10/18/2010 Cray Private 2 Cray XT System 10/18/2010 Cray Private 3 System Overview Y Z GigE X 10 GigE GigE SMW Fibre Channels RAID Subsystem Compute node Login node Network node Boot /Syslog/Database nodes 10/18/2010 Cray Private I/O and Metadata nodes 4 Cabinet – The cabinet contains three chassis, a blower for cooling, a power distribution unit (PDU), a control system (CRMS), and the compute and service blades (modules) – All components of the system are air cooled A blower in the bottom of the cabinet cools the blades within the cabinet • Other rack-mounted devices within the cabinet have their own internal fans for cooling – The PDU is located behind the blower in the back of the cabinet 10/18/2010 Cray Private 5 Liquid Cooled Cabinets Heat exchanger Heat exchanger (XT5-HE LC only) (LC cabinets only) 48Vdc flexible Cage 2 buses Cage 2 Cage 1 Cage 1 Cage VRMs Cage 0 Cage 0 backplane assembly Cage ID controller Interconnect 01234567 Heat exchanger network cable Cage inlet (LC cabinets only) connection air temp sensor Airflow Heat exchanger (slot 3 rail) conditioner 48Vdc shelf 3 (XT5-HE LC only) 48Vdc shelf 2 L1 controller 48Vdc shelf 1 Blower speed controller (VFD) Blooewer PDU line filter XDP temperature XDP interface & humidity sensor -
X86 Memory Protection and Translation
2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory x86 Memory Protection and Threads Formats Allocators Translation User System Calls Kernel Don Porter RCU File System Networking Sync Memory Device CPU Today’s Management Drivers Scheduler Lecture Hardware Interrupts Disk Net Consistency 1 Today’s Lecture: Focus on Hardware ABI 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Lecture Goal Undergrad Review • Understand the hardware tools available on a • What is: modern x86 processor for manipulating and – Virtual memory? protecting memory – Segmentation? • Lab 2: You will program this hardware – Paging? • Apologies: Material can be a bit dry, but important – Plus, slides will be good reference • But, cool tech tricks: – How does thread-local storage (TLS) work? – An actual (and tough) Microsoft interview question 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Memory Mapping Two System Goals 1) Provide an abstraction of contiguous, isolated virtual Process 1 Process 2 memory to a program Virtual Memory Virtual Memory 2) Prevent illegal operations // Program expects (*x) – Prevent access to other application or OS memory 0x1000 Only one physical 0x1000 address 0x1000!! // to always be at – Detect failures early (e.g., segfault on address 0) // address 0x1000 – More recently, prevent exploits that try to execute int *x = 0x1000; program data 0x1000 Physical Memory 5 6 5 6 1 2/5/20 COMP 790: OS Implementation COMP 790: OS Implementation Outline x86 Processor Modes • x86 -
AMD's Bulldozer Architecture
AMD's Bulldozer Architecture Chris Ziemba Jonathan Lunt Overview • AMD's Roadmap • Instruction Set • Architecture • Performance • Later Iterations o Piledriver o Steamroller o Excavator Slide 2 1 Changed this section, bulldozer is covered in architecture so it makes sense to not reiterate with later slides Chris Ziemba, 鳬o AMD's Roadmap • October 2011 o First iteration, Bulldozer released • June 2013 o Piledriver, implemented in 2nd gen FX-CPUs • 2013 o Steamroller, implemented in 3rd gen FX-CPUs • 2014 o Excavator, implemented in 4th gen Fusion APUs • 2015 o Revised Excavator adopted in 2015 for FX-CPUs and beyond Instruction Set: Overview • Type: CISC • Instruction Set: x86-64 (AMD64) o Includes Old x86 Registers o Extends Registers and adds new ones o Two Operating Modes: Long Mode & Legacy Mode • Integer Size: 64 bits • Virtual Address Space: 64 bits o 16 EB of Address Space (17,179,869,184 GB) • Physical Address Space: 48 bits (Current Versions) o Saves space/transistors/etc o 256TB of Address Space Instruction Set: ISA Registers Instruction Set: Operating Modes Instruction Set: Extensions • Intel x86 Extensions o SSE4 : Streaming SIMD (Single Instruction, Multiple Data) Extension 4. Mainly for DSP and Graphics Processing. o AES-NI: Advanced Encryption Standard (AES) Instructions o AVX: Advanced Vector Extensions. 256 bit registers for computationally complex floating point operations such as image/video processing, simulation, etc. • AMD x86 Extensions o XOP: AMD specified SSE5 Revision o FMA4: Fused multiply-add (MAC) instructions -
Operating Systems & Virtualisation Security Knowledge Area
Operating Systems & Virtualisation Security Knowledge Area Issue 1.0 Herbert Bos Vrije Universiteit Amsterdam EDITOR Andrew Martin Oxford University REVIEWERS Chris Dalton Hewlett Packard David Lie University of Toronto Gernot Heiser University of New South Wales Mathias Payer École Polytechnique Fédérale de Lausanne The Cyber Security Body Of Knowledge www.cybok.org COPYRIGHT © Crown Copyright, The National Cyber Security Centre 2019. This information is licensed under the Open Government Licence v3.0. To view this licence, visit: http://www.nationalarchives.gov.uk/doc/open-government-licence/ When you use this information under the Open Government Licence, you should include the following attribution: CyBOK © Crown Copyright, The National Cyber Security Centre 2018, li- censed under the Open Government Licence: http://www.nationalarchives.gov.uk/doc/open- government-licence/. The CyBOK project would like to understand how the CyBOK is being used and its uptake. The project would like organisations using, or intending to use, CyBOK for the purposes of education, training, course development, professional development etc. to contact it at con- [email protected] to let the project know how they are using CyBOK. Issue 1.0 is a stable public release of the Operating Systems & Virtualisation Security Knowl- edge Area. However, it should be noted that a fully-collated CyBOK document which includes all of the Knowledge Areas is anticipated to be released by the end of July 2019. This will likely include updated page layout and formatting of the individual Knowledge Areas KA Operating Systems & Virtualisation Security j October 2019 Page 1 The Cyber Security Body Of Knowledge www.cybok.org INTRODUCTION In this Knowledge Area, we introduce the principles, primitives and practices for ensuring se- curity at the operating system and hypervisor levels. -
Chapter 3 Protected-Mode Memory Management
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT This chapter describes the Intel 64 and IA-32 architecture’s protected-mode memory management facilities, including the physical memory requirements, segmentation mechanism, and paging mechanism. See also: Chapter 5, “Protection” (for a description of the processor’s protection mechanism) and Chapter 20, “8086 Emulation” (for a description of memory addressing protection in real-address and virtual-8086 modes). 3.1 MEMORY MANAGEMENT OVERVIEW The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mech- anism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional. These two mechanisms (segmentation and paging) can be configured to support simple single-program (or single- task) systems, multitasking systems, or multiple-processor systems that used shared memory. As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). -
X86 Memory Protection and Translation
x86 Memory Protection and Translation Don Porter CSE 506 Lecture Goal ò Understand the hardware tools available on a modern x86 processor for manipulating and protecting memory ò Lab 2: You will program this hardware ò Apologies: Material can be a bit dry, but important ò Plus, slides will be good reference ò But, cool tech tricks: ò How does thread-local storage (TLS) work? ò An actual (and tough) Microsoft interview question Undergrad Review ò What is: ò Virtual memory? ò Segmentation? ò Paging? Two System Goals 1) Provide an abstraction of contiguous, isolated virtual memory to a program 2) Prevent illegal operations ò Prevent access to other application or OS memory ò Detect failures early (e.g., segfault on address 0) ò More recently, prevent exploits that try to execute program data Outline ò x86 processor modes ò x86 segmentation ò x86 page tables ò Software vs. Hardware mechanisms ò Advanced Features ò Interesting applications/problems x86 Processor Modes ò Real mode – walks and talks like a really old x86 chip ò State at boot ò 20-bit address space, direct physical memory access ò Segmentation available (no paging) ò Protected mode – Standard 32-bit x86 mode ò Segmentation and paging ò Privilege levels (separate user and kernel) x86 Processor Modes ò Long mode – 64-bit mode (aka amd64, x86_64, etc.) ò Very similar to 32-bit mode (protected mode), but bigger ò Restrict segmentation use ò Garbage collect deprecated instructions ò Chips can still run in protected mode with old instructions Translation Overview 0xdeadbeef Segmentation 0x0eadbeef Paging 0x6eadbeef Virtual Address Linear Address Physical Address Protected/Long mode only ò Segmentation cannot be disabled! ò But can be a no-op (aka flat mode) x86 Segmentation ò A segment has: ò Base address (linear address) ò Length ò Type (code, data, etc). -
C++ Code __M128 Add (Const __M128 &X, Const __M128 &Y){ X X3 X2 X1 X0 Return Mm Add Ps(X, Y); } + + + + +
ECE/ME/EMA/CS 759 High Performance Computing for Engineering Applications Final Project Related Issues Variable Sharing in OpenMP OpenMP synchronization issues OpenMP performance issues November 9, 2015 Lecture 24 © Dan Negrut, 2015 ECE/ME/EMA/CS 759 UW-Madison Quote of the Day “Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid.” -- Frank Zappa, Musician 1940 - 1993 2 Before We Get Started Issues covered last time: Final Project discussion Open MP optimization issues, wrap up Today’s topics SSE and AVX quick overview Parallel computing w/ MPI Other issues: HW08, due on Wd, Nov. 10 at 11:59 PM 3 Parallelism, as Expressed at Various Levels Cluster Group of computers communicating through fast interconnect Coprocessors/Accelerators Special compute devices attached to the local node through special interconnect Node Group of processors communicating through shared memory Socket Group of cores communicating through shared cache Core Group of functional units communicating through registers Hyper-Threads Group of thread contexts sharing functional units Superscalar Group of instructions sharing functional units Pipeline Sequence of instructions sharing functional units Vector Single instruction using multiple functional units Have discussed already Haven’t discussed yet 4 [Intel] Have discussed, but little direct control Instruction Set Architecture (ISA) Extensions Extensions to the base x86 ISA One way the x86 has evolved over the years Extensions for vectorizing