А.М. Sergiyenko Computer Architecture Part 2. Parallel Architectures Kyiv-2016 0 CONTENTS CONTENTS.....................................................................................................................1 ABBREVIATIONS...........................................................................................................3 3 PARALLELISM OF ................................................................................................6 SINGLE-PROCESSOR COMPUTERS .................................................................6 3.1 Pipelined and vector data processing.....................................................................................6 3.2. Vector pipelined computers.................................................................................................16 3.3 Superscalar processors..........................................................................................................21 3.4. Methods of superscalar processor performance increasing .............................................25 3.5. Multithreading......................................................................................................................35 3.6. Memory access parallelism..................................................................................................39 4. MULTITASKING AND DATA PROTECTION ...........................................................47 4.1. Virtual memory ....................................................................................................................47 4.2. Computer multitasking........................................................................................................52 4.3 Memory protection................................................................................................................55 4.4. Memory management in the IA-32 architecture ...............................................................58 4.5. Virtual memory of the IA-64 architecture .........................................................................69 4.6. Problems................................................................................................................................71 5. MULTIPROCESSOR ARCHITECTURE ...................................................................73 5.1. Fundamentals of multiprocessor architectures .................................................................73 5.2. Processors with the SIMD parallelism ...............................................................................89 5.3 Multiprocessor Architectures...............................................................................................97 5.4 Graphics accelerator architecture .....................................................................................108 6 APPLICATION SPECIFIC PROCESSORS .............................................................111 6.1 Introduction .........................................................................................................................111 6.2 Microcontrollers ..................................................................................................................112 1 6.3 DSP microprocessors...........................................................................................................119 6.4 Processors with hardware control. Configurable computers..........................................122 6.5 System on chip in FPGA .....................................................................................................128 7 COMPUTER ARCHITECTURE PROSPECTS.........................................................132 LIST OF RECOMMENDED LITERATURE .................................................................137 ANNEX 1.....................................................................................................................139 2 ABBREVIATIONS ALU — arithmetic and logic unit APIC — Advanced Programmable Interrupt Controller ARM — Acorn (Advanced) RISC Machines AS — Architecture State BHT — Branch History Table BTB — Branch Target Buffer CISC — complex instruction set computer COMA — Cache-Only Memory Architecture CPL — Current Privilege Level CPU — central processing unit CRC — Cyclic Redundancy Check CUDA — Compute Unified Device Architecture DMA — Direct Memory Access DPL — Descriptor Privilege Level DRAM — Dynamic Random Access Memory DRIS — Deferred scheduling register Renaming Instruction Shelf DSMA — Distributed Shared Memory Architecture DSP — Digital Signal Processing GDT — Global Descriptor Table GPIO — General Purpose Input-Output GPU — Graphic Processing Unit HDD — Hard Disc Drive HSA — Heterogeneous System Architecture IA-32 — 32-bit Intel architecture IDT — Interrupt Descriptor Table IoT — Internet of Things 3 IP — Instruction Pointer ITLB — Instruction Translation Lookaside Buffer I2C — Inter-Integrated Circuit JTAG — Joint Test Action Group LDT — local descriptor table LP — Logical Processor MCM — MultiChip Module MIMD — Multiple Instruction flows — Multiple Data flows MIPS — million instructions per second, microprocessor without interlocked pipeline stages MFLOPS — mega floating point operations per second MMU — Memory Management Unit MMX — MultiMedia extension MPI — Message Passing Interface MRMW — Multiple Readers — Multiple Writers MRSW — Multiple Readers — Single Writer MT mode — Multi-Task mode MTA — Multi-Threaded Architecture NUMA — Non-Uniform Memory Access architecture OpenCL — Open Computing Language OS — Operational System PC — Program Counter, Personal Computer PKR — Protection Key Register PRAM — Parallel Random Access Machine PU — Processing Unit PVM — Parallel Virtual Machine RAM — Random Access Memory RAS — Row Address Select, Return Address Stack 4 RAT — Register Alias Table RID — Region IDdentifier RISC — Reduced Instruction Set Computer ROM — Read-Only Memory RPL — Request Privilege Level SAXPY — Single precision A multiplied by X Plus Y SIMD — Single Instruction flow — Multiple Data flows SIMT — Single Instruction — Multiple Threads SM — Streaming Multiprocessor SMP — Symmetric MultiProcessor SMT — Simultaneous MultiThreading SoC — System on a Chip SPI — Serial Peripheral Interface SPMD — Single Program — Multiple Data ST mode — Single-Task mode T-cache — Trace cache TLB — Translation Lookaside Buffer TR — Task Register TSS — Task State Segment UART — Universal Asynchronous Receiver-Transceiver UMA — Uniform Memory Access architecture VLIW — Very Large Instruction Word VPN — Virtual Page Number VSMA — Virtual Shared Memory Architecture µop — micro-operation 5 3 PARALLELISM OF SINGLE-PROCESSOR COMPUTERS 3.1 Pipelined and vector data processing 3.1.1 Basic types of parallelism in computers The main feature of the von Neumann computer architecture reviewed above is the inseparable implementation of all actions. This means that all the instructions and microinstructions are indivisible and are strictly implemen- ted in sequence. The instruction inseparability or atomicity means that its implementation cannot be slowed down and another action be performed during this situation with some elements of the memory. For example, the following instruction could not start running before the end of the previous instruction, the datum is read by the running instruction from RAM strictly after it was written there by the previous instruction A set of instructions can not be executed in several ALUs, i.e. in parallel. Therefore, the speed limit of the von Neumann processor is determined by the period of the instruction execution. It is limited by the amount of delays of instruction decoder, ALU, program RAM, data RAM. In this section, the architecture details are considered, which help to do without the action atomicity and to speed up the computation due to the parallelization. First, some principles of the computer design are recalled. The parallel processing principle . To achieve the high performance and (or) the reliability of computer computations, the independent control or calculation steps are distributed among several operating and control machines (or CPUs), which are connected through some switching system. The pipeline processing principle of information means that complex operation sequence is divided into several successively performed steps 6 (micro steps) so that they can be performed in parallel for the flow of such operations. A few definitions are added: Parallelism is an ability of simultaneous execution of independent arithmetic and logical operations or service. There are three main forms of parallelism: — natural or vector parallelism; — parallelism of independent branches; — parallelism of related operations or scalar parallelism. The essence of the parallelism of independent branches is that the independent software branches can be allocated in a program which solves the large problems, which are processed in parallel. Under natural parallelism , the ability to process the independent data by the same algorithm is understood above all. This is, for example, the ability of simultaneous processing the elements of data vectors ( vector parallelism ), performing a number of identical or similar tasks for different sets of input data. The program for the von Neumann machine is a list of operations performed in series. The scalar parallelism means that the subsets of these operations can be performed in parallel if there are no dependencies between them. Synchronization act is a moment of time, when a portion of calculati- ons transfers the control to other portions, such as the moment, when an
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages141 Page
-
File Size-