Architecture and Advantages of SIMD in Multimedia Applications

Total Page:16

File Type:pdf, Size:1020Kb

Architecture and Advantages of SIMD in Multimedia Applications Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930 Architecture and Advantages of SIMD in Multimedia Applications Sarah M. Al-sudany, Department of Computer Engineering University of Technology, Baghdad-Iraq E-mail: [email protected] Ahmed S. Al-Araji Department of Computer Engineering University of Technology, Baghdad-Iraq E-mail:[email protected] Bassam M. Saeed Department of Computer Engineering University of Technology, Baghdad-Iraq E-mail: [email protected] Abstract— In this paper, we identified the single instruction multi-data architecture (SIMD) that is a method of computing parallelism. Most modern processor designs contain SIMD in order to increase performance of the computer. The aims of this work are to describe the classification of SIMD architecture in computer systems that it depends on implementation when dealing with performance time and to utilize the efficiency SIMD in multicore and multi-processing for able computer system when implementing a program for better performance. This can be achieved by studying the basic principle of SIMD architecture and emphasized two types of SIMD array Processors: array processor and vector processor then identified advantages and disadvantages of these types as well as focusing on the types of SIMD architecture, true and pipelined SIMD. This paper provides an overview of the characteristic multimedia extensions to SIMD then analyze the development in the use of multimedia extensions in the applications that need speeding up processing such as digital signal processing (DSP), image processing and mobile application. Keywords— Array Processing Architecture, SIMD Architecture, Vector Processing Architectur. I. INTRODUCTION The architecture of the microprocessor was usually motivated by higher demands. Several design methods were used to remove numerous forms of applications parallel [1]. Computer engineering has achieved technological developments in recent years which made a great mutual impact, especially in the dissemination of single- instruction multi data (SIMD) for individual education [2]. For example, in the early 1970s, the first use of the SIMD instructions was used with CDC Star 100 and TI ASC machines which were able to perform the same functionality on a batch of data . A new era has begun in the use of SIMD to process the data in parallel with thought machines CM-1 and CM-2 considered as highly parallel processing supercomputers. Many desktop computers are doing many tasks nowadays such as video processing and gaming in real time, with a bundle of digital information. Companies were therefore trying to use this architecture on desktop computers [3]. Sun Microsystems introduced SIMD integer instructions in the 1995 UltraSPARC I microprocessor in a VIS (visual instruction set) extension. MIPS (MIPS Digital Media extension) introduced the MDMX. By adding MMX extensions to the x86 architecture in 1996, Intel made SIMD widely available. Motorola then implemented AltiVec's PowerPC system, which was also used in POWER systems by IBM [4]. This caused the Intel's to react which was SSE. SSE and its extensions are now more used than the others. The purpose of introducing SIMD extensions is that it especially applies to common tasks such as adjusting contrast in a digital image, adjusting digital audio volume, and other audio tools [5]. Most modern central processing unit (CPU) designs including SIMD instructions to improve multimedia usage performance. SIMD capabilities of processors must be taken into account during standardization. For example, a reasonable intermediate calculation accuracy is determined and Volume XII, Issue VI, 2020 Page No: 1452 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930 the sample dependencies are done through a clear effort. Parallel computer systems, on different pieces of data execute the same method [6]. The SIMD Accelerator thus only needs interchangeable working modules with a very basic control mechanism. The improved performance and low complexity of SIMD accelerators have contributed to SIMD operation, as its name suggests, through operating with a single direction on several data elements in parallel [7]. Therefore, the first move in having SIMD to be introduced is to encrypt several data elements that can be operated in parallel with vector-compile-time vector instructions. The compilers execute software review in order to identify various scalar three instructions executing the same operation [8]. Many of these instructions are combined into a common instruction named the vector / SIMD instructions, based on the form of instruction and the appearance of the SIMD accelerator [9] Vectorized instructions, and then implemented on the SIMD accelerator. Some of these implementations are described below: The cumulative number of instructions on executing a programme reduces due to single vector instructions for coding different operations. These effects are decreased guidance on specifications for cache capacity. Instead, it improves the access rate to the help instruction cache and improves performance [10]. If there are fewer instructions, it can reduce the amount of execution that the front-end processor needs to perform. It requires receiving guidance for decoding and its lower timetable. The back-end will therefore delete less command. This ensures improved energy quality [11]. Less instruction decreases the amount of work the processor has to do at the front end. The instructions need to be decoded and organised for less instruction. However, there is less back-end instruction. This mechanism leads to the improvement of energy efficiency. Different operations embedded in an order allow for a broad variety of successful operations. It leads to efficient manual scheduling. Therefore, design of the SIMD architecture gives computational design several improvement benefits such as the execuition time performance, scalability of data size, cost saving and provides concurrently. The huge performance benefits can only be seen in the world of graphical applications with a SIMD technique, and at the same time, many problems can be benefited from the SIMD technique. Even when working with consumer applications, being cautious about using memory and CPU can bring enormous benefits to the user experience as you learn more about software engineering. This paper is organized according to the following: Section II, describes the basic principle of SIMD architecture; Section III introduces the implementing of SIMD architecture; Section IV explains the multimedia applications; Conclusions are cited in section V. II. BASIC PRINCIPLES OF SIMD ARCHITECTURE When researching SIMD architecture, it is necessary to know the rules work of Data parallelism. Because of the core work of SIMD component depends on the characteristic of parallelism, it is important to know the SIMD processing, this research will investigate Attached array processors and SIMD array processors, especially knowing the term of vector processing and array processing, and focus on the difference between these architectures in regard to each distinct application. It will search and review each of these topics so that it becomes easy then to enter into the classification of applications that use SIMD architecture. 2.1 Data Parallelism Data parallelism aimed to improve processing speed dependent on data set storage capacity in overlapping computing flows. For examples, the method of consolidation of customer address gathers an address and tries to turn it into a regular type. It function is adaptable to data parallelism and can be optimised by adding eight 32 bit standardisation processes for the addresses by a factor of 8 and streaming a part of information for each case as in Fig. 1. Figure 1. The data parallelism in SIMD. Volume XII, Issue VI, 2020 Page No: 1453 Journal of Xi'an University of Architecture & Technology ISSN No : 1006-7930 This method is a more precise and stronger parallel by continuously applying the same limited range of tasks to several data sources, to increase our output. However, big gains can be made on today's processors if it is possible to know when and how to implement SIMD. As with all performance improvements, we should measure gains on a typical target device before putting it into production. Although some types of architecture use SIMD with minimal code changes, other custom algorithms bring additional complexity, so we should also consider this trade off against expected performance gains [12]. 2.2 SIMD Processing Array processors were called multiprocessors were vector processors for some time; this performs on a broad variety of data and can boost computer efficiency. There are two types of array processors: Attached array processors. SIMD array processors. An attached array processor is a processor that is attached to a general-purpose computer and its function is to boost and enhance the computer's performance in numerical computational operations. It achieves high performance by handling multiple functional units in parallel, therefore, this review interested in the second type of processors. Which is a single computer system that has multiple processors running in parallel. Processing units built to operate under the control of a common control unit, have the duty to provide a single instruction stream in addition to multiple data flows. The diagram below shows a general block of an array processor as in Fig. 2. Figure 2. General block of an array processor. It comprises
Recommended publications
  • The Central Processing Unit(CPU). the Brain of Any Computer System Is the CPU
    Computer Fundamentals 1'stage Lec. (8 ) College of Computer Technology Dept.Information Networks The central processing unit(CPU). The brain of any computer system is the CPU. It controls the functioning of the other units and process the data. The CPU is sometimes called the processor, or in the personal computer field called “microprocessor”. It is a single integrated circuit that contains all the electronics needed to execute a program. The processor calculates (add, multiplies and so on), performs logical operations (compares numbers and make decisions), and controls the transfer of data among devices. The processor acts as the controller of all actions or services provided by the system. Processor actions are synchronized to its clock input. A clock signal consists of clock cycles. The time to complete a clock cycle is called the clock period. Normally, we use the clock frequency, which is the inverse of the clock period, to specify the clock. The clock frequency is measured in Hertz, which represents one cycle/second. Hertz is abbreviated as Hz. Usually, we use mega Hertz (MHz) and giga Hertz (GHz) as in 1.8 GHz Pentium. The processor can be thought of as executing the following cycle forever: 1. Fetch an instruction from the memory, 2. Decode the instruction (i.e., determine the instruction type), 3. Execute the instruction (i.e., perform the action specified by the instruction). Execution of an instruction involves fetching any required operands, performing the specified operation, and writing the results back. This process is often referred to as the fetch- execute cycle, or simply the execution cycle.
    [Show full text]
  • The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and Computer
    Microprocessors (0630371) Fall 2010/2011 – Lecture Notes # 1 The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and computer Outline of the Lecture Evolution of programming languages. Microcomputer Architecture. Instruction Execution Cycle. Evolution of programming languages: Machine language - the programmer had to remember the machine codes for various operations, and had to remember the locations of the data in the main memory like: 0101 0011 0111… Assembly Language - an instruction is an easy –to- remember form called a mnemonic code . Example: Assembly Language Machine Language Load 100100 ADD 100101 SUB 100011 We need a program called an assembler that translates the assembly language instructions into machine language. High-level languages Fortran, Cobol, Pascal, C++, C# and java. We need a compiler to translate instructions written in high-level languages into machine code. Microprocessor-based system (Micro computer) Architecture Data Bus, I/O bus Memory Storage I/O I/O Registers Unit Device Device Central Processing Unit #1 #2 (CPU ) ALU CU Clock Control Unit Address Bus The figure shows the main components of a microprocessor-based system: CPU- Central Processing Unit , where calculations and logic operations are done. CPU contains registers , a high-frequency clock , a control unit ( CU ) and an arithmetic logic unit ( ALU ). o Clock : synchronizes the internal operations of the CPU with other system components using clock pulsing at a constant rate (the basic unit of time for machine instructions is a machine cycle or clock cycle) One cycle A machine instruction requires at least one clock cycle some instruction require 50 clocks. o Control Unit (CU) - generate the needed control signals to coordinate the sequencing of steps involved in executing machine instructions: (fetches data and instructions and decodes addresses for the ALU).
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • Unit 8 : Microprocessor Architecture
    Unit 8 : Microprocessor Architecture Lesson 1 : Microcomputer Structure 1.1. Learning Objectives On completion of this lesson you will be able to : ♦ draw the block diagram of a simple computer ♦ understand the function of different units of a microcomputer ♦ learn the basic operation of microcomputer bus system. 1.2. Digital Computer A digital computer is a multipurpose, programmable machine that reads A digital computer is a binary instructions from its memory, accepts binary data as input and multipurpose, programmable processes data according to those instructions, and provides results as machine. output. 1.3. Basic Computer System Organization Every computer contains five essential parts or units. They are Basic computer system organization. i. the arithmetic logic unit (ALU) ii. the control unit iii. the memory unit iv. the input unit v. the output unit. 1.3.1. The Arithmetic and Logic Unit (ALU) The arithmetic and logic unit (ALU) is that part of the computer that The arithmetic and logic actually performs arithmetic and logical operations on data. All other unit (ALU) is that part of elements of the computer system - control unit, register, memory, I/O - the computer that actually are there mainly to bring data into the ALU to process and then to take performs arithmetic and the results back out. logical operations on data. An arithmetic and logic unit and, indeed, all electronic components in the computer are based on the use of simple digital logic devices that can store binary digits and perform simple Boolean logic operations. Data are presented to the ALU in registers. These registers are temporary storage locations within the CPU that are connected by signal paths of the ALU.
    [Show full text]
  • CPU) the CPU Is the Brains of the Computer, and Is Also Known As the Processor (A Single Chip Also Known As Microprocessor)
    Central processing unit (CPU) The CPU is the brains of the computer, and is also known as the processor (a single chip also known as microprocessor). This electronic component interprets and carries out the basic instructions that operate the computer. Cache as a rule holds data waiting to be processed and instructions waiting to be executed. The main parts of the CPU are: control unit arithmetic logic unit (ALU), and registers – also referred as Cache registers The CPU is connected to a circuit board called the motherboard also known as the system board. Click here to see more information on the CPU Let’s look inside the CPU and see what the different components actually do and how they interact Control unit The control unit directs and co-ordinates most of the operations in the computer. It is a bit similar to a traffic officer controlling traffic! It translates instructions received from a program/application and then begins the appropriate action to carry out the instruction. Specifically the control unit: controls how and when input devices send data stores and retrieves data to and from specific locations in memory decodes and executes instructions sends data to other parts of the CPU during operations sends data to output devices on request Arithmetic Logic Unit (ALU): The ALU is the computer’s calculator. It handles all math operations such as: add subtract multiply divide logical decisions - true or false, and/or, greater then, equal to, or less than Registers Registers are special temporary storage areas on the CPU. They are: used to store items during arithmetic, logic or transfer operations.
    [Show full text]
  • Reverse Engineering X86 Processor Microcode
    Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz, Ruhr-University Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/koppe This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX Reverse Engineering x86 Processor Microcode Philipp Koppe, Benjamin Kollenda, Marc Fyrbiak, Christian Kison, Robert Gawlik, Christof Paar, and Thorsten Holz Ruhr-Universitat¨ Bochum Abstract hardware modifications [48]. Dedicated hardware units to counter bugs are imperfect [36, 49] and involve non- Microcode is an abstraction layer on top of the phys- negligible hardware costs [8]. The infamous Pentium fdiv ical components of a CPU and present in most general- bug [62] illustrated a clear economic need for field up- purpose CPUs today. In addition to facilitate complex and dates after deployment in order to turn off defective parts vast instruction sets, it also provides an update mechanism and patch erroneous behavior. Note that the implementa- that allows CPUs to be patched in-place without requiring tion of a modern processor involves millions of lines of any special hardware. While it is well-known that CPUs HDL code [55] and verification of functional correctness are regularly updated with this mechanism, very little is for such processors is still an unsolved problem [4, 29]. known about its inner workings given that microcode and the update mechanism are proprietary and have not been Since the 1970s, x86 processor manufacturers have throughly analyzed yet.
    [Show full text]
  • Reconfigurable Accelerators in the World of General-Purpose Computing
    Reconfigurable Accelerators in the World of General-Purpose Computing Dissertation A thesis submitted to the Faculty of Electrical Engineering, Computer Science and Mathematics of Paderborn University in partial fulfillment of the requirements for the degree of Dr. rer. nat. by Tobias Kenter Paderborn, Germany August 26, 2016 Acknowledgments First and foremost, I would like to thank Prof. Dr. Christian Plessl for the advice and support during my research. As particularly helpful, I perceived his ability to communicate suggestions depending on the situation, either through open questions that give room to explore and learn, or through concrete recommendations that help to achieve results more directly. Special thanks go also to Prof. Dr. Marco Platzner for his advice and support. I profited especially from his experience and ability to systematically identify the essence of challenges and solutions. Furthermore, I would like to thank: • Prof. Dr. João M. P. Cardoso, for serving as external reviewer for my dissertation. • Prof. Dr. Friedhelm Meyer auf der Heide and Dr. Matthias Fischer for serving on my oral examination committee. • All colleagues with whom I had the pleasure to work at the PC2 and the Computer Engineering Group, researchers, technical and administrative staff. In a variation to one of our coffee kitchen puns, I’d like to state that research without colleagues is possible, but pointless. However, I’m not sure about the first part. • My long-time office mates Lars Schäfers and Alexander Boschmann for particularly extensive discussions on our research and far beyond. • Gavin Vaz, Heinrich Riebler and Achim Lösch for intensive and productive collabo- ration on joint research interests.
    [Show full text]
  • 1. Central Processing Unit (CPU): 2. Memory Unit
    Department of Electrical Engineering. First Year / 2016-2017 By: Salwa Adel Al-agha Lecture 2 1. Central Processing Unit (CPU): The Central Processing Unit (CPU) is an internal component of the computer, portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU is the brain of the computer. On personal computers and small workstations, the CPU is housed in a single silicon chip called a microprocessor. Modern CPUs are large scale integrated circuits in small, rectangular packages, with multiple connecting pins. Two typical components of a CPU are: - Arithmetic Logic Unit (ALU). - Control Unit (CU). Modern CPUs are small and square and contain multiple metallic connectors or pins on the underside. Pentium chip or processor, made by Intel, is the most common CPU though there are many other companies that produce processors for personal computers. 2. Memory Unit: Memory is a collection of storage registers used to transfer information in and out of the unit. Memory is one of the easiest pieces of hardware to add to computer. The actual work is done in (memory) and the finished result is stored in (disk). The information stored in the memory as binary code in groups of bits called word. The binary is two logic levels: - Logic (1). - Logic (0). Bit: is binary digit (1) or (0), and Byte: is a group of eight bits. 9 Department of Electrical Engineering. First Year / 2016-2017 By: Salwa Adel Al-agha Lecture 2 Memory in general is divided into two general categories: 2.1 Read Only Memory: Read-Only Memory (ROM) is a class of storage medium used in computers and other electronic devices.
    [Show full text]
  • How a Processor Works
    How a Processor Works 2008 HowStuffWorks Microprocessors are at the heart of all computers. See more computer hardware pictures. The computer you are using to read this page uses a microprocessor to do its work. The microprocessor is the heart of any normal computer, whether it is a desktop machine, a server or a laptop. The microprocessor you are using might be a Pentium, a K6, a PowerPC, a Sparc or any of the many other brands and types of microprocessors, but they all do approximately the same thing in approximately the same way. A microprocessor -- also known as a CPU or central processing unit -- is a complete computation engine that is fabricated on a single chip. The first microprocessor was the Intel 4004, introduced in 1971. The 4004 was not very powerful -- all it could do was add and subtract, and it could only do that 4 bits at a time. But it was amazing that everything was on one chip. Prior to the 4004, engineers built computers either from collections of chips or from discrete components (transistors wired one at a time). The 4004 powered one of the first portable electronic calculators. More About CPUs CPU Quiz Pentium vs. Celeron CPU Speed Curiosity Project: What is an assembler? If you have ever wondered what the microprocessor in your computer is doing, or if you have ever wondered about the differences between types of microprocessors, then read on. In this article, you will learn how fairly simple digital logic techniques allow a computer to do its job, whether its playing a game or spell checking a document ! How It Works: Processors So let's get into it.
    [Show full text]
  • Summarizing CPU and GPU Design Trends with Product Data
    Summarizing CPU and GPU Design Trends with Product Data Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli Northeastern University Email: fyifansun, agostini, shidong, [email protected] Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following semiconductor industry for the past few decades. Recently, both questions: laws have faced validity challenges as transistor sizes approach • Are Moore’s Law and Dennard Scaling still valid? If so, the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In what are the factors that keep the laws valid? this work, we collect data of more than 4000 publicly-available • Do GPUs still have computing power advantages over CPU and GPU products. We find that transistor scaling remains CPUs? Is the computing capability gap between CPUs critical in keeping the laws valid. However, architectural solutions and GPUs getting larger? have become increasingly important and will play a larger role • What factors drive performance improvements in GPUs? in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise II. METHODOLOGY because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we We have collected data for all CPU and GPU products (to also see the ratio of GPU to CPU performance moving closer to our best knowledge) that have been released by Intel, AMD parity, thanks to new SIMD extensions on CPUs and increased (including the former ATI GPUs)1, and NVIDIA since January CPU core counts.
    [Show full text]
  • 5 Computer Organization
    5 Computer Organization Source: Foundations of Computer Science Cengage Learning 5.1 Objectives After studying this chapter, students should be able to: List the three subsystems of a computer. Describe the role of the central processing unit (CPU). Describe the fetch-decode-execute phases of a cycle. Describe the main memory and its addressing space. Define the input/output subsystem. Understand the interconnection of subsystems. Describe different methods of input/output addressing. Distinguish the two major trends in the design of computers. Understand how computer throughput can be improved using pipelining and parallel processing. 5.2 1 A computer can be divided into three broad categories or subsystem: the central processing unit (CPU), the main memory and the input/output subsystem. 5.3 5-1 CENTRAL PROCESSING UNIT The central processing unit (CPU) performs operations on data. In most architectures it has three parts: an arithmetic logic unit (ALU), a control unit and a set of registers, fast storage locations. 5.4 2 The arithmetic logic unit (ALU) The arithmetic logic unit (ALU) performs logic, shift, and arithmetic operations on data. Logic operations: NOT, AND, OR, and XOR. Shift operations: logic shift operations and arithmetic shift operations Arithmetic operations: arithmetic operations on integers and reals. 5.5 Registers Registers are fast stand-alone storage locations that hold data temporarily. Multiple registers are needed to facilitate the operation of the CPU. Data registers Instruction register Program counter The control unit The control unit controls the operation of each subsystem. Controlling is achieved through signals sent from the control unit to other subsystems.
    [Show full text]