Acloser Look at Microprocessors That Have

Total Page:16

File Type:pdf, Size:1020Kb

Acloser Look at Microprocessors That Have A CLOSER LOOK AT MICROPROCESSORS THAT HAVE SHAPED THE DIGITAL WORLD Christopher U. Ngene1 Student member IEEE, Manish Kumar Mishra2 1Computer Engineering Faculty, Kharkov National University of Radioelectronics, Kharkov, Ukraine [email protected]. 2Department of Computer Engineering University of Maiduguri, Nigeria; [email protected] 11 The fate of both Intel and Microsoft was dramatically Abstract – If you have been following the development in the changed in 1981 when IBM introduced the IBM PC, which microprocessor world you would attest to the fact that things was based on a 4.77MHz Intel 8088 16-bit processor running have dramatically changed since the introduction of the first the Microsoft Disk Operating System (MS-DOS) 1.0. Was world acclaimed microprocessor Intel 4004 in 1971. What were there any one chip that propelled Intel into the Fortune 500? the changes that have been made to these processors that have actually improved our lots, especially how we perceive the world Intel says there was: the 8088 [6]. Since that fateful decision around us and improve our productivity at work? In this study, was made, computers became affordable and found their we investigate different general purpose processors with a view ways into our homes. The focus of this paper is on Intel and to enlightening consumers and enthusiasts alike, determine AMD processors. Even at that we have discussed selected which of the myriads of processors will be most appropriate for number of processors that have made their marks in the their tasks and the choice of which makes more economic sense. industry. The obvious reason for this selective approach is We have been able to explore in relative detail that processor because there are myriads of processors in the inventory of speed is not the only determinant of processor performance but these vendors which the limited space at our disposal will not of most significant is the architecture. This study reveals that be enough to accommodate. new technology is not the only factor that determines whether a new processor is actually new, but most importantly marketing Available publications on microprocessors have focused considerations have been the driving force. mainly on the historical development, specific processor reviews and benchmarking [1][2][18]. We agree that these are necessary. But numerous users of these processors may I. INTRODUCTION not quite appreciate some of the technical jargons employed in some of these publications. With respect to the above this Since the introduction of the first commercial integrated paper has taken a wider perspective to give our readers a circuit in 1961 and the introduction of the first holistic view. This has dictated the approach we have taken microprocessor in 1971, the semiconductor industry has and have encapsulated all vital information regarding experienced a healthy growth. Anyone involved in electronic processor within the pages of this paper. In this paper we design or electronic design automation (EDA), marketing or have given a brief historical background of microprocessors analysis of electronic devices knows that things are becoming and how the Moore’s law has been holding out as a result of evermore complex as the years go by and microprocessors innovations in chip fabrication enabling smaller and smaller are no exception to this rule. By comparison today’s feature sizes. The rest of this paper is presented as follows: microprocessors are more complex than the 1960’s and ‘70s section 2 presents some historical backgrounds on the mainframe’s central processing units. Most importantly these development of transistors, integrated circuits and processors outperform those mainframe CPUs and are subsequently microprocessors. In this section we have also cheaper and affordable. Architectures have also evolved to looked at the basic processor architectures. Section 3 presents the extent that we no longer talk about CISC or RISC but a the different types of processors and microprocessor feature combination of both. Other architectures that enhance trends. Special purpose processors – microcontrollers, performance like EPIC (Explicitly Parallel Instruction graphic and digital signal processors were discussed, but a computing) based on VLIW in conjunction with pipelining, detailed description of these processors have been left out for super-scaling and hyper-threading have boosted performance future work. Section 4 presents methods of chip fabrication, to unimaginable level. foundries, process yield and process technologies and an evaluation of the relationships between die sizes, lithography and transistor count per die. 1 Manuscript received November 3, 2009. Christopher U. Ngene is with the Kharkov National Uuniversity of 2. BACKGROUND INFORMATION Radioelectronics, Computer Engineering Faculty, Lenin Prosp., 14, Kharkov, 61166, Ukraine (corresponding author, phone: +38057-7021326), Before we proceed further a little background would [email protected]. suffice in order to appreciate where we are now by knowing Manish Kumar Mishra is with Department of Computer Engineering University of Maiduguri, Nigeria; [email protected] where we came from. The development of computer systems is closely tied with processors and subsequently R&I, 2009, No4 41 microprocessors. Processor (Central Processing Unit - CPU) language. The discovery of transistor by three bell laboratory in conjunction with the memory is the brain of the computer. scientists - J. Bardeen, H. W. Brattain, and W. Shockley Data processing (arithmetic and logic operations take place in launched the Second generation processors. Transistors the CPU). Early computers which were mainly mainframes revolutionised electronics in general and computers in have very large CPUs. Early CPUs were implemented as particular. Transistors were much smaller than vacuum tubes, discrete components and numerous small integrated circuits consumed less energy, faster switching and more reliable. (ICs) on one or more circuit boards. Microprocessors, on the Programming of the CPU was done in assembly languages other hand, are CPUs manufactured on a very small number (symbolic languages) and followed by high level languages of ICs; usually just one. The overall smaller CPU size as a such as FORTRAN and COBOL. Standardization trend result of being implemented on a single die means faster generally began in the era of discrete transistor CPUs. With switching time because of physical factors like decreased gate this improvement more complex and reliable CPUs were built parasitic capacitance. Prior to the advent of machines that onto one or several printed circuit boards containing resemble today's CPUs, computers such as the ENIAC had to individual components. After the deployment of transistors as be physically rewired in order to perform different tasks. a switching element, CPUs were still large and occupies These machines are often referred to as "fixed-program several circuit boards. The needs to reduce the size of computers," since they had to be physically reconfigured in components were primary preoccupation of engineers and order to run a different program. Since the term "CPU" is scientists. A method of manufacturing many transistors in a generally defined as a software (computer program) compact space was developed. This method is known as execution device, the earliest devices that could rightly be Integrated Circuit (IC). An IC is a complete electronic circuit called CPUs came with the advent of the stored-program on a small chip of silicon. Beginning in 1965 ICs began to computer. CPU deals with discrete states and thus employs replace transistors in CPUs. In 1959, Jack Kilby and Robert switching elements for change of states. Before the discovery Noyce independently invented a means of fabricating of transistors, electrical relays and vacuum tubes (thermionic multiple transistors on a single slab of semiconductor valves) were commonly used as switching elements. The material. electromechanical relays and vacuum tubes have the 2.2 Scale of Integration problems of contact bounce and heat respectively. They generally have a slow switching capability. They are Dimensions on an IC are measured in units of considered to be very unreliable for the above reasons. Tube micrometers, with one micrometer (1 µm) being one computers like EDVAC are generally faster than millionth of a meter. To serve as a reference point, a human electromechanical computer (Harvard Mark I) but are less hair is roughly 100 µm in diameter. Each year, researchers reliable. EDVAC tended to average eight hours between and engineers have been finding new ways to steadily reduce failures, whereas relay computers (Harvard Mark I) failed these feature sizes to pack more transistors into the same rarely. silicon area. There are different levels of integration. This has to do with the number of digital components that are placed 2.1 Evolution and Direction of Development on a single chip. The early ICs contained only one building Let us start by examining the different switching elements block (logic gates) such as AND gates etc. CPUs based on and subsequent technologies that characterises the this sort of IC are known as Small Scale Integration (SSI) generations of microprocessor. Vacuum tubes and devices. Such ICs contained tens of transistors. To build an electromechanical relays were used in the first generation entire CPU out of SSI ICs required thousands of individual Processors. One other important drawback of these early chips,
Recommended publications
  • Release Notes for X11R6.8.2 the X.Orgfoundation the Xfree86 Project, Inc
    Release Notes for X11R6.8.2 The X.OrgFoundation The XFree86 Project, Inc. 9February 2005 Abstract These release notes contains information about features and their status in the X.Org Foundation X11R6.8.2 release. It is based on the XFree86 4.4RC2 RELNOTES docu- ment published by The XFree86™ Project, Inc. Thereare significant updates and dif- ferences in the X.Orgrelease as noted below. 1. Introduction to the X11R6.8.2 Release The release numbering is based on the original MIT X numbering system. X11refers to the ver- sion of the network protocol that the X Window system is based on: Version 11was first released in 1988 and has been stable for 15 years, with only upwardcompatible additions to the coreX protocol, a recordofstability envied in computing. Formal releases of X started with X version 9 from MIT;the first commercial X products werebased on X version 10. The MIT X Consortium and its successors, the X Consortium, the Open Group X Project Team, and the X.OrgGroup released versions X11R3 through X11R6.6, beforethe founding of the X.OrgFoundation. Therewill be futuremaintenance releases in the X11R6.8.x series. However,efforts arewell underway to split the X distribution into its modular components to allow for easier maintenance and independent updates. We expect a transitional period while both X11R6.8 releases arebeing fielded and the modular release completed and deployed while both will be available as different consumers of X technology have different constraints on deployment. Wehave not yet decided how the modular X releases will be numbered. We encourage you to submit bug fixes and enhancements to bugzilla.freedesktop.orgusing the xorgproduct, and discussions on this server take place on <[email protected]>.
    [Show full text]
  • Stephen Clarke-Willson
    Contact Stephen Clarke-Willson www.linkedin.com/in/drstephencw Programmer, Producer, Executive (LinkedIn) Sammamish www.arena.net (Company) www.above-the-garage.com (Personal) Summary above-the-garage.com/blog (Blog) Software technology development leadership and management. Top Skills Specialties: Technical / Product team building and management; Systems Programming System Architecture experience as first, second and third level manager in fast growing Game Development companies; systems programming, systems architecture, technology development. Publications Guild Wars Microservices and 24/7 Uptime Guild Wars 2 - Scaling from one to Experience millions Applying Game Design To Virtual NCSOFT Environments VP of Technology Nano-Plasm March 2019 - Present (1 year 7 months) Bellevue, WA ArenaNet LLC 13 years 6 months Programmer in the role of Studio Technical Director May 2013 - March 2019 (5 years 11 months) Bellevue, WA Lead engineering staff (about 100 members) for "gaming as a service" with continuous high volume content creation and delivery at MMO scale. Translate business objectives into innovative yet achievable technical challenges. Created technical unit with flat reporting structure and peer input reviews. Programmer in the roles of Server Programmer / Server Team Lead October 2005 - April 2013 (7 years 7 months) Developing multi-threaded, restartable, dynamically updatable, high performance, internet-resilient, bug free server code for the MMO Guild Wars (1 and 2). Above the Garage Productions Programmer / Owner Page 1 of 4 May 2004 - October 2005 (1 year 6 months) Game Technology Developer, Above the Garage Productions Developed downloadable music system "DirectSong.com" (from payment system [PayPal] to delivery system to embedded music player using Microsoft WMA technology).
    [Show full text]
  • Reviving the Development of Openchrome
    Reviving the Development of OpenChrome Kevin Brace OpenChrome Project Maintainer / Developer XDC2017 September 21st, 2017 Outline ● About Me ● My Personal Story Behind OpenChrome ● Background on VIA Chrome Hardware ● The History of OpenChrome Project ● Past Releases ● Observations about Standby Resume ● Developmental Philosophy ● Developmental Challenges ● Strategies for Further Development ● Future Plans 09/21/2017 XDC2017 2 About Me ● EE (Electrical Engineering) background (B.S.E.E.) who specialized in digital design / computer architecture in college (pretty much the only undergraduate student “still” doing this stuff where I attended college) ● Graduated recently ● First time conference presenter ● Very experienced with Xilinx FPGA (Spartan-II through 7 Series FPGA) ● Fluent in Verilog / VHDL design and verification ● Interest / design experience with external communication interfaces (PCI / PCIe) and external memory interfaces (SDRAM / DDR3 SDRAM) ● Developed a simple DMA engine for PCI I/F validation w/Windows WDM (Windows Driver Model) kernel device driver ● Almost all the knowledge I have is self taught (university engineering classes were not very useful) 09/21/2017 XDC2017 3 Motivations Behind My Work ● General difficulty in obtaining meaningful employment in the digital hardware design field (too many students in the field, difficulty obtaining internship, etc.) ● Collects and repairs abandoned computer hardware (It’s like rescuing puppies!) ● Owns 100+ desktop computers and 20+ laptop computers (mostly abandoned old stuff I
    [Show full text]
  • 4010, 237 8514, 226 80486, 280 82786, 227, 280 a AA. See Anti-Aliasing (AA) Abacus, 16 Accelerated Graphics Port (AGP), 219 Acce
    Index 4010, 237 AIB. See Add-in board (AIB) 8514, 226 Air traffic control system, 303 80486, 280 Akeley, Kurt, 242 82786, 227, 280 Akkadian, 16 Algebra, 26 Alias Research, 169 Alienware, 186 A Alioscopy, 389 AA. See Anti-aliasing (AA) All-In-One computer, 352 Abacus, 16 All-points addressable (APA), 221 Accelerated Graphics Port (AGP), 219 Alpha channel, 328 AccelGraphics, 166, 273 Alpha Processor, 164 Accel-KKR, 170 ALT-256, 223 ACM. See Association for Computing Altair 680b, 181 Machinery (ACM) Alto, 158 Acorn, 156 AMD, 232, 257, 277, 410, 411 ACRTC. See Advanced CRT Controller AMD 2901 bit-slice, 318 (ACRTC) American national Standards Institute (ANSI), ACS, 158 239 Action Graphics, 164, 273 Anaglyph, 376 Acumos, 253 Anaglyph glasses, 385 A.D., 15 Analog computer, 140 Adage, 315 Anamorphic distortion, 377 Adage AGT-30, 317 Anatomic and Symbolic Mapper Engine Adams Associates, 102 (ASME), 110 Adams, Charles W., 81, 148 Anderson, Bob, 321 Add-in board (AIB), 217, 363 AN/FSQ-7, 302 Additive color, 328 Anisotropic filtering (AF), 65 Adobe, 280 ANSI. See American national Standards Adobe RGB, 328 Institute (ANSI) Advanced CRT Controller (ACRTC), 226 Anti-aliasing (AA), 63 Advanced Remote Display Station (ARDS), ANTIC graphics co-processor, 279 322 Antikythera device, 127 Advanced Visual Systems (AVS), 164 APA. See All-points addressable (APA) AED 512, 333 Apalatequi, 42 AF. See Anisotropic filtering (AF) Aperture grille, 326 AGP. See Accelerated Graphics Port (AGP) API. See Application program interface Ahiska, Yavuz, 260 standard (API) AI.
    [Show full text]
  • Evolution of Microprocessor Performance
    EvolutionEvolution ofof MicroprocessorMicroprocessor PerformancePerformance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static & dynamic branch predication. Even with these improvements, the restriction of issuing a single instruction per cycle still limits the ideal CPI = 1 Multiple Issue (CPI <1) Multi-cycle Pipelined T = I x CPI x C (single issue) Superscalar/VLIW/SMT Original (2002) Intel Predictions 1 GHz ? 15 GHz to ???? GHz IPC CPI > 10 1.1-10 0.5 - 1.1 .35 - .5 (?) Source: John P. Chen, Intel Labs We next examine the two approaches to achieve a CPI < 1 by issuing multiple instructions per cycle: 4th Edition: Chapter 2.6-2.8 (3rd Edition: Chapter 3.6, 3.7, 4.3 • Superscalar CPUs • Very Long Instruction Word (VLIW) CPUs. Single-issue Processor = Scalar Processor EECC551 - Shaaban Instructions Per Cycle (IPC) = 1/CPI EECC551 - Shaaban #1 lec # 6 Fall 2007 10-2-2007 ParallelismParallelism inin MicroprocessorMicroprocessor VLSIVLSI GenerationsGenerations Bit-level parallelism Instruction-level Thread-level (?) (TLP) 100,000,000 (ILP) Multiple micro-operations Superscalar /VLIW per cycle Simultaneous Single-issue CPI <1 u Multithreading SMT: (multi-cycle non-pipelined) Pipelined e.g. Intel’s Hyper-threading 10,000,000 CPI =1 u uuu u u Chip-Multiprocessors (CMPs) u Not Pipelined R10000 e.g IBM Power 4, 5 CPI >> 1 uuuuuuu u AMD Athlon64 X2 u uuuuu Intel Pentium D u uuuuuuuu u u 1,000,000 u uu uPentium u u uu i80386 u i80286
    [Show full text]
  • Intel® Core™ Microarchitecture • Wrap Up
    EW N IntelIntel®® CoreCore™™ MicroarchitectureMicroarchitecture MarchMarch 8,8, 20062006 Stephen L. Smith Bob Valentine Vice President Architect Digital Enterprise Group Intel Architecture Group Agenda • Multi-core Update and New Microarchitecture Level Set • New Intel® Core™ Microarchitecture • Wrap Up 2 Intel Multi-core Roadmap – Updates since Fall IDF 3 Ramping Multi-core Everywhere 4 All products and dates are preliminary and subject to change without notice. Refresher: What is Multi-Core? Two or more independent execution cores in the same processor Specific implementations will vary over time - driven by product implementation and manufacturing efficiencies • Best mix of product architecture and volume mfg capabilities – Architecture: Shared Caches vs. Independent Caches – Mfg capabilities: volume packaging technology • Designed to deliver performance, OEM and end user experience Single die (Monolithic) based processor Multi-Chip Processor Example: 90nm Pentium® D Example: Intel Core™ Duo Example: 65nm Pentium D Processor (Smithfield) Processor (Yonah) Processor (Presler) Core0 Core1 Core0 Core1 Core0 Core1 Front Side Bus Front Side Bus Front Side Bus *Not representative of actual die photos or relative size 5 Intel® Core™ Micro-architecture *Not representative of actual die photo or relative size 6 Intel Multi-core Roadmap 7 Intel Multi-core Roadmap 8 Intel® Core™ Microarchitecture Based Platforms Platform 2006 20072007 Caneland Platform (2007) MP Servers Tigerton (QC) (2007) Bensley Platform (Q2’06)/ Glidewell Platform (Q2’06) ) DP Servers/ Woodcrest (Q3’06) DP Workstation Clovertown (QC) (Q1’07) Kaylo Platform (Q3’06)/ Wyloway Platform (Q3 ’06) UP Servers/ Conroe (Q3’06) UP Workstation Kentsfield (QC) (Q1’07) Bridge Creek Platform (Mid’06) Desktop -Home Conroe (Q3’06) Kentsfield (QC) (Q1’07) Desktop -Office Averill Platform (Mid’06) Conroe (Q3’06) Mobile Client Napa Platform (Q1’06) Merom (2H’06) All products and dates are preliminary 9 Note: only Intel® Core™ microarchitecture QC refers to Quad-Core and subject to change without notice.
    [Show full text]
  • Manycore GPU Architectures and Programming, Part 1
    Lecture 19: Manycore GPU Architectures and Programming, Part 1 Concurrent and Mul=core Programming CSE 436/536, [email protected] www.secs.oakland.edu/~yan 1 Topics (Part 2) • Parallel architectures and hardware – Parallel computer architectures – Memory hierarchy and cache coherency • Manycore GPU architectures and programming – GPUs architectures – CUDA programming – Introduc?on to offloading model in OpenMP and OpenACC • Programming on large scale systems (Chapter 6) – MPI (point to point and collec=ves) – Introduc?on to PGAS languages, UPC and Chapel • Parallel algorithms (Chapter 8,9 &10) – Dense matrix, and sorng 2 Manycore GPU Architectures and Programming: Outline • Introduc?on – GPU architectures, GPGPUs, and CUDA • GPU Execuon model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruc?on intrinsic and library • Performance, profiling, debugging, and error handling • Direc?ve-based high-level programming model – OpenACC and OpenMP 3 Computer Graphics GPU: Graphics Processing Unit 4 Graphics Processing Unit (GPU) Image: h[p://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 5 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient compung • Unlocking poten?als of complex apps • Enabling Deeper scien?fic discovery 6 What is GPU Today? • It is a processor op?mized for 2D/3D graphics, video, visual compu?ng, and display. • It is highly parallel, highly multhreaded mulprocessor op?mized for visual
    [Show full text]
  • The Intel X86 Microarchitectures Map Version 2.0
    The Intel x86 Microarchitectures Map Version 2.0 P6 (1995, 0.50 to 0.35 μm) 8086 (1978, 3 µm) 80386 (1985, 1.5 to 1 µm) P5 (1993, 0.80 to 0.35 μm) NetBurst (2000 , 180 to 130 nm) Skylake (2015, 14 nm) Alternative Names: i686 Series: Alternative Names: iAPX 386, 386, i386 Alternative Names: Pentium, 80586, 586, i586 Alternative Names: Pentium 4, Pentium IV, P4 Alternative Names: SKL (Desktop and Mobile), SKX (Server) Series: Pentium Pro (used in desktops and servers) • 16-bit data bus: 8086 (iAPX Series: Series: Series: Series: • Variant: Klamath (1997, 0.35 μm) 86) • Desktop/Server: i386DX Desktop/Server: P5, P54C • Desktop: Willamette (180 nm) • Desktop: Desktop 6th Generation Core i5 (Skylake-S and Skylake-H) • Alternative Names: Pentium II, PII • 8-bit data bus: 8088 (iAPX • Desktop lower-performance: i386SX Desktop/Server higher-performance: P54CQS, P54CS • Desktop higher-performance: Northwood Pentium 4 (130 nm), Northwood B Pentium 4 HT (130 nm), • Desktop higher-performance: Desktop 6th Generation Core i7 (Skylake-S and Skylake-H), Desktop 7th Generation Core i7 X (Skylake-X), • Series: Klamath (used in desktops) 88) • Mobile: i386SL, 80376, i386EX, Mobile: P54C, P54LM Northwood C Pentium 4 HT (130 nm), Gallatin (Pentium 4 Extreme Edition 130 nm) Desktop 7th Generation Core i9 X (Skylake-X), Desktop 9th Generation Core i7 X (Skylake-X), Desktop 9th Generation Core i9 X (Skylake-X) • Variant: Deschutes (1998, 0.25 to 0.18 μm) i386CXSA, i386SXSA, i386CXSB Compatibility: Pentium OverDrive • Desktop lower-performance: Willamette-128
    [Show full text]
  • Energy Per Instruction Trends in Intel® Microprocessors
    Energy per Instruction Trends in Intel® Microprocessors Ed Grochowski, Murali Annavaram Microarchitecture Research Lab, Intel Corporation 2200 Mission College Blvd, Santa Clara, CA 95054 [email protected], [email protected] Abstract where throughput performance is the primary objective. In order to deliver high throughput performance within a Energy per Instruction (EPI) is a measure of the amount fixed power budget, a microprocessor must achieve low of energy expended by a microprocessor for each EPI. instruction that the microprocessor executes. In this It is important to note that MIPS/watt and EPI do not paper, we present an overview of EPI, explain the consider the amount of time (latency) needed to process factors that affect a microprocessor’s EPI, and derive a an instruction from start to finish. Other metrics such as MIPS 2/watt (related to energy•delay) and MIPS 3/watt historical comparison of the trends in EPI over multiple 2 generations of Intel microprocessors. We show that the (related to energy•delay ) assign increasing importance recent Intel® Pentium® M and Intel® Core™ Duo to the time required to process instructions, and are thus microprocessors achieve significantly lower EPI than used in environments in which latency performance is what would be expected from a continuation of historical the primary objective. trends. 2. What Determines EPI? 1. Introduction Consider a capacitor that is charged and discharged With the power consumption of recent desktop by a CMOS inverter as shown in Figure 1. microprocessors having reached 130 watts, power has emerged at the forefront of challenges facing the V microprocessor designer [1, 2].
    [Show full text]
  • University of Klagenfurt Digital Signal Processor (DSP) MD SARWAR
    University of Klagenfurt Digital Signal Processor (DSP) GROUP MEMBERS : MD SARWAR ZAHAN (MATRIX:1461419) BASHIRU OTOKITI (MATRIX:1361474) Topic: “GPU Processing” Proc. IEEE 96(5), 2008 AGENDA Introduction GPU Algorithm About GPU CPU VS GPU Short History Application of GPU GPU Pipeline, Architecture Conclusion WHAT IS GPU ? A graphics processing unit (GPU) is a dedicated processor that performs rapid mathematical calculations for rendering high quality video and images . The Abstract goal of a GPU is to enable a representation of a 3D world as realistically as possible. SHORT HISTORY OF GPU 2010 to 1970s 1980s 1990s 2000 to 2010 present ARCADE SYSTEM BOARDS S3 GRAPHICS NVIDIA & AUDI NEC 7220 NVIDIA 3D GRAPHICS RENDERING PIPELINE Image: 3D GRAPHICS RENDERING PIPELINE Vertex Processing: Process and transform individual vertices. Rasterization: Convert each primitive into a set of fragments. Fragment Processing: Process individual fragments. Output Merging: Combine the fragments of all primitives into color-pixel for the display. GPU ARCHITECTURE Image: NVidia GeForce 6800 Series GPU Board Host (CPU). 6 parallel vertex processors (receive data from the host). Image: NVidia GeForce 6800 GPU Architecture triangle setup stage (takes care of primitive assembly). rasterizer stage which produces the fragments. 16 processors (computes the output colors of each fragment). GPU COMPUTING Parallelism is the future of computing. GPU has moved from a fixed-function into full-fledged parallel programmable processor. GPU follow a single program multiple-data (SPMD) programming model. Image: SPMD Model SPMD Tasks are split up and run simultaneously on multiple processors with different input for faster results. GPU SOFTWARE ENVIRONMENTS Famous languages for GPU programming: NVIDIA’s (CUDA) OpenCL HLSL Cg GPU PERFORMANCE EVALUATION Image: GPU Performance Scan performance on CPU, graphics-based GPU (using OpenGL), and direct-compute GPU (using CUDA).
    [Show full text]
  • 5 Microprocessors
    Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 5 Microprocessors “MEGAHERTZ: This is a really, really big hertz.” —DAVE BARRY In this chapter, you will learn or all practical purposes, the terms microprocessor and central processing how to Funit (CPU) mean the same thing: it’s that big chip inside your computer ■ Identify the core components of a that many people often describe as the brain of the system. You know that CPU CPU makers name their microprocessors in a fashion similar to the automobile ■ Describe the relationship of CPUs and memory industry: CPU names get a make and a model, such as Intel Core i7 or AMD ■ Explain the varieties of modern Phenom II X4. But what’s happening inside the CPU to make it able to do the CPUs amazing things asked of it every time you step up to the keyboard? ■ Install and upgrade CPUs 124 P:\010Comp\BaseTech\380-8\ch05.vp Friday, December 18, 2009 4:59:24 PM Color profile: Disabled Composite Default screen BaseTech / Mike Meyers’ CompTIA A+ Guide to Managing and Troubleshooting PCs / Mike Meyers / 380-8 / Chapter 5 Historical/Conceptual ■ CPU Core Components Although the computer might seem to act quite intelligently, comparing the CPU to a human brain hugely overstates its capabilities. A CPU functions more like a very powerful calculator than like a brain—but, oh, what a cal- culator! Today’s CPUs add, subtract, multiply, divide, and move billions of numbers per second.
    [Show full text]
  • Lecture: Manycore GPU Architectures and Programming, Part 1
    Lecture: Manycore GPU Architectures and Programming, Part 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE569/ 1 Manycore GPU Architectures and Programming: Outline • Introduction – GPU architectures, GPGPUs, and CUDA • GPU Execution model • CUDA Programming model • Working with Memory in CUDA – Global memory, shared and constant memory • Streams and concurrency • CUDA instruction intrinsic and library • Performance, profiling, debugging, and error handling • Directive-based high-level programming model – OpenACC and OpenMP 2 Computer Graphics GPU: Graphics Processing Unit 3 Graphics Processing Unit (GPU) Image: http://www.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html 4 Graphics Processing Unit (GPU) • Enriching user visual experience • Delivering energy-efficient computing • Unlocking potentials of complex apps • Enabling Deeper scientific discovery 5 What is GPU Today? • It is a processor optimized for 2D/3D graphics, video, visual computing, and display. • It is highly parallel, highly multithreaded multiprocessor optimized for visual computing. • It provide real-time visual interaction with computed objects via graphics images, and video. • It serves as both a programmable graphics processor and a scalable parallel computing platform. – Heterogeneous systems: combine a GPU with a CPU • It is called as Many-core 6 Graphics Processing Units (GPUs): Brief History GPU Computing General-purpose computing on graphics processing units (GPGPUs) GPUs with programmable shading Nvidia GeForce GE 3 (2001) with programmable shading DirectX graphics API OpenGL graphics API Hardware-accelerated 3D graphics S3 graphics cards- single chip 2D accelerator Atari 8-bit computer IBM PC Professional Playstation text/graphics chip Graphics Controller card 1970 1980 1990 2000 2010 Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit 7 NVIDIA Products • NVIDIA Corp.
    [Show full text]