A Survey of Multicore Processors

Total Page:16

File Type:pdf, Size:1020Kb

A Survey of Multicore Processors [ Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge] A Survey of Multicore Processors [A review of their common attributes] eneral-purpose multicore processors are being accepted in all segments of the industry, including signal processing and embedded space, as the need for more performance and general-pur- Gpose programmability has grown. Parallel process- ing increases performance by adding more parallel resources while maintaining manageable power characteristics. The implementations of multicore processors are numerous and diverse. Designs range from conventional multiprocessor machines to designs that consist of a “sea” of programmable arithmetic logic units (ALUs). In this article, we cover some of the attributes common to all multi- core processor implementations and illustrate these attributes with current and future commercial multi- © PHOTO F/X2 core designs. The characteristics we focus on are applica- tion domain, power/performance, processing elements, memory system, and accelerators/integrated peripherals. INTRODUCTION Parallel processors have had a long history going back at least scalar processor that could be easily programmed in a conven- to the Solomon computer of the mid-1960s. The difficulty of tional manner, and the vector programming paradigm was not programming them meant they have been primarily employed as daunting as creating general parallel programs. Recently by scientists and engineers who understood the application the evolution of parallel machines has changed dramatically. domain and had the resources and skill to program them. For the first time, major chip manufacturers—companies Along the way, a surprising number of companies created par- whose primary business is fabricating and selling microproces- allel machines. They were largely unsuccessful since their dif- sors—have turned to offering parallel machines, or single chip ficulty of use limited their customer base, although, there were multicore microprocessors as they have been styled. exceptions: the Cray vector machines are perhaps the best There are a number of reasons behind this, but the leading one example. However, these Cray machines also had a very fast is to continue the raw performance growth that customers have come to expect from Moore’s law scaling without being over- Digital Object Identifier 10.1109/MSP.2009.934110 whelmed by the growth in power consumption. As single core IEEE SIGNAL PROCESSING MAGAZINE [26] NOVEMBER 2009 1053-5888/09/$26.00©2009IEEE designs were pushed to ever processing, audio processing, higher clock speeds, the power IN THE PAST DECADE, POWER HAS and wireless baseband process- required grew at a faster rate JOINED PERFORMANCE AS A FIRST ing. Many of the classic signal than the frequency. This power CLASS DESIGN CONSTRAINT. processing algorithms are part problem was exacerbated by of this group. The computation designs that attempted to of these types of applications is dynamically extract extra performance from the instruction typically a sequence of operations on a stream of data with little stream, as we will note later. This led to designs that were complex, or no data reuse. The operations can frequently be performed unmanageable, and power hungry. The trend was unsustainable. in parallel and often require high throughput and performance But ever higher performance is still desired as evident by predic- to handle the large amounts of data. These kind of applications tions from the ITRS Roadmap [1] predicting a need for 300x more favor designs that have as many processing elements as practi- performance by 2022 as shown in Figure 1. To meet these cal in regards to desired power/performance ratio. demands, chip designers have turned to multicore processors and parallel programming to continue the push for more performance, CONTROL PROCESSING DOMINATED and in turn the ITRS Roadmap has projected that by 2022, there Control-dominated applications include file compression/ will be chips with upwards of 100x more cores than on current decompression, network processing, and transactional query multicore processors. The main advantage to multicore systems is processing. The code for these types of applications tend to be that raw performance increase can come from increasing the num- dominated by conditional branches, complicating parallelism. ber of cores rather than frequency, which translates into a slower The programs themselves often need to keep track of large growth in power consumption. However, this approach represents amounts of state and often have a high amount of data reuse. a significant gamble because parallel programming science has not These types of applications favor a more modest number of advanced nearly as fast as our ability to build parallel hardware. general-purpose processing elements to handle the unstruc- General-purpose multicores are becoming necessary even tured nature of control dominated code. in the realm of digital signal processing (DSP) where, in the In almost all cases, no application can fit into these neat past, one general-purpose control core marshaled many spe- divisions, but execution phases of an application may. For cial purpose application-specific integrated circuits (ASICs) as instance the H.264/AVC [4] video codec is data dominant when part of a “system on chip.” This is primarily due to the variety performing the block filter, but control dominated when com- of applications and performance required from these chips. pressing or decompressing video using context-adaptive binary This has driven the need for more general-purpose processors. arithmetic coding (CABAC) compression. It is valuable to Recent examples would include software-defined radio (SDR) think of applications as falling into these divisions to under- base stations, or cell phone processors that are required to stand how different multicores design aspects can affect per- support numerous codecs and applications all with different formance. An unbalanced architecture may do very well on the characteristics, requiring a general programmable multicore. data dominated portion of the H.264/AVC, but be very ineffi- cient for CABAC encoding/decoding, leading to less than ARCHITECTURE CLASSIFICATIONS desired performance. Multicore architectures can be classified in a number of ways. In this section we discuss five of the most distinguishing POWER/PERFORMANCE attributes: the application class, power/ performance, pro- Many applications and devices have strict performance and cessing elements, memory system, and accelerators/ integrat- power requirements. For instance, a mobile phone that wants to ed peripherals. APPLICATION CLASS 1,000 1/T : Intrinsic If a machine is targeted to a specific application domain, the Switching Speed architecture can be made to reflect this. The result is a design Number of DPE 100 Processing that is efficient for the domain in question but often ill-suited to Performance other areas. The extreme example is an ASIC. Tuning to an application domain can have several positive consequences. 10 Perhaps the most valuable is the potential for significant power savings. Conventional DSPs are a good example. There are two broad classes of processing into which an appli- Normalized to 2007 Trend, 1 cation can fall: data processing dominated and control dominated. 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Year of Production DATA PROCESSING DOMINATED Data processing-dominated applications contain many familiar [FIG1] ITRS Roadmap [1] for frequency, number of data types of applications including graphics rasterization, image processing elements (DPE) and overall performance. IEEE SIGNAL PROCESSING MAGAZINE [27] NOVEMBER 2009 support video playback has a vector transpose in one instruc- THE MAIN ADVANTAGE TO MULTICORE strict power budget, but it also tion. The soft-core provider has to meet certain perfor- SYSTEMS IS THAT RAW PERFORMANCE Tensilica has made this the mance characteristics. INCREASE CAN COME FROM INCREASING main selling point for their Performance has been the THE NUMBER OF CORES RATHER Xtensa CPUs [7], offering cus- traditional goal. In the past THAN FREQUENCY. tomizable special purpose decade, power has joined per- instructions for specific designs. formance as a first-class design constraint. This is, in large part, due to the rise of mobile phones and other forms of MICROARCHITECTURE mobile computing where battery life and size are critical. More The processing element microarchitecture governs, in many recently, power consumption has also become a concern for respects, the performance and power consumption that can computers that are not mobile. The driving force behind this be expected from the multicore. The microarchitecture of is the growth in data centers to support “cloud” computing. each processing element is often tailored to the application The typical general-purpose multicore processor is ideally domain that is targeted by the multicore machine. Although suited to these centers, but these centers are now consuming the commercial offerings of the major chip manufacturers more energy than heavy manufacturing in the United States. like Intel employ numbers of identical cores into a homoge- They are so large—Google now refers to them as warehouse- neous architecture, it is often advantageous to combine dif- scale computers—that the power consumption of the basic ferent types of processing elements into a heterogeneous multicore component is critical to the cost and operation. architecture. The idea is again
Recommended publications
  • News, Information, Rumors, Opinions, Etc
    http://www.physics.miami.edu/~chris/srchmrk_nws.html ● Miami-Dade/Broward/Palm Beach News ❍ Miami Herald Online Sun Sentinel Palm Beach Post ❍ Miami ABC, Ch 10 Miami NBC, Ch 6 ❍ Miami CBS, Ch 4 Miami Fox, WSVN Ch 7 ● Local Government, Schools, Universities, Culture ❍ Miami Dade County Government ❍ Village of Pinecrest ❍ Miami Dade County Public Schools ❍ ❍ University of Miami UM Arts & Sciences ❍ e-Veritas Univ of Miami Faculty and Staff "news" ❍ The Hurricane online University of Miami Student Newspaper ❍ Tropic Culture Miami ❍ Culture Shock Miami ● Local Traffic ❍ Traffic Conditions in Miami from SmartTraveler. ❍ Traffic Conditions in Florida from FHP. ❍ Traffic Conditions in Miami from MSN Autos. ❍ Yahoo Traffic for Miami. ❍ Road/Highway Construction in Florida from Florida DOT. ❍ WSVN's (Fox, local Channel 7) live Traffic conditions in Miami via RealPlayer. News, Information, Rumors, Opinions, Etc. ● Science News ❍ EurekAlert! ❍ New York Times Science/Health ❍ BBC Science/Technology ❍ Popular Science ❍ Space.com ❍ CNN Space ❍ ABC News Science/Technology ❍ CBS News Sci/Tech ❍ LA Times Science ❍ Scientific American ❍ Science News ❍ MIT Technology Review ❍ New Scientist ❍ Physorg.com ❍ PhysicsToday.org ❍ Sky and Telescope News ❍ ENN - Environmental News Network ● Technology/Computer News/Rumors/Opinions ❍ Google Tech/Sci News or Yahoo Tech News or Google Top Stories ❍ ArsTechnica Wired ❍ SlashDot Digg DoggDot.us ❍ reddit digglicious.com Technorati ❍ del.ic.ious furl.net ❍ New York Times Technology San Jose Mercury News Technology Washington
    [Show full text]
  • Technology, Media, and Telecommunications Predictions
    Technology, Media, and Telecommunications Predictions 2019 Deloitte’s Technology, Media, and Telecommunications (TMT) group brings together one of the world’s largest pools of industry experts—respected for helping companies of all shapes and sizes thrive in a digital world. Deloitte’s TMT specialists can help companies take advantage of the ever- changing industry through a broad array of services designed to meet companies wherever they are, across the value chain and around the globe. Contact the authors for more information or read more on Deloitte.com. Technology, Media, and Telecommunications Predictions 2019 Contents Foreword | 2 5G: The new network arrives | 4 Artificial intelligence: From expert-only to everywhere | 14 Smart speakers: Growth at a discount | 24 Does TV sports have a future? Bet on it | 36 On your marks, get set, game! eSports and the shape of media in 2019 | 50 Radio: Revenue, reach, and resilience | 60 3D printing growth accelerates again | 70 China, by design: World-leading connectivity nurtures new digital business models | 78 China inside: Chinese semiconductors will power artificial intelligence | 86 Quantum computers: The next supercomputers, but not the next laptops | 96 Deloitte Australia key contacts | 108 1 Technology, Media, and Telecommunications Predictions 2019 Foreword Dear reader, Welcome to Deloitte Global’s Technology, Media, and Telecommunications Predictions for 2019. The theme this year is one of continuity—as evolution rather than stasis. Predictions has been published since 2001. Back in 2009 and 2010, we wrote about the launch of exciting new fourth-generation wireless networks called 4G (aka LTE). A decade later, we’re now making predic- tions about 5G networks that will be launching this year.
    [Show full text]
  • PC Gamers Win Big
    CASE STUDY Intel® Solid-State Drives Performance, Storage and the Customer Experience PC Gamers Win Big Intel® Solid-State Drives (SSDs) deliver the ultimate gaming experience, providing dramatic visual and runtime improvements Intel® Solid-State Drives (SSDs) represent a revolutionary breakthrough, delivering a giant leap in storage performance. Designed to satisfy the most demanding gamers, media creators, and technology enthusiasts, Intel® SSDs bring a high level of performance and reliability to notebook and desktop PC storage. Faster load times and improved graphics performance such as increased detail in textures, higher resolution geometry, smoother animation, and more characters on the screen make for a better gaming experience. Developers are now taking advantage of these features in their new game designs. SCREAMING LOAD TiMES AND SMOOTH GRAPHICS With no moving parts, high reliability, and a longer life span than traditional hard drives, Intel Solid-State Drives (SSDs) dramatically improve the computer gaming experience. Load times are substantially faster. When compared with Western Digital VelociRaptor* 10K hard disk drives (HDDs), gamers experienced up to 78 percent load time improvements using Intel SSDs. Graphics are smooth and uninterrupted, even at the highest graphics settings. To see the performance difference in a head-to- head video comparing the Intel® X25-M SATA SSD with a 10,000 RPM HDD, go to www.intelssdgaming.com. When comparing frame-to-frame coherency with the Western Digital VelociRaptor 10K HDD, the Intel X25-M responds with zero hitching while the WD VelociRaptor shows hitching seven percent of the time. This means gamers experience smoother visual transitions with Intel SSDs.
    [Show full text]
  • Lecture 1: CS/ECE 3810 Introduction
    Lecture 1: CS/ECE 3810 Introduction • Today’s topics: . Why computer organization is important . Logistics . Modern trends 1 Why Computer Organization 2 Image credits: uber, extremetech, anandtech Why Computer Organization 3 Image credits: gizmodo Why Computer Organization • Embarrassing if you are a BS in CS/CE and can’t make sense of the following terms: DRAM, pipelining, cache hierarchies, I/O, virtual memory, … • Embarrassing if you are a BS in CS/CE and can’t decide which processor to buy: 4.4 GHz Intel Core i9 or 4.7 GHz AMD Ryzen 9 (reason about performance/power) • Obvious first step for chip designers, compiler/OS writers • Will knowledge of the hardware help you write better and more secure programs? 4 Must a Programmer Care About Hardware? • Must know how to reason about program performance and energy and security • Memory management: if we understand how/where data is placed, we can help ensure that relevant data is nearby • Thread management: if we understand how threads interact, we can write smarter multi-threaded programs Why do we care about multi-threaded programs? 5 Example 200x speedup for matrix vector multiplication • Data level parallelism: 3.8x • Loop unrolling and out-of-order execution: 2.3x • Cache blocking: 2.5x • Thread level parallelism: 14x Further, can use accelerators to get an additional 100x. 6 Key Topics • Moore’s Law, power wall • Use of abstractions • Assembly language • Computer arithmetic • Pipelining • Using predictions • Memory hierarchies • Accelerators • Reliability and Security 7 Logistics
    [Show full text]
  • Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis
    Tousimojarad, Ashkan (2016) GPRM: a high performance programming framework for manycore processors. PhD thesis. http://theses.gla.ac.uk/7312/ Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Glasgow Theses Service http://theses.gla.ac.uk/ [email protected] GPRM: A HIGH PERFORMANCE PROGRAMMING FRAMEWORK FOR MANYCORE PROCESSORS ASHKAN TOUSIMOJARAD SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy SCHOOL OF COMPUTING SCIENCE COLLEGE OF SCIENCE AND ENGINEERING UNIVERSITY OF GLASGOW NOVEMBER 2015 c ASHKAN TOUSIMOJARAD Abstract Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards in- creased parallelism. However, increased parallelism does not necessarily lead to better per- formance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general- purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM).
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • 1 Introduction
    Technical report, IDE1055, September 2010 Radar Signal Processing on Ambric Platform Master’s Thesis in Computer Systems Engineering Yadagiri Pyaram & Md Mashiur Rahman firstFFT RealFFT finalFFT B0 B4 B8 D1 A1 8-Point Input B1 B5 B9 Output D0 A0 B2 B6 B10 D2 A2 B3 B7 B11 Assembler Objects Distributor Objects Butterfly Operation School of Information Science, Computer and Electrical Engineering Halmstad University 2 Radar Signal Processing on Ambric Platform Master Thesis in Computer Systems Engineering School of Information Science, Computer and Electrical Engineering Halmstad University Box 823, S-301 18 Halmstad, Sweden September 2010 3 4 Figure on cover page : Design approach for one of algorithm FFT from page no.41, Figure 15: 8-Point FFT Design Approach 5 6 Acknowledgements This Master’s thesis is the part of Master’s Program for the fulfilment of Master’s Degree specialization in Computer Systems Engineering at Halmstad University. In this project the Ambric processor developed by Ambric Inc. has been used. Behalf of completion of thesis, we are glad to our supervisors, Professor Bertil Svensson and Zain-ul-Abdin Ph.D. Student, Dept of IDE, Halmstad University for their valuable guidance and support throughout this thesis project and without them it would be difficult to complete our thesis. And we are very glad to the Jerker Bengtsson Dept of IDE, Halmstad University, who gave valuable suggestions in the meetings, which helped us. And we would like to thank the Library personnel who helped some stuff in documentation and also to Eva Nestius Research Secretary, IDE Division University of Halmstad for providing access to Laboratory.
    [Show full text]
  • Bit MIPS Processor for Multi-Core
    IETE 46th Mid Term Symposium “Impact of Technology on Skill Development” MTS - 2015 Special Issue of International Journal of Electronics, Communication & Soft Computing Science and Engineering, ISSN: 2277 -9477 Design of 16-bit MIPS Processor for Multi -Core SoC Ms. Jinal K.Tapar Ms. Shrushti K. Tapar Prof. Ashish E. Bhande Abstract — This being the era of fast, high performance multiple instructions like add, move, branch etc. computing, there is the need of having efficient optimizations in simultaneously, leading to fast execution speeds. TheThe the processor architecture and at the same time in memory simultaneous working of these different cores on a chip hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are achieves the goal of “ parallel computing”. Computer compelling to increase number of cores in the main processor viz., architecture courses in university and technical schoolsschools aaroundround dual-core, quad-core, octa-core and so on. Thus, a MPSoC with 8- the world often study the MIPS architecture due to its various cores supporting both message-passing and shared -memory inter- basic and easy-to-understand design characteristics. core communication mechanisms is to be implemented . It is well known that p rocessor is a heart of any computing device. Thus, a The MIPS instruction set archi tecture (ISA) is a RISC 16-bit RISC type processor supporting the MIPS III instruction (Reduced instruction set computer) based microprocessormicroprocessor set architecture (ISA) with special support for detecting the data architecture that was developed by MIPS Computer SystemsSystems dependency cases like RAW is synthesized and sim ulated using Inc.
    [Show full text]
  • Focused on Embedded Systems
    Process Network in Silicon: A High-Productivity, Scalable Platform for High-Performance Embedded Computing Mike Butts [email protected] / [email protected] Chess Seminar, UC Berkeley, 11/25/08 Focused on Embedded Systems Not General-Purpose Systems — Must support huge existing base of apps, which use very large shared memory — They now face severe scaling issues with SMP (symmetric multiprocessing) Not GPU Supercomputers — SIMD architecture for scientific computing — Depends on data parallelism — 170 watts! Embedded Systems — Max performance on one job, strict limits on cost & power, hard real-time determinism — HW/SW developed from scratch by specialist engineers — Able to adopt new programming models UCBerkeley EECS Chess Seminar – 11/25/08 2 1 Scaling Limits: CPU/DSP, ASIC/FPGA MIPS 2008: 5X gap Single CPU & DSP performance 10000 has fallen off Moore’s Law 20%/year 1000 — All the architectural features that turn Moore’s Law area into speed have been used up. 100 Hennessy & 52%/year Patterson, — Now it’s just device speed. Computer 10 Architecture: A CPU/DSP does not scale Quantitative Approach, 4th ed. ASIC project now up to $30M 1986 1990 1994 1998 2002 2006 — NRE, Fab / Design, Validation HW Design Productivity Gap — Stuck at RTL — 21%/yr productivity vs 58%/yr Moore’s Law ASICs limited now, FPGAs soon ASIC/FPGA does not scale Gary Smith, The Crisis of Complexity, Parallel Processing is the Only Choice DAC 2003 UCBerkeley EECS Chess Seminar – 11/25/08 3 Parallel Platforms for Embedded Computing Program processors in software, far more productive than hardware design Massive parallelism is available — A basic pipelined 32-bit integer CPU takes less than 50,000 transistors — Medium-sized chip has over 100 million transistors available.
    [Show full text]
  • FPGA Architecture: Survey and Challenges
    Foundations and TrendsR in Electronic Design Automation Vol. 2, No. 2 (2007) 135–253 c 2008 I. Kuon, R. Tessier and J. Rose DOI: 10.1561/1000000005 FPGA Architecture: Survey and Challenges Ian Kuon1, Russell Tessier2 and Jonathan Rose1 1 The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada, {ikuon, jayar}@eecg.utoronto.ca 2 Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, USA, [email protected] Abstract Field-Programmable Gate Arrays (FPGAs) have become one of the key digital circuit implementation media over the last decade. A crucial part of their creation lies in their architecture, which governs the nature of their programmable logic functionality and their programmable inter- connect. FPGA architecture has a dramatic effect on the quality of the final device’s speed performance, area efficiency, and power consump- tion. This survey reviews the historical development of programmable logic devices, the fundamental programming technologies that the pro- grammability is built on, and then describes the basic understandings gleaned from research on architectures. We include a survey of the key elements of modern commercial FPGA architecture, and look toward future trends in the field. 1 Introduction Field-Programmable Gate Arrays (FPGAs) are pre-fabricated silicon devices that can be electrically programmed to become almost any kind of digital circuit or system. They provide a number of compelling advantages over fixed-function Application Specific Integrated Circuit (ASIC) technologies such as standard cells [62]: ASICs typically take months to fabricate and cost hundreds of thousands to millions of dol- lars to obtain the first device; FPGAs are configured in less than a second (and can often be reconfigured if a mistake is made) and cost anywhere from a few dollars to a few thousand dollars.
    [Show full text]
  • (19) United States (12) Reissued Patent (10) Patent Number: US RE44,365 E Vorbach Et Al
    USOORE44365E (19) United States (12) Reissued Patent (10) Patent Number: US RE44,365 E Vorbach et al. (45) Date of Reissued Patent: Jul. 9, 2013 (54) METHOD OF SELF-SYNCHRONIZATION OF (56) References Cited CONFIGURABLE ELEMENTS OFA PROGRAMMABLE MODULE U.S. PATENT DOCUMENTS 2,067.477 A 1/1937 Cooper (76) Inventors: Martin Vorbach, Lingenfield (DE); 3,242.998 A 3, 1966 Gubbins 3,564,506 A 2f1971 Bee et al. Robert M. Minch, Zug (DE) 3,681,578 A 8, 1972 Stevens (21) Appl. No.: 12/909,061 (Continued) FOREIGN PATENT DOCUMENTS (22) Filed: Oct. 21, 2010 DE 4221. 278 1, 1994 Related U.S. Patent Documents DE 44 16881 11, 1994 Reissue of: (Continued) (64) Patent No.: 7,036,036 OTHER PUBLICATIONS Issued: Apr. 25, 2006 Li, Zhiyuan, et al., “Configuration prefetching techniques for partial Appl. No.: 10/379,403 reconfigurable coprocessor with relocation and defragmentation.” Filed: Mar. 4, 2003 International Symposium on Field Programmable Gate Arrays, Feb. U.S. Applications: 1, 2002, pp. 187-195. (60) Division of application No. 12/109,280, filed on Apr. 24, 2008, which is a continuation of application No. (Continued) 09/369,653, filed on Aug. 6, 1999, now Pat. No. 6,542, Primary Examiner — Paul Yanchus, III 998, which is a continuation-in-part of application No. (74) Attorney, Agent, or Firm — Bechen PLLC PCT/DE98/00334, filed on Feb. 7, 1998, and a contin uation-in-part of application No. 08/946,812, filed on (57) ABSTRACT Oct. 8, 1997, now Pat. No. 6,081,903. A method of synchronizing and reconfiguring configurable elements in a programmable unit is provided.
    [Show full text]
  • Parallel Architecture Hardware and General Purpose Operating System
    Parallel Architecture Hardware and General Purpose Operating System Co-design Oskar Schirmer G¨ottingen, 2018-07-10 Abstract Because most optimisations to achieve higher computational performance eventually are limited, parallelism that scales is required. Parallelised hard- ware alone is not sufficient, but software that matches the architecture is required to gain best performance. For decades now, hardware design has been guided by the basic design of existing software, to avoid the higher cost to redesign the latter. In doing so, however, quite a variety of supe- rior concepts is excluded a priori. Consequently, co-design of both hardware and software is crucial where highest performance is the goal. For special purpose application, this co-design is common practice. For general purpose application, however, a precondition for usability of a computer system is an arXiv:1807.03546v1 [cs.DC] 10 Jul 2018 operating system which is both comprehensive and dynamic. As no such op- erating system has ever been designed, a sketch for a comprehensive dynamic operating system is presented, based on a straightforward hardware architec- ture to demonstrate how design decisions regarding software and hardware do coexist and harmonise. 1 Contents 1 Origin 4 1.1 Performance............................ 4 1.2 Limits ............................... 5 1.3 TransparentStructuralOptimisation . 8 1.4 VectorProcessing......................... 9 1.5 Asymmetric Multiprocessing . 10 1.6 SymmetricMulticore ....................... 11 1.7 MultinodeComputer ....................... 12 2 Review 14 2.1 SharedMemory.......................... 14 2.2 Cache ............................... 15 2.3 Synchronisation .......................... 15 2.4 Time-Shared Multitasking . 15 2.5 Interrupts ............................. 16 2.6 Exceptions............................. 16 2.7 PrivilegedMode.......................... 17 2.8 PeripheralI/O .........................
    [Show full text]