Research Challenges for On-Chip Interconnection Networks

Total Page:16

File Type:pdf, Size:1020Kb

Research Challenges for On-Chip Interconnection Networks ......................................................................................................................................................................................................................................................... RESEARCH CHALLENGES FOR ON-CHIP INTERCONNECTION NETWORKS ......................................................................................................................................................................................................................................................... ON-CHIP INTERCONNECTION NETWORKS ARE RAPIDLY BECOMING A KEY ENABLING John D. Owens TECHNOLOGY FOR COMMODITY MULTICORE PROCESSORS AND SOCS COMMON IN University of California, CONSUMER EMBEDDED SYSTEMS.LAST YEAR, THE NATIONAL SCIENCE FOUNDATION Davis INITIATED A WORKSHOP THAT ADDRESSED UPCOMING RESEARCH ISSUES IN OCIN William J. Dally TECHNOLOGY, DESIGN, AND IMPLEMENTATION AND SET A DIRECTION FOR RESEARCHERS Stanford University IN THE FIELD. ...... VLSI technology’s increased capa- (NoC), whose philosophy has been sum- Ron Ho bility is yielding a more powerful, more marized as ‘‘route packets, not wires.’’2 capable, and more flexible computing Connecting components through an on- Sun Microsystems system on single processor die. The micro- chip network has several advantages over processor industry is moving from single- dedicated wiring, potentially delivering core to multicore and eventually to many- high-bandwidth, low-latency, low-power D.N. (Jay) core architectures, containing tens to hun- communication over a flexible, modular dreds of identical cores arranged as chip medium. OCINs combine performance Jayasimha multiprocessors (CMPs).1 Another equally with design modularity, allowing the in- important direction is toward systems on tegration of many design elements on Intel Corporation a chip (SoCs), composed of many types of a single die. processors on a single chip. Microprocessor Although the benefits of OCINs are vendors are also pursuing mixed approaches substantial, reaching their full potential Stephen W. Keckler that combine multiple identical cores with presents numerous research challenges. In University of Texas at different cores, such as the AMD Fusion 2006, the National Science Foundation processors combining multiple CPU cores initiated a workshop to identify these Austin and a graphics core. challenges and to chart a course to solve Whether homogeneous, heterogeneous, them. The conclusions we present here are or hybrid, cores must be connected in the work of all the attendees of the a high-performance, flexible, scalable, de- workshop, held last December at Stanford Li-Shiuan Peh sign-friendly manner. The emerging tech- University. All the presentation slides, Princeton University nology that targets such connections is posters, and videos of the workshop talks called an on-chip interconnection network are available online at http://www.ece.ucdavis. (OCIN), also known as a network on chip edu/,ocin06/program.html. ........................................................................... 96 Published by the IEEE Computer Society. 0272-1732/07/$20.00 G 2007 IEEE IEEE Micro micr-27-05-owen.3d 12/10/07 16:02:48 96 Cust # Owens ..................................................................................................................................................................... We found that three issues stand out as particularly critical challenges for OCINs: About the workshop power, latency, and CAD compatibility. The 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems, First, the power of OCINs implemented held at Stanford University on 6 and 7 December 2006, brought together about 50 of the with current techniques is too high (by leading researchers from academia and industry studying on-chip interconnection networks a factor of 10) to meet the expected needs of (OCINs). The NSF-initiated workshop featured invited presentations, poster presentations, future CMPs. Fortunately, a combination and working groups. The 15 invited presentations gave a technology forecast, surveyed of circuit and architecture techniques has applications, and captured the current state of the art and identified gaps in it. The posters the potential to reduce power to acceptable covered related topics for which time did not allow a plenary presentation. Each of the five levels. Second, the latency of these networks working groups met for a total of four hours to assess one aspect of OCIN technology, to is too large, leading to performance degra- perform a gap analysis, and to develop a research agenda for that aspect of on-chip dation when they are used to access on-chip networks. Each working group then presented a briefing on its findings. memory. Research efforts to develop spec- We greatly appreciate the dedication and energy of the workshop participants in defining ulative microarchitectures that reduce laten- the research agenda we present in this article. The technology working group included Dave cy through a router to a single clock, circuit Albonesi, Cornell University; Keren Bergman, Columbia University; Nathan Binkert, HP Labs; techniques that increase signal velocity on Shekhar Borkar, Intel; Chung-Kuan Cheng, UC San Diego; Danny Cohen, Sun Labs; Jo channels, and network architectures that Ebergen, Sun Labs; and Ron Ho, Sun Labs. The system architectures working group members reduce the number of hops might overcome included Jose Duato, Polytechnic University of Valencia; Partha Kundu, Intel; Manolis this problem. Third, many on-chip network Katevenis, University of Crete; Chita Das, Penn State; Sudhakar Yalamanchili, Georgia Tech; circuit and architecture techniques are John Lockwood, Washington University; and Ani Vaidya, Intel. The microarchitectures incompatible with modern design flows working group included Luca Carloni, Columbia University; Steve Keckler, University of Texas and CAD tools, making them unsuitable at Austin; Robert Mullins, Cambridge University; Vijay Narayanan, Penn State; Steve Reinhardt, Reservoir Labs; and Michael Taylor, UC San Diego. The design tools working for use in SoCs. Research to provide library group included Luca Benini, University of Bologna; Mark Hummel, AMD; Olav Lysne, Simula encapsulation of network components Lab, Norway; Li-Shiuan Peh, Princeton; Li Shang, Queens University, Canada; and Mithuna might provide compatibility. Thottethodi, Purdue. The evaluation working group included Rajeev Balasubramaniam, The workshop identified five broad University of Utah; Angelos Bilas, University of Crete; D.N. (Jay) Jayasimha, Intel; Rich research areas and the key issues in each Oehler, AMD; D.K. Panda, Ohio State University; Darshan Patra, Intel; Fabrizio Petrini, Pacific area: National Labs; and Drew Wingard, Sonics. The generous support of the National Science Foundation (through the Computer N OCIN technology and circuits. How Architecture Research and Computer Systems Research programs) and the University of will technology (such as the CMOS California Discovery Program made the workshop possible. Bill Dally and John Owens roadmap from the International Tech- chaired the workshop, Timothy Pinkston and Jan Rabaey provided suggestions for workshop nology Roadmap for Semiconductors) direction, and Jane Klickman provided expert logistic and administrative support. and circuit design affect on-chip network design? N OCIN microarchitecture. What micro- Technology-driving applications architecture is needed for on-chip At the workshop, we considered two routers and network interfaces to meet representative technology-driving applica- latency, area, and power constraints? tions for on-chip networks. N OCIN system architecture. What system architecture (topology, routing, flow Applications for CMP systems control, interfaces) is best suited for Large-scale, enterprise-class systems as- on-chip networks? sembled as CMP-style machines require N CAD and design tools for OCINs. What a high-performance network to attain the CAD tools are needed to design on- throughput important to their applications. chip networks and systems using on- For these machines, users will be willing to chip networks? spend on power to achieve performance, at N Evaluation and driving applications for least to reasonable levels, such as to the air- OCINs. How should on-chip networks cooled limit for chips. Cost will be be evaluated? What will be the important because it will determine how dominant workloads for OCINs in many racks can be purchased for a data five to 10 years? center, but it will not be the overriding ........................................................................... SEPTEMBER–OCTOBER 2007 97 IEEE Micro micr-27-05-owen.3d 12/10/07 16:03:04 97 Cust # Owens ......................................................................................................................................................................................................................... OCIN RESEARCH CHALLENGES factor. With the emergence of graphics- back to a central control location. Commu- based applications targeted to the end user, nication devices for soldiers will have similar even desktop systems will have general- and computation, storage, and communication special-purpose computing cores and other requirements. Other possible applications platform elements integrated on a die. include real-time medical communication These designs, which require an appropriate devices, handheld gaming devices, and on-die interconnect,
Recommended publications
  • Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase
    Multi-Core Processors and Systems: State-of-the-Art and Study of Performance Increase Abhilash Goyal Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT speedup. Some tasks are easily divided into parts that can be To achieve the large processing power, we are moving towards processed in parallel. In those scenarios, speed up will most likely Parallel Processing. In the simple words, parallel processing can follow “common trajectory” as shown in Figure 2. If an be defined as using two or more processors (cores, computers) in application has little or no inherent parallelism, then little or no combination to solve a single problem. To achieve the good speedup will be achieved and because of overhead, speed up may results by parallel processing, in the industry many multi-core follow as show by “occasional trajectory” in Figure 2. processors has been designed and fabricated. In this class-project paper, the overview of the state-of-the-art of the multi-core processors designed by several companies including Intel, AMD, IBM and Sun (Oracle) is presented. In addition to the overview, the main advantage of using multi-core will demonstrated by the experimental results. The focus of the experiment is to study speed-up in the execution of the ‘program’ as the number of the processors (core) increases. For this experiment, open source parallel program to count the primes numbers is considered and simulation are performed on 3 nodes Raspberry cluster . Obtained results show that execution time of the parallel program decreases as number of core increases.
    [Show full text]
  • Exascale Computing Study: Technology Challenges in Achieving Exascale Systems
    ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems Peter Kogge, Editor & Study Lead Keren Bergman Shekhar Borkar Dan Campbell William Carlson William Dally Monty Denneau Paul Franzon William Harrod Kerry Hill Jon Hiller Sherman Karp Stephen Keckler Dean Klein Robert Lucas Mark Richards Al Scarpelli Steven Scott Allan Snavely Thomas Sterling R. Stanley Williams Katherine Yelick September 28, 2008 This work was sponsored by DARPA IPTO in the ExaScale Computing Study with Dr. William Harrod as Program Manager; AFRL contract number FA8650-07-C-7724. This report is published in the interest of scientific and technical information exchange and its publication does not constitute the Government’s approval or disapproval of its ideas or findings NOTICE Using Government drawings, specifications, or other data included in this document for any purpose other than Government procurement does not in any way obligate the U.S. Government. The fact that the Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them. APPROVED FOR PUBLIC RELEASE, DISTRIBUTION UNLIMITED. This page intentionally left blank. DISCLAIMER The following disclaimer was signed by all members of the Exascale Study Group (listed below): I agree that the material in this document reects the collective views, ideas, opinions and ¯ndings of the study participants only, and not those of any of the universities, corporations, or other institutions with which they are a±liated. Furthermore, the material in this document does not reect the o±cial views, ideas, opinions and/or ¯ndings of DARPA, the Department of Defense, or of the United States government.
    [Show full text]
  • Unstructured Computations on Emerging Architectures
    Unstructured Computations on Emerging Architectures Dissertation by Mohammed A. Al Farhan In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia May 2019 2 EXAMINATION COMMITTEE PAGE The dissertation of M. A. Al Farhan is approved by the examination committee Dissertation Committee: David E. Keyes, Chair Professor, King Abdullah University of Science and Technology Edmond Chow Associate Professor, Georgia Institute of Technology Mikhail Moshkov Professor, King Abdullah University of Science and Technology Markus Hadwiger Associate Professor, King Abdullah University of Science and Technology Hakan Bagci Associate Professor, King Abdullah University of Science and Technology 3 ©May 2019 Mohammed A. Al Farhan All Rights Reserved 4 ABSTRACT Unstructured Computations on Emerging Architectures Mohammed A. Al Farhan his dissertation describes detailed performance engineering and optimization Tof an unstructured computational aerodynamics software system with irregu- lar memory accesses on various multi- and many-core emerging high performance computing scalable architectures, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic- and architecture-oriented optimizations are essential for achieving worthy performance. We investigate several state-of-the-practice shared-memory optimization techniques applied to key kernels for the important problem class of unstructured meshes. We illustrate
    [Show full text]
  • High-Performance Optimizations on Tiled Many-Core Embedded Systems: a Matrix Multiplication Case Study
    High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study Arslan Munir, Farinaz Koushanfar, Ann Gordon-Ross & Sanjay Ranka The Journal of Supercomputing An International Journal of High- Performance Computer Design, Analysis, and Use ISSN 0920-8542 J Supercomput DOI 10.1007/s11227-013-0916-9 1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be self- archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy J Supercomput DOI 10.1007/s11227-013-0916-9 High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study Arslan Munir · Farinaz Koushanfar · Ann Gordon-Ross · Sanjay Ranka © Springer Science+Business Media New York 2013 Abstract Technological advancements in the silicon industry, as predicted by Moore’s law, have resulted in an increasing number of processor cores on a single chip, giving rise to multicore, and subsequently many-core architectures. This work focuses on identifying key architecture and software optimizations to attain high per- formance from tiled many-core architectures (TMAs)—an architectural innovation in the multicore technology.
    [Show full text]
  • High-Performance Optimizations on Tiled Many-Core Embedded Systems: a Matrix Multiplication Case Study
    J Supercomput (2013) 66:431–487 DOI 10.1007/s11227-013-0916-9 High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study Arslan Munir · Farinaz Koushanfar · Ann Gordon-Ross · Sanjay Ranka Published online: 5 April 2013 © Springer Science+Business Media New York 2013 Abstract Technological advancements in the silicon industry, as predicted by Moore’s law, have resulted in an increasing number of processor cores on a single chip, giving rise to multicore, and subsequently many-core architectures. This work focuses on identifying key architecture and software optimizations to attain high per- formance from tiled many-core architectures (TMAs)—an architectural innovation in the multicore technology. Although embedded systems design is traditionally power- centric, there has been a recent shift toward high-performance embedded computing due to the proliferation of compute-intensive embedded applications. The TMAs are suitable for these embedded applications due to low-power design features in many of these TMAs. We discuss the performance optimizations on a single tile (processor core) as well as parallel performance optimizations, such as application decompo- sition, cache locality, tile locality, memory balancing, and horizontal communica- tion for TMAs. We elaborate compiler-based optimizations that are applicable to A. Munir () · F. Koushanfar Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA e-mail: [email protected] F. Koushanfar e-mail: [email protected] A. Gordon-Ross Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA e-mail: [email protected]fl.edu A. Gordon-Ross NSF Center for High-Performance Reconfigurable Computing (CHREC), University of Florida, Gainesville, FL, USA S.
    [Show full text]
  • When HPC Meets Big Data in the Cloud
    When HPC meets Big Data in the Cloud Prof. Cho-Li Wang The University of Hong Kong Dec. 17, 2013 @Cloud-Asia Big Data: The “4Vs" Model • High Volume (amount of data) • High Velocity (speed of data in and out) • High Variety (range of data types and sources) • High Values : Most Important 2010: 800,000 petabytes (would fill a stack of DVDs 2.5 x 1018 reaching from the earth to the moon and back) By 2020, that pile of DVDs would stretch half way to Mars. Google Trend: (12/2012) Big Data vs. Data Analytics vs. Cloud Computing Cloud Computing Big Data 12/2012 • McKinsey Global Institute (MGI) : – Using big data, retailers could increase its operating margin by more than 60%. – The U.S. could reduce its healthcare expenditure by 8% – Government administrators in Europe could save more than €100 billion ($143 billion). Google Trend: 12/2013 Big Data vs. Data Analytics vs. Cloud Computing “Big Data” in 2013 Outline • Part I: Multi-granularity Computation Migration o "A Computation Migration Approach to Elasticity of Cloud Computing“ (previous work) • Part II: Big Data Computing on Future Maycore Chips o Crocodiles: Cloud Runtime with Object Coherence On Dynamic tILES for future 1000-core tiled processors” (ongoing) Big Data Too Big To Move Part I Multi-granularity Computation Migration Source: Cho-Li Wang, King Tin Lam and Ricky Ma, "A Computation Migration Approach to Elasticity of Cloud Computing", Network and Traffic Engineering in Emerging Distributed Computing Applications, IGI Global, pp. 145-178, July, 2012. Multi-granularity Computation
    [Show full text]
  • (3-D) Integration Technology
    FP7-ICT-2007-1 ELITE-215030 August 2011 CONFIDENTIAL WP: D2.5 Conclusion Report Extended Large (3-D) Integration TEchnology Seventh Framework Programme FP7-ICT-2007-1 Project Number: FP7-ICT-215030 D2.5: 3-D IC’s Modelling and Simulation: Conclusion Report Version 3.0 [1] FP7-ICT-2007-1 ELITE-215030 August 2011 CONFIDENTIAL WP: D2.5 Conclusion Report [2] FP7-ICT-2007-1 ELITE-215030 August 2011 CONFIDENTIAL WP: D2.5 Conclusion Report Extended Large (3-D) Integration TEchnology Project Name Extended Large (3-D) Integration Technology Project Number ELITE-215030 Document Title 3D IC’s Modelling and Simulation: Conclusion Report Work Package WP2 Dissemination Level CONFIDENTIAL Lead Beneficiary Lancaster University (ULANC) Document Editors Roshan Weerasekera, Matt Grange, Dinesh Pamunuwa (ULANC), Christine Fuchs (LETI), Luca Bortesi and Loris Vendrame (MICRON) Version 3.0 [3] FP7-ICT-2007-1 ELITE-215030 August 2011 CONFIDENTIAL WP: D2.5 Conclusion Report Abstract In this report we discuss the modelling and analysis of 3-D ICs within the scope of ELITE project and their usage in design space exploration of general systems. We present a set of TSV parasitic extraction models for various TSV structures laid out in different substrates, taking into account the physical proximity of neighbouring TSVs. The proposed models can be used in system-level performance design space explorations. The RF and low-frequency behaviour of TSVs has also been modelled and measured from test structures. We then discuss 3-D signalling conventions by exhaustively quantifying the trade offs between standard CMOS drivers and receivers as well as shielding techniques over the TSVs.
    [Show full text]
  • Programming for the Intel Xeon Phi
    Programming for the Intel Xeon Phi Michael Florian Hava RISC Software GmbH RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 1 The Road to Xeon Phi RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 2 Intel Pentium (1993 - 1995) . The Pentium was the first superscalar x86 – No out-of-order execution! – Predates all SIMD extensions . 1994: P54C – 75 – 100MHz – Core-design is used in several research projects, including the Xeon Phi architecture! RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 3 Tera-Scale Computing (2006-) . Research project to design TFLOP CPU . 2007: Teraflops Research Chip/Polaris – 80 cores (96-bit VLIW) – 1 TFLOP @ 63W . 2009: Single-chip Cloud Computer – 48 cores (x86) – No cache coherence RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 4 Larrabee (2009) . A fully programmable GPGPU based on x86 – Software renderer for OpenGL, DirectX,… . 32 – 48 cores – 4-way Hyper-Threading – Cache coherence – 512-bit vector registers [LRBni] . Planned product release: 2009-2010 RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 5 Many Integrated Core (2010-) . 2010: Knights Ferry (prototype) – 32 cores @ 1.2 GHz – 4-way Hyper-Threading – 512-bit vector registers [???] – 0.75 TFLOPS @ single precision . 2012: Knights Corner [Xeon Phi] – 57 – 61 cores @ 1.0 – 1.2 GHz – 4-way Hyper-Threading – 512-bit vector registers [KNC] – 1 TFLOP @ double precision RISC Software GmbH – Johannes Kepler University Linz © 2014 07.04.2014 | 6 Xeon Phi . Calculating Peak Xeon Phi FLOPs = – FLOPs = #core × GHz × SIMD vector width × fp-ops – FMA == 2 floating point operations (takes 1 cycle) .
    [Show full text]
  • Resilient On-Chip Memory Design in the Nano Era
    UC Irvine UC Irvine Electronic Theses and Dissertations Title Resilient On-Chip Memory Design in the Nano Era Permalink https://escholarship.org/uc/item/1fj2c0t2 Author Banaiyanmofrad, Abbas Publication Date 2015 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Resilient On-Chip Memory Design in the Nano Era DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Abbas BanaiyanMofrad Dissertation Committee: Professor Nikil Dutt, Chair Professor Alex Nicolau Professor Alex Veidenbaum 2015 c 2015 Abbas BanaiyanMofrad DEDICATION To my wife Marjan whom provided me the necessary strength to pursue my dreams. ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES ix ACKNOWLEDGMENTS x CURRICULUM VITAE xii ABSTRACT OF THE DISSERTATION xv 1 Introduction 1 1.1 Nano Era Design Trends and Challenges . 1 1.1.1 Technology Trend . 1 1.1.2 Chip Design Trend . 2 1.1.3 Application Trend . 2 1.2 Memories and Errors . 3 1.3 State-of-the-art Research Efforts . 5 1.4 Motivation . 7 1.5 Thesis Contributions . 10 1.6 Thesis Organization . 14 2 Flexible and Low-Cost Fault-tolerant Cache Design 15 2.1 Introduction . 15 2.2 Related Work . 19 2.2.1 Circuit-level Techniques . 19 2.2.2 Error Coding Techniques . 20 2.2.3 Architecture-Level Techniques . 21 2.3 FFT-Cache . 23 2.3.1 Proposed Architecture . 23 2.3.2 Evaluation .
    [Show full text]
  • Research Challenges for On-Chip Interconnection Networks
    ..................................................................................................................................................................................................................................................... RESEARCH CHALLENGES FOR ON-CHIP INTERCONNECTION NETWORKS ..................................................................................................................................................................................................................................................... ON-CHIP INTERCONNECTION NETWORKS ARE RAPIDLY BECOMING A KEY ENABLING John D. Owens TECHNOLOGY FOR COMMODITY MULTICORE PROCESSORS AND SOCS COMMON IN University of California, CONSUMER EMBEDDED SYSTEMS.LAST YEAR, THE NATIONAL SCIENCE FOUNDATION Davis INITIATED A WORKSHOP THAT ADDRESSED UPCOMING RESEARCH ISSUES IN OCIN William J. Dally TECHNOLOGY, DESIGN, AND IMPLEMENTATION AND SET A DIRECTION FOR RESEARCHERS Stanford University IN THE FIELD. ...... VLSI technology’s increased capa- (NoC), whose philosophy has been sum- Ron Ho bility is yielding a more powerful, more marized as ‘‘route packets, not wires.’’2 capable, and more flexible computing Connecting components through an on- Sun Microsystems system on single processor die. The micro- chip network has several advantages over processor industry is moving from single- dedicated wiring, potentially delivering core to multicore and eventually to many- high-bandwidth, low-latency, low-power D.N. (Jay) core architectures, containing tens to hun- communication
    [Show full text]
  • 3D Stacked Memory: Patent Landscape Analysis
    3D Stacked Memory: Patent Landscape Analysis Table of Contents Executive Summary……………………………………………………………………….…………………….1 Introduction…………………………………………………………………………….…………………………..2 Filing Trend………………………………………………………………………………….……………………….7 Taxonomy…………………………………………………………………………………….…..……….…………8 Top Assignees……………………………………………………………………………….….…..……………11 Geographical Heat Map…………………………………………………………………….……………….13 LexScoreTM…………………………………………………………….…………………………..….……………14 Patent Strength……………………………………………………………………………………..….……….16 Licensing Heat Map………………………………………………………….…………………….………….17 Appendix: Definitions………………………………………………………………………………….……..19 3D Stacked Memory: Patent Landscape Analysis EXECUTIVE SUMMARY Memory bandwidth, latency and capacity have become a major performance bottleneck as more and more performance and storage are getting integrated in computing devices, demanding more data transfer between processor and system memory (Volatile and Non-Volatile). This memory bandwidth and latency problem can be addressed by employing a 3D-stacked memory architecture which provides a wide, high frequency memory-bus interface. 3D stacking enables stacking of volatile memory like DRAM directly on top of a microprocessor, thereby significantly reducing transmission delay between the two. The 3D- stacked memory also improves memory capacity and cost of non-volatile storage memory like flash or solid state drives. By stacking, memory dies vertically in a three-dimensional structure, new potential for 3D memory capacities are created, eliminating performance and
    [Show full text]
  • Architecture of Large Systems CS-602
    Architecture of Large Systems CS-602 Computer Science and Engineering Department National Institute of Technology Instructor : Dr. Lokesh Chouhan Slide Sources : Andrew S. Tanenbaum, Structured Computer Organization Morris Mano, Computer System and organization book William Stallings, Computer System and organization adapted and supplemented Parallel Processing Computer Organizations Multiple Processor Organization • Single instruction, single data stream – SISD • Single instruction, multiple data stream – SIMD • Multiple instruction, single data stream – MISD • Multiple instruction, multiple data stream- MIMD Single Instruction, Single Data Stream - SISD • Single processor • Single instruction stream • Data stored in single memory Single Instruction, Multiple Data Stream - SIMD • Single machine instruction — Each instruction executed on different set of data by different processors • Number of processing elements — Machine controls simultaneous execution – Lockstep basis — Each processing element has associated data memory • Application: Vector and array processing Multiple Instruction, Single Data Stream - MISD • Sequence of data • Transmitted to set of processors • Each processor executes different instruction sequence • Not clear if it has ever been implemented Multiple Instruction, Multiple Data Stream- MIMD • Set of processors • Simultaneously executes different instruction sequences • Different sets of data • Examples: SMPs, NUMA systems, and Clusters Taxonomy of Parallel Processor Architectures Block Diagram of Tightly Coupled
    [Show full text]