Research Challenges for On-Chip Interconnection Networks

......................................................................................................................................................................................................................................................... RESEARCH CHALLENGES FOR ON-CHIP INTERCONNECTION NETWORKS ......................................................................................................................................................................................................................................................... ON-CHIP INTERCONNECTION NETWORKS ARE RAPIDLY BECOMING A KEY ENABLING John D. Owens TECHNOLOGY FOR COMMODITY MULTICORE PROCESSORS AND SOCS COMMON IN University of California, CONSUMER EMBEDDED SYSTEMS.LAST YEAR, THE NATIONAL SCIENCE FOUNDATION Davis INITIATED A WORKSHOP THAT ADDRESSED UPCOMING RESEARCH ISSUES IN OCIN William J. Dally TECHNOLOGY, DESIGN, AND IMPLEMENTATION AND SET A DIRECTION FOR RESEARCHERS Stanford University IN THE FIELD. ...... VLSI technology’s increased capa- (NoC), whose philosophy has been sum- Ron Ho bility is yielding a more powerful, more marized as ‘‘route packets, not wires.’’2 capable, and more flexible computing Connecting components through an on- Sun Microsystems system on single processor die. The micro- chip network has several advantages over processor industry is moving from single- dedicated wiring, potentially delivering core to multicore and eventually to many- high-bandwidth, low-latency, low-power D.N. (Jay) core architectures, containing tens to hun- communication over a flexible, modular dreds of identical cores arranged as chip medium. OCINs combine performance Jayasimha multiprocessors (CMPs).1 Another equally with design modularity, allowing the in- important direction is toward systems on tegration of many design elements on Intel Corporation a chip (SoCs), composed of many types of a single die. processors on a single chip. Microprocessor Although the benefits of OCINs are vendors are also pursuing mixed approaches substantial, reaching their full potential Stephen W. Keckler that combine multiple identical cores with presents numerous research challenges. In University of Texas at different cores, such as the AMD Fusion 2006, the National Science Foundation processors combining multiple CPU cores initiated a workshop to identify these Austin and a graphics core. challenges and to chart a course to solve Whether homogeneous, heterogeneous, them. The conclusions we present here are or hybrid, cores must be connected in the work of all the attendees of the a high-performance, flexible, scalable, de- workshop, held last December at Stanford Li-Shiuan Peh sign-friendly manner. The emerging tech- University. All the presentation slides, Princeton University nology that targets such connections is posters, and videos of the workshop talks called an on-chip interconnection network are available online at http://www.ece.ucdavis. (OCIN), also known as a network on chip edu/,ocin06/program.html. ........................................................................... 96 Published by the IEEE Computer Society. 0272-1732/07/$20.00 G 2007 IEEE IEEE Micro micr-27-05-owen.3d 12/10/07 16:02:48 96 Cust # Owens ..................................................................................................................................................................... We found that three issues stand out as particularly critical challenges for OCINs: About the workshop power, latency, and CAD compatibility. The 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems, First, the power of OCINs implemented held at Stanford University on 6 and 7 December 2006, brought together about 50 of the with current techniques is too high (by leading researchers from academia and industry studying on-chip interconnection networks a factor of 10) to meet the expected needs of (OCINs). The NSF-initiated workshop featured invited presentations, poster presentations, future CMPs. Fortunately, a combination and working groups. The 15 invited presentations gave a technology forecast, surveyed of circuit and architecture techniques has applications, and captured the current state of the art and identified gaps in it. The posters the potential to reduce power to acceptable covered related topics for which time did not allow a plenary presentation. Each of the five levels. Second, the latency of these networks working groups met for a total of four hours to assess one aspect of OCIN technology, to is too large, leading to performance degra- perform a gap analysis, and to develop a research agenda for that aspect of on-chip dation when they are used to access on-chip networks. Each working group then presented a briefing on its findings. memory. Research efforts to develop spec- We greatly appreciate the dedication and energy of the workshop participants in defining ulative microarchitectures that reduce laten- the research agenda we present in this article. The technology working group included Dave cy through a router to a single clock, circuit Albonesi, Cornell University; Keren Bergman, Columbia University; Nathan Binkert, HP Labs; techniques that increase signal velocity on Shekhar Borkar, Intel; Chung-Kuan Cheng, UC San Diego; Danny Cohen, Sun Labs; Jo channels, and network architectures that Ebergen, Sun Labs; and Ron Ho, Sun Labs. The system architectures working group members reduce the number of hops might overcome included Jose Duato, Polytechnic University of Valencia; Partha Kundu, Intel; Manolis this problem. Third, many on-chip network Katevenis, University of Crete; Chita Das, Penn State; Sudhakar Yalamanchili, Georgia Tech; circuit and architecture techniques are John Lockwood, Washington University; and Ani Vaidya, Intel. The microarchitectures incompatible with modern design flows working group included Luca Carloni, Columbia University; Steve Keckler, University of Texas and CAD tools, making them unsuitable at Austin; Robert Mullins, Cambridge University; Vijay Narayanan, Penn State; Steve Reinhardt, Reservoir Labs; and Michael Taylor, UC San Diego. The design tools working for use in SoCs. Research to provide library group included Luca Benini, University of Bologna; Mark Hummel, AMD; Olav Lysne, Simula encapsulation of network components Lab, Norway; Li-Shiuan Peh, Princeton; Li Shang, Queens University, Canada; and Mithuna might provide compatibility. Thottethodi, Purdue. The evaluation working group included Rajeev Balasubramaniam, The workshop identified five broad University of Utah; Angelos Bilas, University of Crete; D.N. (Jay) Jayasimha, Intel; Rich research areas and the key issues in each Oehler, AMD; D.K. Panda, Ohio State University; Darshan Patra, Intel; Fabrizio Petrini, Pacific area: National Labs; and Drew Wingard, Sonics. The generous support of the National Science Foundation (through the Computer N OCIN technology and circuits. How Architecture Research and Computer Systems Research programs) and the University of will technology (such as the CMOS California Discovery Program made the workshop possible. Bill Dally and John Owens roadmap from the International Tech- chaired the workshop, Timothy Pinkston and Jan Rabaey provided suggestions for workshop nology Roadmap for Semiconductors) direction, and Jane Klickman provided expert logistic and administrative support. and circuit design affect on-chip network design? N OCIN microarchitecture. What micro- Technology-driving applications architecture is needed for on-chip At the workshop, we considered two routers and network interfaces to meet representative technology-driving applica- latency, area, and power constraints? tions for on-chip networks. N OCIN system architecture. What system architecture (topology, routing, flow Applications for CMP systems control, interfaces) is best suited for Large-scale, enterprise-class systems as- on-chip networks? sembled as CMP-style machines require N CAD and design tools for OCINs. What a high-performance network to attain the CAD tools are needed to design on- throughput important to their applications. chip networks and systems using on- For these machines, users will be willing to chip networks? spend on power to achieve performance, at N Evaluation and driving applications for least to reasonable levels, such as to the air- OCINs. How should on-chip networks cooled limit for chips. Cost will be be evaluated? What will be the important because it will determine how dominant workloads for OCINs in many racks can be purchased for a data five to 10 years? center, but it will not be the overriding ........................................................................... SEPTEMBER–OCTOBER 2007 97 IEEE Micro micr-27-05-owen.3d 12/10/07 16:03:04 97 Cust # Owens ......................................................................................................................................................................................................................... OCIN RESEARCH CHALLENGES factor. With the emergence of graphics- back to a central control location. Commu- based applications targeted to the end user, nication devices for soldiers will have similar even desktop systems will have general- and computation, storage, and communication special-purpose computing cores and other requirements. Other possible applications platform elements integrated on a die. include real-time medical communication These designs, which require an appropriate devices, handheld gaming devices, and on-die interconnect,

Research Challenges for On-Chip Interconnection Networks

Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase

Exascale Computing Study: Technology Challenges in Achieving Exascale Systems

Unstructured Computations on Emerging Architectures

High-Performance Optimizations on Tiled Many-Core Embedded Systems: a Matrix Multiplication Case Study

High-Performance Optimizations on Tiled Many-Core Embedded Systems: a Matrix Multiplication Case Study

When HPC Meets Big Data in the Cloud

(3-D) Integration Technology

Programming for the Intel Xeon Phi

Resilient On-Chip Memory Design in the Nano Era

Research Challenges for On-Chip Interconnection Networks

3D Stacked Memory: Patent Landscape Analysis

Architecture of Large Systems CS-602