Automatic Instruction-Set Architecture Synthesis for VLIW Processor Cores in the ASAM Project
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Instruction-set Architecture Synthesis for VLIW Processor Cores in the ASAM Project Roel Jordansa,∗, Lech J´o´zwiaka, Henk Corporaala, Rosilde Corvinob aEindhoven University of Technology, Postbus 513, 5600MB Eindhoven, The Netherlands bIntel Benelux B.V., Capronilaan 37, 1119NG Schiphol-Rijk Abstract The design of high-performance application-specific multi-core processor systems still is a time consuming task which involves many manual steps and decisions that need to be performed by experienced design engineers. The ASAM project sought to change this by proposing an auto- matic architecture synthesis and mapping flow aimed at the design of such application specific instruction-set processor (ASIP) systems. The ASAM flow separated the design problem into two cooperating exploration levels, known as the macro-level and micro-level exploration. This paper presents an overview of the micro-level exploration level, which is concerned with the anal- ysis and design of individual processors within the overall multi-core design starting at the initial exploration stages but continuing up to the selection of the final design of the individual proces- sors within the system. The designed processors use a combination of very-long instruction-word (VLIW), single-instruction multiple-data (SIMD), and complex custom DSP-like operations in or- der to provide an area- and energy-efficient and high-performance execution of the program parts assigned to the processor node. In this paper we present an overview of how the micro-level design space exploration interacts with the macro-level, how early performance estimates are used within the ASAM flow to determine the tasks executed by each processor node, and how an initial processor design is then proposed and refined into a highly specialized VLIW ASIP. The micro-level architecture exploration is then demonstrated with a walk-through description of the process on an example program kernel to further clarify the exploration and architecture specialization process. The main findings of the experimental research are that the presented method enables an auto- matic instruction-set architecture synthesis for VLIW ASIPs within a reasonable exploration time. Using the presented approach, we were able to automatically determine an initial architecture pro- totype that was able to meet the temporal performance requirements of the target application. Subsequently, refinement of this architecture considerably reduced both the design area (by 4x) and the active energy consumption (by 2x). Keywords: Very Long Instruction Word (VLIW), Application Specific Instruction-set Processor (ASIP), Instruction-set architecture synthesis 1. Introduction The recent nano-dimension semiconductor technology nodes enable implementation of complex (heterogeneous) parallel processors and very complex multi-processor systems on a single chip (MPSoCs). Those MPSoCs may involve tens of different complex parallel processors, substantial memory and communication resources, and can realize complex high-performance computations ∗Corresponding author Email address: [email protected] (Roel Jordans) Preprint submitted to Microprocessors and Microsystems April 19, 2017 in an energy efficient way. This facilitates a rapid progress in mobile and autonomous comput- ing, global networking and wireless communication which, combined with a substantial progress in sensor and actuator technologies, creates new important opportunities. Many traditional ap- plications can now be served much better, but what is more important, numerous new sorts of smart communicating mobile and autonomous cyber-physical systems became technologically fea- sible and economically justified. Various systems performing monitoring, control, diagnostics, communication, visualization or combination of these tasks, and representing (parts of) different mobile, remote or poorly accessible objects, installations, machines, vehicles or devices, or even being wearable or implantable in human or animal bodies can serve as examples. A new wave of information technology revolution is arriving that started to create much more coherent and fit to use modern smart communicating cyber-physical systems (CPS) that are part of the internet of things (IoT). The rapidly developing markets of the modern CPS and IoT represent great opportunities, but these opportunities come with the high system complexity and stringent requirements of many modern CPS applications. Two essential characteristics of the mobile CPS are: • the requirement of (ultra-)low energy consumption, often combined with the requirement of high performance, and • heterogeneity, in the sense of convergence and combination of various earlier separated ap- plications, systems and technologies of different sorts in one system, or even in a single chip or package. Smart wearable devices represent a huge heterogeneous CPS system area, covering many differ- ent fields and kinds of applications (from mobile personal computing and communications, through security and safety systems to wireless health-care or sport applications), with each application having its different specific requirements. Also a modern car involves numerous sub/systems im- posing different specific requirements. What is however common to virtually all these applications is that they are autonomous regarding their limited energy sources, and therefore they are re- quired to ensure (ultra-)low energy consumption. Some of these applications do not impose any high demands in computing and communication performance, and can satisfactorily be served with simple micro-controllers and a short distance communication with e.g. a smartphone. Some other applications impose more substantial computing and communication performance requirements and need more sophisticated low-power micro-controllers or micro-controller MPSoCs, equipped a. o. with on-chip embedded memories, multiple sensors, A/D converters and Bluetooth Low Energy, ZigBee or multi-standard communication. However, many of the modern complex and highly-demanding mobile and autonomous CPS impose difficult to satisfy ultra-high demands for which the micro-controller or general-purpose processor based solutions are not satisfactory. These are mainly applications that are required to provide continuous autonomous service in a long time and involve big instant data generated/- consumed by video sensors/actuators and other multi-sensors/actuators, and in consequence, at the same time demand (ultra-)low energy consumption (often below 1 W) and (ultra-)high per- formance in computing and communication (often above 1 Gbps). Examples of such applications include: mobile personal computing, communication and media, video sensing and monitoring, computer vision, augmented reality, image and other multi-sensor processing for health-care and well-being, mobile robots, intelligent car applications, etc. Specifically, smart cars and various wearable systems to a growing degree involve big instant data from multiple complex sensors (e.g. camera, radar, lidar, ultrasonic, EEG, sensor network tissues, etc.) or from other systems. To process these big data in real-time, they demand a guaranteed (ultra-)high performance. Being used for safety-critical applications and demanded to provide continuous autonomous service in a long time, they are required to be highly reliable, safe and secure. The (ultra-) high demands of the modern mobile CPS combined with todays unusual silicon and systems complexity result in serious system development challenges, such as: • guaranteeing the real-time high performance, while at the same time satisfying the require- ments of (ultra-)low energy consumption, and high safety, security and dependability; 2 • accounting in design for a lot of aspects and changed relationships among aspects (e.g. leakage power, negligible in the past, is a very serious issue now); • complex multi-level multi-objective optimization (e.g. processing power versus energy con- sumption and silicon area) and adequate resolution of numerous complex design trade-offs (e.g. between various conflicting requirements, solutions at different design levels, or solutions for different system parts); • reduction of the design productivity gap for the increasingly complex and sophisticated systems; • reduction of the time-to-market and development costs without compromising the system quality, etc. To satisfy the above discussed demands and overcome the challenges, a substantial system ar- chitecture and design technology adaptation is necessary. New sophisticated high-performance and low-power heterogeneous computing architectures are needed involving highly-parallel application- specific instruction-set processors (ASIPs) and/or HW accelerators, as well as, new effective design methods and design automation tools are needed to efficiently create the sophisticated architec- tures and perform the complex processes of application mapping on those architectures. The systemic realizations of complex and highly demanding applications, for which the cus- tomizable multi-ASIP MPSoC technology is especially suitable, demand the performance and energy usage levels comparable to those of ASICs, while requiring a short time-to-market and remaining cost-effective. Satisfaction of these stringent and often conflicting application demands requires construction of highly-optimized application-specific hardware architectures and corre- sponding software structures mapped on the architectures. This can be achieved through an effi- cient exploitation of different kinds of parallelism