Survey on High Performance Reconfigurable Soft - Core Processor for SIMD Applications

PROTEUS JOURNAL ISSN/eISSN: 0889-6348 Survey on High Performance Reconfigurable Soft - Core Processor for SIMD Applications Maheswari. R Pattabiraman. V Kanagaraj Venusamy Associate Professor Professor Lecturer SCOPE, Vellore Institute of SCOPE, Vellore Institute of University of Technology and Technology, Chennai, Tamilnadu, Technology, Chennai, Tamilnadu, Applied Sciences - AlMusanaah, India. India. Muscat, Sultanate of Oman [email protected] [email protected] [email protected] ABSTRACT—The prospective need of SIMD (Single requirement in most of the modern embedded systems. Instruction and Multiple Data) applications like video and The modern processor comes in a wide spectrum of image processing systems requires a processor with greater architecture ranging from hard-core to soft-core processor flexibility and computation to deliver high quality real time with CISC (Complex Instruction Set) and RISC (Reduced output. The main goal of work is to offer a wider survey over high performance processor through various reconfigurable Instruction Set) architecture. The general purpose hard- techniques targeting on real time SIMD dataset. The real- core processor like Intel, Motorola 68000, etc., falls on its time multimedia streaming with extensive parallel data one side and reconfigurable FPGA based soft-core demands high computational processors with low power processor like MicroBlaze, SPARC, etc., falls on the other consumption. The modern processors with flexible computation supports on-the-fly system redesign with side of the spectrum. Each core contains various ranges of reconfiguration techniques at reduced cost. It also functional sub-blocks implemented either in FPGA (Field- accommodates new functionalities sustaining the increasing Programmable Gate Array) or in an ASIC (Application demand for processor with less NREs (Non-Recurring Specific Integrated Circuits). The major advantage of Engineering) cost and shorter time-to-market. The processors synthesizing the processor in FPGA rather than ASIC is with flexible platform permit the designer to incorporate the demanding changes in the existing standards of any that it provides re-configurability when altering the design application like wireless communication, telecommunication with no significant cost [1]. etc. Adaptive computing is the new processor paradigm evolved in early 90s to bridge the space which exists between A.MOTIVATION the generic processor and application specific processor. It In this embedded era, 90% of applications (mobile phones, also supports the inclusion of specific hardware accelerator in the existing RISC (Reduced Instruction Set Computer) based video surveillance systems, medical equipment like architecture to improve the performance of particular medical imaging systems, patient monitoring systems) application. In the last decade, processor based embedded target high performance computing at low power systems like SoC (System–on–Chip) have become more consumption. To accomplish this multimedia data prevalent due to their high performance at low power. Many embedded application designers like mobile and smart phone processing, there is a huge demand for high performance developers namely Philips N-Experia, Intel PXA etc., and low power processor. The two major solutions to explored SoC as computational devices. Thus this work improve the performance of the processor are through summarizes a literature survey of all the related works such hardware and software. Hardware solutions always ensure as RC (Reconfigurable Computing) , HPRC (High high performance and low power than software solutions, Performance Reconfigurable Computing), FPGA based processors, open source soft-core processor, reconfigurable but with very less flexibility to reconfigurability. Here architecture, parallel computing and SIMD. The detailed comes the need for processor architecture, literature survey analyzing the impact of current techniques targeting processor performance is illustrated. Which has to balance between the power as well as performance of multimedia dataset. There exists a KEYWORDS—Reconfigurable Computing, High Performance significant performance gap between commercial SCP like Computing, FPGA, Soft-core Processor MicroBlaze (Xilinx), Nios (Altera) and open source SCP like OR1200, SPARC. As an example MicroBlaze and I . INTRODUCTION Nios have 31-37% lower run times than OR1200 (Gartner Modern embedded applications are ironic in Dataquest). To fill this gap, there is a need for multimedia data involving audio and video processing. reconfigurable high performance soft-core processor This real time streaming with extensive parallel data architecture. To support DLP and ILP in multimedia demands for high computational devices with low power applications, many techniques such as SIMD, SMP consumption. These high performance processors work as (Symmetric Multi-Processor), VLIW (Very Long the driving tool exploring the resource and design relevant 1 VOLUME 11 ISSUE 9 2020 http://www.proteusresearch.org/ Page No: 713 PROTEUS JOURNAL ISSN/eISSN: 0889-6348 Instruction Word), super-scalar instruction sets are being utilized in the processor. Out of these techniques, this work focused on the wider study on SIMD instruction set along with super-scalar architecture as it provides greater flexibility in the reconfigurable architecture with improved performance. B.Processor Processor architecture works as fundamental component Fig.1. when exploring the resource and design relevant Impact of embedded systems in various application domains requirement in most of the modern embedded systems. As a result, most research investigations aim to achieve good D.FPGA improvement in processor performance at reasonably low An FPGA is a device with large collection of power dissipation. The hard-core processors provide high programmable logic cells interconnected through a performance for real-time embedded applications, but they network. The high density interconnections across the are characterized by non-reconfigurability feature. Fig.1 logic cells provides higher flexibility to implement any represents the trade-off between flexibility and computational model at low power consumption [2]. Since performance of processors based on its implementation. It these logic cells are independent blocks, exploiting is implicit that General Purpose Processor (GPP) proves to temporal and spatial parallelism is very efficient, i.e. it provide more flexibility adapting itself to configure in allows many processing elements to execute concurrently. many fields, but with degradation in performance. An They are typically described by register transfer level Application Specific Processor gives better performance (RTL) abstractions using an HDL (Hardware Description than GPP with optimized instruction set. Whereas ASIC Language). The building blocks of any FPGA are CLB gives higher performance than other implementation, (Configurable Logic Block), input and output blocks and altering its feature after fabrication is not feasible one. interconnection network. Through direct interconnect, all Thus reconfigurable processor lies in between the GPP and the adjacent CLBs are interconnected whereas through ASIC provides greater flexibility with better performance. programmable switching matrix (PSM), the horizontal and Almost all digital circuits are classically partitioned into vertical CLBs are intercomnected. various functional modules called as cores. Each core contains various ranges of functional sub-blocks implemented either in FPGA or in an ASIC. C .EMBEDDED SYSTEMS Embedded system is a computational system with dedicated function to perform specific job. It integrates three components into functional units such as: • Hardware specific for the Application• Embedded software • RTOS (Real Time Operating Systems). Most of the functionality of embedded systems will be implemented using software. Unlike GPP (General Purpose Processor), embedded systems demand variety of hardware design to quench the Figure 2: FPGA functional view need of various applications. The major challenge in The internal architecture of FPGA is shown in Fig. 2. Each designing an embedded system is that it needs to work and slice of FPGA contains flip-flops, multiplexers and sustain in a diverse environment to handle various types of collections of digital logic gates called LUTs (Look-Up tasks and applications. Thus, the embedded designer needs Table). The LUTs are alsocalled functional generator of to address various issues such as high-computation, low FPGA. These FPGAs are reprogrammed dynamically power, reliability, flexibility, predictability and memory during compile time or execution time using virtual management. The occupancy of embedded systems in hardware. The CLB functional view of FPGA is shown in various fields is given in Fig 1. Fig.3. They are typically described by register transfer level (RTL) abstractions using HDL. 2 VOLUME 11 ISSUE 9 2020 http://www.proteusresearch.org/ Page No: 714 PROTEUS JOURNAL ISSN/eISSN: 0889-6348 Fig.3. CLB functional view The FPGA 7 allows the user to program the hardware even electronics, video and image processing, etc. were drifting during post-fabrication. In reality, FPGA exploits dynamic to the FPGA. reconfigurable fabric design methods to improve Fig.4.FPGA with Embedded Processor and without Embedded Processor performance of the targeted architecture (Wang et al. Survey (2009)). The major benefit of synthesizing the design in D.Embedded Processors FPGA is that

Survey on High Performance Reconfigurable Soft - Core Processor for SIMD Applications

Latticemico32 Development Kit User's Guide for Latticeecp

Latticemico32 Software Developer User Guide

Openpiton: an Open Source Manycore Research Framework

Network on Chip for FPGA Development of a Test System for Network on Chip

FPGA Design Guide

Embedded Networks on Chip for Field-Programmable Gate Arrays by Mohamed Saied Abdelfattah a Thesis Submitted in Conformity With

An Early Performance Evaluation of Many Integrated Core Architecture Based SGI Rackable Computing System

A Comparison of Four Series of Cisco Network Processors

MYTHIC MULTIPLIES in a FLASH Analog In-Memory Computing Eliminates DRAM Read/Write Cycles

Small Soft Core up Inventory ©2019 James Brakefield Opencore and Other Soft Core Processors Reverse-U16 A.T

Using Inspiration from Synaptic Plasticity Rules to Optimize Traffic Flow in Distributed Engineered Networks Arxiv:1611.06937V1

Mico32 White Paper 2008-02