A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable

A MATLAB Compiler For Distributed, Heterogeneous, Reconfigurable Computing Systems P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare A. Nayak, S. Periyacheri, M. Walkden, D. Zaretsky Electrical and Computer Engineering Northwestern University 2145 Sheridan Road Evanston, IL-60208 [email protected] Abstract capabilities and are coordinated to perform an application whose subtasks have diverse execution requirements. One Recently, high-level languages such as MATLAB have can visualize such systems to consist of embedded proces- become popular in prototyping algorithms in domains such sors, digital signal processors, specialized chips, and field- as signal and image processing. Many of these applications programmable gate arrays (FPGA) interconnected through whose subtasks have diverse execution requirements, often a high-speed interconnection network; several such systems employ distributed, heterogeneous, reconfigurable systems. have been described in [9]. These systems consist of an interconnected set of heteroge- A key question that needs to be addressed is how to map neous processing resources that provide a variety of archi- a given computation on such a heterogeneous architecture tectural capabilities. The objective of the MATCH (MATlab without expecting the application programmer to get into Compiler for Heterogeneous computing systems) compiler the low level details of the architecture or forcing him/her to project at Northwestern University is to make it easier for understand the finer issues of mapping the applications on the users to develop efficient codes for distributed, hetero- such a distributed heterogeneous platform. Recently, high- geneous, reconfirgurable computing systems. Towards this level languages such as MATLAB have become popular in end we are implementing and evaluating an experimental prototyping algorithms in domains such as signal and im- prototype of a software system that will take MATLAB de- age processing, the same domains which are the primary scriptions of various applications, and automatically map users of embedded systems. MATLAB provides a very them on to a distributed computing environment consisting high level language abstraction to express computations in a of embedded processors, digital signal processors and field- functional style which is not only intuitive but also concise. programmable gate arrays built from commercial off-the- However, currently no tools exist that can take such high- shelf components. In this paper, we provide an overview level specifications and generate low level code for such a of the MATCH compiler and discuss the testbed which is heterogeneous testbed automatically. being used to demonstrate our ideas of the MATCH com- The objective of the MATCH (MATlab Compiler for piler. We present preliminary experimental results on some distributed Heterogeneous computing systems) compiler benchmark MATLAB programs with the use of the MATCH project [3] at Northwestern University is to make it easier compiler. for the users to develop efficient codes for distributed heterogeneous computing systems. Towards this end we are implementing and evaluating an experimental prototype of 1 Introduction a software system that will take MATLAB descriptions of various embedded systems applications, and automatically A distributed, heterogeneous, reconfigurable computing map them on to a heterogeneous computing environment system consists of a distributed set of diverse processing re- consisting of field-programmable gate arrays, embedded sources which are connected by a high-speed interconnec- processors and digital signal processors built from commer- tion network; the resources provide a variety of architectural cial off-the-shelf components. An overview of the easy-to- 1 Development Environment: SUN Solaris 2, HP HPUX and MATLAB Windows Algorithm Ultra C/C++ for MVME Local Ethernet PACE C for TMS320 XACT for XILINX MATCH Compiler Transtech TDMB Annapolis Motorola MVME-2604 Wildchild board 428 DSP board Force 5V embedded boards •Nine XILINX •Four TDM 411 85 MHz •200 MHz PowerPC 604 4010 FPGAs • 60 MHz TMS MicroSPARC •64 MB RAM •2 MB RAM 320C40 DSP, CPU •OS-9 OS •50 MHz • 8 MB RAM 64 MB RAM COTS •Ultra C compiler •TI C compiler •Wildfire COTS COTS COTS software Embedded DSP FPGA Controller boards boards boards VME bus Distributed Heteregeous envvironment Figure 1. A graphical representation of the Figure 2. Overview of the Testbed to Demon- objectives of our MATCH compiler strate the MATCH Compiler use programming environment that we are trying to accom- We use an off-the-shelf multi-FPGA board from An- plish through our MATCH compiler is shown in Figure 1. napolis Microsystems [2] as the reconfigurable part of our TM The goal of our compiler is to generate efficient code auto- testbed. This W il dC hil d board has 8 Xilinx 4010 FP- matically for such a heterogeneous target that will be within GAs (each with 400 CLBs, 512KB local memory) and a a factor of 2-4 of the best manual approach with regard Xilinx 4028 FPGAs (with 1024 CLBs, 1MB local mem- to two optimization objectives: (1) Optimizing resources ory). A Transtech TDM-428 board is used as a DSP re- (such as type and number of processors, FPGAs, etc) under source. This board has four Texas Instruments TMS320C40 performance constraints (such as delays, and throughput) processors (each running at 60MHz, with 8MB RAM) in- (2) Optimizing performance under resource constraints. terconnected by an on board 8 bit wide 20MB/sec com- The paper is organized as follows. Section 2 provides an munication network. The other general purpose compute overview of the testbed which is being used to demonstrate resource employed in the MATCH testbed is a pair of Mo- our ideas of the MATCH compiler. We describe the vari- torola MVME2604 boards. Each of these boards hosts a ous components of the MATCH compiler in Section 3. We PowerPC-604 processor (each running at 200 MHz, with present preliminary experimental results of our compiler in 64 MB local memory) running Microware’s OS-9 operating Section 4. We compare our work with other related research system. These processors can communicate among them- in Section 5, and conclude the paper in Section 6. selves via a 100BaseT ethernet interface. A Force 5V board with MicroSPARC-II processor run- 2OverviewofMATCHTestbed ning Solaris 2.6 operating system forms one of the compute resources that also plays the role of a main controller of the testbed. This board can communicate with other boards ei- The testbed that we have designed to work with the ther via the VME bus or via the ethernet interface. MATCH project consists of four types of compute resources. These resources are realized by using off-the-shelf boards plugged into a VME cage. The VME bus provides 3 The MATCH Compiler the communication backbone for some control applications. In addition to the reconfigurable resources, our testbed also We will now discuss various aspects of the MATCH incorporates conventional embedded and DSP processors compiler that automatically translates the MATLAB pro- to handle special needs of some of the applications. Real grams and maps them on to different parts of the target sys- life applications often have parts of the computations which tem shown in Figure 2. The overview of the compiler is may not be ideally suited for the FPGAs. They could be ei- shown in Figure 3. ther control intensive parts or could be even complex float- MATLAB is basically a function oriented language and ing point applications. Such computations are performed most of the MATLAB programs can be written using pre- by these embedded and DSP processors. An overview of defined functions. These functions can be primitive func- the testbed is shown in Figure 2. tions or application specific. In a sequential MATLAB 2 developed a shape algebra which can be used to infer types and shapes of variables in MATLAB programs [6]. Often, it may not be possible to infer these attributes, in which case our compiler takes the help of user directives which allow the programmer to explicitly declare the MATLAB type/shape information. The directives for the MATCH Program compiler start with %!match and hence appear as comments Directives and MATLAB Libraries to other MATLAB interpreters/compilers. An extensive set Partitioner Automation on various Targets of directives have been designed for the MATCH compiler [3]. MATLAB to MATLAB to C/MPI MATLAB to C/MPI The maximum performance from a parallel heteroge- with Library Calls RTL VHDL with Library Calls neous target machine such as the one shown in Figure 2 can only be extracted by efficient mapping of various parts of C compiler for C compiler VHDL to embedded for DSP FPGA Synthesis the MATLAB program onto the appropriate parts of the tar- Object code for get. The MATCH compiler incorporates automatic mecha- Object code for Binaries for TMS 320C40 PowerPC 604 XILINX 4010 nisms to perform such mapping. It also provides ways using which an experienced programmer well versed with the target characteristics can guide the compiler to fine tune the mapping in the form of directives. Figure 3. The MATCH Compiler Components 3.1 Compilation Overview program, these functions are normally implemented as se- The first step in producing parallel code from a MAT- quential codes running on a conventional processor. In the LAB program involves parsing the input MATLAB pro- MATCH compiler however, these functions need to be im- gram based on a formal grammar and building

A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable

Reconfigurable Computing

Syllabus: EEL4930/5934 Reconfigurable Computing

Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems »

Hardware Platforms for Embedded Computing Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund Importance of Energy Efficiency

Reconfigurable Computing

A Reconfigurable Convolutional Neural Network-Accelerated

Reconfigurable Computing Brittany Ransbottom Benjamin Wheeler What Is Reconfigurable Computing?

Elements of High-Performance Reconfigurable Computing*

Coarse-Grained Reconfigurable Computing with the Versat

Reconfigurable Computing: Architectures and Design Methods

Dionysios Diamantopoulos

Power Efficiency Benchmarking of a Partially Reconfigurable, Many-Tile System Implemented on a Xilinx Virtex-6 FPGA