A3MAP: Architecture-Aware Analytic Mapping for Networks-On-Chip Wooyoung Jang and David Z
Total Page:16
File Type:pdf, Size:1020Kb
6C-2 A3MAP: Architecture-Aware Analytic Mapping for Networks-on-Chip Wooyoung Jang and David Z. Pan Department of Electrical and Computer Engineering University of Texas at Austin [email protected], [email protected] Abstract - In this paper, we propose a novel and global A3MAP formulation, we seek to embed a task graph into the metric (Architecture-Aware Analytic Mapping) algorithm applied to space of network. Then, the quality of task mapping is NoC (Networks-on-Chip) based MPSoC (Multi-Processor measured by the total distortion of metric embedding. System-on-Chip) not only with homogeneous cores on regular Through this formulation, our A3MAP can map a task mesh architecture as done by most previous mapping adaptively to any different sized tile both on a algorithms but also with heterogeneous cores on irregular mesh or custom architecture. As a main contribution, we develop a regular/irregular mesh and on a custom network. Fig. 1 simple yet efficient interconnection matrix that models any task shows the methodology of our A3MAP. Given a task graph graph and network. Then, task mapping problem is exactly and a network as inputs, an interconnection matrix that can formulated to an MIQP (Mixed Integer Quadratic model any task graph and network along interconnection is Programming). Since MIQP is NP-hard [15], we propose two generated. Then, task mapping problem is exactly effective heuristics, a successive relaxation algorithm and a formulated to an MIQP (Mixed Integer Quadratic genetic algorithm. Experimental results show that A3MAP by Programming) and is solved by two effective heuristics since the successive relaxation algorithm reduces an amount of the MIQP is NP-hard [15]. One is successive relaxation of traffic up to 5.7%, 16.1% and 7.3% on average in regular mesh, MIQP to a sequence of QP (Quadratic Programming) as a irregular mesh and custom network, respectively, compared to the previous state-of-the-art work [1]. A3MAP by the genetic fast algorithm and the other is a genetic algorithm that is algorithm reduces more traffic up to 8.8%, 29.4% and 16.1 % efficient random search algorithm to find better mapping on average than [1] in regular mesh, irregular mesh and solution. Importantly, our framework not only enables global, custom network, respectively even if its runtime is longer. rather than local, optimization but also maps a task to both a regular mesh and an irregular mesh/custom network. I. Introduction To the best of our knowledge, this is the first work that addresses task mapping in an irregular mesh network and a So far, most of the NoC (Networks-on-Chip) based custom network. The rest of this paper is organized as MPSoCs (Multi-Processor System-on-Chip) targeting follows: Section 2 reviews related works. Section 3 presents general-purposed computing favor a regular mesh network our novel A3MAP formulation. In Section 4, our successive [1-9]. The regular mesh network lets task-to-tile mapping relaxation algorithm and genetic algorithm are proposed to easier, increases routing efficiency, provides desirable solve A3MAP formulated to a MIQP. Section 5 shows electrical and physical properties and reduces the complexity experiment results in comparison with the previous of resource management. However, real MPSoCs consist of state-of-art work [1]. Section 6 is used for conclusion. various processors together with a DSP (Digital Signal Processor) and a memory as shown in Philips Nexperia platform [10], STMicroelectronics Nomadik [11] and Texas Instruments OMAP [12]. Since the physically different-sized processor, DSP and memory cannot be floorplanned to a regular mesh network, a resulting NoC-based MPSoC gets an irregular mesh network or even a custom network. The irregular mesh network can be also found in a regular mesh network when some links become faulty and degraded by process variation and temperature variation. After the fault and degradation of link are detected, task mapping and routing path allocation [13] should deal with the abnormal links and compensate for the loss of yield and performance. The previous task mapping algorithms make it inefficient to Fig. 1. Overview of A3MAP perform tasks in the irregular/custom network since they are not adaptive to the variation of network. In this paper, we propose an A3MAP (Architecture-Aware II. Related Works and Our Contributions Analytic Mapping) algorithm that is analogous to analytical traffic minimization in a given NoC-based MPSoC. We use a In the last decade, a task mapping problem has been metric space that exactly captures the interconnection of solved on a regular mesh network [1-5]. Murali et al. [1] network and that is simple yet efficient for a task mapping present NMAP that is a fast algorithm, where the tasks are problem in various networks. In our task mapping mapped onto a regular mesh network under bandwidth 978-1-4244-5766-3/10/$26.00 ©2010 IEEE 523 6C-2 constrains, aiming at minimizing average communication ei,jE represents communication between vi to vj. vol(ei,j) delay. In [2], a branch and bound algorithm is adopted for represents communication volume between vi to vj in a task task mapping in a regular mesh-based NoC architecture, graph and bw(ei,j) represents bandwidth requirement which minimizes the total amount of power consumed in between vi to vj in a network. We construct an n×n communications. Shin et al. [3] explores the design space of interconnection matrix, CN corresponding to a network, NoC based systems, including task assignment, tile mapping, where cNi,jCN is equal to bw(ei,j) as shown in Fig. 2(a). routing path allocation, task scheduling and link speed Each row of CN represents interconnection relation with assignment using three nested genetic algorithms. The work respect to a single tile on an NoC. Thus, CN contains presented in [4] proposes an efficient technique for run-time interconnection relations for an entire network, representing application mapping onto a homogeneous NoC platform the metric space of network. Similarly, we construct a n×n with multiple voltage levels. Chen et al. in [5] proposes a interconnection matrix CC, corresponding to a task graph, complier-based application mapping algorithm that consists where cCi,jCC is equal to vol(ei,j) as shown in Fig. 2(b). of task scheduling, processor mapping, data mapping and For example, Fig. 2(c), (d) and (e) show three network packet routing to reduce energy consumption. graphs and their metric spaces using the proposed Recently, heterogeneous cores have been considered in an interconnection matrix. In Fig. 2(c) that is a regular mesh, all NoC-based MPSoC for low energy consumption [6-9]. Smit routers are interconnected bidirectionally. Its interconnection et al. in [6] solve the problem of run-time task assignment on matrix is composed symmetrically as shown in the below of heterogeneous processors. Carvalho et al. in [7] investigate Fig. 2(c). In case of an irregular mesh network in Fig. 2(d), the performance of several mapping heuristics promising for interconnections between tile A and tile B or between tile C run-time use in NoC based MPSOCs with dynamic and tile F are unidirectional and tile D is not interconnected workloads, targeting NoC congestion minimization. Chang to tile E. Since the bandwidth of links is also different, its et al. in [8] propose ETAHM to allocate tasks on a target interconnection matrix is composed asymmetrically. This multiprocessor system. It mixes task scheduling, mapping type of network is shown in VFI (Voltage Frequency Island) and DVS (Dynamic Voltage Scaling) utilization and couples based NoC where each processor operates with own voltage ant colony optimization algorithm. ADAM presented in [9] and frequency and in NoC with faulty [14] and degraded is run-time application mapping in a distributed manner links by process and temperature variation [13]. In case of a using agents targeting for adaptive NoC based custom network, there is slightly difference in the heterogeneous MPSoCs. composition of interconnection matrix. In Fig. 2(e), the However, those task mapping solutions do not consider wirelength between tile E and tile F is different from others the irregularity of tiles and interconnections which generates due to different sized tile E. The composition of more traffic on a network so that high energy may be interconnection matrix for the custom network is similar consumed or QoS (Quality of Service) requirement may not with the previous two cases, yet weight Į is added in the be guaranteed. Our main novelty and contribution for matrix in order to consider low energy consumption. Let the solving these problems include followings: energy consumption of single link, Elink, computed as: y A simple yet efficient metric space is proposed to capture EE E (1) the interconnection of task graph or network. Then, a link drivers repeaters task mapping problem is exactly formulated to a MIQP where Edrivers and Erepeaters are the energy consumed by based on a metric embedding technique. drivers and repeaters on a link respectively. If Elink1 and Elink2 y Successive relaxation approximation and genetic are the energy consumption of dashed link and dotted link algorithm are employed to solve the MIQP. They fit respectively, Į is equal to the ratio of Elink1 and Elink2 (=Elink1/ well our formulation and provide a reasonable trade-off Elink2). The weigh Į (0<Į<1) reduces the bandwidth of link between performance and runtime. with a long wire in the network. For example, we assume y We show that the A3MAP can achieve good mapping that a packet generated in tile A is transmitted to tile D, the performance not only in a regular network but also in dotted links are three times longer than the solid links and it an irregular network or custom network.