FLEXBUS: a High-Performance System-On-Chip Communication

34.4 FLEXBUS: A High-Performance System-on-Chip Communication Architecture with a Dynamically Configurable Topology ∗ Krishna Sekar Kanishka Lahiri Anand Raghunathan Sujit Dey Dept. of ECE NEC Laboratories America NEC Laboratories America Dept. of ECE UC San Diego, CA 92093 Princeton, NJ 08540 Princeton, NJ 08540 UC San Diego, CA 92093 [email protected] [email protected] [email protected] [email protected] ABSTRACT portunities for dynamically controlling the spatial allocation of communication architecture resources among different SoC components, In this paper, we describe FLEXBUS, a flexible, high-performance on- chip communication architecture featuring a dynamically configurable a capability, which if properly exploited, can yield substantial performance gains. We also describe techniques for choosing optimized topology. FLEXBUS is designed to detect run-time variations in communication traffic characteristics, and efficiently adapt the topology of FLEXBUS configurations under time-varying traffic profiles. We have the communication architecture, both at the system-level, through dy- conducted detailed experiments on FLEXBUS using a commercial de- namic bridge by-pass, as well as at the component-level, using compo- sign flow to analyze its area and performance under a wide variety of system-level traffic profiles, and applied it to an IEEE 802.11 MAC nent re-mapping. We describe the FLEXBUS architecture in detail and present techniques for its run-time configuration based on the char- processor design. The results demonstrate that FLEXBUS provides up acteristics of the on-chip communication traffic. The techniques un- to 31.5% performance gains compared to conventional architectures, with negligible hardware overhead. derlying FLEXBUS can be used in the context of a variety of on-chip communication architectures. In particular, we demonstrate its appli- 1.1 Related Work cation to AMBA AHB, a popular commercial on-chip bus. Detailed The increasing importance of on-chip communication has resulted experiments conducted on the FLEXBUS architecture using a commer- in numerous advances in communication architecture topology and cial design flow, and its application to an IEEE 802.11 MAC processor protocol design [1]-[5]. However, these architectures are based on design, demonstrate that it can provide significant performance gains fixed topologies, and therefore, are not fully optimized for traffic with as compared to conventional architectures (up to 31.5% in our experi- time-varying characteristics. The use of sophisticated communication ments), with negligible hardware overhead. protocols [3], [6] and techniques for customizing these protocols, both Categories and Subject Descriptors: B.4.3 [Input/Output and Data statically and dynamically [7], [8], [9], have been proposed. Protocol Communications]: Interconnections (Subsystems) and topology customization are complementary, and hence, may be General Terms: Algorithms, Design, Performance combined to yield large performance gains. A number of automatic approaches have been proposed to statically optimize the communica- Keywords: Communication architectures, On-chip bus tion architecture topology [10], [11]. While many of these techniques aim at exploiting application characteristics, they do not adequately 1. INTRODUCTION address dynamic variations in communication traffic profiles. The on-chip communication architecture has emerged as a criti- cal determinant of overall performance in complex System-on-Chip 2. BACKGROUND (SoC) designs, which makes it crucial to customize it to the charac- Communication architecture topologies can range from a single teristics of the traffic generated by the application. Since application shared bus, to which all the system components are connected, to a traffic can exhibit significant diversity, configurable communication network of bus segments interconnected by bridges. Component map- architectures that can be easily adapted to an application’s require- ping refers to the association between components and bus segments. ments are desirable. Components can be either masters (e.g., CPUs, DSPs), which can initi- As shown in this paper, complex SoC designs can benefit from com- ate transactions (reads/writes), or slaves,(e.g., memories, peripherals), munication architectures that can be dynamically configured in order which only respond to transactions initiated by a master. Communica- to adapt to changing traffic characteristics. Most state-of-the-art com- tion protocols specify conventions for the data transfer (e.g., arbitra- munication architectures provide limited customization opportunities tion policies, burst transfer modes). Bridges are specialized compo- through a static (design time) configuration of architectural parame- nents that enable transactions between masters and slaves located on ters, and as such, lack the flexibility to provide high performance in different bus segments. “Cross-bridge” transactions execute as fol- cases where the traffic profiles exhibit significant dynamic variation. lows: the transaction request from the master, once granted by the first In this paper, we describe FLEXBUS, a flexible communication ar- bus, is registered by the bridge’s slave interface. The bridge forwards chitecture that detects run-time variations in system-wide communica- the transaction to it’s master interface, which then requests access to tion traffic characteristics, and efficiently adapts the logical connectiv- the second bus. On being granted, the bridge executes the transaction ity of the communication architecture and the components connected with the destination slave and returns the response to it’s slave inter- to it. This is achieved using two novel techniques, bridge by-pass, face, which finally returns the response to the original master. and component re-mapping, which address configurability at the system and component levels, respectively. These techniques provide op- 3. MOTIVATIONAL EXAMPLE: IEEE 802.11 ∗ This work was supported by the UC Discovery Grant (grant# com02-10123), the Center MAC PROCESSOR for Wireless Communications (UC San Diego), and NEC Laboratories America. In this section, we use an IEEE 802.11 MAC processor design as an example to illustrate the concepts behind the FLEXBUS architec- Permission to make digital or hard copies of all or part of this work for ture. The functional specification of the IEEE 802.11 MAC proces- personal or classroom use is granted without fee provided that copies are sor consists of a set of communicating tasks, shown in Figure 1(a) not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to (details of the 802.11 MAC protocol are available in [12]). For out- republish, to post on servers or to redistribute to lists, requires prior specific going frames, the LLC task receives frames from the Logical Link permission and/or a fee. Control Layer. The Wired Equivalent Privacy (WEP) task and the In- DAC 2005, June 13–17, 2005, Anaheim, California, USA. tegrity Checksum Vector (ICV) task work in conjunction to encrypt Copyright 2005 ACM 1-59593-058-2/05/0006 ...$5.00. and compute a checksum of the frame, respectively. The HDR task 571 From Logical Link To 3.2 Dynamic Topology Configuration Ctrl PHY WEP_ WEP_ We next consider the execution of the IEEE 802.11 MAC processor LLC HDR FCS PLI INIT ENCRYPT under two variants of the FLEXBUS architecture. ICV Frame Example 3: We first assume that the keys are statically configured. MAC_ Data TKIP ICV CTRL Under FLEXBUS, the architecture operates in a multiple bus mode To carrier Control during intervals that exhibit localized communications, and hence en- (a) sense Signals ables concurrent processing of the WEP and FCS tasks. In intervals AHB WEP FCS ARM946ES that require low latency communications between components located Arbiter AHB I/F AHB I/F AHB I/F AHB I/F on different bus segments, a dynamic bridge by-pass mechanism tem- AHB porarily transforms the architecture into a single shared bus topology. The measured data rate under this architecture was 248 Mbps, a 23% AHB I/F AHB I/F AHB I/F To LLC Frame Key (b) PLI PHY improvement over the best statically configured architecture. Buf Buffer The bridge by-pass mechanism provides a technique to make AHB_1 AHB_2 coarse-grained (system level) changes to the communication archi- WEP ARM946ES FCS Arbiter Arbiter tecture topology. However, at times, the ability to make more fine- AHB I/F AHB I/F AHB I/F AHB I/F AHB I/F AHB I/F AHB_1 AHB_2 grained (component level) changes to the topology is also important, Bridge as illustrated next. AHB I/F AHB I/F AHB I/F AHB I/F AHB I/F LLC Frame Key Frame Example 4: When the TKIP task is enabled, the FLEXBUS architec- PLI Buf_1 Buffer Buf_2 (c) ture with dynamic bridge by-pass achieves a data rate of 208 Mbps, which is 18% better than the best conventional architecture. We next Figure 1: IEEE 802.11 MAC processor:(a) functional specifica- consider the application of a dynamic component re-mapping capabil- tion, mapping to a (b) single shared, (c) multiple bus architecture ity. Using this technique, while the overall architecture remains in the multiple bus configuration, the mapping of the slave Frame buf2 is generates the MAC header. The Frame Check Sequence (FCS) task dynamically switched between the two buses. It is mapped to AHB 2 computes a CRC-32 checksum over the encrypted frame and header. as long as the FCS and WEP components are processing frame data, MAC CTRL implements the

FLEXBUS: a High-Performance System-On-Chip Communication

System Buses EE2222 Computer Interfacing and Microprocessors

Input/Output

PDP-11 Bus Handbook (1979)

Upgrading and Repairing Pcs, 21St Edition Editor-In-Chief Greg Wiegand Copyright © 2013 by Pearson Education, Inc

BUS and CACHE MEMORY ORGANIZATIONS for MULTIPROCESSORS By

The System Bus

DMA, System Buses and Peripheral Buses

Application Note Ap-28

PCI Local Bus Specification

The Motherboard ‐ Chapter #5

What Is a DMA Controller?

PDP-11/24 - Computer History Wiki