Chapter 11 REAL-TIME SYSTEM-ON-A-CHIP EMULATION Emulation Driven System Design with Direct Mapped Virtual Components

Chapter 11 REAL-TIME SYSTEM-ON-A-CHIP EMULATION Emulation Driven System Design with Direct Mapped Virtual Components Kimmo Kuusilinna1,2, Chen Chang1, Hans-Martin Bluethgen3, W. Rhett Davis4, Brian Richards1, B. Nikolic1 and Robert W. Brodersen1 1UC Berkeley, Berkeley Wireless Research Center: 2Tampere University of Technology, Finland: 3Infineon Technologies AG: 4North Carolina State University Abstract: The productivity gap between the designer and the opportunities on silicon places increasing pressure particularly on system verification. A comprehensive design flow for digital systems from high-level algorithmic specifications to FPGA-based emulation and final ASIC implementation. The design is entered only using a component library with predictable performance, therefore, enabling rapid system development and easing the verification burden. Hardware emulation, from this description, enables rapid prototyping of large systems where gate-level simulations are impractical. The primary goal of the emulator is to support design space exploration of real- time algorithms. The design environment is customized towards low-power and data flow dominant architectures particularly focusing on applications related to wireless communications. The design process of a 1 Mbit/s transmission system is explored, demonstrating the design convenience and the early performance analysis. Key words: Electronic Design Automation, Hardware Emulation, Rapid Prototyping, Field-Programmable Gate-Array. 1. INTRODUCTION In order to implement system-on-a-chip (SoC) designs most efficiently it is desirable to employ the expertise of the algorithm developers throughout 1 2 Chapter 11 the design process. Therefore the design entry should be as familiar as possible to these designers that often results in the desire to use a standard software programming language. Additionally a path is necessary which gives feedback to the algorithm developer about the realizability of the implementation. Finally, it is also often desirable to evaluate the performance of the final system through a real-time prototype. Satisfying these constraints using conventional approaches leads the SoC developer to a software solution with complexity estimates made by monitoring the execution run time. The design can then be prototyped on a processor which employs the same instruction set. The integration of more optimized hardware such as accelerators becomes very difficult because of the performance mismatch between a software implementation and hardware acceleration. This chapter will describe another approach which satisfies the above requirements of flexibility in the system specification stage, but is able to drive an implementation using highly optimized architectures which have energy and area efficiencies and performance advantages orders of magnitude higher than achievable with a processor solution. The basic approach here is to take a fully parallel description of the algorithm using a timed data flow description and then to directly map it into hardware using libraries that provide resource requirements. The libraries have two realizations: an ASIC implementation for the actual SoC integration and an FPGA realization for real-time prototyping. The timed data flow description used is Simulink from Mathworks, which is a common environment for communication system developers. The ability to accurately estimate resources from that level provides the feedback to the system developer to do system optimizations with accurate knowledge of the final implementation costs. Finally, the prototyping is done using a large FPGA array shown in Figure 11-1 which has the capability to implement the processing possible on a integrated realization from the same description which could be used to do the Soc design. The direct mapping of the parallel description has the advantage that automatic transitions between different descriptions of the design are possible due to the straightforward conversions. Design verification is easier because difficulties in lower level descriptions are conveniently associated with the top-level design. Furthermore, this means that most, if not all, of the design decisions can be raised to the top-level. With direct mapped designs the algorithm details are explicit, therefore forcing the algorithm designer to be involved in the hardware design on a high level. The designers can quickly see the impacts of architectural decisions on actual silicon. This Chapter is organized as follows. Section 2 describes the fundamental approach taken for the SoC design. Section 3 discusses the 11. REAL-TIME System-on-A-Chip Emulation 3 basics of system design and the BEE integration into the system design. In addition, virtual components and the library development methodology are examined. In Section 4, the implementation path to both emulation and ASICs (Application Specific Integrated Circuits) are explained. Particularly, the BEE architecture and some of its inherent capabilities to support contemporary logic design are detailed. Section 5 is an application example of a system building block: a 1 Mbit/s transceiver. Finally, Section 6 concludes the Chapter. Figure 11-1. The BEE Main Processing Board with one riser card. 2. DESCRIBING SOC DESIGNS A “single-chip radio” SoC requires analog circuitry with RF analog and baseband circuitry, A/D and D/A converters, memories, and hard-wired digital signal processing which requires the highest possible performance at the lowest possible power. In this Chapter we will concentrate on the digital part of the problem and in particular a strategy which allows real-time emulation of highly optimized architectures. Many radio chip design flows can be broken into four stages: specification, architecture, front-end, and back-end. Specification generally includes a description of the algorithms and protocols for the digital portion 4 Chapter 11 and requirements for analog portion such as noise figure, phase noise, and distortion. Architecture, sometimes called chip architecture or micro- architecture, is generally the first stage for IC designers. An architecture is a general plan of what kind of signals will exist on a chip and how they are processed, stored, and carried across the chip. A micro-architecture should include some idea of spatial locality, since the further a signal must be carried on the chip, the more likely it is to incur delay or be corrupted by noise and coupling effects. Furthermore, a micro-architecture is a partitioning of a design, which is more or less known to be physically realizable. Front-end design includes the complete specification of all logic functions on the chip, generally as RTL logic, and models for all analog blocks. Back-end design includes the mapping of the front-end design to transistors and making the final mask patterns to be used in fabrication. Chip projects generally begin with a specification and proceed sequentially through the three later stages of the design flow with a different team working on each aspect. The success of this approach is that it allows the design flow to be broken up into many parts, which can proceed more or less independently. Changes to this kind of a flow, and therefore for each new generation of wireless systems, are usually relatively small so that the balance between the stages is not disturbed. Micro-architectures change slowly to retain predictable performance, making multi-level design optimization difficult. However, it would be better for the algorithms to drive the implementation decisions instead of letting implementation dictate algorithms. The design flow should allow suitable point optimization techniques to be applied at each level. For example, at specification level, altering the number and type of operations and adding parallelism should be possible. At architectural level, one should be able to simplify interfaces between blocks, adjust supply voltage, add buffers and caches to increase throughput and to remove buffering constructs to reduce power. At front-end level, switching off unused circuitry, resource sharing to reduce area, elimination of resource sharing to reduce power, and pipeline re-timing to minimize cycle time are important. For the back-end, careful floor-planning to reduce interconnects in critical paths, reducing noise to sensitive circuitry, and resizing transistors to reduce power or increase speed are issues to consider. Our approach has been to choose a single environment, which acts as a common language for analog, digital, and algorithm designers. The common language and simulation paradigm allow the effects of point optimizations to be quickly checked against the rest of the design. Simulink was chosen as the common environment because it seems to be a good compromise among the different design environment requirements. Since Simulink is a structural rather than a procedural description, the system descriptions contain the 11. REAL-TIME System-on-A-Chip Emulation 5 basic parallelism, which can be exploited by concurrent hardware. Both analog design, including baseband models of the RF front end and modeling of non-idealities such as phase noise in the VCO (Voltage Controlled Oscillator), circuit noise in the LNA (Low-Noise Amplifier), and distortion in the mixer, and digital design can be conducted in this environment. For the digital design, data-paths are described using a fixed-point block set and control logic using the Stateflow finite state machine package. Typically, these models use discrete time instead of continuous

Load more