Tradeoffs Abound Infpga Design

Home , Mentor Graphics, Place and route

BASICSof FPGAs Design David Maliniak, Electronic Design Automation Editor Tradeoffs Abound in FPGA Design ield-programmable gate arrays (FPGAs) arrived in 1984 as an Understanding device alternative to programmable logic devices (PLDs) and ASICs. types and design flows FAs their name implies, FPGAs offer the significant benefit of being readily is key to getting the programmable. Unlike their forebearers in the PLD category, FPGAs can (in most cas- most out of FPGAs es) be programmed again and again, giving designers multiple opportunities to tweak their circuits. the programming scheme. There’s no large non-recurring engineering (NRE) cost Just a few years ago, the largest FPGA was measured associated with FPGAs. In addition, lengthy, nerve- in tens of thousands of system gates and operated at 40 wracking waits for mask-making operations are MHz. Older FPGAs often cost more than $150 for the squashed. Often, with FPGA development, logic design most advanced parts at the time. Today, however, FPGAs begins to resemble software design due to the many itera- offer millions of gates of logic capacity, operate at 300 tions of a given design. Innovative design often happens MHz, can cost less than $10, and offer integrated func- with FPGAs as an implementation platform. tions like processors and memory (Table 1). But there are some downsides to FPGAs as well. The PGAs offer all of the features needed to imple- economics of FPGAs force designers to balance their rel- ment most complex designs. Clock management atively high piece-part pricing compared to ASICs with is facilitated by on-chip PLL (phase-locked loop) the absence of high NREs and long development cycles. For DLL (delay-locked loop) circuitry. Dedicated They’re also available only in fixed sizes, which matters memory blocks can be configured as basic single-port when you’re determined to avoid unused silicon area. RAMs, ROMs, FIFOs, or CAMs. Data processing, as embodied in the devices’ logic fabric, varies widely. The What are FPGAs? ability to link the FPGA with backplanes, high-speed bus- FPGAs fill a gap between discrete logic and the smaller es, and memories is afforded by support for various single- PLDs on the low end of the complexity scale and costly ended and differential I/O standards. Also found on custom ASICs on the high end. They consist of an array today’s FPGAs are system-building resources such as high- of logic blocks that are configured using software. Pro- speed serial I/Os, arithmetic modules, embedded proces- grammable I/O blocks surround these logic blocks. Both sors, and large amounts of memory. are connected by programmable interconnects (Fig. 1). Initially seen as a vehicle for rapid prototyping and The programming technology in an FPGA determines the emulation systems, FPGAs have spread into a host of type of basic logic cell and the interconnect scheme. In applications. They were once too simple, and too costly, turn, the logic cells and interconnection scheme deter- for anything but small-volume production. Now, with the mine the design of the input and output circuits as well as advent of much larger devices and declining per-part costs,

A Supplement to Electronic Design/December 4, 2003 Sponsored by Mentor Graphics Corp. Do’s And Don’ts For The FPGA Designer

1.Do concentrate on I/O timing, not 1. Don’t synthesize unless just the register-to-register internal frequency that you’ve fully and correctly constrained the FPGA place-and-route tools report. Frequently, your design. This includes correct the hardest challenge in a complete FPGA design is clock domains, I/O timing require- the I/O timing. Focus on how your signals enter and ments, multicycle paths, and false leave your FPGA, because that’s where the bottlenecks frequently occur. paths. If your synthesis tool doesn’t see exactly what you want, it can’t make 2. Do create hierarchy around vendor-specific structures and decisions to optimize your design accordingly. instantiations. Give yourself the freedom to migrate from one technology to 2. Don’t try to fix every timing problem in place and route. Place and another by ensuring that each instantiation of a vendor-specific element is route offers little room for fixing timing where a properly constrained in a separate hierarchical block. This applies especially to RAMs and clock- synthesis tool would. management blocks. 3. Don’t vainly floor plan at the RTL or block level hoping to 3. Do use IP timing models during synthesis to give the true pic- improve place-and-route results. Manual area placement can cause more ture of your design. By importing EDIF netlists of pre-synthesized blocks, problems than it might initially appear to solve. Unless you are an expert in your synthesis tool can fully understand your timing requirements. Be cau- manual placement and floorplanning, this is best left alone. tious when using vendor cores that you can bring into your synthesis tool if 4. Don’t string clock buffers together, create multiple clock they have no timing model. trees from the same clock, or use multiple clocks when a simple enable will 4. Do design your hierarchical blocks with registered outputs do. Clocking schemes in FPGAs can become very complicated now that there where possible to avoid having critical paths pass through many levels of hier- are PLLs, DLLs, and large numbers of clock-distribution networks. Poor archy. FPGAs exhibit step-functions in logic-limited performance. When hier- clocking schemes can lead to extended place-and-route times, failure to archy is preserved and the critical path passes across a hierarchical boundary, meet timing, and even failure to place in some technologies. Simpler you may introduce an extra level of logic. When considered along with the schemes are vastly more desirable. Avoid those gated clocks, too! associated routing, this can add significant delay to your critical path. 5. Don’t forget to simulate your design blocks as well as your 5. Do enable retiming in your synthesis tool. FPGAs tend to be reg- entire design. Discovering and back-tracking an error from the chip’s pins ister-rich architectures. When you correctly constrain your design in synthe- during on-board testing can be extremely difficult. On-board FPGA testing sis, you allow the tool to optimize your design to take advantage of positive can miss important design flaws that are much easier to identify during sim- slack timing within the design. Sometimes this can be done after initial place ulation; they can be rectified by modifying the FPGA’s programming. and route to improve retiming over wireload estimation.

FPGAs are finding their way off the prototyping bench and board or system test, and then reprogrammed to perform into production (Table 2). their main task. On the flip side, though, SRAM-based FPGAs must be reconfigured each time their host system is Comparing FPGA Architectures powered up, and additional external circuitry is required to FPGAs must be programmed by users to connect the chip’s do so. Further, because the configuration file used to pro- resources in the appropriate manner to implement the gram the FPGA is stored in external memory, security issues desired functionality. Over the years, various technologies concerning intellectual property emerge. have emerged to suit different requirements. Some FPGAs Antifuse-based FPGAs aren’t in-system programmable, can only be programmed once. These Table 1: KEY RESOURCES AVAILABLE IN THE LARGEST DEVICES FROM MAJOR FPGA VENDORS devices employ antifuse technology. Features Xilinx Virtex II Pro Altera Stratix Actel Axcelerator Lattice ispXPGA Flash-based devices can be programmed Clock DCM PLL PLL SysCLOCK PLL and reprogrammed management Up to 12 Up to 12 Up to 8 Up to 8 again after debug- ging. Still others can Embedded BlockRAM TriMatrix memory Embedded RAM SysMEM blocks be dynamically pro- memory blocks Up to 10 Mbits Up to 10 Mbits Up to 338 kbits Up to 414 kbits grammed thanks to Data processing Configurable logic Logic elements and Logic modules (C- Based on SRAM-based technol- blocks and 18-bit by embedded multipli- Cell and R-Cell) programmable ogy. Each has its 18-bit multipliers ers Up to 10,000 functional unit advantages and disadvantages (Table 3). Up to 125,000 logic Up to 79,000 LEs and R-Cells and 21,000 Up to 3844 PFUs ost mod- cells and 556 multipli- 176 embedded mul- C-Cells ern er blocks tipliers FPGAs Mare Programmable I/Os SelectI/O Advanced I/O Advanced I/O sup- SysI/O based on SRAM con- support port figuration cells, which offer the benefit of Special features Embedded PowerPC DSP blocks PerPin FIFOs for SysHSI for high- unlimited reprogram- 405 cores bus applications speed serial mability. When pow- High-speed differ- interface RocketI/O multi-giga- ered up, they can be ential I/O and inter- bit transceiver face standards sup- configured to perform port a given task, such as a

Sponsored by Mentor Graphics Inc. BASICS of but rather are pro- The FPGA design flow FPGAs Design grammed offline After weighing all implementation options, you must consid- using a device pro- er the design flow. The process of implementing a design on grammer. Once the an FPGA can be broken down into several stages, loosely chip is configured, it can’t be altered. definable as design entry or capture, synthesis, and place and However, in antifuse technology, device configuration is non- route (Fig. 2). Along the way, the design is simulated at vari- volatile with no need for external memory. On top of that, it’s ous levels of abstraction as in ASIC design. The availability virtually impossible to reverse-engineer their programming. of sophisticated and coherent tool suites for FPGA design They often work as replacements for ASICs in small volumes. makes them all the more attractive. n a sense, flash-based FPGAs fulfill the promise of FPGAs in At one time, design entry was performed in the form of that they can be reprogrammed many times. They’re non- schematic capture. Most designers have moved over to hard- volatile, retaining their configuration even when powered ware description languages (HDLs) for design entry. Some Idown. Programming is done either in-system or with a pro- will prefer a mixture of the two techniques. Schematic-based grammer. In some cases, IP security can be achieved using a multi- design-capture tools gave designers a great deal of control bit key that locks the configuration data after programming. over the physical placement and partitioning of logic on the But flash-based FPGAs require extra process steps above and device. But it’s becoming less likely that designers will take beyond standard CMOS technology, leaving them at least a gen- that route. Meanwhile, language-based design entry is faster, eration behind. Moreover, the many pull-up resistors result in but often at the expense of performance or density. high static power consumption. or many designers, the choice of whether to use FPGAs can also be characterized as having either fine-, medi- schematic- or HDL-based design entry comes down um-, or coarse-grained architectures. Fine-grained architectures to their conception of their design. For those who boast a large number of relatively simple logic blocks. Each logic Fthink in software or algorithmic-like terms, HDLs are block usually contains either a two-input logic function or a 4-to- the better choice. HDLs are well suited for highly complex 1 multiplexer and a flip-flop. Blocks can only be used to imple- designs, especially when the designer has a good handle on ment simple functions. But fine-grained architectures lend them- how the logic must be structured. They can also be very useful for designing smaller functions when you haven’t the time or 1. Functional Blocks inclination to work through the actual hardware implementation. Just about all FPGAs include a regular, programmable, and flexible architecture of logic blocks surrounded by input/output blocks on the perimeter. These functional blocks are linked together by a hierarchy of highly versatile programmable interconnects. 2. The Big Picture A “big picture” look at an FPGA design flow shows the major steps in the process: design entry, synthesis from RTL to gate level, and physical design. Place and Input/output blocks route is done using the FPGA vendors’ proprietary tools that account for the devices’ architectures and logic-block structures.

Logic blocks

Modify design

Programmable interconnects

Vendor placeplace and routeroute

selves to execution of functions that benefit from parallelism. No-guess flow oarse-grained architectures consist of relatively large log- Achieved ic blocks often containing two or more lookup tables and timing?timing?timing? No two or more flip-flops. In most of these architectures, a Cfour-input lookup table (think of it as a 16 x 1 ROM) implements the actual logic. Yes Done!

Sponsored by Mentor Graphics Corp. On the other Table 2: FPGA USAGE formance in hand, HDLs rep- Emulation: 3% Prototyping: 30% Preproduction: 30% Production: 37% terms of speed. resent a level of As a result, Time-to-market Fairly high; fast Fairly high; fast Fairly high; fast com- Fairly high; fast abstraction that compile times compile times pile times compile times designers can can isolate design- perform many Performance Not stringent Not stringent Very critical Very critical ers from the simulation runs details of the Very low per in an effort to Volume application Low per application Moderately high per High per applica- hardware imple- application tion refine the logic. mentation. At this stage, Schematic-based FPGA develop- entry gives designers much more visibility into the hardware. It’s ment isn’t unlike software development. Signals and variables are a better method for those who are hardware-oriented. The observed, procedures and functions traced, and breakpoints set. downside of schematic-based entry is that it makes the design The good news is that it’s a very fast simulation. But because the more difficult to modify or port to another FPGA. design hasn’t yet been synthesized to gate level, properties such as third option for design entry, state-machine entry, timing and resource usage are still unknowns. works well for designers who can see their logic design The next step following RTL simulation is to convert the RTL as a representation Aseries Table 3: ADVANTAGES/DISADVANTAGES OF VARIOUS FPGA TECHNOLOGIES of the design of states that the into a bit-stream system steps Feature SRAM Antifuse Flash file that can be through. It shines Reprogrammable? Yes (in-system) No Yes (in-system or offline) loaded onto the when designing Reprogramming speed Fast Not 3X SRAM FPGA. The somewhat simple (including erasure) applicable interim step is functions, often in Volatile? Yes No No (but can be if required) FPGA synthesis, the area of system External configuration file? Yes No No which translates control, that can Good for prototyping? Yes No Yes the VHDL or be clearly repre- Instant-on? No Yes Yes Verilog code sented in visual into a device IP security Poor Very good Very good formats. Tool netlist format support for finite Size of configuration cell Large (six transistors) Very small Small (two transistors) that can be state-machine Power consumption High Low Medium understood by a entry is limited, Radiation hardness? No Yes No bit-stream con- though. verter. Some designers The synthesis approach the start of their design from a level of abstraction process can be broken down into three steps. First, the HDL higher than HDLs, which is algorithmic design using the C/C++ code is converted into device netlist format. Then the resulting programming languages. A number of EDA vendors have tool file is converted into a hexadecimal bit-stream file, or .bit file. flows supporting this design style. Generally, algorithmic design This step is necessary to change the list of required devices and has been thought of as a interconnects into hexa- tool for architectural 3. Go With The Flow decimal bits to down- exploration. But increas- The implementation flow for FPGAs begins with synthesis of the HDL design description into a gate-level netlist. load to the FPGA. Last- ingly, as tool flows Accounting for user-defined design constraints on area, power, and speed, the tool performs various optimiza- ly, the .bit file is emerge for C-level syn- tions before creating the netlist that’s passed on to place-and-route tools. downloaded to the thesis, it’s being accepted physical FPGA. This as a first step on the road final step completes the to hardware implemen- FPGA synthesis proce- tation. Language input (VHDL/Verilog) dure by programming fter design the design onto the HDL files Initial optimization entry, the physical FPGA. design is sim- Timing analysis t’s important to fully ulated at the constrain designs A Timing optimization register-transfer level Constraints before synthesis (Fig. (RTL). This is the first of Design I3). A constraint file several simulation stages, is an input to the synthe- because the design must Placement sis process just as the be simulated at successive RTL code itself. Con- levels of abstraction as it Routing VHDL/IP RTL straints can be applied moves down the chain globally or to specific toward physical imple- Implement portions of the design. mentation on the FPGA Verilog/IP RTL The synthesis engine uses itself. RTL simulation FPGA/PLD these constraints to opti- offers the highest per- mize the netlist. However, it’s equally important to not over-constrain the design, which odern FPGAs also incorporate a JTAG port that, will generally result in less-than-optimal results from the next happily, can be used for more than boundary-scan step in the implementation process—physical device place- testing. The JTAG port can be connected to the ment—and interconnect routing. Synthesis constraints soon Mdevice’s internal SRAM configuration-cell shift reg- become place-and-route constraints. ister, which in turn can be instructed to connect to the chip’s This traditional flow will work, but it can lead to numer- JTAG scan chain. ous iterations before achieving timing closure. Some EDA If you’ve gotten this far with your design, chances are you vendors have incorporated more modern physical synthesis have a finished FPGA. There’s one more step to the process, techniques, which automate device re-timing by moving however, which is to attach the device to a printed-circuit board lookup tables (LUTs) across registers to balance out timing in a system. The appearance of 10-Gbit/s serial transmitters, or slack. Physical synthesis also anticipates place and route to I/Os, on the chip, coupled with packages containing as many as leverage delay information. 1500 pins, makes the interface between the FPGA and its intend- ollowing synthesis, device implementation begins. ed system board a very sticky issue. All too often, an FPGA is After netlist synthesis, the design is automatically soldered to a pc board and it doesn’t function as expected or, converted into the format supported internally by worse, it doesn’t function at all. That can be the result of errors Fthe FPGA vendor’s place-and-route tools. Design- caused by manual placement of all those pins, not to mention rule checking and optimization is performed on the incoming the board-level timing issues created by a complex FPGA. netlist and the software partitions the design onto the avail- More than ever, designers must strongly consider an integrat- able logic resources. Good partitioning is required to achieve ed flow that takes them from conception of the FPGA through high routing completion and high performance. board design. Such flows maintain complete connectivity Increasingly, FPGA designers are turning to floorplanning between the system-level design and the FPGA; they also do so after synthesis and design partitioning. FPGA floorplanners between design iterations. Not only do today’s integrated FPGA- work from the netlist hierarchy as defined by the RTL cod- to-board flows create the schematic connectivity needed for veri- ing. Floorplanning can help if area is tight. When possible, it’s a good idea to place critical logic in 4. Simulation Stages separate blocks. After partitioning and floorplanning, the FPGA simulation occurs at various stages of the design process: after RTL design, after synthesis, and once again placement tool tries to place the logic blocks to after implementation. The latter is a final gate-level check, accounting for actual logic and interconnect delays, of logic functionality. achieve efficient routing. The tool monitors routing length and track congestion while placing the blocks. It may also track the absolute path delays to meet the user’s timing constraints. Overall, the FPGA gate process mimics PCB place and route. FPGA Testbench library Functional simulation is performed after syn- RTL design thesis and before physical implementation. This step ensures correct logic functionality. After implementation, there’s a final verification step with full timing information. After placement and routing, the logic and routing delays are HDL simulator back-annotated to the gate-level netlist for this Place and route final simulation. At this point, simulation is a much longer process, because timing is also a fac- tor (Fig. 4). Often, designers substitute static timing analysis for timing simulation. Static timing Synthesis analysis calculates the timing of combinational paths between registers and compares it against the designer’s timing constraints. nce the design is successfully verified and found to meet timing, the final step is to actually program the FPGA itself. At the com- fication and layout of the board, but they also document which Opletion of placement and routing, a binary pro- signal connections are made to which device pins and how these gramming file is created. It’s used to configure the device. No map to the original board-level bus structures. matter what the device’s underlying technology, the FPGA ntegrated flows for FPGAs make sense in general, consider- interconnect fabric has cells that configure it to connect to ing that FPGA vendors will continue to introduce more com- the inputs and outputs of the logic blocks. In turn, the cells plex, powerful, and economical devices over time. An inte- configure those logic blocks to each other. Most programma- Igrated third-party flow makes it easier to re-target a design ble-logic technologies, including the PROMs for SRAM- to different technologies from different vendors as conditions based FPGAs, require some sort of a device programmer. warrant. Devices can also be programmed through their configuration ports using a set of dedicated pins.

BASICSof FPGAs Design