SU 4 EODQATR20 CL ORA XILINX, INC. XCELL JOURNAL ISSUE 64, SECOND QUARTER 2008 EMBEDDED PROCESSING SOLUTIONS EDITION
Issue 64 Second Quarter 2008 XcellXcelljournaljournal SOLUTIONS FOR A PROGRAMMABLE WORLD
Inside the New Virtex-5 FXT FPGA
COVER Embedded Processing InnovationsInnovations withwith Virtex-5 FXT Devices
INSIDEINSIDE Next-Generation Wireless Standards Using Virtex-5 FXT Devices
FPGA Coprocessors for Spacecraft Image Processing
Designing Multiprocessor SOCs
Building Linux Platforms on Xilinx Processors
R
www.xilinx.com/xcell/ Support Across The Board™
Bryan Fletcher Avnet Global Technical Marketing Bringing Products to Life. Manager At Avnet Electronics Marketing, support across the board David Loftus Xilinx is much more than a tagline for us. From design to Vice President / General Manager delivery – we are deeply committed to driving maximum General Products Division (GPD) efficiency throughout the product lifecycle.
The Challenge When the world’s largest FPGA maker Xilinx was mounting the launch of its latest Spartan™-3 Generation family of low-cost FPGAs, the company wanted to ensure designers would have everything they need to compete in the highly competitive high-volume market. Xilinx turned to longtime partner Avnet to lend its technical expertise in creating a complete solution.
The Solution Avnet spent countless engineering hours alongside Xilinx – developing a comprehensive solution of starter kits, development boards, and Speedway™ educational workshops. With Xilinx® Spartan-3 Generation FPGAs and Avnet’s support across the board, designers have access to all they need to save up to 50 percent in total system cost over competing FPGAs.
Visit www.em.avnet.com/xilinxspartan3kits to learn more about Spartan-3 solutions and to purchase a Spartan-3A evaluation kit for only $39!
1 800 332 8638 www.em.avnet.com
©Avnet, Inc. 2008. All rights reserved. AVNET is a registered trademark of Avnet, Inc. ©2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. LETTER FROM THE PUBLISHER
Integrated System Platforms: Xcell journal The Next Wave in FPGA-based Innovation PUBLISHER Forrest Couch [email protected] 408-879-5270 for Embedded Processing Applications EDITOR Charmaine Cooper Hussain
ART DIRECTOR Scott Blair FPGAs have come a long way since the days serving as prototypes to test design concepts before committing them to ‘permanent’ silicon. FPGAs have now become the foundation for many of DESIGN/PRODUCTION Teie, Gelwicks & Associates the most innovative products that improve our lives everyday. Driven by diverse and demanding 1-800-493-5551 market requirements, FPGAs are valued above all else for their flexibility. Their inherent pro-
ADVERTISING SALES Dan Teie grammability is the insurance policy product developers depend on to quickly and efficiently meet 1-800-493-5551 market windows and keep products current in fast moving, evolving markets. [email protected] As many of the articles published in Xcell Journal illustrate, FPGAs are no longer a single-function device. These programmable devices have emerged as the preferred platform for the highly TECHNICAL COORDINATORS Larry Caputo F Jay Gould integrated world of systems-on-chip (SoC) and embedded system design. Therefore, it is no coinci- dence that many of you will see this issue of our magazine at the Embedded System Conference in INTERNATIONAL Piera Or, Asia Pacific San Jose, CA, one of the largest gatherings of hardware designers and software developers. [email protected] It’s also appropriate that this theme is highlighted as Xilinx introduces its ultimate system integra- Christelle Moraga, Europe/ tion platform – the Virtex™-5 FXT FPGA. Comprising the fourth platform in the 65-nanometer Middle East/Africa [email protected] Virtex-5 family, Virtex-5 FXT devices combine flexibility with very high-performance embedded pro- cessing, digital signal processing and connectivity capabilities into a single chip to reduce total system Yumi Homura, Japan [email protected] cost, power, and board real estate.
SUBSCRIPTIONS All Inquiries In this Issue www.xcellpublications.com The significance of the Virtex-5 FXT platform for embedded developers is underscored in the cover article, which describes in detail the industry’s first FPGAs with embedded PowerPC® REPRINT ORDERS 1-800-493-5551 440 processor blocks, high-speed RocketIO™ serial transceivers, and dedicated XtremeDSP™ processing capabilities. These devices deliver an optimal system integration platform that meets market and bandwidth requirements of transporting voice, video, and data for a wide range of applications in wired and wireless communications, audio/video broadcast equipment, military, aerospace and industrial systems, along with many others. This issue provides a “behind-the- scene” look into a variety of FPGA-based embedded system applications, ranging from outer space and aerospace to more earthly scientific and automotive innovations. In addition, you’ll learn how every aspect of Xilinx embedded processing solutions has been substantially upgraded and usability drastically simplified through intuitive hardware and software www.xilinx.com/xcell/ tools. Advancements include the v7 release of the MicroBlaze™ 32-bit soft processor with con- figurable memory management unit, the new high-performance processor interconnect architec- ture, and the ISE™ Design Suite 10.1 development environment with unified access to all logic, Xilinx, Inc. 2100 Logic Drive embedded, and DSP design tools. Of course, our expansive ecosystem of third-party embedded San Jose, CA 95124-3400 technology and service providers, several of whom are featured in this issue, also play a key role Phone: 408-559-7778 FAX: 408-879-4780 in helping developers maximize the benefits of Xilinx embedded solutions. www.xilinx.com/xcell/ At the end of the day, the goal of these efforts is to enable rapid design of high-performance embedded systems with highly integrated, scalable hardware platforms, customizable embedded © 2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands included processors, and pre-verified IP cores. Designers can then focus their efforts on creating differenti- herein are trademarks of Xilinx, Inc. All other trade- ated value, thereby getting products to market in time to extract maximum return and ultimately marks are the property of their respective owners. realizing the full potential of their own inspirations. The articles, information, and other materials included in this issue are provided solely for the convenience of our readers. Xilinx makes no warranties, express, implied, statutory, or otherwise, and accepts no liability with respect to any such articles, information, or other materials or their use, and any use thereof is solely at the risk of the user. Any person or entity using such information in any way releases and waives any claim it might have against Xilinx for any loss, damage, or expense caused thereby. Forrest Couch Publisher ON THE COVER
EMBEDDED APPLICATIONS SYSTEM DEVELOPMENT 48 1414 Next-Generation Wireless Standards Using Virtex-5 FXT Devices LTE baseband reference design implements hardware, Designing Multiprocessor SOCs software and high-speed off-chip comms for wireless baseband Using Xilinx EDK tools and IP, you can scale your applications processing on single device. by designing SOCs with multiple processors.
EMBEDDED APPLICATIONS PLATFORMS, PROCESSORS, & TOOLS 61 2222 61 FPGA Coprocessors for Spacecraft Image Processing C-to-FPGA design techniques speed development of performance-accelerated space-based Building Linux Platforms on Xilinx Processors imaging applications. Find out how a LinuxLink subscription accelerates your Linux development and mitigates project risks. 88 Cover Embedded Processing Innovations with Virtex-5 FXT FPGAs Get ahead of the embedded system design curve by maximizing performance and throughput with the built-in PowerPC 440 processor block. SECOND QUARTER 2008, ISSUE 64 Xcell journal
VIEWPOINTS Letter from the Publisher: The Next Wave in Embedded Processing Innovation...... 3 Xilinx Welcomes New President & CEO...... 7 Power.org: Presenting Power Architecture Technology...... 20
FEATURES Cover Embedded Processing Innovations with Virtex-5 FXT Devices...... 8
Embedded Applications Implementing Next-Generation Wireless Standards Using Virtex-5 FXT Device Parts...... 14 Developing FPGA Coprocessors for Performance-Accelerated Spacecraft Image Processing ...... 22 Using the MicroBlaze Processor to Develop Speed Sensors for F1 Racing Cars...... 28 FPGA-Based Controller Enables Precise Chemical Analysis...... 32 Reconfiguring the Battlespace...... 36
System Development Hardware/Software Optimization of Embedded Systems...... 41 Custom Processors in FPGAs ...... 45 Designing Multiprocessor SOCs ...... 48 Accelerating Video Development on FPGAs Using the Xilinx XtremeDSP Video Starter Kit...... 52
Platforms, Processors, & Tools Microprocessor Report: MicroBlaze v7 Gets an MMU...... 55 Building Linux Platforms on Xilinx Processors with LinuxLink ...... 61 Quickly Build Distributed Systems...... 65 Debugging Systems with FPGA Embedded Processors...... 68 A Complete Debugging Solution for Xilinx Embedded Processors...... 72 Reduce FPGA Design Time with PICO Express FPGA...... 75 Design Embedded Systems with Xilinx and Synplicity...... 78
RESOURCES Embedded Processing Application Notes and Reference Systems...... 81 Embedded Processing QuickStart!...... 82 Embedded Unlock your future
Enter the New Era of Configurable Embedded Processing
Adapt to changing algorithms, protocols and interfaces, by creating your next embedded design on the world’s most flexible system platform. With the latest processing breakthroughs at your fingertips, you can readily meet the demands of applications in automotive, industrial, medical, communications, or defense markets.
Architect your embedded vision • Choose MicroBlaze™, the only 32-bit soft processor with a configurable Get the Complete Embedded Solution MMU, or the industry-standard 32-bit PowerPC ® architecture • Select the exact mix of peripherals that meet your I/O needs, and stitch them together with the new optimized CoreConnect™ PLB bus
Build, program, debug . . . your way • Port the OS of your choice including Linux 2.6 for PowerPC or MicroBlaze • Reduce hardware/software debug time using Eclipse-based IDEs together with integrated ChipScope™ analyzer
Eliminate risk & reduce cost • No worry of processor obsolescence with Xilinx Embedded Processing technology and a range of programmable devices • Reconfigure your design even after deployment, reducing support cost Linux 2.6 and increasing product life Order your complete development kit today, and unlock the future www.xilinx.com/processor of embedded design.
www.xilinx.com/processor
At the Heart of Innovation
©2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. Viewfrom the top Xilinx Welcomes New President & CEO Wim Roelandts Passes the Torch to Industry Veteran Moshe Gavrielov
On January 7, 2008, Xilinx announced the appointment of a new president and CEO, Moshe Gavrielov. He assumes the role from Wim Roelandts, one of the most revered CEOs in the industry. Roelandts remains chairman of the board, while Gavrielov becomes only the third Xilinx® CEO in the company’s 24-year history. In this issue’s View from the Top, Moshe provides a I see a three-fold opportunity for Xilinx automation, aerospace and defense, med- glimpse into his vision for Xilinx and how he to better serve our customers while achiev- ical, automotive, and consumer electronics. plans to get us there. ing higher revenues and increased market FPGAs offer a compelling value propo- share. First, we must maintain our pace in sition and significant technology advan- Let me start by saying how honored and expanding the underlying capabilities of our tages for both high-performance DSP and excited I am to assume responsibility for silicon. Second, we must continue to build embedded applications. Already, DSP and leading one of the most highly respected our portfolio of IP across a growing breadth embedded processors have opened an semiconductor companies in the world. of applications. And third, we must contin- entirely new set of opportunities for This is truly my dream job and I feel I’m ue to invest in our development tools so Xilinx. By integrating key functionalities joining the company at a very exciting that we are able to serve the diverse needs of into a single FPGA, designers can reduce time. We have a great opportunity to bring a growing, increasingly specialized commu- total system cost, power, and board space the benefits of Xilinx technology to a nity of traditional and new users. by reducing component count. Simple, broader range of industries and applica- Xilinx is already on the path to becom- right? Not so simple actually, which leads tions in the coming years. ing a complete solutions supplier. But we me back to my point on becoming a total FPGAs are becoming more and more have a ways to go before we get there. We solutions provider. Every designer – par- relevant to a wider range of designers. It’s need to learn from our customers across ticularly those coming from the DSP or a well-known fact that the best way for the globe, developing an intimate under- embedded space – will need more than designers to prepare for change is to arm standing of their pain points as if they were just FPGAs to make the proverbial leap themselves with options. The FPGA has our own. Then we need to provide solu- from an ASIC or ASSP. always provided these options through the tions to make their pain disappear. Xilinx must provide all of the necessary programmability of its hardware. We are tools, including IP, software tools, and the ultimate providers of faster time to On DSP and Embedded design methodologies that allow designers market. But it’s not just about silicon and Xilinx already had its eye on the DSP and to leverage the high-performance, flexible flexibility – it’s about the IP as well. In embedded markets when I arrived. Three capabilities of the FPGA. Oh yeah – and that way, Xilinx today is in a similar tran- years ago, the company announced the for- we need to make that leap as compelling sition to one I have faced before: moving mation of a new division to broaden the and simple as possible. I, for one, am look- from a supplier of blank gates to being a reach of Xilinx FPGAs into multi-billion ing forward to doing just that. supplier and supporter of the IP that goes dollar vertical markets previously the For the latest in Xilinx solutions designed into the gates. But supporting a body of domain of of ASICs and ASSPs, including to meet the needs of the DSP and embedded IP in the field is a non-trivial undertaking. audio, video and broadcast, industrial designer, visit www.xilinx.com.
Second Quarter 2008 Xcell Journal 7 COVER STORY EmbeddedEmbedded ProcessingProcessing InnovationsInnovations withwith Virtex-5Virtex-5 FXTFXT DevicesDevices Maximizing performance and throughput utilizing the built-in PowerPC 440 processor block.
8 Xcell Journal Second Quarter 2008 COVER STORY by Craig Abramson high-performance processing system while al enhancements in the Virtex-5 FXT Product Marketing Manager still allowing a wide variety of topologies. device’s embedded processor block. This Xilinx, Inc. You still have the flexibility and advantages results in an overall system cost reduction [email protected] of an FPGA-based implementation, but and invariably a more tightly integrated you now also have the added benefit of a processor system. Dan Isaacs hardened, integrated interconnect architec- The processor buses only take up three Director of Embedded Processor Marketing ture that (among other things) maximizes of the five “crossbar master” ports on the 5 Xilinx, Inc. access to external memory. x 2 crossbar (see Figure 1). The crossbar [email protected] As you will see, the result is an embed- includes two additional master ports, Ahmad Ansari ded block that allows you to develop a because in many real-world applications it’s Principal Engineer Xilinx, Inc. [email protected] Crossbar Crossbar Masters Slaves With the advent of the Xilinx® Virtex™-5 FXT FPGA, you have an opportunity to PLB Slave/DMA get ahead of the embedded system design curve. The need to quickly develop and val- Processor Instruction Read idate embedded systems has never been Memory Controller Interface more apparent than in the realm of embed- ded system design. Processor Data Read Combining software and hardware to PLB Master demonstrate this at a system level (as quick- Processor Data Write ly as time permits) has become common- place in the industry. By providing a more tightly coupled, flexible, scalable solution, PLB Slave/DMA you have a means to address many hard- ware and software SOC design challenges. FPGAs provide a significantly faster path for designers to rapidly develop, prototype, Figure 1 – The crossbar and test their embedded designs. The Virtex-5 FXT device platform, the third- generation FPGA to feature a PowerPC wider range of high-performance processing not just the processor that needs access to processor, has added an embedded block architectures in a shorter period of time. memory or peripherals. Each of these that will help you meet more demanding PowerPC processors generically have “crossbar master” ports comprises a proces- design requirements while allowing you to three interfaces: instruction read, data read, sor local bus (PLB) slave interface, as well finish your designs quickly and easily. and data write. In previous Virtex device as two channels of scatter-gather direct In this article, we’ll provide a detailed architectures, which embedded the memory access (DMA). description of the embedded processing PowerPC 405, these processor buses would The “slave” side of the crossbar com- innovations in the PowerPC 440 processor connect to FPGA fabric. The timing clo- prises two ports. One port is a dedicated block and system interconnect. A key area of sure requirements of this circuitry would memory controller interface that provides a focus in the Virtex-5 FXT FPGA processor vary based on how many and what types of high-throughput generic interface to soft block is simplification through integration. loads the design presented to the buses. memory controllers. The other is a bus for A corollary to this is ease of develop- In the Virtex-5 FXT FPGA (where the attaching I/O devices and peripherals. ment and test. Quickly bringing up a sys- processor is now the PowerPC 440), these tem to allow software developers to get a buses are hardened and hooked directly A Better Processor head start on actual hardware is a major to a new structure, an integrated 5 x 2 Providing all of this extra functionality in emphasis for the Virtex-5 FXT device’s crossbar switch – generically referred to as the embedded processor block would be of PowerPC 440 processor. the crossbar. This hardened interconnect little consequence if there were not a proces- provides significantly higher performance sor with the horsepower to take advantage of Simplification Through Integration (with virtually no consumption of FPGA it. The Virtex-5 FXT FPGA represents the Integration is key. We have reduced the logic resources and fixed timing) when first time anyone has embedded a PowerPC amount of FPGA logic needed to build a combined with the rest of the architectur- 440-class processor in an FPGA.
Second Quarter 2008 Xcell Journal 9 COVER STORY
ent memory windows mapped into the DMA memory space of any of the crossbar mas- DMA ters. These memory spaces can be pro- PLB Slv0 grammed through the FPGA bitstream, by the processor at run time, or even by I PowerPC external logic on the FPGA using the Memory I/F APU 440 R crossbar’s sideband bus, called the device Processor W PLB Master control register (DCR) bus.
PLB Slv1 Integrated PLB Interfaces As we mentioned earlier, many of the buses DCR DMA connected to the crossbar are processor DMA local buses, also called PLBs. The PLB is one of the standard CoreConnect buses as defined by IBM. Figure 2 – The PowerPC 440 embedded processor block An earlier version of the PLB (version 3.4) was used as one of the standard buses on The PowerPC 440 offers a significant per- In fact, up to four concurrent transac- PowerPC 405 designs in Virtex-II Pro formance improvement over the PowerPC tions can exist: two for each crossbar slave and Virtex-4 FX FPGAs and is also used 405 (which was embedded in previous Virtex (such as the memory controller or PLB in the new PowerPC 440 embedded families) in a number of areas. master). Additionally, each crossbar mas- processor block. First, the PowerPC 440, when in the ter (that is, the three processor PLBs and In the PowerPC 440 embedded proces- fastest speed-grade FPGA, can be clocked the two PLB slave interfaces) can pipeline sor block, the PLBs connect the processor’s at 550 MHz. The PowerPC 405 topped out four read and four write transactions to internal caches to the input side of the at 450 MHz. This is almost a 20% per- the same slave. crosspoint. The buses are: formance improvement. But add to that Another key feature of the crossbar is the fact that the I and D cache sizes are its highly programmable memory map- • ICURD: instruction cache unit read doubled, the instruction pipeline is seven ping. You can think of the entire system • DCURD: data cache unit read instead of five stages, and the execution of having available memory space of 4 unit can now execute two instructions out GB. Both the memory controller inter- • DCUWR: data cache unit write of order and in parallel. face and the PLB master can have differ- The result? You’ve got a processor with performance sufficient to handle a great many PPC405 PPC440 Benefit of today’s embedded processing challenges. (Virtex-4 FX FPGA) (Virtex-5 FXT FPGA) There are a number of other advantages to moving from the PowerPC 405 to the Architecture 32-bit instruction, 32-bit instruction, Access more physical PowerPC 440, as shown in Table 1. 32-bit address, 36-bit address, 128-bit data, memory, higher speed The PowerPC 440 embedded block is 64-bit data Book E compliant data movement shown in Figure 2. Pipline Single instruction/cycle, Two instructions/cycle, More efficient instruction High-Throughput Switch Matrix five-stage pipeline, seven-stage pipeline, execution The 5 x 2 crossbar is more than just a big in-order issue out-of-order issue switch. It provides non-blocking pipelined access from the five crossbar masters to the Caches – I/D 16K/16K, two-way set 32K/32K, 64-way set Less memory two crossbar slaves (see Figure 1). It allows associative, no locking associative, locking access latency concurrent transfers between different agents on the crossbar at the same time. MMU Page size: 1 KB to 16 MB Page size: 1 KB to 256 MB Less page swapping As shown, we’ll call the buses going into the crossbar “crossbar masters” and the DMPS Estimate 700+ DMIPS 1000+ DMIPS Better benchmarks equal buses coming out “crossbar slaves.” These higher performance interfaces are highly pipelined, thus allow- ing a large number of transactions to be in Table 1 – Virtex PowerPC integrated processor feature comparison progress at the same time.
10 Xcell Journal Second Quarter 2008 COVER STORY
PLB PLB Master Slave Device Device
PLB Slave Soft External DDR2 Memory ICURD Memory MCI Controller DCURD PPC440 MPLB DCUWR
PLB PLB Slave Slave PLB Slave Device Device
PLB PLB Master Slave Device Device
Figure 3 – PLB masters and slaves
The PLB used in the Virtex-5 FXT device Note that there can be no more than Optimized DMA Engines is version 4.6 (PLB46). The PLB46 bus archi- four PLB masters connected to each PLB There are four additional crossbar masters; tecture brings with it a number of new capa- slave bus. Few systems are likely to need they are the four DMA channels. Each bilities that give it a nice performance boost more than four masters, but if you did need DMA channel has separate 32-bit transmit over its predecessor. The most obvious is the more, you could always use the PLB/PLB and 32-bit receive interfaces. They share fact that while PLB34 was 64 bits, PLB46 is bridge IP provided with the Embedded crossbar arbitration with PLB slave inter- 128 bits. But not to worry – if the IP con- Development Kit (EDK) (see www.xilinx. faces, as shown in Figure 4. nected to the bus is less than that, the bus will com/support/documentation/ipembedprocess_ All DMA ports can be operating at the perform dynamic bus sizing as required to coreconnect_plbbusstruct.htm). same time. Each one has a dedicated FIFO, accommodate 32- and 64-bit transactions. Figure 3 is a simplified system diagram so as one DMA is accumulating data, the We should also point out that the PLB46 showing how PLB peripherals can be other DMA can be pumping data through version is a Xilinx implementation of the hooked to crossbar master and crossbar the crossbar. Each DMA channel operates IBM-defined PLB46, optimized to take slave ports. Note that if you have multiple asynchronously to the processor clock. advantage of FPGA resources. masters on any PLB, arbitration is handled The interface into the DMA channels is PLB46 – and indeed all versions of PLB by the IP for the bus. You do not need a through an interface called LocalLink. – have the concept of master and slave. This separate arbiter. Xilinx uses the LocalLink interface in a should not be confused with crossbar mas- ter and crossbar slave. (Again, refer to Figure 1.) As we stated earlier, there are two PLB slave port interfaces on the crossbar; they 32 Local Link RX are crossbar masters. These slave ports are DMA 32 Local Link TX Figure 4 – PLB slave connected to the FPGA fabric. 32 Local Link RX buses and DMA DMA 32 Local Link TX Arbitration
In a processor system there is often the channels share Programmable 128 PLB Slave PLB Slv0 need to allow something besides the proces- crossbar arbitration. sor to access external memory or on-chip peripherals. The PLB slave interfaces allow MCI just that. PLB masters, built from FPGA logic, connected to the PLB Slave ports (see MPLB Figure 3) can access either the MCI or the MPLB through the crossbar. 128 PLB Slave Similarly, the function of the PLB master PLB Slv1 32 Local Link RX (the one that is the crossbar slave) is to have DMA 32 Local Link TX 32 Local Link RX a PLB to hook to I/O devices and soft Arbitration
Programmable DMA 32 Local Link TX peripherals. And because the PLB master is a crossbar slave, anything hooked to a cross- bar master port can access it.
Second Quarter 2008 Xcell Journal 11 COVER STORY
To maximize throughput and performance, the PowerPC 440 embedded block employs scatter/gather DMA. To make using this capability as easy as possible, Xilinx provides wrappers for the various pieces of IP and embedded blocks it offers. number of IP blocks. LocalLink is a point- This wrapper, combined with the func- High-Performance Dedicated Memory Interface to-point interface that sends packets to, or tionality of the DMA engine in the Rounding out the new processor block is the receives packets from, some external device. PowerPC 440 embedded block, allows dedicated memory controller interface. The The most notable type of processor IP you to easily build a processing system purpose of this interface is to provide a ded- that uses the LocalLink interface is the hard with a high-performance TEMAC con- icated link out to external memory, but at embedded tri-mode Ethernet media access nected directly to the PowerPC 440 DMA the same time not be tied to any specific controller (TEMAC) block. The TEMAC channels. Figure 5 is a simplified system memory technology. has a wrapper that allows it to communicate showing how both DMA and PLB periph- At this time, the memory controller directly with the PowerPC 440 DMA. erals can be hooked to crossbar master interface supports a stand-alone DDR2 con- Although all data paths through the and crossbar slave ports. troller and MPMC4 controller, all available crossbar are 128 bits, the LocalLink inter- The DMA channels are controlled by through Xilinx Platform Studio, EDK 10.1. face into and out of the DMA channels are descriptors, small blocks of memory that are This interface provides the flexibility to con- all 32 bits. As such, there is built-in logic set up by the PowerPC 440 processor before nect to virtually any memory technology between the DMA controller and the cross- commencing DMA operations. The descrip- now or in the future. bar that realigns data. tors control how much data is transferred and The memory controller interface is To maximize throughput and perform- where data is located in system memory. streamlined, comprising address/data/ ance, the PowerPC 440 embedded block Descriptors can be chained together if control. It can be programmed to support employs scatter/gather DMA. To make need be, effectively creating a sequence of 128-, 64-, 32-, or even 16-bit memory. It using this capability as easy as possible, commands to control a DMA channel. does width and burst realignment, so while Xilinx provides wrappers for the various The DMA controller is covered in its the DMA may be bursting one size packet, pieces of IP and embedded blocks it offers. entirety in the reference guide, entitled the memory controller can buffer and The first one targeted specifically “Embedded Processor Block in Virtex-5 realign the packet data to maximize the toward the PowerPC 440 is the soft wrap- FPGAs” (http://www.xilinx.com/support/ bandwidth to the memory. Burst size is per for the embedded TEMAC blocks. documentation/user_guides/ug200pdf). programmable and can be 1, 2, 4, or 8, and the memory controller interface will automatically adjust the address to accom- PLB PLB Master Slave modate the various burst widths. Device Device The majority of soft memory controller handshaking signals are generated by the PLB Slave Soft External Memory DDR2 interface on behalf of the memory con- ICURD MCI Controller Memory DCURD troller. They are provided ahead of time such PPC440 MPLB DCUWR that the soft memory controller can generate
PLB PLB throttling signals back to the memory inter- Slave Slave PLB Slave face. The memory controller interface – on Device Device and behalf of the soft memory controller – can PLB PLB DMA Master Slave also be programmed to detect bank and row Device Device misses ahead of time and will inform the soft Local Link Interfaces memory controller to anticipate a bank or row miss. All of these features together pro- Wrapper Wrapper vide a solution whose primary goal is to Hard Hard maximize memory throughput. TEMAC TEMAC
Tuning the System Figure 5 – Sample system with both PLB and DMA peripherals In some situations, a PLB or DMA interface just may not be the right solution. For
12 Xcell Journal Second Quarter 2008 COVER STORY
includes Platform Studio, with its compre- hensive library of IP for hardware design, and Platform Studio SDK, a software Virtex-5 FXT Development Platforms development environment familiar to Jump-Start Your Design many embedded software engineers. Dual PPC With the introduction of the Virtex-5 Single PPC FXT family of devices, we continue to fur- ther strengthen our third-party alliances with support from industry-leading operating sys- tem providers, including WindRiver Systems with VxWorks and Green Hills Integrity. Linux support is provided through LynuxWorks, Monta Vista, and WindRiver ML507 Evaluation Board ML510 Development Board • XC5VFX70T-1FF1136C • XC5VFX130T-1FF1738C Systems. In addition, Xilinx recognizes the • PCie x 1 Plug-In or Stand-Alone • ATX Form Factor importance of open-source Linux, and we’re moving forward on that front. Xilinx and its partner companies are also developing a wide variety of boards. Figure 6 – Xilinx ML507 evaluation and ML510 development boards Xilinx has multiple boards for the Virtex-5 FXT device: the ML507 with the instance, you might find that you have a figured in multiple ways. Virtually every XC5VFX70T and the ML510 with the software algorithm that takes too many interface is programmable. XC5VFX130T, as shown in Figure 6. The cycles to execute and is affecting your sys- For example, when you build your pro- ML507 evaluation platform enables your tem bandwidth. That algorithm is a great cessing system in the Xilinx Platform team to quickly begin developing hard- candidate for implementation in hardware, Studio development environment and a ware, software, or both. When multiple and the interface to which you may want bitstream is created, all of the specifications processors or a motherboard-type platform that hardware connected would be the aux- of the processing system are in the bit- are required, the ML510 with the dual- iliary processing unit, or APU interface. stream. Thus, when the FPGA starts up, processor XC5VFX130T is ideal. The PowerPC 440 has a second-genera- your processor is up and running. tion APU interface that is tightly coupled Now, let’s say the processing system is up Conclusion to the execution units of the processor. The and running and you want to modify the A high-performance processing solution interface is controlled by 16 user defined operation of one of the DMA channels. You with optimized data throughput is high on instructions (UDIs). The data path of the would do that through the DCR interface. the wish list of embedded designers every- APU interface is 128 bits. There are DCR registers to control every where. This is true whether you are run- Perhaps the most common use of the aspect of DMA operation. ning critical algorithms at the heart of the APU interface is for connecting to a float- In fact, there is DCR access to virtually latest wireless base station, switching high- ing-point unit (FPU). The FPU is every other subsystem of the embedded bandwidth data through a video switch, IEEE754-compatible and supports both block: the PLBs and crossbar, memory con- performing advanced signal processing for single- and double-precision operations for troller interface, and the APU controller. guidance systems using coprocessor accel- the PowerPC 440. Refer to Figure 2 for more details. eration, or handling complex control and The FPU is implemented in the FPGA system management tasks. soft logic fabric and utilizes the DSP48E Putting It All Together The Virtex-5 FXT embedded processor blocks. The soft logic implementation This innovation would be for naught if block, with a multi-ported non-blocking operates up to half the frequency of the Xilinx did not provide a comprehensive infra- integrated processor interconnect and hard embedded processor. structure to take advantage of all of the archi- high-performance integrated DMA, offers Other uses of the APU interface tectural enhancements. We should point out a solution that allows you to focus on the include hardware algorithm acceleration, that the Virtex-5 FXT FPGA with the key elements of your embedded design. as well as an alternative high-bandwidth PowerPC 440 block represents our eighth With a virtually unlimited number of link to block RAM. year in embedded processing and our third- ways to harness these embedded capabili- generation FPGA with a hardened processor. ties, the Virtex-5 FXT FPGA embedded Configuring the Embedded Block Throughout that time we’ve been con- processing solution provides a highly inte- By integrating the PowerPC 440 block in stantly updating EDK, our award-winning grated platform for high-performance, the FPGA, the processor block can be con- Embedded Development Kit. EDK high-throughput SOC designs.
Second Quarter 2008 Xcell Journal 13 EMBEDDED APPLICATIONS ImplementingImplementing Next-GenerationNext-Generation WirelessWireless StandardsStandards UsingUsing Virtex-5Virtex-5 FXTFXT DeviceDevice PartsParts The Xilinx LTE baseband reference design shows how the Virtex-5 FXT device enables hardware, software, and high-speed off-chip comms for wireless baseband processing – implemented on a single device.
by Rob Payne Staff Engineer Xilinx, Inc. [email protected]
The next generation of the 3GPP wireless standard is called long-term evolution (LTE). It provides a leap in performance and a complete move to packet-based pro- cessing. In the physical (PHY) level of the LTE specification, specific challenges exist when dealing with higher data throughput rates, as well as the move to OFDM (orthogonal frequency-division multiplex- ing) for transmission. Xilinx has developed – or is in the process of developing – several new or revised DSP LogiCORE™ solutions to meet the demands of the new specification. With such blocks it is critical not only to verify them as stand-alone blocks, but also to validate them in real systems with real- world data. The Xilinx 3GPP downlink ref- erence design provides this validation, as well as providing a reference to customers about how to use the blocks.
14 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
The higher data rate in LTE places grid are allocated to different users’ data as 3GPP LTE Downlink Processing increased processing demands on all parts resource blocks. The task of scheduling data Figure 2 shows the processing stages in the of the system: increased DSP hardware transmission and allocating resource blocks baseband section of the 3GPP LTE down- processing in the baseband, increased soft- to users is performed by the higher software link. The processing for both transmit and ware processing to implement the higher layers in the LTE stack. receive can be split into two main sec- layers of the UMTS protocol stack, and increased I/O communication bandwidth to accept packets and pass data to remote radio-heads. 1 subframe = 1 ms = 14 OFDM Symbols In this article, I’ll review some of the new features of the LTE specification and how
Xilinx® Virtex™-5 FXT devices address the 0 1 2 3 4 5 6 7 8 9 10 11 12 13 increased processing demands of LTE through its tight integration of microproces- sor subsystem, DSP-enhanced FPGA fabric, and high-speed communication links.
The 3GPP LTE Physical Layer One of the key changes in the Layer 1 Resource Grid (PHY layer) of the 3GPP LTE is the change from CDMA (code-division multi- Frequency Resource Block is a 12 x 14 block of resource elements ple access) to OFDM (orthogonal fre- in a subframe. quency-division multiplexing). One of the main benefits of OFDM is that it reduces the problems associated with multiple paths in the radio channel. In CDMA, a significant amount of processing must be devoted to characterizing and tracking the radio channel to compensate for the effects of fading in the channel. Figure 1 illustrates the structure of an example LTE subframe. The subframe Sub-Frequency = 15 kHz comprises a number of OFDM symbols. Each OFDM symbol provides the data input for an inverse fast Fourier transform (IFFT). In LTE this may be as many as 2,048 input points for in-phase (I) and quadrature (Q) components. A subframe can be represented as a resource grid, where each resource element in the grid comprises a single I/Q input KEY: point for the IFFT in an OFDM symbol. User 1 Resource Blocks Resource grids can be layered to provide User 2 Resource Blocks data to multiple antennas, supporting User 3 Resource Blocks transmission schemes such as transmit Synchronization Signal diversity or MIMO (multiple input/multi- ple output) techniques. Control Channel The resource grid is allocated to differ- Time ent purposes. Resource elements are allocat- ed to control channels, data channels, and synchronization signals. The diagram also Figure 1 – LTE uses OFDM. A subframe comprises a resource grid, with areas shows the packetization of data on the allocated to control, synchronization, and user data. Each column of the grid forms an OFDM symbol that is converted to the time domain by an IFFT. channel – different areas of the resource
Second Quarter 2008 Xcell Journal 15 EMBEDDED APPLICATIONS
between transmitter and receiver to Downlink TX Downlink RX increase data rates).
Transport Blocks Transport Blocks The next step is to map samples to
TB CRC Insertion TB CRC Checking resource elements in the OFDM resource
Transport Blocks + CRC Transport Blocks + CRC grid. This stage also adds synchronization TB Segmentation TB Reassembly signals to the resource grid, allowing the
Segmented Blocks Segmented Blocks receiver to synchronize with the transmitter. CB CRC Insertion CB CRC Checking The output of this stage is passed to an
Symbol Rate Segmented Blocks + CRC Segmented Blocks + CRC Processing IFFT. The IFFT takes the samples from the
Turbo Encoder / frequency domain to IQ signals in the time Convolutional Turbo Decoder / domain, which are ready to be sent to the Encoder Convolutional radio-head for transmission. OFDM Decoder Codeblocks requires a cyclic prefix (CP) added to the Rate Matcher data, which repeats the end of the time- domain signal at the start of the output. The
Punctured Codeblocks Receive Likelihoods CP size is determined by the size of the QAM Modulation LLR Generation mobile cell and reflections. A sufficiently I/Q Samples (Freq Domain) I/Q Samples (Freq Domain) larger CP is required to remove multi-path Layer Mapper Layer Demapper effects from the OFDM symbol. Antenna I/Q Samples Antenna I/Q Samples Resource Resource Mapping Receive Sample Rate Processing Mapping Sample Rate Sync Insertion Processing The receive processing generally follows the OFDM Symbols OFDM Symbols inverse of the transmit processing. The first step is to do an FFT on the incoming data IFFT FFT to convert the time-domain signal back to + CP Insertion + Synchronization the frequency domain. In the reference design, data is received across all OFDM I/Q Samples (Time Domain) sub-frequencies for all users, but a real mobile user would only decode data on the resource blocks that it had been allocated. Figure 2 – 3GPP LTE downlink processing: transmit and receive chains Synchronization at this step is also per- formed to synchronize the system to the start of each sub-frame in the OFDM data. tions: symbol rate processing and sample Each segment then has a CRC added The output of the FFT is processed through rate processing. before it is supplied to the forward-error a layer demapper that inverts the layer map- Symbol-rate processing is centered encoder: for data channels this is a turbo ping in the transmission. around forward error correction, used to encoder and for control channels this is a add redundancy to the data stream in a convolutional encoder. Following the encod- Receive Symbol Rate Processing bandwidth-efficient manner and to recov- ing, rate matching is applied to the output to The first step in the receive processing is to er data on the receiver. Sample-rate pro- puncture the data so that it will fill the avail- take modulation symbols and convert cessing is centered around the IFFT/FFT able resource blocks in the OFDM resource them to individual bits. For turbo-encoded that performs the OFDM part of the grid. Finally, the data stream is modulated data, this is a set of probabilities in the baseband operation. with a specified modulation (QPSK, form of logarithmic-likelihood-ratios for QAM16, or QAM64) to give a sample value the turbo decoder. For convolutionally Transmit Symbol Rate Processing to go in the OFDM resource grid. encoded data, it is a distance metric that is The first stage in the Layer 1 (PHY) process- then fed to a Virterbi decoder. The output ing of the LTE downlink takes transport Transmit Sample Rate Processing of the error correction is checked for CRC blocks from the media access controller Samples are first mapped to different validity and reassembled into the original (MAC) layer. Transport blocks have cyclic antenna layers. This mapping allows sup- transport blocks. redundancy checks (CRCs) added, while port for schemes such as transmit diversi- larger transport blocks may be segmented to ty (multiple transmit paths to reduce The LTE Baseband Reference Design ensure that blocks do not exceed a maximum noise at receiver) or MIMO (MIMO LTE has required a number of new or size supported by the forward-error encoder. techniques utilize multiple channels revised Xilinx LogiCORE solutions to meet
16 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
(EDK). XPS was the framework to pull in TX PC TX Board RX Board RX PC ML507 ML507 the various LogiCORE systems that needed
VLC Control Test Application Test Application VLC Control validating, plus additional blocks that were Server GUI UDP LTE Bridge UDP LTE Bridge Client GUI implemented in a mixture of VHDL and
LTE LTE Xilinx System Generator for the DSP por- Drivers Drivers tions of the design. Xilinx Xilinx Windows LWIP LTE TX LTE RX LWIP Windows IP Stack IP Stack CHAIN CHAIN IP Stack IP Stack PDSCH LTE Implementing on the Virtex-5 FXT FPGA Encode Turbo Decode Figure 3 shows a top-level block diagram of FFT v5 the LTE baseband reference design. The system shown consists of two ML507s, Ethernet FX70T Aurora Aurora FX70T Ethernet MAC/PHY LLTEMAC GTX GTX LLTEMAC MAC/PHY which act as IP packet bridges between the Gigabit Ethernet and LTE protocol stacks. Gigabit Ethernet Aurora Link Gigabit Ethernet The reference application is video
KEY: streaming using the open-source Virtex-5 FXT Features: PPC440 Software DSP Blocks High-Speed I/O Blocks VideoLan Server. The VideoLAN server runs on the transmitter PC, which com- Other Features: New / Revised Ref Design GUI Off-the-Shelf LogiCORE Solution municates to the system over the Gigabit Ethernet link. Video is received by the Figure 3 – Xilinx LTE baseband reference design showing system integration of hardware VideoLAN client on the receive PC. A and software blocks. Blocks are stacked into respective protocols so that higher layers use services control GUI also runs on the PC, allow- of lower-layer blocks. Peer-to-peer communication goes from left to right. ing you to set up the LTE blocks to be changed and monitored remotely. the new requirements of the specification. amount of additional design work for the Video packets are transmitted through For example, the turbo encoder uses LTE- reference design. We also wanted to mini- the LTE software driver and into the LTE specific interleavers; the IFFT requires the mize the system integration and tool issues downlink transmission blocks previously addition of a variable-length cyclic prefix; and use off-the-shelf boards and IP blocks described. The output I/Q data from the and the turbo decoder requires a far higher as much as possible. LTE downlink is then sent out over an throughput than previous 3GPP standards. Previous reference systems for WCDMA Aurora link. Aurora was chosen initially Verifying these individual blocks pres- Release 6 of the 3GPP specifications had over dedicated base station standards such ents a number of challenges. First, the LTE used a separate DSP microprocessor board as CPRI/OBSAI protocols because of its standard is still changing and has not been coupled with an FPGA board. This meant early availability on FXT parts. ratified. Second, the size of OFDM symbol using different tool flows for software and The I/Q data is received on the receive (potentially 2,048 points) and the length of hardware design, plus the overhead of ML507 board and passed into the LTE the sub-frame and turbo-decoder iterations designing interposer boards to connect the downlink receive chain. Processing and required mean that a large number of simu- main boards together. communication follows an inverse order to lation cycles are required to verify the For the LTE baseband reference design, that for the transmission, and finally deliv- behavior of even a single transport block we chose to avoid these issues by gaining ers video packets to the VideoLAN client being decoded. This can limit the test cov- early access to the Xilinx Virtex-5 FXT on the receiving PC. erage achievable in simulation. Finally, just device silicon. The FXT parts integrate on a Figure 3 is color-coded to highlight how running unit tests on blocks does not test single chip a PowerPC 440 microprocessor, the different features of the FXT parts are the macroscopic behavior of systems such as DSP-enhanced FPGA fabric, and high-speed used. The software elements of the system transport-block error rates, or validate that GTX off-chip communication blocks. This that run on the PowerPC 440 are highlight- blocks have compatible interfaces. decision minimized our system integration ed in dark blue. DSP functions comprising To address these issues, the design team problems (as all the key components were on the LTE downlink transmit and receive pro- decided to implement an LTE reference a single chip) and allowed us to use the stan- cessing are in orange and high-speed I/O system design that would provide system- dard customer design board, the ML507. blocks are in light blue. New or revised level validation of the new LTE LogiCORE In addition, the design simplified our LogiCORE solutions are shown in yellow. IP using real-world data sources such as tool flow. The top level of the system was We found we could implement both video streams. integrated in Xilinx Platform Studio (XPS), transmit and receive functionality on a sin- Since the main aim of the reference sys- which allows a large number of pre-verified gle FX70T part, so by looping back the data tem was to validate new IP LogiCORE blocks to be pulled into the system from the on the Aurora link, we could implement the solutions, we wanted to minimize the Xilinx Embedded Development Kit system on a single ML507.
Second Quarter 2008 Xcell Journal 17 EMBEDDED APPLICATIONS
Get Published
Figure 4 – Screenshot of demo running. The GUI allows variable modulation, encoding rate, and channel noise. Transmitted video is in the upper right-hand corner. Received video (with time lag for software video decode buffers) is shown for two users in the bottom half of the figure. Uncorrected errors appear as artifacts in the video decode.
Final Demonstration System imize efficient use of the channel, the base Figure 4 shows a screenshot of the final station has to balance the cost of retransmis- demonstration system in operation. The sion against the benefits of lower overall data Would you like transmit video stream is shown along with encoding rates, and also attempt to mini- to be published the video stream received by two different mize latency in the system. These may feed in Xcell LTE mobile users after the packets were into quality of service (QoS) parameters transmitted over a noisy channel. The con- used by higher layers in the LTE protocol. Publications? trol GUI allows variable noise in the chan- It's easier than you think! nel, along with modulation and encoding Conclusion rate parameter settings for each user. The The Xilinx Virtex-5 FXT device provides a Submit an article draft for our Web-based number of iterations used by the turbo tightly coupled integration of processor or printed Xcell Publications and we will decode is also variable. subsystem, DSP-enabled FPGA fabric, and assign an editor and a graphic artist The demonstration allows us to exam- high-speed communication. Such high lev- to work with you to make your work ine the behavior of the LogiCORE solu- els of integration have allowed both the look as good as possible. tions when placed in a system with hardware and software elements of the LTE For more information on this real-world data. In the case of the video baseband reference system to be integrated streams shown in Figure 4, we used the on a single Xilinx FX70T part using stan- exciting and highly rewarding program, same modulation scheme (QPSK) for both dard hardware boards. please contact: users. However, for user 1, we started By using blocks available in the EDK toolkit to maximize IP reuse, and using increasing the data encoding rate, thus Forrest Couch reducing redundancy in the data. Xilinx Platform Studio as a single integra- Publisher, Xcell Publications As we pass an encoding rate of 0.8, we tion framework, design teams can concen- [email protected] cross a threshold and start to see a significant trate on the novel parts of the LTE downlink number of errors in the decoded video design. This has allowed rapid development stream at this value of SNR. These errors and tracking of changes in the LTE specifi- appear as artifacts in the decoded video cation as it approaches ratification. stream. In a real base station, higher layers in Many thanks to other members of the LTE baseband the LTE protocol stack would retransmit design team: Beth Cowie, Graham Johnston, Andrew extra redundant data for the packet. To max- Laney, Neil Lilliott, Jorge Seoane, and Bill Wilkie.
18 Xcell Journal Second Quarter 2008 More gates, more speed, more versatility, and of course, less cost — it’s what you expect from The Dini Group. This new board features 16 Xilinx Virtex-5 LX 330s (-1 or -2 speed grades). With over 32 Million ASIC gates (not counting memories or multipliers) the DN9000K10 is the biggest, fastest, ASIC prototyping platform in production. User-friendly features include: • 9 clock networks, balanced and distributed to all FPGAs • 6 DDR2 SODIMM modules with options for FLASH, SSRAM, QDR SSRAM, Mictor(s), DDR3, RLDRAM, and other memories • USB and multiple RS 232 ports for user interface • 1500 I/O pins for the most demanding expansion requirements Software for board operation includes reference designs to get you up and running quickly. The board is available “off-the-shelf” with lead times of 2-3 weeks. For more gates and more speed, call The Dini Group and get your product to market faster. www.dinigroup.com • 1010 Pearl Street, Suite 6 • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected] VIEWPOINT Presenting Power Architecture Technology The Power Architecture processing platform experiences rapid growth in unit shipments and software support.
by Mike Paczan CTO Power.org [email protected]
Power Architecture processing technology is the common thread for a very broad range of computing devices, from 32-bit microcontrollers to 64-bit ASICs. It is also found in some of the most sophisticated FPGAs available today, including Xilinx® Virtex™-4 FX and Virtex-5 FXT FPGAs. More than a billion Power Architecture- based chips have been built into electronic equipment since 1991, when the process- ing platform was first introduced as IBM’s POWER (Performance Optimized With Enhanced RISC). Every Power Architecture-based chip is rooted in the Power Instruction Set Architecture (ISA), a processing specifica- tion spanning server and embedded com- puting capabilities and incorporating the AltiVec vector (SIMD) processing exten- sion. The Power ISA serves as the basis for future advancement of this highly success- Xilinx; the Cell Broadband Engine from tools, software, systems, and services to ful processing platform. IBM, Sony, and Toshiba; the speed up and simplify product develop- Although the Power ISA name may be PowerQUICC line of SOCs from ment. In any given product category – new to some, its features are familiar to Freescale; the POWER5 and POWER6 from custom design services to real-time thousands of software, hardware, and tool server chips from IBM; and, of course, operating services – there are multiple developers who have worked with hundreds of products from companies all providers from which to choose. The prod- PowerPC devices over the past 16 years. over the world. ucts and services these vendors offer are Power Architecture technology underlies Silicon is just a small part of the com- not only state-of-the-art but in many cases many well-known chip families, includ- plete technology platform: hundreds of have also been refined by decades of expe- ing full-featured Virtex FPGAs from companies provide Power Architecture rience in the marketplace.
20 Xcell Journal Second Quarter 2008 VIEWPOINT
Expanding Market Opportunities set of hardware debugging interfaces for The Power Architecture community’s focus Power Architecture implementations on improving customers’ systems design • Publication of consolidated application experience, along with our product variety binary interface specifications for 32- and vendor quality, has led to tremendous and 64-bit Power Architecture imple- growth in the technology platform. Between mentations with neutral licensing that 2005 and 2006, unit shipments for Power permits wide use and reference by cus- Architecture processors grew by 61%. tomers, third parties, and other interest- FPGA Powered – Today, Power Architecture processing ed parties to support the development That‘s it! technology is used in a huge variety of of tools, operating systems, and other advanced electronics. For example, the platform technologies world’s two fastest supercomputers run Power Architecture processors, as do five of • Platform requirements specification for the world’s 10 best-performing enterprise embedded systems built on Power servers. Power Architecture controllers are Architecture technology in more than half the new cars on the road. • An open-source Hypervisor optimized Practically every phone call, e-mail, and for embedded systems Web page is delivered by equipment using FPGA-Module: Power Architecture microprocessors. Power • New models and simulation specifica- TQM hydraXC – Architecture devices are also pervasive in tions that support the construction of consumer game consoles, including the “virtual prototypes” with Power smallest, most universal Microsoft Xbox 360, Nintendo Wii, and Architecture processors Hardware Platform for Sony PlayStation 3. • Technical training for Power Reconfi gurable Computing Looking forward, the communications Architecture products offered online • Based on XILINX Spartan 3, Virtex 4 and Virtex 5 technology infrastructure, automotive control, aero- through webinars or in person at our • Ethernet 10/100, USB 2.0, RTC space, and defense market segments will regional Power Architecture confer- • SPI-, NAND-Flash, DDR2 / SDRAM continue to be strongholds for Power ences in Europe and Asia • Size 2.13 Inch x 1.73 Inch Architecture technology. The consumer (54 mm x 44 mm) market – digital televisions, DVD players, Conclusion • Programmable VCC IOs set-top boxes, and particularly game con- Power.org’s open-community approach to soles – will drive strong double-digit managing Power Architecture technology Embedded solution with growth for Power Architecture devices has already resulted in many tangible bene- TQM hydraXC for through 2011. Xilinx Virtex FPGAs with fits. For instance, in 2007, Power.org’s • Fast time to market PowerPC cores will remain a vital part of members completed the Server Power • Economical series production • Highest fl exibility the Power Architecture portfolio for many Architecture Platform Reference (sPAPR), • Hardware reduction of these major markets. an open specification for streamlining the :: Starter kits available :: development of Power Architecture prod- Enhancing Architecture Through Power.org ucts for Linux. This specification, built on Dozens of influential companies provid- more than 20 IBM patents, was made Starter kit ing chips, tools, software, services, and available to corporate members of with module systems have joined together in Power.org Power.org royalty-free. Along with this to develop open specifications and stan- specification, a compliant reference design dards for the Power Architecture platform. was also released based on IBM’s 970MP To simplify the development experience processor. This was made available with a for system designers, Power.org’s members Linux stack, Hypervisor, and firmware. will complete a number of projects and By working together within Power.org, specifications this year: our member companies are continuing to improve the product design experience for • Searchable online product catalog cover- our many shared customers and to expand ing almost all silicon, tools, software, business opportunities for the entire Power Email: [email protected] and services offerings available for the Architecture community. www.tq-group.com Power Architecture technology platform For more information about Power.org, Infoline: (+49 ) 8153 / 93 08 - 333 • Specifications establishing a common visit www.power.org.
Second Quarter 2008 Xcell Journal 21 EMBEDDED APPLICATIONS Copyright 2008 California Institute of Technology. Government sponsorship acknowledged. Developing FPGA Coprocessors for Performance-Accelerated Spacecraft Image Processing C-to-FPGA design techniques speed development of space-based imaging applications.
by Paula J. Pingree Senior Engineer Jet Propulsion Laboratory, California Institute of Technology [email protected]
Lucas J. Scharenbroich Staff Engineer Jet Propulsion Laboratory, California Institute of Technology [email protected]
Thomas A. Werne Associate Engineer Jet Propulsion Laboratory, California Institute of Technology [email protected]
David Pellerin CTO Impulse Accelerated Technologies [email protected]
Fast and accurate on-board classification of image data is a critical part of modern satel- lite image processing. For Earth sciences and other applications, space-based smart payloads make use of intelligent, machine- learning algorithms and instrument auton- omy to detect and identify natural phenomena such as flooding, volcanic eruptions, and sea ice break-up.
22 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
The Jet Propulsion Laboratory (JPL), a FPGAs for Onboard Computation power and lower cost than general-purpose, National Aeronautics and Space Onboard computation has become a sig- single-board computers (SBCs). FPGA plat- Administration (NASA) laboratory, has nificant bottleneck for advanced, space- forms offer breakthrough performance over developed support vector machine (SVM) based scientific and engineering radiation-hardened SBCs, leading to entire- classification algorithms used on board applications. Currently available spacecraft ly new architectures for smart payloads. spacecrafts to identify high-priority image processors have high power consumption, To evaluate the potential for accelera- data for downlinking to Earth. These are expensive, require additional interface tion using FPGAs, we selected the Xilinx algorithms also provide onboard data boards, and are limited in their computa- ML410 evaluation platform for develop- analysis to enable rapid reaction to tional capabilities. ment and demonstration of selected smart dynamic events (Figure 1). These onboard payload concepts. The Xilinx ML410 eval- classifiers help reduce the amount of data uation board comes equipped with a downloaded to Earth, greatly increasing Virtex-4 FX60 FPGA that features two the science return of the instrument. embedded PowerPC 405 processors and a SVM classification algorithms are flying large amount of available FPGA logic. today, using computational platforms such The specific algorithm we chose for our as the RAD6000 and Mongoose V proces- investigations is the SVM classification sors. These legacy processors have only lim- algorithm. Algorithms of this type have ited computing power, extremely limited found broad application in general machine active storage capabilities, and are no longer learning and classification tasks, as well as considered state-of-the-art. For this reason, for onboard remote sensing. An SVM is a onboard classification has been limited to maximum margin classifier that calculates a only the simplest functions running on only separating hyperplane between two labeled a subset of the full instrument data: for classes such that the distance to the nearest example, only 11 of 242 bands in the case data in each class is maximized (Figure 2). of the Hyperion instrument on the Earth Figure 1 – NASA uses smart payloads to By selecting such a maximum margin classify Earth image data and reduce the Observing-1 (EO-1) satellite. amount of data required for downloading hyperplane, the SVM classifier can exhibit FPGA coprocessors are an ideal candi- (Image courtesy of NASA). better generalization to new data than other date for these algorithms. FPGAs can pro- linear classification methods. vide significant improvement in onboard The goal of training a support vector + classification capability and accuracy when H machine is to learn a set of weights such compared to the legacy processing plat- H that the sign of a weighted sum of dot – forms now flying. H products between the training data, xi, and To evaluate the effectiveness of FPGAs a test vector – t – will correctly predict the for SVM algorithms, we implemented a class of the new data vector. legacy snow-water-ice-land (SWIL) classi- SVMs also incorporate what is known fier, originally developed for the as the kernel trick, a method allowing them Hyperion instrument, on the Xilinx® to be extended from purely linear to non- Virtex™-4 FX60 FPGA. To develop the linear classifiers. This method involves for- application more quickly, we took advan- mulating the training and testing tage of the Impulse C-to-FPGA compiler algorithms in terms of dot products and margin tools provided by Impulse Accelerated then replacing the dot products with a ker- Technologies. These tools support the nel function that represents a dot product rapid development of highly parallel Figure 2 – Calculating a separating hyperplane after passing the arguments through some using an SVM. The circled data points are the hardware algorithms and applications. support vectors that lie on the margin. non-linear function. The kernel function This article describes our approach to permits the high-, or even infinite-dimen- implementing the Hyperion linear SVM on Recently developed hybrid FPGAs, such sional, dot products in the non-linear space the Virtex-4 FX60 FPGA, as well as addi- as the Xilinx Virtex-4 FX device, offer the to be computed using terms from the orig- tional experiments that we performed using versatility of running diverse software appli- inal, low-dimensional space. an increased number of data bands and a cations on embedded processors while at the SVMs are well suited to onboard more sophisticated SVM kernel. These same time taking advantage of reconfig- autonomy applications. The property that experiments show the potential for more urable hardware resources, all on a single makes SVMs particularly applicable is the efficient, higher performance onboard classi- chip. These tightly coupled, single-chip asymmetry of computational effort in the fication using FPGA-embedded algorithms. hardware/software systems offer lower training and testing stages of the algo-
Second Quarter 2008 Xcell Journal 23 EMBEDDED APPLICATIONS
rithm. Classifying new data points requires orders-of-magnitude less computation Input Output than training because the process of train- Image File Image File ing an SVM requires solving a quadratic optimization problem. SVM training requires O(n 3) operations, where n is the number of training examples. Data Input/Output In contrast, testing a new vector with a trained SVM requires only O(n) operations. Faster training algorithms that exploit the specific structure of the SVM optimization SVM problem exist, but the training remains the primary computational bottleneck. After training the SVM, many of the weights, wi, will be equal to zero. This means that these terms can be ignored in the classification formula. Input vectors that have a corresponding non-zero weight are called support vectors. Reducing the number of support vectors is key to suc- cessfully deploying an SVM classifier on board a spacecraft, where there are severe constraints on the amount of CPU resources available. Previously deployed classifiers have used such reduced-set methods, but were still constrained to operate on only a subset of the available classification features. Removing such bottlenecks is critical to realizing the full potential of SVMs as an onboard autonomy tool. Figure 3 – Software/hardware partitioning for the SVM algorithm
Partitioning the Problem When using FPGAs with embedded proces- CompactFlash card installed in the quency of the MicroBlaze processor. Also, sors, efficient partitioning of algorithms ML410 board. the MicroBlaze processor would be instan- between software and hardware is important The software-side application streams tiated in valuable FPGA fabric, whereas the to achieve high performance. For the the image data to the SVM, which is also PowerPC exists external to the fabric. FPGA-based development of the SVM, we written in C but has been compiled Because the 256-MB DIMM is the implemented a previously software-only (using the Impulse C-to-FPGA compiler) largest source of memory on the board, legacy algorithm in the FPGA hardware fab- to FPGA hardware. The SVM hardware we used it as main memory for the pro- ric to take advantage of the FPGA’s high- process performs the required SVM oper- gram. The processor local bus (PLB) is a speed parallel processing capabilities. The ation on the image and streams the results high-speed bus (compared to the on-chip image file input and classification file output back to the PowerPC 405 processor. The peripheral bus [OPB]) that allows for fast are managed within the embedded processor then writes the pixel classifica- data transfer to/from the memory and PowerPC processor using the CompactFlash tions (e.g., snow, water, ice, land, cloud, SVM core peripherals. The 16-GB card provided on the ML410 board. or unclassified) to an output file on the CompactFlash card holds the input and In this implementation, the software CompactFlash card (Figure 3). output data files, which are too large to fit side of the application is coded in C and The PowerPC ran the software portion on the DIMM. The UART was used for compiled to the embedded PowerPC 405 of the task, which sends data to and collects debugging output. The OPB is a low- processor using the Xilinx EDK tools. data from the SVM hardware module. We speed bus that serves as the default inter- The embedded software application reads chose to use the PowerPC instead of a face between the processor and the an input image file consisting of 857,856 MicroBlaze™ processor because the SystemACE™ interface controller and pixels. The image file is read from the PowerPC can operate at triple the clock fre- UART peripherals.
24 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
The Impulse C compiler performs these optimizations and generates hardware in the form of either VHDL or Verilog. This hardware can then be synthesized using FPGA tools such as Xilinx ISE™ software and Platform Studio.
In support of partitioned software/hard- The Impulse C compiler performs these pixel-wise dimensions of the image. Each ware applications such as this, the Impulse optimizations and generates hardware in class was assigned an arbitrary color, and the tools include a library of C-compatible func- the form of either VHDL or Verilog. This number of pixels belonging to each class tions that implement a number of process- hardware can then be synthesized using was tabulated. We could then easily calcu- to-process communication methods. These FPGA tools such as Xilinx ISE™ software late the percentage of pixels belonging to methods include streaming, shared memory, and Platform Studio. On the processor each class as well as visualize the resulting and message passing. For this application, side, the compiler generates run-time file of classified pixels as a colored image. the Impulse C streaming programming libraries ready for use on the embedded This validation was required to meet model was the obvious choice. PowerPC processor. two goals of this project. First, it was nec- In Impulse C streaming applications, essary to validate both the pixel classifica- hardware and software processes communi- Validating the Algorithm tion results from the legacy (software-only) cate primarily through buffered data The output of the combined software/hard- version of the SVM and the generated streams implemented directly in hardware. ware application is a file comprising a col- hardware implementation of the SVM. This buffering of data makes it possible to umn of integers indicating the resulting This was necessary to verify that the integer write parallel applications at a relatively class of each pixel in the image. To validate and floating-point calculations performed high level of abstraction without the cycle- the results of the experiment and see the in the FPGA (in hardware) returned the by-cycle synchronization that would other- results, we used MATLAB to reformat the same results (within acceptable margins) as wise be required. column of integer values to the original those observed in software currently flying. Figure 4 illustrates the design flow for C-to-hardware compilation using the Xilinx FX60 FPGA as a target. On the software side of the application C Language (in this case on the PowerPC 405 proces- Applications sor used for hardware-level testing), Impulse C functions are used to open and close data streams, read or write data on the streams, and, if desired, send status messages or poll for results. In the case of Generate Generate Generate Hardware the Virtex-4 FX, stream reads and writes FPGA Software Hardware Interfaces Interfaces can be specified as operations that take advantage of either the PLB or the auxil- iary peripheral unit (APU) interface.
HDL C Software Generating Parallel FPGA Hardware Files Libraries To create the hardware portion of our appli- cation, we used the Impulse C compiler to generate synthesizable HDL files ready to use with the Xilinx EDK tools. In addition to generating HDL files, the Impulse com- piler also generates additional files required by the EDK tools, including the needed PLB and APU bus interfaces. The Impulse C compiler performs a variety of low-level optimizations, including C statement scheduling and loop pipelining, saving application developers a great deal of time that would otherwise be spent performing Figure 4 – Design flow from C-code to FPGA-embedded application tedious, low-level hardware optimization.
Second Quarter 2008 Xcell Journal 25 EMBEDDED APPLICATIONS
discrepancies were caused by the differ- Using the C-to-hardware compiler, we ences in the training data sets of the were able to quickly experiment with these SVMs, but nonetheless we decided to ver- alternative implementations and compare ify the hardware implementation on a results, both in terms of accuracy and in pixel-by-pixel basis. terms of performance. The hardware To completely validate the algorithm, implementation of these SVMs produced we compared the outputs of a software results that agree very well with the soft- implementation, using the same input ware simulations of the algorithms and data, to the version implemented in hard- demonstrate significant increases in over- ware by the Impulse C compiler. The two all system performance. implementations produce identical classifi- We were also able to determine (using cations on a pixel-by-pixel basis. The com- Xilinx synthesis tools) the FPGA fabric uti- bination of the good agreement of our lization percentages for each of these results with the legacy ASE results, as well SVMs, as shown in Table 1. as the independent results from the soft- ware platform, led us to believe that our Conclusion implementation was valid. All of these val- FPGAs with embedded processors are idations were performed using a combina- demonstrating levels of performance and tion of C language and MATLAB efficiency that were previously impossible programming methods, with no need to using traditional processors. Hardware use hardware design methods or hardware acceleration of SVM algorithms promises description languages. to dramatically improve onboard data pro- cessing in future science missions. Extending the Algorithm By using embedded FPGAs such as the Having successfully implemented the legacy Virtex-4 FX60 device, we can implement SVM designed for Hyperion, we then con- increasingly advanced SVMs with “room to sidered two extensions to the C-language grow” in onboard resources. Our results algorithm: using a larger number of bands demonstrate that we can achieve even our [The color key is blue = water, cyan = ice, dark purple = snow, lavender = unclassified] with the same linear kernel SVM and creat- most advanced SVM extension, a polyno- ing a new SVM with a nonlinear kernel. For mial kernel, using just one-third of the Figure 5 – A comparison of the results from the expanded linear kernel SVM, we arbi- DSP blocks and slices available in the a) the legacy 11-band SVM implementation, b) the FPGA-accelerated implementation, and trarily selected 30 of the available 242 bands Virtex-4 FX-60 FPGA. c) the original hyperspectral image. in the image. For the non-linear kernel Software-to-hardware design tools SVM, we used the same 11 bands as the played an important role in the fast proto- We began the validation process by com- legacy SVM, but with a modified kernel. typing and development of these SVM paring the pixel classification percentage Because training data was not available for algorithms. The Impulse C tools allowed us results to those reported by the SVM used in the original legacy SVM, we could not gen- to more easily manage the complexity of the currently flying EO-1 satellite. The clas- erate new SVMs that would be comparable the application and to experiment with sification percentages show good agreement, to it, so we used new training data to gen- alternative implementations. For testing particularly for the snow and water classes. erate the two new SVMs. We also generat- and algorithm validation, the Xilinx Image visualizations were also impor- ed a new, 11-band linear-kernel SVM for ML410 board provided an excellent and tant in this effort. Our resulting visualiza- comparison to the legacy SVM. cost-effective development target. tions show excellent agreement with the results from the EO-1 (Figure 5). FPGA Resources Total on V4FX60 Linear (11 bands) Linear (30 bands) (2,1) Polynomial In addition to the qualitative compar- ison of the images, we also conducted a Slices 25,280 1,151 (4%) 2,253 (8%) 2,082 (8%) pixel-by-pixel comparison of the legacy Slice Flip-Flops 50,560 1,290 (2%) 1,337 (2%) 2,519 (4%) algorithm results and our classifications. Four-Input LUTs 50,560 1,838 (3%) 3,110 (6%) 3,287 (6%) The pixel-by-pixel classification compari- FIFO16/RAMB16s 232 2 (1%) 2 (1%) 5 (2%) son showed that 76.8% of the pixel clas- sifications in our results matched those of DSP48s 128 4 (3%) 4 (3%) 12 (9%) the Autonomous Sciencecraft Experiment Table 1 – Device utilization for an FPGA-based SVM algorithm (ASE). We believed that the remaining
26 Xcell Journal Second Quarter 2008
EMBEDDED APPLICATIONS Using the MicroBlaze Processor to Develop Speed Sensors for F1 Racing Cars CEA Leti Minatec, Michelin, and EASii IC developed an accurate and real-time optical sensor prototype that calculates both speed and slip angle.
by Viviane Cattin You can estimate horizontal speed components. For our sensor, two laser Research Engineer parameters through indirect modeling of diodes emit elliptical beam shapes to illu- CEA Grenoble Leti Minatec wheel speed and yaw-angle measurements; minate the road. The first front laser beam [email protected] however, this method suffers from model- runs parallel to the longitudinal axis of the ling errors and other uncertainties. Some car, while the second back laser beam Bernard Guilhamat optical sensors give measurements that are, measures orthogonally. Two linear photo- Research Engineer in practice, not reliable in wet conditions. diode arrays collect the reflected light CEA Grenoble Leti Minatec We set out to develop a compact and beams on the ground. [email protected] real-time sensor that could deliver a direct A pixel from the front photodiode array slip-angle measurement with high accuracy is compared to all pixels on the back pho- Angélo Guiga and flexibility in various conditions (dry, todiode array. The back pixels provide Research Engineer wet, humid, or snowy). In this article, we’ll information about slip angle or transversal CEA Grenoble Leti Minatec ® [email protected] describe how the Xilinx Virtex™ FPGA speed, while a corresponding time delay family, with its embedded MicroBlaze™ gives longitudinal speed estimation. Figure Sébastien Riccardi processors, helped us achieve our goal. 1 illustrates the general optical configura- Hardware Activity Manager tion of the sensor. EASii IC Sensor Measurement Principles We then developed an additional func- [email protected] A laser velocimeter is commonly used for tionality to our sensor. Because of the laser taking speed measurements in the print- beam shape, we inserted a standard triangula- Understanding the dynamic behavior of ing and textile industries. The velocime- tion process to measure ground height varia- cars, in particular two-dimensional hori- ter takes laser images of the surface and tion beneath the car. This parameter is useful zontal speed parameters (also known as the uses an autocorrelation function to com- for understanding the behavior of the vehicle car’s slip angle), is critical to optimize per- pute accurate speed estimates. and also improves overall system accuracy. formance during races and improve essen- We extended this approach to measure Finally, we developed an innovative tial equipment like tires. both longitudinal and transversal speed process to avoid optical disturbances. In
28 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
wet conditions, random spurious reflec- tions disturb systems locally and cor- V H V rupt the calculation of correlations. Our trans longi solution is two-fold: inclining the laser beams at a specific angle and using pre- Laser Laser 1 2 processing to split the optical signals. 685 nm We successfully tested our sensor in real 50 mW time in various conditions. Class 3B
Embedded Intelligence During the development of our sensor, we designed and validated the signal processing algorithms using a reconfigurable FPGA. Unlike other targets such as DSPs or ASICs, V an FPGA allowed us to adapt routines and software architecture depending on test Figure 1 – General optical configuration of a laser speed sensor. Three photodiodes results and evolving specifications. arrays measure longitudinal speed, height measurement, and transversal speed
Software Architecture Figure 2 shows the main architecture of the sensor as proposed by Leti, the research SPEED Process laboratory in charge of this project. Three processes run concurrently: HEIGHT Process Front Pixels Back Pixels • The height process calculates ground Height Pixels FIR Filter height variations. ! ! Adaptive Threshold Adaptive Chopping • The speed process computes longitu- Average Subtraction dinal speed and slip angle from cor- Barycentre Calculation relations and height estimations of 1 x 128 Times Correlation 1D the car. Height H Estimation ! Maximum Extraction • The output process sends sequential Average (y/n) outputs on a high-speed, CAN-stan- Interpolation dard automotive bus system. Fit : Delay, Transersal Pixel These processes share variables through
! Over Sampling memory access. 8 x 5 Times Correlation 1D Figure 2 includes some key points marked with circles. As in the example previously Maximum Extraction mentioned, signal chopping improved our Interpolation ability to take measurements in wet condi- OUTPUT Process tions. Still other developments are essential to Prediction: S D Fit : Delay , Transersal Pixel provide accurate measurements: out out i i Speed S , and Slip Angle D CAN Frame Formation i i • Sampling frequency is the constant Estimation parameter that ensures accurate meas- Sorting Average urements. We controlled the acquisi- tion frequency of diodes so that the ... CAN Bus ... Error Management < S D H Control Flags > ! ground sampling remained constant Model Updating Fs Loop Control during vehicle displacement. A control loop adjusted the sampling frequency according to the speed estimation and prediction, part of the speed process Figure 2 – Main software architecture with key points previously described.
Second Quarter 2008 Xcell Journal 29 EMBEDDED APPLICATIONS
We wanted a programmable embedded target that offered the possibility to accelerate software run times, used reconfigurable parameters, and facilitated evolutions of the algorithm.
• Oversampling of ground signals This routine additionally offered a crit- cal parameters and added a suitable improved measurement accuracy, par- ical way to evaluate the quality of sen- threshold to the signal to avoid sun ticularly for the high speeds (above 190 sor measurements. If all calculations perturbations on height calculation. kilometers per hour) of F1 racing cars. matched, we could confidently believe Other smaller processes run in parallel. At these frequencies, we encountered a in the accuracy of the measurement. For example, a specific algorithm evaluates technological ceiling of photodiode whether speed is equal to zero and commu- array integration time. Therefore, we • Taking into account height estimation nicates that the car is stopped. averaged several slightly shifted meas- when calculating speed improves meas- urements to improve accuracy (similar urement quality; we used the car’s Hardware Architecture to when using very fast oscilloscopes). ground height variation to refine opti- We wanted a programmable embedded tar- get that offered the possibility to accelerate software run times, used reconfigurable FPGA Processing XC2V1000 CLK parameters, and facilitated evolutions of the Manager Buffer Maximum algorithm. An FPGA had all of these fea- tures, as well as offering the potential to
Buffer Maximum parallelize correlation calculations. We used two FPGAs from the Virtex MicroBlaze Processor RISC 32 Bits device family: one devoted to preprocessing (sensor control, digital-to-analog converter, INTERCORR memory, filters, averages, height processes) and one focused on the main processing. We added a MicroBlaze embedded Energy Buffer processor inside the second FPGA to CAN CAN Sequencer Pixel Column Controller CTRL Process implement our high-level algorithm. It includes mainly conditional instructions, with few mathematical calculations. Using this processor allowed us to apply mathe- FPGA Processing XC2V1000 matical functions (like trigonometric func- Sequencer CLK Sensor CTRL Preprocess Manager tions, for example). The soft-core processor is also more user-friendly (based on C lan- Sensor Analog Filter Channel 30 Hz guage) and facilitates debugging. FL FL 30 KHz Figure 3 shows the architecture as pro-
Sensor Analog Buffer Buffer Filter posed by EASii IC, the company in charge Channel Sensor Sensor 30 Hz Average LS LS LS LS 30 KHz of hardware realization.
Analog Filter Component Choice Channel 30 Hz Sensor ZBT HSa 30 KHz The sensor must operate through the severe HS Analog Control conditions of F1 racing cars, which include Channel Average HSb low weights, reduced size, high vibrations, and high accelerations. We used both ZBT Memory DC/DC converters and low dropout regu- 256K x 16 bits lators to avoid major perturbations from the car’s power supply. Figure 3 – Main software architecture with four analog inputs, digital-to-analog The output is transmitted over a high- converter timing, memory storage, calculations, timing, and output control speed automotive CAN bus, thanks to an SJA1000 chip manufactured by NXP
30 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
simulating high-speed and slip-angle varia- Ph di d A tions. The sensor is designed to work with velocities from 2 to 400 kilometers per hour and can calculate slip angles from -10° to +10°. We reported relative errors of 0.5% on speed, 0.3 mm on ground height varia- tion, and less than 0.1° on slip angle. We have conducted more than 50 trials since 2003 at the Michelin Technology Centre and with F1 partner teams. These measurements allowed us to significantly improve sensor performance and validate its use under a wide variety of conditions. Figure 4 presents experimental results in wet conditions. Outputs of the other com- mercial speed sensor are irrelevant because it was very disturbed by the behavior of Figure 4 – Sensor photography and main parameters water on asphalt. Our prototype presents more robustness; F1 car experts consider these results fully functional.
Conclusion Various experiments have shown that our prototype addresses the accuracy and robustness necessary for F1 race cars. We assume that these results show the engi- neering maturity of our sensor, especially with its innovative ability to operate on wet surfaces – a key challenge for non- contact speed sensors. The simplicity of the global architec- ture (which includes electronic devices as well as optical parts) makes us confident that our optical speed sensor prototype could be implemented at a low cost. We have already studied a new hardware design that fully exploits the available Figure 5 – Comparative result on wet surfaces (LETI sensor in blue, other commercial space between optical elements, reducing sensor in red) showing speed (top), slip angle (middle), and height output (bottom) package volume by a third. Future works may include an image Semiconductors. To improve the heat possible if we applied our system to a specif- sensor to simplify the optical part and transfer of the most consuming compo- ic design. The total power consumption of fusion with other data (such as nents (such as the analog-to-digital con- the sensor is ~10W. The total size (195 mm accelerometer measurements) to enhance verter), we added a thermal paste, which x 107 mm x 35 mm, including both elec- sensor functionalities. facilitated thermal exchanges with the tronic and optical elements) facilitates rapid For more information, visit Leti (www- metal packaging. Additionally, a thermal mounting on the car, either on the keel for leti.cea.fr) or EASii IC (www.easii-ic.com) or sensor gives the temperature of the sensor racing cars or anywhere on the chassis. e-mail [email protected]. to prevent overheating. To avoid connection failure and stress Performance We wish to acknowledge Sébastien Dauvé, Bruno breakdowns, we added glue to the largest We set up and calibrated our prototype at Flament, William Foucault, Jean-Michel Ittel, Philipe electronic modules. Most of the basic com- the Michelin Technology Centre, our part- Peltié, and Jean-Charles Vidal from Leti research ponents of our system are off-the-shelf, ner in this project and the eventual end user laboratories and Jean-Baptiste Monnard from EASii which means that further optimization is of the sensor. They have a dedicated tool for IC for their contributions to this project.
Second Quarter 2008 Xcell Journal 31 EMBEDDED APPLICATIONS FPGA-BasedFPGA-Based ControllerController EnablesEnables PrecisePrecise ChemicalChemical AnalysisAnalysis An embedded controller using a Spartan-3 device and MicroBlaze processor provides by Robert Henderson Development Engineer electronic control of gas flows and pressures. Agilent Technologies [email protected]
A gas chromatograph (GC) is a chemical analysis instrument that separates volatile substances in a complex chemical sample. GCs help answer such basic questions as: • “What contaminants are in this ground water”? • “Is this gasoline formulated correctly”? • “Are there any residual solvents in this pill”? The basic mechanism for gas chromato- graphic separation is the distribution of a sample between two phases (a stationary phase and a gas mobile phase). In modern chromatography, the stationary phase is coated on the inside of a long narrow tube known as a column. As the sample passes down the column, the different chemical components travel at different rates and are detected at the exit of the column by a detector. In addition to the gas supporting the flow of the sample down the column, each detector also uses between one and three supporting gases (see Figure 1). Agilent Technologies (spun off from Hewlett Packard in 1999) has been a part of the GC business since 1965. In 1973, Hewlett Packard introduced the world’s first microprocessor-controlled analytical instru- ment, the 5830 GC, and over the years introduced a series of next-generation instruments, including the Agilent 7890A high-performance GC in 2007.
32 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
Gas Chromatograph pneumatic channels in a closed-loop man- ner, offloading the CPU in the base unit from this real-time task (Figure 2). Inlet Sample Detector (Carrier Gas) Four-Wire Interface (Module to Base Unit) From the base unit to the module, the Separation entire interface comprises an unshielded Column four-wire cable: two wires are for power Detector and two are for communications. The Sample Support power supplied is an unregulated, full- Gases wave-rectified +24V, and the communica- tions path is a bus LVDS system. The LVDS communications path is a Figure 1 – Block diagram of gas flows in a gas chromatograph differential signal path that minimizes noise susceptibility issues and allows for cable lengths long enough to place modules anywhere inside the instrument, maximiz- +24V ing instrument configurability options. Module Power The bus LVDS communications operate in Supplies a half-duplex mode, allowing the same pair Spartan-3 BASE of wires that sends commands to the mod- FPGA 4-Pin UNIT (XC3S200) Connector ule to provide command responses from CPU LVDS 2-Wire the module. Additionally, because the communica- tions interface in the base unit is also implemented with an FPGA, the LVDS Multiplexed communications link between the two Valve Valve A/D EEPROM FPGAs required essentially no hardware, (8 Input Channels) Drivers Driver Monitor saving cost, reducing complexity, and improving reliability.
MicroBlaze Embedded Processor Core In order to implement the embedded con- troller, we configured the Xilinx® MicroBlaze™ embedded processor in the FPGA using the Platform Studio tool pro- Pressure Sensor Flow Sensor Pressure Sensor Proportional Valve Proportional Valve Proportional Valve vided in the EDK development system. This tool not only defines, synthesizes, and routes Figure 2 – Block diagram of Spartan-3 FPGA-based pneumatic module user logic, utilizing MicroBlaze processor IP and associated bus interface designs, but also GCs have evolved measurably over the the 7890A GC with a modular structure, develops and compiles the program code for last 20 years, from simple mechanical pres- allowing as many as six pneumatic modules the MicroBlaze processor. All FPGA hard- sure and flow regulators to sophisticated to communicate with a common “base ware description is in VHDL, and all electronic closed-loop pressure and flow unit” that provides the central CPU and MicroBlaze processor coding is in C. controllers that allow for the storage and memory, communication ports, keyboard Specifying the optional barrel shifter retrieval of the entire instrument setup. and display interfaces, and power supplies. and hardware divider in the microprocessor Additionally, this capability also allows for Pneumatic modules can regulate up to hardware specification file (system.mhs) programmable setpoints and for the log- three channels of gas for each inlet or enabled the processor system to function as ging of any deviation from the setpoint(s). detector. Each module is essentially an a real-time controller. Performing opera- embedded controller that receives com- tions such as bit shifting or division in C Spartan-3 FPGA-Based Pneumatic Modules mands from the CPU in the base unit, but code instead of VHDL hardware takes too Because of the disparate different applica- otherwise operates independently. The long for some of the time-critical PID con- tion needs of our customers, we designed module controls the three independent troller processes.
Second Quarter 2008 Xcell Journal 33 EMBEDDED APPLICATIONS
Using the Xilinx-supplied template 65 kHz. This high a frequency is used so MicroBlaze processor a transmit and VHDL file, the user logic interfaces to the that in addition to being above the audible receive FIFO along with the necessary flags MicroBlaze processor OPB bus with an range, it is filtered out well by the inductive (FIFO empty, FIFO full). OPB to IPIF (IP interface) block. This time constant of the valve, resulting in a block takes care of the OPB bus protocol smooth current profile. Basic Pneumatic Module Operation and interface signals and presents a simpli- Signals from the valve drive are read As I mentioned earlier, the physical set- fied interface to the user logic called the IP back into the FPGA for diagnostic purpos- points supplied by the customer may be InterConnect (IPIC). es. This allows the MicroBlaze processor to static or programmed. They are expressed The major user logic VHDL blocks com- examine the voltages and determine if a in physical units such as psig (gauge pres- plement the external hardware controlled by the MicroBlaze processor, resulting in the overall FPGA system design (Figure 3). Let’s FPGA with Embedded Controller review the major components. OPB Power Supplies
Multiplexed A/D Converter Control MicroBlaze Processor IP Interface (IPIF) A common delta-sigma analog-to-digital CPU + Memory IPIC converter (ADC) is multiplexed between User Logic eight inputs. With this type of ADC, when the input changes the output filter must be 4-Pin Connector “flushed” of the previous channel’s data. A MUX EEPROM Valve UART VHDL block manages the ADC and its A/D Control Drive + FIFO serial data output. Control PWM When the MicroBlaze processor selects a specific channel, the VHDL subsystem LVDS 2-Wire reads and throws away the old channel’s readings, then reads and averages data automatically from the ADC. The Multiplexed Valve Valve EEPROM MicroBlaze processor specifies to the A/D Drivers Driver VHDL block how many readings to aver- Monitor age, and thus can perform other opera- tions (such as executing a new command from the base unit CPU) until an aver- aged reading is ready.
Serial EEPROM Control
The VHDL hardware manages the com- Pressure Sensor Flow Sensor Pressure Sensor Proportional Valve Proportional Valve Proportional Valve mands, addresses, and 8-bit data to and from the serial EEPROM and presents a Figure 3 – Overall pneumatic controller block diagram memory-mapped, 32-bit-word-wide inter- face to the MicroBlaze processor. The valve is installed and if the PWM drive is sure) or ml/minute (flow rate) and can be EEPROM stores calibration coefficients able to drive both high and low. set with a resolution of 0.001 psig and 0.01 for the sensors in the pneumatic module ml/min. These setpoints are sent at a rate of and PID coefficients, as well as other con- UART/FIFO 50 Hz to each pneumatic control module. figuration and identification information. We implemented a UART and FIFO in The basic job of the pneumatic con- VHDL hardware to handle the com- troller module is to regulate the pressure or Valve PWM Drive and Monitor mands and responses with the base unit flow to the desired setpoint. This is done After each PID calculation, the MicroBlaze CPU. Because of the half-duplex proto- with a modified proportional integral embedded processor outputs another valve col, the pneumatic module must only be derivative (PID) controller. drive setpoint to the memory-mapped in the transmit mode during the response addresses for the valve drive. The valve to commands. Command Interpreter drive VHDL state machine uses these val- VHDL hardware automatically returns Because there is no clock shared between ues to generate a pulse-width modulated to receive mode a fixed time after the com- the base unit CPU and the LVDS pneu- (PWM) drive to the proportional valves at mand response and presents to the matic modules, commands to the pneu-
34 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
matic control module are not synchronized used on data returned to the base unit for the control loop to attenuate away (the with the closed-loop operation of the CPU. It limits the bandwidth of the data disturbances are above the bandwidth of embedded controller in the module. to below 25 Hz so that there is no aliasing the controller). For example, you can send commands of the data in the 50-Hz rate of data to the We solved these problems by reading while the MicroBlaze processor is in the mid- base unit CPU. the value of the +24V supply with the mul- dle of PID calculations. Because the control tiplexed A/D converter and having the loop takes precedence, the command parsing Control Loop MicroBlaze processor calculate a feed-for- and execution will be temporarily delayed The PID controller function implements a ward compensation term. If the +24V sup- until the MicroBlaze embedded processor is standard PID control with a couple of ply increases, the valve drive is idle. This delay is short enough that it necessary twists. Figure 4 shows the over- automatically reduced to compensate. This allows the command response to fall with- all operation of the PID controller. requires a divide operation. Because this in the acceptable response-time limits set The first modification of a standard happens at over 100 Hz and takes away by the base unit CPU. PID is due to the +24V power supply. A from PID calculation time, the hardware The simple command interface defines fully loaded instrument could have 18 divider was implemented in the commands that set setpoints, read actuals, proportional valves to power. Rather MicroBlaze processor system specification. read and write the EEPROM, specify the than implement a 20W-regulated +24V The second modification from a stan- gas type (H2, Helium, N2, ArCh4), as well supply, we decided to use a simple and dard PID controller is due to the pneumat- as a number of diagnostic tests on the sys- reliable full-wave-rectified and unregu- ic system itself. The range of setpoints you tem, A/D converter, and valve drives. lated supply. A supply like this will track can define covers a broad dynamic range, the AC voltage and ripple at twice the with pressure setpoints from 0.2 psi up to Filtered Sensor Readings line frequency. 150 psi and flows from 3 ml/min up to The sensors read data hundreds of times a The magnitude of the +24V supply can 1,250 ml/min. second. Two IIR filters in the MicroBlaze range from 22V to 28V, depending on the Not surprisingly, the dynamics of the processor process the data, which makes use AC voltage to the instrument. This results pneumatic system over this range of opera- of the barrel shifter hardware configured as in a variation in the open-loop gain of the tion change a lot. It is possible, for exam- part of the processor configuration file. system, which can affect stability. ple, to have more than a 30-dB gain change The first low-pass filter helps reduce raw Additionally, the AC ripple on the in the transfer function of the pneumatic sensor noise above the bandwidth of the +24V supply modulates the drive to the system (ratio of sensor out to valve drive in) control loop that would result in a wider valve and can result in pressure or flow over these ranges. In addition, the band- control band. The second low-pass filter is variations that are at too high a frequency width (poles) of the pneumatic system varies greatly over this operating range. To obtain good control and step +24V +24V Feed-Forward response performance, you can alter the Calculation values of the PID coefficients based on the setpoints specified. This alters both the gain and frequency response of the PID Gain and controller to match the characteristics of Bandwidth Correction the pneumatic system at that setpoint.
Conclusion New techniques and improved precision in chemical analysis techniques have enabled Error Standard Gain and Compensate Proportional + PID Bandwidth +24V our customers to meet increasingly stringent Valve - Controller Correction Multiplier requirements for chemical identification. Setpoint Part of that improvement comes from the increased accuracy and precision of gas sup- plies within the GC. The new Agilent 7890A Pressure network gas chromatograph continues this Pneumatic or Flow System tradition, while the Xilinx Spartan-3 Sensor XC3S200 FPGA with MicroBlaze embedded processor helped the product Figure 4 – Modified PID controller meet its functionality, precision, cost, and modularity requirements.
Second Quarter 2008 Xcell Journal 35 EMBEDDED APPLICATIONS
by Paul Parkinson Senior Systems Architect Wind River [email protected]
Reconfiguring The development of modern defense sys- tems presents a familiar technological chal- lenge in that the requirements for increased application functionality conflict with the requirements for minimal footprint in terms of space, weight, and power (often the Battlespace referred to as “SWaP”). This is a particular concern for airborne platforms, especially unmanned air vehicles (UAVs) that have You can develop aerospace and defense applications physical constraints. Some defense systems are physically with Xilinx and Wind River technologies. vulnerable to interception by hostile forces because of their operational role, including intelligence, surveillance, and reconnais- sance (ISTAR) assets such as reconnais- sance aircraft, UAVs, and land-based sensor systems. If these platforms are to use leading-edge technology, it must be technology that cannot be compromised or reverse-engineered. In addition, field-based configuration and upgradability are essential to achieve interoperability between new and rapidly deployed coalition forces. In this article, I’ll explain how Xilinx® Virtex™-II Pro and Virtex-4 platform FPGAs, along with Wind River software platforms, are ideal for the development of ISTAR systems, and present design tech- niques that will enable secure deployment, with the use of partial reconfiguration in the field to help enable interoperability between coalition forces.
Technology Challenges In previous decades, defense systems used military-grade components from semicon- ductor manufacturers. These components had long life cycles, which were essential to support the in-service use of deployed hardware for 20, 30, or even 40 years. In 1994, when U.S. Secretary of Defense William Perry first advocated the use of commercial off-the-shelf (COTS) compo- nents on U.S. military programs where appropriate, other governments around the world subsequently adopted this philoso- phy (to varying degrees). The migration to
36 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
number of enhanced capabilities when com- pared to VxWorks 5.5, including technolo- gies that are critical in defense applications; advanced memory protection for application isolation using real-time processes; and a dual-mode IPv4 and IPv6 network stack, which the U.S. Department of Defense mandated for new programs in 2003. After XPS generates the VxWorks 6 BSP, it can be configured in the Workbench proj- ect configurator (Figure 1). The configura- Figure 1 – Workbench kernel configuration for Xilinx ML410 VxWorks 6 BSP tor enables components to be configured in a hierarchical manner and parameter COTS has led to a move away from specif- (The details of this approach were previ- values to be specified. When this has been ic military-grade components and toward a ously discussed by Rick Moleres and Milan completed, you can build the VxWorks 6 greater reliance on industrial-grade compo- Saini in their article, “Generating Efficient kernel for your target device from nents. The latter’s shorter supported life Board Support Packages” [Xilinx Embedded Workbench through automated processes cycles could impact in-service lifetimes. magazine, March 2006] and included inte- (Figure 2). Once VxWorks is running on In addition, defense companies have gration with Wind River’s Tornado 2.2 IDE the PowerPC 405 core(s), you can use developed custom ASICs at great expense and VxWorks 5.5 RTOS.) Workbench for source-level application for specialized functions in defense systems, Since the publication of that article, it has development (Figure 3). particularly ISTAR systems. Replacing these become possible to undertake similar devel- The integration between XPS and devices in deployed systems can be prohibi- opments using the state-of-the-art Wind Workbench allows you to use a VxWorks 6- tively expensive. There is also an ever-grow- River Workbench development suite and based common software platform on Virtex ing requirement to extend the in-service life VxWorks 6 RTOS. VxWorks 6 provides a FPGAs for application or algorithm control of ASICs as the operational lifetimes of or gateway network interfaces. This pro- front-line defense systems are extended. vides a very flexible approach that you can These challenges can now be addressed exploit for functions such as image com- through the use of Virtex-II Pro and pression and data link encryption to rele- Virtex-4 FX FPGAs, given their increased vant NATO standardization agreements. gate counts and incorporation of CPU and System and application software in DSP functionality. these devices (known as device software) The platform FPGA also has the poten- are often implemented in an architecture- tial to be used for algorithmic operations, specific manner, using low-level program- but until recent years this has not been ming languages such as assembly language exploited because it is difficult to express and custom software schedulers. Although software algorithmic operations in million- this has enabled exploitation of the hard- gate applications in VHDL. However, the ware’s performance potential, it has also ability to program reconfigurable logic made the task of technology insertion and from high-level software languages such as Figure 2 – Workbench build of Xilinx program upgrades more difficult. C as well as VHDL opens up the potential board VxWorks 6 kernel image You can overcome this problem by using of these devices to further exploitation. software platforms based on open standards. For example, the Wind River General Platform Development Purpose Platform – VxWorks Edition incor- You can readily exploit the potential of the porates the Wind River Workbench devel- platform FPGA through the Xilinx Platform opment suite, which uses the Eclipse Studio (XPS), which generates hardware open-source framework to provide seamless peripheral and IP definitions in VHDL, integration between tools for different parts closely coupled with the automatic genera- of the device software development life tion of a VxWorks board support package cycle, as well as a consistent GUI for devel- (BSP) in C source code for the PowerPC opers. This approach has tangible benefits in 405 processor core(s) contained within the Figure 3 – Workbench target connection to terms of developer efficiency, transferable PowerPC 405 core in Virtex FPGA Virtex-II Pro and Virtex-4 FX device fabric. skills, and knowledge retention.
Second Quarter 2008 Xcell Journal 37 EMBEDDED APPLICATIONS
On the runtime side, VxWorks 6.4 eration technique to provide a secure not provide a rapid, complete, and irre- introduced 100% conformance to the TCP/IP-based communications frame- versible destruction of subsystems; although POSIX PSE52 real-time controller profile, work (using IPSec in conjunction with they may achieve software destruction, they enabling software reuse from legacy sys- Triple DES or potentially 256-bit AES may not sufficiently destroy all of the hard- tems and the development of new encryption) and data link protocols. This ware architecture, especially fixed hardware portable applications. would involve performing the network components such as ASICs. packet processing on the PowerPC proces- Thus, it could still be possible to per- Communications Security sor, with computationally intensive form reverse engineering on some aspects in Defense Systems Design encryption offloaded to the FPGA, acting of the hardware design. To prevent this, Communications security (ComSec) is as a coprocessor. reconfigurable technology offers the ability an important requirement in many to achieve rapid and complete destruct defense systems, especially in UAVs, Information Security sequences and prevent reverse engineering. which often need to maintain continu- in Defense Systems Design FPGA devices can be erased completely, ous communications links for remote Defense systems now contain an increas- leaving no trace of their original applica- piloting and real-time image streaming. ing number of subsystems. In an airborne tion. By asserting a specific signal, the These communication links must be platform, for example, there are avionic device could be cleared in hardware in a few secure from eavesdropping and interfer- systems (for flight control), mission sys- hundred microseconds. You could also per- ence by hostile forces, so encryption is tems, and sensor systems for payloads form the software destruct sequence required for flight control and sensor such as electro-optic/infrared sensors through programmatic software control data transmissions. You could imple- (EO/IR) and synthetic aperture radar issued from a remote system, a sensible sce- ment this encryption in software (which (SAR) (Figure 3). ComSec in this case nario in the case of a UAV capture. places an additional load on the proces- refers to the use of firewalls and encryp- You can implement the sequence from a sor) or in dedicated hardware logic. You could also use reconfigurable technology, which provides benefits in terms of per- formance and secure design (which I’ll Firewall Firewall soon discuss). Let’s consider a hypothetical scenario Avionics Mission Sensor Network Systems Systems involving the use of Triple DES-encryp- tion on a communications link. Triple DES encryption provides a relatively high level of security by encrypting data three Figure 3 – Military airborne data networks times using three 64-bit private encryp- tion keys. This is inherently more secure tion for the transmission and reception of VxWorks-based common software plat- than single DES encryption but will take information securely between networks form connected to the command and con- a processor three times longer to compute, without transforming the information trol center through a secure IP-based because the processor performs the during transport. network, permanently and irreversibly eras- encryption as three sequential steps. Data communication between the avion- ing the content of a Xilinx FPGA using the Given the parallelism inherent in a ics systems and mission systems will include PLD API for VxWorks Embedded Systems Xilinx platform FPGA, you can implement the passing of global positioning informa- (PAVE), as shown in Figure 4. three encryption stages operating in paral- tion, bearing, and altitude. However, infor- I referred to Triple DES encryption in lel, with the output of one stage pipelined mation with differing security classifications the context of secure communication links into the next stage in 64-bit words. This may also need to be transformed securely earlier, but you can also employ this method acceleration enables the passing of data at between applications, subsystems, or net- within the content of the Xilinx FPGA (also higher rates over secure downlinks. works – this is known as InfoSec. known as the payload, not to be confused For example, a 350-MHz PowerPC can with a UAV payload). When the bitstream Triple DES-encrypt a data stream at the Preventing Reverse is read out from the FPGA, the Triple DES rate of 1.2 Mbps, whereas a lower power Engineering by Secure Design encryption must be decrypted before the 20-MHz Xilinx Virtex-II Pro FPGA can The implementation of a system destruct bitstream can be decoded. This not only Triple DES-encrypt a data stream at the sequence capability is possible for systems protects the FPGA content from being rate of 22 Mbps, with each 64-bit word that are vulnerable to compromise or cap- copied blindly and reused; it also prevents it processed in 57 clock cycles. ture by foreign forces. However, previous from being reverse-engineered should the You could apply this encryption accel- implementations of destruct sequences may UAV be intercepted.
38 Xcell Journal Second Quarter 2008 EMBEDDED APPLICATIONS
Software System Destruct Encryption Application Algorithms + PAVE Application Algorithms VxWorks
Secure Hardware TCP/IP CPU FPGA Ground Link
Figure 4 – CPU-controlled reconfiguration of a Xilinx FPGA
Technical Advantage Field Upgrade Advantage Deployment Advantage
FPGA-Based Solution
Fixed Chip Solution Time
Time to Deployment - First to deploy increases technical advantage Time in Field - Increases the in-life support yield while in field
Figure 5 – Reconfigurable technology life cycle
Conclusion form acts as a secure gateway to deliver ISTAR systems will spearhead the deploy- new content and perform partial or full ment of new coalitions in response to reconfiguration of FPGAs while deployed potential or emerging conflicts, providing in the field. This approach would harvest vital imagery intelligence and situational the architecture shown in Figure 4, but awareness from reconnaissance missions. employ an upgrade application instead of a A common software platform and recon- system destruct application. figurable technology could be used to When considered together, these capa- implement codecs that support reconfigu- bilities provide a technical advantage over ration in the field, assisting in the rapid traditional fixed-chip solutions (Figure 5) deployment and interoperability with and present a powerful argument for their NATO and/or other coalition forces by adoption on defense programs. sharing encryption keys. Platforms can For more information, I recommend a also be reconfigured on the fly to commu- paper by Saar Drimer of the University of nicate with legacy or incompatible sys- Cambridge on the use of FPGAs in the tems belonging to other coalition design of secure defense systems. To read members (if a suitable codec exists). “Volatile FPGA Design Security – A Reconfigurable technology not only Survey,” visit www.cl.cam.ac.uk/~sd410/ provides the ability to rapidly deploy new papers/fpga_security.pdf. technology, but also the means to extend For more information about the Wind the in-service lifetimes of deployed sys- River General Purpose Platform – VxWorks tems. A network-enabled software plat- Edition, visit www.windriver.com.
Second Quarter 2008 Xcell Journal 39 OR
Agilent Logic Analyzers Agilent Mixed Signal Oscilloscopes Up to 612 channels, up to 256 M deep memory + 4 scope channels + 16 timing channels Agilent FPGA Dynamic Probe Application software to increase visibility inside your FPGA = Fastest FPGA Debug Available
2ENogE!c_`gEN¢l3`ol!c!l<y!!gE`3