Issue 67 First Quarter 2009 XcellXcelljournaljournal SOLUTIONS FOR A PROGRAMMABLE WORLD

FPGAs Command Center Stage in Next-Gen Wired Networks

INSIDEINSIDE Dorkbot uses Xilinx Spartan-3E Starter Kit toto MakeMake 19th-Century19th-Century Pipe Organ Rock!

Satellite-Based Computing Flies Flexible Virtex Platform

Tools and Techniques to Tame FPGA Power Budgets

R

www.xilinx.com/xcell/ Xilinx® Spartan®-3A Evaluation Kit

Target Applications » General FPGA prototyping » MicroBlaze™ systems » Configuration development » USB-powered controller » Cypress® PSoC® evaluation Key Features » Xilinx XC3S400A-4FTG256C Spartan-3A FPGA » Four LEDs » Four CapSense switches » I2C temperature sensor » Two 6-pin expansion headers ® ® The Xilinx Spartan -3A Evaluation Kit provides an » 20 x 2, 0.1-inch user I/O header easy-to-use, low-cost platform for experimenting » ® ® and prototyping applications based on the Xilinx 32 Mb Spansion MirrorBit NOR GL Parallel Flash Spartan-3A FPGA family. Designed as an entry-level » kit, first-time FPGA designers will find the board’s 128 Mb Spansion MirrorBit functionality to be straightforward and practical, SPI FL Serial Flash while advanced users will appreciate the board’s » USB-UART bridge unique features. » I2C port » Get Behind the Wheel of the Xilinx Spartan-3A Evaluation Kit and take SPI and BPI configuration a quick video tour to see the kit in action (Run time: 7 minutes). » Xilinx JTAG interface » FPGA configuration via PSoC® Ordering Information Part Number Hardware Resale Kit Includes » Xilinx Spartan-3A evaluation board AES-SP3A-EVAL400-G Xilinx Spartan-3A Evaluation Kit $39.00* USD (*Limit 5 per customer) » ISE® WebPACK™ 10.1 DVD » USB cable » ® Take the quick video tour or purchase this kit at: Windows programming application www.em.avnet.com/spartan3a-evl » Cypress MiniProg Programming Unit » Downloadable documentation and reference designs

1.800.332.8638 www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners. Prices and kit configurations shown are subject to change.

LETTER FROM THE PUBLISHER Will FPGAs Lead Semiconductors Xcell journal out of the Economic Doldrums? PUBLISHER Mike Santarini [email protected] 408-626-5981 Programmable logic’s ‘forgiveness factor’ should not be overlooked.

EDITOR Jacqueline Damian ver the last couple of months, I’ve read opinion pieces in the electronics trade press predicting FPGAs could very well lead the semiconductor industry’s recovery from the economic down- ART DIRECTOR Scott Blair turn. I’m no financial guru, but I have to think this could well be the case. DESIGN/PRODUCTION Teie, Gelwicks & Associates O In my 14 years covering the electronic-design space (EDA, ASIC, FPGA, PCB and mem- 1-800-493-5551 ory industries), I’ve witnessed firsthand the maturation of the FPGA industry in its seemingly relentless quest to grab market share from the ASIC business. As FPGAs leveraged new process tech- ADVERTISING SALES Dan Teie nologies to increase logic-cell counts in line with Moore’s Law, they grew faster and lower in power with 1-800-493-5551 [email protected] each new node and became ever more affordable. At the same time, FPGA vendors have added a slew of advanced functionality, hardwiring MPU cores into their devices and offering the latest soft IP. INTERNATIONAL Melissa Zhang, Asia Pacific Tools for FPGAs have kept pace with this complexity, allowing engineers to invent novel designs [email protected] and implement them in FPGAs. What’s more impressive is that most FPGAs are forgiving—if your Christelle Moraga, Europe/ design doesn’t come out quite the way you want it, simply make a few tweaks and reprogram the Middle East/Africa FPGA until you have the functionality you need. If you need to accommodate a particular standard [email protected] that is not quite fully defined, you can quickly add that new functionality once the standard has Yumi Homura, Japan firmed up. You can even adjust your design and reprogram your FPGA after it has been deployed in [email protected] the field. You don’t have to be afraid to try something risky—something innovative. This forgiveness factor, I believe, will become a much more powerful selling point for SUBSCRIPTIONS All Inquiries www.xcellpublications.com FPGAs over ASICs as system companies look at their budgets and examine the cost and risks of implementing designs in application-specific ICs. It certainly is no secret that the mask costs REPRINT ORDERS 1-800-493-5551 for designs in the latest process technologies have grown exponentially, making ASIC starts viable only for extremely high-volume applications. At the same time, the cost of ASIC tools to deal with new process complexity is skyrocketing. Today, for example, most ASIC design teams must acquire design-for-manufacturing tools, sophis- ticated power-analysis tools to manage leakage and perhaps very soon (or already, in some cases) sta- tistical analysis tools just to produce viable silicon. Chances are, to become proficient in these new tools will require a training ramp-up and a bit of troubleshooting. Then you have to deploy all these new tools optimally to make sure your multimillion-gate ASIC is right the first time and doesn’t require a mask respin or 30. ASIC prospects are especially scary when you consider that during the last economic downturn, the dot-com bust, ASIC design starts plunged, from roughly 9,000 in 1999 to 4,000 in 2003, according to research firm Gartner Dataquest. And they have been steadily declin- ing ever since. Considering that ASICs back in 2000 and 2003 were nowhere near as complex as they www.xilinx.com/xcell/ are today and mask costs were far less, one has to wonder how much lower ASIC starts may go. Some may argue that application-specific standard products could step in to replace ASICs. You could indeed buy an ASSP that generally handles your specific task, and then modify the software

Xilinx, Inc. running on it to do the job you want it to. But your competitors can buy the same part and imple- 2100 Logic Drive ment similar, or even better, software in their product. In short, with an ASSP you no longer have San Jose, CA 95124-3400 Phone: 408-559-7778 the advantage of differentiating your hardware, just the software. Gartner Dataquest predicts a FAX: 408-879-4780 steady, albeit gentle, erosion in ASSP starts, from roughly 4,200 in 2008 to 4,000 in 2011. www.xilinx.com/xcell/ Meanwhile, FPGA starts are expected to grow from 90,000 to around 105,000 over that time frame. Can you take innovative risks with ASICs and ASSPs? Can you afford to? These are questions © 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands included that I think will resonate with greater frequency throughout the remainder of this economic down- herein are trademarks of Xilinx, Inc. All other trade- turn and likely afterward. marks are the property of their respective owners. Does all this mean that FPGAs will, in fact, lead the semiconductor industry in the recovery? No The articles, information, and other materials included one can say for sure, but the arguments ring true to me. in this issue are provided solely for the convenience of our readers. Xilinx makes no warranties, express, implied, statutory, or otherwise, and accepts no liability with respect to any such articles, information, or other materials or their use, and any use thereof is solely at the risk of the user. Any person or entity using such information in any way releases and waives any claim it might have against Xilinx for any loss, damage, or expense caused thereby. Mike Santarini Publisher GET UP TO 50% LOWER COST

sIntegrated features and only two power rails minimize need for external components sSave up to 70% in logic cell resources with dedicated DSP blocks sRun cool with 11mW static power, 0μW in hibernate mode

In the highly competitive high-volume market, cost is king. Our latest Extended Spartan®-3A FPGAs deliver the integrated features, low static power, complex computation and embedded processing capabilities you need to achieve the absolute lowest total cost. Period.

Combine these advantages with the industry’s largest selection of IP cores, reference designs and I/O standards and you have the most complete low-cost programmable solution available for your next high-volume design.

Visit us at www.xilinx.com to download our free ISE® WebPACK™ design tools and start saving money today.

©2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. CONTENTS

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES 18 Xcellence in Wireless Communications Baseband Development for 3GPP-LTE Just Got Easier…14 66 Xcellence in Automotive & ISM Updated Starter Kit Speeds Video Development…18 Letter from the Publisher Will FPGAs Lead Semiconductors Xcellence in Aerospace & Defense out of the Economic Doldrums?...4 A Flexible Platform for Satellite-Based High-Performance Computing…22 28 Xpectations Xilinx’s Support Network: 28 Our Success Is Your Success...66 Virtex-5 Powers Reconfigurable, Rugged PC…28

XCELLENCE IN WIRED COMMUNICATIONS Cover Story FPGAs Take Central Role 8 in Wired Networks 8 FIRST QUARTER 2009, ISSUE 67

THE XILINX XPERIENCE FEATURES

Xperts Corner Floating Point: Have it Your Way 50 with FPGA Embedded Processors…32

Xplanation: FPGA 101 Optimizing Xilinx FPGAs for Power…36

Xperiment Computer Interface Makes 19th-Century Pipe Organ Rock…44

Ask FAE-X Hidden in Plain View…50

Profiles of Xcellence Mark Moshayedi Drives STEC Enterprise Storage to Greener Pastures…58 36

XTRA READING Are You Xperienced? 62 Xilinx, Partners Offer Free Seminar on High-Performance System Design…55

Xamples A mix of new and popular application notes…56

Tools of Xcellence A bit of news about our partners and their latest offerings…62

44 COVER STORY FPGAsFPGAs TakeTake CentralCentral RoleRole inin WiredWired CommunicationsCommunications

A demand for speed and thethe adventadvent ofof multimediamultimedia fuelfuel aa needneed forfor advancedadvanced programmable devices in next-generation networks.

8 Xcell Journal First Quarter 2009 COVER STORY

by Mike Santarini Wired Network Basics Publisher, Xcell Journal Today’s wired communication networks are Xilinx, Inc. like a series of roads, highways and super- [email protected] highways linking one destination to anoth- er. Each type of road has a speed limit, and The wired communications business has an the pokiest byways slow the overall traffic, insatiable need for speed. Fifteen years ago, increasing the time it takes for information data transport rates (aka bandwidth) were to reach its destination. typically in the hundreds of thousands of When a typical user accesses an Internet bits per second (bps). Today’s networks can site from a home PC and downloads a file, hurl data across the globe at 10 Gbps, and the request for data leaves the computer in a at some points in the network transmission data packet at a maximum 1 Gbps—the reaches terabit speeds. FPGAs have played copper wire connecting your PC to the car- a part in this evolution, and as FPGA tech- rier’s access network limits the speed. The nologies advance with Moore’s Law they access network reads the data packet for, will likely take a more central role in next- among other things, destination and size, generation wired networks. and then forwards it to what’s called a metro In a bid to attract customers willing to network—a faster series of electrical routers pay more for new, high-bandwidth net- and switches that reads the packet and for- works delivering multimedia content, wards the data to the next router along the telecommunication companies such as line. The data ricochets from router to AT&T and Verizon are pressuring network router in the metro network at 10 Gbps. equipment manufacturers to build faster For long-distance routes, the metro net- systems that will speed the delivery of sev- work may ultimately connect to a data eral types of data, not just voice. Steve superhighway called the core, an optical Rago, principal analyst of broadband and network that shoots the information at the Internet Protocol TV at market research speed of light to a series of metro networks firm iSuppli Corp., notes that telephone near the data server that contains the companies are urgently trying to transition Internet file—be it Web page, video clip or their businesses from voice-only networks music—you are trying to access. The data (see sidebar, page 12). Similarly, big corpo- server then sends the requested data files rations are also demanding faster network back, sometimes the same way, through the equipment that will allow employees to network (Figure 1). communicate more effectively worldwide. At each intersection or hub, a router In the financial sector, for example, a must read the data packet for such infor- speedy network can allow traders in far- mation as destination and size, and deter- flung locations to place trades quickly— mine the fastest route given network traffic here, faster communications translates conditions, before forwarding that data to literally into increased revenue. its next stop. When negotiating longer Network equipment manufacturers routes onto the optical network, a router such as Cisco Systems, Alcatel-Lucent, on the front of the optical network must Nokia-Siemens Networks and Juniper translate that data from the digital signal Networks are among the many companies suited for electronic routers to a pattern of vying to be first to market with equip- light in the optical domain. Finally, at the ment that can offer carriers and enterpris- end of the core network, another router es 40-Gbps and 100-Gbps data transport must do the opposite, retranslating that speeds. To do so, they must first create a data from light back to the format for an new generation of routers and switches electrical packet. Then it must read the net- powered by the latest generation of bleed- work traffic conditions for the fastest route ing-edge ICs. What’s more, they must and ship the data to the next electrical accomplish this feat while standards for router or data server. next-generation networks—namely, 40G Access or download typically occurs in a and 100G—are still evolving. matter of minutes to a matter of seconds,

First Quarter 2009 Xcell Journal 9 COVER STORY

Corporate Communications Personal Computers

Personal Digital Assistants

Internet Enabled Cell Phones Workstations Servers

ENTERPRISE

Broadband Communications ACCESS

EDGE EDGE Wireless Access Point CORE

Workstations Wireless Networking EDGE

METRO Figure 1 – Next-generation wired communications ENTERPRISE networks running at a bandwidth of 40 to 100 Gbps will spur a new generation of broadband services and a host of new electronic devices. Storage

depending on the size and location of the those networks still remain independent, Inside the Router file. But tomorrow’s networks will hum the network industry has been doing its best To transport data at these speeds, network along even faster, thanks in part to ever- over the past few years to merge them, equipment makers will need to create very advancing FPGA technology. specifically around Ethernet. sophisticated equipment—routers, switches “Ethernet used to be the technology and transport systems—that employ Telecom and Datacom Convergence that simply connected you to your IT extremely advanced circuitry. Today there are two types of wired net- department,” said Brebner. “Now there is For example, at the heart of a metro router works—one for computing and the second Ethernet everywhere, and it has devel- is a series of line cards. Each line card receives for telecommunications. Traditionally, these oped into carrier Ethernet, in which the data packets in a wide range of protocols, networks have been separate, each with its telecom industry is using this Ethernet examines the packets for origin, size, destina- own set of unique protocols, routing equip- technology internally in their network.” tion and information regarding the rest of the ment, bandwidth requirements and rate of The 10-Gbps Ethernet (10GE) technolo- network, and then forwards the packet to a bandwidth growth. For example, the tele- gy “has been standardized for a few years switch. The switch, in its turn, shuttles the com industry has typically increased band- now,” Brebner said, “and currently, 40GE packet to its next destination along the net- width in increments of roughly four (2.5 and 100GE are being drafted together as work. The line card must accomplish all these Gbps to 10 Gbps, now moving to 40 IEEE 802.03ba.The final standard is computations in nanoseconds. Gbps), while computer networking has expected to be completed in late 2009.” Traditionally, a line card consists of a done the job in leaps of 10x (100M, 10G, Brebner explained that 40GE would CPU, a series of dedicated network processor 100G). However, Xilinx distinguished engi- have been the next step for telecom, but units (NPUs) and a number of the highest- neer Gordon Brebner notes that during the the industry expects that rate will initial- speed FPGAs available. As a packet enters a last wired-network retooling a few years ly best suit enterprise networking. For line card, an FPGA translates the raw data ago, a convergence of sorts took place at 10 longer transmissions, carriers will use the into formats that a given router can read. Gbps, where physical signaling for Ethernet 100GE standard. It would not be surpris- The processor coordinates the NPUs to read converged with signaling for telecom as ing, however, if competition spurs some and route data, while the FPGAs facilitate both network types independently equipment companies to drive enterprise some of the communication between the increased their top bandwidth rates. While networking, too, to 100GE. CPU and the NPUs.

10 Xcell Journal First Quarter 2009 COVER STORY

Some equipment makers expect FPGAs to start playing a more-central role in the router, integrating the functionality of an NPU into the programmable logic fabric.

To handle packets properly, routers usually reserved for NPUs. Integrating the tions applications, network equipment must understand multiple protocols. translation and interface functionality on a designers are making even greater use of Indeed, said Brebner, they must support a single chip ultimately speeds processing, these versatile devices in next-generation variety of legacy and new protocols that are reduces the overall bill of materials and routers. Each new generation of FPGA layered together in a single packet. Of lowers the power consumption of the technology includes a greater number of course, if the entire world converged on router—and with it, the operating expen- high-speed transceivers to match the one protocol or set of protocols, the net- ditures for the overall network. Moreover, increasing overall bandwidth of the net- work might be able to really speed up. But because FPGAs offer hardware as well as work, even as the overall speed of each much of the differentiation that makes one software reconfigurability and can be mod- transceiver continues to climb. For exam- network superior to its competitors lies in ified in the field, network equipment ven- ple, the recently released Virtex®-5 TXT the protocols the routers use, Brebner dors have an opportunity to upgrade their devices (Table 1) contain up to 48 noted. Carriers are not about to relinquish equipment while it’s in use—indeed, even RocketIO™ multirate transceivers run- this competitive edge. while it’s still running. ning at 6.5 Gbps. They allow the device to Next-generation wired networks will be With FPGAs evolving rapidly in ways deliver the 312 Gbps total bandwidth transferring voice, Internet data and video that particularly suit wired communica- required for building network bridges. simultaneously. This so-called triple play requires the development of new protocols and, inevitably, a series of refinements and Virtex-5 TXT FPGA Platform modifications as carriers race to transfer Part Number XC5VTX150T XC5VTX240T this data more efficiently and safely. Slices 23,200 37,440 That’s why the ability to modify hard- ware and change functionality is becoming Logic Cells 148,480 239,616 so important—it allows telecommunica- CLB Flip-Flops 92,800 149,760 tions equipment to take advantage of the new protocols and in so doing, delivers a Maximum Distributed RAM (kbits) 1,500 2,400 huge advantage to OEMs. Many companies Block RAM/FIFO w/ECC (36 kbits each) 228 324 are shunning ASICs and ASSPs in their Total Block RAM (kbits) 8,208 11,664 communications systems because those ICs offer the ability to modify only their own Digital Clock Manager (DCM) 12 12 software. FPGAs, by contrast, let you mod- Phase-Locked Loop 6 6 ify the hardware, test software functionality Maximum Single-Ended Pins (4) 680 680 in the software domain and then speed it up by creating a hardware implementation of DSP48E Slices 80 96 algorithms in an FPGA. PCI Express Endpoint Blocks 1 1 Still other equipment makers expect FPGAs to start playing a more-central role 10/100/1000 Ethernet MAC Blocks 4 4 in the router, integrating the functionality RocketIO™ GTX High-Speed Transceivers 40 48 of an NPU into the programmable logic Package (7,8) Area fabric. FPGA vendors tend to be the first silicon developers to use new IC processes. FFA Packages (FF): flip-chip fine-pitch BGA (1.0 mm ball spacing) This trait has given them the full doubling- FF1156 35 x 35 mm 360 (40) of-capacity benefits of Moore’s Law, with the payoff of more real estate on each die FF1759 42.5 x 42.5 mm 680 (40) 680 (48) for additional functionality. With each new generation of FPGA, the likelihood Table 1 – Xilinx’s serdes-heavy Virtex-5 TXT provides developers of next-generation wired communications equipment with a programmable platform for innovation. increases that customers can add functions

First Quarter 2009 Xcell Journal 11 COVER STORY

In addition to the high-speed trans- right for the job, is one of the most trying NPU design companies have come and ceivers, the number of logic cells roughly issues equipment manufacturers confront, gone, leading some big vendors, such as doubles with each new generation of Brebner said. The choice is complicated by Cisco Systems, to develop their own. FPGA, in keeping with Moore’s Law. These the fact that with each generation of router, However, as FPGA technology advances additional logic cells allow equipment a new batch of startups arises to build the in each generation, there is a greater oppor- manufacturers to place greater functionali- NPUs to power them. “NPUs are about the tunity for customers to integrate NPU - ty within each FPGA, perhaps functionali- most fractious area in the market,” said lectual property (IP) into FPGAs themselves. ty that was previously assigned to NPUs. Brebner. “Each NPU is designed in a par- OEMs can also leverage the devices’ recon- Developing NPUs for each generation ticular way for various niches and func- figurability so that, as data packets come in of equipment, or deciding which NPU is tions.” The vendor landscape is volatile: with different protocols, the FPGA can

Battle for the Broadband Bundle

acing declining revenue as cable competitors poach their voice Further, Rago said that instead of charging consumers for customers, telephone companies are rapidly bulking up their bits per second, the telcos have decided to do what MSOs do networksF to offer multimedia services, potentially driving new today: have users pay for the services they want. “You pay for growth in next-generation broadband equipment. your video service [IPTV, for example], you pay for your voice “For the last several years, traditional telephone companies have and your other services—you’ll pay more depending on what been losing their subscriber base at an alarming rate,” said iSuppli services you add to your plan.” Rago said that most telephone analyst Steve Rago. “Roughly 4 to 10 percent of their subscribers are companies are already offering these new services or plan to disappearing every year.” soon do so. The erosion is occurring for several reasons. “Number one is that many folks are using mobile phones as their only phone Multibillion-Dollar Question line,” said Rago. “The second reason is that the need for a second Meanwhile, MSOs will continue to attempt to lure traditional line for the Internet and in some cases even a fax is disappear- voice subscribers to their multimedia mix and will likely come up ing—with broadband you don’t need a second line for the with their own value-added services. Internet.” In addition, he said, cable multiple-service operators But the multibillion-dollar question for all these companies, (MSOs) have been successfully snatching away traditional voice telephone and MSO alike, is not simply how to establish services, bundling voice with cable TV and the Internet. In last growth, but how to establish sustainable growth. year’s fourth quarter, said Rago, “cable added 1 million voice sub- One challenge is “getting a fat enough pipe to the home to scribers in the United States alone.” The same transition is going offer all of these new services,” Rago said. DSL worldwide, and on worldwide, he noted, even in mainland China, where the ADSL in particular, is still the biggest, he said, “but we’ve seen voice networks are only a dozen or so years old. growth in broadband DSL and fiber to the home; or fiber near to Telcos, for their part, are enjoying increased revenue from the the home and BDSL near to the curb. Fiber to the home is second broadband services they currently offer, Rago said. However, it has- now in terms of new services. It even surpasses cable modems.” n’t offset the loss of revenue from voice. “The net result is that they Upgrading the access equipment to handle these new serv- are either holding steady [in terms of] revenue growth or growth is ices will, of course, be key to making all this possible. “We’re declining, which is not a very good position for Wall Street,” he looking at equipment that can handle speeds anywhere from said. “If the telcos don’t change the way they do business, they are 30 Mbps to 100 Mbps,” he said. New services such as time- going to become extinct—there won’t be a need for them anymore.” shifting TV and video-on-demand will put tremendous pres- Meanwhile, their competitors in the wired space, the MSOs, are sure on bandwidth in the network. “It will drive a major need not experiencing huge growth either, said Rago. “Actually, they are for innovations and enhancements in long-haul and the metro seeing revenues hold steady or even decline, with more competition networking space,” said Rago. It will also give rise to a new from satellite companies and the telcos,” he said. crop of high-speed-data consumer devices that will in turn To get back on a growth path, telephone companies worldwide shape the feature requirements for next-generation services. have collectively decided to deliver video as a value-added service to Indeed, Rago said that “the battle for the broadband bun- voice, along with other offerings, Rago said. They are banking on dle” should result in novel technologies and services that ulti- one of these Internet-based services in particular: time-shifting TV, mately drive innovations in related fields. But who will win which will allow you to watch whatever program you want to see that battle is anyone’s guess at this point. whenever you want to see it. “It will be a paradigm shift from the To read more about the competitive landscape, contact way you watch TV today,” said Rago. “It’s one of the advantages iSuppli for its latest report on consumer communications. telecom companies have with IPTV over the MSOs.” — Mike Santarini

12 Xcell Journal First Quarter 2009 COVER STORY

Quad into the GTX transceiver, saving nearly 2 x 5.15G to 10G Mux 10x "MLD" = 20 x 5.15G one-fifth of the logic count and power Saves PCB Space consumption of the design. 2 1 In June 2008, telecommunications SFP ± 10g Phy giant Comcast Corp. announced it had 2 2 SFP ± 10g Phy successfully completed a 100GE technolo- gy test over its existing backbone infra- 2 3 SFP ± 10g Phy structure between Philadelphia and TM & System 20-40 SERDES 100G PP @ MAC McLean, Va., using the industry’s first MAC 100G 100GE router interface. The system used Interlaken or XAUI or RXAUI or the same Sarance Technologies’ High Speed Customer Ethernet IP Core (HSEC) running on a Proprietary Virtex-5 FXT FPGA that is supported by 2 10 SFP ± 10g Phy the Virtex-5 TXT platform today. The demonstrations follow up on early

achievements in the 100GE domain. In Figure 2 – Sarance Technologies’ 100GE MAC solution implemented with Virtex-5 FPGAs November 2006, Xilinx FPGAs were the vehicle used to showcase the world’s first instantiate or implement an NPU architec- the ASSPs and go directly with an FPGA. successful 100GE transmission through a ture on the fly that is best suited to read the Every generation has had that brief window live production network demonstrated at data, even run a security check in it, negoti- where they use ASSPs, but with each gener- the SC06 International Conference, the ate the fastest route to its destination and ation that window is becoming more nar- confab of high-performance computing, then forward the data there. row—eventually it will stay closed.” networking, storage and analysis. “FPGAs have traditionally performed Finisar teamed with Level 3 embedded RISC and control plane func- Advancing FPGA Technologies for Wired Comms Communications, Internet2 and the tions,” said Loring Wirbel, longtime commu- Today, Xilinx’s largest Virtex-5 TXT University of California at Santa Cruz to nications watcher and moderator of EDN XC5VTX240T device contains 37,440 demonstrate the transmission of 100GE Magazine’s new FPGA Gurus Web site. logic slices with a total of 239,616 logic traffic over Level 3’s DWDM network “Today’s FPGAs can now handle a lot of cells. This architecture has afforded design from the show site in Tampa, Fla., to datapath functions, so now seemingly a single teams and IP vendors great opportunities Houston and back—a total of 4,000 miles. FPGA—depending on how it’s parti- to innovate solutions that support XAUI, The Xilinx FPGA electrically transmitted tioned—can serve as an aggregation box in an RXAUI, Interlaken, Sonet, ODN and all ten signals to ten 10-Gbps XFP optical enterprise or one of the blades in a big switch- many other wired standards with the most transceivers, which converted the signals into ing center. So you don’t need coprocessing if advanced FPGA silicon to date. the optical domain. From there, the signals you have partitioned things correctly. One of For example, Xilinx worked closely with traveled to Infinera’s commercially available the stories of the death of the network proces- Sarance Technologies to provide the indus- DTN Switched WDM System, which hand- sor is that slowly but surely, FPGAs started try’s first 100GE media-access controller, a ed them off to the Level 3 network. taking over the NPU’s function.” full-featured, IEEE 802.3ba-compliant Overall, FPGA technologies are Wirbel notes that traditionally, as each solution implemented with Virtex-5 advancing fast. With each turn of Moore’s new generation of equipment has rolled FPGAs (Figure 2). Law, FPGAs offer communications out, there is a very brief opportunity for Sarance announced in mid-2008 that its designers the ability to create higher-band- NPU vendors to field specialized packet- 100GE MAC solution was up and running width, next-generation networks. In the forwarding engines. But in the wink of an on tier-one vendor hardware prototypes not-so-distant future, network designers eye the opportunity passes, he said, and the using two Virtex-5 FXT FPGAs, 10 exter- will give FPGAs a more-central place in engines get replaced with FPGAs. nal 10-Gbps physical-layer devices and a their designs. How big that role turns out “As we start moving to 40G and 100G variety of system-side interfaces. to be will depend not only on the silicon, networks, there will be a temporary place for The 100GE MAC-to-Interlaken bridge but on the IP and hardware and software very fast engines that just do the packet for- solution that the new Virtex-5 TXT FPGA tools customers have at their disposal. warding, just as there was for 1G and 10G platform supports is a low-risk way to con- Xilinx remains committed to creating [technology],” Wirbel said. “But the thing dense functionality into a single FPGA innovations for the wired communications is, as you move to the new feature sizes, that and three external quad serdes muxes. In market, with an aim not just to maintain is only going to be a narrow window and this implementation, Xilinx’s 64/66 and its leadership but to build on it with new they may very likely just skip looking over 64/67 encode/decode gearboxes are built programmable solutions.

First Quarter 2009 Xcell Journal 13 XCELLENCE IN WIRELESS COMMS Baseband Development for 3GPP-LTE Just Got Easier With the release of LTE-Channel Encoder and Decoder, Xilinx helps customers speed Layer-1 subsystem development and address the performance and latency challenges of 4G wireless.

by David Nicklin Senior Manager, Wireless Product Marketing Xilinx, Inc.

The baseband processing signal chain presents both the greatest challenge and the best opportunity for innovation in the base transceiver station. No wonder, then, that it has become a key area for product differentiation among OEMs. Competition in baseband architecture designs has intensified with the realiza- tion that many of the techniques used for earlier 2G and 3G systems simply will not scale to meet the performance and latency requirements of the 3GPP Long Term Evolution (LTE) technology, wire- less’ fourth generation Not only does the processing chain have to contend with far more processing than previously, but also, all the functions must be completed in much less time. Rounding out the set of challenges facing system architects is the need to develop a system that can meet the operators’ aggres- sive capital- and operational-expenditure reduction targets. These major pressures on the baseband processing system design are illustrated in Figure 1.

14 Xcell Journal First Quarter 2009 XCELLENCE IN WIRELESS COMMS

More Complex Algorithms come into play due to the enormous Iterative Techniques amount of data that has to pass to-and-fro between them. So, how can we remove such bottle- necks? The key lies in simplifying the Layer-1 system architecture and eliminat- ing all unnecessary chip-to-chip data transfers. This simplification process rais- Reducing Increasing Processing es some uncomfortable questions about Data Rates Time the scalability of architectures based on digital signal processors. Designers need a strong portfolio of intellectual property (IP), software and support to aid them with the transition to a Layer-1 system architecture in which most functions are implemented in programmable hardware Cost Pressure rather than DSPs.

Figure 1 –Challenges in evolving baseband processing needs Simplifying Layer-1 Design Let us look more closely at the problems that can occur when the FPGA is solely employed as a coprocessor to offload per TTI (1msec): 7.2k QAM Symbols 221kb 553kb turbo decoding functions from the DSP processor. In analyzing the effectiveness of such a partitioning in a typical LTE base- Digital Down Soft QAM Turbo De-puncturing band design (see Figure 2), Xilinx system Demapper Decoder Conversion architects have discovered that more than 20 percent of the available latency-timing Digital Front End Baseband budget can go into just shifting the data Transferring this data @ 3.125Gbps eats up 22% of available processing time. from a DSP processor to an FPGA and back again via SRIO connections. FPGA Shockingly, this is far from a worst-case Required scenario. Add to the mix data that is coded with higher-order modulation DSP Figure 2 – Data rates required between FPGA schemes, such as 64-QAM, two MIMO and DSP in a typical LTE system code words at a 1/3 code rate, over a full 20-MHz LTE band, and the numbers rap- An FPGA-based solution can meet all deployment of iterative turbo error-correc- idly get much, much worse. of these demands, while sidestepping tion techniques in 3G networks, a scheme One response is simply to add bigger the usual performance issues and bottle- that migrated from discovery to commer- “pipes” to move the data around by deploy- necks. Initiatives such as Xilinx’s newly cial release inside of 10 years. The pace of ing more high-speed multigigabit-trans- released LTE Uplink Channel Decoder innovation has continued to accelerate, ceiver connections. While it is certainly and LTE Downlink Channel Encoder most notably with the exploitation of the possible to construct a system this way, it LogiCOREs™ seek to remove the barri- spatial dimension in wireless communica- leads to an unnecessary increase in system ers to FPGA adoption by incorporating tion through various multiple-input, mul- power dissipation, as the relatively power- many of the critical Layer-1 functions in tiple-output (MIMO) antenna techniques. hungry high-speed serial connections a single IP solution. With the advent of 4G air interfaces transport the data back and forth and Advances in silicon technology have however, the pressures have built to the bridging functions are replicated, leading been a key enabler in the success of wire- point where traditional programmable to the need for more hardware resources. less communications by facilitating the DSP-centric channel card architectures are There is a better, more-optimal solu- rollout of ever-more-complex algorithmic struggling to cope. The traditional parti- tion. By incorporating most of the Layer-1 techniques from the research labs into tioning between FPGA and DSP is con- functionality within the FPGA, a designer products. One such example was the strained by performance bottlenecks that can free up this unnecessary overhead and

First Quarter 2009 Xcell Journal 15 XCELLENCE IN WIRELESS COMMS

pass on the savings in improved system Xilinx’s LTE Uplink Channel Decoder throughput and latency, while at the same and LTE Downlink Channel Encoder time lowering power requirements. The LogiCOREs remove such barriers to adop- power saving alone translates directly into tion by folding many of the critical Layer- improved reliability, reduced system cost 1 functions in a single IP solution that can and opex savings. be customized via a GUI in the Xilinx Such an architectural approach avoids CORE Generator™ tool. This design flow the need for DSPs altogether—although enables engineers with limited FPGA expe- they can be incorporated, if desired, to per- rience to concentrate on the wider system form lower-rate functions. With this kind of design, saving significant development and a partitioning, the FPGA implements the integration effort. entire Layer-1 baseband processing, leaving the other higher-layer functions, such as The Shape of Things to Come media-access control and HARQ process- The drive to ever-faster connectivity and ing, to a more cost-effective general-purpose low-latency connections is a key require- processor or a network processor, which can ment of LTE and will remain so for future also handle additional backhaul connectivi- systems beyond 4G. As these newer data- ty functions. Integrating all the high-per- centric wireless systems evolve, many formance and time-critical functions on a companies that have adopted the tradi- single platform FPGA effectively circum- tional partitioning between DSP and vents delay and bandwidth limitations, and FPGA will find the overhead involved in partitioning becomes a much simpler task. shifting data between separate chips has The key stumbling block to date in become unacceptable. An FPGA-based adopting such an approach has been the solution is now much more accessible to need to simplify the process, from design designers looking for that extra edge in concept to hardware. Also, designers accus- their product design. Those who move tomed to a DSP-centric design flow need past their ties to a legacy system design IP and development tools that make it eas- approach will deliver a product that sur- ier to unlock the capabilities of the FPGA, mounts the performance issues and bot- and quickly and effectively develop base- tlenecks their competition will continue band functionality within it. to experience.

Inside the LogiCOREs Xilinx’s recently released LTE Channel Encoder and Decoder LogiCOREs, designed for 3GPP rel8 E-UTRA eNB baseband processing, comply with spec- ifications 3GPP TS 36.211 v8.2.0 and TS 36.212 v8.2.0 (2008-03). They sup- port different configurations up to 20 MHz in bandwidth with normal (short) CP, 64-QAM modulation and two MIMO code words. Both can also handle FDD and TDD frame structures, suiting them for systems that have evolved from the TD-SCDMA standard. Both the encoder and the decoder are supplied as standalone, parameterized IP blocks that can be easily incorporated into a customer’s design using the Coregen software tool. Supporting them is a comprehensive set of tests, simu- lations and C models for system simulation in order to aid design integration. More details on the new LogiCORES are available at the Xilinx IPcenter (www.xilinx.com/ipcenter). Designers can access additional wireless reference designs and solutions via the Wireless End-Market page (www.xilinx.com/esp/wireless).

16 Xcell Journal First Quarter 2009 More gates, more speed, more versatility, and of course, less cost — it’s what you expect from The Dini Group. This new board features 16 Xilinx Virtex-5 LX 330s (-1 or -2 speed grades). With over 32 Million ASIC gates (not counting memories or multipliers) the DN9000K10 is the biggest, fastest, ASIC prototyping platform in production. User-friendly features include: • 9 clock networks, balanced and distributed to all FPGAs • 6 DDR2 SODIMM modules with options for FLASH, SSRAM, QDR SSRAM, Mictor(s), DDR3, RLDRAM, and other memories • USB and multiple RS 232 ports for user interface • 1500 I/O pins for the most demanding expansion requirements Software for board operation includes reference designs to get you up and running quickly. The board is available “off-the-shelf” with lead times of 2-3 weeks. For more gates and more speed, call The Dini Group and get your product to market faster. www.dinigroup.com • 1010 Pearl Street, Suite 6 • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected] XCELLENCE IN AUTOMOTIVE & ISM Updated Starter Kit Speeds Video Development The XtremeDSP Video Starter Kit, Spartan-3A DSP FPGA Edition 2, provides a high-performance development platformplatform for complex high-definition systems.

by Joe Mallett Senior Product Line Manager Xilinx, Inc. [email protected]

The video industry’s shift to more-complex and integrated processing solutions along with demanding, next-generation video compression standards is driving system requirements for video performance that exceeds what standalone DSPs can deliver. No wonder, then, that many companies designing state-of-the-art video equipment are turning to FPGA platforms. In particu- lar, many of them are joining companies in sensitive military, automotive, medical, con- sumer, industrial and security applications in choosing the Xilinx® Spartan®-3A DSP, which supplies more than 20 GMACs of DSP performance for less than $30. In addition to performance, this FPGA provides an integrated solution supporting DSP and embedded processing using MicroBlaze™ processors in the system. Those features enable OS support and driv- ers for specific market segments. Xilinx recently updated the XtremeDSP™ Video Starter Kit—Spartan-3A Edition to help design groups get started with these kinds of advanced designs, and to accelerate devel- opment times. This kit can help you create advanced video systems for whatever applica- tion you are targeting.

18 Xcell Journal First Quarter 2009 XCELLENCE IN AUTOMOTIVE & ISM

Video Starter Kit Version 2.0 Reference Design Functionality Description The latest version of the XtremeDSP DVI Pass-Through • Capture a video stream input port Video Starter Kit (VSK) for the Spartan- • Perform real-time image processing on the video stream 3A provides a comprehensive platform • Display the processed video that accelerates the development of video applications on Xilinx FPGAs. Designed DVI Frame Buffer • Capture a video stream from a camera to leverage the cost/performance advan- • Perform processing on the video stream tages of the Spartan-3A DSP FPGA device • Buffer the video stream in external memory family, the kit offers an updated set of • Report memory bandwidth utilization data video reference designs based on an Camera Frame Buffer • Capture a video stream from a camera embedded design framework that allows • Perform processing on the video stream customers to focus on their unique value- • Buffer the video stream in external memory added development. • Display the processed video at a different rate The VSK provides multiple reference • Use a MicroBlaze to configure the video pipeline designs that can accelerate the develop- Video Processing • Capture video stream from an S-video interface ment of video applications running on • Support weave and de-weave access to video memory Xilinx FPGAs. Each reference design is • Perform gamma processing and 2D FIR filtering built upon a common framework and uses multiple interfaces for the I/O of video Hardware Co-SIM • Initialize a point-to-point connection to the platform data to the FPGA. Table 1 lists each refer- • Any design within System Generator can add a HW-CoSIM token ence design and the video processing and • Validation acceleration with hardware in-the-loop connectivity capability it illustrates.

Table 1 – Reference designs

Block RAM

ILMB DLMB

Processor MicroBlaze GPIO GPIO GPIO UART System Reset Processor MDM DIP Switches Push Buttons LEDs

PLB PLB Clock ARB Generator IXCL

DXCL System ACE XPS IIC XPS IIC

MPMC

V V Camera VIDEO Camera Camera F F Display Process- Gamma _TO_ DVI_OUT DVI_OUT In In B B Controller ing VFBC C C

Key: EDK IP VSK IP Figure 1 - Camera reference design

First Quarter 2009 Xcell Journal 19 XCELLENCE IN AUTOMOTIVE & ISM

Starting with the camera reference design, a software developer can, from day one, begin implementing an operating system and programming the application layer using the EDK software development tools.

The reference designs help accelerate system without having to design the sup- processor. Designers gain performance development by implementing specific porting hardware peripherals. and achieve system integration by exploit- data flows that are common to video sys- ing the flexibility of the device to config- tems. One such example is having a camera Embedded Processing ure a hardware architecture optimized for provide RAW image data to the FPGA for The migration to a complex hardware a particular application. Starting with the processing and display, as seen in the cam- acceleration processing system is stepping camera reference design, for example, a era reference design shown in Figure 1. up the need for embedded processing to software developer can, from day one, The VSK provides all necessary source handle all the real-time control, configura- begin implementing an operating system and project files for the reference designs, tion and system interaction. and programming the application layer which developers use as a starting point. The tight integration means designers using the EDK software development In the camera reference design, the camera can convert DSP designs captured in tools (see Figure 2). processing block is a design developed in System Generator into custom peripher- This flexibility adds a degree of free- System Generator and integrated as a ded- als for Platform Studio, and connect dom to the development process that also icated hardware peripheral with the EDK them to the base system using the PLB reduces the design complexity. The embedded system. This allows hardware bus. This allows a system designer to eas- XtremeDSP Video Starter Kit gives a designers to easily remove the example ily enable the system control and migra- hardware or software developer a com- image processing, replace it with new or tion of existing system software with the plete and easy-to-use design environment existing designs and integrate it within the adoption of a MicroBlaze v7 soft-core with example applications and full sup-

Application Stack

Operating System

Block RAM

ILMB DLMB

Processor MicroBlaze GPIO GPIO GPIO UART System Reset Processor MDM DIP Switches Push Buttons LEDs

PLB PLB Clock ARB Generator IXCL

DXCL MPMC System ACE XPS IIC XPS IIC

DDR2

Figure 2 – Application programming on the VSK

20 Xcell Journal First Quarter 2009 XCELLENCE IN AUTOMOTIVE & ISM

Figure 3 – System Generator diagram for a camera reference design port for the standard Xilinx tool flows. Designers can first implement video That combination can help to accelerate algorithm designs as a MATLAB or the design process and allows for end Simulink model using the optional Video product differentiation. and Image Processing Blockset. As the System Generator supports hardware- development moves to the next stage, in-the-loop co-simulation using the hardware implementation is easily Spartan-3A DSP 3400A development enabled through System Generator for platform, which can accelerate the per- DSP, which provides a rich set of DSP formance of Simulink® simulations up building blocks, optimized for Xilinx to 100x. This acceleration enables video devices, for use within the Simulink mod- algorithm development and debug using eling environment (see Figure 3). real-time video streams read into Once the hardware design is completed Simulink using The Mathworks’ Data in System Generator, the HW-CoSIM Acquisition Toolbox™. functionality accelerates the validation time by placing hardware in the loop. For Hardware Acceleration complex systems, this can greatly improve Now that the required processing band- the testing run-time and thereby increase width is outpacing the current capabilities the number of iterations completed within of standalone DSP processors, hardware a given amount of time. acceleration is becoming a necessity in The XtremeDSP Video Starter many video applications. FPGAs enable Kit–Spartan-3A DSP Edition coupled this hardware acceleration while delivering with the integrated reference designs the additional benefits of system integra- provides the ideal platform for video tion and architecture repartitioning. developers by enabling various modes The migration from a standalone sys- for processing data from streaming to tem processor to the integration of a frame buffer based. Video developers coprocessor requires some design explo- can quickly design and validate hardware ration as the hardware designer is looking peripherals (HW-CoSim) using System at the various functions to accelerate. The Generator. Integration of hardware first challenge is the need to have a variety peripherals and embedded processing of design flows that enable abstract model accelerates the development of complex programming using MATLAB™ and video systems within industrial imaging, Simulink, and the easy integration of any broadcast, consumer, medical and auto- existing VHDL/Verilog designs. motive applications.

First Quarter 2009 Xcell Journal 21 XCELLENCE IN AEROSPACE & DEFENSE A Flexible Platform for Satellite-Based High-Performance Computing Space-grade Virtex FPGAs and a reconfigurable system architecture satisfy demanding size, weight and power requirements and accelerate design cycles.

by Ian Troxel Future Systems Architect SEAKR Engineering, Inc. [email protected]

Greg Lara Marketing Manager Xilinx, Inc. [email protected]

Developers of space-based electronic sys- tems face increasing pressure to deliver higher levels of performance while working within more-aggressive project schedules and tighter budgets. Many space-based sys- tems now under development call for advanced video equipment that will cap- ture images with extremely high resolution and then relay those images instantaneous- ly back to earth. To do this, new systems need to include advanced processing cir- cuitry, greater storage capacity and the abil- ity to transfer large data files quickly over long distances. However space-based sys- tems have a unique set of size, weight and power (SWAP) constraints that can prove taxing for designers. Requirements to do more for less are driving the adoption of commercial, off-the- shelf (COTS) devices, such as FPGAs. The flexibility inherent in reconfigurable FPGAs offers tremendous benefits for developers of space-based systems in terms of SWAP con- straints, cost and productivity.

22 Xcell Journal First Quarter 2009 XCELLENCE IN AEROSPACE & DEFENSE

One way to get the maximum leverage prototyping and evaluation), but undergo additional logic. The mezzanine card, from available engineering and budget special processing and screening to Class-Q which also plays a part in SEE mitigation resources is to create a flexible payload that and Class-V requirements. in certain applications, joins the RCC can be deployed in multiple missions. Our Working in concert with sequential board through a set of three connectors, company, SEAKR Engineering, Inc., processors, the Virtex-4 FPGAs serve as a each providing 170 LVDS I/Os. employed reconfigurable Xilinx® Virtex® coprocessor for accelerating key processing- Moving mission-specific functionality FPGAs to create a flexible high-perform- intensive tasks. The three-FPGA board to the mezzanine card has enabled us to use ance computing platform that serves as the architecture offers flexibility for addressing the same FPGA-based processing card for heart of a variety of space-based systems. the unique requirements of different mis- multiple unique applications. The com- This reconfigurable computing (RCC) sions. In some applications we use the three mon architecture has reduced project risk, methodology has enabled our engineers to FPGAs for SEE mitigation techniques that costs and schedules. achieve demanding performance targets require component-level redundancy. In within SWAP, cost and time constraints others, we partition a large coprocessor Mitigating Radiation Effects in FPGAs for a number of missions, most notably across multiple devices and use the ring bus Flying reconfigurable FPGA-based sys- our onboard processor for Raytheon’s connecting the three FPGAs via an LVDS tems in space requires special considera- Advanced Responsive Tactically Effective interface for high-speed communication tions to ensure reliable operation in a Military Imaging Spectrometer (Artemis), among the devices. Employing an extended high-radiation environment, because the our Programmable Space Transceiver, our 6U form factor, the card has two connec- SRAM-based configuration circuitry is Programmable Space IP Modem and the tors for interboard communication: one for susceptible to radiation-induced upset. Orion Vision Processing Unit, currently a CompactPCI backplane and another for The first consideration is the choice of under development. a high-speed serial network. component. In addition to industrial and Each FPGA has direct access to dedi- military temperature-grade options, Application-Independent Processor Architecture cated high-speed memory on the RCC Xilinx offers V-grade Virtex-4 and Virtex- This new platform we developed, called the board and to a connector supporting 5 FPGAs that undergo special processing Application Independent Processor (AIP), expansion and customization via a high- to make them immune to radiation- comprises a mix of scalar processors and speed mezzanine card. This architecture induced latch-up and deliver guaranteed RCCs in a flexible, scalable architecture enables us to expand the capabilities of performance for total-dose effects. These that supports open standards (Figure 1). Its the RCC board with mission-specific devices also undergo extensive characteri- flexible I/O architecture allows us to mix I/O, memory, analog circuitry and even zation in neutron and proton beams to and match boards to create different con- figurations that best suit our application requirements, delivering what we call mis- sion-unique functionality. The AIP lever- AIP System Architecture ages the unique capabilities of Xilinx’s SRAM-based FPGAs to enable in-orbit Shown in the TacSat3 ARTEMIS Configuration reconfigurability for additional flexibility GPS CameraLink Analog Focus GigE Switched 1 pps 2 Ports Position Mechanism PMC 28V and SWAP benefits. The AIP also supports a variety of single-event effect (SEE) radia- tion-mitigation techniques to ensure reli- Mezzanine Card able operation in different orbits. Spare Slot Additional RA-RCC RA-RCC PowerPC Power The heart of the AIP system architecture UPS Memory Cards with Processor Supply is a reconfigurable computer board hosting I/O Cards Virtex-4 FPGAs a trio of Virtex®-4 FPGAs (Figure 2). Our investigation of available components con- cluded that Virtex FPGAs were the only Power Planes 3.3,5 devices that could achieve our performance 28 Volts Spacewire targets and provide the characteristics PCI Bus A required for spaceflight. For the most 4-bit SERDES demanding applications, Xilinx offers Backplane Virtex-4QV space-grade devices. These FPGAs incorporate the same architecture as their commercial-grade counterparts Figure 1 – AIP architecture, showing combination of RCC boards and other system boards in Artemis (enabling low-cost system development,

First Quarter 2009 Xcell Journal 23 XCELLENCE IN AEROSPACE & DEFENSE

tion in the selected orbit and uptime High Speed High Speed High Speed requirements of the circuit. Mezzanine Mezzanine Mezzanine High Speed High Speed High Speed Memory Memory Memory The basic concept behind memory scrubbing is to rewrite the configuration Coprocessor A Coprocessor B Coprocessor C Xilinx Virtex-4 LVDS I/O Xilinx Virtex-4 LVDS I/O Xilinx Virtex-4 memory more frequently than upsets accu-

LVDS I/O mulate. Designers can choose from among SelectMap a number of memory-scrubbing techniques PCI Memory to suit different upset rates and uptime Memory Memory PCI-PCI requirements. At one extreme, the simplest Bridge / Config technique involves reloading a complete bitstream into the configuration memory. This method involves low overhead but cPCI High Speed Serial Network RCC Board Architecture requires that the circuit be inactive for a period at least as long as a configuration cycle. More-advanced techniques exist for applications with stricter uptime require- ments, higher upset rates or both. Leveraging the partial-reconfiguration capability of Virtex FPGAs, these methods involve circuits that detect memory upsets and then initiate reconfiguration of only a selected subset of the memory array.

AIP in Action We have already used the AIP architecture in four separate missions. The combination of the FPGA-based RCC board and flexible mezzanine card has enabled our engineers to build a variety of processing and com- munications systems rapidly and to imple- ment mitigation schemes suitable for the unique requirements of each mission. The first incarnation of the AIP was Artemis—the Advanced Responsive Figure 2 – RCC board architecture diagram and photo Tactically Effective Military Imaging Spectrometer—which will be flying on the generate reliable predictions of single- Designers can choose from among a TacSat-3 satellite, scheduled for launch in event upset (SEU) and single-event func- number of approaches to satisfy the per- the second quarter. Designed to provide sit- tional interrupt rates for particular orbits. formance and availability requirements of uational awareness on the battlefield, This data guides engineers in selecting an their system. One technique involves Artemis performs advanced image process- upset-mitigation scheme appropriate for redundant FPGAs and an external radia- ing on data the satellite collects and delivers the application and orbit. tion-hardened voter circuit. Another it to soldiers in the field via a narrowband Upset mitigation for reconfigurable approach is device-level mitigation, which downlink. Our engineers realized that an FPGAs generally involves some combina- involves triplicating mission-critical logic RCC approach would be required to meet tion of hardware triple redundancy and con- inside a single FPGA, along with the asso- the spacecraft’s size, weight and power goals: figuration memory scrubbing. Hardware ciated voter circuits. Traditionally, engi- dimensions of 7.8 x 11.41 x 10 inches; a triple redundancy involves tripling critical neers have tackled this triple-modular weight of 18 pounds; and power of 40 watts circuits to ensure uninterrupted operation redundancy (TMR) design technique by (with a hard limit of 50 W). even if one element experiences a radiation- hand. Xilinx now offers special design tools Two Virtex-4 FPGAs perform sensor data induced upset. It also adds a voter circuit that automate TMR implementation with- acquisition along with preprocessing func- that compares signals arriving from the trip- in an FPGA. Factors influencing the tions such as calibration. An embedded pro- licated logic branches and rejects an invalid choice of mitigation scheme include the cessing system based on the MicroBlaze™ signal resulting from an upset. size of the target circuit, the level of radia- soft-processor core coordinates memory

24 Xcell Journal First Quarter 2009 XCELLENCE IN AEROSPACE & DEFENSE

access and processor coordination, while a ; each bitstream configures Internet in Space PowerPC® single-board computer handles the FPGAs to process a specific waveform Packet-based networking in space prom- image generation and target cueing. Figure 1 and frequency. In this way the system is able ises to provide the same flexibility and shows the Artemis system architecture. to support multiple waveforms using a min- robustness available in terrestrial net- Because the image data path is not mis- imum amount of hardware (see Table 1). works. Long a mainstay of wireline net- sion critical, configuration memory scrub- The flexibility of the RCC board pro- working equipment, reconfigurable bing provides suitable mitigation for vides benefits that start with initial sys- FPGAs lend the same benefits of per- Artemis. That is, the designers were able to tem development. The delay between formance, flexibility and design accelera- satisfy the availability requirements without requesting and receiving a spectrum slot tion to space-based applications, as resorting to logic triplication or redundant can be greater than one year. demonstrated by our Programmable devices. Furthermore, we determined that Reconfigurable hardware allows designers Satellite Internet Protocol Modem. PSIM we could use commercial-grade FPGAs to to initiate development before receiving extracts Ethernet frames from standard satellite communications waveforms and Receiver/Uplink L-Band: 1,760 to 1,840 MHz facilitates IP routing on the spacecraft. Packet-based satellite communication S-Band: 2,025 to 2,120 MHz enables beam- and waveform-independ- ent routing of data through virtual cir- Transmitter/Downlink S-Band: 2,200 to 3,200 MHz cuits. Compared with standard bent-pipe Space Ground Link System FSK-AM Command Uplink (1 kbps, 2 kbps) satellite communication channels, pack- et-based networking improves scalability Subcarrier BPSK Telemetry Downlink (256 kpbs) and throughput, enables decentralized Universal S-Band Subcarrier BPSK Command Uplink (

First Quarter 2009 Xcell Journal 25 XCELLENCE IN AEROSPACE & DEFENSE

Figure 3 – The platform’s Programmable Satellite Internet Protocol Modem configuration

High-Performance Video sive mitigation scheme. Combining TMR for Manned Spaceflight methodology and configuration memory The most recent application of the AIP scrubbing ensures transparent correction of architecture is the Visual Processing Unit control path corruptions. (VPU) for the Orion Crew Exploration In conclusion, leveraging the capabilities of Vehicle. The VPU provides a reconfig- Virtex FPGAs, SEAKR engineers have devel- urable platform for processing image algo- oped an application-independent processor rithms driving pose estimation, optical for space applications and demonstrated its navigation and compression/decompres- flexibility on several missions. The RCC serves sion. The system receives image data from as a key component in satellite-based image a variety of sensors: star tracker, vision processing and communications, flexible radio navigation sensor, docking camera and sit- communications, space-based networking and uational-awareness camera. navigation for human spaceflight. To process this volume of data takes a Space-grade Virtex FPGAs are COTS combination of sequential processors and components that offer the performance FGPA-based RCC cards. Virtex-4 FPGAs required for demanding data-processing implement video-processing algorithms and communications systems. These such as feature recognition, graphical over- reconfigurable FPGAs enable a flexible, lay, tiling and video compression. They also scalable architecture that reduces develop- incorporate a MicroBlaze soft-processor ment cost and accelerates design cycles. In core to coordinate algorithm cores and addition to supporting rapid development processor communication. A LEON fault- and flexible manufacturing on the ground, tolerant processor-based single-board com- Virtex FPGAs offer the ability to reconfig- puter is dedicated to system coordination, ure on-orbit for additional, significant error handling, RCC configuration and SWAP benefits. oversight, and interconnect control. Future generations of V-grade reconfig- A mezzanine card provides the sensor urable FPGAs promise to offer greater size, interfaces, implementing LVDS links to weight and power benefits by delivering high- all three FPGAs for maximum flexibility er logic capacity, greater integration of hard- in video stream selection and mitigation ened IP blocks, higher performance and lower schemes. power consumption. Radiation-hardened Because of the “human-critical” nature reconfigurable Virtex FPGAs will simplify the of the tasks the VPU carries out, SEAKR designers’ task and extend SWAP benefits fur- engineers selected Virtex-4QV space- ther by eliminating the need to implement grade FPGAs and implemented an aggres- logic- or device-level redundancy.

26 Xcell Journal First Quarter 2009 OR

Agilent Logic Analyzers Agilent Mixed Signal Oscilloscopes Up to 1.2 GHz timing, 667 MHz state, and 256 M 4 scope channels + 16 timing channels deep memory + Agilent FPGA Dynamic Probe Application software to increase visibility inside your FPGA = Fastest FPGA Debug Available

2——EŽNoŽg—ŽE!c_™`gE—N¢l3™`ol!c—!l<—y!Ž!gE™Ž`3—

RMT’s SwitchBack uses Virtex-5 Powers Xilinx Virtex-5 FPGA in a PC that users can customize in the field Reconfigurable, and upgrade on the fly.

by Shane Lewis Director of Technology Development Rugged PC RMT, Inc. [email protected]

The U.S. military and companies in heavy industries such as mining, transportation, warehousing, logistics and public safety all have strict requirements for personal com- puters. First and foremost, their PCs must be ruggedized to endure physical abuse, extreme heat and cold, and exposure to moisture, even submersion. At the same time, these rugged PCs need computing functionality that’s on par with the latest commercial PCs, but surpasses them in security and global communications capa- bilities. Customers looking for this type of computer also require very specific peripheral features targeted at mission- critical tasks. But up until recently, these buyers have been forced to use standard, off-the-shelf PCs that often don’t fully meet their needs. This deficiency presented our design group at RMT, Inc., with the enormous challenge of designing a modular and cus- tomizable computing solution—a “com- mon platform” that truly executed the customers’ strictest requirements and met their expectations. Savvy R&D teams know the pitfalls of trying to build a “one- size-fits-all” device, which too often results in a kluge of difficult compromises. Our design team set out to defy the odds and engineer a truly adaptable computer plat- form that would be field-reconfigurable and customizable at the circuit level, while at the same time remaining an elegant, rugged and user-friendly system.

28 Xcell Journal First Quarter 2009 XCELLENCE IN AEROSPACE & DEFENSE

The result of this effort is the SwitchBack. This computer is truly different at every level, and redefines PC architecture through its innovative use of the Xilinx® Virtex®-5 FPGA. The difference between the tradition- al PC and the SwitchBack architecture is quite extraordinary.

Traditional Open PC Architecture The base of any modern legacy PC is an x86 processor and associated chip set for either Linux or Windows, namely Windows XP or Vista. It’s the legacy sup- port behind this code set that has enabled Figure 1 – The difference between a traditional PC architecture (left) and the revolu- tionary SwitchBack is that in the latter, the FPGA is the primary controller. This it to dominate the PC world and subse- improves processing time and makes the SwitchBack reconfigurable and customizable. quently, to constrain the embedded com- puter space, where operating systems and processor technologies tend to be more fragmented. If you open any desktop, lap- top or tablet PC designed to run System Flash Processor DDR RAM Windows XP or Vista, you will find a chip Memory set/CPU circuit topology. This architecture, which has been the de North DDR2 RAM facto standard for all PCs, both rugged and Bridge LVDS commercial, for many years, has one funda- mental limitation. Task execution must Virtex-5 PCIe either be written for a specific processor or South Bridge must be plugged into one of the many avail- GPIO able expansion ports as external hardware. The boundaries of what can be done are closely defined around the wiring of the ded- Figure 2 – In the SwitchBack architecture, primary system control is a icated ASICs that make up the chip set itself. function of the Virtex-5 FPGA, not the main x86 CPU processor.

SwitchBack Architecture To get a different outcome, the RMT team makes it possible to access and view data the Virtex-5 for its vast array of internal knew that we had to rethink this design. without booting the Windows operating resources as well as its RocketI/O™ and We crafted SwitchBack’s patent-pending system—a feat that’s virtually impossible PCI Express® Endpoint Block. We then set architecture to be both field-reconfig- with a traditional PC. The SwitchBack the PCI Express interface as the major con- urable as well as compatible with takes it a step further by allowing users to nection point between the embedded sys- Windows-based applications. The concept access and control custom functions and tem and the Intel x86 system. Although of field reconfiguration is widely utilized peripherals without the assistance of the there are other interface bridges between today in embedded computing, a trend processor or operating system. These are the components, this one is the major data that FPGAs have enabled. programmed into the BackPack, a modular pipeline for processing and control. While programmable logic may play system that attaches to the rear of the various supporting roles on some x86 PC- SwitchBack and allows full control of any Hardware Design based motherboards, the FPGA is the hero peripherals without waiting for slow-mov- The fundamental layout of SwitchBack’s in the SwitchBack, directing the comput- ing processors or operating systems. hardware design is essentially two systems er’s functions (Figure 1). The Virtex-5 is that can operate independently. The Virtex- the primary controller of all major subsys- The Heart of the SwitchBack 5 FPGA, not the x86 CPU processor, is the tems. From the moment the user presses The amazing flexibility and control of the primary control system (Figure 2). It actual- the power button, the FPGA controls all SwitchBack is made possible through an ly configures immediately upon system peripherals, including the display itself, as architectural design based on the Xilinx boot-up, before allowing the secondary sys- well as the flow of most data. This scheme Virtex-5 LX30T. The RMT team selected tem to begin booting.

First Quarter 2009 Xcell Journal 29 XCELLENCE IN AEROSPACE & DEFENSE

Additionally, the FPGA has its own Reconfigurable Hardware ent microprocessors, communications RAM and flash memory for both configu- The Virtex-5’s internal resource array— devices, arithmetic cores and autonomous ration and program storage, should users including the ExpressFabric Architecture, BackPack control. need to program secondary operations into block RAM, 1.25-Gbit/second Select I/O In fact, we call these functions “virtual the FPGA. This revolutionary architecture and DSP48E slices—offers a multitude of devices” because they appear to the operat- and its FPGA-defined algorithms provide possibilities when creating functions and ing system as separate hardware, yet there is several key capabilities that ordinary PCs processes that would normally be done in no physical circuit card to implement them. cannot accomplish. They include control of software on the main processor. With these One or more devices can be implemented at all system resources allowed by the main chip features, we were able to implement the same time, expanding the platform far processor or its base chip set, and inde- many capabilities into the FPGA to beyond the features expressed by the pendent and autonomous control of offload a portion of the main processor’s onboard chip set. peripherals, including those programmed duties, or create from scratch entirely new into the SwitchBack’s BackPacks. The subsystems that behave like physical Open FPGA for Open Architecture FPGA also includes processor-independent expansion cards in a regular PC. Part of our vision was to preserve the PC’s functions to assist the main processor. By using the PCI Express bus between open architecture platform and extend that Control of system resources and the two systems, we can connect the new flexibility further to include the FPGA itself. peripherals is straightforward and well- hardware expressed in the FPGA to the The capacity of the Virtex-5 FPGA is greater understood in the embedded world. But main processor as though it were physical than the SwitchBack requires for its main the use of processor-independent func- hardware plugged into an expansion port control functions. Our team purposely chose tions raises new possibilities for realizing (Figure 3). This allows the construction of a larger FPGA to allow customers to add the SwitchBack’s full potential. The new types of devices that operate inde- their own programming and functionality to options include reconfigurable hardware, pendently of the main x86 processor. SwitchBack, enabling them to develop a truly open FPGA for open architecture, addi- These devices or functions may include custom, precision tool that can accomplish tional customization via BackPacks and a data format conversion, custom logic the same results as multiple independent sub- BackPack modular development kit. interfaces, hardware emulation, independ- systems attempting to work in parallel.

Figure 3 – Any reconfigurable hardware expressed in the FPGA can connect to the main processor, which can access it as though it were physical hardware plugged into an expansion port.

30 Xcell Journal First Quarter 2009 XCELLENCE IN AEROSPACE & DEFENSE

Thanks to the innovative use of the Virtex-5 in the SwitchBack, customers need no longer settle for a suboptimal computer and an assortment of dissimilar peripherals connected by a mess of cables.

We also intentionally underutilized the Additional Customization via BackPacks and tie them to the computer rapidly. Virtex-5 FPGA as the system master in The SwitchBack is capable of further cus- Although a BackPack is part of the computer SwitchBack. For example, of the Virtex-5 tomization through its BackPack technology. itself, it can operate completely autonomous- LX30T FPGA’s 32 DSP48E slices, the BackPacks are customer-specific modules ly, sharing only processed data rather than SwitchBack uses only one, leaving the that users can securely attach to the back of bogging down the main processor with extra remaining 31 available for the end user. the SwitchBack. RMT initially created the duties, as can happen when attaching devices Depending on the buyer’s resource needs BackPack to eliminate the need for external through traditional methods (such as USB). and the configuration of the SwitchBack peripherals and to add multiple ports. Thus, customers can implement BackPacks at purchase, available resources such as BackPacks can take on an infinite array of with equal or greater processing responsibili- registers, LUTS, BRAM, DCM and PLLs sizes, shapes and complexity—handling ty without burdening the main processor. may be available for customer-specific use. additional processing capabilities, for exam- The Xilinx Virtex-5 with DSP48E slices In general, the SwitchBack uses less than ple—endowing this personal computer with can be put into service to provide additional half of the FPGA for system management the computational clout of a supercomputer. signal processing at an enhanced and and general housekeeping, leaving the In this way, the BackPack is a field-customiz- improved efficiency level. A few examples of majority of the remaining resources for able system that can morph SwitchBack into possible adaptations include image process- the user to define. a highly integrated, precision tool. ing, software radios, cryptography, network A custom update interface tool allows The RMT team routed the GPIO from security and analog modems. users to easily update the flash memory the FPGA directly to the BackPack port so Users can increase the performance of par- inside the SwitchBack. This simple soft- that logic in the FPGA could gain access ticular functions when they build the analog ware update requires no JTAG or special- to the attached BackPack, making it pos- topology of these functions into a BackPack ized equipment to modify the FPGA’s sible to control any type of device without while letting SwitchBack handle the process- configuration file. Installing new hard- ever involving the main system processor. ing duties onboard. ware is quick and easy. Users simply The SwitchBack’s promise can now be Thanks to our design group’s innovative reboot the SwitchBack system and watch fully realized when combining the repro- use of the Virtex-5 in the SwitchBack, cus- the new hardware appear in the Hardware grammable FPGA with the unlimited tomers no longer have to settle for a subopti- Manager, ready for immediate use. potential of external adaptability. mal computer and an assortment of We also packaged the SwitchBack’s dissimilar peripherals connected by a mess of requirements for the FPGA into an easy- Modular Development Kit cables. In this way, the SwitchBack enables to-implement core, which allows the end Building upon the success of the BackPack, rapid system design and deployment for mis- customer to quickly add logic, registers we found that FPGA-savvy customers could sion-critical applications. and data buses to the remaining FPGA develop and program their own BackPacks if SwitchBack is the next evolutionary step space available. we provided them with the right tool kit. in computer technology. Essentially, it’s a Our Firmware Development Kit The SwitchBack’s Modular Development reconfigurable PC platform that customers (FDK) allows customers to modify and Kit (MDK) allows them to design custom can customize in the field and upgrade on reconfigure the SwitchBack to meet their BackPacks that will make use of SwitchBack’s the fly. Since it can run mission-critical own specific needs, as the mission unique architecture. In many cases, cus- applications without interruption, it is espe- changes. The SwitchBack can quickly tomers have electronic devices or circuit cially suited for operations in key markets adapt to the in-field situation with a cards they wish to integrate for rapid testing such as the U.S. military, mining, trans- unique module upgrade, custom logic and deployment. The MDK makes doing so portation, warehousing and logistics, public changes to the FPGA or both. This a snap. It provides a functional circuit board, safety and many others. scheme effectively redesigns the system cables, schematics and mechanical CAD To learn more about the SwitchBack and in the field. By providing this capability data, allowing a customer to build a its revolutionary architecture, visit www. in an FDK, customers with FPGA expe- BackPack in a matter of days. ropermobile.com/products/switchback or rience and the proper place-and-route By using SwitchBack’s onboard FPGA e-mail or phone the author at RMT, Inc., tools can create and shape the and BackPack technology, users can build at [email protected] or (480) 705- SwitchBack to fit their exact needs. entire subsystems of specialized functions 4200, ext. 306.

First Quarter 2009 Xcell Journal 31 XPERTS CORNER Floating Point: Have it Your Way with FPGA Embedded Processors Implementing an FPU for the Xilinx PowerPC 440 is easy.

by Glenn Steiner Senior Manager Xilinx, Inc. [email protected]

Ben Jones FPGAs are well suited to performing results in programs that are application- Senior DSP Design Engineer fixed-point operations, and offer the ability specific and not reusable. The alternative Xilinx, Inc. to construct highly parallel data path solu- is to adopt a standard floating-point rep- [email protected] tions in logic or soft- or hard-processor- resentation to provide a high dynamic based implementations. The Xilinx® range that’s adaptable to many applica- Peter Alfke PowerPC® 440, the latest hard processor tions. The reward is that you need not Distinguished Engineer within the FXT series of the Virtex®-5 modify the algorithm to obtain a fixed- Xilinx, Inc. FPGA family, offers a superscalar capability, point implementation for any particular [email protected] allowing users to program the device to per- application or operating environment, nor form one or two fixed-point operations in extensively modify the code for subse- When creating embedded applications parallel at clock rates of up to 550 MHz. quent projects and applications. employing numerical processing, it’s While users can program the device to While Xilinx offers a very efficient emu- important to keep the arithmetic opera- perform most calculations using integer lated floating-point solution for the tions simple, generally by using an integer or fixed-point arithmetic, they often must PowerPC 440 processor based upon the or fixed-point representation. This helps rearrange calculations and insert scaling IBM floating-point performance libraries, to minimize the cost and power and to operations to calculate results to suffi- the core still requires tens of cycles to per- maximize the speed of an implementation cient accuracy. For complex algorithms, form each operation. Hardware acceleration in hardware. this can be time-consuming and often of floating-point operations in the form of a

32 Xcell Journal First Quarter 2009 XPERTS CORNER

floating-point unit (FPU) reduces this cycle time. The PowerPC 440 processor in the PowerPC 440 Virtex-5 FXT provides an efficient interface Processor Block for connecting a hardware accelerator, such as the Xilinx soft FPU, to the processor Virtex-5 APU core. This scheme bridges the 128-bit auxil- Floating-Point Unit iary processor unit (APU) interface provid- PowerPC APU FCB ed on PowerPC 440 processors to 440 FCB Processor I/F coprocessors via the fabric coprocessor bus (FCB). One such coprocessor, the Xilinx (in FPGA Logic) LogiCORE™ IP Virtex-5 APU-FPU, gives Virtex-5 FXT users floating-point on the PLB and Memory PowerPC “their way,” using either software Crossbar emulation or a dedicated soft-logic FPU. Figure 1 shows a typical implementation of a PowerPC 440 processor connected to the Virtex-5 APU-FPU via the FCB. Figure 1 – Embedded processor system containing an APU-FPU core

About the PowerPC 440 FPU Xilinx designed the APU-FPU specifically for the PowerPC 440 processor embedded in the Virtex-5 FXT FPGA. The tight Mov, Abs, Neg coupling of the FPU to the processor Execution through the APU interface lets the float- Control / Decode Logic Add/Sub/Convert ing-point unit directly execute native PowerPC floating-point instructions to FCB2 Bus FCB2 achieve typically 6x acceleration over soft- Bus ware emulation. Interface Compare

The Xilinx PowerPC FPU complies Register with the IEEE-754 standard for single- File & Multiply and double-precision floating-point arith- Forwarding metic, with minor exceptions. Xilinx Divide offers variants optimized for 2:1 and 3:1 APU-to-CPU clock ratios, allowing the FPSCR PowerPC processor to operate at maxi- Square Root mum frequency. Autonomous instruction issuing hides arithmetic latency and Round decreases the cycles per instruction. What’s more, these optimized implemen- tations leverage the device’s high-per- formance DSP features to reduce operator Figure 2 – Virtex-5 FXT PowerPC 440 floating-point coprocessor architecture latency and trim logic count and power consumption. Xilinx supports the APU- gle-precision, with the exception of the Tying APU-FPU to PowerPC 440 FPU flow in the Xilinx Embedded PowerPC ISA graphics subset (fsel, fres and There are two ways to attach the APU- Development Kit (EDK). frsqrte). This means you can use the FPU FPU to the PowerPC 440 processor. The Figure 2 provides an overview of the FPU with a range of commercial compilers and first method is to use the Base System architecture. The APU-FPU comprises exe- operating systems, listed at www.xilinx.com/ Builder (BSB) wizard within the Xilinx cution units, register file, bus interface and ise/embedded/epartners/listing.htm. Platform Studio design tool. The second all the control logic necessary to manage the A single-precision variant of the APU- method is to simply attach the APU-FPU execution of floating-point instructions. FPU, which the Xilinx compiler supports, unit to an existing design. The FPU comes in two variants. The uses fewer resources. When this FPU is With the BSB wizard, you specify a tar- double-precision version executes all of the employed, double-precision operations are get board and desired processor, either floating-point instructions, including sin- performed using software emulation. PowerPC or MicroBlaze™. Then, via a

First Quarter 2009 Xcell Journal 33 XPERTS CORNER

On average, the soft FPU is six times faster than software emulation. The single-precision FPU is typically 13 percent faster than the double-precision version. series of check-boxes and drop-downs, you top screen). The wizard implements a dou- FPU IP from the IP Catalog to the System select the IP you want to include in the ble-precision FPU optimized to operate at Assembly View, and then configure the design. The BSB wizard makes it easy to one-third of the processor clock frequency. FPU. The bottom screen in Figure 3 shows quickly assemble and test a basic processor You can further customize the FPU for a the IP Catalog on the left and a newly added system. Connection of the APU-FPU is as higher clock rate or for single precision. FPU in the System Assembly View. By right- easy as checking the box indicating that If you don’t want to use the wizard, the clicking on the FPU and selecting Configure you wish to include an FPU (see Figure 3, other method is to simply drag the APU- IP, you can pick the desired precision (single or double) and choose whether you want the FPU optimized for low latency (one-third clock rate) or high speed (one-half clock rate). Finally, connect the FPU to the FCB and link the FPU/FCB clock to the appro- priate clock (typically one-half or one-third of the processor clock rate).

Have Floating Point Your Way Provided free of charge with Platform Studio, the Virtex-5 APU-FPU delivers customized floating-point support. You may choose to implement the single-pre- cision FPU using approximately 2,500 LUT-register pairs or the double-preci- sion FPU using roughly 4,900 LUT-regis- ter pairs. Alternatively, you can run your software application with floating-point emulation and no additional FPGA logic. You can choose your performance level up front: either select the appropriate FPU, or implement your design and deter- mine if software emulation is meeting your requirements. If it’s not, you can upgrade to a soft FPU. Clearly, if the performance you’re get- ting from software emulation is sufficient, then you don’t need an FPU. But for high- er performance, you can use an APU- FPU. Use the double-precision FPU if your application requires it or you are using a partner compiler. If your applica- tion needs only single-precision arithmetic and you are using the Xilinx-provided GNU compiler, then the single-precision FPU will reduce your logic requirements. Remember, if you choose the double-pre- Figure 3 – Adding an FPU to an existing PowerPC processor cision FPU, it will execute single-precision design via BSB wizard (top) and via System Assembly View arithmetic by rounding the result to pro- vide single-precision accuracy.

34 Xcell Journal First Quarter 2009 XPERTS CORNER

Typical Performance Gains In situations where floating-point is dom- When evaluating the need for a hard or a inant, you can boost the performance of the soft FPU, you should determine how soft FPU by optimizing the code to take full floating-point-intensive your code is. advantage of the FPU pipeline. The FIR filter Frequently, code contains a mix of float- benchmark is a good example of the potential ing-point, integer, memory and logic performance gains. The nonoptimized code operations. Thus, while benchmarks can was typical “textbook code”—though easy for be good indicators of potential perform- humans to read, it tends to execute ineffi- ance improvement, there is nothing better ciently with most FPUs. However, by imple- than running your own code. menting loop unrolling, maximizing the To get a feel for how well the APU-FPU retention of constants in the FPU registers performs with code that is floating-point- and interleaving other code between floating- intensive, Table 1 shows selected bench- point instructions, you can obtain significant mark data for a Virtex-5 FXT PowerPC performance improvements for your design. 440 processor operating at 400 MHz, with In this example, the optimized filter code was software emulation and with the processor 3.8 times faster than the nonoptimized code connected to the double-precision APU- and 30 times faster than software emulation. FPU running at 200 MHz. Overall, the Virtex-5 FXT with its The listed numbers are a subset of a PowerPC 440 processor offers numerous larger suite of benchmarks that Xilinx has options for embedded applications. You can run to evaluate the performance of the implement your design with or without an processor floating-point units. On the FPU, trading off the higher performance of average, the soft FPU is six times faster the FPU vs. software emulation, to tailor the than software emulation and the single- processing-power resources of the Virtex-5 precision FPU is typically 13 percent faster FXT to best suit your design requirements than the double-precision FPU. and thus “have it your way.”

Software Double-Precision FPU Speedup Benchmark* Units Emulation FPU Over Software

1k fast Fourier Iterations/sec 83 637 7.6 transform

FIR filter MFLOPS 6.5 51.4 7.9 (nonoptimized)

FIR filter MFLOPS 6.5 194 30 (optimized code)

Whetstone MFLOPS 6.2 34.8 5.6

Bytemark LU Iterations/sec 8.1 43.6 5.4 decomposition

Bytemark Iterations/sec 0.255 1.42 5.6 neural net

SPEC PID Iterations/sec 1.07 3.826 3.56

* EDK 10.1 SP2 GNU compiler with flags: -O3 -funroll-loops

Table 1 – Typical floating-point performance, 400-MHz processor and 200-MHz FPU

First Quarter 2009 Xcell Journal 35 XPLANATION:FPGA101 Optimizing Xilinx FPGAs for Power Designers can rely on a multitude of tools and techniques to tame their power budgets.

by Matt Klein Principal Engineer, Technical Marketing Xilinx, Inc. [email protected]

As IC processes have advanced over the last half dozen years from the 130-nanometer to the 90-nm and now the 65-nm node, at each step power management has grown in importance. It was at the 130-nm node that manufacturers started noticing that transistors leaked power, even in standby mode. At 90 nm, the operating voltage of ICs decreased, but leakage continued to rise, wasting a greater percentage of the device’s power. At 65 nm, both these trends continue. Indeed, leakage at the 65-nm node is so pronounced that many designers consider managing power as important as meeting performance specifications. Because FPGA vendors traditionally design for a broad range of applications and endow their devices with a plethora of high- speed transistors, FPGAs have not been the most power-conservative devices. Like other silicon designed in the most advanced processes, they use transistors that leak. However, designers can leverage an FPGA’s programmability and use related tools to accurately estimate power and then employ optimization techniques to make their FPGA designs and the PCBs that contain them much more power efficient.

36 Xcell Journal First Quarter 2009 XPLANATION:FPGA101

There are two primary types of power Leakage Power vs. Junction Temperature consumption in an FPGA: static and dynamic. Static power consumption is 5 caused by leaking transistors—those that leak even when they are not doing tasks in a 4 design. Dynamic power is the power the device consumes when it is running a task— 3 toggling nodes as a function of voltage, fre- quency and capacitance. It is important to 2 understand both power types and how each varies under different operating conditions 1 Normalized Leakage Power so that you can properly optimize them to meet your design’s power budget. 0 -40 -20 0 20 40 60 80 100 120 140

o Static and Dynamic Power and Variation Die Temperature, TJ ( C) Leakage current becomes fairly significant Figure 1 – Leakage power variation with die temperature for both ASICs and FPGAs at 90 nm and even more challenging at 65 nm. To obtain higher performance from the transistor, its Leakage Power vs. Core Voltage threshold voltage needs to be lowered, but 0.3 that also increases leakage. Xilinx has done many things to minimize leakage, but 0.2 nonetheless, the variation in static power from leakage is about two-to-one between 0.1 worst case and typical process. Leakage 0 power is also strongly influenced by core voltage (VCCINT), varying with the cube of -0.1 VCCINT. Static power rises by approximate- -0.2

ly 15 percent for only a 5 percent increase Normalized Leakage Power in V . Lastly, leakage is strongly influ- CCINT -0.3 enced by junction (or die) temperature. 0.90 0.95 1.00 1.05 1.10 Figures 1 and 2 illustrate the variation of static power from leakage with voltage VCCINT (V) and temperature. Other sources of static power consump- Figure 2 – Leakage power variation with core voltage (VCCINT) tion in the FPGA are DC currents from operating circuits, but for the most part Dynamic power is the power consumed number of switching transistors. Using they are notably process and temperature during switching events in the core or I/O smaller transistors trims routing lengths invariant. Examples include I/O DC cur- of an FPGA. To calculate dynamic power, between them, which reduces dynamic rents (such as I/O termination voltages on we must know the number of toggling power. So the 65-nm transistors in the terminated standards like HSTL, SSTL transistors and traces, capacitance and tog- Virtex-5 FPGA have lower gate capaci- and LVDS) as well as DC currents in cur- gling frequency. Transistors are used for tance and shorter interconnect traces, a rent driver I/O types like LVDS. Some logic and programmable interconnects combination that drops node capacitance FPGA analog blocks also are sources of between metal traces in the FPGA. The by about 15 to 20 percent. That in turn static power consumption, likewise process capacitance consists of transistor parasitic lowers the dynamic power. and temperature invariant. Among them capacitance and metal interconnect capaci- Voltage also has an effect on dynamic are the digital clock manager (DCM), a tance. The formula for dynamic power is: power. Moving from the 90-nm to the 65- clock control element in Xilinx® FPGAs; nm process node reduces dynamic power in PDYNAMIC=nCV2f, where n = the num- phase-locked loops (PLLs), which are avail- Virtex-5 FPGA designs by approximately 30 ber of toggling nodes, C = capacitance, able in the Xilinx Virtex®-5 FPGA; and percent, simply by decreasing V from V = voltage swing, f = toggle frequency. CCINT IODELAY, an element used to select pro- 1.2 volts to 1 V. That plus architectural grammable delays on input and output sig- Tighter logic packing (through internal enhancements allowed a net dynamic power nals in Xilinx FPGAs. FPGA architectural changes) reduces the reduction of 40 to 50 percent compared

First Quarter 2009 Xcell Journal 37 XPLANATION:FPGA101

with 90-nm technology. (Note: While ensures that the design will also meet the Change Dump (VCD) and Switching dynamic power varies with the square of thermal specifications for commercial- Activity Interchange Format (SAIF) files.

VCCINT, it is largely temperature and process grade or industrial-grade devices. The If you are using either the VCD or SAIF invariant for the core of the FPGA.) Block Summary, meanwhile, shows the formats, you need to create representative power from each block and the Power simulation vectors so the tool can record the FPGA Power Analysis Tools Summary displays the sum of the quiescent toggle rates of nodes in the system, which in Xilinx has two types of power analysis tools. and dynamic power. turn allows you to access the data later. In We designed the first, the XPower Each of the tabs in the XPE tool allows the absence of these simulation files, the Estimator (XPE) spreadsheet tool, for use you to enter utilization and toggle rates for user can have the XPower Analyzer tool per- before a designer employs implementation a given type of resource, such as clocks, form a vectorless simulation. This type of tools. Use the second tool, XPower logic, I/O, block RAM (BRAM), PLLs, simulation uses mathematical and statistical Analyzer, after you have implemented your DSP and so on. modeling to propagate starting toggle rates design to check how the changes you’ve Finally, XPE’s Graphs tab/sheet gives through the actual design logic. It then gen- made affect power consumption. you a graphical look at power by function, erates a result containing toggle rates of each The XPower Estimator gives you a process, voltage and temperature varia- node in the design. quick power estimation based on user descriptions of resource utilization in the FPGA, toggle rates, loading, etc. in a spreadsheet environment. This is the tool to use for your initial power evaluation, selection of power supplies and regulator, as well as any cooling solutions for the sys- tem (heat sinks, fans and the like). With this Microsoft Excel-based tool, system architects can make device-, design- and system-oriented power decisions. You simply enter the estimated design parame- ters, such as resource utilization, operating environment, and clock and toggle rates. XPE then calculates estimated power for a given design and reports total power and Figure 3 – Xilinx XPower Analyzer summary page maximum junction temperature as well as rail-based and block-based power. In setting up the estimation run, the tion. The Power by Function graphic, in With both the vector-based (from tool’s Process function is an important fea- particular, lists each feature and shows its VCD and SAIF) files and the vectorless ture. It allows you to see typical or worst- power consumption, allowing you to variety, XPower considers physical connec- case power consumption by various blocks. identify features that could best benefit tivity of the placed-and-routed design and Primarily, the static power from leakage on from optimization. exact resource usage. The tool cross-refer- the VCCINT supply is very process depend- The second Xilinx power analysis tool, ences the activity or toggle rate at each ent. Further, the Voltage Source Summary XPower Analyzer, provides an even more node with characterized capacitance data lets you quickly see the effect on power accurate view of the power breakdown for physical resources and individual consumption when voltages are varied. based on exact resource information it dynamic power consumption of each That is especially important to understand extracts during the implementation. You block at given toggle rates. The result, relative to VCCINT, which is one of the can supply the tool with test and simula- shown in Figure 3, represents total power power supplies representing all the core tion vectors, or perform vectorless power and maximum junction temperature, and logic. Both the process variation and volt- estimation. This tool uses characterized contains rail-based, block-based and hier- age variation selections in the XPE tool capacitance data for physical resources in archically based power reporting. ensure that you can determine proper the FPGA design. XPower gives you a detailed look at worst-case power supply sizing. XPower Analyzer is tied into the Xilinx where your design is consuming power and One other valuable feature of XPE is the Integrated Software Environment (ISE®), lets you do “what if” analysis to make Thermal Information/Summary, which and it accepts post-place-and-route informa- more-informed choices for which blocks allows you to specify heat sink, PCB prop- tion from several internal Xilinx file formats. could most benefit from optimizations, erties and temperature information. This It also accepts industry-standard Value ranging from simple ones through rearchi-

38 Xcell Journal First Quarter 2009 XPLANATION:FPGA101

Memory Write Memory Read

Driving: Not three-stated IOB IOB VCCO = 1.5V

Ω 2RVRP = 2Z0 = 100 HSTL_II_T_DCI HSTL_II_T_DCI

Three-statable Internal + Termination via T_DCI - VREF = 0.75V

2R = 2Z = 100Ω VREF = 0.75V VRN 0

Receiving: Three-stated

Best Signal Integrity, but Input DC Termination Power is Removed During a Memory Write

Figure 4 – FPGA pin shown during memory read and memory write using T_DCI tecting. Additionally, you can use XPower Another way to reduce static power is to document the actual power specs for a Static Power Reduction to carefully audit your design and elimi- Going to Smaller Device given design and pass that information to nate redundant DC consumers. Often, the board level. your design may employ blocks with ‡ 330k 220k -33% extraneous or hidden DCMs or PLLs. Reducing Power with FPGA Design Techniques 220k ‡ 110k -51% That can happen if you are redesigning While the process shrink to 65 nm gives blocks and forget to remove them, or if ‡ the Virtex-5 an inherent dynamic power 110k 85k -24% you’re building a next-generation product reduction, you can also employ new tools, 85k ‡ 50k -46% with a bit of legacy code. Extracting the tricks and techniques to further reduce DCM or PLL to the top level of your 50k ‡ 30k -33% the juice. design, allowing blocks to share the One way to attack power consumption resources, will further reduce the size of Figure 5 – Static power reduction is by selecting the right FPGA for your with part-size reduction the design and DC power. design and then leveraging its program- Making wise use of memory blocks will mability to further optimize your design’s the data going into a single instance of that also help reduce your FPGA design’s power consumption. Your design choices circuitry. This uses half the logic. dynamic power consumption, and in turn can affect both static and dynamic power Another way to reduce logic size is to its overall power consumption. Since consumption. use Xilinx’s unique partial-reconfiguration dynamic power is a function of capacitance Static power from leakage is proportion- feature to replace sections of circuitry with (area or length) and frequency, you should al to the amount of logic and, hence, the new sections when only one section is examine the way your design accesses block number of transistors used to construct a needed at a time. memory and identify areas where you can given FPGA. Thus, if you reduce the num- You can also move functions to nonlim- optimize capacitance and frequency. ber of FPGA resources you are using, you iting available resources—state machines to Xilinx FPGAs include two types of mem- can probably implement your design in a BRAM, for example, or counters to DSP48 ory arrays. BRAM, which we provide in 18k smaller device, which lowers your leakage (Xilinx multiply, add, DSP block), registers or 36k bit size, is optimized for large mem- power. The effect of moving to the next- to shift register logic and BRAM to lookup- ory blocks. LUTRAM is optimized for small smaller device is shown in Figure 5. table RAM (LUTRAM). You can also make granularity and is based on the lookup table To reduce the size of your design, you sure that you are not overconstraining the in the FPGA. LUTRAM comes in units of can employ several techniques, starting timing for your design, which can duplicate 64 bits in Xilinx Virtex-5 FPGAs. with time slicing of logic functions. That is, logic/registers. Of these two types, BRAM typically if two sets of circuitry perform a linear set Also, you should take full advantage of consumes more power. Its enable rate is of functions in the FPGA and are copies of the hard IP blocks (BRAM, DSP, FIFO, typically the largest source of BRAM power each other, one may use one set of that cir- Ethernet MAC, PCI Express) we’ve imple- consumption, while its toggle also con- cuitry, run it at twice the rate and multiplex mented in the FPGA architecture. tributes, but is secondary. Designers can

First Quarter 2009 Xcell Journal 39 XPLANATION:FPGA101

Xilinx FPGAs have some interesting features in the area of clock gating. For example, you can use the BUFGMUX clock buffer to have the FPGA shut off a global clock or dynamically select a slower one.

take a few actions to minimize BRAM The other half of Figure 6 shows Xilinx’s 85 percent power savings over implement- power consumption. For example, you can Block Memory Generator, which allows ing them in BRAM. That’s because we had enable the BRAM only during an active you to build arbitrarily sized memory a lot of wasted space in the BRAM and read or write cycle. You can also make sure arrays and optimize them for speed or inefficiently used sixteen 18-kbit blocks to to use LUTRAM instead of BRAM for power. Figure 7 shows the Xilinx Power get 16 very small (64 x 32-bit) memories. small memory blocks, reserving the BRAM Estimator for this case, comparing the When we look at the power compar- for larger ones. Additionally, you can try to power consumption between N blocks isons for the second case of 16 sets of 18k use BRAM for multiple large blocks. running with a given enable rate to N bit array, the XPE tool shows the opposite Another technique is to arrange the blocks with an enable rate of N/4. The for large memory arrays (Figure 9). memory arrays to minimize area and maxi- results show a 75 percent reduction in Implementing them in BRAM instead of in mize performance or to minimize power dynamic power. LUTRAM gives a 28 percent power sav- consumption. Figure 6 shows a 2k x 36-bit Xilinx tools will help you pick the right ings, attributed to the many small-granu- storage array optimized for speed and area. memory array for the job. Consider two larity objects that need to be turned on and We formed it by using four 2k x 9-bit sets of memory storage areas needed for a interconnected.

Conceptual Xilinx Block Memory Generator Tool Storage 2k x 36 Storage 2k x 36 4 Discrete CEs CE CE ADDR/CE ADDR BRAM 2kx9 ADDR Decode BRAM 512x36 DIN DOUT DIN DOUT ADDR ADDR CE CE BRAM 2kx9 BRAM 512x36 DIN DOUT DIN DOUT ADDR ADDR CE Almost 75% CE DOUT BRAM 2kx9 Reduction in BRAM 512x36 Mux DIN DOUT DIN DOUT ADDR Power ADDR CE CE BRAM 2kx9 BRAM 512x36 DIN DOUT DIN DOUT ADDR ADDR CE CE

Optimized for Optimized for Power Speed and Area (Only one block enabled at a time)

Figure 6 – Speed and area vs. power-optimized memory array (left) and Xilinx Block Memory Generator with power vs. area selections

blocks in parallel and always enabling all design. In one case we need 16 sets of 64 x Xilinx FPGAs also have some interesting four blocks when a new value is needed. 32-bit memory structures (total bits = 32k) features in the area of clock gating. For You can also employ another arrangement running at 300 MHz. In the other case we example, you can use the BUFGMUX clock of 2k x 36 bits by constructing four 512 x need 16 sets of 512 x 36-bit memory struc- buffer to have the FPGA shut off a global 36-bit blocks, but decoding the lower two tures (total bits = 294k). clock or dynamically select a slower clock. address bits to select which 512 x 36-bit If we look at power comparisons for the You can also use the BUFGCE clock buffer block is being accessed. In the latter case, 16 sets of 64 x 32-bit memory structures, to perform cycle-by-cycle clock gating in a accessing no more than one memory block what the XPE tool shows us (Figure 8) is manner very similar to the cycle-gating at a time reduces the power consumption that small memory arrays are best to imple- technique designers use for ASIC design. by 75 percent compared with the first case. ment in LUTRAM. Doing so delivers an You should consider using both features.

40 Xcell Journal First Quarter 2009 XPLANATION:FPGA101

vs. the ±5 percent specification. Keeping the core voltage at the 1-V nominal rather than the 1.05-V maximum setting can reduce static power from leakage by 15 per- cent and dynamic power by 10 percent. You can also reduce power consumption by keeping the junction temperature under control. The thermal properties of the Figure 7 – XPE results of power-optimized array FPGA, PCB, heat sink, ambient tempera- ture, airflow and FPGA power for a given design all influence what the junction tem- perature of the FPGA will be. One simple and somewhat obvious way to reduce the FPGA’s junction temperature is simply to use a more thermally efficient PCB or heat sink. Then, any changes the FPGA designer can make to reduce power will be a bonus. At elevated junction tem- perature, such as 100°C, a reduction of 15°C will reduce static power consumption Figure 8 – Small memory power estimation using block RAM or LUTRAM from leakage by 20 percent. Another way to reduce power consump- tion is to monitor the temperature and voltage in an FPGA. The Virtex-5 FPGA includes an analog block called System Monitor, which monitors external or inter- nal analog voltages and die temperature. System Monitor is wrapped around a 10- bit A/D converter, which can provide accu- rate and reliable results over a temperature range of ¯40°C to +125°C. The A/D con- Figure 9 – Large memory power estimation using LUTRAM vs. block RAM verter digitizes the output of on-chip sen- sors; you can use it to monitor up to 17 external analog inputs to check environ- They are especially helpful in designs where Reducing Power at the Board Level mental aspects of the system performance. certain blocks are not in use but contribute The PCB designer, mechanical engineer The block contains configurable to power consumption. In those cases, you and system architect have several things to thresholds and warning levels, and it can turn off a very large clock domain with consider at the board level to reduce power stores the results of its measurements in thousands of clock loads either on a clock- consumption in the FPGA. Both the core configurable registers that easily interface cycle-by-clock-cycle basis or for many, voltage and the junction temperature of the to user logic or a microprocessor. many clock cycles. FPGA have a strong influence on various Additionally, you can read the values via You can also rein in dynamic power components of power consumption. the JTAG port or even at power-up, prior consumption by reducing glitch energy. In Keeping control over VCCINT core volt- to the FPGA being configured. designs that contain combinatorial logic age is one way to reduce power consump- As power from the core voltage is driven and registers, occasionally various inputs to tion at the board level. Static power from down through transistor improvements, a block of combinatorial logic will arrive at leakage and dynamic power are both highly reduced capacitance and lower voltage, I/O slightly different times, generating short- dependent on the core voltage of the FPGA. power becomes another important consid- duration glitches that can propagate to Thus, one way to reduce leakage is to set eration that you need to be aware of to bal- other structures and waste power (see the core voltage close to nominal (1 volt) ance power and performance. You can also Figure 10). By using more pipelining rather than at the high end of the Virtex-5’s reduce overall power consumption by mak- between layers of logic, you can block a operating range (1.05 V = +5 percent). ing smart I/O choices. To make an glitch from propagating to other structures, With modern switching regulators you can informed choice, you have to consider each which will reduce dynamic power. achieve a voltage tolerance of ±1.5 percent FPGA design’s I/O interface requirements.

First Quarter 2009 Xcell Journal 41 XPLANATION:FPGA101

Intermediate logic propagation glitches cannot pass a flip-flop.

LUT

Glitches propagate and multiply the more logic levels they travel.

LUT LUT LUT LUT

Glitch energy reduced by adding more "roadblocks."

LUT LUT LUT LUT

Figure 10 – Glitch propagation and blocking with inserted flip-flops

For example, interfacing to a memory ance) in the Virtex-5 allows the user to Also, you can configure ISE’s power- (DDR2, QDR, RLDRAM, etc.), may dynamically remove the termination optimized synthesis engine to automatical- require termination inside the FPGA for sig- when a given I/O pad is used as an out- ly locate small arrays in the source code and nal integrity, but it will typically consume put. This is useful for the data bus or synthesize them into LUTRAM. At your more power and raise junction temperature. memory interface, and depending on the command, the engine will locate large Meanwhile, if you are interfacing an read vs. write ratio, it can rein in a fair arrays (in a size you specify) and synthesize FPGA to an ASIC/ASSP, you must select amount of power (see Figure 4). them into block RAM. If it finds a large the interface based on what the ASIC/ASSP When selecting I/O interfaces, wise counter, it can implement it in a DSP48 target specifies (LVDS, HSTL, etc.). And if choices that balance performance and block. It can also make smart choices when you are interfacing one FPGA to another power are important. You should use inter- replicating logic to ensure it implements FPGA, you may choose the interface based faces like LVDS when your design needs only the optimal amount. on the design’s performance needs and may absolute maximum performance and mini- More recently, Xilinx has introduced an be able to more readily find opportunities mum noise or the target device requires the optimized placer that will group functions for power optimization. I/O standard. together to minimize routing distance and, While both inputs and output consume Because termination generally con- hence, capacitance. A related set of tools power, the reference standards like LVDS, sumes a large amount of power, you need called PlanAhead™ allows you to take HSTL and SSTL consume the most. For to use it wisely and take into account the hierarchical groups of logic and physically outputs, the higher-drive-strength standards balance of power and performance. place them in rough areas inside the FPGA. consume the most power and the power Schemes that use external termination or This helps reduce capacitance and speed up varies linearly with output enable rates and no termination can greatly reduce power. routing time as well. toggle rates. However, LVDS is an exception As Xilinx continues to pioneer technol- in that it is based on a fixed current source, Yesterday, Today and Tomorrow ogy on the latest process nodes, we antici- which is independent of toggle rate. Ever since power management began to pate that dynamic and static power will For inputs, the referenced standards loom as a big issue, Xilinx has been dili- continue to pose challenges. But at the consume a lot of power because their gently building power-optimization tech- same time, we are diligently working not receive structure incorporates a differential nologies into tools throughout our ISE only to optimize our tools and methods for receiver and also because they include a suite. For example, in addition to launching power management, but also to make a selectable internal termination. Both con- XPE and XPower Analyzer, a few years ago concerted effort to nip power problems in sume DC power. we gave ISE a power-optimized router that the bud—in silicon. A feature called T_DCI (dynamically works based on known capacitance of rout- For more information on Xilinx power three-statable digitally controlled imped- ing resources inside the FPGA. management, visit www.xilinx.com/power.

42 Xcell Journal First Quarter 2009

XPERIMENT Computer Interface Makes 19th-Century Pipe Organ Rock How a group of engineers in Edinburgh, Scotland, used a Xilinx Spartan-3E Starter Kit to create a robotic organist.

by Gareth Edwards Design Manager Xilinx, Inc. [email protected]

It all started, as these things often do, with a conversation in the pub. “Do you know the pipe organ upstairs in the Forest Café?” “`Yes.” “We should build a robotic organist to play it.” “Of course we should!” And with that casual exchange, so began Project Waldflöte. My day job is as a design manager in the IP group at Xilinx Scotland, but in my spare time, I’m also part of an informal movement called “dorkbot,” which pro- motes grassroots collaborations between the engineering-and-scientific community and the artistic community; its tongue-in- cheek motto is “people doing strange things with electricity.” I belong to the Edinburgh-based chapter (named either “dorkbot alba” or “dorkbot Edinburgh,” depending on who you are talking to). Members have, in the past, built a pixel- mapped LED top hat, a self-propelled toothbrush, a persistence-of-vision poi jug- gling device, a barely electromagnetic screwdriver and various noise-generating boxes. Injuries are surprisingly rare.

44 Xcell Journal First Quarter 2009 XPERIMENT

The dorkbot Edinburgh group meets every other Tuesday in the Forest Café, a volunteer-run nonprofit gathering place near the University of Edinburgh. I had been attending dorkbot workshops in the café for a few weeks when one night, I ventured upstairs to repair some stage lighting and, surprisingly, found myself in a church, complete with pulpit, bal- cony and, most important, a 16-foot pipe organ (Figure 1). It turned out that the building the café occupies was once the meeting place of the Edinburgh Congregational Church— hence the organ. But this wasn’t the instru- ment’s original home. Initially installed in Dublin Castle in Ireland in the late 19th century by Gray and Davison, the famed London-based pipe organ builders, the organ, for reasons unknown, was moved to Edinburgh in 1900. There it has stayed, in varying states of repair, ever since. So after the conversation in the pub, we leaped into inaction. Over the course of seven months’ worth of Tuesday evenings, we pondered, poked, prodded and prototyped several ways of driving the organ’s keyboard. As for the name, we settled on “Project Waldflöte” for a stop on the organ called Waldflöte. It means “forest flute” in German, and since the organ is in the Forest Café, it seemed poetically apt.

Getting the Mechanics Right It became obvious very early on in the development that we could partition the problem into a mechanical part and an electronic part. Once we had a solution to the mechanical problem, we could proceed with the construction of both parts rela- LING PHOTO: MARTIN tively independently. Figure 1 - The Forest Café pipe organ in Edinburgh, Scotland, was built One of the main constraints was by the London firm of Gray and Davison in the late 19th century. financial—we didn’t have any real money to spend, just whatever our group of What we found was that the size of the gram of how it works in Figure 3. For the roughly a half-dozen core people was solenoids was ideal, but the travel of the core white keys, the top plywood lever is willing to throw into the pot. Scouring was a bit less than we would need for con- hinged at the back with a piece of duct the surplus market, we found some sole- sistent triggering of the organ’s white keys. tape, and is pulled down when the sole- noids that looked like they could fit the Although we could drive the black keys noid is energized. When the solenoid is bill. We could get a hundred of them for directly with the solenoid core, we would released, the organ key itself provides the about a pound each (roughly $1.50), so need some kind of lever for the white ones. upward force—there is no need for an we ordered half a dozen to play with on You can see the first prototype of the additional spring. For the black keys, a the organ itself. solenoid assembly in Figure 2 and a dia- small pin protruding from the bottom of

First Quarter 2009 Xcell Journal 45 XPERIMENT

noids had demonstrated that each one would require around a 350-milliamp drive from a 15-volt supply—way beyond what a CMOS output stage can deliver. To meet this requirement, we added a ULN2803A Darlington output stage to each shift regis- ter IC. This chip also has an integral protec- tion diode to shunt the high flyback voltage that a solenoid can generate when the cur- rent is turned off, which saves adding a dis- crete diode to the layout. We built a few prototype driver boards on stripboard, each capable of driving 16 solenoids.

Figure 2 - Prototype solenoid assembly The Controller Design Although there were a number of ways we could have implemented the controller (including on an Arduino platform or using some other microcontroller), we chose to do it on a Xilinx® Spartan®-3E Starter Kit, Solenoid Core since, in my day job for Xilinx, I had access Solenoid to the board and I knew the tool set inside out. In particular, I knew my way around

Plywood Lever the debug tools such as the Platform Studio Duct Tape Hinge Pin SDK and ChipScope™, and since this was likely to be a project in which we debugged live, that would save time down the line. We Solenoid Core used the Xilinx Embedded Development Kit to create the MicroBlaze™ subsystem that Solenoid Organ Keyboard would be the heart of the design (Figure 5). In addition to the MIDI interface and shift register interfaces, we chose to add a serial RS-232 console to help us debug Wooden Backplate the system. The RS-232 protocol might seem a bit old-school, but in this kind of project its presence can be invaluable. We Figure 3 - Mechanical layout also added some GPIO ports to drive LEDs and read switches and pushbut- the solenoid pushes down directly on the Designing the Electronics tons, to allow some interactivity without key with sufficient force and travel to At that point we sat down and roughed out having to use the console. sound the note. the architecture; that basic diagram can be Testing with this assembly showed that seen in Figure 4. On the left, MIDI mes- Writing the MicroBlaze Firmware the keys were indeed successfully pressed. sages arrive from the outside world (I’ll talk We had decided that the best input inter- It also demonstrated that I am incapable more on the MIDI protocol below). On face for the system would be a MIDI port. of dividing by seven—the spacing on the right is a shift register chain; the con- The Musical Instrument Digital Interface which I had placed the solenoids was not troller toggles the “clock” signal while driv- has been, since the 1980s, the standard even close to matching the actual spacing ing the appropriate “data” value to fill the way to connect digitally controlled musi- of the octaves of the keyboard, so we shift register chain, then asserts the “strobe” cal instruments such as synthesizers to could only test one key at a time. Still, we signal to deliver the contents of the chain other instruments or to a controlling com- had proven the principle was sound, so we in parallel onto the solenoid driver inputs. puter, and it was therefore obvious that we went ahead and ordered the parts for a We implemented the shift register/driver should use it too. MIDI would give us full-length key rig and then started on the chain with 74HC595 shift register ICs. maximum flexibility in the devices we design of the electronics. However, the experiments with the sole- could connect to the organ.

46 Xcell Journal First Quarter 2009 XPERIMENT

The MicroBlaze maintains an internal map of the state of the entire keyboard and which keys the system is pressing—that is, which solenoids the system is energizing.

MIDI is a unidirectional low-speed seri- (known as “omni” operation). And since NOTE ON message sets the corresponding al protocol, operating at 31250 baud. It a pipe organ keyboard is not velocity- map entry to “1” and a NOTE OFF mes- consists of a variety of message types, but sensitive, we can safely ignore all the sage sets the entry to “0.” for our purposes, the only important ones velocity bytes. After the map is updated, the solenoid are NOTE ON and NOTE OFF. Each The EDK UART IP core receives the registers are refreshed with the entire con- NOTE ON message consists of three bytes. MIDI message bytes and presents them to tents of the map; by bit-banging on the Byte 1 is 0x9n, where n is the channel the MicroBlaze processor one at a time GPIO port, the MicroBlaze processor writes number. through a FIFO. The MicroBlaze main- the map one bit at a time onto the data Byte 2 is the note number from 0 to tains an internal map of the state of the input to the shift register and toggles the 127, where middle C is No. 60. entire keyboard and which keys the system clock signal to move the shift register along Byte 3 is the velocity value from 0 to is currently pressing (that is, which sole- by one position. Once the entire shift regis- 127. noids the system is energizing). The ter has been updated with the map contents, NOTE OFF is very similar, except the firmware uses a static lookup table to figure the MicroBlaze then writes a rising edge first byte is 0x8n. out which solenoid is associated with the onto the STROBE line, which copies the In our implementation, we decided to musical note in the event and uses this as values in the shift registers into the output listen on all channels simultaneously an index into the map; the arrival of a registers, energizing or de-energizing the correct solenoids to make beautiful music. We implemented the firmware as a soft- ware state machine; for an embedded appli- data cation without a real-time operating system, clock MIDI Input Waldflöte this can provide some of the capabilities of Controller strobe a multithreaded application but without the overhead of an actual thread implemen- tation. A static array of structs describes, for each current state, what action the system should take for a particular event: const midi_state_table_entry_t MIDI_STATE_TABLE[] = Figure 4 - Electronics architecture { {INHIBITED,PANIC, MidiSM_Panic,INHIBITED}, {ANY_STATE,PANIC, Program ROM/RAM MidiSM_Panic,INIT},

LMB {ANY_STATE,INHIBIT, MidiSM_DoNothing,INHIBITED}, {ANY_STATE,OTHER_STATUS_RECEIVED MicroBlaze ,MidiSM_ClearMessage,INIT}, Processor {INIT,NOTE_ON_OR_OFF_RECEIVED, LEDs and MIDI Input xps_uart16550 xps_gpio MidiSM_StoreStatusByte,NOTE_ Buttons ON_OR_OFF}, clk RS-232 PLBv46 data Shift Register xps_uart16550 xps_gpio {INIT,DATA_RECEIVED, Console strobe Control MidiSM_DoNothing,INIT},

Figure 5 - MicroBlaze subsystem {NOTE_ON_OR_OFF,NOTE_ON_ OR_OFF_RECEIVED,MidiSM_

First Quarter 2009 Xcell Journal 47 XPERIMENT

We’ve successfully played some very complex and fast pieces of music, from classical to rock; there doesn’t seem to be any serious limitation in the speed of the solenoids and drivers.

StoreStatusByte,NOTE_ON_OR_OFF}, The event loop supplies an event as an argu- {NOTE_ON_OR_OFF,DATA_RECEIVED, || (pTable->state == ment to this function and, depending on MidiSM_StoreNoteNumber,NOTE_ON_OR ANY_STATE))) the current state and the event, some action _OFF_NUMBER}, { is taken and the state of the system changed. The types of events include the {NOTE_ON_OR_OFF_NUMBER, (*pTable- arrival of bytes on the MIDI interface, the NOTE_ON_OR_OFF_RECEIVED,MidiSM_St >transition_function)((v arrival of characters on the console and oreStatusByte, NOTE_ON_OR_OFF}, oid *)pInstance); presses of the panic button. All experienced {NOTE_ON_OR_OFF_NUMBER, pInstance->current_state MIDI hackers know a panic button is a DATA_RECEIVED,MidiSM = pTable->next_state; must-have feature to save your ears and _NoteOnOrOffComplete, your power supplies—it unconditionally return XST_ NOTE_ON_OR_OFF}, turns off all the solenoids and returns the SUCCESS; {INHIBITED,ENABLE, system to a known-safe state. MidiSM_DoNothing,INIT}, } Waldflöte in Action {LAST_STATE, LAST_EVENT, 0, pTable++; Figure 6 is a photograph of the organ with LAST_STATE}, } while (pTable->state != the contraption in place. Hiding the keys at LAST_STATE); }; the bottom are the solenoids’ wooden back- The first entry in the struct is the cur- planes—each plank has 30 or more sole- rent state; the second entry, the event that // Aaargh, something bad happened noids mounted on it, along with some has arrived; the third entry, the state transi- - should never get here recycled can capacitors to provide the ener- tion function needed to handle the event; gy reservoirs for the solenoids. We C- XASSERT_NONVOID_ALWAYS(); and the fourth entry, the next state. clamped the whole driver assembly to the The code that implements the business } organ. At the top you can see the Spartan- end of the state machine looks something 3E Starter Kit board and the interface strip- like this:

XStatus MidiSM_ DoStateTransition (midi_state_machine_t *pInstance, u8 event) { const midi_state_table_ entry_t *pTable = pInstance- >pStateTable;

// Search for a match in the state table do { if ((event == pTable- >received_event) && ((pInstance- PHOTO: MARTIN LING PHOTO: MARTIN >current_state == pTable- Figure 6 - The finished controller in place; the robotic organist plays everything from rhapsodies to rock. >state)

48 Xcell Journal First Quarter 2009 XPERIMENT

board on its right; we wired these to the driv- er assemblies using recycled CAT5 cable. Get Published It’s hard to describe the operation of the organ in print, so I encourage you to follow the Internet link at the end of the article and take a look at the videos we’ve uploaded. The main thing you will notice is the click-clack- ing sound of the solenoids as the robot plays, be it “Moonlight Sonata” or “Jump”—this is the sound of the solenoid cores bottoming out in the coil, not of the actual levers hit- ting off the key. However, when you are down in the main hall of the café rather than standing up on the balcony where the organ is situated, the solenoid noise is much less noticeable. Only the majestic sound of the pipes dominates. We’ve successfully played some very com- plex and fast pieces of music, from classical to rock, using the system; there doesn’t seem to be any serious limitation in the speed of the solenoids and drivers. The solenoid power Would you like to write supply generally draws less than 4 A from the 15-V supply even in the most demanding for Xcell Publications? pieces. And even though we are overdriving the solenoids slightly, there is no perceptible heating in the solenoid coils. All in all, we are It's easier than you think! very, very pleased with the system and proud to have been involved in its creation. Submit an article draft for our Web-based or printed So what’s next for Waldflöte? Well, we’ve publications and we will assign an editor and a informally invited some musicians to create compositions for the new instrument (par- graphic artist to work with you to make ticularly, composers who are excited by the your work look as good as possible. idea of a 53-fingered performer that never tires), and we are thinking about staging a For more information on this exciting and highly recital. Another possibility is to mechanize rewarding program, please contact: the operation of the stops of the organ, so we can have changes in volume and timbre dur- ing a performance under software control. Mike Santarini We’re also pondering ways of driving the Publisher, Xcell Publications bass pedals of the organ, which will bring the longest and lowest pipes into play. Finally, [email protected] and possibly the most achievable option, we are considering putting a service onto the Internet that would allow the public to upload their own MIDI files to the system and have the audio from the organ streamed back to them in real time. But then again, we might just go back down the pub. See all the new publications on our website. To see and hear Waldflöte in action, visit http://dorkbot.noodlefactory.co.uk/ www.xilinx.com/xcell wiki/WaldFlote.

First Quarter 2009 Xcell Journal 49 ASK FAE-X Hidden in Plain View Xilinx offers a wealth of resources to simplify the prep work for your next design.

by Barrie Timpe Field Applications Engineer, Southern Ohio Valley Xilinx, Inc.

Designers are under the gun, pressured to do more in less time. All too often, customers and design management want it all, and they want it now. At the same time, silicon is growing more complex as each new device family piles on features. The FPGA is no longer a relatively simple array of logic blocks, a peripheral I/O ring and a central- ized clock tree. As densities have grown, hardened silicon resources (block memories, DSP slices, advanced I/O blocks, multigigabit transceivers, PPC405/440 and the like) have added power- ful features and performance capabilities to programmable logic devices.

50 Xcell Journal First Quarter 2009 ASK FAE-X

All of these factors put an increased bur- den on designers. Luckily, Xilinx users have The user guides and schematics can be a access to a wealth of resources that can great resource for peripheral interfacing, reduce overall design effort and risk. It’s all too easy to overlook some of the more power supply reference or other features you important ones during these high-pressure development cycles, starting with the most want to include in an end-product design. basic: documentation.

User Guides Abounding With previous generations of products (for may be updated during the life cycle of the Application notes can be useful in pro- example, the Virtex®-II), the datasheet was product. Meanwhile, Xilinx Change Notices viding information on how something is often the definitive introduction to the sil- detail updates to device availability, package possible—often you know it is, but you icon capabilities, outlining detailed func- or test locations, package materials, revised may not know where to get started or tional and performance characteristics specifications and the like. simply don’t have a handle on some of the associated with the various device elements. considerations. Even if the application Due to the increasing complexity of newer Application Notes note does not fit your exact requirement, families, however, the datasheets are now Xilinx Application Notes outline issues of a it can often serve as a baseline or may largely reserved for the associated DC and more specific slant in terms of actual design spark derivative ideas. The idea is to switching characteristics. It’s up to user considerations. More than 1,000 of these incorporate application notes and extend guides to fill the role of introducing the fea- documents now exist, with new ones com- them, as appropriate, for a user’s specific tures and functions of the devices. ing out regularly and the existing docu- design requirements. A dozen or more separate user guides ments getting occasional updates. Check address specific topics of interest in the cur- the various categories and the associated Development Boards rent flagship Virtex®-5 FPGA, including titles to see if there’s one you could leverage Development boards and kits are another configuration, system monitor, trimode on a current or future design. Many appli- crucial resource. They are most often used Ethernet MAC, gigabit transceivers, PCI cation notes also have accompanying in the early stages of a design cycle with a Express hard block, packaging/pinout, design files, such as RTL source or con- new device family, but can be appropriate printed-circuit board design and so on. The straint files. Here are a few examples of the at other times as well. These boards and more cost-optimized Spartan®-3 products most popular application notes. kits can function as a platform to evaluate have two user guides, one focused on the project feasibility. You can use them as an silicon capabilities and the other for device • XAPP058, XAPP424 and XAPP502 initial development platform during your configuration. A third, optional guide describe techniques for updating own hardware design or to gain familiarity explains the DSP48A slice unique to devices in the field. with introductory designs and tool flows. Spartan-3A DSP devices. • XAPP802 is an overview of approaches However, they can also be helpful even User guides are often the best place to for memory interfacing, by device and when you have no apparent need for an start to familiarize yourself with a new memory technology. evaluation board. For example, some of the device family. Don’t overlook the book- “getting started” guides help with common mark index and search features; they will • XAPP1052 and XAPP859 offer tasks like downloading a design to the help you quickly navigate to specific areas DMA/initiator example designs for the FPGA/PROM and generating or loading of interest. Virtex-5 PCI Express Block Plus core. designs onto a System ACE™ CF card. Deviations from the datasheet may give • XAPP485 and XAPP486 cover Spartan- The user guides and schematics can rise to errata. These typically apply to engi- 3E LVDS serializer/deserializer designs. also be a great resource for peripheral neering-sample revisions, but may also affect Versions are also included with the interfacing, power supply reference or production-step device revisions. As a Spartan-3A Starter Kit reference designs. other features you want to include in an designer, it is important to become familiar end-product design. They come with ref- • XAPP866, XAPP856, XAPP873, with the details and associated considerations erence designs that can bring designers up XAPP855 and XAPP860 describe outlined in the respective errata, including to speed on IP usage, configuration flexi- Virtex-5 LVDS interfacing. devices affected and how to identify them. bility and so on. Errata are posted on the Web site’s Customer • XAPP514 is a collection of audio/video A detailed review of reference designs can Notices page; users can also receive e-mail interfacing application notes. Contact turn up a wealth of practical design nuggets. updates via MySupport Alerts. It is impor- your FAE for updated versions for the For example, the EDK TFT controller, tant to stay on top of the errata, since they Virtex-5. mutex and mailbox cores first launched on

First Quarter 2009 Xcell Journal 51 ASK FAE-X

Virtex-4 reference designs (ML405 and Only a Mouse Click Away cessing document or design archive if it ML410) before they made their way into The Xilinx User Community Forums are a isn’t necessary. And if someone’s response the embedded development kit. Web-based arena for the exchange of ideas has been helpful, say “thank you.” Designers can lean on other user and questions relating to Xilinx silicon, More important, post your own ideas guides, characterization reports and white tools and designs. Since its launch in July and share your successes. Post follow-up papers as well. For those with rusty skills, 2007, the site has drawn more than 6,500 resolutions to aid others who face similar the Programmable Logic Quick Start registered users and likely many more issues. The idea is for the forum to become Guide (ug500.pdf ) can be a good place to “lurking” as readers. a place to share concepts with the commu- get started. Xilinx design tools (for exam- This is a powerful and easy-to-use nity—not merely to serve as a help line. ple, ISE®, EDK, AccelDSP™ and resource, organized by functional areas into Eric Steven Raymond, the open-source System Generator) also include a number “boards.” It is not a place to advertise prod- advocate and spokesman known to many of example designs or tutorials with the ucts or services, however. Nor is it a replace- simply as “ESR,” has his own useful guide- installation directory. While there is no ment for closed-loop technical support (like lines in his article “How to Ask Questions substitute for practical experience, the Xilinx WebCases, a formal mechanism for the Smart Way.” While his suggestions were existing documentation provides a frame- receiving Xilinx technical support from cus- initially intended for usenet and IRC dis- work for novice users to understand the tomer application engineers). Although a cussions, this content is appropriate to tool flow and options. number of Xilinx employees read these Web-based forums as well. Warning: Some We also recommended reviewing the forums and actively participate (often after audiences may view some of the content as release notes and answer records for the work), there is no guarantee you will get an politically incorrect. Nor does Xilinx as a tools and IP. A search of the Xilinx Web answer. But by following a number of corporation officially endorse it. site with appropriate search criteria (for guidelines, you can increase the site’s utility example, “MIG v2.3” or “MIG release to you and the wider community. Help from Freeware notes” for our Memory Interface The first tip is to review the datasheets ADEPT—the acronym stands for Advanced Generator tool) will yield related answer and user guides before posting. Designers Debugging/Editing/Planning Tool—is a records of known issues, questions and could find the answers to many of the ques- freeware application that enjoys a cultlike recommendations. A number of useful tions posted by simply reading the appro- following with many users and Xilinx tech- “index” answer records can provide sum- priate documentation. In addition, search nical employees because of its specific capa- mary information (search for “master before posting. It’s likely that others have bilities and ease of use. Xilinx itself does not answer”). Of course, nothing stands still, addressed many of the same topics before. officially distribute or support ADEPT. especially technology. So, even after Also, provide the appropriate information Developed by Xilinx strategic applica- becoming familiar with the documenta- so someone can help you. A rundown of tions engineer Jim Wu, ADEPT gets fre- tion necessary for your design, these what have you tried, what tools and IP you quent updates to address new features and resources will continue to change as new are using and the specific error message is to support updated versions of ISE as they documents are introduced and existing much better than “it doesn’t work.” In are released. Users must install the ISE ones are updated. But you won’t need to other words, keep your question specific; Foundation or WebPACK™, since comprehensively crawl through the public vague ones are hard to answer. ADEPT accesses those databases to pro- Xilinx site to stay current. It’s generally not realistic to expect a vide its functionality. Fortunately, a convenient mechanism specific example with your exact configu- ADEPT supports Virtex-4, Virtex-5 exists to notify users of changes. ration on a particular board. Many times and Spartan-3 Generation devices. It pro- MySupport Alerts act almost like a Web- an answer may guide you in the right vides important insight into aspects such based “diff” utility in bringing important direction, but it will still require some as footprint, clocking resources, transceiv- changes to your notice. Once you have an work to address your specific scenario. Be er location and attributes, and clock skew. account on the Xilinx Web site, log in and realistic. Don’t expect someone to do It also offers useful insight into your then update your profile to include the your professional homework (or universi- design, including logic utilization (by alerts you would like to receive. The site ty assignment), or to download your hierarchical decomposition) and control offers filtering by type of alert, device fam- design, review a lot of code and find an set usage. ADEPT can help generate ily, board/kit and IP. As updates are pub- obscure bug. Post in the appropriate Orcad and Viewlogic (now Mentor) for- lished, you will get an e-mail with a list of forum and do not cross-post the same mat schematic symbols, which is especial- changes, URLs and brief overviews of question to multiple forums. Start a new ly useful with our larger packages each change. You can drill down into the thread if you are switching subjects rather approaching 2,000 pins. After download- respective documents, each with its own than “hijacking” an existing thread. Make ing the zip archive, simply install this 32- revision history. Xilinx batches these alerts your post succinct and easy to access; bit Windows application by extracting the and typically issues them weekly. nobody wants to download a word-pro- zip archive and setting its environment

52 Xcell Journal First Quarter 2009 ASK FAE-X

A Hidden Gem: PicoBlaze IP

The PicoBlaze 8-bit sequencer is a little bit like the bicycle stashed in your garage behind the sedan and the minivan. The capabilities and targeted application are different, but this venerable technology is an important resource to consider depend- ing on where you are going and what baggage you are taking along. PicoBlaze is the marketing umbrella for a family of 8-bit programmable sequencers developed by Ken Chapman of Xilinx UK. The latest version, kcpsm3, is targeted for Spartan-3 Generation, Virtex-4 and (unofficially) Virtex-5 FPGAs. There are other ver- sions of PicoBlaze (Virtex/Spartan-II, Virtex-II and CoolRunner-II) as well. While there are slight differences in features and tar- geted device family support among them, all leverage a common lineage and share many similarities and design concepts. In truth, PicoBlaze isn’t often the focus of much marketing promotion. It has been around for a while, and its capabilities are often overshadowed by the performance and flexibility of its bigger siblings, the MicroBlaze and the PowerPCs, Xilinx’s 32-bit embedded 1Kx18 64-Byte PORT_ID processing offerings. Yet, even for designs Instruction

(PC) Scratchpad RAM Stack PROM 31x10 deploying the MicroBlaze or PowerPC, OUT_PORT CALL/RETURN PicoBlaze can be a natural complement for spe- Counter Program Flags cific areas of the design decomposition. Constants Instruction Z Zero Decoder Technically, PicoBlaze is a full 8-bit micro- C Carry INTERRUPT controller. However, that nomenclature usually 16 Byte-Wide Registers Operand 1 IE Enable s0 s1 s2 s3 sets the expectation that a graphical integrated ALU s4 s5 s6 s7 development environment and C compiler are IN_PORT s8 s9 sA sB sC sD sE sF available. While third parties do supply such Operand 2 tools for PicoBlaze, they are not part of the ref- erence design release; nor does Xilinx support Figure 1 – PicoBlaze Embedded Microcontroller Block Diagram them. “Programmable sequencer” better posi- tions how most engineers see the PicoBlaze. The PicoBlaze/kcpsm3 consists of an ALU, program counter stack (for nested subroutines), sixteen 8-bit registers, 64 bytes of scratchpad memory, program counter and control, and interrupt support circuitry. The control of the sequencer is captured in assembly language, and the assembler (a 32-bit Windows executable appropriately named kcpsm3.exe) will produce an RTL netlist of the sequencer instantiation and block RAM initialization. The design files include an HDL example, netlist (for schematic flows), UART and FIFO, testbench and JTAG boot loader. The documentation also includes a detailed user man- ual with an example design for an alarm clock. The Spartan-3E and Spartan-3A Starter Kit reference designs contain many more. System Generator also supports PicoBlaze. As an embedded sequencer, the device is best suited for applications in PicoBlaze Microcontroller which the control flexibility quickly outgrows expression in traditional HDL.

IN_PORT[7:0] OUT_PORT[7:0] PicoBlaze is a programmable state machine; its useful features include a fixed INTERRUPT PORT_ID[7:0] resource footprint (one 18-kbyte BRAM and 96 Spartan-3 slices) and a deter- RESET READ_STROBE ministic two clock cycles of execution time per instruction. PicoBlaze imple- WRITE_STROBE ments the instructions you would expect to find in an 8-bit microcontroller,

CLK INTERRUPT_ACK including bit operations (compare, add, subtract, rotate/shift, test and/or/xor,) data movement and program execution (call, jump, return). PicoBlaze includes an I/O space of 256 inputs and 256 outputs with an 8- Figure 2 – PicoBlaze Interface Connections bit address, 8-bit data input and 8-bit data bus. Since PicoBlaze is embedded in the FPGA, this I/O port is a key to interfacing with other logic in the FPGA and peripheral components. One common application is interfacing to user I/O, including LEDs (PWM), 2x16-character LCDs, switch inputs, rotary encoders or keyboard scanners. It is also well-suited for low-speed communications interfacing such as a serial port, SPI and I2C. Other interfaced examples include the Maxim/Dallas DS2432 secure EEPROM. It is also possible to interface PicoBlaze to a fast simplex link as an intelligent coprocessor for a MicroBlaze soft processor. You could use PicoBlaze for basic calcula- tions, such as an ADSR (attack, decay, sustain, release) envelope for an audio synthesizer. In short, the PicoBlaze sequencer is a nice complement to the capabilities of traditional HDL design for FPGAs. The PicoBlaze user guide (ug129.pdf ) provides guidance on the trade-offs with traditional FPGA design capture; it can be a help- ful resource when decomposing a larger design and deciding on the applicable capture technique of particular functions or logical blocks. The PicoBlaze download pages (www.xilinx.com/picoblaze) are a great place to start. — Barrie Timpe

First Quarter 2009 Xcell Journal 53 ASK FAE-X

Clearly, the key to successful design is not just to understand the tasks, challenges and possible risks, but to be able to draw appropriately from a variety of resources. variable to point to its location. A typical voltage supplies with higher associated cur- information on tools and devices. A num- usage would be as follows: rent, larger number of supply rails for ber of other solutions are available for a fee. peripheral components, increased energy For example, Xilinx offers one- and two- • Launch the application. costs and continued trends to higher-effi- day classes via an authorized training net- • Select the appropriate family, device ciency distribution systems. work. They provide more-detailed formal and package from the top subwindow. Few FPGA and board designers are experts training on technology, devices and tools. • Load the device (this may take a few in power supply systems. But trends like gate The company’s Titanium Dedicated pro- minutes, so be patient). leakage in deep-submicron geometries now gram, meanwhile, offers access to an appli- make it critical to understand the contribu- cations engineer for a period of a week or • Enable the appropriate display options tions of both static and dynamic power. Static more (either on-site or remote) to address via the View menu (this may also take power is the cover charge—the amount the specific areas of concern in your design. a while). device will dissipate even if your design is QuickStart is another popular option. It • Optionally, load an existing ucf or doing nothing. It is a function of device fami- bundles two days of on-site training with ncd file. ly, device size and junction temperature. three days of on-site consulting from a Dynamic power is design-dependent, based Titanium applications engineer. • Sort by appropriate field (pin number, on resources used, resource configuration and Beyond that, Xilinx Design Services is bank, type, etc.). toggle rates. Understanding both types is available for larger projects that require You can also export many of the views important if you hope to gauge a realistic esti- longer periods of engagement on design to a spreadsheet for design documentation mate of your power budget. development and implementation. Please and exchanges with other engineers. XPower is a Xilinx utility that can estimate contact your distributor or sales representa- Although there are other ways to per- power by extracting resource utilization from an tive for additional information on training, form this task, I often use ADEPT to inves- existing design. However, it is most useful when Titantium, QuickStart or XDS. tigate the resource utilization by module. I the design nears its final form. The XPower Clearly, the key to successful design is not also find it helpful when correlating the Estimator spreadsheets (distributed per family) just to understand the tasks, challenges and number and locations of required are a more appropriate resource earlier in the possible risks, but to be able to draw appro- IDELAYCTRLs on designs that utilize design cycle. You enter your estimated resource priately from a variety of resources. Xilinx Virtex-4 IDELAYs or Virtex-5 IODELAYs, utilization (logic, clocking, memory, I/O, etc.) continues to invest in providing an entire such as DDR2 memory interfaces. and see the resulting power. The XPower ecosystem of solutions beyond devices and For an introduction to this tool, Jim Estimator also automates thermal calculations corresponding tools. One or more of them Wu’s Web site (http://home.comcast.net/ based on the package, calculated power dissipa- will no doubt supply the help you need to ~jimwu88/tools/adept/) supplies the user tion, ambient temperature and airflow. It’s easy make your next design a success. guide, link to the download files, release to quickly play “what-if” scenarios with various notes history and descriptions of other pos- implementation approaches. Barrie Timpe is a Xilinx FAE supporting cus- sible functions (for example, how to port a The Power Solutions page (www. tomers in the Southern Ohio Valley. Prior to PCI pinout). However, please do not con- xilinx.com/power) includes links to these joining Xilinx in 2005, he worked as an FAE tact Xilinx technical support for questions resources, supporting documentation, supporting Xilinx customers with Memec on ADEPT. Webcasts and application notes. It also has Insight. His FPGA and hardware design links to popular third-party power solu- experience is in the area of military wired- The Power Issue tions providers, many of which have target- communication systems and I/O interfacing. Not long ago, power was a concern only ed literature and reference designs to aid in Timpe spends his spare time with his wife and for certain classes of applications, such as the power supply design for the board. four young children. He also actively partici- handheld or battery-powered designs. pates in the Xilinx User Community Forums. Now it is nearly always at the top of the Other Learning Options If you need assistance, check in with your list of design criteria, even with (or above) For those who need to go further, Xilinx local FAE, contact Xilinx tech support at issues like performance, functionality and Educational Services offers free online train- (800) 255-7778 or visit www.xilinx.com/ cost. Exacerbating factors include lower- ing and Webcasts to provide introductory support/clearexpress/websupport.htm.

54 Xcell Journal First Quarter 2009 ARE YOU XPERIENCED? Xilinx, Partners Offer Free Technical Seminar on High-Performance System Design

As part of a comprehensive customer training The seminar will also feature presenta- Other presentation topics include “DSP program, Xilinx regularly offers free technical tions on embedded processing in FPGAs, Design in FPGAs Using System Generator” seminars. For the months of February and covering both hardware and software. and “FPGA System Design with Built-in March, the seminars will focus on high-per- Based on region, the hardware presenta- Processor and DSP Blocks (Video Example).” formance system design. tion will focus on PowerPC® 440 or Optional presentations, depending on loca- Applications experts will offer tips and MicroBlaze® processor design, showing tion, will cover preverified telecom IP for OC- techniques for serial-link design, eight-lane how to create an embedded processing 3 to 100G line cards, video connectivity for PCI Express v2.0, 100G Ethernet MACs application in FPGAs. Attendees will broadcast applications, Internet Protocol and the use of DSP and embedded process- learn about the tool flow (EDK) and will video, memory interfaces, FPGA power- ing in FPGAs. The seminars will include implement an example design. The soft- reduction techniques and design techniques live demos from Xilinx and partners Agilent ware presentation, meanwhile, will cover for achieving high performance. Technologies, Mentor Graphics, Linear Linux and real-time operating systems. To register, visit www.xilinx.com/seminars. Technology, Northwest Logic and others. Each seminar will include multiple pre- sentations. “Programmable Solutions for City Delivery Date Day of the week Today and Tomorrow’s Design Challenges” North America Boston 2/3/2009 Tuesday is an overview of Xilinx’s current and next- Saddlebrook, NJ 2/10/2009 Tuesday generation products, and the key features, Philadelphia 2/11/2009 Wednesday platforms and applications they enable. “Building Robust Serial Links for 3G and Baltimore 2/17/2009 Tuesday 6.5G Applications” is an introduction to serial Montreal 2/24/2009 Tuesday transceivers in Virtex® FPGAs, GTP/GTX, Ottawa 2/25/2009 Wednesday supported protocols, char reports and boards, Toronto 2/26/2009 Thursday along with a design checklist. Attendees will Calgary, Canada 2/27/2009 Friday hear about serial-link analysis and learn do’s Dayton, Ohio 3/3/2009 Tuesday and don’ts in link debug for various use cases (chip-to-chip, backplanes and so on). Schaumburg, Ill. 3/4/2009 Wednesday In “Designing 40G and 100G Applications Milwaukee 3/5/2009 Thursday in FPGAs,” attendees will learn about Europe and Israel Cambridge, U.K. 2/25/2009 Wednesday emerging communications applications at Tel Aviv 3/10/2009 Tuesday 40- and 100-Gbps rates and how to success- Haifa 3/11/2009 Wednesday fully implement them in FPGAs. Use cases will cover 100G Ethernet MAC implemen- Oslo 3/17/2009 Tuesday tation in a single FPGA, Interlaken and SFI- Paris 3/19/2009 Thursday 5 bridging, and RXAUI. Milan 3/24/2009 Tuesday The presentation “Implement PCI Stuttgart 3/25/2009 Wednesday Express (Gen 1 and Gen 2) in FPGAs” cov- Hannover 3/26/2009 Thursday ers the key concepts of PCI Express protocols and interoperability. Attendees will learn China Beijing 2/23/2009 Monday how to create a PCI Express design using Xian 2/25/2009 Wednesday Xilinx IP tools and see a demo for PCI Chengdu 2/27/2009 Friday Express generation 2. Presenters will review Wuhan 3/3/2009 Tuesday multiple example applications and use mod- Nanjing 3/5/2009 Thursday els with multiple end points.

First Quarter 2009 Xcell Journal 55 XAMPLES... Application Notes If you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these notes.

XAPP1127: XPS LL Tri-Mode Ethernet MAC Performance XAPP1103: Simulation of the IEEE 802.16 CTC Encoder and Decoder with Monta Vista Linux www.xilinx.com/support/documentation/application_notes/xapp1103.pdf www.xilinx.com/support/documentation/application_notes/xapp1127.pdf In this application note, Michael Francis and Raied Mazahreh describe In this new application note, Brian Hill describes how to use the stan- how to simulate the LogiCORE™ IP IEEE 802.16e CTC Encoder dard network performance suite Netperf to measure XPS LL Tri-Mode and IEEE 802.16e CTC Decoder together using either Mentor Ethernet MAC (TEMAC) performance with MontaVista Linux 4.0. Graphics’ ModelSim simulator or hardware-in-the-loop based on The note discusses several of the tunable values that can affect Ethernet Xilinx System Generator software. Simulation using ModelSim gives performance while introducing Netperf as a means of testing and users a better understanding of the operation of the IP and the design’s measuring network performance. Netperf 2.4.4 source is included with various control signals, while hardware-in-the-loop provides users with this application note, along with prebuilt Linux and Cygwin images. bit-error rate (BER) performance measurements. The authors describe their use of a noisy-channel model to test the encoder/decoder combination under a variety of additive white XAPP1126: Reference System: Designing an EDK Custom Peripheral Gaussian noise (AWGN) conditions using the AWGN module. They with a LocalLink Interface use a combination of Simulink® blocks and MATLAB® scripts to www.xilinx.com/support/documentation/application_notes/xapp1126.pdf control the hardware, and collect and display the results. This appli- cation note does not deal with MATLAB simulation; it’s covered in James Lucero describes the design of an EDK core with a LocalLink more detail in ug497, “LogiCORE IP 802.16e CTC Decoder v3.0 interface in this new application note. The note, which includes a ref- Bit-Accurate MATLAB Model User Guide” (www.xilinx.com/ erence design, shows you how to use the Create IP Wizard application member/ieee802_16e_ctc_dec_eval/ctc_decoder_v3_0_bitacc_ to generate the PLB v4.6 slave interface and then add the LocalLink cmodel_ug497.pdf ). interface to the core. To create simple user logic with the LocalLink interface, you can take the payload data transmitted from the Hard DMA (HDMA) and loop it back to the payload on the receive chan- XAPP1113: Designing Efficient Digital Up- and Down-Converters nel. In the note, the XPS LL EXAMPLE core is connected to the for Narrowband Systems Master PLB (MPLB) for the slave registers and the LocalLink inter- www.xilinx.com/support/documentation/application_notes/xapp1113.pdf face is connected to the first HDMA block on the PowerPC® 440 processor block. The note shows how you can then use the This new application note by Stephen Creaney and Igor ChipScope™ Analyzer to verify the core’s LocalLink loopback func- Kostarnov demonstrates how you can create efficient digital tionality. A standalone software application is included to initiate a upconverter and downconverter (DUC/DDC) implementations single DMA transaction. The receive channel is set up to accept and by leveraging Xilinx’s DSP tools and IP portfolio. Digital upcon- verify that the Rx and Tx payload data are the same. Lucero uses the verters and digital downconverters are key components of RF sys- Xilinx ML507 Rev. A board for this reference system. tems in communications, sensing and imaging. While previous

56 Xcell Journal First Quarter 2009 XAMPLES...

application notes, including XAPP1018 (www.xilinx.com/ In addition, Xilinx included performance cores for PLB v.4.6 support/documentation/application_notes/xapp1018.pdf ), have master interfaces and HDMA in the system to measure system provided examples of DUC and DDC implementation in wide- performance before and after optimizations. The note includes band communications systems, this document concentrates on two standalone software applications to demonstrate DMA narrowband systems and the building-block components avail- transactions for XPS Central DMA and HDMA to DDR2. able to meet their particular requirements. During these DMA transactions, you can measure the core’s per- The note provides you with step-by-step guidance on how to formance. For XPS Central DMA, the performance is measured perform simulation of narrowband DUC/DDC systems in MAT- between PLB_PaValid and when the XPS Central DMA provides LAB, how to map functions onto building blocks and IP cores for an interrupt (DMA transactions are complete). For HDMA, the Xilinx FPGAs with Xilinx’s System Generator software and how Tx and Rx channels are measured between the first frame’s start to verify the implementation against the simulation model. The of frame and the last frame’s end of frame note includes two examples: a multicarrier GSM system (both Further, this application note describes how to obtain latency DUC and DDC) and a multichannel MRI receiver (DDC only). numbers across the crossbar and obtain performance numbers The examples provide a guide and template for implementation before optimizing the system/software applications. It also details of your own application systems. The authors also outline the how to optimize the system/software application for performance advantages and limitations of the approaches and methods they and obtain performance numbers with the optimized system. cite, so that you can better tailor these schemes for your particu- The reference system uses the Xilinx ML507 Rev. A board. lar system design.

XAPP1043: Measuring Treck TCP/IP Performance Using the XPS XAPP468: Fail-safe MultiBoot Reference Design LocalLink TEMAC in an Embedded Processor System www.xilinx.com/support/documentation/application_notes/xapp468.pdf www.xilinx.com/support/documentation/application_notes/xapp1043.pdf Jim Wesselkamper in this application note describes a reference In this application note, Doug Gibbs illustrates how to measure design that adds fail-safe mechanisms to the multiboot capabilities the network performance of the XPS Local-Link Tri-Mode of the Extended Spartan®-3A family of FPGAs (Spartan-3A, Ethernet MAC (TEMAC) in an embedded-processor system Spartan-3AN and Spartan-3A DSP platforms). The reference running the Treck TCP/IP stack. The measurements use a design configures specific FPGA logic via an initial bitstream that PPC405 processor-based system with the ML405 Evaluation determines which application (by alternate bitstreams) to load. The Platform or a MicroBlaze-based system with the ML505 decision is based on the bitstream revision, the number of prior Evaluation Platform. Test software and methods are provided so configuration attempts and the integrity of the alternate bitstreams. designers can perform similar tests using different boards. This Users implement the algorithms that test bitstream integrity and application note outlines how to acquire the hardware designs which bitstream image to load using a PicoBlaze™ controller. and set up the two boards. It details the setup of the Treck Additional independent modules manage communication with the TCP/IP software along with how to integrate it with a stand- Internal Configuration Access Port (ICAP) and the Serial alone test application. Test software that can be run on a Peripheral Interface (SPI) flash device. Windows or Linux PC is also discussed.

XAPP1121: Reference System: Optimizing Performance XAPP1106 (Updated): Using and Creating Flash Files for the MicroBlaze in PowerPC 440 Processor Systems Development Kit—Spartan-3A DSP 1800A Starter Platform www.xilinx.com/support/documentation/application_notes/xapp1121.pdf www.xilinx.com/support/documentation/application_notes/xapp1106.pdf In this reference system, James Lucero demonstrates how to This application note describes the files for programming the seri- improve system performance in the PowerPC 440 processor block al flash memory and the StrataFlash memory for the MicroBlaze on the Virtex®-5 FXT FPGA. The reference system note describes Development Kit—Spartan-3A DSP 1800A Starter Platform. how to connect the XPS Central DMA master interface to the The reference system executes the HelloWorld software applica- Processor Local Bus (PLB) v4.6 on either PLB Slave 0 (SPLB0) or tion to run from the serial flash using SPI configuration mode PLB Slave 1 (SPLB1). You can then modify the parameters for the and the BlueCat Linux image from the StrataFlash in BPI config- XPS Central DMA for the DMA engine to allow for parallel reads uration mode. The development kit includes all the files you need and writes through the crossbar. For HDMA, the paper discusses to run both. This application note describes how to use the pro- setting threshold values for interrupts and changing addresses in vided files, as well as how to create new files and run the reference main memory for buffer descriptors and transmit/receive buffers. system successfully from the flash memories. Xilinx recently A simple loopback core is connected to the LocalLink interface updated this app note to be in line with the ISE® software 10.1.3 on one HDMA. update and added JFFS2 root file system information.

First Quarter 2009 Xcell Journal 57 PROFILES OF XCELLENCE Mark Moshayedi Drives STEC Enterprise Storage to Greener Pastures The company and its president and COO seem to have timed the enterprise solid-state disk drive market to perfection.

by Mike Santarini Publisher, Xcell Journal Xilinx, Inc. [email protected]

Spotting the next big opportunity in elec- tronics and moving quickly to capitalize on it is a skill not many possess. But one who does is Mark Moshayedi, the 46-year-old president and chief operating officer of solid-state disk drive (SSD) OEM STEC Inc. The Santa Ana, Calif., company, which he founded with his two older brothers, has reinvented itself a few times to capitalize on new opportunities. Today, STEC is a public company offering a faster, lower-power replacement for mechanical hard-disk drives at a time when power con- sumption is the hot topic in electronics and the green movement is on the precipice of becoming the next big economic driver for the U.S. economy. STEC’s flagship prod- uct, Zeus-IOPS, is built around Xilinx® Virtex® FPGA devices. “SSDs offer a 97 percent power savings over mechanical hard-disk drives—one SSD can do the work of 30 HDs,” said Moshayedi. “Increasingly, customers aren’t as concerned with the unit cost as much as the total cost of the product over its life- time.” After you buy the equipment, “you must have the space to run it, you have to spend money to power it and then you have to spend more money to cool it,” Moshayedi explained. “Today, the purchase price of SSDs is higher than that of mechanical disk drives, but the long-term cost to the user is much less. And SSDs are more reliable.”

58 Xcell Journal First Quarter 2009 PROFILES OF XCELLENCE

STEC Inc. is a growing company seem- ingly equipped with the right technology STEC uses Xilinx Virtex FPGA at the right time. But Moshayedi didn’t propel it there out of luck. Indeed, he devices in its drives to monitor worked hard and overcame many obstacles to get to where he is today, wearing many bit integrity across 16 channels hats along the way. In 1982, Moshayedi received his BSEE of flash in the system. from the University of California, Irvine, and went to work as a design engineer for Texas Instruments. Two and a half years ny started manufacturing its own memory name change to STEC Inc. It was a bit of later, he joined as a field applications modules. Then, in 1994, the company gamble, but seemingly one that has paid off. engineer and took night classes to get an began developing CompactFlash cards. In late 2006, the company began work MBA from Pepperdine. He leveraged that “At the time, the only other supplier was on its Zeus-IOPS product line for the know-how to go into sales, becoming an SanDisk, so we started developing our own enterprise market. These devices boast vast area sales manager for Sony Semiconductor. flash controller and working with a few performance improvements over mechani- In the late 1980s, some of his friends start- other controller manufacturers,” Moshayedi cal disk drives. STEC announced the Zeus- ed a rep firm called Westar Rep Co., which said. “One in particular was Cirrus Logic, IOPS first running in a product line with became the exclusive representative for which was doing solid-state controllers. customer EMC in January 2008. Samsung Semiconductor in Southern And we ended up in 1995 buying that “The product caught on like wildfire and California. Moshayedi joined them. [controller] group from Cirrus Logic and immediately turned on all the other cus- “In those days, Samsung was an up-and- naming it Lexar Media.” tomers in the world to look at using solid- coming company and we were mainly sell- But in 1995-96, the memory market state disk drives,” Moshayedi said. “HP, Dell, ing DRAM for them,” said Moshayedi. “A went into free-fall, and soon Simple Sun...several companies in the storage area lot of our customers back then were Technology found that it had to spin Lexar immediately engaged with us to design-in DRAM memory module manufacturers out. “We sold it to a group of investors and our product into their product lines.” like Kingston, Southland Micro and it became its own entity,” said Moshayedi. The offering is indeed impressive. “Our Viking Components. Those companies did The company then signed a deal with Zeus-IOPS product line replaces 30 a tremendous business.” Hitachi, to become its exclusive manufactur- drives,” Moshayedi said. “It not only gets Moshayedi said it was around that time er of solid-state disk drives in North the same performance you get from 30 when his two brothers, Mike and America. “We would design the products in HDDs from an I/O perspective, but you Manouch, both of whom are civil engi- conjunction with Hitachi and sell them to all also cut down latencies to less than a mil- neers, sought his advice for starting a new the customers,” he said. But in 2000, Hitachi lisecond, from 10 to 15 milliseconds. At the company. “I told them they should look opted out of the SSD business, and Simple same time, it cuts down power consump- into building and selling memory mod- Technology began developing its own con- tion by 97 percent.” ules,” he recalled. His siblings liked the trollers. Simple Technology also made its One of the main concerns of flash-based idea. “I remember we got our start by buy- public offering in September of that year. SSDs is that after many tens of thousands ing a thousand 1-Gbyte memory modules By 2002, the company had developed a of writes and reads, the memory bit-error for $60 apiece and sold them later the same full line of FPGA- and ASIC-based SSDs rates increase. STEC uses Xilinx Virtex day for $64 apiece. They said, ‘hey, this and began selling them to the market. In FPGA devices in its drives to, among other isn’t a bad gig,’” Moshayedi recalled. late 2004, Moshayedi said the company things, monitor bit integrity across 16 That exercise kicked off Simple saw an opportunity to sell even bigger channels of flash in the system. Technology Inc. as a business. After 18 drives to the enterprise market. Thus began “The Zeus-IOPS lineup uses the Virtex-4 months, the fledgling company started to development work on its Zeus line, tailored FX60, which has the integrated PowerPC break even, so Moshayedi left Westar Rep for enterprise servers. processor,” said Moshayedi. “It provides to head up engineering, product develop- By 2005, Simple Technology owned 4 throughput of roughly 300 Mbps for reads ment and manufacturing. Manouch percent of the $250 million SSD market. and writes. But more importantly, it gives us Moshayedi became the company’s CEO, Also around this time, the company sold off 50,000 I/Os per second out of a single drive. the same job he now holds at STEC, while its retail memory module, flash card and The Fibre Channel interface is done by the brother Mike served as its president and portable hard-drive business—along with Xilinx chip; then, because it has the PowerPC has since retired. the name Simple Technology—to Fabrik, on board, it does all the ware leveling and all In 1992, Mark Moshayedi established to focus exclusively on its SSD OEM busi- the flash block management—everything is an engineering department and the compa- ness. With the sale came the corporate done in that controller.”

First Quarter 2009 Xcell Journal 59 PROFILES OF XCELLENCE

Moshayedi said that STEC isn’t resting on STEC is forecasting that by 2013, stantly looking for ways to reduce operating its laurels. The company, which employs 800 “nobody is going to be manufacturing expenditures. The Zeus-IOPS’ 97 percent people, followed up on the success of its 15k RPM drives any longer, and every- power reduction means the equipment Zeus-IOPS introduction with the release of a thing is going to be solid state,” doesn’t generate nearly as much heat, deliv- 3-Gbyte serial-attached SCSI version in Moshayedi said. “The enterprise disk ering a savings in cooling costs. November. It expects to roll out a 6-Gbyte drive market today is about a 30 mil- Ten years ago, $50 billion worth of SAS model in 2009. “Those will also be first- lion-units-per-year market. If you enterprise equipment sold every year, to-market products that use Xilinx chips,” assume one-to-one replacement, without consuming $1 billion worth of energy. said Moshayedi. any growth in the market—and that one Today, the same amount of enterprise In Moshayedi’s view, the market for of our drives can do the work of these 30 equipment consumes $5 billion worth of enterprise-class SSDs is just now starting to disk drives—we have an opportunity to energy. “This growth in energy cost could take form. sell 1 million new units per year.” That be addressed by using SSDs in these sys- “I believe this is a disruptive technology prognosis, he added, is “assuming there tems,” Moshayedi said. that will drive a huge change in the disk drive is no growth in the market: we believe it Time will tell if the market lives up to, industry,” he said. “It used to be that enter- will indeed grow.” or even exceeds, its potential. Certainly, prise-level systems would use one tier of high- In fact, Moshayedi’s estimates seem you can be sure that if it does, Moshayedi speed disk drives to get I/O and then use a somewhat conservative considering the will have had a hand in its success. If it second tier of slower drives for capacity. Now fact that many companies are trying to doesn’t, he’ll move on to the next hot we see that second tier completely going away.” improve their “green” profiles and are con- technology.

60 Xcell Journal First Quarter 2009 Xilinx Global Services offers comprehensive service solutions that help you design more efficiently and cost- effectively. Seize the competitive advantage today with services designed to lower your costs, reduce speed grades, decrease your time-to-market, and close the gap between consumable and consumed technology.

Xilinx Services Portfolio Education Services – High quality, comprehensive courses to give you a competitive edge

Titanium Dedicated Engineering – An engineer dedicated to your team on your site or remote, provides the design help you need to get to market faster

QuickStart! – Combines an on-site, customized education course with a dedicated engineer to coach you through your design and increase your design productivity

Xilinx Productivity Advantage Program – A bundled solution offering a one-stop opportunity to purchase all of the tools you need to get it right the first time

To learn more about Xilinx Global Services, we invite you to visit: www.xilinx.com/services

www.xilinx.com

©2009 Xilinx Inc. All rights reserved. XILINX, the Xilinx logo, are other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. TOOLS OF XCELLENCE ToolsTools ofof XcellenceXcellence News and the latest products from Xilinx partners

by Mike Santarini Publisher, Xcell Journal Xilinx, Inc. [email protected]

62 Xcell Journal First Quarter 2009 TOOLS OF XCELLENCE

Omiino Pioneers ‘Virtual ASSP’ Business Model Running on Xilinx FPGAs

A startup based in Belfast, Northern Ireland, is on a mission to The OM1312 is a 2.5-Gbit/second, high- and low-order replace ASSPs in the carrier optical-transport market with FPGA- Ethernet-over-Sonet/SDH (EOS) packet mapper targeted to the based designs offered through a pioneering business model. With Xilinx Virtex-5 FPGA family. T The OM1312 transports 2.5 Gbps the “Virtual ASSP,” the company provides its full chip designs to of channelized Ethernet over high-order or low-order Sonet/SDH customers as completed FPGA netlists ready for implementation in using standards compliant to GFP-F, VCAT and LCAS. Its SPI3 various Xilinx® Spartan®-3 and Virtex®-5 FPGAs as a cost-effective packet interface supports 128 channels. alternative to application-specific standard products. The OM1411 is a 10-Gbps high-order EOS packet mapper Founded in 2007 by a team drawn from tier-one equipment likewise targeted to the Xilinx Virtex-5 FPGA family. It transports and ASSP manufacturers, the 10-person Omiino is currently 10 Gbps of channelized Ethernet over Sonet/SDH using standards delivering full Ethernet-over-Sonet/SDH devices. Its 2009 road compliant to GFP-F, VCAT and LCAS. The SPI4.2 packet inter- map calls for the launch of Sonet/Ethernet-over-OTN framers at face supports 192 channels. The Sonet/SDH interface is a 10-Gbps 10G, 40G and beyond. 1+1 protected TFI5 link for high-reliability carrier-grade deploy- “FPGAs are clearly powerful enough to replace ASSPs on ments. The OM1411 handles both real and virtual concatenation boards,” said Gary Hamilton, the company’s CEO. “There isn’t a (CCAT and VCAT). tier-one or tier-two equipment vendor in the carrier space who isn’t The company’s third offering, the OM1412, is a 10-Gbps high- doing it or considering doing it, mainly because both they and the and low-order EOS packet mapper targeted to the Xilinx Virtex-5 ASSP vendors struggle to justify new 65-nanometer ASIC and ASSP FPGA. It transports 10 Gbps of channelized Ethernet over high- designs, given the considerable costs. FPGAs also promise faster order or low-order Sonet/SDH using standards compliant with design cycles and the ability to field-upgrade, which is invaluable GFP-F, VCAT and LCAS. Up to 25 percent of the 10G bandwidth when the standards are evolving so rapidly.” can be low-order traffic. Its SPI4.2 packet interface supports 192 But especially in a tough economy, “the real challenge for the channels, making it possible to handle more individual customers equipment manufacturers in transitioning from ASSP- to FPGA- on a single device, further reducing operating expenditures. The based designs is to stop R&D budgets from expanding and to lower Sonet/SDH interface is 10-Gbps 1+1 protected TFI5. product costs,” Hamilton said. “The lack of off-the-shelf third-party Like the OM1411, the OM1412 handles both CCAT and ASSP replacements in FPGA means that equipment vendors are VCAT. It also supports the Link Capacity Adjustment Scheme to having to grow internal capability and are therefore increasing their allow a hitless increase or decrease in group bandwidth as customer R&D budgets.” That’s why “the resulting FPGA designs are gener- need dictates. ally bespoke solutions”—highly integrated, but expensive and Along with its designs, Omiino offers the software drivers as power-hungry designs, compared with the ASSPs they are replacing, standard, but uniquely, Omiino has developed on-chip debug he said. “This is not a scalable business model.” tools. OmniTest and OmniSpy allow the user to view and capture To address these issues, Omiino has developed a suite of “ultra- the internal state information of the device or to set up the device compact” intellectual property (IP) which it will integrate and veri- and test it independently of external software. fy to deliver fully functional, high-performance Virtual ASSPs “The OmniTest and OmniSpy were born from our frustra- targeted at very cost-effective Xilinx FPGAs. Hamilton said Omiino tion, as product developers, with time taken to integrate silicon designs consume 60 percent to 80 percent fewer FPGA resources into the product, the dependency on software for board bring-up than conventional ASSP-targeted IP. and the manually intensive and time-consuming root-cause- “We can give customers a lower product cost and do it in the analysis support models,” said Hamilton. “The OmniTest and familiar ASSP business model, which is why we call our offerings OmniSpy tools can dramatically accelerate [development] and Virtual ASSPs,” he said. reduce costs during integration, verification and field support. Implementing the designs cost-effectively in FPGAs rather We have great feedback from all of our customer engagements as than ASSPs or even structured ASICs reduces the delivery time they see firsthand the benefits, simplicity and power of the tools.” and preserves the reconfigurability, Hamilton said. Retaining Despite the fact that Omiino has a number of offerings that are reconfigurability is important, especially when a standard isn’t off the shelf and ready to go, the company recognizes that cus- quite solidified or changes frequently—a common occurrence in tomers have individual requirements, said Hamilton. Omiino will telecom. “We say retain the programmability, allowing customers work with them to customize designs for specific applications. In to go to market early, and then do field upgrades if they are need- addition, Omiino can supply subsystems of ultracompact IP blocks ed,” said Hamilton. such as GFP, VCAT and Sonet/SDH framers with speeds from 155 Today, the company has three main offerings: the OM1312, Mbps to 10 Gbps. OM1411 and OM1412 Virtual ASSPs. It describes each of these For further information, visit www.omiino.com/en/home or devices as highly integrated and ultracompact. contact [email protected].

First Quarter 2009 Xcell Journal 63 TOOLS OF XCELLENCE

GateRocket Drive Speeds Verification of Virtex-5 FPGAs 4x to 10x

With a mission to help “ASIC refugees” speed the verification because the RocketDrive itself contains an FPGA from the largest and debug of ever-more-complex system-on-chip (SoC) designs platforms in the Virtex-5 family. One model packs a Virtex-5 they’re implementing on FPGAs, EDA tool startup GateRocket LX330T, another the Virtex-5 SX240T and a third, the Virtex-5 has released a Xilinx Virtex-5 version of its RocketDrive “native FX200T device. Orecchio said this native-device approach, among device” verification and debug environment. Using it, engi- other things, eliminates the nagging question that haunts users of neers can test the behavior of a design running on actual FPGA traditional logic emulation systems: is the cause of a given bug silicon while still in their HDL verification and debug envi- really a problem with the design under test or is it due to a flaw in ronment of choice. the emulation system running the design?

GateRocket’s RocketDrive lets you test the behavior of your design running on actual FPGA silicon while still in your HDL verification and debug environment of choice.

Traditionally, hardware-assisted verification and emulation RocketDrive claims to speed verification 4x to 10x over soft- systems have targeted ASIC designs, but Dave Orecchio, ware-based simulation. The larger the design, the greater the GateRocket’s president and CEO, said that today’s FPGAs have speedup, Orecchio said. In addition, he went on, because the advanced to the point where designers can now truly implement drive is device-native, it can help users quickly find the root caus- SoC designs on a single FPGA. “It used to be that for many appli- es of functional errors in their HDL code. Designers can select cations you had no choice but to design an ASIC,” Orecchio said. which blocks they want to run on the drive and those they want “Today designers have a choice, and increasingly they are design- to run on the simulator. ing with FPGAs.” Customers can also use the drive to validate a model of an IP Orecchio described RocketDrive as “the first hardware-assisted block against the IP actually running on silicon, or to validate verification and debug environment specifically targeting these that they have selected an optimal pin placement during the early complex FPGA design projects—we’re here to help ASIC refugees.” stages of design. RocketDrive also helps users figure out if they GateRocket’s offering has a hardware and a software compo- have any tool chain faults. nent. The hardware is a drive that fits into any 5.25-inch drive “You can use the RocketDrive at every phase of the design, slot in a PC chassis, while the RocketVision software coordinates from RTL creation all the way through onboard test,” said verification and debug, connecting the drive to Xilinx ISE® and Orecchio. third-party vendor tools of the user’s choice. The RocketDrive sells for $45,000, while the RocketVision soft- Customers select which model of drive they need for their ware is priced at $9,500. You can learn more about GateRocket and designs based on which Virtex-5 device they plan to use. That is even sign up for a free Webinar at www.gaterocket.com.

64 Xcell Journal First Quarter 2009 TOOLS OF XCELLENCE

GE Fanuc Aims Versatile Virtex-5 FPGA Board at A&D COTS Market

Building on the success of its board offerings built around Virtex-4 and preconfigured PCI Express end points, along with an embed- FPGAs, GE Fanuc has rolled out a Virtex-5 FPGA-based mezzanine ded-processor example system. Mitchell noted that the company card targeting DSP-centric applications developers in the aerospace- developed all of these designs with the Xilinx ISE tool suite, and and-defense COTS markets. sticks to Xilinx’s recommendations whenever possible so that The company designed its Intelligent Platforms XMCV5 FPGA when customers integrate the cores into their designs, there are no mezzanine board in a way that allows customers to use the Virtex-5 compatibility surprises. device that best suits their application needs, said Ramon Mitchell, GE Fanuc has also extended its internally developed tool, Axis, principal engineer of the MNG Group at GE Fanuc. to now support its FPGA systems. Axis allows customers at a high The board supports Virtex-5 devices based around the 1136 level of abstraction to assign algorithms to processors running on the package. As such, Mitchell said, it is possible to fit members of the company’s processor-based cards, said Mitchell. “We’ve now added Xilinx LXT, SXT and FXT families into the XMCV5 system. the hooks in to allow this to work with our FPGA nodes as well,” Mitchell noted that GE Fanuc also offers processor-based boards he said. “So someone that wants to do a system that uses both and switches, which essentially allows A&D COTS customers to mix processors and FPGAs can use Axis to partition the design and hook and match boards to create systems that best fit their requirements. up the various interfaces.” “During the development of an FPGA-based system, the system GE Fanuc also provides a BIST tool called VBIT. “It’s basical- may require a logic-based FPGA, then a DSP-based one at some ly a preconfigured image which will test all the interfaces on the other point in the system and maybe one that will benefit from the board,” said Mitchell. “Customers run that at power-up to check processors and MGTs [multigigabit transceivers] in the FXT,” said the board is working properly and then program their own image Mitchell. “With these almost bite-size FPGA boards, you can slot in once they know it is good. We do all this to make life easier for these in throughout your system. Flexibility is the key, because this our customers.” is a commercial off-the-shelf product and it isn’t designed to a spe- Because the system is targeting harsh environments, GE Fanuc cific requirement. Every pin in the FPGA is connected to some- has rigorously tested the board running at -45°C to 90°C and has thing; we didn’t want to miss an opportunity.” done extensive signal integrity testing on the system as well. “It’s In addition to a Virtex-5 FPGA, the XMCV5 FPGA mezzanine extremely robust,” said Mitchell. board includes two banks of QDR SRAM, DDR2 SDRAM and SPI For more information, visit www.gefanucembedded.com/ flash. For back I/O it has VITA 42.3 PCI Express Gen2 (to 5 GHz) products/2260. or a VITA 42.2 Serial RapidIO build option, along with RocketI/O™ to 6.5 GHz via GTX ports, LVDS, GPIO, Gigabit Ethernet and serial ports. Front I/O includes Gigabit Ethernet, seri- al, LVDS and GPIO. The company has also created its own design care package called VWRAP. “It is based on work we’ve done to develop all the required interface IP inside the FPGA to make our customers’ lives easier when they build projects,” said Mitchell. Among other things, VWRAP includes DDR and DDR2 con- trollers, LVDS drivers

GE Fanuc’s Intelligent Platforms XMCV5 FPGA mezzanine board is built around the Xilinx Virtex-5 series FPGA.

First Quarter 2009 Xcell Journal 65 XPECTATIONS Xilinx’s Support Network: Our Success Is Your Success A stellar support staff and long-term partnerships with customers make all the difference.

by Bruce Kleinman Vice President, Technical Sales Xilinx, Inc. [email protected]

Over my many years here at Xilinx, I’ve come to understand the true value that comes from open communication and real partnership with our customers. A growing user base typically presents Throughout our history, Xilinx has made a challenges to companies, but Xilinx’s support sizable investment in establishing an exten- staff is committed to solving problems users sive customer support network and staffing it may encounter in the quest to build innova- with not just great application engineers and tions and get them to market quickly. technical support staff, but people who share Indeed, in addition to our FAE staff, we also a passion for solving problems. have our worldwide technical support team, Indeed, I have deep respect for the many which is the front line in helping you fix hardworking people in our technical sales problems or directing you to helpful docu- and support network, and awe for their mentation that answers your questions. If the abilities. One would think that solving problem is particularly tough, they’ll have problems from a variety of customers Of course, it would be great if all design the best experts assist you. worldwide, who vary greatly in experience techniques, tools, methodologies and silicon Some customers may be fresh out of col- levels and are targeting a large swath of features were crystal-clear to all customers on lege, others may be ASIC refugees who are applications, would be a bit overwhelming. the day the features were released. For some trying to ramp up on FPGA technology. But I’m proud to say that Xilinx’s staff customers that may be the case, but as Xilinx Still others may be embedded-software (more than 400 people in technical sales remains committed to fielding the most designers or DSP programmers who want and support) is full of unique individuals advanced FPGA platforms available, design to create super-advanced embedded systems who each get charged when a new challenge is certainly becoming more complex and or use the ultra-advanced DSP capabilities arises. Many of our FAEs have done the job FPGAs are adapting themselves to an ever- of an FPGA. For those folks, we offer train- for 10 to 15 years and all remain passionate growing number of applications. Today, we ing courses throughout the year and have an about helping users solve today’s problems. at Xilinx have technical sales and support extensive network of training partners. In addition, they play a vital role in personnel who specialize not just in FPGAs, And for companies that need additional delivering constructive feedback to our tool, but in targeting those FPGAs in application specific expertise to get a project done, we IP and FPGA silicon development groups spaces such as wired and wireless communi- have a technical services group packed with about what features next-generation FPGA cations, aerospace and defense, automotive, experienced engineers who boast broad platforms should include. Sometimes that industrial, scientific and medical. applications knowledge. feedback is not eagerly received; but more What’s more, over the last few years we’ve It’s no small task to design with an times than not, it keeps our tool, IP and sil- seen Xilinx’s user base expand beyond the advanced FPGA, but it shouldn’t be a for- icon developers keenly in tune with reality traditional hardware engineer. Each year we midable task either. Xilinx is keenly aware so they can quickly fix problems that arise see a growing number of embedded-software that our success is directly linked to yours. and integrate those lessons learned into engineers, DSP algorithm developers and The company, and our extensive support next-generation products. system architects using our devices. network, is committed to your success.

66 Xcell Journal First Quarter 2009

Design Simply

Design Completely

Design Today

The Virtex®-5 Family: The Ultimate System Integration Platform

s )NCREASE9OUR3YSTEM0ERFORMANCE s ,OWER9OUR3YSTEM#OST s $ESIGNWITH%ASE

4HE6IRTEX FAMILYDELIVERSUNPARALLELEDSYSTEMINTEGRATIONCAPABILITIESFORDRIVINGYOUR MOSTMISSION CRITICAL HIGH PERFORMANCEAPPLICATIONS7ITHACHOICEOFFIVE PLATFORMSOPTIMIZEDFORLOGIC SERIALCONNECTIVITY $30ANDEMBEDDEDPROCESSINGWITH HARDENED0OWER0#® PROCESSORBLOCKS THE6IRTEX FAMILYDELIVERSANUNPRECEDENTED COMBINATIONOFFLEXIBILITYANDPERFORMANCE—BACKEDBYWORLDCLASSAPPLICATIONSUPPORT

/NLYTHE6IRTEX FAMILYOFFERSYOUACOMPLETESUITEOFDESIGNSOLUTIONSBUILTONPROVEN NMTECHNOLOGYINDEVICESSHIPPINGTODAY

'ETSTARTEDONYOUR6IRTEX DESIGN6ISITwww.xilinx.com/ise FORAFREEDAYEVALUATION OFANY)3%® $ESIGN3UITEPRODUCT

www.xilinx.com/virtex5

©2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.The PowerPC name and logo are registered trademarks of IBM Corp. and used under license. All other trademarks are the property of their respective owners