Evolving Low-Cost Solutions

SU 5 PIG20 CL ORA XILINX, INC. XCELL JOURNAL ISSUE 45, SPRING 2003

Issue 45 Spring 2003 XcellXcelljournaljournal THE AUTHORITATIVE JOURNAL FOR PROGRAMMABLE LOGIC USERS

EvolvingEvolving Low-CostLow-Cost SolutionsSolutions

SERIAL TSUNAMI Serial Tsunami Verification and Planning SOFTWARE ISEISE 5.2i5.2i FurtherFurther Reduces Costs RealFast RTOS

EMBEDDED PROCESSORS New PicoBlaze RISC Reference Design FPCs for Cost-Sensitive Applications APPLICATIONS Processors for Software Defined Radio How to Make Smart Antenna Arrays NEW PRODUCTS MultiPRO – New Prototyping ToolTool COVER STORY

Serial Tsunami High-Speed I/O Technology... R Higher Performance, Lower Costs Will the RealDigital™ CPLD please stand up.

If you are not using the latest breakthrough in dividing the clock off-chip. And the new RealDigital CPLD technology, you are not getting the most out of your CPLD. Only the CPLDs offer the smallest packaging with the highest level new 1.8V CoolRunner-II RealDigital CPLD from Xilinx, offering a 100% digital of design security. core, gives you the performance, low power, and features you are looking for Get the new CoolRunner-II Design Kit with no price premium. Get started now with the new CoolRunner-II Design Kit, The best system performance in the industry including a populated board, programming cable, design guide, and software The CoolRunner-II RealDigital CPLD features resource CDs… all for under fifty dollars! 1.8V CORE VOLTAGE CPLD COMPETITIVE CHART the most I/O per macrocell, and advanced Just contact your local Xilinx distributor or Manufacturer Xilinx Lattice Altera I/O interfacing including HSTL and SSTL. visit Device Family CoolRunner-II ispMACH4000C None www.xilinx.com/coolrunner2 System performance exceeds 400 MHz Standby Current <100 µA 2 mA N/A to order your kit today. And don’t forget with the lowest dynamic current and lowest Clock Divider YES NO N/A the new CoolRunner-II RealDigital CPLDs standby current in the industry (20 times Clock Doubler YES NO N/A are fully supported by the easy-to-use I/O Standards Support LVTTL, LVCMOS, LVTTL, LVCMOS N/A ™ lower compared to all other 1.8V devices!) HSTL, SSTL ISE WebPACK tool, which you can The unique Clock Divider means no more I/O Banks (max) 4 2 N/A download FREE right now.

www.xilinx.com/realcpld

© 2002, Xilinx Inc. All rights reserved. The Xilinx name and the Xilinx logo are registered trademarks, CoolRunner, WebPACK are trademarks, and The Programmable Logic Company is a service mark of Xilinx Inc. All other trademarks and registered trademarks are the property of their respective owners. LETTER FROM THE EDITOR

Why You Should Read Xcell

rogrammable logic is now the overwhelming choice for all types of designs, from low-cost P consumer devices to high-performance switching systems. When you consider the many advantages of programmable logic, you’ll see that it’s not only the fastest, easiest, and most flexible way to develop new products, it’s also the lowest cost and the most reliable solution in most applications. There simply is no better way to develop new products.

The Xcell Journal will help you bring your imagination into reality through programmable logic technology. Xcell is written by engineers who understand the challenges you face every day, and we strive to bring you the latest information about the products and services from Xilinx and our entire Partner Ecosystem, so you can make the best and most informed choices. We show you how to save time, effort, and money, while creating better, more profitable products. EDITOR IN CHIEF Carlis Collins [email protected] 408-879-4519 Xilinx is the undisputed world leader in programmable logic. We invented the Field Programmable

MANAGING EDITOR Tom Durkin Gate Array (FPGA) back in 1984, and since that time we have continually created faster, cheaper, [email protected] 530-271-0899 more capable products with each new generation. In 1994, a basic 25K-gate XC4025 FPGA, built

XCELL ONLINE EDITOR Tom Pyles on 0.6µ technology, cost you $654. This year the price of our smallest Spartan™-IIE device with [email protected] 720-652-3883 twice as many system gates is just $6.00. Plus, this year we’ll be the first in the industry to introduce CONTRIBUTING EDITORS Jane Middleton 90 nm technology to even further reduce costs. As you can see, programmable logic has matured Charmaine Hussain Peter Bergsman rapidly, and that’s why its use is growing faster than any other market segment.

DESIGN & ILLUSTRATION Scott Blair Dan Teie Our Partner Ecosystem is also an important part of what makes Xilinx technology so attractive. Our partners are the best in the business, and they provide a wide range of development tools, Xcell journal intellectual property (IP), design services, and support products that help make your job easier Xilinx, Inc. while making your products more successful. Together, we are changing the future of logic 2100 Logic Drive San Jose, CA 95124-3400 design, bringing you advanced design solutions, helping you create new realities. Phone: 408-559-7778 FAX: 408-879-4780 ©2003 Xilinx, Inc. The Xcell Journal is your best source for information about this exciting technology. All rights reserved.

Xcell is published quarterly. XILINX, the Xilinx logo, CoolRunner, Rocket Chips, Rocket IP, Spartan, and Virtex are registered trademarks of Xilinx, Inc. ACE Controller, ACE Flash, Alliance Series, AllianceCORE, ChipScope, CORE Generator, Fast CONNECT, Fast FLASH, Fast Zero Power, Foundation, HDL Bencher, IRL, J Drive, LogiCORE, MicroBlaze, MultiLINX, NanoBlaze, PicoBlaze, QPro, Real-PCI, RocketIO, SelectIO, Carlis Collins SelectRAM, SelectRAM+, Smart-IP, System ACE, Virtex-II Pro, Virtex-II EasyPath, WebFITTER, WebPACK, Editor-in-Chief WebPOWERED, XACT-Floorplanner, XACT-Performance, XAPP, XC designated products, Xilinx Foundation Series, Xilinx XDTV, and XtremeDSP are trademarks, and The Programmable Logic Company is a service mark of Xilinx, Inc. Other brand or product names are registered trademarks or trademarks of their respective owners.

The articles, information, and other materials included To subscribe to the printed Xcell Journal, in this issue are provided solely for the convenience of our readers. Xilinx makes no warranties, express, implied, statutory, or otherwise, and accepts no liability or to view the Web-based Xcell Online Journal, visit: with respect to any such articles, information, or other materials or their use, and any use thereof is solely at the risk of the user. Any person or entity www.xilinx.com/publications/xcellonline/ using such information in any way releases and waives any claim it might have against Xilinx for any loss, damage, or expense caused thereby. TABLE OF CONTENTS

COVER STORY 9 2121 Could Microprocessor Obsolescence Be History? Embedding soft processors in FPGA fabric offers a radical but robust new solution that effectively Ride the Crest of the Serial Tsunami eradicates the problem of processor obsolescence. The Xilinx Serial Tsunami Initiative is a comprehensive set of programmable serial I/O solutions and system strategies that will help you simplify designs, increase performance, and lower system costs.

Working with the Best Use SpyGlass Predictive Analysis Xilinx now ranks number four on for Effective RTL Coding FORTUNE magazine’s list of the “100 Best Companies to Work For.” Atrenta Inc.’s SpyGlass software uses Here’s why it matters to you. a “look-ahead” engine based on fast-synthesis technology to help you identify potential problems – and fix 6 them – early in the design process. 54 SPRING 2003, ISSUE 45 Xcell journal

Serial Tsunami Requires Working with the Best ...... 6 Changes to SI Verification Ride the Crest of the Serial Tsunami ...... 9

and Planning Surf the Serial Wave to Success...... 14

Mentor Graphics’ HyperLynx Serial Tsunami Verification and Planning...... 17 products are designed to facilitate the transition from parallel to Could Microprocessor Obsolescence Be History? ...... 21 serial connectivity. 17 Reduce Your RISC – PicoBlaze Reference Design ...... 24 Xilinx FPCs Target Cost-Sensitive Applications...... 29

Design Embedded Programmable Systems...... 34 Reduce Your RISC Integrated FPGA and Microprocessor Solutions...... 36 with a PicoBlaze Versatile MicroEngine Simplifies Embedded Designs...... 41

Reference Design Push the DSP Performance Envelope...... 44

Build your own soft ISE 5.2i Further Reduces Your Design Costs ...... 48 microcontroller in a CoolRunner-II CPLD. Prototype Xilinx Devices with MultiPRO Desktop Tool ...... 50 It’s easy, fast, and free. 27 DLK Enables Cost-Effective Design...... 52 Spyglass Predictive Analysis for Effective RTL Coding ...... 54

Get RealFast RTOS with Xilinx FPGAs ...... 58 Get RealFast RTOS Virtex-II Pro FPGAs Deliver Proven Interoperability ...... 61 with Xilinx FPGAs Spartan-IIE Family Grows...... 66 Real-time operating systems CoolRunner-II Solutions Save Money ...... 69 implemented in Xilinx FPGAs enhance performance, improve Reinventing the Signal Processor ...... 72 predictability, simplify design, How to Make Smart Antenna Arrays...... 76 and lower system costs. 58 FPGAs – Multiprocessing I/O Infrastructure for 3G...... 80 Bluetooth Wireless Technology BOOST Lite Processor...... 86

Decode MPEG-2 Video with Virtex FPGAs...... 88 How to Make Smart Bughunters @ Siemens...... 91 Antenna Arrays support.xilinx.com: The Answers You Need...... 94 The Nallatech BenADIC card combines Learn Smarter, Faster ...... 96 a 20-channel data acquisition system with Xilinx XtremeDSP technology and Virtex Xilinx Technology Enabled Deployment of Real-PCI Express...... 98 FPGAs for high-performance digital Xilinx Events and Tradeshows...... 99 signal processing. 76 Fuzzy Logic...... 100 Reference Pages ...... 101 Viefromw the top Working with the Best Xilinx now ranks number four on FORTUNE magazine’s list of the “100 Best Companies to Work For.” Here’s why it matters to you.

6 Xcell Journal Spring 2003 VIEW FROM THE TOP

Xilinx is a certainly they can take advantage of every new oppor- This new process technology has resulted great company to tunity with a full staff. Companies that must in a 5% to 80% percent chip-size reduction work for, as layoff their workers suffer from lower compared to any competing FPGA. IBM acknowledged by morale, less productivity, and a slower return plans to manufacture this new product, in FORTUNE maga- to profitability. That’s what we are seeing high volumes, in the second half of 2003. zine’s ranking, and now: Xilinx is gaining market share, while The new IBM $2.5 billion, 300 mm chip- we are a great com- our competitors are struggling to keep up. making facility combines – for the first time pany to work with, anywhere – IBM chip-making break- by Wim Roelandts because we have What We’ve Accomplished throughs such as copper interconnects, sili- not slowed our fast The technology development that we are con-on-insulator (SOI), and low-k dielectric CEO, Xilinx Inc. pace of innovation doing today will not show up in your insulation on 300 mm wafers. nor decreased our customer support activi- hands for several years. That’s why it is Our investment in 90 nm manufactur- ties – all critical to the success of our cus- imperative that we keep our technology ing technology will enable us to drive pric- tomers. We weathered the recent economic advancing, even when current revenues are ing down to under $25 for a downturn in ways that have made Xilinx slumping. Otherwise, we would find that one-million-gate FPGA, which represents a stronger across the board. How did we keep when the economy turns around, we would savings of 35% to 70% compared to any our technology advancing at a high rate, not be ready to support your demand. Here competitive offering. Such a significant continue to expand our support, and are some of the important advances we are reduction in pricing is possible due to the gain market share, in a downward spiraling now introducing. remarkable economies of scale involved economy? Here’s how. with moving to next-generation A large part of our business manufacturing processes at came from the telecommunica- increasingly finer geometries. tions industry, which was Now we can achieve greater severely hit by the downturn. device densities and higher Our revenues were cut almost yields, making our FPGAs the in half, overnight. Our com- logical alternative to ASICs. petitors were affected just as The rising costs of develop- severely and as a result, many ing ASICs on more advanced were forced to lay off a sizeable processes are well known. Not part of their workforce and to only are non-recurring engi- reduce both their product neering (NREs) charges rising development and their cus- to over one million dollars per tomer support activities. We design, but the engineering cost chose to make layoffs only as a of developing and testing a last resort, and yet we needed to complex ASIC on advanced cut our expenses dramatically. processes such as 150 nm or We chose to take pay cuts, 130 nm technology can run up on a sliding scale, instead of to 10 times that amount. With doing layoffs. The average pay the deployment of our new cut was 6%, rising to 20% for FPGAs on 90 nm technology, myself. Everyone shared the Xilinx has resolved all of the burden according to their abili- deep sub-micron design chal- ty, and everyone was very lenges for you. Using these new happy to have some job securi- FPGAs, you will get all of the ty in a time when many of our colleagues in 90 nm Low-Cost Fabrication Technology substantial cost advantages of the 90 nm other companies were losing their jobs. Using IBM’s most advanced, copper-based, technology without being forced to worry Not only did our morale remain high, but 90 nm semiconductor manufacturing about the detailed circuit design issues our productivity increased as well, and we process technology, IBM and Xilinx are associated with ASICs. You can concentrate continued to produce our new technolo- manufacturing a new FPGA design in on getting your system designed rather gies even faster than before. IBM’s new 300 mm chip fabrication facili- than on getting the chip itself to function. Studies have shown that companies that ty. This technology is a major reason why With our increased density, performance, can avoid layoffs rebound more quickly our FPGAs will continue to lead the indus- and system features, you get all the benefits when the economy turns around, because try in cost reduction. of an ASIC in a flexible, programmable

Spring 2003 Xcell Journal 7 VIEW FROM THE TOP

FPGA, without the risk and the huge NRE (XGMII), RapidIO, PCI, PCI-X, CSIX, for the RocketIO transceivers for flawless costs. This is dramatically expanding the HyperTransport™, XSBI, and SFI-4. system design. You will know exactly total market for FPGAs, both in new appli- Therefore, Virtex-II Pro FPGAs are the what to expect for your specific design cations in existing markets and in totally only devices in production today that situation, plus we provide best design new markets. enable you to bridge between the parallel practices and PCB layout guidelines to and serial interfaces, making them the ulti- help you succeed. Serial Tsunami mate connectivity platform. We have also developed a new, open, We have pioneered the development of very Xilinx and partners are delivering pre- scalable, lightweight serial standard called high-speed serial I/O technology – we call it engineered IP cores for serial connectivity Aurora to help you transition from parallel the Serial Tsunami Initiative. This new protocols. Cores for Gb Ethernet MAC to serial interfaces. It is a link layer technology will solve many design chal- with PHY, 10 Gb Ethernet MAC with protocol that can encapsulate and trans- lenges, allowing you to replace old parallel XAUI, PCI Express, Fibre Channel, and port any higher-level protocol. It is very busses with a far less expensive solution. reference designs for SONET OC-48 resource-efficient with low latency. With Serial Tsunami, you can significantly backplanes are available now, and more are A single-lane reference design and the reduce costs, produce faster designs, reduce your PC board area, and create products that were never possible before. Other advantages of Serial Tsunami include reduced EMI, noise, cross talk, and skew, which makes your overall design more reliable. You can easily expand the I/O for increased bandwidth, and the physical interface can drive long signal traces on your PC boards, making them ideal for backplanes. The move to serial I/O technology is inevitable – there is no better way of cutting costs while keeping pace with current and future bandwidth requirements. Our Virtex-II Pro™ FPGAs with embedded RocketIO™ 3.125 Gbps transceivers, and the accompanying IP cores, reference designs, and support infrastructure, provide the best possible way for you to realize all the advantages of serial I/O technology without the pitfalls. Virtex-II Pro FPGAs support major emerging serial interfaces such as PCI Express™, Gb Ethernet PHY, 10 Gb Ethernet XAUI, Fibre Channel, OC-48, OC-192 and OC-768 SONET for being added. You can also build higher- specification are available for free down- backplanes, Serial RapidIO™, and level protocols using our embedded IBM load at www.xilinx.com/aurora/. A quad- InfiniBand™. Each embedded RocketIO PowerPC™ processors. Reference designs link reference design will be available transceiver in the Virtex-II Pro FPGAs is and evaluation/prototype boards help you during the first half of 2003. based on several generations of customer- verify the performance of the transceivers proven Mindspeed SkyRail™ technology in real hardware. And Much More and can run from 622 Mbps to 3.125 Creating designs with speeds of 622 I’ve only mentioned a few of our most Gbps; there are up to 24 of these trans- MHz, 3.125 GHz, 10 GHz, and beyond important technologies. As you can see, the ceivers available in one FPGA. will present challenges with PCB design and current downturn has not slowed our inno- Virtex-II Pro FPGAs also support par- signal integrity. Therefore, Xilinx and lead- vation, and we are fully ready to support allel interface standards such as SPI-3 ing EDA partners are solving this dilemma your design requirements, both now and in (POS PHY Level 3), SPI-4.1 (Flexbus 4), with tools such as the Cadence the future. We are the fourth best company SPI-4.2 (POS PHY™ Level 4), 10 Gb SPECCTRAQuest™ and HSPICE models. to work for, and the number one company Ethernet Media Independent Interface We provide in-depth characterization data to work with.

8 Xcell Journal Spring 2003 RideRide thethe CrestCrest ofof thethe SerialSerial TsunamiTsunami The Xilinx Serial Tsunami Initiative is a comprehensive set of programmable serial I/O solutions and system strategies that will help you simplify designs, increase performance, and lower system costs.

Spring 2003 Xcell Journal 9 by Anil Telikepalli Marketing Manager, Virtex Solutions Xilinx, Inc. [email protected]

Remember the days when serial I/O brought either USB or IEEE 1394 to mind? Not any more. A veritable tsunami of new and evolving serial I/O standards is washing over the technological landscape, delivering promises of higher performance, lower costs, and simpler designs. Remarkable advances in semiconductor technology and the availability of low-power CMOS serial transceivers are driving a migration of tidal wave proportions from parallel to serial interfaces. Xilinx is at the forefront of this movement. We have been shipping our flagship Virtex-II Pro™ FPGAs (www.xilinx.com/ Figure 1 - Virtex-II Pro Platform FPGA virtex2pro) since the beginning of 2002. The Virtex-II Pro devices are the only Platform Serial system interfaces such as PCI Serial Tsunami Initiative help you surf the FPGAs with embedded 3.125 Gbps Express™, Serial RapidIO™, Infini- surges of multiple, evolving serial stan- RocketIO™ CMOS serial transceivers. Band™, 1 Gb Ethernet, 10 Gb Ethernet dards all the way across devices, IP, and Designing cutting edge serial I/O technolo- XAUI (10 gigabit attachment unit inter- software as you build your systems. gies is a challenging endeavor, but using seri- face), Fibre Channel, Serial ATA, SxI-5, and The foundation of the Serial Tsunami al I/Os to build your systems need not be. TFI-5 are all available today, with many Initiative was our aqusition of RocketChips more coming to address specific needs. Inc. two years ago. Today, a dedicated Broad Trend Toward Serial Connectivity R&D team in the Communications Experts agree that both single-ended and Serial Tsunami Initiative Technology Division (CTD) is wholly differential parallel I/Os have reached Xilinx launched the Serial Tsunami focused on improving, developing, and their physical limitations and cannot pro- Initiative (www.xilinx.com/connectivity) to delivering the capabilities that make up the vide a reliable and cost-effective means for sail the crest of the industry move from Serial Tsunami vision. Commenting on data rates greater than 1 Gbps. Serial I/O parallel to serial interfaces, to reduce this strategy, Wim Roelandts, president provides performance benefits to high- costs, and to keep pace with current and and CEO of Xilinx, said: “The underlying speed systems and cost benefits to low- future bandwidth requirements. The ini- RocketIO technology that’s making it pos- “The underlying RocketIO technology that’s making it possible for Xilinx to support multi-gigabit systems is truly ‘rocket science’ – not something you’d want to simply license from an external IP provider” speed systems. This double benefit has tiative is a vision for delivering a complete sible for Xilinx to support multi-gigabit propagated waves of new serial interface suite of serial connectivity solutions – systems is truly ‘rocket science’ – not some- standards development. including Platform FPGAs, IP cores, thing you’d want to simply license from an The inevitable result is the current wide- design software and methodologies, refer- external IP provider. Having the internal spread migration toward serial I/O across ence designs, solution boards, extensive expertise is a critical element of our strate- many segments of the industry, including characterization data, and training classes gy to make serial a mainstream technology PC and consumer, storage and servers, com- – to enable you to design your next- by providing designers with a comprehen- munications networking, industrial com- generation products. sive, scalable, and cost-effective solution.” puting and control, and test equipment. Unlike point solutions, the Xilinx “As the only FPGA vendor shipping plat-

10 Xcell Journal Spring 2003 ing EDA partners such as Cadence are solv- PCS PMA ing this dilemma with tools such as From FPGA SPECCTRAQuest™ transmission media Transmit TX+ Fabric TX DATA 8B / 10B Serializer FIFO Buffer TX- Encode Transmit and HSPICE™ (highly accurate simula- TX Clock Generator tion program with integrated circuit 40.0 MHz 622 Mbps emphasis) models. REFCLK to 20X Multiplier to Xilinx provides in-depth characteriza- 156.3 MHz Comma Detect Loopback tion data (Figure 3) for the RocketIO and Word RX Clock Generator 3.125 Gbps Alignment transceivers for flawless system design using To FPGA Receive Receive RX+ Fabric RX DATA Elastic 8B / 10B Deserializer our Virtex-II Pro FPGAs. You will know Buffer Decode Buffer RX- exactly how your design is expected to operate, and you will be supported every step of the way with best design practices Figure 2 - RocketIO serial transceiver in Virtex-II Pro FPGAs and PCB layout guidelines. form devices with programmable 3.125 10 Gb Ethernet XAUI, Fibre Channel, Aurora Reference Design Gbps transceivers and IP cores for key serial OC-48, OC-192, and OC-768 SONET Aurora is a new, open, lightweight, scala- interface standards, it’s evident that our for backplanes, and Serial RapidIO. ble serial interface provided by Xilinx to approach is paying off,” he added. “Watch Virtex-II Pro FPGAs also support parallel help you transition from parallel to serial for many more exciting developments in the interface standards such as SPI-3 (POS PHY interfaces. It supports any transport pro- coming months as we continue to extend Level 3), SPI-4.1 (Flexbus 4), SPI-4.2 (POS tocol, has a compact architecture, and our market lead in serial connectivity.” PHY™ Level 4), 10 Gb Ethernet Media delivers low latency. A single-lane refer- Steve Berry, principal analyst for Independent Interface (XGMII), RapidIO, ence design is available for free download Electronic Trend Publications, evaluated the PCI, PCI-X, CSIX, HyperTransport™, at: www.xilinx.com/aurora. A quad link benefits of the serial I/O trend and the role XSBI, and SFI-4. reference design will be available during Xilinx plays in the movement: “Through its The Virtex-II Pro FPGA is the only the first half of 2003. leading technology and well-established device available today that enables bridg- Xilinx takes an active role in industry strategic partnerships, Xilinx is poised to ing across all these interface classes and standards organizations – including lead the industry in the transition to serial generations, making it the ultimate con- PICMG, RapidIO Trade Association, interfaces. System architects will experience nectivity platform. NPF, OIF, PCI-SIG, XFP, SMPTE, and dramatic improvements in bandwidth, pin count, power, and signal integrity.” Xilinx Tools and Solutions for Serial Connectivity Virtex-II Pro Platform FPGAs Using the 3.125 Gbps RocketIO integrated Although it is widely accepted that serial transceivers in Virtex-II Pro FPGAs, Xilinx I/O delivers significant advantages over and its partners are delivering pre-engineered parallel I/O methods, until now there was IP cores for serial connectivity protocols. no flexible, cost-effective, general-purpose Cores for 1 Gb Ethernet MAC with PHY, 10 silicon support. The standards wars do not Gb Ethernet MAC with XAUI, PCI Express, have clear winners, and the transition path Fibre Channel, and reference designs for is not obvious. SONET OC-48 backplanes are available Virtex-II Pro Platform FPGAs (Figure 1) now, and more are being added. You can also deliver state-of-the-art serial I/O with as build higher level protocols using the embed- many as 24 RocketIO transceivers embedded IBM PowerPC™ processors. Reference ded in the highest performance FPGA fab- designs and evaluation/prototype boards ric in production. The RocketIO help you verify the performance of trans- transceivers (Figure 2) operate at speeds ceivers in real hardware. from 622 Mbps to 3.125 Gbps. The prod- Designing with parallel I/O meant you uct is available in a wide range of program- were limited to speeds of 33 MHz to 133 mable logic densities in 10 devices and MHz. In the serial world, these speeds leap Figure 3 - RocketIO characterization using ML320 hardware platform: several packages. Virtex-II Pro FPGAs sup- to 622 MHz, 3.125 GHz, 10 GHz, and eye diagram at receiver shows 3.125 Gbps port all major emerging serial interfaces beyond. This raises challenges with PCB on 44-inch FR4, two connectors with such as PCI Express, 1 Gb Ethernet PHY, design and signal integrity. Xilinx and lead- 33% pre-emphasis.

Spring 2003 Xcell Journal 11 ing both serial and parallel interfaces within PCI 32/33, 5-client system (250 pins, 1 Gbps) the same FPGA. Xilinx delivers IP cores and reference designs to interface with par- 12345 allel standards such as 10 Gb Ethernet 50 XGMII, RapidIO, SPI-3, SPI-4.1, SPI-4.2, HyperTransport, PCI, PCI-X, CSIX, XSBI, PCI Express 5-client system (80 pins, 80 Gbps) SFI4, and many others. With Virtex-II Pro FPGAs you can 1 2345bridge across parallel and serial interfaces to make a seamless transition. Plus, you can 4 continue to interface with the ASSPs that best suit your needs based on their function, not on their interface support.

Case Studies – Serial vs. Parallel Serial helps you break the bandwidth bot- Figure 4 - The PCI Express serial standard delivers 80X bandwidth with less than one-third of the pins required by the PCI parallel protocol. tleneck with higher data rates using fewer pins. This lowered pin count delivers many advantages over traditional parallel implementations. • Low device pin counts: With fewer Four Serial Channels Bonded Together pins per connection, you save costs TXC 36 from small board real estate, smaller TXD PCS/PMA MAC PMD packages, fewer PCB traces and layers, Media Access Physical Coding Physical Medium Control RXC Sublayer/Physical Dependent and even smaller connectors. RXD 36 Media Attachment Interface • Expansion and scalability: High-speed serial pipes scale to support higher data Parallel Serial rates as needs change. XGMII 2" FR4 XAUI 20" FR4 • Improved EMI and noise immunity: With embedded clock-data mechanism Figure 5 - Serial XAUI allows 10 times longer PCB trace, lower pin count, and lower jitter/skew than XGMII. that is serial vs. wide parallel data and clock, you get higher clock rates at others – with the goal of providing com- interfaces embraced by available ASSPs. lower EMI, noise, cross talk, and skew. plete solutions synchronized with the Your optimum design could easily incorpo- • Physical interfaces: Serial I/O physical availability of new standards. An example rate multiple ASSPs equipped with a variety interfaces can also drive long PCB is the industry’s first PCI Express core, of interfaces. Xilinx and ASSP vendors work traces on backplanes or even external which Xilinx released on the same date closely together to ensure interoperability cable (copper or optical). that the PCI-SIG ratified the specifica- with Virtex-II Pro features and electrical I/O tion. [Ed. note: See “Xilinx Technology standards as well as interface IP cores jointly One obvious way to analyze the costs Enabled Instant Deployment of Real-PCI verified in hardware. [Ed. note: For a discus- and benefits of I/O interfaces is to compare Express” in this issue.] sion of ASSP and Virtex-II Pro FPGA inter- performance per pin. Let us take a look at The Serial Tsunami Initiative also offers operability, see “Virtex-II Pro Platform FPGAs two examples – PCI Express vs. PCI, and several levels of training about working Deliver Proven Interoperability” in this issue.] 10 Gb Ethernet XAUI vs. XGMII. with serial technologies. These range from an online introduction to in-depth face-to- Bridge Across Any Standard Case 1 – PCI and PCI-Express face classes. In addition, design support The leading parallel standards are not going PCI is a higher bandwidth, parallel, shared- and services are available from experienced to vanish overnight, and you might find it bus standard, while the PCI Express proto- designers within Xilinx Design Services. necessary to accommodate an older parallel col is a new serial version. A 5-client standard in a design that is intended to pro- communication example (Figure 4) con- Manage Interoperability mote a newer serial standard. So how do trasts the two standards: The multiplicity of available and emerging you manage the transition? Virtex-II Pro • A 32 bit/33 MHz PCI interface requires serial standards is reflected in the range of FPGAs have solved this for you by support- 50 pins, delivering about 1 Gbps

12 Xcell Journal Spring 2003 (32 bits x 33 MHz = ~1 Gbps) total • XGMII is a full-duplex, parallel inter- route as much as 20 inches FR4 on bandwidth that would be shared among face operating at 312.5 Mbps per wire PCBs, backplanes, and even cable. Its all 5 clients (5 x 50 pins = 250 pins). using 74 pins for data, clock, and con- low pin count makes it highly scalable. trol. Its data rate is 10 Gbps each way. • The PCI-Express interface operates at Fewer pins, higher bandwidth, scalability, Due to signal count, skew, and other 2 Gbps data rate in each direction, and significant cost savings are all driving the problems, XGMII fails to support delivering 4 Gbps full-duplex band- move to serial. multiple interfaces in a single chip or width over just four pins (that is, two routes longer than a few centimeters. differential pairs) for each point-to- Conclusion Hence, it is restricted to be only a chip- point connection. The total aggregate The Xilinx Serial Tsunami Initiative is revo- to-chip interface and has a maximum bandwidth provided in the same lutionizing system architectures. State-of- FR4 trace limitation of 2 inches. 5-client configuration is 80 Gbps the-art serial I/O delivers scalable, (4 Gbps/connection x 4 connections/- • XAUI is a 4-lane, full-duplex, serial high-bandwidth at low cost. As the industry client x 5 clients = 80 Gbps). interface, with each lane running at rapidly moves to serial connectivity, and 2.5 Gbps data rate (3.125 Gbps baud away from current parallel interface schemes, Case 2 – 10 Gigabit Ethernet XGMII rate). It requires 4 differential signal you must either sink or swim. Xilinx and and XAUI pairs in each direction and hence, 16 Virtex-II Pro Platform FPGAs offer a whole XGMII is a full-duplex interface between pins in total, delivering 10 Gbps aggre- boatload of serial solutions from the initial the MAC and PHY layers. XAUI is a seri- gate data bandwidth. Automatic de- concept to the finished product. Check it out alized version of this interface (Figure 5). skew and pre-emphasis allows XAUI to at: www.xilinx.com/serialsolution/.

Spring 2003 Xcell Journal 13 SurfSurf thethe SerialSerial WaveWave toto Success Success LearnLearn howhow toto accelerateaccelerate youryour multi-gigabitmulti-gigabit serialserial linklink designdesign process.process.

by Donald Telian This article will show you how to har- You simply cannot afford to lose data. Technologist ness the power of the Xilinx Serial Figure 1 illustrates two types of data Cadence Design Systems, Inc Tsunami Initiative. We’ll look at helpful transfers relevant in MGHz design. The [email protected] tools and effective techniques already first row shows the serial link itself. Data being used by engineers today. is sourced by the transmitter (Tx), trans- The move to multi-gigahertz (MGHz) seri- ferred through the differential intercon- al technology represents a sea change of Effective Data Transfers nect, and latched onto by the receiver tsunami proportions. A variety of industry Whether you’re moving bits down an (Rx). If all elements are not tuned to each forces are revolutionizing product design to MGHz serial link or moving money into other, the data is not transferred effective- accommodate the speed and throughput of your bank account, it’s important to make ly. The transmission medium must be serial data transfer. sure all the data is transferred correctly. designed carefully – and all three ele-

14 Xcell Journal Spring 2003 ments (Tx, transmission medium, Rx) Although the red signal appears more deter- • Thorough characterization and must be well-matched. ministic than the blue case, the transmitter in accounting for the interconnect’s dis- Similarly, the second row of Figure 1 this case is not delivering enough voltage continuities and behaviors (such as shows the design chain between Xilinx swing to the circuit to meet the thresholds in vias, connectors, dielectric loss). serial technology and your design process. the receiver to extract the serial data. Just as in the case of the serial link, the Items that make an MGHz serial link The SPECCTRAQuest RocketIO Design Kit Xilinx technology must be delivered to you work right include: Recognizing that the MGHz design process in a medium that matches your design • Proper sizing of the transmitter for the has discontinuities too, Xilinx proactively process to ensure a clean data transfer. required voltage swing developed the RocketIO Design Kit for In cooperation with Cadence, Xilinx has Cadence’s SPECCTRAQuest high-speed • An understanding of the differential developed the SPECCTRAQuest Design PCB design tool. This kit was first intro- impedance of the transmission medi- Kit as a way to effectively communicate the duced with the Virtex-II Pro FPGA um (Z_differential is typically operation of the RocketIO™ MGHz trans- in March 2002, and was described in an 2*[Z_uncoupled Ð Z_coupled]) ceivers found in the Xilinx Virtex-II Pro™ Xcell Journal article at that time FPGAs. Later, we’ll examine the Xilinx- • Matching that impedance with a ter- (see support.xilinx.com/publications/xcellon- Cadence partnership and how you can use mination resistor between the two nets line/partners/xc_speckit42.htm). The kit it to accelerate your design process and at the receiver’s inputs helps you implement the RocketIO tech- improve your products. But first, let’s take a nology by providing the electronic files and closer look at the MGHz models that match and can serial link itself. be inserted directly into your Data Transmission User design process. Multimedia How Serial Links Work Source Medium of Data tutorials within the kit help Measuring the “opening” you quickly understand the on an eye diagram is a steps involved. common way to judge the Serial Differential Mohammad W. Ali, Tx Rx effectiveness of serial trans- Link Interconnect Ph.D., a technologist at mission. Figure 2 superim- Tellabs, found the kit to poses eye diagrams of offer significant improve- received signals in three ments in both the through- slightly different test cases. Design Xilinx SPECCTRAQuest Your put and quality of his design Chain Technology Design Kit Design In the green signal’s circuit, process. He states, “The the transmitter, intercon- new silicon package board nect, and receiver imped- Figure 1 - How data is effectively transferred in a serial link, and its design chain. solutions in the design kit ances are well-matched. save me a lot of time, par- Here, all three elements are ticularly for my multiboard working together to pro- simulations that involve dif- duce an acceptably wide ferent styles of routed 2.5 eye opening. GHz differential pairs. The other two wave- With the new interfaces in forms in Figure 2 show this SPECCTRAQuest Kit, what happens when only I can accomplish my simu- one of the three elements lation task 10 to 20 times becomes imbalanced. In faster.” the blue signal’s circuit, a With RocketIO trans- mismatch in the imped- ceivers, signaling through- ance of the transmission put has increased an order line causes erratic signal of magnitude. And with the behavior and a collapse of accompanying design kit, the eye opening. Changing the throughput of the the transmitter’s imped- design process has increased ance, however, causes an Figure 2 - Mismatch in any one of the serial link components can similarly as well – even with even further collapse in the adversely affect performance. the challenges of MGHz red signal’s circuit behavior. design.

Spring 2003 Xcell Journal 15 Design Chain Optimization task and improve his prod- Great technology that is hard to Design chain Supply chain uct’s quality instead of use isn’t really all that great. wading through thousands Company A Deployment Distribution New technologies have failed of lines of text-based mod- because they were just too hard els and netlists. to access or too complex to Engineering Manufacturing Stéphane Tessier, a hard- work with. That’s why Figure 1 ware engineer at Radical shows the two parallel chal- Horizon, a Montreal-based lenges that must be solved for Company B Deployment Distribution software-defined radio high-speed serial communica- Design Component (SDR) solution provider, Kits Kits agrees that the kits are a tion to succeed: Engineering Manufacturing 1. Proper transmission of “must-have” for developing serial data from transmitter multi-gigabit links. He to receiver, and found that the tutorial infor- Company C Deployment Distribution mation in the kits shortened 2. Proper transfer of serial his learning curve, and he technology from Xilinx Engineering Manufacturing believes use of the kits will to you. “reduce the number of board Focusing on the second chal- iterations.” lenge is what “design chain optimization” is all about. Figure 3 - Design and supply chains and the role of design kits. Conclusion Design chain optimization is A survey of engineers cur- the only way to achieve the 10X to 20X by PCB designers would not work with rently using the kits revealed design task improvement that the this new technology. In fact, the only accu- that they unanimously find them valuable RocketIO kit has to offer. rate representation of the RocketIO trans- for serial MGHz design. Already, 75% of the Figure 3 illustrates the design chain. ceiver was the model used to design the engineers believe that using the combined Because the term “design chain” is not as silicon – an IC-level model that only Xilinx/Cadence kit has helped them improve common as “supply chain,” both are shown worked in IC design tools. their product’s quality. to help you understand their function and As Figure 1 shows, the SPECCTRA- During 2002, the integration of the relationship to each other. Within the Quest Design Kit answered this challenge SPECCTRAQuest and RocketIO design design chain, design kits of “virtual compo- and became an efficient “transmission kits have become an integral part of the nents” (in the form of models, EDA files, medium” to bridge the worlds of IC and Xilinx Serial Tsunami Initiative – listed and databases) are transferred from the PCB modeling. New technology in the among EDN magazine’s top 100 products technology deployment group at one com- SPECCTRAQuest kit allows you to simu- for 2002. pany to the engineering group of another. late arbitrary PCB layouts with complex The serial tsunami is here and growing. In our example, the RocketIO kit effec- IC models – all from the SPECCTRA- As you join fellow engineers in riding the tively communicates the nuances of Quest user interfaces commonly found in serial wave, be sure to download your free MGHz technology to Xilinx customers. the high-speed PCB design process. If copy of the SPECCTRAQuest Design This is done by avoiding the vagaries of Xilinx had required PCB engineers to Kit. It will help you put the power of textual datasheets, instead providing elec- learn new IC simulation tools, it would MGHz signaling into your next design. tronic files that can be easily inserted into have caused a mismatch in the design your design process. These files are “exe- chain and hindered the adoption of For More Information cutable specifications” that can quickly be RocketIO transceivers. The SPECCTRAQuest RocketIO Design understood by engineers all over the Wenwei Qiao, an engineer at Applied Kit can be downloaded free of charge at: world, because the tool shows the Materials, prefers using the Xilinx and support.xilinx.com/support/software/spice/spice RocketIO serial transceiver in a context Cadence kit’s pre-packaged complex sili- -request.htm. Registration and click-license with which they are familiar. con models within the SPECCTRAQuest NDA are required. environment because they can be manip- Information about Cadence SPECCTRA- Bridging IC to PCB ulated much like simpler IBIS-style mod- Quest (SQ) and other free SQ design kits is Just as all elements in a serial link must be els. “In only 10 minutes after installation, available at www.specctraquest.com. matched, so must the elements in the seri- I was able to begin simulating my multi- An executive white paper on design chain al design chain. But here Xilinx had a chal- gigabit solution,” he reports. The user optimization is available at http://register. lenge: the model formats commonly used interface helps him focus on the design cadence.com/register.nsf/designChain/.

16 Xcell Journal Spring 2003 SerialSerial TsunamiTsunami RequiresRequires ChangesChanges toto SISI VerificationVerification Mentor Graphics’ HyperLynx products are designed to facilitate the transition andand PlanningPlanning fromfrom parallelparallel toto serialserial connectivity.connectivity.

Spring 2003 Xcell Journal 17 by Dave Kohlmeier including broadside (up and down a Director of Engineering, Simulation and layer) and edge-coupled (side to side Analysis Products, System Design Division on same layer) pairs. Figure 1 shows Mentor Graphics, Inc. the EM field lines calculated by the [email protected] Mentor Graphics HyperLynx™ 2D field solver for two diff pairs. Notice We see third-generation I/O (3GIO) the difference in the number of lines serial connectivity standards showing (which indicate field strength) up everywhere: HyperTransport™, between members of a pair versus InfiniBand™, and PCI Express™. those between the pairs themselves. Why are all the major platform ven- Impedance planning is an impor- dors moving to these new interconnect tant step with diff pairs where inter- schemes? Basically, the speed of this nal terminations might require you “Serial Tsunami” is being driven by the to maintain specific differential inability to cost-effectively resolve sig- Figure 1 - Plotted EM field lines for two adjacent differential pairs impedance. For this reason, we have nal and power integrity issues prevalent included an impedance-planning in standard synchronous, parallel, mul- dialog in the HyperLynx stackup edi- tidrop bus designs. Power and ground tor, as shown in Figure 2. In the edi- noise due to large high-frequency cur- tor, you can simply set a priority rent demands in large voltage, swing- parameter, such as trace width, and single-ended designs; reflections from request a spacing value that allows stubs due to multidrop connections; you to reach your goal (say, a differ- impossible delay/skew constraints for ential impedance of 100 ohms). data versus clock nets; and other limitations have made cost-effective high- Vias speed parallel design unattainable. At extremely high frequencies, using At the crest of the serial tsunami vias can increasingly introduce sig- are differential signaling, embedded nal integrity (SI) problems. Why? clocks, pre-emphasis, and point-to- Vias are a discontinuity in the trans- point interconnect; all these elements Figure 2 - Differential impedance stackup planning mission line. give us a way around the electromag- using the stackup editor Just visualize the nicely controlled netic barriers of parallel multidrop impedance of a trace over a ground design. These changes require us to plane and then compare that to a ver- New 3GIO constraints for differential change our verification methodologies to tical tube of copper with no corresponding (diff) pair topologies include: assure successful designs. Let’s take a look at ground return path – the impedance is • Minimize coupling from adjacent pairs some of the changes in detail, focusing on clearly different. In these systems, it’s how they affect the need for up-front plan- • Match length of traces in a pair important to keep diff pair vias close ning and verification. together and to add ground stitching vias in • Minimize the number of vias used, but the vicinity for shielding and common match use and location within pairs Differential Signaling mode return. when they are used. Low voltage differential signaling (LVDS) is However, it’s also important to simulate the real basis for the transition to 3GIO. The idea here is that the more the diff the impedance discontinuity. HyperLynx LVDS, a two-wire system where return cur- pair is coupled, the more external noise will Version 7.0 has given you the ability to have rents are expressly dealt with by the second be rejected. An aggressor pair will induce a via L and C calculated automatically, to closely coupled wire, is used to minimize signal in the victim pair, but as such, it specify L and C values for each via type, or ground-return loops and the cumulative would induce the same signal in both wires to set a default via value for the entire board. effect produced by wide parallel buses on that (if they are closely coupled). Any noise return loop (otherwise known as simultane- agent that affects both wires in a diff pair Transmission Line Losses ous switching noise, or SSN). At these high will have no effect on the resulting differ- Without going into a lengthy discussion on frequencies, though, it is still very important ential waveform. losses in transmission lines, let’s just stress to maintain a complete ground return path So, it is important that a verification the importance of your verification and (no cuts or voids in the ground plane), espe- tool understands the self-coupling of each planning tools in recognizing and support- cially for common mode currents. diff pair and the coupling between pairs, ing loss. As frequencies increase, AC losses

18 Xcell Journal Spring 2003 are significant. Skin effect (where more ceiver implementation, simply assign the is a large enough window of time where the current is forced to the surface of a con- RocketIO model just as you would an IBIS signals are reliably in one state or the other ductor) and dielectric loss (thermal heating model and use either your true topology (necessary for the receiver clock recovery of the dielectric as the EM waves travel extracted from your layout system circuit to dependably extract the data). through the dielectric) are the culprits. (HyperLynx supports trace model extrac- As in digital simulation, we are required Especially in backplanes where the trace tion from all major PCB vendors) or a to create stimuli that can affect the signals length on FR4 can be substantial, loss- in a realistic way, which in turn we es will reduce and “smear” voltages at see as a changing shape in the “eye.” the receiver. This reduces your ability Because the bit history will affect the to have a clean “eye” for the receiving analog result, an-easy-to-use multibit IC to sync to, and extract, the correct stimulus editor in HyperLynx data. HyperLynx Version 7.0 includes Version 7.0 has been added that the trusted “W” element in its simula- allows you to create bit streams of tor for robust loss support. ones and zeros for the simulator to drive the diff pair (Figure 4). Pseudo- High-Speed Models random, 8-bit/10-bit encoded and Modeling and simulation go hand- customer defined patterns are sup- in-hand. Whether you are doing ana- ported, as well as random (uniform log or digital simulation, you can’t get or Gaussian) jitter. This functionality, very far without models. In the SI combined with easy-to-use differen- business, the I/O Buffer Information Figure 3 - Oscilloscope view of an eye diagram including eye mask tial probes, provides you with robust Specification (IBIS) standard has eye diagram support. been a huge benefit to systems 3GIO systems typically specify designers. Virtually all IC vendors are what a sufficient eye should look like now making I/O models available for for a receiver to extract the clock and their devices. data. This capability is referred to as As we enter this next decade with an “eye mask.” The ability to specify 3GIO, IBIS is moving to support sub- an eye mask is included in HyperLynx circuit models in SPICE (Simulation Version 7.0, giving you an easy way to Program with Integrated Circuit “see” in the oscilloscope view if your Emphasis) or VHDL_AMS in the resulting eye pattern meets the manu- proposed IBIS 4.1 (now Bird 75) spec- facturer’s specification. ification. In the meantime, simulation environments must support the mod- Conclusion els that are currently available, and High-speed, low-voltage differential those are predominately HSPICE serial interconnect is the wave of the (from MetaSoft). future for both interboard and intra- HyperLynx Version 7.0 offers sup- Figure 4 - Multibit stimulus generation dialog board interconnects. The features port for HSPICE models. These mod- you need in a verification and analy- els are generally encrypted and require a “what-if” topology defined in HyperLynx. sis tool are changing to support this transi- license for the HSPICE simulator; Version Then hit the simulate button and view the tion. Because tolerances are extremely tight 7.0 calls the HSPICE simulator from the results that are displayed in an eye diagram in all aspects of these interconnect systems, HyperLynx environment. Model assignment based on HSPICE results. the entire system must be simulated to and waveform analysis are the same as assure first-pass success of the PCB design. always, but behind the scene HyperLynx pre- Eye Diagrams In summary, consider a quote from an pares a SPICE netlist of the entire electrical For us digital designers, an eye diagram – as Intel® white paper on PCI Express, topology (including all coupling and loss ele- shown in Figure 3 – is nothing more than “...detailed simulation and validation are ments) and then invokes HSPICE, extracts what we have been used to looking at in a necessary to guarantee a successful design.” the results, and presents them in the native logic analyzer for years – data bits and high We at Mentor Graphics believe our latest HyperLynx environment. This includes any and low voltage levels. The difference release of the HyperLynx products will multibit stimulus patterns you have defined. between a logic analyzer and an eye diagram make this task easier and more productive. For example, if you want to simulate a is that in the eye, hundreds of bits are over- We wish you luck as you move into the Xilinx RocketIO™ multi-gigabit trans- laid on each other so that we can see if there gigahertz world of serial connectivity.

Spring 2003 Xcell Journal 19 FPGA Design 20 etrGahc oprto.AlRgt eevd etrGahc n PAAdvantage areregistered trademarksofMentorG ©2002 Mentor GraphicsCorporation. All Rights Reserved.MentorGraphics andFPGA Concept to Silicon video seminar at seminar video Silicon to Concept our View want. FPGA-on- you doors any and open to able be you'll property and support intellectual customer award-winning Mentor’s in Add systems, board. embedded synthesis, verification, creation, design in challenges your handles suite full inte- Our designs. edge cutting today’s handle fully to power the you gives that a solution FPGA grated has Graphics Mentor Only performance. and speed maximize to technologies and tools robust more FPGA In the world of FPGA, the only constant is change. As gate counts soar into the millions, designers require designers millions, the into soar counts gate As change. is constant only the FPGA, of world the In www.mentor.com/fpga So are the tools to handle it. handle to tools the are So here. is complexity FPGA Thresholds. New raphics Corporation. Could Microprocessor Obsolescence Be History? Embedding soft processors in FPGA fabric offers a radical but robust new solution that effectively eradicates the problem of processor obsolescence.

by Karen Parnell Product Marketing Manager, Automotive Xilinx, Inc. [email protected]

Your biggest obsolescence problem is out-of- date microprocessors and microcontrollers. Processors have shorter life spans than ever and are often discontinued at short notice, victims of fluctuating consumer market trends and the demand for ever-greater speed. In addition, obsolescence is deliber- ately built into such items as consumer products, encouraging microprocessor manufacturers to abandon existing processors in pursuit of planned platform introductions, thus propagating a ripple effect of obsolescence. Even for designs coded in “C” (tout- ed as being “portable code”), there are always architecture-specific instructions and features that hamper the changeover from the obsolete processor to the next-generation device. This changeover problem is exacer- bated by different package options and I/O configurations, which can necessitate a complete board re-spin. Soft processor cores offer an extremely attractive alternative to traditional approaches toward the problem of obsolescence, all of which are time-, labor-, and cost-intensive, and waste legacy development efforts.

Spring 2003 Xcell Journal 21 A Look at Traditional Approaches Inserting a new processor along with Off-the-shelf solutions often merely sub- The predicament of automotive telematics software written to emulate the old one is, stitute new headaches for old ones. Let’s say equipment designers, for example, is rep- at present, better in theory than in reality – you need a processor with 10 UARTs, an resentative of the problem. Although although it’s an appealing concept that interrupt controller, and access to a block of design and development time scales have does have some operational history. This external flash memory. Although there are shrunk from five years down to two, pro- option preserves the legacy software, so the many off-the-shelf processors offering mul- duction is measured in years and the prod- process is relatively cheap and fast. But tiple UARTs and other desired peripherals, ucts remain active in the field for another once again, the solution is not permanent. they typically have numerous other periph- 10 years plus. If we imagine a scenario in If the system has a long projected life, a erals that would go unused in your system. which every Electronic Control Unit new solution might have to be repeated So, not only are you paying for the addi- (ECU) in a car contains at least one every few years. tional peripherals, but you often have to processor, and that place unused peripher- every car contains up to als into a safe mode or 60 ECUs, this translates otherwise disable them to a major problem via software. every time a processor is Decommissioning obsoleted at relatively unused peripherals short notice. MicroBlaze MicroBlaze places an additional 900 in in Designers rely on Virtex-II Virtex-II Pro burden on your soft- several solutions to deal ware design team, who with processor obsoles- not only have to make cence. The applicability the processor peripher- Logic Cells PicoBlaze als operate correctly, of any given solution 140 in depends upon a num- Spartan-II but now have to write ber of variables: the code for the parts of value of the application the processor that are software, the projected not used. Clearly, life of the system, and 76 MHz 125 MHz 150 MHz purchasing an off-the- shelf solution in this the amount of time and Frequency money available to solve scenario would be highly wasteful, not the problem. Figure 1 - Xilinx PicoBlaze and MicroBlaze soft processors The most expensive only in terms of initial solution is to redesign cost, but also in wasted the system around a new processor. More important, software emulation is engineering time during the design process. Depending on the volume of the code, a inherently a serial process. Because so redesign can cost hundreds of worker- much performance is consumed running The Soft Processor Solution years, much of it devoted to validation and the emulation rather than the applica- The soft processor solution eradicates testing. Not only is the huge investment in tion, it is slow. It has been shown empiri- processor obsolescence and preserves many debugging and refining the existing soft- cally that emulation requires, on ware lost – a clear case of throwing the average, about 20 clock baby out with the bath water – but the cycles of the new solution is temporary at best. If the system processor for every has a long projected life, the same problem legacy instruction will recur every few years. it executes. Another solution is the Last Time Buy In addition, (LTB), which, on the surface, appears to be emulation breeds the most cost-effective option. The prob- further obsoles- lem with this approach is that the designer cence, because the must guess how much product to buy for processor used as the the life of his program. If he guesses wrong, emulation engine will he is faced with an even more difficult itself become obsolete and problem: a larger legacy investment that may force an entire rewrite of must somehow be upgraded. the emulator.

22 Xcell Journal Spring 2003 MicroBlaze Development Board Part Numbers The MicroBlaze marizes the specifications for the two 32-bit soft proces- processor cores. Avnet (Silica) sor core is the Selected Xilinx FPGAs in the new IQ ADS-SP2-MB-EVL MicroBlaze/Spartan-II Evaluation Kit industry’s fastest Solutions range have been qualified to soft processing solu- operate over the -40ºC/-40ºF to ADS-VE-MB-EVL MicroBlaze/Virtex-E Evaluation Kit tion. It runs at 150 125ºC/257ºF temperature range and are ADS-VE-MB-DEV MicroBlaze/Virtex-E Development Kit MHz and delivers designed for use in automotive applica- 100 D-MIPS. It tions, such as telematics systems. Memec features a RISC DS-KIT-MBLAZE-V2 Virtex-II Embedded Development Kit architecture with Conclusion Nohau Harvard-style, sepa- Embedded in FPGA fabric, Xilinx soft rate 32-bit instruc- processor cores such as MicroBlaze and EMUL-MicroBlaze-PC High-Speed Debugger for MicroBlaze Soft Processor tion and data buses PicoBlaze can eradicate processor obsoles- running at full cence issues by providing a stable platform Table 1 - EDK Development boards speed to execute that is owned and configured by the programs and access designer. Used in combination with the years of legacy code and development. The data from both on-chip and external mem- new IQ Solutions extended temperature new approach is to own the soft processor ory. A standard set of peripherals is IBM range FPGAs, they are ideal for such appli- core and embed it in an FPGA host. Not CoreConnect™ bus- enabled to cations as automotive telematics. only can you port the core to multiple offer MicroBlaze designers compatibility Not only will you benefit from the flex- FPGA platforms, but you can also design and reuse. ibility, integration, and upgradeability the peripheral set to meet your exact design offered by programmable logic, you can requirements, thus eradicating architecture Xilinx EDK Solutions now take advantage of a processor tailored compromises and wasted peripherals. Xilinx Embedded Development Kits to your design needs that will not go obso- The Xilinx MicroBlaze™ soft processor (EDKs), including the soft processor core lete, and will ultimately save the time and gives you the luxury of a different approach. and a standard set of peripherals, are available money associated with costly redesign. Now, you can start with a processor core from Xilinx and its distribution partners For more information, visit these websites: and build the peripheral set to meet your (see Table 1). The kits include a complete set exact requirements. You’ve eliminated sili- of GNU-based software tools, including MicroBlaze Information con wastage because you implement only compiler, assembler, debugger, and linker. www.xilinx.com/microblaze/ what you need. You’ve reduced software MicroBlaze EDKs bought from Xilinx PicoBlaze Information design complexity because no code need ever and its distribution partners also include www.xilinx.com/picoblaze/ be written to disable unwanted processor development boards that support the functionality. Creating unusual processor Virtex™-E, Virtex-II, Spartan™-II, and Automotive IQ Solutions configurations – which can be changed at Spartan-IIE series of FPGAs. Table 2 sum- www.xilinx.com/automotive/. any time to suit changes in the specification – is now a simple task. Even if after five or six years of field use, Soft Processor Architecture Bus MIPS/Speed Size FPGA Support Support when the FPGA hardware may itself be nearing the end of its life, the soft proces- MicroBlaze 32-bit RISC Harvard-style 100 D-MIPs 225 CLBs Virtex Embedded sor core can simply be dropped into its new buses Virtex-E Development Kit FPGA host, using the same “C” code. The 150 MHz Virtex-II (EDK) – soft processor 32-bit Virtex-II Pro core, peripherals, hardware platform may need some PCB instruction Spartan-II GNU-based software modifications but the legacy code remains and data Spartan-IIE tools (compiler, usable and intact. buses assembler, debugger, and linker) MicroBlaze and PicoBlaze Soft Processors Xilinx offers both a 32-bit MicroBlaze soft PicoBlaze 8-bit 8-bit address 35 MIPS 35 CLBs Virtex Free-of-charge processor core and an 8-bit PicoBlaze™ and data buses Spartan-II reference design soft core. The PicoBlaze processor runs at 116 MHz and application note, assembler speeds of 116 MHz, yet occupies a tiny footprint of just 35 configurable logic blocks (CLBs). (See Figure 1.) Table 2 - Xilinx soft processors

Spring 2003 Xcell Journal 23 by Jesse Jenkins, Applications Manager CPLD Business Unit Xilinx, Inc. ReduceReduce YourYour RISCRISC [email protected] Would you like to design your own microcontroller, but don’t want all the hassle of starting from scratch with either the withwith aa PicoBlazePicoBlaze architecture or the support software? Would you like to update older code from earlier microcontrollers to run faster and with the new memory standards? ReferenceReference DesignDesign Would you like to take the same code to substantially lower power dissipation?

Would you like to gain insight into Build your own soft microcontroller in a the inner workings of a microcontroller with in-depth simulation as well as code CoolRunner-II CPLD. It’s easy, fast, and free. creation support?

If the answer is yes to any of these questions, read on about the new Xilinx PicoBlaze™ microcontroller reference design. It’s here now, it’s fast, and maybe most important of all, it’s free. This article details the CPLD version of the PicoBlaze reference design, including where to get the application code, examples, and the cross assembler. With that under your belt, you are ready to start – but first, a little more detail.

PicoBlaze Explained The PicoBlaze soft microcontroller is an 8- bit design that supports an 8-bit data bus and 16-bit instruction bus. As you may have guessed, the PicoBlaze design is based on the RISC (reduced instruction set computer) “Harvard architecture” model with separate data and instruction ports. The PicoBlaze design is written in VHDL, and is intentionally documented so that the accompanying cross assembler directly tracks the architecture. The PicoBlaze version currently shipping supports 49 instructions that operate within any of several Xilinx CoolRunner™-II CPLDs. The speed will vary depending on exactly which instructions you wish to support and which version of the architecture you choose. For instance, with the full instruction

24 Xcell Journal Spring 2003 set and all instructions held outside the CPLD, you can expect to achieve about 30 Port 8 MHz performance. By streamlining either 8 PORT_ID Address READ_STROBE the instruction set or the program, you can 8 Control WRITE_STROBE 8 8 triple performance to 90 MHz. 8 Registers OUT_PORT 8 8-bit Naturally, the PicoBlaze microcontroller IN_PORT 8 8 8 architecture takes advantage of the two ALU key CoolRunner-II features – high-speed execution and low power consumption. Interrupt ZERO & Flag CARRY Store Add or Delete Instructions Interrupt Flags INTERRUPT Figure 1 shows the PicoBlaze base architec- Control 8 Program 8 ADDRESS ture, but don’t restrict your thinking to that Program Counter 8 16 CONSTANT DATA Flow architecture alone. Think of it as a starting INSTRUCTION Operational Control 8 Control & point. You are free to either add or delete RESET Program Instruction CLK Counter capability as you see fit. Decoding Stack For instance, you can trim instructions from the instruction set by merely commenting them out of the VHDL. If you Figure 1 - PicoBlaze architecture wish, you can also remove them from the assembler, but that is not required. You tents. The algorithm starts with a byte of pointers and counters and unroll it. can also add instructions, if you have data labeled A-H. This byte is first inter- In Figure 3, the “flip” instruction is some application that can take advantage nally swapped (four rotates), then succes- added to the VHDL, the design is recom- of essential instructions beyond those sively, inner bits are picked off with piled, and the processor is “rewired” to add currently supplied. Boolean AND/OR into a target register in this key instruction. This method col- It’s also possible to do both – cut some that will build up the results, two bits at a lapses many instructions down to some instructions and add some instructions. time. One pass through this results in the gate rewiring with the synthesis tools. Very For instance, most programmers use about final register with the original contents many bit-level operations boil down to 20 instructions in their day-to-day pro- reversed. Depending on algorithm details, simply rewiring the CPU, and best of all, gramming. Select the 20 you typically use, it can take approximately 12 to 18 instruc- the synthesizer does the work. Many other remove the rest, and then program. If you tions. In this case, we dispense with adding examples exist. See the links at the end of discover a bottlenecking “inner loop” that the overhead of loop management with this article. could benefit from a single instruction customized for that specific task, go ahead and write the VHDL that will do it at hardware ABCDEFGH Initial value 00010001 Mask 2 bits speeds. Remember, the PicoBlaze micro- 0HGF0DCB Or Int. Result controller uses DualEDGE flip-flops with- EFGHABCD Swap nibbles in the processor to accomplish 00010001 BCDEFGHA Right Rot Swap computation on both clock edges. Mask 2 bits 000H000D Or Int. Result HGF0 DCB0 Left Rot Int. Result A DSP Example Right Rot Swap To illustrate the ability of the PicoBlaze DEFGHABC 00010001 Mask 2 bits architecture to adapt, let’s look at an example from DSP. The code to bit-reverse a bus 00H000D0 Left Rot Int. Result HGFEDCBA OR Int. Result is a fundamental operation used in Fast 00010001 Mask 2 bits Fourier Transforms. The value is then typi- Final result is bit-reversed cally driven out on the address lines as a 00HG00DC Or Int. Result critical step in the base algorithm. To do this in “standard” instructions would take CDEF GHAB Right Rot Swap multiple “mask and rotate” commands, 0HG00DC0 Left Rot Int. Result creating a processing bottleneck. Figure 2 shows the basic operations in Figure 2 - Bit-reversal code steps assembler-like steps to display register con-

Spring 2003 Xcell Journal 25 counters or timers, interrupt handlers, and unused circuitry that they pay for but never ABCDEFGH DMA circuits. With PicoBlaze, just add the really use. Don’t fall into that trap. Select the right set of peripheral capability within the functions you will need and get the cheap- chip, depending on the density of the est, fastest, lowest power solution possible. CoolRunner-II CPLD chosen. Table 1 Note that most of the items listed in Table 2 shows the densities available in exist as separate reference designs and may CoolRunner-II CPLDs, and Table 2 gives be found on the Xilinx website. HGFEDCBA some estimates of macrocell usage for various add-on functions. Performance Improvement Figure 3 - Recoded instruction One very important thing to remember Getting the most out of your design will be for bit-reversal (“flip”) operation is that when choosing a function to add in, another step. A classic way to improve the select just the functionality actually needed, design is to “tune” it. Observe its perform- Processor Enhancement so you will get the best usage from your ance behavior, identify where the processor We just mentioned instruction set opti- choice. Bill Carter, one of the founders of is spending its time, discover what it is mization, but it’s also possible to add Xilinx, frequently comments that most peo- doing, and think through the best set of functionality. Remember that many micro- ple don’t really want or need a UART (uni- operations to improve. Then, implement a controllers include on-board function versal asynchronous receiver/transmitter), new version of the architecture and/or code blocks that have a payoff beyond the but only an “RT.” That is, they select a func- and evaluate it again. instruction set. For instance, many 8-bit tion that comes with 50 options, then only One easy way to do that is with the microcontrollers include internal peripheral use two. They end up carrying along lots of CoolRunner-II Design Kit (see Figure 4). Many target designs easily fit onto the 256- CoolRunner-II XC2C32 XC2C64 XC2C128 XC2C256 XC2C384 XC2C512 macrocell XC2C256 that resides on that board. There is also a blank pinout site for Macrocells 32 64 128 256 384 512 adding a 64 macrocell XC2C64, with sig- MAX I/O 33 64 100 184 240 270 nals already attached to the XC256. Simply construct a small hardware performance TPD (ns) 3.5 4.0 4.5 5.0 5.5 6.0 monitor that will time various code sections TSU (ns) 1.7 2.0 2.1 2.2 2.3 2.4 and report back the execution time. That TCO (ns) 2.8 3.0 3.4 3.8 4.2 4.6 way, by examining the behavior over address space and time, you can determine just how Fsystem1 (MHz) 333 270 263 238 217 217 much time is spent doing the various tasks. Table 1 - CoolRunner-II macrocell capacities and pertinent data Figure 5 shows a simple approach to doing this operation. With care, both

Function Macrocells IrDA and UAR/T 87 Timer/Counter 16 and up DMA Port 16-32 Manchester Encoder/Decoder 55 Wireless XCVR 156 16b/20b Encoder/Decoder 76 Flash NAND Interface 9 SPI Interface 135 UAR/T 61 DDR SDRAM Interface 128 SMBus Controller 158

Table 2 - Common functions and approximate CoolRunner-II macrocell usage Figure 4 - CoolRunner-II Design Kit

26 Xcell Journal Spring 2003 PicoBlaze Conclusions and Recommendations bits and others for 16? It’s up to you. (XC2C256) This introduction to designing PicoBlaze Application areas where these kinds of microcontrollers was written to stimulate processors make sense include industrial 8 Timer Out your imagination to discover the fascinat- control, low-power portable DSP (brain- Interrupt ing world of creating your own CPUs. waves, EKG, medical), and cryptography. 8 Counter In Once you start, it can be addictive. You can Did you know that most cryptographic Decode Performance Monitor easily alter the processor to be 16 or even operations are bit-level operations and (XC2C64) 32 bits wide – or even non-binary powers. almost never do floating point arithmetic? Then you will discover which instructions The PicoBlaze microcontroller reference 8 Instruction design for CPLDs has been built, tested, Address burn through the macrocells and what the Memory new speed limits will be. and is now available over the Internet, free Instructions 16 Do you need some instructions for 8 to the user.

Figure 5 - PicoBlaze performance monitor Acknowledgements Xilinx engineer Ken Chapman, who received additional support and encouragement CPLDs can communicate through a PC par- from Henk van Kampen at Mediatronix BV, developed the original reference design. allel port via their JTAG boundary scan The CPLD version was created by Scott Lien, who also wrote the PicoBlaze cross chains. Performance monitoring can help assembler and the application note that is on the Xilinx website. The VHDL and decide which aspects of the PicoBlaze design cross assembler (source and executable) links are shown in xapp387 (see below). to perform in software and which to embody in hardware. One beautiful thing about For more information, see the following: building functions out of programmable CoolRunner-II Design Kit: logic is that different experiments to focus on www.xilinx.com/products/cpldsolutions/demoboard.htm (purchasing details) specific performance targets are easily developed – giving you highly tuned, fast designs. CoolRunner-II Application Notes And you don’t have to worry about power www.xilinx.com/xapp/xapp375.pdf (timing model) enhancement. CoolRunner-II CPLDs are www.xilinx.com/xapp/xapp376.pdf (logic engine) already the lowest power CPLDs available www.xilinx.com/xapp/xapp377.pdf (low-power design) today, and PicoBlaze is a very competitive www.xilinx.com/xapp/xapp378.pdf (advanced features) low power microcontroller. www.xilinx.com/xapp/xapp379.pdf (high-speed design) www.xilinx.com/xapp/xapp380.pdf (cross point switch) PicoBlaze Cross Assembler www.xilinx.com/xapp/xapp381.pdf (demo board) As mentioned before, the PicoBlaze cross www.xilinx.com/xapp/xapp382.pdf (I/O characteristics) assembler is so well documented that a www.xilinx.com/xapp/xapp383.pdf (single error correction, double error detection) direct correspondence between assembly www.xilinx.com/xapp/xapp384.pdf (DDR SDRAM interface) code and VHDL in the PicoBlaze design file already exists. The translator is written www.xilinx.com/xapp/xapp387.pdf (PicoBlaze microcontroller) in ANSI-C and is assembled on Microsoft www.xilinx.com/xapp/xapp388.pdf (on-the-fly reconfiguration) assemblers. The cross assembler is highly www.xilinx.com/xapp/xapp389.pdf (powering CoolRunner-II CPLDs) transportable and supports multiple output CoolRunner-II Data Sheets file types. For instance, it produces a bina- www.xilinx.com/bvdocs/publications/ds090.pdf (CoolRunner-II CPLD family data sheet) ry output file ready to load into external www.xilinx.com/bvdocs/publications/ds091.pdf (XC2C32 data sheet) EPROM in Intel hex format. www.xilinx.com/bvdocs/publications/ds092.pdf (XC2C64 data sheet) The cross assembler also produces the www.xilinx.com/bvdocs/publications/ds093.pdf (XC2C128 data sheet) essential modeling files for the VHDL simulator. You can instantly analyze your pro- www.xilinx.com/bvdocs/publications/ds094.pdf (XC2C256 data sheet) duced code with high-speed simulations to www.xilinx.com/bvdocs/publications/ds095.pdf (XC2C384 data sheet) determine the functionality and effective- www.xilinx.com/bvdocs/publications/ds096.pdf (XC2C512 data sheet) ness of the code you have implemented. CoolRunner-II White Papers Then download the code into the www.xilinx.com/publications/products/cool2/wp_pdf/wp165.pdf (chip scale packaging) CoolRunner-II Design Kit and see that www.xilinx.com/publications/whitepapers/wp_pdf/wp170.pdf (security) they actually do work as expected.

Spring 2003 Xcell Journal 27 Helping To Make Your CPLD Designs Better... CoolRunner-II Development Board Available from Nu Horizons

Being able to get your device to market in real time is the name of the game. The CoolRunner-II Evaluation Board from Nu Horizons is a very flexible testing platform that allows engineers to evaluate the XC2C256 low power CPLD from Xilinx in a typical environment.

What sets this board apart from other boards is its robust features and the XC9572XL that allows the engineer to interface 5V signals to the XC2C256. All at a cost of only $99.

For more information on the Nu Horizons CoolRunner-II 256 Macrocell Evaluation Board, including features, benefits, schematic and users manual, please visit us on-line at www.nuhorizons.com/cr2

If you are new to designing with CPLDs, and would like more information on how these devices can benefit your application, please visit our dedicated Xilinx CPLD Solution Center for a complete source of Xilinx CPLD products and support software. www.nuhorizons.com/CPLD

1-888-747-NUHO Xilinx field programmable XilinxXilinx FPCs FPCs Target Target controllers combine the power of the 32-bit MicroBlaze Cost-SensitiveCost-Sensitive soft processor cores with the versatility of Spartan-IIE FPGAs to deliver the greatest ApplicationsApplications number of I/Os for low-cost processing solutions.

by Helen Yu Processor Solutions Marketing Manager Xilinx, Inc. [email protected]

Staying ahead of the competition is getting tougher everyday. Cost pressures, changing standards, and device obsolescence are just a few of the challenges. To maintain leadership, you need a low-cost, competitive processing solution that’s customizable throughout the entire design cycle and can quickly be brought into high-volume production. The Xilinx field programmable controller (FPC) solution allows you to create low-cost, customized processors with the peripherals, memory, and logic you want – all on a single, cost-optimized Spartan™-IIE FPGA. With the flexibility to allow integration of other intellectual property (IP) cores on the FPGA fabric, the Spartan-IIE family presents an ideal embedded solution.

Spring 2003 Xcell Journal 29 Traditional Design Issues columns of block RAM. A power- The introduction of embedded soft ful hierarchy of versatile routing processors has offered substantial DLL DLL channels interconnects these func- benefits to the world of digital elec- tional elements. Figure 1 shows a tronics. The industry’s huge block diagram of a Spartan-IIE appetite for increasingly intelligent FPGA device. CLBs CLBs and sophisticated control systems This flexible platform is an has dictated the rapid advancement BLOCK RAM BLOCK RAM ideal base for implementing a of processor technology. Perhaps controller system. An embedded most prevalent of all is the huge leap microcontroller takes the concept in demand for embedded micro- of integration one stage further by controllers and microprocessors. permitting you to embed the con- If you are using a traditional troller system into a small section microcontroller unit (MCU) or of a programmable device. No CLBs CLBs microprocessor unit (MPU) in your longer does the microcontroller BLOCK RAM application, selecting the proper BLOCK RAM have to exist in a stand-alone device is one of the most critical package; it can now be embedded decisions that will ultimately deter- DLL DLL deep within custom hardware. mine the success or failure of your Spartan-IIE FPGAs are cus- design. Typically, you must address I/O LOGIC tomized by loading configura- a number of issues, including: Figure 1- Spartan-IIE FPGA block diagram tion data into internal static • Is the MCU or MPU affordable? memory cells while permitting Does it minimize the overall cost unlimited reprogramming cycles of the system while still fulfilling the The enhanced integration of embedded to become a viable upgrade path for design specification? applications means fewer interface issues, future product enhancements. Therefore, and processor-based designs developed in Spartan-IIE FPGAs are ideal for shorten- • Does the unit have the required number the FPGA can more easily be updated ing product development cycles while of I/Os? Too few can’t do the job; too without changing the PC board. offering a cost-effective solution for high- many can lead to excessive cost. volume production. • Does the device have all the required Spartan-IIE FPGAs In addition, the Spartan-IIE family peripherals? Can you add your own? The Spartan-IIE 1.8V family of FPGAs delivers a cost-effective platform with Does the unit include peripherals you achieves high-performance, low-cost opera- high numbers of I/Os to provide excellent don’t need? tion through advanced architecture and the I/O expansion (up to 514 user I/Os). • Are you paying for unneeded IP? latest semiconduc- Table 1 compares the number of I/Os in tor technology. The two traditional microcontrollers against • Will the MCU or MPU be available seven-member fam- the number of I/Os delivered by two over the long term? Will the processor ily (see “Spartan-IIE Spartan-IIE FPGAs. become obsolete? Family Grows” in These questions are just some of the cri- this issue) offers MicroBlaze Soft Processor teria to consider when choosing a tradi- densities ranging The MicroBlaze 32-bit RISC soft processor tional MCU or MPU for your application. from 50,000 to 600,000 system gates. is a true 32-bit processor, supporting 32-bit The Xilinx FPC solution, however, makes Spartan-IIE devices also provide system bus widths. many of these concerns irrelevant. clock rates beyond 200 MHz. The core is a The Spartan-IIE FPGAs have a flexible, RISC-based The FPC Solution programmable architecture of configurable engine with a 32-bit LUT RAM-based reg- Using the 32-bit MicroBlaze™ soft logic blocks (CLBs), surrounded by a ister file with separate instructions for data processor core in the Spartan-IIE FPGA, perimeter of programmable input/output and memory access. the FPC offers a true low-cost solution for blocks (IOBs). There are four delay-locked The MicroBlaze soft processor supports real-time processor control. The combina- loops (DLLs), one at each corner of the die. both on-chip block RAM and external tion of a high-performance soft processor Two columns of block RAM lie on oppo- memory. All peripherals use the same IBM and a low-cost FPGA enables you to rap- site sides of the die, between the CLBs and CoreConnect™ OPB bus as the IBM idly develop programmable systems for the IOB columns. The XC2S400E has four PowerPC™ processor – which means the cost-sensitive applications. columns and the XC2S600E has six processor peripherals are hardware compat-

30 Xcell Journal Spring 2003 ible with the PowerPC processors on concentration, requiring only the

Virtex-II Pro™ FPGAs. SPARTAN-IIE FPGA most minimal maintenance and The MicroBlaze embedded sys- supervision. tem, including the MicroBlaze core and selected processor peripherals, is Consumer shown in Figure 2. Several peripher- Walk into any electronic store als are available to support the and you will find a host of prod- MicroBlaze processor, including ucts that use some kind of micro- memory controllers, UART, GPIO, controller: MP3 players, video I2C, 10/100 Ethernet MAC, and recorders, Web tablets, televi- ILMB DLMB many more. On-chip Local Memory sions, plasma displays, set-top The MicroBlaze core offers the boxes, refrigerators, washing flexibility and scalability of embed- Add/Sub machines, telephones, answering Program Shift/Logical ded processor programmable logic Counter machines, ovens, toasters, print- devices. The MicroBlaze processor Multiply ers, and scanners all offer added requires less than half the logic functionality through the use of a Instruction resources yet offers more than twice Decode microprocessor. the performance of competing soft Instruction Register File Buffer 32 x 32b Data Side Bus Interface The Benefits of FPCs

processors, as measured by industry- Instruction Side Bus Interface standard Dhrystone-MIPS (D- MicroBlaze Soft Processor The Xilinx FPC solution com- MIPS) benchmarks. Delivering 49 bines the low-cost Spartan-IIE Watchdog Interrupt D-MIPS of performance at 75 UART FPGA family with the compact, Timer Controller MHz, the MicroBlaze processor IOPB DOPB high-performance MicroBlaze occupies only 1,050 logic cells in the 32-bit RISC processor core. You

Off-chip General Off-chip Spartan-IIE FPGA. Timer Counters also get complete embedded sys- Memory Purpose I/O Memory Interface Interface tem tools (EST) support, includ- Processor Peripherals FPC Applications ing GNU compiler and As shown in Figure 3, FPCs have debugger; hardware and software Off-chip Off-chip significant applications in the tra- Memory MicroBlaze Embedded System Memory development tools for imple- ditional 16- and 32-bit microcon- mentation, simulation, and veri- troller and microprocessor markets, Figure 2 - MicroBlaze embedded system fication; and more than 30 fully which include automotive, indus- parameterizable processor IPs. trial, and high-end consumer Industrial Overall, FPCs offer a low-cost, applications. No longer must large workforces be trained high-performance, and easy-to-use solu- to monitor a specific area of a production tion. Among the benefits of FPCs are: Automotive plant. Today, a series of microcontroller- • Reduced costs – By integrating your The modern automobile is replete with based monitoring modules, often linked to design onto a single device, you not only microcontroller-based systems providing a central station, replaces these human save time and effort, you also reduce automated control for just about every counterparts. Microcontrollers work tire- your overall costs. Spartan-IIE FPGAs conceivable part of the car. Braking sys- lessly around the clock without lapses in are the lowest cost programmable tems use microcontrollers to deliver advanced safety features, such as ABS and traction control. Windshield wipers are Traditional Microcontrollers # I/Os controlled to bring us timed interval NEC V850E/IA2 (32-bit microcontroller) 100-pin package wipes and even rain-sensitive automatic wiper activation. Heating controls for the Motorola MC68HC912B32 (16-bit flash microcontroller) 80-pin package vehicle interior monitor multiple zones within the passenger compartment, auto- Spartan-IIE FPGAs # I/Os Advantages matically adjusting the supply of air to S-IIE XC2S600E 514-pin package 5x more I/O than NEC, 6x more than Motorola maintain the desired temperature. Seats even remember the favored positions for S-IIE XC2S300E 329-pin package 3x more I/O than NEC, 4x more than Motorola different drivers of the car and readjust themselves automatically. Table 1 - Comparison of Spartan-IIE devices and traditional microcontrollers

Spring 2003 Xcell Journal 31 logic devices you can get. They MP3 Player also allow you to integrate costly FLASH board-level features such as DLLs, DRAM Bluetooth RAM, a variety of I/O translators – Spartan-IIE Module User as well as MicroBlaze, the industry’s Interface Adapter/ System Control fastest soft processor – into a single, DRAM Logic & Display Clock Controller LCD Display compact, low-cost platform. Interface Distribution IrDA • More user I/Os – Spartan-IIE FPGA I2C MP3 Decoder USB devices contain as many as 514 user Interface MicroBlaze GPIO I/Os, with more than 70% additional

Audio DAC capacity, and the lowest cost per I/O than competing FPGAs in the same density ranges. This competitive Web Tablet advantage allows you to integrate AC97 more features and also shrink the

Speakers form factor for each device.

AC97 Codec Spartan-IIE • Customization – You can create a customized controller and peripheral RTC I2C Storage Controller HDD set to meet your exact and evolving

UART Display Controller design requirements. Unlike other Touch-screen Display SRAM solutions, with FPCs you are no Memory Controller SDRAM MicroBlaze longer locked into a rigid, pre-selected PCI USB 2.0 ADC Host set of peripherals – or have to pay for unused function sets. In addition, Xilinx offers more than 30 processor PCMCIA IPs to choose from. Plug-in Card • No obsolescence – Xilinx allows you to purchase the MicroBlaze soft processor Flat Panel Display core source code. This option guarantees product availability for any application you choose. You can also port Spartan-IIE the core across Xilinx product lines – Display Timing Digital Clock Gen and even target an ASIC device. Generator Video MPEG DLLs DVD Decoder MicroBlaze Memory SDRAM Conclusion Controller FPCs from Xilinx deliver the highest I/O De- for low-cost processing solutions. They cre- Analog Video I/F interlacing/ Color Video Video RGB to Scaling Space ate a customized controller and peripheral Decoder YUV 4:2:2 Filters and Converter Buffers sets to meet your exact – and evolving – Y/Red Y/Red Green Green design requirements. Freed from a fixed set UV/Blue UV/Blue Display of peripherals, you now have the power to Output VLUT Formatter DAC custom tailor your peripheral set to meet Gamma your needs. In addition, the opportunity to Correction purchase the MicroBlaze structural VHDL source code assures you of product avail- Legend ability well into the future. FPCs reduce your overall design cost and inventory

Xilinx Memory CPU Non-Xilinx Mixed Signal Embedded Logic while bringing you the highest performance from your logic devices. For more information on the FPC solu- Figure 3 - FPC applications gallery tion, visit www.xilinx.com/fpc/.

32 Xcell Journal Spring 2003 Memec Lead by Design Training Embedded Processor Solutions Workshop During this hands-on training course, learn how to use the new Embedded Development Kit software to architect processor-based Xilinx solutions. Easy-to-follow lab exercises provide practical experience with: • Defining your hardware system • Adding peripherals • Writing application code • Performing simulation • Implementing the design • Debugging (in-system) A MicroBlaze™ and Virtex-II Pro™ PowerPC™ version of the course focuses on the details of the processor development flow. During the MicroBlaze class, you will experiment with the new Memec Design Spartan™-IIE LC Development Kit; likewise, the Virtex-II Pro PowerPC course allows you to explore the features of the Memec Design Virtex-II Pro Development Kit. Exclusive to Attendees: Specially Bundled Kits at Reduced Pricing! With special kit pricing, you can continue your development efforts with the same hardware and software you used during the hands-on labs.

Virtex-II Pro PowerPC Development Kit Spartan-IIE MicroBlaze Development Kit Explore the power of an embedded PowerPC processor, The ideal starter kit for developing MicroBlaze-based applications. Rocket I/Os, and the most advanced FPGA architecture available. • Spartan-IIE LC Development Board • Virtex-II Pro Development Board • Programming Cable • Programming Cable • Power Supply • Serial and MGT Loop-back Cables • Xilinx EDK Software • Power Supply • Xilinx ISE BaseX Software • Xilinx EDK Software • Documentation and Reference Designs • Documentation and Reference Designs

What Are You Waiting For? Classes take place in March and April, in locations throughout North America. Sign up now for the Embedded Processor Solutions Workshop nearest you.

Call 800.488.4133 ext.980 or register at www.insight.na.memec.com/embedded_workshops. DesignDesign EmbeddedEmbedded ProgrammableProgrammable SystemsSystems WithoutWithout CompromiseCompromise

With the new Xilinx Embedded Design Kit, you can easily develop programmable, embedded, hardware/software solutions that deliver optimum results – and you can do it faster than ever.

by Ravi Pragasam market windows and changing industry pling ensures that the hardware platform Marketing Manager, Embedded Processor Solutions standards, on the other, often necessitate you create will be one a software platform Xilinx Inc. compromises in performance. The result can match. [email protected] is often failure to meet original specifica- The EDK also enables you to rapidly tion goals. What has been lacking is a fast, match the programming environment and The trouble with embedded design up to effective way to integrate HW and SW software capability to available hardware now has been that neither ASICs nor flows to produce a solution that brings out resources. ASSPs – the traditional choices for embed- the full advantage of embedded design. For the software designer, the greatest ded solutions – offer an ideal fit. In an The new Xilinx Embedded Dev- advantage afforded by PSF is access to an ASIC flow, hardware (HW) limitation elopment Kit (EDK) bridges the gap embedded tool that allows HW/SW co- problems discovered late in the design cycle between HW and SW flows by providing design, and which also interfaces with must be dealt with in the software (SW). a single HW/SW design environment. industry standard software tools, such as This drawback is becoming increasingly Fusing the two early in the design flow RTOS support from Wind River Systems, critical now as physical geometries shrink, enables you to deliver an embedded pro- Linux support from MontaVista Software, and as design rules and chip fabrication grammable system that meets your and popular open source embedded tools processes evolve. design specifications on time – without like GNU. Likewise, ASSPs are also less than perfect compromising performance. because you must design your solution PSF Benefits around standard elements, which often A New Era of System Design • The PSF overcomes many of the most impose limitations. In addition, the non- The EDK allows you to define the hard- common problems of the traditional standard elements of the ASSP usually don’t ware and software platforms of your system “over-the-wall” approach to HW/SW add enough functionality to deliver a solu- using powerful tools based on the Platform interface definition and integration. tion that sets it apart from the competition. Specification Format (PSF). PSF is an open With the PSF, you can give a hardware Typically, the resulting design has too much format that provides an abstraction layer platform specification to a firmware or functionality here, and not enough there. between the HW and SW sections so they software engineer without having to Long design cycles involving extensive can be tightly integrated during the defini- wait until the hardware is available. simulation, on the one hand, and narrow tion and development process. This cou- The firmware and software developer

34 Xcell Journal Spring 2003 can then configure the SW platform Conclusion • Embedded system tools with full knowledge of the HW plat- With key products, such as MicroBlaze form – with which devices are - Xilinx Platform Studio 32-bit soft processor cores and Virtex-II Pro attached, memory maps, register defi- - Tools for specifying the hardware and FPGAs, Xilinx offers a complete solution nitions, and so on. software platforms based on PSF that resolves many of the design challenges presented by more traditional tools and • Offering a single-environment approach - Xilinx microprocessor debug (XMD) technology. With the Virtex-II Pro device, for HW/SW platform configuration, - GNU tools for MicroBlaze and hard Xilinx has ushered in a new era of system PSF enables a rapid re-architecture of the embedded PowerPC cores in Virtex-II design in which SW/HW flows blend to embedded programmable systems design. Pro FPGAs (compiler and debugger) take advantage of programmability and This enables you to add to, or subtract high-performance features – such as the from, HW resources with matching SW - Support for simulation tools multi-gigabit serial transceivers available in in the modified architecture easily. - System generator for processors the silicon – in a way that enables solution • The unified environment, common bus (beta version) providers to get ahead in their markets structure, and common core libraries - Board Support Package (BSP) without making performance compromises. enable you to easily use a Xilinx generator Working within a common HW/SW MicroBlaze™ core, IBM PowerPC™ tool environment, you can rapidly con- • Interface and infrastructure IP cores processor in a Virtex-II Pro™ FPGA, or struct a custom processor system consisting both, in a single design. - Arbiters of processor, peripheral cores, and interconnect bus in a Xilinx FPGA. You can also • Multi-core, homogeneous (multiple - Memory controllers for external integrate your own custom IP cores into MicroBlaze cores or multiple PowerPC memory interfaces the processor system. processors), or heterogeneous (mixed - More than 40 standard IP cores (such And now, with the EDK, Xilinx has PowerPC and MicroBlaze cores) in a as UART, GPIO, and timer/counter) enabled the process of HW/SW co-design single design (multiple bus masters), - Evaluation version of high-value that has long been a dream of embedded are also supported. cores (such as 10/100 EMAC, single system designers everywhere. The EDK The PSF is a one-of-a-kind specifica- channel HDLC controller, and serial (Part No. DO-EDK) is available now. For tion format for both HW and SW. ATA L2) more information and updates to the Because it is an open format, it is available • MicroBlaze 32-bit soft processor core EDK, visit www.xilinx.com/edk/. for adoption by customers and third par- For more information about Xilinx ties, enabling them to integrate custom • Reference designs and examples embedded processor solutions, visit peripherals, third-party IP cores, and • Evaluation versions of Wind River and Processor Central at www.xilinx.com/ tools. PSF provides a common tool chain ISE 5.1i software tools. processor/. for the HW/SW platform specification, generation, development, and debug, while supporting multimaster and mixed architecture designs. The Xilinx EDK and ISE 5.1i logic design tools provide you with the technology to help you achieve your performance requirements – faster and better than ever.

EDK Contents The EDK includes the Xilinx Platform Studio (XPS), an all-encompassing design environment that permits you to define, configure, and generate a custom hardware and matching software/programming environment for either a stand-alone or RTOS- enabled programmable system. Hardware, software, and firmware developers who need to work in both domains can use the XPS integrated design environment.

Spring 2003 Xcell Journal 35 DesignDesign toto WinWin withwith IntegratedIntegrated FPGAFPGA andand MicroprocessorMicroprocessor SolutionsSolutions Using “software-compiled system design” for programmable systems, we show how you can combine software and hardware design methodologies – and development tools – fromfrom system-levelsystem-level specificationspecification toto directdirect implementationimplementation andand run-timerun-time reconfiguration.reconfiguration.

by Chris Sullivan Director of Strategic Alliances Celoxica, Ltd. [email protected]

As FPGAs have developed from logic prototyping devices into fundamental system Existing design examples that combine time to market. Starting at the system level, elements, there has been enthusiasm for the Xilinx FPGA solutions with development verification becomes a whole-design life concept of using high-performance proces- tools from Celoxica and Wind River cycle activity, and by enabling system-level sors closely coupled to or immersed inside Systems already provide unique and tangi- partitioning, you can realize a better quality the FPGA fabric for applications that ble proof that this concept and design flow of design (QoD) – right the first time, more require unrivalled levels of performance works. They form a core element of proof the time. and flexibility. grammable system co-design, delivering a In this architecture, the microprocessor quick, efficient, and verifiable route to A Design Example typically runs system applications while the device-optimized implementation. An early design example – developed by FPGA manages computationally intensive “Software-compiled system design” pro- Celoxica, Wind River, and Xilinx – focused tasks. Offloading processor-intensive tasks vides the capability to drive partitioning, on the design methodology, tools, and run- to hardware reduces the load on the proces- co-verification, and direct implementation time environments that can be applied to sor and delivers greater bandwidth. It can from the system specification. Moreover, it programmable systems. Specifically, we also remove bottlenecks by migrating algo- allows engineers to jump-start their system developed a triple-DES encryption and rithms to hardware. In short, FPGAs have and software application development decryption engine to compare a program- evolved into fully programmable systems before actual hardware is available, thereby mable system solution with an alternative and fast co-processors, rather than just flex- enabling concurrent design, saving valuable software implementation. A compressed ible relations of the ASIC. development effort and delivering the best video stream formed the basis of test data.

36 Xcell Journal Spring 2003 Secondary PCI Host Primary PCI PMC #1

PCI-PCI Bridge Figure 1 - PPMC750 and RC1000 connected and functioning as a prototyping platform PMC #2 for programmable systems

PLX PCI9080 Clocks & Hardware PCI Bridge Control The selected hardware architecture was Xilinx initially based around a discrete IBM SRAM BANK BG560 512Kx32 e.g. 4085XL PowerPC™ processor and a Xilinx SRAM BANK 40150XV Virtex™ FPGA – effectively a first- 512Kx32 V1000 generation Virtex-II Pro™ prototyping SRAM BANK V2000E V812E platform (Figure 1). We used a PPMC750 512Kx32 SRAM BANK single-board computer from Wind River Isolation 512Kx32 and Celoxica’s RC1000 – a Virtex-based Isolation PCI card with a Xilinx FPGA and 8 Mb of +3v3/+2v5 Linear local memory (Figure 2). Regulator Auxiliary I/O Subsequently, we deployed a newer Header reference platform using the PowerPC 405GP processor (Figure 3). In addition Figure 2 - Above, the RC1000 is a Celoxica PCI slot-based card, which has a to PCI and PMC (PCI mezzanine card) secondary bus on-board for connecting other PMC based FPGA-cards as well as the existing FPGA on-board. Below the board is a schematic of the RC1000. connectors, this platform also featured a custom connector that allowed an FPGA daughtercard to be plugged directly onto The design platform is completed Nexus and DK the processor peripheral bus, thus provid- by a simple DAC interface, enabling Nexus is a powerful co-design environment ing even closer coupling, lower latency, the FPGA card to drive a video monitor for programmable systems. It supports and higher throughput. or a flat-panel LCD for standalone system partitioning, co-verification, and Various FPGA daughtercards can be demonstrations. co-simulation. Nexus allows you to fully used with this reference platform, such as the ADM-XRC from Alpha Data Development Tools Systems, Xilinx Durango, or the Proteus The 405GP processor runs Wind River’s IBM PowerPC microprocessor card from Wind River. Xilinx VxWorks™ real-time operating system Virtex-II FPGA Wind River’s Proteus card is equipped (RTOS), together with hardware bring-up with a Xilinx Virtex device and memory tools that allow close control of the boot includes 4 MB on-board SSRAM. The cycle for the board during the time period FPGA PMC can interface with any stan- before control is passed to the RTOS. dard PMC slot (with an image containing The PAVE Framework API from a PCI soft core) or the microprocessor Xilinx was used to program the FPGA local bus on Wind River’s SBC405GP with configuration files. single board computer. There is a sub- Determination of the system partition Proteus FPGA Daughtercard stantial performance boost from direct and application content for the FPGA Wind River SBC405GP processor bus connection, compared were developed using Celoxica’s Nexus co- with PCI. design environment and DK Design Suite. Figure 3 - Virtex-II prototyping platform

Spring 2003 Xcell Journal 37 explore the design space to identify calling software thread are handled trans- optimal system partitioning. System a = 3; a = 3 parently by the hardware side of the functionality can be simulated par DSM. { between hardware and software using a = 7; a = 7 b = 4 On the software side, there are two multiple languages such as C, C++, seq main parts to the DSM: the control/set- Handel-C, SystemC, and HDLs. { up phase and then the specific usage of b = 4; These models can be used throughout c = 5; c = 5 the custom hardware function. the design for co-verification. Nexus } Essentially, information about the FPGA } communicates directly during simula- configuration and available functions are a = b + c; a = b + c tion with popular, third-party, hard- retrieved by reading a memory- ware RTL simulators and software ISS Enhanced bit manipulation mapped register. User-defined environments. Preprocessor macros Parallelism - par{ } identifiers (called function Using DK, the resulting code may be ANSI-C Standard Library ie #define Structures Macro procedures addresses) are assigned to ANSI-C Constructs Macro expressions debugged using a familiar integrated Functions each available hardware Side Effects for, while, if, switch Arbitrary width variables development environment (IDE), and ie x = i++ * y++; Arrays Arithmetic Operations function, and these applications are compiled direct to the Pointers ie +, -, *, /, % Interfaces function addresses are Recursion FPGA fabric using device-optimized Bitwise Logical Operators RAM & ROM later used to commu- ie ~, ^, &, | synthesis. VHDL and Verilog output Signals nicate between the Floating Point Logical Operators is also supported for traditional RTL ie &&, ||, | Channels application software and synthesis. Handel-C Standard Library the functions implemented in hardware. Handel-C ANSI-C Handel-C Using this methodology, the We selected the Handel-C language for optimal system partition can be identi- hardware implementation, as it provides a fied by porting blocks of software to common level of abstraction and a com- Figure 4 - ANSI-C/Handel-C comparison Handel-C, for hardware prototyping, test- mon language base for both the hardware ing, and verification. DSM’s portability and software. The language has simple means that multiple partitions can be rap- extensions to ANSI-C (Figure 4) that can redevelop individual communications pro- idly evaluated, tested, and verified with the be leveraged to quickly create applications tocols and data-marshalling routines for software used as a testbench throughout. that fully exploit the capabilities of a pro- each application. This problem is overcome DSM also provides a functionally accu- grammable system, without compromising by using DSM (Data Streaming Manager), rate simulation environment that allows performance or area. a portable co-design API developed for ANSI-C programs and Handel-C applica- As a fully synthesizable language, every- hardware/software integration in program- tions to interact using the DSM (Figure 7). thing that can be described in Handel-C has mable systems. The ANSI-C program is run as a native exe- translation to hardware (Figure 5). The code cutable on the PC. The Handel-C applica- illustrates concepts and extensions, such as Data Streaming Manager tion is run using the simulation and par, chan, synchronization, functions, DSM is a portable hardware/software co- debugging capability of Celoxica’s co-design pointers, structures, interfacing, and extern- design API that offers a simple and trans- environment. A utility is provided through ing pure C functions for simulation. parent interface for transferring multiple which the data passing between the applica- With a simple timing model, each independent streams of data between hard- tions may be monitored to assist with assignment in a program takes one clock ware and software. DSM supports system debugging (Figure 8). All of the API func- cycle to execute, giving you full control partitioning and final implementation; it is tions are provided, allowing complete sys- over what is happening in the design at any both bus/interconnect and OS-independent; tem development to begin – without the point in time. Results are predictable and and for the developer, it simplifies the inte- development platform being available. Once controllable, and the facility for complex gration between the hardware and software working, the application can be easily trans- sequential control flows means there are no (Figure 6). ferred to the target platform for final testing. state machines to design. As an example, the hardware function reads parameters from an input port and Triple DES Encryption Run-Time Environment then writes the results to an output port. Our design example was based around Typically, the FPGA is connected to the All the complexities of receiving com- streaming of compressed and encrypted microprocessor in a memory-mapped or mands over the PCI or bus, routing param- video data. The Autodesk FLI file format programmed I/O fashion, but this creates eters to the appropriate hardware function, was used to compress the video, and an FLI the challenge of needing to develop and and then routing the responses back to the player, developed by Celoxica, was imple-

38 Xcell Journal Spring 2003 mented in FPGA hardware connected to The same C source code was ported to which subsequently allowed correct decryp- the processor via the PCI bus. To bench- Handel-C, optimized in terms of control- tion of the video stream. Implementing mark the design, we loaded a cartoon ani- ling parallelism and timing, and compiled three DES algorithms in sequence (triple mation into the memory on the processor to a gate-level design that was device- DES encryption) provided further increases board. A triple DES algorithm described in optimized for the target FPGA. in this standard’s security. Three 64-bit keys C ran on the PowerPC microprocessor. A 64-bit key was used for the encryption, were used for an encrypt/decrypt/encrypt cycle in a triple DES pass, and the same keys allowed decrypt/encrypt/decrypt for define WIDTH 9 // parameterisable data widths decrypting the data. typedef struct // complex number type { This was a robust test of performance. signed WIDTH re; signed WIDTH im; The algorithm was inherently sequential in } complex; software, but it could be heavily pipelined

set clock = external " ClockSource "; for a hardware implementation. void main() // single synchronous clock domain (ClockSource) { To measure the performance improve- chan complex cDataIn[2], cDataOut; // communication channels ment, we played a cartoon with each com- while(1) par // parallel hardware pressed frame being encrypted, decrypted, { DataIO(cDataIn, &cDataOut); // data input/output function and displayed on a VGA monitor. Both Transform(cDataIn, &cDataOut); // data transform function } hardware and software implementations } were displayed together. They were trig- void Transform(chan complex *pcDataIn, chan *pcDataOut) { gered to start simultaneously, with the complex DataInReg[2],DataOutReg; // complex data registers

par (i=0; i<2; i++) // replicted par{} hardware version programmed to cycle { pcDataIn[i] ? DataInReg[i]; // read complex numbers in parallel continuously until the software implemen- } par tation finished. Processed data from the { // single cycle multiplication of two complex numbers DataOutReg.re = DataInReg[0].re*DataInReg[1].re - DataInReg[0].im*DataInReg[1].im; microprocessor was fed to the FPGA, DataOutReg.im = DataInReg[0].re*DataInReg[1].im + DataInReg[0].im*DataInReg[1].re; } which as well as performing DES encryp- *pcDataOut ! DataOutReg; // write complex number }

void DataIO(chan complex *pcDataIn, chan *pcDataOut) { Handel-C Handel-C complex DataInReg[2],DataOutReg; // complex data registers

DataInput( &DataInReg[0] ); // data input function DataInput( &DataInReg[1] ); // data input function Hardware DSM Library par (i=0; i<2; i++) // replicted par{} { pcDataIn[i] ! DataInReg[i]; // write complex numbers in parallel Hardware Bus Controller } *pcDataOut ? DataOutReg; // read complex result DataOutput(&DataOutReg); // data output function } Software Bus Controller #ifdef SIMULATE // Simulation Testbenches Software DSM Library void DataInput(complex *pDataIn) { long Data[2]; scanf("%d%d", &Data[0], &Data[1]); // ANSI-C function pDataIn->re = adjs(Data[0], WIDTH); // type conversion from ANSI-C call Software Software pDataIn->im = adjs(Data[1], WIDTH); // type conversion from ANSI-C call }

void DataOutput(complex *pDataOut) { Figure 6 - DSM Data Streaming Manager long Data[2]; Data[0] = adjs(pDataOut->re, 32); // type conversion for ANSI-C call Data[1] = adjs(pDataOut->im, 32); // type conversion for ANSI-C call printf("DataOut: %d %d\n", Data[0], Data[1]); // ANSI-C function }

#else // Hardware Implementations DK Simulator void DataInput(complex *pDataIn) { interface port_in(signed WIDTH in) DataInPort(); // WIDTH-bit input port pDataIn->re = DataInPort.in; Handel-C Applications pDataIn->im = DataInPort.in; }

void DataOutput(complex *pDataOut) { signed WIDTH Data; interface port_out() DataOutPort(signed WIDTH OutPort = Data); // WIDTH-bit output port PC DSM Sim Data = pDataOut->re; Data = pDataOut->im; } #endif Processor Applications

Figure 5 - Handel-C code for single cycle multiplication of two complex numbers Figure 7 - DSM simulation environment

Spring 2003 Xcell Journal 39 Following more detailed partitioning analysis, performance closer to the theoretical limits might be realized by removing code and functionality that are not directly associated with the triple DES algorithm (for example, the FLI decoder, frame buffer, and VGA driver). Better performance would also be achievable by connecting the FPGA directly to the processor bus in a memory-mapped fashion rather than across the PCI bus.

Conclusion The performance analysis results demon- strated significant improvements in overall Figure 8 - DSM Sim Monitor for assisted debugging system performance and quality of design. The results were achieved using a software- compiled system design methodology – Encryption Decryption specifically developed for programmable Software FPGA Software FPGA systems – that consistently delivered the fastest time to market (some 50% to 75% Elapsed time for 1 Mb of data 5558.8 ms 424.8 ms 5562.9 ms 424.7 ms advantage in design time) without com- Cryptography rate 1.51 Mbps 19.7 Mbps 1.51 Mbps 19.8 Mbps promising performance or area. For example, using the selected devel- Table 1 - Performance statistics for triple DES encryption/decryption opment tools and run-time environment, the FLI player took 10 person-days to tion/decryption, was programmed to gen- implement, as did the triple-DES erate VGA signals. Output from both the Platform Clock Speed Performance functionality. On the other hand, inte- hardware and software implementations SBC405 300 MHz 1.51 Mbps grating these two blocks to produce the cartoon demonstration took just half a was merged to form a composite image on Xilinx FPGA 20 MHz 33.8 Mbps the monitor. day. Moreover, you can very quickly Table 2 - Performance comparison explore the design space, experiment, and Performance Comparison of hardware and software encryption analyze different hardware/software A test harness enabled triple DES per- trade-offs, and rapidly implement and formance to be benchmarked by stream- This test scenario showed an FPGA prototype the system. ing data into either the software or throughput about 13 times faster using Coupling Celoxica’s co-design tech- hardware encryption algorithm. hardware than running software on a nology with high-performance profiling Theoretical performance for the PowerPC microprocessor. Nevertheless, it tools in the development tool chain FPGA was calculated as follows: The was still about a quarter of the theoretical enabled further performance boosts and triple DES implementation produced a maximum rate. This indicated that the full time-to-market efficiencies. Overall 64-bit word every 19 clock cycles, giving benefits of placing core routines in hard- improvements in the quality of design a data throughput of 85.6 Mbps for a ware might be compromised by other sys- were realized by more informed and accu- device running at 25.4 MHz. tem bottlenecks. Further analysis showed rate partitioning decisions, better up- Actual performance was profiled using that there were overheads associated with front system verification, and by WindView, a diagnostic tool from Wind offloading functionality into hardware. maximizing the speed gains of hardware River that enables visualization and These overheads were associated with implementation while minimizing the analysis of performance and timing RAM access latency and/or bus speeds. negative impact of transferring data issues in embedded systems. It allowed We also calculated the performance of between the FPGA and microprocessor. triggers to be set at different points in the hardware and software encryption in the The bottom line is that these system- code, and then provided accurate cartoon demonstration. Results demon- level design qualities offer real and com- timing information for each trigger strated that the hardware performed 22 petitive advantages for designers of event. Performance statistics are detailed times faster on a 15 times slower clock, as programmable systems who want to move in Table 1. shown in Table 2. to volume production.

40 Xcell Journal Spring 2003 VersatileVersatile MicroEngineMicroEngine SimplifiesSimplifies EmbeddedEmbedded SystemSystem DesignsDesigns NMI Electronics’Electronics’ CPUCPU deploymentdeployment modulemodule uses XilinxXilinx Spartan-IISpartan-II andand Virtex-IIVirtex-II FPGAsFPGAs to achieveachieve highhigh levelslevels ofof customization.customization.

by Kevin Heawood correct peripheral mix and the interfaces based on an industry standard, such as Vice President, Strategic Marketing that go with a particular CPU. Even if you PC/104, or on a combination of a standard Intrinsyc Software, Inc. do find the right SBC, the peripherals may interface and a custom interface defined by [email protected] be too expensive for your application, or the board manufacturer. otherwise inappropriate. Peripheral functionality is normally pro- Typically, embedded systems are designed NMI Electronics Ltd. has developed an vided by standard chipsets and is deter- around a specific CPU architecture that SBC, or more specifically, a deployment mined according to whatever the device or comprises a 32-bit high-performance module, that greatly simplifies system CPU manufacturer considers the most processor unit, volatile and non-volatile design. The module contains all the main commonly requested functions. Beyond a memory, and a set of peripherals and inter- components of a 32-bit CPU system, few basics (for example, baud rates on faces specific to that system or to a particu- including volatile and non-volatile memo- UARTs or display resolutions), the mix of lar application. With on-chip clock speeds ry, but it uses either a Xilinx Spartan™-II peripheral functions is neither flexible nor reaching 400 MHz to 700 MHz and or Virtex™-II FPGA to provide a com- programmable, and so may be inconven- board-level clock speeds as fast as 133 pletely programmable peripheral set and ient or inappropriate for your application. MHz, system design has become increas- an interface that can be specifically tailored Using an FPGA effectively eliminates ingly time-consuming, complex, and risky. to any application (Figure 1). these restrictions. The FPGA can be placed Using a single-board computer (SBC) onto the CPU local bus and closely cou- or companion chips and interface logic to Programmable Interface and Companion Chip pled with the memory subsystem, thereby reduce risk can be difficult, however, as it is Most SBCs have a predefined interface and creating a highly programmable compan- not always possible to find an SBC with the set of functions. The interface is usually ion and interface chip.

Spring 2003 Xcell Journal 41 to a mobile phone and one for diagnostics. It may not be possible to find a CPU and JTAG, Debug and Test Connector companion chip with that particular CPU Flash EPROM SDRAM peripheral mix. By using an FPGA-based companion chip, however, you can implement a PCI host bridge as an interface to a high- performance standard graphics chip, plus a

FPGA Local Power CAN controller, and two UARTs, all with- Supplies in the FPGA. You could even implement a

12 CPU 88 general purpose, DMA controller to feed the graphics chip Specific pins FPGA I/O pins and service the CAN controller and SODIMM 144 Expansion Bus Connector UARTs, which frees up the CPU to perform more compute-intensive tasks. Figure 1 - µEngine general architecture NMI MicroEngine Interfaces performance of specific functions. Using the FPGA as the basis for the system This FPGA system architecture allows you Examples of such accelerators include: interface and peripheral companion chip also to implement the interface you require to • 2D display assistance makes it possible to isolate the CPU/memo- the rest of the product hardware in a way • Hardware cursor support ry/FPGA subsystem and mount this onto a that provides the optimum solution for your • DSPs. small printed circuit board. application. NMI supplies a number of In fact, this is what we at NMI have standard interfaces, including the following: Companion Chip Application Example done with our MicroEngine (µEngine). • PCI Imagine you are working on an automobile The µEngine is a small form factor deploy- • ISA navigation system that requires high- ment module that contains all the key ele- • PCMCIA. performance graphics, an interface to CAN ments of a high-performance, processor- It is also possible to support the CPU bus, and two serial ports – one for interface based system (CPU, flash memory, local bus or a custom-designed interface.

Peripherals CPU local External bus (on the Interfaces Because almost every application has a MicroEngine) (from the unique set of requirements, the peripherals MicroEngine) required for each specific system are as diverse as the applications themselves. CPU Interface PC/XT Interface bus Using the FPGA as the companion chip to Logic (CPU type NMI ISA Bus to PC/104 set at synthesis Interface IP Core fulfill these requirements, it is now relative- time) ly simple to mix and match a range of peripherals in a way that meets the exact External interrupt needs of your application. NMI Interrupt sources Controller Examples of peripherals that can be IP Core included in the FPGA are: • UARTs • SPI, I2C, and AC97 serial interfaces NMI SDRAM NMI Distributed NMI Display CRT or LCD display • Display controller (LCD or CRT) Controller IP DMA Controller Controller IP Core IP Core • Stepper motor controller • Camera frame grabber • PCI slave and host interfaces NMI Color NMI 2D Graphics • PC/104 interface. Space Converter Accelerator IP Core IP Core Accelerators You can also place application-specific accelerators (co-processors) into the FPGA. Figure 2 - Example µEngine FPGA design These accelerators assist the CPU in the

42 Xcell Journal Spring 2003 SDRAM), but which uses a Xilinx Spartan- tages. It enables upgrades of CPUs as well • Xilinx LogiCORE™ IP II or Virtex-II FPGA to provide the system as the use of CPUs of varying perform- • Third-party IP interface and peripheral functions. This ance levels, such as an application that • Your own IP arrangement provides a totally flexible core requires modest graphics performance in • Custom-developed IP. module, and it enables you to include pre- an entry-level product and high perform- These elements can be freely mixed in cisely the peripherals and interfaces you ance in another, high-end product. For the µEngine to produce the unique func- need for your system. It also means that instance, the entry-level product might be tionality required for any application. The you can use the same basic board in a wide based on a Hitachi SH3 (without FPU) NMI deployment module has many fea- range of equipment. 100 MHz µEngine (Figure 3) – and the tures that simplify integration into the In addition, the µEngine addresses the high-end product might be based on a final system. For instance, because most conceptually simple, yet practically more Hitachi SH4 (with FPU) 200 MHz systems using high-performance CPUs are difficult, problem of designing microproces- µEngine (Figure 4). Both units would use running an embedded operating system, sor systems with high-speed external clocks the same baseboard. such as Windows® CE.NET, we have pro- and buses. The µEngine is, in its vided Windows CE software own right, a self-contained, pre- drivers for all of our FPGA IP on tested, high-performance micro- the full range of µEngines. processor subsystem. All it needs What’s more, you can popu- to “run” is power. late a µEngine with various In other words, all you need FPGA densities: 50K to 200K to implement your application is gates on Spartan-II FPGAs, 50K a baseboard containing a power to 300K gates on Virtex-E supply and the specific interface FPGAs, and 250K to 1M gates logic to suit your application. In on Virtex-II FPGAs. This vari- many cases, the baseboard can be able gate population makes the relatively straightforward, and µEngine the most cost-effective use lower technology design and Figure 3 - Hitachi SH3 µEngine with Virtex XCV100 solution for almost any applica- less stringent manufacturing tion, interface, peripheral, or rules than the high-speed µEngine design. accelerator mix. The connection between the µEngine Lastly, to ease portability from one and the baseboard is made via an industry- FPGA device to another, our FPGA designs standard, 144-pin, SODIMM connector use only high-level description languages. that carries both power and logic signals. Eighty-eight pins of the interface are con- Conclusion nected to the FPGA and are completely NMI developed the µEngine deployment user-programmable. Twelve CPU-specific module through imaginative use of FPGAs pins carry such dedicated functions as in CPU-based systems, creating a high- serial ports, ADCs, DACs, or USB, performance module that provides depending on the CPU deployed on the extraordinary hardware flexibility and µEngine (Figure 2). upgradeability. The availability of FPGA The image for the FPGA is held in the IP and reference designs facilitates rapid µEngine’s flash memory and is completely and low-risk development of new products reprogrammable. You can even place more and applications, allowing companies to than one FPGA image on a µEngine, focus on adding value, rather than having enabling it to support multiple baseboards. Figure 4 - Virtex XC2V1000 implemented to reinvent the core technology. It does this by means of a mechanism that on Hitachi SH4 µEngine NMI offers several development plat- identifies the type of baseboard it is plugged forms, enabling you to easily evaluate the into at power-up and automatically loads FPGA Intellectual Property µEngine and its associated IP. the correct FPGA for the application. FPGA IP cores for the µEngine are avail- In addition, the ability to isolate the able from many sources: Editor’s note: Since this article was written, CPU from the baseboard allows you to • NMI provides a wide range of proven NMI Electronics was purchased by Intrinsyc plug different CPU-based µEngines into IP (for example, PCI host bridge, dis- Software Inc. For more information on the inno- the same baseboard. This is one of the play controllers, UART, frame grabber, vative use of Xilinx FPGAs in µEngines, visit µEngine architecture’s greatest advan- 2D graphics accelerator). www.intrinsyc.com/products/microengine/.

Spring 2003 Xcell Journal 43 by Jane S. Donaldson President Annapolis Micro Systems, Inc. [email protected]

PushPush thethe DSPDSP DSP developers and their customers know FPGA-based processing outperforms conventional processors on a board-for-board comparison, resulting in significant improvements in processing speed, size, weight, power, and costs. Your FPGA design can be a customized PerformancePerformance parallel processing chip, specifically crafted for a particular application, accelerating the application to run in hardware and at hardware speeds far faster than could be achieved with software on a generic processor. EnvelopeEnvelope • Process data in real time, on site, saving all the time and money involved in data collection and off-site processing.

Use CoreFire, the FPGA Design Enabler, for quick and easy • Modify the processing by simply high-performance DSP application development with Xilinx reconfiguring the chip (by downloading a different FPGA file) to fix bugs, Virtex-II FPGAs and Annapolis Micro Systems’ COTS hardware. to adapt to a new set of interface requirements, or to modify the processing in response to application input data or processed results.

• Deliver new applications in place, with no human on-site intervention, by any means of file transfer, including Internet, internal network, hard drive storage, smart card, or wireless modem.

To deploy your application quickly to meet customer demands, you need commer- cially available hardware with the latest Xilinx FPGAs – plenty of gates, ample memory, and fast standard I/O options like Fibre Channel 2 and 1.5 GHz A-to-D input. You need a quick and easy way to develop, modify, and test your applications. With VHDL, Verilog, or schematics, even the most experienced ASIC designers need many months to develop applications using upwards of 40 million gates for a single VME or PCI slot. You can jumpstart your DSP design process – saving time and money – by using the eighth-generation, commercial off-the-shelf (COTS), general purpose Xilinx Virtex™-II FPGA-based hardware from Annapolis Micro Systems, Inc. You

44 Xcell Journal Spring 2003 can develop at the application level with When you implement your digital sig- To illustrate the requirements of DSP the easy-to-learn CoreFire™ FPGA nal processing application in a Xilinx applications, we chose a radar signal Design Enabler. It’s loaded with high- Virtex-II FPGA, you build a customized, processor that uses a 16-channel channel- performance IP modules, created for your parallel-processing design that outper- izer with a polyphase FIR filter and FFTs to use by FPGA application design experts. forms both general-purpose processors and divide the incoming data into multiple fre- digital signal processing chips. Some of the quency channels for real-time processing. Meet the Demand for Real-Time Virtex-II features that enable this very high Refer to Figure 1 to see the data flow and DSP Applications performance are: processing required by this application. Managing digital signal processing data in The data comes into the system at 1200 • Chip performance in excess of real time for applications like radar and MegaSamples/second with 8 bits per sam- 300 MHz image processing is very demanding. You ple. This stream is broken into 16 data need: • Multiple on-chip memory banks for streams and processed with polyphase FIR vector-based processing filters and FFTs. The resulting data is again • High-speed real-time processing split into 16 channels, the nine most inter- • Very fast data rates • High ratio of memory to logic esting of which are chosen for further pro- • A combination of complex and real • Fast embedded multipliers cessing. Channels 1-3 are sent into the first data types radar signal processing module, 4-6 are • 16 pre-engineered clock domains sent into the second radar signal processing • Integer and floating point data repre- to support the multiple frequencies module, and 7-9 are sent into the third sentations and computation and multiple-phase requirements of radar signal processing module, all at 300 • Variable and changing data path sizes complex system design. MB/s per channel.

Radar Signal Processor 16 Channel Channelizer WILDSTAR II for VME

Raw Data - 1200 MB/s 900 MB/s 2700 MB/s 400 MB/s SDRAM Complex Data 64 MB 1200 MSPS @ 300 MB/s @ 8 bits/ Per Each of 9 Channels 80 Sample Radar Signal Two Parallel Processing PE0 MHz Channelizer 8 Bit Inputs at XC2V6000-5 ADC Polyphase Sample Rate 200 MB/s 16 FIR and FFT Over 2 Data PE Streams XCVE2000-8 PE 400 MB/s SDRAM XC2V3000-5 64 MB 900 MB/s 1.5 GHz A/D I/O Card Radar Signal Processing PE2 XC2V6000-5 405 SDRAM 150 MB/s 200 MB/s Write PPC 128 MB 900 MB/s Dual FC2 400 MB/s SDRAM Controller PE 150 MB/s 64 MB Write Buffer XC2V4000-5 for Xfer 150 MB/s to Disk 600 MB/s Radar Signal Write Dual FC2 Processing PE1 Controller XC2V6000-5 150 MB/s SDRAM Write 128 MB 200 MB/s Fibre Channel 2 I/O Card

Figure 1 - Channelizer block diagram

Spring 2003 Xcell Journal 45 The pre-channelizer raw data, at 1200 MB/s, is divided into three streams of 400 MB/s each. Each stream is stored in its own SDRAM block. The appropriate raw data is folded back into the radar signal processing with the channelizer-processed data. The radar signal processors perform filters, FFTs, and other DSP functions on the data.The final result is sent to buffer memory, and then out to disk at 600 MB/s.

Use COTS Hardware from Annapolis for Fast Deployment On the right side of Figure 2 is the Annapolis WILDSTAR™ II VME board. This board is available with one, two, or three Virtex-II 6000 or 8000 FPGAs, with Figure 2 - WILDSTAR II up to 72 MB of DDR2 SRAM in 18 banks, for VME with 1.5 GHz A/D and Fibre Channel 2 I/O cards up to 384 MB of DDR SDRAM in three banks, and programmable flash memory for storing FPGA files for fast reconfiguration. On the top left side of Figure 2 is the Annapolis 1.5 GHz A/D I/O card, which, the WILDSTAR II card, buffers it, and for this application, plugs into the top slot sends it out to disk via the four Fibre of the WILDSTAR II card. This board Channel 2 channels with the help of the comes with a MAX 104 or MAX 108 8-bit QLogic and PowerPC chips. A/D converter, one Virtex-II 1000 or Table 1 is a comparison of the system 3000, one Virtex-E 1000 or 2000, with up data transfer speeds provided by this to 2 MB of DDR2 SRAM accessible by the Annapolis system to the data transfer Virtex-II bridge PE and up to 16 MB of speeds required by this channelizer applica- ZBT SRAM in four banks accessible by the tion. You can see that the system easily Virtex-E PE. meets the throughput requirements for the On the bottom left side of Figure 2 is channelizer application. the Annapolis Fibre Channel 2 I/O card, These WILDSTAR II and I/O boards which, for this application, plugs into the are the eighth-generation of Xilinx bottom slot of the WILDSTAR II card. FPGA-based, high-performance process- This board has four full duplex Fibre ing boards produced by Annapolis Micro Channel 2 I/O channels, with peak rates of Systems. Annapolis continues to push the 200 MB each way per channel. The board high-performance envelope, using latest- comes with two QLogic ISP2312s, a standard Xilinx FPGAs. Virtex-II 4000, 264 MB of DDR SDRAM Figure 3 - WILDSTAR II for VME with I/O cards ready to insert in chassis in four banks, and an IBM PowerPC™ You Can Build Your Application Quickly 405 running Linux. and Easily with CoreFire Figure 3 shows the boards connected The channelized data and raw data are You can see that the channelizer applica- together and ready to fit into one slot in the both split into three paths in the Virtex-II tion fits on the chosen WILDSTAR II sys- VME chassis. The ADC on the 1.5 GHz PE0 on the WILDSTAR II card. Each tem, so acquiring readily available A/D I/O card performs the analog input Virtex-II PE on the WILDSTAR performs hardware for your application will be easy. and A/D conversion. The Virtex-II and radar signal processing functions. The The next step is to figure out how you Virtex-E FPGAs on the 1.5 GHz A/D I/O Virtex-II PE1 on the WILDSTAR II card will develop the host software and FPGA card create parallel data streams and per- gathers and processes the results for output implementations for your application. form the channelizer function, using to the Fibre Channel 2 I/O card. Remember, this project stretches across polyphase filters, FFTs, and other DSP The Virtex-II FPGA on the Fibre three different printed circuit cards and six functions, as well as data reduction. Channel 2 I/O card accepts the data from different FPGAs.

46 Xcell Journal Spring 2003 find it very easy to use the CoreFire design Channelizer Data Transfer Data Path Speed Requirements System Data Transfer Speed window to modify and rebuild your FPGA design until you are satisfied with the Input to ADC Speed 80 MHz 75-150 MHz results. Use the CoreFire program to build and debug each of your FPGA designs, and ADC to PE Speed 1200 MegaSamples/s 80-1500 MegaSamples/s then use the jar file and the WILDSTAR II A/D I/O to PE0 on WS II 3900 MB/s 4000 MB/s API to develop your overall host program. WS II PE to its SDRAM 900 MB/s 950 MB/s Conclusion PE0 to PE1 1300 MB/s 4000 MB/s It is easy to push the DSP performance PE0 to PE2 1500 MB/s 4000 MB/s limits with Virtex-II FPGAs and Annapolis Micro Systems boards. Some of WS II PE2 to FC II I/O 600 MB/s 4000 MB/s our customers have gone from initial Table 1 - System data transfer speeds versus channelizer requirements inquiry to first deployment in as short a time as two months. The classic VHDL methodology for Button to check for errors and sizes and to When you buy high-performance, world- implementing applications on FPGAs is build an encrypted EDIF file. Use the class, Virtex-II-based off-the-shelf hardware difficult, and requires expert knowledge Xilinx ISE tool to place-and-route each from Annapolis, it is easy to build and mod- and countless months of painstaking work. FPGA design. ify applications. You have more time to fine- You cannot wait months to deploy your Modify and use the jar file created by tune your algorithms. You can get prototypes product. You need a tool that will allow the CoreFire build to load your new file up and running sooner, so you have more you to deploy your project within weeks, into your WILDSTAR II and I/O card time to test market your product. not months or years. You must be able to hardware. Use the CoreFire debugger to Final development is just as easy. You develop new application files rapidly and view and modify register and memory con- will be in the market far ahead of your easily, as well as accommodate specification tents in the FPGA, and to step through the competition, saving time and money. To changes, functional additions, and algo- data flow of your design running in the real learn more, contact Annapolis Micro rithm development. physical hardware. Systems, Inc., at 410-841-2514, or visit Using the CoreFire FPGA Design Suite Armed with your debug results, you will our website at www. annapmicro.com. from Annapolis, you can implement each of your algorithms in as little as a few hours. Use the standard WILDSTAR II C or Java API to write your host program. The CoreFire board support packages handle all the I/O, memory, and FPGA interfaces seamlessly, providing excellent performance. Refer to the CoreFire screen display in Figure 4. CoreFire is a graphical user interface FPGA application development tool that allows you to build your application very quickly by dragging and dropping library elements onto the design window. Choose from more than 400 expertly crafted modules. Modify your input and output types, numbers of bits, and other variables by changing module parameters with pull- down menus. Move modules around on the screen and reconnect with a flick of the mouse. The modules automatically provide correct timing and clock control. Insert debug modules to report actual hardware values for in-the-loop debugging. Hit the Build Figure 4 - CoreFire screen display

Spring 2003 Xcell Journal 47 ISE 5.2i Further Reduces Xilinx industry-leading design tools provide a low-cost, low-risk, and Your Design Costs high-performance logic solution.

by Mark Goosman / Lee Hansen Product Marketing Managers ISE WebPACK includes: Xilinx, Inc. [email protected], [email protected] • ModelSim Xilinx Edition (MXE-II) Starter Version ModelSim® XE is a complete HDL Reducing project costs isn’t new to most simulation environment that has been designers, but in tight economic times the optimized for programmable logic pressure to bring project costs down design, enabling you to quickly verify becomes much more important. In his Xcell source code and functional and timing Journal Winter 2002 article titled, “When models of your design. Total Cost Management Counts, Xilinx PLDs Pay Off” (www.xilinx.com/publica- • HDL Bencher tions/products/cool2/xc_tcm43.htm), Eric Within the ISE WebPACK toolset, the Thacker described how programmable logic HDL Bencher™ test bench generator devices (PLDs) offer significant benefits in Free ISE WebPACK automatically imports the current HDL dynamic, rapidly changing markets. The ISE WebPACK™ design suite is the design file and creates an editable stim- With our ISE development systems and ideal Web-downloadable desktop solu- ulus waveform by default. development options, Xilinx not only sup- tion. It offers a complete development • StateCAD ports the benefits of PLDs but also offers environment with modules from ABEL The StateCAD® FSM wizard automates additional cost savings. ISE 5.2i, the latest and HDL synthesis to device fitting and the state machine design process. You release of our design software, delivers a JTAG programming. ISE WebPACK can specify complex state machines to number of productivity technologies that tools are a subset of our award-winning quickly meet tough product require- shorten logic design flow, optimize design ISE Foundation™ design tools, provid- ments. The state machines can then be results, shorten implementation and verifi- ing instant access to the ISE tools at no automatically translated to an HDL for- cation cycles, and provide interactive design cost. By providing a design solution that mat you can include in your design flow. assistance. At the same time, ISE 5.2i is always up-to-date, with error-free enables you to realize even faster design per- downloading and single file installation, • ChipViewer formance. The end result to you is cost sav- Xilinx has created a solution that allows ChipViewer is a pre- and post-fit graph- ings across your entire project. instant productivity. Because ISE ical utility to assign or view pin place- The shorter design cycles and time-to- WebPACK development tools are avail- ment and implemented logic for all market advantages of FPGAs and CPLDs able for download from the Xilinx web- Xilinx CPLD devices. This removes the mean that you need less engineering site at www.xilinx.com/ise/webpack5, risks associated with changes late in the resources. This allows you to make the best you can get started immediately on design process. use of your staff when difficult economic designs for leading Xilinx CPLDs and • XPower conditions restrict your ability to hire more mid-density FPGAs. XPower is a graphical power-analysis engineers. Our fast, efficient, and highly This Web-downloadable design solu- tool. Total device power, power per-net, productive ISE software tools help you get tion reduces your design costs by includ- fitted, routed, partially routed, or the job done in less time, and they make ing all the tools you need to complete unrouted designs can be easily analyzed. each engineer more productive. your design.

48 Xcell Journal Spring 2003 Optimized Design Performance the nearest competitive offering. ChipScope Pro tool allows you to monitor – and Device Utilization In the Virtex-II fabric, the LUT and flip- in real time – any signal in the FPGA. This Xilinx ISE design tools have raised the flop can be used independently, without includes the IBM® PowerPC™ 405 periph- industry standard for both design perform- restrictions. In Stratix devices, a LUT can- eral bus in the advanced Virtex-II Pro ance and device utilization. Through patent- not be used with its flip-flop in all circum- FPGA. Design signals are captured and ed implementation algorithms, ISE allows stances, because one input pin of the LUT brought to the outside world through the you to achieve the fastest possible design is shared with the path that has direct access FPGA JTAG programming port. This min- performance. Compared with competitive to the flip-flop. By default, when the flip- imizes the amount of dedicated FPGA space solutions, designs can achieve better than flop is not fed by any logic, the LUT in that and I/O pins required – as opposed to using 15% higher performance. LE is unavailable to the rest of the design. more traditional ASIC and competing This performance edge means you can As a remedy, Quartus II tools provide a reg- FPGA debug methodologies. potentially target a lower cost device – leveraging faster performance from the software. Thus, you can hit your timing goals earlier, spending less time in the design flow. Virtex-II (2 true independent paths) Stratix (shared pin on the LUT) For example, based on benchmark data, LUT LUT you can achieve 20% to 30% better performance in Virtex-II Pro™ designs using FF LUT4 FF ISE than you can get from an offering from 0 0 the leading competitor. In many cases, you 1 1 can target your design to a slower speed grade device and still achieve targeted design performance. Figure 1 - Connectivity LUT to flip-flop in Virtex-II and Stratix devices ISE also reduces project costs by packing more logic into Virtex™-II devices, letting you fit your design in the smallest possible ister packing option (off by default) to Additionally, signal monitor points can device. Advanced FPGAs are not solely enable the packing of LUTs along with the be changed through the ISE FPGA editor made of look-up tables and flip-flops any- flip-flop. This still does not allow LUTs without having to re-compile the design, more. Today’s logic fabrics are best described using 4-inputs to be packed, because the saving even more debug time. The as “feature rich.” This trend requires sophis- connectivity restriction is still present. ChipScope Pro analyzer cuts verification ticated algorithms in both synthesis and Figure 1 shows LUT to flip-flop connectiv- times dramatically, even when the device is implementation tools, providing optimal ity in both Virtex-II and Stratix devices. on the board – or in the field. performance and logic utilization by leveraging new hardware features. Advanced Technology Streamlines Conclusion Xilinx ISE development tools separate the Design Flow For logic design, the true cost of the proj- unrelated functions and assign them to dif- ISE is also packed with advanced software ect includes much more than just device ferent clusters (called a slice) on the fabric. technology designed to accelerate the more cost. Factors like development cost, project This avoids conflicting placement constraint time-consuming parts of the design and timelines, access to development tools, and guarantees optimal performance. As the debug logic flow. designer efficiency, ability to achieve device gets full, powerful algorithms pack Incremental Design is a technology device performance goals, and verification unrelated logic into common clusters. This included in ISE that shortens design re- costs can have a big impact on the overall gradual process ensures that the device is uti- compile times. By locking performance for project cost. lized at its best, with minimal impact to areas of the design that don’t need to change, Xilinx allows you to meet – or beat – design performance. Incremental Design lets you perform re- your project budget through free ISE With competing FPGA solutions devel- synthesis and re-place-and-route on only WebPACK development tools, other ISE opment tools, the packing of logic requires a those pieces of the design that have to configurations, a complete design envi- special option. With this option turned on, change. This reduction in time adds up fast ronment, ISE’s powerful implementation packing is limited, because an unrelated in the crucial verification cycle, where debug tools, robust verification technology, and LUT using its 4-inputs and a flip-flop can- changes are common. more. As you evaluate various logic not be merged together in any logic element The Xilinx ChipScope™ Pro integrated design solutions, look at the total costs – and the limited packing comes at a cost to logic analyzer also delivers added productiv- associated with design tools and designer design performance. Virtex-II logic utiliza- ity to the verification cycle. Through small, resources in addition to the cost of tion with ISE comes out 15% better than easy-to-place software debug cores, the the device.

Spring 2003 Xcell Journal 49 PrototypePrototype AllAll XilinxXilinx DevicesDevices In-SystemIn-System oror StandaloneStandalone –– withwith MultiPROMultiPRO DesktopDesktop ToolTool

Now you can program/configure Xilinx devices from your by Michelle Badal Marketing Manager, Configuration Solutions desktop with a single programming hardware solution. Xilinx, Inc. [email protected]

In today’s competitive landscape, engineers need to have flexibility – it’s the key to maximizing your design efforts. The MultiPRO Desktop Tool, our latest programming hardware, offers a number of features to provide you with the most flexible, low-risk, and cost-effective way to prototype all Xilinx devices.

MultiPRO Desktop Tool Increases Flexibility and Reduces Cost MultiPRO Desktop Tool (Table 1), released in December, is a complete programming solution that enables you to realize the full potential of Xilinx programmable logic devices. Designed specifically to interface with a PC via parallel port (IEEE 1284), the MultiPRO Desktop Tool provides: • In-system programming and configuration of all Xilinx devices • Standalone programming of CoolRunner™-II CPLDs and XC18V00 ISP PROMs • Comprehensive configuration mode support • A low-cost solution by integrating desktop programmer and download cable functionality.

50 Xcell Journal Spring 2003 In-System Programming • Connect the MultiPRO pod to a Comprehensive Configuration Mode Support With MultiPRO Desktop Tool, you can CoolRunner-II CPLD or XC18V00 The comprehensive configuration mode program and configure all Xilinx devices ISP PROM adapter (adapters available support gives you the flexibility to choose in-system. MultiPRO Desktop Tool’s flexi- for all package types) the most suitable mode for your design, bility enables you to program your devices while using only one programmer hardware • Run software to program target right at your desk by following these steps: solution. MultiPRO Desktop Tool is sup- CoolRunner-II or XC18V00 ISP • Power MultiPRO with +5VDC, via ported by ISE iMPACT download software PROM (Figure 2). an external AC power brick v5.1i SP3 or higher, and supports JTAG (IEEE 1149.1), Xilinx slave serial, and • Use ISE iMPACT v5.1i SP3 or SelectMAP modes (Table 1). higher on your PC • Interface the MultiPRO pod with MultiPRO Desktop Tool Features the PC via a parallel port (IEEE • Provides standardization 1284) (cable included) with J Drive™ IEEE 1532 • Connect the MultiPRO pod to programming engine the PCB with the ribbon cables • Provides prototype debug environ- (included) ment with ChipScope™ ILA and • Run software to download bit- ChipScope Pro compatible stream and program/configure tar- • Automatically adapts to target get FPGA, CPLD, or PROM I/O voltage. (Figure 1). Conclusion Figure 1 - MultiPRO with download cable ribbons Standalone Programming MultiPRO Desktop Tool reduces the MultiPRO Desktop Tool reduces the risks and costs of prototyping in a num- risks of prototyping by enabling you ber of ways: to perform standalone programming • It provides a single solution that on CoolRunner-II CPLDs and meets all your desktop program- XC18V00 ISP. As with in-system pro- ming and download cable needs. gramming, you can program your • It frees up more expensive resources devices right at your desk by following designed for mass programming these steps: and configuring of devices. • Power MultiPRO with +5VDC, via an external AC power brick MultiPRO Desktop Tool provides the most comprehensive programming solu- • Use ISE iMPACT v5.1i SP3 or tion to date from Xilinx. For more infor- higher on your PC mation on how to increase flexibility, • Interface the MultiPRO pod with reduce risk, and reduce system prototyp- the PC via a parallel port (IEEE ing cost, visit: www.xilinx.com/xlnx/xil_ 1284) (cable included) Figure 2 - MultiPRO with programming adapter prodcat_product.jsp?title=csd_cables/.

Devices Supported Software Host/Interface Configuration Mode

MultiPRO • Standalone programming of CoolRunner-II • ISE iMPACT download software • PC (Parallel Port - • JTAG (IEEE 1149.1) mode Desktop Tool CPLDs and XC18V00 ISP PROMs v5.1i SP3 or higher IEEE 1284) • Slave Serial mode • In-system programming of all Xilinx ISP • SelectMAP mode PROMs and CPLDs • Desktop Programming mode • In-system configuration of all Xilinx FPGAs

Table 1 - MultiPRO Desktop Tool

Spring 2003 Xcell Journal 51 by Peter Seng Managing Director SENG digitale Systeme GmbH [email protected]

Existing application-specific development DLKDLK EnablesEnables and production solutions may be fine for high-volume markets, but for low- and mid-range product volumes, these solutions are most often either too expensive or otherwise not suitable. As a result, these types of products are usually built with off-the-shelf PC- or microcontroller Cost-EffectiveCost-Effective board-based components. Although standard components are relatively low cost, they are rarely a perfect fit for your design The new design environment specification and almost always require you to develop and add specialized hard- from SENG digitale Systeme ware. Programmable FPGA-based hardware offers an attractive alternative, but DesignDesign GmbH – based on FPGAs and so far, entrance barriers have been high, and development time and costs difficult standard parts – makes it easier to predict. to estimate development costs. SENG digitale Systeme GmbH recently introduced a scalable development environment that makes it easy to prototype and produce programmable FPGA-based hardware using standard parts and at predictable costs. The core of the SENG environment is the Digital Logic Kernel (DLK), which consists of an FPGA + CPU + memory + PC interface. The DLK uses standard Xilinx parts, including FPGA and CPLD source codes, software, and design rules, thus eliminating the need to program equipment or preprogram parts before use. The DLK is, by default, a self-bootable device. To exchange data or for administration purposes, the DLK is accessed via an integrated PC parallel printer port interface. All you need to build, program, and service devices in the field is a PC with a parallel printer port (Figure 1).

The DLK Concept The DLK is well suited for designing, building, and programming products containing any kind of hard or soft CPU with internal and external memory, and programmable logic. The system interconnect concept makes the DLK appropriate for several CPU, FPGA, and CPLD device families.

52 Xcell Journal Spring 2003 PLD memory; disconnect from PC. The others... CPU (RS-232, I2C) application is ready. (others...) 9. For administration, upgrade, or com- Flash RAM munication purposes, just reconnect PC parallel portPC parallel interface Connector D25 memory board to PC.

Conclusion

Connector Connector User application The DLK concept is an open system environment that uses standard tools and readily FPGA available elements, enabling development of digital products quickly and cost-effectively. This development system is especially suited for developing prototypes and for low- and JTAG Interface PLD/FPGA chain (Power supply) (Power supply) mid-range product volumes. The system is compatible with Xilinx ISE design tools and Figure 1 - DLK concept Windows operating systems. The kit parport includes several real-world examples to min- Components SPP+PS2+EPP imize training. All source code is available. The DLK development kit consists of soft- PP_D(7:0) DIN(7:0) DLK development kits, including demo ware and hardware source code, software, PP_nBUSY DOUT(7:0) boards, are available now, starting at $298. System and support is available from SENG schematics, and a demonstration board. PP_ERROR digitale Systeme GmbH. Write to Components include: PP_SLCT [email protected] or visit www.seng.de for more • FPGA internal 8-bit bidirectional PP_PE information. interface to the PC with parallel I/O PP_nSLCTIN A(7:0) bus structure (Figure 2) PP_nSTROBE nWR

• Driver and software for several operat- PP_nAUTOFD nRD ing systems, including Win 9x- and PP_INIT PPinit WinNT 4.0-based OS PP_ACK PPack • Flash CPLD-based state machines and memory CLK ReProg

• FPGA internal emulation of JTAG Figure 2 - PC parallel port interface, programming interface ISE schematic symbol

• Administration software running on ware according to your specific needs; PC for Windows™ OS, including compile. Figure 3 - DLK running on Windows dynamic link library, application program, and source code (Figure 3) 3. Connect application-specific board to PC parallel port; configure it with JTAG • Demonstration board (Figure 4) emulation bitstream using DLK software. • FPGA source codes (varies with grade 4. Use iMPACT software to embed flash of license) memory on board CPLD. • 8032 C sample source code containing 5. Develop application-specific logic and LCD, UART, interrupt, and I2C CPU programs; compile. routines. 6. Download FPGA bitstream to the board and communicate with it using DLK Design Flow DLK routines. 1. Design an application-specific board according to the DLK principles, or use 7. Debug your application. one of the available DLK boards. 8. Store the final FPGA configuration bit- 2. Rearrange FPGA, CPLD, and PC soft- stream and CPU program in flash Figure 4 - DLK51 board

Spring 2003 Xcell Journal 53 UseUse SpyGlassSpyGlass PredictivePredictive AnalysisAnalysis forfor EffectiveEffective Atrenta Inc.’s SpyGlassSpyGlass softwaresoftware usesuses aa “look-ahead” engine based on fast-synthesis technologytechnology toto helphelp youyou identifyidentify potentialpotential problemsproblems –– RTLRTL CodingCoding and fix them – early in the design process.

by Bhanu Kapoor Technology Director Atrenta Inc. [email protected]

Decisions made early in the design process affect the entire chip design process. Thus, to manage designs effectively, the tools you use for RTL (register transfer level/language) design must enable a stepwise refine- ment of the code by predicting likely downstream problems. For example, to achieve high performance, the guidelines for synchronous design must be met, as they play an important role toward meeting timing objectives. Recognizing the importance of adherence to appropriate guidelines during the RTL coding process, Xilinx recommends a set of coding guidelines for designs targeting the various Xilinx device families. Most designers currently follow a manual method of design reviews to check to see if coding guidelines have been met. This manual method is both prone to error and time consuming. Automating adherence to

54 Xcell Journal Spring 2003 RTL coding guidelines, therefore, would process of RTL code development. Policy Engine greatly increase the timely success of proj- Examples of such policies include lint, The policy engine accelerates electronic ects targeting high-performance designs on reuse, verification, timing, and testability. product development by enabling develop- Xilinx device families, such as the ment teams to capture, aggregate, distrib- Virtex™-II series of platform FPGAs. Rule Groups ute, and apply constraints and requirements The SpyGlass™ Predictive Analyzer A collection of rules within a policy is early in the development cycle. The fast tool from Atrenta Inc. automates the termed a group. Typically, groups consist synthesis engine internally creates a struc- process of meeting design guidelines of rules addressing a particular area of tural view of the design and foresees down- through the use of an underlying predic- interest in the RTL code. Groups are hier- stream issues early in the development tive analysis technology. This approach archical, meaning that a group can contain cycle, thereby eliminating errors at the ear- performs detailed structural analysis on other lower-level rule groups as well as liest possible stage. RTL aspects to check for Although you can coding styles, RTL-handoff, check many complex rules design reuse, clock/reset statically on the internal requirements, verification, synthesized structural timing guidelines, and view, other rules require much more. The SpyGlass some understanding of the tool employs a “look- logic function of the ahead” engine based on design. This is particularly fast-synthesis technology true for testability-related and a fast, built-in, cycle- checks. In order to per- based simulator to carry out form a testability check, such analyses. Such a look- you must use an evaluator. ahead methodology only The evaluator in the poli- uses RTL code as input to cy engine is effectively a the SpyGlass tool and does zero-delay, cycle-based not require any vectors or simulator you can use to assertions. It is therefore resolve functional design very easy to set up and run. constraints, as well as carry out a simulation required Policy-Based RTL Coding to set up the design for The SpyGlass™ Predictive testability analysis. Analyzer is a comprehensive, Policy implementa- policy-based system that tion requires a traversal defines, in a succinct and engine that works on the Figure 1 - Example of policy consisting of rule groups and rules organized way, design poli- RTL netlist produced by cies that automatically point fast synthesis. The out time-consuming downstream issues. individual rules. A group provides an addi- SpyGlass engine generates basic primitives The SpyGlass tool then offers suggestions tional level of modularity in applying poli- for rules that cover design traversal. The on ways to overcome these downstream cies to a given RTL design. connectivity information, coupled with issues during the RTL code development the traversal primitives, enable you to cre- process. This helps you to meet your time- Rules ate rules that look for violations across the to-market target. A rule is the most fundamental element in design hierarchy. the policy-based management system. It By applying a policy over a given RTL Policy describes a set of conditions that – when design code, you can obtain a wealth of A policy is a collection of rules for a specif- checked by the policy engine – result in an information about the design – as well as ic purpose, such as rules associated with a indication of a specific problem with the any of violations of the defined rules. The standard, a silicon vendor, or a specific RTL code. Rules allow standard analysis SpyGlass interface highlights rule viola- design tool. Policies enhance user extensi- of the RTL code. An example of a policy, tions on the specific lines of RTL code. bility, allowing you to develop and manage rule groups, and rules in the SpyGlass sys- Not only does the SpyGlass predictive customized groupings of rules more easily. tem is shown in Figure 1. With this sys- analysis tool offer an extensive help func- The design methodology includes a set of tem, you can selectively turn on specific tion to assist you in understanding the policies that you can select during the groups or specific rules within a group. nature of the violation, but also, where

Spring 2003 Xcell Journal 55 to Xilinx FPGA designs: • Single clock edge is used in the design to clock the data. • Internally derived or generated clocks and set/resets are not used in the design. • Appropriate synchronizers are used as signals cross clock-domain boundaries. • The number of levels of logic between registers is less than a specified value. • The fanout of any design node does not exceed a specified number. • Unintended latch inferences are avoided in the code. • If-then-else or case statements are not nested deeper than three levels. • Naming conventions are followed while coding pipeline logic. Figure 2 - Example of rule violation and associated debug windows • Assess memory requirements and find out feasibility of conversion into regular flops, appropriate, it makes suggestions for mentation. For this reason, it is important distributed memory, and block memory. resolving the problem. to identify the clocks and resets in the • For state machines, separate the next-state Moreover, each violation is highlighted design, as well as the clock-tree network, decoding and output decoding into two on a structural view of the design, which early in the code development process. discrete processes or always blocks. can give you additional debugging infor- Many designs, especially in the net- mation for complex problems. And, as working domain, typically use multiple • Outputs from design units are registered. shown in Figure 2, cross-probing between clock domains, a situation that presents • Identify dead code and floating nodes the code and structural views reveals an challenging integration and verification in the design. example of a clock domain synchronization issues. Signals that originate in one clock problem. Additionally, a comprehensive set domain may be used in other clock Several of these checks are critical for of reporting mechanisms and violation domains. Correct chip operation can only meeting timing objectives in high perform- management utilities – such as waiver to be ensured if rules governing correct use of ance designs. allow a specific rule violation in a given signals across asynchronous clock bound- point in the design – further facilitate the aries are followed. Conclusion RTL coding and assessment process. The routability of design is another We have described the elements of policy- aspect that can benefit greatly by follow- based design methodology, enabled Coding for Xilinx Devices ing appropriate coding guidelines. The through SpyGlass predictive analysis, for Xilinx recommends a set of coding guide- number of I/Os, fanout of intermediate effective RTL coding leading to efficient lines for designs targeting the various nodes, and widths of internal buses must implementation of designs on Xilinx-based Xilinx device families. These include gener- meet specified limits for various parts of a platforms. This approach performs detailed al coding guidelines, as well as synthesiz- given design. structural analysis on RTL code to check able HDL guidelines. Furthermore, syn- Using the SpyGlass predictive analysis for coding styles, design reuse, clock/reset chronous design guidelines help you meet system, you can check for all of the above requirements, verification, timing guide- timing objectives in high-performance violations, problems, and issues during lines, and more. SpyGlass software includes designs such as those targeting the Virtex- the RTL coding process itself. This saves the most comprehensive coverage of II FPGAs, which can accommodate as many design iterations that might other- Xilinx-recommended coding guidelines many as 8 million gates and operate at fre- wise be needed to fix errors later in the that are essential for reuse and efficient quencies as high as 400 MHz. design cycle. design implementation. Clock distribution also requires specific Specifically, you can use the SpyGlass To learn more about SpyGlass predic- design code guidelines for successful imple- tool to check the following issues relevant tive analysis, go to www.atrenta.com.

56 Xcell Journal Spring 2003 Enter the next dimension

As a professional engineer you know that electronics signal integrity analysis, and VHDL simulation. For circuit design involves many dimensions. Capturing design data, implementation, nVisage's extensive output options analyzing and verifying the circuit's performance, for board layout are complemented by powerful RTL-level synthesizing VHDL for FPGA implementation - these are VHDL synthesis for multiple FPGA target architectures, not fragmented processes, but multiple dimensions of a allowing you to design for both programmable devices single design flow. That's why Altium has pioneered and PCBs within the one application. nVisage DXP - the first design capture system to go beyond current one-dimensional schematic capture systems. Why not envisage a new dimension in design and take a look at the With nVisage DXP you can capture the multiple dimen- nVisage multimedia demonstration sions of your design within a single application. nVisage at www.nVisage.com. You can also includes multiple design entry methods that support both download a free 30-day trial version, schematic and VHDL based entry. nVisage also gives you from the website or order the trial multiple analysis and verification tools, such as integrated CD by calling: 800-470-2686. XSpice/SPICE 3f5 circuit simulation, pre- and post-layout

Switch to nVisage * Multi-dimensional design capture for every engineer from $1,495 1039-ADXA-1

*Retail value $2,995. Receive up to 50% discount if you switch to nVisage DXP from another qualifying Schematic Capture package. nVisage, nVisage DXP, Design Explorer and Altium and their respective logos are trademarks or registered trademarks of Altium Limited or its subsidiaries. GetGet RealFastRealFast RTOSRTOS withwith XilinxXilinx FPGAsFPGAs Real-time operating systems implemented in Xilinx FPGAs enhance performance, improveimprove predictability,predictability, simplifysimplify design,design, andand lowerlower systemsystem costs.costs.

by Tommy Klevin Product Manager RealFast [email protected]

Designers once used field programmable gate arrays (FPGAs) as the glue logic to interconnect discrete components on a printed circuit board (PCB). As we start a new era, though, we can now build complete systems in one FPGA chip. To take advantage of the possibilities offered by Xilinx Platform FPGAs, we must consider the kind of operating system best suited for these one-chip systems, as well as other systems that contain more than one central processing unit (CPU) or one digital signal processor (DSP). The RealFast company in Sweden has many years of experience in FPGA design, in operating systems, and in real-time systems development. We recently developed the Sierra 16 real-time operating system (RTOS), implemented in an FPGA, to enhance performance and predictability and to reduce system complexity. This article describes the Sierra 16 RTOS, explains what it can do, and explores the possibilities of putting operating system functions into hardware.

58 Xcell Journal Spring 2003 In the coming months, RealFast will complement the Sierra series with other configu- Operating System Accelerator rations to suit both small and large systems.

Interrupt Handling Irq Irq = Interrupt Handler Tmq = Time Management Handler Interrupt handling is critical in most Sem = Semaphore Handler systems and is particularly critical in real- Tmq GBI = Generic Bus Interface GBI time systems. Interrupts introduce unpre-

TDBI TDBI = Technology Dependent Bus Interface Local Bus interface Sem dictable behavior because of the difficulty Scheduler Accelerator in predicting when they will arrive. The Sierra 16 RTOS has an intelligent interrupt handler that makes it possible to achieve fully predictable system behavior, even though it is not possible to know Figure 1 - Block schematics of the Sierra 16 HW-RTOS exactly when the interrupts will arrive. We treat the interrupt service routines Sierra 16 RTOS you save a lot of space in memory, and at the (ISR) like any ordinary task with a certain Why implement an operating system in same time, you increase performance and priority in the system. As shown in Figure 2, hardware? Is it not better to have the oper- achieve a fully predictable system. when not taking care of any service, ISRs are ating system in software as we are used to? The Sierra 16 RTOS has the same func- in a wait state. When the interrupt arrives, Well, who would have imagined in the tionalities found in other traditional soft- the ISR is transferred to the ready state in the mid-1980s that mathematical operations ware real-time operating systems. As shown Sierra 16 RTOS scheduler. ISR tasks execute would be performed by hardware in the in Figure 1, the Sierra 16 RTOS has the by priority. To switch between different tasks, CPU instead of software? A closer look at following configuration: the Sierra 16 RTOS interrupts the CPU with operating systems shows that many of the a task-switch interrupt. This is the only inter- • Handles as many as 16 tasks at eight low-level primitives are similarly independ- rupt that interrupts the CPU. All mecha- priority levels ent of the operating system. These primi- nisms are handled by the Sierra 16 RTOS – tives can be performed by either hardware • Supports 16 semaphores from the physical pin, to scheduling, to inter- or software. rupting the CPU so it can switch from pres- • Handles timing – delay, timers Some example low-level primitives are: ent task code to the ISR code and start executing it. • Task handling – create and schedule • Supports eight external interrupts. tasks • Synchronization – semaphores, Interrupt occurs message passing Suspended • Time handling – delay, timeout handling • Interrupts.

What is the point of doing these opera- Blocked/ Ready Wait for IRQ tions in a hardware FPGA system instead Waiting of in software? Even in the new platform FPGAs, memory is limited if you do not add external memory. In small, cheap products, you probably want to squeeze Running your application into the available memory ISR=task is moved from in the FPGA. If you use a traditional oper- ”wait for IRQ“-state to ready-state ating system, the operating system eats substantial memory, leaving less space for the application. Figure 2 - ISR is transferred to the ready state when an Dormant On the other hand, platform FPGAs external interrupt occurs. contain logic that is just waiting to be used. By moving the operating system to logic,

Spring 2003 Xcell Journal 59 desired performance boost compared to single CPU systems. Tool Environment An RTOS implemented in hardware FPGA System-on-Chip can, on the other hand, perform many par- Port allel operations. This capability provides a completely new approach to solving the IP problem with multiprocessor systems. All algorithms that are complex and time- MAMon consuming in software are moved to hard- RTOS ware. All synchronization of tasks, such as Core semaphores, is handled by the HW-RTOS. System Bus The Sierra 16 RTOS supports single CPU systems, but RealFast will soon release another HW-RTOS series that provides support for systems with two or more CPUs or DSPs.

MicroBlaze CPU + Sierra 16 HW-RTOS A powerful but very inexpensive RTOS solution is to combine a Xilinx Figure 3 - On-chip MicroBlaze™ software CPU with a monitoring system RealFast Sierra 16 HW-RTOS. With a small 150K-gate Spartan™-II FPGA and (maybe) some external memory, you can Nonintrusive Monitoring and Debugging intellectual property (IP) component create a complete and advanced real-time Run-time observability in embedded sys- available for the Sierra 16 RTOS is the system at a very low cost. tem architectures is a requirement for test- Multiprocess Application Monitor In bigger Spartan-II and Virtex™-II ing, debugging, and validating design (MAMon). As shown in Figure 3, this FPGAs, it is even possible to exclude exter- assumptions made about the behavior of monitor is implemented in hardware and nal memory, as the internal block RAMs the system and its environment. The clas- listens nonintrusively to the different will be big enough for both driver and sic approach to run-time observability is parts inside the Sierra 16 RTOS. application in many cases. to apply monitoring – the process of The software driver for the Sierra 16 has detecting, collecting, and interpreting Multiprocessor Solutions Made Easy a minimal footprint, only a couple of kilo- run-time information regarding the sys- Loosely coupled systems and tightly cou- bytes. This driver, together with the HW tem’s execution behavior. pled systems contain more than one kernel, makes a very fast and predictable When monitoring real-time systems, CPU. In a loosely coupled system, all RTOS kernel. With a system running at 50 an important aspect is to minimize – or CPUs have their own memory for such MHz, most of the system-calls are finished better yet, completely avoid – the intru- things as program codes and data, and the within 2 microseconds. siveness of the monitor on the system’s CPUs communicate through a shared timing and execution properties. Failure memory. In a tightly coupled system, Conclusion to handle monitor intrusiveness may lead often called Symmetrical Multiprocessor The new Xilinx Platform FPGAs offer new to probe effects that cause nondeterminis- System (SMP), all CPUs share the same design approaches and new possibilities. tic behavior in programs with race condi- memory and execute code and data from Because logic has grown, we need to start tions and poor synchronization. this memory. solving problems in a parallel manner, When we use a hardware RTOS (HW- The problem with these kinds of sys- rather than in the sequential manner that RTOS) like the Sierra 16 RTOS, we avoid tems, in particular SMP systems, is the we are used to. the intrusiveness problem. Because the difficulty in making efficient operating An operating system implemented in Sierra 16 RTOS comprises a number of systems that use the efficiency of multiple hardware is a step towards parallel process- hardware components that know at each CPUs. It takes complex algorithms to ing, and we believe that these kinds of cost- moment exactly what is going on in the handle and synchronize multiple CPUs. effective solutions will meet the increasing system, this information can be extracted Pure software solutions for operating sys- demands of applications in the future. To without any software probes normally tems with multiprocessor support learn more about the RealFast Sierra 16 used for extracting information. One become inefficient and do not give the HW-RTOS, visit www.realfast.se/.

60 Xcell Journal Spring 2003 Virtex-IIVirtex-II ProPro PlatformPlatform FPGAsFPGAs DeliverDeliver ProvenProven InteroperabilityInteroperability

With verified interoperability between specialty ASSPs and Xilinx Virtex-II series FPGAs, you can focus on designing and debugging your system rather than how to make all the parts work together.

by Anil Telikepalli Marketing Manager, Virtex Solutions Xilinx, Inc. [email protected]

Virtex™-II series Platform FPGAs include a rich set of system features such as embedded memories, DSP, and I/O connectivity with which you will design most blocks of your system. But your design might need to incorporate specialty ASSPs. These could be large memories, optical transceivers, analog filters, A/D and D/A converters, specialized DSP/network processors, connectors, and many others. Interoperability of all these devices on the printed circuit board (PCB) is a tough challenge. Xilinx Virtex-II Pro™ and Virtex-II FPGAs solve these challenges by enabling you to connect to almost any ASSP without a hitch. This article will show you how.

Interoperability Can Make or Break Your Design Making multiple disparate devices compatible – often from different vendors – can be an arduous task involving numerous design and debug iterations. Even if you do succeed in making all the devices work together, the inefficiency of such a design is likely to impose a performance penalty that pre- vents the overall design from achieving its

Spring 2003 Xcell Journal 61 maximum performance objectives. Pro Platform FPGAs and IP cores for pro- Unknown interoperability of devices tocols, the Xilinx SystemIO solution pro- increases the risk of cost overruns and vides ultimate connectivity with ASSPs. slipped schedules, and can even jeopardize The Virtex-II Pro FPGA family was built design completion. Starting with a on the Virtex-II family in a completely Platform FPGA that delivers interoperabil- scalable framework – specifically to address ity of the devices you will use in your interoperability and prevent problems design is critical to predicting project time, common with ground-up competing archi- cost, and feasibility. tectures. Virtex-II Pro FPGAs inherit the entire Virtex-II SelectIO-Ultra technology, Xilinx Leads in Interoperability which directly addresses connectivity and Figure 1 - Four RocketIO channels at 2.488 Gbps Xilinx offers complete interoperability interoperability with ASSPs using estab- with Ignis Optics’ four IGP-2000 OC-48 SFP optical transceivers solutions with leading ASSPs across a lished parallel system interfaces. Virtex-II broad range of interface standards. This FPGAs have been widely used as an inter- Given that ASSP vendors have been working with SelectIO-Ultra technology for more than two years, Virtex-II Pro FPGAs give you unparalleled edge in interoperability over any competing FPGA in the world.

Figure 2 - SPI4.2 static and dynamic alignment mode HW verification completed ensures that the integration of third-party face to bridge disparate ASSPs. Given that with PMC-Sierra S/UNI-9953 ASSPs will go smoothly, that the cost of ASSP vendors have been working with design implementation will be predictable, SelectIO-Ultra technology for more than and that your projects will succeed the first two years, Virtex-II Pro FPGAs give you time through. unparalleled edge in interoperability over By choosing Virtex-II Platform FPGAs any competing FPGA in the world. over alternatives that lack powerful capabil- Virtex-II Pro devices support dozens of ities for interoperability, you will lower the emerging, established, and even proprietary risk of slipped deadlines, significantly lower connectivity standards. The embedded the cost of the project, accelerate design RocketIO serial transceivers enable the and development cycles, and simplify the widest range of programmable serial band- expensive system debug process. width: 622 Mbps to 75 Gbps covering With its extensive network of partner- emerging serial standards such as Gigabit ships with ASSP vendors through the Ethernet, 10 Gigabit Ethernet XAUI, PCI Reference Design Alliance Program – the Express™, SxI-5, TFI-5, Serial RapidIO™, Figure 3 – SPI4.1 HW interoperability with only such program in the programmable InfiniBand™, and Fibre Channel. AMCC Ganges-II device logic industry – Xilinx gives you the edge. SelectIO-Ultra parallel I/O system Xilinx and ASSP vendors work closely to interfaces support such standards as SPI3 deliver fully interoperable solutions. This (POS PHY 3), SPI-4.1 (Flexbus 4), Xilinx goes far beyond this by providing allows you to simplify system development SPI4.2 (POS PHY 4), XGMII, RapidIO, pre-verified interface and controller IP and eliminates the tough challenges of mak- PCI, PCI-X, CSIX, HyperTransport™, cores – jointly verified in hardware with ing all the different devices on the board XSBI, and SFI-4 using 22 single-ended ASSP vendors during several months of work together seamlessly. You can focus and six differential electric standards engineering testing to guarantee full your creative energy on product differentia- including LVTTL, LVCMOS, PCI, interoperability. Reference designs and tion and maximizing system performance. PCI-X, HSTL, SSTL, LVDS, LVPECL, boards are also available to demonstrate and HyperTransport. true hardware interoperability, as shown Virtex-II Pro Delivers Proven Interoperability Checking silicon features and electri- in Figures 1, 2, and 3. With proven RocketIO™ serial and cal I/O standard support is only the first The list of ASSP devices interoperable SelectIO™-Ultra parallel I/Os in Virtex-II step towards interoperability with ASSPs. with Virtex-II series FPGAs is shown in

62 Xcell Journal Spring 2003 Table 1. The latest list of interoperable you are in safe hands when working with credit. Focus your creative energy on devices, along with reference designs, is their devices. product function, performance, and dif- always available at www.xilinx.com/company/ Start your next design on a robust ferentiation, rather than worrying about reference_design/interop_solutions.htm. These platform that has a proven record of sev- whether the components will work ASSPs include devices from Intel, AMCC, eral years of diverse interoperability to its together. PMC-Sierra, Mindspeed Technologies, IDT, and other vendors. ASSP Vendor ASSP Device ASSP Device Type Interface Standard Reference Design In addition, Xilinx actively participates in interoperability events and testing activities. AMCC Several devices Several ViX-v3 to SPI4 Yes Recently, Xilinx submitted Virtex-II Pro AMCC Ganges-II OC-192, 4xOC-48 POS, SPI4.1 Coming FPGAs for interoperability testing to the 10 ATM, SONET/SDH Gigabit Ethernet Consortium at the Framer/Mapper University of New Hampshire’s InterOperability Lab, which was attended by Bay Microsystems Montego Network Processor Host/Accountant Yes 13 participating vendors. (ftp://public.iol. unh.edu/pub/10gec/Oct02_GTP_release.pdf). Broadcom BCM8040 Multi-rate XAUI Coming Octal SerDes Xilinx Lets You Choose the Best ASSPs The Virtex-II Pro solution permits you to Ignis Optics 2.5 Gbps SFP Optical transceiver 2.488 Gbps Yes design-in the best ASSPs that fit your per Channel needs. ASSPs come with varying and often IDT TeraSync FIFO FIFOs HSTL Coming incompatible interface standards. Lack of extensive interoperability capabilities can Intel IXF1810x Family OC-192c POS/GFP & SPI4.2 Coming limit design alternatives. For example, 10Gb Ethernet choosing a network processor ASSP sup- MAC/Framer porting the HyperTransport standard could prevent you from choosing a Mindspeed OptiPHY M29730 OC-192/STM-64, SPI4.2 Yes security co-processor ASSP supporting Technologies Quad OC-48/STM-16 only the RapidIO standard. Even if POS Framer these two ASSPs matched your requirements perfectly, you may still be forced to Netlogic NSE4256, CAM NSE bus/nPu bus Yes choose less suitable ASSPs just because Microsystem IXP1200, SA110 they work together. PMC-Sierra S/UNI-2xGE Dual Gigabit SPI3 Coming With the Virtex-II Pro FPGAs’ support (PM3386) Ethernet Controller for multiple system connectivity standards and proven ASSP interoperability, you can PMC-Sierra Xenon Family 10 Gb Ethernet SPI4.2 Coming effortlessly bridge standards and ASSPs. Select the right ASSPs that best fit your PMC-Sierra S/UNI 9953 10 Gb PHY for SPI4.2 dynamic Coming product needs based on capability and POS, ATM, Ethernet alignment price, not on how well they work together. The Xilinx leadership in interoperability TeraCross TXQ1450, TXS1400 Scheduler, traffic Line Card Interfcace Yes gives you the clear advantage in differentiat- generation & Control Link monitoring ing your products from the competition.

Velio VC1003 SONET backplane, LVDS Yes Conclusion gbE SerDes Interoperability of disparate devices on the board is a critical parameter for design suc- Velio VC1021/1022 OC-48, OC-192 SerDes LVDS Yes cess. Your design productivity, performance, and even the probability of completion can Velio VC1061/1062 Storage Quad SerDes 10 GigE MAC No be significantly improved by thoroughly verified interoperability among semiconductor ZettaCom ZTM202 Traffic Management CAM, SRAM, CSIX Yes devices and vendors. Xilinx and the leading ASSP vendors want you to be confident that Table 1 - Virtex-II series interoperable solutions

Spring 2003 Xcell Journal 63 Available now, the The Power of Xtreme Processing Each IBM PowerPC runs at 300+ MHz and 420 Dhrystone MIPS, and is new Virtex®-II Pro supported by IBM CoreConnect™ bus technology. In addition, the FPGA fabric Platform FPGAs herald an enables ultra-fast hardware processing, such as TeraMACs/s DSP applications. With Xilinx’s unique astonishing breakthrough in system-level solutions. IP-Immersion™ architecture, system architects can now harness the power of With up to four IBM PowerPC™ 405 processors high-performance processors, along with easy integration of soft IP into the immersed into the industry’s leading FPGA fabric, industry’s highest performance programmable logic. Designers have absolute freedom to implement any design they can imagine. Never before has such Xilinx/Conexant’s flawless high-speed serial I/O performance been achieved enabling hardware accelerated processing and technology, and Wind River System’s cutting-edge multiple processing in an off-the-shelf device. embedded design tools, Xilinx delivers a complete Enabling a new development paradigm development platform of infinite possibilities. For the first time ever, system designers can partition and re-partition their system between hardware and software at any time during the development The era of the programmable system is here. cycle__even after the product has shipped. That means you can optimize the overall system, guaranteeing your performance target in the most cost-efficient manner. You can also debug hardware and software simultaneously at speed. Over 10Mb Block RAM

Up to 4 IBM PowerPC™ Processors Immersed in FPGA Fabric

Up to 24 Embedded Rocket I/O Multi-Gigabit Transceivers

Up to 24 Digital Clock Managers

On-Chip Termination with XCITE Technology

Up to 556 Multipliers

The ultimate Connectivity platform Industry-Leading Tools from Wind River and Xilinx

The first programmable device to combine embedded processors along with Optimized for the PowerPC, Wind River’s industry-proven embedded tools are 3.125 Gbps transceivers, the Virtex-II Pro series addresses all existing connec- the premier support for real-time microprocessor and logic tivity requirements as well as emerging high-speed interface standards. Xilinx designs. And driving the Virtex-II Pro FPGA is Xilinx’s Rocket I/O™ transceivers offer a complete serial interface solution, supporting lightning-fast ISE 4.2i software, the most comprehensive, 10 Gigabit Ethernet with XAUI, 3GIO, SerialATA, InfiniBand, you name it. easy-to-use development system available. And our SelectI/O™-Ultra supports 840 Mbps LVDS and other parallel methodologies. Think of it: up to 16 Rocket I/O 3.125 Gbps transceivers at your disposal, See the new Virtex-II Pro Platform FPGA in detail. delivering the ultra-high bandwidth you need (40+ Gigabit per second total Visit www.xilinx.com/virtex2pro today and step into the era of the serial bandwidth) for real market challenges like optical networking, high-end programmable system. broadcast, storage and DSP systems and so much more.

The Power of Integration

In a single off-the-shelf programmable device, system architects can take The Programmable Logic CompanySM advantage of microprocessors, multi-gigabit transceivers, digital clock managers, www.xilinx.com/virtex2pro highest density on-chip memory, on-chip termination and more. The result is

©2002, Xilinx, Inc. All rights reserved. The Xilinx name, the Xilinx logo, and Virtex are registered trademarks. RocketI/O and SelectI/O are trademarks, and The a dramatic simplification of board layout, a reduced bill of materials, and Programmable Logic Company is a service mark of Xilinx, Inc. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: IBM, IBM logo, PowerPC, PowerPC logo, and CoreConnect. All other trademarks and registered trademarks are the property of their unbeatable time to market. respective owners. Spartan-IIE Family Grows

Building on a tradition of cost-effective speed and reliability, two new Xilinx Spartan-IIE devices offer enhanced flexibility and higher densities – at the lowest cost per I/O.

by Rufino Olay Solutions Marketing Manager Xilinx, Inc. [email protected]

With today’s challenging economic times, making cost-sensitive products such as plasma displays, set-top boxes, and broadcast video equipment requires a low-cost solution. Additionally, the integration of more features in digital consumer products often demands more pins than previously available. Thus, we were challenged to create a low-cost solution that addressed the need for a high pin count device. Two new additions to the successful Spartan™-IIE family of devices meet these exacting criteria. The two devices, XC2S400E and XC2S600E, are low-cost, high-density, high-I/O devices that will allow you to target a wider spectrum of designs than you previously could with programmable logic.

66 Xcell Journal Spring 2003 Fourth Generation Spartan FPGAs Device XC2S50E XC2S100E XC2S150E XC2S200E XC2S300E XC2S400E XC2S600E Since introducing the Spartan series more System Gates 50K 100K 150K 200K 300K 400K 600K than four years ago, Xilinx has delivered Logic Cells 1,728 2,700 3,888 5,292 6,912 10,800 15,552 four generations and shipped more than 40 Block RAM Bits 32K 40K 48K 56K 64K 160K 288K million Spartan series FPGAs, with prices Distributed RAM Bits 24K 37K 54K 73K 96K 150K 216K starting as low as $2.55 per device. With the DLLs 4444444 Spartan series, you get the advantages of I/O Standards 19 19 19 19 19 19 19 high I/O-count ASICs and gate arrays – Max Differential I/O Pairs 83 86 114 120 120 172 205 and you get the added flexibility of a gener- Max Single Ended I/O 182 202 265 289 329 410 514 al purpose, programmable architecture. You Packages TQ144 TQ144 also get the benefit of a proven architecture PQ208 PQ208 PQ208 PQ208 with the industry’s fastest and most produc- FT256 FT256 FT256 FT256 FT256 tive software tools, plus the most compre- FG456 FG456 FG456 FG456 FG456 FG456 hensive offering of IP cores from Xilinx and FG676 FG676 third-party AllianceCORE™ vendors. Table 1 - Spartan-IIE product matrix On November 18, 2002, Xilinx announced an extension to the Spartan product line to address customer demand for 100% even higher density and higher I/O-count ASIC Shipments by I/O Count devices in the price ranges required for con- 90% sumer applications. As shown in Table 1, the 80% >1,000 XC2S400E device (400,000 system gates 70% 597 - 1000 and up to 410 I/Os) and the XC2S600E 60% 457 - 596 device (600,000 system gates and up to 514 Spartan-IIE Family Extension 50% 305 - 456 I/Os) give you the ability to integrate more 196 - 304 functionality into a smaller form factor and 40% Prior Spartan-IIE Family 85 - 195 still meet your stringent budget require- 30% <84 ments. With the Spartan-IIE family, you get: Served Market by Pin Count 20%

• The lowest cost per I/O – The new 10% Spartan-IIE devices give you more 0% I/Os at much lower prices than any <84 <196 <305 <457 <597 <=1,000 >1,000 competing FPGA. Source: Xilinx

• Up to 514 I/Os – With the highest Figure 1 - Spartan-IIE extensions meet customer demand for more I/O. number of I/Os available in the low-cost segment of the FPGA industry, Spartan-IIE devices allow More I/Os for Digital Consumer Applications These two new FPGAs deliver a greater you to put higher density ASIC Moving to advanced technologies has than 67% increase in I/O capacity over designs into FPGAs and still keep always enabled Xilinx to dramatically previous Spartan-IIE offerings, and up to the benefits of reprogrammability. reduce costs and simultaneously bring 100% more I/Os than competing FPGAs • Four DLLs – The DLLs allow easy larger density devices within the reach of in the same density ranges. Additionally, clock duplication, quick frequency many more cost-conscious customers. the I/Os can be configured as differential adjustment, faster state machines Now, with the two new Spartan-IIE I/O pairs (up to 205), giving you LVDS using different clock phases, FPGAs, this same advantage is being performance up to 400 Mbps. de-skewing of the incoming clock, brought into the I/O arena. and generation of fast setup and Traditionally, consumer applications Supporting More I/O Standards hold times or fast clock to outs. with more than 305 I/Os have required Today’s designs are more complicated than ASICs, as shown in Figure 1. But with the ever, with the majority typically containing • More than one billion MACs/sec introduction of the XC2S400E and numerous I/O standards on a single PCB. per dollar – You can implement high XC2S600E devices, you now have up to With the Spartan-IIE FPGAs you can con- performance DSP functionality at the 410 and 514 I/Os, respectively, and with nect as many as 19 different I/O standards lowest cost possible. the added advantage of reprogrammability. on a single chip. This flexibility gives you

Spring 2003 Xcell Journal 67 processor and peripheral solution on the XC2S50E XC2S100E XC2S150E XC2S200E XC2S300E XC2S400E XC2S600E market today for traditional 16-bit and 32- bit microprocessor and microcontroller TQ144 102 102 applications. Coupling the ISE and MicroBlaze solu- PQ208 146 146 146 146 146 tions gives you a winning combination with the benefits of: FT256 182 182 182 182 182 182 • Flexibility – Easily create a customized FG456 202 265 289 329 329 329 processor design that can be modified at any time during the design cycle. FG676 410 514 • Guaranteed product availability – You can purchase the MicroBlaze source Table 2 - Density migration possibilities TQ144 2 devices for migration code and never have to worry about PQ208 5 devices for migration processor obsolescence. FT256 6 devices for migration • Reduce system cost – By integrating the ability to bridge different I/O standards FG456 6 devices for migration your entire processing solution within and protocols – and completely eliminates FG676 2 devices for migration one device you not only save time and the need for costly bus transceivers. effort but you also reduce your bill of multiple small, fast, and flexible memories materials, inventory, and debug time. Scaleable Footprints situated close to the logic. As with all other Following the tradition of Xilinx FPGA Xilinx FPGA families, the 4-input LUT in Conclusion families, the Spartan-IIE family continues to Spartan-IIE devices can also be used as To be successful in this tough, competi- support density migration across common memory, where it can be configured as tive marketplace, you need an inexpensive packages without changing the PC board ROM, or single-port or dual-port synchro- and flexible design solution. You also footprint. The relative positions of VCC and nous RAM. need fast, reliable performance and the GND remain constant across like packages, We’ve doubled the amount of distrib- lowest cost per I/O. With the expanded unlike competing low-cost FPGAs. uted RAM in these new devices to 150K Spartan-IIE family there is no faster, safer, For example, when using a FG456 for the XC2S400E, and 216K for the or lower cost way to develop next-genera- package, six density members of the XC2S600E. tion consumer products. Spartan-IIE family can be interchanged, To obtain a free Spartan-IIE Resource providing outstanding flexibility for design Software and IP CD containing a wealth of information on revision, upgrade, or cost optimization, as The entire Spartan-IIE family is supported the XC2S400E and XC2S600E Spartan-IIE shown in Table 2. by the Xilinx ISE (Integrated Software devices, visit www.xilinx.com/spartan2e. Environment) tool set, which includes the More RAM industry’s most advanced timing-driven The new Spartan-IIE devices differentiate implementation tools available for pro- themselves in the amount of available grammable logic design, along with design memory both in block RAM and distrib- entry, synthesis, and verification capabili- uted RAM. ties (www.xilinx.com/ise5/). The XC2S400 has four columns and There are also more than 200 IP cores, the XC2S600E has six columns of block including PCI, DSP, and other pre- RAM, equating to 160K and 288K of designed and tested solutions (www.xil- block RAM, respectively. This is a greater inx.com/ipcenter/) to get your designs up than 4X increase in capacity over the previ- and running fast. ous largest density Spartan-IIE device. With this increase comes the possibility of Processing Solutions on a Budget storing more data, coefficients, FIFO func- By utilizing the Xilinx MicroBlaze™ 32- tions, and larger general memory functions bit field programmable controller option in your memory-hungry applications. with the Spartan series devices, you can cre- Xilinx continues to be the only FPGA ate an easy-to-use, low-cost, customized supplier to offer distributed RAM, which is processing solution. The MicroBlaze an ideal solution for designs that require processor is the fastest, most powerful soft

68 Xcell Journal Spring 2003 by Steve Prokosch CPLD Product Marketing Xilinx, Inc. [email protected] CoolRunner-II When you’re looking for a simple, low cost, CoolRunner-II and easy-to-use programmable logic device that incorporates multiple functions, think Xilinx CoolRunner™-II CPLDs. These versatile, nonvolatile devices can save you time and money on your next design by reducing board costs and redesigns. SolutionsSolutions Cost can be thought of in several different ways, depending on your point of view. For a buyer, it’s the bottom line of a bill of materials. For a design engineer, it’s time invested and looming deadlines. Engineers also face tradeoffs, such as how Save Money fast a product can be designed with a mini- Save Money mal number of board layouts. Your success may rest in the decisions you make while try- ing to accomplish this goal. When making With CoolRunner-II CPLD devices, component choices, it pays to have built-in flexibility; with reprogrammable logic, you great things come in small packages. get the best dollar value as well as the ability to deliver products ahead of schedule. Additionally, engineers must consider such factors as single-chip solutions, package size, density, versatility, flexible I/O structures, and the ability to modify pin functionality after placement on the board. By considering these items before parts selection, you can save costs and still maintain flexibility.

Single Chip Integration If you have unlimited board space, a large stocking warehouse, and inexpensive test and assembly costs, some of these cost factors may not enter into the price equation. But if you’re in a competitive marketplace, usually one or more of these items will be scrutinized: • Power-efficient board size • Minimum number of parts and suppliers • Low assembly costs.

Board Size Typically, the packaging of your product is defined by board size, which is driven by the number of components you need to

Spring 2003 Xcell Journal 69 get the job done. If you can squeeze out you can squeeze into a single CPLD and reflects a direct assembly charge. If your the required functionality and still stay still get the low power operation you desire. contract manufacturer charges you to within the power budget, you have met Maintaining multiple components for stock devices, this will also add cost to your goal. specific functions can lead to a nightmare your end product. Don’t forget that cost can mean board for procurement. By expanding your sup- By keeping the component count space to some engineers and inexpensive plier base, you increase demands on many down, you can dramatically reduce both parts to others. As shown in Figure 1, with different departments within your compa- direct expenses and the indirect cost of CoolRunner-II you can select small BGA ny and thus lengthen your time-to-market. doing business. packages such as 56- or 132-ball chip scale These areas may include accounting, packages (CSP) for high integration or flat shipping and receiving, or component Integration and Flexibility pack (FP) packages for low-cost solutions. engineering. If you have a quality depart- If you need multiple I/O standards for If you are concerned with board size, you ment, they may want reports on each indi- unique memory devices or CMOS level may need unique CSP options. Xilinx also vidual device. translation, conversion devices may be nec- offers 0.5 mm to 0.8 mm ball spacing Furthermore, the more devices you essary. Depending on your application, spe- packages that can save you more than 50% specify, the higher the chance of encoun- cialty memory devices may also be required. when compared to similar I/O count FP package options. Although the space savings from a 14 mm-by-14 mm, 100-pin FP package to an 8mm-by-8 mm, 132-ball CSP package may seem trivial, consider the routing involved. With flat packs, all pins typically route outward from the package. With BGA packages, routing can be achieved by running traces between the adjacent solder balls. These packages also offer more options when using denser, multilayer PCBs. This may yield twice the routing efficiency of a comparable FP package, further reducing board space. Thus, the capability of these small packages goes well beyond the “wow factor” of their physical size.

Parts and Suppliers Lower power consumption can also be achieved through reduced component counts. A single low-power CPLD device Figure 1 - CoolRunner-II CPLD package offerings improves reliability by reducing the total number chance of cold or weak solder joints that may cause intermittent fail- tering a production problem. It would be If your processor does not support ures. Heat dissipation may be reduced. devastating to not be able to ship a multi- HSTL or SSTL memory types, you may And more solder joints also increase the million-dollar product due to a $2 part on need to select voltage referenced to chance of manufacturing problems. The back order. By using more parts, you also CMOS translators. In high-volume appli- more solder joints, the higher the chances run a higher risk of device obsolescence. cations, these single-function translators of developing manufacturing problems. This may not cause a delay in shipment, can cost from $4 to $6 in 48-pin pack- Heat dissipation may be reduced through but it typically costs a board re-layout. ages. The problem is, they only serve one fully utilizing a single part instead of purpose. If you don’t use all of the pins, powering multiple parts that may not be Assembly Costs it’s wasted board space and power. With a fully utilized. These two factors can have The more components shipped to your single CPLD, you get translation coupled a direct impact on customer service and contract manufacturer, the more money with extra logic capabilities and the free- customer reliability ratings you spend in shipping costs. Each elec- dom to use pins for other purposes Figure 2 illustrates the many functions tronic component placed on your board besides translation.

70 Xcell Journal Spring 2003 if not, just leave it disabled. Because you don’t always know if you need hysteresis, this flexibility may save a new board layout. Moreover, with features such as clock dividers and doublers (DualEDGE flip- flops), you can set up independent clock domains in CoolRunner-II CPLDs, thus eliminating the need for independent oscil- lators or crystals. The devices can handle fast-running sequential functions such as pulse width modulator, conversion functions (BCD to decimal), and serial communications functions.

Conclusion Due to their multifunctional nature, Xilinx CPLDs can integrate many applications to save costs in your design. The high- performance, low-power CoolRunner-II Figure 2 - Multiple functions in a single CPLD CPLDs can reduce the number of board redesigns, minimize the total number of Even if you don’t use any specialty spins is input hysteresis. Schmitt trigger devices, and increase overall flexibility. This memory, what about legacy parts that use inverters can cost from $4 to $8 in 20-pin will have a direct impact on bringing your different voltage levels than your processor? packages. These devices usually operate from product to market faster. Again, you have the choice of purchasing a 1.6V to 3.6V, which gives them a wide oper- To get you started with CPLDs, Xilinx single function device that can cost around ating window. CoolRunner-II CPLDs have offers multiple aids, including beginner $2 for a 48-pin package. A comparable pin input hysteresis on every input pin. And tutorials with demo boards and reference count CPLD can cost half as much as this because you can configure CoolRunner-II designs that include detailed application single function device – and again, give you CPLD input buffers to any voltage from notes with HDL code. Some design exam- more functionality. So if you need voltage 1.5V to 3.3V, they also have a wide range of ples include SMBus, I2C, SPI, and proces- level translation in the range of 1.5V to operation. In a head-to-head comparison, sor interfaces. You can also look at full-up 3.3V, CoolRunner-II CPLDs can also pro- CoolRunner-II CPLDs can cost 75% less reference designs, such as designing an vide this integrated function. than a discrete Schmitt trigger device. MP3 player. Whatever your level of experi- One specialty function that sometimes is Also, by using a CPLD solution, you ence, Xilinx makes it easy to use repro- not considered but may prevent board re- can enable the input hysteresis, if required; grammable logic.

Reach over 70,000 Millogic IP is programmable logic Xilinx Proven! professionals worldwide. • PCI-X, VME, 12C, Advertise in the... Video, LZS, PS2 • Design Services • Prototyping Xcell journal • Xilinx Xperts Millogic, Ltd. 6 Clocktower Place forfor moremore informationinformation Maynard, MA 01754 Call: (800) 493-5551 978.461.1560 [email protected] [email protected]@aol.com www.millogic.com

Spring 2003 Xcell Journal 71 by Chris Dick, Ph.D. Chief DSP Architect and Director Signal Processing Engineering Xilinx, Inc. [email protected] ReinventingReinventing The ultimate goal in software radio has been the realization of an agile radio that can transmit and receive at any carrier frequency using any protocol, all of which can be reprogrammed virtually instantaneously. The Software Defined Radio Forum (SDRF) (www.sdrforum.org), an organiza- thethe SignalSignal tion dedicated to supporting the development, deployment, and use of open architectures for advanced wireless systems, defines a software defined radio (originally coined by Joe Mitola in 1991 [1]) as radios that provide software control of a variety of modulation techniques. These include ProcessorProcessor wide-band or narrow-band operation, communications security functions (such as FPGAs are ideal for building high-performance, hopping), and waveform requirements over a broad frequency range. reconfigurable signal processing systems such Figure 1 shows the architecture of a generic software radio. Smart antenna array as software defined radios. technology is used for both the receive and transmit paths in the system. On the receive side, multiple high-bandwidth digitized antenna data is channelized, converted to baseband, and filtered – typically the sample rate is adjusted at this node. Other sections of the radio’s physical layer (PHY) perform demodulation, synchronization, multiuser detection, adaptive interference cancellation, source decoding, forward error correction, beam forming, and adaptive equalization. All of these computations present significant challenges for the radio PHY signal processing engine. Furthermore, much of the processing occurs at very high data rates.

Demands for Configurability and Agility One of the driving objectives underlying SDR concepts is the desire to have a single hardware platform capable of servicing a number of radio environments. This type of reconfigurability could be used in several ways. For example, manufacturers developing infrastructure equipment or network operators building out a network could deploy a soft radio system in Europe configured to support Universal Mobile

72 Xcell Journal Spring 2003 Smart Antenna Array ADC Array DSP Radio PHY

Channelization - Demod/Mod Configurable ADC Down ADC - Synchronization RF/IF Analog ADC Conversion & ADC - MUD Signal Multirate Filters Processing - ICU Network (possibly - FEC - Source coding microelectro DAC Up-Conversion DAC - Security mechanical DAC & Multirate based) DAC Filters - Beam forming - Equalization

MUD = Multiuser Detection ICU = Interface cancellation unit FEC = Format Error Correction

Figure 1 - Generic software radio architecture showing adaptive antenna array, analog signal processing, digitization, and the radio signal processing PHY.

Telecommunications System (UMTS) or permit field upgrades or bug fixes to equip- Instruction set architecture (ISA) signal Global System for Mobile Communication ment already deployed, by supplying a new processors share many similarities to gener- (GSM) standards, or operate the system in BTS profile using the Internet or a al purpose processors. Architectural differ- the U.S. with a Code Division Multiple microwave link from a radio network con- entiates such as very long instruction word Access (CDMA)2000 radio personality troller to the BTS. Soft radios can also be (VLIW), super-scalar extensions, and vari- profile. This one system could also be oper- viewed as a means to protect infrastructure ous types of predictive enhancements are ated as a multimode radio in an environ- investments by keeping radio hardware from really micro-architecture evolutions of the ment that employs both wideband and becoming obsolete as new standards and basic architecture credited to von Neumann narrowband CDMA communications. techniques become available. and his colleagues in the 1940s and 1950s. Radio agility is important in situations As such, signal microprocessors have lever- where standards are fluid. For example, con- Economics and The Effect on Software aged most of their performance via a raw sider the evolution of the 3GPP standard Radio Development increase in clock frequencies. For example, and the length of time required for that Although commercial technology and eco- in the early 1980s the first fixed-point sig- standard to stabilize. Agility is also impor- nomics have always been inextricably nal processors supported clock frequencies tant during transition periods. As we move linked, significant changes are occurring in in the 5- to 10-MHz region. Current- from 2G to 3G mobile cellular systems, both of these domains that will alter the way generation high-end ISA Digital Signal multiple standards such as Personal Digital electronic equipment, including software Processors (DSPs) use 600-MHz clocks and Communication System (PCS), GSM, radios, is developed and deployed. From a are on a trajectory to the Giga-Hertz region. IS-95, Personal Handyphone System purely technological perspective, The Obviously, this curve has the same thermal (PHS), DECT, EDGE, GPRS, IMT-2000, International Technology Roadmap for pitfalls described above. and CDMA2000 must all coexist. Semiconductors (ITRS) shows that Moore’s Although FPGAs take advantage of Multistandard support will be a fact of life Law will remain in effect for at least another Moore’s Law (and other advanced process for the foreseeable future. When the 4G 15 years, and that in the year 2016 devices technology such as all-copper interconnect wireless network build-out is completed, will be produced on a 22 nanometer node. and low-K dielectric substrates) to provide multimode operation will be required to sup- Yet Patrick Gelsinger, Intel’s chief tech- increased clock frequencies over time, their port third-generation wireless direct nology officer, announced at last year’s primary mechanism for supplying perform- sequence spread spectrum (DSSS) and International Solid State Circuit’s ance is completely different than the ISA orthogonal frequency division multiplexing Conference (ISSCC 2001) plans for a 20 or approach. FPGAs exploit the large amount (OFDM), the modulation scheme most like- 30 nanometer process in 2010, delivering a of parallelism inherent in most signal pro- ly to be deployed in 4G systems. From a net- device consisting of 2 billion transistors cessing algorithms. With as many as 556 work operator perspective, base tranceiver operating at a clock frequency of 30 GHz. embedded multipliers and 125,136 logic station (BTS) configurability could be used The estimated power consumption of the cells in the Xilinx Virtex-II Pro™ Platform to dynamically allocate radio resources. This device would be 3 to 5 kW, or a power den- FPGA, we can readily see how these devices might occur on the time-scale of hours in sity of 1 kW/square centimeter, about the can be viewed as a naturally parallel pro- order to provide the highest quality of service same as a rocket nozzle. This has obvious cessing engine that can take advantage of to the subscriber base at any given time. thermal implications that must be dealt the rich parallelism in a software radio PHY. Both manufacturers and network opera- with using techniques radically different The software radio PHY is a complex tors could also use a configurable BTS to from today’s methods. signal processing system in which algorith-

Spring 2003 Xcell Journal 73 mic and functional-level state-of-the-art ISA DSPs. parallelism can be lever- Embedded Even though the ASICs used CLB, RISC CPU aged to realize a high- Switch IOB, in this part of the system may Matrix DCM performance system that offer some programmability, it does not rely on raw speed Active super-fast is generally limited in nature for its performance. The interconnect and is certainly a departure multiplier array could be from the intended philosophy used to implement space- Synchronous of the fully configurable soft Dual-Port RAM Programmable time processing in a receiv- radio. DSP processors might I/Os with LVDS er, while at the functional BRAM be used for certain baseband level multiple turbo convo- functions such as source (de-) Ω lutional decoders could be 50 encoding, as with CELP XCITE operating concurrently to codec. Reduced Instruction Multipliers Impedance support multiple users, 3.125 Gb Serial Set Code (RISC) processing • 18b x 18b multiplier Impedance Control each with a 2 Mbps data • 300 MHz pipelined Controller resources in the system could rate in a 3G environment. also support the requirements Manufacturers have of the higher levels in the pro- Figure 2 - Virtex-II Pro Platform FPGA showing the multiplier array for supporting made 60% more transis- parallel signal processing, multi-gigabit transceivers for inter-chip and inter-system tocol stack. tors available to circuit connectivity, and embedded RISC processor technology for performing decision-oriented From a soft radio perspec- designers per area of silicon tasks and running a real-time operating system. tive, the ASIC/processor com- compared with what was bination is poor partitioning available a year earlier. In contrast, the ratio streams. The Virtex-II Pro platform FPGA from both a flexibility and efficiency stand- at which designers are able to utilize transis- shown in Figure 2 is the cornerstone tech- point. In recent years FPGAs have experi- tors in circuits of any given tier of complexi- nology for building high-performance enced hyper-growth in both arithmetic ty has only been increasing at a rate of 20% reconfigurable signal processing systems – complexity and compute density (number of per year [3]. which includes the PHY in an SDR. In con- operations/unit area of FPGA) that can be This “design gap” is associated with per- junction with the logic fabric and active achieved by current generation devices. What formance supply and demand, but another interconnect, this device has an array of types of signal processing functions can be aspect is methodology related. It is becoming embedded multipliers for supporting the usefully realized by an FPGA? an increasingly complex, time-consuming, most demanding of arithmetic tasks in a Radio designers working with FPGA and error-prone procedure to develop and radio PHY. This particular FPGA family technology implement IF sampled receivers, verify a sophisticated ASIC. Furthermore, at a also offers integrated Power PC 405 tech- channelizers of different varieties including cost of $1-$2 million for mask set costs, it is nology, multi-gigabit transceivers and classical digital down (and up) conversion becoming prohibitively expensive. ASIC dynamic impedance matching capability on (DDC and DUC) architectures, FFT-based development timelines are now spanning the device I/O ports that can be used to sim- polyphase transforms, multistage multirate years, and may even extend beyond the win- plify printed circuit board design and man- polyphase decimators and interpolators, dow of opportunity for the intended product. ufacturing. Using a platform-based adaptive interference cancellers for DSSS Harvard Business School Professor approach to system implementation, system channels, multiuser detection (MUD), and Clayton Christensen highlights that while designers can create product differentiates rake receivers (including acquisition and price and performance are still important, by implementing signal processing func- tracking). More recently, FPGAs have been there are signs that a seismic shift is taking tions in the logic fabric as well as through used to construct space-time processors for place, leading to a new era where other fac- embedded software running on a Power PC. advanced smart antenna systems. FPGAs are tors – such as customization – matter more extremely adept and flexible at implement- [4]. This is precisely where the FPGA fits in: The DSP Dilemma ing FFTs, and this functionality has been it is the ultimate in customization. One approach to BTS implementation used to construct OFDM modulators and FPGAs address the technical as well as has been to employ a combination of ASIC demodulators. business perspectives outlined above. Because and ISA DSPs. The ASIC technology is typ- FPGAs have also found extensive use in they are off-the-shelf commodity items, com- ically used to address the significant arith- narrowband bandwidth-efficient Quadrature panies can access state-of-the-art device tech- metic requirements of the radio front-end, Amplitude Modulation (QAM) systems. In nology with minimal NRE, and quickly such as digital down conversion and chan- this environment they have been used to build and deploy customized systems, achiev- nelization filters to support multicarrier W- implement adaptive channel equalizers, digi- ing very short time-to-market while simulta- CDMA or CDMA2000 standards. These tal timing recovery circuits, carrier recovery neously maximizing first-to-market revenue functions are beyond the capabilities of even loops, frequency locked loops, and fractional

74 Xcell Journal Spring 2003 rate change filters. even be used to run a Node B FPGAs are also exten- High MIPs Processing application protocol (NBAP) sively used for forward error in logic fabric for a BTS, as a java virtual correction in communica- Polyphase MUD machine, or even provide ADC Transform tion systems. For example, ADC ICU CORBA support. ADC Demod 1 OC-3 155 Mbps Viterbi ADC Radio PHY Advances in analog-to- ADC Demod 2 decoders, Reed-Solomon Demod N MAC (Media Access) digital converter technology decoding at OC-192 10 DAC TCC are still required to support DAC MAC PPC405 Gbps data rates, and DAC Viterbi the high dynamic range DAC - Decision oriented tasks (de-)interleavers operating DAC - CORBA requirements of wideband at clock frequencies greater - Java Virtual Machine radio front-ends that offer than 200 MHz are all - NBAP true multimode global oper- Ω achievable with current gen- 50 ability. Advances are also 3.125Gb Serial XCITE Connectivity to eration Virtex™-II FPGAs. Network Impedance - other components required in the area of config- connectivity - other FPGAs FPGAs are the ultimate Impedance Control urable high-bandwidth ana- device technology in terms Controller log signal processing for of user customization. They realizing the RF and IF stages allow system architects to Figure 3 - Platform FPGA approach to software-defined radio realization. The high MIPs of a radio. Micro-electro- processing is implemented in the logic fabric, while decision-oriented and non-real-time perform area-performance tasks are provided as embedded software running on the Power PC. The multi-gigabit mechanical systems (MEMs) tradeoffs and to therefore transceivers could be used for providing connectivity to the broader network. appear to be a promising “right-size” the functional technology for addressing in- components in the system. FFTs with execu- past, FPGA-DSP design has required signal system configurable analog signal process- tion times in the microsecond to tens-of- processing and communication engineers to ing. As this technology matures and is microseconds are possible. In the context of use tool flows and languages with which combined on a single platform with digital an OFDM communication system, a small they are typically unfamiliar. The introduc- functionality, the ideal of a completely con- number of FPGA resources could be used to tion of tools like System Generator for DSP figurable radio will move closer to reality. realize a (de-)modulator that supports a mod- (www.xilinx.com/xlnx/xil_prodcat_product.js The significant computation demands of erate data rate, or by using more resources an p?title=system_generator) has gone a long the SDR PHY have been largely satisfied by extremely high-performance high data-rate way to let engineers work in the language of highly parallel signal processing platforms link could be realized. the problem. In this case the system is realized using recent generation FPGA tech- With FPGA technology, control of the sil- developed using a visual dataflow paradigm nology from companies like Xilinx. To com- icon is put back into the hands of the system in The Mathworks Simulink environment. plement the device technology, an increased developer rather than the chip architect – as The approach not only allows the design to emphasis on Signal Processing IP libraries is the case with an ISA signal processor. In be specified, simulated, and parameterized, and design methodologies such as System fact, one way to view an FPGA is as a minia- but it also enables design reuse through the Generator for DSP are taking on renewed ture silicon foundry with turnaround times of use of IP cores. roles to provide a solution to the challenges hours rather than months. presented by the software radio. Leveraging these types of tradeoffs does The Reconstruction of the Software Radio (Note: A useful primer for engineers not always mean that the engineering team The platform FPGA provides an opportuni- and executives interested in developing has to construct the functional units from ty for the radio architect to reinvent the sys- products in the SDR application space first principles. To facilitate rapid product tem. Instead of having a radio card that is can be found on the SDRF’s web site, development, many signal processing func- responsible for the DSP heavy lifting at the www.sdrforum.org/sdr_primer.html.) tions are available from the FPGA manufac- front-end of a soft radio system, and then turers themselves and from third-party passing this partially processed data over a References intellectual property (IP) suppliers. FFTs, VME or PCI-X bus to a baseband processor, [1] Mitola, J., III. 2000. Software Radio Architecture: Object Oriented Approaches to Wireless Systems Engineering. multirate filters, Viterbi decoders, and Reed- multiple functions could be integrated into John Wiley and Sons. Solomon encoders and decoders are all avail- one or a small number of platform FPGAs. [2] Allan, A., D. Edenfeld, W. H. Joyner Jr., A.B. Kahng, able as pre-verified IP from Xilinx As shown in Figure 3, compute-intensive M. Rodgers, and Y. Zorian. 2002. 2001 Technology Roadmap for Semiconductors. IEEE Computer (www.xilinx.com/xlnx/xil_prodcat_landing- tasks in the radio PHY could be imple- Magazine 35 (1):42-53. page.jsp?title=Xilinx+DSP). mented in the FPGA logic fabric, while [3] Verlinden, M.C., S.M. King, and C.M. Christensen. One of the roadblocks to the widespread more decision- and control-oriented tasks 2002. Seeing Beyond Moore’s Law. Semiconductor International 25 (8):50-60. deployment of FPGA-based signal processing are run as embedded software on the Power [4] Bass, M.J., and C.M. Christensen. The Future of the has been design methodology related. In the PC™. This embedded processor could Microprocessor. 2002. IEEE Spectrum Magazine: 34-39.

Spring 2003 Xcell Journal 75 How to Make Smart

The Nallatech BenADIC Antenna Arrays card combines a 20- channel data acquisition system with Xilinx XtremeDSP technology and Virtex FPGAs for high-performance digital signal processing.

by Malachy Devlin, Ph.D. Chief Technology Officer Nallatech [email protected]

Wireless communication has created a con- tinuing demand for increased bandwidth and better quality of service. With the ever- increasing number of mobile network sub- scribers, available capacity is becoming more of a premium. “Smart” antenna arrays are one way to accommodate this increasing demand for bandwidth and quality. These antenna arrays provide numerous benefits to service providers. However, the processing requirements for smart antenna arrays are many orders of magnitude greater than those for single antenna implementations. 76 Xcell Journal Spring 2003 In this article, we will describe how This dynamic adaptation of the antenna smart antenna arrays work and present a array response provides focused beams to new product from Nallatech™ that specific users and a new mechanism for combines a 20-channel data acquisition multiuser access to the base station. system with an FPGA computing fabric Conventionally, multiple users are separat- for handling the high-performance digital ed when accessing the base station by signal processing (DSP) operations. We using different frequencies, as in frequency also show you how this combined prod- division multiple access (FDMA). FDMA uct is integrated into a scalable system is used in advanced mobile phone services using Xilinx Internet Reconfigurable (AMPS) and total access communications Logic (IRL™) technology for remote Figure 1- Omnidirectional antenna systems (TACS). FDMA is also used in configuration and control of the system using Nallatech’s field upgrade systems environment (FUSE™) software. Active Beam

Focus Power with Smart Antennas Antennas Figure 1 shows a conventional antenna as omnidirectional. It radiates and receives information equally in all directions. This Antenna Array equal distribution leads to power being transmitted to, but not received by, the user. This wasted power becomes potential interference to other users or to other base stations in other cells. Interference and noise reduce the signal-to-noise ratio used by the detection and demodulation operations, resulting in poor signal quality. Figure 2 - Sectorized antenna with four sectors Figure 3 - Switched antenna array To overcome the problems associated with active beam highlighted with omnidirectional arrays, smart antennas focus all transmitted power to the user provides the maximum signal response. and only “look” in the direction of the Adaptive antenna arrays, on the other user for the received signal. This ensures hand, incorporate more intelligence into Targeted User that the user receives the optimum quality their control system than do switched- of service and maximum coverage for a beam arrays. Adaptive antennas monitor base station. An intermediate step to this their environment and, in particular, the ideal is using directional antennas that response of the data path between the user divide the 360-degree coverage into sec- and the base station. This information is tors. As shown in Figure 2, four direction- then used to adjust the gains of the data Interfering User al antennas can each cover approximately received or transmitted from the array to 90 degrees. maximize the response for the user. With Antenna Array Instead of using individual antennas, adaptive antenna arrays, the control sys- we can create a smart antenna array and tem has full flexibility and determines add further processing intelligence to the how the gains of the arrays are adjusted. data received or transmitted with this By adjusting the gains in this way, the Figure 4 - Adaptive antenna array enhancing the SINR array. Smart antenna arrays enable us to control system can – in addition to maxi- direct beams in specific directions through mizing the gain from a particular user – electronic or software control. also attenuate the signal from an interfer- Two types of smart antenna arrays are ing source, such as from another user or time, as in time division multiple access switched-beam arrays and adaptive arrays. from multipath signals. Therefore, as (TDMA) for global systems for mobile As shown in Figure 3, switched-beam shown in Figure 4, adaptive arrays maxi- communications, interim standard 136 arrays comprise a number of predefined mize the signal-to-interference-plus-noise (IS-136), or code division multiple access beams. The control system switches among ratio (SINR) and not just the signal-to- (CDMA), which is used in third genera- the beams and selects the beam that noise ratio (SNR). tion (3G) systems.

Spring 2003 Xcell Journal 77 cessing requirements. per second, for a sample rate of 105 mega Previously, we had a samples per second. This sample rate is for Antenna Array single stream of data a single beam and does not include the coming from a single processing requirements for the adaptive antenna; now, we have update algorithm. This amount of process- multiple data streams to ing does not seem unreasonable for per- process. As shown in formance in a DSP processor. However, if User 1 Figure 6, the data flow we want to support multiple beams and diagram for a beam- achieve finer beams by increasing the num- forming application is ber of antennas, we could quickly exhaust not a single input data the processing capability of a standard User 2 stream. We now have N processor architecture as we reach process- data streams that must ing requirements of several billion MACs be processed from the N per second. Figure 5 - SDMA allows two users to access antenna elements. By using FPGAs, we have powerful the same base station on the same frequency. The fundamental DSP devices for handling these high- operation carried out performance requirements at sampled data As shown in Figure 5, by using smart in adaptive arrays is to pass the data rates. Furthermore, we can take advantage antenna arrays, we can now use space divi- stream from each antenna through an of the FPGA flexibility for directly han- sion multiple access (SDMA). In this case, adaptive finite impulse response (FIR) dling acquisition control and other DSP users may use the same frequency, time, or filter. Note that in narrowband applica- functions, such as digital down-conversion, code allocations over the air interface and tions, the adaptive FIR filters simplify to demodulation, and matched filtering. only be separated spatially. This enables a single weight vector. The processing SDMA to be a complementary scheme to requirements increase, however, with each 20-Channel Data Acquisition FDMA, TDMA, and CDMA, and SDMA beam processed. Figure 7 shows the Nallatech BenADIC™ thus provides increased capacity within If we consider a simple example where data acquisition card, which can simultane- congested areas. we have four antennas and a narrowband ously capture data from 20 sources at a sus- system, such that the adaptive filters result tained rate of 105 mega samples per sec- Smart Antenna Processing in a single multiplication, we can see that ond. The analog inputs have a 250 MHz A fully adaptive antenna array implementa- the processing requirements approach one- bandwidth and the data is digitized at 14 tion requires a considerable increase in pro- half billion multiple accumulates (MACs) bits resolution. The card produces 3.675 gigabytes of digitized data every second, or the equivalent of 5.4 audio Antenna 1 CDs, for processing. RF Down-conversion Adaptive Filter The large number of tightly coupled input Antenna 2 channels makes the BenADIC card ideal for RF Down-conversion Adaptive Filter processing smart anten-

Array Output na arrays. As shown in ∑ Figure 8, the 20 input channels are partitioned into five groups of four channels. The analog-to- Antenna N digital converters (ADC) RF Down-conversion Adaptive Filter in each of these groups are connected to their own Xilinx Virtex™-E Reference Signal FPGAs. This enables Filter Weight Update local processing of the Figure 6 - Data flow for beam-forming four channels. Thus, the architecture can be

78 Xcell Journal Spring 2003 Xilinx FPGAs Allow for By using the FUSE software, this control Software-Defined Radio can be handled locally over a PCI or The great thing about Xilinx FPGAs is their remotely via TCP/IP, for example. ability to be reprogrammed on the fly and to give hardware different personalities based Conclusion on the application. Nallatech has been The BenADIC card from Nallatech cou- implementing dynamically reconfigurable ples the power of Xilinx FPGAs with a FPGA systems for a number of years. The highly integrated 20-channel data acquisi- BenADIC card is Nallatech’s newest product. tion system on a single card. It is ideally The BenADIC card is compatible with suited for handling large, smart antenna the Nallatech FUSE software environment, arrays or several smaller arrays. Figure 7 - Xilinx Virtex-E FPGAs which provides the capability to selectively The combination of the FPGA per- on a BenADIC 20-channel, and dynamically change the operation of formance and flexibility enables the realiza- 14-bit data acquisition card FPGAs in the BenADIC card or other tion of advanced DSP algorithms, which in FPGA-based systems, including modular turn opens the possibility for deploying arranged to handle five antenna arrays, DIME systems. This ability to dynamically advanced wireless interfaces. Deploying each with four antennas within the array. update a system leads to the definition of a advanced wireless interfaces provides users Alternatively, the high-speed internal software-defined radio where the receiver with a better quality of service and gives buses enable these groups to be intercon- characteristics are controlled via software. service providers greater capacity. nected to handle an array of 20 antennas. In addition to the channel 20 analog inputs group FPGAs, a large FPGA can handle further processing and communicate with the compact PCI (cPCI) backplane and the PCI bus (via the interface FPGA). Communications over the cPCI backplane allow data Interface Interface Interface Interface Interface transfer to other cards in the sys- FPGA5 FPGA4 FPGA3 FPGA2 FPGA1 tem, such as to the Nallatech SD-RAM SD-RAM SD-RAM SD-RAM SD-RAM DIME-II™-based BenADIC card, which can accept DIME-II modules and provide more than 50 million system gates with the Virtex-II family for Xilinx XtremeDSP™ operations. User FPGA: The BenADIC card is Control FPGA: Clocks Clock A Clock Xilinx distributed Clock B sources and compatible with the tools and Flash Virtex-E all FPGAs configuration Memory Xilinx Clock C cores produced by the Xilinx Virtex FPGA FPGA for user DSP group and includes the with control & designs PCI core powerful System Generator SD-RAM tool. By using System Generator, you can develop 82 165 and verify your algorithms 64 bit / 66 MHz J3 Connector J4 & J5 backplane within MATLAB™ and PCI bus digital I/O COMPACT PCI BACKPLANE Simulink™ environments. You can then synthesize and Local bus (64-bit width) implement your design for the Interface FPGA to user FPGA bus (58-bit width) PCI to user FPGA bus (40-bit width) FPGA. This implementation Parallel link buses (12-bit width) exercise has been carried out Adjacent bus (64-bit width) successfully for the BenADIC Figure 8 - Block diagram of a BenADIC card using System Generator.

Spring 2003 Xcell Journal 79 FPGAs Have the Multiprocessing I/O Infrastructure to Meet 3G Base Station Design Goals

Two-dimensional fabric efficiently links by Peter Galicki, CEO CrossBow Technologies Inc. arrays of processors inside Virtex-II Pro [email protected] With increased data traffic and new devices to enable parallel processing of data. multiuser detection and adaptive beam- forming algorithms, data processing requirements of 3G base stations will increase by as much as 100 times relative to current equipment. This increase in processing capacity must be matched by low power consumption, as the new pico- cell base stations mounted on building sides will not be using forced air cooling. Arrays of small and specialized processors (Figures 1 and 2) will provide a power-efficient method of increasing performance, more so than can be obtained by increasing the features of larger, general-purpose super processors. The evolution of current standards and introduction of new standards currently force base station operators to perform frequent upgrades to their wireless infrastructure, often requiring board replace- ments. To reduce field maintenance, 3G equipment must be upgradable without board swapping. The high cost of 3G equipment often leaves wireless infrastructure manufacturers with thin profit margins; costs will have to come down to enable large-scale deployment. Finally, OEMs cannot abandon their current design methods to start designing 3G equipment from scratch. They must be able to reuse semiconductor IP, code, and development tools to hit market windows and to obtain a return on investment.

80 Xcell Journal Spring 2003 cated DSP blocks. Virtex™-II FPGAs sup- general purpose super processor array of small specialized processors port DSP functions and a MicroBlaze™ soft processor. Virtex-II Pro™ devices also feature embedded IBM PowerPC™ processors. Depending on the task at hand, smaller processors may be better suited for simpler functions, and larger processors may path of one instruction be a better fit for more complex algorithms. In order to work in parallel, processors must be able to easily communicate with each other. An efficient way for processors to communicate is through a fabric dispersed across the entire design that looks to one instruction consumes one instruction consumes individual processors like conventional a lot of power very little power memory (Figure 3). This approach enables each processing element to be developed Figure 1 – Arrays of small specialized processors reduce power consumption. and verified individually, yet easily exchange data with other processors.

Two-Dimensional Data Communications general purpose super processor array of small specialized processors An effective data interconnect fabric must support low latency and deterministic data transfers occurring simultaneously among multiple processing elements. It must also be flexible and scalable to allow for the same size addition of new elements or the removal of silicon unwanted elements without affecting the rest of the design. Finally, it should be compatible with existing processors and be as easy to use as accessing memory.

Memory-Like Interface 8 data operations per cycle 256 data operations per cycle Using conventional bus cycles to transfer data between processors dispenses with Figure 2 – Arrays of small specialized processors increase performance. exotic and hard-to-implement communications peripherals and protocols in favor Meeting these seemingly exclusive goals precisely deterministic – otherwise compo- of a simple memory-like interface. As requires a combination of top-notch process nents will waste valuable processing cycles shown in Figure 4, 2D-fabric from technology, combined with comprehensive while waiting for data. CrossBow appears to processors as a mem- component library and efficient data com- Low latency requires the removal of data ory-mapped peripheral on an IBM munication methods, inside-chip and chip- communication bottlenecks by spreading CoreConnect™ bus. PowerPC and to-chip. Reaching 3G design goals hinges on the data flow over the entire area of a chip. MicroBlaze processors can issue conven- achieving the right balance between the size Transfer determinism is achieved by a com- tional read and/or write bus cycles to their and number of data processing components bination of low latency and a uniform data local 2D-fabric peripherals to communi- to keep most of the chip busy at all times, communications structure inside the chip. cate with other processors on the chip while reducing the overall distance that the (Figures 5 and 6). The payload for each data has to travel inside ICs. System efficien- Data Processing Elements transfer is derived from the data bus. The cy is heavily influenced by design partition- Each 3G chip is likely to contain hundreds destination location and the initial direc- ing, optimization of individual data of processing elements, many representing tion of travel are derived from the address processing components, and streamlining autonomous processors with their own data bus. The transfers are totally transparent to data flow between components. To keep data processing flows, control flows, memory, the sending and receiving processors, processing components busy, inter-processor and communications ports. Some data pro- launching transfers with write cycles and data transfers must have low latency and be cessing flows may be augmented with dedi- terminating transfers with read cycles.

Spring 2003 Xcell Journal 81 Routing of data from source to destination, lines to transfer all kinds of data, including cycles are autonomously converted to small as well as arbitration with other data traffic, payloads, control words, and configuration packets that travel between source processors is performed autonomously by the inter- data. Duplication of data transfer lines and destination processors through chains of connected 2D-fabric peripherals. reduces overall system efficiency. 2D-fabric peripherals of the intermediate As shown in Figure 7, 2D-fabric periph- processors along the way. Short point-to- A 2D Array of Data Transport Links erals of adjacent processors are interconnect- point links reduce power consumption. Efficient 3G designs will feature global ed with a single mesh of horizontal and Small packets with single word payloads communication fabrics using single sets of vertical data transport links. Individual bus reduce data transfer latencies, enabling data and control packets to share common transfer lines. The same lines can also be used for system initialization and configuration. Peripheral PowerPC MicroBlaze + DSP Scalability CoreConnect Bus I/F Scalability is an important requirement for the design effort and product field upgrades. Constantly changing standards may require adding or removing processors late in the design cycle or even after field deployment. In the past, adding or removing proces- 2D-fabric peripheral sors has always been difficult when using centralized DMAs for movement of data. In any centralized I/O structure, removing CPU 2D-fabric BUS transport or adding new components is likely to links affect other system components. Two- dimensional I/O structures are much less EDGE OF 2D-fabric sensitive to design changes. Adding anoth- A CHIP (high-level view) other local (detailed view) peripherals er processor to a chip is as simple as wrap- ping it with a 2D-fabric peripheral and connecting the respective data transport links to the existing fabric. This can be eas- Figure 3 – Two-dimensional I/O fabric enables arrays of processors to work in parallel. ily done without affecting any hardware or software already in place.

Low Latency and Deterministic Data Transfers HARDWARE VIEW 11 12 13 14 15 16 In computing environments where hun-

YX coordinates SOFTWARE VIEW dreds of processors are simultaneously 21 22 23 24 25 26 exchanging data, how can you guarantee that any one of those transfers is going to

31 32 33 34 35 36 3 wait states arrive at its destination no later than a

C PACKET FC13 13 EXIT fixed amount of time? Buses, crossbars, 41 429 433 44 45 46 DIRECTION 2 wait states F324 24 and other centralized I/O structures force FC23 23 F335 35 6 all data traffic through one central loca- F334 34 F346 46 51 52 53 54 55 56 tion, creating huge traffic jams. Two- F345 45 F655 55

1 wait state F654 54 F664 64 dimensional I/O structures, however, can 61 62 63 64 65 66 FC33 33 F663 63 F962 62 easily guarantee data delivery by spreading F344 44 F952 52 F951 51 out data traffic across the design. As EDGE OF F653 53 F941 41 FC31 31 A CHIP shown in Figure 7, a two-dimensional FXXX XX F942 42 FC32 32 FC22 22 1 wait state data transport grid dispersed across the LOCAL BUS 2 wait states PROCESSOR 43 + 2D-FABRIC READ READ WRITE WRITE entire design area removes communica- 3 wait states ADDR DATA ADDR DATA tion bottlenecks to allow individual transfers to complete on time, without Figure 4 – Deterministic worst-case transfer latencies look like memory wait-states. interfering with other transfers. Individual processors must use worst-

82 Xcell Journal Spring 2003 case transfer latency when planning data Worst-Case Latency Across packet delay time is 50 ns latency plus transfers. Although it is acceptable for data One Processing Node another 50ns for the packet to fully emerge to wait to be transferred, processors waiting A 50 ns packet latency across one node from the node. for data are wasting precious processing represents the time elapsed from when the If packets exiting from a given output cycles. Total transfer latency depends on packet started entering the node to the port can arrive from three different sources, the worst-case latency across one process- time when it started exiting that node. A the worst-case latency for any one packet is ing node and the number of intermediate packet delay time is the time from when it 250 ns. This is equal to the best-case laten- processing nodes between the source and starts entering the node to the time when cy of 50 ns plus two packet-delay slots of destination nodes. it completely emerges. Thus, a 100 ns 100 ns each.

Worst-Case Latency Across Several Processing Nodes If the worst-case latency for crossing of one processing node is 250 ns, the worst-case 2D-fabric ready for the next write bus cycle TXI 200 MHz CLK latency for the entire transfer chain of two

Packet exit direction and YX destination coordinates come from the write address bus ADR 9 BA nodes, for example, would amount to 500 ns. Thus, if a packet is launched from a Data payload comes from the write data bus DWR 87654321 source processor two nodes away from its RNW destination, it will take it a maximum of 500 REQ ns to arrive at its destination processor, ACK regardless of any other data traffic in the sys-

2D-fabric peripheral tem (Figure 8). converts one bus write cycle to one packet 2D-fabric 3 C Total Latency 100 MHz CPU T9P Because 2D-fabric appears to processors as if HEADER DATA PAYLOAD T9D BA 81765432 it were memory, and because transfer laten- 9 6 YX BYTE 4 BYTE 3 BYTE 2 BYTE 1 cy increases with the geographical distance T9B from the source, processors can treat transfer T9N latency as memory wait states for the purpose of scheduling the transfers. In a fully deterministic way, the further you go, the Figure 5 – Conventional write bus cycles launch data packets. more wait states will be required to complete a transfer (Figure 4). Although actual latency for the above

2D-fabric example is most likely to be closer to the

100 MHz best-case latency of 100 ns, the worst-case CPU BA R3P latency should always be used when plan- HEADER DATA PAYLOAD R3D BA 81765432 ning data transfers between processors. In YX BYTE 4 BYTE 3 BYTE 2 BYTE 1 some I/O fabrics, worst-case latency can be R3B further reduced by launching packets in spe- R3N cific routing directions to avoid interference

2D-fabric peripheral with other packets, thus reducing the num- converts one packet to one read bus cycle ber of packet delay slots from two to one, or RXI 2D-fabric ready for the next read bus cycle (the header is discarded) 200 MHz even down to zero. CLK As shown in Figure 4, 2D-fabric allows ADR Address bus only needs decode bits to select 2D-fabric and de-select other peripherals processors to easily determine the worst- DRD Read data bus content comes from the data payload 87654321 case transfer latency for any destination RNW inside the chip by simply counting the REQ number of intermediate nodes. 2D-fabric ACK also enables packets to be launched in any one of four possible directions by encoding Figure 6 – Conventional read bus cycles receive data packets. exit directions in the address field of each data write cycle.

Spring 2003 Xcell Journal 83 Conclusion Two-dimensional inter-processor interfaces enable fast, easy, and efficient data communications among hundreds of data processing elements of 3G functions implemented inside Virtex-II FPGAs. In addition to 3G, two-dimensional I/O also benefits voice- 2D-fabric over-packet, routers, medical imaging, radar, read CPU bus and sonar applications. Linking processing 35 cycle elements with 2D-fabric increases system D performance by enabling multiple processors

path of a to process data in parallel. At the same time, single packet 2D-fabric reduces power consumption by minimizing the total distance that data has to 4-byte travel inside chips. payload

2D-fabric And because it looks to the processors like conventional memory, 2D-fabric does not 35 write CPU 44YX coordinates 45 bus 43 force system programmers to change their cycle D A programming methods to benefit from high-

2D-fabric 2D-fabric er performance. Serial programming code investment is preserved, because each processing element has only one processor. Finally, system designers can now drasti- cally increase processing throughput and I/O Figure 7 – Non-blocking multiple simultaneous data transfers bandwidth while retaining current processor architectures and design tools. For more information on the 2D-fabric parallel-processing interface, go to 100 ns www.xilinx.com/products/logicore/alliance/ crossbow/crossbow.htm. LT input

output L1 L2

best-case latency = 50 ns

input output

worst-case latency = 250 ns input output

4-byte 100 MHz LT = L1 + L2 payload CPU

Total best-case latency 100 ns= 50 ns+ 50 ns

Total worst-case latency 500 ns= 250 ns + 250 ns 2D-fabric

Figure 8 – Latency can be determined by counting intermediate nodes.

84 Xcell Journal Spring 2003 AWESOME ASIC PCI-XPCI-X PCIPCIPCI Prototyping Solutions Cool Emulation Platforms DN3000K10S Single VirtexIITM PWB - 2V4000/6000/8000 - 2 PLL's for high speed clock distribution - 3 512k x 36 SSRAM's $995 - 1 GB SDRAM DIMM (PC133) - FPGA configuration using SmartMedia Card - Sun, Win2000/NT/98, LINUX Drivers and Utilities

PCI-X Extender DN3000K10 64-bit/133MHz Active Extender Logic Emulation PWB with the - Tundra TSI320 Dual-mode PCIX-to-PCIX bridge Hot new Xilinx VirtexIITM FPGA's. - Active PCI Extender with TWO 32/64-bit slots - Up to 5 2V8000's per PWB -- 466,000 FF's! - Easy access to all PCI-X signals - 2 PLL's for high speed clock distribution - PCI/PCI-X Universal 5V/3.3V I/O - 4 512k x 36 SSRAM's - Secondary PCI Frequencies as low as 10MHz - 1 GB SDRAM DIMM (PC133) - Numerous LEDs show system status - Monstrous amounts of interconnect - +3.3V not needed on host - FPGA configuration using SmartMedia Card $595!

PCI Extender 64-bit/66MHz Active Extender DN2000K10 Design Consulting - Intel® 21154 PCI-to-PCI bridge Logic Emulation PWB based on the Services Available - Active PCI Extender with THREE 32/64-bit slots for any ASIC or - Easy access to all PCI signals Xilinx VirtexE series FPGA's - 5V 3.3V or 3.3V 5V signal translation - Up to 6 Xilinx Virtex or Virtex-E FPGAs FPGA projects! - Frequency division of PCI Clock - Stackable to 4 or more PWBs! - Numerous LEDs show system status - 32/64 bit Universal PCI - +3.3V not needed on host A Sample of Our Services: Mr. Lazy Says: Mr. Freaky Says: - FPGA/ASIC/Board Design "Cheepest Way - Turnkey Verilog/VHDL "I hate searching for To a Working Design and Verification bugs; these guys Design!" - ASIC FPGA Design Translation made my life easy!" - HW Emulation and Testing - White-Backed - Blue Naped Mousebird Mousebird - Algorithmic Acceleration The DINI Group 1010 Pearl Street, Suite 6 La Jolla, CA 92037-5165 ASIC http://www.dinigroup.com 400M+ Gates Voice: (858) 454-3419 FAX: (858) 454-1728 SOLD! Email: [email protected] BluetoothBluetooth WirelessWireless TechnologyTechnology GetsGets BOOSTBOOST LiteLite ProcessorProcessor For easy and fast access to Bluetooth forfor VirtexVirtex FPGAsFPGAs wireless technology, NewLogic offers the BOOST Lite processor and development board for Xilinx Virtex FPGA products.

by Yan Siang Goh NewLogic Technologies, Inc. [email protected]

In recent months, many Bluetooth™- enabled products have been slowly appear- ing in the market. The proliferation of these products strengthens the acceptance of Bluetooth wireless technology in the market and as a global standard for short distance wireless communication. The BOOST Lite™ baseband processor for Bluetooth technology is designed for easy integration into the Xilinx Virtex™ FPGA family. The BOOST Lite processor is based on the popular BOOST Core™ processor and is complemented with the Bluetooth protocol stack and BOOST software to implement a complete Bluetooth wireless system.

86 Xcell Journal Spring 2003 External PCM CVSD transcoder BOOST Lite core PCM Codec and PCM interface inside Xilinx Virtex FPGA Voice bus

Interrupt Voice Internal Timing Registers bus Generator Port Generators

Bus HEC Exchange FEC Whitening Encryp tion Interface Data CRC Memory External Bluetooth radio module Control Radio Frequency Synchronization Interface Selection Logic Processor bus

BOOST Lite Core

BOOST Lite Option External ARM7TDMI External components processor External bus signals Figure 1 - BOOST Lite architecture

BOOST Lite Core Features stream from a UART. • Based on BQB version 1.1 qualified The BOOST Lite core BOOST Core processor interfaces to a fast processor bus or the AMBA-type bus. • Interface to most major Bluetooth This bus ensures that data is radios in the market moved quickly between the • Supports co-existence with 802.11b, processor and the exchange piconet, and scatternet operation memory, which is accom- modated internally on the • Supports Bluetooth low power modes Xilinx Virtex FPGA. Figure 2 - BOOST Lite development board • Supports all data and voice packet types From the architecture, you can see that the BOOST Lite processor A BOOST Lite development board is • ARM™ processor AMBA-type interface is designed as an “out of the box” Bluetooth also available (see Figure 2). The board is • Optimized for the Xilinx Virtex family. wireless solution. The BOOST Lite proces- supplied with an ARM7TDMI™ processor, sor can be integrated with any other third- external Bluetooth radio, and a Xilinx Virtex BOOST Lite Architecture party system, such as a printer or test FPGA as the BOOST Lite core. The BOOST Lite core has a fixed bus equipment core, that requires Bluetooth interface to an external ARM processor functionality. Conclusion and external Bluetooth radio (see Figure Furthermore, you can use the BOOST With the availability of BOOST Lite 1). Some external RAM and ROM (as Lite processor for fast prototyping before processors for the Virtex FPGA family, the well as EPROM, EEPROM, and so on) committing to a ASIC/ASSP design cycle. process of designing a Bluetooth-enabled are necessary to host the BOOST soft- This enables an easy progression from an system using Xilinx Virtex FPGAs is ware, which is the Bluetooth protocol FPGA prototype to a silicon solution. greatly simplified. For more information stack. The CVSD encoder (available as an NewLogic offers an option for BOOST on the BOOST™ family and the option) and a voice coder are necessary to Lite users to upgrade to the BOOST Core WiLD™ 802.11 WLAN technology support voice operation. For data applica- processor as a full, source code version of a family, visit NewLogic’s website at tions, it is possible to input/output a data Bluetooth baseband processor. www.newlogic.com.

Spring 2003 Xcell Journal 87 by Rick Richmond Senior Design Engineer Amphion DecodeDecode MPEG-2MPEG-2 [email protected] MPEG-2 is the digital video paradigm of today. It is at the heart of the Digital Video Broadcast (DVB) and Advanced Television Systems Committee (ATSC) standard as well as high-definition digital television VideoVideo withwith systems and DVD-video, which has seen incredible market growth in recent years. The widespread adoption of these applications and systems, coupled with considerable investment by broadcasters and dis- tributors, indicate that MPEG-2 is going to VirtexVirtex FPGAsFPGAs be around for a good while to come – despite the emergence of new, even better video compression algorithms. Future con- Amphion’s CS6651 video decoder sumer digital applications, in which audio, video, and data networking technologies enables the decompression of converge, are certain to need built-in MPEG-2 video capability. video streams in real time. Yet MPEG-2 video is fairly complex and computationally intensive to decode. The main features of the algorithm are discrete cosine transform (DCT)-based compression and motion estimation techniques. Until now, decoder implementations for digital STBs (Set Top Boxes) and DVD- video players had been the domain of ASIC implementations or software running on very powerful processors. As the sophistication of products like STBs grows, designs will require ever-increasing flexibility and ever-decreasing development time scales. Now, using the Xilinx Virtex™ series of FPGAs and a new intellectual property (IP) core from Amphion, building MPEG-2 video into your designs is simple. Amphion’s CS6651 video decoder, which incorporates an integrated external SDRAM memory controller and display direct memory access (DMA), is a great example of the Platform FPGA capability of the Xilinx Virtex series. This solution allows decoding of MP@ML MPEG-2 video with NTSC or PAL frame rates and resolutions.

The MPEG-2 Video Algorithm MPEG-2 MP@ML video provides a generic video compression solution for applications such as satellite, terrestrial, and cable

88 Xcell Journal Spring 2003 television (in DVB and ATSC formats), as ment from the current position at which shown in Figure 1. The core requires a well as optical storage (DVD-video) at the stored reference picture best resembles minimum clock speed of 27 MHz to NTSC, PAL, and SECAM resolution and the current macro block. The difference (if maintain MP@ML decoding rates. Video frame rates. any) between the motion-compensated elementary streams are accepted into the MPEG-2 video sequences are composed prediction and the actual image is coded core via the byte-wide ES_Data input of three different types of pictures: using the DCT-based techniques described port on the elementary stream interface. • Intra coded pictures (I-pictures), above. The sum of the predicted samples which are compressed using DCT- and the prediction error give the final Parser based techniques reconstructed macro block. The front end of the core is the video elementary stream parser. It searches • Predictive coded pictures the syntax of the incoming stream (P-pictures), which use motion for start codes at which decoding compensation to predict the may commence. The parser extracts current picture from a past ref- the various encoding parameters erence picture from the headers, which are used to • Bidirectional predictive coded direct subsequent decoding. The pictures (B-pictures), which use remaining variable-length encoded predictions from past and picture data is passed onto the VLC future reference pictures. B- decoder. type pictures are not themselves used as reference pictures. VLC Decoder Here, the Huffman-style VLC DCT-based Compression picture data is decoded. The out- In MPEG-2 MP@ML video, puts of this block include DCT each picture is broken down block run-level codes and motion into 16x16-sized blocks of lumi- vectors for motion compensation nance samples, called macro of each macro block. blocks. These blocks are further divided into 8x8 blocks. Each Run-Level Decoding and Inverse macro block has six blocks in Quantization total: four luminance blocks and two sub Decoding MPEG-2 Video The run-level decoder converts run-level sampled chrominance blocks (a 4:2:0 As devices and applications grow in com- codes from the VLC decoder into com- chrominance format). In the encoding plexity, the Amphion CS6651 MPEG-2 plete blocks of 64 quantized DCT coeffi- process, a two-dimensional, 8-point video decoder IP core can be implemented cients. These coefficients are then con- DCT is applied to each 8x8 block; the by taking advantage of the capabilities of verted from zigzag scan order to natural resulting coefficients are quantized using Xilinx Platform FPGAs. row order before being dequantized. a 64-element quantization matrix. As part of a demonstration system, the Virtex block RAMs are used to support This process reduces amplitude and core is implemented in a Xilinx Virtex the scan conversion operation and the increases the number of zero-value coeffi- XCV800 device. Including extra interfac- storage of custom quantization matrices, cients. The quantized DCT coefficients are ing glue logic and PAL/NTSC video which may be sent in the elementary reordered in a zigzag fashion into a one- encoder driver logic, the core consumes stream headers. dimensional stream, effectively grouping fewer than 8,000 slices and 26 block together runs of zero-valued coefficients RAMs. The implementation benefited Inverse DCT interspersed with non-zeros. This stream of greatly from the following features of the This unit performs the computationally run-level pairs is then encoded using target device: intensive inverse DCT (IDCT) on Huffman-style variable length codes the dequantized 8x8 blocks of DCT coef- • Ample high-performance block RAM (VLCs) based on a statistical model. ficients. Making use of the high-speed • Fast I/Os on-chip block RAM, the IDCT unit is Motion Compensation capable of streaming data continuously, • Extensive logic resources. In predictive and bidirectionally predictive transforming in 64 clock cycles an coded pictures, each macro block may have The functional blocks and simplified 8x8 block of DCT coefficients into an a number of pairs of motion vectors. These interfaces of the Amphion CS6651 8x8 block of luminance or chrominance specify the horizontal and vertical displace- MPEG-2 video decoder IP core are samples or prediction errors.

Spring 2003 Xcell Journal 89 Motion Compensation and Picture Reconstruction For each macro block in a P-picture or B-picture, the motion compensation unit ES_Data SD_DataOut takes the decoded motion 8-bits 32-bits Motion vectors from the VLC ES_Valid Parser Comp. decoder and translates them VLC SD_DataIn ES_Stall into row and column coor- Decoder 32-bits dinates for the prediction samples in the reference picture. The frame store SDRAM Inverse I/F memory then requests these Quant & samples and retrieves them Run-Level Decoder via the SDRAM interface. The samples are combined Picture with other prediction sam- Recon. ples, if necessary, to com- IDCT plete the prediction for the macro block. The final stage of decoding is to add the prediction P_Data samples to the prediction H_Addr 16-bits error corrections from the Control Display 22-bits inverse DCT unit and write DMA P_DataAvail H_DataIn Host I/F the reconstructed samples H_DataOut P_DataStrobe into the frame store memory. If a block has no predict- 32-bits ed samples, then the samples from the inverse DCT are the final samples and are passed straight through. Figure 1 - Simplified block diagram of Amphion CS6651 MPEG-2 video decoder IP core Both motion compensation and picture reconstruction employ Virtex block RAMs for buffering Display DMA Simple 32-bit read/write access to the predictions, samples, or prediction errors The display DMA unit retrieves decoded frame store is also available. In the and final reconstructed samples. samples from the frame-store memory line- demonstration system, a host processor by-line for display. This unit has a config- controls the core to perform special SDRAM Interface urable double-byte output interface for effects modes such as pause and fast- The frame store memory can be imple- luminance and chrominance samples and forward in response to user commands. mented using a commodity PC100 or can also perform chrominance upsampling better 64-Mb SDRAM part. The to a 4:2:2 format in the vertical direction. Conclusion SDRAM interface handles the mapping This interface provides a number of hand- MPEG-2 video decoding promises to be of row and column motion compensation shake signals and flags (not shown in an important feature of many future prediction requests and reconstructed Figure 1) to easily allow for the addition of products. In addition to the Amphion sample writes into linear memory extra logic, to create sync pulses suitable for CS6651 MPEG-2 video decoder IP core addresses. Motion compensation places connecting the decoder to a NTSC or PAL implemented on a single Xilinx Virtex particular memory bandwidth demands video encoder chip. Platform FPGA, Amphion is developing on the decoder implementation. To even more sophisticated FPGA-based achieve adequate decoding performance, Host Interface and Control decoders. this unit must then arbitrate between the Access to internal control, status, and For the latest information about other functions, such as the display video stream parameter registers within Amphion’s advanced decoders, visit DMA, which also access the frame store. the core is provided via the host interface. www.amphion.com/video.html.

90 Xcell Journal Spring 2003 BughuntersBughunters @@ SiemensSiemens

Siemens has developed a powerful diagnostic tool for its high-volume by Alfred Fuchs, MSEE Siemens has developed a powerful diagnostic tool for its high-volume Project Manager telephone switches. Advanced FPGA technology and high-productivity EDA Siemens CES Design Services telephone switches. Advanced FPGA technology and high-productivity EDA www.ces-designservices.com toolstools fromfrom XilinxXilinx havehave enabledenabled sophisticatedsophisticated testtest strategies.strategies. [email protected] Gerhard R. Cadek, Pd.D. EDA Trainer Oregano Systems [email protected]

Spring 2003 Xcell Journal 91 Most people are unaware that hundreds of microprocessors are involved in placing a cellular telephone call. Although this may be a surprising fact for the average cell phone user, mobile communications experts are even more astonished that such a complex system is stable and seemingly robust. For despite a well-defined development methodology and quality assurance meas- ures, bugs are sure to exist in any microprocessor code or software. Particularly annoying are sporadic bugs, which show up only under special, rare circumstances at intervals as long as months apart. Typically, such bugs are associated with the dynamics of the multiprocessor system. Of course, there are plenty of debug tools to analyze software, but applying Figure 1 - Siemens “Bughunter” HW-Tracer with Virtex-E XCV2000E FPGA them may change the dynamics of the system and can therefore mask the problem. per second. Actual cycle rates can even be FPGAs was the only technology available This is where the Siemens Hardware higher, as data bursts are transferred in a that could meet the requirements. We (HW) Tracer comes into its own in identi- compressed format. chose Virtex-E FPGAs because: fying software bugs before they can cause To use the HW-Tracer’s two channels • Virtex-E devices have up to 804 user problems in the field. Based on Xilinx independently for processor or memory, we I/Os and a 130 MHz internal per- Virtex™-E FPGAs, HW-Tracer is a stand- dynamically reconfigured the FPGA that formance level. alone, rack-based data capture, analysis, acts as the heart of the tool. Reconfiguring • There were more than enough rout- and debug tool (Figure 1). the hardware to complete two different ing resources to capture the 32-bit trace tasks halved the cost of the hardware. busses with widths as wide as 144 Hardware Tracer bits. With densities ranging from The HW-Tracer’s task is to acquire, filter, The Technical Challenge 58K to 4M system gates, a cascade and qualify all bus cycles and additional A number of key design considerations chain for wide-input functions, and monitor signals. These signals can be decided the final hardware implementa- dedicated carry logic for high-speed accessed via interfaces on the front panels tion. Because of their reprogrammability arithmetic functions, the Virtex-E of every processor and memory board and and dynamic reconfigurability, we selected FPGAs were ideal for this task. are recorded in trace memory together with Virtex-E FPGAs. a timestamp that is 2 million cycles deep. The tasks the HW-Tracer had to per- • The programmable SelectIO™ The tracer can log the complete activity form included: standards on Virtex-E devices were of the system in real time without interfer- • Analyze nearly 200 input signals at a able to handle signal integrity issues ing with its operation. Moreover, it can also system speed of 75 MHz at the board level, supporting 20 trigger specific events due to a tight cou- • Capture a high number of 32-bit buses high-performance interface standards pling with the Coordination Processor’s with a data path as wide as 144 bits, with as many as 804 single-ended (CP) operating system. which would stress routing resources I/Os or 344 differential I/O pairs. HW-Tracer is essentially a logic analyzer • The fast, high-density, 1.8V Virtex-E with many proprietary extensions. Because • Cope with signal integrity issues on the family is designed for low-power of the number, high speed, and complexity board level. operation. of these extensions, we developed the HW- Power aspects also had to be taken into Tracer as an in-house tool for Siemens. For account in the overall design. The circuit design took a massive example, it tracked processes (whose address- We conducted a feasibility study before pipelining approach, mathematically es were determined at runtime) and com- we developed the new HW-Tracer. speaking. In addition to a design complex- piled a view containing only program jumps. Specifically, we investigated whether FPGA ity of about 400K gates, which was main- To meet the demanding requirements of technology was capable of dealing with the ly determined by the logic in the data the new CP generation, HW-Tracer has to necessary system performance. Our study path, there was also complex control logic cope with a peak data rate of 2 gigabytes showed that the Xilinx Virtex-E family of to be implemented. A unique feature in

92 Xcell Journal Spring 2003 Xilinx FPGAs is a mode called SRL16 this issue successfully by interacting with dards, and pipelining capabilities of Virtex- (Shift Register LUT), which can be used Xilinx Alliance software, and thus, reduc- E combined with the powerful design tools to increase the effective number of flip- ing the number of design iterations. We made the choice of FPGA easy.” flops per configurable logic block (CLB) have seen improvements on the critical by a factor of 16. (Adding flip-flops paths as high as 15%. Emulation enables fast pipelining.) We carried out many implementation The application software of the HW- The state machines required to control runs to incrementally move toward our Tracer had to be thoroughly verified before the operation modes of the HW-Tracer performance goal. In our project, we its first use. In this environment, the could not be pipelined and threatened to started the development based on prelim- Tracer design was stimulated by a pattern limit the achievable system speed. We inary timing data; later updates of the generator, which was included into the have found, however, that arithmetic per- timing data entailed some refinements of FPGA design using internal Virtex-E formance is not the bottleneck in today’s the constraints. block memories. The patterns were again leading-edge FPGAs. We benefited a great deal from the sta- derived from the system simulation. In the bility and performance of Xilinx design final product the pattern generator is used Choosing the Design Flow software. We achieved turnaround times of for a “learn mode,” which helps users Despite the extreme performance needed, three to four hours for the whole imple- familiarize themselves with HW-Tracer we decided to code the design in VHDL, mentation task, which is excellent when before being connected to an actual CP which is technology-independent at a high taking into account the complexity and unit for test and debug. level and controls the implementation via tough timing constraints. just the constraints in the synthesis and Internet Reconfigurable place-and-route tools. We have successfully Management Views In order to facilitate troubleshooting on used this automated design flow already in The CP’s bring-up-phase depended heavily any of the switches installed throughout the past for several FPGA designs. It pre- on the availability of HW-Tracer, which the world, the HW-Tracer was designed as vents design issues and saves time when was developed in parallel. To address this a portable device and squeezed into a 3U- modifying the design. risk, we added several additional verifica- CompactPCI chassis. Only the memory modules had to be tion steps to the process. The FPGAs were loaded by the instantiated; using the Xilinx CORE The test equipment included in the embedded PC, which took the configura- Generator™ tool, this proved to be quick VHDL system simulation comprised: tion data from the hard disk. Embedded and easy to implement. • Two main memory units, each with a software as well as FPGA design files are It was clear that we would have to utilize million-gate ASIC containing two accessible on a dedicated homepage – simulation as if we were developing an embedded CPU-cores where future updates and upgrades can be ASIC. Simulation (including a gate-level downloaded. simulation with the VHDL output of • Several different processing units, Xilinx Alliance Series™ software) added to each with a million-gate ASIC and Conclusion the overall design time but paid off at the two CPUs The powerful combination of Mentor end, when the FPGA was delivered with • Up to 16 peripheral models and an Graphics’ Leonardo Spectrum synthesis only two very minor bugs detected. ATM-controller. with Alliance software and Virtex-E hardware from Xilinx enabled Siemens to Striving for Timing Closure In the context of the system simulation, bring its new range of telephone switches We selected Mentor Graphics’ Leonardo the CPU’s firmware was already verified. to market. Virtex-E FPGAs were used at Spectrum™ software to synthesize the “The verification of the HW-Tracer at the heart of the HW-Tracer, which was design and Alliance Series software for the the virtual, simulated level ensured that it used to debug and test the Coordination back-end task. Both tools enabled us to could be immediately used for testing the Processor project. The software combina- implement a completely automated physical prototype of the CP unit. This is a tion allowed FPGA design iterations to be design flow. This was important because great advantage over ASIC designs. Using completed quickly without affecting the in past designs, timing closure could not FPGA simulation tools shortened our tight timing constraints. We chose Virtex- be guaranteed; code changes tended to design time greatly,” said Johann Notbauer, E devices for their high logic densities, result in new worst-case paths that violat- Siemens CES Design Services’ technical high system speeds, flexible system I/O, ed the tight timing constraints. We have director for ASIC and FPGA design. and ability to perform fast pipelining uti- experienced this timing constraints prob- Friedrich Wilhelm, technical director lizing the SRL16 mode. lem when designing with ASICs. and specialist for proprietary test tools at As one customer put it, “The HW- The most recent releases of the Siemens, added, “The high logic densities, Tracer is the most important debug tool in Leonardo Spectrum synthesis tool address flexible high-performance interface stan- the CP project.”

Spring 2003 Xcell Journal 93 support.xilinx.comsupport.xilinx.com:: AnswersAnswers YouYou NeedNeed WhenWhen YouYou NeedNeed ThemThem

Need help? Simply go to support.xilinx.com on the World Wide Web.

by Doug Horne Product Marketing & Web Development Global Services Division Xilinx, Inc. [email protected] A robust, comprehensive, online resource, support.xilinx.com stands ready to answer all your design questions 24 hours a day, 7 days a week, 365 days a year (366 in a leap year). With customized access to such features as Answers Search, Software Manuals, Tech Tips, Forums, Problem Solvers, techXclusives, WebCase, Software Updates, and Agents, you can get the answer you need when you need it (Figure 1): • Answers Databases: Use our advanced search or answer browser tools to easily access the latest answers to technical issues from our huge database of more than 4,000 answers, indexed by logical category.

94 Xcell Journal Spring 2003 • Software Manuals: View manuals in • Latest Answers: View the answer PDF, compressed PDF, or in conven- records you visited, make note of ient search-enabled HTML format. the most recent answers published by Xilinx engineers, and see what • Tech Tips: Get the latest technical visitors are saying. information about development tools, device families, interface • Tech Tips: This is the best resource tools, and Virtex-II Pro™ FPGAs. for hot issues and tips that will get you up and running quickly. • Forums: Collaborate with other designers in discussion groups or • My Bookmarks: Create links to chat rooms; join the popular news- your favorite websites and Answer group comp.arch.fpga. Records, or search Queries on support.xilinx.com. • Problem Solvers: Get instant help with installation and configura- • Stock Quotes: Watch your favorite tion, PCI applications, and JTAG stock ticker symbols, latest price, implementation. This interactive change, and last trade information. tool uses a series of questions to Figure 1 - support.xilinx.com: the best in online support diagnose and troubleshoot your Do It Today configuration or installation prob- Take advantage of the powerful lem automatically, saving you tools at your disposal by visiting hours of work. support.xilinx.com today. To sign up for MySupport.xilinx.com, visit • techXclusives: Expert Xilinx appli- mysupport.xilinx.com, or click on cation engineers will keep you the MySupport logo on the informed of the hottest issues and support.xilinx.com home page. latest techniques. Questions can be directed to • WebCase: Use our WebCase to Doug Horne at 408-626-6317 or manage hotline cases. You can [email protected]. check status, add notes, and even close a case -on the Web, anytime.

Figure 2 - MySupport.xilinx.com, the Let’s Get Personal information you need when you need it MySupport.xilinx.com puts the powerful tools of support.xilinx.com to work for • Xilinx Global Services: you with an easy-to-use interface that lets Here you’ll find all the lat- you create your own personalized page est news on the Xilinx (Figure 2). Whenever you log in, you will Global Services Division’s automatically be notified of new down- offerings, designed to help loads, such as service packs and IP update you “finish faster.” You can releases, relevant to your design. Stock customize your news alerts quotes, bookmarks (for Xilinx and non- according to your interests. Xilinx websites), and Answer Records trails • MyAlerts: This personalized are handy features that add a personal touch service gives you the latest to MySupport.xilinx.com. Registration is product news and Answer easy and free! Records pertaining to your MySupport makes it easy to stay on top specific needs. (Figure 3): • What’s New: View the lat- • Support News: Get the latest in support- est products, devices, related news and developments on design tools, and system support.xilinx.com; techXclusives are also solutions from the Xilinx featured in this section. product catalog. Figure 3 - Keeping track of issues is easy.

Spring 2003 Xcell Journal 95 Learning Without Luggage Design engineer Lukose Ninan was just getting into Xilinx FPGA design when he found out about our first online FPGA training, e- Series I. “I wanted a refresher course,” said LearnLearn Smarter,Smarter, Ninan of ComSonics Inc. “e-Series I is exactly what I wanted to get up to speed on Xilinx design, and I got the information I needed without having to travel.” When Mike Schell of Convergent Design Inc. learned about the online pro- LearnLearn FasterFaster gram, he enrolled immediately. “The number one reason I prefer e-learning classes is WithWith itsits new DesigningDesigning forfor convenience,” said Schell, a design consult- ant. “I have deadlines to meet, and it would PerformancePerformance LiveLive OnlineOnline course, be lost time if I had to travel to a class.” Ninan and Schell are among a growing Xilinx Education Services now offers number of engineers and managers who use online e-learning programs to acquire real-time,real-time, interactiveinteractive instructioninstruction or enhance existing skills (see “Benefits for Both Managers and Engineers”). According over the Web. Learn at your desk to a report from Merrill Lynch, employees from trained specialists -- live online. in more than half of U.S. corporations used from trained specialists -- live online. e-learning training programs in 2000. Here at Xilinx, online course enrollments have steadily increased since our initial e-learning offerings began in 2000.

Virtual Education The Designing for Performance Live Online package consists of five one-hour lecture modules and four two-hour lecture-and- lab modules (see “Designing for Performance Live Online Modules”). This series of modules was selected from the popular Designing for Performance course. Each module is delivered live on the World Wide Web. The entire package is available for $900 by Cindy Andruss USD or nine training credits, and you can Technical Writer/Production Editor In addition to developing state-of-the- register and pay for the package online. If Xilinx, Inc. art logic devices and software, Xilinx also you prefer, you may purchase modules [email protected] continues to pioneer educational oppor- individually. tunities that reduce engineers’ time to The modules are scheduled sequentially, Reduced training budgets, travel restric- knowledge and increase their proficiency two per week, over a five-week time period. tions, and explosive advances in program- in using Xilinx FPGA design tools. The Some students say these smaller units of mable logic technology make it harder than latest offering from Xilinx Education content provide a more lasting learning ever to stay on the high side of the learning Services, Designing for Performance Live effect compared with the average content curve. Combined with demands to do Online, is an education package that retention rate for intensive, all-day seminars. more in less time, your job as a design combines the best of live instruction with “It’s not necessary to have all the infor- engineer – or as a manager of a design team none of the inconveniences and lost mation in an eight-hour class,” Schell said. – requires an innovative training solution opportunity costs of travel to off-site “That’s what I really like about the pro- that will save you time and money. training centers. gram. You have a day or two to absorb the

96 Xcell Journal Spring 2003 After you register for Designing for Performance Live Online, Xilinx will send Benefits for Both Managers and Engineers you a series of e-mails containing the URL As a manager, you will benefit from Designing for Performance Live Online if: address and phone number for each session, along with lab requirements and • You plan to hire engineers throughout the year and would like to offer this instructions. Ten minutes before each ses- packaged training to new hires as they come on board. sion begins, you can log in and download • You do not have a sufficient number of engineers at any one time to warrant the presentation materials and lab docu- an on-site learning program. mentation so you will be prepared for class. • You do not want to send the engineers away for training because of time or Conclusion budget restrictions. The Designing for Performance Live • You want accessible, consistent, and affordable training for your engineers who Online series delivers a convenient, have experience with Xilinx ISE software tools but who need to enhance their advanced training solution for design engi- knowledge of FPGAs. neers who have taken the Xilinx As a design engineer, you will learn how to: Fundamentals course or who have equivalent knowledge of Virtex™-II architecture, • Use synchronous design techniques to improve performance. software tool flow, and global timing con- • Design synchronization circuits to improve design reliability. straints. This Live Online course focuses on enhancing your knowledge of the latest • Write HDL code to efficiently target Virtex-II architecture resources. Xilinx FPGA design tools and techniques. • Generate customized cores using the CORE Generator™ system. To learn more about Designing for Performance Live Online or other Xilinx • Estimate power consumption using the XPower utility. e-learning courses, visit the Xilinx Education • Pinpoint design bottlenecks by interpreting timing reports. Services website at support.xilinx.com/ support/education-home.htm or call the regis- • Apply advanced timing constraints to meet your performance objectives. trar at 877-XLX-CLAS (877-959-2527). • Improve design performance by using advanced implementation options.

Designing for Performance information before the next session. It’s Real-time, synchronous training sessions Live Online Modules* easier to learn that way.” create an environment where you can ask The predetermined schedule lets you questions and hold collaborative discus- 1. FPGA Design Techniques lock in your dates ahead of time. If by sions. Also, you can use the chat feature to 2. HDL Coding Style (and lab) chance you miss a session, the sequence is ask questions of other engineers in the class, repeated every five weeks. You may even or pose your questions to the instructor. 3. Synthesis Techniques (and lab) begin the series with almost any of the first All you need to begin Designing for 4. CORE Generator™ System five modules, but units 6 through 9 are Performance Live Online are a Web-enabled (and lab) somewhat dependent upon each other and computer and a telephone line. You will log should be taken sequentially. Once you on to an online server (provided by Toolwire) 5. Xpower have completed Designing for Performance to run the lab exercises. Before enrolling in 6. Achieving Timing Closure Live Online, you’ll receive a Certificate of this class, you must pretest your system to Completion to add to your record of con- ensure that it will perform the labs in the 7. Timing Groups and Offset tinuing education credits. Toolwire virtual environment. Go to Constraints support.xilinx.com/support/training/using- 8. Path-Specific Timing Constraints A Classroom at Your Fingertips toolwire.htm and follow the steps for testing (and lab) As the name promises, Designing for your network, connection speed, firewall Performance Live Online is delivered by a compatibility, and installation of the Citrix 9. Advanced Implementation Options live instructor – no recordings. At the ICA client software. Once you have pretest- *Modules are subject to change to stay current beginning of each interactive session, the ed your system, you will be able to connect with emerging technology. For the most up-to- date information, visit the Xilinx Education instructor will review labs and address any to the Toolwire remote Windows 2000 desk- Services website at support.xilinx.com/support/ questions you may have before moving on top. If any issues arise, simply contact the education-home.htm. to the lecture. registrar at 1-877-959-2527 for support.

Spring 2003 Xcell Journal 97 Xilinx Technology Enabled Instant Deployment of Real-PCI Express Xilinx delivered the world’s first PCI Express product on the same day the specification was finalized.

By Xilinx Staff the definition of the specification – a true testament to the capability and Last July – on the same day the PCI benefits of programmable systems,” Special Interest Group (SIG) Snyder added. announced the final PCI Express specification – Xilinx delivered the Always Looking Forward world’s first PCI Express intellectual Real-PCI Express is currently compati- property (IP) core: Real-PCI™ ble with both the protocol and electrical Express. The solution expedited the requirements of the v1.0 base PCI implementation of PCI Express for Express specification, but Xilinx is con- Xilinx customers by 12 to 18 months, tinuing to participate as an active demonstrating the power of program- developer in the PCI Express Advanced mable logic over ASIC technology. Switching working group to develop PCI Express is the successor to the communications extension for PCI the legacy peripheral component Express. Xilinx plans to incorporate interconnect local bus standard estab- Xilinx Smart-IP™ technology to meet crit- PCI Express with advanced switching into lished by Intel. The new PCI Express stan- ical 2.5 GHz timing requirements. The its Virtex™-II series FPGAs. dard is targeted at the desktop, mobile, core reaches a line speed of 2.5 Gbps, uti- server, storage, and embedded communi- lizing the features of the RocketIO multi- cations markets. gigabit transceivers, such as clock data License Price and Availability “PCI Express takes PCI to another recovery, 8B/10B encoding, 3.125 Gbps Real-PCI Express is level, with a high-speed, scalable, serial SerDes, transmit/receive FIFOs, and CRC. available now as a Xilinx architecture that provides exciting new LogiCORE™ product under I/O options for system partitioning The Logic Advantage the terms of SignOnce™ IP designs and form factors,” said Tony The programmability and serial transceiv- license and is priced at $25,000. Pierce, PCI-SIG chairman. er capability of Virtex-II Pro FPGAs Once purchased, it may be allowed Xilinx to develop the core simulta- configured and downloaded The PCI Express Core neously with the definition of the PCI from the Xilinx website at The Xilinx Real-PCI Express core uses Express specification as it evolved. www.xilinx.com/pciexpress/ the proven RocketIO™ 3.125 Gbps The shipment of the Real-PCI Express For more information and to serial transceivers on Xilinx Virtex-II core at the same time a final specification purchase Virtex-II Pro FPGAs, Pro™ FPGAs – the only devices on was released allowed designers to begin visit www.xilinx.com/platform/ the market capable of implementing the prototyping PCI Express solutions imme- Information about licensing and new specification. diately, well ahead of any ASIC-based other Xilinx LogiCORE products is The Real-PCI Express interface can be implementations, according to Cary available on the Xilinx IP Center at used to maximize performance and feature Snyder, a noted industry expert and analyst. www.xilinx.com/ipcenter/ quality in high-performance workstations “By using the programmability and For complete information as well as consumer gaming devices. serial transceiver capability of the Virtex- about Real-PCI Express, visit Designers can use the core to design high II Pro device, Xilinx was able to develop www.xilinx.com/systemio/ performance PCI Express systems using its core in parallel with its participation in

98 Xcell Journal Spring 2003 Xilinx Events and Tradeshows HW-AcceleratedHW-Accelerated

Xilinx participates in numerous trade shows and events throughout the year to help you stay informed. This is a perfect opportunity to meet our silicon and software experts, ask questions, see demonstrations of new products, and discuss the latest trends. You’ll meet people just like you and you’ll see how RTOSRTOS they are using programmable logic to solve their technical challenges.

Worldwide Event Schedule TheThe worldsworlds fastestfastest January 28-29 Platform Conference San Jose, CA realreal timetime kernel!kernel! January 30-31 EDSF 2003 Pacifico, Yokohama

February 17-21 3GSM Cannes, France SierraSierra isis aa RTOSRTOS kernelkernel IPIP

February 19-20 Platform Conference Taipei, Taiwan forfor XilinxXilinx FPGAs.FPGAs. SuitableSuitable forfor bothboth MicroBlazeMicroBlaze andand February 23-25 FPGA Monterey, CA PowerPC. Amazing low price! February 25-27 Wireless Systems Conference San Jose, CA

March 3-11 IIC China Spring Shanghai, Beijing, Shenzhen

March 17 Synopsys Users Group San Jose, CA

April 1-3 Global Signal Processing Expo Dallas, TX

April 8-11 FCCM Napa, CA

April 23-25 Embedded Systems Conference San Francisco, CA

June 16 Embedded Processor Forum San Jose, CA

June 2-4 40th Design Automation Conference Anaheim, CA

For more information see:

For more information about Xilinx Worldwide Events, http://www.realfast.se/ please contact one of the following Xilinx team members or see our website at: Click on www.xilinx.com/events/ Real-Time For North American shows, contact Jennifer Waibel at: [email protected] Kernel. For European shows, contact Andrew Stock at: [email protected] For Japanese shows, contact Yumi Homura at: [email protected] For Asia Pacific shows, contact Mary Leung at: [email protected]

Spring 2003 Xcell Journal 99 FUZZYFUZZY LOGICLOGIC

Want to win a new Handspring PDA? Heard any good jokes lately?

In each issue of the Xcell Journal we give you many golden nuggets of information about programmable logic. So, we thought Contest Rules it would be fun to bury a few “diamonds” for you to find as well – something to challenge you and give you a diversion from your There are nine diamonds shown usual routine. Some of these diamonds are buried deep and are below. Each comes from a page somewhere in this issue of Xcell. Your difficult to dig up, some lie there on the surface, easy to find. job is to find the page where each Plus, we thought you might enjoy your colleagues’ humor. diamond originated. Add up the nine page numbers where you find each With a funny joke and a keen eye, you have a good chance of diamond, to give you a total - that’s winning one of five Handspring™ Visor Pro™ PDAs. your answer. Send us your answer along with your funniest clean joke, (one that is acceptable for printing). The five winning entries will be the ones with the correct answers and the funniest jokes (as judged by the Xcell editors). The five winners will receive a new Handspring Visor Pro PDA. Only one entry per person; Xilinx employees and contractors, and their families, are not eligible. The entry deadline is May 1, 2003. The winners and their jokes will be announced in the next issue of Xcell. Send your entry to [email protected], with the following words in the subject line: Xcell Diamonds - 123 (Replace 123 with your answer). In the body of your e-mail send us your funniest clean joke. Write down the page number where you find each diamond, add up the nine page numbers, Here’s a joke from us to get you started: and send us the total along with your joke. There are only 10 kinds of You could win a new Handspring PDA! engineers in the world: Those who understand binary numbers, and those who don’t.

100 Xcell Journal Spring 2003 REFERENCE

Xilinx CPLD Product Selection Matrix

PRODUCT SELECTION MATRIX I/O Features Speed Clocking System Gates Macrocells Product terms per Macrocell Input Voltage Compatible Output Voltage Compatible Max. I/O I/O Banking Min. Pin-to-Pin Logic Delay (ns) Commercial Speed Grades (fastest to slowest) Industrial Speed Grades (fastest to slowest) Global Clocks Clocks per Term Product Function Block CoolRunner-II Family — 1.8 Volt XC2C32 750 32 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 33 1 3.5 -3 -4 -6 -6 3 17 XC2C64 1500 64 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 64 1 4 -4 -5 -7 -7 3 17 XC2C128 3000 128 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 100 2 4.5 -4 -6 -7 -7 3 17 XC2C256 6000 256 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 184 2 5 -5 -6 -7 -7 3 17 XC2C384 9000 384 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 240 4 6 -6 -7 -10 -10 3 17 XC2C512 12000 512 40 1.5/1.8/2.5/3.3 1.5/1.8/2.5/3.3 270 4 6 -6 -7 -10 -10 3 17 CoolRunner XPLA3 Family — 3.3 Volt XCR3032XL 750 32 48 3.3/5 3.3 36 5 -5 -7 -10 -7 -10 4 16 XCR3064XL 1500 64 48 3.3/5 3.3 68 6 -6 -7 -10 -7 -10 4 16 XCR3128XL 3000 128 48 3.3/5 3.3 108 6 -6 -7 -10 -7 -10 4 16 XCR3256XL 6000 256 48 3.3/5 3.3 164 7.5 -7 -10 -12 -10 -12 4 16 XCR3384XL 9000 384 48 3.3/5 3.3 220 7.5 -7 -10 -12 -10 -12 4 16 XCR3512XL 12000 512 48 3.3/5 3.3 260 7.5 -7 -10 -12 -10 -12 4 16 XC9500XV Family — 2.5 Volt XC9536XV 800 36 90 2.5/3.3 1.8/2.5/3.3 36 1 5 -5 -7 -7 3 18 XC9572XV 1600 72 90 2.5/3.3 1.8/2.5/3.3 72 1 5 -5 -7 -7 3 18 XC95144XV 3200 144 90 2.5/3.3 1.8/2.5/3.3 117 2 5 -5 -7 -7 3 18 XC95288XV 6400 288 90 2.5/3.3 1.8/2.5/3.3 192 4 6 -6 -7 -10 -7 -10 3 18 XC9500XL Family — 3.3 Volt XC9536XL 800 36 90 2.5/3.3/5 2.5/3.3 36 5 -5 -7 -10 -7 -10 3 18 PACKAGE OPTIONS AND USER I/O XC9572XL 1600 72 90 2.5/3.3/5 2.5/3.3 72 5 -5 -7 -10 -7 -10 3 18 XC95144XL 3200 144 90 2.5/3.3/5 2.5/3.3 117 5 -5 -7 -10 -7 -10 3 18 CoolRunner-II CoolRunner XPLA3 XC9500XV XC9500XL XC95288XL 6400 288 90 2.5/3.3/5 2.5/3.3 192 6 -6 -7 -10 -7 -10 3 18 XC2C32 XC2C64 XC2C128 XC2C256 XC2C384 XC2C512 XCR3032XL XCR3064XL XCR3128XL XCR3256XL XCR3384XL XCR3512XL XC9536XV XC9572XV XC95144XV XC95288XV Body Size XC9536XL XC9572XL XC95144XL XC95288XL PLCC Packages (PC) 44 16.5 x 16.5 mm 33 33 36 36 34 34 34 34 PQFP Packages (PQ) 208 28 x 28 mm 173 173 173 164 172 180 168 168 VQFP Packages (VQ) 44 12 x 12 mm 33 33 36 36 34 34 34 34 64 12 x 12 mm 36 52 100 16 x 16 mm 64 80 80 68 84 TQFP Packages (TQ) 100 14 x 14 mm 72 81 72 81 144 20 x 20 mm 100 118 118 108 120 118* 117 117 117 117 Chip Scale Packages (CP) — wire-bond chip-scale BGA (0.5 mm ball spacing) 56 6 X 6 mm 33 45 48 *JTAG pins and port enable are not pin compatible 132 8 X 8 mm 100 106 in this package for this member of the family. Chip Scale Packages (CS) — wire-bond chip-scale BGA (0.8 mm ball spacing) Important: Verify all Data with Device Data Sheet and Product Availability with 48 7 x 7mm 36 40 36 38 36 38 your local Xilinx Rep 144 12 x 12 mm 108 117 117 280 16 x 16 mm 164 192 192 BGA Packages (BG) — wire-bond standard BGA (1.27 mm ball spacing) Automotive products are highlighted: 256 27 x 27 mm 192 -40C to +125C ambient temperature for CPLDs FGA Packages (FT) — wire-bond fine-pitch thin BGA (1.0 mm ball spacing) 256 17 x 17 mm 184 212 212 164 212 212 FBGA Packages (FG) — wire-bond Fineline BGA (1.0 mm ball spacing) 256 17 x 17 mm 192 192 324 23 x 23 mm 240 270 220 260

Spring 2003 Xcell Journal 101 REFERENCE

Xilinx Virtex FPGA Product Selection Matrix

Virtex-II Pro Package Configurations with Virtex-II Pro (1.5V) Virtex-II (1.5V) Available RocketIO Transceiver Blocks XC2VP2 XC2VP4 XC2VP7 XC2VP20 XC2VP30 XC2VP40 XC2VP50 XC2VP70 XC2VP100 XC2VP125 XC2V40 XC2V80 XC2V250 XC2V500 XC2V1000 XC2V1500 XC2V2000 XC2V3000 XC2V4000 XC2V6000 XC2V8000

Package XC2VP2 XC2VP4 XC2VP7 XC2VP20 XC2VP30 XC2VP40 XC2VP50 XC2VP70 XC2VP100 XC2VP125 Pins Body Size I/O’s 204 348 396 564 692 804 852 996 1164 1200 88 120 200 264 432 528 624 720 912 1104 1296 FG256 44 Chip Scale Packages — wire-bond chip-scale BGA (0.8 mm ball spacing) FG456 448 144 12 x 12 mm 88 92 92 FG676 888 BGA Packages (BG) — wire-bond standard BGA (1.27 mm ball spacing) FF672 448 575 31 x 31 mm 328 392 408 FF896 888 728 35 x 35 mm 516 FF1152 8 8 12 16 FGA Packages (FG) — wire-bond fine-pitch BGA (1.0 mm ball spacing) FF1148 0* 0* 256 17 x 17 mm 140 140 88 120 172 172 172 FF1517 16 16 456 23 x 23 mm 156 248 248 200 264 324 FF1704 20 20 24 676 27 x 27 mm 404 416 416 392 456 484 FF1696 0* 0* FFA Packages (FF) — flip-chip fine-pitch BGA (1.0 mm ball spacing) 672 27 x 27 mm 204 348 396 Note: * FF1148 and FF1696 packages support higher number of user I/O and 896 31 x 31 mm 396 556 556 432 528 624 zero RocketIO multi-gigabit transceivers 1152 35 x 35 mm 564 644 692 692 720 824 824 824 1148* 35 x 35 mm 804 812 1517 40 x 40 mm 852 964 912 1104 1108 1704 42.5 x 42.5 mm 996 1040 1040 1696* 42.5 x 42.5 mm 1164 1200 BFA Packages (BF) — flip-chip fine-pitch BGA (1.27 mm ball spacing) 957 40 x 40 mm 624 684 684 684

Note: Within the same family, all devices in a particular package are pin-out (footprint) compatible. Virtex-II packages FG456 and FG676 are also footprint compatible. Virtex-II packages FF896 and FF1152 are also footprint compatible. * The FF1148 and FF1696 packages support higher number of user I/O and zero RocketIO™ multi-gigabit transceivers. Important: Verify all Data with Device Data Sheet (http://www.xilinx.com/partinfo/databook.htm)

Numbers indicated in the matrix are the maximum number of user I/O’s for that package and device combination, I/Os for RocketIO MGTs are not included in this table.

CLB Resources Memory Resources DSP Clock Resources I/O Features Speed Transceiver Blocks Transceiver ™ System Gates (see note 1) (Row X Col) Array CLB Number of Slices Logic Cells (see note 2) CLB Flip-Flops Max. Distributed RAM Bits(kbits) # 18 kbits Block RAM Block RAM (kbits) Total Dedicated Multipliers # 18x18 (min/max) DCM Frequency # DCM Blocks (see note 3) Digitally Controlled Impedance Maximum Differential I/O Pairs Max. I/O I/O Standards Commercial Speed Grades (slowest to fastest) Industrial Speed Grades (slowest to fastest) Serial PROM Family System ACE Config. Memory (Bits) RocketIO PowerPC Processor Blocks PowerPC Series EasyPath Virtex-II Solution (see note 4) Virtex-II Pro Family — 1.5 Volt .13um Nine Layer Copper Process XC2VP2 * 16 x 22 1,408 3,168 2,816 44 12 216 12 24/420 4 YES 100 204 LDT-25, LVDS-25, LVDSEXT-25, -5 -6 -7 -5 -6 1.31M 4 0 XC2VP4 * 40 x 22 3,008 6,768 6,016 94 28 504 28 24/420 4 YES 172 348 BLVDS-25, ULVDS-25, LVPECL-25, -5 -6 -7 -5 -6 3.01M 4 1 XC2VP7 * 40 x 34 4,928 11,088 9,856 154 44 792 44 24/420 4 YES 196 396 LVCMOS25, LVCMOS18, -5 -6 -7 -5 -6 4.49M 8 1 XC2VP20 * 56 x 46 9,280 20,880 18,560 290 88 1,584 88 24/420 8 YES 276 564 LVCMOS15, PCI33, LVTTL, -5 -6 -7 -5 -6 8.21M 8 2 XC2VP30 * 80 x 46 13,696 30,816 27,392 428 136 2,448 136 24/420 8 YES 372 644 LVCMOS33, PCI-X, PCI66, GTL, -5 -6 -7 -5 -6 11.36M 8 2 ✔ ISP

XC2VP40 * 88 x 58 19,392 43,632 38,784 606 192 3,456 192 24/420 8 YES 396 804 GTL+, HSTL I (1.5V,1.8V), -5 -6 -7 -5 -6 ISP/OTP 15.56M 0**or 12 2 ✔ XC2VP50 * 88 x 70 23,616 53,136 47,232 738 232 4,176 232 24/420 8 YES 420 852 HSTL II (1.5V,1.8V), -5 -6 -7 -5 -6 19.02M 0**or 16 2 ✔ XC2VP70 * 104 x 82 33,088 74,448 66,176 1,034 328 5,904 328 24/420 8 YES 492 996 HSTL III (1.5V,1.8V), -5 -6 -7 -5 -6 25.60M 16 or 20 2 ✔ XC2VP100 * 120 x 94 44,096 99,216 88,192 1,378 444 7,992 444 24/420 12 YES 572 1,164 HSTL IV (1.5V,1.8V), SSTL2I, -5 -6 -7 -5 -6 33.65M 0**or 20 2 ✔ XC2VP125 * 136 x 106 55,616 125,136 111,232 1,738 556 10,008 556 24/420 12 YES 644 1,200 SSTL2II, SSTL18 I, SSTL18 II -5 -6 -7 -5 -6 42.78M 0**,20, or 24 4 ✔ Virtex-II Family — 1.5 Volt .15um Eight Layer Metal Process XC2V40 40K 8 x 8 256 576 512 8 4 72 4 24/420 4 YES 44 88 LDT-25, LVPECL-33, -4 -5 -6 -4 -5 0.4M XC2V80 80K 16 x8 512 1,152 1,024 16 8 144 8 24/420 4 YES 60 120 LVDS-33, LVDS-25, -4 -5 -6 -4 -5 0.6M Platform FPGAs XC2V250 250K 24 x16 1,536 3,456 3,072 48 24 432 24 24/420 8 YES 100 200 LVDSEXT-33, LVDSEXT-25, -4 -5 -6 -4 -5 1.7M XC2V500 500K 32 x 24 3,072 6,912 6,144 96 32 576 32 24/420 8 YES 132 264 BLVDS-25, ULVDS-25, -4 -5 -6 -4 -5 2.8M XC2V1000 1M 40 x 32 5,120 11,520 10,240 160 40 720 40 24/420 8 YES 216 432 LVTTL, LVCMOS33, -4 -5 -6 -4 -5 4.1M

XC2V1500 1.5M 48 x 40 7,680 17,280 15,360 240 48 864 48 24/420 8 YES 264 528 LVCMOS25, LVCMOS18, -4 -5 -6 -4 -5 ISP 5.7M ISP/OTP XC2V2000 2M 56 x 48 10,752 24,192 21,504 336 56 1,008 56 24/420 8 YES 312 624 LVCMOS15, PCI33, PCI66, -4 -5 -6 -4 -5 7.5M XC2V3000 3M 64 x 56 14,336 32,256 28,672 448 96 1,728 96 24/420 12 YES 360 720 PCI-X, GTL, GTL+, HSTL I, -4 -5 -6 -4 -5 10.5M ✔ XC2V4000 4M 80 x 72 23,040 51,840 46,080 720 120 2,160 120 24/420 12 YES 456 912 HSTL II, HSTL III, HSTL IV, -4 -5 -6 -4 -5 15.7M ✔ XC2V6000 6M 96 x 88 33,792 76,032 67,584 1,056 144 2,592 144 24/420 12 YES 552 1,104 SSTL2I, SSTL2II, SSTL3 I, -4 -5 -6 -4 -5 21.9M ✔ XC2V8000 8M 112 x 104 46,592 104,832 93,184 1,456 168 3,024 168 24/420 12 YES 554 1,108 SSTL3 II, AGP,AGP-2X -4 -5 29.1M ✔

Note: 1. System Gates include 20-30% of CLBs used as RAM 2. Logic cell = (1) 4 Input (LUT) Look Up Table + Flip Flop + Carry Logic. 3. DCM – Digital Clock Management 4. Virtex-II Series EasyPath solution available to provide a no risk, no effort cost reduction path for volume production. * System gate count not meaningful for Virtex-II Pro devices with immersed special blocks such as PowerPC processors and multi-gigabit transceivers. ** The FF1148 and FF1696 packages support higher number of user I/O and zero RocketIO multi-gigabit transceivers. Important: Verify all Data with Device Data Sheet (http://www.xilinx.com/partinfo/databook.htm)

102 Xcell Journal Spring 2003 REFERENCE

Xilinx Spartan FPGAs

PRODUCT SELECTION MATRIX

CLB Resources BLK RAM CLK Resources I/O Features Speed System Gates (see note 1) (Row X Col) Array CLB Number of Slices Logic Cells (see note 2) CLB Flip-Flops Max. Distributed RAM Bits # Block RAM Block RAM (kbits) # Dedicated Multipliers (min/max) DLL Frequency # DLL's Synthesis Frequency Phase Shift Digitally Controlled Impedance Number of Differential I/O Pairs Max. I/O I/O Standards Commercial Speed Grades (slowest to fastest) Industrial Speed Grades (slowest to fastest) Serial PROM Family Config. Memory (Bits) Spartan-IIE Family — 1.8 Volt .18/.15um Six Layer Metal Process XC2S50E 50K 16 x 24 768 1,728 1,536 24K 8 32K NA 25/320 4 YES YES NA 83 182 LVTTL,LVCMOS2, -6 -7 -6 0.6M XC2S100E 100K 20 x 30 1,200 2,700 2,400 37K 10 40K NA 25/320 4 YES YES NA 86 202 LVCMOS18, PCI33, PCI66, -6 -7 -6 0.9M XC2S150E 150K 24 x 36 1,728 3,888 3,456 54K 12 48K NA 25/320 4 YES YES NA 114 265 GTL, GTL+, HSTL I, HSTL III, -6 -7 -6 1.1M XC2S200E 200K 28 x 42 2,352 5,292 4,704 73K 14 56K NA 25/320 4 YES YES NA 120 289 HSTL IV,SSTL3 I, SSTL3 II, -6 -7 -6 1.4M XC2S300E 300K 32 x 48 3,072 6,912 6,144 96K 16 64K NA 25/320 4 YES YES NA 120 329 SSTL2 I, SSTL2 II,AGP-2X, -6 -7 -6 1.9M XC2S400E 400K 40 x 60 4,800 10,800 9,600 150K 40 160K NA 25/320 4 YES YES NA 172 410 CTT, LVDS, BLVDS, LVPECL -6 -7 -6 2.7M XC2S600E 600K 48 x 72 6,912 15,552 13,824 216K 72 288K NA 25/320 4 YES YES NA 205 514 -6 -7 -6 4.0M Spartan-II Family — 2.5 Volt .22/.18um Six Layer Metal Process XC2S15 15K 8 x 12 192 432 384 6K 4 16K NA 25/200 4 YES YES NA NA 86 LVTTL, LVCMOS2, -5 -6 -5 0.2M XC2S30 30K 12 x 18 432 972 864 13.5K 6 24K NA 25/200 4 YES YES NA NA 132 PCI33 (3.3V & 5V), -5 -6 -5 0.4M XC2S50 50K 16 x 24 768 1,728 1,536 24K 8 32K NA 25/200 4 YES YES NA NA 176 PCI66 (3.3V), GTL, GTL+, -5 -6 -5 0.6M XC2S100 100K 20 x 30 1,200 2,700 2,400 37.5K 10 40K NA 25/200 4 YES YES NA NA 196 HSTL I, HSTL III, HSTL IV, -5 -6 -5 0.8M XC2S150 150K 24 x 36 1,728 3,888 3,456 54K 12 48K NA 25/200 4 YES YES NA NA 260 SSTL3 I, SSTL3 II, SSTL2 I, -5 -6 -5 1.1M XC2S200 200K 28 x 42 2,352 5,292 4,704 73.5K 14 56K NA 25/200 4 YES YES NA NA 284 SSTL2 II, AGP-2X, CTT -5 -6 -5 1.4M Spartan-XL Family — 3.3 Volt XCS05XL 5K 10 x 10 100 238 200 3.1K NA NA NA NA NA NA NA NA NA 77 TTL, LVTTL, CMOS, -4 -5 -4 0.05M XCS10XL 10K 14 x 14 196 466 392 6.1K NA NA NA NA NA NA NA NA NA 112 LVMOS, PCI -4 -5 -4 0.09M XCS20XL 20K 20 x 20 400 950 800 12.5K NA NA NA NA NA NA NA NA NA 160 -4 -5 -4 0.18M ISP ISP ISP OTP OTP OTP XCS30XL 30K 24 x 24 576 1,368 1,152 18.0K NA NA NA NA NA NA NA NA NA 192 -4 -5 -4 0.25M XCS40XL 40K 28 x 28 784 1,862 1,568 24.5K NA NA NA NA NA NA NA NA NA 224 -4 -5 -4 0.33M

PACKAGE OPTIONS AND USER I/O

Spartan-IIE (1.8V) Spartan-II (2.5V) Spartan-XL (3.3V) Note: 1. System Gates include 20-30% of CLBs used as RAM 2. Logic Cell is defined as a 4 input LUT and a register

Important: Verify all Data with Device Data Sheet (http://www.xilinx.com/spartan) XC2S50E XC2S100E XC2S150E XC2S200E XC2S300E XC2S400E XC2S600E XCS05XL XCS10XL XCS20XL XCS30XL XCS40XL XC2S15 XC2S30 XC2S50 XC2S100 XC2S150 XC2S200

Pins Body Size I/O’s 182 202 265 289 329 410 514 86 132 176 196 260 284 77 112 160 192 224 Numbers indicated in the matrix are the maximum number of user I/O’s for PLCC Packages that package and device combination. 84 30 x 30 mm 61 61 PQFP Packages (PQ) 208 28 x 28 mm 146 146 146 146 146 132 140 140 140 140 160 169 169 Automotive products are highlighted: 240 32 x 32 mm 192 192 -40C to +125C junction temperature for FPGAs VQFP Packages (VQ) 100 14 x 14 mm 60 60 77 77 77 77 TQFP Packages (TQ) 144 20 x 20 mm 102 102 86 92 92 92 112 113 113 Chip Scale Packages — wire-bond chip-scale BGA (0.8 mm ball spacing) 144 12 x 12 mm 86 92 112 113 280 16 x 16 mm 192 224 FGA Packages (FT) — wire-bond fine-pitch thin BGA (1.0 mm ball spacing) 256 17 x 17 mm 182 182 182 182 182 182 FGA Packages (FG) — wire-bond fine-pitch BGA (1.0 mm ball spacing) 256 17 x 17 mm 176 176 176 176 456 23 x 23 mm 202 265 289 329 329 329 196 260 284 676 27 x 27 mm 410 514 BGA Packages 256 27 x 27 mm 192 205

Spring 2003 Xcell Journal 103 REFERENCE

Xilinx IQ Solutions

Part Number Speed Package Voltage Description Part Number Speed Grade Package Voltage Description

XC9500XL CPLD Spartan-XL

XC9536XL 10 ns/100 MHz VQ44, VQ64 3.3V 36 Macrocells (800 Gates), ISP, JTAG, Bus Hold XCS05XL -4 VQ100 3.3V Low cost FPGA with power down pin, 5V tol I/O, & I/P Hysteresis 5,000 Gate, 238 logic cells, 100 CLBs.

XC9572XL 10 ns/100 MHz VQ64, TQ100 3.3V 72 Macrocells (1,600 Gates), ISP, JTAG, Bus Hold XCS10XL -4 VQ100 3.3V Low cost FPGA with power down pin, 5V tol I/O, & I/P Hysteresis 10,000 Gate, 466 logic cells, 196 CLBs. CoolRunner XPLA3 XCS20XL -4 TQ144, PQ208 3.3V Low cost FPGA with power down pin, 5V tol I/O, 20,000 Gate, 950 logic cells, 400 CLBs. XCR3032XL 10 ns/100 MHz VQ44 3.3V 32 Macrocells (800 Gates), Low Power, Slew Rate Control, ISP & JTAG XCS30XL -4 TQ144, PQ208 3.3V Low cost FPGA with power down pin, 5V tol I/O, 30,000 Gate, 1,368 logic cells, 576 CLBs. XCR3064XL 10 ns/100 MHz VQ44, VQ100 3.3V 64 Macrocells (1,600 Gates), Low Power, Slew Rate Control, ISP & JTAG XCS40XL -4 PQ208, BG256 3.3V Low cost FPGA with power down pin, 5V tol I/O, 40,000 Gate, 1,862 logic cells, 784 CLBs. XCR3128XL 10 ns/100 MHzVQ100, TQ144 3.3V 128 Macrocells (3,200 Gates), Low Power, Slew Rate Control, ISP & JTAG Spartan-II

XCR3256XL 10 ns/100 MHz TQ144, PQ208 3.3V 256 Macrocells (6,400 Gates), Low Power, XC2S15 -5 TQ144 2.5V High volume FPGA, on-chip RAM, 16 I/O Slew Rate Control, ISP & JTAG standards, 15,000 Gate, 432 logic cells, XCR3384XL 10 ns/100 MHz PQ208 3.3V 384 Macrocells (9,600 Gates), Low Power, 96 CLBs, 4 block RAM blocks, 4 DLLS. Slew Rate Control, ISP & JTAG XC2S30 -5 TQ144, PQ208 2.5V High volume FPGA, on-chip RAM, 16 I/O XCR3512XL 10 ns/100 MHz PQ208 3.3V 512 Macrocells (12,800 Gates), Low Power, standards, 30,000 Gate, 972 logic cells, Slew Rate Control, ISP & JTAG 216 CLBs, 6 block RAM blocks, 4 DLLS.

CoolRunner-II XC2S50 -5 TQ144, PQ208, 2.5V High volume FPGA, on-chip RAM, 16 I/O FG256 standards, 50,000 Gate, 1,728 logic cells, XC2C32 6 ns/145 MHz VQ44 1.8V 32 Macrocells (800 Gates), 6 I/O Standards, 384 CLBs, 8 block RAM blocks, 4 DLLs. Slew Rate Control, Clock Doubler, Bus Hold, I/P Hysteresis. Ultra low power. XC2S100 -5 TQ144, PQ208, 2.5V High volume FPGA, on-chip RAM, 16 I/O FG256 standards, 100,000 Gate, 2,700 logic cells, XC2C64 7.5 ns/127 MHz VQ44, VQ100 1.8V 64 Macrocells (1,600 Gates), 6 I/O Standards, 600 CLBs, 10 block RAM blocks, 4 DLLs. Slew Rate Control, Clock Doubler, Bus Hold, XC2S150 -5 PQ208, FG256 2.5V High volume FPGA, on-chip RAM, 16 I/O I/P Hysteresis. Ultra low power. standards, 150,000 Gate, 3,888 logic cells, XC2C128 7.5 ns/127 MHz VQ44, VQ100 1.8V 128 Macrocells (3,200 Gates), 9 I/O Standards, 864 CLBs, 12 block RAM blocks, 4 DLLs. Slew Rate Control, Clock Doubler, Clcok Divider, XC2S200 -5 PQ208, FG456 2.5V High volume FPGA, on-chip RAM, 16 I/O CoolClock, DataGate, Bus Hold, I/P Hysteresis. standards, 200,000 Gate, 5,292 logic cells, Ultra low power. 1,176 CLBs, 14 block RAM blocks, 4 DLLs. XC2C256 7.5 ns/127 MHz VQ100, TQ144 1.8V 256 Macrocells (6,400 Gates), 9 I/O Standards, Spartan-IIE Slew Rate Control, Clock Doubler, Clcok Divider, CoolClock, DataGate, Bus Hold, I/P Hysteresis. XC2S50E -6 TQ144, PQ208, 1.8V High volume FPGA, on-chip RAM, 19 I/O Ultra low power. FT256 standards, 50,000 Gate, 1,728 logic cells, 384 CLBs, 8 block RAM blocks, 4 DLLs. XC2C384 10 ns/100 MHz TQ144, PQ208 1.8V 384 Macrocells (9,600 Gates), 9 I/O Standards, Slew Rate Control, Clock Doubler, Clcok Divider, XC2S100E -6 TQ144, PQ208, 1.8V High volume FPGA, on-chip RAM, 19 I/O CoolClock, DataGate, Bus Hold, I/P Hysteresis. FT256 standards, 100,000 Gate, 2,700 logic cells, Ultra low power. 600 CLBs, 10 block RAM blocks, 4 DLLs.

XC2C512 10 ns/100 MHz PQ208 1.8V 512 Macrocells (12,800 Gates), 9 I/O Standards, XC2S150E -6 PQ208, FT256 1.8V High volume FPGA, on-chip RAM, 19 I/O Slew Rate Control, Clock Doubler, Clcok Divider, standards, 150,000 Gate, 3,888 logic cells, CoolClock, DataGate, Bus Hold, I/P Hysteresis. 864 CLBs, 12 block RAM blocks , 4 DLLs. Ultra low power. XC2S200E -6 PQ208, FT256 1.8V High volume FPGA, on-chip RAM, 19 I/O standards, 200,000 Gate, 5,292 logic cells, Note: See page 96 for CPLD IQ devices Package Options and User I/O. 1,176 CLBs, 14 block RAM blocks, 4 DLLs. XC2S300E -6 PQ208, FG456 1.8V High volume FPGA, on-chip RAM, 19 I/O standards, 300,000 Gate, 6,912 logic cells, 1,536 CLBs, 16 block RAM blocks, 4 DLLs.

XC2S400E -6 FT256, FG456, 1.8V High volume FPGA, on-chip RAM,19 I/O FG676 standards, 400,000 Gate,10,800 logic cells, 2,400 CLBs, 40 block RAM blocks, 4DLLs.

XC2S600E -6 FG456, FG676 1.8V High volume FPGA, on-chip RAM,19 I/O standards, 600,000 Gate,10,800 logic cells, 3,456 CLBs, 72block RAM blocks, 4DLLs.

Note: See page 93 for Spartan IQ devices Package Options and User I/O.

104 Xcell Journal Spring 2003 REFERENCE FG676 (1.0mm) 27.0x27.0mm BG728 BF957 (1.27mm) FF1517 (1.0mm) (1.27mm) 35.0x35.0mm 40.0x40.0mm 40.0x40.0mm FG456 (1.0mm) 23.0x23.0mm FF1152 (1.0mm) 35.0x35.0mm BG575 (1.27mm) FG1156 (1.0mm) FG324 (1.0mm) 31.0x31.0mm 35.0x35.0mm 23.0x23.0mm FG256 (1.0mm) 17.0x17.0mm FF896 (1.00mm) 31.0x31.0mm FG900 BG256 (1.0mm) (1.27mm) 31.0x31.0mm 27.0x27.0mm FT256 (1.0mm) 17.0x17.0mm CAVITY-UP BGA’S CAVITY-UP FF672 (1.0mm) 27.0x27.0mm FLIP-CHIP BGA’S BG560 FG680 (1.27mm) (1.0mm) 42.5x42.5mm 40.0x40.0mm CS280 (0.8mm) 16.0x16.0mm CAVITY-DOWN BGA’S CAVITY-DOWN (with copper Heatslug) (0.5mm) HQ/PQ240 34.6x34.6mm CS144 (0.8mm) 12.0x12.0mm (0.5mm) VQ100 16.0x16.0mm PC84 TQ144 (0.5mm) (0.05in) 1.19x1.19in 22.0x22.0mm CS48 (0.8mm) 7.0x7.0mm VQ64 (0.5mm) 12.0x12.0mm CP132 (0.5mm) 8.0x8.0mm (0.5mm) HQ/PQ208 30.6x30.6mm TQ100 (0.5mm) PC44 (0.05in) 16.0x16.0mm VQ44 CP56 0.69x0.69in (0.8mm) (0.5mm) 12.0x12.0mm CHIP-SCALE BGA 6.0x6.0mm VERY THIN QUAD FLAT PACK THIN QUAD FLAT VERY THIN QUAD FLAT PACK THIN QUAD FLAT PLASTIC QUAD FLAT PACK PLASTIC QUAD FLAT PLASTIC QUAD FLAT PACK PLASTIC QUAD FLAT

Spring 2003 Xcell Journal 105 REFERENCE

Xilinx Configuration Storage Solutions Memory Density Number of Components space Min board Compression FPGA Config. Mode Multiple Designs Storage Software Removable IRL Hooks Max Config. Speed Media Non-Volatile

SystemACE CF up to 8 Gbit 2 25 cm2 No JTAG Unlimited Yes Yes Yes 30 Mbit/sec CompactFlash

SystemACE MPM 16 Mbit 1 12.25 cm2 Yes SelectMAP (up to 4 FPGA) Up to 8 No No Yes 152 Mbit/sec AMD Flash Memory 32 Mbit Slave-Serial (up to 8 FPGA chains) 64 Mbit

SystemACE SC 16 Mbit 3 Custom Yes SelectMAP (up to 4 FPGA) Up to 8 No No Yes 152 Mbit/sec AMD Flash memory 32 Mbit Slave-Serial (up to 8 FPGA chains) 64 Mbit

I/O Voltage FPGA PROM Solution PD8 VO8 SO20 PC20 PC44 VQ44 Core Voltage 2.5V 3.3V I/O OTP Configuration PROMs for Spartan-II/IIE Voltage XC2S50E XC17S50A Y Y Y 3.3V Y Y XC2S100E XC17S100A Y Y Y 3.3V Y Y Density PD8 VO8 SO20 PC20 PC44 VQ44 Core Voltage 2.5V 3.3V XC2S150E XC17S200A Y Y Y 3.3V Y Y In-System Programming (ISP) Configuration PROMs XC2S200E XC17S200A Y Y Y 3.3V Y Y XC18V256 256Kb Y Y Y 3.3V Y Y XC2S300E XC17S300A Y 3.3V Y Y XC18V512 512Kb Y Y Y 3.3V Y Y XC2S15 XC17S15A Y Y Y 3.3V Y Y XC18V01 1Mb Y Y Y 3.3V Y Y XC2S30 XC17S30A Y Y Y 3.3V Y Y XC18V02 2Mb Y Y 3.3V Y Y XC2S50 XC17S50A Y Y Y 3.3V Y Y XC18V04 4Mb Y Y 3.3V Y Y XC2S100 XC17S100A Y Y Y 3.3V Y Y One-Time Programmable (OTP) Configuration PROMs XC2S150 XC17S150A Y Y Y 3.3V Y Y XC17V01 1.6Mb Y Y Y 3.3V Y Y XC2S200 XC17S200A Y Y Y 3.3V Y Y XC17V02 2Mb Y Y Y 3.3V Y Y XC17V04 4Mb Y Y Y 3.3V Y Y XC17V08 8Mb Y Y 3.3V Y Y I/O XC17V16 16Mb Y Y 3.3V Y Y Voltage FPGA PROM Solution PD8 VO8 SO20 PC20 PC44 VQ44 Core Voltage 3.3V 5.0V OTP Configuration PROMs for Spartan-XL XCS05XL XC17S05XL Y Y 3.3V Y Y XCS10XL XC17S10XL Y Y 3.3V Y Y XCS20XL XC17S20XL Y Y 3.3V Y Y XCS30XL XC17S30XL Y Y 3.3V Y Y XCS40XL XC17S40XL Y Y 3.3V Y Y

Xilinx Home Page Xilinx Education Center http://www.xilinx.com http://www.xilinx.com/support/education-home.htm

Xilinx Online Support Xilinx Tutorial Center http://www.xilinx.com/support/support.htm http://www.xilinx.com/support/techsup/tutorials/index.htm

Xilinx IP Center Xilinx WebPACK http://www.xilinx.com/ipcenter/index.htm http://www.xilinx.com/sxpresso/webpack.htm

Spring 2003 Xcell Journal 106 REFERENCE

Xilinx Software

Feature ISE WebPACK ISE BaseX ISE Foundation ISE Alliance Devices Virtex™Series Virtex-E: V50E – V300E Virtex: V50 – V300 ALL ALL Virtex-II: 2V40 – 2V250 Virtex-E: V50E – V300E Virtex-II Pro: 2VP2 Virtex-II: 2V40 – 2V250 Virtex-II Pro: 2VP2 Spartan™ II/IIE Families ALL (except XC2S400E and XC2S600E) ALL ALL ALL CoolRunner™ XPLA3 / CoolRunner-II ALL ALL ALL ALL XC9500™ Series ALL ALL ALL ALL Services Educational Services Yes Yes Yes Yes Design Services Sold as an Option Sold as an Option Sold as an Option Sold as an Option Support Services Web Only Yes Yes Yes Design Entry Schematic Editor Yes Yes PC Only No HDL Editor Yes Yes Yes Yes State Diagram Editor Yes Yes PC Only No CORE Generator System No Yes Yes Yes PACE (Pinout and Area Constraint Editor) Yes Yes Yes Yes Architecture Wizards No Yes Yes Yes DCM – Digital Clock Management MGT – Multi-Gigabit Transcievers 3rd Party RTL Checker Support Yes Yes Yes Yes Xilinx System Generator for DSP No Sold as an Option Sold as an Option Sold as an Option Embedded GNU Embedded Tools Yes Yes Yes Yes System Design GCC – GNU Compiler GDB – GNU Software Debugger WindRiver Xilinx Edition Development Tools No Sold as an Option Sold as an Option Sold as an Option Diab C/C++ Compiler SingleStep Debugger visionPROBE II target connection Synthesis Xilinx Synthesis Technology (XST) Yes Yes Yes No Synplicity Synplify/Pro Integrated Interface Integrated Interface Integrated Interface (PC Only) Integrated Interface (PC Only) Synplicity Amplify Physical Synthesis Support Yes Yes Yes Yes Leonardo Spectrum Integrated Interface Integrated Interface Integrated Interface Integrated Interface Synopsys FPGA Compiler II EDIF Interface EDIF Interface EDIF Interface EDIF Interface ABEL CPLD CPLD CPLD (PC Only) No Implementation iMPACT Yes Yes Yes Yes FloorPlanner Yes Yes Yes Yes Xilinx Constraints Editor Yes Yes Yes Yes Timing Driven Place & Route Yes Yes Yes Yes System ACE Configuration Manager Yes Yes Yes Yes Modular Design Yes Yes Yes Yes Timing Improvement Wizard Yes Yes Yes Yes Board Level IBIS Models Yes Yes Yes Yes Integration STAMP Models Yes Yes Yes Yes LMG SmartModels Yes (Available from Synopsys) Yes (Available from Synopsys) Yes (Available from Synopsys) Yes (Available from Synopsys) HSPICE Models* Yes Yes Yes Yes Verification HDL Bencher™ Yes Yes PC Only No ModelSim® Xilinx Edition (MXE II) ModelSim XE II Starter** ModelSim XE II Starter** ModelSim XE II Starter** ModelSim XE II Starter** Static Timing Analyzer Yes Yes Yes Yes ChipScope PRO No Sold as an Option Sold as an Option Sold as an Option FPGA Editor with Probe No Yes Yes Yes ChipViewer Yes Yes Yes Yes XPower (Power Analysis) Yes Yes Yes Yes 3rd Party Equivalency Checking Support Yes Yes Yes Yes SMARTModels for PPC and Rocket I/O No Yes Yes Yes 3rd Party Simulator Support Yes Yes Yes Yes Platforms PC (MS Windows 2000/MS Windows XP) PC Only (MS Windows 2000/MS Windows XP) PC (MS Windows 2000/MS Windows XP)Sun Solaris, Linux PC (MS Windows 2000/MS Windows XP)Sun Solaris, Linux IP/CORE For more information on the complete list of Xilinx IP products, visit the Xilinx IP Center at http://www.xilinx.com/ipcenter

* HSPICE Models available at the Xilinx Design Tools Center at www.xilinx.com/ise. ** MXE II supports the simulation of designs up to 1 million system gates and is sold as an option. For more information, visit the Xilinx Design Tools Center at www.xilinx.com/ise

Spring 2003 Xcell Journal 107 REFERENCE

Xilinx Global Services

Part Number Product Description Duration Availability support.xilinx.com Education Services hours FPGA13000-5-ILT Fundamentals of FPGA Design 8 Now • Sign up for personalized email FPGA23000-5-ILT Designing for Performance 16 Now alerts @mysupport.xilinx.com FPGA33000-5-ILT Advanced FPGA Implementation 16 Now • Search our knowledge database LANG11000-5-ILT Introduction to VHDL 24 Now • Troubleshoot with Problem Solvers LANG21000-5-ILT Advanced VHDL 16 Now • Consult with engineers in Forums LANG12000-5-ILT Introduction to Verilog 24 Now PCI8000-4-ILT PCI CORE Basics 8 Now Xilinx Productivity Advantage Program PCI28000-4-ILT Designing a PCI System 16 Now (XPA) ASIC25000-5-ILT Designing for Performance for the ASIC User 24 Now The XPA provides customers with everything they need to DSP2000-3-ILT DSP Implementation Techniques for Xilinx FPGAs 24 Now improve their designs — Software, Education and Support DSP-10000-5-ILT DSP Design Flow 24 Now Services, IP cores and demo boards — in one package. A single purchase order allows their designers to get what RIO22000-5-ILT Designing with RocketIO MGT 16 Now they need, when they need it, without the worry of ordering PROMO-5004-5-ILT FPGA Essentials 15 Now each piece of the solution separately. PROMO-5003-5-LEL Designing for Performance, Live Online 9 Now Two types of XPAs are available; custom XPA and EMBD-21000-5-ILT Embedded Systems Development 16 Now Prepacked “XPA Seat”. “XPA Seat” is primarily for Platinum Technical Service individuals or small design teams. SC-PLAT-SVC-10 1 Seat Platinum Technical Service w/10 education credits N/A Now http://support.xilinx.com/support/gsd/xpa_program.htm SC-PLAT-SITE-50 Platinum Technical Service site license up to 50 customers N/A Now SC-PLAT-SITE-100 Platinum Technical Service site license for 51-100 customers N/A Now Xilinx Design Services (XDS) SC-PLAT-SITE-150 Platinum Technical Service site license for 101-150 customers N/A Now • XDS provides extensive FPGA hardware and Titanium Technical Service embedded software design experience backed PS-TEC-SERV Titanium Technical Service (minimum 40 hours) N/A Now by industry recognized experts and resources to solve even the most complex design challenge. Design Services DC-DES-SERV Design Services Contract N/A Now • System Architecture Consulting — Provide engineering services to define system architec- Xilinx Productivity Advantage ture and partitioning for design specification. DS-XPA Custom XPA Packaged Solution N/A Now DS-ISE-ALI-XPA XPA Seat, ISE Alliance N/A Now • Custom Design Solutions — Project designed, DS-ISE-FND-XPA XPA Seat, ISE Foundation N/A Now verified, and delivered to mutually agreed upon design specifications. Titanium Technical Service Platinum Technical Service • IP Core Development, Optimization, Integration, Modification, and Verification — • Dedicated application engineer • Access to Senior Applications Engineers Modify, integrate, and optimize customer • Design Flow methodology coaching • Dedicated Toll Free Number* intellectual property or third party cores to work with Xilinx technology. Develop customer-required • Contract based service • Priority Case Resolution special features to Xilinx IP cores or third party • Factory escalation process • Ten Education Credits cores. Perform integration, optimization, and • Timing Closure expertise • Electronic Newsletter verification of IP cores in Xilinx technology. • Service at customer site or Xilinx site • Formal Escalation Process • Embedded Software — Develop complex embedded software with real-time constraints, • Service Packs and Software Updates using hardware/software co-design techniques. • Application Engineer to Customer Ratio, 2x Gold Level • Conversions — Convert ASIC designs and other *Toll free number available in US only, dedicated local numbers FPGAs to Xilinx technology and devices. available across Europe

Education Services Contacts Design Services Contacts XPA Contacts Titanium Technical Service Contacts North America: 877-XLX-CLAS (877-959-2527) North America & Asia: North America: 800-888-FPGA (3742) North America: http://support.xilinx.com/support/training/training.htm Richard Fodor: 408-626-4256 fpga.xilinx.com Telesales: 1-800-888-3742 Mike Barone: 512-238-1473 Europe: +44-870-7350-548 Europe: Europe: [email protected] Europe: Stuart Elston: +44-870-7350-532 Stuart Elston: +44-870-7350-632 Alex Hillier: +44-870-7350-516 Japan: +81-03-5321-7750 Japan: Martina Finnerty: +353-1-4032469 http://support.xilinx.co.jp/support/education-home.htm 81-3-5321-7730 or [email protected] Asia Pacific: +03-5321-7711 [email protected] http://support.xilinx.com/support/education-home.htm

108 Xcell Journal Spring 2003 REFERENCE on, data storage ATM PHY layer ATM PHY layer ATM ical layer of Fiber Channel ical layer of Fiber Channel Bluetooth applications ATM, SONET, and Ethernet ATM, SONET, and Ethernet POS PHY4, iSCSI line cards NICs, routers, switches, hubs Bridges, switches, links WAN DECT,VOIP, cordless telephony DECT,VOIP, cordless telephony DECT,VOIP, cordless telephony DECT,VOIP, cordless telephony X.25, Relay, Frame B/D-Channel Secure communication, data storage Digital broadcast, transmitter microwave Line card: routers & optical switches terabit Line card: routers & optical switches terabit iSCSI line cards, SPI-3 (POS-PHY L3) to Gb Ethernet and other bridges Channel coding in telecom/wireless, broadcast Line cards, iSCSI cards, gigabit routers and switches Layer 2 swtiches/hubs, test equipment, bridge to Line cards, switches, routers and optical switches Networking, edge and access, Switches and routers Line cards, cards, iSCSI gigabit routers and switches Line cards, cards, iSCSI gigabit routers and switches Line cards, cards, iSCSI gigabit routers and switches Line cards, cards, iSCSI gigabit routers and switches Access systems, multi-service switches, DSLAMs, Basestation controllers eCommerce, Banking, phones, Video PDA, satellite communications Broadcast, wireless LAN, cable modem, xDSL, nets, satellite com,uwave TV, digital CDMA2000 eCommerce, Banking, phones, Video PDA, satellite communications X.25, POS, cable modems, relay switches, frame video conferencing over ISDN X.25, POS, cable modems, relay switches, frame video conferencing over ISDN BER measurements for Forward Error Correction cores such as Reed-Solomon, BER measurements for Forward 3G base stations, broadcast, wireless LAN, cable modem, xDSL, satellite com, uwave Viterbi Decoder,Turbo Convolutional code or Turbo Product code Turbo Convolutional code or Decoder,Turbo Viterbi GbE Network Interface Cards (NICs), based line cards, routers – packet Edge switches and terabit Transaction and secure communications;Transaction surveillance, and embedded systems storage on ches WAN/LAN functionality,WAN/LAN Statistics gathering 100Base-T/TX/FX/4; RMON & Etherstats SERDES for GMII; Auto-negotiation for 1000BASE-X and Mindspeed OC-192 framers. Supports dynamic alignment. parallel interface or XAUI interface, interface or XAUI parallel supports 10 GBASE-X, integrated PCS/PMA interface,integrated supporting 1000BASE-X application IEEE 802.3-2000 compliant, support 8-bit GMII interface or processor (NPV), low power an device count than competing network processors. Single/multi-mode fiber optics; 802.3x full duples flow control; OIF SPI-4 Phase 2 compliant. at OC-48 line rate Operates Line cards, switches, and optical switches routers Compliant with ANSI X9.52,Compliant with or two independent 64-bit keys 128-bit key Separate generator and verifier blocks, generator Separate AAL5 compat with ITU-T I.363 for Supports up to 32 links/32 groups; I/Fs,ATM L2 PHY & UTOPIA supports IDCR OIF SPI-3 (POS-PHY L3) compliant. OC-48 framers. with PMC-Sierra Fully HW interoperable OIF SPI-3 (POS-PHY L3) compliant. OC-48 framers. with PMC-Sierra Fully HW interoperable OIF SPI-3 (POS-PHY L3) compliant. OC-48 framers. with PMC-Sierra Fully HW interoperable Supports ECB, OFB, CFB, CBC modes; Supports 128, 192 and 256-bit keys Supports G.721 G.723 G.726 G.726a G.727 G.727a, G.727 G.726a G.726 G.723 Supports G.721 u-law, a-law G.727a, G.727 G.726a G.726 G.723 Supports G.721 u-law, a-law G.727a, G.727 G.726a G.726 G.723 Supports G.721 u-law, a-law G.727a, G.727 G.726a G.726 G.723 Supports G.721 u-law, a-law Compliant with ITU-T I.432 scrambler. data width, Parameterizable cell length, header length Designed to IEEE 802.3ae, version D4.1 support both 32-bit XGMII RFC1619 (IP&IPX) POS, and verification, 16/32 bit FCS generation stats 16/32-bit frame seq,16/32-bit frame 8/16-bit addr insert/delete, flag/zerop insert/detect OIF SPI-3 (POS-PHY L3) compliant. OC-48 framers. with PMC-Sierra Fully HW interoperable Block & convolutional support. features. param 3GPP, UMTS, GSM, DVB compliant Conforms to ETSI EN 300 421 v1.1.2, selectable convolutional code rate Total Solution requires this core + SPEEDAnalyzer ASIC, Solution requires this core + SPEEDAnalyzer Total 2.5 Gbps full duplex wire speed; network Compliant to Bluetooth v1.1, for L2CAP, BQB qualified software LHP, HC1, voice support Compliant with ITU-T I.432. data width, Parameterizable cell & header length Separate generator and verifier blocks, generator Separate AAL3/AAL4 compatible with ITU-T I.363 for Probability density function (PDF) deviates less than 0.2 percent from OIF SPI-4 Phase 1 and Flexbus4 compliant.AMCC OC-192 framers. with Fully HW interoperable OIF SPI-4 Phase 1 and Flexbus4 compliant.AMCC OC-192 framers. with Fully HW interoperable the Gaussian PDF for |x| < 4.8_ and is obtained from a closed-form expression OIF SPI-4 Phase 2 compliant. with PMC-Sierra Fully HW interoperable 16/32-bit frame seq,16/32-bit frame 8/16-bit addr insert/delete, flag/zerop insert/detection evice Features Key Application Examples FG676-5 XC2V3000-5 XC2V1000 FG456-4 XC2V1000 FG456-5 XC2V1000 FG456-4 XC2V1000 FG456-4 XC2V3000 FG676-5 XC2V1000 FG456-5 or 1.25 Gbps or 4 channels of 3.125 Gbps Rocket I/O transceivers Rocket 156.25 DDR for XGMII

Implementation Example

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II AllianceCOREAllianceCOREAllianceCOREAllianceCORE V-IIAllianceCORE V-IIAllianceCORE V-II V-E V-II VAllianceCORE V-IIAllianceCORE V-II V-E V V-E V V-II VAllianceCORE V-EAllianceCORE VAllianceCORE S-II S-IIAllianceCORE 89% S 66% VAllianceCORE S-II 89% 49%AllianceCORE 36% 16 89% 30 V S-II VAllianceCORE 50 26% 6 103 V 50 S XCV150-6 XCV400E-8 AllianceCORE 134 VAllianceCORE 71% XC2V500-5 XCV400E-8 V XC2V250-5 XC2V500-5 S-IIAllianceCORE S 67 V XC2V250-5 S S V V-II 2% S XCV50-6 S-II V S V 22% 144 S 60 S-IIAllianceCORE 44% 40 S S-II 20AllianceCORE XC4005XL-1 XCV50-6 S-II 29 Data syntax analysis of IP, V-II MPEG, S ATM XC4010XL-9 S Octet wide operation, HEC computation, 14% cell scrambling 48 XCS30-4 Octet wide operation, code rate, HEC verification, gen. 9% cell scrambling vectors, CMSTR length customizable XCS30-4 94 74AllianceCORE 46% XC2S150-6 104 VAllianceCORE XC2S100-6 adapter cards, ATM S-II routers, switches XCV50-6 31AllianceCORE adapter cards, ATM rtouters, switches XCV50-6 S-IIAllianceCORE NIST certified, supports ECB, CBC, 13% XC2V1500-5 CFB, and OFB V-II S ATM, IP, MPEG V V-E Error correction 45% 31 V S-II 50 XC2V1500-5 S-II V Secure communication, data storage Single/multi-mode fiber optics; 10/100 MII PHY, 10Base-T, S 75% XCV150-4 21% S-II 80 IEEE 802.3 compliant RMON, MIBs stats, MII support S 73 NICs, routers, switches, hubs, printers 23% XC2S150-6 80 XCV50-6 100 Ethernet switched, XC2V1500-5 hub, NICS XCV50-6 profile noise generation Programmable channel Noise emulation in transmission endor Name IPType Occ MHz D Paxonet Communications Paxonet Communications Paxonet Communications Paxonet Communications Paxonet Communications Paxonet Communications Paxonet inSilicon Corporation inSilicon Corporation IP Semiconductors A/S IP Semiconductors Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Alcatel Technology Licensing Group Technology Alcatel Licensing Group Technology Alcatel Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Function V with XGMII or XAUI Communication & Networking 3G FEC Package8b/10b Decoder8b/10b EncoderADPCM, 16 channel ADPCM, 256 channel ADPCM, 512 channel ADPCM, 768 channel ADPCM, 1024 channel AES Decryption core family AES Encryption CoreAES Encryption core family XilinxAnalyzer and Data Extractor Bit Stream XilinxBluetooth Baseband Processor Xilinx Baseband ControllerBluetooth Hardware LogiCORECell Assembler LogiCORECell Delineation LogiCOREConvolutional Encoder V-IIP V-IIConvolutional Encoder Wipro, V-IIP Ltd. V-IIVerifier and CRC10 Generator V-II V-E NewLogic GmbH V-E AllianceCORE AllianceCOREVerifier and CRC32 Generator V-E V CAST, Inc.DES3 Encryption V V S-IIEDES and 3DES Cryptoprocessor AllianceCORE V-II S-IIE S-II V-IIDES and 3DES encryption engineDES Cryptoprocessor S-IIDES Encryption V-IIDistributed Sample Descrambler V 0.4% V V-EDistributed Sample Scrambler 0.4%DVB Satellite Modulator V Xilinx 160 Memec Core 160 AllianceCOREEEthernet MAC, 1 Gigabit Half/Full duplex with LogiCOREGMII or Gigabit serial PHY S-II XC2V40-5Ethernet MAC, V-IIP CAST, 10 Gigabit Full Duplex Inc. XC2V40-5 V-II Xilinx Industry std 8b/10b en/decode for serial data transmission AllianceCORE 69% V-E Industry std 8b/10b en/decode for serial data transmission 16%Ethernet MAC, 10/100 V Memec Core LogiCORE CAST, Inc. S-IIE 25 V-IIEthernet MAC, AllianceCORE 10/100 V-IIP 131 S-IISPI-4.1 (Flexbus 4) Interface Core, V-E 1-Channel Xilinx AllianceCORE V-IISPI-4.1 (Flexbus 4) Interface Core, 4-Channel V-E V Phys HDLC Controller Core, XC2V1500-4 1-Channel XC2V250-5 HDLC Controller, Phys LogiCORE 1-Channel V-II Xilinx 10%HDLC Controller Core, 32-Channel V-IIP V-E XC2V1000-4 S-IIE Xilinx over SONET HDLC-PPP for Packet V-II S-II Decoder, Codec, Viterbi Turbo Convolutional EncoderInterleaver/De-interleaver V S LogiCORE 26Interleaver/Deinterleaver V LogiCOREATMInverse Multiplexer for 79%Network Processor 15% Xilinx S-II XC2V40-6 V-II 23% Channel Model Transmission Noisy 25 Memec Core Xilinx V-II 167SPI-3 (POS-PHY L3) Link Layer Interface, 125 (GMII) 1-256Ch AllianceCORE LogiCORESPI-3 (POS-PHY L3) Link Layer Interface, 1-Ch 93% XC2V1000-4 Wireless Infrastructure 3G SPI-3 (POS-PHY L3) Link Layer Interface, LogiCORE 2-Ch k from 3 to 9, puncturing from 2/3 to 12/13 XC2V1000-5 XCS20-4 XilinxSPI-3 (POS-PHY L3) Link Layer Interface, 4-Ch ModelWare, Inc.SPI-3 (POS-PHY L3) Physical Layer Interface Xilinx 204 AllianceCORE V-II XilinxSPI-4.2 (POS-PHY L4) Multi-Channel Interface V-E Xilinx LogiCORE V-II 25% Xilinx NIST certified, supports EBC, LogiCORE VSPI-4.2 (POS-PHY L4) Multi-Channel Interface V-E CBC, V-IIP CFB, V-II LogiCORE and OFB XC2V40-5 45-70 V-II Xilinx V-IIP LogiCORE V Xilinx V-E V V-IIP V-II V-E LogiCORE V-II V-IIP V-E Xilinx V-II XCV50-4 V S-II V-IIP V-E 12% LogiCORE LogiCORE V-II V-E V S-IIE S-II 27% V S-II V-IIP V-E S-II S-IIE V LogiCORE S-IIE 200 V-II S-II V Secure communicati 15% S-IIE S-II 200 S-IIE S-II 95% 34% S-II V-E 9% 115 V-II 30% 10% 100% 77 81 55% 175 15% XC2V250 208 200 31 200 XC2S15-5 XC2V250 200 XC2V40-6 32 full duplex, CRC-16/32, XC2V1000-4 8/16-bit address insertion/deleti XCV50E-8 Block & convolutional, width up to 256 bits, 256 bran 29% 52% 400 DDR 11% 104 XC2V3000 100 DDR XCV50E-8 OIF SPI-3 (POS-PHY L3) compliant. AWGN - Additive White Gaussian NoiseAdditive - AWGN Xilinx LogiCORE V-IIP V-II 93.0% 245 XC2V80-5 Ethernet MAC, 1 Gigabit Full Duplex Xilinx IP Selection Guide

Spring 2003 Xcell Journal 109 REFERENCE uwave nets,uwave TV digital ss cture ess Infrastructure ection, LMDS, MMDS rrection, LMDS, MMDS ATM PHY layer ATM PHY layer ATM PHY layer ATM PHY layer ATM PHY layer ATM Error correction Error correction Digital receivers xDSL, sat com, nets uwave Data transmission, wireless tions, wireless communications, image filtering xDSL, satellite com, uwave, CDMA2000 Embedded systems, professional audio, video Error correction, wireless, DVB, Satellite data link Error correction, wireless, DVB, Satellite data link L/MMDS, broadcast equip, wireless LAN, modem, cable 3G base stations, broadcast, wireless LAN, modem, cable Secure comms, video suveillance, data storage, financial transactions Broadcast, wireless LAN, cable modem, xDSL, nets, satellite com,uwave TV digital BPSK, QPSK, QAM demodulators, spread spectrum communication systems, CDMA2000 & 3G basestations Wireless & wireline communication systems such as software defined radios,Wireless & wireline communication systems such as software digital receivers, cable modems, , r dithering/offset D4, ESF, SLC-96 formats. XC4000. For DSI trunk, PBX I/F polyphase decimators, 25 to 108db DDS spurious free dynamic range Coded Modulation, IEEE802.11a/16a compatible, reaches OC3 (155Mbps) and higher Configurable datapath comprising mixer,Configurable DDS, optional CIC filter, optional with AMCC,with and Mindspeed OC-192 framers. PMC-Sierra DVB-RCS compliant, 9Mbps, data rate, sizes and frame switchable code rates DVB-RCS compliant, 9Mbps, data rate, sizes and frame switchable code rates Std or cust coding, 3-12 bit width, up to 4095 symbols with 256 check symb. Broadcast, wireless LAN, cable modem, xDSL, satellite com, Protocol conversion from UTOPIA L2 Pb (RACE BLNT), L2 Pb (RACE Protocol conversion from UTOPIA 8/16 bit operation Not recommended for new designs. Suggested replacement: Direct Digital Synthesizer Polar to rectangular,Polar rectangular to polar, sin & cos, sinh & cosh, atan & atanh, square root SPHY, MPHY, HEC processing, round robin polling, ind. receiver transmitter Not recommended for new designs. Suggested replacement:Arithmetic FIR Filter Distributed Not recommended for new designs. Suggested replacement:Arithmetic FIR Filter Distributed 64-,256-,1024-point programmable point size,64-,256-,1024-point programmable and inverse transform, forward 16-bit complex data, Not recommended for new designs. replacement: Suggested Direct Digital Synthesize Radix-2/radix4 architectures,Radix-2/radix4 BER, depuncturing. Code rate, length parameterizable contraint 1024-point transform - 7.31 us,1024-point transform - 1.83 us, 256-point transform - 0.46 us 64-point transform compatible with standards such as DVB ETS, 3GPP2, IEEE802.16, Hiperlan, IESS-308/309 Intelsat Protocol conversion from Pb (RACE BLNT) to UTOPIA L2, BLNT) to UTOPIA Protocol conversion from Pb (RACE 8/16 bit operation Not recommended for new designs. Suggested replacement: Comb Filter Cascaded Integrator Like Intel 8XC152 Global Serial Channel,Like Serial Comm., HDLC apps, telecom Cell handshake in SPHY mode,Cell handshake 8/16 bit operation, internal FIFO, detects runt cells 32-bit input/coeff width, 1024 taps, 1-8 chan, polyphase, online coeff reload includes BestState Logic, depth on the fly, and traceback ability to change code rate Trellis supports Std or custom coding, 3-12 bit symbol width, up to 4095 symbols, decoding error & erasure Parameterizable source code with constraint length(k)=7, source code with constraint Parameterizable G0=171, G1=133; or G0=133, G1=171 18-bit phase factors, complement, 2's built-in memories, data scaling, programmable 140 MHz, Complex FFT, and Inverse transform. Forward Supports bit precisions from 2-32 bits. Embedded memory Puncturing, architecture, serial & parallel change, dynamic rate length, constraint paramterized soft/hard decision with programmable number of soft bits,decision with programmable decoder, dual rate pins for external puncturing, erasure Cell handshake in SPHY mode,Cell handshake 8/16 bit operation, 32 bit FIFO interface, detects runt cells evice Features Key Application Examples FG676-5AMCC, with and Mindspeed OC-192 framers. PMC-Sierra XC2V500 16 bit complex data, 2's comp, and inverse transform forward XC2V500 16 bit complex data, 2's comp, and inverse transform forward XC2V500 16 bit complex data, 2's comp, and inverse transform forward XC2V500 16 bit complex data, 2's comp, and inverse transform forward XC2V3000XC2V3000 OIF SPI-4 Phase 1& 2 compliant. Fully HW interoperable OIF SPI-4 Phase 1& 2 compliant. Fully HW interoperable Line cards, switches, routers and optical switches Line cards, switches, routers and optical switches I 130MHz 00 on FB4, 350 DDR 156 DDR on XGMI 350 DDR on SPI-4.2, on SPI-4.2,on 175 internal 41us, 100MHz 2 7.7us, 100MHz 1.9us, 100MHz

Implementation Example

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II AllianceCOREAllianceCORE V-IIAllianceCORE V VAllianceCORE VAllianceCOREAllianceCORE V-IIAllianceCORE V-II V-E S 97% V 50% V-IIAllianceCORE V-II 61 V-E S VAllianceCORE V 50AllianceCORE 11%AllianceCORE XC2V500-5 VAllianceCORE 82AllianceCORE XCV100-4 AllianceCOREAllianceCORE 44% S-IIAllianceCORE XCV50-4 V 70% parameterizable, available V RTL 71 2% V 48% V 65 S-II V XC2V2000-5 69 S-II V 120 S S-II VAllianceCORE S XC2V2000-5 S-II V 8% S XC2V2000-5 S-II 10% S XC2V80-5 S-II V-II S S-II S V-E 53 3GPP/UMTS compliant, S-II >2Mbps data rate 61 S Error correction, 3GPP/UMTS compliant, wireless, S upto 4 interleaver laws DSL 26% XCV50-6 XCV50-6 S-II 79 74% XCV50-4 Error correction, wireless Error correction, wireless 91 XC2V500-5 Xilinx LogiCORE V-II 35% endor Name IPType Occ MHz D Paxonet Communications Paxonet Communications Paxonet Communications Paxonet inSilicon Corporation inSilicon Corporation inSilicon Corporation inSilicon Corporation Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom Italia Lab S.p.A. Telecom iCoding Technology, Inc. iCoding Technology, Inc. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Function V Communication & Networking (continued) SPI-4.2 (POS-PHY L4) to SPI-4.1 (Flexbus 4) Bridge Reed-Solomon DecoderReed-Solomon Decoder Reed Solomon Decoder Reed Solomon DecoderReed-Solomon EncoderReed Solomon Encoder Reed Solomon EncoderSDLC ControllerSHA-1 Encryption processorT1 Framer Convolutional Decoder - 3GPP Compliant Turbo Xilinx Decoder,Turbo 3GPP Decoder,Turbo DVB-RCS Memec Core Xilinx Decoder Turbo LogiCORE AllianceCORE Convolutional Encoder - 3GPP CompliantTurbo Xilinx V-IIP Encoder,Turbo DVB-RCS LogiCORE Memec Core V-II Encoder Turbo CAST, Inc. AllianceCORE V-E Xilinx Product Code DecoderTurbo LogiCORE Product Code EncoderTurbo AllianceCORE V V-IIP level 2 slave inteface UTOPIA V-II CAST, Inc. S-IIE V-II LogiCORE Level-2 PHY Side RX Interface UTOPIA V-E S-II SysOnChip,TX Interface V-E Level-2 PHY Side AllianceCORE Inc.UTOPIA V-II V AllianceCORE V Receiver ATM Level-3 UTOPIA IP Group Virtual S V V-E AllianceCORE Transmitter UTOPIA Level-3 ATM V-II S-IIE Level-3 PHY Receiver UTOPIA 40% V V-II V-E V S-II V-IITransmitter Level-3 PHY UTOPIA S-IIE S-II Xilinx Master UTOPIA V S S-II Xilinx S 98 Slave UTOPIA V 42% S-II V 83% LogiCORE LogiCORE S V-IIP 24% XC2V250-6 80% 180 V-II Decoder,Viterbi Purpose General V-IIP 73 12% V-II 79 40 Decoder Viterbi XC2V40-6 113 65%Digital Signal Processing XCV50-6Bit Correlator XC2V500-6 11% XC2V500 Comb (CIC) FilterCascaded Integrator 60 XCV50-6Comb Filter XilinxCORDIC 158 3GPP specs, 2 Mbps, XC2V250 BER=10-6 for 1.5dB SNR 66 LogiCORE XC2V1000-5 Direct Digital Synthesizer (DDS) Customizable, >580 Mbps V-IIP XilinxFFT/IFFT, Complex SHA-1 algorithm compliant 1024-Point 31% V-IIVirtex-II,FFT/IFFT for Complex 1024-Point 64% XC2V500-5 V-EFFT/IFFT, Complex 16-Point Customizable, > 900 Mbps LogiCOREVirtex-II,FFT/IFFT for Complex 16-Point 285 VFFT/IFFT, Complex 256-Point V-IIP 150 Compliant w/ 3GPP, puncturing S-IIE 3GPP/UMTS compliant,Virtex-II, IMT-2000,FFT/IFFT for V-II 2Mbps data rate Complex 256-Point Xilinx S-IIFFT/IFFT, Complex 32-Point V-E XC2V40-5 XilinxFFT/IFFT, 64-, XilinxWirel 3G XC2V1000-5 256-, Complex 1024-Point V Xilinx Xilinx LogiCORE Fully compliant with IEEE 802.16 and 802.16a standards S-IIE Fully compliant with IEEE 802.16 and 802.16a standards Xilinx 38% LogiCOREFFT/IFFT, LogiCORE S-II Complex Xilinx 64-Point V-IIPVirtex-II,FFT/IFFT for Complex 64-Point Xilinx V-IIP LogiCORE LogiCORE Xilinx V-IIFIR Filter,Arithmetic (DA) Distributed 128 LogiCORE V-II V-II Xilinx Xilinx LogiCOREFIR Filter, V-E Arithmetic Parallel Distributed V-E Error correction, Error correction wirele FIR Filter,Arithmetic Serial Distributed LogiCORE LogiCORE V VFIR Filter, V-II MAC Xilinx XC2V500-5 LogiCORE Error Corr V-IIP Error Co LogiCORE S-IIELFSR, Error Correction Wireless Infrastru S-IIE Shift Register 3G Linear Feedback Xilinx V-II V-IIP V-II S-IIOscillator, Dual-Channel Numerically Controlled S-II V-E V-II V-E V-IIOscillator, Numerically Controlled LogiCORE Xilinx Xilinx V Buffer,Time-Skew V-E Nonsymmetric 16-Deep V Xilinx LogiCORE V-II V-IIP Xilinx Xilinx Buffer,Time-Skew Nonsymmetric 32-Deep V-II S-IIE V V-E 12% LogiCORE LogiCORE S-II V-E V LogiCORE V-IIP Xilinx LogiCORE LogiCORE V V-II V-II Xilinx 245 S-IIE V-E Xilinx 62% S-II Xilinx LogiCORE V LogiCORE XC2V80-6 V-IIP S-IIE LogiCORE S 37% V-II V-E Xilinx 8-65K samples, LogiCORE S-II 32-bits output precision, phase V-E 123ns V-E 29% 54% 32 bits data width, change from 8 to 16384 rate V 32% V V LogiCORE S-IIE 110 S-IIE V-IIP S-II S-II 140 V-II S V-E V-E XC2V500-6 38% V XC2V1500-6 V 4096 taps, input, serial/parallel S 4096 bits width S-IIE S-IIE S-II S S-II S S 16% S XC2V250 Single rate, Decimator, Polyphase Interpolator Polyphase 168 input widths, SRL16/register implementation 3G base sta Not recommended for new designs. Not recommended for new designs. Digital Down Converter (DDC) Xilinx LogiCORE V-IIP V-II V-E V S-IIE S-II 47% 116 XC2V500-5 SPI-4.2 (POS-PHY L4) to XGMII (10GE MAC) BridgeSPI-4.2 (POS-PHY L4) to XGMII (10GE MAC) Xilinx LogiCORE V-II 53% Viterbi Decoder,Viterbi IEEE 802-compatible Xilinx LogiCORE V-IIP V-II V-E V S-IIE S-II 35% 157 XC2V500-5

110 Xcell Journal Spring 2003 REFERENCE deo IP routers Processor applications Processor applications DSP, Math, Arithmetic apps DSP, Math, Arithmetic apps DSP, Math, Arithmetic apps DSP, Math, Arithmetic apps DSP, Math, Arithmetic apps DSP, Math, Arithmetic apps. DSP, Math, Arithmetic apps. Embedded systems, telecom Embedded systems, telecom Serial and modem applications Serial data applications, modems Serial data applications, modems Embedded systems, telecom, video Embedded systems, telecom, video Embedded systems, telecom, video Simple microcontroller applications Low cost embedded systems, telecom Telecom, industrial, high speed control DSP,Telecom, industrial, high speed control High speed embedded systems, audio, video Networking, communications, processor apps vailable w/EDK)vailable Networking, communications, processor apps 80C31 instruction set, high speed multiplier, RISC architecture 6.7X faster than standard 8051 80C31 instruction set, RISC architecture 6.7X faster than standard 8051 Supports 16450 and 16550a synchronous s/w; serial interface programmable 80C31 instruction set,ALU, 8 bit 8 bit control, 32 bit I/O ports, two 16 bit timer/counters, SFR I/F Not recommended for new designs. Suggested replacement: CORDIC Independently controlled transmit, receive and data interrupts. 16X clock. 12X faster (average) and code compatible wrt legacy 8051,12X faster (average) verification bus monitor, SFR interface CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation 32 bit I/O, 3 counters, interrupt controller, SFR interface, dual data pointer Full IEEE-754 compliance, 4 pipelines, Single precision real format support Full IEEE-754 compliance, 4 pipelines, Single precision real format support Full IEEE-754 compliance, 4 pipelines, Single precision real format support 64-bit input data width, constant, inputs, reloadable or variable implementation parallel/sequential Full IEEE-754 compliance, double word input, 2 pipelines, Single precision real output Full IEEE-754 compliance, 15 pipelines, Single precision real format support Full IEEE-754 compliance, mult, 7 pipelines,32x32 Single precision real format support Not recommended for new designs. replacement: Suggested Distributed Memory Not recommended for new designs. replacement: Suggested Distributed Memory 8X faster (average) and code compatible wrt legacy 8051,8X faster (average) verification bus monitor, SFR interface, DSP focused Not recommended for new designs. Suggested replacement: Comb Filter Cascaded Integrator CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation 80C31 instruction set, high speed multiplication and division ,RISC architecture 6.7X faster than standard 8051 RISC implementation,ALU, 8 bit 8 bit control, 32 bit I/O, 16 bit timer/counters, SFR I/F, ext. memory I/F Not recommended for new designs. Suggested replacement: Synchronous FIFO supporting Spartan-II/Spartan-IIE Not recommended for new designs. Suggested replacement: Multiplier Not recommended for new designs. Suggested replacement: Multiplier Not recommended for new designs. Suggested replacement: Multiplier Not recommended for new designs. Suggested replacement: Multiplier Not recommended for new designs. Suggested replacement: Multiplier 32 bit I/O, 3 counters, timer, 27-bit watchdog 3-priority interrupt controller, SFR interface Not recommended for new designs. Suggested replacement: Adder Subtracter Not recommended for new designs. Suggested replacement: Adder Subtracter Not recommended for new designs. Suggested replacement: Adder Subtracter Not recommended for new designs. Suggested replacement: Adder Subtracter Not recommended for new designs. Suggested replacement: Accumulator Prog. Data width, parity, stop bits. 16X internal clock, FIFO mode, false start bit detection 1-256 bits, 15-65535 words, DRAM or BRAM, independent I/O clock domains Not recommended for new designs. Suggested replacement: Distributed Memory using SRL16 based memory type 1-1024 bit, 16-65536 word, RAM/ROM/SRL16, opt output regs and pipelining Eight function ALU,Eight function 4 status flags- Carry, Overflow, Zero and Negative Full IEEE-754 compliance, 4 pipelines, Single precision real format support Input width up to 32 bits, 65-bit accumulator, truncation rounding evice Features Key Application Examples

Implementation Example

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II AllianceCOREAllianceCOREAllianceCOREAllianceCORE V-IIAllianceCORE V-IIAllianceCORE V-IIAllianceCORE V-II V-E V V-II V V V-II V-E V-II V S-II S-II S-II V V S-II S-II S-II S-II 66 74 53 91 XC2V250-5 66 XC2V250-5 XC2V250-5 66 73 XC2V80-5 XC2V250-5 XC2V250-5 XC2V250-5 AllianceCORE VAllianceCORE S-II S V-II 19% V-E V 49 S-IIAllianceCORE XCV50-6 AllianceCOREAllianceCORE V-IIAllianceCOREAllianceCORE V-II V-II VAllianceCORE 91 V-II V V-E V V-II XC2V500-5 V S-II S-II 44 opcodes, 64-K word data, V program, arch. Harvard S-II 77% S-II S S-II 49 39% 33% Control functions, State machines, Coprocessor 73 XC2V1000-5 38 29.8 90 XC2V250-5 XC2V1000-5 XCV300-6 XC2V250-5 71 i80186 compatible plus enhanced mode XC2V250-5 Industrial, wireless, communications, embedded endor Name IPType Occ MHz D Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Digital Core Design Telecom Italia Lab S.p.A. Telecom eInfochips Pvt. Ltd. Dolphin Integration Dolphin Integration Loarant Corporation Loarant Function V Digital Signal Processing (continued) Digital Signal Processing Buffer,Time-Skew Symmetric 16-Deep TMS32025 DSP Processor coreMath Functions 1s and 2s ComplementAccumulatorAdder Subtracter XilinxDivider, Pipelined Adder Floating Point Divider Floating Point CAST, Inc. LogiCORE Multiplier Floating Point Square Comparator Floating Point AllianceCORE Square Root Operator Floating Point to Integer Converter Floating Point Converter Integer to Floating Point V-II XilinxIntegrator V-EMultiplier, Constant Coefficient Multiplier, V Constant Coefficient - Pipelined LogiCOREMultiplier, Dynamic Constant Coefficient XilinxMultiplier, XilinxArea Optimized - Parallel XilinxVirtex,Multiplier for Parallel Variable S-IIMultiply Accumulator (MAC) LogiCORE Xilinx LogiCOREMultiply Generator LogiCORE V-IIPRegistered Adder Xilinx V-IIP V-IIAdderRegistered Loadable Xilinx V-II 66% LogiCORE V-ERegistered Loadable Subtracter S V-E XilinxAdderRegistered Scaled LogiCORE V-II V Xilinx VAdder LogiCORERegistered Serial V-E 63 S-IIERegistered Subtracter S-IIE LogiCORE V S-IIScaled-by-One-Half Accumulator S-II LogiCORE XilinxTableSine Cosine Look Up S-IIE Xilinx XC2V500-5 Square Root S-II ComplementerTwos V-E S LogiCORE S LogiCOREMemories & Storage Elements Xilinx Xilinx V Utopia Level 2 ATM Xilinx V-IIPBlock Memory, Dual-Port V-II V-E XilinxBlock Memory, LogiCORE Single-Port Xilinx LogiCORE Xilinx V LogiCOREAddressable Memory (CAM)Content XilinxCAM, S-IIE for Internet Protocol V-IIP LogiCORE VDistributed Memory V-II LogiCORE S-II Xilinx Xilinx LogiCORE S FIFO, Asynchronous Xilinx S-IIE V-E LogiCOREFIFO, Synchronous Xilinx S-II VFIFO, LogiCORE Synchronous S LogiCOREPipelined Delay Element LogiCORE S-IIE Xilinx Xilinx V-IIP S Registered ROM LogiCORE S-II V-IIP V-II RAMRegistered Single Port Xilinx V-II Not recommended for new designs. Xilinx replacement: Suggested Xilinx Complementer Twos Xilinx Bus Gate or V-E & Peripherals Controllers Microprocessors, LogiCORE LogiCORE V-E w/OPB interface10/100 Ethernet MAC V V-IIP V-IIP V Lite w/OPB interface LogiCORE10/100 Ethernet MAC S LogiCORE LogiCORE V-II LogiCORE S-IIE V-II16-bit proprietary RISC processor S-IIE V-IIP V-E S-II V-IIP V-E V-IIP 32-bit input data width, Registered LUT16-Word-Deep multiple clock per output S-II V-II V-II Xilinx16450 UART V S V S V-E S V-E w/OPB interface16450 UART Xilinx S-IIE S-IIE Xilinx 12% w/FIFOs16550 UART V 1-256s bit wide Xilinx V S-II S-II LogiCORE 1-256s bit wide S w/FIFOs & synch interface16550 UART Xilinx Xilinx S-IIE S S-IIE w/OPB interface V-IIP S 16550 UART LogiCORE S-II LogiCORE 270 S-II S V-II2901 microprocessor slice LogiCORE V-IIP Xilinx V-IIP controller2910A microprogram V-E LogiCORE LogiCORE V-II S V-II68000 compatible microprocessor Xilinx V-IIP V V-E CAST, Inc. XC2V40-6 V-E80186 compatible processor V-II S LogiCORE S-IIE8051 compatible microcontroller V V V-E AllianceCORE Xilinx S-II 8051 compatible microcontroller LogiCORE S-IIE S-IIE V8051 compatible microcontroller S-II IP Group Virtual 3-10 bit in, S-II 4-32 bit out,8051 compatible microcontroller AllianceCORE distributed/block ROM IP Group S-IIE Virtual S LogiCORE Xilinx V-II AllianceCORE8051 compatible microcontroller S-II CAST, Inc. CAST, Inc. V-IIP V-E80515 high-speed 8-bit RISC microcontroller CAST, Inc. V-II8052 compatible microcontroller AllianceCORE AllianceCORE V LogiCORE AllianceCORE V-E CAST,80530 8-bit microcontroller, Inc. compact S-IIE V-IIP CAST, Inc. V S-II V-II AllianceCORE V-II S-IIE V-E AllianceCORE 125 S-II 125 V S 57% S S-IIE CAST, Inc. V XC2V80-5 S-II XC2V80-5 V Not recommended for new designs. AllianceCORE V S 87 Not recommended for new designs. 1-512 bits, CoreConnect Bus (OPB), 2-10K words, Core (A Evaluation SRL16 Input width up to 256 bits S V S-II V S S-II S 1-256 bits, 125 1-256 bits, XC2V80-5 2-13K words 2-128K words S S-II 12% 89% 125 XC2V80-5 19% V 1-256 bits, 16-256 words, distributed/block RAM 63 52% 16 32 60 XC2V80-5 36 S-II 90% 68 XCV50-6 XC2V500-5 XCS20-4 XC2S50-6 XC2S50-6 42 88% XCV200E-8 XCV200E-8 51 MC68000 CompatibleAMD 2910a Based on XC2S150-6 Embedded systems, professional audio, vi High-speed bit slice design 80530 8-bit microcontroller CAST, Inc. AllianceCORE V 81% 66 XCV200E-8

Spring 2003 Xcell Journal 111 REFERENCE rking, communications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications Processor applications 32 bit processing, DSP Processor I/O interface Processor I/O interface Processor I/O interface Embedded systems interface PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC embedded system design PowerPC Event counterr, generator baud rate Communication, embedded systems Real-time interrupt based uP designs Real-time interrupt based uP designs Networking, communications, processor apps Networking, communications, processor apps Networking, communications, processor apps Embedded systems, telecom, audio and video Embedded systems, telecom, audio and video Embedded systems, telecom, audio and video Digital video, embedded computing, networking Internet appliance, industrial control, multimedia, HAVi set top boxes K Processor applications DM Microprocessor based systems e w/EDK Processor applications DK Networking, communications ilable w/EDKilable w/EDK Processor applications Processor applications le, w/EDK Available Processor applications dule, w/EDK Available Processor applications odule, w/EDK Available Processor applications oreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available oreConnect Bus (OPB), Core (includes device drivers), Infrastructure w/EDK Available oreConnect Bus (OPB), Core (includes device drivers), Infrastructure w/EDK Available PIC 12c4x like, 2X faster, 12-bit wide instruction set, 33 instructions 8 vectored priority interrupts, e.g., all 8259/A modes programmable- special mask, buffer CoreConnect Bus (PLB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available CoreConnect Bus (DCR), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation CoreConnect Bus (OPB), ($) Core to be bought separately Value w/EDK) or High Core (Available Evaluation DDR SDRAM burst length support for 2,4,8 per access, supports data 16,32, 64, 72. CoreConnect Bus (PLB), Memory Controller Core, w/EDK Available Three 8-bit peripheral ports, 8-bit peripheral Three IO lines, 24 programmable 8-bit bidi data bus CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (OPB), Memory Controller Core, w/EDK Available CoreConnect Bus (OPB), Memory Controller Core, w/EDK Available CoreConnect Bus (OPB), Memory Controller Core, w/EDK Available CoreConnect Bus (OPB), Core (includes device drivers), Infrastructure w/EDK Available 8 char keyboard FIFO,8 char keyboard lockout, 2-key rollover, n-key 4-16 char display CoreConnect Bus (DCR), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (DCR), Core (includes device drivers), Infrastructure w/EDK Available S/W compatible with PIC16C55X, 14-bit instruction set, 35 instructions Baud rate generator for 13 common baud rates, generator Baud rate I/O ports, parallel prog. timer/counters Three 8-bit parallel ports, 8-bit parallel Three IO lines, 24 programmable 8-bit bidi data bus PIC 12c4x like, 2X faster, 12-bit wide instruction set, 33 instructions Status feedback, counter latch, mode, Square wave 6 counter modes, binary/BCD count, LSB/MSB R/W Three 8-bit peripheral ports, 8-bit peripheral Three I/O lines, 24 programmable 8-bit bidi data bus 4 stage pipeline, 16 single cycle instructions10, 3 interrupt exception levels, 24 bit stack pointer 8 vectored priority interrupts, e.g., all 8259/A modes programmable- mask, special buffer (-6) CoreConnect Bus (PLB), Memory Controller Core, w/EDK Available Device Features Key Application Examples

Implementation Example

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II AllianceCOREAllianceCORE V-II V-IIAllianceCORE S-II S-II 11% 2% V-E 51 175 XC2V1000-4 XC2V1000-5 S-II Status red, Six prog, counter modes, Intel8254 like 89% 37 XC2S150-6 Processor I/O interface AllianceCOREAllianceCORE V-II V-II V V S-IIAllianceCOREAllianceCOREAllianceCORE V-II 33% V-IIAllianceCORE 38%AllianceCORE V-II 40 VAllianceCORE V-E VAllianceCORE 20 VAllianceCORE XC2V1000-5 32bit data, S-II 24 bit address, 3 Stage pipeline, dev. Java/C tools XC2V1000-5 V-E S-II S-II V V V V 32b data/address optional DES 126 140 S-II 126 S-II 3% 10% XC2V80-5 XC2V80-5 37% XC2V80-5 80 80 91 Internet appliance, industrial control 137 XCV400E-8 XCV400-6 XCV50-6 XCV50-6 SDRAM refresh, customizable Embedded systems using SDRAMs XilinxXilinx LogiCORE LogiCORE V-IIP V-II V-IIP V-EXilinx VXilinx S-IIE LogiCORE S-II V-IIP LogiCORE V-II V-IIP V-E V S-IIE S-II 125 XC2V80-5 150 125 Pro (-6) Virtex-II XC2V80-5 150 Pro Virtex-II endor Name IPType Occ MHz Memec Core AllianceCORE S 89% 10 XCS20-4 Rapid Prototypes, Inc. NMI Electronics, Ltd. Digital Core Design Digital Core Design Digital Core Design Eureka Technology Eureka Technology Eureka Technology Digital Communications Technologies,Digital Communications Ltd. eInfochips Pvt. Ltd. eInfochips Pvt. Ltd. ARC International plc Derivation Systems,Derivation Inc. Function V Microprocessors, Controllers & Peripherals (continued) Controllers Microprocessors, 80C51 compatible RISC microcontroller DMA controller8237 programmable 8250 UART timer/counter core interval 8254 programmable timer8254 programmable CAST, Inc. timer/counter 8254 programmable CAST, Inc. I/O controller 8255 programmable AllianceCORE CAST, Inc. AllianceCORE interface peripheral 8255 programmable interface peripheral 8255 programmable AllianceCORE interface8255A peripheral 8256 multifunction microprocessor support controller V-II interrupt controller Memec Core8259 programmable V-E IP Group Virtual V-II V-E IP Group Virtual interrupt controller8259A programmable AllianceCORE AllianceCORE AllianceCORE V V-E display interface keyboard 8279 programmable VArbiter and Bus Structure w/OPB interface V IP Group Virtual Memec CoreArbiter and Bus Structure w/PLB interface Memec Core AllianceCORE RISC processor ARC 32-bit configurable AllianceCORE CAST, Inc. AllianceCORE S-II Utopia Level 2 Master and Slave w/OPB Interface ATM S-II CAST, AllianceCORE Inc. Utopia Level 2 Master and Slave w/PLB Interface S-IIATM XilinxBRAM Controller w/LMB interface AllianceCORE XilinxBRAM Controller w/OPB interface 74% V V 75% V-IIBRAM Controller w/PLB interface LogiCORE 40% (SW only)BSP Generator V-E LogiCORE V-IIPDCR Bus Structure 68 V 34 V-II S-IIExternal Memory Controller (EMC) w/OPB interface V-IIP S-II 34 V-E(Includes support for Flash, SRAM, ZBT, SACE) System S External Memory Controller (EMC) w/PLB interface Xilinx V XC2V80-5 S(Includes support for Flash, XC2S150-6 SRAM, Xilinx ZBT, S-IIACE) System V S-IIE XC2S100-6GPIO w/OPB interface Xilinx 64% S-IIHDLC Controller (Single Channel) w/OPB interface LogiCORE 4 independent DMA channels (expandable),I2C w/OPB interface S software LogiCORE V-IIP S 90%Interrupt Controller (IntC) w/OPB interface S Xilinx S-II 227 V-II LogiCORE V-IIP 8Interrupt Controller (IntC) w/DCR interface Xilinx 46% V-II V-E 58% V-IIPAddress Decode w/OPB interfaceIPIF V-E 142IPIF DMA w/OPB interface V LogiCORE XCV50E-8 10%IPIF Interrupt Controller w/OPB interface LogiCORE V Xilinx S-IIE 12X faster, V-IIP SRF I/F Xilinx 8 XCS05-4 10Attachment w/OPB interfaceIPIF Master/Slave V-II V-IIP S-IIE S-II Xilinx 125 FIFO w/OPB interfaceIPIF Read/Write Packet XC2V40-5 V-E S-II 227IPIF Scatter/Gather w/OPB interface LogiCORE LogiCORE VAttachment w/PLB interfaceIPIF Slave Xilinx Xilinx Xilinx V-IIP XCS10-4 LogiCORE XCS20-4 V-IIP processor, XC2V80-5 Java 32-bit S-IIE Xilinx XC2S50-6 V-II V-IIP Xilinx XCV50E-8 processor,Java core configurable S-II Xilinx V-II w/OPB interface V-E LogiCORE UART JTAG LogiCORE LogiCORE V-E Utility (SW only) 150Test Memory LogiCORE V-IIP V V-IIP V-IIP Xilinx LogiCOREMicroBlaze Soft RISC Processor Bit set/reset support V-II 125 V-II V V-IIP V-II LogiCORE S-IIE Xilinx XilinxMicroBlaze Source Code V-IIP Pro (-6) V-E Virtex-II V-II 125 V-E S-IIE V-E S-II V-IIP V-II BSP (SW only)VxWorks ML300 V-E LogiCORE S-II V-II V VOPB2DCR Bridge V-E V XC2V80-5 LogiCORE LogiCORE V-E V-IIP VOPB2OPB Bridge (Lite) S-IIE DC to 625K baud S-IIE S-IIE V XC2V80-5 V-II V-IIPOPB2PCI Full Bridge (32/33) V-IIP S-II 125 S-IIE S-II V S-II V-II S-IIE V-EOPB2PLB Bridge Embedded systems S-II S-IIE S-IIPIC125x fast RISC microcontroller Xilinx V-E V 150 Xilinx S-IIPIC1655x fast RISC microcontroller Xilinx XC2V80-5 V S-IIEPIC165X compatible microcontroller 125 S-IIE S-II LogiCORE Pro (-6) Virtex-II PIC165x fast RISC microcontroller Xilinx LogiCORE 125 S-IIPLB2OPB Bridge LogiCORE 150 V-IIP Xilinx V-IIP Bus Master PowerPC V-II V-IIP XC2V80-5 LogiCORE Bus Slave PowerPC 125 V-II 125 V-E 125 Pro (-6) Virtex-II Xilinx XC2V80-5 SDRAM Controller LogiCORE 150 125 V-IIP V-E Embedded systems V Xilinx 125 CAST, Inc.SDRAM Controller V-IIP Xilinx 125 VSDRAM Controller, S-IIE 200 MHz XC2V80-5 V-II XC2V80-5 AllianceCORE Pro (-6) Virtex-II LogiCORE XC2V80-5 S-II S-IIE XC2V80-5 V-E LogiCORE 125 V-IIP CoreConnect Bus (OPB), XC2V80-5 CoreConnect Bus (OPB), S-II IP Interface Module, Ava IP Interface M Serial communications LogiCORE Xilinx 125 V-II XC2V80-5 V-IIP CoreConnect Bus (OPB), V IP Interface Modu CoreConnect Bus (OPB), V-II V-IIP IP Interface Mo V-II V-E S-IIE XC2V80-5 V-E CoreConnect Bus (OPB), w/EDK Available S-II V LogiCORE XC2V80-5 CoreConnect Bus (OPB), V IP Interface Module, S-IIE w/ED Available V-IIP V CoreConnect Bus (OPB), Xilinx IP Interface Module, S-IIE S-II Ava 150 S-II 125 150 Pro (-6) Virtex-II LogiCORE CoreConnect Bus (PLB), S-II IP Interface Module, Availabl V-IIP XC2V80-5 Pro (-6) Virtex-II 150 150 61% Pro (-6) Virtex-II Processor applications 125 32-bit Soft Processor Core, w/E Available 125 Pro (-6) Virtex-II 150 128 ($) XC2V80-5 Core to be bought separately Value High Pro (-6) Virtex-II XC2V80-5 C XC2V80-5 C 150 Pro (-6) Virtex-II 150 Netwo Microchip 16C5X PIC like Pro (-6) Virtex-II C 150 Pro (-6) Virtex-II SDRAM Controller, DDR Memec Core AllianceCORE V-II V-E S-II 7% 133 XC2V1000-5

112 Xcell Journal Spring 2003 REFERENCE , , s tions gn Processor applications Processor applications Processor applications Processor applications JPEG, MPEG, H.261, H.263 Server,Embedded,gb ethernet, Server,Embedded,gb eo phone, Set-top box, PDA display and chip-to-chip interconnections PowerPC embedded system design PowerPC embedded system design PowerPC PC add-in boards, CPCI, Embedded PC add-in boards, CPCI, Embedded Embedded systems, communications comm systems, SAN, clustered servers, U320 SCSI,Fibre Ch,RAID cntl,graphics beded microcontroller and communications Aurora is an ideal solution for backplane Aurora Ultra 3 SCSI/Fibre Ch RAID,Ultra multi-port Gb Embedded systems, 8-bit processing apps. Scanners, Printers, Handhelds, Mass Storage Networking, communications, processor apps Intelligent RapidIO peripherals, route exception processor, channel processor enterprise storage PC boards,CPCI,Embedded,hiperf video,gb ethernet video,gb PC boards,CPCI,Embedded,hiperf ethernet video,gb PC boards,CPCI,Embedded,hiperf PC boards, CPCI, Embedded, hiperf video, gb ethernet Routers, switches, backplane, control plane, data path, embedded sys Routers, switches, backplane, control plane, data path, embedded sys high speed interface to memory and encryption engines, high end video high speed interface to memory and encryption engines, high end video Programmable ,Programmable dual-port device, keyboards, printers, paper table readers Embedded microprocessor boards, and SOCs, audio/video, home and automotive radio Programmable frequency divider,Programmable Pulse counter, pulse generator, interrupt controller Picture and video, archiving, digital television compression and transmission, teleconference des Embedded microprocessor systems, I2C peripherals odes Embedded microprocessor systems, I2C peripherals VSC7216-01, Instruments TLK2501 and Texas devices initiator and target IF,V PCI-X at 33-66 MHz, 3.3 V PCI at 0-33 MHz 3.3 Circuit to detect SONET A1A2 transitions Circuit to detect SONET are tightly integrated. to RapidIO PHY layer customers for free Available PCI initiator and target IF,V PCI-X at 33-66 MHz, 3.3 V PCI at 0-33 MHz 3.3 customer education 3-day training class for US & Canada locations customer education 3-day training channel. design back-end Aurora Uses LocalLink specification on the v2.3 compliant, assured PCI timing, 3.3/5-V, 0-waitstate, friendly CPCI hot swap v2.3 compiliant, assured PCI timing, 3.3/5-V, 0-waitstate, friendly CPCI hot swap v2.3 compiliant, assured PCI timing, 3.3/5-V, 0-waitstate, friendly CPCI hot swap Reference design implements the Aurora link-layer protocol for a single Aurora Reference design implements the USB 1.1; up to 31 endpoints; suspend and resume power mgmt; remote wake-up PCI-X 2.0 mode1 comp, 64/32-bit, 66 MHz PCI-X initiator and target IF, PCI 2.3 comp, 64/32-bit, 33 MHz CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available Includes PCI32 board, driver development kit, class for US & Canada locations and customer education 3-day training 8x8 parameterized FDCT,8x8 parameterized IDCT & IEEE 1180-1990 compliant v2.3 compliant, assured PCI timing, 3.3/5-V, 0-waitstate, friendly, CPCI hot swap CoreConnect Bus (PLB), Memory Controller Core, w/EDK Available CoreConnect Bus (PLB), Core (includes device drivers), Infrastructure w/EDK Available CoreConnect Bus (OPB), Memory Controller Core, w/EDK Available World's first PowerPC with a RapidIO interface. first PowerPC World's RapidIO physical layer core and the CoreConnect PLB The SONET/OTN descrambler and scrambler are implemented in V-IIP fabric V-IIP are implemented in and scrambler descrambler SONET/OTN CoreConnect Bus (OPB), Core (includes device drivers, Peripheral adaptation layers), RTOS w/EDK Available PCI-X 2.0 mode1 comp, 64/32-bit, 133 MHz PCI-X initiator and target IF, PCI 2.3 comp, 64/32-bit, 33 MHz PCI Proprietary 8-bit processor,ALU, 8 bit 16 bit stack pointer, 33 opcodes, 4 addr. Modes, 2 user opcodes Application note describes how to create compliant PCI3.3V designs with V-II Pro V-II Application note describes how to create compliant PCI3.3V designs with Compliant with USB1.1 spec.,VCI bus, Supports CRC, Performs Supports 1.5 Mbps & 12 Discusses emulating Mindspeed CX27C201,VSC7123 and quad channel single channel Vitesse evice Features Key Application Examples FG456-5 FG456-5 FG456-5 0-waitstate, friendly CPCI hot swap FG456-5FG456-5 with Motorola's RapidIO bus functional model v1.5 with Motorola's RapidIO bus functional model v1.5 XC2V1000 XC2V1000 v2.3 compliant, assured PCI timing, 3.3/5-V, ethernet video,gb PC boards,CPCI,Embedded,hiperf XC2S200 PQ208-6 XC2V1000 FG456-5 XC2V1000 FG456-5 XC2V1000 FG456-5 per channel 3.125 Gbps Any 2.488 Gbps per channel2.488 Gbps per channel 2.488 Gbps per channel inside the FPGA @ 155.52 MHz transceiver Designed to I/F 16-bit SONET datapath the RocketIO Implementation Example

~500 slices

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II V-IIP V-IIP V-IIP V-IIP V-IIP V-IIP AllianceCOREAllianceCOREAllianceCOREAllianceCOREAllianceCORE V-II V-II V-EAllianceCORE V-II V-E VAllianceCORE V-E V-II V-II V-E S-II S-II V-E S-II S S S-II S 24% 16 S 33 XC4000E 143 157 187 XCV50-6 22% 22% XC2V40-5 XC2V40-5 66 XC2V40-5 priority classes - strong/weak, Two access counters 33 XC2V1000-5 I2C-like, multi master, fast/std. modes XC2V1000-5 purpose bus arbitration General I2C-like, Slave I2C-like, Slave Embedded systems Embedded Systems Embedded Reference Design Reference Design Reference Design Reference Design Reference Design Reference Design ilinx LogiCORE V-IIP V-II V-E V S-IIE S-II 7% 66 ilinx LogiCORE V-IIP V-II V-E V S-IIE S-II 7% 66 Xilinx White Paper V-IIP Xilinx Xilinx LogiCOREXilinxXilinx LogiCOREXilinx LogiCOREXilinx S-IIE V-II LogiCORE S-IIXilinx V-IIP LogiCORE V-II V-IIP 12% VE V-II 66 Xilinx Xilinx Xilinx 30% 30% 24% 133 20% 66 250 XC2V1000 62.5 XC2V1000 XCV300E-8 Xilinx XC2V1000Xilinx RapidIO Interconnect v1.2 compliant, LogiCORE verified LogiCORE V-IIP RapidIO Interconnect v1.1 compliant, verified V-II V-IIP V-II V-E V-E V V S-IIE S-IIE S-II S-II 32% 140 XC2V1000-5 8-32 pt FDCT, IDCT with 8-24 bits for coeff & input JPEG, MPEG, H.261, H.263 CAST, Inc.CAST, Inc. AllianceCORE AllianceCORE V-II V-II V-E V-E V V S-II S-II 65% 55% 95 95 XC2V500-5 XC2V500-5 endor Name IPType Occ MHz D elecom Italia Lab S.p.A. Digital Core Design Digital Core Design Digital Core Design Eureka Technology Eureka Technology ARC International plc Function V PCI64 Interface Design Kit (DO-DI-PCI64-DKT) X WP160: Emulating External SERDES Devices USB 1.1 Function Controller CAST, Inc. AllianceCORE V-II V-E V S-II 92% 48 XC2S200-6 Microprocessors, Controllers & Peripherals (continued) Controllers Microprocessors, SPI Master and Slave w/OPB interfaceSDRAM Controller w/OPB interface SDRAM Controller w/PLB interface Systems Reset Module (WDT) w/OPB interfaceTimer Dog Timebase/Watch w/OPB interfaceTimer/Counter XilinxUART, compact generic Xilinx Lite w/OPB interfaceUART XilinxV8-uRISC 8-bit RISC Microprocessor Xilinx LogiCORE (BSP) Board Support Package VxWorks LogiCORE timer/counter coreZ80 compatible programmable V-IIP V-IIP LogiCORE I/O controller coreZ80 peripheral V-II V-II CAST, Inc. LogiCORE V-IIPZ80CPU Microprocessor V-E V-E V-II Bus Interfaces Standard V-IIP Xilinx AllianceCORE V Xilinx V V-EArbiter XilinxI2C Bus Controller S-IIE S-IIE V Memec CoreI2C Bus Controller Master LogiCORE S-II S-II V-II Xilinx S-IIE LogiCORE AllianceCOREI2C Bus Controller Slave V-IIP V-E LogiCORE CAST, S-II Inc.I2C Bus Controller Slave Base V-IIP V-II V-IIP Serial Interface Master-OnlyTwo-Wire I2C V LogiCORE V-E AllianceCORE Serial Interface Master-SlaveTwo-Wire V-III2C V-IIP VPCI 64-bit/66-MHz master/target interface V-E V-II CAST, Inc.PCI host bridge S-IIE Memec Core V-II V S-II V-EPCI32 Interface Design Kit (DO-DI-PCI32-DKT) Memec Core S-II AllianceCORE AllianceCORE V-E S-IIEPCI32 Interface, 125 IP Only (DO-DI-PCI32-IP) 125 V AllianceCORE S-IIPCI32 Single-Use License for Spartan (DO-DI-PCI32-SP) V S-IIE 125PCI64 & PCI32, IP Only (DO-DI-PCI-AL) Xilinx CAST, V-II S 9% Inc. S-II V-II XC2V80-5 XC2V80-5 V-II V-E T 15% AllianceCORE V-E S-II XC2V80-5 PCI64 Interface, Xilinx IP Only (DO-DI-PCI64-IP) LogiCORE V 167 V V V-IIP S-IIEXAPP653:Virtex-II Pro 3.3V PCI Reference Design 73 125 V-II S-IIE V-II S-II Xilinx 150 (DO-DI-PCIX64-VE) LogiCOREVirtex-II PCI-X 64/133 Interface for V-E S-II 49% XC2V500-5 V-EIncludes PCI 64 bit interface at 33 MHz V-IIP (DO-DI-PCIX64-VE) Virtex-E VPCI-X 64/66 Interface for S-II V X V-II Pro (-6) Virtex-II XC2V80-5 XC2S50E-6 125 LogiCOREIncludes PCI 64 bit interface at 33 MHz Xilinx V-E S-IIE 16% 76 Phy Layer (DO-DI-RIO8-PHY) RapidIO 8-bit port LP-LVDS V-IIP 28% S-II V V-II 150 56% S-II 150 XC2V80-5 Layer (DO-DI-RIO8-LOG) Transport RapidIO Logical (I/O) and V-E S-IIE 83 108 XC2S50-6 S-II Pro (-6) Virtex-II VRapidIO Phy Layer to PLB Bridge reference design Pro (-6) Virtex-II 6% 72 generator and baud rate S-IIE UART 21% XC2S50E-6 S-IISerial Protocol Interface (SPI) Slave XC2V80-4USB 1.1 Device Controller 6% 66 XC2V500-5 103WindRiver RTOSBackplanes and Gigabit Serial I/O Interfaces HW and 6 - 7%XAPP 649: Pro Devices Virtex-II SONET Rate Conversion in 66 XAPP 651: I2C-like, multi master fast/std. Scramblers/Descramblers SONET and OTN XC2S100-6 mo I2C-like, 66 multi master fast/std. m XAPP 652:Alignment and SONET/SDH Framing Word CAST, Inc. Philips I2C 1.1; Zilog Z80 compatible, supports master tx/rx and slave modes 8-bit processorTransceivers AllianceCORE with Embedded RocketIO 401:Aurora single lane, 32-bit interface Memec Core Serial data communication AllianceCORE Em Video & Image Processing V-II2D discrete/inverse cosine transform V-ETransform Discrete Cosine 2D Forward V Embedded system desi V-IITransform2D Inverse Discrete Cosine Transform Wavelet Block-based 2D Discrete S-IIE V-ETransform Discrete Cosine Combined 2D Forward/Inverse Embedded systems, Communica Transform Wavelet Discrete Combined 2D Forward/Inverse Barco-Silex ControllerVideo Compact CAST, Inc. CAST, Inc.Transform, Discrete Cosine DCT/IDCT Forward/Inverse 1-D AllianceCORE CAST, Inc. AllianceCORETransform, Discrete Cosine DCT/IDCT Forward/Inverse 2-D AllianceCORE 14% S-II AllianceCORE V-II V-II V-II V-E 80 V-E V-II V-E V V-E V V Xylon d.o.o. V XC2S50E-6 AllianceCORE S-II 12 S-II S-II S-II V-II XC2V1000-5 77% 42% 62% 45% V 133 83 52 95 XC2V250-5 S-II XC2V500-5 XC2V250-5 XC2V500-5 88 XC2V250-4 Single & double panel, support, LCD/CRT 4 gray, 256 colors Vid

Spring 2003 Xcell Journal 113 REFERENCE lation Video editing,Video digital camera, scanners Video editing,Video digital camera, scanners editing,Video digital camera, scanners Video coding and security,Video medical imaging, scanners, copiers, digital still cameras Digital RGB to TV output conversion,Digital RGB to image filtering, machine vision, still and video image processing. Audio/Video recording and editing equipment Motion JPEG Codec JPEG recording and editing equipment Motion Audio/Video . Not recommended for new designs. Suggested replacement: Register FD-based Parallel 8-bit I/O, 10-bit coeff, 13-bit internal precision; 5-cycle latency; fully synchronous SMTPE/EBU compliant, PAL/NTSC, lock-on external video reference onforms to ISO/IEC Baseline 10918-1, 4 quantization tables, 4 Huffman tables. Stallable Not recommended for new designs. Suggested replacement: Bit Multiplexer or Bus Not recommended for new designs. Suggested replacement: Bit Multiplexer or Bus Not recommended for new designs. Suggested replacement: Bit Multiplexer or Bus Conforms to ISO/IEC Baseline 10918-1, color, multi-scan, Gray-Scale Not recommended for new designs. Suggested replacement: FD-based Shift Register Device Features Key Application Examples

Implementation Example

Spartan

Spartan-II

Spartan-IIE

Virtex

Virtex-E

Virtex-II Virtex-II Pro Virtex-II www.xilinx.com/ipcenter AllianceCOREAllianceCOREAllianceCORE V-II V-II V-E V-II V-E V V-E V V 40 42 70 Virtex-II Virtex-II Virtex-II AllianceCOREAllianceCORE V-II V-EAllianceCORE V S-IIE S-II V 35% V S-II 83 S-II XC2V1000-5 S 77% 8-bit input/12-bit output precision; 76 clock cycle latency 78 20 XCV200-6 DCT for 8X8, XCV400E-8 16X16, IDCT IEEE1180-1990 compliant C JPEG, MPEG, H.26X endor Name IPType Occ MHz Telecom Italia Lab S.p.A. Telecom Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. Amphion Semiconductor Ltd. eInfochips Pvt. Ltd. inSilicon Corporation Function V Video & Image Processing (continued) Video & Image Processing Transform Discrete Cosine JPEG color image decoderFast scale image decoder JPEG gray Fast Transform Discrete Cosine FIDCT Forward/Inverse Huffman DecoderJPEG encoder/decoder DWT forward Line-based programmable Code GeneratorTime Longitudinal Barco-Silexcore V1.0 Barco-SilexV1.0 Motion JPEG Decoder core AllianceCORE AllianceCOREV2.0 Motion JPEG Encoder core CAST, Inc.RGB2YCrCb Color Space ConverterRGB2YCrCb Color Space Converter AllianceCORE V-II Deltatec S.A. V-II V-EBasic Elements AllianceCOREBinary Counter V V-II CAST, Inc.Binary Decoder V-E VBit Bus Gate AllianceCORE CAST, Inc.Bit Gate V Perigee, LLCBit Multiplexer AllianceCORE AllianceCOREBUFE-based Multiplexer Slice V-II Multiplexer SliceBUFT-based V-E S-II VBus Gate V-IIBus Multiplexer V 68% V-EComparator S-IIE RegisterFD-based Parallel 78% V 145% S-IIFD-based Shift Register 73 S-IIE MUXFour-Input S-II V LatchLD-based Parallel 56 51 Xilinx Xilinx ConverterParallel-to-Serial 22% Xilinx XC2V1000-4 XilinxRAM-based Shift Register 36%Register Xilinx XC2V1000-4 LogiCORE 29% XC2V250-5 S-II LogiCORE MUXThree-Input LogiCORE 25 Conforms to ISO/IEC Baseline 10918-1, Xilinx V-IIP Gray-Scale LogiCORE MUX V-IIPTwo-Input S 80 V-II Xilinx V-IIP V-II LogiCORE V-IIP 96 V-II V-E Xilinx V-II V-E V-IIP LogiCORE XC2V1000-5 V-E V Xilinx V-E V-II LogiCORE V V-IIP Xilinx XC2S15-5 V Xilinx V-E S-IIE V-II V V-IIP LogiCORE XC2S50E-7 S-IIE Xilinx S-IIE S-II Xilinx V-II Xilinx V-E V S-IIE S-II V-IIP LogiCORE 202 S-II LogiCORE V-E V-II Xilinx Xilinx S-II LogiCORE S-IIE V V-IIP V-IIP V-E LogiCORE S-II V V-II V-IIP LogiCORE S-IIE LogiCORE V-II XCV100E-8 V-II V-E S-IIE V-IIP S-II V LogiCORE LogiCORE V-IIP V-E V-II V-E S-II V-II S-IIE V V-IIP V V-E Xilinx S-II V-E V V-II S-IIE S-IIE Xilinx V S-IIE V-E S-II V Xilinx S-II S-II S-IIE LogiCORE S-IIE V S-II S-II LogiCORE S-IIE One clock cycle throughput LogiCORE S-II S S HDTV,TV output modu real time 2-256 bits output width 2-256 bits output width S 1-256 bits wide S 1-256 bits wide S 1-256 bits wide 1-256 bits wide 1-256 bits wide IO widths up to 256 bits 1-256 bits wide 1-64 bits wide 1-256 bits wide, 1024 words deep 1-256 bits wide 1-256 bits wide 1-256 bits wide YCrCb2RGB Color Space Converter Perigee, LLC AllianceCORE V S-II S XCV100E-8 One clock cycle throughput HDTV, real time video Please visit the Xilinx IP Center for most up-to-date information at

114 Xcell Journal Spring 2003 TheThe BestBest ofof BothBoth WorldsWorlds

Eureka Technology offers a wide range of silicon proven IP cores for SoC designs. By using our free web-based SoCDesigner™ software, our Eureka Technology customers are able to combine the benefits of using pre-verified silicon proven IP cores with the ability of customizing the SoC design according to their exact specifications. Our rich repertoire of IP cores enable our customers to Call today for more information build a wide varieties of system controller/CPU companion chips to support tel: +1 650 960 3800 different CPU and bus standards such as ARM™, PowerPC™, ARC™, email: [email protected] MIPS™, SH™2/3/4, PCI™, Cardbus™ and PCMCIA™. www.eurekatech.com To learn more about our silicon proven IP cores and SoCDesigner, please visit 4962 El Camino Real, Los Altos, www.eurekatech.com/socdesigner CA 94022 USA

© 2003 Eureka Technology Inc. All rights reserved. The Eureka Technology logo and SoCDesigner are trademarks of Eureka Technology Inc. All other trademarks are properties of their respective owners. Up to 600K Gates New and 514 I/Os Got-a-Lotta-Pins. Not-a-Lotta-Price.

With up to 514 I/Os, the newly extended Fits your budget. Hits your market. Spartan®-IIE family offers the lowest cost per pin Spartan-IIE FPGAs lead the way in quick-turn, cost-sensitive in the industry. That’s why Spartan-IIE FPGAs markets like consumer, digital video, home networking, auto- are the first choice of designers seeking a low- motive and much more. Supporting 19 I/O cost, higher density solution for high-volume applications. standards and driven by Xilinx’s proven, lightning-fast ISE 5.1i software, there’s no All the I/O. All the density. No compromises. better low-cost solution Spartan-IIE FPGAs achieve minimum die size with- shipping today. out sacrificing I/Os. With double the competitive I/O count, and densities ranging from 50,000 Visit www.xilinx.com/spartan2e/ today to 600,000 system gates, the Spartan-IIE series is and find out how you can get all the pins the only true solution to ASIC headaches. you need . . . at the price you want.

The Programmable Logic CompanySM www.xilinx.com/spartan2e/

® 2003 Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. Europe +44-870-7350-600; Japan +81-3-5321-7711; Asia Pacific +852-2-424-5200; Xilinx and Spartan are registered trademarks, WebPACK ia a trademark and The Programmable Logic Company is a service mark of Xilinx, Inc.

PN 0010707