Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Enabling Technologies for Schedule Xputer Lab July 8, 2002, ENST, Paris, France University of Kaiserslautern

time slot xx.30 – xx.00 Reconfigurable Computing (RC) Reiner Hartenstein Enabling Technologies for xx.00 – xx.30 coffee break University of Reconfigurable Computing and xx.30 – xx.00 Design / Compilation Techniques Kaiserslautern Software / Configware Co-Design xx.00 – xx.00 lunch break

xx.00 – xx.30 Resources for Data-Stream-based RC

Part 3: xx.30 – xx.00 coffee break Resources for RC xx.00 – xx.30 FPGAs: recent developments

2002, 2 http://kressaray.de - © [email protected]

Opportunities by new patent laws ? >> Configware Industry

University of Kaiserslautern University of Kaiserslautern

• to clever guys being keen on patents: • Configware Industry • don‘t file for patent following details ! • Terminology • MoPL data-procedural language • everything shown in this presentation • Anti architecture and circuitry has been published years ago • Stream-based Memory Architecture http://www.uni-kl.de

© 2002, [email protected] 3 http://kressaray.de © 2002, [email protected] 4 http://kressaray.de

Configware heading for mainstream OS for PLDs

University of Kaiserslautern University of Kaiserslautern • Configware market taking off for mainstream • FPGA-based designs more complex, even SoC • separate EDA software market, comparable • No design productivity and quality without good to the compiler / OS market in computers, configware libraries (soft IP cores) from various • Cadence, Mentor, just jumped in. application areas. • < 5% Xilinx / Altera income from EDA SW • Growing no. of independent configware houses (soft IP core vendors) and design services • AllianceCORE & Reference Design Alliance • Currently the top FPGA vendors are the key innovators and meet most configware demand.

© 2002, [email protected] 5 http://kressaray.de © 2002, [email protected] 6 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

The Software Alliance EDA Program Xputer Lab Alliances University of Kaiserslautern University of Kaiserslautern provides a wide Acugen Software, IKOS Systems, selection of EDA tools Agilent Innoveda, • The Software • The Xilinx XtremeDSP EEsof EDA, Mentor AllianceEDA Program Initiative (with Mentor , Graphics, helps leading EDA Aptix, MiroTech, • ... Xilinx Inc.'s Graphics) vendors to integrate Auspy Development, Model Technoloy, Foundation... • MathWorks / Xilinx Xilinx Alliance software Cadence, Protel International, •# tightly into their tools • free WebPACK Alliance. Celoxica, Simucad, Dolphin Integration, SynaptiCAD, downloadable tool • The Wind River / Xilinx Elanix, Synopsys, palette alliance Exemplar, Synplicity, Flynn Systems, Translogic, Hyperlynx, Virtual Computer Corporation.

© 2002, [email protected] 7 http://kressaray.de © 2002, [email protected] 8 http://kressaray.de

The Xilinx AllianceCORE program The Xilinx Reference Design Alliance Program University of Kaiserslautern a cooperation between Xilinx and third-party University of Kaiserslautern core developers, to produce a broad selection of industry-standard solutions for use in The Xilinx Reference Design Alliance Program helps Xilinx platforms. - Partners are: Amphion Semiconductor, Ltd. the development of multi-component reference ARC Cores MemecCore designs that incorporate Xilinx devices and other CAST, Inc. DELTATEC Inventra semiconductors. Derivation Systems, Inc. NewLogic Technologies, Inc. (Europe) Dolphin Integration (Grenoble) NMI Electronics The designs are fully functional, but no warranties, Eureka Technology Inc. Paxonet Communications, Inc. Frontier Design Inc. Perigee, LLC no liability. Partners are:. GV & Associates, Inc. Rapid Prototypes Inc. inSilicon Corporation sci-worx GmbH (Hannover, Germany) JK microsystems, Inc. iCODING Technology Inc. SysOnChip Loarant Corporation TILAB (Telecom Italia Lab) ADI Engineering LYR Technologies Mindspeed Technologies VAutomation - A Conexant Business Virtual IP Group, Inc. Innovative Integration NetLogic Microsystems (formerly Applied Telecom) | XYLON.

© 2002, [email protected] 9 http://kressaray.de © 2002, [email protected] 10 http://kressaray.de

The Xilinx University Program offers over a University of Kaiserslautern University of Kaiserslautern hundred IP cores (1) Altera offers over a hundred IP The Xilinx University Program provides cores like, for example: • modulator, • Xilinx Student Edition Software, • controller, • synchronizer, • UART, • DDR SDRAM controller, • Professor Workshops, • microprocessor, • Hadamar transform, • a Xilinx University User Group, • decoder, • interrupt controller, • Presentation Materials and Lab Files, • bus control, • Real86 16 bit microprocessor, • USB controller, • floating point, • Course Examples, • PCI bus interface, • FIR filter, • Research, • viterbi controller, • discrete cosine, • Books, etc. • fast Ethernet • ATM cell processor, • MAC receiver or transmitter, • and many others.

© 2002, [email protected] 11 http://kressaray.de © 2002, [email protected] 12 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Altera offers over a Altera IP core design services Xputer Lab University of Kaiserslautern hundred IP cores (2) University of Kaiserslautern

from Altera | Modelware AMIRIX Systems, Inc. Ncomm, Inc. Amphion Semiconductor, Ltd. NewLogic Technologies Altera IP core design Arasan Chip Systems, Inc. Northwest Logic services are available CAST, Inc. Nova Engineering, Inc. from: Digital Core Design Palmchip Corporation Eureka Technology Inc. Paxonet Communications HammerCores PLD Applications • Northwest Logic Innocor Sciworx Ktech Telecommunications, Inc. Simple Silicon Lexra Computing Engines Tensilica Mentor Graphics - Inventra TurboConcept.

© 2002, [email protected] 13 http://kressaray.de © 2002, [email protected] 14 http://kressaray.de

Altera Certified Design Center The Altera Consultants University of Kaiserslautern (CDC) Program University of Kaiserslautern Alliance Program (ACAP):

Certified Design Center (CDC) Program:

• Barco Silex The Altera Consultants Alliance • El Camino GmbH Program (ACAP): lists • Excel Consultants • Plextek •41 offices in North America and • Reflex Consulting • Sci-worx •29 in the rest of the world. • Tality • Zaiq Technologies.

© 2002, [email protected] 15 http://kressaray.de © 2002, [email protected] 16 http://kressaray.de

Devlopment boards Consultants and services not listed University of Kaiserslautern University of Kaiserslautern by Xilinx nor Altera (index)

Devlopment boards are offered from: Flexibilis, Tampere, Finland, • Altera Algotronix, Edinburgh, Geoff Bostock Designs, Wiltshire, England, • El Camino GmbH Andraka Consulting Group Great River Technology, Alberquerque, NM, New Horizons GB Ltd, United Kingdom, • Gid'el Limited Arkham Technology, Pasadena, CA North West Logic Barco Silex, Louvain-la-Neuve, Belgium, • Nova Engineering, Inc. Silicon System Solutions, Canterbury, Australia, Bottom Line Technologies, Milford, NJ • PLD Applications Smartech, Tampere, Finland, Codelogic, Helderberg, South Africa, • Princeton Technology Group Tekmosv, Austin, Texas, Coelacanth Engineering, Norwell, MASS The Rockland Group, Garden Valley, CA • RPA Electronics Design, LLC Comit Systems, Inc., Santa Clara, CA Nick Tredennick, Los Gatos, California,

• Tensilica. EDTN Programmable Logic Design Center Vitesse,

© 2002, [email protected] 17 http://kressaray.de © 2002, [email protected] 18 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Consultants and services not Consultants and services not Xputer Lab University of Kaiserslautern listed by Xilinx nor Altera (1) University of Kaiserslautern listed by Xilinx nor Altera (2)

Algotronix, Edinburgh, Reconfigurable Computing and FPL in Bottom Line Technologies, Milford, New Jersey, FPGA design, software radio, communications and computer security training, designing Xilinx parts since 1985 Codelogic, Helderberg, South Africa, consulting, FPGA design Andraka Consulting Group high performance FPGA designs for services DSP applications Coelacanth Engineering, Norwell, Massachusetts, design services, Arkham Technology, Pasadena, low cost IP cores for Xilinx and test development services, in wireless communication, DSP-based , embedded processor, DSP, wireless communication, instrumentation, mixed-signal ATE COM / CORBA / DirectX, client-server database programming, Comit Systems, Inc., Santa Clara, California, DSP, ASIC, software internationalization, PCB design networking, embedded control in avionics -- FPGA / ASIC design and system software Barco Silex, Louvain-la-Neuve, Belgium, IP integration boards for ASIC and FPGA, consultancy, design, sub-contracting EDTN Programmable Logic Design Center

© 2002, [email protected] 19 http://kressaray.de © 2002, [email protected] 20 http://kressaray.de

Consultants and services not Consultants and services not University of Kaiserslautern listed by Xilinx nor Altera (3) University of Kaiserslautern listed by Xilinx nor Altera (4)

FirstPass, Castle Rock, Colorado Silicon System Solutions, Canterbury, Australia, VHDL IP Vitesse, ASIC design cores for the ASIC and FPGA/CPLD/EPLD markets Flexibilis, Tampere, Finland, VHDL IP cores for Xilinx products Smartech, Tampere, Finland, ASIC and FPGA design Geoff Bostock Designs, Wiltshire, England, FPGA design services Tekmosv, Austin, Texas, Multiple Designs on a Single Gate Great River Technology, Alberquerque, New Mexico, FPGA Array, HDL synthesis, design conversions, chip debug, test design services in digital video and point-to-point data generation transmission for aerospace, military, and commercial broadcasters The Rockland Group, Garden Valley, California, a New Horizons GB Ltd, United Kingdom, FPGA design and TeleConsulting organization about logic design for FPGAs training, Xilinx specialist Nick Tredennick, Los Gatos, California, investor and North West Logic; FPGA and embedded processor design in consultant digital communications, digital video

© 2002, [email protected] 21 http://kressaray.de © 2002, [email protected] 22 http://kressaray.de

>> Terminology Terminology

University of Kaiserslautern University of Kaiserslautern

Programming Paradigm Platform source • Configware Industry “von Neumann” Hardware Software • Terminology Soft Machine (w. Coarse grain high level • MoPL data-procedural language soft datapaths) Flexware Configware

• Anti architecture and circuitry netlist level RL (FPGA etc.) fine grain Flexware Configware

• Stream-based Memory Architecture

http://www.uni-kl.de

© 2002, [email protected] 23 http://kressaray.de © 2002, [email protected] 24 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Terminology & Acronyms Stream-based Computing (2) Xputer Lab University of Kaiserslautern University of Kaiserslautern

• RC: reconfigurable computing terms: • RL: reconfigurable logic • Software (SW): procedural sources* • DPU: datapath unit • Configware (CW): structural sources • DPA: datapath array • Hardware (HW): hardwired platforms • rDPU: reconfigurable DPU • ASIC: customizable hardwired platforms • rDPA: reconfigurable DPA • Flexware (FW): reconfigurable platforms • stream-based computing: using complex pipe network • FPGA: field-programmable gate array (super-systolic: Kress et al.) • FPL: field-programmable logic *) note: firmware is SW !

© 2002, [email protected] 25 http://kressaray.de © 2002, [email protected] 26 http://kressaray.de

.

Confusing Terminology [à la Ingo Kreuz] Terms (1)

University of Kaiserslautern University of Kaiserslautern

Computer Science and EE as well as ist R&D and applicatgion areas suffer from a babylonial confusion. Term Meaning Example Communication not only between Computer Science and EE, but also Hardware hardwired Processor, ASIC between ist special areas, even between ist different abstrac tion Flexware Reconfigurable FPLA, FPGA, levels is made difficult – mainly because of immature terminology in relation to reconfigurable circuits and their applications. (structurally programmable) KressArray IBM 360 Computer Family Terms are rarely standardized and often used with drastically Firmware Microprogramme (rarely used different meanings – even within then same special area. after introduction of RISC proc.) Often terms have been so badly coined, that they are not self- Software procedural programs Word, C, OS, explanatory, but mesleading. A demonstratory example is the (sequentially executable by a CPU) Compiler, etc. comparizon of terms used used in VHDL and . Configware structural programs, soft for rDPA FPGA Ideal are "intuitive" terms. But often Intuition yields the wrong idea. configuration, e. g. as a logic Whenever a new term appears in teaching, I often have to tell the IP cores, personalizing CPLD, circuit, state machine, students, that the term does not mean, what he believes. FPGA, or other Flexware datapath, function

© 2002, [email protected] 27 http://kressaray.de © 2002, [email protected] 28 http://kressaray.de

. .

[à la Ingo Kreuz] Terms (2) [à la Ingo Kreuz] Terms (3)

University of Kaiserslautern University of Kaiserslautern Term Meaning Example data objects of computing Bits, numbers, operands, “data” property results, any text (also Term Meaning Example depends on the moment compiler input) lists, of watching graphs, tables, images, ... boot program simple program to comparable to the data stream ordered, also parallel I/O data streams for enable programming starter of the data word lists, systolic or other arrays - usually saved in motor of a car obtained by scheduling non-volatile memory programming personalisation by procedural code or booting load and execute a loading programm code structural code: for boot program (re)configuration program source text or object procedural oder structural code for programming

© 2002, [email protected] 29 http://kressaray.de © 2002, [email protected] 30 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

[à la Ingo Kreuz] Hardware Terms (1) [à la Ingo Kreuz] Hardware Terms (2) Xputer Lab University of Kaiserslautern University of Kaiserslautern Term Meaning Example Term Meaning Example DPU data path unit, processes ALU with machine execution unit, driven by von Neumann operands - no CPU since without registers, deterministic sequencer machine sequencer - no maschine multiplexers etc. „dataflow not a machine, since without (sleeping Computer CPU with RAM and interfaces machine“ a deterministic sequencer research Parallel ensemble of several Computers (exotic concept) area) Computer

CPU Instruction Set Processor ARM, Pentium Xputer deterministically data-driven MoM ("von Neumann”): program core, Machine, (transport-triggered) - architectures counter (instruction data counter(s) used instead of a (Kaiserslautern) sequencer) and DPU - mode of program counterm operation: deterministically dataflow indeterministically data-driven (sleeping research instruction-driven machine (execution sequence unpredictable) area)

© 2002, [email protected] 31 http://kressaray.de © 2002, [email protected] 32 http://kressaray.de

[à la Ingo Kreuz] Terms on Parallelism (1) [à la Ingo Kreuz] Terms on Parallelism (2)

University of Kaiserslautern University of Kaiserslautern Term Meaning Example Term Meaning Example parallelism several levels of parallelism parallel processes, distinguished parallelism at pipelining several uniform or different pipelined CPUs, pipe instruction set level, DPUs running simultaneously networks, systolic, pipelines, - connected to a pipeline by etc. buffer registers. concurrent parallel processes run on weather prognisis, different CPUs of a parallel complex simulations, chaining several uniform or different Schaltnetze, computer - may occasionally etc. DPUs running simultaneously komplexe exchange signals or data - connected to a pipeline arithmetische without buffer registers Operatoren ISP (instruction several CPUs run in parallel VLIW (very long set parallelism) by clocked synchronization instruction word) Pipe network Ensemble of DPUs, also systolisc arrays, computer multiple pipelines, also with stream-based irregular or wild structures computing arrays

© 2002, [email protected] 33 http://kressaray.de © 2002, [email protected] 34 http://kressaray.de

[à la Ingo Kreuz] Terms on Parallelism (3) [à la Ingo Kreuz] Counterparts

University of Kaiserslautern University of Kaiserslautern

Term Meaning Example category property counterpart Systolic Array Pipe network with only Matrix computation, programing procedural structural (synthesis, design) linear (straight-on, no DSP, DNA sequencing, mode (classical) - „field-programmable“, PLA branching), uniform etc. „programming“, etc. pipelines (all DPUs machine: controlflow-driven Data-driven: Xputer machine hardwired and with same principle of (instruction-driven): v. functionality) pipelines operation Neumann stream-based pipe network, configured image processing, DSP, system: instruction-flow-driven Data-stream-based (systolisc computing arrays before fabrication complex functions and principle of (parallel computer etc.) array, DPU array, KressArray) (super-systolic arrays) algorithms operation

(coarse grain) stream-based arrays, KressArray Set-up time during run time; before run time: reconf. stream- configurable after (datapaths (instruction-driven) FPGA (at compile time) based arrays fabrication switched thru) Gate Array (at fabrication)

© 2002, [email protected] 35 http://kressaray.de © 2002, [email protected] 36 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

>> MoPL data-procedural language Fundamental Ideas available (1) Xputer Lab University of Kaiserslautern University of Kaiserslautern

• Data Sequencer Methodology

• Data-procedural Languages (Duality with v N) • Configware Industry • Terminology • ... supporting memory bandwidth optimization • MoPL data-procedural language • Soft Data Path Synthesis Algorithms • Anti architecture and circuitry • Parallelizing Loop Transformation Methods

• Stream-based Memory Architecture • Compilers supporting Soft Machines http://www.uni-kl.de • SW / CW Partitioning Co-Compilers

© 2002, [email protected] 37 http://kressaray.de © 2002, [email protected] 38 http://kressaray.de

Fundamental Ideas available (2) Programming Language Paradigms University of Kaiserslautern University of Kaiserslautern

• Programming Xputers language category Computer Languages Xputer Languages both deterministic procedural sequencing: traceable, checkpointable read next instruction, read next data item, goto (instr. addr.), goto (data addr.), • Similarities to programming computers operation jump (to instr. addr.), jump (to data addr.), sequence instr. loop, loop nesting data loop, loop nesting, driven by: • How not to get confused by similarities no parallel loops, escapes, parallel loops, escapes, instruction stream branching data stream branching state register program counter data counter(s) address massive memory • What benefits vs. Computers ? computation cycle overhead overhead avoided Instruction fetch memory cycle overhead overhead avoided parallel memory bank access interleaving only no restrictions

© 2002, [email protected] 39 http://kressaray.de © 2002, [email protected] 40 http://kressaray.de

*> Declarations goto PixMap[1,1] JPEG zigzag scan pattern Similar Programming Language Paradigms EastScan is HalfZigZag; University of Kaiserslautern 4 stepUniversity by of Kaiserslautern[1,0] SouthWestScan published end EastScan; uturn (HalfZigZag) SouthScan is in 1993 step by [0,1] 1 endSouthScan; x language category Computer Languages Xputer Languages NorthEastScan is loop 8 times until [*,1] y both deterministic procedural sequencing: traceable, checkpointable 2 step by [1,-1] dataHalfZigZag counter data counter endloop read next instruction, read next data object, end NorthEastScan; goto (instruction addr.), goto (data addr.), SouthWestScan is jump (to instruction addr.), jump (to data addr.), loop 8 times until [1,*] 3 step by [-1,1] sequencing instruction loop, data loop, endloop driven by: instruction loop nesting data loop nesting, end SouthWestScan; no parallel loops, parallel data loops, HalfZigZag is instruction loop escapes, data loop escapes, EastScan loop 3 times instruction stream branching data stream branching SouthWestScan

SouthScan

NorthEastScan data counter data counterHalfZigZag EastScan endloop end HalfZigZag; © 2002, [email protected] 41 http://kressaray.de © 2002, [email protected] 42 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

GAG = Generic GAU generic address unit Scheme >> Anti architecture and circuitry Address Xputer Lab Generatorc University of Kaiserslautern University of Kaiserslautern

B0 DA L [ | | | | ] • Configware Industry DA limit L0 B0 • Terminology • MoPL data-procedural language Limit Address Base Slider Stepper Slider • Anti architecture and circuitry

http://www.uni-kl.de • Stream-based Memory Architecture all 3 are copies GAU of the same BSU A stepper circuit © 2002, [email protected] 43 http://kressaray.de © 2002, [email protected] 44 http://kressaray.de

GAG: AddressGAG: Address Stepper Stepper Generic Sequence Examples University of Kaiserslautern University of Kaiserslautern L0 DA B0

Limit Address Base ] [ | | atomic scan linear scan Slider Stepper Slider Limit Base stepVector maxStepCount B0 GAU a) b) video scan A init tag B0 DA L -90º rotated video scan Step c) [ | | | | ] L A D A Counter -45º rotated (mirx (v scan)) until limit sheared video scan

+ / – =o non-rectangular video scan sequencing zigzag video scan

d) e) f) g) spiral scan Escape End GAG = BSU = stepper Clause Detect feed-back-driven scans Generic Basic Address Stepper perfect A endExec Generator shuffle Unit Address

© 2002, [email protected] 45 http://kressaray.de © 2002, [email protected] 46 http://kressaray.de

Slider Animation Demo GAG Complex Sequencer Implementation University of Kaiserslautern University of Kaiserslautern

GAU GAU address ceiling VLIW floor L0 DA B0 L0 DA B0 F B 0 L0 C stack DA Limit Address Base Limit Address Base Slider Stepper Slider Slider Stepper Slider

DL DB GAU GAU A A GAG

y L0 DA B0 SDS x Limit Address Base Slider Stepper Slider

GAU all `been GAG A published Generic Address Generator in 1990

© 2002, [email protected] 47 http://kressaray.de © 2002, [email protected] 48 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

>> Stream-based Memory Architecture MoM Xputer Architecture Xputer Lab University of Kaiserslautern University of Kaiserslautern

Smart memory interface • Configware Industry • Terminology • MoPL data-procedural language Scan Multiple • Anti architecture and circuitry Window „Cache“ rDPA RAM banks • Stream-based Memory Architecture http://www.uni-kl.de

© 2002, [email protected] 49 http://kressaray.de © 2002, [email protected] 50 http://kressaray.de

Antimachine: MoM architecture Linear Filter Application

University of Kaiserslautern University of Kaiserslautern

b)

Handle Position Generator r/w r r intra scan window accesses y-GAG x-GAG r r r (low level sequencing) scan window y r r r example handleposition

Scan Window Generator

bank 0 1 • • • n scan pattern memory accesses (high level sequencing)

w/r r r Bank a w / r r r w r

r r r Bank b r r r r x r r r Bank a r r r r handle positions scan step

© 2002, [email protected] 51 http://kressaray.de © 2002, [email protected] 52 http://kressaray.de

Scanline unrolling 90o Rotation of Scan Pattern

University of Kaiserslautern University of Kaiserslautern

r r r r r Bank b

r r r r r Bank a w w w r r r r r Bank b w w w r r r r r Bank a

r/w r r r r r r r r r r r r Bank b

r/w r r r r r r r r r r r r Bank a scan r/w r r r/w r/w r/w r r window w w w Bank b r r r overlap r r r/w r/w r/w area w w w Bank a r r r

© 2002, [email protected] 53 http://kressaray.de © 2002, [email protected] 54 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Linear Filter Application XMDS Scan Pattern Editor GUI Xputer Lab University of Kaiserslautern University of Kaiserslautern Parallelized Merged Buffer Linear Filter Application with example image of x=22 by y=11 pixel

final design

after inner scan line loop unrolling

after scan line unrolling

hardw. level access optim.

initial design

© 2002, [email protected] 55 http://kressaray.de © 2002, [email protected] 56 http://kressaray.de

MoM Architecture Features Hot Research Topic: Memory Architectures

University of Kaiserslautern University of Kaiserslautern

• Scan Cache Size adjustable at run time • High Performance Embedded Memory Architectures [Cathoor et al.] • Any other shape than square supported • High Performance Memory Communication Architectures [Herz] • 2-dimensional memory space • Custom Memory Management Methodology [Cathoor et al] • Supports generic „scan patterns“ • Data Reuse Transformations [Kougia et al.] – Subject of parallel access transformations • Data Reuse Exploration [Soudris, Wuytak] – compare Francky Cathoor et al . • Rapidly greowing market: IP cores, module generators ets. • Supports visualization

© 2002, [email protected] 57 http://kressaray.de © 2002, [email protected] 58 http://kressaray.de

Processor Memory Performance Gap rDPAs: classical cache does not help

University of Kaiserslautern University of Kaiserslautern

• Stream-based arrays • super pipe networks, Performance are a memory no parallel computers ! 1000 µProc bandwidth problem 60%/yr.. • the memory bandwidth • classical interleaving is 100 CPU Processor-Memory problem is often more not practicable, since Performance Gap: dramatic then for based on sequential (grows 50% / year) microprocessors instruction streams 10 • classical caches do not • the problem: throughput DRAM DRAM 1 help, since instruction of parallel data streams, 7%/yr.. 1980 1990 2000 sequencing is not used not instruction streams

© 2002, [email protected] 59 http://kressaray.de © 2002, [email protected] 60 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Data-Stream-based Soft Anti Machine The Disk Farm? or Xputer Lab University of Kaiserslautern University of Kaiserslautern a System On a Card? “instructions” [Gordon Bell, The 500GB disc card Jim Gray, Compiler LOTS of bandwidth ISCA2000] Memory A few disks replaced by 14" (data memory) Scheduler rDPA >10s Gbytes RAM and a processor memory bank MicroDrive:1.7” x 1.4” x 0.2” 2006: ? memory bank 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek memory bank ... 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor memory bank ... 2x height Connected via crossbar switch growing like Moore’s law Sequencers 16 Mbytes; ; 1.6 Gflops; 6.4 Gops memory bank (data stream 10,000+ nodes in one rack! generator) 100/board = 1 TB; 0.16 Tflops

© 2002, [email protected] 61 http://kressaray.de © 2002, [email protected] 62 http://kressaray.de

MoM Application Examples Schedule

University of Kaiserslautern University of Kaiserslautern • Image Processing • Grid-based design rule check [1983*] time slot – 4 by 4 word scan cache 08.30 – 10.00 Reconfigurable Computing (RC) – Pattern-matching based 10.00 – 10.30 coffee break – Our own nMOS „DPLA“ design 10.30 – 12.00 Stream-based Computingfor RC – design rule violation pixel map automatically 12.00 – 14.00 lunch break generated from textual design rules – 256 M&C nMOS, 800 single metal CMOS 14.00 – 15.30 Resources for RC – Speed-up > 10000 vs. Motorola 68000 15.30 – 16.00 coffee break 16.00 – 17.30 FPGAs: recent developments *) „machine“ not yet discovered

© 2002, [email protected] 63 http://kressaray.de © 2002, [email protected] 64 http://kressaray.de

>>> Coarse Grain Schedule

University of Kaiserslautern University of Kaiserslautern

time slot 08.30 – 10.00 Reconfigurable Computing (RC) 10.00 – 10.30 coffee break 10.30 – 12.00 Stream-based Computing for RC - END - 12.00 – 14.00 lunch break 14.00 – 15.30 Resources for RC 15.30 – 16.00 coffee break 16.00 – 17.30 FPGAs: recent developments

© 2002, [email protected] 65 http://kressaray.de © 2002, [email protected] 66 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # . Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://kressarray.de

Synthesizable Memory Communication Memory Communication Architecture Xputer Lab University of Kaiserslautern University of Kaiserslautern An example by Nageldinger’s KressArray Xplorer Efficient Memory Communication • hot research topic in embedded systems should be directly supported by the Mapper Tools • storage context transformations [Herz, others] Legend: Optimized • for low power sequencers Parallel memory ports Memory Controller • for high performance application not used • startups provide memory IP or generators

http://kressarray.de

© 2002, [email protected] 67 http://kressaray.de © 2002, [email protected] 68 http://kressaray.de

Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design, Part 3: ENST, Paris, 8 July 2002 Resources for RC # .