Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

TU Kaiserslautern Preface (1)

The ubiquitous everlasting crisis of computing

Reiner Hartenstein Stonewalled Progress The dominant reductionist tunnel vision mind set SDPS fellow, IEEE fellow, FPL fellow of Computing Efficiency

http://hartenstein.de Why we must Reinvent Computing [email protected] (keynote)

http://xputer.de/UnB/Reiner-1SBCCI -12.pdf © 2012, [email protected] 2 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

FPL2013 at Porto, Portugal September 2 – 4, 2013 TU Kaiserslautern Tunnel Vision …. TU Kaiserslautern Preface (2) See here: http://fpl.org/FPL2013-PreliminaryCFP.pdf

23rd International Conference On Field Programmable Logic And Applications The ubiquitous everlasting crisis of computing

A dominant reductionist tunnel vision mind set

PRELIMINARY CALL FOR PAPERS kind of „anti tunnel General Chair: João M. P. Cardoso, University of Porto, Portugal vision Program Co-Chairs: Katherine (Compton) Morrow, University of Wisconsin-Madison, USA society“ Pedro Diniz, INESC-ID, Portugal http://www.sdpsnet.org

© 2012, [email protected] 3 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 4 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

TU Kaiserslautern Preface (3) TU Kaiserslautern Outline

The ubiquitous everlasting crisis of computing The ubiquitous everlasting Computing Crisis The von Neumann Syndrome and its history A dominant reductionist tunnel vision mind set Datastream-based Computing is not new Paradigm shifts leading into the wrong direction The sustained Language Selection problem Stonewalls preventing very urgent improvements We must Reinvent Programmer Education What we urgently have to change Conclusions

© 2012, [email protected] 5 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 6 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 1 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

The Productivity Paradox Banking with computers? TU Kaiserslautern TU Kaiserslautern (an example)

Robert Solow Nobel Prize 1987 From early 1970s to early 1990s a massive slow-down in economic growth

The Productivity Paradox of IT: “discrepancy between IT investment and output”…

… at company level, national economy, and worldwide

Literature. Robert Solow: We'd better watch out; NYT Book Review, July 12, 1987, Brynjolfsson, Erik: The productivity paradox of information technology; Comm. ACM 36 (12), 1993 Jack E. Triplett: The Solow productivity paradox; Can. J. of Economics, 32,2, 1999 http://xputer.de/CJE/ took 10 minutes

© 2012, [email protected] 7 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 8 http://hartenstein.de

45 years Software Crisis So many Experts ??? TU Kaiserslautern TU Kaiserslautern F. L. Bauer 1968: All software developers know that projects „Software Crisis“ typically run over budget and behind schedule.

>10% projects abandoned >30% not successful >50% late or over budget

Before multi-core crisis: rapdily growing clock speed: Wirth‘s Law “software is slowing faster [Niklaus Wirth] than hardware is accelerating“ … much more SE conferences ….

© 2012, [email protected] 9 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 10 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

„Software“ stands for extremely memory-cycle-hungry instruction streams Scientific Revolutions TU Kaiserslautern TU Kaiserslautern Science does not progress continuously - about the uglyness of this term: Thomas S. Kuhn 1960: It proceeds as a series of The Structure of revolutionary upheavals Patterson’s Law: Nathan’s Law: Scientific Revolutions bandwidth gap grows 50% / year Software is a gas. It expands http://xputer.de/Kuhn/ Shortcomings in an established Dave - has reached >1000x to fill all its containers ... Patterson paradigm produce a crisis … until being limited by Moore’s Law that may lead to a revolution to Nathan [& Kryder’s Law] “The Memory Wall” Myhrvold replace the established paradigm coined by Sally McKee (& co-author) . But sometimes an enduring massive “The von Neumann overhead piles up to code sizes crisis does not lead to a revolution C.V. Syndrome” Ramamoorthy, of astronomic dimensions However, some „revolutions“ may even UC Berkeley introduce much worse shortcomings MLOC = Million Windows Vista: > 50 MLOC SAP NetWeaver : >250 MLOC Lines Of Code An example: Marvin Minsky 1969 © 2012, [email protected] 11 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 12 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 2 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

A long time no Revolution The Catastrophe Gap TU Kaiserslautern TU Kaiserslautern [Harold „Bud“ Lawson]

Unsolved Multicore, HPC and FPGA Programming dilemma unnecessary complexity inside Decades of tunnel vision – e.g. Software Engineering*: *) Ernst Denert, “from fashion to fashion, but no real improvements” founder sd&m complexity Fundamental misconceptions in the theory of algorithmic complexity block further progress**. We have to rethink the basic assumptions**.

** catastrophe gap We cannot afford further dominance of reductionism ) James Larus

We must radically reinvent computing and its education framework year

© 2012, [email protected] 13 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 14 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

How Societies Chose Shut down Data Stations? TU Kaiserslautern to Fail or Succeed TU Kaiserslautern computers in this context: COMPUTER, August 2008 G. Fettweis, E. Zimmermann: Power consumption by internet: ICT Energy Consumption - Trends and Challenges; WPMC'08, x30 til 2030 if trends continue Lapland, Finland, 8 –11 Sep 2008

Will our IT culture partly destroy itself ?

or our economy? e. g. Flash Crash

© 2012, [email protected] 15 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 16 © Newhttp://hartenstein.de York Times

Affordability of Computing TU Kaiserslautern TU Kaiserslautern Outline [datacenterknowledge.com] Google causing 2% electricity consumption worldwide ? The ubiquitous everlasting Computing Crisis The von Neumann Syndrome and its history Datastream-based Computing is not new The sustained Language Selection problem „The possibility of computer equipment We must Reinvent Programmer Education power consumption spiraling out of control could have serious consequences PELAMIS wave power Conclusions for the overall affordability of computing”

Google sells electricity 17 [L. A. Barroso, Google] © 2012, [email protected] http://hartenstein.de © 2012, [email protected] 18 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 3 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

The History of Computing The History of Computing TU Kaiserslautern (1) TU Kaiserslautern Prototype 1884: Herman Hollerith (2) the first reconfigurable computer (picture from earlier talks) datastream-based ! The 1st electrical computer, ready prototyped for mass production ?

Guess: which year, which company ? non-volatile !! The LUT (lookup table)

first FPGA 100 years later © 2012, [email protected] 19 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 20 http://hartenstein.de

Punched Card Data Memory „Dataflow“ Machine TU Kaiserslautern … state of the art ….. TU Kaiserslautern Bizarre architecture motivated by loving dataflow graphs Not yet invented in 1884: arbiter-driven queue machine: fully indeterministic operation • magnetic tape (1898), principles (no program counter)

• the vacuum tube (1904), despite of highest ranking US universities‘ support efforts • magnetic drum (1932), the scene is dead, meanwhile

• the transistor (1934), Data-Flow* DFM: Execution Models new for Extreme Scale • ferrite core memory (1949), Computing workshop (DFM 2012) series • hard disc (1956). *) bad name: it‘s on datastream computing http://www.cs.ucy.ac.cy/dfmworkshop/ 22 © 2012, [email protected] 21 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf http://arch© 2012, [email protected]/conferences/83 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

the ENIAC Electronic Numerical Another Bizarre „Programming“ the ENIAC Integrator and Computer TU Kaiserslautern „Paradigm Shift TU Kaiserslautern 167 m2 30 tons, 178 kW no datastream computer 17468 vacuum tubes no stored-program computer

data input

© 2012, [email protected] 23 http://hartenstein.de © 2012, [email protected] 24 http://hartenstein.de

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 4 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

really a von Multiple von Neumann Syndrome EDSAC Neumann TU Kaiserslautern computer TU Kaiserslautern

The first von-Neumann stored program* computer was Even hardware design went von Neumann completed 1949 by Maurice Wilkes at Cambridge, the EDSAC 1 (Electronic Delay Storage Automatic Calculator).. EDSAC 2, 1958: the first microprogrammable computer

Still massive MTBF problems Microprogramming: nested von Neumann machines: Eccles-Jordan requires 2 vacuum tubes per bit ! instruction streams + microinstruction streams

multiple von Neumann Syndrome

punched tape input and teleprinter. output *) 1 kBit mercury delay line memory Brief History of Microprogramming: http://cs.clemson.edu/~mark/uprog.html

© 2012, [email protected] 25 http://hartenstein.de © 2012, [email protected] 26 http://hartenstein.de

von Neumann Syndrome TU Kaiserslautern 27 TU Kaiserslautern von Neumann principles are fundamentally wrong, since data processing targets data Von Neumann streams – not instruction streams. [R. H.] Syndrome book

Many celebrities criticized the von Neumann paradigm: A Book by: 1997: Turing is irrelvant. Robert N. Charette 2005: Why Software Fails; David Parnas Lambert M. Surhone, N. N. 1995: THE STANDISH GROUP REPORT F. L. Bauer 1968: coined “Software Crisis” Mariam T. Tennoe, Nathan Myhrvold („Nathan’s Law”): Software is a gas ….. Peter G. Neumann Susan F. Hennessow (ed.): 1985-2003: 216x “Inside Risks“ G. Koch & R.H. 1975: The universal Bus considered harmful (18 years inside back cover C_ACM) Von Neumann Syndrome; Edsgar J. Dijkstra 1968: The Goto considered harmful Brad Cox 1990: Planning ßetascript publishing 2011 the Software Industrial Revolution Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006: J.© 2012, Backus, [email protected] 1978: Can programming be liberated from the von Neumann style? Why http://hartenstein.deSoftware is bad … http://xputer.de© 2012, [email protected]/UnB/ Reiner-SBCCI-12.pdf 28 http://hartenstein.de

VAX Complexity Problem Some Symptoms (my personal experience) TU Kaiserslautern TU Kaiserslautern

„mini“computers: VAX-11/750: punched card VAX-11/750 input was by orders of magnitude faster than by terminals quasi-standard ~ 1980

(my experiences:) NATO ASI on VLSI at SOGESTA, Urbino, Italy, 1981

UC Berkeley

CS dept Kaiserslautern

my Xputer lab at Kaiserslautern during E.I.S. project

NATO ASI, Urbino ‚81

30 © 2012, [email protected] 29 http://hartenstein.de © 2012, [email protected] 30 http://hartenstein.de

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 5 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

That‘s why it is so slow TU Kaiserslautern TU Kaiserslautern Outline IEEE 7th ISCA, La Baule, The ubiquitous everlasting Computing Crisis France, May 6-8, 1980 The von Neumann Syndrome and its history Datastream-based Computing is not new By rental car back from  from systolic array to anti-Machine paradigm La Baule to the airport  over to a Copernican paradigm world model The sustained Language Selection problem Mario Barbacci: We must Reinvent Programmer Education „VAX? That‘s why Conclusions it is so slow“

© 2012, [email protected] 31 http://hartenstein.de © 2012, [email protected] 32http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Systolic Arrays What Synthesis Method? (1) TU Kaiserslautern TU Kaiserslautern of course algebraic !! IEEE 7th ISCA, La Baule, (picture from x earlier talks) x x France, May 6-8, 1980 x

x - x

-

x - x

x M. J. Foster and H. T. Kung: x x x - - - x x x x x x - - - - - x x x

The Design of Special- x x x ------x x x

-

-

-

-

-

Purpose VLSI Chips ... -

-

-

-

-

-

x - x x … invented something ? x x x x x

You should recognize, that … creativecan x

© creativecan

[Karl Steinbuch] http://xputer.de/Steinbuch/ © Why not a general purpose methodology ? Mathematicians (H. T. Kung et al.): favorite examples: matrix computations

© 2012, [email protected] 33http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 34http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

What Synthesis Method? (2) Introducing Data stream definition time TU Kaiserslautern TU Kaiserslautern H. T. Kung‘s Definiton : algebraic means linear projection! x „which data item x x (picture from from or to which port input data x x x DPA earlier talks) at which time step“ only for applications with streams x x | x strictly regular data dependencies | | port # time time x x x - - - x x x *) KressArray [ASP-DAC-1995] x x x - - - - - x x x http://kressarray.de/ 1995: x x x ------x x x port # | | | port # | | | output data | | | streams x | | x x | * x x x Rainer Kress replaced it by simulated annealing : x x systolic array research: x throughout the 80ies: supports also any irregular & wild form pipe networks time Mathematicians‘ hobby

© 2012, [email protected] 35http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 36http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 6 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

Mathematicians: “It’s not our job” Who generates TU Kaiserslautern TU Kaiserslautern the data streams? (picture from earlier talks) http://xputer.de/ http://data-streams.org/ x x x x

x - x

-

x - x

x x x x - - - x x x x x x - - - - - x x x

x x x ------x x x

-

-

-

-

-

-

-

-

-

-

-

x - x x x x x x x x http://xputer.de/Steinbuch/ without a sequencer: missed to define the machine paradigm

© 2012, [email protected] *) or receives 37http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 38http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

http://xputer.de/GAG/ JPEG zigzag goto PixMap[1,1] The Datastream Machine ** HalfZigZag; scan pattern TU Kaiserslautern (aM = anti-Machine ) TUSouthWestScan Kaiserslautern is loop 8 times until [1,*] SouthWestScan **) data counters: no step by [-1,1] uturn (HalfZigZag) Flowware language example (MoPL) : Auto-Sequencing Memory*** endloop asM program counter end SouthWestScan; published at FPL 1994 with reconfigurable address FPGA.or ASIC SouthScan is (e.g. GAG*) step by [0,1] generator endSouthScan; x asM y NorthEastScan is HalfZigZag asM designed or loop 8 times until [*,1] data counter data counter minimum # step by [1,-1] asM compiled e. g. endloop of asMs end NorthEastScan; asM datapath by loop to pipe application- network EastScan is asM transforms step by [1,0] (picture from dependent end EastScan; earlier talks) HalfZigZag is EastScan asM loop 3 times programmed SouthWestScan SouthScan

***) … asM-based machine *) GAG & enabling technology

NorthEastScan HalfZigZag by Flowware paradigm… R.H. et al.: published 1989, survey: EastScan data counter data counter HICSS-24, Koloa, Hawaii, 1991, [M. Herz et al.: IEEE endloop & IEEE-COMPEURO, Hamburg, 1989 ICECS 2003, Dubrovnik] end HalfZigZag; 40 © 2012, [email protected] http://hartenstein.de © 2012, [email protected] http://xputer.de/UnB/zigzag.ppsxhttp://hartenstein.de 39 See here:

CPU-centric flat world TU Kaiserslautern Outline TU Kaiserslautern (Aristotelian model) The ubiquitous everlasting Computing Crisis The von Neumann Syndrome and its history Datastream-based Computing is not new from systolic array to anti Machine paradigm to a Copernican multi paradigm world model obsolete The sustained Language Selection problem We must Reinvent Programmer Education Conclusions CPU 41

© 2012, [email protected] See here: http://xputer.de/UnB/zigzag.ppsxhttp://hartenstein.de © 2012, [email protected] http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 7 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

Duality of procedural Languages A Clean Terminology, please TU Kaiserslautern TU Kaiserslautern program counter: data counter(s): Software Languages Flowware Languages „program“ source compilation result domain read next instruction read next data item Software instruction streams goto (instruction address) goto (data address) time Flowware data streams jump to (instruction address) jump to (data address) datapath structures instruction loop data loop Configware configured (or designed) space instruction loop nesting data loop nesting instruction loop escape data loop escape The paradigm difference: instruction stream branching data stream branching vN: moves the data to the operator at run time von Neumann (vN) versus no: internally parallel loops yes: internally parallel loops aM: moves the operator at compile time (or set- anti Machine (aM) more simple: up time) into the prepared path of the data But there is an Asymmetry no ALU tasks 43 * © 2012, [email protected] See here: http://xputer.de/UnB/zigzag.ppsxhttp://hartenstein.de © 2012, [email protected] (picture from earlier talks) 44 ) published at FPL 1994 http://hartenstein.de

The Triple Paradigm Model Twin paradigm mind set:

TU Kaiserslautern TU Kaiserslautern very old hat - but ignored PE instead of SE : future time to space mapping: procedural to structural Programmer Education FE token bit Flowware evoke Engineering FF

SE PE Software Program Engineering Engineering FF FF 1967: W. A. Clark: Macromodular Computer onfigware Systems; 1967 SJCC, AFIPS Conf. Proc. structures C PE: the Generalization CE Engineering of Software Engineering C. G. Bell et al: The Description and Use of Register- Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972 © 2012, [email protected] 45http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 46 http://hartenstein.de

47 106 Speed-up Factors FPGA: much worse Technology? by Software to FPGA migration TU Kaiserslautern TU Kaiserslautern Image processing,

Power save ~10% Factor - Pattern matching, 28500 Speed-ups by Orders of Magnitude with an

PISA project [ICCAD Multimedia 3439 DSP and DES Orders of Magnitude worse Technology? >15000 1984] real-time breaking face detection wireless 1116*

DNA Tarek Reed-Solomon seq. 8723 2400 DPLA replacing Speedup 6000 Decoding 256 FPGAs’1984 3000 • Lower clock speed video-rate MAC crypto CT imaging (E.I.S. project) 103 stereo vision 1000 pattern 730 1000 recognition 900 Viterbi Decoding 400 • Wiring overhead 288 Smith-Waterman SPIHT wavelet-based pattern matching Speed-up image compression 457 100 FFT factors are 52 BLAST molecular • Reconfigurability overhead 88 dynamics not new protein simulation 40 identification Bioinformatics (avoiding the von • Routing congestion *)DES br. equipment size Neumann syndrome) 20 Astrophysics 100 GRAPE The Paradox

(version© 2012, [email protected] a picture from earlier talks) [Tarek El-Ghazawi et al.: IEEE COMPUTER,http://hartenstein.de Febr. 2008] © 2012, [email protected] 48http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 8

Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

TU Kaiserslautern Outline TU Kaiserslautern

The ubiquitous everlasting Computing Crisis The von Neumann Syndrome and its history Lemming-like Datastream-based Computing is not new Programming Languages The sustained Language Selection problem Wiki: 659 languages • Programming Languages old lists: >3800 • Hardware Description Languages We must Reinvent Programming Conclusions

© 2012, [email protected] 49http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 50 http://hartenstein.de

The wrong Direction: We need „une' Levée en Masses“ TU Kaiserslautern Lobby or Herd Instinct? TU Kaiserslautern The transition from machine or, a C mafia ? level to higher level languages led to the biggest productivity gain ever made [Fred Brooks] WeWe wouldwould havehave neededneeded It‘s alarming that today‘s mega-bytes of code are a mass movement to compiled from languages at low a mass movement to abstraction levels (C, C++,Java) fightfight thisthis disasterdisaster Java is a religion – not a language [Yale Patt] © 2012, [email protected] 51 http://hartenstein.de © 2012, [email protected] 52 http://hartenstein.de 52

Language designer‘s “Programming” Domains (1) TU Kaiserslautern TU Kaiserslautern tunnel vision Personalization Programming Communication Platform ( “Programs” ) by Domain Paths Setup Time

Teaching to the students: Hardware CAD Space Fabrication Time Absurdely incomprehensible abstractions are the problem in „standard“ languages Hardware pipe network CAD Time and Space Fabrication Time

control-procedural Software Time Run Time Programming Languages: Debugging: uncertifyable, i. e. an art – data-procedural Flowware Time Run Time no science provably unprovable Flexware Configware Space Compile Time

Concurrency models should operate at Flexware pipe network Configware Time and Space Compile Time component architecture level rather Configware / Soft- Compile Time than programming languages. [E. A. Lee] Embedded Flexware ware Co-Compilation Time and Space and Run Time CAD / Configware / Fabrication time / modern embedded FPGA Time and Space 53 [E. A. Lee: Are new languages necessary for multicore? 2007] Flowware / Software compile time / run time [E. A. Lee. The problem with threads. Computer, 2006.] © 2012, [email protected] http://hartenstein.de © 2012, [email protected] 54 http://hartenstein.de

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 9 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

“Programming” Domains (2) “Programming” Domains (3) We urgently must bridge the paradigm gaps TU Kaiserslautern traditional Software Engineering tunnel vision TU Kaiserslautern Personalization Programming Communication Personalization Programming Communication Platform ( “Programs” ) by Domain Paths Setup Time Platform ( “Programs” ) by Domain Paths Setup Time

Hardware CAD Space Fabrication Time Hardware CAD Space Fabrication Time

Hardware pipe network CAD Time and Space Fabrication Time Hardware pipe network CAD Time and Space Fabrication Time

control-procedural Software Time Run Time control-procedural Software Time Run Time

data-procedural Flowware Time Run Time data-procedural Flowware Time Run Time

Flexware Configware Space Compile Time Flexware Configware Space Compile Time

Flexware pipe network Configware Time and Space Compile Time Flexware pipe network Configware Time and Space Compile Time

Configware / Soft- Compile Time Configware / Soft- Compile Time Embedded Flexware ware Co-Compilation Time and Space and Run Time Embedded Flexware ware Co-Compilation Time and Space and Run Time CAD / Configware / Fabrication time / CAD / Configware / Fabrication time / modern embedded FPGA Time and Space modern embedded FPGA Time and Space Flowware / Software compile time / run time Flowware / Software compile time / run time © 2012, [email protected] http://hartenstein.de © 2012, [email protected] http://hartenstein.de 55 56

TU Kaiserslautern Outline TU Kaiserslautern

The ubiquitous everlasting Computing Crisis Some DeFacto MATCH The von Neumann Syndrome and its history Hardware Datastream-based Computing is not new Description Languages The sustained Language Selection problem • Programming Languages • Hardware Description Languages already in 1980 We must Reinvent Programmer Education > 200 listed

Conclusions Galadriel & Nenia

© 2012, [email protected] 57http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 58http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

VHDL: biggest mistake TU Kaiserslautern TU Kaiserslautern The „nroff“ of EDA! in memoriam Richard Newton

[courtesy Richard Newton]

1951 - 2007 how to hide the ugliness [courtesy Richard Newton] from the user [Herman Schmit] © 2012, [email protected] 59http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 60 Richard Newton: The Next EDA Revolutionhttp://hartenstein.de (Japan, Sept 1996)

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 10 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

Lean Hardware Language-of the Year Phenomenon TU Kaiserslautern Qualification: TU Kaiserslautern [Richard Newton] imbelievable! 62 KARL

KARL

[courtesy Richard Newton]

… should be fired immediately

[courtesy Richard Newton] © 2012, [email protected] 61 Richard Newton: The Next EDA Revolutionhttp://hartenstein.de (Japan, Sept 1996) © 2012, [email protected] See here: http://xputer.de/UnB/Reinerhttp://hartenstein.de-SBCCI-12.pdf

Last language of the Year Obituary: CHDL series † TU Kaiserslautern TU Kaiserslautern st EU stops funding 1 CHDL'73 New Brunswick, NJ, USA 2nd CHDL'74 Darmstadt, Germany 63 CHDL series † CHDL’97 (last CHDL) 3rd CHDL'75 New York, NY, US 4th CHDL'79 Palo Alto, CA, USA 5th CHDL'81 Kaiserslautern, Germany 6th CHDL'83 Pittsburgh, PA, USA 7h CHDL'85 Tokyo, Japan 8th CHDL'87 Amsterdam, Netherlands 9th CHDL'89 Washington, DC, USA 10th CHDL'91 Marseille, France 11th CHDL'93 Ottawa, Ontario, Canada 12th CHDL'95 Makuhari, Chiba, Japan 13th CHDL'97 Toledo, Spain* [courtesy Richard Newton] *) attendance: >800 14th CHDL'99 Charlottesville, VA, USA © 2012, [email protected] See here: http://xputer.de/UnB/Reinerhttp://hartenstein.de-SBCCI-12.pdf © 2012, [email protected] 64 http://hartenstein.de

FPGA’s Semiconductor 66 VHDL Disadvantages

TU Kaiserslautern Market Share TU Kaiserslautern • VHDL is verbose, complicated and confusing • Some VHDL programs cannot be synthesized • Many different ways of saying the same thing • 7 Different tools support different subsets of VHDL. courtesy [Nick Tredennick] • Constructs that have similar purpose have very • Different tools generate different circuits for same code different syntax (case vs. select) • – The infamous latch inference problem (See section • Constructs that have similar syntax have very 1.5.2 for more information) different semantics (variables vs signals) • Used when timing of communication between • Hardware that is synthesized is not always obvious producer and consumer is unpredictable. still < 2% (when is a signal a flip-flop vs latch vs combinational) • Especially since VHDL is not sequential! • The disadvantage is that it is cumbersome to • VHDL is hard to debug implement and slow to execute. • Safety mechanisms may be more difficult to verify • Getting familiar with X32 (directory) • Design mapping into the FPGA device may not be structure/conventions takes some time optimal across hierarchical boundaries. This can cause lesser device utilization and decreased design performance. FPGAs Achilles’ heel is their long development time: • Design file revision control becomes more difficult. https://ece.uwaterloo.ca/~ece327/course-notes/notes-up1.pdf • Designs become more verbose. http://digitalelectronics.blogspot.de/2007/07/comparison-of--to-other-hardware.html Low level HDLs (VHDL/) are still dominant [Nick Tredennick] VHDL is made by man, so it is imperfect. In fact, VHDL is made by many, many men, so it is very, very imperfect. Let's see the words that VHDL users choose to describe VHDL :

Development with VHDL is very expensive … the impression that it is very unforgiving: strict, hard, very very bad, painful, terrible, [Nick Tredennick] it's too f[***]ing hard, because the smallest error will cause huge problems. © 2012, [email protected] 65http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] See here: http://xputer.de/UnB/Reinerhttp://hartenstein.de-SBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 11 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

Looking at Games and Azido [Nick Tredennick] TU Kaiserslautern http://azido.net/ TU Kaiserslautern Outline

The ubiquitous everlasting Computing Crisis Game systems become development platforms No special expertise required for System development The von Neumann Syndrome Visual system assembly by friendly graphical user interfaces Datastream-based Computing is not new Easy manipulation of open source component libraries The sustained Language Selection problem * Azido: “Software reinvents FPGA programming”: We must Reinvent Programmer Education (open-source libraries: electronics, FPGAs, mechanical components, controllers, actuators) Conclusions *) Windows client software © 2012, [email protected] 67http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 68 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Dead Supercomputer Society [list by Gordon Bell, Radical Progress needed

TU Kaiserslautern keynote ISCA 2000]. TU Kaiserslautern Will we never reach Zettaflops ? [Thomas Sterling] •ACRI •DAPP •MasPar (> 50) •Alliant •Denelcor •Meiko „Computing is cheap. Moving data is expensive“ •American •Elexsi •Multiflow Supercomputer •ETA Systems •Myrias •Ametek •Evans and Sutherland •Numerix But moving instructions is even more expensive •Applied Dynamics •Computer •Prisma •Astronautics •Floating Point Systems •Tera •BBN •Galaxy YH-1 •Thinking Machines Locality and transportation method is essential: •CDC •Goodyear Aerospace MPP •Convex •Gould NPL •Saxpy invisible with single abstraction layer tunnel vision •Cray Computer •Guiltech •Scientific Computer •Cray Research •ICL •Systems (SCS) Name FLOPS •Culler-Harris • Scientific Computers •Soviet Supercomputers More parallelism needed yottaFLOPS 1024 •Supertek zettaFLOPS 1021 •Culler Scientific •International Parallel by orders of magnitude. 18 •Cydrome Machines •Supercomputer Systems exaFLOPS 10 New architectures required petaFLOPS 1015 •Dana/Ardent/ •Kendall Square Research •Suprenum teraFLOPS 1012 Stellar/Stardent •Key Computer Laboratories •Vitesse Electronics for OS, RS, APIs and compilers … gigaFLOPS 109 megaFLOPS 106 © 2012, [email protected] http://hartenstein.de © 2012, [email protected] http://hartenstein.de 69 70 kiloFLOPS 103

More Abstraction Layers Mike Flynn‘s Taxonomy TU Kaiserslautern TU Kaiserslautern However, programmability crisis solution impossible without mastering the entire design space Include more Abstraction Layers to find critical sources of traditional efficiency limits to detect overhead and bottlenecks ! microprocessor

We must change how programmers think, also by …..

… fusion of abstraction layers (a challenge to educators!) … open the borders between different paradigm domains vector processor Instruction versus Data E. g. see Shuffle Sort: http://xputer.de/pucminas/ Single versus Multiple

© 2012, [email protected] http://hartenstein.de © 2012, [email protected] http://hartenstein.de 71 72 http://xputer.de/UnB/Reiner-SBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 12 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

A Huge Design Space The Design Space for the TU Kaiserslautern TU Kaiserslautern new Hw/Software Stack

extending Flynn‘s taxonomy by extending Diana‘s taxonomy by 4 boxes:

) noI versus SI or MI

reconfigurable or not Diana‘s Taxonomy Reiner‘s Taxonomy

machine

-

(NISC)

(anti

set

based -

Diana Göhringer instruction (on the TPC of

this conference) no datastream

© 2012, [email protected] 73 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 74 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

The tunnel vision of EDA Education Abstraction Layers merged http:// puter.de/EIS/ TU Kaiserslautern TU Kaiserslautern the pre-manycore age M-&-C merged layers Carver Mead extending Diana‘s taxonomy by traditional division of labor: of abstraction: application application Lynn [1980] ) noI versus SI or MI Conway submit reject

Reiner‘s Taxonomy

RT level *

submit reject

machine

- logic level clearing out & (NISC)

submit reject intuitive models

(anti

set switching level

submit reject

to master the coherence

fragmentation specialisation based - circuit level dilemma

submit reject tall thin man thin tall

instruction layout level

Silicon Foundry no local technology

datastream depth of reduced depth specialization of specialization

© 2012, [email protected] 75 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] *) or ”tall thin woman” 76 stressing “Systemshttp://hartenstein.de”

It is possible ! Programmer Education TU Kaiserslautern connected TU Kaiserslautern ... but you can never thinking „You can always teach hardware The most influential project * teach programming

in modern computer history to a hardware guy ... nor configware

[Luigi Dadda] to a programmer“ Clearing off niche specialists’s The required programmer population is not existing … reductionist tunnel vision junk … also due to the yaw-dropping sclerosis of the joint

coherence IEEE-CS /ACM task force on Computing Curricula, A coherent multi level methodology tall thin man thin tall Leaving the traditional tunnel vision reduced depth of specialization excellent exercises, tutorials and textbooks approach we need a multi-paradigm and vertically integrated programmer education *) or ”tall thin woman”

© 2012, [email protected] 77 http://xputer.de/EIS/http://hartenstein.de © 2012, [email protected] 78 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 13 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

TU Kaiserslautern Outline TU Kaiserslautern CS was a Monster 35 years wrong mainstream education approach: The ubiquitous everlasting Computing Crisis Horizontal challenges demand to bridge the paradigm gap between microprocessors, datastreams, FPGAs, and memory The von Neumann Syndrome and its history RC will be a significant dominating part of HPC solutions Datastream-based Computing is not new We need advanced tools for training the software community The sustained Language Selection problem The exclusively instruction-stream-oriented dominant mind set has lead us to the fatal mapping of parallelism We must Reinvent Programming into time only – abstracting away the space domain.

Conclusions We urgently need to remove all kinds of stonewalls

© 2012, [email protected] 79 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 80 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

We need „une' Levée en Masses“ We need „une' Levée en Masses“

TU Kaiserslautern TU Kaiserslautern WeWe neededneeded aa MassMass We needed a Mass MovementMovement toto examineexamine We needed a Mass Movement to encourage andand keepkeep aa checkcheck onon Movement to encourage & support our pioneers demolitionistdemolitionist intensionsintensions & support our pioneers

© 2012, [email protected] 81 http://hartenstein.de © 2012, [email protected] 82 http://hartenstein.de 81 82

Conclusions TU Kaiserslautern We need a Radical Approach TU Kaiserslautern 84 Since we‘ve also to re-write legacy software anyway, we are forced to go twin-pardigm. We need tool flow & education by a twin-paradigm multi-level approach providing locality awareness Obrigado ! Twin Paradigm & merged-level hardware knowledge are essential qualifications for programmers. We need a CS Education and Research Revolution Questions ? for multi-rail-thinking strictly avoiding tunnel vision

This is heralding a new era [James Larus]

© 2012, [email protected] 83 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 14 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

FPL2013 at Porto, Portugal

TU Kaiserslautern September 2 – 4, 2013 TU Kaiserslautern See here: http://fpl.org/FPL2013-PreliminaryCFP.pdf

23rd International Conference On Field Programmable Logic And Applications

Appendix PRELIMINARY CALL FOR PAPERS

General Chair: João M. P. Cardoso, University of Porto, Portugal Program Co-Chairs: Katherine (Compton) Morrow, University of Wisconsin-Madison, USA Pedro Diniz, INESC-ID, Portugal

© 2012, [email protected] 85http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] here : http://fpl.org/FPL201386 -PreliminaryCFP.pdfhttp://hartenstein.de

Bio Links to Reinvent Computing TU Kaiserslautern TU Kaiserslautern The Grand Challenge to Reinvent Computing http://xputer.de/pucminas/

Dr.-Ing. Reiner Hartenstein is full professor of the TU Kaiserslautern and Reinvent Computing? This idea is not new. See the keynote • MyBench independent expert and consultant of EDA in Reconfigurable Computing. by Burton Smith (former Cray CTO): http://xputer.de/reinvent/ • iNOC

As a scholar of Karl Steinbuch all his academic degrees are from EE at KIT Invasic Computing: agressive 30 people project http://xputer.de/invasic/ (Karlsruhe Institute of Technology), where he later was associate professor, working in image processing, computer architecture and hardware Invasive Computing — An Overview http://xputer.de/invasive/ description languages. He appreciates more than a decade of very fruitful cooperation with colleagues of the University of Brasilia and of the KIT. KAHRISMA: KArlsruhe's Hypermorphic Reconfigurable-Instruction-Set Multi- grained-Array Processor http://www.kahrisma.de/ http://xputer.de/kahr/ Prof. Hartenstein is FPL fellow, SDPS fellow, IEEE fellow and recipient of other awards. He gave more than 200 invited talks and 40 international ARAMiS (Automotive, Railway and Avionics Multicore Systems), a large keynote addresses. He has published more than 400 papers and authored, German/European project http://xputer.de/aramis/ edited or co-edited 16 books 1st Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM-2011) http://xputer.de/DFM1/ DFM 2012: http://xputer.de/DFM2/

© 2012, [email protected] 87http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf © 2012, [email protected] 88http://xputer.de /UnB/Reiner-http://hartenstein.deSBCCI-12.pdf

Microsoft Data Center The World's largest Data Center (in 2010) TU Kaiserslautern at Quincey TU Kaiserslautern [datacenterknowledge.com]

© 2012, [email protected] http://hartenstein.de © 2012, [email protected] http://hartenstein.de 89 90 [datacenterknowledge.com]

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 15 Reiner Hartenstein, TU Kaiserslautern, Germany [email protected] http://hartenstein.de 13 September 2012

>2000 datacenters world-wide ... Supercomputers ... [datacenterknowledge.com] TU Kaiserslautern TU Kaiserslautern

(picture from earlier talks)

© 2012, [email protected] 91 http://hartenstein.de © 2012, [email protected] 92 http://hartenstein.de

Sucperomputers as Scientific Instruments TU Kaiserslautern TU Kaiserslautern ... more Supercomputers ... …Their usage patterns and scientific impact are closer to major research facilities such as CERN, ITER, or Hubble. [Andrew Jones, vice president Numerical Algorithms Group]

93 © 2012, [email protected] http://hartenstein.de 94© 2012, [email protected] 94 http://hartenstein.de

Learning to go Exascale All but ALU is overhead: TU Kaiserslautern CACHES 2011 TU Kaiserslautern x20 inefficiency 95 st (picture from 1 International Workshop on Characterizing earlier talks) [R. Hameed et al.: Understanding Sources of Inefficiency in General-Purpose Chips; Applications for Heterogeneous Exascale Systems 37th ISCA, June 19-23, 2010, St. Malo, France] June 4th, 2011, held in conjunction with ICS'2011

(data Just one cashe) of several overhead layers 25th International Conference on Supercomputing May© 2012, 31 [email protected] - June 4, 2011, Loews Ventana Canyon Resort, Tucson,http://hartenstein.de Arizona © 2012, [email protected] 96 http://xputer.de/UnB/Reiner-http://hartenstein.deSBCCI-12.pdf 95

Keynote Chip in Brasilia - SBCCI 2012, 25th Symposium on Integrated Circuits and System Design, August 30 – September 2, 2012, Brasilia, Brasil 16