The Flow Programming Language: an Implicitly-Parallelizing, Programmer-Safe Language

““WriteWrite Once,Once, ParallelizeParallelize Anywhere”Anywhere” TheThe FlowFlow ProgrammingProgramming Language:Language: AnAn implicitly-parallelizing,implicitly-parallelizing, programmer-safeprogrammer-safe languagelanguage Luke Hutchison Moore'sMoore's LawLaw ● TheThe numbernumber ofof transistorstransistors thatthat cancan bebe placedplaced inexpensivelyinexpensively onon anan integratedintegrated circuitcircuit hashas doubleddoubled approximatelyapproximately everyevery twotwo years.years. Moore'sMoore's LawLaw ● ComputingComputing isis work;work; transistorstransistors dodo work;work; moremore transistorstransistors workingworking inin parallelparallel shouldshould yieldyield fasterfaster computerscomputers ● =>=> Moore'sMoore's LawLaw hashas (so(so far)far) alsoalso appliedapplied toto CPUCPU speedspeed ● BUTBUT nownow we'rewe're hittinghitting multiplemultiple walls:walls: ● TransistorTransistor sizesize – tunneling;tunneling; lithographylithography featurefeature sizesize vs.vs. wavelengthwavelength – nono moremore 2D2D densitydensity increasesincreases ● ThermalThermal envelopeenvelope – functionfunction ofof frequencyfrequency andand featurefeature sizesize =>=> wewe havehave hithit aa clockspeedclockspeed wallwall ● DataData widthwidth – 128-bit128-bit CPUs?CPUs? NeedNeed parallelparallel controlcontrol flowflow insteadinstead ButBut it'sit's worseworse thanthan thatthat It'sIt's notnot justjust thethe endend ofof Moore'sMoore's Law...Law... ...it's...it's thethe beginningbeginning ofof anan eraera ofof buggierbuggier software.software. HumansHumans willwill nevernever bebe goodgood atat writingwriting multi-threadedmulti-threaded codecode –– ourour brainsbrains simplysimply don'tdon't workwork likelike that.that. (Abstractions like Futures only help a little bit.) ParallelParallel ProgrammingProgramming ToolsetToolset ● Traditional locks: mutex, semaphore etc. ● Libraries: java.lang.concurrent ● Message passing: MPI; Actor model (Erlang) ● Futures, channels, STM ● MapReduce ● Array programming ● Java7 ParallelArray ● Intel Concurrent Collections (CnC) ● Intel Array BuildingBlocks (ArBB) ● Data Parallel Haskell (DPH) ● ZPL But—all require programmer skill, extra effort and shoehorning of design. HowHow toto shootshoot yourselfyourself inin thethe footfoot Classic 1991 – http://bit.ly/hiwaG1 FORTRAN : You shoot yourself in each toe, iteratively, until you run out of toes, then you read in the next foot and repeat. COBOL : USEing a COLT 45 HANDGUN, AIM gun at LEG.FOOT, THEN place ARM.HAND.FINGER on HANDGUN.TRIGGER and SQUEEZE. THEN return HANDGUN to HOLSTER. CHECK whether shoelace needs to be retied. Lisp : You shoot yourself in the appendage which holds the gun with which you shoot yourself in the appendage which holds the gun with which you shoot yourself in the appendage which holds... BASIC : You shoot yourself in the foot with a water pistol. Forth : Foot yourself in the shoot. C++ : You accidently create a dozen instances of yourself and shoot them all in the foot. Providing emergency medical assistance is impossible since you can't tell which are bitwise copies and which are just pointing at others and saying "That's me, over there." C : You shoot yourself in the foot. My addition: Explicit parallelization in any language : You shoot in the yourself <segfault> TheThe statestate ofof Moore'sMoore's Law,Law, 20102010 ""Finding:Finding: ThereThere isis nono knownknown alternativealternative forfor sustainingsustaining growthgrowth inin computingcomputing performance;performance; however,however, nono compellingcompelling programmingprogramming paradigmsparadigms forfor generalgeneral parallelparallel systemssystems havehave yetyet emerged."emerged." —The Future of Computing Performance: Game Over or Next Level? Fuller & Millett (Eds.); Committee on Sustaining Growth in Computing Performance, National Research Council, The National Academies Press, 2010, p.81 =>=> TheThe MulticoreMulticore DilemmaDilemma TheThe rootroot ofof thethe problemproblem ● PurelyPurely functionalfunctional programmingprogramming languageslanguages cancan bebe safelysafely implicitlyimplicitly parallelizedparallelized ● NoNo side-effects,side-effects, nono statestate ● =>=> threadsthreads cannotcannot interactinteract ● e.g.e.g. severalseveral parallelparallel versionsversions ofof HaskellHaskell TheThe rootroot ofof thethe problemproblem butbut purelypurely functionalfunctional programmingprogramming languageslanguages areare extremelyextremely hardhard forfor meremere mortalsmortals toto useuse toto solvesolve real-worldreal-world problems.problems. OnOn thethe otherother hand,hand, gettinggetting explicitexplicit parallelizationparallelization rightright –– andand writingwriting multithreadedmultithreaded codecode quicklyquickly –– maymay actuallyactually bebe harder.harder. MereMere mortalsmortals ""Recommendation:Recommendation: InvestInvest inin researchresearch andand developmentdevelopment ofof programmingprogramming methodsmethods thatthat willwill enableenable efficientefficient useuse ofof parallelparallel systemssystems notnot onlyonly byby parallelparallel systemssystems expertsexperts butbut alsoalso byby typicaltypical programmers.programmers."" —The Future of Computing Performance: Game Over or Next Level? Fuller & Millett (Eds.); Committee on Sustaining Growth in Computing Performance, National Research Council, The National Academies Press, 2010, p.99 TheThe rootroot ofof thethe problemproblem ● We like imperative programming languages: “do this, then this”. ● But for imperative programming languages, it is impossibleimpossible toto telltell thethe exactexact data dependency graph of a program by static analysis (by looking at the source). ● (The data dependency graph tells you the order you need to compute values.) ● Therefore you don't know what can be computed in parallel without: – (1) trusting the programmer (explicit parallelism) – VERY BAD – (2) guessing (static code analysis) – possibly/probably very bad – (3) trying and failing/bailing if you were wrong (STM) FunctionalFunctional vs.vs. imperativeimperative ● Functional:Functional: ● gather,gather, pull,pull, reducereduce ● Imperative:Imperative: samesame asas functional,functional, butbut addsadds ● scatter,scatter, push,push, produceproduce TrainingTraining wheelswheels ● HowHow toto minimallyminimally constrainconstrain imperativeimperative programmingprogramming toto allowallow forfor automaticautomatic implicitimplicit parallelizationparallelization withwith guaranteedguaranteed threadthread safety...safety... ……withoutwithout forcingforcing thethe programmerprogrammer toto useuse trainingtraining wheels?wheels? TheThe realreal rootroot ofof thethe problemproblem ● ...is the ability to read the current value of a variable. (Surprising. Foundational.) ● => Einstein's theory of the relativity of simultaneity: there is no such thing as “now” given physical separation. ● c.f. Values vs. variables. ● Remove the concept of “now”, and you minimally restrict a language such that it's impossible to create race conditions or deadlocks. ● But you also don't have to program in purely functional style: Can support a subset of imperative, push-style programming. IntroducingIntroducing FlowFlow ● FlowFlow isis aa newnew programmingprogramming paradigmparadigm thatthat enforcesenforces thisthis restrictionrestriction toto solvesolve thethe multicoremulticore dilemmadilemma whilewhile providingproviding strongstrong andand specificspecific guaranteesguarantees aboutabout programprogram correctnesscorrectness andand runtimeruntime safety.safety. IntroducingIntroducing FlowFlow ● FlowFlow isis aa compilercompiler frameworkframework thatthat cancan bebe targetedtargeted byby aa widewide rangerange ofof differentdifferent languageslanguages ● [will[will probablyprobably plugplug intointo LLVM]LLVM] ● ...and...and (eventually),(eventually), aa referencereference languagelanguage ● ItIt doesn'tdoesn't actuallyactually existexist yet,yet, comecome helphelp makemake itit existexist ● http://flowlang.net/http://flowlang.net/ GoalsGoals ofof FlowFlow ● Solve the multicore dilemma ● Ubiquitous implicit parallelization, zero programmer effort ● “Write once, parallelize anywhere” – Cluster : hadoop / MapReduce, MPI or similar – JVM via Java Threads – C via pthreads – CPU ↔ GPU via CUDA ● Optimal load balancing via big-Oh analysis of computation and communication cost ● Prevent programmers from shooting themselves in the foot ● Make race conditions, deadlocks, segfaults/NPEs, memory leaks impossible BackBack toto thethe drawingdrawing boardboard FlowFlow enablesenables implicitimplicit parallelizationparallelization byby syntacticallysyntactically enforcingenforcing everyevery programprogram toto bebe aa latticelattice oror partialpartial orderingordering,, suchsuch thatthat thethe programprogram sourcesource codecode itselfitself isis thethe datadata dependencydependency graph.graph. ThisThis makesmakes parallelizationparallelization trivial.trivial. BackBack toto thethe drawingdrawing boardboard Also note that this does not eliminate “push” programming ● G can be an unordered collection It also does not mean you can't “change the value of a variable” ● But you have to specify a specific timestamp, e.g. x@(t) = x@(t-1) + 1 (then each value of x is a specific node in the DAG) => Minimally-restricted imperative programming with maximum implicit parallelization AnAn exampleexample ● Histogram generation ● Multithreaded version? Two options: (1) locks (=> contention!) (2) keep separate

The Flow Programming Language: an Implicitly-Parallelizing, Programmer-Safe Language

An Architectural Trail to Threaded-Code Systems

Basic Threads Programming: Standards and Strategy

Effective Inline-Threaded Interpretation of Java Bytecode

Threaded Code Is Not in Fact Implemented in the Computer's Hardware

Csc 453 Interpreters & Interpretation

Static Analysis of Lockless Microcontroller C Programs

A Case Study of Multi-Threading in the Embedded Space

Virtual Machine Showdown: Stack Versus Registers

Multiple Cores + SIMD + Threads) (And Their Performance Characteristics

Generating Multithreaded Code from Parallel Haskell for Symmetric Multiprocessors Alejandro Caro

The Vector-Thread Architecture

Threaded Code Variations and Optimizations