This article was downloaded by: 10.3.98.104 On: 24 Sep 2021 Access details: subscription number Publisher: CRC Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London SW1P 1WG, UK

Heterogeneous Computing Architectures Challenges and Vision Olivier Terzo, Karim Djemame, Alberto Scionti, Clara Pezuela

Programming and Architecture Models

Publication details https://www.routledgehandbooks.com/doi/10.1201/9780429399602-3 J. Fumero, C. Kotselidis, F. Zakkak, M. Papadimitriou, O. Akrivopoulos, C. Tselios, N. Kanakis, K. Doka, I. Konstantinou, I. Mytilinis, C. Bitsakos Published online on: 10 Sep 2019

How to cite :- J. Fumero, C. Kotselidis, F. Zakkak, M. Papadimitriou, O. Akrivopoulos, C. Tselios, N. Kanakis, K. Doka, I. Konstantinou, I. Mytilinis, C. Bitsakos. 10 Sep 2019, Programming and Architecture Models from: Heterogeneous Computing Architectures, Challenges and Vision CRC Press Accessed on: 24 Sep 2021 https://www.routledgehandbooks.com/doi/10.1201/9780429399602-3

PLEASE SCROLL DOWN FOR DOCUMENT

Full terms and conditions of use: https://www.routledgehandbooks.com/legal-notices/terms

This Document PDF may be used for research, teaching and private study purposes. Any substantial or systematic reproductions, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The publisher shall not be liable for an loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 . eeoeeu eieSelection Device Heterogeneous 3.4 Languages Programming Heterogeneous 3.3 Models Programming Heterogeneous Introduction 3.2 3.1 CONTENTS Greece Athens, Bitsakos Athens, C. of and University Technical Mytilinis, National I. Konstantinou, I. Doka, K. UK Derbyshire, Kanakis ITC, N. Works Spark and Tselios, C. Akrivopoulos, O. UK Manchester, Papadimitriou Manchester, M. of and University Zakkak, The F. Kotselidis, C. Fumero, J. Models Architecture and Programming 3 . non uoenProjects European Ongoing 3.6 Architectures and Models Programming Emerging 3.5 .. ALOHA 3.6.1 Frameworks Serverless 3.5.2 Initiatives Collaborative Hardware-bound 3.5.1 Intelligence Platforms 3.4.3 Schedulers 3.4.2 3.4.1 Languages Specific Languages Domain Programming Dynamic and Managed 3.3.3 68 Languages Programming Unmanaged 3.3.2 Tools Synthesis High-Level, 3.3.1 FPGA Synthesis: Hardware Heterogeneous 3.2.4 for Representations Intermediate Models 3.2.3 Programming Heterogeneous Explicit 3.2.2 Models Programming Directive-Based 3.2.1 ... pnP4.0 OpenMP OpenACC 3.2.1.2 3.2.1.1 Programming CUDA 3.2.2.1 ... OpenCAPI Group Khronos 3.5.1.2 3.5.1.1 ...... 53 83 83 81 80 78 78 78 76 74 73 73 72 70 70 69 66 62 59 57 56 56 56 54 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 aklvladsic-eest ceeaecmuainaddt rnfr while transfers consumption. data energy and reducing the computation Graphics into accelerate FPGAs a to switch-levels adding and and started cores, rack-level centers computing data Recently, of (GPU). set a Unit a contain Processing with devices (CPU) mobile Unit currently, and Processing example, Central types For different chips. of integrated processors specialized comprise systems computing Contemporary Introduction 3.1 re- heterogeneous all system. of computer advantage a take within that architecture available and applications sources programming develop state-of-the-art to popular models most chap- This the hardware. their presents programmable enabling via ter servers, applications their their accelerate in to (FPGAs) clients data Additionally, such Arrays hardware, GPU. Gate heterogeneous a Field-Programmable more and more as and cores, more or computing integrating currently of one are set centers contain a with personal usually one ranging desktop each systems and scale, CPUs, laptops, computing every tablets, Heterogeneous as in such computers. servers, and to everywhere devices, mobile present from is hardware Heterogeneous Conclusions 3.7 54 i 12 ihtepriso fteauthor. the of permission the with [162] sis components. hardware specialized other ac- a or Such with share bus. FPGAs, equipped PCIe they GPUs, also the and be are through , they connected may normally CPU, celerators local accelerators, the writing— own other to of of its addition set time In has a (LLC). the core cache typically—at Each of level CPU, cores. last 8 one to least sys- 4 at computing comprising by shared-memory equipped are Heterogeneous tems system. computing memory 1 iue31sosa btatrpeetto fahtrgnosshared- heterogeneous a of representation abstract an shows 3.1 Figure ato h aeilpeetdi hscatri nlddi unFmr’ h the- PhD Fumero’s Juan in included is chapter this in presented material the of Part ..2VINEYARD 3.6.12 TANGO 3.6.11 RECIPE PHANTOM 3.6.10 MONT-BLANC 3.6.9 MANGO 3.6.8 LEGaTO 3.6.7 EXTRA 3.6.6 EXA2PRO 3.6.5 EPEEC 3.6.4 E2Data 3.6.3 3.6.2 ...... rgamn n rhtcueModels Architecture and Programming 1 87 87 86 86 85 85 85 85 84 84 84 84 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 eeoeeu.Iely rgamr att aeueo smc resources much as system of a use renders make in- models to want different programming programmers of and Ideally, combination models, heterogeneous. This memory models. sets, programming struction and the model obtain to memory runtime at big wired a are as blocks seen functionality. logic be desired which can they in FPGAs Instead, core GPUs, cores. programmable and computing CPUs traditional to contain contrast not In do blocks. logic groups. and units in programmable cores Compute (SM). called organize Multiprocessors are GPUs Stream groups are or These cores. memory. (CUs) capabilities cache CPU Units own their its of contains since those group “light-cores”, Each to considered compared are limited cores These cores. system. computing heterogeneous a of ( representation Abstract 3.1: FIGURE Introduction rjcsfrslcigtems utbedvc ornats ihna het- an shortly within 3.6 Section task research; a under emerging currently run are and to that new platforms describes device heterogeneous 3.5 suitable Section environment; most architecture the erogeneous state-of-the-art develop selecting analyses to for 3.4 used Section projects currently platforms; are heterogeneous academia; that on and languages applications industry programming in shows heterogeneous used 3.3 popular being most Section currently chapter the are this that discusses of 3.2 models programming Section rest follows: The as systems. structured heterogeneous is programming for today energy. used save and performance increase to primarily possible, as © hscatrdsrbstepeaetpormigmdl n languages and models programming prevalent the describes chapter This set, instruction own its has CPU, the including devices, these of Each many of composed hardware programmable are hand, other the on FPGAs, computing of thousands, even or hundreds, contain GPUs hand, one On unFmr 2017.) Fumero Juan 55 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 ftecd 33 7,30 8,7,78]. 77, 187, 360, execution 370, the scheduling [303, resources, and code available memories, the the different of managing across application. data for resulting the invo- responsible perform the transferring are method accompanies or systems of that data, runtime consists systems These transfer runtime mainly parallel, some code to in generated cations segments The code appropriate synchronization. generate execute some and either directives system. the to runtime process code to a responsible performed. 251]—and a is be 395, of compiler differ- The combination should 394, a in points [55, as synchronization source-to-source accessed implemented how compiler—often are are and variables models code programming how Directive-based the code, of location of interpreting the parts sequential regions about of ent original compiler parallel capable the the potentially inform of is essentially version of directives that parallel ap- new a compiler These This the create compatible code. ignoring possibly code. simply a and original by directives use the sequentially the to program alter the or to run directives, need to users the enables without proach code source the source in as quential annotations known of use also the code, on based is Models programming Directive-based Programming Directive-Based 3.2.1 the and benefits the discusses models it model. Furthermore, programming each CUDA. of heterogeneous trade-offs and and OpenCL parallel relevant as explicit and such describes OpenMP. popular and it most OpenACC most the Then, as such the describes models first programming of It directive-based computing. notable overview heterogeneous high-level for models a programming gives section Models This Programming Heterogeneous 3.2 the summarizes 3.7 Section heterogeneous finally, on architectures; chapter. research and perform models that programming projects European ongoing presents 56 eu rhtcue,sc smlicr PsadGU,uigtedirective- the using GPUs, and heteroge- CPUs approach. on multi-core based running as for such applications architectures, stan- implement new neous This to 2011. programmers in allows conference stan- dard Super-Computing Accelerators) the (Open at [293] pro- proposed Following OpenACC was for the approach. dard OpenMP, standard directive-based of case dominant the successful using the the systems is multi-core [296] gramming (OpenMP) Multi-Processing Open OpenACC 3.2.1.1 pragmas ietvsaeue oantt xsigse- existing annotate to used are Directives . rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 hsbo 26.Ti eso ftesadr nldsasto e directives new of set CPUs. a writing multi-core of includes as time well standard the as the at GPUs of standard program version the to This of standard [296]. versions the latest book since the this devices of one heterogeneous 4.0, for GPUs, OpenMP support on included programming heterogeneous also for OpenMP OpenACC of success the Following 4.0 OpenMP 3.2.1.2 do parts two 1 the Listing as long as is synchronization kernel of a need of overlap. parts the that not different with array access array the to same kernels of the different range GPU). enables a the essentially and (e.g., setting This device generate accessing. requires target to the OpenACC system from that runtime and to the Note copies, and necessary compiler the only the execute by array used the is the in information within kernel). iteration statement the each the by of Since written result loop. are for by (i.e., annotated read out the are arrays by copied (i.e., the be region example, to parallel this need the In which to and copied kernel) be the to need variables and the arrays with annotated kernels are kernels acc Computing directives. OpenACC running with notated for environment the prepares and system. region, heterogeneous parallel a a on identify pro- as that section actual calls runtime code the to a directives before performs user’s the replacements that translates program code compilation) computer gram and (a substitution, pre- evaluation, a Then, some code. C in rectives Models Programming Heterogeneous uha P raGU hn tseie o h aasol ese by seen be should data the how specifies it Then, GPU. a or CPU a as such the able through be accelerator to systems GPUs. memory for computing computation directive shared express existing program the and to the extending multi-cores for from GPU terminology for GPU new OpenMP the for introduced use also OpenMP 4.0 to OpenMP that OpenMP how level. because about is is explicit This observation completely OpenACC. than first is verbose The more much 4.0. is programming OpenMP with GPUs 4 3 2 1 raigdw itn ,ln esanwOeM einta agt an targets that region OpenMP new a sets 1 line 2, Listing down Breaking target to addition vector perform to how of example an shows 2 Listing } { i++) copyout(c[0:n]) i

69 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 aino hs agae nldsalnug ita ahn V)that (VM) ob- Machine including memory, Virtual implemen- manages The language automatically area. a and heap applications includes and managed the Java a languages executes in like those allocated languages by of are provided are tation languages objects system which examples those runtime in Some to a C#, implementation. refers by language managed that automatically the term is a memory is whose Languages languages Programming programming Dynamic Managed and Managed reduced 3.3.2 of cost the program. at the of programmers, execution the cycles, the from development over quicker expertise control low-level thus less are and requiring prototyping, they while faster that programming for Higher-level is allow programmers. the languages languages contrary, from the expertise unmanaged higher On require models, language. using and target execution of the syntax, for disadvantage rules, defined all the well that are models is memory sections and previous identify the to in extend language described CUDA the and in tokens OpenCL and time, modifiers same new parallelism. with the standard At C99 definition. languages the both standard programming for heterogeneous directives the all explicit include in as OpenMP and well OpenACC as Both models. directive-based, both high-performance use and can low-level use to tend learning, languages. recently, machine programming more and applications and simula- data of forecast, physics weather big majority biology, as chemistry, such the physics, (HPC), particle Furthermore, tion, Computing Fortran. programming High-Performance and heterogeneous for for C developed standards for many GPUs, designed fact, as are In such FPGAs. architectures and heterogeneous CPUs programming for (e.g., used memory languages manage automatically tend that languages those Java). These to programmers. compared lower-level the in be by languages to handled those directly to is refers that memory term which a is languages Languages programming Unmanaged Programming Unmanaged 3.3.1 70 oplrt xct ffiinl h nu programs. and input interpreter the an efficiently VM, execute language to Those efficient it. compiler an implement a efficiently on to how rely of fo- languages instead solve to programming to programmers want they allowing what faster, on cus and makes easier this much perspective, development programmer’s application the From Javascript. and Ruby Python, to application. their them of allowing aspects bugs, other accompanying the on from potential management focus the memory with of along burden programmer, the removes This de-allocation. ject utemr,sm aae rgamn agae r ntpd like un-typed, are languages programming managed some Furthermore, standards the of any using Fortran or C in programming of advantage The developers Fortran, and C/C++ from devices heterogeneous program To C/C++ and Fortran r h otcmo naae programming unmanaged common most the are rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 sn xenllibraries: external Using languages: to managed techniques from main two FPGAs previous currently the and are in GPUs There listed program languages. models managed programming from the follow section that accelerators use as to such languages, managed-programming Javascript. are majority and 10 the Python, top that Google C#, the is Java, in common projects, in languages Github have the all as of What such papers. popularity, academic and its Index searches, by TIOBE languages sources: programming different three using PYPL 2018 October in using languages programming PYPL. and popular IEEE, most Index, 10 TIOBE sources: the different of three Raking 3.1: TABLE Languages Programming Heterogeneous 2018 wrapper: a Using ntecs fJv,te a eepsduigteJv aieInterface from Native called Java and the code C using using exposed implemented side. be are Java can example, the operations For they which level. Java, in C man- of the (JNI), the from case to interface the exposed language normally in are a operations, that through common computation language of matrix aged and set array a for provide operations They Fortran. and C/C++ ceeaos u otait ihtentr ftemngdprogramming understanding. managed the low-level the on requires of it executed nature since be the languages to with applications contradicts the but over calls. accelerators, API control standards fine-tune of provides set same This the using languages programming high-level 4 3 2 h usinta rssatrloiga h aafo al . show is 3.1 Table from data the at looking after arises that question The languages programming popular most the of ranking a shows 3.1 Table http://pypl.github.io/PYPL.html https://spectrum.ieee.org/static/interactive-the-top-programming-languages- https://www.tiobe.com/tiobe-index/ 4 ahsuc eeec ssadffrn erhciei o reigthe ordering for criteria search different a uses reference source Each . agaeRank Language 10 9 8 7 6 5 4 3 2 1 h s faclrtr sdrcl rgamdfo the from programmed directly is accelerators of use the ceeae-irre r mlmne ietyin directly implemented are accelerated-libraries Javascript TIOBE Python VBasic C++ Swift PHP SQL Java C# C Javascript Assembly Python IEEE C++ PHP Java C# Go R C Objective-C Javascript C/C++ Python Matlab PYPL Swift PHP Java C# R 2 IEEE , 3 and , 71 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 uha iiinb eo ntecs fJv,teJMi eurdt ho an throw to required OpenCL. is or JVM CUDA the Java, in of case the exceptions, In runtime zero. ArithmeticException with by division happens a example, what as For clarify such not models. does programming standard heterogeneous OpenCL the the of those match always a to called the representation is by high-level process This their required marshalling versa. from is vice data and effort representation the extra low-level convert Therefore, to programming object. system high-level runtime an object-oriented OpenCL is in neither everything while example, languages For objects, types. support complex- and CUDA the memory not increases managing efficiently languages of managed Program- ity from C. like devices languages heterogeneous unmanaged differ- extra in be ming are an will implemented makes arrays applications those JNI of to all performance that compared Java, the means ent therefore, in This and side. arrays, example, op- all JNI’s For to for the copy platform. becomes to side target it Java’s difficult the from more copied for the code abstraction the develop- the timize the higher ease the languages process, ment managed-programming and high-level Although calls. JNI via Challenges execute and compiled OpenCL that the string the to Java matches a passed that as Java then Both expressed from is are Java. invoked Kernels be respectively. within to definitions CUDA CUDA calls and native programming of for set a [386], expose JCUDA are and that Java, operations within high-level of set a CUDA. exposes in It implemented GPUs. OpenCL efficiently library on for Python and learning a GPUs is deep CUDA PyTorch for processing. highly-optimized-function using image and numerous and computation statistics, Java, processing, contains C++, matrix signal it C, for in and designed GPUs underneath, and is CPUs It parallel programming Python. for library a is 72 hn erigdetefc hti rvdsa nrosqatt fexternal of quantity enormous an ma- provides using it events that future fact to been predict R the has even use due and and scientists learning popular, statistics, Many chine most provide years. the data, three of last big one the process is during 3.1, popularity Table its for in designed increasing shown specifically languages as programming pur- R, a specific statistics. is an [205] for R designed example For languages pose. are (DSLs) Languages Languages Specific Domain Specific Domain 3.3.3 5 utemr,tesmniso ihlvlpormiglnugsd not do languages programming high-level of semantics the Furthermore, OpenCL programming for [217], JOCL are wrappers of example Some PyTorch and [264] ArrayFire are libraries external of Examples https://pytorch.org n a etm-osmn eedn nteojcsa hand. at objects the on depending time-consuming be can and u adaeecpin r o urnl supported currently not are exceptions hardware but , rgamn n rhtcueModels Architecture and Programming marshalling/un- 5 ArrayFire . Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 nonee nCodifatutrs h ceuigtcnqe a ecate- be can techniques scheduling The infrastructures. Cloud in encountered PBS-Torque and [362] architecture Condor same with as the 89]. such themselves following schedulers [196, etc, register “batch” messages, legacy resource to typical heartbeat centralized with approaches as a such different utilizing master, use the by the manage workers slaves) and (i.e., The consolidate workers registry. to a different used where is of manner, “master”) amount master-slave the a networking, (called in storage, machine operate nodes, central typically computing infras- schedulers of underlying Resource clusters sched- the of etc. to resource consist tasks called can allocating that systems for tructure software responsible the are by datacenters, Schedulers managed HPC ulers. and being Cloud is as such infrastructure environments, computing typical In Schedulers 3.4.1 machine analysis, code various device source selection. discusses profile-guided offline suitable and and as: presents most models, such it learning selection the specifically, device select More for can code. techniques their developers executing how for describes section Selection This Device Heterogeneous 3.4 required domain. totally are of models, their programmers architecture complexity to and result, unrelated programming the a different understand raises As and specialization. mix wrappers special- to their their Introducing breaks and domain. simplicity and specific their DSLs of a DSLs, because in of used ization more case mainly even the are are systems In DSLs heterogeneous complex. languages. programming programming of challenges managed the in however, as wrappers, performance. and better for Fortran programming lower and in C written as normally such are languages modules external These modules. Selection Device Heterogeneous rmdffrn plctos ohshdlr ffrtepsiiiyt “label” to possibility the requests offer negotiation upon schedulers a provided Both and utilizes applications. requested also de- different being It modern from are software. in resources ecosystem used where scheduler Hadoop those framework de-facto on the the execute of framework to is node ployments each application [372] master what YARN while Mesos and Apache framework, The accepts resources. each it Mesos). offer resources of to the top determines resources programs, on many map-reduce run how servers, that web decides like etc. (apps databases, frameworks NoSQL to made are offers [276]. subtask-based MapReduce-based dynamic), and vs. (static pipe-lining, scheduling, methods partitioning workload in gorized pceMss[9]ue w-ee ceuigmcaimweeresource where mechanism scheduling two-level a uses [198] Mesos Apache resources of heterogeneity the embrace to starting are schedulers Modern libraries external using programmed be can devices heterogeneous DSLs, In 73 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 .. Platforms 3.4.2 identify not. to of or achieved merging) effect be (i.e., the can device study speedup same they features. a the that whether selected on is carefully kernels [378] on executing to based concurrently compared time differentiation execution main predict Their to built is metrics. model performance they different Nevertheless, explore they arrival. not and workload do machines they upon to and vector time DAG-oblivious features execution also support the are these on predict based utilize to it model They use learning time). machine that execution features a (i.e., kernel based build important is performance the approach and the identifying Their predict on affect engine. can i.e., appropriate BlackScholes, modeling, they most predictive bfs, that the on to show as kernels they such their methodology and deploy kernels) their QuasirandomG, evaluate (i.e., and They algorithms Dotproduct, devices. complex GPU different and CPU with both on executed of similar DAG-oblivious), aware is not it (i.e., ORiON. is workload to submitted it the nevertheless in relations problem, task any Programming Linear Integer Mixed het- hardware both the exploited. Although systems, not technique. software is heterogeneous dynamic-programming erogeneity with a given work utilizing workflow required can execution by schedulers engines abstract user an analytics in the of pipeline by sequence data-processing can a IReS optimal execute optimization. the to resource materialize for formulation and problem identify com- linear prediction a integer resource on and for a based software process and is learning of It machine decision-tree-like job. terms a analytics appropri- of in incoming bination the an configuration execute cluster decide to adaptively the resources hardware can with along ORiON framework clusters. ate data big on workloads handle to user the leaving defined-labels, user heterogeneity. with nodes execution different 74 gr)ta r epnil o h P aaeet xcto scoordinated (GPUMan- is Execution management. processes GPU upon introduces the for builds and responsible it can are Flink, that, that it achieve agers) of To processors, accelerators. architecture CPU GPU master-slave of from the top apart on that, tasks execute is also key-feature Its Flink[93]. extent Apache application what improve to to investigate exploited a we is in section, datacenters data performance. modern of this of amounts In heterogeneity large manner. the parallel frameworks streaming in analytics or data process parallel batch systems and These distributed emerged. many have years last the Over Fik[0]i itiue tempoesn ltomta extends that platform stream-processing distributed a is [103] GFlink learning machine a where [377], by followed also is approach similar A be can that tasks for methodology scheduling a present authors the [378] In a forming by resources heterogeneous schedules also [369] TetriSched heterogeneous schedule can both schedulers [133] IReS and [391] ORiON rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 eto faGUfrats eed nwehra loih t h GPU the fits algorithm an whether on se- depends The task device. a hardware for beneficial GPU most a the of adap- to optimizer lection queries rule-based A user it. schedules of version tively GPU-aware fully a creates to and optimizer Thus, its with GPUs. comes for Spark-GPU the manager. hardware, execution At resource underlying isolated custom time. the own offer given of potential not a the at did exploit [372] device [198] YARN like the Mesos managers on resource and developed, run was Spark-GPU should when application moment one only and to tasks. GFlink, order serialization-deserialization in as often space Moreover, of heap cost Java basis. excessive the and per-block of the memory or avoid instead native memory per-element in native a data utilizes in its Spark-GPU all consumed buffers be which can structure introducing and new Spark- model a underutilization. iterator Spark’s resource GPU-RDD: extending to by consumed issue leads massive this and and overcomes the offer GPU match [392] can not RDDs GPU does a as this parallelization However, modeled fashion. is one-element-at-a-time Spark a in in Data modes. Spark[393]. mod- execution but Apache both execution support GPU to managing TaskManagers GFlink, for Flink to process existing similar separate the very a ifies Although use datasets. not large does proccess EML to GPUs and data Flink operations. cache copying to in devices spent GPU time the serialization the instruct object way, reduce can This and master them. the structures on Moreover, custom avoided. directly define is operate can and Users off-heap How- management: live opts bus. memory GFlink that PCIe efficient cost, memory the more serialization-deserialization over serialize a high data incurs for transfer to and process is heap, this JVM as devices ever, the GPU in live the that and objects JVM between provided. communication not are algorithm scheduling GPU the Nev- on or un- GPUs. avoid details CPU among to ertheless, load-balancing for achieves manages either it and while workers locality-aware communication, to is necessary scheme tasks scheduling schedules The which execution. master single a by Selection Device Heterogeneous i ieaGUkre,(i eeo rpe nCta ae s fthe of use and makes kernel, the that for C API Java in a wrapper create Spark. a to in deploy order develop (iii) in (ii) (JNI) kernel, Interface steps: Native GPU following Java the Het- a requires with It way. wite automated application an (i) Java in happen a not accelerating does Furthermore, eroSpark demand. device. on GPU rialized a or on whether executed choose be explicitly should can task HeteroSpark a of top not on run that Applications model. execution otayt h frmnindsses bet r eilzdaddese- and serialized are objects systems, aforementioned the to Contrary architecture. Spark-based GPU-accelerated another is [250] HeteroSpark query Spark-SQL the extends Spark-GPU scheduling, task Regarding exist should isolation GPU, a of capabilities the of advantage taking For on based system analytics data hybrid CPU-GPU a is [389] Spark-GPU combines that system another [129], EML propose authors the [102] In of way standard The (JVM). Machine Virtual Java the in runs GFlink 75 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 iempigtcnqe hc uoaial ascmuain ohetero- to computations maps automatically which technique, intelligence mapping techniques. learning tive on machine based and profiling, resource offline ed- computing analysis, making code preferred configuration, from the and derived on input, processing application, decisions each the ucated fits hetero- determine best and automatically that tasks they element application Contrarily, between manually hardware. of mapping burden geneous beneficial the most alleviate the subsection selecting this in presented approaches The Intelligence 3.4.3 devices. supported apply case. not all do this by serialization-deserialization in of interpreted overhead take the efficiently about to be concerns Thus, enough can mature that yet values primitive not clusters. large-scale is in and performance device) state optimal same its guarantee and the that operation in decisions stateful Tensorflow placed a not, be (e.g., does considerations should takes she basic algorithm the the only case select However, account In explicitly algorithm. into placement can task. automatic user separate an the each employ task and may for Each processor, preference dependencies. of of kind data for dataflow different device edges a a specialized and on on unit tasks run based can represent a is nodes model Units, where execution Processing graph, The (Tensor applications). learning TPUs over machine algorithms and learning GPUs, machine TensorFlow are: of supports CPUs, TensorFlow Google’s training devices well-known the The environments. The enables heterogeneous that large-scale learning. system machine a of is realm [34] the in on ists based is objects. accelerators Java the of with serialization-deserialization communication on-demand the the and kernels OpenCL DSPs. to and FPGAs process- CPUs, of APUs, range GPUs, broader SWAT like a While devices, targets OpenCL. SparkCL ing using acceleration, by GPU kernels only Spark supports user-defined accelerate to able 76 stando-h-y uigtets’ rteeuinudrQln h input The Qilin: under execution first task task’s each the of cur- during model the on-the-fly, regression The for trained configuration. estimations is system time and per-task execution size a problem provides on rent that relying model units near- regression processing the linear to finds computations automatically from application it mapping Then, the optimal dependencies. of the Acyclic data tasks Directed though represent computational a edges code represent and builds GPU nodes first where or compiler C/C++, (DAG), the CPU Graph of specifically, native More top into API compiler. on Qilin translated Qilin The built dynamically operations. are API parallelizable express calls programming to a primitives provides offers which Qilin devices. geneous ii 29 sahtrgnospormigsse htrle na adap- an on relies that system programming heterogeneous a is [259] Qilin of consist structures, data Tensorflow’s of abstraction logical the Tensors, ex- also systems heterogeneous of development the in activity of flurry A methods Java translating for framework [47] Aparapi the use systems Both are that frameworks open-source two are [339] SparkCL and [184] SWAT rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 plctost eeoeeu lses C ersnshtrgnosapplica- heterogeneous represents offline HCl an clusters. heterogeneous in to trained applications schemes. are partitioning models various over Both measurements models. profiling SVM with CPU-80% again 20% manner GPU, using CPU-90% spec- etc.) 10% the (i.e., along GPU, is GPU-only is categories and latter different performance CPU- The nine between best CPU. to trum the and programs where GPU classifying by both cases over performed the execution clas- handles distributing (SVM) one when Machine achieved Vector second Support the binary while a sifier, GPU- using and executions CPU- the distinguishes optimal for level only partitioning first The optimal program: the per- OpenCL determine that corresponding to predictor machine-learning classification two-level low-dimensional hierarchical a normalized, forms through The passed data. then the dimensionality are normalize etc. the data and transferred, reduce space data to feature float of applied the and is of size int (PCA) the of Analysis static accesses, amount Component 13 memory the Principal extract of as number to such the compilation information operations, CPU- during include heterogeneous which used of features, is code units analysis Code processing available systems. the that GPU among GPU) tasks vs. OpenCL (CPU resource transfer computing data performance. preferred and num- program accesses the loop, the array is optimizes parallel of affect output a number that The of iteration, features size. range per code instructions loop static of time: of ber compile set at a The extracted of datasets. performance, consist input data dimensions various training over input as executions classifier using program manner actual Ma- offline Vector of an Support measurements in binary constructed a that classifier i.e., heuristics (SVM) models, performance chine learning using machine automatic by supervised of achieved on capability is (JIT) rely the This just-in-time adding selection. IBM’s GPU in vs. consists extending CPU extensions execution the GPU of One for compiler. programs 8 Java execution ing actual GPU. the or for CPU over projections GPU. task as the the used and of then CPU time construct are the to which techniques to equations, curve-fitting equally to linear assigned fed and are measurements parts Execution-time multiple in divided is Selection Device Heterogeneous ntetcnqe rsne ofr etr eeto eursmna okby work manual requires selection involved Feature stage selection far. feature so code presented the bypass techniques to the is in goal Its code. raw over ing task separately. entire the task the selects optimizes each and that than schedule mappings rather execution exhaustively node graph the HCl to i.e., resource, optimum, task each global of the on combinations resources task hardware possible each available all of ac- the evaluates estimations into tasks, runtime Taking DAG the between nodes. and volume connected I/O between the computations transfer for count data stand represent nodes where edges (DAG), and Graphs Acyclic Directed as tions epue[1]i notmzto rmwr htrle nmcielearn- machine on relies that framework optimization an is [119] DeepTune heterogeneous maps that scheduler a is [227] Cluster) (Heterogeneous HCl data-parallel partitioning to approach an proposes [181] in work The optimiz- and compiling for system a developed have [195] al. et Hayashi 77 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 .. adaebudClaoaieInitiatives Collaborative Hardware-bound a 3.5.1 in executed is but hardware-agnostic remains manner. seamless thus efficient a code highly in user hardware different contemporary The using all matter. mostly handles generated, that is ap- techniques, layer different containerization abstraction a an describes which part in second the The proach unify hardware. which underlying frameworks of de- sets common end-to-end by different and most tackled drivers the is software describes low-level heterogeneity part ploying hardware first where de- The overall solutions, parts. two The hardware-oriented into architectures. divided and is ar- models scription the programming in developments heterogeneous current of Architectures and eas state-of-the-art and the describes Models section This Programming Emerging 3.5 decision. heuristic neurons, possible of per final number neuron the one constant make of a to consists network layer has neural second layer two-layer the first while connected, source The fully the prediction: a of representations to optimization learned fixed-size fed the single, is i.e., the vectors, a code During resulting extract vectors. the embedding to stage, of sequence last used Short-Term entire then Long the characterizes is A that vectors network vector tokens. space, embedding neural between vector of (LSTM) relationship sequence low-dimensional semantic Memory that a a into the embeddings transformed to capture Using is that vocabulary integers indices. the of integer sequence of the to token tokens each vocabulary code translate language-specific it source a style, using code maps integers consistent of which a sequence a to an into according transformed for rewritten is GPU) automatically is or code creation (CPU source the device is execution DeepTune optimal of the cases select kernel. use to OpenCL demonstrated learn- heuristic machine the a resulting of of the of One quality model. the affects ing heavily and experts domain 78 eetfo otaeadAI hc nac h blte fteunderlying the of of activity abilities the The enhance devices. which platforms of heterogeneous and variety all software however, large from video, benefit towards a mainly onto is and consortium general, graphics the of in playback interfaces media hardware-accelerated programming dynamic and application authorization for royalty-free (APIs) open-standard, of creation Group Khronos The Group Khronos 3.5.1.1 6 h rhtcueo epuei ahn-erigppln:atrthe after pipeline: machine-learning a is DeepTune of architecture The https://www.khronos.org 6 spoal h edn nutycnotu o the for consortium industry leading the probably is rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 agae.SI- sctlzn eouini h language-compiler the intermediate in products, both vendors’ revolution multiple use across a chain that compiler the catalyzing toolchains split can is flexible ecosystem—it SPIR-V of construction languages. enable OpenGL to LLVM. an using in longer supported core no is the is it of and but part 2.2, extension is OpenCL 4.6 It 2.1, graphics. OpenCL and of compute specifications parallel language representing intermediate natively cross-API for open-standard, opti- representation first designed compiler the intermediate enhanced is is enabling shader SPIR-V and thus mizations. Vulkan operations 1.1, the Vulkan subgroup released fea- supporting of of also was kernel launch by capabilities 1.3 the and the SPIR-V accompany expand shader version, to to for current 2018 The 7th, support March APIs. native on affiliated with that by Khronos language used intermediate by tures cross-API defined with a use fully into for evolved is developed now originally has graphics, SPIR and OpenCL. compute parallel (SPIR) for Representation language Intermediate Portable Standard SPIR parallel in running tasks supports multiple also GPU. between 1.1 a data Vulkan on highly-efficient of API, enables manipulation it the and since of feature sharing version new computational important latest more an The operations, releasing subgroup CPU. thus the over- batching driver for reduced utilizing bet- cycles a to extensively offering able cores, and also CPU is head, Vulkan multiple executes usage, among CPU CPU work lower the distribute its while, ter to shaders addition executes specifically, In only else. More GPU everything the utilization. inter- that CPU/GPU to and ensures balanced is video API more the scope as a such prime with applications, Its graphics media, resources. 3D active hardware in GPU performance higher modern offer all to access efficient Vulkan issues. related addressing Vulkan computa- solutions parallel its of in several interested having also thus is efficiency consortium tion the addition, In hardware. Architectures and Models Programming Emerging lxt,adwl nbeabodrneo agaeadfaeokfront-ends framework and architectures. language hardware of diverse range on broad run a to enable com- high-level will driver reducing a and significantly For build plexity, drivers, drivers. to device into need OpenCL compiler the source or language eliminates OpenGL, SPIR-V Vulkan, ingesting by vendors, ingested in- hardware standardized be a to in form programs emit termediate to front-ends language high-level enabling 8 7 oee,Krnshsoe-ore PRVLV ovrintools conversion SPIR-V/LLVM open-sourced has Khronos However, https://www.khronos.org/spir/ https://www.khronos.org/vulkan/ 7 sacospafr rpisadcmueAIwihpoie highly- provides which API compute and graphics cross-platform a is 8 sa intermediate an is 79 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 oa’ urn pnbspooos l fwihrqieatcnclsolution technical a require which of in all bottlenecks mem- protocols, system advanced open-bus significant initiative and current introduced this computing today’s have behind on that drive acceleration solutions main called increasing the ory/storage The constantly standard grow interface. the and bus this is (OpenCAPI) new utilizes Interface a that an Processor on ecosystem create Accelerator devices, based to Coherent interface I/O aims Open bus initiative and seman- high-performance DMA This accelerators user-level architectures. open, or user-level Agnostic-processor read/write Coherent (iii) via tics, accessible (i) memories to Advanced (ii) attach to cessor OpenCAPI OpenCAPI 3.5.1.2 minimize porting. to application resources system during changes operating the abstracts source portable includes that for also API OpenKODE architecture Core acceleration. media OpenKODE graphics a mixed advanced and for to resources, API access cross-platform system a operating It providing applications. by accessing graphics fragmentation and platform media rich mobile for reduces portability source increase to APIs OpenKODE OpenKODE experiment feedback to necessary implementation provide open-source and an with provides requirements. environment Khronos of learning addition, machine revision power 1.2.1 to In recent related its the features with integrates but introduces it world, 3, SPIR only the graph. to not C++ type-safe task higher-level modern SYCL a enable single-source asynchronous to development. in cross-platform functions software file, lambda a source generic application of and same templates simplicity the includes the in SYCL for contained with code be and kernel way, to and standard application host completely the entire using enables style heterogeneous an programming “single-source” for single-source a code SYCL in C++. allowing written thus be to OpenCL, processors of efficiency and portability, SYCL SYCL 80 hti pnyavailable. openly is that 12 11 10 9 https://opencapi.org/ https://www.khronos.org/openkode/ https://github.com/Xilinx/triSYCL https://www.khronos.org/sycl/ 9 sahg-ee btato ae htbid nteudryn concepts, underlying the on builds that layer abstraction high-level a is 12 11 sa pnItraeAcietr htalw n micropro- any allows that Architecture Interface Open an is sarylyfe,oe tnadta obnsasto native of set a combines that standard open royalty-free, a is 10 . rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 nProject Fn Nuclio entire the on knowledge code. without source de- code and to to function architecture SDK convenient submit user integration a and provides providing the test, Nuclio streams write, for sources. event data way and heterogeneous data convenient several processing with a for provides function any Nuclio fine processing. events high-performance data on focused and framework serverless another is [14] Nuclio add Nuclio to instance for requirements, language. specific programming any a it meet for source, to support open extended is easily OpenFaaS it be technology; Since can containerization languages. utilizing programming anywhere process many it any package supporting execute to and able function is a Ku- OpenFaaS as and functions. Docker serverless utilizes host that to framework FaaS bernetes open-source an is [17] OpenFaaS OpenFaaS Cloud heterogeneous serverless by in covered platform be a also of infrastructures. can deployment plat- requirement a the to This according composing enabling service. providers service architectures Cloud each every multiple of to of needs services requirements deploy the the to of common is all it meet form, to able provider Cloud not the individual is every utilizing since, from Moreover, language. teams programming different same disengaging services multi-language or- several container in of frameworks rise such of the way. execution by serverless and boosted a deployment ap- also the open-source was simplifying several trend chestrators Cloud recently This only but Initially, community. emerged. functionality services proaches a Cloud such the provided in interest providers of lot immediately a and gained emerged recently has approach (FaaS) Function-as-a-Service Frameworks Serverless 3.5.2 Architectures and Models Programming Emerging Fnto eeomn i)aue sal oqikybosrpfntosin and models, functions functions. binding FDK bootstrap source submitted Fn quickly input the the defining to testing Docker, Through able by functions. is supported language to user any traffic a utilizes routing and Kit) languages for Development programming (Function balancer container. of Docker load range a smart broad in a a executed for is support Fn con- provides in Docker Fn submitted utilizing function platform each serverless since open-source tainers another is [7] Project Fn evressrie nbetedvlpr ocmoea plcto from application an compose to developers the enable services Serverless 81 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 sasrels praht vn rvnmcoevcsadfcse nbeing on focusses Funktion and offerings. microservices driven SaaS event and to services, approach serverless Cloud a networks, programming is messag- social databases, and lambda-style systems, transports, sources event ing protocols, event-driven network several many supports open-source including Funktion connectors Kubernetes. an for designed is model [8] Funktion functions. Funktion many span process that the applications automating serverless di- etc., complex without queues, building functions trig- message of serverless event networking, of with to set dealing them a rectly of maps orchestration Fission it Moreover, the management. and enable and Workflows language, creation enabling any containers away Kubernetes in abstracting on gers functions functions serverless of for definition framework the a also is [6] Fission available them Fission make and mechanism. PubSub runtimes a the via into dynam- or code controller HTTP functions over The custom the on-demand. these as watches injects runtimes functions that ically launches create controller and in-cluster to resources an able runs custom be then to It resources. Definition Kubernetes Resource infrastructure Custom Kubernetes Kubeless a leveraging primitives. de- other uses cluster, be and monitoring Kubernetes to routing, a designed auto-scaling, management, framework of serverless top native on Kubernetes ployed a is [10] Kubeless instance. Kubeless OpenWhisk gets an several of that a with aspects block provides several integration and function managing providing languages for a event programming CLI define several an and supports to of OpenWhisk infrastructure able reception sources. events. is the the to user manage upon response the triggered to in Again, containers executed scaling. Docker are handle uses functions platform also the serverless distributed since OpenWhisk event-driven open-source an is also that is [3] OpenWhisk Apache OpenWhisk Apache 82 in,Qei a lohs vn-rvnmicroservices. func- event-driven from host apart also and, downgrade can automated or Quebic provides update tions, also functions Quebic submitted on calls. for inter-functions processes run API or to an The function from to functions NodeJS. invocations Gateway enables and serverless Quebic Java, of writing Python, mechanism only messaging for event-driven supports framework quebic FaaS Currently a Kubernetes. is [19] Quebic Quebic framework. serverless generic a than rather OpenShift-native and Kubernetes- rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 24,w aeas nlddtems rmnn non rjcsfo the from projects ongoing prominent most GPUs on the call relying included heavily EU a relevant also and of have heterogeneous vastly capable we be be [244], to will expected systems is up- where tructure the [147], in computing 10 that exascale quintillion fact and of that the computing era projects Given heterogeneous coming challenges. European for programming ongoing models heterogeneity of study programming overview on high-level research a perform gives Projects section This European Ongoing 3.6 life-cycle application for support provides built-in and and languages ecosystem management. programming Server- plug-ins platforms. several Cloud robust also public a to supports makes applications framework that deploy experience and less develop CLI intuitive to Cloud simple, easy is a it framework provides Serverless and files. agnostic configuration defin- likeprovider deploy- technologies simple Code and serverless with popular as Lambda, building AWS utilizing Infrastructure for applications, serverless enables CLI entire Serverless ing open-source applications. an serverless is ing [22] Framework Serverless it Framework kills Serverless developers. afterwards the and from Ku- container operations triggered, one those is away spins function abstracting Riff, software off of a integration con- orchestrator when provides as and Riff the packaged languages In bernetes, of are sources. variety event Riff a several in in with written functions be running Since can for events. they designed tainers, certi- to framework any response a in in is works Riff functions that Again framework environment. serverless Kubernetes open-source fied another is [21] Riff Riff Projects European Ongoing n salwpwrItre fTig IT ltom hl h eodi an is second the while first platform, The (IoT) test-beds. Things as of platforms Internet distinct low-power two a uses to is tasks one project DL ALOHA schedul- of The software and porting mapping, a ing. optimized the develop their others architectures, will embedded among project heterogeneous automating ALOHA flow, the tool goals development its achieve To edge-nodes. ALOHA ALOHA 3.6.1 14 13 https://www.aloha-h2020.eu/ https://cordis.europa.eu/programme/rcn/702036_en.html 14 ist aetedpomn fde erig(L loihson algorithms (DL) learning deep of deployment the ease to aims 18 otn-on prtosprscn FP) l ao infras- major all (FOPS), second per operations floating-point 13 . 83 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 iepromnemntrn etrs swl sfuttlrnemechanisms. fault-tolerance as well as features, pro- will monitoring it Furthermore, performance optimization. vide integrate management memory and quality and data code scien- for source of tools improving for range tools wide provide a applications, support tific heterogeneity. will systems’ environment exa-scale programming of EXA2PRO exploitation The efficient enable will that ment EXA2PRO EXA2PRO 3.6.4 per- high productivity, coding high awareness. provide upcoming energy tech- to and European the goal formance, on ultimate on will based the components with applications nology, project state-of-the-art integrate of The will and deployment advance supercomputers. that and environment exa-scale programming development overwhelmingly-heterogeneous parallel the production-ready Envi- ease a Programming Computing deliver Productive Exascale Highly to Heterogeneous a for toward Effort ronment joint European The EPEEC 3.6.3 build- green health, domains. finance, in security the architec- deployments and from real-world ings, cluster applications of ARM scenarios resource-demanding low-power execution four and realistic con- representing high-performing be tures combining will both evaluation by on resource The paradigm, deployments. maximum ducted Data Cloud achieve orig- heterogeneous to Big for the order new utilization in in a changes components, provide software code with- will state-of-the-art no resources E2Data (i.e., Cloud source). norms less inal programming utilizing current while affecting increases out performance deliver will E2Data E2Data 3.6.2 (CNNs). Convolutional Networks accelerate Neural to designed architecture heterogeneous FPGA-based 84 eeoigrcngrbeacietrs eintosadHCapplications start. the HPC from and built-in reconfiguration tools runtime design with architectures, reconfigurable developing EXTRA EXTRA 3.6.5 18 17 16 15 https://www.extrahpc.eu/ https://exa2pro.eu https://cordis.europa.eu/project/rcn/215832_en.html https://e2data.eu 15 18 rpssa n-oedslto o i aadpomnsthat deployments Data Big for solution end-to-end an proposes rjc ist raeanwadflxbeepoainpafr for platform exploration flexible and new a create to aims project 17 saohrpoetta ist eie rgamn environ- programming a deliver to aims that project another is rgamn n rhtcueModels Architecture and Programming 16 nsotEEC aims EPEEC, short in , Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 ae P lse n rpsdtcnqe oadestecalne fmas- of resiliency. challenges and the computing, address ARM- heterogeneous to an techniques parallelism, created energy-efficient proposed sive project and European the cluster others, for HPC among based phases, solutions previous provide its During to HPC. is aim MONT-BLANC’s MONT-BLANC MONT-BLANC 3.6.8 whole applications. to the adaptation for dynamic latency allowing and infrastructure, will node bandwidth, It predictability, HN interconnect. homogeneous for across-boundary guarantees an provide by acceleration heterogeneous linked with (HN), as general- intertwined nodes are where heterogeneous (GN) customization, inherently nodes application-based compute be and purpose efficiency to for intends enabler programming architecture an as system well runtime as The cooling, interconnects, models. and monitoring architectures, het- power involve memory management, including will resource levels, cores, approach architectural computing disruptive various innova- erogeneous at Its an mechanisms system. and interrelated cooling quality-of-service, many and passive architectures two-phase isolation for tive heterogeneous mechanisms new- native power-efficient, of with definition exploration high-performance, the architecture on generation based cross-boundary (PPP) ambitious performance/power/predictability for through HPC sensitive MANGO MANGO hardware. 3.6.7 heterogeneous the on running stack ensure software to ways the explore of in- will resilience magnitude LEGaTO the of Additionally, order efficiency. programming one energy a task-based in achieve ultimately crease leverage to FPGA-based OpenMP, will to and similar LEGaTO FPGAs models, (DFEs). GPUs, CPUs, engines off data-flow composed hardware erogeneous LEGaTO’s LEGaTO 3.6.6 Projects European Ongoing gotcsfwr ltomta iloe h en o multi-dimensional for means the offer hardware- will a comprises that system platform PHANTOM software The hard- agnostic programmer. underlying the the of from complexity the ware hiding while systems, computing power PHANTOM PHANTOM 3.6.9 22 21 20 19 http://www.phantom-project.org/ https://www.montblanc-project.eu/ http://www.mango-project.eu https://legato-project.eu/ 20 19 22 agt civn xrm eoreecec nftr QoS- future in efficiency resource extreme achieving targets oli opoieasfwr csse o aei-uoehet- Made-in-Europe for ecosystem software a provide to is goal ist nbenx-eeainhtrgnos aalladlow- and parallel heterogeneous, next-generation enable to aims 21 saln-unn rjc,cretyi t orhphase. fourth its in currently project, long-running a is 85 Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 n uoai oegnrto nldn otaeadhrwr modeling. hardware and software including generation code featur- pro- architectures cross-layer automatic hardware new parallel ing a heterogeneous creates for TANGO approach pro- addition, gramming hardware and In various devices. chips for logic heterogeneous grammable support clusters, built-in heterogeneous with including architectures model programming a TANGO Moreover, integrates optimization architectures). time- different target cost, implementa- into on location, dependability work actual and security, movement criticality, research its data the performance, and efficiency, of architecture (energy results areas reference the a includes that is tion project key the The architectures). of location, target novelty and on movement dependability security, design data time-criticality, software performance, cost, of efficiency, dimensions de- (energy various logic operations optimize and to programmable tools and developing chips while vices clusters, heterogeneous software including and systems configurations architectures, hardware heterogeneous underlying TANGO of scope The TANGO 3.6.11 on focusing specifically task a includes models. runtime package programming both the work for from second reliability RECIPE’s Apart the ensure enforce itself computation. and will throughput-oriented it applications, and time the same time-critical by the min- imposed At constraints and hotspots. thermal time efficiency of runtime energy occurrence hierarchical the optimise a imise provide to will infrastructure systems) resource-management Exascale heterogeneous of ment RECIPE RECIPE 3.6.10 dynamically platforms. orchestrates hardware reconfigurable it of Additionally, components application. software and an ex- hardware to of the CPU) part FPGA, (GPU, each analog-digital, technology hybrid ecute heterogeneous cen- digital, which (analog, data on level and desktops, software) system devices, cross-layer mobile which at systems, ters), embedded con- Cloud, computing the (e.g., in tinuum where decides scheduler multi-objective A optimization. 86 h oto falaoeetoe eeoeeu aallifatutrsi an in facilitate infrastructures which parallel mechanisms heterogeneous toolbox certain aforementioned open-source provides all project of the control the least, not but Last 25 24 23 http://www.tango-project.eu/content/beta-tango-toolbox-released-open-source http://www.tango-project.eu http://www.recipe-project.eu/ 23 Rlal oe n ieCntansaaePeitv manage- Predictive time-ConstraInts-aware and power (REliable 25 24 . st rvd h en o otoln n abstracting and controlling for means the provide to is rgamn n rhtcueModels Architecture and Programming Downloaded By: 10.3.98.104 At: 14:20 24 Sep 2021; For: 9780429399602, chapter3, 10.1201/9780429399602-3 oad mrvn n adigpormigfrhtrgnoscomputing. heterogeneous working for projects state-of-the-art programming European handling the ongoing and presented and improving towards initiatives also computing. current heterogeneous It showed for it selection R. Finally, device and regarding projects Python, and Java, lan- research programming as managed such from systems guages such OpenCL. programming and covered CUDA, of it OpenACC, overview Then, as an ar- such showed and models first programming programming It heterogeneous current computing. heterogeneous of for overview models high-level chitecture a provided chapter This Conclusions 3.7 programming data-center etc.). typical Spark, Storm, using (e.g., uti- by frameworks seamlessly platforms to heterogeneous builds end-users such allowing also lize VINEYARD for framework platforms, programming such high-level accel- on hardware a productivity programmable increase with To servers on erators. based centers data erogeneous VINEYARD VINEYARD 3.6.12 Conclusions 26 http://vineyard-h2020.eu 26 ist eeo nitgae ltomfreeg-ffiin het- energy-efficient for platform integrated an develop to aims 87