PoS(EGICF12-EMITC2)057 http://pos.sissa.it/ ∗ ‡ † [email protected] [email protected] [email protected] Speaker. Speaker. Speaker. The European Grid Infrastructure (EGI)a offers wide a range of platform technologies toprogramming and execute with paradigms parallel OpenMP, such or applications as GPGPU using message programming. passingThe with objective of MPI, this shared workshop memory is to presentjobs the on latest the advances European in Grid the Infrastructure. support The forobjectives following MPI topics and and are status; parallel covered: an the MPI update Virtual of Team MPI-Start the for generic latest parallel features jobs; and and a developments summaryprogramming. of of MPI-Start; the current the status use of of the support for GPGPU ∗ † ‡ Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike Licence. c

EGI Community Forum 2012 / EMI26-30 Second March, Technical 2012 Conference, Munich, Germany Alvaro Simon Enol Fernandez-del-Castillo John Walsh CESGA, E-mail: Workshop Instituto de Fisica de Cantabria (IFCA),E-mail: CSIC-UC, Spain Trinity College Dublin, Ireland E-mail: PoS(EGICF12-EMITC2)057 ]. In 8 ] offers a 3 Alvaro Simon an introduction . 5 6 , we describe how 4 describes the work of the MPI Virtual 2 ]. This new team has different objectives: 17 2 ] (such as Marketing and Communication, Technical outreach, etc) and are open for 4 Collaborate with user communities and projects that use MPIPromote resources. MPI deployment. Enhance MPI experience and support for EGI users. Produce materials (tutorials, white papers,that can etc) be about used by successful new use communities. cases of MPI on EGI Improve the communications between MPI users and developers within EGIImprove/create SA3. new MPI documentation for EGI users and administrators. Find and detect current issues and bottlenecks related with MPI jobs. The Virtual Team (VT) framework has been created within the NA2 activity of the EGI- One of these new created VTs is the MPI VT [ The new VT is a short living project (it has a life time of 6 months). It was started the past Execution of parallel applications in clusters and HPC systems is commonplace. Grid infras- Previous workshops have covered the support for MPI in the infrastructure and middleware • • • • • • • ] and provided users with tutorials to execute their MPI jobs using the available tools [ presents the status of the MPI-Start development and new features that may improve the users’ 20 InSPIRE project to improve theEGI.eu efficiency and User Communities. flexibility of The the original interactiondefined, idea between was non-operational to the activities create NGIs around short and the livingferent production projects areas infrastructure. that [ focus Different on VTs well cover dif- November 2011 and it willend end of at the the VT timeline: end of May 2012. This work will provide an output at the Team of EGI-InSPIRE, created to improve the usage of MPI in the infrastructure. Next, Section MPI-Start may be used for the executionto of generic GPGPU parallel programming jobs, is while given. on Last, Section some Conclusions are given on Section 2. The MPI Virtual Team NGIs, countries and regions thatjoin are existing not Virtual Teams involved or in propose NA2 new Virtual or Teams. in EGI-InSPIRE. These entities can 3 experience, including the use of MPI-Startof for non-MPI hybrid parallel OpenMP/MPI jobs programming. is The considered execution in the following sections. On Section tructures are mainly used for the execution ofare collections clusters of where sequential execution jobs of although parallel most applications Grid sites is possible. The EGI Infrastructure [ [ Parallel Computing Workshop 1. Introduction platform to execute parallel applications usingmessage a passing wide with range MPI, of shared technologies memory and programming paradigms with such OpenMP, or as GPGPU programming. this workshop we will cover the following topics: Section PoS(EGICF12-EMITC2)057 MaxS- Alvaro Simon 3 . This value reflects the maximum number of slots which could be allocated to a single Set up a VO on EGI with site committed to support MPI jobs. Review and extension of documentation by MPI VT members. MPI Administrator guide: https://wiki.egi.eu/wiki/MAN03. Working on an MPI user guide. Propose to concentrate documentation under a single endpoint. Detect if grid sites are publishing correctly MPI tags andInspect flavours. which GLUE 1.3/2.0correct variables implementation. could be used in the MPI context and ask for its Due to the complexity of this project it was split in six tasks. These new tasks were assigned MPI documentation was fragmented in different places and in most cases only site admin The information system is a critical gear for any grid infrastructure. One of the most common MPI VT members have detected a valid GLUE variable that can be used by MPI jobs: Current EGI monitoring system does not detect if a MPI site is really working fine or not (only • • • • • • • to different MPI VT members to be accomplished during the next2.1 months. Task 1: Documentation oriented. MPI user communities areTo solve demanding this new documentation gap and MPI improved VTmonths user has the oriented current created MPI documentation. an EGI specific documentation task wasthe to improved MPI solve to this include VT issue. these group new In is features. thecommunities. working Currently last to The tasks include undertaken in are: the same endpoint a MPI user guide to help MPI user 2.2 Task 2: Information System failure reasons for MPI jobsand is improve due MPI to sites reliability, sites a information Informationinformation system System is task misconfiguration. was not created To published within fix correctly the thisfollowing MPI MPI issue objectives: VT. jobs If may this fail or queued indefinitely. This task has the an MPI ”Hello word” isMPI members submitted). to check A if newby MPI set MPI sites members of are and nagios working it will correctly. probesThe be The specifications new developed new by probes were SA3 specifications will created team were include to by these reviewed be tests: included into EGI nagios framework. Parallel Computing Workshop lotsPerJobs job. Unfortunately this value is notstatic ”999999999” filled value by is the published current by LRMSogy default. Information Provides MPI Providers to VT and show is this only in information an contact by with the the information different providers. Technol- 2.3 Task 3: MPI monitoring PoS(EGICF12-EMITC2)057 variable Alvaro Simon 1, although this does not > GlueCEPolicyMaxSlotsPerJob 4 set to a reasonableservice. value and detect The informationCheck published the by MPI the functionality with (MPI ain or different minimum machines). set Parallel) of resources (run MPI jobComplex using Job two slots check. Thisdedicated probe machines. will requires 4 slots with 2 instances running in different Environment sanity check: To detect if a site has the EGI accounting system based on APEL is not ready yet to include parallel jobs usage records. Another important task for MPI Virtual Team members is to detect new potential failuresOne in of these bugs was detected in MAUI, typically used in conjunction with the Torque sched- VTs were created to improve the interaction between the NGIs and EGI.eu. The MPI VT is The new nagios MPI probes will be able to detect any MPI issue or misconfiguration and will • • • This is an open task insideparallel EGI job project in the (with accounting a system specific is task by JRA1.4). checking jobs The with only efficiency way to recognise a different batch systems. Inexecution were the detected last . months several issues that affectuler. directly to Due the to this parallelVT issue, job Torque members was have not contacted ablenext directly to EMI to submit release. EMI jobs that maintainers An requiremaintained important and directly more point a by than is the new technology a that fix providers some single likeinto will EMI. LRMS node. account This be are of issue the available supported was different raised in by LRMS by third the MPI releases VT parties, and to updates. are take 2.6 not Task 6: MPI user communities and VO regularly in contact with differentsite NGIs administrator to feedback. gather In information thesubmitted about first to different months the MPI of NGIs. use life These cases oftry reports and the to are VT help discussed several user and reports communities reviewed and by andparallel surveys the environments site in were MPI administrators. their members sites In in it order was ordertogether to created to sites help a new MPI and MPI site users VO admin within interested VT.purposes to in The configure to new MPI VO collect will but bring experience it that isis will not available a be in permanent later the VO. adopted Itfunctionalities. MPI by was VT regular created wiki VOs. for page testing MPI and VO could configuration be used by any site to support it and test MPI improve the MPI availability and reliability metrics. 2.4 Task 4: Accounting guarantee a parallel job execution. APELparallel developers jobs are accounting. working To to help provide in newdevelopers this plugin to task to provide MPI include a VT complete members MPI areof accounting in 2012. able contact to with correctly EMI record and parallel APEL usage at the end 2.5 Task 5: LRMS status Parallel Computing Workshop PoS(EGICF12-EMITC2)057 ]), 13 ] and 21 Alvaro Simon ], LSF [ 1 , that manages the , that provide additional Execution ] (including MPICH-G2 [ Hooks ], PBS/Torque [ 12 10 ]. Currently its development continues 16 ], MPICH [ 9 5 ] 2 ]. ]. MPI-Start is a unique layer that hides the details of the 6 19 , that deals with the local batch systems; on 12th of December 2011. This release is an enhancement of the MPI-Start ], and LAM-MPI [ Scheduler 1 ]. 11 15 Support for Local Resource Management Systems: SGE [ MPICH2 [ Condor [ Support for controlling the placement ofting processes processor on and the memory allocated affinity machines, (forof including Open Hybrid set- MPI MPI/OpenMP and applications MPICH2). as This described allows in the the execution nextDetection section. of the appropriate compilers for the MPI implementationUse in of use. command line parameters instead ofbehaviour. environment variables for defining the MPI-Start Refactored configuration that allowsscheduler sites or execution and framework. users to provide their own plugins for the Support for MPI implementations: Open MPI [ http://www.eu-emi.eu/emi-1-kebnekaise-updates/-/asset_publisher/Ir6q/ MPI-Start is composed by a set of scripts that ease the execution of MPI programs by using a Parallel applications using the paradigm are becoming more popular with the The latest released version of MPI-Start in EMI is 1.2.0, that was made available as part of In order to provide a uniform interface for running MPI applications in grid environments, 1 • • • • • • content/update-11-15-12-2011 special parameters needed to startfeatures each and MPI is implementations; customisable and by bothand sites main and features final of users. MPI-Start A is detailed available description at of [ the architecture unique and stable interface toa the core middleware. and The around scripts itStart are there architecture: written are in different a plug-ins. modular Three way: main there frameworks is are defined in the MPI- 3.1 Hybrid MPI/OpenMP Applications advent of multi-core architectures.the MPI-Start slots default allocated behaviour for an isarchitecture execution. to where several start However, threads this a access is to not for a suitable each common for shared of applications memory using area a in hybrid each of the nodes. resources and application frameworks (such asof a MPI the implementation) middleware. to the The user usestarting and of parallel upper such applications. layers layer removes any dependencies in the middleware related to EMI-1 update 11 1.x series that are included inthis the release first are: EMI-1 release and subsequent updates. The main features of MPI-Start project was started in the frame of int.eu.grid [ Parallel Computing Workshop 3. MPI-Start in the framework of the EMI project [ PoS(EGICF12-EMITC2)057 , t3 t2 p3 t1 CPU 1 Alvaro Simon t0 NodeNumber Host 1 t3 t2 p2 t1 CPU 0 15, each assigned to one t0 p t3 t7 0 to t2 p t6 p1 t1 CPU 1 t5 CPU 1 t0 t4 p1 Host 0 t3 Host 1 t3 t2 t2 p0 t1 CPU 0 t1 CPU 0 t0 t0 6 t7 p15 t6 p14 t5 CPU 1 attributes, as shown below: CPU 1 p13 3 in the second host – and for each of these processes, 4 1 in the second one — and 8 threads would be started for t4 p p p12 p0 7. Host 0 t t3 Host 1 p11 p11 can be submitted with a JDL that specifies the t2 2 and 1 p10 0 to t1 p CPU 0 t p9 CPU 1 t0 p8 WholeNodes 3). Finally, if the per node mapping is used, only one processes would be p7 0 in first host, t p and p6 attribute of JDL). p5 CPU 1 0 to t p4 shows the different possible mappings for two hosts with two quad-core CPUs. In the Host 0 p3 1 1 in the first host, p2 p MPI-Start mapping of processes. (a) Top left: per core; (b) Top right: per socket; (c) Bottom: per p1 CPU 0 Define the total number of processesslots. to be started, independently of the number of allocated Start a single processapplications. per MPI-Start host. prepares the This environmentin to is the start host. the as usual many threads use as case slots available forStart hybrid a single jobs process with per CPU and socket.threads In MPI as this cores case, are an available hybrid for application each would CPU. start as many Start a process per CPU core, independently of the numberDefine of the allocated number slots. of processesallocated to slots be at each started host. in each host, independently of the number of p0 The jobs depicted in Fig. Figure • • • • • WholeNodes 0 and p different threads ( per core case, there would be 16 MPI processes, numbered from started at each host – CPU core. In the case of– using a per socket mapping, each host would have two different processes each of them, numbered from SMPGranularity Figure 1: node. In order to support more useof cases, the how latest the report processes of are MPI-Start started, includes allowing support the for following better behaviours: control All these modes must be used withmachines. caution, since It they is could interfere recommended with thatthe other node jobs running exclusivity in is the enforced on the job submission (e.g. using Parallel Computing Workshop PoS(EGICF12-EMITC2)057 at- Alvaro Simon Arguments Parallel Environ- ] middleware, al- 14 attribute: option in the with Open MPI using the process Arguments -pnode user_app option to the 7 -psocket option would be equivalent to not using it since there is a slot ] was developed. This specification includes a 18 -pcore ], a new Runtime Environment that invokes MPI-Start is available. In the 5 ] a Execution Environments is defined using a XML file where the different 7 section that will provide a simple and user-friendly way of setting up parallel jobs. Each In the case of ARC [ In order to provide a unified interface for the different middleware stacks in EMI, the EMI The MPI frameworks (OpenMPI, MPICH) are the most successful methods for managing The JDL requires 2 different hosts, each with 8 cores and used exclusively by the job. With The behaviour of case (c) is obtained by using the In this case, using the MPI-Start also provides a set of variables accessible from the user application that describe MPI-Start was originally developed for the integration with the gLite [ though the design andWith architecture the latest of releases, MPI-Start MPI-Start can is benative mechanisms integrated completely with of independent the each ARC platform. of and this UNICORE stacks middleware. using the Execution Service (EMI-ES) [ case of UNICORE [ options available to the users arethe described. possibility of In setting order the to parametersables. use via MPI-Start command in line such arguments way, instead we of introduced environment vari- parallel workloads distributed across a local network. However, numerous legacy applications ment parallel environment is identified by aAdditional type option (e.g. tags generic allow MPI, users Open to MPI,allows specify MPICH2) non setting and default a behaviour the version. and number resource specificationEMI-ES of implementations field are processes available, MPI-Start per will slot beEnvironments, adapted thus or to all act threads the as MPI-Start back-end per features of processes will the Parallel be to easily be accessible from used. the4. new interface. Once Using MPI-Start the for Generic Parallel Jobs the arguments shown, mpi-start would execute the the allocated resources by thethe batch execution. system as well as the placement of3.2 processes Integration that with is ARC used and for UNICORE: towards EMI-ES Parallel Computing Workshop Executable = "/usr/bin/mpi-start"; Arguments = "-tNodeNumber openmpi = user_app 2; SMPGranularity arg1 = arg2"; 8; WholeNodes = True; InputSandbox = {"user_app", ...}; distribution of case (a) in the(b) Figure, would without be any achieved special by treatment adding of the theArguments allocated resources. Case = "-t openmpi -psocket user_apptribute: arg1 arg2"; Arguments = "-t openmpi -pnode user_appallocated arg1 per each arg2"; of the available cores as defined in the JDL. PoS(EGICF12-EMITC2)057 Alvaro Simon Comm Size OMPI_COMM_WORLD_SIZE MPIRUN_NPROCS 8 of the processes. Using this information one can MPI Rank variable allows a user’s job to request multiple job slots on a ). However, the methodology can be extended to accommodate Example MPI related environment variables lists a selection of MPI related environment variables. describes how mpiexec can be used to execute remote processes and 1 2 Rank Identifier OMPI_COMM_WORLD_RANK MPIRUN_RANK Table 1: NodeNumber . 3 MPI_SHARED_HOME http://www.osc.edu/~djohnson/mpiexec/index.php#Cute_mpiexec_hacks https://wiki.egi.eu/wiki/Parallel_Computing_Support_User_Guide Both OpenMPI and MPICH2 define a number of environment variables that are available to A simple indication of how this can be used in practice can be found in the listing below. The OSC mpiexec FAQ The authors investigated whether MPI-Start could be used as a way to handle non-MPI parallel The gLite JDL MPI distribution OpenMPI MPICH2 2 3 ### Code for slave processes ### Code for coordinating master process fi else nominate a "master" or coordinator processmaster/slave in use-cases. the set Table of processes. This allows us to accommodate #!/bin/bash if test x"$OMPI_COMM_WORLD_RANK" = x"0" ; then every MPI process.slots In allocated particular, to they the export job variables and to which the relate to the number of process Further information and full examplesGuide can web be page found on the EGI Parallel Computing Support User to copy files between nodes.more These complicated recipes applications. can be Moreover,non-mpi used all executables. as mpiexec a implementations We primitive may recommend, buildingframework be however, blocks as that used to this users to develop provides should execute environment setup, continuethe file-copying, to correct and parameters. use explicitly the launches mpiexec MPI-Start with 4.2 Master/Slave Applications other resource centre setup scenarios. 4.1 From mpiexec hacks to building blocks workloads. The experimentspublishing were carried on an MPI enabled site with shared filesystem (i.e. Computing Element. This facilitates allocationworkload of across worker-node those resources, resources. but The not MPI-Start executionsuccessfully framework, of used however, the was for developed many and years has on been the EGI infrastructure for exactly this purpose. Parallel Computing Workshop using the parallel virtual machineimportantly, (PVM) several framework newer paradigms are for still handling in distributed general workloadsinclude have Map/Reduce production also and use, emerged. Charm++. and These more PoS(EGICF12-EMITC2)057 . To Alvaro Simon 4 message-driven migrateable object 9 http://www.top500.org/lists/2011/11 The increasing capabilities of general purpose graphics processing units (GPGPUs) over the Charm++ is an object based parallel programming system. It’s execution model is message- It is not hard to see how the "coordinator" model can be used in a much more general sense. A simple example using the 3darray example code from the Charm++ distribution can be found 4 ], a system for provisioning virtual Hadoop clusters using existing batch systems, in the grid ./hello +p${OMPI_UNIVERSE_SIZE} 32 +x2 +y2 +z2 +isomalloc_sync 22 list using NVIDIA GPGPUs, we would expectcentres the number to of grow GPGPU significantly deployments at over gridsupporting the resource grid next access few years. tosource such However, centres there resources. to are advertise/publish two Firstly, availability of major therein these problems current is resources. in batch currently Secondly, scheduling no there systems are standardisedresources. which deficiencies make way it for difficult re- to guarantee exclusive access to those past few years has resulted inplines. a With huge three increase of in their the exploitation top by five all clusters the in major the scientific current disci- November 2011 Top500 environment. 5. GPGPU programming driven, with Charm++ computationsMoreover, it triggered has an based adaptive on runtime system, the and supports reception of associated messages. Parallel Computing Workshop 4.2.1 A Charm++ master/slave application The authors are currently investigating how[ to use this method to implement Hadoop-on-Demand fi # The slave# node launched does by not the actively Charm++ do4.2.2 "remote-shell" Map/Reduce anything, invocation. - Hadoop it’s on Demand processes are below. Note in particular thatformat. we must convert the PBS machinefile into a#!/bin/bash Charm++ compatible export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/charm/lib export PATH=$PATH:/opt/charm/bin if test x"$OMPI_COMM_WORLD_RANK"echo = "Now x"0" buildingmake ; Charm++ -f then 3darray Makefile cat example" ${MPI_START_MACHINEFILE} |charmrun sed ++remote-shell ssh ’s/^/host ++nodelist /g’ charm_nodelist > \ charm_nodelist install Charm++ it must first be compiled6.21 from into source. a The authors compatible were RedhatTrinity able College to Package Dublin package Management grid Charm++ resource (RPM) centre. format, and then deployed it at the PoS(EGICF12-EMITC2)057 ] and 23 ] is attempt- Alvaro Simon 25 . MPI was used 5 ]. 10 26 ]. OpenCL supports heterogeneous compute environments such as AMD and Intel CPUs, 24 Some batch system integration support, Inadequate Operating System hardware device level security, Lack of a standardised grid glue-schema support for GPGPUs, and Lack of information system plugins. Each node had 2 GPGPUs, but the master node did not use any of its GPGPUs GPGPUs are highly efficient for "single instruction, multiple data" (SIMD) models of exe- Although it is fairly straightforward to assemble grid-enabled worker nodes with GPGPUs, None of these issue are huge technical hurdles, and in particular, most recent versions batch The execution of parallel applications in grid environments is a challenging problem that re- The latest developments in MPI-Start, a tool that provides a unique layer for several MPI GPGPU Application development is dominated by two API frameworks: OpenCL[ 5 • • • • cution. The authors are ablethe to grid report infrastructure successfully using running 16 a individual hybrid machines MPI/OpenCL application and on 30 NVIDIA GPGPUs integrating them as first classtructure resources, is such a much as more CPU challenging and problem. storage The fundamental resources, issues into can the be grid classified as infras- follows: systems such as LSF, Torque, SGE and Slurmwhich have does some have level issues of is support. Maui, The butsupport only some requires batch experimental scheduler standardisation patches and are agreement available for within this. the grid Glue-schema community. 6. Conclusions quires the cooperation of severaldleware middleware and tools the and infrastructure services. is Although constantlyexperts improving, the that users support help still from them need mid- guidance withport and for they support MPI day from jobs, to and dayrunning the issues. in MPI the Virtual available Both Team resources. recently the created, SA3 help task, users to devoted get to theirimplementations, provide parallel have sup- introduced jobs better control ofMPI/Open job MP execution, applications the and the possibility integration of with running ARC,Users hybrid gLite are and totally UNICORE abstracted middleware stacks. by usingapplication the from unique one interface middleware of provider MPI-Start to and another. can easily migrate their CUDA[ ing to enable CUDA applicationoffer a development performance for advantage non-NVIDIA over OpenCL[ processors. CUDA applications for inter-node communication, and OpenCLdistribution to was Intel responsible CPUs for and NVIDIA device GPGPU allocation devices. and workload 5.1 GPGPU Grid Integration Challenges Parallel Computing Workshop AMD, Intel and NVIDIA GPGPUs,currently and run on Cell NVIDIA Broadband GPGPUs devices Engine only. processors. However, the GNU CUDA Ocelot applications project[ PoS(EGICF12-EMITC2)057 , , J. , Alvaro Simon ) and under contract Portable batch system: , Proceedings of the 8th ) , Proceedings of the first , Computing and Informatics, , Proceedings of the 9th European , Lecture Notes in Computer Science, volume , EGEE-PUB-2006-029, 2006. http://www.egi.eu/ A high-performance, portable implementation of the 11 MPI Support on the Grid , Parallel Computing, volume 22(6), pp. 789–828, 1996. http://www.eu-emi.eu/ https://twiki.cern.ch/twiki/bin/view/EMI , EGI User Forum, April 2011. Condor - a hunter of idle workstations http://www.egi.eu/ , Technical report, MRJ Technology Solutions, 1999. MPICH-G2: A grid-enabled implementation of the message passing interface Advanced Resource Connector middleware for lightweight computational Grids Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation Programming the Grid using gLite MPI Hands on Training Sun Grid Engine: towards creating a compute power grid UNICORE - A Environment MPICH2: A new start for MPI implementations External reference specification 2150, pp. 825–834, 2001. https://wiki.egi.eu/wiki/Overview_of_Virtual_Team_projects volume 27, pp. 213–222, 2008. Future Generation Computer Systems, volume 23, pp. 219–240, 2007. Lecture Notes in Computer Science, 3241, pp. 97–104, 2004. IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 35–36, 2001. PVM/MPI Users’ Group Meeting on RecentPassing Advances Interface, in pp. Parallel 7, Virtual 2002. Machine and Message MPI message passing interface standard Parallel Distrib. Comput., volume 63(5), pp. 551–563, 2003. International Conference of Systems, 1988. The execution of parallel jobs is not limited to MPI. Other generic parallel jobs can be executed The authors acknowledge support of the European Commission FP7 program, under contract [1] A. Bayucan, R. L. Henderson, C. Lesiak, B. Mann, T. Proett, and D. Tweten, [2] K. Dichev, S. Stork, R. Keller, E. Fernandez, [8] E. Fernandez, [5] M. Ellert et al., [3] European Grid Initiative (EGI) [4] EGI Virtual Teams [6] European Middleware Initiative (EMI) [7] D. Erwin, [9] E. Gabriel et al, with the help of MPI-Start andthe MPI use tools of as GPGPUs shown programming withalso and Charm++ covered possible or in integration the Hadoop. methods workshop. An in introduction the to grid infrastructure was Acknowledgments number 261323 through the project EGI-InSPIRE ( Parallel Computing Workshop number 261611 through the project EMI ( References [10] W. Gentzsch, [13] N. T. Karonis et al, [15] M. Litzkow, M. Livny, and M. Mutka, [11] W. Gropp, [12] W. Gropp, E. Lusk, N. Doss, and A. Skjellum, [14] E. Laure et al., PoS(EGICF12-EMITC2)057 Alvaro Simon , EGI Technical , Computing and Manning Publications Morgan Kaufmann , Proccedings of the , v1.07, 2011. , Proceedings of the 9th ACM SIGPLAN 12 A Comprehensive Performance Comparison of CUDA and http://code.google.com/p/gpuocelot/ http://hadoop.apache.org/common/docs/r0.17.0/hod.html https://wiki.egi.eu/wiki/VT_MPI_within_EGI EMI Execution Service Specification Message Passing Interface - Current Status and Future Developments A component architecture for LAM/MPI The Interactive European Grid: Project Objectives and Achievements OpenCL in Action: How to accelerate graphics and computation Lsf: Load sharing in large-scale heterogeneous distributed systems , The 40-th International Conference on Parallel Processing (ICPP’11), Taipei, Taiwan. Programming Processors: A Hands-on Approach Publishers, ISBN 9780123814722, 2010 OpenCL Workshop on Cluster Computing, 2002. Co., ISBN 9781617290176, 2011 symposium on Principles and practice of parallel programming, pp. 379–387, 2003. Informatics, volume 27, pp. 161–173, 2008. Forum, 2010. [25] GNU Ocelot Homepage [26] J. Fang, A.L. Varbanescu and H. Sips, [22] Hadoop On Demand [23] M. Scarpino, [24] D. Kirk [20] J. Walsh et al., [19] J. M. Squyres, [17] MPI Virtual Team. [18] B. Schuller et al., [21] S. Zhou, Parallel Computing Workshop [16] J. Marco et al,