CS693 Grid Computing Unit - IV UNIT – IV NATIVE PROGRAMMING AND SOFTWARE APPLICATIONS

Desktop supercomputing – – parallel programming paradigms – problems of current parallel programming paradigms – Desktop supercomputing programming paradigms – parallelizing existing applications – Grid enabling software applications – Needs of the Grid users – methods of Grid deployment – Requirements for Grid enabling software – Grid enabling software applications Desktop Supercomputing: Parallel Computing – Historical Background

MIMD

 The language of desktop supercomputing, CxC, combines the advantages of C, Java, and Fortran and is designed for MIMD architectures  any parallel not following the SIMD approach (one-program-on-one-processor-controls-all-others) automatically fell into the MIMD category. Parallel Asynchronous Hardware Architectures

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 Grid Computing Unit - IV List of popular MIMD hardware architectures:  Symmetric Systems (SMP)  Processing Systems (MPP)  Cluster computers  Proprietary  Cache-Coherent-Non- (CC-NUMA) computers  Blade servers  Clusters of blade servers MIMD computer classification  Single-Node/Single-Processor (SNSP)  Single-Node/Multiple-Processors (SNMP)  Multiple-Node/Single-Processor (MNSP)  Multiple-Node/Multiple-Processor systems (MNMP) Single-Node/Single-Processor (SNSP)  also known as von-Neumann computers  same as Flynn’s Single-Instruction-Single-Data (SISD) category Single-Node/Multiple-Processors (SNMP)

computers having multiple processors within the same node accessing the same memory  Representatives are blade servers, symmetric multiprocessing systems (SMP), CC-NUMA architectures, and other custom-made high-performance computers.  Array and vector computers (SIMD) would fall into this category Multiple-Node/Single-Processor (MNSP)

 distributed-memory computers represented by a network of workstations

MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Multiple-Node/Multiple-Processor systems (MNMP)

 multiple shared-memory computers (SNMPs) connected by a network.  MNMPs are a loosely coupled cluster of closely coupled nodes. Typical representatives for loosely couple shared-memory computers are SMP clusters or clusters of blade servers Parallel Programming Paradigms

 Single Node Single Processor (SNSP)  Single Node Multi Processor (SNMP)  Multi Node Single Processor (MNSP)  Multi Node Multi Processor (MNMP)

Single Node Single Processor (SNSP)

 preemptive multitasking is used as the parallel processing model.  All processes share the same processor—which spends only a limited amount of time on each o so their execution appears to be quasi-parallel.  The local memory can usually be accessed by all threads/processes during their execution time

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Single Node Multi Processor (SNMP)

 based on symmetric multiprocessing hardware architecture  Shared memory is used as the parallel processing model  each processor works on processes truly in parallel and each process can access the shared memory of the compute node Data access in SNMP computer  connected through a high-speed connection fabric, they represent this single shared-memory system  The OS has been parallelized so that each processor can access the system memory at the same time.  The shared-memory programming model is easy to use  all processors are able to run a partitioned version of sequential algorithms created for single processor systems. Share Memory Paradigm in Symmetric Multi Processing

Disadvantage of SNMP systems: is limited to a small number of processors.  Limitations are based on system design  include problems such as bottlenecks with the memory connection fabric or I/O channels,  every memory and I/O request has to go through the connection or I/O fabric. Methods for shared-memory (asynchronous) parallelism are  OpenMP,  Linda, or (GA). Data parallel synchronous parallelism  High Performance FORTRAN (HPF). Multi Node Single Processor (MNSP)

The programming model for standard MNSP (distributed-memory) computers, such as clusters and MPP systems, usually involves a message-passing model

Message-Passing Model

 parallel programs must explicitly specify communication functions on the sender and the receiver sides.  When data needed in a computation are not present on the local computer o issuing a send function to the remote computer holding the data o issuing a receive function at the local computer.  The process for passing information from one computer to another computer via the network includes o data transfer from a running application to a device driver; o the device driver then assembles a message to be transferred into packets to the remote computer, which is subsequently sent through networks and cables to the receiving computer. o On the receiving computer’s side, the mirrored receiving process has to be initiated: the application triggers “wait for receiving a message,” using the device driver. o Finally, the message arrives in packets, reconstructed, handed over to the waiting application.

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Message Passing Paradigm Architecture

Disadvantages for the Message-Passing Model  time delay/loss o due to waiting on both transmitting and receiving ends;  synchronization problems, such as o applications can wait indefinitely if a sender sends data to a remote computer not ready to receive it  data loss o each sender needs a complementing receiver (if one fails, data gets lost);  difficult programming o algorithms have to be specifically programmed for the message-passing model o sequential algorithms cannot be reused without significant changes Distributed Shared Memory (DSM) or Virtual Shared Memory

 An approach used to overcome the difficult-to-program problem of the message- passing model  simulation of the shared-memory model on top of a distributed-memory environment.  provides the function of shared memory, even though physical memory is distributed among Disadvantage : loss of performance:  every time a processor tries to access data in a remote computer, the local computer performs message passing of a whole memory page. This leads to huge network traffic and the network becomes such a significant bottleneck - the decreased performance becomes unacceptable for most applications

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Problems Of Current Parallel Programming Paradigms

 new art for most software developers  parallel computers are very expensive and not available to most software developers  complexity of parallel programming o With different architectures, there are also different parallel programming paradigms o no satisfying model for Multiple-Node/Multiple-Processor computers, o simulated shared memory leads to unacceptable performance for the application.  Shared-Memory Programming Model and the Message-Passing Programming Model offer both advantages and disadvantages. o Algorithms implemented in one model have to be reprogrammed with significant o There is no effective parallel processing paradigm that works for both SNMP and MNSP systems. o Mixing models is unacceptable due to complexity in the programs that makes maintenance difficult. o Complexity of programming and lack of a standardized programming model for hybrid compute clusters is an unsatisfying and unacceptable situation Desktop supercomputing programming paradigms Connected Memory Paradigm

 allows developers to focus on building parallel algorithms by creating a virtual parallel computer consisting of virtual processing elements.  It effectively maps, distributes, and executes programs on any available physical hardware.  Then it maps a virtual parallel computer to available physical hardware, o with creation of algorithms independent of any particular architecture. Desktop Supercomputing makes the parallelization process for even complex problems simple.

It enables:

Ease of programming  The language CxC allows developers to design algorithms by defining a virtual parallel computer o instead of having to fit algorithms into the boundaries and restriction of a real computer. Architecture Independence  Executables run on any of the following architectures without modification: SNMP, MNSP, or MNMP.  Today developers can use shared memory on SNMP and message passing on MNSP architectures that are distinctly different, requiring significant effort to rewrite programs for the other architecture. Scalability  Developers can create programs on small computers and run these same programs on a cluster of hundreds or thousands of connected computers.  This scalability allows testing of algorithms in a laboratory environment and tackling problems of sizes not previously solvable. Enhancement  It has the ability to unleash the performance of MNMP computers that have the best performance/price ratio of all parallel computers. Desktop Supercomputing with CxC  offers the advantages of message passing—using distributed-computing solutions—with the easier programmability of shared memory.

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Parallel Programming in CxC

 CxC is the language of Desktop Supercomputing.  When using CxC, you undergo an intrinsic parallelization process by creating a virtual parallel computer that may consist of millions of parallel processing elements communicating to each other via topology. Every CxC program consists of three main creation steps:  Specify the Virtual Parallel . o create a virtual parallel computer architecture o consisting of array controllers and parallel processing units (PPUs).  Define the Communication Topology. o define the communication topology between them  Implement the Parallel Programs o implementation of the programs running on each PPU Example //// My first CxC program hello.cxc //// controller and unit declaration

controller ArrayController // create processor controller { // create parallel processors unit ParallelProcessor[30]; }

//// don't need a topology since there is no communication //// between processors taking place

//// program implementations

main hello(10) // execute 10 times { program ArrayController // program for all processors { // of declared controller println("hello parallel world!"); } }  This simple CxC program creates 30 processors that will run all the same program.  The parallel machine will be executed 10 times.  The result is the following output 300 times (30 processors * 10 times executed). hello parallel world! hello parallel world! hello parallel world! Parallelizing Existing Applications

CxC solution

 removes computational, platform, and scalability barriers;  significantly lowers overall time and costs of development;  enables a new parallel computing paradigm, which provides the best platform for highly parallel applications.  offer an easy way to parallelize existing serial applications

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

 works as an intermediate “glue” of FORTRAN, C, and C++ functions.  maintains the orginal performance that have been achieved in these libraries.  offers a great way to simplify the development and implementation of parallel algorithms o with a huge number of interdependent elements and their interactions.  simulation of interacting particles is considered as some the most complex and challenging applications that require tremendous computational resources. Grand Challenges  sheer size of the mathematical problems,  their very complexity combined with the required computational power Grid enabling software applications

 The Needs of Grid Users  Grid Deployment Criteria  Methods of Grid Deployment  When to Grid-enable Software  Requirements for Grid-Enabling Software  Grid Programming Tools and Expertise  The Process of Grid-enabling Software Applications  Grid-enabling a Mainstream Software Application: An Example Needs of the Grid users

Three groups of stakeholders  Application End Users  Business Enterprises  Application Developers Application End Users

 grid-enabled applications be simple to use.- primary need  benefits of the grid need to outweigh the difficulty they must incur in order to use it.  they do not have to fundamentally change the way they use an application – for adoptation  intolerant of any requirement for extensive configuration or management. o A zero configuration, zero administration “plug and play” approach is de rigueur for end users. Business Enterprises

 the benefits of the grid need to outweigh the difficulty they must incur in order to use it o determined by calculating the return on investment that grid implementation provides. o straightforward costs like, . cost of grid infrastructure and project-based costs for implementation o “soft” costs of grid deployment . expense associated with business process engineering  simplicity of use  control and management o needs the grid to be simple to manage o many users and many resources to manage according to various business rules and shifting priorities Application Developers

 Types o independent software vendors (ISVs), o in house developers, o third-party solutions integrators (SIs).  benefits of the grid must outweigh the difficulty the developer must incur in creating new applications or changing existing applications.

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

 grid capability to be a software feature  it must be simple to develop grid-enabled applications. Grid Deployment Criteria (O)

To determine if it is worthwhile to deploy a  compute grid—whether local, enterprise-wide, or global in scope  it is essential to establish whether or not there is sufficient resulting benefit or return derived from the  required investment of effort, time, money, and resources. There are three significant benefits that can be achieved with Grid Computing.  capable of providing powerful processing capacity that meets the extreme requirements of high performance computing applications.  allows computationally intensive software applications to run significantly faster  can raise the efficiency of computing resources in an enterprise network o from the typical 10 percent usage of desktops and 30 percent utilization of server capacity to the 80 percent to 90 percent range Methods of Grid deployment

Two different methods  The scripted batch queue distribution method  The programmatic method of coding for parallel distributed processing The scripted batch queue distribution method,

 Distributed Resource Management (DRM) solutions use the batch queue method o Examples: Sun ONE Grid Engine, Platform LSF  Grid deployment can be achieved with little or no code modification by o replicating an application across several computers o distributing computing jobs through scripting.  Multiple jobs from application users are submitted and queued,  DRM software most efficiently and appropriately allocates the jobs to the available computing resources. Example  ten jobs that each require one hour to process  all be completed in one hour when distributed across a grid of ten identical computers. Conditions  There is a large quantity of jobs for a single application to process;  there is a large pool of computing resources available and capable of performing processing;  the application is scriptable  there is sufficient MIS/IT expertise to configure, deploy, and manage  If there is only one job to be processed, for example, or if there is only one computer to do the processing, the time required to complete the job(s) cannot be reduced. The programmatic method of coding

 allow a single, large job to be broken down into several smaller tasks  sent out and processed individually on separate computers  results returned to be recompiled.  Instead of requiring the application to be installed, o each task provides the compute resource with an instruction set o only the portion of data it requires for the computation.  can be effective with even a single job, as few as two computing resources,  can be applied to both scriptable and non-scriptable applications.  require software development expertise and access to application source code

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Example  a single job that requires one hour to process could be completed in o thirty minutes on two computers o six minutes across a grid of ten identical computers. When to Grid-enable Software (O)

categorized as either scriptable or non-scriptable Three key factors to consider in determining the most appropriate method of grid deploying a scriptable application are:  The number of jobs to be processed;  the number of available resources  the size of the jobs and capacity of the resources be preferable to distribute scriptable applications using the programmatic method if:  There are few jobs.  There are few resources.  Typical job size is very large.  The capacity of most resources is small.  User self sufficiency is preferred to IT control and management.  The frequency of job submission is sporadic.  Software deployment on each compute resource is prohibitive for reasons of cost, provisioning, or inadequate system requirements. Hybrid approach standards such as Distributed Resource Management Application API (DRMAA) and OGSA, programmatic solutions can interface with DRM software to create an integrated environment where both scriptable and non- scriptable applications can be distributed together on a common grid infrastructure Requirements for Grid enabling software

Two requirements must be met in order to modify software for grid deployment: 1. Access to the application source code and the 2. Ability to modify it  both the legal right and the development expertise necessary to change an uncompiled application. There are three groups that meet these requirements. Independent Software Vendors (ISVs)  develop and commercially distribute software applications.  ISVs own their software code and have software developers in their employ. Academic Institutions And Enterprises In Research-Intensive Industries  Example: life sciences that use open source software applications.  Open source software licenses permit modifications of code,  allow redistribution of the modified version, subject to certain conditions. Enterprises That Have Developed Their Own Proprietary Software Applications  for securing competitive advantage through superior implementation of information technology  As these applications are proprietary, enterprises typically own or in some fashion retain intellectual property rights to the source code.

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Grid Programming Tools and Expertise

The primary tools for modifying code to enable parallel distributed processing have been protocols such as  MPI (message passing interface)  PVM (). GridIron XLR8

 development tool that simplifies the process of implementing application-embedded parallel processing. Consists of two parts  An application developers’ toolkit, or SDK, o comprised of that are added to the source code of a computationally intensive application, o documentation, o sample applications, o other tools and materials to assist a software developer in modifying their code;  runtime software o installed on each computer in a grid providing the processing power. GridIron XLR8 provides APIs at a high level of abstraction  developers do not have to worry about communications level programming  they simply work with familiar variables and data as they would with a serial program. The GridIron XLR8 runtime software  allows the computers on the grid to discover each other  automatically set up a processing network, distribute work,  recover from failure  This autonomic capability eliminates the need to code or manage the processing environment outside the application. The Process of Grid-enabling Software Applications

Process steps has to have the following three characteristics:  They can be split into smaller tasks.  Each task can be processed on a separate computer.  The results from each task can be returned re-assembled into one final result. Analysis

To grid-enable an application for distribution, it must first be analyzed prior to modification of the software source code. Identifying Hot Spots

 identify the partition points at which the application is most appropriately split into smaller tasks.  achieved by locating the computational hot spots in the application algorithm  majority of the execution time is spent running an encapsulated sub-algorithm.  typically found within portions of iterative code, o i.e., nested FOR or WHILE loops.  Execution profiling can be used to empirically identify partition points  by highlighting which few lines of code account for a high percentage of the application execution time. Tightly Coupled versus Algorithms Application algorithms fall into the two categories. Embarrassingly Parallel  Some algorithms are easily segmented into tasks that can be processed entirely independently.  Scriptable applications that can be deployed using the batch

MTech CSE (PT, 2011-14) SRM, Ramapuram 11 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

 many non-scriptable can be modified using the programmatic method.  Example: The MPEG encoder application Tightly Coupled Algorithms  have dependencies on interim communication and exchange of results in the computational process.  do not lend themselves to batch queue distribution  require code modification  generally require more effort and are more difficult to modify through programmatic means. Operating Environment

 determine what specific items are required for each task to be processed on a separate computer.  identifying the application requirements for local files, libraries, or databases,  special licensing or hardware requirements.  These need to be dynamically provided to the grid as part of the task o if they are not statically pre-installed on all compute nodes. o The GridIron XLR8 framework provides support for the automated distribution of these files. Results Generation  identify what kind of results the task computation will generate  determine how task computation results are to be stored and/or processed.  task results will arrive asynchronously in a different order than they were defined. Application Modifications

the application code can be modified using the GridIron XLR8 development tool  to allow the distributed tasks to be split  sent to the grid for computation,  re-assemble the individually returned task results into a single aggregate job result. Defining Tasks  an instance of the application called the distributor needs to be created  The distributor contains the GridIron XLR8 defineTask method, o divides the job into smaller tasks  At run time, the GridIron XLR8 framework will repeatedly invoke the function to create new tasks whenever compute resources are available.  Through defineTask, the distributor controls the size of each task.  task size can be varied based on constraints of the deployment environment, o interconnect speed, o whether the network is static or dynamic, o reliability and availability of the network and computers, o the minimum and maximum number of computers making up the grid.  a default task size is selected which maximizes the ratio of the task computation time relative to the time required for setup, result recompiling and communication.  the size of a task should be configurable o optimization can be easily achieved without making further changes to the application code.  GridIron XLR8 framework will invoke defineTask until the entire job is complete. Task Computation

 create an instance of the application called the executor,  defines a new function completely encapsulating the algorithm contained inside the selected hot spot.  accomplished with the GridIron XLR8 doTask method.  This is where the computationally intense part of the application code goes.  GridIron XLR8 distributes the doTask method to the compute nodes for execution.  Upon completion of the computation, the framework will automatically return results back to the distributor.

MTech CSE (PT, 2011-14) SRM, Ramapuram 12 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Result Re-assembly  modify the distributor to define another new function containing the results-handling portion of the algorithm  This is the GridIron XLR8 checkTaskResults method,  it contains code originally found within or immediately following the application hot spot.  At run time, after processing is complete, the GridIron XLR8 framework will ensure all files and/or data defined by the executor have been transmitted back to the distributor.  task results will be returned in a different order than that in which they were defined, results may have to be stored until all results have been received, whereupon the final, aggregated result can then be generated. Grid-enabling a Mainstream Software Application: An Example Video Encoding

 Video encoding - applications used for digital content creation.  Many processes in the creation of digital audio, video, and graphics are computationally intensive.  For example, rendering, compositing, animation and encoding/decoding either require large amounts of processing and handle large amounts of data  Video encoding is one such computationally intensive digital content creation software process.  Grid Computing provides powerful processing capacity at low cost using commodity computing hardware  it offers tremendous potential as a means of accelerating digital video encoding  an open source MPEG-4 software encoding application was grid-enabled to allow video encoding to be performed more rapidly on a compute grid  Video encoding is a process for compressing raw digital video data by several factors in order to make storage and transmission more practical. MPEG-4 encoding involves a number of tasks, including:  image processing;  format conversion;  quantization and inverse quantization;  discrete cosine transform (DCT);  inverse DCT  motion compensation;  motion estimation. motion-compensated prediction is highly computationally intensive, and can require as many as several gigaflops per second The Need for Speed

three pronounced trends 1) The growing popularity of HDTV (high definition television)  NTSC standard at a resolution of 720 by 480 pixels and a frame rate of 30 frames per second is delivered at a bit rate of 249 Mbps and requires approximately 1.9 GB of storage per minute  HD standard of 1920 by 1080 pixels at the same 30 fps frame rate is delivered at a bit rate of 1.5 Gbps and requires approximately 1.1 TB of storage per minute of raw video  With the bit rate increased by a factor of six and the storage requirement increased by over two orders of magnitude, HD has much more rigorous compression requirements than the current NTSC standard 2) New Standards in addition to new video presentation standards, each new digital coding standard is similarly satisfied only through greater computing power 3) digital video (DV) cameras are among the fastest selling consumer electronics products

MTech CSE (PT, 2011-14) SRM, Ramapuram 13 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Current Solutions

Two general categories of solutions currently exist hardware solutions  hardware blocks  loosely coupled coprocessors,  available to speed the encoding process Software encoding solutions,  afford superior portability and flexibility  expensive and difficult to scale to achieve fast encoding Grid Deployment of Video Encoding

 a number of academic studies have investigated the applicability of distributed processing and data distribution methods for the purposes of video encoding  develop a complete implementation that can allow distributed video encoding to be successfully migrated from the research lab to commercial environments Requirements for Broad Marketplace Adoption

 Simple enough for application end users to deploy and use without having to acquire special skills or knowledge  Sufficiently robust to operate reliably outside of controlled lab conditions where environments may have varied hardware, , and network elements  Fast and easy for the commercial software developers to integrate into their video encoding applications Overview of MPEG 4 Encoder

The MPEG4IP package includes  an MPEG-4 AAC audio encoder,  an MP3 encoder,  two MPEG-4 video encoders,  an MP4 file creator and hinter,  an IETF standards-based streaming server,  an MPEG-4 player that can both stream and playback from local file. MPEG4IP’s tools are available on the Linux platform, various components have been ported to Windows, Solaris, FreeBSD, BSD/OS, and Mac OS X. Overview of GridIron XLR8

GridIron XLR8 consists of two parts:  An application developers’ toolkit, or SDK, comprised of APIs, plus documentation, sample applications, and other tools and  runtime software that is installed on each computer in a network, providing additional processing power Using GridIron XLR8 to add to the MPEG4IP video encoder is beneficial in three ways:  It provides a simple and rapid development environment.  It eliminates the need to code or manage the processing environment outside the application.  it is simple for end-users to work with the final compiled & installed version of the distributed software embedded directly into the software applications.  Once compiled and installed, users can benefit from the speed of distributed computing

MTech CSE (PT, 2011-14) SRM, Ramapuram 14 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Distributed Computing Strategy

 The MPEG-4 specification makes use of a hierarchy of video objects to represent content.  A Video Session is the top tier in the hierarchy, representing the whole MPEG-4 scene.  Each Video Session, or scene, is populated with Video Objects, which can be encoded as single or multiple Video Object Layers.  A Video Object Layer is in turn comprised of a Group Of Pictures (GOP).  A Group Of Pictures is a collection of three different picture types o I-pictures, or intra pictures . are pictures that are moderately compressed and coded without reference to other pictures o P-pictures, or “predictive” pictures, . take advantage of motion-compensated prediction from a preceding I- or P-picture to allow much greater compression. o B-pictures, or bi-directionally-predictive pictures . use motion-compensated prediction from both past and future pictures to allow the highest degree of compensation.  GOPs have the same number of pictures per sequence and are similar in size.  Segmenting the video data for distribution at the GOP tier in the hierarchy takes advantage of these characteristics to effectively treat the MPEG-4 encoding process as parallel software algorithm.  The size and structure of the GOPs makes them convenient to distribute for parallel processing at a reasonable level of granularity, to accommodate typical CPU and bandwidth capabilities, and to achieve reasonable load balancing across multiple processors. Implementation

implementation process undertaken to grid-enable and test the application Application Modification  identification of the appropriate partition points in the encoder, i.e., areas where data are encoded into I-, P- , and B-pictures  Files were modified as required in order to segment and distribute the video data based on the GOP strategy  The number of frames in a GOP is configurable as 20 frames  The raw video data were comprised of a total of 1,824 frames.  At a partitioning of 20 frames, this yielded a total of 92 partitions.  Each of the partitions was treated as an individual processing task, and encoded by an individual XLR8 runtime software peer.  This yielded 92 compressed files, which were asynchronously returned and appropriately multiplexed back into a single, new compressed file.  Encoding occurred at a target bit rate of 1.5 Mb per second Hardware  A grid comprised of 13 IBM xSeries 335 servers was used for this implementation,  each with dual 2.0 GHz Intel XEON processors,  1.0 Gb RAM  Windows 2000® Server operating system.  The modified MPEG4IP MPEG-4 software encoder was installed on one of the 13 machines.  Only the GridIron XLR8 peer runtime software was installed on the remaining 12 xSeries computers, which were configured as individual servers and connected together with a gigabit switch. Data  ~60-second NTSC broadcast quality (YUV [4:2:0]) at a resolution of 720 × 480 pixels @ 29.97 fps  The size of the uncompressed file was 923,400 KB. (900 MB)

MTech CSE (PT, 2011-14) SRM, Ramapuram 15 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Hyperthreading  performance improvement technologies o vectorization (e.g., AltiVec), Single Instruction Multiple Data (SIMD), Pthreads, hyperthreading, SSE2  Hyperthreading provides dual simultaneous execution of two threads on the same physical processor  Performance improvements : 5 percent to approximately 30 percent.  IBM xSeries 335 hardware and the Windows 2000 Server operating system supports HT  resulting in four instances of the GridIron XLR8 runtime software per node (machine)  each dual processor node with the ability to execute four independent parallel processing tasks GridIron XLR8 Runtime Software  manages the processing environment  allows computers to discover each other  automatically set up a processing network, distribute work, recover from failure.  total of 48 peers deployed on the 12 servers acting as processing nodes in this implementation.  Each peer has an installed footprint of 24 MB Results Output  generated a compressed and encoded MPEG-4 file that was decodable and playable at expected levels of quality using an existing MPEG player. Compression

 The resultant MPEG-4 compressed file was 13,798 KB, o a reduction of about 98.5 percent from the original raw video data. Speed Improvement

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 16 hcr:innovationcse@gg