CS693 Unit V
Total Page:16
File Type:pdf, Size:1020Kb
CS693 – Grid Computing Unit - V UNIT – V APPLICATIONS, SERVICES AND ENVIRONMENTS Application integration – application classification – Grid requirements – Integrating applications with Middleware platforms – Grid enabling Network services – managing Grid environments – Managing Grids – Management reporting – Monitoring – Data catalogs and replica management – portals – Different application areas of Grid computing Application Classification The dimensions relevant to enabling the application to run on a grid are o parallelism o granularity o communications o dependency affects the way in which it can be migrated to a grid environment not mutually exclusive—a specific application can be characterized along most, of these dimensions. Parallelism classification scheme attributed to Flynn has a significant impact on how the application is integrated to run on a grid. Single Program, Single Data (SPSD) simple sequential programs that take a single input set and generate a single output set Motivations 1) A vast number of computing resources are immediately available grid is used as a throughput engine when many instances of this single program need to be executed. Leveraging resources outside the local domain can vastly improve the throughput of these jobs. Example: Logic simulation of a microprocessor design The design needs to be verified by running millions of test cases against the design. By leveraging grid resources, each test case can be run on a different remote machine, thereby improving the overall throughput of the simulation. 2) Remotely available shared data can be used program can be made to execute where the data reside or the data can be accessed via a “data grid.” grid environment enables controlled access to shared data. It is simple to integrate these applications to run on a grid, as no additional code development is required. Single Program, Multiple Data (SPMD) the input data can be partitioned and processed concurrently using the same program. comprises the majority of applications that utilize the grid today and covers a wide range of domains. Examples Finite Element Method evaluations using an MPI-based program Large-scale Internet applications such as SETI@home UD-Cancer project Motivation to significantly improve performance and/or scope by scaling the application out to as many resources on the grid as possible MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Multiple Program Multiple Data (MIMD) broadest category of parallel/distributed applications both the program and the data can be partitioned and processed concurrently. Motivation to improve performance to ensure that resources are more optimally matched to application needs Example there might be partitions of an application that require a very tightly-coupled SMP system while another partition could be run in a more loosely-coupled environment or on a single machine. Multiple Program Single Data (MPSD) require different transformations to be applied to the same set of input data can be carried out concurrently very rare Communications very dependent on the performance of the underlying grid network infrastructure. The communications cost associated with an application is characterized by the following: The cost of initial movement of data and programs to grid resources prior to the computation. o This cost can be avoided on the grid in cases where the data or program is already available at the remote grid resource. The cost of communication while the application execution executes. o This cost is a function of the frequency and size of communication. Granularity based on the granularity of their programs as o coarse-grained o fine-grained specifies how long a program can execute before it needs to communicate with other programs will dictate the amount of overhead associated with running the application remotely on the grid. Dependency programs within an application have no dependencies between them and each program can be scheduled and executed independently of the others. the application would be a good candidate for deployment on the grid Grid requirements 1. Interfaces 2. Job Scheduling Interfaces Users and applications have accessed grids using simple command-line tools and programming APIs Command line tools such as ‘qsub’ and ‘qstat’ offer ways for users to interactively submit and query the status of their jobs. o These tools can be further extended through wrapper scripts o hide the job creation and management complexities from the user. programming APIs provide a way for developers to embed job submission and management functions into an application-specific wrapper program. users execute an application-specific wrapper script or program and input the data required for the job. MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 – Grid Computing Unit - V usually limited to running SPSD-type applications more sophisticated interfaces that allow the creation of single- and multi-dimensional job arrays to support SPMD-, MPMD-, and MPSD-type applications Web browsers are replacing command-line tools enabling users to remotely access the grid from anywhere and at any time. The use of a Web Services description language (WSDL) to describe jobs, applications, and data in conjunction with Web Services provides a very powerful programmatic interface to the grid. Web Services provides a language-independent interface for programmers to develop complex wrappers using their favorite programming language. Web Services is a well-defined standard, grids based on it will easily integrate into existing environments The Open Grid Services Architecture (OGSA) is in the process of defining a set of Web Services Description Language (WSDL) interfaces for creating, managing, and securely accessing large computational grids. Job Scheduling To provide transparent and efficient access to remote and geographically distributed resources. A scheduling service is necessary to coordinate access to the different resources such as network, data, storage, software, and computing elements that are available on the grid. the heterogeneous and dynamic nature of the grid makes scheduling complicated. Schedulers need to automate the three essential tasks viz o resource discovery, o system selection o job execution. At a very basic level, jobs, once submitted into the grid, are o queued by the scheduler o dispatched to compute nodes as they become available. grid schedulers are typically required to dispatch jobs based on a well-defined algorithm such as o First-Come-First-Serve (FCFS), o Shortest-Job-First (SJF), o Round-Robin (RR), or o Least-Recently-Serviced (LRS). Schedulers have to support advanced features such as: o User requested job priority o Allocation of resources to users based on percentages o Concurrency limit on the numbers jobs a user is allowed to run o User specifiable resource requirements o Advanced reservation of resources o Resource usage limits enforced by administrators Data Management Data processed by jobs are typically sourced at submission time obtained by the compute node from a shared file system (NFS, AFS, DFS) at execution time. o the location of the data is input at submission o each node is required to have access to the specified location work well when the input data size is small or when all nodes in the grid have access to a global shared file system. Typical data sizes in both commercial and R&D are in gigabytes Several hardware and software solutions are available o such as distributed data caching, replication, uniform namespace virtualize access to data and provide a seamless way to store and retrieve data. MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Remote Execution Environment Jobs executing on a grid require the same environment as they would if executed on submitters’ machine These include environment variables, runtime environments such as Java, Small Talk, or .NET CLR, and libraries such as C language runtime and operating system libraries. This total environment may be already available on the compute node or may have to be created by the job on node prior to execution. Many grid frameworks provide tools and well-documented procedures to create such environments. Security grids are vulnerable to security attacks at all network points Some grid frameworks offer end-to-end security that includes sand-boxing of jobs sand-boxing feature enforces a security wall between the job execution environment and the running node o Example, a desktop user will be unable to view or manipulate the grid job running on his computer. The following are some basic features required to operate a secure grid: Authentication used to positively verify the identity of users, devices, or other entity in the grid, accomplished using passwords and challenge-and-response protocols. Confidentiality assurance that job information is not disclosed to unauthorized persons, processes, or devices. accomplished through access controls, and protection of data through encryption techniques. Data Integrity refers to a condition when job data are unchanged from their source and have not been accidentally or maliciously modified, altered, or destroyed. accomplished through checksum validation and digital signature schemes. Non-repudiation is a method by which the sender of job data, such as the grid scheduler, is provided with proof of delivery and the recipient, such as the