CS693 Unit - I UNIT- I INTRODUCTION Grid Computing values and risks – History of Grid computing – Grid computing model and protocols – overview of types of Grids Grid Computing values and risks Key value elements that provide credible inputs to the various valuation metrics that can be used by enterprises to build successful Grid Computing deployment business cases.

Each of the value elements can be applied to one, two, or all of the valuation models that a company may consider using, such as return on investment (ROI), total cost of ownership (TCO), and return on assets (ROA)

Grid Computing Value Elements • Leveraging Existing Hardware Investments and Resources • Reducing Operational Expenses • Creating a Scalable and Flexible Enterprise IT Infrastructure • Accelerating Product Development, Improving Time to Market, and Raising Customer Satisfaction • Increasing Productivity Leveraging Existing Hardware Investments and Resources • Grids can be deployed on an enterprise’s existing infrastructure o mitigate the need for investment in new hardware systems. • eliminating expenditure on air conditioning, electricity, and in many cases, development of new data centers Example: grids deployed on existing desktops and servers provide over 93 percent in up-front hardware cost savings when compared to High Performance Computing Systems (HPC)

Reducing Operational Expenses • Key self-healing and self-optimizing capabilities free system administrators from routine tasks o allow them to focus on high-value, important system administration • The operational expenses of a Grid Computing deployment are 73 percent less than for comparable HPC- based solutions • grids are being deployed in enterprises in as quickly as two days, with little or no disruption to operations • Cluster system deployments, on the other hand, are taking 60–90 days, in addition to the days required to configure and deploy the applications Creating a Scalable and Flexible Enterprise IT Infrastructure • Grid Computing allows companies to add resources linearly based on real-time business requirements • These resources can be derived from within the enterprise or from utility computing services • projects never have to be put on hold for a lack of computational capacity, space, or system priority • The entire compute infrastructure of the enterprise is available for connecting • Grid Computing can help bring about the end of departmental silos and expose computational assets curtained by server huggers and bureaucracy Accelerating Product Development, Improving Time to Market, and Raising Customer Satisfaction • The dramatic reduction in, simulation times can get products completed quickly. • This also provides the capability to perform a lot more detailed and exhaustive product design, as the computational resources brought to bear by the grid can quickly churn through the complex models and scenarios to detect design flaws Example: Life Sciences: Drug Discovery • introducing a “New Chemical Entity” (drug)” into the market costs US $802M @ 12–15 years

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 Grid Computing Unit - I

• Grid Computing is allowing drug companies to get the most out of their R&D expenditure by developing the right product and getting it to market in the shortest possible time. • can save almost US $5M per month in R&D expenses of the drug development process • can amount to almost US $1M per day for each day that the product is brought to market early Increasing Productivity • run times of jobs submitted by its engineers were reduced by 58 percent by deploying a grid Example:

Risk Analysis

the key risk factors that plague technology deployments and analyze its vulnerabilities

Lock-in • the IT manager will not be making any investments in durable complementary assets which promote lock-in • software, and supporting infrastructure and may contribute to customer lock-in. • pay keen attention to which vendors are supporting the Grid Computing standards at the Global Grid Forum Switching Costs • the primary switching cost will be driven by the effort required to integrate and enable enterprise applications to work on whatever replacement grid infrastructure that has been selected. • performed by utilizing software development toolkits • introduce new grid software in the enterprise to support new grid-enabled applications, while letting the existing software deployment and its integration with legacy grid software remain unchanged. Project Implementation Failure • project failure, either due to bad project management or incorrect needs assessment • take advantage of hosted pilot and professional services offered by grid software vendors • This will allow the IT manager to accurately pre-assess the suitability of the grid software, level of integration required, and feasibility (application speedup times, productivity gains, etc.). • Hosted pilots are conducted on the vendors’ data centers and have no impact to the company

MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 Grid Computing Unit - I History of Grid Computing Academic Research Projects

I-Way • 11 high speed networks were used to connect 17 sites with high-end computing resources for a demonstration to create one super “metacomputer* • Sixty different applications, spanning various faculties of science and engineering, were developed and run over this demonstration network. • Many of the early Grid Computing concepts were explored Globus • a suite of tools that laid the foundation for Grid Computing activities • 80 sites worldwide running software based on the Globus Toolkit were connected together Entropia • to harness the idle computers worldwide to solve problems of scientific interest • grew to 30,000 computers with aggregate speed of over one teraflop per second • ordinary users volunteered their PCs to analyze research topics such as patient’s response to chemotherapy, discovering drugs for AIDS, and potential cures for anthrax

High-Performance Computing

• refers to supercomputing • There are hundreds of supercomputers deployed throughout the world. • Key parallel processing algorithms have already been developed to support execution of programs on different, but co-located processors. • High-performance computing system deployment, is not limited to academic or research institutions. • The industries in which high performance systems are deployed are numerous in nature. • Example: Telecommunication, Finance, Automotive, Database, Transportation, Electronics, Geophysics, Aerospace, Energy, World Wide Web, Information Services, Chemistry, Manufacturing, Mechanics, Pharmaceutics Cluster Computing

• Clusters are high-performance, massively parallel computers built primarily out of commodity hardware components, running a free-software such as or FreeBSD, and interconnected by a private high-speed network. • It consists of a cluster of PCs, or workstations, dedicated to running high-performance computing tasks. • The nodes in the cluster do not sit on users’ desks, but are dedicated to running cluster jobs. • A cluster is usually connected to the outside world through only a single node • numerous tools have been developed to run and manage clusters • load-balancing tools, tools to adapt applications run in the parallel cluster environment (Ex: ForgeExplorer)

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 Grid Computing Unit - I Peer-to-Peer Computing

• Two main models: centralized model, decentralized model Centralized model • file sharing is based around the use of a central server system that directs traffic between individual registered users. • The central servers maintain directories of the shared files stored on the respective PCs of registered users of the network. • These directories are updated every time a user logs on or off the Napster server network. • Each time a user of a centralized P2P file sharing system submits a request or searches for a particular file, the central server creates a list of files matching the search request by cross-checking the request with the server’s database of files belonging to users who are currently connected to the network. • The central server then displays that list to the requesting user. • The requesting user can then select the desired file from the list and open a direct HTTP link with the individual computer that currently possesses that file. • The download of the actual file takes place directly, from one network user to the other. • The actual file is never stored on the central server or on any intermediate point on the network. Decentralized model • file sharing does not use a central server to keep track of files. • it relies on each individual computer to announce its existence to a peer, which in turn announces it to all the users that it is connected to, and so on. • If one of the computers in the peer network has a file that matches the request, it transmits the file information (name, size) back through all the computers in the pathway to the user that requested the file. • A direct connection is established and the file is transferred Internet Computing

• utilize the vast processing cycles available at users’ desktops • Large compute intensive projects are coded so that tasks can be broken down into smaller subtasks and distributed over the Internet for processing. • Volunteer users then download a lightweight client onto their desktop, which periodically communicates with the central server to receive tasks. • The client initiates the tasks only when the desktop CPU is not in use. • Upon completion of the task, it communicates results back to the central server. • central server aggregates the information received from all the different desktops and compiles the results • Many area of interests and projects o Science: SETI @ Home, eOn o Life Science: Find-a-Drug, Genome@Home o Cyptography: Distributed.net, ECCp-109 o Mathematics: PCP@Home, Proth Prime Search Grid Computing

• Acronym for Global Resource Information Database • Grid Computing enables virtual organizations to share geographically distributed resources as they pursue common goals, assuming the absence of central location, central control, omniscience, and an existing trust relationship • Virtual organizations can span from small corporate departments to large groups of people from different organizations • A resource is an entity that is to be shared: computational, storage, sensors, bandwidth etc • the resources do not have prior information about each other nor do they have pre-defined security relationships

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 Grid Computing Unit - I Grid Computing Model And Protocols Challenges in sharing of resources in Grid across boundaries • Identity and Authentication o Is this user who he says he is? Is this program the right program? • Authorization and Policy o What can the user do on the grid? What can the application do on the grid? What resources are the user and or application allowed to access? • Resource Discovery o Where are the resources? • Resource Characterization o What types of resources are available? • Resource Allocation o What policy is applied when assigning the resources? What is the actual process of assigning the resources. Who gets how much? • Resource Management o Which resource can be used at what time and for what purpose? • Accounting/Billing/Service Level Agreement (SLA) o How much of the resources is being used? What is the rating schedule? What is the SLA? • Security o How do I make sure that this is done securely? How do we know if we have been compromised? What steps are taken once a security breach is detected? Grid computing architecture model • A set of protocols and mechanisms need to be defined that address the security and policy concerns of the resource owners and users. • The grid protocol(s) should be flexible enough to deal with many resource types, scale to large numbers of resources with many users and many program components in an efficient and cost effective manner • In addition to the grid protocols that have to be defined, a set of grid applications programming interfaces () and software development toolkits (SDKs) need to be defined. They provide interfaces to the grid protocols and services as well as facilitate application development by supplying higher-level abstraction. • The grid architecture model shown in Figure has been closely aligned with the Internet protocol architecture as defined by the Open Systems Interconnect (OSI) Internet stack.

Grid Computing architecture model—detail • Protocols, services, and APIs occur at each level of the grid architecture model. • Figure shows the relationship between APIs, services, and protocols. • At each protocol layer in the grid architecture, one or more services are defined. • Access to these services is provided by one or more APIs. • More sophisticated interfaces, or software development toolkits, provide complex functionality that may not map one to one onto service functions and may combine services and protocols at lower levels in the grid protocol stack

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 Grid Computing Unit - I

• At the top of this figure, we include languages and frameworks, which utilize the various APIs and SDKs to provide programming environments to the grid application.

• each layer provides a set of services that allow Grid Computing resources to be identified and accessed securely based on a set of rules. • The rules are defined both by the user of the resource and the owner. • The services can be accessed by through a set of applications programming interfaces and software development toolkits that have been defined for each layer The Fabric Layer • includes the protocols and interfaces that provide access to the resources that are being shared. • This layer is a logical view rather than a physical view. Example • the view of a cluster with a local resource manager is defined by the local resource manager and not the cluster hardware • the fabric provided by a storage system is defined by the file system that is available on that system and not the raw disk or tapes. The Connectivity Layer • defines core protocols required for grid-specific network transactions. • These utilize the existing Internet protocols such as IP, Domain Name Service, various routing protocols such as BGP, and so on. • Another set of protocols defined by the connectivity layer include the core grid security protocol. • This is also known as the Grid Security Infrastructure (GSI). • GSI provides uniform authentication, authorization, and message protection mechanisms. • It also provides for a single sign-on to all the services that will be used and it utilizes public key technology such as X.509.[11] The Resource Layer • defines protocols required to initiate and control sharing of local resources. • Protocols defined at this layer include: o Grid Resource Allocation Management (GRAM) . Remote allocation, reservation, monitoring, and control of resources o GridFTP (FTP Extensions) . High performance data access and transport o Grid Resource Information Service (GRIS)

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 Grid Computing Unit - I . Access to structure and state information • These protocols are built on the connectivity layer’s grid security infrastructure and utilize standard IP protocols for communications. The Collective Layer • defines protocols that provide system oriented (versus local) capabilities for wide scale deployment. • This includes index or meta-directory services so that a custom view can be created of the resources available on the grid. • It also includes resource brokers that discover and then allocate resources based on defined criteria. The Application Layer • defines protocols and services that are targeted toward a specific application or a class of applications Grid Protocols • Security: Grid Security Infrastructure • Resource Management: Grid Resource Allocation Management • Data Transfer: Grid File Transfer Protocol • Information Services: Grid Information Services Grid Security Infrastructure (GSI)

• The Grid Security Infrastructure (GSI) for grids has been defined by creating extensions to standard and well-known protocols and APIs. • Extensions for Secure Socket Layer/ Transport Layer Security (SSL/TLS) and X.509 have been defined to allow single sign-on (proxy certificate) and delegation • The X.509 proxy certificate grid extension defines how a short-term, restricted credential can be created from a normal, long-term X.509 credential. • This supports single sign-on and delegation through “impersonation” and is also an Internet Engineering Task Force (IETF) draft. • The Generic Security Service (GSS) API extensions have been created and are under review at GGF • GSS is an IETF standard that provides functions for authentication, delegation, and message protection. • Figure shows the Grid Security Infrastructure in action. • The request submitted is as follows: “Create processes at A and B that Communicate & Access Files at C.

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 Grid Computing Unit - I Grid Resource Allocation Management (GRAM)

• The Grid Resource Allocation and Management protocol and client API allows programs to be started on remote resources. • A Resource Specification Language (RSL) has been developed as a common notation for exchange of information between applications, resource brokers, and local resource managers. • RSL provides two types of information: o Resource requirements: machine type, number of nodes, memory, etc. o Job configuration: directory, executable, arguments, environment • An example of an RSL-based requirement would be as follows: o “create 5-10 instances of myprog, each on a machine with at least 64MB memory that is available to me for 4 hours, or 10 instances, on a machine with at least 32MB of memory” • GRAM protocol is a simple, HTTP-based remote procedure call (RPC). o It sends messages such as job request, job cancel, status, and signal o Event notifications for state changes include pending, active, done, failed, or suspended. • GRAM-2 protocol includes multiple resource types, such as storage, network, sensors, etc. • It will also use Web Services protocols such as Web Services Definition Language (WSDL) and Simple Object Access Protocol (SOAP). Grid File Transfer Protocol

• One of the key requirements for these data-intensive grids is high-speed and reliable access to remote data • The standard FTP protocol has been extended while preserving interoperability with existing servers • The extensions provide for striped/parallel data channels, partial files, automatic and manual TCP buffer size settings, progress monitoring, and extended restart functionality. • The protocol extension to FTP for the grid (GridFTP) has been submitted as a draft to the GGF DWG Grid Information Services

• A set of protocols and APIs are defined in the resource layer that provides key information about the grid infrastructure. • Grid Information Service (GIS) provides access to static and dynamic information regarding a grid’s various components and includes the type and state of available resources. • There are two types of Grid Information Services. o The Grid Resource Information Service (GRIS) : supplies information about a specific resource o Grid Index Information Service (GIIS): an aggregate directory service • The Grid Resource Registration protocol is used by resources to register with the GRIS servers. • The Grid Resource Inquiry protocol is used to query a resource description server for information and also query the aggregate server for information

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 Grid Computing Unit - I Overview Of Types Of Grids 1. Departmental Grids

• Departmental grids are deployed to solve problems for a particular group of people within an enterprise. • The resources are not shared by other groups within the enterprise. Cluster Grids: • typically used by a team for a single project and can be used to support both high throughput and high performance jobs. Infra Grids • grid that optimizes resources within an enterprise and does not involve any other internal partner. • It can be within a campus or across campuses 2. Enterprise Grids

• Enterprise grids consist of resources spread across an enterprise • provide service to all users within that enterprise • run behind the corporate firewall Enterprise Grids: • is deployed within large corporations that have a global presence or a need to access resources outside a single corporate location Intra Grids • resource sharing among different groups within an enterprise constitutes an intra grid. • An intra grid can be local or traverse the wide area network. Campus Grids • enable multiple projects or departments to share computing resources in a cooperative way. • Campus grids may consist of dispersed workstations and servers as well as centralized resources located in multiple administrative domains, in departments, or across the enterprise. 3. Extraprise Grids

• established between companies, their partners, and their customers. • The grid resources are generally made available through a virtual private network Extra Grids: • enable sharing of resources with external partners. • assumes that connectivity between the two enterprises is through some trusted service, such as a private network or a virtual private network. Partner Grids • grids between organizations within similar industries, which have a need to collaborate on projects and use each other’s resources as a means to reach a common goal. 4 Global Grids

• Grids established over the public Internet constitute global grids

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 Grid Computing Unit - I

• They can be established by organizations to facilitate their business or purchased in part, or in whole, from service providers Global Grids • allow users to tap into external resources • provide the power of distributed resources to users anywhere in the world Inter Grids • provide the ability to share compute and data/storage resources across the public Web • This can involve sharing resources with other enterprises or buying or selling of excess capacity. 5 Compute Grids

• created solely for the purpose of providing access to computational resources Desktop Grids • leverage the compute resources of desktop computers Server Grids • Grid Computing , limited to server • for the purpose of creating an internal “utility grid” with resources made available to various departments High-Performance/Cluster Grids • These grids constitute high-end systems, such as supercomputers or HPC clusters 6 Data Grids

• Grid deployments that require access to, and processing of, data are called data grids. • They are optimized for data-oriented operations 7 Utility Grids

• commercial compute resources that are maintained and managed by a service provider • Customers may purchase “cycles” from a utility grid • may choose to use utility grids for business continuity and disaster recovery purposes. • utility grid providers are also called Grid Resource Providers (GReP).

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 – Grid Computing Unit - II UNIT- II TYPES OF GRIDS Desktop Grids : Background – Definition – Challenges – Technology – Suitability – Grid server and practical uses; Clusters and Cluster Grids; HPC Grids; Scientific in sight – application and Architecture – HPC application development environment and HPC Grids; Data Grids; Alternatives to Data Grid – Data Grid architecture Desktop Grids : Background Cause Computing and the Internet

• aggregation of PC processing power through one of the many “cause computing” projects • Example projects: o Searching for extra-terrestrials - SETI@home o Evaluating AIDS drug candidates - FightAIDS@Home o Screening For Extremely Large Prime Numbers - Greater Internet Mersenne Prime Search o Predicting Climate On A Global Scale - ClimatePrediction.net Concept / Idea enrolling • conscious decision on the part of a PC owner to sign up with a particular organization to allow the spare computational cycles of his PC to be used by the selected project • Upon enrollment, a small control program is downloaded to the PC. This program is responsible for communicating with the central project serve harvesting • spare capacity of the machine by executing cause-related computations relayed by the central server Key Concepts Resource Management • All Internet-based grids use passive resource management; • they rely on the enrolled PCs to initiate communication on a periodic basis. • This limits the degree to which the timeliness of results from such a grid can be predicted. • In addition, it limits the ability to reprioritize the computational behavior of the grid in a timely manner. Communication and Data Security • HTTP is the communication protocol between the PCs and the central server. • the data usually reside in an unencrypted format on the enrolled PC. • This limits the nature of the problems that can be attempted over the public Internet to those in which compromise of the data is not a pressing issue. • In some cases, the answers produced on the enrolled PC may be vulnerable to tampering, causing the confidence in the results to be lower than desired. Machine Heterogeneity • A wide variety of machines can vary in CPU speed, RAM, hard-drive capacity, and operating system level. • The management infrastructure either needs to operate at the lowest common denominator or needs to be aware of differences in the machines and assign tasks appropriately. Resource Availability • The entire cause-computing paradigm relies on the idea of voluntary participation. • the availability and utility of any particular resource is subject to the whim of the person controlling the PC. • adds a layer of unpredictability to the performance expectations that can be associated with such a grid.

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Distributed Computing in the Enterprise

• Distributed computing in the corporate world evolved out of the high-performance computing grids consisting of inter-networked UNIX™ and/or Linux™ machines. • Many corporate users realized that the aggregated, unused power of the PCs assigned to employees represented a large pool of computational cycles that were being wasted and not benefiting their company. Fundamental differences between a corporate intranet and desktop grids Network Connectivity • dedicated high speed (100 Mbps) or very-high speed (1Gbps) networks • many organizations are using portable computers as the primary desktop computing device, • both the duration and the quality of any device’s connection are difficult to predict Required Participation • can be part of the standard “way of doing things” within the company • yet does not address any of the robustness issues that remain (PCs may reboot; PCs may be turned off) PC Administration and Security • already in place so that “sensitive” information can be distributed • active management of PCs is an accepted part of corporate PC infrastructure Access to Shared Resources • Most organizations have common data storage on their intranets o shared drive / multi-tier, multi-terabyte data warehouse • can use this knowledge to reduce or eliminate redundant copies of data and to optimize work assignments Definition Characteristics of Desktop Grid • A defined (named) collection of machines on a shared network, behind a single firewall, with all machines running the Windows operating system. • A set of user-controlled policies describing the way in which each of these machines participates in the grid. These policies should also support automated addition and removal of machines • A hub-and-spoke virtual network topology controlled by a dedicated, central server. The machines on the grid are unaware of each other. More of a client-server architecture than a peer-to-peer architecture. • An actively managed mechanism for distribution, execution, and retrieval of work to and from the grid under control of a central server Desktop Grid Challenges Intermittent Availability • Unlike a dedicated compute infrastructure, a user may choose to turn off or reboot his PC at any time. • In addition, the increasing trend of using a laptop (portable) computer as a desktop replacement means that some PCs may disappear and reappear and may connect from multiple locations over network connections of varying speeds and quality. User Expectations • The user of the PC on the corporate desktop views it as a truly “personal” part of his work experience, much like a telephone or a stapler. • It is often running many concurrent applications and needs to appear as if it is always and completely available to serve that employee’s needs. • After a distributed computing component is deployed on an employee’s PC, that component will tend to be blamed for every future fault that occurs—at least until the next new component or application is installed.

MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Desktop Grid Technology—Key Elements to Evaluate Key elements that need to be considered and incorporated in Desktop Grid technology are, • Security • Unobtrusiveness • Openness/Ease of Application Integration • Robustness • Scalability • Central Manageability Security

• The Desktop Grid must protect the integrity of the distributed computation. • Tampering with or disclosure of the application data and program must be prevented. • must protect the integrity of the underlying computing resources. • The Grid Client Executive must prevent distributed computing applications from accessing or modifying data on the computing resources. Application Level • The distributed application should run on the PC in an environment that is completely separate from the PC’s normal operating environment. • The Grid Clients should receive executable programs only from Grid Servers, which should always be authenticated. System Level • The Grid Client Executive should prevent an application from using/misusing local or network resources. Machine configuration, applications, and data should be unaffected. Task Level • The Grid Client Executive must encrypt the entire work unit to protect the integrity of the application, the input data, and the results. Unobtrusiveness

• usage of these resources should be unobtrusive. • The Grid Client should cause no degradation in PC performance • When the user’s tasks require any resources from the Grid Client, the Grid Client Executive should yield instantly and only resume activity as the resources again become available o result is maximum utilization of resources with no loss of PC user productivity Openness/Ease of Application Integration

• should provide safe and rapid application integration • support applications at the executable level • should not be any requirement for recompilation, relinking, or access to application source code • must provide application integration in a secure manner Robustness

• must complete computational jobs with minimal failures • the grid must execute reliably with fault tolerance on heterogeneous resources that may be turned off or disconnected during execution • matching and dispatching appropriately sized tasks to each machine

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Scalability

• should be capable of scaling to tens of thousands of PCs to take advantage of the increased speed and power a large grid can provide • should also scale downward, performing well even when the grid is limited in scope Central Manageability

• management capability that allows control of grid resources, application configuration, scheduling, and software version management and upgrades • administrator should be able to manage all of the Grid Clients without requiring physical access to them • Management, queuing, and monitoring of the computational work should be easy and intuitive Key Technology Elements—Checklists

Security Checklist • Disallow (or limit) access to network or local resources by the distributed application. • Encrypt application and data to preserve confidentiality and integrity. • Ensure that the Grid Client environment (disk contents, memory utilization, registry contents, and other settings) remains unchanged after running the distributed application. • Prevent local user from interfering with the execution of the distributed application. • Prevent local user from tampering with or deleting data associated with the distributed application. Unobtrusiveness Checklist • Centrally manage unobtrusiveness levels that are changeable based on time-of-day or other factors. • Ensure that the Grid Client Executive relinquishes client resources automatically. • Ensure invisibility to local user. • Prevent distributed application from displaying dialogs or action requests. • Prevent performance degradation (and total system failure) due to execution of the distributed application. • Require very little (ideally, zero) interaction with the day-to-day user of the Grid Client. Application Integration Checklist • Ability to simulate a standalone environment within the Grid Client. • Binary-level integration (no recompilation, relinking, or source code access). • Easy integration (tools, examples, and wizards are provided). • Integrated security and encryption of sensitive data. • Support for any native 32-bit Windows application. Robustness Checklist • Allocate work to appropriately configured Grid Clients. • Automatically reallocate work units when Grid Clients are removed either permanently or temporarily. • Automatically reallocate work units due to other resource or network failures. • Prevent aberrant applications from completely consuming Grid Client resources (disk, memory, CPU, etc.). • Provide transparent support for all versions of Windows in the Grid Client population. Scalability Checklist • Automatic addition, configuration, and registration of new Grid Clients. • Compatible with heterogeneous resource population. • Configurable over multiple geographic locations. Central Manageability Checklist • Automated monitoring of all grid resources. • Central queuing and management of work units for the grid. • Central policy administration for grid access and utilization. • Compatibility with existing IT management systems. • Product installation and upgrade can be accomplished using enterprise tools (SMS, WinInstall, etc.). • Remote client deployment and management

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Desktop Grid Suitability—Key Areas for Exploration key practical areas that need to be addressed for successful integration of grid computing technology in enterprise desktop environments • Applications • Computing Environment • Culture Applications

• most technically advanced Desktop Grid deployment is of little use without applications that can execute • must have a Windows version of the application, all supporting files and environmental settings needed to establish an execution environment for the application, • appropriate licensing to permit multiple, concurrent copies of the application to be executed • continuum of “Application Suitability Application Categories Data Parallel • These applications process large input datasets in a sequential fashion with no application dependencies between or among the records of the dataset. • application that examines a large file of integers (that are stored one integer per line) and counts the number of items greater than a particular target value Parameter Sweep • These applications use an iterative approach to generate a multidimensional series of input values used to evaluate a particular set of output functions • application that finds the maximum value of a function F(X,Y) over a specified range using an exhaustive search approach that involves iteration of the parameters Probabilistic • These applications process a very large number of trials using randomized inputs (or other ab initio processes) to generate input values used to evaluate a particular set of output functions • application that finds the maximum value of a function F(X,Y) over a specified range using a Monte Carlo approach Analyzing Application Distribution Possibilities • Understanding how to decompose the input(s) of a large, monolithic job into an equivalent set of smaller input(s) that can be processed in a distributed fashion • Understanding how to recompose the output(s) from these smaller distributed instances of the application into a combined output that is indistinguishable from the single large job “grid-enabled application” refer to the combination of

• an application prepared to execute on a Grid Client, • a particular decomposition approach, • a particular recomposition approach Determining Application Suitability

• Compute Intensity of a typical work unit reflects the relative percentage of time spent moving data to and from the Desktop Grid Client compared to the time spent performing calculations on that data • typical work unit executes in 15 minutes (900 seconds) on a hypothetical “average” grid client, consumes 2MB (2,000 KB) of input data, and produces 0.4MB (400 KB) of output data

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

• CI = (4 * 900) / (2000 + 400) = 1.5 • CI is greater than 1.0 are “well suited” for distributed processing using a Desktop Grid solution Fine-Tuning a Grid-Enabled Application • plan to receive the benefit of using a Desktop Grid with that application • Receiving the same answer faster, Receiving a “better” answer in the same time Computing Environment

• Can be considered as o alternative, lower-cost option when compared with acquiring new, dedicated computing resources. o complementary addition to an existing infrastructure in which problems with a Windows-based solution can be executed on the Desktop Grid • A partial list of environmental considerations includes: o Archival and Cleanup—Of information regarding completed computing tasks and their results. o Backup and Recovery—In the event of failure of a Grid Server o Performance Monitoring and Tuning—Of the Grid Server and the Grid Clients Culture

number of cultural considerations that will contribute to the success or failure of a Desktop Grid within any particular organization

At the Desktop—Employee Culture • learning new habits and a different kind of “enrollment • concerned about the Grid Client Executive • kind of “spy ware” or blaming it for any fault or failure that occurs on their PC • Overcome with o employee education that might include informational e-mails, o published results (savings in capital expense dollars, faster time to results, etc.) o even a hands-on laboratory In the User Community • Current UNIX/Linux cluster systems generally have a fixed, physical presence • a notion of temporary, but exclusive, control • prejudice against Windows-based devices with regard to their ability to do “serious” computation • reluctance to use a new technology is easy to overcome based on the delivery of initial results Grid server and practical uses The Grid Server—Additional Functionality to Consider Essential functions provided by the Grid Server: • management/administration of all work units • assignment of work units to Grid Clients • management/administration of all Grid Clients Additional functionality provided by the Grid Server • Client Group-level Operations • Data Caching • Job-level Scheduling and Administration • Performance Tuning and Analysis • Security • System Interfaces

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Client Group-level Operations • In small (departmental) grids, administering clients on a one-by-one basis is relatively straightforward. • As the size and complexity of the grid grows, it is more useful to administer the grid as a collection of virtual, overlapping groups. Example • one group might be all machines located on the second floor, another group might be all Windows XP® machines; these groups will have zero or more machines in common. • Client Groups must be accompanied by a set of rules that allow client membership to be determined automatically for both new Grid Clients and for Grid Clients that have changed status (for example, upgrading the Windows operating system on that client or adding memory to that client). Data Caching • The time needed to move data to and from the Grid Client in the calculation of Computational Intensity. • Advanced Desktop Grid systems will provide various forms of data caching o data needed for a work unit can be placed in (or very close to) the Grid Client for advance execution • can either be manually controlled or automatically administered (the Grid Server examines its queue of work and ensures that any data needed for a work unit will be available at the Client). Job-level Scheduling and Administration The Grid Server should support various levels of job priority along with the ability to select particular Clients (or groups of Clients) for a particular job based on characteristics of the job itself.

Performance Tuning and Analysis • The Desktop Grid system should provide all necessary data and reports to allow an administrator to determine important performance characteristics of each Grid Client and the grid as a whole. • This should include optimum (theoretical) throughput calculations for the grid, actual throughput calculations for any particular job or set of work units, identification of any problematic Clients (or groups of Clients), etc. Security • Each function within the Grid Server user environment should include user-level security—which users may add new applications, which users may submit jobs, which users may review job output, etc • should have its own security system for access to any of its components through direct methods System Interfaces • The Grid Server should support a variety of interfaces for its various user and administrative functions. o a browser-based interface. o a command-line interface (for scripting support), o a Windows-based API (for invoking grid functionality from other Windows programs) o an XML interface (as a general-purpose communication methodology). • Any system interfaces provided must also include a security protocol Practical Uses of Desktop Grids

• Data Mining—Demographic analysis and legal discovery • Engineering Design—CAD/CAM and two-dimensional rendering • Financial Modeling—Portfolio management and risk management • Geophysical Modeling—Climate prediction and seismic computations • Graphic Design—Animation and three-dimensional rendering • Life Sciences—Disease simulation and target identification • Material Sciences—Physical property prediction and product optimization • Supply Chain Management—Process optimization and total cost minimization

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Real-world Examples Risk Management for Financial Derivatives - financial sector Opportunity • A large North American brokerage organization used a series of interconnected Excel spreadsheets to calculate various risk parameters • their ability to complete all the required calculations in the time window available • Upper management also wanted to move the calculation windows from a once per day (batch) paradigm to one in which complex risk-management calculations could be executed on a near-realtime basis Desktop Grid Solution • Departmental Grid.- All of the PCs in the Risk Management group participated in a small-scale grid • The application was grid-enabled with a combination of parameter sweep and probabilistic techniques • a PC grid with less than 25 PCs was more than sufficient Molecular Docking for Drug Discovery - life sciences sector Opportunity • capacity constraints for speedy evaluation of new drug candidates against a known database of compounds Desktop Grid Solution - Data-parallel application integration methodology • splitting the large database of compounds into hundreds of pieces and having each Grid Client evaluate the new drug candidates against a much smaller subset of the larger database Architectural Rendering Opportunity • challenge of generating realistic renderings of increasingly complex environments in a timely manner Desktop Grid Solution • turning every designer workstation and each of the dedicated rendering PCs into a Grid Client Clusters and Cluster Grids Introduction • A cluster is a local-area, logical arrangement of independent entities that collectively provide a service Cluster landscape as revealed by virtualized instance as a function of external appearance

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Clusters • HPC clusters use Smart System Software (SSS) to virtualize independent operating-system instances to provide an HPC service. • SSS allows a number of distinct systems to appear as one—even though each runs its own instance of the operating system. • there are two possibilities for SSS. • the Single System Image (SSI) is SSS that involves kernel modification. • the Single System Environment (SSE) is SSS that runs in user space as a layered service. • To varying degrees, these solutions enable computing for capacity (i.e., throughput of serial, parametric, and embarrassingly parallel applications) and/or capability (i.e., multithreaded and distributed-memory parallel applications). A layered view of SSS opposite the operating system and end-user applications

Single System Image

• interconnected COTS-class hardware, each running its own instance of GNU/Linux, to function as a distributed-memory parallel compute engine • Beowulf SSS incorporates a kernel modification to provide a distributed process space (BPROC). • BPROC allows: o PIDs to span multiple physical systems—each running their own instance of GNU/Linux o Processes to be launched on multiple physical systems—each running their own instance of GNU/Linux • This distributed process space is key to the creation of a clustered Linux environment for distributed- memory parallel computation • Beowulf clusters support distributed-memory parallel computing via the Parallel Virtual Machine (PVM) or via the Message Passing Interface (MPI) The next-generation solution, offers the following enhancements • Installation and administration improvements • Efficient, single-point distributed process management • Various 64-bit capabilities • Distributed-process-space-aware MPICH • MPI-enabled linear algebra libraries and Beowulf application examples Limitations • tight dependency between Beowulf’s BPROC and the Linux kernel • architecture of BPROC itself has caused scalability concerns • licensing for BPROC is under the GNU Public License (GPL). makes it challenging for commercial Single System Environment

• SSE runs in user space and provides a distributed process abstraction that includes primitives for process creation and process control • SSE solutions make use of dynamic-load-state data to assist in making effective, policy-based scheduling decisions, and in applying utilization rules to hosts, users, jobs, queues, etc., all in real time.

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

• This dynamic-load-state capability has significant implications, as task-placement advice is provided directly to the application on dispatch for execution. • A remote-execution service is required to allow o Authenticated communications over a network o A high degree of transparency in maintaining the user’s execution environment o Task control with respect to limits, signal passing, etc • task-tracking mechanisms are required o a unique identifier for application control, cluster-wide identifiers can be used in monitoring, manipulating, reporting, and accounting contexts • employ a policy center to manage all resources—e.g., jobs, hosts, users, queues, external events, etc. • Through the use of a scheduler—and subject to predefined policies—demands for resources are mapped against the supply for the same in order to facilitate specific activities Industry Examples Three application areas can be identified: • Capacity HPC o serial-processing requirements, o highly focused on throughput. o individual tasks are loosely related, and a parametric processing approach can be applied. o Compute farms based on SSE are of common use in this case. • Capability HPC o parallel processing requirements o focused more on the resources required to address challenging problems o Traditional Grand Challenge problems fall into this category o Solutions based on either SSI or SSE are common in this case. • Hybrid HPC. o Capacity and capability tend to be idealized end points. o Real-world examples involve some combination of capacity and capability requirements. o SSE handles the mixed workloads found in this case Electronic Design Automation (EDA)

• SSE in capacity-driven simulation • challenges for design synthesis, verification, timing closure, and power consumption SSE enhanced productivity stack for electronic design automation (EDA) • Synopsys VCS tool translates Verilog source code into a highly optimized executable simulation through the use of a C or its own native compiler • in terms of simulation workload, is a number of jobs that together form verification/regression suites • Example: o vcs test_vector1 o . . . o vcs test_vectorN • where N tests are run through sequentially • Each Synopsys VCS simulation run can be easily cast, submitted, executed, and summarized as a series of jobs for Platform LSF Primary Benefits • Increased ability to handle high-volume ASIC regression requirements • Compute capacity is automatically allocated where it is needed the most • Actual compute capacity requirements are tracked historically via Platform Intelligence • Effective partitioning of the varied workloads • Virtual engineers work round the clock—i.e., 24/7/365

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

Result • a regression suite consisting of 200 individual Synopsys VCS jobs o almost 12 days on a dual-processor Sun Enterprise 450 SMP system o less than 12 hours on 50-CPU workstation cluster Bioinformatics

Five steps in genome sequence analysis via Platform ActiveCluster • Split sequences of interest into M files • Divide reference database of interest into N pieces • Dispatch NxM jobs to desktops • Collect NxM results • Merge and sort results

• Load Sharing Facility (LSF) • Platform ActiveCluster shares much in common with the Desktop Grid solutions • can be cast as a complimentary addition to a cluster based on Platform LSF. • In this case, desktops pull workload for processing on an opportunistic basis. • This contrasts with the proactive push to the dedicated resources managed by Platform LSF. • Figure also provides a visual representation of a standard GSA use case scenario. • In five steps, it’s made clear that the embarrassingly parallel nature of GSA can be fully exploited. • Thus the opportunistic and dedicated HPC resources, virtualized by Platform ActiveCluster and Platform LSF, respectively, collectively form an HPC cluster Industrial Manufacturing

• Computed-assisted engineering (CAE) o obtaining high-fidelity results in a cost-effective fashion • Computational fluid dynamics (CFD) • Fluent, Inc.provides software technology for fluid-flow simulations in three-dimensions • MPI implementation of the solver across a cluster of interconnected workstations. • Ford Motor Company’s enterprise VISTEON implemented an MPI version of Fluent across a cluster of Hewlett-Packard workstations

MTech CSE (PT, 2011-14) SRM, Ramapuram 11 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

• compute infrastructure that could simulate complex airflows and heat-transfer phenomena in automotive heating, ventilation, and air conditioning (HVAC) units • Platform HPC was implemented to manage this capability processing o transparent allocation of all compute resources between all classes of application workload SSS enhanced productivity stack for industrial engineering Key Benefits • The ability to schedule jobs requiring multiple processors on physically distinct systems • to ensure that this processing is completed reliably Additional seamless benefits • can leverage an optimal array of distributed processors due to dynamic scheduling • The rich workload policy infrastructure of Platform HPC can be used • provides an infrastructure that allows for signal propagation, limit enforcement, realtime resource data Cluster Grids Based on their success with clusters, customers in EDA and industrial manufacturing sought to take virtualization to the next level. The resulting federation of clusters had to: • Allow for selective sharing of resources • Preserve a high degree of local autonomy • Address availability on a cluster-by-cluster basis need to span geographic boundaries within organization Platform MultiCluster • Platform MultiCluster was introduced in 1996 to federate clusters based on Platform LSF. • Platform MultiCluster addresses the identified requirements, and is deployed in one of the three submission-execution topologies • it’s clear that an HPC cluster is not a grid. • A grid needs to involve more than one cluster, address collaborative requirements, and span distance. three-point grid checklist: • (1) coordinates resources that are not subject to centralized control using • (2) standard, open, general-purpose protocols and interfaces to • (3) deliver nontrivial qualities of service Clusters Are Not Grids • Centrally controlled, but not distributed geographically o In HPC clusters, BPROC and the master of a Platform LSF cluster, are points of central control o clusters tend to be tightly coupled architectures • Often built around proprietary technologies o clustering for HPC via SSE, proprietary protocols and interfaces remain in use o more pronounced in the case of clustering for High Availability and transactional processing • Delivering non-trivial QoS o locality of concern—a tightly coupled LAN versus a geographically dispersed WAN environment

MTech CSE (PT, 2011-14) SRM, Ramapuram 12 hcr:innovationcse@gg CS693 – Grid Computing Unit - II HPC Grids Scientific insight Five steps to scientific insight

• Determine the relevant physics, chemistry, etc • Represent the science mathematically • Represent the mathematics numerically • Model/simulate numerically • Produce numerical/visual results Step 1) Determine the relevant physics, chemistry, etc • Once the problem under investigation has been determined, the first task is to determine the relevant physics, chemistry, etc Step 2) Represent the science mathematically • the mathematical description typically exists o there is rarely a need to invent the mathematical description. • can be formulated by combining existing descriptions • mathematical solution methods are often not enough to allow the resulting equations to be solved • it is often difficult to near-impossible to derive analytic solutions to many scientific equations. • in some cases it is difficult to prove that such solutions even exist. Step 3) Represent the mathematics numerically • Due to the challenging mathematical context, numerical methods are used to permit progress on otherwise unsolvable scientific problems • this involves a discrete representation of the equation(s) in space and/or time, and performing calculations that trace out an evolution in space and/or time • the underlying structure of the resulting set of equations has impact on the types of numerical methods that can be applied Step 4) Model/simulate numerically • numerical experiments are acts of modeling or simulation subject to a set of pre-specified constraints • Problems in which time variations are key need to be seeded with initial conditions, whereas those with variations in space are subject to boundary conditions; • The numerical model or simulation subject to various constraints can be regarded as a scientific application.

MTech CSE (PT, 2011-14) SRM, Ramapuram 13 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Step 5) Produce numerical/visual results • the solution of a scientific problem results in numerical output that may or may not be represented graphically • this application is further regarded as scientific workload that needs to be managed as the calculations are carried out. • Typically undertaken as a process of discovery, with a recursive nature of investigation Notes • many of the equations of classical physics and chemistry push even the most powerful compute architectures to their limits of capability. • Irrespective of numerical methods and/or compute capability, these Grand Challenge Equations afford solutions based on simplifying assumptions, plus restrictions in space and/or time, etc Application and Architecture four types of applications revealed by exploring process granularity. • Granularity refers to the size of a computation that can be performed between communication or synchronization points • any point on the vertical axis of identifies a specific ratio of computation (increasing from bottom to top) to communication (increasing from top to bottom) • Task parallelism, increasing from left-to-right on the horizontal axis, refers to the degree of parallelism present in the application

Serial Applications

• Most scientific problems are implemented initially as serial applications • These problems require that each step of the scientific calculation be performed in sequence. • Serial applications can be executed on compute architectures ranging from isolated desktops, servers, or supercomputers to compute farms. • Compute farms are loosely coupled compute architectures in which system software is used to virtualize compute servers into a single system environment (SSE). Data Parallel Applications

• focus is on data processing • seek and exploit any parallelism in the data • termed embarrassingly parallel The parallelism in data is leveraged by • Subdividing input data into multiple segments • Processing each data segment independently via the same executable • Reassembling the individual results to produce the output data

MTech CSE (PT, 2011-14) SRM, Ramapuram 14 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

Data-driven parametric processing

Compute Parallel Applications

• parallelism in the Grand Challenge Equations can be exploited at the source-code level—e.g., by taking advantage of loop constructs in which each calculation is independent of others in the same loop • Compute parallel applications are further segmented on the basis of memory access o shared versus distributed memory • With minimal language extensions and explicit code level directives, OpenMP (Unified Parallel C (UPC)), offer up parallel computing with shared-memory programming semantics. • Symmetric MultiProcessor (SMP) systems allow for shared-memory programming semantics via threads through uniform (UMA) and non-uniform (NUMA) memory access architectures Service Applications

• focus is networking itself or Web Services • MPI applications would require tightly coupled architectures • networking applications can be applied in a variety of contexts, • loosely coupled architectures can be used in the instantiation of Web Services HPC application development environment • MPI libraries are architecture specific and may come from a variety of sources—e.g., a system vendor, an interconnect vendor or via an open source contribution. • the relevant implements the MPI specification to some degree of compliance. • This MPI library, in combination with the tools and utilities that support developers, collectively forms the application development environment for a particular platform Challenges for the MPI • Re-synchronization and re-connection were not factored in specs • Fault tolerance was not factored in • Hosts and numbers of processors need to be specified as static quantities • Single point of control is absent • Multiple versions of MPI may exist on the same architecture MPI places the responsibility of these shortcomings on the application developer and user MPI application development environment

Production HPC Reinvented

SKIP

MTech CSE (PT, 2011-14) SRM, Ramapuram 15 hcr:innovationcse@gg CS693 – Grid Computing Unit - II HPC Grids • production HPC can be reinvented through the use of an integrated development and run-time environment in which workload management system software plays a key role • Consider the integrated production HPC solution stack shown below

Production HPC for Linux • Linux o low-processor-count servers each running their own instance of the GNU/Linux operating system • Myrinet Interconnect o GM message passing protocol across low-latency, high-bandwidth, multi-port Myricom Myrinet switches used solely to support parallel computing via MPI and the GM driver • Platform HPC for Linux o provides core workload management services o also provides the control and audit primitives that allow parallel applications to be completely managed • MPICH-GM Library o User’s MPI applications need to be compiled and linked against the application development environment provided by Myricom. The multi-protocol nature of MPI

• Portability was identified as a design goal for MPI • This objective has been carried through in MPI implementations such as MPICH. • Despite this fact, heterogeneous parallel applications based on MPI must not only use the same implementation of MPI (e.g., MPICH) but also the same protocol implementation (e.g., GM). • MPI is a multi-protocol API in which each protocol implements its own message formats, exchange sequences, etc.

MTech CSE (PT, 2011-14) SRM, Ramapuram 16 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Data Grids Characteristics of grid resources • numerous, • owned and managed by different organizations and individuals, • potentially faulty, • different security requirements and policies, • heterogeneous, • connected by heterogeneous, multilevel networks, • different resource management policies, separated geographically A grid enables users to

• Find and share : data be able to access them like data on their own system • Find and share applications • Share computing resources Grid Computing requirements • security • a global name space • fault-tolerance, accommodating heterogeneity • binary management • multi-language support • scalability • persistence • extensibility • site autonomy • complexity management Data Grids

A data grid provides transparent, secure, high-performance access to federated data sets across administrative domains and organizations.

Data grids are used to provide secure access to remote data resources:

• flat-file data, • relational data • streaming data Examples • two collaborators at sites A and B need to share the results of a computation performed at site A, • design data for a new part needs to be accessible by multiple team members working on a new product at different sites—and in different companies. Alternatives to Data Grid • Network File System (NFS) • File Transfer Protocol (FTP) • NFS over IPSec • Secure Copy—scp/sftp • De-Militarized Zone (DMZ) • GridFTP • Andrew File System (AFS)

MTech CSE (PT, 2011-14) SRM, Ramapuram 17 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Network File System (NFS)

• NFS is the standard Unix solution for accessing files on remote machines within a LAN. • With NFS, a disk on a remote machine can be made part of the local machine’s file system. • Accessing data from the remote system now becomes a matter of accessing a particular part of the file system in the usual manner Advantages • easy to understand & simply use • applications need not be changed to access files on an NFS mount • NFS server and client tools come standard on all Unix systems Disadvantages • LAN protocol—it simply does not scale to WAN environments Drawbacks • frequent retransmission of data and over-consumption of bandwidth o due to caching strategy on the NFS server o read block size is too small, typically 8KB. In a wide-area environment, latency can be high • NFS does not address security well. o An NFS request packet is sent in the clear and contains the (integer) User ID (UID) and Group ID (GID) of the user making the read or write request o VPN deployed between the organizations may attenuate some of these attacks • NFS requires that the identity spaces at the two sites to be the same o have accounts on each other’s machines File Transfer Protocol (FTP)

• FTP is a command-line tool that provides its own command prompt and has its own set of commands. • Several of the commands resemble Unix commands, although several new commands, particularly for file transfer as well manipulating the local file system, are different Advantages • relatively easy to use • has been around for a long time • likely to be installed virtually everywhere Disadvantages • must have access to an account on other machine o potentially could do more than just file transfer • every transfer requires typing the appropriate machine name, username, and password o implement anonymous ftp o anyone may access the ftp directory • inherently insecure; passwords and data are transmitted in the clear • Applications cannot use ftp to take advantage of remote files without significant modification NFS over IPSec

• IPSec is a protocol devised by IETF to encrypt data on a network • NFS over IPSec implies traffic between an NFS server and an NFS client over a network on which the data has been encrypted using IPSec. • The encryption is transparent to an end-user Advantages • regaining privacy and integrity

MTech CSE (PT, 2011-14) SRM, Ramapuram 18 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

Disadvantages • All of the performance, scalability, configuration, and identity space problems for NFS remain • kernels must be recompiled in order to insert IPSec in the communication protocol stack o once this recompilation is done, all traffic between all machines is encrypted. . Even Web, e-mail, and ftp traffic is encrypted, whether desired or not Secure Copy—scp/sftp

• SCP/SFTP belong to the ssh family of tools. • SCP is basically a secure version of the Unix rcp command that can copy files to and from remote sites, • sftp is a secure version of ftp. • Both are command-line tools. • The syntax for scp resembles the standard Unix cp command with a provision for naming a remote machine and a user on it. • the syntax and usage for sftp resembles ftp. Advantages • their usage is similar to existing tools. • password and data transfer is encrypted, and therefore secure. Disadvantages • these tools must be installed specifically on the machines on which they will be used. • Installations for Windows are hard to come by. • scp/sftp do not solve several of the problems with ftp • Applications cannot take advantage of remote files using scp/sftp without significant modification De-Militarized Zone (DMZ)

• A DMZ is simply a third set of machines accessible to both Alice and Bob using ftp or scp/sftp, established to create an environment trusted by both parties. • When Alice wishes to share a file with Bob, she must transfer the file to a machine in the DMZ, inform Bob about the transfer, and request Bob to transfer the file from the DMZ machine to his own machine. Advantages • neither party compromises his/her own machine by letting the other have access to it. Disadvantages • additional step of informing the other whenever a transfer occurs. • DMZs worsen the consistency problems by maintaining three copies of the file. • the file essentially makes two hops to get to its final destination, network usage increases. GridFTP

• GridFTP is a tool for transferring files • It is built on top of the Globus Toolkit • characterizes the Globus “sum of services” approach for a grid architecture. • could use GridFTP to transfer files from one machine to another, similar to the way they would use ftp. • both parties must install the Globus Toolkit in order to use this service. Advantages • solves the privacy and integrity of the problems with ftp by encrypting passwords and data. • provides for high-performance, concurrent accesses by design. • An API enables accessing files programmatically • Data can be accessed in a variety of ways—for example, blocked and striped. • Part or all of a data file may be accessed, thus removing the all-or-nothing disadvantage of ftp. Disadvantages • does not address the identity space problems with ftp.

MTech CSE (PT, 2011-14) SRM, Ramapuram 19 hcr:innovationcse@gg CS693 – Grid Computing Unit - II

o Their identities are managed by Globus, using session-based credentials • GridFTP does not solve the problems of maintaining consistency between multiple copies Andrew File System (AFS)

• The Andrew File System is a distributed network file system that enables access to files and directories distributed across multiple sites. • Access to files involves becoming part of a single virtual file system. • AFS comprises several cells, with each cell representing an independently administered file system. • The cells together form a single large virtual file system that can be accessed similar to a Unix file system. • AFS permits different cells to be managed by different organizations, thus managing trust. Advantages • Users would not require accounts on the other’s machines. • they could control each other’s access to their cell using the fine-grained permissions provided by AFS. • AFS avoids the consistency problems with other approaches using copy-on-open semantics • AFS supports intelligent caching mechanisms for performance. • access to an AFS file system is almost identical to accessing a Unix file system, users have to learn few new commands, • legacy applications can run almost unchanged. • AFS implements strong security features. o All data are encrypted in transit. o Authentication is using Kerberos, and access control lists are supported. Disadvantages use of Kerberos • all sites and organizations that want to connect using AFS must themselves use Kerberos authentication and all of the Kerberos realms must trust each other. • this means changing the authentication mechanism in use at the organization. o This is a non-trivial—and typically politically very difficult—step to accomplish. • the realms must trust each other • the Kerberos security credentials time-out eventually o long-running applications must be changed to renew credentials using Kerberos’s API. AFS requires that • all parties migrate to the same file system o which would probably be a significant burden on them and their organizations Avaki Data Grid Objectives of Avaki Data Grid • High-performance o To provide high-performance access in the wide-area, local copies must be cached to reduce the time spent transferring data over the wide-area network • Coherent o data grid must provide cache-coherent data while recognizing and exploiting the fact that different applications have different coherence requirements • Transparent o The data grid must be transparent to end users and applications • Secure o must support strong authentication with identities that span administrative domains and organizations • Between different administrative domains and organizations o a grid must address the identity mapping problem. o To span organizations issues of trust management must be addressed

MTech CSE (PT, 2011-14) SRM, Ramapuram 20 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Three Design Principles In designing Avaki Data Grid • Provide a single-system view o beyond the local network or cluster to a geographically dispersed group of sites, perhaps consisting of several different types of platforms o Re-creating the illusion of single resource for heterogeneous, distributed resources reduces the complexity of the overall system and provides a single namespace • Provide transparency as a means of hiding detail o traditional distributed system transparencies: access, location, heterogeneity, failure, migration, replication, scaling, concurrency, and behavior. o Example: a should not have to know where a peer object is located in order to use it • Reduce “activation energy o make using the technology easy o users can easily and readily realize the benefit of using grids Avaki Data Grid

• federated sharing model, • a global name space • a set of servers—called DGAS (Data Grid Access Servers) that support the NFS protocols o can be mounted by user machines, effectively mapping the data grid into the local file system. global name space • globally visible directory structure where the leaves may be files, directories, servers, users, groups, or any other named entity in the data grid. • Example: the path “/shares/grimshaw/myfile” uniquely identifies myfile, and the path can be used anywhere in the data grid by a client • The share command takes a rooted directory tree on some source machine and maps it into the global name space. • Example: the user can share c:\data on his laptop into /shares/grimshaw/data Data access in ADG • an NFS client to a DGAS via the local file system • shell scripts, scripts, and other applications that use stdio will work without any modification Access control • via access control lists (ACLs) on each grid object

MTech CSE (PT, 2011-14) SRM, Ramapuram 21 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Accessing the Avaki Data Grid • no programming is required at all. • Applications that access the local file system will work out of the box with a data grid. Three ways for end users to access data in the data grid NFS • transparent access via the native file system. • Applications require no modification, • tools such as “ls” in Unix and “dir” in Windows work on mounted data grids. • A similar capability using Windows file system access (via CIFS) is also available. Command Line Interface • A data grid can be accessed using a set of command line tools that mimic the Unix file system commands such as ls, cat, etc. • The Avaki analogues are avaki ls, avaki cat, etc. • provided for administrators or for users who may be unable for whatever reason to mount a DGAS into their file system. Web-based Portal • Using the portal, a user can traverse the directory structure, manage access control lists for files and directories, and create and remove shares Managing the Avaki Data Grid • Avaki ensures secure access to resources on the grid. Files on participating computers become part of the grid only when they are shared, or explicitly made available to the grid • Avaki’s fine-grained access control is used to prevent unauthorized access to shared data • Only users who have explicitly been granted access can take advantage of the shared data • The administrative unit of a data grid is the grid domain. • A data grid can be made up of one grid domain or several grid domains—effectively a “grid of grids.” Multiple domains should be established when: • Networks do not share a namespace. • separate grid domains should be established for those networks, and the two grid domains interconnected. • The two units truly represent different administrative domains within the organization, or two separate organizations. • In these cases, each organization will want to administer its grid separately, and might also be creating grids under different projects with different time constraints Basic tasks for Systems administrators • Server management o the number of grid servers is specified and hot spares for high availability are configured. • Grid user management o users and groups are either imported from the existing LDAP, Active Directory, or NIS environment, or they are defined within the grid itself. • Grid object management o files and directories can be created and destroyed, ACL set, and new shares added. • Grid monitoring o logging levels, event triggers, and so on are set. ADG can be configured to generate SNMP traps and thus be integrated into the existing network management infrastructure. • Grid interconnects o the system administrator can establish and support connections with other grid domains.

MTech CSE (PT, 2011-14) SRM, Ramapuram 22 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Data Grid architecture Architecture

• ADG 3.0 has been written almost entirely in Java • The architecture is based on an off-the-shelf J2EE application server. • Every ADG component runs within an application server. • Objects are created, deactivated, reactivated, and destroyed within the application server on demand • Interactions between objects within the same application server are processed by the Java Virtual Machine (JVM) within the application server. • Interactions between objects in different application servers (typically, on different machines) are processed using remote method invocations (RMI). • The product is configured to run RMI using SSL sockets in order to protect user credentials. • All objects log several levels of messages using log4j, but can also be configured to generate Simple Network Management Protocol (SNMP) traps, send e-mail, etc The major components of an ADG are • Grid Servers • Share Servers • Data Grid Access Servers (DGAS) • Proxy Servers • Failover Servers (Secondary Grid Domain Controllers) Grid Servers

A grid server performs the following grid-related tasks

Domain Creation • This grid server is also called a grid domain controller (GDC). • The GDC creates and defines a domain • A domain represents a single grid. • Every domain has exactly one GDC. • Multiple domains may be interconnected Authentication • this grid server would be responsible for receiving the user name and password • verifying the user’s identity using an in-built grid authentication service • delegating this process to a third-party authentication service such as NIS, Active Directory, or Netegrity Access Control • grid server uses her credentials to retrieve her identity and then check the access controls on the object to determine if the requested access is permissible or not Meta-data management • Every object in a grid has meta-data associated with it o creation time, ownership information, modification time, etc • A grid server is responsible for storing the meta-data in an internal database, • performing searches on it when requested, • rehashing the information when it becomes stale Monitoring • Monitoring typically involves determining the response time of other components to ping messages Searching

MTech CSE (PT, 2011-14) SRM, Ramapuram 23 hcr:innovationcse@gg CS693 – Grid Computing Unit - II Share Servers

• responsible for bulk data transfer to and from a local disk on a machine • always associated with a grid server • The grid server is responsible for verifying whether a given read/write request is permissible or not. • If the request is permitted, the grid server passes a handle to the user as well as the share server. o The user’s request is then forwarded to the share server along with this handle. • Subsequent requests are satisfied by the share server without the intervention of the grid server • A share server performs the actual bulk data transfers, grid server performs grid-related tasks for it Data Grid Access Servers (DGAS)

• A DGAS provides a standards-based mechanism to access a data grid. • A DGAS is a server that responds to NFS 2.0/3.0 protocols and interacts with other data grid components DGAS is not a typical NFS server • it has no actual disk or file system behind it; it interacts with components that may be distributed, be owned by multiple organizations, be behind firewalls, etc. • supports the Avaki security mechanisms; access control is via signed credentials, and interactions with the data grid can be encrypted. • caches data aggressively, using configurable local memory and disk caches • Furthermore, a DGAS can be modified to exploit semantic data that can be carried in the meta-data of a file object, such as “cacheable,” “cacheable until,” or “coherence window size.” • a DGAS provides a highly secure, wide-area NFS. Proxy Servers

• enables accesses across a firewall • requires a single port in the firewall to be opened for TCP—specifically HTTP/HTTPS—traffic. • All Avaki traffic passes through this port • The proxy server accepts all Avaki traffic forwarded from the firewall and redirects the traffic to the appropriate components running on machines within the firewall. • The responses of these machines are sent back to the proxy server, which forwards this traffic to the appropriate destination through the open port on the firewall. Failover Servers (Secondary Grid Domain Controllers)

• A failover server is a grid server that serves as a backup for the GDC. • A failover server is configured to synchronize its internal database periodically with a GDC. • if a GDC becomes unavailable either because the machine on which it is running is down or because the network is partitioned or for any other reason, users can continue to access grid data without significant interruption in service • Grid objects access using a unique name, called a Location-independent Object IDentifier (LOID) • when a data grid is operating in failover mode, i.e., with a failover server acting in lieu of a GDC, actions that change the GDC’s database are prohibited

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 24 hcr:innovationcse@gg CS693 – Grid Computing Unit - III UNIT – III ARCHITECTURE AND MANAGEMENT The open Grid services Architecture – Analogy – Evolution – Overview – Building on the OGSA platform – implementing OGSA based Grids – Creating and Managing services – Services and the Grid – Service Discovery – Tools and Toolkits – Universal Description Discovery and Integration (UDDI) The Open Grid Services Architecture (OGSA) • set of technical specifications which define a common framework that will allow businesses to build grids both across the enterprise and with their business partners. • OGSA will define the standards required for both open source and commercial software for a broadly applicable and widely adopted global grid infrastructure. • An enabling infrastructure for systems and applications that require the integration and management of service within distributed, heterogeneous, dynamic “virtual organizations • Defines the notion of a “Grid Service,” which is a Web Service that conforms to a specific interface and behavior, as defined in various specifications developed by the Global Grid Forum (GGF) Analogy • The analogy provides before-and-after reference points through a familiar example. • Office-productivity software is used here Multiple vendors with multiple products sharing a common operating environment

• vendors had to develop their own user interfaces, build their own tools and utilities, etc. • End users had to learn several different user interfaces, plus the details of per-application spell checkers, print drivers, etc. • this approach was not sustainable Multiple vendors with multiple products under the COM-enhanced OS

• OLE has been superceded by the Component Object Model (COM)—Microsoft’s framework for developing and supporting program component objects • COM has closer technical ties with OGSA • DCOM, is generally equivalent to CORBA (Common Object Request Broker Architecture).

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 – Grid Computing Unit - III

• Although OGSA leverages CORBA concepts, OGSA also needs to directly address secure interoperability, and provide a richer interface definition language The introduction of COM: • Introduces a de facto standard and implementation o encouraging modularity and facilitating extensibility in the Microsoft Windows operating environment. • Facilitates the transition of isolated products to a suite of integrated products o almost eliminating the value of per-application differentiation, and favoring a single-vendor solution. • Enables customers—from end user to support organizations. • Enables independent software vendors (ISVs) o leveraging COM simplifies and accelerates software development. Evolution of OGSA Grid Computing

Grid problem • defined “as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources – refered as virtual organizations • The purpose of such sharing is to enable a new kind of eScience, by harnessing collections of resources that could not be “brought under one roof.” • distributed computing technologies such as CORBA or DCE, effective within the domain of one organization o management of such relationships unscalable Globus Toolkit • developed to help solve this grid problem. • a need for a common set of APIs, protocols, and services to support Grid Computing in a way that would facilitate the dynamic interoperation required by the virtual organization. Global Grid Forum • formed to host a number of working groups focused on defining standards for distributed computing. • the main focus was on high performance computing. • There are focus areas covering specific aspects of the grid problem, which include working groups on o security issues, o data management and access o resource management and scheduling o user environments and programming models. Platform Computing • solving subsets of the grid problem in order to enable these organizations to realize more efficient utilization of computing resources Web Services

• In parallel with the development of grid technologies for scientific computing, businesses were attempting to deal with data and application integration problems o hitting up against constraints of many existing distributed computing models • incompatibilities at the protocol level and at the interface level o Example: A business based on CORBA couldn’t integrate systems with businesses based on Java RMI • Web Services were defined to decouple the programming environment from the integration environment • Based on existing Internet standards such as XML, W3C standards such as o Simple Object Access Protocol (SOAP), o Web Services Description Language (WSDL), o Web Services Inspection Language (WSIL) • help define conventions for a service provider and a service consumer to exchange messages in a protocol and programming language independent way. • Industry adoption of the Web Services model exploded o Examples: Microsoft (.NET™), IBM (Dynamic eBusiness™), and Sun (Sun ONE™) MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 – Grid Computing Unit - III

• developers can leverage existing programming skills, • legacy applications can continue to be maintained • focus only on the interaction between systems and the semantics of the messages being sent • allowed organizations to take advantage of computing architecture like Cluster Computing o Applications could now be architected as sets of services o run on more cost effective clusters of Intel-based systems, still maintaining previous levels of QoS. Convergence

• Web Services approach provides a number of desirable characteristics as a grid technology • The focus on interface rather than implementation, the need to support activities such as dynamic discovery of available services interoperation of heterogeneous environments, provides the necessary level of abstraction to facilitate the dynamic relationships represented by virtual organizations. • Adopting Web Services mechanisms allows for the accelerated adoption of grid technologies based on Web Services due to the large number of tools, services, and hosting environments available • The Open Grid Services Architecture (OGSA) is being proposed to the GGF as a standard platform for Grid Computing, and represents the definition of grid technology and functionality as Web Services OGSA Overview Contents

• Introduction • The OGSA Platform • OGSI o WSDL extension and conventions o Service Data o core grid service properties o portTypes for basic services • OGSA Platform Interfaces • OGSA Platform Models Introduction

• The OGSA is intended to define an infrastructure for integrating and managing services within a virtual organization. • it is an architecture for building grid applications. • The specification is being developed under the OGSA Working Group of the GGF • The scope of the working group includes identifying and outlining requirements for the different OGSA services, protocols, and usage models, and defining relationships with other standards activities • The approach of the working group is to drive this work based on the following use case scenarios, o Commercial Data Center—management of outsourced computing infrastructure o National Fusion Collaboratory—on-demand application services for large-scale analysis and simulation o Severe Storm Prediction—comparison of models to “streamed” sensor data triggered by “storm event” o Online Media and Entertainment—video on demand or online gaming o Service-Based Distributed Query Processing—data and computing resource federation The functional requirements fall into different categories: • Support for heterogeneity—of platforms, mechanisms, and administrative environments • Different application structures—process, resource requirements, flows, workload • Basic functionality—discovery and resource brokering, metering and accounting, data sharing, support for managing virtual organizations, monitoring, and policy enforcement. • Security—need to support security infrastructures and perimeter security solutions such as firewalls. • Resource management—provisioning in a uniform way, virtualizing access, optimization of usage, managing transport, batch and interactive access, SLA management and monitoring, and CPU scavenging. • Some desirable system properties—fault tolerance, disaster recovery, self-healing capabilities, ability to detect defects or attacks and to route around same, administration capabilities

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 – Grid Computing Unit - III The OGSA Platform

• The OGSA Platform is made up of three components • focuses on interfaces and usage models • it does not define protocol bindings, hosting environments, or domain-specific services Open Grid Services Infrastructure (OGSI) • OGSI represents the convergence of Web Services and grid technologies. • It defines the mechanisms for managing Grid Service instances (e.g., messaging, lifecycle management) OGSA Platform Interfaces • OGSI-compliant Grid Services (i.e., interfaces and associated behaviors) that are not defined within OGSI • Examples include registries, data access and integration, resource manager interfaces, etc. OGSA Platform Models. • combination of OGSA services and information schemas for representing real entities on the grid • Example: a standard definition of terms describing a computer system and the associated behavior OGSI

• defines mechanisms for creating, managing, and exchanging information among Grid Services • Everything in OGSA is a Grid Service • Grid Services can be permanent or transient, long or short lived. • Each Grid Service is required to support a specific set of interfaces, and to act in a specifically defined way, so that two can communicate in a meaningful way and act in a predictable fashion OGSI extends WSDL and XML Schema Definition to support: • Stateful Web Services • Inheritance of Web Services interfaces (portTypes) • Asynchronous notification of state change • References to instances of services • Collections of service instances • Service state data that augment the constraint capabilities of XML Schema Definition OGSI definitions can be grouped into four broad categories: • WSDL extension and conventions • Service Data • core grid service properties • portTypes for basic services WSDL Extensions and Conventions • OGSI extended WSDL 1.1 to correct two deficiencies. o First, there was no interface inheritance (i.e., inheritance of portTypes). o Second, it wasn’t possible to add informational elements to the portType • OGSI redefines the wsdl:portType element to add the desired semantics Service Data • mechanism to expose a service instance’s state data, called serviceData • can be used for reading, writing, or subscription Mutability • an indication of how the SDE value can change over time • Mutability can take one of four values: o Static—the SDE value is assigned in the WSDL service definition o Constant—the SDE must not change during the lifetime of the service. o Extendable—New elements can be added, but none are removed. o Mutable—any elements of the SDE may be added or removed at any time

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Core Grid Service Properties Service Description and Service Instance • OGSI makes a point of distinguishing • The Grid Service description specifies how clients interact with instances of the service • can also be used for discovering instances of Grid Services that implement a particular service description Modeling Time in OGSI • The Greenwich Mean Time (GMT) global time standard is used • Nothing specific about time synchronization between Grid Services • OGSI also defines conventions for representing “zero time” and “infinity.” XML Element Lifetime Declaration Properties • to describe the lifetime that a particular SDE value is valid for • The three attributes are: o ogsi:goodFrom: the time from which the content of an element is valid. o ogsi:goodUntil: the time when the content can be considered invalid o ogsi:availableUntil: the time up until the element itself can be considered valid Interface Naming and Change Management • OGSI specifies a naming scheme for portTypes to make sure that changes are detectable • all elements of a Grid Service description MUST be immutable • If a change is needed, a new portType MUST be defined with a new QName Naming Grid Service Instances • two-level naming scheme for locating Grid Services. • One or more Grid Service Handles (GSH) names every Grid Service • GSH names only one Grid Service instance • GSH is resolved into a Grid Service Reference (GSR) • service locator o an XML structure containing zero or more GSHs, zero or more GSRs, and zero of more interface (i.e., portType) QNames Grid Service Lifecycle • The life of a Grid Service instance is demarcated by its creation and destruction • Clients create Grid Services by invoking the createService operation • Destruction can be accomplished o explicitly destroy the Grid Service o "soft-state" approach where clients indicate their interest in a Grid Service for a particular duration Common Handling of Operation Faults • OGSI defines an XML base type (ogsi:FaultType) which must be returned in all fault messages • includes description, originator, timestamp, fault cause, fault code and extension element Extensible Operations • Many OGSI operations can accept an “untyped” input argument • allows common patterns of behavior to be expressed, without needing to define an operation for every different type that can be used in the pattern • The extensible parameter is defined in a serviceData element of type ogsi:OperationExtensibilityType PortTypes for Basic Services • OGSI defines a set of portTypes and describes the behavior of a collection of patterns The portTypes defined (in the ogsi namespace) are:

o GridService—encapsulates the root behavior of the service model. o HandleResolver—mapping from a GSH to a GSR. o Factory—standard operation for creation of Grid Service instances. o NotificationSource—allows clients to subscribe to notification messages.

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 – Grid Computing Unit - III

o NotificationSubscription—defines the relationship between a source and sink pair. o NotificationSink—defines a single operation for delivering a notification message o ServiceGroup—allows clients to maintain groups of services. o ServiceGroupRegistration—allows Grid Services to be added and removed from a ServiceGroup. o ServiceGroupEntry—defines the relationship between a Grid Service and its membership GridService includes the following serviceData elements • interface • serviceDataName • factoryLocator • gridServiceHandle • gridServiceReference • findServiceDataExtensibility • setServiceDataExtensibility • terminationTime The operations defined for GridService are • findServiceData • setServiceData • requestTerminationAfter • requestTerminationBefore • destroy OGSA Platform Interfaces

• The OGSA Platform Interfaces define a number of functions that commonly occur within Grid systems The functions categories • Service Groups and Discovery Interfaces • Service Domain Interfaces • Security • Policy • Data Management Services • Messaging and Queuing • Events • Distributed Logging • Metering and Accounting o Metering Interface o Rating Interface o Accounting Interface o Billing/Payment Interface o Administrative Services o Transactions o Grid Service Orchestration OGSA Platform Models

• OGSA Platform include sets of common models for describing and manipulating the real entities that are represented through Grid Services. • there will be multiple models defined corresponding to specific domains. • Examples: o a model for a “Linux HPC Cluster” might be defined, which describes the attributes of Linux clusters that are specific to HPC, and that are needed to run jobs on the cluster. o A model for “Point of Sale” system might be defined, which includes services and attributes pertaining to this type of application (e.g., how to represent “orders” and “inventory” as Grid Services).

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Building on the OGSA platform Pre-OGSA grid deployments • academic “Partner Grids” based on the Globus Toolkit, • production “Enterprise Grids” as implemented by Platform Computing • Two essential elements of this kind of problem are the o scheduling of computing resources (i.e., CPUs, memory, network, applications) o uniform access to data Open Grid Services Architecture - OGSA • The Agreement-based Grid Service Management (WS-Agreement) (OGSI-Agreement) specification defines the way that Grid Services can be managed based on organizational goals and application requirements, such that resources can be scheduled for use. • The Data Access and Integration Services (DAIS) working group at the GGF is focused on developing specifications, which can help virtualize access to data sources, hiding the complexity associated with integrating data from multiple locations and in multiple forms • being equally driven from academia and industry WS-Agreement

• WS-Agreement defines the Agreement-based Grid Service Management model, which defines a set of OGSI-compliant portTypes allowing clients to negotiate with management services in order to manage Grid Services or other legacy applications (e.g., a local resource manager). • if a user wants to submit a compute job to run on a cluster, they would have their Grid Service client contact a job management service and would negotiate a set of agreements that ensured that the user’s job would have access to a number of CPUs, memory, storage space, etc. • WS-Agreement defines fundamental mechanisms based on OGSI-compliant Agreement services, which represent an ongoing relationship between an agreement provider and an agreement initiator. • The agreements define the behavior of a delivered service with respect to a service consumer. • The Agreement is defined in sets of domain-specific agreement terms (defined in other specifications), as the WS-Agreement specification is focused on defining the abstraction of the agreement and the protocol for coming to agreement, rather than on defining sets of agreement terms Steps in creating agreements • Agreements are negotiated by creating a new Agreement service instance through a Grid Service that implements the AgreementFactory interface, inherited from Factory. • A client calls the Factory::createService operation with CreationParameters, for the requested terms. • Some CreationParameters can be required and some can be open to counter offers. • If the agreement terms are not acceptable, the createService operation returns a fault • Otherwise, a new Agreement service instance is created with Service Data Elements (SDEs) representing the negotiated terms, or an AgreementOffer service instance is created, which represents a number of alternative offers that the client can then choose from. • Instantiation of the Agreement service implies that the agreement has been accepted and is in force. Job Submission Example • the client might contact a job management service, which implements the AgreementFactory interface, • with CreationParameters that say "my job has to have a software license for application X, would like to have 8 CPUs, and would like to have 4G of RAM." • If it could provide all of the terms, an Agreement service instance representing the job resources would be created else rejected • If, because of available resource constraints, the job management service could not fulfill the terms of the original CreationParameters, but could supply either 4 CPUs and 4G of RAM, or 8 CPUs and 2G of RAM, the job management service could create an AgreementOffer which included two potential agreements: "app X, 4 CPUs, 4G" and "app X, 8 CPUs, 2G," one of which the client could choose (because CPUs and memory were terms subject to counteroffers).

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Data Access and Integration Services (DAIS)

The Data Access and Integration Services working group is focused on defining grid data services that provide consistent access to existing, autonomously managed databases.

Issues • wide area distribution of resources means that data to be processed could be located far away • accessing multiple data sources, with different formats and different access mechanisms The purpose of the DAIS specifications is to • highlight a set of “transparencies,” which are needed to deal with the complexities of data management • define a set of OSGI-compliant Grid Services that is intended to virtualize data access Heterogeneity Transparency • applications accessing a data source should not have to be cognizant of the implementation of it Name Transparency • applications should not manipulate data objects directly • access them through “logical domains” constrained by attributes. • This implies both location and replication transparency. Ownership and Costing Transparency • applications should not need to separately negotiate access to data sources when dealing with multiple, autonomous data sources, both in terms of access rights and usage costs. Parallelism Transparency • applications processing data should automatically get parallel execution over nodes on the grid, if possible. Distribution Transparency • distributed data should be able to be maintained in a unified way. Services Needed Discovery • Name (maps logical domain and predicates onto actual data source or service), • Ownership and Costing (optimizes for source-independent metrics like time or cost). Federated Access • Heterogeneity (allows access independent of data format and implementation), • Distribution (provides unified access to distributed data sources), • Name (provides some location transparency for file access). Consistency Management • Distribution (maintains consistency of distributed data). Collaboration • Distribution (maintains consistency of data updated in multiple places). Workflow Coordination • Parallelism (automatically parallelizes request). Authorization • Ownership and Costing (single sign-on). Replication/Caching • Name (migrates data on demand to the right place). Schema Management • Heterogeneity (maps between data formats)

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Implementing OGSA based Grids The Globus Toolkit 3 • The Globus Project is a consortium of academic researchers • investigating distributed computing in support of Big Science • sharing terascale volumes of data from high-energy-physics (HEP) experiments between hundreds of globally distributed scientists, aggregating hundreds of CPUs to perform Grand Challenge computations • a software toolkit that addresses the key technical challenges in building grid tools, services, and applications Protocol stack in version 2.x of the Globus Toolkit

Components of GT3 • An OGSI reference implementation—Implements all the OGSI specified portTypes, as well as some APIs that help developers implement OGSI compliant services. • Security Infrastructure—Based on GSI, this provides the encryption, authentication, and authorization services well known from GT2. • System-level Services—Some infrastructural services that can be used with all other Grid Services • Base services—Higher level Grid Services, corresponding to GRAM, MDS, and GridFTP • User-defined services—Any higher-level services built on top of the base services • Grid Service Container—The OGSI runtime environment intended to shield end users from details of implementation, such as the kind of database used to store service data. • Hosting environment—Implements the typical Web server type of services, such as transport protocols. o Four hosting environments are supported: o 1) an embedded environment for clients and lightweight servers, o 2) a standalone server based on the embedded environment but with some additional server functionality o 3) the grid container inside a standard Java Servlet™ Engine o 4) the grid container inside of an EJB application server Globus Toolkit Model

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 – Grid Computing Unit - III A functional representation of the OGSA Platform

• GT3 Core focuses on providing support for writing Grid Services in the Java programming language • There are a number of utility classes intended to make it easier to develop OGSI compliant Grid Services • This includes support for o automatically adding OGSI mandated service data to all services, o APIs for dynamically adding service data definitions corresponding to SDEs in your WSDL, o support for automatic lifecycle management for service instances in the container based on demand o some support for service state management GCSF • open source implementation of the WS-Agreement protocol contributed by Platform Computing • it provides an implementation of the OGSI-Agreement Grid Services, built on the Globus Toolkit’s OGSA hosting environment. • The framework provides the basic protocols for negotiating agreements and the state data associated with managing agreements. • provide the basis for the implementation of Grid Services for Meta Scheduling Creating and Managing services None

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Services and the Grid • defined as a behavior that is provided by a component for use by any other component based on a network-addressable interface contract • such a contract specifies the set of operations that one service can invoke on another (a set of method calls, for instance, when a service is implemented by using object-oriented technologies). • such an interface may also contain additional attributes that are not specifically related to the functionality of the called service and can include aspects such as o the performance of the service, o the cost of accessing and using the service o the details of ownership and access rights associated with a service. • The interface also allows the discovery, advertising, delegation, and composition of services Example • automobile company that has the need to share its compute and data for developing a new car • Following participants involved in such a collaboration: Grid services and toolkit developers • These users care about what grid tools and resources are available within the grid infrastructure to guide the software development process. Grid administrators • These users want to know about the status of resources in a production mode setting, which includes controlling, monitoring, and utilization-related information. Grid application users • These users care about having a transparent high-level view of the grid that exposes information related to a convenient problem-solving environment rather than the maintenance and development of complex grid applications. A service stresses interoperability and may be dynamically discovered and used.

a computational grid is composed of a number of heterogeneous resources which may be owned and managed by different administrators.

Each of these resources may offer one or more services: • A single application with a well-defined API • A single application used to access services on other resources managed by a different systems administrator. • A collection of coupled applications, with predefined interdependencies between elements of the collection o Each element provides a subservice that must be combined with other services in a particular order. • A software library containing a number of subservices all of which are related in some functional sense o e.g., a graphics or a numeric library. • An interface for managing access to a resource. o (This may include access rights and security privileges, scheduling priorities, and license checking software.) Executing a service • it does not involve persistent software applications running on a particular server. • A service primarily is executed only when a request for the service is received by a service provider. • The service provider publishes (advertises) the capability it can offer, but it does not need to have a permanently running server to support the service. • The environment that supports the service abstraction must satisfy all preconditions needed to execute a service. • A service may also have soft state, implying that the results of a service may not exist forever. • Soft state is particularly important in the context of dynamic systems such as Computational Grids, where resources and user properties can vary over time. • The soft state mechanism also allows a system to adapt its structure, depending on the behavior of participants as they enter and leave a system at will.

MTech CSE (PT, 2011-14) SRM, Ramapuram 11 hcr:innovationcse@gg CS693 – Grid Computing Unit - III

• The soft-state approach allows participation for a particular time period, subsequently requiring the participants to renew their membership. • Participants not able to renew their membership are automatically removed from the system. OGSA and WSDL • OGSA is a distributed interaction and computing architecture that aims to integrate grid systems with Web Services, and it defines standard mechanisms to create, name, and discover Grid Service instances. • Core concept in OGSA is “Grid Service,” defined in the Web Services Description Language (WSDL), and containing some pre-defined attributes (such as types, operations, and bindings). • WSDL provides a set of well-defined interfaces that require the developer to follow specific naming conventions • A new tag gsdl has been added to the WSDL document to enable description of Grid Services. • Once the interface of a Grid Service has been defined, it must be made available for use by others. • This process generally involves publishing the service within one or more registries. • The standard interface for a Grid Service includes multiple bindings and implementations (such as the Java and C# languages), and development may be undertaken by using a range of commercial (such as Microsoft’s Visual Studio.NET™) or public-domain (IBM’s WSTK) tools. • A Grid Service may be deployed on a number of different hosting environments, although all services require the existence of the Globus Toolkit. • Grid Services implemented with OGSA are generally transient and are created by using a factory service • there may be many instances of a particular Grid Service, and each instance can maintain internal state. • Service instances can exchange state via messaging • Connectivity between the user and a number of organizations is achieved by using XML/SOAP messages • Each organization in this instance offers a particular set of services—such as mathematical and graphics libraries—that are encoded as WSDL services. • A user can pick and combine a number of such services to implement an application. • Each organization may itself support a local grid containing compute and data resources internal to an organization. • The ability to integrate a number of local grids is a significant advantage of using Web Service technologies to build Computational Grids. General framework for enabling interaction between multiple grid systems based on Web Services

Creating a new service • When creating a new service, a user application issues a create Grid service request on a factory interface, leading to the creation of a new instance. • This newly created instance will now be allocated some computing resources automatically. • An initial lifetime of the instance can also be specified prior to its creation and allows the OGSA infrastructure to keep the service "alive" for the duration of this instance. o This is achieved by sending it keepalive messages. • The newly created instance is assigned a globally unique identifier called the Grid Service Handle (GSH)—and is used to distinguish this particular service instance.

MTech CSE (PT, 2011-14) SRM, Ramapuram 12 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Steps involved in the implementation of a Grid Service 1. Write a WSDL PortType definition, using OGSA types (or defining new ones). 2. Write a WSDL binding definition, identifying ways in which one could connect to the service, a. e.g., by using SOAP/HTTP, TCP/IP, etc. 3. Write a WSDL service definition based on the PortTypes and identified in Step 1. 4. Implement a factory by extending the FactorySkeleton provided, to indicate how new instances of a service are to be created. 5. Configure the factory with various options available, such as schemas supported. 6. Implement the functionality of the service by extending the ServiceSkeleton class. a. If an existing code is to be used in some way, then the delegation mechanism should be used. b. When used in this mode, the factory returns a skeleton instance in Step 4. 7. Implement code for the client that must interact with the service. Role of WSDL in OGSA • Extensibility elements available within WSDL play a significant role in OGSA, enabling users to customize their Grid Services. • This capability is useful to enable a community or group of users to define special XML tags or queries that may be relevant within a particular application context. • In OGSA, such extensibility elements include factory input parameters, query expressions, and service data elements (SDEs). o Factory input parameters provide the user with a mechanism to customize the creation of a new Grid Service o Query expressions enable a user to utilize a specialist query language or representation scheme. o SDEs enable the definition of specialist XML tags. WSDL Example A mathematical service which makes use of numeric service bindings is illustrated below ...

MTech CSE (PT, 2011-14) SRM, Ramapuram 13 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Service Discovery • Service discovery involves search through local and remote registries, to discover services that match a particular criterion defined by the user. • generally restricted to hardware resources being made available as services and registered in a directory service such as the Metacomputing Directory Service (MDS) • The MDS uses the Lightweight Directory Access Protocol (LDAP) to hierarchically structure the available resources within organizational domains. • Each node within the LDAP structure contains properties associated with a particular resource. • A user or application program can query the LDAP server to discover the type of hardware architecture • Each organizational domain participating in the grid would need to run a Grid Resource Information Service (GRIS) that is registered with a Grid Index Information Service (GIIS) to form a virtual info aggregation. • The GIIS/GRIS together constitutes the discovery service currently provided in the Globus Toolkit. Functionality to be supported by the grid information service are, • Lookup and retrieval of information that can be pinpointed to a particular resource or object • Query and search of information to retrieve a collection of related resources or objects • Creation of new information upon request that is otherwise not stored or cached in the grid • Event forwarding to relay the information based on dynamic events within the grid infrastructure • Aggregation for topical retrieval and organization of information • Filtering of information to reduce the amount or information communicated & stored, increase performance • Storage, backup, and caching of the information • Security, protection, and encryption to enable access control, authentication A grid information service builds the bridge between information about resources on the grid and the user community

MTech CSE (PT, 2011-14) SRM, Ramapuram 14 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Tools and Toolkits • The Globus Toolkit 3 provides a standard toolkit for developing prototype Grid Services applications • Includes services for information, services for remote job submission, file transfer, and security • one can interface with these services through a variety of client libraries due to support for protocols. • Easy to use client libraries are provided in Java and Python as part of the Commodity Grid (CoG) Kit project Globus Toolkit Grid Information Service • Globus Toolkit contains a grid information service called MDS • Metacomputing Directory Service, Monitoring and Directory Service • MDS was designed from the start as a distributed information service with an information entry point for each virtual organization • Object classes describe what information can be stored in the directory • MDS comprises the Grid Resource Information Service (GRIS) and Grid Index Information Service (GIIS) • A GRIS is an information service that runs on a single resource and can answer queries from a user about that particular resource by directing these queries to an information provider deployed on that resource. • The query from GRIS to a resource typically contains such information related to the resource platform, architecture, operating system, CPU, memory, network, and file systems. • GIIS is an aggregate directory service that builds a collection of information services out of multiple GIIS. • It supports queries against information spread across multiple GRIS resources. • MDS can use the Grid Security Infrastructure (GSI), which enables the use of certificates for authentication and authorization. • MDS provides both authenticated and anonymous access by users. • Site policies (open / closed) specify the restrictions on registration of resources with GIIS Accessing Grid Information • The Java CoG Kit project developed a LDAP browser/editor that provides a user-friendly Explorer™-like interface to LDAP directories • It is written entirely in Java with the help JNDI class libraries. • can connect to LDAP v2 & v3 servers. • Figure shows the user interface of the LDAP browser/editor. • Other interfaces have been developed in Perl, PHP, and Python The Java CoG Kit LDAP browser/editor • Java CoG Kit also provides a shell or batch script, called grid-info-search • Example: the following command can be used to retrieve all available information from the machine hot.mcs.anl.gov while accessing a GRIS service running at port 2135: > grid-info-search -x -h hot.mcs.anl.gov -p 2135 -b "Mds-Vo-name=local, o=Grid" " (objectclass=*)" Performance Issues with MDS • The performance of a query depends upon the information providers and the frequency of retrieval. • One should connect to the MDS server only as long as the connection is required, in order to avoid blocking the limited number of ports to an MDS server • avoid analyzing results between subsequent queries • think about the correlation between query frequency and information update frequency of a value in the MDS

MTech CSE (PT, 2011-14) SRM, Ramapuram 15 hcr:innovationcse@gg CS693 – Grid Computing Unit - III Universal Description Discovery and Integration (UDDI) • UDDI provides a standard method for publishing and discovering information about Web Services. • a platform-independent, open framework o for describing and discovering services offered by one or more businesses • Service discovery can happen at application design time or at run time • UDDI is a registry and not a data repository Issues that UDDI address are, • Making it possible for organizations to quickly discover the right business from millions online • Defining how to enable interaction with the business to be conducted, once it has been discovered Information that can be stored in UDDI may be classified as follows: White pages • These contain basic contact information and identifiers about a company, including business name, address, contact information, and unique identifiers such as its DUNS or tax IDs • This information allows others to discover Web Service based on business identification. • In the context of Grid Computing, white pages can provide the retrieval of an IP address or the amount of memory available on a particular resource. Yellow pages • These contain information that describes a Web Service using different business categories (taxonomies). • This information allows others to discover Web Services based on its categorization (like flower sellers). Green pages These contain technical information about Web Services that are exposed by a business, including references to specifications of interfaces, as well as support for pointers to various file and URL-based discovery mechanisms. UDDI and OGSA

UDDI OGSA bridge • The UDDI OGSA bridge is meant to connect two environments without necessitating a major change in either. • It can enable the use of existing UDDI implementations by relating key structures in grid environments o (such as Globus) with the UDDI registry

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 16 hcr:innovationcse@gg CS693 Grid Computing Unit - IV UNIT – IV NATIVE PROGRAMMING AND SOFTWARE APPLICATIONS Desktop supercomputing – parallel computing – parallel programming paradigms – problems of current parallel programming paradigms – Desktop supercomputing programming paradigms – parallelizing existing applications – Grid enabling software applications – Needs of the Grid users – methods of Grid deployment – Requirements for Grid enabling software – Grid enabling software applications Desktop Supercomputing: Parallel Computing – Historical Background

MIMD Computers

• The language of desktop supercomputing, CxC, combines the advantages of C, Java, and Fortran and is designed for MIMD architectures • any parallel computer not following the SIMD approach (one-program-on-one-processor-controls-all-others) automatically fell into the MIMD category. Parallel Asynchronous Hardware Architectures

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 Grid Computing Unit - IV List of popular MIMD hardware architectures: • Symmetric Multiprocessing Systems (SMP) • Massively Parallel Processing Systems (MPP) • Cluster computers • Proprietary supercomputers • Cache-Coherent-Non-Uniform Memory Access (CC-NUMA) computers • Blade servers • Clusters of blade servers MIMD computer classification • Single-Node/Single-Processor (SNSP) • Single-Node/Multiple-Processors (SNMP) • Multiple-Node/Single-Processor (MNSP) • Multiple-Node/Multiple-Processor systems (MNMP) Single-Node/Single-Processor (SNSP) • also known as von-Neumann computers • same as Flynn’s Single-Instruction-Single-Data (SISD) category Single-Node/Multiple-Processors (SNMP)

• shared memory computers having multiple processors within the same node accessing the same memory • Representatives are blade servers, symmetric multiprocessing systems (SMP), CC-NUMA architectures, and other custom-made high-performance computers. • Array and vector computers (SIMD) would fall into this category Multiple-Node/Single-Processor (MNSP)

• distributed-memory computers represented by a network of workstations

MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Multiple-Node/Multiple-Processor systems (MNMP)

• multiple shared-memory computers (SNMPs) connected by a network. • MNMPs are a loosely coupled cluster of closely coupled nodes. Typical representatives for loosely couple shared-memory computers are SMP clusters or clusters of blade servers Parallel Programming Paradigms • Single Node Single Processor (SNSP) • Single Node Multi Processor (SNMP) • Multi Node Single Processor (MNSP) • Multi Node Multi Processor (MNMP)

Single Node Single Processor (SNSP)

• preemptive multitasking is used as the parallel processing model. • All processes share the same processor—which spends only a limited amount of time on each process o so their execution appears to be quasi-parallel. • The local memory can usually be accessed by all threads/processes during their execution time

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Single Node Multi Processor (SNMP)

• based on symmetric multiprocessing hardware architecture • Shared memory is used as the parallel processing model • each processor works on processes truly in parallel and each process can access the shared memory of the compute node Data access in SNMP computer • connected through a high-speed connection fabric, they represent this single shared-memory system • The OS has been parallelized so that each processor can access the system memory at the same time. • The shared-memory programming model is easy to use • all processors are able to run a partitioned version of sequential algorithms created for single processor systems. Share Memory Paradigm in Symmetric Multi Processing

Disadvantage of SNMP systems: scalability is limited to a small number of processors. • Limitations are based on system design • include problems such as bottlenecks with the memory connection fabric or I/O channels, • every memory and I/O request has to go through the connection or I/O fabric. Methods for shared-memory (asynchronous) parallelism are • OpenMP, • Linda, or Global Arrays (GA). Data parallel synchronous parallelism • High Performance FORTRAN (HPF). Multi Node Single Processor (MNSP)

The programming model for standard MNSP (distributed-memory) computers, such as clusters and MPP systems, usually involves a message-passing model

Message-Passing Model • parallel programs must explicitly specify communication functions on the sender and the receiver sides. • When data needed in a computation are not present on the local computer o issuing a send function to the remote computer holding the data o issuing a receive function at the local computer. • The process for passing information from one computer to another computer via the network includes o data transfer from a running application to a device driver; o the device driver then assembles a message to be transferred into packets to the remote computer, which is subsequently sent through networks and cables to the receiving computer. o On the receiving computer’s side, the mirrored receiving process has to be initiated: the application triggers “wait for receiving a message,” using the device driver. o Finally, the message arrives in packets, reconstructed, handed over to the waiting application.

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Message Passing Paradigm Distributed Memory Architecture

Disadvantages for the Message-Passing Model • time delay/loss o due to waiting on both transmitting and receiving ends; • synchronization problems, such as deadlocks o applications can wait indefinitely if a sender sends data to a remote computer not ready to receive it • data loss o each sender needs a complementing receiver (if one fails, data gets lost); • difficult programming o algorithms have to be specifically programmed for the message-passing model o sequential algorithms cannot be reused without significant changes Distributed Shared Memory (DSM) or Virtual Shared Memory

• An approach used to overcome the difficult-to-program problem of the message- passing model • simulation of the shared-memory model on top of a distributed-memory environment. • provides the function of shared memory, even though physical memory is distributed among Disadvantage : loss of performance: • every time a processor tries to access data in a remote computer, the local computer performs message passing of a whole memory page. This leads to huge network traffic and the network becomes such a significant bottleneck - the decreased performance becomes unacceptable for most applications

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Problems Of Current Parallel Programming Paradigms • new art for most software developers • parallel computers are very expensive and not available to most software developers • complexity of parallel programming o With different architectures, there are also different parallel programming paradigms o no satisfying model for Multiple-Node/Multiple-Processor computers, o simulated shared memory leads to unacceptable performance for the application. • Shared-Memory Programming Model and the Message-Passing Programming Model offer both advantages and disadvantages. o Algorithms implemented in one model have to be reprogrammed with significant o There is no effective parallel processing paradigm that works for both SNMP and MNSP systems. o Mixing models is unacceptable due to complexity in the programs that makes maintenance difficult. o Complexity of programming and lack of a standardized programming model for hybrid compute clusters is an unsatisfying and unacceptable situation Desktop supercomputing programming paradigms Connected Memory Paradigm

• allows developers to focus on building parallel algorithms by creating a virtual parallel computer consisting of virtual processing elements. • It effectively maps, distributes, and executes programs on any available physical hardware. • Then it maps a virtual parallel computer to available physical hardware, o with creation of algorithms independent of any particular architecture. Desktop Supercomputing makes the parallelization process for even complex problems simple.

It enables:

Ease of programming • The language CxC allows developers to design algorithms by defining a virtual parallel computer o instead of having to fit algorithms into the boundaries and restriction of a real computer. Architecture Independence • Executables run on any of the following architectures without modification: SNMP, MNSP, or MNMP. • Today developers can use shared memory on SNMP and message passing on MNSP architectures that are distinctly different, requiring significant effort to rewrite programs for the other architecture. Scalability • Developers can create programs on small computers and run these same programs on a cluster of hundreds or thousands of connected computers. • This scalability allows testing of algorithms in a laboratory environment and tackling problems of sizes not previously solvable. Enhancement • It has the ability to unleash the performance of MNMP computers that have the best performance/price ratio of all parallel computers. Desktop Supercomputing with CxC • offers the advantages of message passing—using distributed-computing solutions—with the easier programmability of shared memory.

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Parallel Programming in CxC

• CxC is the language of Desktop Supercomputing. • When using CxC, you undergo an intrinsic parallelization process by creating a virtual parallel computer that may consist of millions of parallel processing elements communicating to each other via topology. Every CxC program consists of three main creation steps: • Specify the Virtual Parallel Computer Architecture. o create a virtual parallel computer architecture o consisting of array controllers and parallel processing units (PPUs). • Define the Communication Topology. o define the communication topology between them • Implement the Parallel Programs o implementation of the programs running on each PPU Example //// My first CxC program hello.cxc //// controller and unit declaration

controller ArrayController // create processor controller { // create parallel processors unit ParallelProcessor[30]; }

//// don't need a topology since there is no communication //// between processors taking place

//// program implementations

main hello(10) // execute 10 times { program ArrayController // program for all processors { // of declared controller println("hello parallel world!"); } }

• This simple CxC program creates 30 processors that will run all the same program. • The parallel machine will be executed 10 times. • The result is the following output 300 times (30 processors * 10 times executed). hello parallel world! hello parallel world! hello parallel world! Parallelizing Existing Applications CxC solution

• removes computational, platform, and scalability barriers; • significantly lowers overall time and costs of development; • enables a new parallel computing paradigm, which provides the best platform for highly parallel applications. • offer an easy way to parallelize existing serial applications

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

• works as an intermediate “glue” of FORTRAN, C, and C++ functions. • maintains the orginal performance that have been achieved in these libraries. • offers a great way to simplify the development and implementation of parallel algorithms o with a huge number of interdependent elements and their interactions. • simulation of interacting particles is considered as some the most complex and challenging applications that require tremendous computational resources. Grand Challenges • sheer size of the mathematical problems, • their very complexity combined with the required computational power Grid enabling software applications • The Needs of Grid Users • Grid Deployment Criteria • Methods of Grid Deployment • When to Grid-enable Software • Requirements for Grid-Enabling Software • Grid Programming Tools and Expertise • The Process of Grid-enabling Software Applications • Grid-enabling a Mainstream Software Application: An Example Needs of the Grid users Three groups of stakeholders • Application End Users • Business Enterprises • Application Developers Application End Users

• grid-enabled applications be simple to use.- primary need • benefits of the grid need to outweigh the difficulty they must incur in order to use it. • they do not have to fundamentally change the way they use an application – for adoptation • intolerant of any requirement for extensive configuration or management. o A zero configuration, zero administration “plug and play” approach is de rigueur for end users. Business Enterprises

• the benefits of the grid need to outweigh the difficulty they must incur in order to use it o determined by calculating the return on investment that grid implementation provides. o straightforward costs like, . cost of grid infrastructure and project-based costs for implementation o “soft” costs of grid deployment . expense associated with business process engineering • simplicity of use • control and management o needs the grid to be simple to manage o many users and many resources to manage according to various business rules and shifting priorities Application Developers

• Types o independent software vendors (ISVs), o in house developers, o third-party solutions integrators (SIs). • benefits of the grid must outweigh the difficulty the developer must incur in creating new applications or changing existing applications.

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

• grid capability to be a software feature • it must be simple to develop grid-enabled applications. Grid Deployment Criteria (O) To determine if it is worthwhile to deploy a • compute grid—whether local, enterprise-wide, or global in scope • it is essential to establish whether or not there is sufficient resulting benefit or return derived from the • required investment of effort, time, money, and resources. There are three significant benefits that can be achieved with Grid Computing.

• capable of providing powerful processing capacity that meets the extreme requirements of high performance computing applications. • allows computationally intensive software applications to run significantly faster • can raise the efficiency of computing resources in an enterprise network o from the typical 10 percent usage of desktops and 30 percent utilization of server capacity to the 80 percent to 90 percent range Methods of Grid deployment Two different methods • The scripted batch queue distribution method • The programmatic method of coding for parallel distributed processing The scripted batch queue distribution method,

• Distributed Resource Management (DRM) solutions use the batch queue method o Examples: Sun ONE Grid Engine, Platform LSF • Grid deployment can be achieved with little or no code modification by o replicating an application across several computers o distributing computing jobs through scripting. • Multiple jobs from application users are submitted and queued, • DRM software most efficiently and appropriately allocates the jobs to the available computing resources. Example • ten jobs that each require one hour to process • all be completed in one hour when distributed across a grid of ten identical computers. Conditions • There is a large quantity of jobs for a single application to process; • there is a large pool of computing resources available and capable of performing processing; • the application is scriptable • there is sufficient MIS/IT expertise to configure, deploy, and manage • If there is only one job to be processed, for example, or if there is only one computer to do the processing, the time required to complete the job(s) cannot be reduced. The programmatic method of coding

• allow a single, large job to be broken down into several smaller tasks • sent out and processed individually on separate computers • results returned to be recompiled. • Instead of requiring the application to be installed, o each task provides the compute resource with an instruction set o only the portion of data it requires for the computation. • can be effective with even a single job, as few as two computing resources, • can be applied to both scriptable and non-scriptable applications. • require software development expertise and access to application source code

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

Example • a single job that requires one hour to process could be completed in o thirty minutes on two computers o six minutes across a grid of ten identical computers. When to Grid-enable Software (O) categorized as either scriptable or non-scriptable Three key factors to consider in determining the most appropriate method of grid deploying a scriptable application are: • The number of jobs to be processed; • the number of available resources • the size of the jobs and capacity of the resources be preferable to distribute scriptable applications using the programmatic method if: • There are few jobs. • There are few resources. • Typical job size is very large. • The capacity of most resources is small. • User self sufficiency is preferred to IT control and management. • The frequency of job submission is sporadic. • Software deployment on each compute resource is prohibitive for reasons of cost, provisioning, or inadequate system requirements. Hybrid approach standards such as Distributed Resource Management Application API (DRMAA) and OGSA, programmatic solutions can interface with DRM software to create an integrated environment where both scriptable and non- scriptable applications can be distributed together on a common grid infrastructure Requirements for Grid enabling software Two requirements must be met in order to modify software for grid deployment: 1. Access to the application source code and the 2. Ability to modify it • both the legal right and the development expertise necessary to change an uncompiled application. There are three groups that meet these requirements. Independent Software Vendors (ISVs) • develop and commercially distribute software applications. • ISVs own their software code and have software developers in their employ. Academic Institutions And Enterprises In Research-Intensive Industries • Example: life sciences that use open source software applications. • Open source software licenses permit modifications of code, • allow redistribution of the modified version, subject to certain conditions. Enterprises That Have Developed Their Own Applications • for securing competitive advantage through superior implementation of information technology • As these applications are proprietary, enterprises typically own or in some fashion retain intellectual property rights to the source code.

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Grid Programming Tools and Expertise The primary tools for modifying code to enable parallel distributed processing have been protocols such as • MPI (message passing interface) • PVM (parallel virtual machine). GridIron XLR8

• development tool that simplifies the process of implementing application-embedded parallel processing. Consists of two parts • An application developers’ toolkit, or SDK, o comprised of APIs that are added to the source code of a computationally intensive application, o documentation, o sample applications, o other tools and materials to assist a software developer in modifying their code; • runtime software o installed on each computer in a grid providing the processing power. GridIron XLR8 provides APIs at a high level of abstraction • developers do not have to worry about communications level programming • they simply work with familiar variables and data as they would with a serial program. The GridIron XLR8 runtime software • allows the computers on the grid to discover each other • automatically set up a processing network, distribute work, • recover from failure • This autonomic capability eliminates the need to code or manage the processing environment outside the application. The Process of Grid-enabling Software Applications Process steps has to have the following three characteristics: • They can be split into smaller tasks. • Each task can be processed on a separate computer. • The results from each task can be returned re-assembled into one final result. Analysis

To grid-enable an application for distribution, it must first be analyzed prior to modification of the software source code. Identifying Hot Spots • identify the partition points at which the application is most appropriately split into smaller tasks. • achieved by locating the computational hot spots in the application algorithm • majority of the execution time is spent running an encapsulated sub-algorithm. • typically found within portions of iterative code, o i.e., nested FOR or WHILE loops. • Execution profiling can be used to empirically identify partition points • by highlighting which few lines of code account for a high percentage of the application execution time. Tightly Coupled versus Embarrassingly Parallel Algorithms Application algorithms fall into the two categories. Embarrassingly Parallel • Some algorithms are easily segmented into tasks that can be processed entirely independently. • Scriptable applications that can be deployed using the batch

MTech CSE (PT, 2011-14) SRM, Ramapuram 11 hcr:innovationcse@gg CS693 Grid Computing Unit - IV

• many non-scriptable can be modified using the programmatic method. • Example: The MPEG encoder application Tightly Coupled Algorithms • have dependencies on interim communication and exchange of results in the computational process. • do not lend themselves to batch queue distribution • require code modification • generally require more effort and are more difficult to modify through programmatic means. Operating Environment • determine what specific items are required for each task to be processed on a separate computer. • identifying the application requirements for local files, libraries, or databases, • special licensing or hardware requirements. • These need to be dynamically provided to the grid as part of the task o if they are not statically pre-installed on all compute nodes. o The GridIron XLR8 framework provides support for the automated distribution of these files. Results Generation • identify what kind of results the task computation will generate • determine how task computation results are to be stored and/or processed. • task results will arrive asynchronously in a different order than they were defined. Application Modifications

the application code can be modified using the GridIron XLR8 development tool • to allow the distributed tasks to be split • sent to the grid for computation, • re-assemble the individually returned task results into a single aggregate job result. Defining Tasks • an instance of the application called the distributor needs to be created • The distributor contains the GridIron XLR8 defineTask method, o divides the job into smaller tasks • At run time, the GridIron XLR8 framework will repeatedly invoke the function to create new tasks whenever compute resources are available. • Through defineTask, the distributor controls the size of each task. • task size can be varied based on constraints of the deployment environment, o interconnect speed, o whether the network is static or dynamic, o reliability and availability of the network and computers, o the minimum and maximum number of computers making up the grid. • a default task size is selected which maximizes the ratio of the task computation time relative to the time required for setup, result recompiling and communication. • the size of a task should be configurable o optimization can be easily achieved without making further changes to the application code. • GridIron XLR8 framework will invoke defineTask until the entire job is complete. Task Computation • create an instance of the application called the executor, • defines a new function completely encapsulating the algorithm contained inside the selected hot spot. • accomplished with the GridIron XLR8 doTask method. • This is where the computationally intense part of the application code goes. • GridIron XLR8 distributes the doTask method to the compute nodes for execution. • Upon completion of the computation, the framework will automatically return results back to the distributor.

MTech CSE (PT, 2011-14) SRM, Ramapuram 12 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Result Re-assembly • modify the distributor to define another new function containing the results-handling portion of the algorithm • This is the GridIron XLR8 checkTaskResults method, • it contains code originally found within or immediately following the application hot spot. • At run time, after processing is complete, the GridIron XLR8 framework will ensure all files and/or data defined by the executor have been transmitted back to the distributor. • task results will be returned in a different order than that in which they were defined, results may have to be stored until all results have been received, whereupon the final, aggregated result can then be generated. Grid-enabling a Mainstream Software Application: An Example Video Encoding

• Video encoding - applications used for digital content creation. • Many processes in the creation of digital audio, video, and graphics are computationally intensive. • For example, rendering, compositing, animation and encoding/decoding either require large amounts of processing and handle large amounts of data • Video encoding is one such computationally intensive digital content creation software process. • Grid Computing provides powerful processing capacity at low cost using commodity computing hardware • it offers tremendous potential as a means of accelerating digital video encoding • an open source MPEG-4 software encoding application was grid-enabled to allow video encoding to be performed more rapidly on a compute grid • Video encoding is a process for compressing raw digital video data by several factors in order to make storage and transmission more practical. MPEG-4 encoding involves a number of tasks, including: • image processing; • format conversion; • quantization and inverse quantization; • discrete cosine transform (DCT); • inverse DCT • motion compensation; • motion estimation. motion-compensated prediction is highly computationally intensive, and can require as many as several gigaflops per second The Need for Speed

three pronounced trends 1) The growing popularity of HDTV (high definition television) • NTSC standard at a resolution of 720 by 480 pixels and a frame rate of 30 frames per second is delivered at a bit rate of 249 Mbps and requires approximately 1.9 GB of storage per minute • HD standard of 1920 by 1080 pixels at the same 30 fps frame rate is delivered at a bit rate of 1.5 Gbps and requires approximately 1.1 TB of storage per minute of raw video • With the bit rate increased by a factor of six and the storage requirement increased by over two orders of magnitude, HD has much more rigorous compression requirements than the current NTSC standard 2) New Standards in addition to new video presentation standards, each new digital coding standard is similarly satisfied only through greater computing power 3) digital video (DV) cameras are among the fastest selling consumer electronics products

MTech CSE (PT, 2011-14) SRM, Ramapuram 13 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Current Solutions

Two general categories of solutions currently exist hardware solutions • hardware accelerator blocks • loosely coupled coprocessors, • available to speed the encoding process Software encoding solutions, • afford superior portability and flexibility • expensive and difficult to scale to achieve fast encoding Grid Deployment of Video Encoding

• a number of academic studies have investigated the applicability of distributed processing and data distribution methods for the purposes of video encoding • develop a complete implementation that can allow distributed video encoding to be successfully migrated from the research lab to commercial environments Requirements for Broad Marketplace Adoption

• Simple enough for application end users to deploy and use without having to acquire special skills or knowledge • Sufficiently robust to operate reliably outside of controlled lab conditions where environments may have varied hardware, operating system, and network elements • Fast and easy for the commercial software developers to integrate into their video encoding applications Overview of MPEG 4 Encoder

The MPEG4IP package includes • an MPEG-4 AAC audio encoder, • an MP3 encoder, • two MPEG-4 video encoders, • an MP4 file creator and hinter, • an IETF standards-based streaming server, • an MPEG-4 player that can both stream and playback from local file. MPEG4IP’s tools are available on the Linux platform, various components have been ported to Windows, Solaris, FreeBSD, BSD/OS, and Mac OS X. Overview of GridIron XLR8

GridIron XLR8 consists of two parts: • An application developers’ toolkit, or SDK, comprised of APIs, plus documentation, sample applications, and other tools and • runtime software that is installed on each computer in a network, providing additional processing power Using GridIron XLR8 to add distributed computing to the MPEG4IP video encoder is beneficial in three ways: • It provides a simple and rapid development environment. • It eliminates the need to code or manage the processing environment outside the application. • it is simple for end-users to work with the final compiled & installed version of the distributed software embedded directly into the software applications. • Once compiled and installed, users can benefit from the speed of distributed computing

MTech CSE (PT, 2011-14) SRM, Ramapuram 14 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Distributed Computing Strategy

• The MPEG-4 specification makes use of a hierarchy of video objects to represent content. • A Video Session is the top tier in the hierarchy, representing the whole MPEG-4 scene. • Each Video Session, or scene, is populated with Video Objects, which can be encoded as single or multiple Video Object Layers. • A Video Object Layer is in turn comprised of a Group Of Pictures (GOP). • A Group Of Pictures is a collection of three different picture types o I-pictures, or intra pictures . are pictures that are moderately compressed and coded without reference to other pictures o P-pictures, or “predictive” pictures, . take advantage of motion-compensated prediction from a preceding I- or P-picture to allow much greater compression. o B-pictures, or bi-directionally-predictive pictures . use motion-compensated prediction from both past and future pictures to allow the highest degree of compensation. • GOPs have the same number of pictures per sequence and are similar in size. • Segmenting the video data for distribution at the GOP tier in the hierarchy takes advantage of these characteristics to effectively treat the MPEG-4 encoding process as parallel software algorithm. • The size and structure of the GOPs makes them convenient to distribute for parallel processing at a reasonable level of granularity, to accommodate typical CPU and bandwidth capabilities, and to achieve reasonable load balancing across multiple processors. Implementation

implementation process undertaken to grid-enable and test the application Application Modification • identification of the appropriate partition points in the encoder, i.e., areas where data are encoded into I-, P- , and B-pictures • Files were modified as required in order to segment and distribute the video data based on the GOP strategy • The number of frames in a GOP is configurable as 20 frames • The raw video data were comprised of a total of 1,824 frames. • At a partitioning of 20 frames, this yielded a total of 92 partitions. • Each of the partitions was treated as an individual processing task, and encoded by an individual XLR8 runtime software peer. • This yielded 92 compressed files, which were asynchronously returned and appropriately multiplexed back into a single, new compressed file. • Encoding occurred at a target bit rate of 1.5 Mb per second Hardware • A grid comprised of 13 IBM xSeries 335 servers was used for this implementation, • each with dual 2.0 GHz Intel XEON processors, • 1.0 Gb RAM • Windows 2000® Server operating system. • The modified MPEG4IP MPEG-4 software encoder was installed on one of the 13 machines. • Only the GridIron XLR8 peer runtime software was installed on the remaining 12 xSeries computers, which were configured as individual servers and connected together with a gigabit Ethernet switch. Data • ~60-second NTSC broadcast quality (YUV [4:2:0]) at a resolution of 720 × 480 pixels @ 29.97 fps • The size of the uncompressed file was 923,400 KB. (900 MB)

MTech CSE (PT, 2011-14) SRM, Ramapuram 15 hcr:innovationcse@gg CS693 Grid Computing Unit - IV Hyperthreading • performance improvement technologies o vectorization (e.g., AltiVec), Single Instruction Multiple Data (SIMD), Pthreads, hyperthreading, SSE2 • Hyperthreading provides dual simultaneous execution of two threads on the same physical processor • Performance improvements : 5 percent to approximately 30 percent. • IBM xSeries 335 hardware and the Windows 2000 Server operating system supports HT • resulting in four instances of the GridIron XLR8 runtime software per node (machine) • each dual processor node with the ability to execute four independent parallel processing tasks GridIron XLR8 Runtime Software • manages the processing environment • allows computers to discover each other • automatically set up a processing network, distribute work, recover from failure. • total of 48 peers deployed on the 12 servers acting as processing nodes in this implementation. • Each peer has an installed footprint of 24 MB Results Output • generated a compressed and encoded MPEG-4 file that was decodable and playable at expected levels of quality using an existing MPEG player. Compression • The resultant MPEG-4 compressed file was 13,798 KB, o a reduction of about 98.5 percent from the original raw video data. Speed Improvement

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 16 hcr:innovationcse@gg CS693 – Grid Computing Unit - V UNIT – V APPLICATIONS, SERVICES AND ENVIRONMENTS Application integration – application classification – Grid requirements – Integrating applications with Middleware platforms – Grid enabling Network services – managing Grid environments – Managing Grids – Management reporting – Monitoring – Data catalogs and replica management – portals – Different application areas of Grid computing Application Classification • The dimensions relevant to enabling the application to run on a grid are o parallelism o granularity o communications o dependency • affects the way in which it can be migrated to a grid environment • not mutually exclusive—a specific application can be characterized along most, of these dimensions. Parallelism

• classification scheme attributed to Flynn • has a significant impact on how the application is integrated to run on a grid. Single Program, Single Data (SPSD) • simple sequential programs that take a single input set and generate a single output set Motivations 1) A vast number of computing resources are immediately available • grid is used as a throughput engine when many instances of this single program need to be executed. • Leveraging resources outside the local domain can vastly improve the throughput of these jobs. Example: Logic simulation of a microprocessor design • The design needs to be verified by running millions of test cases against the design. • By leveraging grid resources, each test case can be run on a different remote machine, thereby improving the overall throughput of the simulation. 2) Remotely available shared data can be used • program can be made to execute where the data reside or the data can be accessed via a “data grid.” • grid environment enables controlled access to shared data. • It is simple to integrate these applications to run on a grid, as no additional code development is required. Single Program, Multiple Data (SPMD) • the input data can be partitioned and processed concurrently using the same program. • comprises the majority of applications that utilize the grid today and covers a wide range of domains. Examples • Finite Element Method evaluations using an MPI-based program • Large-scale Internet applications such as SETI@home • UD-Cancer project Motivation • to significantly improve performance and/or scope by scaling the application out to as many resources on the grid as possible

MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Multiple Program Multiple Data (MIMD) • broadest category of parallel/distributed applications • both the program and the data can be partitioned and processed concurrently. Motivation • to improve performance • to ensure that resources are more optimally matched to application needs Example there might be partitions of an application that require a very tightly-coupled SMP system while another partition could be run in a more loosely-coupled environment or on a single machine.

Multiple Program Single Data (MPSD) • require different transformations to be applied to the same set of input data • can be carried out concurrently • very rare Communications

• very dependent on the performance of the underlying grid network infrastructure. The communications cost associated with an application is characterized by the following:

• The cost of initial movement of data and programs to grid resources prior to the computation. o This cost can be avoided on the grid in cases where the data or program is already available at the remote grid resource. • The cost of communication while the application execution executes. o This cost is a function of the frequency and size of communication. Granularity

• based on the granularity of their programs as o coarse-grained o fine-grained • specifies how long a program can execute before it needs to communicate with other programs • will dictate the amount of overhead associated with running the application remotely on the grid. Dependency

• programs within an application have no dependencies between them and each program can be scheduled and executed independently of the others. • the application would be a good candidate for deployment on the grid Grid requirements 1. Interfaces 2. Job Scheduling Interfaces

• Users and applications have accessed grids using simple command-line tools and programming APIs • Command line tools such as ‘qsub’ and ‘qstat’ offer ways for users to interactively submit and query the status of their jobs. o These tools can be further extended through wrapper scripts o hide the job creation and management complexities from the user. • programming APIs provide a way for developers to embed job submission and management functions into an application-specific wrapper program. • users execute an application-specific wrapper script or program and input the data required for the job.

MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg CS693 – Grid Computing Unit - V

• usually limited to running SPSD-type applications • more sophisticated interfaces that allow the creation of single- and multi-dimensional job arrays to support SPMD-, MPMD-, and MPSD-type applications • Web browsers are replacing command-line tools • enabling users to remotely access the grid from anywhere and at any time. • The use of a Web Services description language (WSDL) to describe jobs, applications, and data in conjunction with Web Services provides a very powerful programmatic interface to the grid. • Web Services provides a language-independent interface for programmers to develop complex wrappers using their favorite programming language. • Web Services is a well-defined standard, grids based on it will easily integrate into existing environments • The Open Grid Services Architecture (OGSA) is in the process of defining a set of Web Services Description Language (WSDL) interfaces for creating, managing, and securely accessing large computational grids. Job Scheduling

• To provide transparent and efficient access to remote and geographically distributed resources. • A scheduling service is necessary to coordinate access to the different resources such as network, data, storage, software, and computing elements that are available on the grid. • the heterogeneous and dynamic nature of the grid makes scheduling complicated. • Schedulers need to automate the three essential tasks viz o resource discovery, o system selection o job execution. • At a very basic level, jobs, once submitted into the grid, are o queued by the scheduler o dispatched to compute nodes as they become available. • grid schedulers are typically required to dispatch jobs based on a well-defined algorithm such as o First-Come-First-Serve (FCFS), o Shortest-Job-First (SJF), o Round-Robin (RR), or o Least-Recently-Serviced (LRS). • Schedulers have to support advanced features such as: o User requested job priority o Allocation of resources to users based on percentages o Concurrency limit on the numbers jobs a user is allowed to run o User specifiable resource requirements o Advanced reservation of resources o Resource usage limits enforced by administrators Data Management

Data processed by jobs are typically

• sourced at submission time • obtained by the compute node from a shared file system (NFS, AFS, DFS) at execution time. o the location of the data is input at submission o each node is required to have access to the specified location • work well when the input data size is small or when all nodes in the grid have access to a global shared file system. • Typical data sizes in both commercial and R&D are in gigabytes • Several hardware and software solutions are available o such as distributed data caching, replication, uniform namespace • virtualize access to data and provide a seamless way to store and retrieve data.

MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Remote Execution Environment

• Jobs executing on a grid require the same environment as they would if executed on submitters’ machine • These include environment variables, runtime environments such as Java, Small Talk, or .NET CLR, and libraries such as C language runtime and operating system libraries. • This total environment may be already available on the compute node or may have to be created by the job on node prior to execution. • Many grid frameworks provide tools and well-documented procedures to create such environments. Security

• grids are vulnerable to security attacks at all network points • Some grid frameworks offer end-to-end security that includes sand-boxing of jobs • sand-boxing feature enforces a security wall between the job execution environment and the running node o Example, a desktop user will be unable to view or manipulate the grid job running on his computer. The following are some basic features required to operate a secure grid:

Authentication • used to positively verify the identity of users, devices, or other entity in the grid, • accomplished using passwords and challenge-and-response protocols. Confidentiality • assurance that job information is not disclosed to unauthorized persons, processes, or devices. • accomplished through access controls, and protection of data through encryption techniques. Data Integrity • refers to a condition when job data are unchanged from their source and have not been accidentally or maliciously modified, altered, or destroyed. • accomplished through checksum validation and digital signature schemes. Non-repudiation • is a method by which the sender of job data, such as the grid scheduler, is provided with proof of delivery and the recipient, such as the compute node, is assured of the sender’s identity • neither can later deny having processed the data. • requires strong authentication and data integrity, as well as verifying that the sender’s identity is connected to the data that are being submitted. Gang Scheduling

• refers to all of a program’s threads of execution being grouped into a gang and concurrently scheduled on distinct processors. • time-slicing is supported through the concurrent preemption and later rescheduling of the gang • The threads may span multiple computers and/or UNIX processes • Communications between threads may be performed through shared memory, message passing, etc • Each gang-scheduled job requires all its threads to be started and stopped concurrently, causing synchronization among threads to occur at the start and end of the job. • if any specific thread aborts abruptly, all other threads must be terminated and entire job restarted Checkpointing and Job Migration Checkpointing • enables a job to take a snapshot of its state, so that it can be restarted later. Two main reasons for checkpointing a job:

• Fault Tolerance—where the job must recover from a failed compute node failure • Load Balancing—where a job on a overloaded compute node must be migrated to a node with lesser load.

MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg CS693 – Grid Computing Unit - V

• A long running job can be checkpointed periodically during its run. • If the execution node fails for any reason, the job can resume execution from the last checkpoint when the node recovers rather than start from the beginning. • The job can also be restarted on a different node if the original node is unavailable. • Sometimes one node is overloaded while the others are idle or lightly loaded. • Jobs can checkpoint one/more jobs on overloaded node and restart jobs on other idle/lightly loaded nodes. • Job migration can also be used to move intensive jobs to avoid interference with users or other programs running on the same node. Checkpointing may be implemented at two levels: Kernel Level • the OS transparently supports the checkpointing and restarting process without any changes User Level • the application can be coded in a way to checkpoint itself periodically • When restarted, the application looks for the checkpoint files and restores its state. Management

• require very complex processes and tools to manage the entire system. • require the underlying framework to provide some basic but very important functions: Virtualization • ability for administrators and job submitters to transparently access and manage compute resources, applications and data from anywhere in the grid. • may be transparently located anywhere and updated Provisioning • enables heterogeneous resources to be grouped based on their capability, location, or territorial control. • delivers the benefits of Grid Computing while preserving existing decentralized IT policies. Accounting • provides IT managers the ability to assess the utilization of resources and re-provision • users may be charged based on the type of resources and the duration for which they use them. Integrating applications with Middleware platforms • The simplest classes of applications to integrate are SPSD – Single Process Single Data There are some basic mandatory functions that are required:

1. Prepare the application for remote execution. 2. Login to the grid or have the appropriate access credentials on the remote machine. 3. Submit the application and data for execution on the grid platform. 4. Retrieve results from the remote machine where the application executes. Integrating SPSD Applications

• include support for one or more of these steps. • In the simplest case, these applications simply run “as is.” • No source code modifications or additional software development is required. • provide varying levels of abstraction to the end user by performing one or more of the above steps. • They abstract the job submitter from the remote machine and its execution environment. The user simply specifies the execution environment and the middleware does the following,

• identifies the appropriate resource, • sets up the execution environment (code, data, environment), • accesses the resource , runs the application

MTech CSE (PT, 2011-14) SRM, Ramapuram 5 hcr:innovationcse@gg CS693 – Grid Computing Unit - V

Example: • command-line utilities to submit jobs and retrieve results from the GridMP. • packages such as the Globus Toolkit provide a set of services for each of these steps and the end user has to develop the accompanying software to integrate these steps. • browser-based console: The console provides a user or an administrator with an easy-to-use interface to install applications, register data sets, submit jobs, and retrieve results. Integrating SPMD Applications

• the input data need to be partitioned and the partial outputs need to be merged to form a final result. Example: Virtual Screening • a database of drug-like molecules is tested against one or more protein targets. • The molecular database is partitioned, • each instance of execution works on this partitioned data. • The partitioning of data, submission of individual pieces of work (workunits), and retrieval and merging of results is accomplished by a separate piece of software, called an application service. • The application service can be accessed by a number of end users or clients simultaneously to run jobs. GUI for end user interaction • Both SPSD and SPMD applications sometimes include a GUI • best approach is to separate the GUI from the core application so that the GUI runs on the end-user’s machine while the core application is run on a grid resource. Example: Accelrys’ LigandFit application • This application included a separate graphical user interface that was the primary interface for end users. • The grid application service was invoked from within this GUI application. • End users continued to interface via the graphical front-end and the grid was completely transparent Integrating MPMD applications

• require modifications to the application source code • have some dependencies • require interaction between the program partitions during execution • done via a message-passing interface such as MPI or PVM. • The final merge step is also more complex o there might be additional computation that needs to occur after all partitions complete execution Application Preparation Example

• A simple command-line utility, buildmodule, is available with the Grid MP. o This utility packages the application executables along with metadata o The metadata is described within a module definition file. Example module definition file

MTech CSE (PT, 2011-14) SRM, Ramapuram 6 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Issues in Application Integration Application Modification • Legacy applications typically were not developed in-house o source code may not available for modification • it is very expensive to develop and maintain a separate code base specifically for the grid • most grid systems support the execution of application code as-is, typically without requiring even a recompilation of the code. • Example: Grid MP platforms User Interface • specification of scheduling policies, the parallelization of the application if desired, and basic job management functions are integrated with the existing front end for the application. • very useful in maintaining the user experience while providing the benefits of the grid. Virtualized Access • provide mechanisms to allow applications to operate seamlessly across these resources, where possible. • United Devices Grid MP allow the registration of multiple executables for a single application, where each executable is compiled to run on a different platform • The overall performance of an application is best when the resource requirements of the application match the resources available on the machines on which it runs. Scheduling • Grid platforms typically approach this aspect of scheduling in several ways. Leave The Matching To The User • Batch Queueing Systems • job submitter select a queue for the job, where the queue is typically serviced by a specific type of machine • Example: LSF and PBS Users specify the resource requirements for the job • Condor and Grid MP • scheduler can attempt to schedule an application to a machine that is a good match. Use Historical Data • use historical data for the execution of an application when making future scheduling decisions. Management Of Data And Programs Data • Data management systems such as SRB and Chimera provide mechanisms to manage multiple copies of a dataset across the grid and to use this cached data to get improved performance for their applications. • The Grid MP caches data across all of the grid and uses data affinity scheduling to match jobs to resources that already have the data. This optimizes performance and minimizes network traffic Program • provides features such as redundancy, monitoring, rescheduling, and checkpointing • Seti@home and UD Grid MP: allow the user to specify the number of copies • Condor and UD Grid MP: support application checkpointing that allows the job to be restarted or the application can be rescheduled on a different machine and restarted using the checkpoint information. Security • authentication of users • all data and programs be encrypted on the host machines. • Grid MP provide automatic encryption/decryption of all input and output data

MTech CSE (PT, 2011-14) SRM, Ramapuram 7 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Grid enabling Network services Introduction: Differentiated services

• The Operational Support Systems (OSS) that providers deploy are large, complex, and tightly coupled o integrated internally amongst their components o externally to both the devices in the network o Business Support Systems (BSS) that drives the provider’s operations. • The tight coupling of the layers of the solution creates the conditions that impede services differentiation. • For a service provider to offer a new network-based service, they must first reengineer the OSS/BSS to support the service. o This process can take anywhere from several months to a year or more • The Open Grid Services Architecture (OGSA) provides a strategy for service providers to create service- oriented infrastructures which support more flexible resource management. o Web Services supplies a paradigm that supports dynamic resource modeling, a fundamental requirement in pursuit of comprehensive management for evolutionary infrastructures. o Peer-to-peer technology creates a mechanism for ad hoc relationships to be formed on demand, without a centralized controlling mechanism. • Montague River, leveraging the emerging grid, Web Services, and peer-to-peer standards, has developed technology which shifts the way that network-attached resources can be managed On Demand Optical Connection Services

• connection device of choice for the user-facing edge of the service provider’s network. Organizations Dilemma • build out their own private optical network to gain control of the network and the ability to rapidly respond to changing market conditions • be faced with a service provider technology which requires centrally managed state for the creation of end- to-end optical connections. Montague River’s technology Providers • have more choice as to how they wish a state to be maintained, connections to be made, and services to be subsequently managed. • can allow the state to be maintained at the edge of the network • can allow users the ability to autonomously and independently create end to end connections Users • can cross connect and add-drop these connections independently of a central administrative organization. • can partition and re-advertise these connections to other users. Creating Grid-enabled Network Services

• Montague River has developed a layered, component-based architecture, implemented as network management appliances, to deliver on the promise of grid-based resource virtualization. • The solution implements the OGSA developed by the Globus Project, the peer-to-peer technology of the JXTA Project, and both SOAP and WSDL Web Services • Montague River leverages several open source technologies with the solution, including the Globus Toolkit 3, Apache Tomcat, AXIS, JBoss, MySQL, and Linux. • standardized implementations of key technologies, Internet scale compatibility for the solution is ensured Focus of Montague River’s solution • towards ExtraGrids and InterGrids. • users of global grids must be inherently shielded from the scale, complexity, and heterogeneity issues • same focus as standards organizations such as the Global Grid Forum and the World Wide Web Consortium.

MTech CSE (PT, 2011-14) SRM, Ramapuram 8 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Montague River Grid (MRG)

• supplies the necessary functionality to support the inter-domain and inter-provider management of NBS • self-organizing grid adapter/gateway for network-attached resources • deployed in conjunction with technology specific domain managers. • acts as the community authority/virtual org. for locally advertised and controlled NBS. • Supports discovery, membership, registry, mapper, factory, notification, topology, and threading services • enables user controlled, end-to-end, inter-domain and inter-provider services. Components of the MRG include: • Inter Domain Services—used to represent the persistent service datastore, service configuration, etc. • Inter Domain Factory—primary entrance factory for users • Factory—operational interface; used to implement the process of managing network services • Registry—used to identify existing persistent service instances • Mapper—used to extract detailed information about existing service instances • Notifications—used to relay asynchronous alarms and notifications • Membership—used to enhance path selection within a business relationship or service paradigm • Discovery—used to identify and propagate existing services within a business relationship or service Montague River Grid architecture

Montague River Domain (MRD)

• supplies the necessary functionality to support device-specific, domain-level management for NBS • MRD is a service fulfillment/configuration management platform for network-attached devices such as transport equipment, storage platforms, and computational servers. • dynamically coupled network model allows evolution without the re-engineering of the platform • MRD implements standard functionality such as journaled transaction management, inventory upload and reconciliation, service configuration and rollback, service and topology reporting, and alarm correlation. Components of the MRD • Domain Services—service configuration operations, e.g., provisioning, discovery, service grooming • Domain Factory—network configuration operations, e.g., upload, transaction management, etc. • Security—service and network security, including resource tagging, user enablement, etc. • Configuration—service configuration operations, e.g, partitionLightPath, findASPath, addXC, deleteXC • Inventory—network device and component management, virtualized persistence of physical network, etc. • Reporting—domain level network and service reporting.

MTech CSE (PT, 2011-14) SRM, Ramapuram 9 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Montague River Domain Architecture

Sample API

• use rough XML constructs as used by SOAP messages to describe the functionality • element. o used to identify the objects under consideration during a specific action o which of the optional attributes a query will return via the API. o standard query/matching construct used by the system. o A match is made if the structure under inspection matches all of the regular expression (regexp) o effectively performs a logical ‘AND.’ • can be used against any data type, but values may not behave as expected. • will only work against numeric values • will only work for date and time related values • allows the builder of the criteria to limit the data in the element returned o results in less information being transferred over the wire - substantial performance improvements - optional ...as for regexp... gt | lt | eq | ne | gte | lte ...comparison value... ...as for regexp... before | after | between ...comparison value... ...2nd comparison value – used only by 'between'... ...

MTech CSE (PT, 2011-14) SRM, Ramapuram 10 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Deployment Example: End-to-End LightPath Management

• forward-looking, user-controlled grid network service is the LightPath. • LightPath services are being pioneered by CANARIE Inc., • deployed in Canada’s next generation Internet—the CA*net 4. A LightPath service is defined to be: • “Any uni-directional point to point connection with effective guaranteed bandwidth.” • The first implementation of LightPaths are across SONET add/drop multiplexers • can also be implemented over CWDM, DWDM, SONET, SDH, ATM CBR, MPLS LSR, DiffServ, and GigE VLAN devices and services. • Montague River has developed technology to support the user control of LightPath with MRG & MRD Montague River’s LightPath technology allows • end users and grid applications to dynamically invoke spatial Quality of Service mechanisms • to configure a dedicated optical BGP static route between two arbitrary points, • independent of the number of service providers or types of network devices between the points. • High-end data traffic is then automatically re-routed over the path. • By setting up separate, direct optical BGP paths between source and destination, the use of advanced techniques to efficiently manage the transfer of large data sets can then be deployed Managing Grids Grid Management Requirements / Management Concerns

• Trust—A grid deployment must achieve high levels of trustworthiness and the trust infrastructure must accommodate the unique needs for federated control. • Management Reporting—The grid environment should provide a variety of reports that support management’s need to understand the deployment and utilization of resources. • Monitoring—A wide variety of processing tasks may be using the resources of the grid, and these need to be monitored in realtime and both normal and abnormal events must be reported using various modalities. • Service Levels—Commercial collaborators who choose to use grid technologies have concrete expectations about service levels tied to specific business relationships; the tools and techniques for monitoring and managing service levels must be present. • Data Catalogs and Replicas—Sharing data resources is often the compelling motivation for the deployment of computing grids. Meta-data based mechanisms support the needs for data distribution

MTech CSE (PT, 2011-14) SRM, Ramapuram 11 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Trust

• grid deployments must be trustworthy • resources are accessible only to properly authorized and authenticated users. • work in the context of a consistent trust model. • A robust trust model has four major components: identity, privacy, authority, and policy Identity • The identity management problem has two aspects. o First is the mechanism for asserting and assuring identity. . Example: Globus-based deployments rely on secure digital certificates and public key infrastructure to achieve reliable mutual authentication between users and resources. o Second is the need to establish a regimen for assigning, accepting, and revoking identities • The firm should either be operating as a certificate authority in order to issue and manage certificates or must contract with an external certificate authority for those services • All participants in a grid deployment must be able and willing to accept the certificates used to represent the identities of each other’s employees • it must be possible to suspend and revoke user’s access to the grid Privacy • to ensure that interaction itself is not compromised • no third party can use or misuse the substance of the interaction. • Implementing encryption yields the desired privacy Tasks that require the use of encryption 1. The specification of the parameters associated with a job 2. The process of identifying a job to the grid and submitting the job for processing 3. The transfer of the executable code associated wit the job from one location in the grid to another 4. The transfer of input data required by the job from one location to another 5. The transfer of output data produced by the job from one location to another 6. The transfer of log files and control information generated as a byproduct of the processing job (e.g., stdout and stderr on Unix style operating systems) 7. The notification of the user about the results and the delivery to the user of the results themselves 8. Information that resides in the grid middleware and identifies the job and its outcome (e.g., internal log files) Authorization • represent and enforce access controls • Requirements for access controls include Granularity, Federation, and Policies Granularity • refers to the ability to specify and enforce access controls on arbitrarily small discrete units of work Federation • In the distributed adjudication model, the request includes a representation of the requester’s rights. • A resource owner may also choose to delegate the specification of a subset for his/her resources • This model of shared responsibility for the access control regimen is a federated model of access control Policies • to represent the mechanism for specifying the mapping of access to subsets of the user base Policy • Any organization contemplating the adoption of grid technologies should devote the necessary attention to articulating security policies. Policies should include, • rules about the disclosure of private keys, • procedures for reporting and disabling all credentials associated with lost and/or stolen keys,

MTech CSE (PT, 2011-14) SRM, Ramapuram 12 hcr:innovationcse@gg CS693 – Grid Computing Unit - V

• periodic review of personnel lists and accepted credentials. Access control lists should be reviewed regularly, • clear guidelines for inclusion on specific lists should be articulated • policies regarding the exposure of the contents of the lists themselves should be developed. The Grid Computing environment should provide secure audit controls and appropriate reporting to enable (properly authorized) management personnel to validate the integrity of the operating environment. Management reporting • wide variety of metrics about the allocation and utilization of grid resources • both detailed and summary information about o various categories of users o statistics on usage and utilization rates of various resources. Users • A user report should contain the following information for each grid user, sorted by user id: o # of requests submitted, # of requests succeeded, # of requests failed, # of requests cancelled, o total CPU time used, total wall clock time used, total bytes transferred • The report should be available for a variety of time frames: o today, week-to-date, month-to-date, year-to-date and, for an arbitrary time frame, daily, weekly, monthly, quarterly, yearly. • The report should be available—sorted by user within organizational Resources Usage reports • which users and/or organizations are using a particular resource or group of resources • should contain the same details found in user report for each grid resource, sorted by resource id: • available for a variety of time frames • available sorted by resource within organization Utilization reports • Show degree to which a resource or group of resources are approaching their capacity Jobs • A job detail report will identify o each job run during a specified time period o the user who submitted the job, o the user’s organizational unit and organization, o whether or not the job succeeded o the resources used by the job. • A variety of summary reports should be available that have job as the primary sort, but that total usage by user, by organizational unit, and by organization. • should be available for varying time frames and, within reason, historically over varying periods Audit Support • should be possible to identify and track all important events in the computing environment o submission of jobs, the completion of jobs, failed access attempts, and resource allocation failures • All actions that change the state of the grid itself should be logged o maintenance of user lists, the granting and revoking of credentials, the maintenance of access control lists and policies, and the creation and destruction of identity proxies • Filtering should be available based on the type of event; o Example, accessViolation, accessViolationSubmittingJob

MTech CSE (PT, 2011-14) SRM, Ramapuram 13 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Monitoring • the grid environment should provide a mechanism for real-time monitoring and notification. • The notification mechanism should support a variety of modalities like email or paged, based on severity or any event etc Types of Events • provide notification for any event that occurs as part of the normal operation of the grid o such as job completion - sufficient importance and urgency to the initiator • any event that could result in a log entry should be allowed to result in a realtime notification o the event-monitoring subsystem may support multiple levels or types of event. • For example, events can be categorized as minor, moderate, serious, or urgent. • a notification policy can be invoked based on the category associated with a particular event Notification Modes • A notification policy may require notification via o most intrusive means (say, ringing a pager or telephone) for serious or urgent events, o e-mail notification for moderate events, o no notification for minor events. • A notification policy should be able to identify multiple responders • It may be useful to have a dynamically updated display of events. • These updates can be loggable events, creating an audit trail that is useful for tracking and reporting on the responsiveness of various service and support organizations. Data catalogs and replica management Data Catalog • The data catalog is meta-data that identifies the data sets being managed. • Typical meta-data includes o name of the data set, o location, o date-time it was last modified, o size, o type o correct access method. • Key issues in meta-data management include the completeness of the meta-data and the timeliness of updates to it. • The catalog should be flexible enough to track everything • should be subject to strict access controls • catalog maintenance activity should be diligently logged. Replication • a replication mechanism manages the physical distribution of the data • handle failed data transfer operations, unreachable network destinations, and unavailable source data • Data can be either pushed or pulled to its destinations Push approach • a replication manager, operating on a schedule of some sort, periodically initiates the movement of the data to their designated replication destination(s). • example, “move dataset A to locations Alpha and Beta every weekday at 2am.” Pull approach • responds to actual demand for access to the data • the replication mechanism makes copies available in at locations proximal to the requester

MTech CSE (PT, 2011-14) SRM, Ramapuram 14 hcr:innovationcse@gg CS693 – Grid Computing Unit - V Portals • Web-based applications that encapsulate grid operations and present a uniform user interface to the grid user base can be a useful integration point for all this management capability. • For the average grid user, the portal provides a uniform user interface that masks the difference between the hardware and software resources available on the grid. • For the administrative user, the portal provides a coherent application environment that uniformly enforces access controls and ensures consistent logging. Service Level Management (O) • Whenever organizations share resources—there is the expectation that each party is contributing according to some mutually agreed prior arrangement. • Such arrangements may specify that o a certain amount of processing capacity will be available during a given period of time o a certain amount of disk space and disk I/O will be available during that period of time o certain applications will be executed and will yield a specified throughput. • In the utility or service grid model of distributed computing, o resource providers are able to allocate costs to resource users o resource users are able to hold providers accountable to specific service level commitments • Monitoring and reporting provide a solid foundation for the deployment of service-level management tools. o Job reporting provides the basis for verifying that known work loads are processed according to throughput commitments made by the resource provider. o Summing job resource usage allows for a useful validation that total resources allocated as part of a grid deployment are consistent with resource provider commitments • Integrating this type of realtime usage and throughput information with a robust event management mechanism leads to the possibility of two important adaptive behaviors on the part of the grid infrastructure. o First, resource providers can use this information to automatically reconfigure or redeploy resources to a specific processing task or set of tasks. . Example: a resource manager is throttling the arrival of jobs of a particular and a queue length threshold is passed, a service-level management module could readjust the number of servers available for those jobs o Second, resource consumers can take advantage of realtime quality of service information to make service-level adjusted scheduling decisions Different application areas of Grid computing Grids in Life Sciences • Bioinformatics • Computational Chemistry and Biochemistry • Protein Modeling • Ab Initio Molecular Modeling • Artificial Intelligence and Life Sciences Grids in Other Industries • Grids in Financial Services, Geo Sciences, Manufacturing, • Electronic Design Automation • Entertainment and Media, • Chemical and Material Sciences, • Gaming

Grids in the Telecommunications Sector • Network Planning and Management • EDR Analysis and DataWareHouse • A Business in the Future Network

MTech CSE (PT, 2011-14) SRM, Ramapuram 15 hcr:innovationcse@gg CS693 – Grid Computing Unit - V

A two-step EDR processing platform, using InnerGrid

Disclaimer Intended for educational purposes only. Not intended for any sort of commercial use Purely created to help students with limited preparation time Text and picture used were taken from the reference items

Reference Grid Computing: A Practical Guide to Technology and Applications by Ahmar Abbas

Credits Thanks to my family members who supported me, while I spent considerable amount of time to prepare these notes. Feedback is always welcome at [email protected]

MTech CSE (PT, 2011-14) SRM, Ramapuram 16 hcr:innovationcse@gg