An Evaluation of the State of Affairs of : Current and Future Projections by Mesbah Haque

MEng., Civil and Environmental Engineering (IT), MIT, 2003 B.S., Civil and Environmental Engineering, University of Massachusetts - Lowell, 2001

Submitted to the System Design and Management Program in partial fulfillment of the requirements for the Degree of

MASTER OF SCIENCE IN ENGINEERING AND MANAGEMENT

AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY [it. l e 2 e J FEBRUARY 2005

0 2005 Mesbah Haque. All rights reserved

The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part.

Signature of Author: ...... I 4V Mesbah Haque System Design and Management Program January 10, 2005

Certified by: ...... John R. Williams Associate Professor, Depa ent of Civil & Environmental Engineering Thesis Supervisor MASSACHUSETTS INS E OF TECHNOLOGy APR 14 2005 LIBRARIES An Evaluation of the State of Affairs of Grid Computing: Current and Future Projections

By

Mesbah Haque

Submitted to the System Design and Management Program on January 14, 2005, in Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering and Management

ABSTRACT

Grid computing has a potential market opportunity of $12 billion by 2007 [6] and recent business strategy alignment to support Grid Computing by the major vendors like IBM, SUN, Oracle and others has resulted in high interests in numerous 'Grid Computing' products and solutions offered. The different frameworks and standards have also led to confusion in the industry as to the 'right' way of implementing Grid Computing. There is a tremendous need today for massive computing cycles to evaluate various business and engineering decisions and businesses are under continuous pressure of high infrastructure costs and lack of flexibility and reliability [4]. There are already various solutions that have evolved over decades to meet these needs but there is no overwhelming adoption of such technologies by corporations unlike in academia.

The thesis is derived from this need to attempt to clarify the current and future state of Grid Computing by evaluating the various standards and implementations available. A hierarchy of Types of Grid is also presented. Several case studies are also used to illustrate the effect of current technology implementations and their benefits. Future predictions of the market and technology drivers are also presented based on interviews and available research data.

Thesis Supervisor: Dr. John R. Williams Title: Associate Professor of Civil and Environmental Engineering, IESL Director ACKNOWLEDGEMENTS

I dedicate this thesis to my family andfriends for providing me with inspirationand encouragement.

I would like to thank my parents, Mozharul Haque and Zulekha Begum, for all their sacrifice, support and hard work to get me to where I am today and always encouraging me to achieve higher. My special thanks to my dearest wife, Maliha, for her support, encouragement and help. I am thankful to my brothers: Manny and Roma for their encouragement and support. My special thanks to my endearing nephews, Zeshan and Samee for keeping me distracted by their cuteness.

I would like to thank my thesis advisor Dr. John R. Williams for his tutelage, support and friendship throughout my learning process at MIT and believing in me to deliver. I will always be grateful to him. I would like to thank Mark Dailey and Manny for providing me with remarkable help and support from EDS. I would also like thank Sharon Green and Ed Reynolds at EDS for their time and valuable insights. I would also like to thank Michael Sheets at IBM Innovation Center for his help. I am also thankful to Helen Trimble for her help and friendship. I am thankful to my new friends at SDM for their help, support and friendship. You know who you are.

I would like to give special thanks to my dear friend, Sharon E. Walcott for her friendship, support, encouragement and help. I am also thankful to my friends Omar Hoda and Marco Ambrosoli for their help and support.

Lastly, but most importantly, I would like to thank God for blessing me with the opportunity to pursue my education this far. Table of Contents

T A B L E O F C O N T E N T S ...... 4 L IS T O F F IG U RE S ...... 6 CHAPTER 1 INRODUCTION TO GRID COMPUTING ...... 7 1.1 G R ID C O M P U TIN G ...... 7 1.2 ORIGINS OF GRID COMPUTING...... 9 1.3 MARKET FOR GRID COMPUTING ...... 11 1.4 GRID COMPUTING COMPETITIVE LANDSCAPE ...... 15 CHAPTER 2 GRID COMPUTING INFRASTRUCTURE ...... 18 2.1 GRID COMPUTING ARCHITECTURE INTRODUCTION AND IT EVOLUTION...... 18 2.2 GRID COMPUTING ATTRIBUTES...... 20 2.3 GRID COMPUTING STANDARD ...... 22 2.3.1 Globus Toolkit Standard ...... 24 2.3.2 Grid Architecture and Internet ...... 26 2.3.3 Grid and Web Services Standard Convergence (WSRF)...... 28 2.4 GRID COMPUTING CATEGORIES ...... 29 2 .4 .1 C o mp u te G rid ...... 2 9 2 .4 .2 D ata G rid ...... 3 0 2 .4 .3 O p tim ization G rid ...... 3 1 2.5 APPLICATIONS OF GRID COMPUTING...... 31 2 .5 .1 IB M O n-D em and ...... 32 Grid ...... 2 .5 .2 S u n N 35 I 2 .5 .3 O racle lOg ...... 3 8 2 .5 .4 M icro so ft G rid ...... 4 0 2.5.5 H P A daptive E nterprise...... 42 2.5.6 Gridbus - Open Source Grid Initiative ...... 43 2.5.7 GridGarden.NET - A Microsoft .NET based Grid Framework from MIT ...... 45 CHAPTER 3 CURRENT STATE OF AFFAIRS IN GRID COMPUTING AND IM P LE M E N T A T IO N S...... 48 3.1 GRID COMPUTING TODAY - CURRENT STATE OF AFFAIRS...... 48 3.2 INDUSTRY PERSPECTIVE - HIERARCHY FOR TYPES OF GRID COMPUTING...... 50 3.3 IBM CASE STUDY: HEWITT ASSOCIATES...... 53 3.4 SUN CASE STUDY - AXYZ ANIMATION, INC...... 55 3.5 ORACLE CASE STUDY - CHICAGO STOCK EXCHANGE ...... 58

4 CHAPTER 4 INDUSTRY FUTURE OF GRID COMPUTING ...... 61 4.1 M A RK ET FO RECA ST ...... 61 4.2 TECHN O LO G Y A N D STAN D ARD S D IRECTION ...... 63 4.3 IND U STRY PER SPECTIV ES ...... 65 C H APTER 5 D ISC U SSIO N S A ND C ON CLU SION S ...... 66 5.1 D ISCU SSION S ...... 66 5.2 CO N CLU SION S ...... 67 RE FER EN C ES ...... 68 A PPEN D IX A ...... 71 A PPEN D IX B ...... 76 A PPEN D IX C ...... 81 A PPEN D IX D ...... 84 List of Figures

Figure 1. 1: A rchitecture Evolution ...... 8 Figure 1.2: Revenue Forecast by Grid Category, 2002 - 2004...... 12 Figure 1.3: Connecting Motivations to Grid Computing...... 14 Figure 1.4: Key Providers of Grid/Utility Programs ...... 16 Figure 1.5: Best-Positioned Suppliers of Grid Computing Services ...... 17 Figure 2.1 Consum ption M odel...... 19 Figure 2.2: The evolution of grid technologies and standards...... 23 Figure 2.3: G T3 Core A rchitecture...... 24 Figure 2.4: Layered Grid architecture and its relationship to IP architecture...... 26 Figure 2.5: Convergence of Grid and Web Service Standards ...... 28 Figure 2.6: IBM Grid Toolbox Base Services ...... 33 Figure 2.7: IBM Grid Toolbox Hosting Environment...... 35 Figure 2.8: SUN NI Grid Computing Strategy ...... 37 Figure 2.9: Oracle lOg Product Grid Mapping ...... 39 Figure 2.10: Global Web Services Architecture ...... 41 Figure 2.11: The HP grid solution stack diagram...... 43 Figure 2.12: G ridbus M iddlew are...... 44 Figure 2.13: Components of GridGarden.NET...... 45 Figure 3.1: Organizational issues ranked "Extremely High" and "High" in severity ...... 49 Figure 3.2: Master-Worker Level 1 Grid Scenario...... 51 Figure 3.3: Level 2 G rid Type ...... 51 Figure 3.4: Level 3 G rid Type ...... 52 Figure 4.1: Evolutionary change through incremental IT adoption...... 62 Figure 4.2: Grid-related standards organizations...... 63 Figure 4.3: Vision of the Grid Computing journey...... 64 Figure A. 1: Web Services Architecture Stack ...... 71 Figure B. 1: Microsoft .NET Architecture...... 77 Figure B.2: Common Language Runtime Architecture...... 78 Figure C. 1: Alchemi architecture and interaction between its components ...... 81

6 CHAPTER 1 INRODUCTION TO GRID COMPUTING

1.1 Grid Computing

There is general agreement among businesses that today's information infrastructure costs too much and does not meet the need for flexibility and reliability that they demand [4]. Companies face the paradox of increasing demand for compute cycles while simultaneously possessing unused compute cycles on the servers and desktops across the enterprise. This disconnect exists in almost all companies and the underutilization rate often increases dramatically with the number of PCs owned by the business. Therefore, there is tremendous opportunity to capture the latent value of Information Technology (IT) investments just by the efficient utilization of deployed assets.

Grid Computing extends that concept of cluster computing to a scale that crosses organizational boundaries. The Grid refers to an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and other resources, such as scientific instruments, that may be owned and managed by multiple organizations. Grid applications often involve large volume data (> Terabyte) and often require secure resource sharing across organizational boundaries that are not easily handled by today's Internet and Web infrastructures [1]. This new innovative approach to computing can be thought of as a massive "utility" grid, similar to the electric grid that provides power to homes and businesses each and every day. In this same utility fashion, Grid Computing openly seeks to be capable of adding an infinite number of computing devices into a grid environment, adding to the computing capability and problem resolution tasks within the operational grid environment [2]. It can also be stated as a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed "autonomous" resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements [3].

7 The architecture evolution reflecting the change from a centralized architecture to distributed grid is represented in Figure 1 [4]. In this representation, the centralized model leverages a relatively homogenous environment and focused support specialists. In contrast, the distributed architecture provides businesses with more flexibility and more control over their IT environment. But this also leads to additional complexity, lower asset utilization, a distributed support staff and additional facilities costs which virtually eliminates the potential cost benefit of mini and micro computers used. The next jump on this evolutionary trail is the virtualization of resources such as the dynamic, efficient pooling of mixed compute, storage and network resources which is enabled by the rapidly declining cost/performance ratio of network bandwidth and systems management software.

Virtualized

Distri butedGrd Grids

Blades Centralized

Micros Minis

Mainframes

1960 1970 1980 1990 2000 2010

Figure 1.1: Architecture Evolution

The term Utility Computing and Grid Computing are being used interchangeably due to marketing efforts of various companies although differences exist in the principles of its approach [25]. Utility Computing is the ability to intelligently match IT resources to business demand on a pay-for-use basis, which is an on-demand variation of the Grid

8 Computing concept. Utility Computing benefits include reduced cost, speed time to value and alignment of IT spending with business activity.

A further study of the evolution of the grid is represented in Chapter 2 along with the various current architectures available and different vendor based initiatives towards grid computing.

1.2 Origins of Grid Computing

In order to understand the current trends in grid computing it would be ideal to put the concept in perspective of its origins. Grid computing concept is not entirely new. It is actually a variation on the notion of distributed computing, which takes an application off one server and runs it more effectively by spreading it across several servers on a network. In the early 1970s when computers were first linked by networks, the idea of harnessing unused CPU cycles was born which leveraged the Internet's predecessor, the ARPAnet, in the form of distributed computing. Distributed computing scaled to a global level with the maturation of the Internet in the 1990s.

The Internet has had the powerful effect of connecting computers, but more importantly people. However, only email and the World Wide Web systems have been widely used. The problem with the Internet is that it connects dispersed computing resources but doesn't provide a way to coordinate them to work on a common problem. New protocols such as resource discovery, negotiation or coordination are necessary for integrating and trading resources on the Internet. As more and more resources are converted from atoms to bits and are connected to the Internet, a standard framework is required to approach these resources on the net, no matter what protocols they use and where they are located, and coordinate them together in order to help people collaborate and work on a common and complicate problem [2].

9 The most successful and popular of distributed computing projects in history, is the SETI@home (Search for Extraterrestrial Intelligence) project. The project's focus is to search for radio signal fluctuations that may indicate a sign of intelligent life from space. Over two million people - the largest number of volunteers for any Internet distributed computing project to date - have installed the SETI@home software agent since the project started in May 1999 to search through signals collected by the Arecibo Radio Telescope in Puerto Rico. The project originally received far more terabytes of data every day than its assigned computers could process. This project conclusively proved that distributed computing could accelerate computing project results while managing project costs. The project invited volunteers to download the SETI@home software to donate the idle processing time on their computers to the project. The project originated at the University of California at Berkeley.

Grid computing is evolving into an important discipline within the computer industry by extending distributed computing through an increased focus on resource sharing, co- ordination, manageability, and high performance. Current distributed computing technologies, such as Internet technologies, addresses communication and information exchanges among computers but does not provide an integrated approach to the coordinated use of resources at multiple sites for computation. Even Business-to- Business exchanges focus on information sharing and not necessarily computing resource sharing.

The Grid has many of the same characteristics as the electric grid, such as [5]:

" Complexity characterized by complicated connections across multi administrative domains. " Standards enabled through research, business drivers, and Government or association standard enforcement " Distribution characterized by several distribution levels with different complexity and different capacity.

10 Some experts postulate the concept of Grid technology arriving in three waves: Firstly in academic research communities, followed by corporations which is beginning to happen now. The ultimate goal however is the third wave, which will see the technology coalesce to create a processing network analogous to the Web, and called simply "the Grid" [15].

1.3 Market for Grid Computing

Grid Computing is slowly gaining momentum in the business world and its application is being tested in various industrial segments around the world. Marketing efforts are underway by the big technology vendors such as IBM, SUN, HP and Oracle, to name a few, touting their Grid enabled solutions leveraging their computing cycles and enormous productivity gains.

Having the backing from the major industrial players has opened up the 'Great Grid Rush' of this century. The rush is justified by the market opportunity that grid computing promises. According to an IDC survey the market opportunity for grid computing is projected to be $12 billion dollar by 2007 as shown in Figure 1.2 [6].

Some of the key findings in the IDC report include:

- Primary customer motivations, adoption scenarios, targeted workloads, and business needs suggest that the grid market is beginning to split into three distinct segments: compute, data, and optimization " Early adoption has largely been in the high-performance computing market for large batch-oriented grids " Emerging opportunity is focused primarily on the pooling and allocation of resources across a variety of business services

11 Grid Revenue Forecast by Grid Category, 2002-2007

$14,000 $12.000 $10,000 $8.000 $0,000 $4,000 $2.000

2002 2003 2004 2005 2006 2007 Year

*Campute gDate DOptmizetion

Figure 1.2: Revenue Forecast by Grid Categoryl, 2002 - 2004

IDC attributes the projected revenue increase to a number of factors, including the maturation and standardization of Grid software, the drive for efficient use of IT infrastructure by end users, the expanded awareness of Grids, and the expansion of the market beyond traditional HPC applications and users.

Another recent IDC study [7] shows that grid computing adoption in Western Europe, although still nascent, is expected to reach beyond the high performance computing (HPC) space and will become more pervasive in commercial data centers over the next few years. IDC's European Enterprise Server Group forecasts that grid computing will approach $1.8 billion in server revenue by 2008 across HPC technical markets and commercial applications. The study builds on IDC's server research group investigations into emerging grid technology in an attempt to define and size the present and future grid computing space in Western Europe, in particular the incremental server revenue attached to grid computing.

The increasing corporate demand for flexibility, reliability and resiliency with reduced costs of technology needs has increased the interest of corporations to study the benefits

Grid Categories are explained in Chapter 2

12 provided by grid computing. Grid computing can provide many benefits not available with traditional computing models [26]:

- Better utilization of resources - Grid computing uses distributed resources more efficiently and delivers more usable computing power. This can decrease time-to- market, allow for innovation, or enable additional testing and simulation for improved product quality. By employing existing resources, grid computing helps protect IT investments, containing costs while providing more capacity - Increased user productivity - By providing transparent access to resources, work can be completed more quickly. Users gain additional productivity as they can focus on design and development rather than wasting valuable time hunting for resources and manually scheduling and managing large numbers of jobs and providing end users uninhibited access to the computing, data and storage resources they need (when they need them) - Scalability - Grids can grow seamlessly over time, allowing many thousands of processors to be integrated into one cluster. Components can be updated independently and additional resources can be added as needed, reducing large one- time expenses " Flexibility - Grid computing provides computing power where it is needed most, helping to better meet dynamically changing work loads. Grids can contain heterogeneous compute nodes, allowing resources to be added and removed as needs dictate.

Additional motivations of corporation to connect to the grid are represented in Figure 1.3 and some key drivers are discussed below.

IT Infrastructure Complexity Over time, organizations have adopted new information technologies to support new applications and placed them alongside older technology rather than abandoning their previous investment in systems, software, and processes. This has often led to an IT infrastructure that is very complex. The foundation often is 20 years old. Layered on top

13 of that foundation are layers of new technology. Complexity clearly can increase an organization's staff-related costs. It may also decrease an organization's ability to respond to rapidly moving market forces [14] and thereby affects its agility and adaptability.

Deliver Tinme to Cresa COW04pmim, I m Better Markel Ad vwftp F"t Reduce Services _A Operational U Costs Hh sters Availaiy Simplify IT Operations D

iviaageentDa id _ Reduce Capital COSs Grid Computing Server sted Consolidation Utilization *IDC

Figure 1.3: Connecting Motivations to Grid Computing

Reduce Capital Costs Declining or stagnant revenue has forced many organizations to seek ways to trim costs. IT budgets either have been reduced while the IT organization is expected to continue supporting all application systems and business initiatives or have remained static while the IT organization is expected to provide expanded levels of service to the organization. Thus, the IT staff faces the paradox of having to do more with less. To address these issues, IT organizations have implemented various techniques such as consolidating multiple workloads on fewer, larger system configurations in a smaller number of datacenters, increasing focus on the use of multi-system configurations (blade computing, clustering, or grids) constructed of high-volume, low- cost systems and reducing the number of system architectures utilized throughout the

14 network to make development, administration, operations, and support easier and, thus, less costly

Improved System Utilization Organizations have installed a number of different computing solutions and are now finding that these systems are not fully utilized nor do they interoperate as a single computing resource. IT management is often seeking ways to increase the overall utilization of its systems and integrate application systems. Since organizations are facing a rapidly changing environment, they are also concerned about agility. That is, they are seeking ways to align their system resources to address today's need, knowing that tomorrow they may need to realign to address a different set of needs.

While potential opportunity is both broad and significant, there are numerous and varied challenges and obstacles that may hinder its adoption including:

" Cultural and organizational concerns associated with resource sharing - e.g., the comfort factor associated with virtualized resources for business units " General lack of commercial applications running in a grid environment - General lack of tools and industry standards leading organizations to think of grids as requiring large people and services costs, which lessen any infrastructure cost savings " Security concerns

Chapter 3 discusses the barriers to adoption of Grid Computing in corporations.

1.4 Grid Computing Competitive Landscape

The market opportunity of grid computing has been dominated by several major vendors like IBM, SUN, Microsoft, Oracle and EDS to name a few (Figure 1.4). There are also other providers at different segments of the grid computing offerings.

15 A Michael Porter Five Forces view of the Grid computing landscape shows that the barrier to entry into this market space is low and the rivalry within the industry although civil has seen recent trends of cluster alliances. The availability of several vendors has increased the buyer power although that view may also limit extensibility of their architectures by using vendor specific grid enabler tools. This also can project initial higher switching costs for corporation to use alternatives. As the technology matures and acceptance grows within business communities, the potential for more entrants is high.

Full-Service-Oriented On-Demand Adaptive Agile En-errise-r e Business Enterprise Enterprise -Enterprise vision -Solution set -Road map IT to BP iM ~ v6 Emerging Utility Full-Service Orientation Unisys C Computing

Business Corio AT&T SevenSpace Blueprinting NaviSite Coradiant Digex Hosting, Point Solutions and NTTIVero Managed Service Providers Rackspace Inforonics Savvis

Niche dbaDirect Guardent BroadSoft Sector AimNet Solutions broad:margin Nuvo Koyntta Gartew

Figure 1.4: Key Providers of Grid/Utility Programs

Figure 1.4 also depicts the confusing messages that are being marketed by different vendors. For example IBM has their On-Demand offering whereas HP and EDS has their

Adaptive and Agile Enterprise offerings respectively.

Whereas, Figure 1.5, shows the current best positioned suppliers of Grid Computing services from the perspective of the customers according to IDC market research. IBM is viewed as the market leader and their market positioning strategy has already started reaping the rewards. According to the VP of Grid Computing at IBM, Ken King, IBM's

Grid business had already almost hit the $1 billion mark [27].

16 A summary of the Go-to-Market strategy segmentation is provided below:

- Development of channel model (e.g. IBM Global Services, ) - Partnerships with ISVs to enable delivery of ASP (e.g. IBM GS) - Stand-alone offerings (e.g. IBM GS, HP)

IBM

HP, AT&T

Microsoft

EDS, Sun Microsystems

Dell, CSC, Cisco, Qwest

0 5 10 15 20 25

Figure 1.5: Best-Positioned Suppliers of Grid Computing Services

Although the initiatives may be marketed differently by the major players for differentiation essentially it all related to grid computing. The next chapter discusses some of these different messages and how they relate to their Grid Computing architecture application.

17 CHAPTER 2

GRID COMPUTING INFRASTRUCTURE

2.1 Grid Computing Architecture Introduction and IT Evolution

Grid Computing has been an integral part of the Information Technology evolution. In Chapter 1 the architecture evolution was presented to define the concept of virtualization and the enabling virtual organization (VO). Over the last several years the grid community has produced protocols, services and various tools that address the notion of building scalable virtual organizations. Grid Computing, in turn, provides highly scalable, highly secure, and extremely high-performance mechanisms for discovering and negotiating access to remote computing resources in a seamless manner. This makes it possible for the sharing of computing resources, on an unprecedented scale, among an infinite number of geographically distributed groups.

Prior to delving deeper into Grid Computing architecture, the evolution of IT would be evaluated in relation to emergence of grid or utility computing. The consumption model reflects how corporations purchase their IT as shown in Figure 2.1 [4]. The evolution started off with the concept of 'Insourcing' where corporations owned all compute resources and managed them internally. Since the last two decades IT was used to implement additional business functions leading to higher complexity and costs. In order to manage cost and complexity companies started to outsource non-core business functions initially to other IT vendors while maintaining the decision to architectures and product selections. Although this model provided a better option than 'Insourced' model the corporations still lacked the agility and cost performance that was required to remain competitive. The next wave represented in the figure is that of the utility or grid computing where the virtualized organization will be the implementation for the corporation to purchase of services for computation or utilize the existing idle CPU cycle

18 ......

resources currently available. There is the need for the openness of standards and guidelines to follow in order to maintain the transparency of resource availability and their utilizations across organizations.

Utility Services

Selective Outsourcing Provider-owned, -managed and -operated assets

Insourced Company-managed and provider-operated assets

Company-owned, -managed and -operated assets

1960 1970 1980 1990 2000 2010

Figure 2.1 Consumption Model

Grid computing environments must be constructed upon the following foundations [4]:

" Coordinated resources. Avoid building grid systems with a centralized control; instead, provide the necessary infrastructure for coordination among the resources, based on respective policies and service-level agreements " Open standard protocols and frameworks. The use of open standards provides interoperability and integration facilities. These standards must be applied for resource discovery, resource access, and resource coordination.

Additionally a metric would be included for the measurement of Quality of Service (QoS) requirements necessary for the end-user community.

19 2.2 Grid Computing Attributes

The requirements for grid computing infrastructure can be described by the following attributes [16]:

Virtualization Virtualization is the abstraction into a service of every physical and logical entity in a grid. Virtualization is important because it enables grid components (such as storage, processors, databases, application servers, and applications) to integrate tightly without creating rigidity and brittleness in the system. Rather than making fixed ties that determine which application server node will handle requests from a particular application, for example, or where a database physically locates its data, virtualization enables each component of the grid to react to changing circumstances more quickly and to adapt to component failures without compromising performance of the system as a whole.

Dynamic Provisioning Provisioning simply means distributing supplies where they are needed. In the context of the grid, "supplies" may mean server requests that need to be handled, data that needs to be accessed and used, or computations that need to be performed. Provisioning in the grid environment means a grid service broker that knows the resource requirements of one element of the grid and the resource availability of another element links the two together automatically and dynamically to make efficient use of resources. Then it adjusts the associations as circumstances change. Policies, such as response time thresholds or anticipated peak demands, can be used to further optimize the associations of resource- requestors to resource providers.

Resource Pooling Consolidation and pooling of resources is required for grids to achieve better utilization of resources, a key contributor to lower costs. By pooling individual disks into storage arrays and individual servers into blade farms, the grid runtime processes that

20 dynamically couple service consumers to service providers have more flexibility to optimize the associations. Resource sharing also happens purely in software. Web services provide the model for applications to expose re-usable functionality for discovery and invocation by unrelated applications.

Self-Adaptive Software With labor being the most significant portion of IT costs, savings due to better hardware utilization or more responsive systems become irrelevant if the everyday tasks of administrators are not automated and simplified. A grid infrastructure would be unworkable if every node required constant manual tuning and intervention. A critical grid infrastructure requirement is systems that automate the bulk of maintenance and tuning tasks traditionally performed by IT staff. More of the tasks that used to be performed by administrators must now be handled by the systems themselves.

Unified Management Even with self-managing systems, human beings will always be involved in managing an enterprise grid, but the management tasks required by humans should be simplified with a single tool that can provision, monitor, and administer every element in the grid. Such a tool should evaluate availability and performance from the perspective of the user, such that any bottleneck in the system or any unavailable component raises alerts. Most importantly, with a grid infrastructure, IT professionals must be able to treat groups of systems as a single logical entity so that tasks can be performed once and executed on multiple machines.

Implement One from Many. Together, the attributes of virtualization, dynamic provisioning, and resource pooling form the requirements for software that implements a single logical entity using many services running on multiple servers and crossing multiple disks-an entity which delivers high quality of service from low-cost components.

21 Manage Many as One. Together, the attributes of self-adaptive software and a unified management model form the requirements for dramatically lowering management costs by viewing the entire enterprise grid as one simple whole.

2.3 Grid Computing Standard

The Global Grid Forum (GGF) has the purpose of defining specifications for grid computing. The Globus Alliance implements these standards through the Globus Toolkit, which has become the de facto standard for grid middleware. As a middleware component, it provides a standard platform for services to build upon, but grid computing needs other components as well, and many other tools operate to support a successful Grid environment. This situation resembles that of TCP/IP: the usefulness of the Internet emerged both from the success of TCP/IP and the establishment of applications such as newsgroups and web pages.

Globus has implementations of the GGF-defined protocols to provide:

1. Resource management: Grid Resource Allocation & Management Protocol (GRAM) 2. Information Services: Monitoring and Discovery Service (MDS) 3. Security Services: Grid Security Infrastructure (GSI) 4. Data Movement and Management: Global Access to Secondary Storage (GASS) and GridFTP

Emergence of Web Services and their increased popularity within industry has allowed for refocusing of the strategy for Globus Toolkit to incorporate Web Services. More details about Web Services is available in Appendix A. The Open Grid Services Architecture (OGSA) represents an evolution towards a Grid system architecture based on Web Services concepts and technologies. Open Grid Services Infrastructure (OGSI) is a formal specification of the concepts described by the OGSA. OGSA is a distributed interaction and computing architecture that is based around the Grid service, assuring

22 AM

interoperability on heterogeneous systems so that different types of systems can communicate and share information (Figure 2.2). It leverages the emerging Web services to define the Web Services Definition Language (WSDL) interfaces. Specifically, the Grid service interface is described by WSDL, which defines how to use the service. OGSI 1.0 specifies a set of service primitives that define a nucleus of behavior common to all grid services. The Web Services Resource Framework (WSRF) is an evolution of OGSI 1.0. Its goal is to evolve the grid architecture in a way that's more clearly aligned with the general evolution of Web services. Instead of defining a new type of grid service, these specifications will allow the services specified in the OGSA to be based completely on standard Web services.

Globus Toolkits (GT), the standard, as implemented in GTI and GT2, was originally formulated without reference to the industry standard of Web Services. However, the most recent release (GT3) has moved fairly aggressively towards Web Services as the underlying architecture. Although GT3 incorporates many recent standards, such as XML and Web Services, it has a complicated structure that contains much legacy GT2 code. The next version, GT4, is slated to be released in 2005.

and industry R ID- 'ivanaged shared Academic Svirtual systems' C: 0 Open Grid .2 -r, Web services, etc. +.. M Services Arc SRealstandards -0C, - Multiple irrplementations a) C Internet Cn M Globus Toolkit 0 +w standards 0 C Defacto standard - Custom Single implementation solutions --11- 1990 1995 2000 2005 2010 Figure 2.2: The evolution of grid technologies and standards [8]

23 2.3.1 Globus Toolkit Standard

The Globus Toolkit (GT) [29] is the open source software base for building Grid infrastructure and applications. A software system addressing key technical problems in the development of Grid-enabled tools, services, and applications which offers a modular set of orthogonal services and middleware for building solutions.

The evolution of the Globus Toolkits started off with GT1.0 which was released in 1998 followed by GT2.0 in 2001 and the current production version GT3.0 in June 2003. The discussion here is limited to GT3.0 as the new version, GT4, has not being released.

The GT3 core architecture includes four core components that are represented by a white background in Figure 2.3 [10]. The core components together make up the building blocks of the Grid Services. OGSI Specification and Security Infrastructure implementations do not provide any run time services but serve purely as a base for other services.

Grid Service Container I

System-Level Services

jOGSI Spec I mplementation Security Infrastructure I

Hosting Environment

Figure 2.3: GT3 Core Architecture

24 OGSI Specification Implementation provides implementations for all OGSI specified interfaces, as well as APIs and tools to make it easier to develop OGSI compliant services.

Security Infrastructure implementation provides SOAP as well as transport level message protection, end-to-end mutual authentication, and single sign-on service authorization; basically a rendering of the GSI implementation known from Globus Toolkit 2 in an OGSI environment. This includes using X.509 identity certificates for authentication and X.509 Proxy certificates to support delegation and single sign-on which was updated to conform to latest IETF/GGF draft.

System-Level Services are general-purpose services that facilitate the use of Grid Services in production environments and include the following services: - Administration Service - Logging Service - Management Service

Grid Service Container includes the above three core components and also shields the application from environment specific run-time settings, such as what database is used to persist service data. The container also controls the lifecycle of services, and the dispatching of remote requests to service instances.

Hosting Environment currently offers support for four Java environments: 1. Embedded: utility to be used in clients or lightweight servers to expose Grid services 2. Standalone: lightweight server program (essentially the embedded hosting environment with an additional server mainline with startup options) 3. Servlet: container inside of a standard Java Servlet Engine 4. EJB: container inside of an EJB application server

Base Services includes higher-level services like Program Execution, Data Management, and Information Services which was discussed earlier.

25 User-Defined Services are higher-level services which are not included in the toolkit but are built on top of any subset of GT3 components including Base Services.

2.3.2 Grid Architecture and Internet

Figure 2.4 compares the Grid architecture with the Internet architecture. In this framework, each layer of the Grid provides the Application Programming Interface (API) and Software Development Kit (SDK) to help with the application development. The separation of each layer makes it is possible to implement many protocols in each layer without losing interoperability. The layered architecture makes it possible to easily assemble resources on the grid for sharing and building Grid applications through all the levels [3]. Descriptions of the layers are provided below.

Grid Infrastructure Internet Infrastructure

Application

"Coordinating multiple resources": ubiquitous infrastructure services, app- specific distributed services

"Sharing single resources": negotiating Resource access, controlling use

"Talking to things": communication (Internet protocols) & security

"Controlling things locally": Access Fabric to, & control of, resources

Figure 2.4: Layered Grid architecture and its relationship to Internet Protocol architecture

Fabric Layer deals with local resource and is resource-specific. The resource can be computing cycles, storage, network, code repositories, or catalogs like databases. In terms

26 of resource, alternative resource specific operations can also be provided. However, the minimum required operations include enquiry and resource management operations.

Connectivity Layer supports core communication and authentication. Since virtual organization systems work in a distrusted and dynamic relationships environment, security is extremely important. The required functions are single sign on, delegation, integration with various local security solutions, and user-based trust relationships. Globus related protocol is the Grid Security Infrastructure protocol which is based on public-key.

Resource Layer defines a suite of protocols on service negotiation, initiation, monitoring, control, accounting, and payment. However, this layer still concerns only the local resource. It deals with two classes of information: information protocols and management protocols. Because management protocols handle the negotiation and sharing relationship initiation, they are embedded with access policies. In Globus Toolkit, Grid Resource Information Protocol (GRIP), Grid Resource Registration Protocol, Grid Resource Access and Management, and GridFTP are defined in this layer.

Collective Layer is used to coordinate multiple distributed resources and capture interactions across collections of resources. It is not related with any individual resource. Due to the variety of Grid applications, there can be many protocols defined in this layer, for example, directory service, collaboration frameworks, software discovery etc. The Collective Layer can be general or domain specific. In Globus Toolkit, there are many protocols defined.

The Application Layer consists of Grid applications, which can be developed on services defined at any layer. The applications within this layer may in practice call upon sophisticated frameworks and libraries like the Common Component Architecture, SciRun , CORBA, Cactus and workflow systems [9].

27 2.3.3 Grid and Web Services Standard Convergence (WSRF)

Since releasing Globus Toolkit 3.0 in July 2003, the GGF and the Globus Alliance have been working closely to define enhancements to the standards. In January 2004, they presented the WS-Resource Framework (WSRF), an open framework for modeling and accessing stateful resources using Web services. WSRF defines where Web service standards are evolving to meet grid services elements and requirements and the convergence of core technology standards would allow a common base for business and technology services (Figure 2.5). The specification consists of separate specifications, each one focusing on a specific area.

Grid Gr1 Gr2 Started far OGSC apart in HaveWS- Compliant applications H Technology & converging Stack technology

Figure 2.5: Convergence of Grid and Web Service Standards [11]

The WS-Resource Framework (WSRF) is a set of six Web services specifications that define what is termed the WS-Resource approach to modeling and managing state in a Web services context. It can be viewed as a straightforward refactoring of the concepts and interfaces developed in the OGSI version 1.0 specification, in a manner that exploits recent developments in Web services architecture (e.g., WS-Addressing) to express these concepts and interfaces in a manner that is fully aligned with current Web services directions. WSRF retains essentially all of OGSI concepts, and introduces only modest changes to OGSI messages and their associated semantics. GT4.0 is expected to be WSRF enabled.

28 2.4 Grid Computing Categories

Grid Computing application can be varied based on the needs and utilization requirements. Primarily there are three main categories that are currently assessed and implemented both in the research and business communities. These include Compute, Data and Optimization Grids. The details about these grid categories are presented in the following sections.

2.4.1 Compute Grid

A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities [5].

There are many different implementations of computing Grids. For example, in the Condor system, a computing Grid scheduling system developed by University of Wisconsin in Madison, there are three basic components: central manager, execution hosts, and submission hosts. The central manager collects the status of all computing server in its clusters and matches computing resource requests with a server that can meet this requirement. Execution machines are servers executing the assigned jobs. Submission hosts are servers that request computing resources. Every server in the cluster can be configured as an execution machine and submission machine at the same time. There is always one central manager in one cluster although it can co-exist with submission machines and execution machines. These servers can be dedicated servers for computing Grid or idle machines.

BASF, one of the world's largest chemical companies, uses Platform Computing Inc.'s grid products to power a project that examines the effect of catalysts on accelerating chemical reactions. The problems faced by BASF were that using traditional computer approaches, the analysis requires very complex, compute-intensive simulations could

take days to complete - and the process also required a large amount of intervention by

29 the researchers, reducing their productivity. BASF wanted a solution that would run the simulations faster, improve researcher productivity, and use the existing IT infrastructure. By deploying Platform's LSF grid offering, BASF was able to increase processor utilization by 90%, automate manual scheduling and processing tasks, and reclaim 20% of the researcher's work [17].

2.4.2 Data Grid

Data grids are grids that provide computing resources that allow for in depth analysis of shared large-scale (and often diverse) databases. Data Grid has overlapping goals with heterogeneous distributed database systems, which deal with different kinds of database management systems distributed in a heterogeneous environment like different hardware, operation system, network connection, data models, and even DBMS vendors. Both of them aim to resolve the distributed data management tasks including integrated data catalog, data discovery, distributed query, distributed transaction, and semantic integration. However, most distributed database management systems are focused on research under the database circumstances while Data Grid targets an even more complicated dynamic environment across administrative domains. Second, most distributed database systems require a central information server, which is not possible in the dynamic Data Grid environment. Therefore, notification and event mechanisms are very important in data Grid.

Pacific Life Insurance, the fifteenth largest life insurance company in the nation, has deployed Entropia's DCGrid to accelerate financial modeling and simulation, using their existing desktop PC infrastructure. DCGrid's ability to run these simulations more quickly translated into a key competitive advantage for Pacific Life. DCGrid was able to cost-effectively exploit the excess power of existing PCs, allowing Pacific Life to avoid the purchase of additional hardware and related training [17].

30 2.4.3 Optimization Grid

Enterprise optimization grids help enterprises increase asset utilization through optimized grid design. These grids focus on providing increased computing resources and better storage systems utilization for enterprises that are trying to better leverage their investments in computer systems and storage. This includes pooling of resources together for better economy of scale and also scaling computing and storage resources to meet spikes in computing demand.

Hewitt Associates, a large human resource consulting and outsourcing firm, operates one of the best know examples of an enterprise optimization grid. The company reconfigured its existing information systems environment to offload complex, resource-intensive pension calculation applications off of its mainframe systems onto less expensive grid- based blade servers. By so doing, Hewitt Associates has been able to reduce costs associated with processing pension calculations on the more expensive mainframe, while improving the performance (by more than 90%) of the pension application [17].

2.5 Applications of Grid Computing

Grid Computing in the commercial market has been led by several major vendors in both the software and 'hardware' aspect of grid computing. Currently there are already numerous grid computing based projects in existence in academia and government initiatives. More recently the major IT vendors have declared key Grid Computing based corporate strategic goals. IBM, SUN, Oracle, Microsoft and HP were chosen, in this thesis, as commercial vendors for an analysis of their Grid Computing offerings. Gridbus project is also analyzed to provide the Open Source and research community perspective on Grid Computing initiatives. GridGarden.NET, a Microsoft .NET based Grid Framework developed at MIT, is also presented.

31 2.5.1 IBM On-Demand

IBM On-Demand is the company's marketing term that denotes a company whose business processes-integrated end-to-end across the company and with key partners, suppliers and customers-can respond with flexibility and speed to any customer demand, opportunity, or external threat. On-demand businesses are responsive, variable, focused and resilient. IBM has established key business relationships with leading middleware independent software vendors (ISV), like Avaki, Platform Computing, Data Synapse and , to provide customers with the most robust grid solutions in the industry. IBM is also a strong supporter of open standards for Grid architectures and has also developed an extension of GT3 called the IBM Grid Toolbox V3 [24].

The IBM Grid Toolbox V3 for Multiplatforms implements the OGSI standard and provides the tools to build a grid and to develop, deploy, and manage grid services. The IBM Grid Toolbox consists of the following:

" A hosting environment that is capable of running grid services and sharing them with other grid participants, such as grid service providers and grid service consumers " A set of tools to manage and administer grid services and the grid hosting environment, including a Web-based interface, the Grid Services Manager - A set of APIs and development tools to create and deploy new grid services and grid applications

The IBM Grid Toolbox V3 for Multiplatforms includes core grid services and base grid services as shown in Figure 2.6. Core grid services are always available in any instance of the IBM Grid Toolbox. Core grid services cannot be deployed or un-deployed separately from the installation of the IBM Grid Toolbox. Base grid services can be deployed with the installation of the IBM Grid Toolbox, but they can also be deployed or un-deployed separately.

32 Ptag;rn MAXj*ff Motio CMM Po"ic 30roup fMa negnr~t 5*rvices Manageruarnt SOM14*4 StIV1* Gou SetVktsSevcs3rs

OG51 cote sevcs r

IBWb~phues? Application Servot Expiess V5.2

Figure 2.6: IBM Grid Toolbox Base Services

The IBM Grid Toolbox includes a list of base grid services that can be deployed with the installation or deployed or un-deployed separately. The following base grid services are included:

Program Management Services Program Management Services manages jobs located on local or remote instances of the IBM Grid Toolbox. Program Management Services simplifies the use of remote systems by providing a standard interface for requesting and using remote system resources for the submission and control of jobs on a grid. This implementation is typically used to support distributed computing applications.

Information Services Information Services provides information about grid resources for use in resource discovery, selection, and optimization. Information Services also maintains knowledge about resource availability, capacity, and current utilization. This information is critical to the operation of the grid and development of applications. Within any grid, resources will fluctuate, depending on their availability to process and share data. As resources become free within the grid, they can update their status within the grid information services. The client, broker, and grid resource manager use this information to make informed decisions on resource assignments.

33 Data Management Services Data Management Services provides data transfer capabilities throughout the grid. Without this grid service, data is not able to move through the grid nodes. Grid applications use this grid service to move data from one node to another.

Common Management Models (CMM) Services The CMM implementation in the IBM Grid Toolbox provides the infrastructure required to represent an instrumented resource as a grid service, so that it can be queried and managed in a grid context.

Policy Services The Policy Services in the IBM Grid Toolbox enables administrators to define a set of business goals and to enforce a set of rules that allow their grid to meet those goals. In the IBM Grid Toolbox, a policy identifies the desired outcome for the interactions between different elements in the grid environment. The core set of services to define, manage, and apply policies on a grid include: Policy Service Manager (PSM), Policy Service Agent (PSA) and Policy Repository.

Service Group Services The Service Group Services implementation in the IBM Grid Toolbox includes extensions to the service group services in the core Globus Toolkit, and is compliant with the Service Group definition in the OGSI specification. Service group provides a framework that allows grid applications to categorize (group) grid services and later execute queries on the group of services to find specific types of grid services. Grid application writers might use the service group framework to implement a registry service for their grid.

34 =A - -_am

IBM WebSphere

http server soap engine

IBM Grid Toolbox

0 GSI core (G T3)

Figure 2.7: IBM Grid Toolbox Hosting Environment

The IBM Grid Toolbox is built with an embedded version of the IBM WebSphere Application Server - Express V5.0.2. Although it functions nearly the same as a full- product version of WebSphere Application Server, it is used exclusively for the IBM Grid Toolbox. A hosting environment can be seen essentially as a grid container running inside of a Java engine (Web container) or an EJB application server (WebSphere Application Server) as shown in Figure 2.7 [24].

2.5.2 Sun N1 Grid

Sun NI Grid is its architecture for the next -generation data center. The architecture is designed to make the entire data center behave as a single, unified system. NI is designed to reduce management complexity and cost; increase data center resource utilization, improve infrastructure responsiveness and agility, and ensure investment protection. Sun's new Grid offerings come in four categories: access, data, computation, and visualization [13].

Sun's "access software" enables efficient usage of resources regardless of location, and is provided through a new Grid Portal solution that relies on the Sun Grid Engine Enterprise Edition (SGEEE) (currently called Sun NI Grid Engine) software and the industry- standard Globus toolkit.

35 Sun's Data Grid solutions rely on the Sun StorEdge Open SAN Architecture, the Sun StorEdge 3510 FC array, and Sun StorEdge SAM-FS and QFS software. Sun said its Data Grid solutions allow data to be collected, managed and protected regardless of user or data location.

The Sun Fire Compute Grid family couples Sun Fire systems with a choice of interconnect technologies, which Sun says "provides excellent price-performance with clusters of small systems as well as excellent price/productivity with superclusters" that utilize large memory and simplified programming environments. Interconnect choices include Gigabit Ethernet switches, Myrinet, Infiniband, Quadrics, or the Sun Fire Link interconnect.

Visualization Grids let applications perform graphics operations using local or remote graphics systems. Sun's Visual Grid platform is based on the Sun Fire V880z, the XVR- 4000 high-speed graphics subsystem, and specialized software based on the OpenGL industry standard.

Sun says its Grid Reference Architecture provides a framework for the deployment of these building blocks. Sun's Customer Ready Systems (CRS) program integrates the building blocks with complementary third-party hardware and software into "ready-to- deploy" solutions that are built in Sun factories based on a customer's specifications and supported by a global professional services practice focused on Grid deployments.

36 ......

Sun Grid Solution Stack Web Interface Sun Grid Portal

SolarisTm Operating Environment & Lirw*

Sun Enterprisem and Sun Fire TM Servers Sun StorEdgeTM Systems and HPC SAN hin and Bladed Servers - SPARC & Intel

Figure 2.8: SUN NI Grid Computing Strategy

Sun NI Grid Engine software is a distributed management product that optimizes utilization of software and hardware resources. It can increase utilization of available resources to as much as 98 percent. Sun NI Grid Engine software is both a job manager and a for clusters of computers. The Sun NI Grid Engine Enterprise Edition software can harness computing power across multiple clusters (campus grids). Sun's NI Grid technology is designed to allow multiple software applications to share a common pool of servers and storage resources. By opening up previously static relationships between hardware, applications, and the , NI Grid technology enables flexible provisioning of resources so that excess capacity can be available to power other applications in the virtualized Network.

Sun Grid Engine software works by enabling companies to submit and manage jobs from just about any Linux or UNIX system on the network. It does this by monitoring the availability of workstations, then deploying jobs to the available resource. Additionally,

37 the command line utility gives the company the flexibility to script and automate jobs as well as build a custom front end. And the GUI provides the business a convenient management tool for administering the Sun ONE Grid Engine software.

Sun NI Grid Engine software enabled hosts can be master hosts, execution hosts, submission hosts, and administration hosts. These roles are not mutually exclusive; it is possible for a host to perform all four functions. A typical cluster configuration is to have one master host, running the sgeqmaster (manager) and sge-schedd (scheduler) daemons and the other hosts running sge execd (execution) daemons. All Sun NI Grid Engine software hosts are communicating through TCP/IP; for this purpose, there is a special daemon, sge commd, running on each host.

Computing resources are modeled by Sun NI Grid Engine software as job execution queues. Each queue can have specific attributes and can support multiple parallel environments. The most frequent parallel environments used are Message Passing Interface (MPI) and parallel virtual machine (PVM).

2.5.3 Oracle 10g

Oracle offers organizations a comprehensive solution to manage information and run enterprise applications on grids. Oracle Database lOg has been designed to manage information on computing grids called database grids. Oracle Application Server I Og (OracleAS lOg) has been designed to run enterprise applications on computing grids called application server grids. Both Oracle Database lOg and Oracle Application Server lOg can be very efficiently managed in a grid computing environment using Oracle Enterprise Manager lOg Grid Control. Figure 2.9 illustrates Oracle 1 Og products and features mapping to grid computing requirements [21].

38 ~~Iwffr!!mE~I-in~.& - -

One from 2 Manygh sa an ad man newO

Manage Many as One

Figure 2.9: Oracle 1 Og Product Grid Mapping

Oracle Database 1 Og builds on the success of Oracle9i Database, and adds many new grid-specific capabilities. Other vendors implement certain portions of a grid infrastructure, for example pools of virtualized storage are becoming common, but no one else can provide a true grid database. Oracle Database 1 g is based on Real Application Clusters, introduced in Oracle9i.

Oracle Application Server 10g, the next generation of Oracle's integrated suite of application infrastructure software, has been specifically designed to run enterprise applications on computing grids. It is designed to run enterprise applications on pools of low cost servers and storage with very high performance, scalability, and availability while radically reducing the costs of systems and applications monitoring and management. Further, customers can deploy all their existing Oracle9iAS applications on Application Server 1 g without any changes and take advantage of the new grid features. OracleAS 1 Og is managed by Oracle Enterprise Manager 1 Og Grid Control, a web-based management console that enables administrators to manage many application servers as though they were one, thereby automating administrative tasks and reducing administrative costs. Grid Control also provides facilities to enable many administrators to work together to manage an application server grid. Finally, OracleAS 1Og is also

39 integrated with Oracle Database I Og in many different ways to optimize quality of service across a unified grid computing infrastructure for data management and enterprise applications.

Oracle Enterprise Manager lOg Grid Control is the complete, integrated, central management console and underlying framework that automates administrative tasks across sets of systems in a grid environment. Grid Control helps reduce administration costs through automation and policy-based standardization. With Oracle Grid Control, IT professionals can group multiple hardware nodes, databases, application servers, and other targets into single logical entities. By executing jobs, enforcing standard policies, monitoring performance and automating many other tasks across a group of targets instead of on many systems individually, Grid Control enables IT staff to scale with a growing grid. Because of this feature, the existence of many small computers in a grid infrastructure does not increase management complexity.

Oracle's product offerings are focused on Oracle-based solutions, not generalized offerings. This being said, a great deal of what organizations are seeking when they consider Grid Computing architectures can be supported quite well using Oracle's products.

2.5.4 Microsoft Grid

In Chapter 1, Microsoft was pointed out as a player in the Grid Computing space; in reality Microsoft does not seem to have any specific grid products in this space. Microsoft currently offers High Performance Computing (HPC) cluster solutions through its version Windows 2000/3 Server operating systems. Microsoft has also announced a future Windows 2003 Server, HPC Edition [33].

According to Microsoft's Distinguished Engineer Jim Gray web services and Grid Computing are synergistic. The data-centric nature of web services complements Grid

40 Computing's computational focus. ".NET is application-centric and is designed to make it easy to build the Data Grid - the integration of data sources throughout the world to produce a data library that provides a consistent, unified corpus that also is easy to query" Gray says [41]. Microsoft's strategy seems to be based on these Data Grids and implementing the Global Web Services Architecture (GXA) (Figure 2.10) [18].

A recent article points out that Microsoft is working on a skunk-works project related to Grid Computing, code-named Bigtop, which is designed to allow developers to create a set of loosely coupled, distributed operating-systems components in a relatively rapid way. Microsoft is looking at the consequences of loosely coupling a larger number of moderately powerful computers to achieve similar results as using several tightly coupled high performance systems together [42].

Infrastructure Protocols

Future Messaging Future Eventing WS-Transaction Protocol Protocol

SOAP Modules

WS-Referral WS-Security WS-Attach ents

WS-Routing..

Baseline Web Service Specifications

SOAP UDDI WSDL

Figure 2.10: Global Web Services Architecture

GXA is a layered architecture built upon baseline Web service specifications, such as Simple Object Access Protocol (SOAP), Universal Data Discovery Interface (UDDI), and

41 Web Services Description Language (WSDL) (see Appendix B). The GXA protocols above the baseline can be divided into two layers. The first layer is SOAP modules, such as WS-Security and WS-Attachments. These protocols are primarily concerned with the content and structure of an individual message. For example, the Security specification describes how security information, such as a Kerberos ticket, is embedded in a SOAP message. The second layer (top) is composed of infrastructure protocols such as Transaction and reliable messaging. Infrastructure protocols describe how sequences of a message are put together to solve business problems. The Transaction specification, for example, describes the flow of a SOAP message sent between a set of Web services that need to work together to coordinate a series of database updates.

A framework using Microsoft .NET platform called GridGarden.NET, developed at Intelligent Engineering Systems Laboratory (IESL) MIT, is presented later in this chapter. Alchemi, the Open Source .NET initiative, is described in Appendix C.

2.5.5 HP Adaptive Enterprise

HP has been a pioneer in grid computing since the late 1980s when Joel Birnbaum, then the director of HP Laboratories, envisioned the notion of "utility computing". Since then HP has been putting its energies toward grid computing inside and outside the lab. Company Chairman and Chief Executive Officer Carly Fiorina recently committed to use industry standards to grid-enable all of HP's products as illustrated in Figure 2.10 and also provide enterprises with services needed to implement Grid Computing [30]. Products ranging from the smallest handhelds, printers and PCs to the most powerful storage arrays and supercomputers will be able to connect with and serve as resources on a grid. Already today HP provides customers with grid-enabled systems running HP-UX, Linux, and Tru64 UNIX.

42 Figure 2.11: The HP grid solution stack diagram

HP believes that open, vendor-neutral standards are critical to the wide adoption of grid technology and that the flexibility of an open standards-based approach is the only way to bring together heterogeneous resources into an effective grid computing environment. Grid computing enables HP's vision of the Adaptive Enterprise -- where information technology is a highly efficient, flexible service that is agile enough to change in line with a corporation's business and its business environment.

HP has partnered with Platform Computing, Avaki, United Devices and Altair Engineering to provide grid software solutions with the HP Adaptive Enterprise vision.

2.5.6 Gridbus - Open Source Grid Initiative

The Grid Computing and Distributed Systems (GRIDS) Laboratory at the University of Melbourne is actively engaged in the design and development of next-generation parallel and distributed computing systems and applications. The Lab's flagship "Open Source" project, called Gridbus, is developing technology that enables GRID computing and BUSiness (Figure 2.12) [31].

43 __ ' 0. 11 -, - - =r g-.Q'N - 0, -- Irw - -

Grid Science Commerce Engineering Collaboratories . Portals: Applications

MPI . [ExcelGrd Gridscape Workflow Ix-Parameter SweepLang. User-Level Middiwware (GrId Tools) Grid Brokers: Nimrod-G Workflow Engine * Gridbus Data Broker

------r------Con Grid Grid Grid Grid Grid Middleware Achemi ,; Storage Bank Exchange & Market i NorduGrid Economy Federation Directory E 0 G C ~-T-- -- ~--- R UCL L | IjNE VM Condor PS GE Libra Tomcat I Grid Fabric Software WFdos solaris Linux AIX IRIX I OSF1 Mac

Grid - Fabdic Hwrdwam

Worldwide Grid

Figure 2.12: Gridbus Middleware

The Gridbus Project is engaged in the design and development of grid middleware technologies to support eScience and eBusiness applications. These include visual Grid application development tools for rapid creation of distributed applications, competitive economy-based Grid scheduler, cooperative economy based cluster scheduler, Web- services based Grid market directory (GMD), Grid accounting services, Gridscape for creation of dynamic and interactive test bed portals, G-monitor portal for web-based management of Grid applications execution, and the widely used GridSim toolkit for performance evaluation. Recently, the Gridbus Project has developed a Windows/.NET- based desktop clustering software and Grid job web services to support the integration of both Windows and Unix-class resources for Grid computing. A layered architecture for realization of low-level and high-level Grid technologies is shown in the figure below. Some of the Gridbus technologies discussed below have been developed by making use of Web Services technologies and services provided by low-level Grid middleware, particularly Globus Toolkit and Alchemi.

44 2.5.7 GridGarden.NET - A Microsoft .NET based Grid Framework from MIT

The Intelligent Engineering Systems Laboratory (IESL) at MIT has developed a .NET implementation the middleware necessary for Grid Computing [1]. The IESL team has focused on deploying computations over a farm of worker machines using the OGSA standards but implemented on the .NET CLR. Appendix B provides a summary of the Microsoft .NET architecture. The environment supports message passing between processes (more precisely Application Domains) using a more programmer friendly interface than Message Passing Interface (MPI).

This work has been partially sponsored by the National Infrastructure Simulation and Analysis Center of Sandia National Laboratories.

This is a master-slave architecture in which the master computer is called a "Proxy" and the slave computer a "Worker". The Proxy for the Master is used because it provides users with "proxy" objects for manipulating the Workers. For example, a user can, over the Internet, command one or more of the Worker threads to "Start/Resume", "Pause" or "Kill".

NAD Subsystem_ Worker Prmces "k(P== Event SubsystemJ

GridToolset - GridRLu

Channels

Figure 2.13: Components of GridGarden.NET

45 The computing is done through a number of "worker processes". Workers, the computers that hold the worker processes, are distributed across the network by URLs. For the current design, two communication channels are provided to communicate with the Proxy, namely, .NET TCP Remoting channel and HTTP Web Service channel. The TCP Remoting channel is used for the local area network (typically the Workers) due to its high performance, while the HTTP channel is used for communication with Internet clients, since the SOAP based Web Service is an open standard.

Programs running as worker processes are coded in the Common Language Runtime (CLR), which is platform independent (though presently limited to Windows, Linux and BSD). A grid program is normally a CLR dynamic loaded library (DLL) that contains the minimum functionality needed by GridGarden.NET system to Start/Resume, Pause and Kill it. We call the DLL a "Seed". The GridGarden.NET framework stores the DLLs in the Seed Pool for later distribution to the Workers. A number of abstract seed classes are provided as templates for the programmer. Typically, the programmer inherits and extends the abstract classes to their specific needs, such as particle simulation.

One important idea to be introduced is the "Net Application Domain" (NAD). Similar to MPI's "Communication Group", the NAD facilitates identification and communication of worker processes for a Grid Application. One difference between NAD and MPI Communication Group is that the NAD is not just for communication. It also provides a shared memory block, similar to a global COMMON, for worker processes, and its data can be exported over the Internet. The share memory block is called "Global". It is critical for Grid Computing so that results from all the Workers can be aggregated and fed on to other systems, such as a graphics server or a database server. Grid users accessing information about a simulation or controlling the application can do this by Read or Write to the Global data. Data stored in Global can be arbitrary serializable objects and can represent any data type.

Controlling the Grid Application is currently achieved through a Windows GUI that allows the Grid Application administrator to Run, Pause, Kill etc, which forms a mini

46 distributed operating environment that is quite similar to UNIX. One difference is that GridGarden.NET commands are managing distributed the application through Web Services.

The coding of the framework is mostly done in C#. However, C++ and VB for .NET can also be used since they are supported and can be compiled into CLR code. There are some other language bindings for CLR that are mostly Open Sourced, such as Perl, Python and even PHP. These are also potential language for GridGarden.NET developers but at present have not been tested. High performance programming libraries, such as Lapack, can be easily embedded into GridGarden.NET by inter operation. The current system contains a matrix manipulation class "Matrix" in GridLib as a wrapper for a native code function written in C. This Matrix class is a good illustration of how to write a wrapper for legacy libraries.

47 CHAPTER 3

CURRENT STATE OF AFFAIRS IN GRID COMPUTING AND

IMPLEMENTATIONS

3.1 Grid Computing Today - Current State of Affairs

The buzz around Grid Computing has picked up considerable momentum in recent years within the commercial realm where huge amounts of information are available but not properly understood. Grid Computing should not be regarded as the silver bullet but rather a tool that will help businesses and entities to leverage their current computing resources and be able to access others' when needed.

This momentum is validated by the endorsement of Grid Computing technologies into the corporate strategies of several major IT vendors. Although the trend shows the enormous Grid Computing market opportunity [6], there are still some barriers that need to be overcome. According to a market study report by Platform Computing [19] there is an overwhelming consensus that non-technical barriers including organizational politics played an important role in the implementation of Grid Computing. A further analysis revealed four top key organizational issues as illustrated in Figure 3.1:

- Loss of control or access to resources - Risks associated with enterprise-wide deployment - Loss or reduction of budget dollars - Reduced priority of projects

The key issues mentioned above were also highlighted by Sharon Green, Director of Utility Computing at EDS and Ed Reynolds, EDS Fellow [25]. In order to overcome these barriers corporate-wide education about the benefits and realities of Grid

48 Computing implementation is important. The study also found that corporate culture has transformed as such that employees perceive that they own the corporate assets that they use for their jobs leading to difficulties in sharing their resources (loss of control) in exchange for access to a larger pool of resources. Depending on the type of solution a corporation chooses the resources can be under the total control of the organizations.

LOSS OF CONTROLACCESS TO RESOURCES 44.4%

RISKS ASSOCIATED WITH ENTERPRISE-WiDE DEPLOYMENT 40.0%

LOSS/REDUCTIZN OF BUDGET DOlLARS 33.3%

REDUCED PRIORITY OF PROJECTS 29.6%

LACK OF DATA SECURITY AMONG DEPARTMENTS 25.9%

FEAR OF EXTERNAL DATA LEAKS 18.6%

Figure 3.1: Organizational issues ranked "Extremely High" and "High" in severity [19]

The issue of risks in enterprise-wide deployment can be mitigated by starting out by grid enabling one or two applications and initially and then scaling higher across the enterprise. On the other hand the fear of loss of budget is compensated by the savings from not purchasing additional expensive hardware and increased computing capabilities for faster tasks completions. Once the shared infrastructure is established one would be able to move resources and move processing around, and migrate resources based on priorities. These priorities are needed to be decided at an organizational level, and implemented top-down.

Security and standards are also key factors relating to the adoption of Grid Computing by various organizations. The broad range of standards initiatives currently focused on grid computing are a reliable indicator of the acceptance, and ultimately widespread usage, of this technology [22]. Several grid standards groups like the Global Grid Forum, Globus Alliance, and OASIS have formed a number of working groups to develop standards

49 addressing data transport, interoperability, security and integration with existing technologies like web services.

The development of grid standards will also allow customers the flexibility and adaptability to be free to take advantage of open-source and commodity products where appropriate while avoiding being locked into a proprietary unique vendor solution.

3.2 Industry Perspective - Hierarchy for Types of Grid Computing

In the previous chapter, several key players were identified and their 'Grid Computing' implementations or initiatives were presented. In this section the hierarchy for types of Grid Computing is introduced and industry perspectives are highlighted. This is followed by several real world case study implementations of the major vendors' 'Grid Computing' solutions in the next section.

There are at least three levels of hierarchy for types Grid Computing that have been identified [28]. The different levels suggest a different mode of communication and orchestration of resources among different entities. The levels are explained below:

Level 1 Grid Type:

Level 1 Grid can be attributed to highly parallelizable problems i.e. non-coupled problems like Monte Carlo simulations where same code is run in every machine but focusing different data. In general a user distributes code to be executed across a network of machines and coordinates the messaging between the machines so that some computational goal is achieved. The metaphor of Master and Worker machine can be used here for illustration. The coordination of the machines is achieved by using the Master machine. Here it is assumed that the Master is a "special" trusted machine that has coordination and other responsibilities not shared by the Workers [1].

50 The Master controls the workload on Worker and messaging routing and coordination or queries or distributes agents to remote machines and gathers results. At this level there is no Worker to Worker communication or messaging. This is illustrated in Figure 3.2.

Master

workers

Figure 3.2: Master-Worker Level I Grid Scenario

Level 2 Grid Type:

In a Level 2 type Grid, Workers may execute different code and act on different data and the Master may or may not coordinate Workers. The Master distributes the code to different workers and starts the execution of the tasks on the Workers. The Workers in this scenario can communicate with each other possibly via the Master. This is illustrated in Figure 3.2. Issues such as large messages sizes, buffers and timing need to be considered due to the communication between the Workers and Master.

Workers

Master

Figure 3.3: Level 2 Grid Type

51 ... . -4- 1- - -.. ------z I -- ...... ------,%w

Level 3 Grid Type:

Level 3 Grid Type represents true Internet grids with cross domain computing. This involves multiple grids combining to form super grids where the grid nodes are web addressable (Figure 3.4). This involves highly complex workflow and coordination where communication not only exists directly between Workers but also between Masters within the hierarchy of multiple grids.

Figure 3.4: Level 3 Grid Type

One of the primary concerns in this Level 3 Grid Type is security issues across multiple domains and complex and complicated process of communication, coordination and scaling among others.

The Industry View

There are current Grid Computing implementations which already fall within Level 1 and Level 2. Most of the Level 2 implementation uses Message Passing Interface (MPI), which is a de facto standard for communication among the nodes running a parallel program on a distributed memory system. As mentioned earlier, security is of primary concern in Level 3 Grid Type implementation and using multiple level of security abstraction also raises the concern for security holes for corporations due to the

52 sensitivity of their data. It is also known in the industry that in order to build Grid applications there needs to be a paradigm shift among programmers to focus development and reconfiguring of applications to meet the requirements of grid computing similar to the way it has been implemented in mainframes in the past [25]. Currently, there are unique solutions that are being implemented by the different major vendors like IBM, SUN, and Oracle etc. and their partners for their corporate clients (See next section).

The challenge of implementing Grid Computing is further complicated by the variety of different operating systems that run in corporations and if open source is not adopted whole heartedly within the corporation then Globus standard might be difficult to implement [25]. Web Services could play a crucial role in connecting these disparate resources if the standards are agreed upon. One consistent set of standards is being developed by Microsoft under the title of Global Web Services Architecture (GXA). However, OGSA is proposing others, such as WS-Resources that are not compatible with GXA and therefore there is some confusion as to which standards will prevail [1].

3.3 IBM Case Study: Hewitt Associates

Hewitt, an IBM supplier and customer, is a global outsourcing and consulting firm that offers a complete range of human resource services. High-volume printing, quick reaction time to changing customer needs and around-the clock availability of the print process are important to Hewitt, which offers printed documents to customers as a primary means of communication [24].

Hewitt needed a highly available print composition system capable of satisfying customer demands while integrating transparently and cost effectively with its existing system. The challenge was to partner with Hewitt Associates to build an enterprise print composition solution that was cost-effective, resilient and scalable. Sefas Innovation's Open Print Backstage module, running on IBM BladeCenter systems and Red Hat Linux was selected as solution components. To establish that Sefas' Open Print Backstage module

53 could be grid enabled to satisfy Hewitt's demanding print composition requirements, a proof of concept (POC) exercise was staged at the IBM Innovation Center in Waltham, Massachusetts.

Hewitt had successfully implemented a grid for another application and thought the Open Print Backstage module was a good candidate for the next Hewitt grid implementation.

Backstage - part of the Sefas Open Print suite that offers complete document management software infrastructure for enterprise publishing - is the high-performance document production engine that drives the high-volume printing of transactional documents. Backstage is optimized for volume batch printing throughput and features an easy-to-use drag-and drop interface that allows any application to be put into production from any workstation.

Proof of Concept Results

The IBM grid team, in concert with teams from Sefas, Hewitt and DataSynapse, conducted the POC grid enablement exercise. The test environment featured multiple IBM BladeCenter and IBM xSeries systems running Red Hat Linux, as well as z/OS. DataSynapse GridServer provided the grid infrastructure software layer. DataSynapse is also an IBM Business Partner.

The end-to-end integration testing showed conclusively that Hewitt jobs submitted to the Sefas grid successfully processed with all business and technical objectives achieved. Most importantly, the test allowed Sefas to provide added value to Hewitt, its customer. "The proof of concept grid enablement exercise was important to us for three reasons," said Jean-Philippe Sarraut, CEO of Sefas Innovation. "First, we've had a long and successful relationship with IBM that we want to continue. Second, most of our customers are using IBM infrastructure solutions, so ensuring our solutions work in an IBM environment is a must. Lastly, we've been exploring ways of splitting and spreading CPU-consuming tasks to low-cost commodity hardware platforms. IBM, Sefas and

54 Hewitt are at the forefront of support for new technologies such as grid computing, Blades, open standards and Linux, so the timing of the grid enablement evaluation couldn't have been better from our perspective."

Key Business Benefits:

- Reduced the time required to set up a new client from 12 months with their previous system to 3 months " The new architecture allowed Hewitt to absorb production sample reviews at a fraction of the cost of using mainframe-based architecture * The new system can be scaled up as needed " System resilience avoids reprocessing very large files when data is corrupt or invalid " Workflow is more fluid, leading to more predictable processing times " Use of IT resources is maximized using grid technology

3.4 SUN Case Study - Axyz Animation, Inc.2

The world of digital special effects is constantly concerned with the time and effort it takes for animators and technical directors to create many versions of various shots and farm these shots off to be rendered frame by frame for review. Since "rendering" is such a compute-intensive task, finding the available computing resources and distributing the work effectively can end up being tremendously time consuming. "Our scripts had to be manually tweaked for each and every render to run on various machines," explains John Coldrick, senior animator/V.P. at Axyz Animation, Inc. "The available machines and CPUs were constantly changing and required monitoring. Our process was very inefficient and error prone--animators and technical directors were constantly waiting for render tests."

2 Sun Microsystems Case Studies [20]

55 Based in Toronto, Ontario in Canada, Axyz Animation is a small- to mid-sized company that produces digital special effects for commercials, television series, and films. Unlike large animation shops', where having a "render farm solution" is a necessity, for smaller shops it's not an economical option to have a custom-made solution. Unfortunately, off- the-shelf packages do not typically work acceptably in the animation environment, therefore most large animation shops have solutions that are written from the ground up.

"What we needed was a flexible solution that didn't require re-inventing the wheel as far as distributed processes went, and yet was flexible enough for us to implement things the way we wanted," explains Coldrick. After extensive research, Coldrick read about Sun ONE Grid Engine software developed by Sun Microsystems, Inc. He continues, "What we discovered in Sun Grid Engine software was a remarkably robust, flexible, and scalable product that fit our needs like a glove. In fact, due to the scalability of Sun Grid Engine software, it would easily work as a solution for the larger animation shops as well."

Sun Grid Engine Software to the Rescue

After switching to Sun Grid Engine software, Axyz animators could submit any process-- animation or render--with the same command. "Animators don't have to be concerned about what machine is available, or massaging their scripts to maximize speed," explains Coldrick. "Every available CPU in our farm is put at their disposal, and jobs that typically took a whole night in the past can often run in a fraction of the time. Bottlenecks are a thing of the past."

Sun Grid Engine software works by enabling companies to submit and manage jobs from just about any Linux or UNIX system on the network. It does this by monitoring the availability of workstations, then deploying jobs to the available resource. Additionally, the command line utility gives the company the flexibility to script and automate jobs as well as build a custom front end. And the GUI provides the business a convenient

56 management tool for administering the Sun ONE Grid Engine software. "We've been able to implement application-specific licensing scenarios, such as applications that will run multi-threaded on one machine without an additional token penalty," says Coldrick. "This helps utilize all of our CPUs to their maximum. By taking the significant technical task of managing distributed processing off my plate, I was free to focus on an implementation that worked for our needs."

Sun Grid Engine software can be set up in three different environments depending on the company's requirements. The Cluster Grid includes one group using the Cluster Grid. The company could have multiple Cluster Grids set up in different locations. In Axyz's case, they used the Cluster Grid environment.

"I was able to set up a single group Cluster Grid in approximately two weeks of spare time after regular working hours--not very long at all," continues Coldrick. "I wrote script wrappers that not only allowed our staff to use language that was familiar to them, but allowed us to adapt to the specific requirements of the applications that we run." In addition, Axyz was able to easily write utility scripts that were tailored to its everyday tasks because of the open nature of the Sun Grid Engine software.

Axyz picked up a short-term project to work on a TV pilot for ABC. For this pilot, they were asked to generate a demanding 120 shots in three weeks. To accommodate this project, Axyz easily developed a second Sun Grid Engine software group. "I set up a second Sun Grid Engine software group for that short time, and combined with a powerful production pipeline that we developed, we were able to bring in quality work, on time, on budget, and without killing any of our staff," explains Coldrick. "Without Sun Grid Engine software this could not have been possible, since in such an incredibly short time having to haggle with managing something as mundane as render management would have killed the project."

All in all, Sun Grid Engine software has freed up significant amounts of time for Axyz animators, and let them focus on what they do best--animate. "In addition, turnaround

57 time for tests has dropped significantly, allowing for more refining in the same amount of time," concludes Coldrick. "Human error has also dropped, as there are far fewer scripts that need to be edited to get the job done. We couldn't be happier."

Key Business Benefits:

- Animation and render jobs submitted and managed quickly, efficiently, and reliably by Sun ONE Grid Engine software - Dramatically reduced time to do animation or render jobs from overnight to 1-2 hours - Completely eliminated bottlenecks from animation process " Significantly increased server utilization rates to almost 95 percent - Helped enable small animation company to cost-effectively distribute jobs to and from Linux systems - Provided robust and flexible solution helping to allow company to easily grow with future business by simply adding compute power

3.5 Oracle Case Study - Chicago Stock Exchange 3

In 2002, Chicago Stock Exchange (CHX) was exhausting the capacity of its legacy database server in its previous configuration, which was cumbersome to manage. Recovery from a hardware failure could take up to three hours, at worst, but outages of 15 to 20 minutes, which were more common, were also unacceptable. When replacing its legacy system, one challenge CHX faced was estimating capacity for a new system, as the stock market was entering a downturn at that time. If CHX bought more server power than it needed, it would be paying for idle capacity. However, if it under-bought server capacity, it risked throwing away the hardware investment when market activity picked up and business needs required a larger machine.

3 Oracle Corporation Success Stories [21]

58 CHX identified that its high-availability and hardware-economy requirements could be met by implementing an enterprise grid solution using Oracle Database with Real Application Clusters, Oracle Application Server, and Oracle Enterprise Manager. This move allows CHX to limit its initial hardware investment only to the server power it currently needs, and provided the flexibility to incrementally scale-out system capacity as demand dictates. CHX is upgrading its system to Oracle 1 Og to further improve system automation and centralized management. CHX increased the number of servers in its grid configuration from two to four, further minimizing the risk of downtime.

Server Utilization

Prior to using Oracle Real Application Clusters, CHX ran on two HP Alpha Server GS60 servers. "We recorded a lot of trading information during the day, and then at night we would process it," said David Milne, director of database technologies for the Chicago Stock Exchange. "That meant one of these big machines was virtually idle while the other was very busy. We could failover and run on one machine, but that was just to get us through a crisis." CHX runs its online customer service, batch reporting, and data mining systems along with a near-real-time decision support system on Oracle. The system is managed through Oracle Enterprise Manager Grid Control and runs on economical HP ES40 servers.

The resulting infrastructure has greater flexibility for allocating workloads to get the most out of the system. "Since I now have jobs divided onto four machines, it's easier to see where the problems are and better tune the systems," Milne added. "We can now quickly identify and fix problems instead of just having to live with the problems because we weren't able to see what was going on." CHX officials also note the system consistently achieves more predictable CPU utilization rates with Oracle's grid environment and is capable of handling mixed workloads, including batch loads and online transaction processing systems. Over time, CHX can expand its lOg environment by scaling out with additional server and storage modules-a strategy that Mainstay Partners, an independent consulting firm, predicts will avoid costly hardware replacements.

59 Oracle 10g Solution Results

CHX officials expect significant productivity increases, operational benefits, and improved customer satisfaction levels with its Oracle enterprise grid solution. These benefits can be attributed to the built-in automation in Oracle Database 1 Og and the self- tuning capabilities of Oracle Enterprise Manager 10g, such as Automatic Database Diagnostic Monitor (ADDM). ADDM simplifies the query tuning process, eliminating routine diagnostics and manual performance "fire drills."

Oracle Enterprise Manager Grid Control will also provide CHX greater visibility through centralized system management. This approach will ease the challenge of managing mixed workloads in an enterprise grid by dynamically monitoring and shifting resources, as predetermined according to CHX's business rules. The increased automation allows CHX to deploy database administrators and systems administrators away from routine maintenance tasks to more strategic projects.

"One of my biggest goals is to continuously improve our computing environment, making it seamless and transparent, so the customer never knows that there's a database serving them," Milne said. "Oracle Real Application Clusters on HP keeps our database up and running, allowing our customers to conduct business on their schedule. Oracle helps us ensure maximum uptime so we're here whenever our customers need us."

Key Business Benefits:

" CHX's investment in Oracle technology will yield a return on investment of 171% over five years and an internal rate of return of 47% (Mainstay) - According CHX executives, Oracle's enterprise grid computing solution has provided opportunities for improved operations and customer service - Cost savings stemming from headcount avoidance due to improved overall system manageability

60 CHAPTER 4

INDUSTRY FUTURE OF GRID COMPUTING

The current state of affairs of grid computing has certainly shown its both potential and barriers that exists but the future perspective also needs to be taken into account. This chapter discusses the several drivers for the future of Grid Computing in industry as well in its existing areas.

4.1 Market Forecast

The potential for Grid Computing is huge and the projected investment by the major vendors is surely to increase. The market drivers identified in Chapter 1 will play a significant role in the next few years as the commercial grid computing technology matures and cultural and political organizational acceptance increases. The market opportunity of $12 billion and $1.8 billion in Western Europe is a driver in itself [6] [7]. IDC has been forecasting the virtual environment software markets to grow at a faster rate than the operating system and subsystem market. This is, in part, due to its belief that Grid Computing is going to be increasingly important to organizations [14].

One of the major drivers that might be leading in acceptance rate for Grid Computing would be clarity of the different messages that are being marketed by the various industry leaders like IBM, SUN, Oracle and others. There seems to be no clear definition of Grid Computing and particularly the standards that would lead to its ubiquity. Today there is a mix of application-specific code, "off the shelf' tools and services from Globus, startups, established IT-vendors such as mentioned earlier and others in the Grid community. These are all tied together by application development and system integration. But in order for Grid Computing to reach the next level there should be wider open source

61 implementations and more opportunities for newer small companied to participate with more investment from the major IT vendors. This could be achieved by building the right 'Eco-system' to nurture the organic growth and proliferation of Grid Computing. IBM, for example, had announced their intentions to build this Eco-System through partnering and supporting new businesses ventures in Grid Computing and creating new market opportunities [27]. Similarly, EDS has their Agility Alliance, with selective key partners like Sun, HP, Microsoft and others, which is a part of EDS Agile Enterprise initiative to promote Utility Computing services [25].

Utility Computing can also be a major driver in acceptance of the Grid Computing concept. IT outsourcing in general has already played a major role in promoting the evolution of Utility Computing. Evolutionary change through incremental adoption as IT becomes more and more synchronized with business objectives is illustrated in Figure 4.1.

Business utility Shared Business \ / Services E IT E utility

0

E Fixed o 0 Infrastructure

M

1990 2002 2010 I-e inft stwrlCr iS SLpI',mmd-, 1-ned by the shared access of 6 utity. Gartner Figure 4.1: Evolutionary change through incremental IT adoption

There are already several models available for Utility Computing such as Dedicated (in- house), Shared (Outsourced) and Public (Data Center) along with pricing models based on Compute (CPU cycles), Storage (GB) and Network (GB Bandwidth) that are already implemented commercially [25].

62 _;ZZ_ - - 3L

4.2 Technology and Standards Direction

The technology for Grid Computing has been evolving over several years and the pace has picked up by the endorsements of the larger vendors. The Globus standard has been at the forefront as the de facto open standard for the grid but several other projects for different platforms are already available as mentioned in previous chapters. In the recent past, tech companies have delivered products that make it relatively easy to set up grids - although complexity increases if the same operating software is not used within the hierarchy of Grid Computing types. Technology development based on open standards will play a major role in enhancing the future of grid computing as was mentioned earlier.

For now as technology improves so does the disparity in the systems and their implementations. A clear lack of standards will hinder the global Grid Computing strategy. Figure 4.2 provides a list of Grid-related standards organizations that are promoting, defining and evolving different aspects of Grid Computing [23].

-GGF + Research and Industry, use cases, architectures and specifications (OGSA, OGSI/WSRF) *EGA * Promote and grow Enterprise grid computing " DMTF * Distributed Mgt. standards and models (CIM) OASIS LA " OASIS + eBusiness & Web Services Management (WS-RF, WS-Notification. WSDM, ...) +4' " IETF W3Zi' * Internet architectures & specifications (SNMP, SMI) -w3c * Web Services architectures and specifications - SNIA + Advance the adoption of storage networks as complete and trusted solutions" Figure 4.2: Grid-related standards organizations

63 For example, Enterprise Grid Alliance (EGA) is one such organization which is a consortium of leading vendors and customers focused on developing Enterprise Grid solutions. EGA is open, independent and vendor-neutral. Anyone can join by executing relevant agreements and paying dues- there are no admission barriers. Their technical scope includes grid activities within enterprise data centers, but not desktop grids; using proven and standard enterprise components, but not vector supercomputers; within and between trusted and secure enterprises, but not involving dynamically defined virtual organizations, and for use with enterprise commercial and technical applications, but not scientific computing or academic research grids. EGA's ultimate goal is to unify Grid Computing within and between enterprises to support true cooperative processing and not just message passing [23]. Appendix D provides summaries about the other standards organizations.

Collaboration among the various standards organizations will be a key enabler for tackling the challenges of technology and standards for wide Grid Computing acceptance.

Federated ...

Shared ... I! Network Fabric

Silos ... C

Eng Mfg Sales .. IT Storage Fabric

Management Tools & Processes Figure 4.3: Vision of the Grid Computing journey

64 Figure 4.3 illustrates the journey ahead for Grid Computing and one should keep in mind that the journey is not only about technology but also people, process and finance [23]. These are also the organizational issues identified as barriers to Grid Computing implementation in Chapter 3.

4.3 Industry Perspectives

Corporations are taking a wait and see approach towards the adoption of Grid Computing into their corporate infrastructures. Utility Computing service provider, EDS, is taking a cautious approach to adopting Grid Computing as part of their major initiatives but keeping vigilant about future opportunities and disruptive technologies leading to wider acceptance of Grid Computing [25]. Whereas, IBM, Sun, Oracle and others are touting Grid Computing enabled software and hardware services and are investing heavily in marketing their products and services. The case studies in the previous chapter provided specific examples of Grid Computing based solutions within corporations but it did not reflect the ideals of true ubiquity of the grid. Academia and scientific foundations, as well as government initiatives, are trying to push those boundaries since they are not restricted by the business justification of cost and benefit that corporations have to take into account.

The application programmers' paradigm shift to deeper Grid Computing development application support, mentioned in Chapter 3, will play a key role in enabling the Grid Computing ability of various software. This cultural shift although not new, since it already existed in the mainframe days, will be very important and is recognized by the industry [25].

Finally, technology is driving the standards and open source standards versus homogenous standards 4 will have a global impact in the adoption of Grid Computing [27].

4 Term used here to illustrate alliances and partnerships among vendors to promote their joint Grid Computing standards

65 CHAPTER 5

DISCUSSIONS AND CONCLUSIONS

5.1 Discussions

Grid Computing has been in existence within academia and the scientific community over several years and there are many different projects currently active worldwide. The commercial interest in Grid Computing has gained momentum and there are several major vendors offering different initiatives in Grid Computing. What's been interesting is that there seems to be confusion about the definition of Grid Computing and Utility Computing, and the use of the term interchangeably. The market opportunity predicted for Grid Computing is very extensive and this is will be a key driver in expanding Grid Computing in the commercial space [6]. Chapter 2 introduced the Grid Computing infrastructure and the de facto Globus standard. The convergence of Grid Computing and Web Services was also highlighted with the emergence of WSRF and GXA. Some of the major vendors', like IBM, Sun, Oracle and HP, Grid Computing initiatives were presented. Gridbus, an Open Source Grid Initiative, and GridGarden.NET, Microsoft .NET Grid framework at MIT, were also presented to illustrate the use of the Grid standard.

The current state of affairs of Grid Computing was discussed in Chapter 3 along with three case studies of major vendor implementation of Grid Computing solutions. The organizational issues to Grid acceptance were listed and the hierarchy for Grid types was introduced. There are already current implementations of Level 1 and Level 2 Grid Types but Level 3 Grid Types raises issues of security across domains and the complex and complicated process of communication, coordination and scaling among others. This is further complicated by the use of different operating systems by different systems across

66 multiple domains and also within domains. A new paradigm of application development supporting Grid Computing was also mentioned.

The future state of Grid Computing with regards to marketing, standards and technology directions were discussed in Chapter 4. The market opportunity as mentioned earlier seems to be driving some of the commercial movement in the Grid Computing space with formation and participation in various standards organizations. Standardization and a broader acceptance of the open standards along with identification of key business benefits of Grid Computing will be essential for commercial acceptance.

5.2 Conclusions

The value of Grid Computing to a company will be measured by the business benefit it provides. In order for that to be the case, the standards need to be defined and a new way of thinking in utilizing and maximizing the unused CPU cycles towards realizing business benefits has to be part of the organization's culture with a long term visions. Corporations are currently evaluating Grid Computing solutions on an as needed basis and are not thinking in terms of parallel processing terms as to how to improve the business. Globus is the de facto standard that most organizations are following but challenges exist in implementing this standard in a heterogeneous operating environment. The flexibility, reliability and resiliency with IT cost reduction that companies demand can be realized by implementing Grid Computing. The computing needs of business and engineering solutions are currently available from the major vendors and their partners as presented in the sample case studies. It can be concluded that standards, security and convergence of Grid and Web Services technologies in an Open Source environment are some of the key enablers towards the ubiquity goal of Grid Computing.

67 REFERENCES

[1] Xiaohan Lin and John R. Williams, "A Grid Computing Architecture for Applications in Distributed Computation", MIT [2] Joshy Joseph, Craig Fellenstein, "Introduction to Grid Computing", Apr 16, 2004 [3] Sivakameswari Viswanathan, "Grid Computing Impact and Implications for Business" [4] Dan Bertrand, Dan Allen, Frank Paukovitz, "The IT Infrastructure Journey: Rapid Transformation toward Agility", EDS, September 2003 [5] Ian Foster and Carl Kesselman, "The Grid: Blueprint for a New Computing Infrastructure", Morgan Kaufmann Publishers, Inc. 1999 [6] IDC's study, "Role of Grid Computing in the Coming Innovation Wave", March 2004 [7] IDC's study, "The Western European Grid Computing Server Market Opportunity", October 2004 [8] Ian Foster, "Exploring the Power of Grid Computing", May 2003 [9] Ian Foster, Carl Kesselman, Steven Tuecke, "The Anatomy of the Grid Enabling Scalable Virtual Organizations", 2001 [10] Thomas Sandholm and Jarek Gawor, "Globus Toolkit 3 Core - A Grid Service Container Framework", July 2003 [11] Marc Brooks, "Service Oriented Architecture & Grid Computing", MITRE Corporation [12] IBM Developerworks Web, "Globus Toolkit 3 Core - A Grid Service Container Framework", July 2003 [13] Paul Shread, "Sun Overhauls Grid Strategy", Internetnews.com, November 2003 [14] Dan Kusnetzky and Carl W. Olofson, "Oracle 1 0 g: Putting Grids to Work", IDC, April 2003 [15] Andrew Donoghue, "What the hell are Grids anyway?", ZDNet UK, December 10, 2004

68 [16] Miranda Nash, "Oracle I Og: Infrastructure for Grid Computing", Oracle Corporation, 2002 [17] Clabby Analytics, "The Grid Report", 2004 [18] Directions on Microsoft, < http://www.directionsonmicrosoft.com/> [19] Platform Computing, Market Study Report, "The Politics of Grid", 2003, 2004 [20] Sun Microsystems, [21] Oracle Corporation, [22] Platform Computing, Business Whitepaper, "Doing More with Less - Accelerating Performance in the Financial Industry with Grid Computing", February 2004 [23] Mark Linesch, Don Deutsch, "Grid Standards: Accelerating Grid Deployment within the Enterprise", Oracle OpenWorld, December 2004 [24] IBM Solutions Grid for Business Partners, [25] Phone interview with Sharon Green and Ed Reynolds, EDS Corporation, January 4, 2005 [26] Sun Microsystems, "Sun Cluster Grid Architecture", May 2002 [27] "The Many Faces of Grid" Roundtable at the IBM Innovation Center in Waltham, Massachusetts, December 14, 2004 [28] Prof. John Williams, Director of Intelligent Engineering Systems Lab at MIT [29] The Globus Alliance, < http://www.globus.org> [30] HP Corporation, [31] The GridBus Project, < http://www.gridbus.org> [32] World Wide Web Consortium (W3C), Web Services Architecture, [33] Microsoft Corporation, [34] Akshay Luther, Rajkumar Buyya, Rajiv Ranjan, and Srikumar Venugopal, "Alchemi: A .NET-based Grid Computing Framework and its Integration into Global Grids" [35] Global Grid Forum, < http://www.gridforum.org/> [36] Distributed Management Task Force, Inc. (DMTF),

69 [37] Organization for the Advancement of Structured Information Standards (OASIS), [38] Internet Engineering Task Force (IETF), < http://www.ietf.org/> [39] World Wide Consortium (W3C), < http://w3c.org/> [40] Storage Networking Industry Association (SNIA), < http://www.snia.org/> [41] Microsoft PressPass, "Working Together: Researchers Unite Web Services and Grid Computing to Enhance Scientific Study", October 6, 2003 [42] Mary Jo Foley, "A Peek Under Microsoft's Secret 'Bigtop"', Microsoft Watch, December 29, 2004

70 APPENDIX A

XML Web Services

XML Web services are units of application logic that provide data and services to other applications. Applications access XML Web services by means of industry standard Web protocols and data formats, such as HTTP, XML, and Simple Object Access Protocol (SOAP), regardless of how each XML Web service is implemented. Web service architecture involves many layered and interrelated technologies Figure A. 1 [32]. One of the primary advantages of the XML Web services architecture is that it allows programs written in different languages on different platforms to communicate with each other in a standards-based way.

I

Figure A. 1: Web Services Architecture Stack

71 One of the core characteristics of an XML Web service is the high degree of abstraction that exists between the implementation and the consumption of a service. By using XML- based messaging as the mechanism by which the service is created and accessed, both the XML Web service client and the XML Web service provider are freed from needing any knowledge of each other beyond inputs, outputs, and location.

Simple Object Access Protocol (SOAP)

SOAP defines how messages are formatted, sent, and received when working with XML Web services. SOAP is also an industry standard that is built on XML and HTTP. Any platform that supports the SOAP standard can support XML Web services. In other words SOAP is the communications protocol for XML Web services. SOAP is a specification that defines the XML format for messages. If you have a well-formed XML fragment enclosed in a couple of SOAP elements, you have a SOAP message. There are other parts of the SOAP specification that describe how to represent program data as XML and how to use SOAP to do Remote Procedure Calls. These optional parts of the specification are used to implement RPC-style applications where a SOAP message containing a callable function, and the parameters to pass to the function, is sent from the client, and the server returns a message with the results of the executed function. Most current implementations of SOAP support RPC applications because programmers who are used to doing COM or CORBA applications understand the RPC style. SOAP also supports document style applications where the SOAP message is just a wrapper around an XML document. Document-style SOAP applications are very flexible and many new XML Web services take advantage of this flexibility to build services that would be difficult to implement using RPC.

The last optional part of the SOAP specification defines what an HTTP message that contains a SOAP message looks like. This HTTP binding is important. The HTTP binding is optional, but almost all SOAP implementations support it because it's the only standardized protocol for SOAP. For this reason, there's a common misconception that

72 SOAP requires HTTP. Some implementations support MSMQ, MQ Series, SMTP, or TCP/IP transports, but almost all current XML Web services use HTTP because it is ubiquitous. Since HTTP is a core protocol of the Web, most organizations have a network infrastructure that supports HTTP and people who understand how to manage it already. The security, monitoring, and load-balancing infrastructure for HTTP are readily available today.

Web Services Description Language (WSDL)

WSDL is an XML format for describing the network services that are offered by the server. You use WSDL to create a file that identifies the services that are provided by the server and the set of operations within each service that the server supports. For each of the operations, the WSDL file also describes the format that the client must follow when requesting an operation.

Universal Description, Discovery and Integration (UDDI)

The Universal Description, Discovery and Integration (UDDI) specifications define a registry service for Web services and for other electronic and non-electronic services. A UDDI registry service is a Web service that manages information about service providers, service implementations, and service metadata. Service providers can use UDDI to advertise the services they offer. Service consumers can use UDDI to discover services that suit their requirements and to obtain the service metadata needed to consume those services.

The UDDI specifications define:

" SOAP APIs that applications use to query and to publish information to a UDDI registry - XML Schema schemata of the registry data model and the SOAP message formats

73 " WSDL definitions of the SOAP APIs - UDDI registry definitions of various identifier and category systems that may be used to identify and categorize UDDI registrations

Security

Organizations building and managing secure XML Web services need to ensure that only authorized parties are allowed to use the XML Web services and that the SOAP messages sent and received by the XML Web services can only be modified or viewed by appropriate parties. WS-Security describes how to use the existing W3C security specifications, XML Signature and XML Encryption, to ensure the integrity and confidentiality of SOAP messages. And together with WS-License, it describes how existing digital credentials and their associated trust semantics can be securely associated with SOAP messages. Together, these specifications form the bottom layer of comprehensive modular security architecture for XML Web services. Future security specifications will build on these basic capabilities to provide mechanisms for credential exchange, trust management, revocation, and other higher-level capabilities.

The two initial security specifications provide the following capabilities:

- WS-Security is a simple, stateless, SOAP extension that describes how digital credentials should be placed within SOAP messages, and how these credentials should be associated with a message to ensure message integrity and confidentiality. WS-Security describes how message integrity is maintained even for SOAP messages that use the WS-Routing specifications described below. Using WS- Security, XML Web services can examine incoming SOAP messages and, based on an evaluation of the credentials, determine whether or not to process the request. WS- Security supports a wide range of digital credentials and technologies including both public key and symmetric key cryptography.

74 * WS-License describes how several common license formats, including X.509 certificates and Kerberos tickets, can be used as WS-Security credentials. WS- License includes extensibility mechanisms that enable new license formats to be easily incorporated into the specification.

75 APPENDIX B

Microsoft .NET Framework

The Microsoft .NET Framework is a platform for building, deploying, and running Web Services and applications. It provides a highly productive, standards-based, multi- language environment for integrating existing investments with next-generation applications and services as well as the agility to solve the challenges of deployment and operation of Internet-scale applications. The .NET Framework consists of two main parts: the common language runtime (CLR) and a unified, hierarchical class library that includes a revolutionary advance to Active Server Pages (ASP.NET), an environment for building smart client applications (Windows Forms), and a loosely-coupled data access subsystem (ADO.NET) as shown in Figure B.1 [33].

The .NET Framework is designed to fulfill the following objectives:

- To provide a consistent object-oriented programming environment whether object code is stored and executed locally, executed locally but Internet-distributed, or executed remotely. " To provide a code-execution environment that minimizes software deployment and versioning conflicts. - To provide a code-execution environment that guarantees safe execution of code, including code created by an unknown or semi-trusted third party. - To provide a code-execution environment that eliminates the performance problems of scripted or interpreted environments. - To make the developer experience consistent across widely varying types of applications, such as Windows-based applications and Web-based applications. - To build all communication on industry standards to ensure that code based on the .NET Framework can integrate with any other code.

76 The .NET Framework can be hosted by unmanaged components that load the common language runtime into their processes and initiate the execution of managed code, thereby creating a software environment that can exploit both managed and unmanaged features. The .NET Framework not only provides several runtime hosts, but also supports the development of third-party runtime hosts.

Visal asic C++j C# J#~ Common Language Specification

ASP.NET Windows Web Forms, XML Web Services Forms 0

ADO.NET and XML : Base Classes |1)q. Common Language Runtime Operating System

Figure B. 1: Microsoft .NET Architecture

The .NET Framework is an integral Windows component for building and running the next generation of software applications and Web services.

Common Language Runtime

The common language runtime manages memory, thread execution, code execution, code safety verification, compilation, and other system services (Figure B.2) [33]. These features are intrinsic to the managed code that runs on the common language runtime. With regards to security, managed components are awarded varying degrees of trust, depending on a number of factors that include their origin (such as the Internet, enterprise

77 inIU~I'I,! '.,d.,.IflhiiiiiflI 1mm III 1'VT.fl, rdIt -- ~1

network, or local computer). This means that a managed component might or might not be able to perform file-access operations, registry-access operations, or other sensitive functions, even if it is being used in the same active application.

Figure B.2: Common Language Runtime Architecture

The common language runtime makes it easy to design components and applications whose objects interact across languages. Objects written in different languages can communicate with each other, and their behaviors can be tightly integrated. For example, you can define a class and then use a different language to derive a class from your original class or call a method on the original class. You can also pass an instance of a class to a method of a class written in a different language. This cross-language integration is possible because language compilers and tools that target the runtime use a common type system defined by the runtime, and they follow the runtime's rules for defining new types, as well as for creating, using, persisting, and binding to types.

78 Class Libraries

The .NET Framework class library is a collection of reusable types that tightly integrate with the common language runtime. The class library is object oriented, providing types from which your own managed code can derive functionality. This not only makes the .NET Framework types easy to use, but also reduces the time associated with learning new features of the .NET Framework. In addition, third-party components can integrate seamlessly with classes in the .NET Framework. Base classes provide standard functionality such as input/output, string manipulation, security management, network communications, thread management, text management, and user interface design features.

The ADO.NET classes enable developers to interact with data accessed in the form of XML through the OLE DB, ODBC, Oracle, and SQL Server interfaces. XML classes enable XML manipulation, searching, and translations. The ASP.NET classes support the development of Web-based applications and Web services. The Windows Forms classes support the development of desktop-based smart client applications.

The .NET Framework also provides a collection of classes and tools to aid in development and consumption of XML Web services applications. XML Web services are built on standards such as SOAP (a remote procedure-call protocol), XML (an extensible data format), and WSDL. The .NET Framework is built on these standards to promote interoperability with non-Microsoft solutions. It is also possible to extend the library by creating one's own classes and compiling them into libraries.

.NET Framework Security

The .NET Framework provides several mechanisms for protecting resources and code from unauthorized code and users:

79 - ASP.NET Web Application Security provides a way to help limit access to a site by comparing authenticated credentials (or representations of them) to Microsoft Windows NT file system permissions or to an XML file that lists authorized users, authorized roles, or authorized HTTP verbs. " Code access security uses permissions to help limit the access that code has to protected resources and operations. It helps protect computer systems from malicious mobile code and helps provide a way to allow mobile code to run safely. (Code access security, together with the policies that govern it, are referred to as evidence- based security.) - Role-based security provides information needed to make decisions about what a user is allowed to do. These decisions can be based on either the user's identity or role membership, or both.

80 ...... I

APPENDIX C

Alchemi: A .NET-based Grid Computing Framework [34]

A Microsoft Windows based grid computing infrastructure will play a critical role in the industry-wide adoption of grids due to the large-scale deployment of Windows within enterprises. This enables the harnessing of the unused computational power of desktop PCs and workstations to create a virtual supercomputing resource at a fraction of the cost of traditional supercomputers. However, there is a distinct lack of service oriented architecture-based grid computing software in this space. To overcome this limitation, we have developed a Windows-based grid computing framework called Alchemi implemented on the Microsoft .NET Platform.

e-Sclence e-Business e-Cornmerce e-Engineering Application Application Application Application I I I I 9; Precorn plied executables Any lanquaqe

Parametric Modeling Environment J. Grldbus CGrid Service Broker (GSB)

Alchemi Actuator Globus Actuator

Grid Threads Alcheml Jobs

Alcheml

Windows-based machines with .NET Framework

Figure C. 1: Alchemi architecture and interaction between its components

Alchemi follows the master-worker parallel programming paradigm in which a central component dispatches independent units of parallel execution to workers and manages

81 them. This smallest unit of parallel execution is a grid thread, which is conceptually and programmatically similar to a thread object (in the object-oriented sense) that wraps a "normal" multitasking operating system thread. A grid application is defined simply as an application that is to be executed on a grid and that consists of a number of grid threads. Grid applications and grid threads are exposed to the grid application developer via the object oriented Alchemi .NET API.

Manager The Manager manages the execution of grid applications and provides services associated with managing thread execution. The Executors register themselves with the Manager which in turn keeps track of their availability. Threads received from the Owner are placed in a pool and scheduled to be executed on the various available Executors. A priority for each thread can be explicitly specified when it is created within the Owner, but is assigned the highest priority by default if none is specified. Threads are scheduled on a Priority and First Come First Served (FCFS) basis, in that order. The Executors return completed threads to the Manager which are subsequently passed on or collected by the respective Owner.

Executor The Executor accepts threads from the Manager and executes them. An Executor can be configured to be dedicated, meaning the resource is centrally managed by the Manager, or non-dedicated, meaning that the resource is managed on a volunteer basis via a screen saver or by the user. For non-dedicated execution, there is one-way communication between the Executor and the Manager. In this case, the resource that the Executor resides on is managed on a volunteer basis since it requests threads to execute from the Manager. Where two-way communication is possible and dedicated execution is desired the Executor exposes an interface (Executor) so that the Manager may communicate with it directly. In this case, the Manager explicitly instructs the Executor to execute threads, resulting in centralized management of the resource where the Executor resides. Thus, Alchemi's execution model provides the dual benefit of:

82 - Flexible resource management i.e. centralized management with dedicated execution vs. decentralized management with non-dedicated execution; and - Flexible deployment under network constraints i.e. the component can be deployment as nondedicated where two-way communication is not desired or not possible (e.g. when it is behind a firewall or NAT/proxy server).

Thus, dedicated execution is more suitable where the Manager and Executor are on the same Local Area Network while non-dedicated execution is more appropriate when the Manager and Executor are to be connected over the Internet.

Owner Grid applications created using the Alchemi API are executed on the Owner component. The Owner provides an interface with respect to grid applications between the application developer and the grid. Hence it "owns" the application and provides services associated with the ownership of an application and its constituent threads. The Owner submits threads to the Manager and collects completed threads on behalf of the application developer via the Alchemi API.

Cross-Platform Manager The Cross-Platform Manager, an optional sub-component of the Manager, is a generic web services interface that exposes a portion of the functionality of the Manager in order to enable Alchemi to manage the execution of platform independent grid jobs (as opposed to grid applications utilizing the Alchemi grid thread model). Jobs submitted to the Cross-Platform Manager are translated into a form that is accepted by the Manager (i.e. grid threads), which are then scheduled and executed as normal in the fashion described above. Thus, in addition to supporting the grid-enabling of existing applications, the Cross-Platform Manager enables other grid middleware to interoperate with and leverage Alchemi on any platform that supports web services (e.g. Gridbus Grid Service Broker).

83 APPENDIX D

Global Grid Forum (GGF) [35]

The Global Grid Forum is a community-initiated forum of thousands of individuals from industry and research leading the global standardization effort for grid computing. GGF's primary objectives are to promote and support the development, deployment, and implementation of Grid technologies and applications via the creation and documentation of "best practices" - technical specifications, user experiences, and implementation guidelines.

GGF efforts are also aimed at the development of a broadly based Integrated Grid Architecture that can serve to guide the research, development, and deployment activities of the emerging Grid communities. GGF goals include the following:

- To facilitate and support the creation and development of regional and global computational grids that will provide to the scientific community, industry, government and the public at large dependable, consistent, pervasive and inexpensive access to high-end computational capabilities - To address architecture, infrastructure, standards and other technical requirements for computational grids and to facilitate and find solutions to obstacles inhibiting the creation of these grids - To educate the scientific community, industry, government and the public regarding the technologies involved in, and potential uses and benefits of, computational grids " To facilitate the application of grid technologies within educational, research, governmental, healthcare and other industries " To provide a forum for exploration of computational grid technologies, applications and opportunities, and to stimulate collaboration among the scientific community, industry, government and the public regarding the creation, development and use of computational grids

84 - To exercise all powers conferred upon corporations formed under the Illinois General Not-For-Profit Corporation Act in order to accomplish its charitable, scientific and educational purposes and to take other actions necessary, advisable or convenient to carry out any or all of these purposes

Distributed Management Task Force, Inc. (DMTF) [36]

With more than 3,000 active participants, the Distributed Management Task Force, Inc. (DMTF) is the industry organization leading the development of management standards and integration technology for enterprise and Internet environments. DMTF standards provide common management infrastructure components for instrumentation, control and communication in a platform-independent and technology neutral way. DMTF technologies include information models (CIM), communication/control protocols (WBEM), and core management services/utilities.

DMTF works closely with its Alliance Partners, including CompTIA, Consortium for Service Innovation, Federation Against Software Theft (FAST), Global Grid Forum (GGF), Interoperability Technology Association for Information Processing (INTAP), IT Service Management Forum (itSMF), Network Applications Consortium (NAC), Northwest Energy Efficiency Alliance, The Open Group, Storage Networking Industry Association (SNIA) and TeleManagement Forum (TMF). These top industry standards bodies are working with and participating in the development of DMTF's CIM - and its semantically rich definitions of management information - as a common approach to address the challenge of providing interoperable distributed management.

85 Organization for the Advancement of Structured Information Standards (OASIS) [37]

OASIS was founded in 1993 under the name SGML Open as a consortium of vendors and users devoted to developing guidelines for interoperability among products that support the Standard Generalized Markup Language (SGML). OASIS changed its name in 1998 to reflect an expanded scope of technical work. It is a not-for-profit, international consortium that drives the development, convergence, and adoption of e-business standards. The consortium produces more Web services standards than any other organization along with standards for security, e-business, and standardization efforts in the public sector and for application-specific markets. Founded in 1993, OASIS has more than 4,000 participants representing over 600 organizations and individual members in 100 countries.

OASIS is distinguished by its transparent governance and operating procedures. Members themselves set the OASIS technical agenda, using a lightweight process expressly designed to promote industry consensus and unite disparate efforts. Completed work is ratified by open ballot. Governance is accountable and unrestricted. Officers of both the OASIS Board of Directors and Technical Advisory Board are chosen by democratic election to serve two-year terms. Consortium leadership is based on individual merit and is not tied to financial contribution, corporate standing, or special appointment.

The Internet Engineering Task Force (IETF) [38]

The Internet Engineering Task Force is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. It is open to any interested individual. The actual technical work of the IETF is done in its working groups, which

86 are organized by topic into several areas (e.g., routing, transport, security, etc.). Much of the work is handled via mailing lists. The IETF holds meetings three times per year.

The Internet Assigned Numbers Authority (IANA) is the central coordinator for the assignment of unique parameter values for Internet protocols. The IANA is chartered by the Internet Society (ISOC) to act as the clearinghouse to assign and coordinate the use of numerous Internet protocol parameters.

World Wide Consortium (W3C) [39]

In October 1994, Tim Berners-Lee, inventor of the Web, founded the World Wide Web Consortium (W3C) at the Massachusetts Institute of Technology, Laboratory for Computer Science [MIT/LCS] in collaboration with CERN, where the Web originated, with support from DARPA and the European Commission.

By promoting interoperability and encouraging an open forum for discussion, W3C commits to leading the technical evolution of the Web. In just ten years, W3C has developed more than eighty technical specifications for the Web's infrastructure. However, the Web is still young and there is still a lot of work to do, especially as computers, telecommunications, and multimedia technologies converge. To meet the growing expectations of users and the increasing power of machines, W3C is already laying the foundations for the next generation of the Web. W3C's technologies will help make the Web a robust, scalable, and adaptive infrastructure for a world of information. To understand how W3C pursues this mission, it is useful to understand the Consortium's goals and driving principles.

87 Storage Networking Industry Association (SNIA) [40]

SNIA incorporated in December 1997 and is a non-profit trade association. Its members are dedicated to ensuring that storage networks become complete and trusted solutions across the IT community. The SNIA works toward this goal by forming and sponsoring technical work groups, producing (with our strategic partner Computerworld) the Storage Networking World Conference series, building and maintaining a vendor neutral Technology Center in Colorado Springs, CO, and promoting activities that expand the breadth and quality of the storage networking market.

88