<<

TECHNICAL WHITEPAPER

Grid-Enabling and Virtualizing Mission-Critical Financial Services’ Applications

AUGUST 2006

WWW.PLATFORM.COM 2 -Enabling and Virtualizing Mission-Critical Financial Services Applications

Table of Contents

Executive Summary...... 3 1. Solution Overview...... 4 2. Platform Symphony Architecture...... 4

3. Grid-Enabling Compute-Intensive Applications with Platform Symphony...... 5

3.1 High Availability of Symphony-Enabled Applications...... 7 3.2 Performance and Scalability of Symphony-Enabled Applications...... 7 3.3 Platform Symphony Grid-Enablement Patterns for Service-Oriented Applications...... 8 3.4 Platform Symphony for Grid-Enablement APIs...... 9 3.5 Platform Symphony and Application Technology Interoperability...... 9

4. Grid Resource Orchestration in Platform Symphony...... 12

4.1 Ownership and Policy-based Sharing of Resources...... 14 4.2 Guaranteed Service Level Agreement...... 14 4.3 Platform EGO in Action...... 14

5. Managing the Platform Symphony Grid...... 15

5.1 The Platform Management Console……………………………………………………………………...... ….15 5.2 Data Management...... ………………...... ….15 5.3 Reliability and High Availability for System and Application Components………………...... ….16 5.4 Monitoring, Dynamic Troubleshooting and Event Notification………………...... …17 5.5 Pluggable Security Framework………………...... 17 5.6 Service Workload Scheduling – Proportional Scheduling with or without Session Affinity...... 17 5.7 Service Deployment………………...... 17 5.8 Grid Node Management…………...... 17

6. Conclusion...... 18 3 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

Executive Summary

Today’s financial services firms are under increasing pressure to grow revenue and market share amidst intense competition, regulations, and the growing need for enterprise risk management. At the same time customers are struggling to lower the total cost of ownership of resources, increase operating efficiency to be faster, more agile and adaptive to changing business demands.

Critical applications like pricing, Value-at-Risk, stochastic modeling, and Monte Carlo simulations need to be executed in real time, requiring massive computational power, working with multiple applications, and faster application runtimes. Financial Services organizations have experienced dramatic growth over the last several years, with many forecasting much more growth to come. To take advantage of the market potential, FS firms need to have increased business agility and more complex business risk analysis. Front and middle office business units are looking at deploying virtualization solutions for many demanding applications like:

• Credit, FX, and equities derivatives • Convertibles calculations, bond yields • Corporate derivative value risk management • Intra-day portfolio and credit risk • Financial engineering and model development • Portfolio analysis • FX options • Risk reporting & compliance • Derivatives credit exposure

The computation engines for these applications are usually tightly coupled to the overall risk management system. As a result, all of these executable programs run on the same physical machine, which may also function as the database server. This leads to an infrastructure that is difficult to manage and an environment that cannot scale to keep pace with the ever-changing business demands.

This monolithic architecture introduces high operational costs by:

• Forcing financial application developers to move outside their area of expertise to develop distribution mechanisms and load balancing schemes for internal HPC solutions; • Duplicating HPC solutions across each line of business; and • Requiring each business technology team to fully staff in order to operate the HPC services and perform service maintenance, such as repairs and capacity planning, while tuning and upgrading systems.

To solve these problems, banks are turning to resource virtualization solutions. Resource virtualization enables large-scale distributed computing where geographically dispersed computers can be shared dynamically between many applications to behave as a single shared computer. As enterprises evolve to be more agile and adaptive, and move to a Service Oriented Architecture (SOA), the Platform technology extends the SOA-paradigm to compute resources and delivers Service Oriented Infrastructure (SOI). With this technology, companies can balance the supply and demand of computing resources by providing users with a single, transparent, aggregated of computing power. Ultimately, it provides the ability to lower the total cost of computing by providing on-demand, reliable and transparent access to available computer resources.

This white paper provides an overview of Platform Symphony, covering the technical aspects of Symphony architecture, grid- enabling and managing applications with Platform Symphony. The target audience for this white paper are the IT managers and application architects in the Financial Services market.

4 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

1. Solution Overview

Platform Symphony, designed for the financial services market, is ideal for mission-critical risk management and compute intensive application services. Built on Platform Enterprise Grid Orchestrator™ (EGO), Symphony allows you to , test, grid-enable, and manage application services on a highly fault-tolerant, shared, scaled-out infrastructure.

Platform Symphony brings the following benefits to your business:

• Launch new products faster, better manage your risks, and increase your revenue Designed to solve massively compute-intensive problems in record time, Symphony empowers you with faster results. On an average our customers have seen 80% speed-up in application run times. Faster results allow you to make faster decisions, be more competitive, and win more business.

• Increase server utilization and infrastructure ROI Most server farms operate at less than 30% of their capacity, and most are over-provisioned for peak loads. Wasted capacity equates to wasted investment (datacenter costs, energy, real-estate, thermal issues, maintenance). With Platform Symphony, you can increase server and cluster utilization to up to 95%.

• Easily grid-enable applications Tailored for grid-ready environments, the Symphony Developer Edition allows LOBs (Lines of Businesses) and application developers to build applications without involving IT, leading to faster time to build and deploy new applications on the grid and faster realization of revenue opportunity.

• High Reliability IT organizations can leverage Platform Symphony’s resilient and fault-tolerant architecture to provide seamless business continuity. From automated failover and resiliency in the product to seasoned worldwide support, Platform Symphony delivers complete reliability to our FS customers.

The key strength in Platform Symphony comes from its unique de-coupled architecture built with the Platform EGO technology and the service-oriented application component. This de-coupled architecture provides more flexibility and reliability, allowing businesses to be agile and adaptive.

2. Platform Symphony Architecture

A full-scale installation of Platform Symphony architecture consists of three subsystems:

i. Platform Symphony Developer Edition (DE) is a services-oriented application middleware that enables enterprise application services to be segmented into parallel execution units to be distributed seamlessly on a virtual resource infrastructure (Platform EGO). It enables scheduling and management of service-oriented workloads within different business applications by applying scheduling policies to the resources that are shared between applications through Platform EGO. It also provides easy-to-use APIs and rich design patterns to seamlessly virtualize all types of service-oriented applications with minimal changes. With Platform Symphony Developer Edition (DE) application developers can virtualize, test, debug, and run service-oriented applications without requiring access to resources managed by Platform EGO. 5 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

ii. Platform EGO provides a flexible way to orchestrate multiple applications into a single, cohesive, efficient system. Platform EGO dynamically provisions virtual compute resources and manages execution of enterprise application services. By decoupling resource management from workload management, Platform EGO can effectively allocate, prioritize and manage the supply of resources with business policies across the enterprise. This functionality provides organizations the ability to scale up and scale out, while improving application performance, resource utilization and achieving better Service-Level Agreement (SLA) management overall.

iii. Platform Management Console is a common web-based console for administration and management of all products built on Platform EGO, including Platform Symphony and Platform VMO™. This enables administrators to easily view, customize, manage, monitor, and request SLA- based resources on-demand.

Figure 1. Platform Symphony 3: Built on Platform EGO

3. Grid-Enabling Compute-Intensive Applications with Platform Symphony

With Platform Symphony, users can distribute and grid-enable compute-intensive applications in several different ways that correspond to seven distinct application patterns for optimizing application grid-enablement and performance. This section will cover the internal system components of Platform Symphony, reliability and performance of Symphony -enabled applications, and the different application patterns for grid-enablement.

The following diagram shows the internal system component architecture of Platform Symphony and its relation with Platform EGO. Platform Symphony offers an application grid-enablement library, workload management, real time high performance and scalability, and reliable execution. Once a compute-intensive application is grid-enabled with Platform Symphony, the application can run anywhere on the dynamic set of grid resources managed by Platform EGO. 6 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

Figure 2.

SSM – Platform Symphony Service Session Manager SIM – Platform Symphony Service Instance Manager SD – Platform Symphony Session Director API – Platform Symphony API

Platform Symphony is a service-oriented virtualization solution that:

• Is an out-of-box grid messaging middleware, managing service-oriented workload on resources that are either dynamically allocated by Platform EGO or statically defined in the Platform Symphony DE configuration. • Uses key concepts of Connection, Session, Service, and Task to abstract fine-grained service workloads on the grid. • Is designed to intelligently route independent tasks for services. • Is optimized to run high performance and real-time service-oriented applications with high CPU utilization and low network latency on large-scale grids consisting of thousands of CPUs. • Has many reliability measures built-in for high availability, resilience and failover of systems and applications on dynamic grids, with no single point of failure. • Provides easy-to-use APIs and rich design patterns to seamlessly grid-enable service-oriented applications with minimal changes.

Because the service-oriented grid messaging middleware layer of Platform Symphony is completely decoupled from the underlying grid infrastructure and resource orchestration layer, it allows application developers to build and test grid-ready applications independently of the grid, without involving enterprise IT. Further, once these grid-ready applications are built, they require no additional code changes and are ready to be published to the grid. This provides a key benefit for application developers leading to faster time to grid-enable new applications without the need to understand grid infrastructure details. 7 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

The unique decoupled architecture of Platform Symphony, has allowed us to offer a completely independent tool for application developers, the Platform Symphony Developer Edition (DE) - the first and only solution tailored for the unique needs of grid- enabling, testing, and running service-oriented applications in a grid-ready environment.

3.1 High Availability of Symphony-Enabled Applications

Platform Symphony, with its decoupled architecture, enables Lines of Businesses (LOBs) to run multiple applications as well as different versions of Platform Symphony on the same Platform EGO grid. Resources can be shared across these multiple applications, yet when one of these applications is being upgraded or has problems, it can be isolated and the other applications are not impacted. You can also run multiple versions of the same application, or multiple types of workloads or third-party applications – you have the flexibility to share resources and isolate applications when they need maintenance or are down. Furthermore, if EGO becomes unavailable, all applications will continue to run using currently allocated resources. With Symphony, IT managers and application architects are assured of application-level fault tolerance, where the application can survive any hardware, OS or grid infrastructure failure.

Figure 3.

3.2 Performance and Scalability of Symphony-Enabled Applications

Platform Symphony is designed and optimized for real-time, high-performance, service-oriented computing on large scale grids, running time-critical sub-second-tasks on hundreds or thousands of CPUs with extremely high CPU utilization and low task latency.

The following key performance and scalability factors are certified with Platform Symphony:

• Task granularity – A task is an atomic workload unit in a service-oriented application running on the Platform Symphony grid. The shorter the task, the more stress that’s put on the grid, and the more difficult to achieve high utilization. A grid system that can handle short tasks well will have no problem handling long tasks.

The task granularity target of Platform Symphony ranges from 10 milliseconds up to any longer-running task times.

• The number of nodes and CPUs – Large-scale grids, consisting of several thousands of CPUs allow more applications to share resources and deliver faster application performance and optimization. A scalable grid system should deliver consistent performance and CPU utilization efficiency even as the number of CPUs in the grid is increased significantly. Platform Symphony is certified for the following thresholds: o 5,000 compute nodes per cluster, running is excess of 20,000 CPUs o 2,000 CPUs per application 8 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

o 1,000 client nodes per application o 100 applications per cluster

Multiple clusters can be geographically dispersed, with client and compute nodes dynamically joining any cluster.

• CPU utilization efficiency – CPU utilization is a key performance indicator that truly maps back to IT infrastructure investment – the more utilization you can squeeze out of your existing resources, the fewer resources you need to meet business demand. The more CPUs a grid system can manage while still maintaining a high CPU utilization, the more it can scale out. Platform Symphony achieves a consistent rate of 97% CPU utilization across 2,000 CPUs for tasks that have 2 second compute time and 1KB messages per task.

• Task latency – Task latency is the time it takes for a single task to make a round trip through the grid system. Platform Symphony achieves task latency as low as 1.6 milliseconds for a 1 KB message round trip (given that the number of tasks do not exceed the number of CPUs and there is no task queue).

• Task Throughput – Task throughput is the number of tasks per second completed on the grid. On 2,000 CPUs, Platform Symphony achieves task throughput of 1,000 to 3,000 tasks per second.

3.3 Platform Symphony Grid-Enablement Patterns for Service-Oriented Applications

Platform Symphony enables users to distribute and grid-enable compute-intensive applications in several different ways. When considering where and how to grid-enable, integrate, or optimize an existing or new service-oriented application with Platform Symphony, the following seven design patterns are recommended. Integration can be done at the client side, the service side, the middle tier between the client and service, or more than one place:

1. Scatter-and-Gather – A client application scatters tasks to services on the grid and then gathers task output to be returned to the application.

2. Client Broker – A middle tier broker aggregates requests from multiple real clients, and acts as a single Platform Symphony client to translate these requests into Platform Symphony workloads.

3. Master Client – A Platform Symphony client holds the main application logic to send and receive tasks to services. This client can either run as an EGO service or a Symphony service for improved reliability. This pattern can be used to grid- enable J2EE, .NET, CORBA, or MPI applications.

4. Grid Decorator – Decorate an existing middle tier with capabilities to interact with Symphony. This middle tier can be a load balancer of a distributed or clustered service-oriented application, which invokes the services on Platform Symphony grid.

5. Super Service Proxy – A service proxy on the service side of an existing service-oriented application intercepts service workloads and acts as Platform Symphony client, sending the workloads to real services running on the grid as if they are on a super computer of hundreds or thousands of dynamic CPUs. This pattern can be used to grid-enable Web Services or JMS/MOM applications.

6. Service Façade – For grid-enabling a scalable and loosely coupled service-oriented application, a service façade provides a simple and coarse-grained service interface for Platform Symphony client services to interact with. Any detailed method name and complex data can be encapsulated in a unified form of input and output messages. 9 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

7. Pass-by-value and Pass-by-reference – If the task input and output data are small (100KB for 1-second task), it is efficient to send the data value directly through Platform Symphony to the services on the grid, without extra overhead of going through another channel. Otherwise, it may be more efficient to send the data reference through Platform Symphony to the services, which will then get or return the data value via another channel outside Platform Symphony. This will avoid slowing down Platform Symphony workload scheduling and dispatching with large data values.

3.4 Platform Symphony Grid-Enablement APIs

Technologists have learned many lessons in the past from object-oriented architectures like CORBA that support APIs tightly- coupled into object data structures and methods, which require very complex infrastructures for remote invocation. This complexity and tight coupling greatly limits the scalability, reliability and portability of enterprise applications.

The emerging SOA (Service-Oriented Architecture) approach is the model Symphony has adopted. With this model, the information of a remote invocation is encapsulated in an application message regardless of how the service is implemented by the application. The infrastructure delivers a request (input) message from a client to a service, and returns its response (output) message back to the application. To this end, for ultimate scalability, reliability and portability, Symphony has provided a simple yet generic API to on-board service-oriented applications.

Grid-enablement with Platform Symphony is simple, language agnostic (C/C++, Java, .NET, and etc) , and data-format- independent. Platform Symphony provides data serialization and de-serialization APIs to pack and unpack input and output data payloads for distribution on the grid. Once the input and output payloads are distributed to their end points by Platform Symphony, they can be converted back to their native data structures to fit into the existing client and service application code. Therefore Platform Symphony can be used to grid-enable a wide range of service-oriented applications that are written in different protocols and languages.

3.5 Platform Symphony and Application Technology Interoperability

Web Services

Web Services (WS) are standard and implementation-independent wire protocols for interoperation among heterogeneous web clients and services. Applications written in various programming languages and running on various platforms can exchange data over networks using Web Services protocols, mostly SOAP and HTTP.

Platform Symphony supports the SOAP/HTTP WS protocol wherever appropriate for interoperability. For example, Platform Symphony supports the SOAP/HTTP WS protocol in the administration interfaces. To ensure efficient performance, the WS protocol is supported in Platform Symphony by packaging an application SOAP message as a payload in a Platform Symphony input/output message. Due to the lack of performance of SOAP/HTTP based protocols, Symphony uses an optimized binary protocol for application messages, enabling unmatched levels of message throughput.

The following Super Service Proxy pattern can be used to grid-enable a WS application at the service side. 10 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

Figure 4.

J2EE and .NET

Platform Symphony provides grid-enablement capabilities for application server environments as a typical services-oriented application implementation.

The following Master Client pattern can be used when integrating Platform Symphony with a J2EE Application Server.

Figure 5. 11 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

JMS and MOM

JMS (Java Message Service) and MOM (Message-Oriented Middleware) provide reliable inter-application communication as opposed to inter-process communication directly using raw network protocols like sockets. The primary advantage of this message-oriented communication protocol comes from its ability to store, route and transform messages in the process of delivery. Its flexibility of integrating multiple applications together to form a large application finds its major uses in EAI (Enterprise Application Integration). It can be used to integrate virtually any system and application together including Platform Symphony and its applications.

The following Super Service Proxy pattern can be used to integrate with JMS or MOM.

Figure 6.

CORBA

Compared to emerging technologies, a major limitation of CORBA is that it relies on an over-complicated CORBA infrastructure and dictated communication protocol that all the applications in the system have to follow in order to interoperate. This is where WS and SOA provide significant advantages – defining a message transport schema (it is often XML or SOAP) and overall architecture, and leaving the implementation details to the application. Other limitations of CORBA are its lack of message queuing for temporarily unavailable applications, lack of built-in resilience and fault-tolerance, scalability and performance limits.

Platform Symphony can help alleviate many of these limitations by grid-enabling CORBA-based applications. For example, Platform Symphony enables task queuing with significant improvements in performance, scalability, and fault tolerance by running CORBA applications as services in the Symphony grid.

Integration between a CORBA application and Platform Symphony can be done using the same model as the J2EE Application Server, using the Master Client pattern. 12 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

MPI

MPI is a library specification of message-passing models for parallel applications on parallel computers, clusters, and heterogeneous networks. MPI implementations can be called from Fortran, C, C++ and Ada programs, and linked as part of the applications. They are designed for parallel application patterns and optimized to take advantage of many advanced parallel hardware architectures including CPUs, memory, and network.

There is no resilience and fault tolerance capability built in MPI. Application developers must build their own reliability measures in the application. However, building a reliable application with resilience and fault tolerance in a distributed environment is nontrivial, especially for application developers whose main focus is on application logic as opposed to system plumbing. In addition, most MPI applications have a monolithic architecture. The components inside one application are tightly coupled; one component failure will mostly bring down the whole application.

Since it is difficult to modify an existing MPI application that has been designed for a tightly coupled and monolithic architecture, Symphony enables two modes of integration with MPI applications: native support of the MPI paradigm to ensure smooth transition to a loosely coupled service-oriented architecture, or preservation of existing MPI-based applications on the Symphony grid. An example of native support is an MPI application that has its master process sending independent tasks to its slave processes. A simple way to move this application to a service-oriented architecture is to modify the master process to become a client, and the slave processes to become services, using the Master Client pattern.

4. Grid Resource Orchestration in Platform Symphony

Core to Platform Symphony is Platform EGO that plays a key role in providing grid resource orchestration. Platform EGO is a large, virtualized pool of computing resources, serving multiple applications or user groups, which orchestrates on-demand and policy based allocation of resources to applications.

Platform EGO provides Information, Allocation and Execution services to all applications in its Enterprise Grid Architecture. Platform EGO concepts are applied as follows:

• Information about the state and utilization of the managed resources is centrally available to all applications • Allocation of resources to the applications is made based on scheduling techniques and user defined policies • Execution of application and system services is managed across the distributed and heterogeneous infrastructure to maximize workload throughput

Platform EGO is different from existing grid technologies because other technologies are monolithic in design and focused on solving a specific problem (such as system management, provisioning, or batch scheduling). These products create solution silos, which limits their usage. Instead of having a series of application and administration silos based on monolithic programming practices, Platform EGO offers a building block approach where any number of applications and middleware systems can be deployed, sharing compute resources as needed, and without building silos. 13 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

Figure 7. The Platform Solution

Benefits of component decoupling:

• Business Agility and Flexibility o Share resources across different workload managers, applications, Lines of Businesses (LOBs) in the same cluster o Support different versions of Platform Symphony and Platform LSF in the same cluster o Each application can be upgraded in isolation while other applications continue to run their own versions

• Higher Availability and Reliability o Complete application segregation o No single point of failure – EGO manages failover and redundancy of all services running in the grid o If one instance of Platform Symphony fails, other instances are not impacted o If EGO encounters problems, the application is not impacted

And with the decoupled architecture of Symphony, the underlying grid infrastructure is completely hidden from the application itself. Whether the application runs on 20 CPUs or 2000 CPUs, the application developer does not need to know, making it easier to grid-enable applications.

As businesses strive to be more agile and adaptive to meet changing business demands, they are rapidly moving towards Service Oriented Architecture (SOA). Platform EGO technology seamlessly adds Grid capabilities to SOA-based applications, and extends the service-oriented paradigm to compute resources and delivers enterprise level Service Oriented Infrastructure (SOI). Thus, the evolution and maturation of application architectures and infrastructure technologies, enabled by Platform EGO provides the virtualization of large-scale grid infrastructures. 14 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

4.1 Ownership and Policy-based Sharing of Resources

Platform Symphony, with Platform EGO inside, provides a sophisticated ownership and sharing mechanism for resource allocation policies among hierarchical resource consumers (a resource consumer is generally configured as an application or services requiring compute resources):

• A consumer can own a pre-set number of CPUs. When the consumer does not use its own CPUs, it can lend them out to other consumers. But when the consumer’s need for CPUs increases, the previously loaned-out CPUs can be reclaimed. • When a consumer has a workload requiring more CPUs than it owns, it can borrow CPUs either from other consumers or from a shared resource pool.

4.2 Guaranteed Service Level Agreement

When a compute node becomes unavailable in the middle of application execution, Platform Symphony will allocate another compute node for the consumer to guarantee the CPU capacity as agreed to in the ownership and sharing policy. If the application running in the consumer can tolerate the node failure, it can then pick up the new compute node, and recover its guaranteed CPU capacity.

4.3 Platform EGO in Action

In the simplest of terms, a policy is defined by CPU utilization thresholds. Resource requests are then processed by the Platform Enterprise Grid Orchestrator (EGO):

1) A request for resources is made to Platform EGO 2) Information provided in this request would include: • Characteristics of the resource (data, storage, processing) • Environment necessary (physical or virtual processor) • requirements • system requirements • Data and storage requirements needed to fulfill the overall resource request

3) Allocation decisions are made depending on the availability of resources and priorities associated with this resource request and in the context of other resource requests currently executing on the grid 4) Execution of the resource request proceeds, either indefinitely or until re-prioritized, suspended or completed 5) Information is fed back by Platform EGO agent to Platform EGO kernel regarding available resource capacity for future allocation and execution decisions

This process can be easily applied to resource requests of any kind. In addition, since Platform EGO is built as an open grid platform, it can interact with all types of IT services in a scalable, policy-driven architecture. 15 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

5. Managing the Platform Symphony Grid

5.1 The Platform Management Console

The Platform Symphony grid is managed by a web-based Platform Management Console, which also forms one common management interface for all Platform products built on Platform EGO, for example, Platform Symphony and Platform LSF. The Platform Management Console enables IT organizations to create custom administrative plug-ins that fit seamlessly into the console, as well as modify the console’s look-and-feel to comply with corporate branding standards. The Platform Management Console can also be integrated with corporate intranet portals, using standard portal frameworks. The following diagram shows how these products or sub-systems work together.

Figure 8.

5.2 Data Management

Data management is a challenging issue when running data-intensive service-oriented applications on a large grid. However, because there are so many different application data distribution patterns, they cannot be solved by a “one-size-fits-all” solution. According to data access patterns, different applications require different data distribution solutions. To help customers effectively grid-enable their applications, the following solutions are provided for data distribution and cache on the grid:

• Platform provides a separate data cache product decoupled from Platform Symphony, which supports persistent and shared data cache as well as cache APIs that can be used by Platform Symphony clients and services to read and write cache on the grid. 16 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

• Based on customer needs, Platform can recommend third-party data cache solutions for data management on a Platform Symphony grid. • Platform Symphony provides session affinity in its workload scheduling, which will cache data in the service instance memory for improved performance. • Platform Symphony provides solutions and best-practices for pass-by-value, pass-by-reference, and data staging when distributing data on the grid.

5.3 Reliability and High Availability for System and Application Components

In a heavily loaded, large-scale and dynamic grid, a diverse array of events such as errors, failures and administration maintenance could lead to system downtime. Regardless of these events, a reliable grid should provide predictable results by tolerating failures and events, dynamically adjusting to maintain overall grid functionality, and ensuring application workload continues running as normal, yet completely transparent to users. A reliable grid should neither lose a single task or session due to a failure or event, nor fail a running session in the middle of execution that forces users to re-submit the session.

The reliability and HA (High Availability) of Platform Symphony comes from its “layered” clustering technology, in which a cluster of compute nodes is connected and managed together by a small group of management nodes. As Symphony provides HA and redundancy of compute nodes directly to the application, management nodes form a “management cluster” inside a Platform Symphony cluster for HA and redundancy. These management nodes provide complete cluster redundancy, and any node in the grid is eligible to become a management node for failover events. As long as there is one or more nodes operating in the cluster, the grid infrastructure will continue to serve existing and new requests.

The reliability and HA of Platform Symphony comes from its layered architecture, mutual-insurance and self-healing mechanisms:

• Platform EGO manages and monitors Platform Symphony system and application components. If one Platform Symphony system and/or application component becomes unavailable, Platform EGO will restart it either on the same or a different machine. • If a Platform EGO component becomes unavailable, the running Platform Symphony subsystems and applications won’t be affected while Platform EGO is self-healing, or a system event is being notified to an administrator and a manual attention is required. • Inside a service-oriented application, if a service instance becomes unavailable, Platform Symphony can restart the service instance on the same or a different machine; or the application can keep running without this service instance. If a task fails, Platform Symphony can re-dispatch it many times before claiming task failure to the client. If a Platform Symphony client is temporarily disconnected, it can automatically reconnect to Platform Symphony to continue its work as if nothing has happened. If a Platform Symphony client that runs as a Platform EGO service becomes unavailable, Platform EGO will restart it either on the same or a different machine so that it can reconnect to Platform Symphony.

In addition, you can run multiple service-oriented applications within one Platform Symphony subsystem, and you can run multiple different versions of Platform Symphony subsystems within one Platform EGO cluster. Because these applications and Platform Symphony subsystems are completely segregated, one application or one subsystem error event will not impact the other applications or subsystems running on the same grid of a Platform EGO cluster.

Finally, the reliability of Platform Symphony stems from its persistent state transition and reliable messaging. Every major system state transition event of resource allocation and workload scheduling, as well as every application input and output message of Platform Symphony, is persisted to a nonvolatile storage system. Platform Symphony system and application workload can be recovered from any failure situation. For example, if only a few client or compute nodes are unavailable, the application and system can be recovered from in-memory state information of Platform Symphony management nodes. If it is a complete power outage including the client nodes, management nodes, and compute nodes, they can still be recovered from the persistent storage system. No matter what error event occurs, the submitted and completed workload will not be lost, and users do not have to re-submit and re-run their workload. 17 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

5.4 Monitoring, Dynamic Troubleshooting, and Event Notification

For system and application monitoring, Platform Symphony provides the easy-to-use web-based Platform Management Console, easy-to-script CLIs (Command Line Interface), and easy-to-interoperate Web Services SOAP/HTTP interfaces.

Platform Symphony allows users to dynamically switch the debug levels in system components, to clearly identify system and application issues. The user can also dynamically ping the health information of a compute node.

All significant events and errors in the systems and applications can be monitored by the Platform Management Console by notification through standard SNMP interfaces to a third party monitoring system or tool. Examples of these events include management node down, task failure, service errors, and CPU under-allocation.

5.5 Pluggable Security Framework

Platform Symphony supports a pluggable security framework consistent across Platform Symphony middleware, Platform EGO, and the Platform Management Console in one cluster:

• SSO (Single-Sign-On) user authentication that can be integrated with a site-specific user identity system such as LDAP, Kerberos, or web SSO. Platform Symphony also supports a built-in user identity system if an external one is not needed or is not available. • Role-based user authorization for access control. • Consumer-based user delegation for service execution sandbox and segregation.

5.6 Service Workload Scheduling – Proportional Scheduling with or without Session Affinity

Based on the ownership and sharing policy, a number of CPUs will be allocated to a consumer by Platform EGO. When multiple sessions in a consumer compete for these CPUs, based on the session priority ratio, Platform Symphony will assign each session a proportion from the CPUs for the consumer.

• With session affinity – service instances assigned to a session will keep working for the same session as long as the session has unfinished tasks, assuming that data can be cached in the service instance memory for better performance when processing tasks from the same session. • Without session affinity – service instances can be assigned to any sessions. Some per-service data can still be cached in the service instance memory for better performance, but this is not per-session information.

5.7 Service Deployment

A service module, which can include business application services as well as required Symphony services can be registered in Platform Symphony by an administrator to a central repository, and then dynamically propagated and deployed to compute nodes on demand. A Platform Symphony agent will keep track of the needs and differences between the compute nodes and the central repository.

5.8 Grid Node Management

Platform Symphony management nodes are recommended to reside in the local cluster of one site for ultimate autonomy. However, a client or a compute node can dynamically connect to a remote cluster, either when the local cluster is too busy for the client, or when the local cluster has no workload for the compute node. Platform Symphony provides an API that can be used by a client to connect to a remote cluster, as well as the capability for a compute node to dynamically join a remote cluster. 18 Grid-Enabling and Virtualizing Mission-Critical Financial Services Applications

Heterogeneous computers, including Windows and server and desktop machines, can be managed in one EGO cluster to run multiple applications, as well as used by one consumer to run the same application. A resource group concept is also supported in Platform Symphony so that machines of the same or similar properties can be managed as a group. Consumer and resource policies can be set to determine where and when application services can be run, including powerful rules for scavenging resource slices from non-dedicated desktop and server machines.

Compute node blocklisting is a feature to prevent a troubled compute node from impacting application execution. Blocklisting can be set based on a failure when loading a service module, binding a service to a session, running a task, or all of the above. A blacklisted machine will not be used for the affected application until an administrator manually removes it from the blocklist.

6. Conclusion

As enterprises evolve to be more agile and adaptive, they are rapidly moving to a service-based paradigm, where Service Oriented Architecture (SOA) provides a repeatable connectivity blueprint to dynamically respond to fluctuating business demands. Platform technology seamlessly adds grid and virtualization capabilities to SOA-based applications. Platform EGO extends the SOA-paradigm to compute resources and delivers Service Oriented Infrastructure (SOI).

Platform Symphony with Platform EGO inside, and with its unique decoupled architecture provides the flexibility and reliability required for mission critical large-scale grid environments. By virtualizing heterogeneous hardware resources and running multiple applications in a shared environment, Symphony gives financial services customers the ability to scale and adapt IT according to their business needs. This empowers customers to consolidate risks across multiple lines of businesses (LOBs), reduce the total cost of ownership and dynamically respond to growth opportunities.

Further, by making it easy to interface with Symphony APIs, application developers and LOBs have the flexibility to build and test applications independent of the underlying grid infrastructure, resulting in less time required to develop and deploy new applications. Reducing the time required to grid-enable applications results in accelerated performance of these applications leading to increased revenue generation.

Grid technology has evolved with Symphony to provide the real benefits of having a common grid platform, supporting multiple applications on the same grid and managing them through a common web-based Platform Management Console. This next generation Platform Symphony, with Platform EGO inside is the industry’s most agile, scalable, reliable and business responsive solution of mission-critical financial services customers. ABOUT PLATFORM COMPUTING

Platform Computing is the global leader for grid computing solutions and a technology pioneer of the supercomputing world. The company’s solutions for enterprise and high performance computing helps the world’s largest organizations integrate and accelerate business processes, to increase competitive advantage and enjoy a higher return on investment from IT. With over 2000 customers, the company has achieved a clear leadership position in the market through a focus on technology innovation and execution. Founded in 1992, Platform Computing has strategic relationships with Dell, HP, IBM, Intel, Microsoft, Novell and Red Hat, along with the industry’s broadest support for third-party applications. For more information please visit www.platform.com.

World Headquarters United States Asia-Pacific Europe Platform Computing Inc. Boston: 781 685 4966 Beijing: +86 10 62381125 Düsseldorf: +49 (0) 2102 610390 3760 14th Avenue Detroit: 248 359 7820 Tokyo: +81 3 5326 3105 Basingstoke: +44 (0) 1256 370500 Markham, Ontario Reston: 703 251 6125 Paris: +33 (0) 1 41 10 09 20 L3R 3T7 Canada Newport Beach: 949 798 5654 London: +44 20 7956 2098 Tel: 905 948 8448 New York: 646 290 5070 Fax: 905 948 9975 San Jose: 408 392 4900 Toll-free tel: 877 528 3676 [email protected] For more information, visit www.platform.com/contactus

Copyright © 2006 Platform Computing Corporation. ® ™ Trademarks of Platform Computing Corporation. All other logo and product names are the trademarks of their respective owners, errors and omissions excepted. Printed in Canada. Platform and Platform Computing refer to Platform Computing Inc. and each of its subsidiaries.

June 2006