WHITE PAPER TransLattice Technology Overview

ABSTRACT

By combining necessary computing elements into easily managed appliances, the TransLattice Application Platform (TAP) offers enterprises a fundamentally different way to provision their applications and data. This paper explains how Lattice Computing, a unique approach that combines globally distributed computing resources into a cohesive system, not only simplifies IT infrastructure and deployment but also delivers exceptional system resilience and data control, while significantly reducing costs. TransLattice Technology Overview White Paper

Table of Contents Introduction ...... 3 E Pluribus Unum ...... 4 Distributed Enterprise Applications ...... 4 Distributed Relational Data ...... 4 Data Distribution and Policy Controls ...... 4 Cluster Hierarchy ...... 5 Inherently Resilient Architecture ...... 6 Sophisticated Redundancy Model ...... 7 Scalability ...... 8 Network Services ...... 8 Management & Administration ...... 9 Summary ...... 10

2 TransLattice Technology Overview White Paper

Introduction

Today’s typical application infrastructure is an overly complex master node or centralized points of failure (Fig. 1). Utilizing beast. Most deployments rely on numerous components— Lattice Computing, the TransLattice Application Platform including servers, application runtimes, load anticipates workers’ needs, delivering applications and data balancers, storage area networks, and WAN optimizers—all when and where they are needed. Furthermore, the ability of which are provided by a multitude of vendors. The resulting to easily add nodes (for deployment on-premises, in the application stack demands considerable integration, which cloud, or through a combination of both) increases both can drive management costs up and availability levels down. capacity and redundancy. And the inherently centralized structure often results in a poor user experience for those not located near the data This unique, new approach to scalable application computing center. Ultimately, this type of infrastructure is inefficient, simplifies IT infrastructure by combining the various necessary rigid, unstable, and costly. computing elements into easily managed appliances. This paper outlines the concepts behind Lattice Computing and TransLattice believes there is a better approach. Lattice explains how our use of intelligent distributed systems helps Computing is a resilient, distributed application architecture, enterprises boost resilience and data control, while significantly comprised of a cluster of identical nodes that cooperate to reducing costs, management burdens, and deployment provide data storage and run applications—without any complexity.

BACKUP DATA CENTER

SAN Database Application Storage Servers Load Balancing

PRIMARY DATA CENTER

Load SAN Database Application Balancing Storage Servers

TRADITIONAL APPROACH LATTICE COMPUTING

Figure 1. Unlike conventional deployments, Lattice Computing decentralizes all aspects of an application. This distributed infrastructure ensures business continuity and provides users with local access to information.

3 TransLattice Technology Overview White Paper

E Pluribus Unum Distributed Relational Data

Out of many, one. This idea of distributed strength is the The TransLattice platform further employs the concept of basic premise behind the TransLattice solution. TAP coalesces “e pluribus unum” through a geographically distributed computing resources throughout an organization so that relational database that delivers high-performance, global administrators and users see one unified system. With TAP, redundancy, and cost efficient data storage. Existing nodes may be dispersed across multiple sites and cloud applications that transact, access, and transform relational providers, but they work together to form a highly resilient data using SQL can use this storage without any modification. application and data platform (Fig. 2). Applications residing Database tables are automatically partitioned into groups of throughout the network can pull from the distributed rows based on attributes, and these partitions are redundantly resources, efficiently delivering the performance of a local stored across the computing infrastructure. The database application—while, in reality, using the world’s first truly provides full ACID semantics, ensuring reliable processing of geographically distributed relational database. database transactions.

Distributed Enterprise Applications Data Distribution and Policy Controls

TAP puts “e pluribus unum” into action by distributing and TAP anticipates future access patterns and then automatically decentralizing the application server so that it runs multiple and intelligently distributes data according to those patterns— application containers—yet it appears as a single application thereby minimizing impact on the network and improving runtime and database. Standard J2EE applications execute end-user performance. At the same time, when an object or seamlessly across the entire computing environment, while database partition is created, TAP combines historical actually boosting resilience, scalability, and user performance. access patterns and automatically-gathered network topology information to determine the relative costs of Adapting an application to run on TAP requires minimal different storage strategies. Represented in these costs are effort. In fact, the majority of time invested centers on testing both the resources utilized by the storage strategy, as well the application with the TransLattice platform as part of the as the anticipated amount of time it will take users to application’s release or deployment process. interactively retrieve the information in the future.

Figure 2. Each node contains processing, storage, and network resources and works cooperatively with the rest of the cluster to provide key application services and unified management. 4 TransLattice Technology Overview White Paper

Policy also plays an important role in how TAP distributes of capacity constraints or an outage. In this case, the system and stores data, while giving administrators a high level of notes the location of the information in a globally distributed control. For example, by establishing location policy rules, exception index. Additionally, usage patterns and the administrators can specify that certain tables or portions of optimal positioning of data may change, resulting in other tables must or must not be stored in various locations. If exception index entries. At times when the network is not critical data must be stored only on a particular continent, or fully utilized, the system leverages spare capacity to move may not be stored at locations with inferior physical security, items in the exception index to their preferred locations. administrators can pinpoint these restrictions with ease. Similarly, they can use redundancy policy rules to specify Cluster Hierarchy how many copies of each type of information must be stored, so organizations can meet business continuity and To simplify the specification of policy rules and apply further availability goals. A redundancy policy rule might specify control over the infrastructure, administrators may also define that all transaction data must be stored on at least two groupings between nodes. These groupings form a cluster different continents, ensuring that the data is preserved even hierarchy, which is maintained as a balanced tree (Fig. 3). if all computing resources on a given continent fall offline. Cluster hierarchy is useful because it enables administrators Within the constraints specified by policy rules, the system to align the infrastructure more closely with business policy. then generally turns to the most efficient calculated storage For instance, an administrator may group resources by plan to store each object or database partition. Because of geographic region to meet business continuity use cases, this, the same procedure that calculates where to store and then further group them by country to meet compliance information can also be used to locate information within the goals. The hierarchy need not correspond to actual network system for access. In some cases, the system may not be topology; instead, it is a grouping that allows the ready able to place information in the most ideal locations because specification of policy.

Figure 3. Typical Cluster Hierarchy. Grouping nodes in this way allows administrators to easily meet business and disaster recovery use cases through the intelligent placement of information.

5 TransLattice Technology Overview White Paper

Inherently Resilient Architecture

Administrators manage these groupings through the Inherent resilience is another aspect that sets the TransLattice TransLattice Cluster Hierarchy Wheel, which provides a platform apart from traditional infrastructures, which tend to convenient interface for exploring the current status of a rely on complex replication to allow for disaster recovery. In cluster and its associated nodes. For quick and cohesive fact, most traditional frameworks require duplication of the viewing, cluster information is represented in a circular entire application stack at some secondary location, and configuration rather than in a tree (Fig. 4). Policy levels, which then use storage area network snapshot replication or correspond to a concept or type of grouping, are represented database log-shipping to periodically move changes from as rings on the wheel, with the innermost ring representing the primary location to the secondary one. the broadest grouping (such as a region). The sectors within In the TransLattice system, however, resilience against facility policy levels are policy domains. A policy domain is a logical failure is a fundamental trait of the distributed architecture. grouping of system resources in a cluster (which might Because the data is stored redundantly across the nodes correspond, for example, to a specific city or continent based on policy, the system can continue processing if a node within a region). The outermost ring on the wheel ultimately or location fails, while automatically rebuilding redundancy. drills down to the node level, and each sector corresponds Moreover, because all nodes provide the required application to a specific node. When a specific policy domain or node is services in a resilient fashion, organizations no longer need selected, the wheel rotates and zooms in to provide an to set up and maintain dedicated failover sites. No resources enhanced view of the status of associated resources. are dedicated purely to disaster recovery; instead, surplus Policy levels and their names are shown in the legend on the left resources also satisfy increases in application demand and side of the hierarchy wheel. The ability to define levels provides improve performance for application users. considerable flexibility in the policy specification process.

N o d e 6

Q u e b e c N 8 e o d n a d a d o a e C 4 N o c G M s e u 2 i r n e c m i d n s l u t e c r a o a C h M n r N

N

A n F

o

E

y

N d

n M

e

a

9

E

1

A

S

e A

S

d

U o N

LEVEL LEGEND e c

nodes n

7 N a

e e F r

City d s o w

r i N Y a Country o P 5 r k e o d Region N N o root d e 3

Figure 4. Cluster Hierarchy Wheel. The hierarchy in this case corresponds to that shown in Figure 3. In this case, nodes are grouped first by region, next by country, and finally by city.

6 TransLattice Technology Overview White Paper

How much difference does this inherent resilience make In other words, TAP provides a framework for maximum when determining how well the system responds to failures? resilience with minimum effort, enabling enterprises to feel Consider the concepts of RPO (Recovery Point Objective) secure in their ability to preserve business continuity and and RTO (Recovery Time Objective). The RPO of a system prevent data loss. is the specified amount of data that may be lost in the event of a failure, while the RTO of a system is the amount of time Sophisticated Redundancy Model that it will take to bring the system back online after a failure. Similarly, TAP helps companies avoid the pitfalls of conventional s In the case of snapshot replication systems, the RPO may storage architectures, which tightly couple storage components be more than a day’s worth of changes, and the RTO to provide redundancy. For instance, in a RAID-5 or RAID-6 generally requires manual intervention and may take array, a group of drives are combined into an array that several hours. maintains parity to cope with drive failure. However, rebuilding s In the TransLattice system, the majority of users are not after a failure requires all data on the array to be read—a affected by a failure. For the users who are affected, the lengthy process that’s likely to degrade performance and leave RTO is generally less than a minute, which represents the data vulnerable to loss in the event of any additional failures. amount of time required for their client to reconnect to a The TAP architecture, however, loosely couples all data functioning node. Furthermore, TAP allows applications to storage locations and uses different combinations of storage choose their RPO on a transaction-by-transaction basis; elements to store each object. In the event that a node or critical transactions can become instantly durable (with an storage element fails, only a relatively small amount of work RPO of effectively zero, preserving the transaction once is required to restore redundancy, and this workload is fairly success has been returned), while larger and less critical distributed throughout the system (Fig. 5). applications can be streamed out as resources allow.

Backup SAN Cabinet (RAID 6) 1 2 3 4 5 P P spare 6 7 8 9 P P 10 11 12 13 P P 14 15 ......

BACKUP DATA CENTER SAN Database Application Servers Load Storage Balancing

PRIMARY DATA CENTER

8 4 6 7 3 9 Load 13 9 12 10 7 14 SAN Database Application Balancing 11 5 3 4 8 2 Storage Servers ......

SAN Cabinet (RAID 6) 1 5 2 15 12 11 1 2 3 4 5 P P 6 1 10 spare 6 7 8 9 P P 10 ...... 11 12 13 P P 14 15 ......

TRADITIONAL APPROACH TRANSLATTICE APPROACH

Figure 5. TransLattice’s redundancy model combines business continuity and storage redundancy into a cohesive architecture, while ensuring that redundancy can be quickly restored after failure. 7 TransLattice Technology Overview White Paper

Scalability

When analyzing the reliability of a system, we typically look The TransLattice Application Platform also vastly simplifies at two key industry standard metrics: MTBF (Mean Time and reduces many traditional challenges and expenses Between Failures), which specifies the rate at which associated with scaling—including component or vendor infrastructure component failures are expected to occur, and limitations, inaccurate planning, or changing business MTTR (Mean Time To Repair), which is the anticipated amount requirements. of time required before the failure is repaired. Conventional redundancy architectures often have substantial MTTRs, Traditional architectures often require manual federation of during which any subsequent failure may cause loss of data data, complicated partitioning schemes, or careful balancing or application availability. In fact, many conventional business of components to meet performance and scalability goals. continuity architectures often have no redundancy left when To illustrate, imagine a current infrastructure composed of a a failover site is active, and deactivating the failover site can multitude of interdependent and complicated tiers of be a complicated procedure requiring the manual replication components, each of which is carefully aligned with respect of data from the failover site back to the primary site. to another. If one component in the infrastructure does not scale well, other components might be subsequently stuck. The TransLattice system improves MTTR by automatically In some cases, this precarious framework demands a managing redundancy in both normal and failure cases. “forklift upgrade,” where the existing application stack must Furthermore, the way that data placement occurs allows all be completely replaced with another of greater scale. the nodes to fairly amortize the work of restoring redundancy in the event of a failure, thereby speeding recovery. For Obviously, this level of complexity in traditional deployments example, if an organization has an eight-node cluster spread makes capacity planning tremendously important when across four locations, with policy that specifies that each anticipating future demand. Overbuilding an application is piece of information must be stored on at least three nodes expensive in operating expenses and capital costs—but, and in at least two physical locations, there are 56 different infrastructure that is originally provisioned too small may ways that each piece of data can be stored. The system need to be replaced prematurely and may not be able to selects between these different storage plans individually for meet business needs. each object. The large number of plans ensures that only a Due to TAP’s scale-out capabilities, however, organizations small proportion of data objects lose redundancy if two can easily expand capacity and storage by adding nodes to nodes fail, while also ensuring that all nodes will share the the cluster or by leveraging utility computing services in the small amount of work required to restore compliance with cloud. This ease of scalability frees organizations from the the redundancy policy. need to overprovision in the face of uncertain or intermittent Because redundancy is automatically managed by the future demand, and instead allows them to successfully distributed system, failures of individual disks or processing accommodate business needs with agility. nodes do not require administrator intervention. Administrators can simply replace the damaged resources whenever it’s convenient. Additionally, no resources are dedicated purely to redundancy or spares, so all parts of the system can be used to satisfy user requests.

8 TransLattice Technology Overview White Paper

Network Services Management & Administration

TransLattice nodes are designed to simplify management Because TAP’s significantly different architecture unites because they are self-administering and require minimal many components of the framework into a single, cohesive local configuration. Additionally, TransLattice nodes are system, the platform automates many operations that have typically deployed on dynamic (DHCP) addressing. This traditionally required careful tuning and configuration by eliminates the need for reconfiguration when computing personnel. No longer bound by tedious and tactical chores, resources are moved or network renumbering occurs. All administrators gain back valuable time to apply towards communications between nodes occur over an SSL-based strategy and fundamental business needs. Instead of secure overlay network. Nodes automatically maintain worrying about the minute details of network and resource connectivity with other nodes to provide cluster services. utilization, for example, personnel can focus on delivering new functionality in the application, addressing new business When users access the application, they do so through a cases, and infrastructure planning. Service Entry Point (SEP). The administrator specifies a number of these entry points where users connect into the Any remaining administrative actions are tied directly to cluster to obtain services. Nodes arbitrate with each other business requirements. IT staff can carefully determine what for ownership of SEP addresses. As long as one node types of policy should be in place for data storage and remains functional on the subnet where the SEP resides, redundancy, the structure of the underlying computing services will remain available through that SEP. When users resources, and the types and methods of access provided connect, they are directed to the node that may most to end-users. While administrators are no longer required to effectively handle their requests—taking into account micromanage information storage, they can always determine loading, data location, and a user’s location on the network. the location of data through reports. Reports provide data This provides linear scalability of load-balancing performance. access locations, which can be useful when analyzing how the In the event of node failure, another node takes over its system is used and where capacity might need to be added. address and continues servicing requests. The end result is that connectivity between elements of the system and end-users’ ability to reach the system remains consistent even in the event of failure.

TransLattice nodes maintain global connectivity on the overlay network through a variety of mechanisms. Nodes attempt to open connections to nodes with which they are not directly connected, using a predetermined static address, an address found using DNS-Service Discovery, or an address provided by another adjacent node that has direct connectivity to the desired node. In short, this means that if two nodes have any connectivity path and can connect to any common node, they can find each other for direct communication. As a result, minimal administration is required, and the system maintains connectivity when network changes occur.

9 TransLattice Technology Overview White Paper

Summary

TransLattice offers significant advantages in the deployment and operation of relational applications. Unlike conventional infrastructures, the Translattice Application Platform offers a geographically distributed architecture, including a decen- tralized application server and the first truly distributed relational database. Unified, simplified management provides administrators with greater levels of control over policy and data location, while the TransLattice platform intelligently automates the process of distributing data and maintaining redundancy. As a result, enterprise applications become highly resilient against disasters and data loss, while scalability and performance limitations are eliminated.

Ultimately, TransLattice is redefining application infrastructure to align more closely with the overall objectives of the business. With the TransLattice system in place, organizations save time, significantly reduce costs and complexity, and become better positioned to focus on value-generating business concerns. This is a dramatic change, indeed—but one that offers dramatic rewards.

Corporate Headquarters: TransLattice, Inc. | 2900 Gordon Avenue, Santa Clara, CA | phone: (408) 749-8478 email: [email protected] | translattice.com

© 2011 TransLattice, Inc. All Rights Reserved. TransLattice and the TransLattice logo are property of TransLattice, Inc. in the United States and other countries. Part # 9800-0001-03