24 x 7  StarTeam

A practical look at high availability

A Borland White Paper

Randy Guck Chief Scientist, Borland Software Corporation

January 2005

24 x 7 Borland StarTeam

Contents

Overview ...... 4

High-availability fundamentals ...... 4

How available is highly available? ...... 5 A distorted term ...... 5 Availability by the numbers ...... 5 The myth of the nines ...... 6 A better approach to availability...... 7 High availability at what cost? ...... 8 Availability vs. investment...... 9 ALM high availability in perspective...... 10 Enemies of high availability ...... 11 Infrastructure failures ...... 11 Application outages ...... 13 Plan of attack ...... 15

StarTeam high-availability best practices...... 15

Administrative practices...... 15 Practice #10: Don’t be cheap ...... 15 Practice #9: Enforce security ...... 16 Practice #8: Centralize your servers ...... 16 Practice #7: Enforce change control ...... 16 Practice #6: Document everything...... 16 Practice #5: Test everything...... 17 Practice #4: Design for growth ...... 17 Practice #3: Choose mature software ...... 17 Practice #2: Choose mature hardware ...... 17 Practice #1: K.I.S.S...... 17 Flash control ...... 18 StarTeamMPX ...... 21 StarTeamMPX Cache Agent...... 22 Backup procedures ...... 23 Backup procedure prior to StarTeam 2005 ...... 24 Backup procedure with StarTeam 2005 ...... 25 Redundancy: Reducing SPOFs...... 28 Failover management ...... 30 Failover prerequisites ...... 30 Active/passive failover configuration...... 31 Process monitoring with Borland Deployment Op-Center ...... 34

2 24 x 7 Borland StarTeam

Disaster Recovery...... 35 Replication for DR...... 35 Synchronous replication ...... 36 Asynchronous replication...... 37 Batch replication ...... 38 Other StarTeam 2005 high-availability features ...... 38

Summary ...... 39

Glossary ...... 40

References ...... 42

3 24 x 7 Borland StarTeam

Overview

Software configuration management (SCM), change management, and other application lifecycle management (ALM) processes are quickly becoming mission-critical processes within the enterprise. Correspondingly, organizations that depend on Borland StarTeam are seeking ways to maximize its availability and resiliency to keep development teams running uninterrupted. In this white paper, StarTeam high-availability topics are discussed, such as online backups, failover techniques, and disaster recovery. Specific StarTeam capabilities and techniques that afford continuous operation in the global enterprise are described, including features introduced in the 2005 release.

The goal of this white paper is to help you prepare a comprehensive high-availability plan for your organization’s use of StarTeam. Because there are many factors that contribute to a comprehensive plan, the first part of this paper discusses general high availability issues such as what high availability means and costs vs. benefits. The second part focuses on specific strategies and options that you can use with StarTeam to provide the level of availability needed by your organization. You will find that high availability can be achieved with a graduated series of measures, each of which improves availability at an additional cost.

High-availability fundamentals

As you probably already suspect, there are no true 24 x 7 applications. Even the most critical and downtime-sensitive systems experience glitches and even failures. The real goal for critical applications is to maximize availability, coming as close to 24 x 7 operation as possible. In this section, we’ll take a practical look at what high availability means, what it costs, and the factors that make high availability a challenge.

4 24 x 7 Borland StarTeam

How available is highly available?

In this section we take a look at what high availability means and how we measure it.

A distorted term Back when mainframes ruled, high availability had a well-defined meaning. Unfortunately, with today’s myriad hardware and software products, the term has been hyped, marketed, and distorted to mean many things. Depending on whom you ask, high availability could mean any of the following:

• 24 x 7 uptime: A worthy goal, but not absolutely attainable

• Clustering: One of many technologies that can improve availability

• Failover: Any of several techniques for recovering from an outage

• Online backups: Another technique that can improve availability by reducing scheduled downtime

• “Five nines” or “Six sigma”: 99.999% or 99.9999% availability, measured or predicted over time

• Currently not dating: As we said, the term has lots of meanings.

Availability by the numbers Most commonly, the availability of a given service is measured as a percentage of time that the service is actually or theoretically available, on average, compared to the desired availability of the service. The following chart translates uptime (and downtime) percentages into real time:

5 24 x 7 Borland StarTeam

% Uptime % Downtime Downtime per year Downtime per week 99% 1% 3.65 days 1 hour, 41 minutes 99.9% 0.1% 8 hours, 45 minutes 10 minutes, 5 seconds 99.99% 0.01% 52.5 minutes 1 minute 99.999% 0.001% 5.25 minutes 6 seconds 99.9999% 0.0001% 31.5 seconds 0.6 seconds (“six sigma”)

This chart assumes that the desired availability is 24 hours a day, seven days a week (24 x 7). In practice, many services have availability windows that are not 24 x 7 but perhaps 24 x 5 or even 12 x 5. For such services, it does not make sense to target—and pay for—a 24 x 7 availability plan when it is not needed. For example, you might be able to perform offline maintenance tasks on weekends while still meeting availability needs on weekdays.

The myth of the nines Many people get stuck in the trap of focusing on the number of availability “nines” and then attempt to build systems that meet the chosen metric. This approach is very difficult for a number of reasons:

• Specific teams and individuals tend to request more availability than they really need. No one likes the thought of being unable to get to a needed service, but occasional, short service outages rarely cause serious financial loss. Project managers will say, “Our project is critical—we need 99.99% availability.” But in practice, software development teams are rarely unproductive during short application outages.

• True reliability metrics are very difficult to compute. Because of the number of variables and interactions between system components, reliability computations are very complex. For example, consider a system comprised of seven components, each of which has a stated reliability of 99.99%. The system’s reliability would be 0.9999 raised to the seventh power, which is 99.93%. A 0.07% reliability difference might not sound like much, but the predicted downtime of each component, 52 minutes per

6 24 x 7 Borland StarTeam

year, drops to over 6 hours per year when measured at the system level. (References cited at the end of this article guide you through the math, if you really want to do it.)

• Downtime is often affected by future, unforeseeable business decisions that cannot be entered into simulations and computations. For example, your business might decide to install new security patches, accelerate a product upgrade schedule, or add a new business unit to existing servers, requiring some project refactoring.

• Availability is actually a function of mean time between failures (MTBF) and mean time to repair (MTTR). The generally accepted formula is: A = MTBF / (MTBF + MTTR). This means that availability is increased as MTBF is increased or MTTR is decreased. Again, the actual availability computations are complex: a disk drive might have an MTBF of 5 years, but if you depend on lots of them and do not maintain an inventory of replacement drives, your MTTR could be very high.

Goals such as “six sigma” sound good, but such a system, with 99.9999% availability, would be allowed to have only 6 minutes of downtime every 11.4 years! (You don’t even want to know what it costs to build a system that meets that metric.)

A better approach to availability Instead of focusing on availability percentages, there’s a more practical approach: focus on scenarios and probabilities. Examine your organization’s needs, possible service disruptions, and most-to-least possible failure scenarios, and develop a plan to prevent or quickly recover from the most likely problems. Here are some sample questions you and your organization should ask in preparing your plan:

• What are the critical hardware and software components that, should they fail, will cause a service outage?

• How can single points of failure (SPOFs) be eliminated?

• How would a critical component be repaired or replaced if it failed, and what is the estimated time to repair or replace? Have failure scenarios been tested?

7 24 x 7 Borland StarTeam

• Does your organization have comprehensive backup, recovery, and disaster recovery plans? Have they been documented and tested?

• Is a plan in place so that knowledgeable staff members needed to enact a recovery plan can be located?

The advantage of scenario-based high availability planning is that problems can be identified, addressed, and even tested on a priority and cost/benefit basis. The next section discusses the cost/benefit topic in more depth.

High availability at what cost?

As with most things, the effectiveness of a high availability plan depends on what you put into it. In developing a plan, what should you focus on first? Should you spend money on mirrored disks? Or should you start by just documenting the backup procedure? As you might expect, there are simple things you can do that cost little but are highly effective. At the other end of the spectrum, you can spend lots of money to cover fringe conditions that are highly unlikely to occur.

8 24 x 7 Borland StarTeam

Availability vs. investment As a guideline for where to start first and what to do next, consider the following chart:

• Disaster recovery planning • Failover management Availability • Redundancy (no SPOFs) • Backup procedures • Demand peak management (flash control) • Basic administrative practices

Investment

Figure 1: Availability vs. investment

This chart suggests that a service’s availability increases as investment increases to address specific issues. It also suggests that there are basic steps that can be made with relatively low cost but high impact. For example, if you establish basic administrative practices, engage features that manage demand peaks, and use sound backup procedures, you are halfway up the Availability axis with little expenditure on the Investment axis.

In preparing a high-availability plan, you should take the following measures in this general order:

1. Enact basic administrative practices: This includes such things as training for IT staff members, defining and enforcing security measures, and enforcing configuration change control.

2. Manage demand peaks: This includes capacity management and instituting mechanisms to prevent service “brownouts” or “flash floods” caused by congestion. StarTeam provides specific features to help smooth out demand peaks.

9 24 x 7 Borland StarTeam

3. Establish sound backup procedures: With the 2005 release, StarTeam supports online backup procedures that prevent disruption to users.

4. Use redundancy to eliminate SPOFs: Eliminating single points of failure requires some capital investment but allows you to go the next level of availability.

5. Implement a failover plan: Failover management allows a service to more quickly recover from a hard failure than simply restoring from backups. Failover Management Systems (FMS), also known as cluster management systems, provide failover capabilities with standby servers.

6. Prepare a disaster recovery plan: Total site failure is the least likely scenario you will face, but it also the scenario that requires the most preparation and planning. If the criticality of your business requires it, you must plan for the disaster scenario.

Each of these topics is addressed in more detail later in this article. But first, let’s compare the high availability requirements of ALM systems to those of other systems.

ALM high availability in perspective Obviously, the costs devoted to improving availability must be balanced with the benefits of increased availability. Which kind of systems typically spend the most money for maximum availability? They generally fall into two classes:

Life-rated systems: These are applications on which human life depends, such as onboard flight software, emergency response applications, and command-and-control systems. The failure of such systems endangers human life. Correspondingly, these applications expend great cost effort in measures such as reliability, fault tolerance, and resiliency.

High financial cost systems: These are applications that, when unavailable, can create significant and sometimes cumulative costs to business. Examples include stock-trading systems, online reservation systems, and banking systems. At the extreme end of this spectrum, systems are placed in hardened bunkers that can withstand earthquakes and

10 24 x 7 Borland StarTeam

tornados, and power and cooling systems are installed that allow continuous operation when public utilities are down.1

Although application lifecycle management (ALM) services are becoming more mission- critical, they do not have the same financial or “loss of life” impact as the systems described above. Consequently, it doesn’t make sense to model high availability after them. Establishing comprehensive backup and recovery procedures makes sense. Building an underground bunker so that StarTeam can withstand a Class 5 hurricane probably does not make sense. Your money would probably be better spent elsewhere.

Enemies of high availability

To develop a comprehensive hig- availability plan, you should consider all of the potential failures that would create a service outage. How do you enumerate all potential failures? This section provides food for thought on potential failures, in both hardware and software. Use this section to help you examine your environment from end-to-end.

Infrastructure failures In this section, we use the term “infrastructure” to designate any equipment or service outside of a specific application required for the application to be accessible. If an application is operating normally but external factors cause it to be unreachable, it is the same to end users as the application itself failing. Here are some items to consider:

• Hardware: First consider the machines on which your StarTeam and database servers run. Any single hardware failure on these machines could take the machine with it. The most common failures occur with disks, followed (loosely) by memory, power supply, and motherboards. But don’t forget that network cards, disk controllers, and even fans occasionally fail. Consider each crucial hardware

1 The top airline reservation system (Sabre) uses such a hardened it-will-not-go-down availability approach. Despite the cost invested in Sabre, in February 2000 the system went down for about 2 hours because of a router problem.

11 24 x 7 Borland StarTeam

component and how its failure can be quickly repaired or mitigated via SPOF strategies (discussed later).

• Environmental failures: It’s easy to forget that services are subject to environmental factors beyond your immediate control. You certainly depend on power; you might also depend on water for cooling. How reliable are the utility services in your area? Are your services critical enough that you should have onsite backups for these services? You should also consider your site’s vulnerability to natural disasters such as fires, floods, hurricanes, and earthquakes. Unfortunately, you also might have to think about unnatural disasters such as terrorism. Does your organization have emergency plans for these kinds of failures?

• Network outages: There are a variety of issues you should consider specific to the network. One way to divide the concerns is as follows:

• LAN outages: Consider various network failures on the LAN. Switches, routers, firewalls, and other equipment can fail or be improperly configured. Cables can fail or become susceptible to interference from new EMF sources. With StarTeam, both the StarTeam server-to-database and client-to-StarTeam server network segments are critical paths.

• WAN outages: You might have clients who require access from outside of the immediate network. If they rely on VPN service, what happens when the VPN machine fails? If user access relies on access to the building via an ISP, what happens when that ISP’s service fails? What happens when a backhoe cuts the external network lines leading to the building?2

• Service outages: Even when the physical network is intact, users rely on a number of network services in order to find and reach an application service:

2 Note from the author: Although this scenario may sound improbable, I have heard the backhoe-cutting-the-lines scenario from three separate companies!

12 24 x 7 Borland StarTeam

DNS, DHCP, a directory server, and even email. Does your high-availability plan include these components?

• Database outages: Databases need their own care and feeding for proper health. Do you have a plan in place to monitor disk usage to prevent unexpected out-of-disk issues? Do you know how your settings affect recovery time after a server reboot? Do you regularly scan for index corruption and general database health?

• Bandwidth issues: At virtually every point in the system, you should consider capacity. Too many users with not enough bandwidth could result in network congestion, database congestion, or server resource starvation.

• Denial-of-service attacks: In this age of the Internet, you also need to worry about viruses, worms, and denial-of-service (DOS) attacks. A special case of the DOS attack is the distributed DOS (DDOS) attack, with which IP storms are sent from many machines across the Internet. Make sure your security measures are adequate.

Infrastructure issues alone provide a lot to think about. But if you don’t find the weakest links in your high-availability plan, Murphy’s Law will find them for you.

Application outages Infrastructure issues aside, let’s look at the ways in which applications can fail, reducing availability. For each of these, we will provide a peek at how these are addressed with StarTeam. In later discussions, StarTeam options are discussed in more detail.

• Application brownouts: When server applications become heavily loaded, internal locks and bottlenecks can cause disproportionate unresponsiveness. (In immature applications, server congestion can become problematic even under moderate loads.) In client/server environments, client requests comes in waves, and demand peaks or “flash floods” can occur.

As a mature client/server system, StarTeam is designed for highly concurrent access, and application brownouts are mostly avoided through architectural design and years

13 24 x 7 Borland StarTeam

of experience and fine-tuning. Additionally, StarTeamMPX is a product that reduces demand per client and smoothes out demand peaks.

• Server outages: No application is bug free,3 and all are prone to a variety of errors: logic flaws, deadlocks, exceptions, out-of-memory conditions, etc. In many cases, these errors are fatal and will cause the application to terminate. The only lesson here is that the probably of falling prey to such errors is related, in part, to the maturity of the application and its mechanisms to defend against failures.

StarTeam provides extensive fault capturing logic to “catch” otherwise fatal errors, generate diagnostic information, and continue processing. Configurable diagnostic options allow various levels of information to be captured, which can be used to analyze errors, develop patches, and prevent recurrences. This diagnostic feedback approach has proven very beneficial in preventing errors and improving the resiliency of StarTeam.

• Scheduled outages: Applications occasionally need scheduled downtime for maintenance procedures and product upgrades. Even when necessary, these outages constitute downtime for the affected user community.

With the 2005 release, StarTeam allows online backups to eliminate corresponding scheduled downtime. New memory management and caching algorithms allow StarTeam to operate virtually nonstop even while the repository grows and evolves. Vault storage can be expanded dynamically, and archive files can be offloaded and reloaded on the fly. Only certain administrative functions such as product upgrades require scheduled downtime.

3 Even the space shuttle software, for which NASA paid $500,000,000 (or $1,000 per line of code) has bugs. The first shuttle flight (STS 1) was scrubbed at the last minute because of a previously undetected timing flaw.

14 24 x 7 Borland StarTeam

Plan of attack

From a user’s perspective, a service is “down” when it is not available for any reason. Consequently, a comprehensive high-availability plan must consider all potential outages from end-to-end, on a cost/benefit basis. In the next section, we discuss high-availability best practices for StarTeam.

StarTeam high-availability best practices

Enough background. At this point, you have enough information to start building your organization’s high-availability plan. Where do you begin? The chart previously discussed in the “Availability vs. investment” section provides the roadmap: start with relatively low-cost measures that improve high availability, and work your way up the cost-vs.-benefit ladder. How far you go depends on your organization’s needs.

Administrative practices

The most cost-effective measures you can take to assure high availability are basic administrative practices. They are simple procedural guidelines or sensible rules-of-thumb. Just for fun, here they are in a top 10 list.

Practice #10: Don’t be cheap A cost/benefit approach means “don’t spend where the return doesn’t justify the investment.” The other side of the same coin is “don’t be cheap where the investment is worthwhile.” This means you should plan what you need and then spend to fulfill the plan with quality components. “Bargain” network cards, disks, and memory are no bargain if they are flaky. If you’re betting your disaster recovery plan on quality backups, use reputable media and proven utilities. If you need StarTeamMPX to mitigate demand peaks (or boost performance for distributed teams), buy it.

15 24 x 7 Borland StarTeam

Practice #9: Enforce security StarTeam has a powerful security model; use it. Only administrators should have user accounts with administrative privileges. Physical access to the configuration files and vault folders (archive, cache, and attachments) should be controlled. Use the StarTeam view mechanism to partition artifacts to restrict access for external groups. Install and maintain virus software and other security measures. Do not let viruses or the accidental deletion of a critical file disrupt operations.

Practice #8: Centralize your servers Behold the fool saith, “put not all thine eggs in one basket”—which is but a manner of saying, “scatter your money and your attention”; but the wise man saith, “put all your eggs in the one basket and—WATCH THAT BASKET.” — Mark Twain

Centralized repositories are easier to manage and more cost-effective. For more details on why this is true for StarTeam and for guidelines on how to optimize StarTeam for distributed teams, see the BorCon session article Optimizing StarTeam for Distributed Teams.

Practice #7: Enforce change control You already use StarTeam for software change control, so why not use it for operational change control as well? Check in all your backup scripts, cluster configuration files, backup and recovery plans, and all other important operational artifacts. Then, use StarTeam change requests (CRs), process items, and other change management features to control and track changes to those artifacts. (Of course, keep a printed copy of important recovery procedures for obvious reasons!)

Practice #6: Document everything Document backup, recovery, repair, failover, and disaster recovery procedures. Provide training on these procedures to key personnel, and have qualified backup personnel identified. Document the personnel list who has the appropriate training, and publish contact information on how to reach trained personnel. Check these documents into StarTeam and place them under change control.

16 24 x 7 Borland StarTeam

Practice #5: Test everything No plan is worth its weight in paper if it doesn’t work. Make sure your plans work by practicing them. Include extra hardware in your resource plans so you can test various recovery scenarios, including partial recovery (e.g., the database only or a few vault files) and total recovery (e.g., entire repository or entire site) scenarios. Use recovery testing as a training vehicle for administrators. Log who has conducted which recovery tests, and check the training and test logs into StarTeam.

Practice #4: Design for growth If you are just starting out, capacity planning far down the road might be difficult. But once your organization gets a few projects under its belt, start tracking capacity trends. What is the disk usage history of your StarTeam vaults and databases? What is the usage trend for network throughput, CPU usage, and I/O rates? Build a few simple response-checking applications, run them periodically, and store the test results in StarTeam. Use all of this information to stay on top of your capacity needs. Don’t run out of network, disk, or other critical resources and find yourself addressing issues in firefighting mode.

Practice #3: Choose mature software You’ve chosen StarTeam, so you’re off to a good start. Maintain the trend as you choose clustering software, backup utilities, monitoring tools, etc. Remember practice #10.

Practice #2: Choose mature hardware For each component within the hardware architecture, choose equipment from proven vendors. If you’re using fault-tolerant servers, RAID, or clustered servers, ensure that the hardware has been proven in that role. Make sure software has been qualified with the platform software—OS, database, clustering software—that you intend to use. Finally, don’t mix hardware types as redundant instances of the same component. For example, don’t use Intel Pentium® III as a failover secondary for a Pentium Xeon™ primary server.

Practice #1: K.I.S.S. Everything should be made as simple as possible, but not simpler . — Albert Einstein

17 24 x 7 Borland StarTeam

The number one practice is: keep it simple silly.4 This practice might seem in conflict with practice #10 (don’t be cheap), but it really means that should you must find the “as simple as possible, but not simpler” balance. A plan can fail from overdesign and unnecessary complexity just is it can from having gaps. Keeping a high-availability plan simple means:

• Do not overdesign backup, recovery, failover, and other operational plans. Cover the bases but keep procedures easy-to-understand and follow.

• Do not install extraneous hardware on StarTeam and database servers. If you don’t plan to burn CDs on these servers (and you shouldn’t), don’t put a CD burner on these systems. Similarly, don’t add extra peripherals and I/O cards that you don’t need. These components can fail, and the drivers that communicate with them can have bugs.

• Do not run extraneous applications on your StarTeam and database servers. Having multiple core applications on the same machine increases the chance of unintentional resource interference. With the relative low cost of rack-mount systems, err on the side of separation rather than sharing.

• Isolate your test and production networks. You need both environments, but don’t let the test environment’s network traffic bleed onto the production network. If you are using StarTeamMPX, the test environment should have its own Message Broker “cloud”, unconnected to the production cloud.

• Minimize manual tasks. Every manual task required in an operational procedure is a task that could be skipped or performed incorrectly. Where feasible, automate backup, failover, and other processes with scripts or custom StarTeam SDK applications.

Flash control

Client/server systems typically have “demand peaks”. This means that client requests come in “waves”, where wave “valleys” represent relatively quiet periods and wave “peaks” represent

4 OK, the original version is “keep it simple stupid”, but we don’t want to be rude here.

18 24 x 7 Borland StarTeam

higher-than-average request periods. Peaks are sometimes daily occurrences caused by time- based events such as the following:

• If your business has regular working hours, many employees may arrive and log on in the morning at around the same time. Even though log on is quick, users might immediately begin checking out new files or reviewing new CRs.

• Users might launch large tasks just before going to lunch. Report generation or build scripts (that check out files in bulk) are two examples of bulk tasks.

Scenarios such as these will generate daily, cyclical demand peaks. Demand peaks also can be caused by calendar-based events such as end-of-week master builds or end-of-month reports. When demand peaks are sufficiently large, overall responsiveness will drop. In severe cases, you could have a “brownout” where responsiveness is so slow that the service is effectively, well, ineffective. Despite its scalability and performance features, StarTeam is not immune to demand peak brownouts.

Where does congestion occur during these “flash floods”? Consider the following diagram:

19 24 x 7 Borland StarTeam

StarTeam StarTeam Client Server Command API

Demand peak congestion areas

StarTeam VaultVault DB Client

StarTeam Client

Figure 2: Demand peak congestion areas

StarTeam uses a standard client/server architecture in which each client creates a TCP/IP command API connection to the server. During demand peaks, congestion can occur in these areas:

• Network traffic between clients and the server can become a bottleneck as the total volume of request and reply message traffic approaches network maximum bandwidth.

• Internal StarTeam server resources (e.g., locks) can become congested as the total number of client requests exceeds processing power.

• The I/O path to the vault can become a bottleneck if many file check-in and check- out requests are being processed in parallel.

• The network path to the database (or the database’s I/O path to disk) can become a bottleneck if many parallel database queries are being performed.

20 24 x 7 Borland StarTeam

Sometimes demand peak congestion can be relieved by adding hardware, such as another NIC card or more CPUs. Sometimes tuning options such as increasing maximum command threads or the database connection pool can be used to relieve congestion. (See the StarTeam Performance and Scalability Techniques article referenced at the end of this article for more information on these options.)

Ideally, you can monitor and project your peak demand requirements and plan your infrastructure accordingly. Some capacity planning guides suggest that you deploy hardware to meet three to ten times the peak demand you expect to prevent “flash floods”. But demand peak capacity planning is difficult, and purchasing more hardware than you really need can be costly.

Rather than trying to predict and provision for demand peaks, what if you could smooth out demand, making peaks significantly smaller? This is one of the benefits of StarTeamMPX.

StarTeamMPX StarTeamMPX (also called MPX) is an optional component to StarTeam. It is included with StarTeam Enterprise Advantage. The basic MPX architecture is shown below:

StarTeam StarTeam Client Server

Event publish stream

StarTeam Message VaultVault DB Client Broker

StarTeam Client

Figure 3: The basic StarTeamMPX architecture

21 24 x 7 Borland StarTeam

The key component added by MPX is the Message Broker, which acts as a communication broadcast service. The Message Broker adds a publish/subscribe “broadcast channel” to the StarTeam framework. Update objects are “pushed” to StarTeam clients, preventing them from “pulling” updates from the server using refresh and polling techniques. The result is that both the average network traffic and server processing required by clients are reduced. Consequently, basic MPX reduces or prevents “flash flood” periods where a large set of transactions bombard the server at the same time.

StarTeamMPX Cache Agent The StarTeam 2005 release adds the MPX Cache Agent. Whereas standard MPX distributes metadata and database objects to online clients, the Cache Agent persistently stores file revisions at locations throughout the enterprise. An example Cache Agent added to the MPX architecture is shown below:

StarTeam StarTeam Client Server

Check-out requests

Cache Message VaultVault DB Agent Broker

EncryptedEncrypted File publish stream CacheCache

StarTeam Client

Figure 4: StarTeamMPX Cache Agent architecture

Cache Agent-aware StarTeam clients can check out file revisions from network-near Cache Agents, offloading additional demand from both the StarTeam Server and long-distance network connections. Studies have shown that up to 33% of the total transactions and up to

22 24 x 7 Borland StarTeam

98% of the total outbound message traffic from a typical StarTeam Server consists of file checkout requests. As a result, the Cache Agent amplifies the ability of MPX to smooth out demand peaks and prevent flash floods.

See the references at the end of this article for more information on MPX and the Cache Agent.

Backup procedures

First, let’s review some general rules about backup procedures:

• Mirroring does not replace backups: Just to be clear, you need backup procedures even if you use mirroring and other redundancy techniques. Why? If a file becomes corrupted or erroneously deleted, mirroring simply reproduces the corruption or the file deletion. You still need backups to recover from many failure scenarios.

• Backups are an important part of high availability: This might not seem obvious, but reme mber that MTTR is the other variable in the availability equation. Without backups, the “R” in MTTR might be very difficult to provide.

• Test integrity of backups periodically: Just because tapes are spinning and data is flowing doesn’t mean your backups are working. Drives fail, media fails, and backup utilities don’t always detect these. Perform periodic recoveries with your backups to make sure everything is working.

• Consider a rotating storage scheme: Consider using a rotating or hierarchical storage system, which can facilitate your disaster recovery plan as well. In a three- level rotation scheme, three complete sets of backups (or three sets of full and incremental backups) are rotated through three different storage sites, at least one of them being far-enough offsite to be useful in a disaster-recovery scenario. Disaster- recovery planning is discussed in more detail later.

Older (6.0 and prior) StarTeam releases require a different backup procedure than StarTeam 2005 and beyond, so let’s go over each separately.

23 24 x 7 Borland StarTeam

Backup procedure prior to StarTeam 2005 If you use StarTeam 6.0 or older, your backup procedure is probably a variation of this process:

1. Lock the server. This doesn’t kick off currently connected users, but it prevents them from sending commands until the server is unlocked.

2. Back up the database and vault. (Only the archive and attachments vault components require backing up, not the cache.) You might use disk-to-disk copies and/or differential dumps to speed things up, but the process might require minutes to an hour for some repositories.

3. Unlock the server. Clients can be resume sending commands to the server.

Why did StarTeam 6.0 and older releases require the server to be locked during backups? The answer lies in the design of the vault, which is illustrated below:

24 24 x 7 Borland StarTeam

StarTeam 6.0 Vault

Single Volume Base version Delta 1 Delta 2 Delta 3 … Text files Base version Delta 1 Delta 2 Delta 3 … Archive Folder Base version Rev 1 Rev 2 Rev 3 … Binary files Base version Rev 1 Rev 2 Rev 3 …

Single Volume Full version Full version

Full version Full version Cache Uncompressed Folder Full version Full version

Full version Full version

Figure 5: StarTeam 6.0 and earlier vault

The vault used by StarTeam 6.0 and older versions is called the Native I Vault. With this vault, each file added to a StarTeam repository is first written as a base revision to a unique archive file. New revisions checked in are then appended to the archive file as deltas (for text files) or full revisions (for binary files). (Cache files consist of independent full file revisions created for performance reasons.) Vault updates require careful synchronization with the repository’s database to ensure agreement on the number of revisions in each archive file. Consequently, the StarTeam 6.0 backup procedure has to ensure that no updates occur while the database and archive files are backed up.

Backup procedure with StarTeam 2005 With StarTeam 2005, backups can be performed in a completely online manner, requiring no server locking at all. The major change that allows online backups is the development of a new vault, termed the Native II Vault. An overview of the Native II Vault is shown below:

25 24 x 7 Borland StarTeam

StarTeam 7.0 Vault

subfolders MD5-based storage

Hive compressed 000a807b9f393f58a69998b2cd7db7d2.gz 00/0 … 000752242cc7e16d573f299a127903f2.gz uncompressed

Archive … Root

ff/f fff16c26e911ac72abad5557ac44d84c uncompressed 000a807b9f393f58a69998b2cd7db7d2 00/0

… 000752242cc7e16d573f299a127903f2

Cache … Root

ff/f fffb865605a09eef1f06be92a38bc8da

Figure 6: StarTeam 2005 Vault

The Native II vault adds the notion of a “hive”, which can be thought of as a folder tree with its own archive and cache areas. New file revisions are added to the archive portion of a selected hive in either compressed format (.gz extension) or noncompressed format (no extension). (A folder for “attachments” is still used; it resides “outside” of the vault hives.) File revisions are stored based on their MD5, and the same MD5 is never stored twice (within the same hive). (The cache area continues to hold uncompressed, full file revisions.) Once a file is added to the archive, it is never modified thereafter. Consequently, each file has a write- once-read-many usage pattern. This also means that, once a file appears in the archive area, it can be backed up without concern that it will ever be modified.

For completeness, note that the StarTeam 2005 allows multiple hives, each of which contains its own archive area that must be backed up. This is illustrated below:

26 24 x 7 Borland StarTeam

Figure 7: StarTeam 2005 Vault arthictecture

Hives can be dynamically added to expand the addressability of the vault, so you must ensure that your backup procedure keeps pace when a new hive is added.

The new online backup procedure that exploits the characteristics of the Native II Vault is summarized below:

1. First, back up the database using the database’s online backup procedure.

2. When the database backup is complete, online back up the attachment folder and each hive’s archive folders. These backups can be performed in parallel, and a full/incremental backup schedule can be used such as:

• Perform full backups weekly

• Perform incremental backups daily

Note that the server is never locked, hence full functionality remains available.

27 24 x 7 Borland StarTeam

As you might guess, the vault (attachments and archive) backup in this scheme will be chronologically “ahead” of the information represented in the database backup. That is, the archive and attachment folders may contain new files that are not represented by the database captured. But this is fine: the recovery procedure allows for the time mismatch. Here is the recovery plan that matches the backup plan summarized above:

1. Reload the database from the last backup.

2. Simultaneously, reload the archive and attachment folders from the last backup. If the full and incremental backups were used:

• Reload the last full backup

• In parallel, reload all subsequent incremental backups.

When all loads a complete, the repository is ready to use. If archive or attachment folders have “future” files not represented in the database, that’s OK. StarTeam will ignore them and, if those file revisions are eventually added again, the existing files will be overwritten.

The bottom line: StarTeam 2005 contributes to your high availability plan by preventing the need for scheduled downtime due to backups.

Redundancy: Reducing SPOFs

The next level of improving high availability for your StarTeam repositories is to eliminate single points of failure (SPOFs). This means using hardware redundancy for all critical components in your infrastructure. General strategies for various components are summarized below:

• Servers: Use machines with dual power supplies, dual fans, error-correcting (ECC) memory, and other redundancy techniques. Note that redundant systems with dual CPUs might still require rebooting after a CPU failure but can sometimes operate without the failed CPU.

28 24 x 7 Borland StarTeam

• Storage: Use attached or network (e.g., SANS) RAID storage for disk storage, and provide access via redundant controllers. One strategy is to use attached mirrored disks (RAID 1) for the operating system and StarTeam. Then, use network storage (RAID 0+1, RAID 1+0, RAID 5, etc.) for the vault and database components so that they can be accessed by multiple systems.

• Network: Eliminate SPOFs in the network by deploying dual NICs on server machines and dual switches. To support external users, select outbound connections from two separate ISPs (and ensure that they don’t subcontract to the same provider!)

Graphically, a StarTeam environment with fully redundant hardware will look something like this:

StarTeam Server Switch ECC memory, dual fans, etc. RAID vault disks

dual dual NICs controllers

RAID DB disks Switch Database Server

Figure 8: StarTeam redundant hardware example

In this configuration, operations will continue with the failure of many (but not all) of the redundant components. In some cases such as a CPU failure, a system restart will be required to restore normal operation. And of course there are scenarios such as motherboard failures or physical damage (e.g., a soft drink spilled into the server case) that might not be recoverable without replacement of the lost component(s) and possibly recovery from backups. You

29 24 x 7 Borland StarTeam

should consider the possibility of these scenarios and the business costs they incur to determine if you need to go to the next level.

Failover management

With the high availability features described to this point, the StarTeam and database server machines are still, as a whole, single failure points. If a server machine fails beyond the protection provided by its redundant components, you probably have one of two options:

1. The machine might be usable if it is restarted (e.g., with a failed CPU disabled). This might take several minutes for the operating system, followed by the time to start the StarTeam or database server process, which could require a few more minutes to an hour depending on how large the repository is and how much restart recovery is needed.

2. If the machine is not usable, it will require repair or replacement. At one extreme, it could take several hours to locate a spare machine, install the correct software, configure it, and start it. Even if a “cold standby” machine is available, appropriately pre-installed and preconfigured, inserting it in place of the failed machine and starting it could take an hour or more.

In both scenarios, your business needs might require that you have a faster failover plan in the case of a failed server. The next step in improving high availability is to reduce MTTR in a failover situation by using automated failover management.

Failover prerequisites The basic ingredients you’ll need to construct a failover architecture for StarTeam:

• Server machines: At least two identically configured systems, four if you use separate machines for the StarTeam and database servers (which is the normal case for enterprise repositories). Each set of machines will act as a pair: one as the primary, the other as the secondary.

30 24 x 7 Borland StarTeam

• Storage: Shared disks for vault components that can be accessed by the primary and secondary StarTeam server machine, and shared disks that can be accessed by the primary and secondary database machine.

• Network connections: You will need separate network segments for each of the following:

• Heartbeat network: These are server-to-server connections that allow each secondary machine to monitor the health of its primary counterpart. (Some experts recommend redundant heartbeat networks so that the heartbeat connections do not themselves become critical failure points.)

• Client-facing network: Also called the “service” network, this is the set of network connections that allow each primary and server machine to reach the client community. Usually, only the StarTeam server machines require direct client-facing connectivity, but each machine should have two NICs for redundancy.

• Administrative network: This is an optional third network between servers that allow administrative applications to perform monitoring functions without affecting the bandwidth of the heartbeat and client-facing networks.

• Failure Management System (FMS): Also called the cluster management system, this is the software that manages the failover components, detects failures, and performs failover actions. StarTeam has been tested with Microsoft Cluster Server (MSCS).

Note that the FMS might have additional requirements. MSCS, for example, requires a shared disk for the “quorum resource”, which is used to define and manage cluster resources.

Active/passive failover configuration It is very important to note that StarTeam allows only one active process to access a specific StarTeam repository. For this reason, StarTeam can use “active/passive” clustering (also called “active/standby”), but it cannot use “active/active” or similar configurations that utilize

31 24 x 7 Borland StarTeam

multiple active processes against the same repository. Such configurations likely will cause data corruption within the repository.

To deploy StarTeam on an active/passive cluster configuration, each node must have the following items:

• The identical StarTeam release (including patches)

• A copy of the server configuration file (starteam-server-configs.xml)

• Copies of all MPX profile configuration files (EventServices\\*.xml)

• A copy of the StarTeam license file ServerLicenses.st

A central concept within the failover architecture is the “resource group”. It is a set of resources that act as the failover unit. A resource group typically defines the service application (e.g., StarTeam server process) and shared resources such as disks and database connections. The resource group appears as a “virtual server” that has a host name and a virtual IP address.

An example active/passive cluster configuration is shown below (only one pair of servers is shown in this diagram, and the optional administrative network is not shown):

heartbeat network

Active Passive Mirrored Server disks Server

12.34.56.78

client-facing service network

Figure 9: Active/passive cluster example

32 24 x 7 Borland StarTeam

As shown, the resource group is mapped to the primary machine (Active Server), which hosts the active StarTeam server process. Connections to the virtual server (IP address 12.34.56.78) are routed to that machine. The secondary machine (Passive Server) has physical connections to the shared disks and the client-facing network, but they are not active.

Failover can be manually initiated, for example when maintenance on the primary machine is needed, or it can be automatic, for example when the secondary machine detects a loss of heartbeats on the primary machine. When failover occurs, the FMS software transfers control of shared resources to the secondary machine, starts the application (StarTeam server process) on that machine, and routes new connections to it. This is illustrated below:

heartbeat network

Active Passive Mirrored ServerX disks Server 12.34.56.78

client-facing service network

Figure 10: Failover example

What do StarTeam clients see when a failover occurs? Because StarTeam uses TCP/IP connections, all connections are lost when the primary machine fails. Hence, StarTeam clients will see a “connection lost” message the next time they use their client in a way that requires server interaction. In that case, they must re-open the project they were using with the same parameters (virtual server name + port number). For most users, by the time they detect a connection loss and attempt to re-open the project, the failover will have completed, so they will be able to resume work right away. (One StarTeam customer reports testing failover and seeing the secondary system ready for work within 24 seconds of the initial failure.)

33 24 x 7 Borland StarTeam

In case you’re wondering, StarTeam’s auto-reconnect functionality does not work in a failover scenario. This is because the StarTeam server process maintains session state to support auto-reconnect. That state is not persisted; hence, all clients must establish new sessions when connecting to a newly-started StarTeam server process.

Process monitoring with Borland Deployment Op-Center In an enterprise environment, many processes might warrant remote monitoring: one or more StarTeam server processes, Message Brokers, Cache Agents, workflow notification agent, etc. Not all of these services warrant installation and control by an FMS solution. For example, MPX Message Brokers and Cache Agents are optional processes; StarTeam clients can perform all normal functions without them.

Borland Deployment Op-Center (BDOC) is a tool that assists with the management of enterprise processes distributed over many machines, including processes that might be under the control of FMS. BDOC allows you to monitor multiple processes from a single console. For non-FMS controlled processes, BDOC also provides manual and automatic remote start/restart functions. An example of the BDOC monitoring console is shown below:

Figure 11: Borland Deployment Op-Center

34 24 x 7 Borland StarTeam

A detailed discussion of BDOC is beyond the scope of this article—see Borland’s Web site (http://www.borland.com) for white papers and technical information on BDOC.

Disaster recovery

Disaster recovery (DR) is needed when your site experiences a widespread systematic failure. Natural disasters such as fire, flooding, or tornados could cause such a failure, as could unnatural (manmade) disasters. A failure at a local power station that requires a week to repair would constitute a disaster. The common theme is that the entire infrastructure must be restored, usually at another site.

As you can imagine, DR preparation is a major investment. You must be able to create the core of your production environment—servers, applications, databases—at a remote location. You must also be able to provide access to those core services to users, which means that your plan must include spare client machines, software, and remote network access. And of course, to ensure that your DR plan works, you must test it. Because it is the least likely failure scenario you will face and because of its cost, DR planning is the highest rung of the high availability cost/benefit ladder.

For the rest of this topic, we will focus on the measures required for StarTeam to participate in a DR plan.

Replication for DR Let’s assume you’ve taken care of all the DR basics: establishing an offsite recovery location, planning for needed equipment to be available, storing offsite copies of needed software, etc. The operating system, application software, and even configuration files are fairly static and can be stored offsite ahead of time. The core challenge for repositories is to keep an up-to- date copy of business data at an offsite location. For StarTeam, this means a copy of the database as well as the attachments and archive portions of the vault. (Remember that cache files are automatically created as needed.) This is essentially a replication problem.

35 24 x 7 Borland StarTeam

The types of replication that you can perform can be categorized based on the latency with which information is copied. The replication latency determines how up-to-date the remote site’s data will be should a failure at the main site occur. There are three levels of latency:

• Synchronous: Data is copied in nearly real time to the remote location.

• Asynchronous: Data is copied with some delay, but often enough that the remote site will be only a few hours out of date, possibly minutes.

• Batch: Data is copied in periodic bulk batches, usually as a copy of the backup media.

In each case, it is important that the data being copied is “transactionally consistent”. For StarTeam, this means that a complete, recoverable copy of the database is sent and that all archive and attachment files described by the database are sent. (As discussed under Backup Procedures, the archive and attachment backup can contain additional files that are in the “future” with respect to the database, but it cannot be missing files reflected by objects stored within the database.) Techniques that can be used for each level of latency are described next.

Synchronous replication This is essentially long-distance mirroring, and it the most expensive form of replication. There are two basic approaches, both of which can be pricey:

• Hardware approaches such as fiber channel mirroring, which can be performed at distances up to 10 km. With newer fibre technologies, this distance is being stretched all the time.

• Software approaches such as Veritas Volume Replicator.

The advantage of synchronous replication is that the offsite copy is always up-to-date, so virtually no data is lost. In fact, disaster recovery in this mode is more like a long-distance failover scenario. Some FMS software can even manage failover when long-distance mirroring is used.

36 24 x 7 Borland StarTeam

The disadvantage of synchronous replication is cost: software, hardware, high-speed site-to- site network bandwidth, and administration. Also, the network or fibre connection must be highly reliable and probably redundant. But if the high-availability needs of your organization are sufficient, the cost of synchronous replication might fit your cost/benefit formula.

Asynchronous replication It is possible to use asynchronous replication for StarTeam disaster recovery. The basic strategy for asynchronous replication has two components:

• Database-provided replication: This means replication specifically designed for your database system. Disk-level (volume) replication will not work in an asynchronous mode, because transactional integrity can not be guaranteed. Examples of qualified database replication techniques are SQL Server “Log Shipping” and Oracle Standby Database Replication.

• Incremental vault file copy: This means utilizing a utility that can continuously detect new archive and attachment files and copy them to the secondary system. Such a utility leverages the “write once” feature of the new StarTeam 2005 Native II vault.

The advantage of asynchronous replication is that it requires less network bandwidth than synchronous replication. It is also lower cost, and the “currency” window—the time with which the replicated database will lag the primary database—can be tuned. The disadvantage of this approach is that it still requires a fairly reliable network, and, in the event of a failover to the remote site, some data will be lost.

This strategy has to be qualified with the term “possible” because it is not yet in use. Because StarTeam 2005 is a new release, as of this writing, customers are yet using an asynchronous replication strategy (that we know of). Hopefully, future versions of this article will document best practices on this technique, derived from real world implementations—maybe yours!5

5 Contact me ([email protected]) if you are successfully using an asynchronous offsite replication procedure, and I’ll add it to the next version of this article.

37 24 x 7 Borland StarTeam

Batch replication “Never underestimate the bandwidth of a station wagon filled with tapes barreling down the highway” — Unknown

The most common (and lowest-cost) offsite replication technique used for DR purposes is batch replication. This means transferring backup copies to the remote site via removable media or, if the bandwidth is available, via the network. The most common practice is to rotate daily backups through the offsite storage area so that it is within one or two days of the most recent backup. (See Backup Procedures described earlier.)

The advantages of batch replication are its simplicity, reliability, and low cost. Considering the increased high availability afforded by having a DR strategy, daily use of couriers, FedEx, or Tony’s 1987 Volvo wagon is a good tradeoff. The biggest disadvantage of replication via backups is the greater window of data that could be lost if a disaster occurs. Unless network transfer is used, this approach also relies on manual steps to handle and ship media.

Other StarTeam 2005 high-availability features

StarTeam 2005 provides a few other new high-availability features that have not been described elsewhere in this article. These are not best practices per se, but rather important features you may find nice to know:

• StarTeam 2005 vault conversion: If you upgrade an existing StarTeam repository to the 2005 release, you can convert your existing (Native I) vault to the new (Native II) format while the StarTeam server is fully operational. The vault conversion runs as a background thread; it can be started and stopped at any time; and it can be scheduled to run at specific times. No scheduled downtime is needed for vault conversion.

• Dynamic vault expansion: The new vault format also allows space to be allocated dynamically without stopping the StarTeam server. If you see that vault disk volumes are running low, you can add a new vault hive (e.g., from a SANS array) and begin using it on the fly.

38 24 x 7 Borland StarTeam

• Archive file management: The new vault format is resilient to the offloading of archive files. Old archive files can be dynamically copied to secondary storage and removed. Should a user attempt to check out a file revision whose archive file has been removed, the user will be notified of which file needs to be reloaded. The required file can be reloaded and the checkout operation subsequently repeated, all without stopping the server.

• New memory management algorithms: In previous StarTeam releases, memory usage of the StarTeam server process grew according to caching options (ChangeRequestsCaching, FilesCaching, etc.), the number of objects within the repository, and the usage patterns of users. For large repositories, the StarTeam server process could eventually run out of virtual memory, requiring a restart to flush internal caches. With the 2005 release, new memory management algorithms allow the StarTeam server process to detect available memory and prevent out-of-memory conditions regardless of caching options, repository size, or usage patterns.

Summary

In this article, we have shown that the pursuit of 24 x 7 operation is a quest to balance high- availability requirements with financial considerations. In short, high availability is a cost/benefit analysis. A comprehensive high-availability plan for StarTeam must consider risk and mitigation factors of all system components from end-to-end. StarTeam possesses specific features that can help you develop a high-availability plan that meets your organization’s needs.

The steps you can take to maximize high availability follow a graduated scale of cost/benefit topics. The topics, organized from lowest cost and certain impact to highest cost with less probable impact are:

• Review administrative practices: Follow the top 10 policies for maintaining sound operational procedures for your organization.

39 24 x 7 Borland StarTeam

• Smooth demand peaks: Prevent demand peak “flash floods” by using StarTeamMPX, including the new Cache Agent component.

• Establish online backup procedures: Prevent scheduled downtime for backups by exploiting new StarTeam 2005 features.

• Eliminate SPOFs: Use redundancy to eliminate single points of failure.

• Use clustering for automated failover: Deploy an active/passive cluster to reduce MTTR when a system failure occurs.

• Implement a disaster recovery plan: Use a synchronous, asynchronous, or batch offsite storage plan to allow recovery in the event of a total site failure.

And don’t forget to document your high-availability plan and test it!

Glossary

The continuous operation of an application, termed 24 x 7 to indicate availability “24 hours a day, 7 days a week.” (Alternatively, 24 x 365 24 x 7 indicates “24 hours a day, 365 days per year.”) In practice, no system (or 24 x 365) can provide 100% availability, even under the most generous budgets. Hence, 24 x 7 operation is a goal to be strived for. A clustering configuration in which a primary (“active”) node provides a active/passive service while a secondary (“passive”) nodes waits as a hot standby, clustering ready to take over processing should the primary node fail. The StarTeamMPX component that provides file-caching functions. The Cache Agent provides distributed teams a high-performance alternate Cache Agent source for file checkout operations. The Cache Agent removes network traffic and processing away from the StarTeam server, helping to reduce demand peaks. The period of time when client request traffic rises to levels significantly demand peak above average traffic levels. A sharp demand peak that significantly affects responsiveness is referred to as a flash flood. disaster recovery A recovery scenario in which a total site failure has occurred. Critical (DR) applications typically must be reestablished at an offsite location.

40 24 x 7 Borland StarTeam

A recovery scenario in which a “standby” server takes over the processing of a failed server. Compared to recovery from backups, failover hardware replacement, and other recovery scenarios, failover is designed to have a much lower MTTR. Failure A software management system that provides automated failover Management capabilities. FMS systems are also known as cluster management System (FMS) systems because clustering is used to provide standby servers.

The resiliency of a component due to a failure. Hardware is typically fault tolerance considered fault tolerant when it has redundant components that mitigate the failure of components. five nines A term for 99.999% availability. A sharp increase in client request traffic (see demand peak), which often flash flood renders a service so unresponsive as to be unusable. The ability of an application to meet or exceed the availability high availability requirements of its users. Mean time between failures. The projected time with which a hardware MTBF component can be used in continuous operation before it will fail. Mean time to repair. The projected time required to repair, replace, or MTTR circumvent a failed component. Network Interface Card. Most commonly an Ethernet card running at NIC 100 megabits (100BaseT) or 1 gigabit (1000BaseT). The ability to perform a backup of an application’s persistent data without having to shut down the application or otherwise make its services unavailable. A backup of a StarTeam instance (configuration) online backup includes the database and the archive and attachments components of the vault. Beginning with the 2005 release, StarTeam supports online backups. six sigma A term for 99.9999% available. A lofty goal. Single point of failure. A single hardware or software component that, if SPOF it fails, will cause a service outage. An add-on product for StarTeam that adds a publish/subscribe framework to the client/server architecture. StarTeamMPX (also referred StarTeamMPX to as MPX) boosts server scalability and client responsiveness while reducing overall network traffic.

41 24 x 7 Borland StarTeam

References

• Borland StarTeam MPX Server, May 2003. Available at: http://www.borland.com/products/white_papers/starteam_mpx_server.html. This paper provides a high-level overview of StarTeamMPX.

• Optimizing StarTeam for Distributed Teams, September 2004. Presented at BorCon 2004. (Watch http://bdn.borland.com for forthcoming download instructions.) This article describes techniques for optimizing StarTeam for distributed teams.

• StarTeam Configuration Best Practices, December 2003. Presented at BorCon 2003. Available at: http://bdn.borland.com/article/0,1410,31869,00.html. This articles presents best practices for creating StarTeam server configurations, defining and managing projects, and using StarTeam views. Do’s and Don’ts, sample development scenarios, and general tips and techniques are presented for using StarTeam effectively.

• StarTeam Performance and Scalability Techniques, December 2003. Presented at BorCon 2003. Available at: http://bdn.borland.com/article/0,1410,31868,00.html. This article describes techniques for maximizing the performance and scalability of StarTeam servers. Hardware and capacity planning, StarTeamMPX configurations, and performance tuning are all discussed.

• StarTeam Security Explained!, September 2004. Presented at BorCon 2004. (Watch http://bdn.borland.com for forthcoming download instructions.) This article explains StarTeam ACLs, ACEs, and other StarTeam security concepts and how to configure security for your StarTeam projects.

Made in Borland® Copyright © 2005 Borland Software Corporation. All rights reserved. All Borland brand and product names are trademarks or registered trademarks of Borland Software Corporation in the United States and other countries. All other marks are the property of their respective owners. Corporate Headquarters: 100 Enterprise Way, Scotts Valley, CA 95066-3249 • 831-431-1000 • www.borland.com • Offices in: Australia, Brazil, Canada, China, Czech Republic, Finland, France, Germany, Hong Kong, Hungary, India, Ireland, Italy, Japan, Korea, Mexico, the Netherlands, New Zealand, Russia, Singapore, Spain, Sweden, Taiwan, the United Kingdom, and the United States. • 23254

42