Ensuring Application Uptime in the Cloud the Importance of Uptime

Ensuring Application Uptime in the Cloud The Importance of Uptime Nearly every business is now reliant, in some way, on the availability of one system or another. An hour of down time can, for an ecommerce retailer, mean millions in lost revenues; for an independent software vendor it can lead to lost customers and a tarnished reputation. For the average CTO, uptime is a top, even if sometimes unstated, priority. Whether your infrastructure is on-premise or hosted, dedicated or virtualised, public or private, uptime is a multifaceted issue. While a move to the cloud, for most, brings with it a welcome increase in the uptime you can expect, there is still a wide range of issues that can bring your systems down. There is also a difference between the uptime guarantee in your average hosting SLA and the actual uptime of any system. Understanding this is fundamental to ensuring that your organisation benefits from the uptime assurances it needs, at a price it can afford. In this ebook, we look at many ways you can work with your hosting provider to influence the uptime of your cloud hosted applications and what the trade-offs in each case are. Uptime in the Cloud Uptime is a measure of the availability of a component or system. It is typically reported as a percentage indicating the ratio of uptime to total time over any period. Depending on the circumstances, uptime is used to refer to the actual, measured uptime of a system, the predicted uptime or the uptime that is guaranteed. In the context of cloud hosting, the term ‘uptime’ is frequently used in reference to the predicted uptime of the data centre in question. This figure, or an improvement on it, is often passed on to you in the form of an uptime guarantee. But a guarantee, with financial penalties for failure to comply, is not the same as actual system uptime. In addition, it’s entirely possible for the uptime of your specific applications to differ from that of the host data centre since it is affected by a range of other factors. Some of these factors can be the responsibility of your hosting provider while others may fall entirely at your feet. Either way, it’s essential that you take a holistic view of application uptime, and don’t rely entirely on that guarantee, if you are to meet your organisation's requirements for application availability within budget. In the cloud hosting context, the term uptime is frequently used in reference to the expected availability of a data centre. 1 The True Cost of Uptime 6 Application Performance Ensuring Application Uptime Business Continuity and Disaster 2 in the Cloud 7 Recovery 3 Single Points of Failure 8 Service Level Agreement 4 Scalability 9 Application Security 5 Application Architecture 10 Measurement and Monitoring www.iomart.com 1 The True Cost of Uptime Assuming your objective is to secure a desired level of application availability, rather than simply being reimbursed for any downtime, there are a great many ways that this can be achieved. As with most things however, uptime is inexorably linked with cost and each incremental increase is likely to add to your bill. Therefore, before making changes to your infrastructure or agreements, you need to know how much downtime is, in truth, acceptable to your organisation. Understanding your recovery time objective (RTO) – i.e. how quickly you need your systems to recover and your applications to be back up – and recovery point objective (RPO) – i.e. how much data loss between backups or replications you can tolerate following a disruption – will help to guide all your decisions going forward. Do you need to recover in an hour, a day or a week? And does this differ between systems? Knowing this will help you make the most cost effective choices while achieving the required uptime. Before making changes to your infrastructure or agreements, you need to know how much downtime is, in truth, acceptable to your organisation. www.iomart.com 2 Ensuring Application Uptime in the Cloud Data centre tier Since data centre uptime is the starting point for most predictions of application uptime, it makes sense that we start here also. The Uptime Institute’s data centre tiers started life as a guide for those building data centre facilities. To the end- customer, they have become the standard measure of a data centre’s resilience and therefore the uptime that can be expected. The tier of the data centre (or centres) in which you choose to host your applications will therefore have a direct effect on the level of uptime that your provider will be able to guarantee you. Expected Availability by Data Centre Tier 99.99% 100.00% 99.98% 99.95% 99.90% 99.85% 99.80% 99.75% 99.75% 99.70% 99.67% 99.65% 99.60% 99.55% 99.50% Tier 1 Tier 2 Tier 3 Tier 4 If money is no problem, then you could of course invest heavily in the most robust hosting environment possible - the tier 4 data centre. But the cost of such an approach may not make financial sense for you. To move the uptime clock up from 99.982% (tier 3) to 99.995%, a tier 4 data centre has to implement an independently dual-powered cooling system along with electrical storage and distribution systems. These do not come cheap and will certainly put the cost of your hosting up significantly. While these measures do, clearly, improve the expected uptime of the data centre, they protect against some of the less likely single incidents and do nothing to address the wide range of other factors that can still cripple your application. For many businesses, a tier III data centre offers a happy medium between high availability and affordable cost, leaving you with some budget to fix other issues. www.iomart.com 3 Single Points of Failure As the saying goes, ‘a chain is only as strong as its weakest link’. The same thing goes for your application and the infrastructure it runs on. Analysing the entire chain to identify and remove single points of failure, i.e. single devices that, if INTERNET they were to fail, would completely prevent the system from functioning, is a key step in achieving maximum uptime. FIWA Single points of failure can take many forms and may or may not be under your direct control. Either way, don’t assume that redundancy or resilience is a given and, together with your hosting provider, take a look at all the links in your chain. As a minimum, we recommend that you look at the following: OA BAANC • Power supplies - do all critical devices have dual power supplies? • Network connections - do all critical devices have redundant connections to the network? • Network switches - are the switches redundant? • Load balancers - while load balancers share traffic between TAC redundant servers, a single load balancer is a single point of WITC failure that can take your application ‘off-air’ • Firewalls • DNS servers • Third-party dependencies • Routes to the internet via multiple carriers WB, ANI, MMCAC O In most cases we recommend that you introduce device level redundancy to offset the risk of any application outage arising from the failure of one of these components. In some cases, however, your provider may be able to commit to a hardware replacement within your acceptable downtime window. For example, if you decide FAT TOANT IINT that your business can tolerate the failure of a specific component B for an hour, but a replacement can be installed in 30 minutes, there is no need to implement a fully redundant device in defence against Example of web hosting infrastructure this risk. with no single point of failure In addition to all of the above, your application itself can be a single point of failure. Refer to the section on application architecture for more information. www.iomart.com 4 Scalability Even if your applications are ‘up’ in the broadest sense, any inability to cope with increases in demand can lead to unavailability for some users. Your approach C UT to capacity planning and scalability therefore plays a determinant role in your application’s uptime. Cloud systems are often thought of as intrinsically scalable, but scalability can mean several things: • Scaling out - adding new servers, or machine instances, to a cluster • Scaling up - adding new resources (i.e. CPU, RAM) to an existing server ITC Scaling up and out are both perfectly possible in a cloud hosted environment, providing you’re willing to pay for it. Scaling out can, . theoretically, be performed instantly, whereas scaling up demands that the UC machine in question be restarted. Elasticity is the automation of the scaling out process. In an elastic system, the application can monitor demand in real time and make calls to the C U hypervisor to create or destroy machine instances accordingly. But scaling up, scaling out and elasticity are all worth nothing if your application hasn’t been created to utilise them. [See the section on application architecture.] A sound capacity planning approach combined with knowledge of the application architecture is fundamental in determining if or what scalability is required within your infrastructure. For mature workloads with predictable fluctuations in volume, it may be more cost effective to add the required headroom to your system permanently, rather than to pay for flexibility. www.iomart.com 5 Application Architecture Application architecture plays a crucial role in uptime. For instance, if your application hasn’t been developed in such a way that it can utilise redundant or scalable resources, these are not going to offer an effective means of improving uptime and availability.

Load more