Ensuring Application Uptime in the Cloud The Importance of Uptime
Nearly every business is now reliant, in some way, on the availability of one system or another. An hour of down time can, for an ecommerce retailer, mean millions in lost revenues; for an independent software vendor it can lead to lost customers and a tarnished reputation. For the average CTO, uptime is a top, even if sometimes unstated, priority.
Whether your infrastructure is on-premise or hosted, dedicated or virtualised, public or private, uptime is a multifaceted issue.
While a move to the cloud, for most, brings with it a welcome increase in the uptime you can expect, there is still a wide range of issues that can bring your systems down. There is also a difference between the uptime guarantee in your average hosting SLA and the actual uptime of any system. Understanding this is fundamental to ensuring that your organisation benefits from the uptime assurances it needs, at a price it can afford.
In this ebook, we look at many ways you can work with your hosting provider to influence the uptime of your cloud hosted applications and what the trade-offs in each case are. Uptime in the Cloud
Uptime is a measure of the availability of a component or system. It is typically reported as a percentage indicating the ratio of uptime to total time over any period. Depending on the circumstances, uptime is used to refer to the actual, measured uptime of a system, the predicted uptime or the uptime that is guaranteed.
In the context of cloud hosting, the term ‘uptime’ is frequently used in reference to the predicted uptime of the data centre in question. This figure, or an improvement on it, is often passed on to you in the form of an uptime guarantee.
But a guarantee, with financial penalties for failure to comply, is not the same as actual system uptime. In addition, it’s entirely possible for the uptime of your specific applications to differ from that of the host data centre since it is affected by a range of other factors. Some of these factors can be the responsibility of your hosting provider while others may fall entirely at your feet. Either way, it’s essential that you take a holistic view of application uptime, and don’t rely entirely on that guarantee, if you are to meet your organisation's requirements for application availability within budget.
In the cloud hosting context, the term uptime is frequently used in reference to the expected availability of a data centre.
1 The True Cost of Uptime 6 Application Performance
Ensuring Application Uptime Business Continuity and Disaster 2 in the Cloud 7 Recovery
3 Single Points of Failure 8 Service Level Agreement
4 Scalability 9 Application Security
5 Application Architecture 10 Measurement and Monitoring
www.iomart.com 1 The True Cost of Uptime
Assuming your objective is to secure a desired level of application availability, rather than simply being reimbursed for any downtime, there are a great many ways that this can be achieved. As with most things however, uptime is inexorably linked with cost and each incremental increase is likely to add to your bill.
Therefore, before making changes to your infrastructure or agreements, you need to know how much downtime is, in truth, acceptable to your organisation. Understanding your recovery time objective (RTO) – i.e. how quickly you need your systems to recover and your applications to be back up – and recovery point objective (RPO) – i.e. how much data loss between backups or replications you can tolerate following a disruption – will help to guide all your decisions going forward.
Do you need to recover in an hour, a day or a week? And does this differ between systems? Knowing this will help you make the most cost effective choices while achieving the required uptime.
Before making changes to your infrastructure or agreements, you need to know how much downtime is, in truth, acceptable to your organisation.
www.iomart.com 2 Ensuring Application Uptime in the Cloud
Data centre tier Since data centre uptime is the starting point for most predictions of application uptime, it makes sense that we start here also.
The Uptime Institute’s data centre tiers started life as a guide for those building data centre facilities. To the end- customer, they have become the standard measure of a data centre’s resilience and therefore the uptime that can be expected.
The tier of the data centre (or centres) in which you choose to host your applications will therefore have a direct effect on the level of uptime that your provider will be able to guarantee you.
Expected Availability by Data Centre Tier 99.99% 100.00% 99.98%
99.95%
99.90%
99.85%
99.80% 99.75% 99.75%
99.70% 99.67%
99.65%
99.60%
99.55%
99.50% Tier 1 Tier 2 Tier 3 Tier 4
If money is no problem, then you could of course invest heavily in the most robust hosting environment possible - the tier 4 data centre. But the cost of such an approach may not make financial sense for you. To move the uptime clock up from 99.982% (tier 3) to 99.995%, a tier 4 data centre has to implement an independently dual-powered cooling system along with electrical storage and distribution systems. These do not come cheap and will certainly put the cost of your hosting up significantly.
While these measures do, clearly, improve the expected uptime of the data centre, they protect against some of the less likely single incidents and do nothing to address the wide range of other factors that can still cripple your application. For many businesses, a tier III data centre offers a happy medium between high availability and affordable cost, leaving you with some budget to fix other issues.
www.iomart.com 3 Single Points of Failure
As the saying goes, ‘a chain is only as strong as its weakest link’. The same thing goes for your application and the infrastructure it runs on. Analysing the entire chain to identify and remove single points of failure, i.e. single devices that, if INTERNET they were to fail, would completely prevent the system from functioning, is a key step in achieving maximum uptime.
FI WA Single points of failure can take many forms and may or may not be under your direct control. Either way, don’t assume that redundancy or resilience is a given and, together with your hosting provider, take a look at all the links in your chain. As a minimum, we recommend that you look at the following: OA BA ANC • Power supplies - do all critical devices have dual power supplies? • Network connections - do all critical devices have redundant connections to the network? • Network switches - are the switches redundant? • Load balancers - while load balancers share traffic between TAC redundant servers, a single load balancer is a single point of WITC failure that can take your application ‘off-air’ • Firewalls • DNS servers • Third-party dependencies • Routes to the internet via multiple carriers W