<<

Will my applications run slower with ?

A common question when cloud storage is discussed is “will my applications run slower with cloud storage?”. It’s a very valid question, and the answer to that question requires examination of many key elements of the application behavior, the network, any intermediary devices (including cloud storage gateway devices or on-ramp devices like StorSimple), and of course, the cloud storage provider.

Looking at typical application deployment scenarios, people tend to deploy storage to support those applications in the using one of three methods:

– Directly-attached storage (DAS), where the storage is captive to the server (either within the server chassis, or a drive tray that is dedicated to the server). This is generally a SATA, SAS, or other SCSI connection (FC-AL, etc)

– Network-attached storage (NAS), where the server accesses a shared pool of storage using file system protocols (such as CIFS or NFS) over the network

– Storage-area networks (SAN), where the server accessed volumes that appear as DAS, even though they are accessed over a network and generally provisioned from a shared storage system

Generally speaking, these storage attachment methods exhibit high levels of bandwidth, low levels of latency, and low levels of packet loss (one exception being NAS over the WAN, but we won’t get into that here).

In the case of cloud storage, your server accesses its storage in one of four (primary) ways: – Using file protocols directly, which is identical to NAS as described above (CIFS or NFS)

– Using RESTful HTTP-based , which requires that the application or operating system be prepared to handle any intermediary translation between HTTP and one of the protocols mentioned above

– Using SOAP APIs, which again may require translation

– Using a cloud storage gateway or on-ramp device

In all of these cases, accessing cloud storage occurs over a network, and that network has some measurable amount of latency, packet loss, and some limitation on bandwidth capacity. In the data center, latency, loss, and bandwidth may not be much of a concern, aside from environments using applications that are either high-throughput or transactional in nature. Interestingly enough, in many cases people find that their application servers present storage-related performance challenges of their own as well.

In any case, however, decreases in bandwidth, increases in latency, or increases in packet loss can have a severely adverse affect on performance – and this is not just true for storage.

With cloud storage, the effects of latency, packet loss, and bandwidth can be amplified, as you are effectively inserting the in between your server and your storage for every IO (assuming you are using a public cloud storage service). Whether or not applications will run more slowly is a function of the BDP (bandwidth delay product) of the network, along with the number of IOPS, average block sizes, and response time requirements that your application would drive or require in an optimal environment. If the BDP of the network cannot provide sustained performance under those metrics, it is highly likely that your application will indeed run more slowly with cloud storage. So what is a person to do? Cloud storage has obvious benefits – pay-on-use, lower cost per GB, elasticity, simpler management, and so on. How can you overcome the possibility of using cloud storage slowing down your data center application?

Performance is one of the many benefits of a cloud storage gateway or on-ramp device. Using such a system is not a guarantee that cloud storage will work for your application, as some applications may have storage I/O or access patterns that will not work well even in environments where a gateway is present. However, the majority of applications today that power corporations exhibit two characteristics that make cloud storage gateway devices like StorSimple attractive:

– They exhibit high locality of reference in the data they access

– They work with data that is typically compressible or able to be deduplicated

For example, email data contains text (highly compressible) and attachments (able to be deduplicated), and most people tend to work with email that is the most recent, meaning locality of reference is high (i.e. a ‘working set’ of data can be identified). Document management systems (like SharePoint), file servers, content archives and libraries, and many others, also exhibit these characteristics.

The reason cloud storage gateway devices are attractive for these applications is that:

– They provide internal storage capacity, which can be used as a high performance on-premises tier to support your working set of data. This means that the majority of IOs are serviced locally, mitigating the need to go to the cloud

– They automatically detect shifts in the working set (insertion of new data, older data becoming older and less frequently used), and transparently move that data between faster (integrated storage) and slower (cloud storage) tiers

– They compress and deduplicate data, meaning that data is stored efficiently, and fewer bytes are transferred into and out of the cloud. Writes and reads are performed faster, since the amount of data being moved is smaller, meaning less dependence on WAN bandwidth, and less impact in terms of the effect of loss or latency

As a side effect, what our customers have noticed is that:

– Their performance is consistent with what they had experienced with using traditional storage for the applications we are targeting

– The cloud storage capacity costs are reduced, due to deduplication and compression

– The cloud storage transfer costs are reduced, due to deduplication, compression, and the benefits of automated data tiering (we have a patented algorithm called Weighted Storage Layout, or WSL, which performs this task)

– They can leverage the myriad benefits of cloud storage without the performance worry that typically accompanies use of cloud storage

We have a number of customers that are beta testing our system in primary storage use cases for applications that are considered critical to the function of their business, and are finding that performance is consistent with what is provided by traditional storage for those particular applications.

If you’d like more information, I encourage you to drop me an email!

Cheers,

Joel Addressing Performance, Availability, Data Protection, and Security Concerns in Cloud Storage Environments

Hi all,

We would love to get your feedback on an article we just published on ITBusinessEdge.com, focusing on addressing performance, availability, data protection, and security concerns in cloud storage environments. Please take a read and provide feedback on other issues and concerns you see impacting and slowing broader adoption of cloud storage in your environment.

Cloud Storage 101 Episode 3 – How can cloud gateway devices, or ‘on-ramp’ devices help?

In the past two episodes of cloud storage 101 (ep1, ep2), we discussed what cloud storage is, what value it provides, and an overview of some of the challenges typically associated with cloud storage. In this episode, we’re going to look at how cloud gateway devices, also known as cloud on-ramp devices, can help make cloud storage a reality. In the spirit of intellectual honesty, we can’t make the bold generalization that cloud storage is applicable to any and every application. Quite the opposite, actually. Cloud storage is a wonderful fit for some applications, and for others, it simply isn’t. I tend to be an optimist, so I’m going to focus on how these devices can make cloud storage very useful, rather than on where it can’t.

So, what is a ‘cloud gateway device’, or a ‘cloud on-ramp device’?

Put simply, a cloud gateway device is a device deployed behind your firewall that acts as a useful intermediary between you (your application servers) and a cloud service. Typically, cloud gateway devices perform some function that improves performance, security, usability, or other aspects of the service. Several classes of cloud gateway products exist, and we’re going to focus on those that sit between your application servers and cloud storage services, i.e. cloud storage gateway devices.

Over the last year there are a handful of companies that have emerged with cloud storage gateway devices, StorSimple included. So what do these devices do? How do they make the cloud storage services useful? The following are general characteristics and functions of these devices, and why the functions matter.

1) They give your application servers familiar protocols to use for accessing storage, and natively speak the language of your cloud storage service so your applications don’t have to. This simplifies integration, because many in-house applications (for instance, Exchange, SharePoint), do not natively speak using RESTful APIs when accessing storage. They expect SCSI in its many forms (iSCSI, FC, FCoE).

2) They optimize transmission of data to and from the cloud to improve performance. Typically, this is accomplished through data deduplication, TCP optimization, compression, and other techniques, which A) minimize the amount of data traversing the wire and B) allow the WAN pipe to be utilized in the most efficient manner. Moving redundancy-eliminated blocks of data moves a tremendous amount of actual data.

3) They secure data being transferred over the network (data in motion), and the data while it is being stored on the cloud storage service (data at rest), generally using keys you supply, which are not shared with your cloud storage service provider. This means that your data is rendered practically useless to third parties that sniff your WAN, compromise your cloud storage service provider’s network or systems, steal your service provider’s hardware, or anyone that becomes the recipient of your data from your service provider through litigation.

4) They provide storage management functions, including volume management, provisioning, access control (i.e. LUN masking), snapshots, replication, and other functions. This allows you to manage the system as though you were managing an existing storage array, meaning simplified integration into existing processes, procedures, and data protection infrastructure.

Cloud storage gateway devices provide a lot of useful functions which help make cloud storage consumable by your traditional on-premises applications. Good candidates for such models include those that are centered around unstructured data – stored as such, or in any of its permutations (files, email, content repositories, collaboration systems, archives), as they tend to have a high degree of commonality within the data, and only a small portion of the data is really used by the applications and their users at any given point in time (let’s face it, the email never ceases to arrive, but I can only process so much of it in a given day).

Cloud storage gateway devices certainly do a lot more than this, and we feel that we have a very compelling solution for the applications that we’re focusing on. If you are interested in learning more, I encourage you to drop me an email at joel at dot com.

Cheers!

Joel

Cloud Storage 101 Episode 2 – Cloud storage sounds great – why hasn’t it taken off?

In our last blog post, episode 1 – “what the heck is cloud storage?”, we discussed what cloud storage is and some of its attributes. Based on those, it’s apparent why many corporations are interested in being able to take advantage of cloud storage. It’s elastic, inexpensive, and there’s less headache in maintaining complicated storage infrastructure.

So why hasn’t cloud storage taken off?

The truth is, cloud storage isn’t consumable by most traditional applications in your data center. As it is accessed over a WAN (for public cloud storage) using APIs (more often than other access methods), cloud storage has done quite well for custom-built applications where the source code is readily accessible – both in the data center and for applications running on public compute clouds. However, this is not the case when the application expects to speak directly to disk storage using block protocols that carry SCSI, such as iSCSI, FC, and FCoE.

Additionally, many people are nervous about cloud security. In a sense, when you use cloud storage, you’re giving control of your data to a third-party (unless of course you use a private cloud) and introducing a new availability concern (availability being part of the triad of security: confidentiality, integrity, availability). Naturally this brings up questions such as

– “what if my provider loses hardware?”

– “what if my provider’s data center is compromised?”

– “how isolated is my data from the rest of my provider’s customers?”

– “what if my provider is asked to turn over my data?”

– “what if my provider has an outage?”

– … and many more

Alongside the communication issues and security issues are performance issues. Accessing cloud storage involves communication over a network – potentially, the Internet, which is the case in public cloud storage services. All networks – even data center networks – introduce latency, packet loss, bandwidth limitations, congestion, and other issues, all of which can impact performance. Most applications enjoy very high performance access to their storage systems today, because generally speaking the storage is accessed over the local network where these issues are only noticeable in extremely high-throughput environments.

Cloud storage services also lack in the realm of data protection. Many of them will automatically replicate your data to two or more locations, but replication does not solve the issue of providing a consistent copy from a point-in-time in the past. Virtually all companies today rely on snapshots as the foundation of their backup and restore strategy, which when used correctly can create application-consistent, crash- consistent point-in-time copies of application data, to allow the data to be restored in the case of corruption, site failure, data loss, and so on. Cloud storage services generally do not provide this function.

Unfortunately, without addressing these issues, it’s difficult for I/T organizations to take advantage of cloud storage for many of the applications they rely on to power their business.

In the next episode, we’ll talk about how these issues can be addressed. I look forward to your feedback on these items – where you agree, where you disagree, or points I may have missed. Please feel free to use the comments section below and speak up!

Cheers,

Joel Cloud Storage 101 Episode 1 – What the heck is cloud storage?

Welcome to the first part of Cloud Storage 101, episode #1, “What the heck is cloud storage?”. Unfortunately I don’t have Lucasfilm’s budget so I’m unable to provide a Star Wars entrance for this series, but even if I did, I’m not sure if you would consider it unique, corny, interesting, or what. In any case, in this series we’re going to focus on providing a 5,000 foot view of various aspects of cloud storage. It won’t be all encompassing, but nothing ever is when you’re dealing with topics related to the cloud. I do encourage everyone to comment – good or bad – based on areas where you agree or disagree, see other values with cloud storage, and so on.

So, what the heck is ‘cloud storage’?

Put simply, cloud storage is a product or service that allows you to store your information. Generally, cloud storage is provided by a third party and accessed over the Internet (, Azure, AT&T Synaptic, Iron Mountain Digital) – these are commonly referred to as "public cloud storage" (owned by a service provider) – but there are also examples of cloud storage systems that are deployed behind your firewall (EMC Atmos) commonly called "private cloud storage" (owned by you). Cloud storage is similar to traditional storage found in your data center today in that it serves the purpose of storing data, but there are a few key differences.

1) Cloud storage is generally accessed through APIs (RESTful APIs that use HTTP, SOAP APIs, WebDAV). Some cloud storage services on the Internet provide access via file protocols (such as CIFS or NFS) or even FTP. Traditional storage is usually accessed through block protocols (iSCSI, FC/FCoE, all of which carry SCSI as payload) or file protocols (CIFS, NFS)

2) Cloud storage is inherently elastic. Given that most cloud storage services have their roots in providing data storage for really, really massively-scalable Internet applications – or roots in global object namespaces – it’s no wonder that cloud storage can grow by simply adding nodes, without requiring data migration due to running out of space on a volume, as an example. This is largely due to the mindset of some of the revolutionary Internet giants wanting to come up with a way to eliminate operational costs from their business model and improve flexibility

3) Cloud storage can provide pay-as-you-use economics. For public cloud services, storage consumers are charged generally based on the amount of data they store, and how much data they access. For private cloud systems – and also legacy storage systems – there are the associated capital and operational costs associated with hardware purchase, maintenance, support, and other items, which are not a direct factor in the case of public cloud storage services (the provider deals with this, which of course impacts your data storage and data transfer pricing)

So why do people have an interest in taking advantage of cloud storage?

1) It’s pretty inexpensive. Most services charge around $0.15/GB for data stored, and similar prices per GB of data transferred, and the pricing model is generally tiered to provide incentives for people to store more data to get better prices. This makes it far less expensive than traditional storage in many but not all cases 2) The worries of managing storage are relegated to your cloud storage provider. You can pretty much forget about replacing failed hard drives, or migrating data across systems when you run out of space, and many other things that make you want to pull out your hair

3) You can use as much as you want, and are only billed for what you use. Rather than dealing with large purchases up front – which is common since most people over-provision to account for their three-to-five year capacity needs – you get billed based on what you use, when you use it

4) Availability is “built-in”. Most cloud storage service providers automatically replicate your data across their data centers, which are spread across the country or the world. Many claim that availability provided in this manner is lower cost than traditional storage system replication techniques while providing better up-time. Of course, there are other points of failure that need to be considered

Next time, we’ll look at what the issues with cloud storage are in light of all of the benefits provided as described above. If you have anything you’d like to add to the list, or areas where you feel something needs clarification or an alternate viewpoint, please do leave a comment.

Cheers

Joel StorSimple named ‘Top Ten Data Storage Startup’ by Enterprise Storage Forum

We're humbled by Drew Robb's article on Enterprise Storage Forum today, naming StorSimple as one of ten top data storage startups. This helps to validate that customers do see storage-related issues that plague their applications from being able to be deployed at scale and at reasonable cost. StorSimple's unique approach solves these issues through an elegant blend of on-premises integrated storage, coupled with an on-ramp to cloud storage services (Amazon S3, EMC Atmos, AT&T Synaptic, Iron Mountain Digital, ), automated data tiering (Weighted Storage Layout), primary storage deduplication, thin provisioning, and application- specific integration components.

Additionally, our team recognized from our past lives that storage products do not exist in isolation. Rather, they are part of a broader data management ecosystem and series of policies and procedures, and StorSimple integrates with these as well, with additional capabilities to further improve companies’ posture towards lower-cost and efficient data protection while also minimizing time to restore (through our Cloud Clones capability). All of this packaged in an easy-to- use appliance that deploys within 10 minutes in your data center.

We encourage anyone that is interested in learning more about how to address storage issues related to Exchange 2010, SharePoint, Windows User Files, and virtualized environments to take a look at our website, and reach out so we can have a conversation! Our website is https://www.storsimple.com

Thanks Drew for choosing StorSimple, we are genuinely thankful!