<<

Technology Analysis Report Storage Gateways

Kiran Srinivasan, ATG, CTO Office Shwetha Krishnan, ATG, CTO Office Monika Doshi, SPT, CTO Office Chris Busick, V-series product group Sonali Sahu, V-series product group Kaladhar Voruganti, ATG, CTO Office

TABLE OF CONTENTS

1 Style examples (HEADING LEVEL 1, Arial bold 13 pt blue) Error! Bookmark not defined. 1.1 subsection (heading level 2, arial bold 11 pt black) Error! Bookmark not defined. 1.2 subsection (heading level 2) Error! Bookmark not defined. 1.3 subsection (heading level 2) Error! Bookmark not defined. 1.4 subsection (heading level 2) Error! Bookmark not defined. 2 Template use Error! Bookmark not defined. 2.1 Applying paragraph styles Error! Bookmark not defined. 2.2 Adding Graphics Error! Bookmark not defined. 2.3 Adding New Tables Error! Bookmark not defined. 2.4 styling Unformatted Tables Error! Bookmark not defined. 3 Appendices Error! Bookmark not defined. 3.1 Appendix title Error! Bookmark not defined. 3.2 Appendix title Error! Bookmark not defined. 4 References Error! Bookmark not defined. 5 Styles lists Error! Bookmark not defined.

LIST OF TABLES Table 1) Example NetApp table with caption...... Error! Bookmark not defined. Table 2) Single-use styles...... Error! Bookmark not defined. Table 3) Authoring styles...... Error! Bookmark not defined.

LIST OF FIGURES Figure 1) Insert name of figure here (Arial bold 9 point gray)...... Error! Bookmark not defined. Figure 2) Example caption. Always finish captions with a period, even if it‟s not a complete sentence...... Error! Bookmark not defined.

Cloud Storage Gateways NetApp Confidential 1 SUMMARY In this section, we will present the key observations and insights of the report, these will be elaborated at length in the rest of the report.

1.1 CLOUD GATEWAYS (Section 2) A gateway can be defined as follows: is a hardware or software based appliance located on your organization’s premises. It enables applications located in your local datacenter to access data over a WAN from external cloud storage. The applications continue to use iSCSI, CIFS, NFS protocols while the cloud storage gateway accesses data over the WAN using such as SOAP or REST. Cloud gateways act as a bridge between enterprise data centers and storage that is resident in an external service provider, supporting the trend towards hybrid clouds. Why are gateways important for our customers? 1. Agile storage delivery a. Provide access to a elastic storage for enterprises with simpler and rapid provisioning 2. Lower infrastructure costs : a. Pay only for storage used (pay-as-you-go model). b. Lower capital costs in the . c. Reduced storage management, offloaded to the cloud provider. d. No separate off-site disaster recovery solution needed. Why are gateways important for NetApp? 1. Another storage tier offering with different SLA properties in our storage portfolio. a. Cloud storage is viewed as low-SLA storage. A cloud gateway can enhance the value of cloud storage for enterprises – features like security, dedupe, storage mgmt., etc. 2. Opportunity to offer MSEs a compelling alternative to dedicated backup appliances (e.g. Data Domain).

1.2 RATIONALE FOR CLOUD GATEWAYS (section 2.3 )  Enterprise backups, archival data and tape data can leverage elastic cloud storage: o Roughly three copies of primary data are created for backup and secondary purposes leading to provisioning issues. o Cloud storage in a remote data center can be an equivalent for off-site tape (for DR). o Storage in the cloud is the largest growing cloud service (750 Billion objects in S3 by late 2011). Mainly archival and online backups from consumer space. o Movement of enterprise data can be facilitated by the advent of cloud gateways.  All enterprise applications might not move to the cloud: o Migration of compute and storage to the cloud is not cheap unless the application is offered as a SaaS (Software ) (e.g. .com, ‟s Office365). o Security, control and process concerns for many larger enterprises will force them to keep at least some applications on-premises. o On-premise enterprise applications can benefit from elasticity and other cloud storage advantages via cloud gateways.

Cloud Storage Gateways NetApp Confidential 1.3 KEY CLOUD GATEWAY USE CASES (Sections 3, 4, 5)  Short term (1-2 yrs): Conduit for secondary storage - backup streams, archival data and tape data.  Longer term (2-5 yrs): Conduit for primary Tier-2 application data (primary) – Microsoft Exchange, Microsoft SharePoint, Home directories.

1.4 OPPORTUNITIES AND THREATS FOR NETAPP (Section 7)  Threat: Tier-2 application primary data, especially in virtualized environments form the bulk of our revenue. They can move to the cloud via cloud gateways impacting our revenue significantly. Amazon‟s AWS gateway suggests that their next version will aim at primary enterprise data.  Opportunity-1: Currently NetApp does not have a compelling solution against DataDomain‟s backup appliances. A cloud gateway solution with inline deduplication, WAN latency performance optimizations, integrated with NetApp data management features (like SnapVault, Sync Mirror) allows us to compete with them in this $2.18B market.  Opportunity-2: Cloud gateways can enable easier migration of data to cloud service providers who use NetApp storage. In addition, an integrated solution between a NetApp-based cloud storage and NetApp cloud gateway can be efficient and compelling.

1.5 KEY COMPETITORS IN THIS SPACE (Section 6)  Startup vendors: (primary focus), StorSimple (Sharepoint integration), Panzura (Global file system), CTERA (consumer oriented). Enterprise readiness is a question with most of them, only a couple of them have more than 50 customers.  Established vendors: Amazon AWS Gateway, Riverbed‟s Whitewater appliance, appliance. Amazon‟s gateway as well as ‟s foray into cloud storage highlight the fact that established players are keen to move entreprise data to the cloud.  EMC has partnerships with almost all gateway vendors. EMC also has Atmos for cloud storage.  Mode of deployment: All have VSAs, some have both VSAs and physical appliances. Very few have HA capabilities.

1.6 NETAPP ADVANTAGES/DISTINGUISHING FEATURES (Section 8)  NetApp’s data management value: Expose NetApp‟s value add in data management (like snapshots, cloning, mirroring, snapvault) on another storage tier - cloud storage.  SLO-based management: Enable migration of data between traditional storage tiers and cloud storage via SLOs.  Leverage NetApp technologies: Cloud gateways require write-back caching for performance, NetApp can leverage existing technologies to create an efficient write-back cache that is protected by HA.

1.7 KEY ADDITIONAL IP FOR A CLOUD GATEWAY VIS-À-VIS NETAPP (SECTION 9)  Basic cloud gateway infrastructure (for both secondary and primary storage): o File to object protocols conversion. o Volume to objects/group of objects data granularity mapping. o Security of objects in the cloud (encryption).

Cloud Storage Gateways NetApp Confidential  Value-added features (for both secondary and primary storage): o Compression o Deduplication o Application integration. o Cloud-aware data management (like auditing cloud costs)

 Optimizations for a viable primary specific storage solution: o WAN latency optimization via read and write back caching, pre-fetching

 Infrastructure for global collaboration of a primary storage repository: o Global Locking across a WAN.

1.8 RECOMMENDATIONS FOR NETAPP (Section 9, Section 10)  BUY : (Near Term – 1 yr) o Pros: Lower time to market; compelling and unique IP (as listed above in Section 1.7). o Cons: Enterprise readiness of many startup vendors; Integration of acquired vendor‟s IP with NetApp data management features requires time and effort. o Recommendation: Buy only when IP is hard to develop; chart out pathway to integrate.  PARTNER : (Short Term – 3 to 6 months) o Pros: Lower time to market; parity with EMC; integrated solutions that lower TCO. o Cons: Limited gains; NetApp‟s data management value-add could be hidden  BUILD : (Long Term – 2 yrs) o Pros: NetApp‟s distinguishing features can be fully leveraged; enterprise readiness o Cons: Building cloud gateway in ONTAP would take beyond 2015 (LB+). Building non- ONTAP solution might limit exposing our value-adds. The overall recommendation is to partner immediately with cloud gateway vendors and pursue the „buy‟ and „build‟ options in parallel. Specific projects in the „build‟ option are outlined below:  Enhance our V-series offering to have a cloud storage backend (already underway)  ATG Projects: o Understand the performance of primary, Tier-2 applications on a cloud gateway o Reliability and security aspects of cloud gateway o Unique data management functionality required for cloud gateways o Global file system using cloud gateways

Cloud Storage Gateways NetApp Confidential 2 INTRODUCTION

The growth of cloud technologies, both public and private clouds, have been driven primarily by perceived reduction in IT costs. The commoditization of server hardware resources (especially CPU cycles, memory capacity and disk capacity) has been the biggest enabler. In addition, the growth of hypervisors technologies to help increase resource utilization and consolidation of application servers have contributed significantly to this trend. Also, analytics on large data repositories (“big data”) have assumed significance for many organizations that derive revenue from web services; e.g. Google, Yahoo, Amazon, etc. The scale of data, and the need to compute analytics cost effectively, have forced them to adopt a cloud-based infrastructure along with novel paradigms for computing – like Map-Reduce and Hadoop.

2.1 PRIVATE, PUBLIC AND HYBRID CLOUDS In the context of our discussion, we primarily deal with cloud storage as opposed to cloud compute. Private cloud storage has been applicable in situations where enterprises feel insecure about certain types of data leaving their controlled admin domains, e.g. payroll, source code, corporate email, etc. On the other hand, public clouds are applicable for cases where flexibility in terms of compute and/or storage, as well as ease of management trumps other admin considerations. Private clouds require both upfront (capital expenses) cost to create them as well as recurring operational expenses for administration and management. Whereas, the inherent sharing of resources, economies of scale, and multi-vendor competition attributed to public cloud vendors enable a pure operational expenses model with an expected downward tendency in prices. From an enterprise storage context, it is clear that there are always some types of data that can be moved to a public cloud, e.g. backups, archival data. Therefore, for enterprises, a hybrid cloud model, a combination of private and public clouds is expected to emerge. However, it is speculated that all data will move to public clouds eventually provided the issues around security, control and service-levels are addressed adequately.

2.2 CLOUD GATEWAYS For both hybrid clouds as well as for enterprise datacenters to leverage public clouds, there is a requirement of a functionality that can bridge the two worlds and enable data migration between them. We call such a functionality as the „Cloud Gateway‟, which can reside in an appliance or in a virtual machine. Typically, the raw public cloud storage is accessed via a simple, object-based (PUT/GET) interface using SOAP/REST-based API over HTTPS. In addition, the „Cloud Gateway‟ would employ the local disks or flash storage associated with the appliance to cache cloud data. In addition, the local disks could also be employed as the final resting place for certains types of data that need to stored permanently in the gateway, e.g., filesystem metadata. The caching can be write-back or write-through. Typically, to reduce latency for write operations, the cache would be a write-back cache. However, with write-back caching, we need to satisfy these requirements: the dirty data in the write-back cache has adequate protection against failures and that these protection mechanisms do not debilitate the appliance‟s performance. The storage for the cache is the only upfront storage investment needed to leverage the cloud gateway. This implies that the cost for the customer is proportional to the working sets of their workloads as opposed to the entire data generated by their workloads. Depending on the workloads, the former could be much smaller than the latter. In addition, for enterprise storage vendors, their ability to sell actual storage (in bytes) reduces drastically. For a viable cloud gateway to an external cloud storage provider, there are some very specific requirements due to the nature of the raw cloud data and the API offered to access it. In Section 3.1, we list these requirements and the rationale for having them. In Figure 1, an illustration of a cloud gateway appliance is presented in Figure 1 in the context of an enterprise datacenter. As can be seen, the cloud

Cloud Storage Gateways NetApp Confidential Figure 1: Cloud Storage Gateway Architecture gateway is an on-premise appliance or a VSA (virtual storage appliance) and talks via NAS (CIFS, NFS) or SAN (iSCSI, FC) protocols with traditional clients in the datacenter. It interacts with the cloud storage using an object-based interface and uses local disks for caching hot content or as a permanent store for primary data. Typically, the gateway would be responsible to ensure security of the data before leaves the data center. In addition, the gateway might contain features (depending on its use) that enable performance optimizations and latency reductions while accessing cloud storage over the WAN. Last but not the least, to enhance reliability of data stored in the cloud, the gateway might simultaneously store data on multiple cloud vendors to protect against cloud access outages, vendor lock-in and to leverage price changes. Overall, all of the functionality in the gateway is aimed to lower the TCO by enabling the flexibility of cloud storage, i.e. lower/simpler provisioning (pay-as-you-go), lower admin cost and infinitely scalable storage. Current cloud storage gateway market is very nascent and the offerings are not fully featured. Most of the vendors are smaller startup companies that are new to the storage space and do not own a complete storage portfolio like NetApp or EMC. As of now, only a few offer traditional enterprise capabilities like high-availablity and very few of them actually target enterprise storage. Currently, gateways have been primarily targeted for backup streams, archival data and as a replacement for tape (offsite disaster- recovery). These are primarily offline workloads, with limited performance requirements that are typically tolerant to variation in throughput and latency, such as in a WAN. Moreover, since most of the vendors are startups, they would like to target workloads that are relatively easy to support from a performance perspective, as opposed to primary workloads that have stringent requirements.

2.3 MOTIVATING FACTORS In this section, we will provide motivation for cloud gateways from the perspective of two different enterprise workloads: secondary storage and for Tier-2 application data. 2.3.1 Cloud gateway for secondary storage

Cloud Storage Gateways NetApp Confidential In the enterprise data center, a rule of thumb is that for every byte of primary data, three bytes of secondary data are stored. This includes backups within the data center and a copy on tape at a remote site for disaster recovery (DR) purposes. With backups, the typical enterprise workflow entails full backup every week followed by daily incremental backups, leading to secondary data bloat. The reason deduplication technologies are adopted heavily in this realm is primarily due to this bloat. In spite of deduplication technologies, we observe from certain case studies that efficient provisioning of storage to address secondary growth is very difficult. The number of objects in Amazon‟s S3 now exceeds 700 million [ref]. Backup and archival data constitute nearly 55% of S3‟s objects. These backups are largely expected to be online backups of personal laptops (consumer space), the fraction of enterprise backups is not known. However, such a large percentage questions whether cloud storage can be an efficient option for enterprise backup data as well. Also, the features expected of offsite-copy (for disaster recovery purposes) of enterprise data maintained on tape is very similar to that offered by cloud storage. But, cloud storage has other inherent advantages relative to tape like WAN-based global access and a variable cost structure (due to multiplexing of cloud resources across clients and workloads). We observe that these advantages make the migration of enterprise backups and off-site copies to cloud storage imminent. The inherent elastic nature of cloud storage can help to address the provisioning of secondary data. Thus, the need is for a conduit to send enterprise backup to the cloud with the right level of security (protection) and recovery semantics. We envision the cloud gateway to act as this conduit and enable existing backup applications to transparently leverage cloud storage. From another perspective, currently we observe that within , an overwhelming percentage of data consists of online backups and archives from individual users (or consumers). Enabling enterprise backups to utilize cloud storage would be a natural extension of this trend. As mentioned before, a copy is typically stored on a off-site tape archive, primarily for DR. Similar to cloud storage, the tape is maintained at a remote data center. Moreover, like cloud storage, the tape archive could be managed by a different company, maintained as a repository across their multiple customers. Therefore, the functionality and requirements are almost identical. This implies that cloud storage can be an effective and cheaper tape replacement because the extra cost of copying data over to tapes and transporting them are not applicable. Compared to tape, cloud storage has one distinct advantage, data can be accessed at any time or place, independent of the actual physical location. With the cloud gateway as the bridge to the archive in the cloud, the archive can be kept online indefinitely. Moreover, this online archive can be accessed using traditional protocols and recovered efficiently, with little logistical overheads.

2.3.2 Cloud gateway for Tier-2 application data (primary storage) The cloud market as a whole, both private and public is growing rapidly. The cloud vendors classify their service in many ways – Infrastructure as a Service (ItaaS, e.g. Amazon‟s S3), (PaaS, e.g. ), (SaaS, e.g. Microsoft‟s Office365), etc. Within ITaaS, there is further classification as Storage as a Service (StaaS, e.g. Amazon‟s S3) and Compute as a Service (CaaS, e.g. Amazon‟s EC2, Microsoft‟s Azure cloud). Among these different categories, StaaS is experiencing the highest growth (cumulative growth rate of 25% annually), but CaaS leads in terms of revenue[1,2,11]. The key question remains as to whether other workloads will adopt cloud storage. A related question is whether hybrid clouds will become mainstream, where some data resides in an enterprise datacenter or a private cloud and the rest resides in a public cloud. An interesting perspective is provided by ‟s whitepaper on the future of IT, datacenters and their evolution vis-à-vis cloud technologies [4]. Figure 2 shows an illustration from the whitepaper on the evolution of hybrid clouds and where the different wo

Cloud Storage Gateways NetApp Confidential

Figure 2: Intel's IT evolution - hybrid clouds (Source: Intel whitepaper [4]) workloads will reside. It can be observed that in the mid term, only selective functions will move to public clouds like caching of content on ItaaS and sales support to SaaS (e.g. salesforce.com). However, in the longer term, they expect more workloads to go to public clouds – backups, storage, manageability, client VM images to ItaaS as well as CRM, Collaboration, Productivity tools to SaaS. As per this report, for NetApp, the implications are pretty clear – a significant portion of the enterprise storage data is moving to public clouds. Also, in the picture, we can see that internal clients in the enterprise datacenter are expected to make use of ITaaS services like cloud storage over the WAN. This observation points to the importance and development of cloud storage gateway technologies in the near future that will enable this evolution. Another aspect in the evolution of cloud workloads is the role of SaaS. SaaS provide you the application as a cloud service accessible typically over a web-based interface. For business applications like CRM, ERP, such cloud services are readily available and are being adopted zealously. A unique case in point is Microsoft‟s Office365, which offers the entire Mircosoft Office suite of applications as web services. Such services are clearly cost effective for enterprises. In addition to the advantages of a cloud service, i.e. low management and admin costs, instant deployment, such services eliminate the extra servers (and associated data center costs) required to run the application servers in the datacenter. However, the key disadvantage is that there is very little control on the application data, i.e. their storage and security policies on them. Thus, while adopting a SaaS service, we trust the provider considerably. The SaaS security model might be suitable for MSEs but not completely for large enterprises. We expect that large enterprises would still like the security, control over resources and processes, that standalone application servers offer when they are on-premises. However, they would like to take advantage of the elasticity and cost advantages of cloud storage if possible. Cloud gateways enable this exact requirement, to serve as a bridge that enable applications to reside in the data center but leverage cloud storage for their storage needs. With cloud gateways, other than SaaS use-cases, we expect that there are a significant number of workloads that use compute in the datacenter or private cloud but leverage storage in a public cloud. A key question is whether this assumption is valid. A contrary opinion is that for all enterprise applications

Cloud Storage Gateways NetApp Confidential (standard as well as custom) both compute and storage would move to a public cloud and render the gateway functionality useless. There are no clear answers to this question as of now. We need to wait for the evolution to take place before we make our judgement. However, a hybrid cloud scenario, with a split of compute in the private cloud (or enterprise datacenter) and storage in the public cloud is plausible given the characteristics of these applications and resources (compute and storage). We list a few of them here: 1. Change in applications: For applications to move completely to a public cloud, a few changes have to happen: a. These applications need to be portable, i.e. encapsulated in a VM, before they can be moved to a compute cloud service. b. Typically these applications work with unencrypted data in the datacenter, when run in a public cloud compute infrastructure, data traffic into and out of the VM needs to be encrypted. Moreover, data at rest created by the applications need to be maintained in encrypted form. c. The application or the layer below the application needs to talk to cloud storage via a different protocol. Enterpise applications or the layer(s) below them need an object protocol to access cloud storage as opposed to the typical enterprise storage protocols (NAS/SAN). Either a translation from NAS/SAN protocols to object protocols need to be made or new storage client software that natively talk object protocols need to be introduced in the application VM stack. d. Also, along with translation, to reduce cost of cloud storage, features like storage efficiency need to be incorporated into the application‟s VM stack. None of these changes are insignificant for many legacy enterprise applications. 2. Compute and Storage are different kinds of resources: With increasing processor speeds, compute as a resource is one of the cheapest in the datacenter and the most flexible in terms of usage. Contrarily, the cost per CPU cycle, as seen with Amazon‟s EC2 is not low. This observation has translated to CaaS generating more revenue than StaaS. Moreover, compute is a renewable resouce, the moment a CPU cycle is used up it is available for use again. Whereas, storage costs are going down, but the hidden costs of storage administration is still considerable. In addition, storage is a consumable resource, once a byte of storage has been used, it has been consumed and cannot be used until the data is erased. These factors might make storage a candidate to migrate to the public cloud but not necessarily compute. Given these observations and the growth of hybrid clouds, we feel that that cloud gateways might be the conduit for enterprise storage to move to public clouds at least for some workloads. Burton Group‟s report [1] on cloud gateways classifies workloads that have already moved and the ones that can move to the public cloud via gateways. A key observation is that many Tier-2 applications where NetApp has enjoyed significant market share and revenue growth are listed as ones that might move. The lure of flexible, pay- as-you-go and a low capital expenditure model for enterprise storage are the key motivating factors behind this prognosis.

Cloud Storage Gateways NetApp Confidential

3 CLOUD GATEWAY ARCHITECTURES

In this section, first, we will outline the mandatory capabilities that a cloud gateway should possess – mainly dictated by design considerations and partly by the first-movers‟ differentiation in this space. Second, we will provide three alternative models of usage for a cloud gateway. These models are not mutually exclusive of each other.

3.1 MANDATORY CAPABILITIES Given the market space for cloud gateways, the following are the mandatory features expected of an enterprise-class appliance/VSA: . Operations and protocols (NAS/SAN) that emulate conventional storage arrays and file servers: To enable existing enterprise (storage) clients to access data. . Translate between files, blocks to objects on the cloud storage: Since the cloud storage API offered is typically object-based, the gateway needs to translate appropriately. . Data leaving the enterprise datacenter needs to be secure: Enterprise data cannot leave the premises in clear-text and cannot be stored in clear-text in the cloud. This requirement is usually achieved by encryption before the data leaves the datacenter. . Perform smart caching of the data to avoid WAN latency: Typically, write-back caching and efficient pre-fetching strategies are employed in this context. Also, having an effective cache reduces the number of network requests to the cloud storage provider, enabling extra savings. . Minimize the WAN bandwidth usage by deduplicating data: Most external cloud SSPs charge for both the data stored as well as for network requests. To ensure minimal data is stored in the cloud, deduplication is essential. Moreover, you pay the SSPs only for network requests of unique data. . Provide access to multiple public cloud storage vendors: Mainly to prevent a single point of failure as well as single-vendor lock-in. . Export cloud storage semantics to the end admin: Cloud storage features such as on-demand capacity and pay-as-you-go pricing model needs to be exported to the admins in a transparent way. . Monitoring, reporting and other data management capabilities: Since the customer would be paying the cloud SSP for the storage as well as network requests that originate from the cloud gateway, it is essential to audit all the requests efficiently and present them to the customer on demand.

3.2 DESIGN APPROACHES TO CLOUD GATEWAY Different approaches or models have surfaced among the cloud gateway vendors to facilitate cloud storage integration. These models also have a strong relationship to a typical dataset they would support. The performance characteristics of all enterprise workloads cannot be satisfied by all models. However, these models are not exclusive to each other, some appliances blend them. The models are: . Caching device model: The gateway provides advanced caching algorithms to mask cloud performance limitations – both WAN latency and bandwidth constraints. Typically, write-back caching is done on local disk or SSD devices.

Cloud Storage Gateways NetApp Confidential . Tiered device model: The gateway enables the creation of an explicit enterprise storage tier with specific performance and capacity characteristics. By definition, in this model, the gateway is part of a larger eco-system that provides the other storage tiers. . Copy device model: The gateway provides conventional on-premises storage with scheduled replication services to the cloud to facilitate backup/recovery functionality as well as a disaster recovery solution. As gateway offerings increase, we expect them to be a combination of these models. The following subsections detail each of these models.

3.3 CACHING DEVICE MODEL Figure 3 illustrates a gateway modeled as a caching device. With this approach, a cached copy, i.e., a virtual storage volume (filesystem or LUN) is presented to the datacenter clients by the appliance, whereas the actual volume is in the cloud. The cached copy need not be in-sync with the volume in the cloud. Moreover, the cache type – write-through or write-back dictates the invalidation and consistency

Figure 3: Gateway Caching Model (Source: Gartner) requirements. Typically, in order to mask WAN latencies, the caches are designed as write-back caches. This implies that during steady-state, some amount of dirty data (unflushed writes) will be present in the cache. In this case, since we are dealing with enterprise data, data loss is not acceptable. Therefore, we need to ensure that the dirty data can survive the loss of the gateway appliance by appropriate reliability mechanisms (e.g., mirroring to another appliance within the datacenter). Since this requirement is similar to the reliability requirements of primary enterprise data, vendors should build these functionalities in their gateways to be feasible. In addition, in the event of a datacenter disaster, the volume recovered from the cloud needs to be in a consistent state. To enable this, by design, we require well-defined cut-off points (checkpoints/snapshots/consistency-points) in time for synchronizing dirty data from the gateway to the cloud. A vendor-proprietary caching algorithm attempts to minimize data transfers between the gateway and the cloud- storage provider for both reads and writes. Cache reads can transfer to the enterprise data clients at speeds consistent with NAS or SAN systems. Anytime the cache experiences a read cache "miss," the gateway must retrieve data from the cloud and incur both latency and bandwidth penalties while the data moves from the cloud through the cloud connectivity and finally to the gateway. Typically, for writes, the cache aggregates, compresses/deduplicates, and encrypts the data for transfer to the

Cloud Storage Gateways NetApp Confidential cloud at opportune times. To minimize cloud data footprints, improve performance, and preserve data privacy, upon transfer the gateway will compress, deduplicate, and encrypt data.

Some vendors like StorSimple have taken a hybrid approach, where some data like the filesystem metadata always resides in the gateway, whereas the file system data is being cached. This approach enables fast access to metadata even in the event of a disconnect to the cloud.

The following are some key technical issues are relevant for the caching approach:

. Cacheable Workloads: To mask the WAN latency effectively, the workloads need to be cache- friendly, i.e., have temporal locality properties that lend to a relatively small working set. In addition, the cache replacement policy has a big impact on performance and has to be designed with due importance. Lastly, effective prefetching strategies need to be devised, in order to minimize “cold” misses that require synchronous retrieval from the cloud leading to unexpected and unacceptable WAN latencies (could be three orders of magnitude compared to local disk). . Sizing: Since we store data storage in the gateway is a function of the working set sizes and not the actual data size, ideally, we can serve an ever-increasing cloud storage footprint with the same local storage on the gateway appliance. Even if we do not exhibit this ideal behavior, we expect the growth of local storage (on the gateway) to be dependent on working set size growth, which is more tied to applications‟ behavior/evolution as opposed to application data growth. This insight offers incredible cost advantages for datacenters, where storage capacity sizing and provisioning (usually for peak utilization) concerns are largely mitigated. In addition, the lower capacity requirements augur well for a pure flash-based cache (SSDs) . . Coherency issue: In some cases, the actual cloud volume could be shared by multiple cloud gateways in different geo-distributed datacenters via gateways. Panzura‟s Global File System is an example. To provide a globally consistent view of a single file system, we need appropriate coherency mechanisms across the WAN-distributed gateways, such as global locking and cache invalidation schemes.

So far, analysts like Gartner have suggested that the caching model is best for minimal footprint installations (like branch offices), file sharing workloads, data archival and low-demand backup, primarily due to latency concerns. It is still unclear if the requirements of key primary workloads like business applications can be satisfied by the caching model. For this model, example vendors are Nasuni, Riverbed‟s Whitewater appliance and Panzura‟s Alto 6000 Series Cloud Controllers.

3.4 STORAGE TIER MODEL Figure 4 shows a tiered gateway. With this model, in contrast to the caching model, storage volumes may wholly exist in the datacenter on local storage and/or exist in the cloud. In this model, the gateway enables cloud storage access to enterprises as a specific tier in a multi-tier storage hierarchy. The hierarchy is usually based on performance characteristics of each tier. Today, in enterprise datacenters, such multi-tier hierarchies already exist, usually classified based on performance characteristics - starting from fast, expensive flash-based storage to slow, inexpensive tapes. A cloud gateway would extend this hierarchy by offering another tier with flexible storage capacity and disaster protection but lower performance. Given these characteristics, this tier would fit datasets described as archival/data warehouse, cold data as well as traditional backups. Also, multi-tier storage hierarchies enable features like automated data migration to reduce overall cost - the ability to move data from one tier to another automatically via dataset policies. Here are some key issues to relevant to this model:

Cloud Storage Gateways NetApp Confidential Figure 4: Gateway as a storage tier (Source : Gartner) . Traditional workloads compatibility: A tiered gateway from a different perspective could be thought of as a conventional enterprise storage system with cloud gateway feature. Therefore, workloads that are appropriate for local storage systems are still applicable to tiered gateways. However, a key advantage of such a gateway, is that the cloud storage provides infinite capacity, albeit with performance, SLO and cost limitations. Though, current offerings have limited local storage scalability (StorSimple‟s 7010 appliance‟s maximum is 20TB) and may fall short for IO intensive workloads. . Caching issues mitigation: As mentioned before, with a write-back caching gateway, to protect against data loss due to failures (hardware or cloud connectivity), we need protection mechanisms in place like local mirroring and consistent checkpoints of the cloud data. In the tiering model, due to the existence of a local storage tier, such issues are largely mitigated or completely absent. . Highly competitive landscape: With the tiering model, we expect existing enterprise storage players, e.g. EMC, HP, IBM, Hitachi data systems, etc. to have a tiered gateway as they are best equipped to enhance their value proposition with multi-tier storage. The tiering model is best suited for data archival, data warehousing type of datasets. Example products are StorSimple and F5 Network‟s ARX.

Figure 5: Gateway used for a remote copy (Source: Gartner)

Cloud Storage Gateways NetApp Confidential 3.5 COPY MODEL Figure 5 illustrates a copy cloud gateway. In this model, the gateway is similar to a traditional local NAS/SAN storage system and customers are supposed to use it that way. The performance and management expectations from this appliance are also similar to a traditional on-premise storage system. However, the only unique value-add is the ability to connect to external cloud storage and perform replication/copy services from local storage to the cloud storage. These copy services would be similar or identical to the ones between enterprise storage systems. The main goal of such services is data protection in the event of both the on-premise storage system or the datacenter itself. A secondary goal would be asymmetric data sharing (i.e., read only) across geographically distributed datacenters. With this model, storage admins are required to be able to perform cloud storage related configuration and map traditional copy services notions to the equivalent ones relevant for cloud storage. Also, the gateway is expected to deduplicate and compress data before transferring to the cloud as a large stream. The copy approach is ideal for data protection use-cases (like DR) that require routine snapshot capabilities. This model has the least barrier to adoption and the ideal first product offering across all gateway models. However, the limited applicability might also limited cost savings. Also, since the offering is geared towards disaster recovery, transferring large datasets from/to the cloud efficiently is a key issue. Not all cloud storage protocols are suited for a streaming workload, all of them currently export an RPC like object based protocols. Some key issues relevant for the copy model are: . Storage players’ imminent entry: Enterprise storage vendors with current DR offerings would jump onto the copy cloud gateway soon. The required SLA/SLOs in the cloud storage to support this model is a key open question, the role of cloud storage providers to facilitate this model is not clear. Amazon‟s S3 is optimized for relatively smaller objects (order of 1MB) as opposed to large objects that might arise of a copy workload (order of TB). This implies more work on the gateway to split the copy workload onto many smaller objects and the associated book-keeping. . MSE suitability: Customers in the MSE space are the ideal candidate for such a gateway. With cloud-based replication, DR services, they can skip the equivalent datacenter-to-datacenter offered by current vendors completely for cost reasons. We expect this advantage as a key driver for this model. An additional use-case is for data sharing between datacenters with completely non-overlapped work patterns (two datacenters on different sides of the globe). Though, we have not seen existing vendors tout this specific advantage. CTERA is a prototypical vendor in this space.

3.6 HYBRID (COMBINATION) MODELS It is easy to notice that these models are not mutually exclusive. A number of solutions from existing vendors is a combination of these different models. Typically, most gateways include a write-back cache for performance reasons irrespective of the intended workload, i.e., backup or primary. An example is Panzura, that can be used as a primary storage as well as for archival purposes. As opposed to using different models, vendors prefer to differentiate by offering value added services like closer application integration, e.g., StorSimple offers MS Sharepoint/Exchange server integration.

Cloud Storage Gateways NetApp Confidential 4 UNIQUE REQUIREMENTS/EXPECTATIONS OF GATEWAYS

Compared to traditional enterprise storage in datacenters, cloud gateways have unique requirements that they need to fulfill. At a high level, most of these are methods to integrate them into known storage management notions. We look into them next.

4.1 CLOUD-STORAGE SERVICE AUDITING AND CONSOLIDATION Changing a storage environment to incorporate cloud providers‟ storage requires service-plan tracking to manage user accounts, billing, access and usage. Consolidation of user accounts for volume pricing and indirect billing help reduce cost. Also, auditing of all network IOs to the cloud provider is key to provide the customer with necessary data to validate the cloud provider‟s actual cost and to project future costs.

4.2 ANALYTICS BUILT INTO THE GATEWAY To manage accounts and optimize savings, report generation for things such as caching efficacy, bandwidth usage, object transfer sizes are required to validate the effectiveness of cloud storage vs traditional storage. Also, these analytics might identify performance bottlenecks and help in provisioning the right number of gateways and cloud storage volumes.

4.3 PROVISIONING MANAGEMENT As is true for traditional storage, administrators should expect a gateway to offer simplified provisioning. For example, because cloud storage provides dynamic capacity expansion, creating a thin-provisioned volume from a gateway should be a simple procedure. Expanding storage is straightforward, but when a gateway releases unused storage (from the users‟ perspective) it may not release the equivalent amount from the cloud storage provider. This can be true of gateways that translate block protocols to a object- based cloud storage without matching. Thus, releasing capacity may require local server agents to release unused data blocks from the gateway and thus objects in the cloud.

4.4 DR AND BACKUP INTEGRATION Because DR and Backup workloads are an important use-case for a gateway, it would be appropriate to integrate the gateway to existing mechanisms to perform these operations. For example, backup applications and protocols like Microsoft‟s VSS (Volume Shadow Copy Services) , NDMP (Network Data Management Protocol) or Symantec‟s OST (Open Storage Technology) are some relevant ones.

4.5 FILE SYSTEM INTEGRITY PROTECTION MANAGEMENT Cloud gateways provide the ability to share a global file system that is made available to geographically distributed datacenters. Some vendors like Panzura have touted this as one of their major features. This feature definitely distinguishes gateways from other traditional storage appliances. However, in order to accomplish file system integrity in light of a sharing between clients in different datacenters require mechanisms like global lock management and global file synchronization.

Cloud Storage Gateways NetApp Confidential

5 SOLUTION DEPLOYMENT

In this section, we will compare and contrast different deployment options for the cloud gateway functionality in an enterprise datacenter.

5.1 LOCATION IN DATACENTER The cloud gateway needs the best possible access to WAN connectivity within the datacenter. To a large degree, both throughput and latency experienced in accessing cloud storage is dictated by WAN characteristics. Of the two, throughput can be maintained close to the available bandwidth by associating with the appropriate physical datacenter of the cloud storage provider‟s network and by sufficient parallelism in software (multiple open connections to the cloud storage provider). For latency, each extra network hop in the local datacenter before reaching the WAN connection will be detrimental to the overall latency. Therefore, the cloud gateway should be placed in the network topology with minimal distance to the WAN connection of the datacenter.

5.2 MERITS/DEMERITS OF AN APPLIANCE DEPLOYMENT Some cloud gateway vendors are packaging their functionality in a dedicated physical appliance. This approach has many benefits : dedicated physical resources, performance isolation, fault isolation, typically better control on performance, leaner data path, etc. Of these advantages, the ones that influence performance are prominent. Since the cloud gateway needs to have a write-back cache in most models, the write-back cache can be made reliable with very little effect on performance by using specialized hardware (like using NVRAM, using high-speed interconnects to mirror contents to another node). This approach is typical of many primary storage systems. In addition, having dedicated memory and CPU resources just for the gateway functionality enhances predictability in performance. There are many disadvantages of this approach as well. First, a dedicated physical appliance typical comes with a higher cost. Second, deploying a physical appliance is more time consuming and expensive for the admins of a datacenter. Third, the opex component of a physical appliance including the rackspace, cooling/heating costs as well as power are not insignificant. Last but not the least, for the vendors manufacturing the systems, there are more dimensions to handle (suppliers/inventory control, qualification), a longer product cycle resulting in a longer ROI.

5.3 MERITS/DEMERITS OF VM DEPLOYMENT Most cloud gateway vendors offer their solution as a VM. This has been influenced largely by the market into which they are positioning their solution. The MSE/SMB market has been the focus of many startup vendors. The VM solution is ideal for such cost-constrained environments where the higher performance of a dedicated appliance is not as important. Constrained in a VM, the cloud gateway is forced to share resources –CPU, memory and storage devices. Moreover, the management of the VM has to be integrated with the hypervisor‟s management processes and tools. A VM-based deployment model has its unique advantages. It is possible to deploy many smaller virtual appliances in each host, such that the combined resources is higher or equal to that of a physical appliance. In addition, being closer to the application VMs allows the caches in the gateway VMs to be more effective. Apart from the advantages listed for a physical appliance deployment, the key disadvantage is that performing any global optimization that entails cross-gateway communication is expensive and is avoided. For example, the cloud gateways can only perform deduplication within the I/O streams originating at their hosts and cannot deduplicate across the gateways. Therefore, some

Cloud Storage Gateways NetApp Confidential duplicates will find their way to the cloud storage resulting in extra costs for the admins. Also, being a VM- based deployment, the performance expectations need to be appropriately calibrated.

5.4 USE-CASES The different architectural models enable one or more different enterprise use-cases for the cloud gateways. A single architecture could support multiple use-cases. The following are some important use- cases for cloud gateway deployments.

5.5 CASE 1: BACKUP/COLD DATA By far the most common case where cloud gateways are being employed today. The cloud gateway is used to backup data to the cloud. Typically, the backup copies consume a lot of storage because of traditional policies – full backup every week, with incrementals every day. In datacenters today, dedicated storage appliances like Data Domain‟s disk-based backup systems are prevalent. With this model,there are significant overheads: raw storage costs (in spite of deduplication), storage administration costs of the systems, datacenter costs, provisioning/planning for storage growth. In addition, backup and recovery software like Symantec or CommVault need to be employed. Backing up to the cloud is cost-effective in the local datacenter, minimal or no storage admins are needed, and the storage needs can be met dynamically.

5.6 CASE 2: PRIMARY DATA {CIRTAS, STORSIMPLE, NASUNI} A small number of cloud gateway vendors are positioning it as a one-stop solution for all storage needs. They claim that they can cache the most performance-critical working sets on their appliances (virtual or physical) to enable primary datasets to be stored on the cloud gateway. These appliances are expected to understand the data access properties of the different data entities (files, blocks or objects) stored on them and transfer only appropriate ones to the cloud. In addition, they typically perform effective pre- fetching from the cloud to avoid WAN latencies. These vendors are careful not to position their appliance for highly latency-sensitive tier-1 applications like OLTP. They are targeting tier-2 application content like Exchange or Sharepoint databases, whose performance requirements they feel they can satisfy by careful analysis of their data access properties. Moreover, compared to OLTP datasets, tier-2 applications typically generate more data. Therefore, their storage growth trends can make them moving to the cloud economically viable.

5.7 CASE 3: DISASTER RECOVERY AND COMPLIANCE COPY With this approach, the traditional tape-based, off-site DR copy is being replaced by a copy in the cloud. The datacenter is expected to have a local disk-based backup appliance for operational recovery. Data is retrieved from the cloud only when access to the datacenter is completely lost. In this use-case, data is being sent to the cloud continuously, but hardly ever read back. A similar use-case is to keep a copy in the cloud for compliance purposes. Sarbanes-Oxley and HIPPA regulations force the respective verticals to maintain fine-grained data for prolonged periods of time with the ability to recover when needed. Maintaining an off-site datacenter just for compliance reasons is expensive. A cloud copy kept with a provider with reasonable reliability and availability guarantees is a good option to keep costs low. Amazon‟s S3 provides different levels of reliability and availability with a matching cost spectrum to enable such use-cases.

Cloud Storage Gateways NetApp Confidential 5.8 CASE 4: REDUNDANT DATA BLOAT {NETFLIX USE} In rich content space, there is a need to keep multiple copies of the same content at different degrees of definition or resolutions. A case in point is Netflix, they need to maintain multiple copies of their online movie content at different resolutions. Coupled with the number of movies they have, this leads to a storage explosion. Not all resolutions are being used at the same time. Each resolution could be appropriate for a different device on which the movies can be played. A similar case can be made for online photo repository or sharing services. In such cases, to satisfy storage growth, it would make sense to put such content on an external cloud storage service and stream the content directly from them.

Cloud Storage Gateways NetApp Confidential

6 COMPETITIVE LANDSCAPE

6.1 CLOUD STORAGE GATEWAY ECOSYSTEM In Table1 below, we have listed the key products in this space with some of their main attributes along with their differentiators. U

Table 1: Cloud Gateway vendors, features and differentiators

Vendor/Product Form Factor Use-case Block/File Supported Key Clouds Differentiators Name Focus

AArkeia Hardware Data File S3/Arkeia Integrated Appliance Protection only cloud backup/DR, source-based dedupe

Axcient H/W Data File Axcient cloud Integrated data Appliance Protection only protection and business continuity

CTERA H/W Data File S3, EMC All-in-one appliance + protection, file Atmos, backup agents sharing Rackspace, Hitachi HCP, Mezeo, Scality, Nirvanix, IBM GPFS, DX/Carringo

Egnyte PC Agent, Cloud file File Egnyte cloud Ease of use, hardware or server, file only centrally managed virtual sharing, file cloud file server appliance backup with local edit and offline access

Cloud Storage Gateways NetApp Confidential Vendor/Product Form Factor Use-case Block/File Supported Key Clouds Differentiators Name Focus

Gladinet Software Cloud desktop, File Mezeo, S3, Wide choice of cloud server AT&T storage clouds, Synaptic low cost Drive, Internap, Google, .net, Open Stack, Nirvanix, Rackspace CloudFiles, Azure

Hitachi Data Hardware Data File HDS Hitachi Centrally manage Ingestor appliance protection, Content and control data with HA archiving Platform only at the ‘edge’ (private cloud)

MS i365 Software, Data File i365 Cloud Range of services, hardware Protection Microsoft DPM integration

Nasuni Hardware or Primary NAS, File S3 100% Uptime Virtual data protection SLA appliance

Nirvanix – Software Cloud filer, File Nirvanix –only Free of charge, CloudNAS feature sharing ease of use

Panzura – Alto Hardware Primary, File S3, Limelight Global Cloud Controller appliance collaboration, CDN, Namespace, (with HA) or archiving Microsoft global data virtual Azure, AT&T replication and appliance Synaptic locking, and storage, global Nirvanix deduplication

Riverbed – Hardware or Data File S3, Nirvanix, Experience in Whitewater virtual protection only AT&T WAN appliance Synaptic bandwidth/latency Storage optimization

Cloud Storage Gateways NetApp Confidential Vendor/Product Form Factor Use-case Block/File Supported Key Clouds Differentiators Name Focus

Seven10 – Software Archiving only File EMC Atmos, Multi-vendor, StorFirst EAS AT&T multi-platform Synaptic and multimedia Storage, Dell archiving DX6000 + others

StorSimple Hardware Primary, Block AT&T Microsoft and appliance secondary, Synaptic VMWare (with HA) data protection Storage, S3, certification EMC Atmos, Microsoft Azure

TwinStrata – Hardware or Secondary, Block S3, EMC DR anywhere, CloudArray virtual data protection Atmos, Compute appliance Mezeo, Scality anywhere (with HA)

EMC Atmos Software Cloud filer, File Atmos Ease of use, GeoDrive feature sharing integration with Atmos

PPLiER/ PRODUCT NA As can be seen, the preferred use-cases for most of these cloud gateway products are data protection and secondary storage. Very few of them explicitly target the primary space. Also, as of now, S3 seems to be the preferred cloud storage vendor of choice. EMC is partnering with a lot of these vendors to fuel Atmos cloud deployments. A notable feature is that except a few of them, most of them handle data at file granularity. For vendors that focus specifically on data protection, it is reasonable to handle at the level of files. However, vendors that are offering cloud gateways for primary, collaboration and sharing workloads the rationale for file-level granularity needs to be established. In the rest of the section, we will present more details of a few select products. Each of these represent one particular type of cloud gateway architecture.

6.2 NASUNI The Nasuni Filer is an on-premise storage device that serves as a cache to the cloud storage, where the primary copy of the data resides. It is available both as a virtual machine (on VMWare or Microsoft‟s Hyper-V servers) and as a physical appliance. It supports NFS and CIFS, with full integration with Active Directory, DFS, older Windows versions. Key differentiating features:  Performance with unique caching algorithms: Active data is stored in the cache for local storage performance, while relatively inactive data, typically bulk file date, is stored off-site in the cloud storage. For this offsite data, all metadata is cached. Upon a cache miss, the file requested is returned in chunks so that users can access the data without waiting for the whole file to be brought into the local cache. There are special algorithms to handle metadata versus data so that the system is responsive when the user is scanning directory listings and browsing his folders. Being file-based

Cloud Storage Gateways NetApp Confidential and not block-based, the appliance prefetches the rest of the file upon the first access. It also does file-type aware differentiation, like it uses the fact that Word documents are much more likely to be updated and accessed than ZIP files. Has a configurable cache size based on the user‟s working data set.  Synchronous snapshots with fast restores: Snapshots stored in the cloud capture the filesystem at user-defined points in time, providing versioning of data and eliminating the need for local backups. As metadata is cached and restore happens in chunk units, users can access data immediately. Snapshots are de-duplicated at file level and compressed.  Non-disruptive cloud-to-cloud migration: Downtime to move terabytes of primary storage data from one cloud storage provider to another is only on the order of few minutes.  Global Multi-Site Access: Nasuni allows multiple of its storage controllers to have live access to the same volume of snapshots. It provides two way synchronized read-write, so workers who move from office to office can now ensure that they have fast access to local data. Virtual and hardware forms of the appliance are interchangeable.  Most stringent SLAs: Nasuni guarantees 100 percent data availability and accessibility, with significant penalties if the services are unavailable even for just a few minutes.  Stress Tests to qualify Cloud Providers: Through rigorous and ongoing testing, Nasuni chooses only the highest performing cloud providers. It tests for performance, stability, availability and scalability that organizations need to take advantage of the cloud for primary storage, data protection and disaster recovery.

6.3 PANZURA Panzura offers Application Cloud Controller (ACC) appliances that can serve as primary storage, collaboration and file sharing at branch offices, and for backup and archiving use cases as an alternative to offsite tape. It is available in virtual machine and hardware (1U/2U) form factors and currently supports only NAS-based protocols NFS and CIFS. Key differentiating features:  Targeting Large Enterprises: Selling to high end of the enterprise market.  Application Network Storage (ANS): Revisits traditional network-centric storage by focusing on the application and its data usage pattern. Includes deep packet inspection and acceleration, WAN optimization, deduplication, encryption, offline file access.  Global namespace and file-system, global replication, global block-level deduplication and global dynamic lock management: Unified file system spanning multiple physical sites. Metadata is separated from data and this smaller metadata repository is quickly replicated to all of the ACC appliances for visibility from all nodes for search/browsing operations with rapid response like local file systems. Replication can be done either directly between appliances or through a cloud provider. When a user requests to open a file that isn't stored locally, its data moves to the top of the replication queue. Administrators can set replication policies to preload folders to specific locations. Global locking keeps track of file accesses and grants write access to the first user requesting a file, providing read-only access to subsequent requestors. This enables shared read/write access to data for users in different locations without accidentally creating file corruption. No administrator intervention required to maintain write order fidelity and atomic file synchronization. Upon a WAN failure, users get read/write access to files that were last modified locally and read-only access to files last modified in another location. Can also write simultaneously to multiple third-party clouds.  Application Integration (with SharePoint): Uses EBS (External Blob Storage) to split BLOBs from back-end SQL database. The local ACC appliance intercepts and serves requests sent to SharePoint server. Also supports Symantec‟s NetBackup application.

Cloud Storage Gateways NetApp Confidential  High-Speed SSDs for Performance Acceleration: Upto 12 SSDs to hold frequently used data (front end cache) to support multiple concurrent users. Admins can assign tiering policies so the .VMDKs get the performance of flash but ARCHIVE.PSTs stay on disk.  High Availability with redundant components: RAID-5 or RAID-6 protection, hot-swappable drives, redundant power supplies and fans.

6.4 STORSIMPLE StorSimple is about separating off the top two tiers from an enterprise storage array, the SSD and fast SAS disk drives and putting them in a 3U hardware appliance, with the rest of the array - the bulk data storage part - replaced by the cloud. It offers the Armada hybrid cloud storage appliance as a primary storage alternative to conventional block storage systems in midsized companies (~500 users) and departments within enterprises. The iSCSI-based appliance is positioned as „all-in-one‟– primary storage, archive, backup/recovery and DR in a single box.  Four Storage Tiers: SSD - linear (raw tier 1), SSD - deduplicated (tier 2), SAS – deduped, compressed, Cloud - deduplicated, compressed, encrypted.  Weighted Storage Layout (WSL) and BlockRank Algorithm: Figures out what data is relevant to an application over a period of time and makes sure these hotspots/working set data is in the StorSimple appliance while colder data goes out to the cloud. Transparently moves data across tiers of storage to optimize performance and cost – for example, 85% utilization threshold causes spilling downward to lower cost, lower performance tier. WSL is automatic, dynamic and operates in real time. Data is carved into variable-length blocks. WSL works at block level and uses the "Block Rank” to order blocks in terms of their usage patterns, frequency of use, age, reference counts and the relationship segments have with each other, to find the right storage tier. Spilling can be controlled by a per-volume priority setting (local-preferred, normal or cloud-preferred).  Application integration and application-specific optimization for Microsoft SharePoint and Exchange 2010: Has application-optimization plug-ins which maximizes performance on a per- volume basis. Leverages Microsoft‟s EBS and RBS APIs with SharePoint, wherein the SQL server database is always stored on SSD whereas the content, including BLOBs like audio, video and CAD drawings, can be spread over SSD, SAS drive or the cloud. With Exchange, leverages deduplicated primary storage and the cloud for supporting DAG, increased mailbox quotas, PST centralization. Can recover individual items or full mailboxes.  High Availability: Offers dual controllers for enterprise grade HA, redundant power supplies, network connections and no single point of failure. Also supports non-disruptive software upgrades. Certified by Microsoft and VMWare.

Others: Concurrent inline block-level dedupe using variable-length subblock segmentation. Cloud Snapshots for data protection, backup/recovery, and Cloud Clones for off-site backup, geo-replication, DR, tape replacement and Cloud Bursting for compute.

6.5 CTERA CTERA offers a Cloud Attached Storage solution for SMBs and enterprise branch offices, that combines secure cloud storage services with on-premises appliances and managed agents in an all-in-one solution for backup and recovery, shared storage and file-based collaboration. Its products are all in hardware appliance form, ranging from the consumer-centric CloudPlug to the enterprise-grade C800 8-bay appliance. It supports file-based protocols such as NFS and CIFS.  Tiny form factor offering: CloudPlug is a plug-top computing device that instantly transforms any external USB/eSATA drive into a NAS device with automatic secure cloud backup without the need

Cloud Storage Gateways NetApp Confidential for any user intervention or PC client software, and allows remote access, file sharing and synchronization.  Next3 File System for Thin-Provisioned Snapshots: Developed on top of ext3, this creates snapshots using dynamically allocated space, so there is no need to pre-allocate and waste valuable space on the disk, and unused space is automatically recovered for file system use. It works by creating a special, sparse file (that takes no space at the outset), to represent a snapshot of the filesystem. When a change is made to a block on disk, the filesystem first checks to see if that block has been saved in the most recent snapshot already. If not, the affected block is moved over to the snapshot file, and a new block is allocated to replace it. Writes take a little longer due to the need to move the old block. Over time, this fragments the contiguous on-disk format that ext3 tries to create, affecting streaming read performance.  File and disk level backup: Backups can be stored both locally and in the cloud. Individual files backup/restore as well as incremental, disk-level (“bare-metal”) backup of live servers is possible for entire system recovery. Supports built-in and custom-created “Backup Sets” where each represents a group of files of certain types and/or that are located in specific folders. Block-level as well as partial-file deduplication is done.  Full Remote Management: All aspects of CTERA's solution can be managed remotely, with no on- site presence or intervention. Only require a Web browser to access, configure every feature, including firmware updates, real-time monitoring and event notifications.

6.6 AMAZON AWS CLOUD GATEWAY This beta offering from Amazon is a service connecting an on-premises software appliance with cloud- based storage to provide seamless and secure integration between on-site IT environments and AWS‟s cloud storage infrastructure. It is a running atop VMWare ESXi hypervisor on a physical machine with 7.5GB RAM for the VM, and 75GB of local disk storage (DAS or SAN). It exposes an iSCSI- compatible interface. It compliments on-premises storage by preserving low-latency performance, while asynchronously uploading the data to Amazon S3.  Versioned, compressed EBS snapshots: The gateway proactively buffers writes temporarily on on- premises disks, before compressing and asynchronously uploading them to Amazon S3 where they are encrypted and stored as an Amazon EBS snapshot. Each snapshot has a unique identifier for point-in-time recovery - it is mounted as new iSCSI volume on-premise, and the volume‟s data is loaded lazily in the background.  Backup, DR, Workload Migration Use Cases: Provides low-cost offsite backup using snapshots. Amazon S3 redundantly stores these snapshots on multiple devices across multiple facilities, quickly detecting and repairing any lost redundancy. If on-premises systems go down, users can launch Amazon EC2 compute instances, restore snapshots to new EBS volumes and get the DR environment up and running with no upfront server costs. To leverage Amazon EC2‟s on-demand compute capacity during peak periods, or as a more cost-effective way to run normal workloads, the Gateway can be used to move compute to the cloud by mirroring on-premises data to Amazon EC2 instances. It can upload data to S3 as EBS snapshots, from which new EBS volumes can be created using AWS Management Console or Amazon EC2‟s APIs, and attached to EC2 instances.  Monitoring Metrics via Amazon CloudWatch: Provides insight into on-premises applications‟ throughput, latency, and bandwidth to S3.  Bandwidth Throttling: Can restrict the bandwidth between the gateway and AWS cloud based on a user-specified rate for inbound and outbound traffic.  Gateway-Cached Volumes: Future support, wherein only a cache of recently written and frequently accessed data will be stored locally on on-premises storage hardware, and the entire data set will be in the cloud. Fits cloud as primary storage case, with low access latency to active data only.

Cloud Storage Gateways NetApp Confidential 7 MARKET ANALYSIS

In section, we will provide an overview of the cloud storage market and its relation to the cloud market as a whole. Specifically, we will focus on the growth of cloud service providers that offer „Storage As a Service‟ (StaaS) in terms of both revenue and capacity. Within the context of StaaS, we will analyze the market of cloud gateways and their impact.

7.1 STAAS MARKET OVERVIEW Among the StaaS providers, AWS‟ S3 is an established leader but far from the only player. Today, most public cloud providers – telcos.; Managed service providers; hosting specialists, etc – have some form of cloud storage offering, either on a stand-alone basis or as part of a broader „Infrastructure as a Service‟ (ItaaS) or „Platform as a Service‟ (PaaS) capability. Therefore, cloud storage is starting to become a material market from a revenue perspective. With S3, as shown in Figure 6, we can see that the number of objects are growing at a staggering rate approaching a 1 trillion objects. However, we do not have a distribution on the sizes of these objects are not available. Also, the growth in paid is not available directly.

Figure 6: Growth in number of objects in S3 (Source: Amazon) From a revenue perspective, as per the „451 Group‟s Market Monitor Service‟, the StaaS market

Figure 7: Growth of StaaS market (source: The 451 Group)

Cloud Storage Gateways NetApp Confidential Figure 8: Segmentation of StaaS capacity generated $388m in 2010, and will grow at a CAGR of 25% to reach $1.18bn in 2014 (see Figure 7). These figures are for business-centric cloud storage and not consumer cloud storage. significant point is that compared to all other cloud-based services, this sector has been experiencing the highest growth. Probing a bit further, we break the StaaS market into three segments: stand-alone cloud storage, online backup and archiving. As Figure 8 shows, an overwhelming portion of cloud storage objects are backup streams (online backup) and archival datasets. We expect this trend to continue at least in the near future. In light of this observation, the role of dedicated backup appliances in the datacenter is expected to diminish. More backup objects will move to the cloud provided the throughput requirements for the backup streams are being met when the target is cloud storage.

Application Areas July 2011 April 2011

Email Systems 39% 35%

Customer Relationship Management (CRM) 35% 29%

Document and Enterprise Content Management (ECM) 22% 17%

Collaboration Tools 22% 22%

Business Intelligence / Reporting and analytics (BI) 21% 14%

Disaster Recovery/Failover 20% 20%

B2B e-Commerce (Business to Business) 17% 16%

Enterprise Resource Planning (ERP) 11% 7%

Test and Development 11% 9%

Supply Chain Management (SCM) 10% 6%

Table 2: Current cloud usage (Source: ChangeWave, Corporate Trends, Aug 2011)

Cloud Storage Gateways NetApp Confidential In addition, to understand the relative adoption of public cloud storage services by different applications, we present in Table 2, the survey results published by ChangeWave Research in their Corporate Cloud Computing Trends report. The exact survey question was “Which of the following areas does your company currently support applications that run on public cloud computing services? (check all that apply). As seen in the Table, respondents picked a wide range of uses for cloud computing. Although heavily skewed towards cloud-email and CRM systems – which are compute intensive and less of a storage element, other storage/data-centric applications have been cited – like disaster recovery, BI and ECM. All of this data illustrate that stand-alone external cloud storage is yet to take off as a significant enterprise market in its own right. To encourage larger businesses to adopt cloud storage, some challenges need to be overcome. The key challenge is to address performance limitations of accessing cloud storage over the WAN. Though latency is the key metric that would be affected, given the nature of data stored, i.e., backup/archival streams predominantly, we notice that reduction in throughput is also cited as a big concern. Therefore, solutions that can enable cloud storage access for enterprise has to focus on both throughput and latency. In addition, another survey by InfoPro pointed that the top inhibitor of cloud adoption is the change/learning involved over and above expected concerns like security and cost. This indicates that solutions that enable near-seamless access of applications to the cloud stand to benefit. Admins do not want to change their enterprise applications to support a new cloud protocol (as opposed to standard storage protocols - NFS/CIFS/iSCSI, etc) in order to leverage the benefits of cloud storage. Also, cloud storage is viewed to restrict flexibility – once data is in the cloud, it is very difficult to migrate it out, a different form of vendor lock-in. In light of these concerns, an ideal cloud gateway solution that provides a pathway to move enterprise application data to the cloud should provide: relatively good performance, seamless integration to existing apps, default access to multiple clouds and the ability to move-in/move- out of clouds. We will probe deeper into the existing cloud gateway market next and examine existing vendors‟ solutions from the perspective of these requirements.

7.2 CLOUD GATEWAY MARKET 7.2.1 Workload Analysis Before we understand the nascent cloud gateway market, it would be userful to assess cloud gateways from the perspective of workloads they can support. As per Gartner, in Figure 9, we show the different workloads that can be relevant for cloud gateways. This picture splits the workloads into two categories – gateway reality and gateway potential. In addition, there is a third category, labeled as Tier-1 workloads (such as OLTP, financial databases) that are latency-sensitive which are expectedly kept out of both these categories. With regards to the gateway reality, Tier-3 workloads such as Backup/DR, file and email archives have been focus of existing vendors. This picture from Gartner also includes Home directories as well as file distribution workloads under Tier-3. From our experience at NetApp, this is considered as primary workloads by many of our customers and a prime use-case for NetApp‟s FAS systems. Moreover, we do not see the gateway being deployed for these workloads in our analysis. So, we do not believe that these two workloads fit in the „gateway reality‟ category but should be classified under „gateway potential‟. For Backup/DR workloads, NetApp does not sell a competitive product in the dedicated backup appliance space (like EMC/DataDomain‟s BRS). Therefore, having a cloud gateway solution would present an opportunity to have a solution in this space, albeit the storage could be residing in a public cloud.

Cloud Storage Gateways NetApp Confidential Under „gateway potential‟, we see the typical Tier-2 applications, mainly primary workloads: email,

Figure 9: Cloud Gateway workloads (Source: Gartner) collaboration, workgroup files and development/test. These workloads are characterized by applications that care about latency and throughput, but can tolerate some variation in both of them. From NetApp‟s perspective, it would be useful to understand our share of this tier to assess if cloud gateways represent a threat to an existing sales segment.

FY11 FY16 FY11-16 FY11 NTAP FAS E-Series Size Size CAGR Share Rev Rev ($ Billion) ($ Billion) (%) (%) ($ Billion) ($ Billion) DAS (incl. Server Attached Storage) ~$16.01 ~$8.01 -13% ~3% $0.4

Active and Dark $0.6 $2.6 33% ~0% Archives Big Enterprise Content $1.3 $6.0 36% ~0% C Depots ~29% $1.3 File Svs, HomeDirs etc $4.6 $8.0 12% 15.4% Share in the Open N/W Virtualized (Collaboration, App Dev, Storage Market Tier2-BP, parts of IT infra, $10.0 $19.3 14% ~22% $2.2 parts of Web Infra) SVI Virtualizing (Collaboration, App Dev, Tier2-BP, parts of IT infra, $8.8 $6.3 -7% ~16% parts of Web Infra) $1.4

Big HPC, FMV, VSS $1.5 $2.9 15% ~7% $0.1 B Big Analytics (DSS/DW, Web Infra) $0.6 $2.6 34% ~0%

Tier1-BP $4.5 $5.0 2% ~4% $0.2 Non SVI DSS / DW $8.5 $10.3 4% ~2% $0.1 $0.2

Figure 10: NetApp share based on workloads

To understand the impact of cloud gateways on NetApp‟s revenue streams, we have in Figure 10, the breakdown of our current revenue based on both products as well as workloads over the different

Cloud Storage Gateways NetApp Confidential financial years. It is clear that the bulk of the revenue is obtained from two broad segments: a) Collaboration, App Development, Tier-2 business processing applications, web infrastructure – both virtualized and non-virtualized segments – contributing $3.6B towards total revenue b) File services and home directories – contributing $1.4B to the total revenue. As can be seen, a significant portion of these overlap with the workloads that can be potentially satisfied with a cloud gateway (as seen in Figure 5). This is the clear potential threat that NetApp to its current revenue generation model from the emerging cloud gateway vendors.

7.2.2 Cloud gateway vendors‟ share Since this market is very nascent, most cloud gateway vendors do not have more than a dozen customers. Some of the vendors, like Panzura sell their solution exclusively to enterprises, inspite of the number of customers, each customer might lead to a bigger revenue. On the other side of the spectrum, there are vendors like CTERA who have a consumer focus and a larger number of customers. Storsimple and other like them fall in between these two extremes. In Figure 11, we have an approximate number of customers for many of these cloud gateway vendors, their focus and investors. These numbers indicate that this market is pretty nascent. However, with the emergence of Amazon‟s AWS gateway and other gateways from established enterprise storage system vendors might change the landscape significantly.

Figure 11: Cloud Gateways, customers and investors

From this Figure, we can also see that more than $100m of VC funding has been infused into this space. Most vendors have an SMB or MSE focus, with some notable failures as well (Cirtas). There is a high- level consensus between different analysts like Gartner, ESG and 451 Research Group in their studies of this market: a. Current vendors are mainly startups focusing on SMB/MSEs with backup/archival as the primary use-case. b. Most vendors have significant disadvantages compared to enterprise storage systems today – lack of standard high-availability/reliability features and enterprise readiness (untested file systems). Only a few of them have HA as a default option.

Cloud Storage Gateways NetApp Confidential c. Established enterprise storage vendors (like EMC or NetApp) can develop solutions that augment their existing solution portfolio, i.e. make cloud storage as an extra tier as well. In addition, they are not constrained by many of the disadvantages the smaller competitors, especially enterprise readiness. Once one of the established storage vendors have a solution, they feel that cloud gateways would be adopted faster in enterprise data centers.

7.3 CASE STUDIES 7.3.1 Medplast MedPlast provides thermoplastic and elastomer molding products and related services to the healthcare, pharmaceutical and certain consumer/industrial markets. The company has about 600 employees across five sites and is approaching $100million in revenue. It fits clearly in the MSE space that many gateways are targeting. Medplast‟s IT department consists of just one person running all IT-related operations. Accordingly, it has a heavy emphasis on outsourcing IT processes and applications and retain only critical systems under direct control. The company already uses SaaS and hosted applications (like Rackspace-hosted Exchange). However, it generates a lot of critical data internally, mainly manufacturing and engineering related. It used a tier-one Hitachi SAN for this data. Medplast was using EMC‟s Data Domain target arrays for backup/recovery via Veeam application instances and tape backups for off-site DR. Upgrading the Data Domain systems to growing data volumes and corresponding upgrade cycles led Medplast to explore cloud storage for backup/recovery. This led them to cloud gateways as they offer both primary storage as well as a means to backup into cloud storage, and also eliminate tape backup in the process. Medplast worked with Cirtas but replaced them with StorSimple. StorSimple offered them with an iSCSI interface to their applications; enterprise-grade capabilities (specifically high-availability), and was a certified VMWare and Microsoft partner (vendors Medplast uses extensively). Regarding the cloud storage provider, they chose Amazon S3 because of their ability to geo-replicate for no extra charge. StorSimple deployment at Medplast consists of a HA pair, with 10TB data local and 4TB in the cloud. These storage systems are used by Medplast as primary storage for mission-critical applications and other Tier-2 apps – ERP, Sharepoint and file servers. Therefore, usage of Tier-1 storage has reduced considerably. Also, StorSimple enables simpler backup and DR operations and eliminated tape backups entirely. In addition, Snapshots are also available via StorSimple for local recoveries. In summary, Medplast was able to solve all their storage requirements (primary, backup and DR copies) necessitated by growth via StorSimple.

7.3.2 NYU Langone Medical center NYU Langone Medical center comprises NYU school of Medicine and three hospitals and has a trifold mission: patient care, biomedical research and medical education. The storage-engineering department at the center has faced numerous challenges in recent years, which led it to evaluate cloud-based options. The department had a four-tier storage strategy that it was looking to squeeze more efficiencies out of : tier-1 was high performance SAN running on EMC VMAX; tier-2 was IBM XIV; tier-3 was NAS (Windows-based file server clusters mapped to SAN); and tier-4 was off-site storage for archive/retention and setup. Key challenges included data growth in the order of tens of terabytes and investing in individual storage systems were no longer feasible. Also, the center is planning to move its primary datacenter (currently housed in an IBM-hosted and managed facility, where they have about 70TB of tier-1 storage) to reduce operational costs for storage and data retention.

Cloud Storage Gateways NetApp Confidential They evaluated systems from IBM and EMC, but neither could offer performance or cost advantages for tier-4 storage. They started looking at cloud storage providers and settled on Nirvanix for the off-site cloud component, primarily based on cost. Nirvanix was able to offer storage at $0.15/GB/month – including unlimited data movement into and out of the cloud – versus $0.87/GB/month it was paying for IBM storage. With Nirvanix, it needed a means to send data to the cloud with performance requirements, this effort lead to investigating cloud gateway options. Key requirements included performance and scalability to hundreds of users as well as high availability, seamless/transparent data access (over CIFS and NFS) and efficiency (dedupe and compression). They settled on Panzura‟s Alto Cloud controller after evaluating a number of options. They have recently finished a pilot deployment with two 20TB controllers, with plans to move into production by end of 2011. Panzura‟s selection was based on its ability to have a large front-end cache with SSDs that allowed it to scale to hundreds of users concurrently. Panzura‟s global namespace was a factor too. It enables their infrastructure to grow incrementally by adding gateways at remote locations with a single namespace and point of management. The center is initially using Panzura for its research department – replacing low-end NAS units that individual researchers had purchased in the past. But it quickly realized that there are other potential workloads, including archival workloads (it currently archives to EMC Centera via Symantec‟s Enterprise Vault) can move to the gateway. Its also considering Panzura/Nirvanix to replace their tape-based backup system. According to them, Panzura‟s snapshots and Nirvanix‟s replication functions effectively remove the need for backup. In all, 100TB of local data can move to the cloud. The center is aware of Panzura‟s shortcomings in their essentials list – richer Active Directory integration; end-user snapshot-based recovery; and global read/write for NFS as well as CIFS. However, they believe Panzura/Nirvanix combination could form an integral part of their next-generation datacenter buildout. Initially, Nirvanix was used only for tier-4 storage, but with Nirvanix hNode deployed in their local datacenter, they are contemplating using Panzura/Nirvanix for both tier-4 and tier-3. Panzura would be used to front-end the hNode appliance and export a global namespace.

7.3.3 Seneca Data Seneca Data is a IT value-added distributor and custom systems manufacturer that focuses on building integrated systems and related services for resellers, OEMs spanning servers, desktops/laptops and storage. Its business is split into three divisions – partnering services, engineering services and life-cycle management services. The company has been experimenting with cloud-based technologies and services via its CLOUDeCITY (www.cloudecity.com) service – a marketplace for its 3000-4000 resellers to offer cloud services to SMBs and others looking for web-based tools and applications to enable sales, marketing and operational tasks. Seneca believes that CLOUDeCITY offers resellers a recurring revenue model without much investment in a complex and expensive infrastructure. The portal, started in 2011, offering modest services – CRM (based on SugarCRM), website/email hosting and blogging tools. However, interest has grown and there is demand for higher functionality tools such as business-productivity, finance and even specific, vertically oriented tools like healthcare tools. As part of this expansion, Seneca formed a partnership with CTERA to add managed storage and data- protection services based on CTERA‟s cloud storage gateway. Although Seneca markets an online backup service called DataMend for users that require a more customized offering, it found value in CTERA because of its simplicity and ease-of-use (plug and play functionality). The partnership initially focused on CTERA‟s CloudPlug, offering shared local storage, cloud backup, snapshots and browser-based file access aimed at consumers and small-business users. However, it has since added the CTERA C200 and C400 appliances, which offer 6TB and 12TB of local storage, respectively as well as integration with the CTERA cloud for backup, remote access and collaboration. Seneca has plans for larger appliances from CTERA in the near future. Currently, CTERA charges a monthly fee for the CloudPlug, C200 and C400 appliances, which increases with more storage. In

Cloud Storage Gateways NetApp Confidential addition, CTERA has workstation agents that can backup/recover Microsoft Exchange, SQL and Active Directory data. As of now, Seneca claims their customers are pushing around 30% of their locally backed- up data to the cloud. 7.3.4 Energy-industry customer One of Panzura‟s early wins was with a large energy-industry customer that deployed Panzura‟s Alto Cloud Controllers to create a private cloud storage alternative to off-site tape repositories. Prior to Panzura deployment, the customer used tapes to move older seismic data to an off-site repository, which created potential data-leakage vulnerabilities since the tapes were not encrypted. The customer creates 6PB of data per year, which consists of seismic trace files that are a few hundred terabytes in size. Prior to its Panzura hybrid cloud deployment, data-restoration jobs could potentially take weeks to accomplish using off-site tape repository, a process that takes only a few hours using cloud storage. Beyond backup, more importantly, the customer is using the cloud to keep more of its data set online and could potentially use hybrid cloud storage to extend data access to its partners. The company uses BlueArc and NetApp NAS systems to hold live data, which offloading older data sets to the off-site tape archive. Storage costs for the customer were reduced from $6m to $1.2m per year with the hybrid appliance and private cloud storage resources eliminating the need to purchase additional NAS systems. The backup-replacement solution in the cloud costs them $0.5/GB versus $2/GB for data held on off-site tape. Also, to share their large data sets with partners, they currently ship the NAS systems to them. However, with Panzura gateways, the data may be shared remotely from the private cloud. They have their data in a private cloud as opposed to a public cloud due to security concerns and do not plan on changing it anytime soon. Panzura‟s compression capability help reduce the size of nearly 6TB of data it needs to transfer nightly. Also, Panzura‟s security certificates and key management mechanisms were essential features expected by this company.

7.3.5 Psomas Psomas is a 500-person civil-engineering firm based in Los Angeles, serving public and private clients in the transportation, water, site development, federal and energy markets. The company has 10 offices – plus a datacenter – spread across the western US, including locations in California, Arizona and Utah. By 2010, Psomas was struggling to meet its recovery objectives through its existing tape-based backup infrastructure. Partly due to long-standing policy to “back up everything” , it could not meet its backup windows. Moreover, it was running into tape-media upgrade and reliability issues, and was also finding that maintaining tape-based backup infrastructure at each remote office was increasingly difficult to justify. Psomas team started exploring disk-based backup alternatives, including EMC‟s Data Domain based backup-to-disk with de-duplication, plus options from Symantec and IronMountain‟s online backup services. The latter two were too expensive to begin with and Data Domain‟s solution had high up-front capex costs. Meanwhile, Psomas was exploring a new storage project from an existing vendor – Riverbed. It was one of the early beta customer for Riverbed‟s Whitewater product before converting into full production customer in early 2011. It is now running the appliance in all of its locations, almost exclusively as virtual appliances and has transitioned away from tape-based backup completely. Psomas leverages the Whitewater appliance to backup to local disks for operational recovery and uses the cloud for DR purposes. As part of this move, Psomas selectively backups only certain data – CAD files, office documents and databases. Also, Psomas moved to a hosted Exchange service, so it need not backup emails anymore. Psomas uses Amazon‟s S3 as its cloud storage backup target and has currently 12TB of backup data in the cloud (with dedupe ratios of 20:1). With this approach, Psomas does not worry about running out of

Cloud Storage Gateways NetApp Confidential capacity and upgrading hardware. Amazon‟s very high reliability (eleven nines) and availability (four nines) are important assurance metrics for them. Overall, Psomas has high confidence in its backup infrastructure with very fast restores. Moving to Whitewater is expected to save them around $80000 per year across capex and opex. In addition, it is exploring leveraging Amazon‟s EC2 for running its Autodesk CAD application to enable “cloud bursting” during busy periods.

Cloud Storage Gateways NetApp Confidential

8 NETAPP DIFFERENTIATION

In this section, we will discuss the various advantages that NetApp enjoys in comparison to other cloud gateways vendors which enable significant opportunities.

8.1 UNIQUE ADVANTAGES OF HAVING A NETAPP SOLUTION IN THE MARKET NetApp exploits different types media in its storage systems to deliver compelling value to its customers at a low TCO. This includes low-cost SATA drives that are made enterprise ready by providing reliability features in software over and above the raw disks. Cloud storage can be perceived as one such medium with unique characteristics – infinite capacity; low cost of maintenance/management; SLOs defined by cloud service providers (performance, reliability, etc.); poor performance - varying bandwidth; high latency; object-based interface and global access. Just like ordinary low-cost SATA disks were made useful in the enterprise context applying value-added features in software, cloud storage can be made into enterprise-class by providing significant value above raw cloud storage. NetApp already enables different storage tiers in the datacenter ranging from a performance oriented tier at or closer to the host (like flash-based Project Mercury); primary storage tier (FAS based systems); archival tier for disk-to-disk backup based on SATA drives. Having a cloud gateway would enable cloud storage to be an extra tier in this hierarchy. With SLO-based (or policy based) data management, NetApp‟s data management software can identify and move data to the appropriate tier based on data or workload properties. Hot data will move closer to the higher performance tier and cold data will move to the slowest tier. The cloud storage tier‟s performance characteristics are largely governed by the providers‟ policies and their cost structure. For example, Amazon‟s price and performance varies significantly based on the regional data center picked. So, cloud storage can be an effective tier with flexible characteristics based on cost and enhance our SLO-based data management vision. With most cloud storage providers, the cost of cloud storage is dominated by two factors – network cost of access (both the number of requests and amount of bytes transferred) and the amount of cloud storage used. We can reduce both by effective caching and prefetching strategies for suitable workloads. Such techniques have been exploited by many vendors in this space. Given the latency, only write-back caching strategies are feasible for data exported by a cloud gateway. The amount of „dirty‟ data buffered depends on the cloud storage performance. Compared to LAN use-cases, where the NVRAM is sufficient to buffer adequate amount of new writes, with WAN latencies, more capacity is needed, leading to disk- based buffering. With any write-back caching strategy, the buffered data needs to be protected against failures that can lead to permanent data loss of this data. Typically, this is done by means of a high- availability solution that entails replicating the buffered data to another system. The replication needs to happen with very little overhead lest it affects latency. Building a high-performance HA solution requires significant effort and time. Most of the current gateway startups do not have such technology. NetApp has had HA techniques by means of specialized low-latency hardware (infiniband interconnect for NVRAM replication) for a long time now and can be exploited fully in this context to provide reliability for the buffered data. This is a significant and key advantage for NetApp other enterprise storage system vendors. Also, gateways that are only virtual appliances, need to use standard Ethernet based schemes for replicating data between two instances. In such cases, the clients will experience latencies comparable to two Ethernet network hops, making them infeasible for many latency-sensitive primary workloads.

8.2 SOLUTION GAPS – GATEWAY OPPORTUNITIES FOR NETAPP Cloud gateways can also enable NetApp to fill gaps in its solutions portfolio. As of now, we do not have a compelling solution to rival EMC‟s Data Domain backup appliances. Enabling cloud-based backups by means of a cloud gateway can satisfy this gap. In addition, the key attributes of the Data Domain

Cloud Storage Gateways NetApp Confidential appliances need to be matched – inline deduplication to reduce storage capacity with a high ingest rate to enable timely backups. Among the features, inline deduplication is essential in the cloud gateway context to reduce cloud storage capacity irrespective of the workload (backup, archival or primary). Since the gateway is positioned to reduce cost for a customer, storage efficiency has to be a default feature. Inline deduplication‟s utility for backup streams is more critical, given the typical capacity savings seen (above 90% is not uncommon). For a high ingest rate, we need ensure efficiencies in buffering data on disk reliably and in destaging to the cloud. To buffer data on disk reliably and with high performance, our current techniques on NVRAM mirroring should suffice. With regards to destaging, we need to develop strategies that are cognizant of the cloud storage provider‟s characteristics. Most cloud storage services, like Amazon‟s S3, are most cost-effective and provide good throughput when writing out large objects via parallel network streams. This implies allocating appropriate disk capacities on the cloud gateway to buffer large objects with destage activities at the right intervals. Implementing such techniques can be accomplished with reasonable effort and can lead to a compelling NetApp solution to compete with EMC‟s Data Domain appliances.

Cloud Storage Gateways NetApp Confidential

9 RECOMMENDATIONS FOR NETAPP

In this section, we will discuss the different options for NetApp in the cloud gateway space and the rationale, timescale for pursuing those options. We will start with a look at the key IP we require to have a compelling and complete solution followed by an assessment and comparison between organic product development („Build‟) with NetApp, acquisition of existing vendors (“Buy”) and partnership (“Partner”) options.

9.1 KEY INTELLECTUAL PROPERTY (IP) NEEDED FOR CLOUD GATEWAYS In a previous section, we outlined two reasonable outcomes for positioning a NetApp cloud gateway product – a backup, tape replacement using cloud storage solution; a primary storage appliance for specific Tier-2 application with integrated, SLO-based data management features. For both these cases, before evaluating the ways to bring a cloud storage gateway solution to the market, the following are key technical building blocks required for such solutions: 1. Basic infrastructure (for both secondary and primary storage solutions): - File to object protocols conversion. - Volume to objects/group of objects data granularity mapping. - Security of objects in the cloud (encryption).

2. Value-added features (for both secondary and primary storage): - Compression - Deduplication - Application integration. - Cloud-aware management (like auditing)

3. Optimizations for a viable primary specific storage solution: - WAN latency optimization via read and write back caching, pre-fetching

4. Infrastructure for global collaboration of a primary storage repository: - Global Locking across a WAN.

The more immediate need is to deliver a secondary storage solution based on the market demand. In terms of IP, NetApp has a working prototype of primary and secondary storage infrastructure (1) but not production ready functionality. NetApp has most of the IP for value-added features of primary and secondary storage (2), except for cloud-aware data management as these are available within Data ONTAP today. Whereas, optimizations for a primary specific storage solution (3) and infrastructure for global collaboration (4) are new areas where we need to get IP. NetApp has three possible options for the building blocks for the cloud storage gateway solution: a. Organic development within NetApp (“Build”) i. Pros: Ability to provide this solution as a simple license key that enables customers to easily direct their data to external cloud storage. All NetApp value add (reliability, performance, data mgmt., SLO mgmt.) can be exposed. Prototype already exists for secondary data use cases. ii. Cons: Slow Time to market. No opening in the roadmap to deliver the (1) and (2) functionality until 2015. Resources required to deliver the solution. Primary storage use cases would need to be addressed after that timeline b. Partner or Buy & Rebrand (“Partner”)

Cloud Storage Gateways NetApp Confidential i. Pros: Immediate time to market ii. Cons: Difficult to position existing lower end FAS solutions with partner solutions. No differentiation versus other cloud gateway solutions. Potential overlap in positioning low end FAS and cloud gateway solution. New solution for customer to manage. Cost of purchase to NetApp in the case of buy & rebrand. No unique IP being purchased in the case of buy & rebrand. c. Buy and integrate (“Buy”) i. Pros: Ability to provide this solution as a simple license key that enables customers to easily direct their data to external cloud storage. All NetApp value add (reliability, performance, data mgmt., SLO mgmt.) can be exposed. ii. Cons: Cost of purchase. Slow time to market as work to integrate requires data path work on ONTAP. Resources required bringing IP into the NetApp stack. No unique IP being purchased.

Based on the assessment of the three options, the recommendation is two-fold: 1. Tactical: Drive partnership with 1-3 companies to have an interim solution for time to market reasons to ensure stickiness with existing customers that wish to leverage the external cloud storage for certain types of data. Enables service provider customers to easily ingest data from the enterprise to their external cloud storage. Will require careful positioning versus existing low- end FAS. 2. Strategic: Change the roadmap priority to deliver a compelling cloud gateway solution for secondary data sooner than Longboard. In parallel, conduct due diligence on interesting startups with unique IP for potential acquisition to enable primary data use cases.

Cloud Storage Gateways NetApp Confidential

10 NETAPP PROJECTS

In this section, taking into account all the observations from the previous sections, we will recommend a few projects to help understand cloud gateways as well as answer questions that will help NetApp decide on the nature and contours of future products in this space.

10.1 PROJECT 0: V-SERIES - SNAPVAULT TO THE CLOUD This is a project that is already underway in the V-series product group and is headed by Chris Busick. Project Goal: This project aims to provide the ability to copy snapshots from traditional Netapp storage systems to cloud storage. Synopsis: The project aims to build a module below the disk subsystem that will perform the translation of blocks to cloud objects and store them on Amazon‟s S3. Coalescing of a fixed number of blocks to a single Amazon object are being studied to reduce network traffic to S3 as well as to improve throughput. Initial observations indicate that the latency to S3‟s network vary widely. As part of this project, the basic infrastructure pieces needed to communicate with Amazon AWS (via the S3 protocol) would be built within ONTAP and can be leveraged by further ATG projects.

10.2 ATG PROJECT 1: UNDERSTAND CACHING BEHAVIOR AND PERFORMANCE FOR TIER-2 WORKLOADS (WAN LATENCY MITIGATION) This will be the first foundation project that will help us understand the performance characteristics of workloads that can leverage the cloud gateway. Specifically, workloads that currently use NetApp FAS systems for their storage. Project Goals: The goals of the project are the following: 1. Assess performance boundary limits (using ONTAP): Understand the performance limitations of cloud gateways with a realistic cloud storage provider (like Amazon S3) and assess the applicability of a gateway for Tier-2 application workloads. 2. Efficient read and write-back caching, prefetching strategies: Assess different caching technologies in this context for read caching. Evaluate methods to perform write-back caching (buffering on disk) in this context and pick an efficient strategy. Also, examine prefetching strategies that reduce cold-misses that impact latency. Overall, look at different strategies to mitigate WAN latencies and its relationship to total cost. 3. Efficient inline deduplication: Since deduplication is vital to reduce cloud storage costs, it would be explored as part of this project. However, it introduces extra complexity in the code path as it has to be done inline to requests. In addition, deduplication can reduce network traffic significantly to further reduce costs. 4. Efficient cloud interaction: Since the cloud gateway would be bridging file or block-based protocols to cloud storage, we need to map blocks to objects. An efficient mapping should enable flexibility and lesser network traffic to the cloud. 5. Efficient metadata management: This is a side-effect of many of the features like deduplication, block to object mapping mentioned above. These features imply maintaining different types of metadata, i.e. lookup tables. These lookup tables might need to be kept on disk or flash due to their size. Keeping them efficient in terms of both size and access performance are important to keep the overall performance of the gateway high.

Cloud Storage Gateways NetApp Confidential 6. Cognizant of VSA deployment: The cloud gateway might need to be deployed as a VSA in multiple scenarios. VSAs have their own peculiarities in terms of performance and resources. While designing the gateway appliance, it is important to be cognizant of how features might change in the VSA context. Comments: As mentioned in the goals, this project will be performance centric, while building a relatively realistic cloud gateway prototype with a lot of key features included. We hope to build the prototype leveraging ONTAP features wherever applicable and evaluate the prototype in a physical appliance. The outcome of the evaluation would be a detailed performance analysis of a cloud gateway appliance and its applicability to common Tier-2 applications. The cloud storage would likely be Amazon S3, and the Tier-2 applications we would like to evaluate in this context are Microsoft Exchange and Microsoft Sharepoint. Since this is the first cloud gateway project, some essential features like security and reliability have been kept out of the scope. In the VSA context, compared to the physical appliance, the cloud gateway might be handicapped in terms of features and performance. Thus, further reducing the list of Tier-2 workloads that can be supported. These workloads would be highlighted as part of our evaluation of the VSA as well.

10.3 ATG PROJECT 2: CLOUD SECURITY AND RELIABILITY ACROSS CLOUD PROVIDERS This project is a potential followup project to Project 1 above and can leverage the infrastructure built. However, the scope of this project does not overlap with the previous ones, so it can be executed in parallel to the above projects. Project Goals: The primary goals of the project are the following: 1. Security: With a cloud gateway, enterprise data is leaving the datacenter over a publicly accessed WAN network and is stored for prolonged periods of time at a cloud storage provider. Therefore, security concerns need to be addressed to prevent enterprise data from being compromised. A commonly applied technique is to encrypt the data before sending it to the cloud storage provider. The cost of such encryption and the corresponding decryption needs to be evaluated in this context. Moreover, efficient key management for the encryption is another area that needs to be addressed. 2. Reliability: To prevent data-loss in an traditional enterprise data center, parity schemes such as RAID are being employed. However, with cloud storage, traditional RAID algorithms might not be applicable given the scale of data stored. Alternative techniques need to be evaluated. Moreover, with cloud storage, it is essential that data loss due to network outage to a specific cloud storage provider is protected. This data along with their associated protection metadata – like parity blocks or erasure coded blocks need to be spread across multiple cloud providers to enable recovery of the data when one or more of them fail or suffer outage. 3. Auditing: The actual cost of cloud storage is dependent on the number of network accesses made to the cloud service, the amount of data transferred as well as the actual number of bytes stored in the cloud. In order to validate the bill presented by the cloud storage provider, for the data accessed via the gateway, the gateway has to audit all the traffic to the cloud. Comments: These features are essential to have a viable gateway product. Most vendors in this space solve these issues in different ways. The project would aim to look at all the options available and make the right recommendations.

10.4 ATG PROJECT 3: CLOUD-AWARE MANAGEMENT This project will look at the data management aspects that are introduced due to the cloud gateway in an enterprise datacenter. Some of the issues are extensions to known issues, while some are specific to the cloud.

Cloud Storage Gateways NetApp Confidential The primary goals of the project are: 1. SLO-aware cloud tiering: With cloud storage being a tier in the storage hierarchy, we need to make this tier SLO-aware. The cloud provider would export a set of SLOs. The intelligence in the gateways would be able to enhance or provide stricter bounds for these SLOs. Together, we will have a new set of SLOs for the cloud storage tier. The rest of the data management infrastructure in the enterprise datacenter needs to incorporate these new SLOs and make them available to the end-clients. 2. Cloud storage billing support: As mentioned in the previous project, auditing support is an essential feature. In addition to auditing, we need to have management infrastructure in the gateway that will allow admins to query the gateway to ascertain costs dynamically. For example, if the admin would like to know the cost of a specific backup or a set of backups, we need an interface to expose the actual costs for a specific time period. 3. Cloud storage supportability: With cloud being a tier, it introduces a new support tier as well. The problems and their resolution with cloud storage are vastly different from traditional enterprise storage. We need to have the right management infrastructure to assess and resolve cloud storage problems efficiently. Comments: Among these challenges, making the cloud storage tier SLO-aware is the biggest one. Moreover, it would enable NetApp to extend our SLO-aware management portfolio and provide greater value to our customers.

10.5 ATG PROJECT 4: CLOUD ENABLED GLOBAL FILE SYSTEM This project envisions cloud gateways to enable a single, geo-distributed, global file system, where the actual data of the file system resides in the cloud, and the different cloud gateways provide access to this file system. This project requires that some of the essential features outlined in the previous projects are complete and we have a viable single-site cloud gateway. Project Goals: The goals are the following: 1. Globally distributed clients access a single file system: All data and metadata corresponding to the file system is resident in the cloud. The gateways are deployed at different sites (or datacenters) that are globally distributed. All the gateways provide a view to a single file system (and namespace) and coordinate among themselves to enforce consistency and coherency. 2. Global data consistency mgmt : Since files could be shared in the single file system across sites, there needs to be global data consistency management between the gateways by means of a global lock manager. Making sure the manager works effectively across WAN distances is the key challenge. Also, understanding and articulating the consistency semantics in this scenario is another challenge. 3. Global delegation coordination: For performance reasons, some gateways might need complete/exclusive access to certain portions of the global file system for prolonged periods of time, i.e. delegation. Since the delegation managers are WAN separated, new mechanisms to handle delegations might need to be devised. Comments: Panzura already has this feature and is a key enabler in many of its enterprise accounts. Given the nascent customer base of Panzura, it is not clear how valuable this complex feature is for the cloud gateway market.

Cloud Storage Gateways NetApp Confidential 11 REFERENCES

1. Burton Group. Cloud-Storage Gateways: Bridge the Gap. Jan 2011 2. The 451 Group. Cloud Storage On-Ramps. November 2011. 3. Nasuni. Nasuni unveils New Storage Services Backed by a 100 percent uptime Service Level Agreement. July 18, 2011 4. Intel Corporation. An Enterprise private cloud architecture and implementation roadmap. June 2010. 5. Ian Howells (StorSimple). Cloud-as-a-Tier: Building an Enterprise Architecture For Secure High- Performance Hybrid Cloud Storage. May, 2011 6. Peer-1 Hosting. Peer-1 Hosting Rolls Out EMC Atmos. July,2011 7. Dale Stara, Everest Group. Where Are Enterprises in the Public Cloud? | Gaining Altitude in the Cloud. April, 2011 8. Frank Gillett, Forrester. The Age of Computing Diversity. September 2010 9. Gene Ruth, Burton Group. Market Profile: Cloud-Storage Service Providers, 2011. Dec 2010 10. Intechnica. How fast is the Cloud? June,2011 11. Cloud.com. 2011 Cloud Computing Outlook | Survey Results. 2011 12. The 451 Group. CTERA grows hybrid cloud storage base, partners with EMC. Aug, 2011 13. Intel Corporation. Taking Control of the Cloud for Your Enterprise. 2011 14. Gartner. Magic Quadrant for WAN Optimization Controllers. Dec, 2010. 15. Gartner. Hybrid Cloud Appliances Expand Cloud Storage Use Cases. Jan 2011 16. Gartner. Cloud-Based Server Backup Services 1Q11 Update. Jan,2011 17. Gladinet. Gladinet Blog (http://gladinet.blogspot.com). June 2011 18. The 451 Group. (Mezeo) IaaS, PaaS and Enabling Technologies – Market Monitor. Jan, 2010 19. Gartner. Cool vendors in Storage Technologies. April 2011 20. Gartner. Cloud IaaS: Adding Storage to Compute. Oct 2010 21. Gartner. Hype Cycle for Business Continuity Management and IT Disaster Recovery Management. Sept, 2010 22. Gartner. Competitive Landscape: Cloud Storage Infrastructure As a Service, North America, 2010. June 2010 23. Gartner. Cloud Storage: An Emerging Market. June, 2009 24. The 451 Group. Nirvanix Quadruples Cloud Storage Capacity, Eyes European Expansion. April 2010 25. Panzura. The Panzura Global Cloud Storage Platform. 2011 26. The 451 Group. Do Startups with On-Ramps hold the key unlocking Cloud storage? June 2010 27. Mezeo. Distributed data store and async geo services with single namespace. June 2010 28. Amazon. Amazon S3 announces server side encryption support. Oct, 2011 29. Network Computing (magazine). Quest and StorSimple Collaborate On Cloud-Based Backup. May, 2011

Cloud Storage Gateways NetApp Confidential 30. CipherCloud. Cloud Encryption Gateway. Sept, 2011 31. Information Week (magazine). HP Takes On Amazon With Enterprise Cloud Services. Jan, 2011 32. IDC. How Conigent Effectively Leverages Public Cloud IT Infrastructure with Egnyte. Dec 2010 33. The 451 Group. Nasuni unveils File aims to be NetApp for cloud storage. Feb 2010. 34. ESG. Nasuni builds a bridge to the Cloud. April 2010 35. The 451 Group. Do startups with On-ramps hold the key to unlocking cloud storage. June 2010 36. The 451 Group. TwinStrata emerges with focus on hybrid cloud storage environment. May 2010 37. The 451 Group. Nirvanix Quadruples cloud storage capacity, eyes European expansion. April 2010 38. ESG. Will cloud storage come of age in 2010. Feb 2010 39. IDC. Cirtas Announces cloud storage controller for the enterprise storage. Sept 2010. 40. The 451 Group. Cirtas goes live with cloud storage controller, reveals Amazon as a investor. Sept 2010 41. The 451 Group. Panzura pounces on hybrid cloud storage opportunity for enterprises. Sept 2010.

Cloud Storage Gateways NetApp Confidential

NetApp provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer‟s responsibility and depends on the customer‟s ability to evaluate and integrate them into the customer‟s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

© 2015 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, xxx, and xxx are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. <> All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.nTR- Cloud Storage Gateways XXX-XX NetApp Confidential