Leveraging For Optimized Storage Management

IncludingEMC Proven EMC Professional Proven™ ProfessionalKnowledge Sharing Certification 2009

Information Storage & Management (EMCPA) Mohammed Hashim Rejaneesh Sasidharan Escalations & Training Manager Technical Lead-SME Global Technical Support Global Technical Support PSE Lab, Bangalore PSE Lab, Bangalore Wipro Technologies Wipro Technologies [email protected] [email protected]

Leveraging For Optimized Storage Management

Mohammed Hashim Rejaneesh Sasidharan Escalations & Training Manager Technical Lead-SME Global Technical Support Global Technical Support PSE Lab, Bangalore PSE Lab, Bangalore Wipro Technologies Wipro Technologies [email protected] [email protected]

Information Storage & Management (EMCPA)

2009 EMC Proven Professional Knowledge Sharing 1 Table of Contents

Leveraging Cloud Computing For Optimized Storage Management...... 1 Introduction...... 3 Cloud Computing and Data Storage ...... 3 Industry Relevance and Article Overview...... 4 SOA...... 4 SaaS...... 6 Distributed System...... 6 Grid Computing ...... 7 Applying Cloud Computing to Storage ...... 7 Cloud Computing Status in the Global Market...... 8 Cloud in Action...... 8 Market Profile and Market Size of Cloud Computing...... 9 Various Models of Cloud Computing...... 11 Customer Adoption...... 13 Drivers for Adoption and Industrial Outlook...... 14 Major Cloud Service Providers...... 16 Prominent Players ...... 17 Evolution of Cloud based Storage ...... 18 Need for Cloud based Storage ...... 19 Comparison Chart of Major Cloud based Services...... 20 Implementing a Cloud Computing Solution...... 22 Optimizing a Solution...... 24 Managing the Cloud Solution ...... 25 Managing Enterprise 2.0 and SLAs...... 27 Enterprise SLAs and Cloud Computing ...... 28 Security in the Cloud ...... 28 Background Analysis...... 29 Securing the Cloud Solution ...... 30 Future of the Cloud ...... 34 Conclusion: Cloud Vision and Strategy...... 35 Appendix A: Technical References ...... 37 Bibliography ...... 37 Websites...... 37 Appendix B: Cloud Taxonomy...... 38 Cloud Technology Landscape...... 39 Appendix-C SaaS, Cloud and Web2.0...... 41 Biography...... Error! Bookmark not defined.

Disclaimer: The views, processes, or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

2009 EMC Proven Professional Knowledge Sharing 2

Introduction

Cloud Computing spreads IT computing resources across cloud boundaries that are selectively accessed through service providers. Generally, users pay for computing capacity on-demand and are not concerned with the essential technologies or challenges used to achieve the increased and diverse storage scalability, server and other resource capacity and extensibility.

Applications of the Cloud Computing model are expanding rapidly as connectivity costs fall and computing hardware becomes more efficiently operates at scale. The cloud’s services have expanded beyond web applications to include data storage, raw computing, and access to different specialized services. This is due to the increase in governments’ economic incentives for multiple users sharing common resources, and technological advancements that have improved collective hardware and software performance that earlier delayed distributed computing solutions. The cloud is becoming a popular solution to the problem of horizontal scalability.

Cloud Computing and Data Storage

Cloud-based storage has evolved from continuing attempts to decouple storage from applications so that each resource can be optimally scaled, utilized and managed. Cloud storage is a model of networked data storage where data resides on multiple virtual servers, generally hosted by third parties rather than dedicated servers. Hosting companies operate large data centers; users who require data hosting buy or lease storage capacity. The operators, in the background, virtualize the resources according to the customer’s requirements and expose them as virtual servers that the customers can manage. Physically, the resource may span multiple servers, data centers, or even continents.

2009 EMC Proven Professional Knowledge Sharing 3 Industry Relevance and Article Overview

Cloud Optimized Storage is the new buzzword in storage networking; the world is rethinking managing their data storage. Gartner predicts that “By 2012, 80 percent of Fortune 1000 companies will pay for some cloud computing service, and 30 percent of them will pay for cloud computing infrastructure.” As Merrill Lynch analysts predict, the cloud market potential for business and productivity applications is about $96Bn by 2012 (Including SaaS of $30Bn). There are a series of major industry players waiting to adopt a Cloud Computing model to maximize the use of their services and boost revenues. This article focuses on Cloud Computing and Cloud based storage solutions and compares existing setups. It describes features of storage optimization, leveraging the current IT infrastructure, and the advantages and disadvantages of the model.

The discussion analyzes SOA, SaaS, Grid and Cloud Computing; Cloud Architecture and Applying Cloud Computing to Storage; Outlining Cloud Storage Solution with optimal performance; Managing the Solution over existing Storage Infrastructure; Advantages and Risks with Cloud Computing and Comparing different Cloud Storage Solutions. This article provides insight on capabilities for service providers, data centers, and the core capabilities that end users should consider when evaluating Cloud Storage Solutions. These capabilities and benefits will shed light on how cloud based storage would benefit each of them.

Engineers responsible for storage design and management will learn about the various elements of a Cloud Computing business model. The ability to increase capacity or add capability without investing in new infrastructure, training new personnel, or licensing new software are just a few of the many benefits. This model comprises any subscribed service that extends through the existing IT capabilities real time over the Internet cloud.

SOA Service Oriented Architecture was an initiative leading to virtualization. In Service Orientation, we virtualized resources; everything was differentiated and billed accordingly. It is a conceptual business architecture where business functionality or application logic is made available to SOA users or consumers as shared, reusable services on an IT network.

2009 EMC Proven Professional Knowledge Sharing 4 An enterprise level reference SOA architecture establishes guidelines for defining the architecture of any SOA based project.

1. SOA Roadmap: This defines the milestones on an SOA adoption journey. The milestones are defined for maturing the SOA infrastructure and for rolling out business applications using SOA principles. In this stage, an SOA practitioner defines major activities and timelines.

2. SOA Infrastructure: After defining the roadmap, assess the customer’s IT infrastructure for SOA readiness. This stage defines the target SOA infrastructure including hardware, software tools, packages, and alliances. Impacting changes to infrastructure involves following sub-stages viz., Plan, Define, Design, Construct, Test & Deploy.

3. Services, Composite Applications: This stage of SOA adoption deals with design and programming aspects of service, and process and application realization. Activities include creating a project plan, developing a risk assessment and mitigation plan, formulating QA strategy and guidelines, defining SOA design methodologies, creating a test plan, and testing report generation. Sub-stages include Plan, Define, Design, Construct, Test and Deploy.

4. Operations & Maintenance: This stage of the journey observes SOA in production and measures its value. Metrics collected during this stage guide the next set of SOA activities including infrastructure augmentation, service portfolio enhancements, changes to SOA runtime governance etc.

5. Change Management: This stage lays down the formal process for impacting changes to the SOA system. Change management defines how changes are brought in for SOA Roadmap definition, infrastructure, and service portfolio. These are guided by policies defined by SOA Runtime Governance.

6. SOA Runtime Governance: This defines the policies establishing behavioral rules and guidelines. Policies are specific and cover business, organizational, compliance, security, and technology facets of services operating within SOA.

2009 EMC Proven Professional Knowledge Sharing 5 SaaS The “” delivery model is increasingly popular as there is no hardware or software to manage and the service is delivered through a browser. The prime factors in this model: i. Pay per use ii. Instant Scalability iii. Security and Reliability iv.

Advantages include lower cost of ownership, reduced responsibility for infrastructure management, bandwidth for unexpected resource loads, and faster application rollout. CRM, Financial Planning, Human Resources and Word processing are the common implementation areas. Risks include ssecurity, downtime, access, dependency, and interoperability.

Distributed System A distributed system executes tasks (orders) via its components’ cooperation. In operation, it has only limited knowledge of its components’ current status since:

 time needed to determine the status of a component is longer than the duration of the status (very often a consequence of the spatial distribution of the system)  status Hiding

Components typically operate asynchronously and communicate via messages. Protocols are the specifications for syntax, semantics, and dialog structure of the communication. The Internet, WWW, Grids and Chip are common examples of distributed systems.

2009 EMC Proven Professional Knowledge Sharing 6 Grid Computing This is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers acting in concert to perform large tasks. It’s a distributed system for a community of users that provides:

 Resource sharing (computers, storage, I/O equipment, network, other equipment (scientific instruments, e.g.), data, software)  Support for user collaboration  Support for virtual organizations

Resources, users and virtual organizations are dynamic, i.e. during operation they may emerge, change, or vanish. Mostly, grids:

 Use heterogeneous resources  Are geographically distributed  In science combine resources and users from different management domains and institutions, whereas in business they are restricted to a single institution/company (security, liability of service)

 Approach SOA

Applying Cloud Computing to Storage Cloud Computing is a model in which data and applications are hosted and managed remotely and provided as a service over the internet cloud. It provisions IT capabilities on-demand versus the traditional procure and provision model. Service utilization is calculated based on a consumption model where consumers pay only for the operational units consumed.

Cloud Computing segments the IT Services market as Infrastructure, Platform and Application services – Infrastructure as a service (IaaS), (PaaS) and Software as a service (SaaS). Although there is a general industry consensus towards SaaS as a subset of the Cloud Computing paradigm, they are two entities functioning in composition. In SaaS, we deal with complete business applications made available for consumption as service whereas IT capabilities (including application platform and services) made available for consumption as service will be a part of Cloud Computing. In that sense, cloud services could be used to build SaaS applications if they were offered for consumption in a service model.

2009 EMC Proven Professional Knowledge Sharing 7 Cloud Computing Status in the Global Market Cloud Computing symbolizes a trend toward a commoditization and utility mindset in the industry and society as a whole. It may be signaling the emergence of a paradigm of IT capabilities as utility services provided and consumed on a need rather than an ownership basis. Nonetheless, the following trends and events may be key contributors.

 There is a significant disruption caused by new players like and who are using commodity hardware to provide Cloud services.

 Infrastructure virtualization and management tools are quickly maturing as viable and reliable cloud platforms. The availability of technology and business environmental factors is prompting enterprises to consider optimizing their computing resources.

 Some enterprises are unable to expand existing data centers due to space, energy constraints, and government regulation. Expansion is not possible even if there is money and willingness.

 Virtualization and other dynamic schemes are reducing hardware sales; pushing hardware vendors to pursue service models. SUN, HP, and IBM started pay-per-use hosted models as an alternative revenue channel as they failed to clock expected growth from traditional channels. The model aligns how small and medium companies want to procure resources. Cloud Computing provides them with a model of incremental growth with an inherent elasticity for shrinkage. It can also meet enterprise demand for transient compute capacity/scaling requirements.

Cloud in Action The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of just $240[10]. --illustration of economy of scale [Hadoop is apache implementation of Google technology for large data processing]

2009 EMC Proven Professional Knowledge Sharing 8 In many circles, Sawzall is considered the key building block for much of Google’s data analysis. http://labs.google.com/papers/sawzall.html

 … Sawzall has become one of the most widely used programming languages at Google. … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).  Other similar languages: Yahoo’s Pig Latin and Pig; ’s Dryad  Cloned in open source: Hadoop, http://hadoop.apache.org/core/

Market Profile and Market Size of Cloud Computing From the demand perspective, the market needs various models palatable to buyers based on their appetite for cost, control and risk. The spectrum ranges from models with full control in terms of hardware choice and root access, to complete black dynamism. Our age-old managed hosting model is at the lowest band, repackaged with a higher degree of automation and optimization from virtualization and standardization in various layers of the IT stack.

Potential cloud contenders come from various sections of the industry (ie. telecom). The vendor market can be characterized within the following classes in the sense that the offerings are trying to address requirements in different layers of the IT stack. It appears likely that a future cloud application will be built with services from more than one vendor. In our opinion, this is the ideal end goal for Cloud Computing.

1. Application Platform: This is classified under cloud vis-à-vis SaaS and is more infrastructure in nature rather than business software. Application platform providers offer software development platforms (Platform as a service) for application development. This application could be either fully consumable business application (in which case they could be called SaaS) or they could be just cloud services. Usually the application is also hosted and run by the same provider. This may not be the case going forward due to increasing standardization of virtualization and application definition (viz XAML). Providers include Force.com, Net Suite, Bungee Lab, Cog Head (developed application run on Amazon) etc.

2009 EMC Proven Professional Knowledge Sharing 9 2. Software Service: These services are standalone business/software services for consumption by cloud applications/services inside/outside the provider. Unlike SaaS, these are not fully developed applications with enough value on their own; they must be combined with other services. Examples include Google Map, Yahoo Pipe, Pay Pal, Amazon Book Web Services, Strike Iron (data as a service) etc. This is a generic bucket covering all possible software services required to make applications/services operational (integration, security, IM etc.) Note: At times, these are also classified as SaaS, but the majority considers them SaaS only when the applications are consumable for business functions rather than as components of a function.

3. Storage Service: This is about storage services on the internet. These evolved from the first generation of storage services offered on the internet during the 1990s.

4. Computing & Storage Infrastructure: This is about providing compute infrastructure service on the internet. This usually includes other services such as networking and management services to make the infrastructure useful. Examples include Amazon WS, Enomaly, CohesiveFT, GoGrid, etc.

An estimate of the market size:

 Merrill Lynch study projects Cloud Computing business potential at 95BN (ad revenue 65Bn); this includes SaaS applications. As per the McKinsey projection, SaaS projection is around 30Bn. That leaves the pure cloud market at 65 Bn by 2012.

 Heuristics Based: As per rough estimate, the installed base of commodity servers in the world is around 30Mn; assuming half (15Bn) of the servers migrating to Cloud (other half between SaaS, sunset/rationalization and in-house data center). With utilization at 80% the cloud will need about 4Mn servers (16Mn normalized for typical multi-core/CPU enterprise server vs. leaner cloud server). Amazon makes about 1K per server per year bringing the figure to 4Bn. With CPU normalization, it can go up to 16Bn (typical two CPU quad core enterprise servers). The other cost components are storage and bandwidth charges totaling to about 30-35Bn (storage is much more expensive). The remaining 30-35Bn could be from professional services and cloud software services like gmail, map, paypal, StrikeIron etc.

2009 EMC Proven Professional Knowledge Sharing 10

Various Models of Cloud Computing Dominant models of Cloud Computing from the providers’ classification perspective include:

1. IAAS (Infrastructure As A Service) - This category of providers provide IT infrastructure capabilities such as servers, storage, networking and other applications infrastructures like Queue, Database. The primary differentiator is that these providers do not provide platforms for building applications natively. 2. PAAS (Platform As A Service) - This category of providers provides the next level of platforms above IAAS, enabling consumers to build cloud ready applications using proprietary (mostly) API frameworks. These frameworks are designed and optimized for multi-tenancy (cost) and scalability (autonomy) on demand. 3. SaaS (Software As A Service) - This category of providers provides complete business applications or functions on a pay-per-user model. The rental model could be based on 'concurrent user count', 'per named user', 'per transaction' etc.

Note: Please remember that these providers are going to evolve over time and the various dominant models are likely to fuse and give rise to a universal, inter-operable cloud.

Example Pattern Description Services The core software is owned and run by the Embedded Service enterprise with hooks to use services from the Google API, Flickr, [Access API] internet. The core software could also run on the eBay, Strike Iron cloud, while ownership still is with the enterprise The service is owned and run by the enterprise, including on the cloud. The services are then Embedding Service embedded into Cloud applications and platforms Facebook [Plug-in API] on the internet. It uses “Access API” to access the core of the platform.

2009 EMC Proven Professional Knowledge Sharing 11

Example Pattern Description Services

Runtime In this mode, the service code is developed Force.com, Ning, Environment using the platform and runs inside the platform. Bungee

Agnostic Hosting This category of platforms are agnostic of the (Infrastructure Amazon EC2 application details like API, language etc. Cloud) This category is more aware of the computing Integrated Hosting framework used for application development, but Joyent, (Platform Cloud) they are still domain agnostic (from business Gigaspaces model perspective). Force.com for These vendors focus on a particular service CRM, Webex for Verticalized Cloud domain, acting more like a domain eco-system collaboration, (Application Cloud) (Verticalization of cloud). RightNow for Customer Care Private Cloud This is an emerging trend where customers pay (different from cloud a premium for extra high quality, reliable service inside the hosted on the cloud. enterprise) This is about ISV supplying tools and technologies for the public cloud and also Tools and VMWare vCloud, integrating the enterprise cloud to a public cloud. Technologies These may work across all clouds or a selected partner set.

1. VMWare is betting big on cloud with its vcloud and VDC-OS (Virtual Data Center OS) 2. 3Tera alliance with Citrix alliance and large customers like BT

2009 EMC Proven Professional Knowledge Sharing 12 Customer Adoption Global CxOs are pressured to reduce non-differentiating IT footprint with innovative solutions because:

i. Datacenter hardware utilization is very low (10-20%). Organizational policies and politics are barriers to platform rationalization. ii. Applications are commonly provisioned for peak capacity. iii. Less than 40% of the features available in package applications are put to use in a typical enterprise. iv. Maintenance and data center operations cost are higher than software costs.

To summarize, organizations are paying for over-provisioned and under-utilized IT infrastructure and software that is non-differentiating and unresponsive to business change. This is the general pattern across the industry and we presume that the Cloud will have appeal once the industry overcomes FUD (fear, uncertainty and doubt). Enterprises (especially large enterprises) will take advantage of virtualization and other Cloud related technologies to boost utilization and manage their in-house infrastructure more efficiently. This is sometimes referred to as a Private Cloud (the term also seems to be used to define part of the public cloud isolated and dedicated exclusively for a customer). In the near term, enterprises will focus on making their in-house infrastructure more efficient and start leveraging cloud services to move some applications off premise.

Gartner estimates that 3% of custom and packaged IT applications are currently off premise. By 2013, this percentage will rise to 20%. Most off premise applications will run on some sort of cloud infrastructure. Enterprises will move simple and less-demanding applications to Cloud Computing in the short term. In the longer term, some of the larger and mission-critical transactions will be entrusted to cloud providers.

Email is one of the first applications to migrate to Cloud as it has a strong business case compared to the outsourced or in house options. Organizations like Sanmina-SCI, Avago, .com etc are already using Google enterprise email services. According to Gartner, by 2012, 20% of enterprise e-mail seats will use a SaaS or cloud model for e-mail services.

2009 EMC Proven Professional Knowledge Sharing 13 Drivers for Adoption and Industrial Outlook The following table illustrates both positive and negative forces at play in terms of Cloud adoption in the industry.

Reference Study http://innovation.wipro.com/CTO/wiki

2009 EMC Proven Professional Knowledge Sharing 14 A few more points follow with respect to use cases from an infrastructure perspective.

 SMB market is adopting the cloud infrastructure with enthusiasm. The proposition of having access to incremental infrastructure on a rental basis could be next to utopia.

 Large enterprises are using cloud infrastructure for burst computing purposes (businesses with seasonality). Burst computing is a scenario where an enterprise faces enormous computing and storage power for a brief span of time. Large scale application performance testing is a typical case. A variation of this model is a proposal for enterprise IT to use cloud to process spill-over requirements in any traditional application setup in which the capacity provisioned will be for the average load and anything above the predetermined threshold will be forked to the cloud.

 Large enterprises will also look at the underlying technologies to create in-house clouds to optimize their IT infrastructure. Some enterprises will also be open to the idea of moving their applications to a private cloud managed by an outsourcer.

 Where typical internal sourcing models are too sluggish or cost inefficient, it promotes the use of a cloud infrastructure.

 Cloud services can potentially be embedded inside enterprise applications to enrich them with the power of the internet; the most popular include Google map, StrikeIron, Amazon Book WS, EBay WS, Google data API.

 ISVs will use many Cloud services to offer their software in a SaaS model. Gartner has proposed a planning assumption that through 2013, more than 70% of platform as service based business applications will be developed by ISVs (not enterprise IT departments).

 From a cloud service perspective, we are all familiar with mashup or pure services on the internet (Gmail, Hotmail, paypal, Google maps, igoogle etc.)

2009 EMC Proven Professional Knowledge Sharing 15 Numerous firms have developed complex applications that demonstrate the potential power of the cloud. Cloud Computing and storage have become ideal platforms to develop sophisticated, economical and flexible services. Cloud-based technology is here to stay, will rapidly become pervasive, and will change the way we do business. Cloud-based storage has evolved from continuing attempts to de-couple storage from applications so that each resource can be optimally scaled and managed.

Major Cloud Service Providers The value proposition of Cloud Computing could be offered by many existing players by extending their skill in areas related to Cloud Computing.

1. Major telecom vendors are converting their business model for providing infrastructure services to enterprises by leveraging their skill in handling massive amounts of infrastructure equipment to offset loss of revenues due to the emergence of VOIP.

2. ISPs, hosting companies, and large scale Web platform players are leveraging their skills in handling large scale data centers like Amazon, Google and Yahoo.

3. Hardware vendors are looking to leverage the cloud paradigm to host their hardware in a rental model like Network.com from Sun and EC2 from Amazon.

4. Traditional consumer/enterprise service providers can participate in the cloud model (viz. DHL, ADP) by also hosting the IT applications and processes

5. Traditional software giants like Microsoft, SAP, and Oracle etc. are taking preemptive measures to retain their customer bases through cloud (SaaS) counterparts of their traditional software offerings. Also, traditional IT services providers can participate as aggregators in the cloud.

2009 EMC Proven Professional Knowledge Sharing 16 Services opportunities in Cloud Computing are in distinct areas:

1. Developing software tools and utilities to facilitate build to package, deploy, and manage applications in the cloud

2. Consulting and implementation services around migration to public and private cloud

Now, our discussion moves to understanding the existing and evolving market offerings in Cloud Computing. Web players are leading in cloud services and cloud-based applications.

Amazon – The company has moved to monetize its global scale eCommerce platform to offer computing resources on demand (EC2, S3) along with other cloud based web services. Amazon was one of the first vendors to bring broader Cloud services to market.

Google – Google offers its search, advertising, email and office suite as cloud based applications. Google’s cloud offering leverages its massive data center investments. The company also has a platform as a service offering called .

Salesforce.com – Salesforce.com has moved to monetize the infrastructure it built for its SaaS CRM application. Their PaaS offering, force.com, allows developers to build multi-tenant SaaS applications. The company has successfully built an ecosystem of third party applications on its platform (700 applications from 350 vendors).

Prominent Players

The prominent players include service providers like Amazon, AppNexus, eBay, Google, GoGrid, Salesforce and Yahoo; as well as traditional vendors including IBM, Intel, Microsoft and Nirvanix. Individual users are adopting it through large enterprises including General Electric, L'Oréal, Procter & Gamble and Valeo. Vendors such as Caringo, EMC, Ibrix, Xiotech, and others are racing to provide a storage services layer or APIs that will underpin the next generation of cloud-based storage. These vendors are developing cloud-based storage infrastructures that will go well beyond the limitations of first generation products that tried to provide basic FTP and WebDAV-type access.

2009 EMC Proven Professional Knowledge Sharing 17 Next-generation solutions will provide services on top of true Web services architectures. Moreover, these next-generation solutions will provide secure partitioning, data organization, and advanced user management services. Major IT vendors like Microsoft, IBM and HP are bringing their cloud products and services to market. Oracle is also expected to bring an application platform related offering based on BEA products to market.

Evolution of Cloud based Storage

Figure 1 (Reference Taneja Group Research)

2009 EMC Proven Professional Knowledge Sharing 18 We must analyze the evolution of Cloud based storage from the perspective of constantly changing user demands with regard to the scalability and performance of data storage and applications. In a Web-centric world, where large service providers host storage and computing, and customers buy them on a pay-per-use basis, this makes the IT infrastructure elastic and cost-optimized.

According to Wikipedia, the Cloud is a metaphor for the Internet as commonly depicted in network diagrams as a cloud outline. The underlying concept dates back to 1960 when John McCarthy opined that "computation may someday be organized as a public utility" and the term cloud was already in commercial use in the early 1990s to refer to large ATM networks. By the turn of the 21st century, Cloud Computing solutions had started to appear on the market primarily focused on Software as a service (SaaS). Amazon.com played a key role in the development of Cloud Computing by modernizing their data centers after the dot-com bubble and providing access to their systems by way of in 2002 on a utility computing basis.

In 2007, Google, IBM, Sales Force, Yahoo and a number of universities began large scale Cloud Computing research projects and application development. By mid 2008, Cloud Computing had gained publicity thanks to the press and global technical conferences.

In August 2008, Gartner observed that "organizations are switching from company-owned hardware and software assets to per-use service-based models" and that the "projected shift to Cloud Computing will result in dramatic growth in IT products in some areas and in significant reductions in other areas."

Need for Cloud based Storage One of the major business needs for cloud based storage is the need for a Service-based online economy where resources and services are transparently provisioned and managed real time.

2009 EMC Proven Professional Knowledge Sharing 19 Other reasons include:  dramatic growth in interconnected devices  increase in real-time data streams  increased industrial adoption of service oriented architectures  Web 2.0 Applications & SaaS/PaaS/IaaS  changes in multi-vendor collaboration across geographies  globalization  massive social networking and mobile commerce  inexpensive and more efficient means of connecting to Internet and its immense user penetration  improvements in virtualization  Grid Technologies  change in rationale of demand/usage based cost effective utilization of resources  tremendous increase in the scale of IT environments

Comparison Chart of Major Cloud based Services

Reference Hysea Cloud Computing Workshop- “Migration to Cloud Computing: An Academic Perspective.”

Illustration on following page.

2009 EMC Proven Professional Knowledge Sharing 20

2009 EMC Proven Professional Knowledge Sharing 21 Implementing a Cloud Computing Solution

Cloud Computing is often confused with:  grid computing (a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely-coupled computers, acting in concert to perform very large tasks)  utility computing (the packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility such as electricity)  autonomic computing (computer systems capable of self-management)

Today, many Cloud Computing deployments are powered by grids, have autonomic characteristics and are billed like utilities, but Cloud Computing can be seen as a natural next step from the grid-utility model.

Some successful cloud architectures have little or no centralized infrastructure or billing systems including peer to peer networks like and Skype, and volunteer computing like Wikipedia. Today, developers can create a cloud application on a number of cloud platform technologies. To understand cloud platforms, let’s start by looking at cloud services in general. As Figure 2 shows, cloud services can be grouped into three broad categories:

Figure 2 (Reference Taneja Group Research)

2009 EMC Proven Professional Knowledge Sharing 22 1. Software as a service (SaaS): An SaaS application runs entirely in the cloud (that is, on servers at an Internet-accessible service provider). The on-premises client is typically a browser or some other simple client. Salesforce.com is the most well-known example of a SaaS application but many others are also available.

2. Attached services: Every on-premises application provides useful functions on its own. An application can sometimes enhance these by accessing application-specific services provided in the cloud. Because these services are usable only by this particular application, they can be thought of as attached. Apple’s iTunes is one popular consumer example. The desktop application plays music and more, while an attached service allows buying new audio and video content. Microsoft’s Exchange Hosted Services provides an enterprise example, adding cloud-based spam filtering, archiving, and other services to an on-premises Exchange server.

3. Cloud platforms: A cloud platform provides cloud-based services to create applications. Rather than building their own custom foundation, for example, the creators of a new SaaS application could build on a cloud platform.

Whether it’s on-premises or in the cloud, an application platform has three parts:

1. A foundation: Nearly every application uses some platform software on the machine it runs on. This typically includes various support functions, such as standard libraries and storage, and a base .

2. A group of infrastructure services: In a modern distributed environment, applications frequently use basic services provided on other computers. It’s common to provide remote storage, for example, integration services, an identity service, and more.

3. A set of application services: As more and more applications become service-oriented, the functions they offer become accessible to new applications. Even though these applications exist primarily to provide services to end users, they are also part of the application platform. (It might seem odd to think of other applications as part of the platform, but in a service-oriented world, they certainly are.)

2009 EMC Proven Professional Knowledge Sharing 23 Developers build cloud applications using the three parts of an application platform. A framework used for classifying and characterizing the various cloud technology vendors in the market follows. As per our definition, Cloud Computing offers developing and/or deploying/managing IT services on the public internet platform. It may unnecessary to develop cloud software on the cloud. For example, you can use a local development platform to develop but a public cloud to deploy and manage. It is imperative that we understand the three pillars involved in software development:

1. Development Environment and API: We need a development environment that understands the underlying language API to develop software.

2. Runtime Platform: Next, we need a platform to execute the software developed.

3. Management: Once deployed, the software needs to be monitored and administered from time to time.

Optimizing a Cloud Storage Solution The prominent components of optimizing Cloud Solutions and their utilization analysis over existing IT are as:

 Node Priority: Issue: Some nodes are more performance critical than others Solution: Boost spending on critical nodes (e.g. master funding boost)

 Workflow Priority: Issue: Some workflows are more performance critical than others (although they look the same to the system) Solution: Declare relative priority of workflows and split budget accordingly

 Job Priority: Issue: Some stages of a workflow are more I/O intensive, others more CPU intensive Solution: Boost resource spending during resource-intense stages of workflow

2009 EMC Proven Professional Knowledge Sharing 24

 Bottleneck Mitigation: Issue: Some nodes may be bottlenecks during map/reduce synch Solution: Redistribute funds to active bottlenecks

 Best Response: Issue: Optimal configuration/allocation might change when other users place competing bids Solution: Find game theoretical best response bids continuously to maximize utility

 Risk: Issue: Some users are more risk averse than others (can tolerate fewer fluctuations) Solution: Bid on nodes based on predicted guarantee to deliver a QoS level

Managing the Cloud Solution Workflow management matters because many of the benefits of Cloud Computing come from the speed and ease with which IT resources can be created and put into production.

Illustration follows on the next page.

2009 EMC Proven Professional Knowledge Sharing 25

 Includes clear policies on i. who to admit ii. how to arbitrate among competing requests iii. what resource capacity may be requested over what time frames  Isolated Data centre: Reset, reboot, power up, power down, get status  Bias towards large and short experiments  Site coordination required, e.g. accounting

2009 EMC Proven Professional Knowledge Sharing 26 As per Credit Suisse Analysis- “Managing the Impact of SOA and Enterprise 2.0 on Financial Services IT”; broad adoption of shared services is clearly a good thing, so what’s holding it back? Line-of-Business groups will no longer own the whole front-to-back process, so we need to solve several difficult problems to ensure they can still deliver to their customers:

. Technical issues: Security, Architectural Governance, Development Lifecycle, etc. . Non-technical issues: Culture, incentives for cross-silo cooperation, risk management

Managing Enterprise 2.0 and SLAs

. Service Level Agreements: SLAs are essential to managing business-critical functions – Without SLAs, how can you know if the SOA will meet business needs? . SLAs aren’t mentioned in most SOA discussions

. Existing systems tend to have implicit SLAs between components – Easier to manage the end-to-end SLA when you stay inside the silo

. Problems arise when refactoring systems as services – Implicit SLAs in existing applications need to be identified and made explicit . Tight coupling gives way to loose coupling between client, workflow, and services

. Mashups and workflow magnify the problem – Can no longer make assumptions about how a service is consumed, or by whom . In effect, each end user desktop can drive a unique application

2009 EMC Proven Professional Knowledge Sharing 27 Enterprise SLAs and Cloud Computing . Use SLAs to model the behaviour of each level in the SOA

. At each boundary in the SOA, we should have an SLA – Top-level SLA will be expressed in business terms . transaction throughput, availability, number of concurrent users, etc. – Underlying services will have SLAs stated in more technical terms . message throughput / latency, number of concurrent connections, etc. – Model client behaviour too

. One size does not fit all – Different users will require different SLAs (at different price points!)

. A lot of effort going into building clouds of utility computing power – On demand computing, next-generation service fabrics, etc.

. SLAs and Policies become ever-more important the closer you get to utility computing – basis for pricing models; chargeback – SLAs give confidence that cloud model is manageable – SLAs can be mapped onto infrastructure and support tiers . e.g., automatically deploy services onto appropriate h/w based on SLA

The bottom line is that senior management won’t outsource to the cloud unless they are sure to achieve a return on their investment.

Security in the Cloud

Security is a vital concern when designing/realizing Cloud Computing as an abstraction for a complex on-demand scalable computation grid that is accessible to users through web- enabled devices. Customer data and programs residing in provider premises and security is always a major concern in Open System Architectures. An optimal application for securing it should be a compact, cross platform independent, security application used on Cloud Computing systems and focusing on protecting users’ sensitive and private data.

2009 EMC Proven Professional Knowledge Sharing 28 The utilization of Cloud Computing systems is steadily rising as is the need for a practical security application. Cloud Computing typically stores a client’s data in a location accessible from the Internet so it is no longer stored in the client’s personal computer, but in a data center operated by the Cloud Computing provider. It is more susceptible to attack since the data is not completely in the user’s control. Primarily, Cloud Security will have to minimize the risk by guaranteeing that only authorized users have total access to the data. They will have to implement security methods to protect users’ data regardless of the Cloud Computing host, its platform, or its weaknesses.

Security is often an afterthought in computer applications; the same is true with Cloud Computing. Nevertheless, its implementation is critical since so much sensitive and private data is being stored on these systems. For example, users can now create, edit and print Word documents and spreadsheets online. Users may maintain their personal contacts stored on the Cloud, with associated telephone numbers, addresses, and email addresses. Users may even retain the most sensitive data such as social security numbers or bank account and credit card numbers on the Cloud’s storage mediums.

Data is no longer stored on the creator’s computer, but on the servers of the service that provides the web application. Often, the security of these systems is questionable. Not only that, the security scheme is up to the Cloud Computing provider. The Security Application must protect data even though it is stored somewhere else.

Background Analysis As previously mentioned, Cloud Computing is an abstraction for a complex on-demand scalable computation grid that is accessible to users through web-enabled devices. Although the specifics of this paradigm are still being defined and revised, Cloud Computing typically consists of some basic components (e.g. CPUs, storage mediums, network interconnects, etc.) upon which any number of applications can be deployed. A Cloud Computing platform incorporates some or all of these components, and each component has its own security concerns and issues. As this technology becomes more widespread and accessible, the need for proper security becomes more evident. As the general public (i.e. those with less technical expertise) shifts to Cloud Computing, security issues should be at the forefront of developers’ minds. Some aspects of Cloud Computing become more secure; others become less.

2009 EMC Proven Professional Knowledge Sharing 29 On one hand, security improves due to the centralization of data. Rather than having private data spread over a number of systems (e.g. work computer, home computer, and mobile device), data is stored on the Cloud and accessed with the device. Moreover, security improves with the ability to increase focus of security resources. Rather than having to secure the operating system and applications of many different computing devices, security can be focused on a single data center.

While some features of Cloud Computing are more secure, some are more vulnerable to exploitation and attack, these aspects can be categorized into two groups:

General weaknesses: 1. Loss of control of data 2. Security measures are in the hands of providers 3. Denial of service type attacks makes all data unavailable 4. Large infrastructure offers many points of failure

Specific weaknesses: 1. Distributed Encryption/Decryption 2. Distributed Key Generation/Distribution 3. Security certification of distributed systems.

Distributed Encryption/Decryption and Key Generation/Distribution have been the topic of scholarly articles and some companies are already providing solutions. Current systems lack an easy to use, homogenous application solution that encompasses all of the aforementioned security issues. A typical Cloud Computing platform has various layers and we wish to address the issues in the platform and applications layers.

Securing the Cloud Solution This would encapsulate many areas of computer science. Ideally, we will adjust and adapt existing security applications to serve as a basis for applications. Generally, security incorporates both hardware and software aspects of computer science specifically, cryptography, network security, and software security. Furthermore, we will incorporate the fields are related to Cloud Computing including networking, operating systems, and virtualization. As we develop, we will need to incorporate principals from these related fields.

2009 EMC Proven Professional Knowledge Sharing 30

Our challenge is to apply typical cryptographic schemes to a Cloud Computing environment. Some of the specific weaknesses are concerned with cryptography. It is important to maintain distinctive encryption and decryption keys since much of Cloud Computing is based on replication. Recently, Amazon faced this issue on their Cloud systems. This problem has since been resolved, but lack of foresight in cryptography can lead to disastrous results.

Our application will address the challenges presented by distributed cryptography, and focus on encryption and key distribution. Networking is the backbone of a Cloud Computing system. Network security has been heavily researched, but our application will be concerned with networking as it relates to Clouds. Networking on the Cloud is unique in that there are so many points of entry. A typical data center may host only a handful of websites and therefore have a small number of access points. A data center focused on Cloud Computing could host hundreds of sites, plus private data, and other applications.

2009 EMC Proven Professional Knowledge Sharing 31 Users access data using a myriad of different protocols from different locations. This calls for a complex set of network defenses such as firewalls, intrusion detection systems, and secure channels. While overall network security of the entire Cloud Computing system is outside the scope of our application, it will be important to understand its functionality and implementation.

Other related fields such as virtualization and operating systems present their own security issues. We will identify where security holes exist and incorporate solutions into our design as we develop our application. Many environments exist in which we can develop our application, most notably Amazon’s Elastic Computing Cloud (EC2). This system provides the features of any typical Cloud Computing System and implements an API for usage. Other companies provide similar systems, but development will most likely occur on the EC2 system as it is the most accessible and offers a feature-rich API. The EC2 is the system on which our work thus far has been developed. Security should be available at the following prime levels:

 Server access security  Internet access security  Database access security  Data privacy security  Program access Security

Potential Advantages

If you extend the concept of virtualization from a single server to a complete grid, and make access available over the Internet, it can be summarized as Cloud Computing. So just imagine that virtualizing a single server can save 50 to 70% of resources, how much savings can you achieve if a complete data center acts as a single grid and is then virtualized?

 Reducing capital and operating expenditures through infrastructure pooling and improved utilization. Here, customer expense is minimized so we minimize barriers to entry. The infrastructure does not need to be purchased for one-time or infrequent intensive computing tasks but is owned by the provider. Lower Operating Costs as Minimized Capital expenditure

2009 EMC Proven Professional Knowledge Sharing 32  Separating infrastructure maintenance duties from application development

 Separating application code from physical resources. i.e, device and location independence enables users to access systems regardless of their location or what device they are using (e.g. PC, mobile). Any-time, any-place, any-device access; Location and Device independence

 Centralizing operations with ability to use external assets to handle peak loads.

 Increasing administrator efficiency and quickly scale to meet user demands.

 Sharing capability among a large pool of users, improving overall utilization. This is an alternative if departmental or central IT is non-responsive.

 Increasing flexibility to shape the software for improved operational efficiency as High Computing power with flexibility and dynamic load handling.

 Enhancing scalability as it facilitates easier cross-institution collaboration

 Offering pay as you go options and focusing on core business; pay only for what you need is useful when service demands fluctuate

 Minimizing down times with Fault Tolerance clouds built with the presumption of untimely component failures

 Improving alignment of IT resources with institutional priorities with caching service call results, higher utilization, and improved efficiency

2009 EMC Proven Professional Knowledge Sharing 33 Some Limitations

 Loss of control (mirrors traditional centralize/decentralize debate)  Integration with enterprise authentication, single-sign-on  Integration with key enterprise applications  Accessibility and User Interface limitations of web applications  Performance and availability concerns  Policy/compliance concerns  Breach forensics and mitigation  Need to monitor application availability, not just node or VM availability

Future of the Cloud The Cloud is evolving as each day passes. Here is a graphical analysis by Source: Saugatuck Technology 2008.

SaaS 1.0 SaaS 2.0 Cloud Computing High Wave I: 2001-2006 Wave II: 2005-2010 Wave III: 2008-2013 Wave IV: 2011-2016 Cost-Effective Integrat ed Workflow-Enabled Measured, Monitored, Managed Software Delivery Business Solutions Business Transformation Business Processes

Post-SaaS Adopt i on Ubiquitous Adopt ion • End-to-End Business Processes A • Optimized Business Ecosystems • Integrated w/ Services Anywhere • Intelligent Hubs Linking Platforms d Mainstream Adoption • IT-Targeted Ecosystems • Mobile Device- and Sensor-Controllable o • Integrated w/ Business • SaaS Development Platforms pt • Inter-enterprise Collaboration •SLAs for Composite Service Offerings Early Adopt ion • SaaS Integration Platforms • Stand-alone Apps • Business Marketplaces • IT Utility / SaaS Infrastructure • Multi-tenancy and SaaS Ecosystems • Limited Configurability • Customization Capability • FTCO/id

Low 200 200 200 200 200 200 200 201 201 201 201 201 201 201 Source: Saugatuck Technology

The Enterprise evolution to ‘cloud sourcing’ as indicated in the study – “Innovation and Profit: How On-Demand Computing Can Change Your Business” - from Dream Force, shown on the following page, gives another executive insight.

2009 EMC Proven Professional Knowledge Sharing 34

Toda 1-2 3-5

Conclusion: Cloud Vision and Strategy

The cloud paradigm is gaining mindshare in the market (CxO) and it is likely to continue (including complex enterprise IT) with increasing maturity. Though cloud is not yet a recognized enterprise IT sourcing strategy, it is deemed a viable alternative for business units, particularly those who are frustrated with sluggish IT departments. In this sense, cloud is emerging as an outsourcing alternative (bypassing IT); we need a relationship with the business units to capture these opportunities. Presently, the majority of current projects are outsourced through the IT department. The IT procurement processes will soon include an on- demand option in addition to traditional build or buy options.

Both SaaS and Cloud Computing paradigms are touted as disruptive outsourcing models and their effect on IT staffing is expected to be significant. The future enterprise IT staff will likely include CIOs, architects and process experts. Architecture and design will likely remain inside the enterprise boundary, while the rest of the construction and operations work will be outsourced through the two models (SaaS & Cloud). Though it is not clear how and to what extent, there is a definite change on the horizon for enterprise IT processes with proportionate changes percolating down to SI vendors.

2009 EMC Proven Professional Knowledge Sharing 35 Enterprises will begin to explore the cloud paradigm in-house and externally for smaller non- mission critical and less demanding applications at first. Over time, larger and mission critical system will be trusted to the cloud.

There are no clear or immediate answers on the transition to the external cloud, ease of migration, Fixed Cost Advantage vs Variable Rental Cost etc. On the contrary, the benefits of cloud-based computing, including scalability and lower costs, are very real. Hence, working in an application development, whether for a software vendor or an end user; the cloud is definitely going to play an increasing role in the next generation computing storage systems.

2009 EMC Proven Professional Knowledge Sharing 36 Appendix A: Technical References

Bibliography 1. Computing in the Clouds by A. Weiss. 2. A Short Introduction to Cloud Platforms by David Chappell. 3. An Introduction to SaaS and Cloud Computing by Ross Cooney. 4. Virtualization, Cloud Computing & TeraGrid by Kate Keahey and Marlon Pierce. 5. Computer Lab to Go: A “Cloud” Computing Implementation by Murphy and McClelland. 6. The Grid: Blueprint for a Future Computing Infrastructure by I. Foster and C. Kesselman. 7. Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities by Rajkumar Buyya, Chee Shin Yeo and Srikumar Venugopal. 8. Taneja Group Research on Cloud Computing by Jeff Boles

Websites 9. http://www.mesh.com 10. http://www.network.com 11. http://www.wikipedia.org 12. http://www.salesforce.com 13. http://aws.amazon.com/ec2 14. http://www.searchstorage.com 15. http://code.google.com/appengine/ 16. http://www.datacenterknowledge.com/ 17. http://www.cloudcomputing.org.il/ccd/ 18. http://www.ibm.com/developerworks/websphere/zones/hipods/ 19. http://www.morganstanley.com/institutional/techresearch/pdfs/TechTrends062008.pdf 20. http://www-03.ibm.com/security/products/prod_dkms.shtml 21. http://cloudsecurity.org/2008/07/14/is-your-amazon-machine-image-vulnerable-to- sshspoofing-attacks/ 22. http://etherealmind.com/2008/08/21/enterprise-cloud-computing-build-your-own-cisco/ 23. http://paulstamatiou.com/2008/04/05/how-to-getting-started-with-amazon-ec2 24. http://paulstamatiou.com/2008/08/21/how-to-live-the-cloud-life 25. http://justinleider.com/2008/08/20/running-your-own-hardware-vs-ec2-and-/

2009 EMC Proven Professional Knowledge Sharing 37

Appendix B: Cloud Taxonomy

2009 EMC Proven Professional Knowledge Sharing 38 Cloud Technology Landscape A view of cloud technology landscape used by a popular cloud blog site in Cloud Computing discussion group of Google follows.

Amazon EC2 ServePath GoGrid Rackspace Mosso Cloud Public Cloud Joyent Accelerators AppNexus Flexiscale ElasticHosts Private Cloud Cassatt Active Response Enomaly Enomalism Platform Open Cloud Platforms Morph Labs Salesforce.com force.com Custom Cloud Platforms Google App Engine [One can not run generic applications Bungee Labs Connect but ones developed using the native API Intuit Quickbase set] LongJump Coghead Cloud Platform Tools Rightscale Fabric Mgmt Elastra Cloud Server 3Tera AppLogic Kaavo IMOD Oracle Coherence IBM eXtreme Scale Data Grids GigaSpaces Data Grid Gemstone Gemfire

2009 EMC Proven Professional Knowledge Sharing 39

rPath CohesiveFT Virtual Appliances Hyperic CloudStatus Hadoop Amazon SimpleDB Storage Microsoft SSDS Rackspace Mosso CloudFS Google BigTable Bungee Labs Connect Boomi MuleSource Mule OnDemand Amazon SQS Microsoft BizTalk Services OpSource Connect Integration SnapLogic SaaS Solution Packs gnip CastIron Appirio Skemma Appian Anywhere Cloud Services OpSource Billing Aria Billing eVapt Zuora Vindicia Ping Identity Security OpenID/OAuth Data as a Service Strikeiron

2009 EMC Proven Professional Knowledge Sharing 40

Appendix-C SaaS, Cloud and Web2.0 Below are collections of diagrams from the blog http://markusklems.wordpress.com/cloud- classification/ capturing one of the plausible interpretations of the relationship as they exist today.

2009 EMC Proven Professional Knowledge Sharing 41

2009 EMC Proven Professional Knowledge Sharing 42

2009 EMC Proven Professional Knowledge Sharing 43