Agricultural and Biosystems Engineering Agricultural and Biosystems Engineering Publications

2016 Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era Sugam Sharma Iowa State University, [email protected]

U. Sunday Tim Iowa State University, [email protected]

Shashi K. Gadia Iowa State University, [email protected]

Johnny S. Wong Iowa State University, [email protected]

Follow this and additional works at: https://lib.dr.iastate.edu/abe_eng_pubs Part of the Bioresource and Agricultural Engineering Commons, Databases and Information Systems Commons, and the Statistical Methodology Commons The ompc lete bibliographic information for this item can be found at https://lib.dr.iastate.edu/ abe_eng_pubs/950. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html.

This Article is brought to you for free and open access by the Agricultural and Biosystems Engineering at Iowa State University Digital Repository. It has been accepted for inclusion in Agricultural and Biosystems Engineering Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era

Abstract Big Data is permeating through the bigger aspect of human life for scientific nda commercial dependencies, especially for massive scale data analytics of beyond the exabyte magnitude. As the footprint of Big Data applications is continuously expanding, the reliability on cloud environments is also increasing to obtain appropriate, robust and affordable services to deal with Big Data challenges. Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. Several data models to process Big Data are already developed and a number of such models are still emerging, potentially relying on heterogeneous underlying storage technologies, including cloud computing. In this paper, we investigate the growing role of cloud computing in Big Data ecosystem. Also, we propose a novel XCLOUDX {XCloudX, X…X} classification to zoom in to gauge the intuitiveness of the scientific an me of the cloud-assisted NoSQL Big Data models and analyze whether XCloudX always uses cloud computing underneath or vice versa. XCloudX symbolizes those NoSQL Big Data models that embody the term “cloud” in their name, where X is any alphanumeric variable. The discussion is strengthen by a set of important case studies. Furthermore, we study the emergence of as-a- Service era, motivated by cloud computing drive and explore the new members beyond traditional cloud computing stack, developed over the last few years.

Keywords Cloud Computing, Big Data, XCLOUDX, XCloudX, X…X, as-a-Service

Disciplines Bioresource and Agricultural Engineering | Computer Sciences | Databases and Information Systems | Statistical Methodology

Comments This is a manuscript of a forthcoming article that will be published as Sharma, Sugam, U. Sunday Tim, Shashi Gadia, and Johnny Wong. "Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era." Data Science Journal (2016): 1-20.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

This article is available at Iowa State University Digital Repository: https://lib.dr.iastate.edu/abe_eng_pubs/950

Proliferating Cloud Density through Big Data Ecosystem, Novel XCLOUDX Classification and Emergence of as-a-Service Era

Sugam Sharma1*, U Sunday Tim2, Shashi Gadia3, Johnny Wong3 1. Center for Survey Statistics and Methodology, Iowa State University, Ames, Iowa, USA 2. Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa, USA 3. Department of Computer Science, Iowa State University, Ames, Iowa, USA

Email: [email protected]

Abstract. Big Data is permeating through the bigger aspect of human life for scientific and commercial dependencies, especially for massive scale data analytics of beyond the exabyte magnitude. As the footprint of Big Data applications is continuously expanding, the reliability on cloud environments is also increasing to obtain appropriate, robust and affordable services to deal with Big Data challenges. Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. Several data models to process Big Data are already developed and a number of such models are still emerging, potentially relying on heterogeneous underlying storage technologies, including cloud computing. In this paper, we investigate the growing role of cloud computing in Big Data ecosystem. Also, we propose a novel XCLOUDX {XCloudX, X…X} classification to zoom in to gauge the intuitiveness of the scientific name of the cloud-assisted NoSQL Big Data models and analyze whether XCloudX always uses cloud computing underneath or vice versa. XCloudX symbolizes those NoSQL Big Data models that embody the term “cloud” in their name, where X is any alphanumeric variable. The discussion is strengthen by a set of important case studies. Furthermore, we study the emergence of as-a-Service era, motivated by cloud computing drive and explore the new members beyond traditional cloud computing stack, developed over the last few years.

Keywords. Cloud Computing, Big Data, XCLOUDX, XCloudX, X…X, as-a-Service

1.Introduction

In last few years, the volume as well as the variety of the data is observed to be grown extensively (Figure 1(a)), especially because of the social media such as , , multimedia, Internet of Things (IOT), etc. This fast paced heterogeneous, structured or unstructured data generation has overwhelmed the community and as a whole, in modern data science vocabulary it is named as Big Data (Villars et al., 2011) (Figure 1(b)). The birth of Big Data has seriously challenged the widely prevailed existing data management capabilities. The legacy computation techniques are not at par to deal with the Big Data. The rapidly growing volume of the multifarious data has caused an absolute paradigm shift in how the modern data should be computed. This new era of Big Data witnesses a revolutionized evolution of new potential data engineering techniques. Also, the existing infrastructure is required to be sufficiently large, efficient and scalable that is capable to easily accommodate the Big Data challenges. Such infrastructure requires not only high-end hardware and software, but expansive maintenance efforts also. However, several data models to efficiently process Big Data are already developed and a number of such models are still emerging, relying on heterogeneous underlying storage technologies. As the footprint of Big Data applications grows, an increase in the infrastructure cost cannot be ruled out completely. Consequently, the reliability on cloud environments is also increasing to obtain appropriate services to deal with Big Data stress. However, to construct a suitable, efficient and cost-effective cloud computing environment is also a major challenge. Today, the cloud computing also has emerged as one of the major shifts in recent information and communication age that promises an affordable and robust computational architecture for large-scale and even for overly complex enterprise applications (Fenn et al. 2008). It is a powerful and revolutionary paradigm that offers service-oriented computing and abstracts the software-equipped hardware infrastructure from the clients or users. Although, the concept of cloud computing is mainly popular in three flavors - 1) IaaS (Infrastructure-as-a-Service ), 2) PaaS

1

(Platform-as-a-Service ), and 3) SaaS (Software-as-a-Service ), but in this data science age, should be equally expandable to DBaaS (Database-as-a-Service ) (Curino, et al., 2011; Seibold et al., 2012). One of the other rationales behind the growing popularity of cloud computing over the traditional infrastructure, simple to complex up to enterprise level is the economic feasibility it offers. It is a new shape of business model that is gradually turning into a powerful ubiquitous paradigm to perform massively scaled complex computing. It maintains and delivers the required resources and services on demand with a pay-per-use scheme, especially related to information technology (IT) to the requester business entities. Cloud computing greatly destroys the needs of localization of housing and maintaining the complex and expensive computing infrastructure, hardware and software both, and required dedicated physical space. Some of the other salient features of cloud computing include, comparatively lower upfront investment, risk free environment at client site, and allowable elasticity in requested resources, services, and consequently payment. Today, more and more applications are inclining to avail the benefits of cloud computing and a growing number of such applications are noticed on cloud platforms. Also, observed is the tremendous growth in the scale of data (Big Data) cultivated through the complex computing applications housed in the cloud. Thus, the scalable data management is becoming a critical but integral part of cloud computing infrastructures, especially for the intensive data-driven applications such as decision support systems. Although, the pursuit to address the Big Data challenges has given rise to number of systems, but the scalability, availability, integrity, transformation, quality, and heterogeneity of the data always remain challenging. In this study, we aim to investigate comprehensively the role and status of cloud computing environment in Big Data ecosystem. This extended investigation will cover a broader spectrum of cloud computing and its growing utility in Big Data. As mentioned above, the quest for efficient solutions for the Big Data challenges has led to a plethora of data models; we propose a novel XCLOUDX {XCloudX, X…X} classification, which is solely based upon the scientific name of the data models; the term “X” can be a set of variables and/or numerals. The objective of this novel characterization is to understand whether the name of the cloud-assisted NoSQL Big Data models (XCLOUDX) is intuitive enough to indicate to cloud computing environment underneath or vice versa. The pool of study contains cloud-assisted those data models (XCLOUDX), which 1) consist of the term “cloud” in their name as one of constituents such as XCloudX (one of two children of XCLOUDX), 2) do not include the word “cloud” in their name at all; the latter category is represented by X…X class (another child of XCLOUDX). Additionally, we briefly explore some interesting case studies from different business domains, academia or industry that further strengthen the notion of the growing spectrum of cloud in Big Data ecosystem. Over the last few year, the cloud computing has evolved as scalable, secure and cost-effective solutions to an overwhelming number of applications from diversified areas. The traditional cloud ecosystem delivers the services in three popular flavors - Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). The scientific communities understand the purview of pervasiveness of cloud computing and intent to fan out its technological advantages to benefit every single field of study. The imagination of ubiquitous information access, free from the geography constraints, and supported by the distinguish characteristics of cloud computing has motivated enough the communities further that has extended the cloud ecosystem beyond IaaS, PaaS, and SaaS. This gives rise to the accelerated evolution of “as-a-Service” (aaS) era, permeating through almost all the domains in academia or industry. Moreover, we highlight an extended list of new entrants of aaS family to the cloud computing stack on the top of IaaS, PaaS, and SaaS. We believe that the thorough understanding of the relationship of the cloud computing with Big Data is inevitable to ensure the efficient and successful management of the scalable data (such as Big Data) in the cloud environment, emulating the popularity of relational data management in traditional infrastructure settings. Rest of the paper is structured as follows. The definition, characteristics, and classification of Big Data are briefly introduced in section 2. Sections 3 provides the brief overview of cloud computing. Section 4 is expanded to explore the role of cloud computing in Big Data. Section 5 discusses the proposed novel XCLOUDX classification and reviews some of the leading Big Data models of XCloudX kind. This section also investigate whether there are some X…X type data models that also leverage cloud computing for data analytics. Section 6 explores some of the interesting and fresh case studies. Section 7 highlights the emergence of as-a-Service era. Finally, the paper concludes in section 8.

2. Big Data - definition, characteristics and classification

The massive amount of digital footprints and sizeable traces are being deposited through the leading data outlets such as social media, sensors, power meters, etc. About 2.5 quintillion bytes of data (IBM, 2012) are being generated primarily by these sources every day. The actual pattern and nature of such data is indistinct, but is 2

certainly large, complex, heterogonous, structure and unstructured. And, in this data science age, as a whole, it is being referred as Big Data (Figure 1(b)). An extreme diverse set of the structured and unstructured data is also observed from the Internet users (O’Leary, 2013). Figure 1(c) is the comprehensive ecosystem of Big Data (Palermo, E. 2014), which provides a composite demonstration of some of the characteristics -- volume, velocity, variety, and veracity, as well as some of the challenges -- storage, capture, curation, search, sharing, transfer, analysis and visualization, of Big Data. Apparently, in order to procure useful insights from Big Data, enormous modeling efforts are required compared to the traditional data. Also, the rapid growth of this complex and heterogeneous data on such a large scale generates some strong suspicion on the computing capacity of the existing data models (Manyika et al., 2011; Hilbert and López, 2011) (Figure 2(a)) and is one of the core issues today for the data science community. Not only from storage capacity, but also from processing aspects, the existing database management systems fall short to handle them. As Big Data and its applications are pervading across the larger aspects of human life, various scientific communities envisage a broader outreach from the insights of Big Data. Consequently, they echo for the development of new capable technologies to cater the current data processing need. The state of the art research in

Figure 1(a). Data growth vs. years Figure 1(b). Big Data Figure 1(c). Big Data ecosystem (Source: walasolutions.com/ (Source: mattclarkecto.com (Source: businessnewsdaily.com/ simplifying-big-data/) /page/2/) 5825-big-data-risks-rewards.html)

Figure 2(a) Storage capacity vs. Computation Capacity: Data size exceeds the computation capacity (Source: Hilbert et al., 2011)

Big Data is still far behind from maturity, has enormous scope of research; continuous and active research participation is required from various potential scientific communities to have new findings. Figure 2(b) highlights some important aspects of the life of Big Data – 1) source of data, 2) format of data, 3) data management techniques, 4) modeling of data, and 5) processing of data. A potential variability can be observed at every stage. For Big Data the variablity can be termed as catagory that helps in classifying the characterstics (Nugent et al., 2013) of the Big Data that could offer valueable insights about the Big Data, especially in cloud. Sharma et al. (2014, 2015a, 2015b) 3

have provided the detailed explantion and discusisons about the characteristics and complexities of each category mentioned in this figure 2(b) and are briefly mentioned here once again.

1) Source of data  Internet. There is a huge amount of data being collected through the internet. There are devices such as smart phones, which on connected through the internet offers various kind of services that generated vast amount of crude, largely unstructured data (Rao et al., 2012).  Social media. The social sites such as Facebook, Twitter, Blogs and Microblogs are rich sources of data generation.  Sensors. The sensors are used in several different type of complex systems such as weather research and forecasting, and produces a huge Figure 2(b). Various aspects of Big Data life amount of data.  Other devices. There are other devices also, which may not considered as aggressive as social media in data or information production, but their footprints cannot be ignored completely. Example includes power meters.

2) Format of data  Structure. The structured data follows a specific format and has a relational structure and the management of the data is easily feasible using a standard SQL-type languages, often seen in relational database management systems (R-DBMS). Examples of structured data are numeral, string, dates, etc.  Unstructured. The unstructured data does not have any specific pattern. Data gets collected in various format such as text, videos, geographic location and time information. Such data is continued to grow at a rapid pace and management of such data poses severe challenges to the existing computational capacity.  Semi-structured. The semi-structured data is not manageable by traditional DBMS techniques, but the analysis and understanding of such data is not impractical either. Some comprehensive rules are required that are intelligent enough to dynamically decide the next process, once the previous process is over (Frank, 2012).

3) Data management techniques

In 2007, Amazon introduced a distributed NoSQL system, called Dynamo, through a published paper, which was used by Amazon primary as its internal storage system. Amazon was one of first few big giants to begin to store majority of its principal business data in a storage management system that was not relational in nature (Leavitt, 2010). Ever since, the data science has made some advancements in the form of Big Data and efforts are still on to strengthen it further. The result of which is the presence of a good number of robust NoSQL Big Data models today. They are largely classified into popular four types. Figure 2(c) Popularity trend for different categories of NoSQL data models (Source: solidIT, 2014)  Key-value store. The key-value stores (Berezecki et al., 2011) store the structured or unstructured data in key-value pairs, where the data is the value, which is retrieved by the key of the pair. The austerity of such 4

data models attributes them to be perfectly suitable for lightning fast and highly scalable retrieval of the complex data as values.  Wide column store. The wide column store (Cattell, 2011) houses the closely related data in one extendable column. Such data models are very useful for distributed data storage, especially versioned data, large scale batch oriented data processing (e.g. sorting, parsing, conversion, etc.), exploratory and predictive analytics, etc.  Document-oriented store. Such data models (Cattell, 2011) store the data as the collections of the documents. A document is flexible enough to freely accommodate large number of exceptionally long data fields. The document in a document data model is encoded in XML, JSON (JavaScript Option Notation) or BSON (Binary version of JSON) format.  Graph data model. The graph data models (also known as Graph DBMS) use structured relational graphs attributed with interconnected key-value pairs as properties. The graph data models constitute and store the data along the nodes, edges, and properties of a graph structure (Angles and Gutierrez, 2008).

Figure 2(c) highlights the popularity trends for all the four classes of NoSQL data models, discussed briefly above.

4) Modeling of data  Data cleansing. The data cleansing process helps in identifying the data, which is not complete, or corrupt (Rahm et al., 2000). Techniques such as data imputation, interpolation are used to eradicate the impurities from the data.  Data transformation. As the need of computation, the data can be transformed into suitable format.  Data normalization. This process is especially applicable to the structured data. The collected data may have redundancies that are required to be reduced, if not removed completely (Quackenbush, 2002).

5) Processing of data  Streaming. It is called as real-time data processing. One popular system for such processing is Simple Scalable Streaming Systems, famously known as S4 (Neumeyer, et al., 2010). It is a general purpose, scalable, pluggable system that is used to develop the applications that process the regular data streams.  Batch processing. The data is processed in long-running batch jobs. The related applications are scaled across the large clusters consisting of several nodes of commodity hardware. Today, the system are achieving the batch processing by running the MapReduce framework underneath (Chen at al., 2012).

3. Cloud Computing Overview

Cloud computing is considered as one of most accomplished and ubiquitous paradigms in 21st century, especially for the service related computing. It has been credited for the revolution, it has brought by abstracting the complex and expansive computing infrastructure underneath. Though, in service-oriented age, there are only three cloud paradigms that have attained significant popularity -- Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). However, in the data science era, a new paradigm -- Database-as-a- Service (DBaaS) -- is also gradually drawing widespread attention from various scientific communities. Elasticity in service usage and consequently price (pay-per-use), low financial inputs upfront, reasonably reduced time to market (TTM), responsibility shift in risk management, etc. are some of the potential convincing attributes of cloud computing. Such features attribute it as pervasive, economical, and a quintessential paradigm that is highly suitable for novel applications deployment. Such economic feasibility is very far from reality under the ambit of any traditional infrastructure environment; as a result of which, more and more applications are migrating into cloud environment. This results a sharp spike in the scale of the data traffic through, which indicates that the voluminous scalable data, storage, and their processing models are important components of the cloud environment. In sum, cloud computing is a financially economical but technologically robust invention that is designed for complex and overly large scale computations; which relinquishes any need for a dedicated space, and onsite management of expensive hardware infrastructure and delicate software systems.

5

3.1 Salient features of cloud computing

In an informal way, cloud computing can be defined as the pay-per-use service of the highly scalable computing resources, offered from an outer environment by a third party vendor. A resource-rich cloud environment may house thousands of resources, but a client (or a user) is just billed only for those resources that are requested or used by him. However, at any point of time, and from any geographical location, where Internet accessibility is available, any resources, active under the contractual agreement, can be accessed. A cloud service provider is solely responsible for the administration of infrastructure, constituted by physical hardware and expansive software; a client is absolutely free Figure 3(a). Characteristics of a cloud service from such burdens. Figure 3(a) briefly explains three main characteristics of the cloud services that primarily distinguish them from the classical and commercial hosting services.  Pay per usage. A user may avail the cloud services on the basis of minutes or hours and will be billed accordingly. A user is not bound to do the advance payment for any such service, where the user is unsure their future usage in actuality.  Flexibility. Based upon the user’s demand, the resource usage can be scaled up or down.  Infrastructure Management. The service provider owns the infrastructure and is solely responsible for its management also.

3.2. Cloud services

In this section, a general classification of some of the leading cloud services is provided (Figure 3(b)). Each class is briefly described here.

3.2.1 Software-as-a-Service (SaaS): SaaS is one of the most popular, repository-rich and widely used cloud model that is offering the services for more than a decade now. The service repository is much diversified and cover a very wide range of simple to complex services such as Google Email, Google Doc, etc. Under the contractual agreement of SaaS model, the vendor also called the service provider is fully responsible to provide all the essential infrastructure consisting of the robust hardware resources and expansive software systems. Also provided is a graphical user interface (GUI) that facilitates the user interaction with the service.

3.2.2 Infrastructure-as-a-Service (IaaS): As, the name suggests, IaaS offers the infrastructure as service. The infrastructure consists of various building blocks, which can be combined or layered to derive a customized environment most appropriate to execute the designated applications. Some of the most popular examples of IaaS cloud model include Amazon Web Services (AWS), Rackspace, etc.

3.2.3 Platform-as-a-Service (PaaS): In the PaaS cloud model, the service provider is responsible to provide the risk free and robust environment for software product development. The environment consists of required software tools and is hosted on the hardware infrastructure of the service provider. A user can avail the benefits of the PaaS for the software development, either by using the APIs provided or thorough a GUI. Google App Engine, Force.com, etc. are some of the popular example of PaaS.

3.2.4 Database-as-a-Service (DBaaS): The DBaaS cloud model (Curino, et al., 2011) is gaining popularity in this data science age (Wolfe, 2013). The service agreement under DBaaS assures the shift of data management related responsibilities, especially administration from end user to a third party vendor, called service provider, who can be public or private. Some of the data related operational burdens include upgrade, provisioning, failover management, configuration, seamless scaling, performance tuning, backup, privacy, access control (Seibold and Kemper, 2012), etc. DBaaS not only migrates the responsibilities, but it guarantees overall a very lower costs to the end users. Also, on the client site, the DBaaS model avoids any needs of a professionally trained database administrator (DBA) who is primarily responsible of data management and maintenance otherwise. In DBaaS, the

6

full access to a complex database can be achieved through very simple service calls only. The users, who use these services, do not feel that they are interacting with any database. Therefore, the engineers, engaged in application development are neither expected nor required to have expert understanding of database management. The data related operations such as failover, scaling, etc. in DBaaS cloud is expected not to impact any live user in any mean. In, sum, a DBaaS offers - 1) a shared, consolidated platform to provision database services on, 2) a self-service model for provisioning those resources, 3) elasticity to scale out and scale back database resources, 4) chargeback based on database usage. Some of the early DBaaS providers include Amazon RDS and Microsoft SQL Azure. Figure 3(b). Types of cloud services in modern computing 3.3 Types of clouds

Based on the scope of accessibility and spatial location, a cloud environment can be further classified (Figure 3(c)).  Public cloud. The accessibility of public cloud is open to end user, who has the Internet availability. However, the access is hidden behind the valid user credentials for the returning user. Figure 3(c). Types of cloud (Source: blog.backupify.com) A new user may be asked to fill out a simple sign up form to gain access to the public cloud. Two leading public cloud vendor include Amazon’s AWS and Rackspace.  Private cloud. This is a proprietary environment in the cloud that can be accessed only by a restricted group of users. The usual example of such cloud environment is the privately owned data center secured by the firewall of the organization.  Hybrid cloud. A hybrid cloud is an environment in which the private services run on a public cloud with a restricted access through a virtual private network (VPN) system.

4 Cloud Computing in Big Data

The usage of incredibly large data (Big Data) is capable to answer some of the incredibly complex questions and is being used by various organization across a variety of disciplines such research, financial, pharmaceutical are just to name a few. When researched in tandem with other information, evidently, the data potentially divulges the interesting relationships and patterns that would otherwise remain unexplored. Regardless to the Big Data use cases, to process and analyze such substantial and overly complex dataset, the modern technologies are required. Cloud computing has emerged as a robust technology for complex and massive-scale computations that shifts the burden of maintaining the expensive computing resources, especially software and hardware and dedicated space from onsite client to third party dedicated service provider. The data generation through cloud computing has grown manifold lately and based on its attribution falls in Big Data category. The efficient and satisfactory processing and analysis of Big Data need a comprehensive computational infrastructure. The challenges posed by Big Data stress upon to redefine the legacy technologies, services and practices that have been in active usage for several years. The recent technological advances in storage devices have not only reduced the physical hardware size, and increased the actual storage capacity but lowered the storage costs also. Such compact storage media and high-end data analysis techniques have facilitated the easy availability of Big Data to communities, academia and industries.

7

Cloud computing can be attributed and credited to bring the analysis of Big Data to masses that include the users of various level of scientific and technological understanding and may range from naïve to elite experts. The continued technical advancements are further strengthening the relationship between cloud computing and Big Data. The cloud computing and Big Data are coming together to address the data- oriented challenges. Figure 4 is an attempt to capture the use of cloud computing in Big Data ecosystem; this, however, is not claimed to be the best representation. Based upon the need and the circumstances, the SaaS, PaaS, IaaS or DBaaS cloud models can be sought from a cloud service provider. The data, satisfying all the V’s of Big Data (Sharma et al. (2014, 2015a, 2015b)) is Figure 4. Cloud computing in Big Data ecosystem stored in the cloud environment rather than the limited and relatively less efficient local storage system, attachable or detachable to the computing units. The cloud storage is combatively robust, distributed, scalable, fault-tolerant that perfectly matched the processing frequency of a parallel and distributed programming model. The communication between the cloud environment and external world is facilitated either by the graphical user interface (GUI) or by the APIs, made available by the service providers. These interfaces provide the users not only the rich set of operations but also the platforms to perform them, directly by click-based selection or via programming implementations. Collection, aggregation, visualization, and data mining are a few such operations, shown in this figure. In Big Data analytics, visualization has emerged as relatively more important action as it abets the visual depiction of analytical outcomes, which in turn better assist the decision making process. Apparently, the applicability, usability and acceptability of cloud environment to handle Big Data is regularly growing not only across the private industries, but government organizations also have recognized the potential of cloud computation. In research and academia also, the usage of cloud-oriented services is pervious across all dimensions.

4.1 Related studies on cloud computing for Big Data

Talia (2013) agrees that the cloud-assisted infrastructure can effectively be used for Big Data storage and analytics. The author also discusses the diversity of data, its complexity and computation capacity for large data analytics. The growth of data also increases its complexity and consequently the concurrent processing also becomes overly complicated. In such situation, the use of cloud-enabled environment certainly becomes a necessity (Ji et al., 2012) as the traditional settings gradually succumb to the growing data stress, become infective, choked and eventually ceased. Dean and Ghemawat (2008) suggest MapReduce as one of the good framework to process Big Data as it enforces parallel computation in clusters under distributed environment. Bollier et al. (2010) discuss the merits of cluster computing and its potential to amicably address the data growth. Miller (2013) discusses the drawbacks of peer pressure of organizations for quick adaptation and implementation of upcoming technologies e.g. cloud computing to have reasonably balance between the storage and computation capacities. Hence, the role of traditional DBMS cannot be ignored in this evolving cloud environment as it ensure a risk free and smooth transition of complex application from its end to cloud-assisted environment. Kwon et al. (2014) discuss the merits and demerits of Big Data analytics. The authors are convinced that Big Data analytics is an innovative approach that has the potential to boost the performance of the related entity. However, adoption of such capability does not seem trivial for each business entity because of shallow understanding and experience in Big Data; the authors have proposed a model to understand the relevant issues in early Big Data adaption. Singh et al. (2014) use some of the components such as Hive and Mahout of Hadoop ecosystem and build a scalable quasi real-time intrusion detection system. The developed system is a test-bed for real-time P2P Botnet detection, which acts as a pre-processing engine for existing

8

IDS/IPS. Schnase et al. (2014) describe the major Big Data challenges in climate science and show how cloud computing can address them through a generative approach, Climate Analytics-as-a-Service (CAaaS). CAaaS is a Climate Data Services API owned by NASA and authors of this paper claim to be the first to describe that. Tannahill and Jamshidi (2014) construct a bridge between System of Systems (SoS) and Data Analytics to develop reliable models for such systems. The data analytics is used to generate a model to forecast the produced photovoltaic energy to assist in the optimization of a micro grid SoS. Aluru and Simmhan (2013) serve as editors on a special issue that address the issues occurring in Big Data management and analytics. Hipgrave (2013) explores the important aspects of smarter fraud investigations -- smarter technology, improved data visualization and better internal communication and seeks the possibility to take advantage of Big Data analytics to detect fraud fast and share critical investigative information. Zhang et al. (2013) also believe in MapReduce like framework for parallel data processing in cloud. They discuss the migration of massive, dynamically-generated, geodispersed data into the cloud with timely and cost-minimizing uploading strategies. Demirkan and Delen (2013) present a conceptual framework for service- oriented decision support systems (DSS in cloud) that suggests to migrate the analytics and big data in cloud. The authors also furnish the definitions Data-as-a-Service; Information-as-a-Service; Analytics-as-a-Service. Pandey and Nepal (2013) serve as the editors on CCSA2012 event that invited the high quality scientific publications addressing the issues related to diversified aspects in cloud computing. Lee et al. (2012) discuss the concerns of incapability of single cloud in handling the Big Data services and suggest to address these issues by multi-clouds (cloud-of-cloud/ rain computing). Srirama et al. (2012) take advantage of MapReduce framework in solving scientific computing problems. For more complicated problems, an alternate to MapReduce, called Twister is considered by the authors and results are analyzed in details. Yan et al. (2013) aim to expand the data scalability of power iteration clustering (PIC) algorithm by implementing a parallel PIC. Various parallelization strategies and implementation details are explored to reduce the costs, incurred in computation and communication also considering the commodity computers. Schadt et al. (2011) explore some of the diverse cloud computing strategies, presently available in biology. The objective is to address the issues, surfacing due to the emergence of Big Data.

5 XCLOUDX Classification of Cloud-Assisted Big Data Models

There are a good number of cloud-assisted data models available today, that provide efficient and most accurate solutions for complex Big Data problems. However, the pool of cloud providers is still growing. Figure 5 shows a spider shaped comprehensive organization of Big Data and some of the leading cloud models, where former is reflected by the central system and latter are branched as peripherals. Each branch represents a unique characterization and the cloud data models potentially having the same distinctive

Figure 5. Cloud-assisted Big Data models (Source: http://stevenimmons.org/) characteristics are grouped together. Further, in this section, we propose a novel first-hand classification, called XCLOUDX, which specifically is dependent upon the name of the cloud models. The motivation behind such classification is to gauge, how intuitive the name of the model is. As mentioned above, the pool of the cloud providers is not yet saturated, rather it is growing rapidly. It becomes utmost difficult, likely impossible, for a user to read through the documentation of each individual model to understand its technology to judge it Figure 6. XCLOUDX classification tree for cloud data models suitability, especially from cloud

9

perspective. And, there is a high probability that after a brief web surf, the searching process is ceased. The user might have missed the most appropriate model as it could not come to his search radar; along with the other reasons, the short of user’s patience or time constraints in searching and reading through may be the acceptable excuses. The XCLOUDX classification can provide a viable solution to address such issues like stated above, up to greater extent, if not fully. The classification is applicable to the cloud-assisted data models only and considering their scientific name as the underlying basis, divides them into two classes -- XCloudX and X…X, where ‘X’ is any alphanumeric variable. The following sections further develop the proposed XCLOUDX classification and explore it in larger detail. The expectations from this classification are to encourage more intuitive names of the models, especially cloud-assisted, that out rightly could hint about the underlying technologies. Below is the mathematical representation of XCLOUDX. The pre- and post-CLOUD, X’s are not synchronous and concurrently may carry dissimilar values under the aegis of the defined domain.

XCLOUDX = {XCloudX, X…X} | ∀ X ϵ [A-Za-z0-9_-]+

5.1 XCloudX cloud data models

Definition 1. XCloudX is the representation for that class of Big Data models, which embody the term “cloud” in their name. XCloudX = {X}Cloud{X} | ∀ X ϵ [A-Za-z0-9_-]+ Hypothesis 1. All cloud-enable Big Data models have the word “cloud” in their name.

In this section, we review cloud-assisted some of the Big Data models of XCloudX kind with the objective to validate the hypothesis 1.

5.1.1 Cloudant (cloud): Cloudant (IBM, 2014), presently administered by IBM, primarily is a cloud-based, non- relational, open source, distributed database-as-a-Service (DBaaS) with zero-configuration to handle the Big Data challenges. It is highly suitable for developers, who require to load, store, and analyze the operational data of complex applications that also include mobile and web. Cloudant is offered as a fully managed service that evades all the data management mechanics from the developers. As a result, the whole efforts of the developers are invested solely in the development of the applications, which sometime are critically complex. This helps elevating the time- to-market (TTM) (Glinz and Mukhuija, 2003). Cloudant comes with some premier features such as high availability, elastic scalability. The synchronization of mobile devices is one of the unique attributions of Cloudant. The CouchDB (Sharma et al., 2014, 2015a) is the primary project, the Cloudant is built upon. The exactly same set of REST APIs of CouchDB are used by Cloudant and absolutely in the same way.

5.1.2 CloudKit (cloud): The CloudKit framework (Hanson, 2015) is a new technology that is meant to be used as service for data transportation with (to/from) cloud. CloudKit is not a local storage rather is a transport mechanism that is used as the Rack middleware. It provides an auto-versioned RESTful JSON storage that has no schema restriction. Although, it is as middleware component for Rack-assisted applications but, can be used be used on its own. The auto-versioned feature of the JSON storage helps in tracking each version of every JSON document.

5.1.3 Cloud Datastore (cloud): Cloud Datastore (Google Inc., 2010a) is a NoSQL datastore to store the schema- free, non-relational data in cloud. Each instance of the Cloud Datastore is entirely managed by Google Inc. Cloud Datastore experiences no downtime and the data is replicated across multiple datacenters, hence high data availability. Also, the Cloud Datastore scales up automatically for growing data traffic, endorses ACID transactions, and offers eventual consistency for queries. The Cloud Datastore is accessed by JSON APIs supported by the Google APIs underneath as the infrastructure. The Cloud Datastore provides a web-based GUI interface and the client libraries and Google APIs for Java and Python that are used to administer the cloud instances as well as the server instances, required for local development.

5.1.4 LightCloud (non-cloud): LightCloud (.com, 2009), built on Tokyo Tyrant (FAL Lab, 2010), is an open source, distributed key-value database that is always compared with memcached from performance point of view. It is tested for storing millions of keys on a very small group of few servers. However, in contrast to memcachedb (Chu, 2008), the LightCloud is horizontally scalable with master-master replication architecture and commodity 10

hardware nodes are easily added to achieve it. Load balancing and automatic failover are some of the important features of LightCloud. Also, unlike memcachedb, which just caches the data in memory, the LightCloud persists the data to disk. Overall footprint of LightCloud is relatively much smaller -- client around ~500 lines and manager about ~400. Python is the only language for the core version of LightCloud database, however, it is easily portable to other languages also. Although, Salihefendic (2009), the founder and CEO of doist.io, in his blog quotes about the LightCloud as “Plurk's open-source cloud database LightCloud…” nonetheless, from the available documentation on LightCloud, there is no indication that the LightCloud database does rely on any kind of cloud for the storage and access of the data.

5.1.5 Cloudera (cloud): Cloudera lately has drawn widespread attention for its completely open source, Apache Hadoop distribution, popularly known as CDH (Cloudera’s Distribution for Hadoop) (Cloudera, 2012). CDH is a regressively tested distribution for Apache Hadoop, which has been widely deployed with extensive outreach. CDH is one of the most complete solution for Hadoop, which provides to the end users the batch processing of the massive data, support for MapReduce inclusion, interactive SQL and search, etc. The support to the other popular components of Hadoop ecosystem such as Pig, Hive, Flume, etc., also, the support for external solutions such as Oracle, Teradata, etc. is provided with the standard distribution of CHD. The users may use the cloudera manager of APIs to deploy and manage the Hadoop distribution.

5.1.6 PiCloud (cloud): PiCloud (Elkabany and Staley, 2014) provides a cloud computing environment that helps the users to avail the massive computational power (thousands of cores) from Amazon Web Services (Cloud, 2011), by completely avoiding the management, maintenance, or even configuration of the virtual servers. The access to the PiCloud interface is feasible using 1) a command line, 2) a Python cloud library, or 3) web base GUI. Although, Python is the default programming language used to develop standard cloud library, but PiCloud can be customized to accommodate other programming languages also.

5.2 X…X cloud data models

Definition 2. X…X represents that class of data models, which have cloud computing as the common underlying technology and have any combination of words but cloud in their name. X…X = {X}{X} | ∀ X ϵ [A-Za-z0-9_-]+ Hypothesis 2. The cloud-assisted Big Data models not necessary to have the word “cloud” stuffed in their name.

In this section, we analyze some of the Big Data models, which fall in X…X class; the data models should have cloud computing support. The aim of the investigation is to test the hypothesis 2.

5.2.1 1010data System (cloud): The 1010data (Bloor, 2011) is a columnar database system that has advanced dynamic in-memory capability. It is a cloud base solution that best avails the massively parallel processing for data management. It is a suite of advanced analytics that is devised and designed to perform the analytics exactly on the designated data elements, which are intended to be analyzed. The data is comprehended in orderly fashion and in the inherent structure, producing the efficient results. The unmatched efficiency of the query and the performance of the overall system stands 1010data out from peers. The 1010data performs the analytics right at the Big Data physical address (the database), therefore, the analyses are possible to conduct against the full granularity of Big Data without data sampling, aggregation, or redundancy, which would be highly unlikely to avoid in traditional stand-alone systems. The absence of the pre-calculation, aggregation, and sampling of the data helps improve the quick response times. The system claims to deliver the improved performance and also is linearly scalable. The massive parallel processing of the systems helps in big data discovery as it manages to analyze the most comprehensive Big Data with its full granularity.

5.2.2 Algebraix System (cloud): ALGEBRAIX (Silver, 2005) is a cloud technology that harnesses the power modern computer architectures and offers such a robust data analytics operations for Big Data, which are not 11

possible in the confines of conventional database systems, which are inflexible and inefficient to address the Big Data challenges. The ALGEBRAIX uses the full potential of mathematical models and maintains a universe of algebraic expressions relating data. Also, it avoids any need of prior formatting or use of any temporary data structure additionally. The data is loaded in its native format and operations are performed on the same. Therefore, ALGEBRAIX has a highly simplified loading mechanism that monitors the query patterns from the users and adaptively restructures the data. Also, ALGEBRAIX also furnishes the user the capability of database roll back to any point in time and further querying the data as it was in its previous avatar.

5.2.3 Azure DocumentDB (cloud): Azure DocumentDB (Microsoft Azure, 2015) as a whole provides an enterprise- proven hybrid IaaS and PaaS cloud infrastructure where DocumentDB serves as a NoSQL document database, which inside the database engine natively supports JSON and JavaScript. DocumentDB is born in the cloud and backed by blazingly fast, low-latency, write-optimized, solid-state drive storage. For applications, especially web and mobile, it provides predictable throughput, low latency, and flexible query. Microsoft OneNote is using DocumentDB in production already. Azure provides not only efficient but also economic data storage, backup, and recovery and is not restricted to any , language, tool, or framework. Being a cloud platform, Azure DocumentDB can be scaled up or down quickly according to the user’s requests and demands, that also affects the amount, the users pay for the infrastructure and services.

5.2.4 Amazon DynamoDB (cloud): Amazon DynamoDB (Vogels, 2012) is a fully managed (cloud), fast and flexible NoSQL database service for Big Data that rides on both document and key-value data models. DynamoDB is befittingly suitable for those complex applications that at any scale cannot afford any-moment data inconsistency and more than single-digit millisecond latency. DynamoDB allows a user to create global secondary indexes also, when the table is created. However, either through the console of DynamoDB, or by simple APIs calls, the global secondary can be deleted or add from the tables at any point of time, without disturbing the live traffic and continuous service support, the same table is providing. The secondary indexes aid a user to write rich query sets with filters. In DynamoDB, all items from a table are fetched through the Scan operations through APIs. Also, the Scan operation can be performed on secondary indexes to fetch all the data attributed with secondary indexes.

5.2.6 Datameer (cloud): Datameer, Inc. (Groschupf, 2009) provides cloud based (PaaS) effective, self-serviced, and schema-free analytics and visualization support for Big Data application for Hadoop. It also offers the deployment- specific SaaS Big Data analytics platform. Datameer is highly scalable up to thousands of nodes and all major distributions of Hadoop can best use it. Datameer specializes in analysis of large volumes of data for Apache Hadoop users. Datameer Analytics Solution (DAS) is one of the popular services that Datameer offers. It is a platform for Hadoop that integrates data source and an analytics engine. The analytics engine contains a spreadsheet interface with large number of analytic functions. The visualization aspect includes the reports, charts and dashboards. All the major distributions of Hadoop such as Cloudera, Greenplum HD, IBM BigInsights, MapR, Yahoo, Amazon, etc. are delivered with DAS. In Sum, Datameer provides a cost-effective Hadoop-as-a-Service (Schneider, 2012) platform. This avoids the on-premises overly complex configuration and deployment of Hadoop and therefore, helps the business executives and managers to stay dedicated to analytics only and utilize their time much effectively.

5.2.7 Google BigQuery (cloud): Google BigQuery (Google Inc., 2010b) is a web service, which facilitates the analysis of the Big Data interactively. From the cloud classification point of view, it is delivered as IaaS. Google Big Query can be accessed through GUI, command line or REST APIs with the help of the client libraries such as Java, PHP or Python. BigQuery can load the data directly from the Cloud Storage into flat or nested tables that are stored as a columnar database. Also, the data with anonymous format can be loaded into BigQuery with the restriction that it should be conformed into JSON or CVS formats before upload. The queries used in BigQuery are SQL-like, super-fast that use the processing power of Google's infrastructure.

5.2.8 Dydra (cloud): Dydra (Anderson, et al., 2011) is a fully managed cloud based graph database, which, natively stores the data as a property graph. The property graph clearly shows the relationships, if any, in the underlying data. Dydra is a very useful solution for a vast variety of Big Data applications as most of the problems are reducible to graphs with little or more efforts. Dydra is a multi-tenant data store and query engine. Dydra allows the data operations through the query language, called SPARQL. It is an industry-standard, which specifically devised to query the graph databases such as Resource Description Framework (RDF) (Pan, 2009). The use of SPARQL is

12

extremely easy and Dydra equips the user with a simple query editor, embedded itself in the browser. Dydra is built from the ground up to be performant for use cases built on graphs, like social networking. Social networking is one use case, however, the list is unlimited, where the performance from Dydra are really impressive. Dydra is platform or programming language independent and delivers the full benefits of a cloud service the users.

Hypotheses validation. Based upon the above investigation on the cloud data models, we try to validate the hypotheses 1 and 2.

Figure 6 is a classification tree that depicts the summary of the investigation on XCLOUDX classification. The XCloudX child of the tree root XCLOUDX (≠ XCloudX, case sensitive) CLASSIFICATION is to test the hypothesis 1, whereas the X...X child of the same root helps validating the hypothesis 2. It is clear from the figure 6 that the XCloudX child has its own children and at least one of them -- LightCloud -- is different from rest of them. It can be noticed that LightCloud has localized disk persistence, while its other peers rely on cloud storage, in turn the cloud computing technology. As per the statement of the hypothesis 1, all the cloud- assisted Big Data models have the term “cloud” associated with their name. The investigation on data models under XCloudX category reveals that LightCloud does not support the hypothesis. Although, the LightCloud (Light + “Cloud”) data model has the term “Cloud” as one of its name constituent, but it does not use cloud computing technology, rather on premises computations with localized persistence. Therefore, the hypothesis 1 is not universally correct. The figure 6 also shows that the child X…X has children and investigation establishes the fact that every child under X…X root has complete cloud-assistance for the computation. However, none of them have the word “Cloud” stuffed anywhere, in their scientific name; this proves the correctness of hypothesis 2.

6 Interesting Case Studies

This section briefly explores a list of some interesting and fresh case studies that witness the growing spectrum of cloud computing in Big Data ecosystem and use one of the data models that are explored in this paper. The selection of case studies has made across all the interesting areas such as mobile, telecom, health, financial, agriculture, computer software and hardware, transportation, government agencies, media, community, academia, eservice, ecom, gaming, space, etc. without any reservations.

6.1 Adobe (computer software): Adobe, under one of its program, called Systems Managed Services (Nelson, 2015) offers the commercial software and Adobe LiveCycle Forms, Adobe Connect conferencing software are just to name a few. The software delivery is to enterprises e.g. Fortune 100 companies, multi-national mega corporations, and public sectors. The reason behind the success of the Adobe software is the use of Amazon Web Services (AWS) cloud, which delivers the conducive environments for multi-terabyte operations. The integration of Adobe systems with the AWS Cloud helps Adobe to deploy and operate its own resources, especially software, rather than worrying about the infrastructure.

6.2 CDC (public health): The Centers for Disease Control and Prevention (CDC) always strive to improve public health concerns (Kass-Hout, 2013). Lately, CDC started a health awareness program, called BioSense 2.0. The goal is to provide the national, state or local level awareness about all the health-related threats and their appropriate responses. To bypass the financial burden of purchasing costly hardware infrastructure and software systems, and to avoid the involved complex configuration, CDC adopted Amazon Web Service (AWS) cloud. Which provides the low cost operations with pay-per-use scheme. Also, the hosted systems are high available. The cloud has full support for the security and practice compliance.

6.3 DellSecureWorks (IT services): Dell SecureWorks, with global headquarters in GA, USA, is an information security services industry that protects thousands of customers against cyber threats and attacks all around the globe (Cloudera, 2012). Dell SecureWorks collects and analyzes the information on millions of simultaneous IT events occurred globally in a few milliseconds. Dell SecureWorks operates round the clock and provides real-time security for the IT related assets of its clients. In order to provide and maintain such promising services, Dell SecureWorks has to process enormous data on high-end technologies, demanding high cost. Cloudera offers Dell SecureWorks a

13

better solution that address the data processing challenges. The solution is highly cost-effective and speedy for storing, scaling, and analyzing the massive amounts of information in real-time.

6.4 Expedia (eservice): Expedia, Inc. (Kobashigawa-Bates, 2010) is one of the world’s largest online travel agency, consisting of numerous popular brands such as Hotels.com, Hotwire.com, etc. in more than 60 countries. Along with the other operations, the direct bookings to travel suppliers, and advertising opportunity are highly popular and the company experiences heavy traffic every day. Expedia also powers its partners the same services. Observing a regular increase in through traffic, Expedia moved to use Amazon Web Services (AWS) in 2010 to avail the global infrastructure to support Asia Pacific customers and to solve key issues such as automation, and customer proximity, etc. By 2011, Expedia shipped its several high-volumes critical applications to AWS and Global Deals Engine (GDE) is one such example; GDE administers the deals with its online partners and allows them to use Expedia APIs and other tool tools for various purposes.

6.5 NASA (space): National Aeronautics and Space Administration (NASA) has numerous centers around the nation. Jet Propulsion Laboratory (JPL) is one of the premier centers that is constantly engaged in the robotic exploration of space (Cureton, 2012). It has launched the robots to every planet in the solar system. It gathers huge amount of imagery data, being pumped by the robotic sensors. For the Mars Exploration Rover and the Mars Science Laboratory, the successful missions, NASA/JPL started to the Amazon Web Services (AWS) cloud is an essential part of the tactical operations to capture and store the images and metadata collected. Also, NASA planned to share the thrilled success and experience of Mars mission with the interested population around the globe. Hence, it decided to make the most current up-to-the-minute details of the mission, especially during the final 7 minutes, rover took to descend through the Martian atmosphere and land on Mars. So, JPL used AWS again to stream the landing images and videos. Considering the fact the public users, spread all over the globe, would be visiting its websites, NASA decided to served its contents from AWS clusters, situated around the world. This helped to meet the global demand of heavy traffic experienced by websites and also offered enhanced the viewer’s experience.

6.6 Netflix (media): Netflix, founded in 1999 and based in California, supplies the Internet streaming media on demand to the users (TSE, 2015). The viewers almost all around the globe enjoy the subscription-based services of Netflix. The major market includes the countries such as North America, South America, Australia, New Zealand, Denmark, United Kingdom, France, etc. The content can be streamed online anywhere internationally. However, in United States, the flat rate DVD-by-mail system is also available, where the mailed DVDs are dispatched via the Permit Reply Mail. Today, Netflix offer a collection of way more than 100,000 titles on DVD and has greatly surpassed 10 million subscribers. The scalability of the subscription and the competitive pressure for supporting other media devices such as tablets, , etc. are some of the leading factors that compel Netflix to explore the advance solutions that promise the seamless global service. Netflix finally believes in Amazon Web Services (AWS) for services and content delivery. AWS cloud has powered Netflix to deploy thousands of servers and terabytes of storage in almost no time.

6.7 Cloud in academic research: Reid et al. (2014) have been studying, in laboratories as well as in hospitals, the growth of next-generation sequencing data. In clinical genetics, due to the growth, the analytics process of DNA sequence has become more challenging than DNA sequence production. The genomics results, which are expected to be accurate and reproducible, may scale from individuals up to sizable cohort. Considering this, Reid and team began to leverage on the Amazon web services (AWS) cloud. They developed a Mercury analysis pipeline and with the help of the DNA nexus platform, the pipeline was deployed to AWS. Now, a combination of a robust and fully validated software pipeline and a scalable computational resource exists, which is powered by AWS. So far, the new computational platform has been employed for over 10,000 whole genome and exome samples. In an another case, Noordhuis et al. (2010) planned to study the user’s ranking in large amounts of Twitter data, but wanted to outsource the burden of heavy computations. They started to use cloud computing for the type of commutations, they intended. To retrieve the ranks of the users, the PageRank algorithm was applied on and the Amazon Web Services (AWS) cloud was used to power all the computations. All the related data were obtained from Twitter first and then the PageRank algorithm was used on the procured data subsequently; a graph with 50 million nodes and 1.8 billion edges was web crawled to retrieve the data, which is almost two-thirds of total user base on Twitter. Therefore, AWS provided much affordable infrastructure for acquisition and analysis of humongous amount of data.

14

7 Emergence of as-a-Service Era

In 21st century, the data as well as the information have emerged as the most valuable resources that are insatiable required globally across all the disciplines. In last few years, the growth of data has raised the size of all the available data to a completely new level, where the data management operations such as curation, storage, modeling, etc. have become a bit complex and financially expansive. Furthermore, the security aspects of the data have highly fueled the concerns. The communities feel the urgent need to explore the alternatives that will be able to address all the concerns. And, because of the deep realistic promises of the cloud computing to deliver scalable, secure and cost-effective solutions, the community search for an appropriate substitute is converged on it. In the recent years, the communities have been focused on cloud-enable systems to ensure the customizable on demand access to computing resources. The most popular flavors, a cloud model offers as services are Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS). However, the thrust of Big Data, and the need of ubiquitous information access have further extended the cloud plumes beyond IaaS, PaaS, and SaaS. some An evolution of “as-a-Service” (aaS) era is increasingly observed and in this section, some new important members, developed in academic and industry, of aaS family are listed with little background description. The elements, contained by list can be considered as the new additions to cloud computing stack, which previously has only three elements - IaaS, PaaS, and SaaS.  Business Intelligence-as-a-Service (SaaS BI/BIaaS) is a new cloud computing paradigm for Business Intelligence processes that offers data access through a web interface. The implementations and details are hidden from the users. Consequently, the business processes are orchestrated in a simpler and faster manner (Sano, 2014).  Cashier-as-a-Service (CaaS) refers to those merchant websites that accept payments through third-party cashiers such as PayPal, Amazon Payments and Google Checkout (Wang et al., 2011).  Climate Analytics-as-a-Service (CAaaS) is a solution to address Big Data challenges in climate science. CAaaS combines high-performance computing and data-proximal analytics with scalable data management, cloud computing virtualization, the notion of adaptive analytics, and a domain-harmonized API to improve the accessibility and usability of large collections of climate data (Schnase, et al., 2015).  Cloud-Based Analytics-as-a-Service (CLAaaS) is a platform for big data analytics, which provides on demand data storage and analytics services through customized user interfaces that include query, decision management, and workflow design and execution services for different user groups (Zulkernine et al., 2013).  Continuous Analytics-as-a-Service (CaaaS) is a cloud computing model for enabling convenient, on demand network access to a shared pool of continuous analytics results of real-time events such as monitoring oil & gas production, watching traffic status and detecting accident (Chen et al., 2011).  Data-as-a-Service (DaaS) is a type of cloud-assisted services that deliver data on demand to consumers through APIs. DaaS helps the consumers to avoid the typical need to fetch and store giant data assets and then search for the required information in the data asset (Vu et al., 2012).  Data Integrity-as-a-Service (DIaaS) enables all the expertise related to data integrity in one place to deal with some of the well-known problems such as public verifiability and dynamic content. DIaaS not only releases the burdens of data integrity management from a storage service by handling it through an independent third party data Integrity Management Service (IMS), but also reduces the security risk of the data stored in the storage services by checking the data integrity with the help of IMS (Nepal et al., 2011).  Data Mining-as-a-Service (DMAS/DMaaS) allows the data owners to leverage hardware and software solutions provided by DMAS providers, without developing their own. It is especially for those clients, who have large volume of data but limited budget for data analysis, to outsource their data and data mining needs to a third-party service provider. (Liu et al., 2012).  Database-as-a-Service (DBaaS/DaaS) promises to move much of the operational burden of provisioning, configuration, scaling, performance tuning, backup, privacy, and access control from the database users to the service operator, offering lower overall costs to users. Early DBaaS examples include Amazon RDS and Microsoft SQL Azure that promise to address the market need for such a service (Curino, 2011).  Digital Forensics-as-a-Service (DFaaS) is a new service-based approach for processing and investigating the high volume of seized digital material. It aids reducing the case backlogs and freeing up digital investigators to help detectives better understand the digital material. Now a days, this approach has become a standard for hundreds of criminal cases and over a thousand detectives (Baar et al., 2014). 15

 Education and learning-as-a-Service (ELaaS) can be understood as the use of cloud computing in the educational and learning environment. ElaaS enables the learners, instructors, and administrators to perform their tasks effectively by using the services it offers that results in overall education cost reduction (Alabbadi, 2011).  Everything-as-a-Service (XaaS) also called as anything-as-a-service, facilities the flexibility for users and companies to all on demand customize their computing environments to craft the desired experiences. The components that XaaS is highly dependent on are 1) a strong cloud services platform, 2) reliable Internet connectivity to successfully gain traction and acceptance among both individuals and enterprises (Duan, 2012).  Failure-as-a-Service (FaaS) allows the cloud services to routinely perform the large-scale failure drills in real deployments. FaaS enables cloud services to routinely exercise large-scale failures online, which will strengthen individual, organizational, and cultural ability to anticipate, mitigate, respond to, and recover from failures (Gunawi et al., 2011).  Forensics-as-a-Service (FRaaS) provides a comprehensive cloud forensics solution for creating a repeatable system. Such a system could be implemented as a standard forensics operational model for deployment within the cloud ecosystem regardless of environments and client service lines (Shende et al., 2014).  Hadoop-as-a-Service (HDaaS) enables the enterprises to perform various Hadoop-based operations such as analytics, management, and storage of Big Data in a cloud in a cost-effective and time-efficient manner. Hence, eliminates the need for any on premise hardware. The Hadoop architecture and the supporting applications are abstracted into a single cloud-based delivery (Wood, L. (2015).  HPC-as-a-Service (HPCaaS) is a new approach that uses a cloud abstraction to provide a simple interface to high-end HPC resources. The users can pay for HPC resources as needed (AbdelBaky et al., 2012).  Mobility-as-a-Service (MobiaaS) provides the consumers the required connectivity service continuity and seamless handover for flows like voice as the consumers use a multitude of devices to communicate (Baliga et al., 2011).  Object-as-a-Service (ObaaS) is based on the notion of building dynamically the service needed on each object and then integrate it in the whole composition. ObaaS runs on the Object, using its functionalities such as sensing, actuating, and computing (Cherrier et al., 2014).  Ontology-as-a-Service (OaaS) is an ontology tailoring process in the cloud, which underneath exercises the sub-ontology extraction and replacement on the cloud (Flahive et al., 2014).  Security-as-a-Service (SecuaaS/SaaS) is a data protection and a host & application protection solution. It validates the security services over a geographically distributed, large scale, multi-cloud and federated cloud infrastructure (Pawar et al., 2015).  Sensing and Actuation-as-a-Service (SAaaS) is a step forward to create a cloud of sensors and actuators. As, the cloud provides on-demand computing and storage of the resources with guaranteed quality of service; moving the sensors and actuators, that will be able to interact with the surrounding environment, will enable the development of new and value added services and opens avenue for pervasive cloud computing (Distefano et al., 2012).  Storage-as-a-Service (StaaS/SaaS) provides an online storage space in cloud to store the data. The users' concerns about the data security and privacy are addressed by robust cryptographic algorithms in place. StaaS also rewards the users the optimized computation cost, higher and sensitivity based on significance of the data (Patel et al., 2012).  Things-as-a-Service (ThiaaS) is a way where the innovative as well as the value-added services are implemented through the integration of the cloud computing with the Internet of Things. It is a way to develop a cloud of things where the heterogeneous resources are aggregated or abstracted based on the tailored thing-like semantics (Distefano et al., 2012).

8 Conclusions

The recent rise of cloud computing has been accepted also by a wide range of communities beyond IT. Despite this, the notion apparently still remains ab initio cloudy to many. In this contribution, we have tried to scientifically and technologically understand the concept of cloud computing that has awarded a completely new perspective to the

16

information delivery systems. It can be understood as a new business model that imbibes the context of outsourcing for clienteles. From the vendor perspective, it is a new, on demand, pay-per-use, and highly flexible, delivery and provisioning system for IT related resources and services. Cloud computing is not like one solution that benefits all applications, rather there may be some cases, where the usage of cloud computing fails to provide any gain. The potential candidates for cloud computing include such applications that are highly volatile and involve big computations to deal with Big Data. Today, in this information age, Big Data is offering the big scientific insights that eventually will be helpful in alleviating the human life and community at large. The much recent scientific and technological advancements have witnessed the closer proximity between the cloud computing and Big Data. The former is gradually inching to the vicinity of latter to host it completely. The thrust of this study was to investigate the growing constructive intrusion of cloud computing into Big Data ecosystem. Also, we briefly explored the concept of Big Data and explained the core components of its life cycle. It is perceived that the pool of Big Data models has increased rapidly in last couple of years and is still emerging. Each data model is developed with different storage technology and computing mechanism underneath. It becomes nearly impractical for a cloud-interested user to surf through that extended pool of data models to examine the underlying technology and segregate the cloud-enabled data models in the view of finding the apt one. To assist such users, we proposed a novel first-level classification, called XCLOUDX {XCloudX, X…X}. This classification was primarily based the scientific name and gauged the intuitiveness of the name of the cloud-assisted Big Data models. The subclasses XCloudX, and X…X were hypothesized and some of the available Big Data models of each class were investigated to validate them. It was concluded that a cloud-enabled Big Data model would be classified as XCLOUDX and belonged to either XCloudX (≠ XCLOUDX) or X…X class, based on its scientific name, where X was any alphanumeric variable. In contrast, one instance (LightCloud) was found, which elegantly qualified for XCloudX subclass, based upon its scientific name, but was not cloud-powered at least yet. So based on that very fact, it was deduced that currently, all XCloudX Big Data models not necessarily had to be cloud-supported. However, according to related literature resources, we suspected that the idea of LightCloud would have conceived to have cloud foundation underneath, but the development could not reach that intended maturity yet. Furthermore, to strengthen the notion of XCLOUDX, the proposed classification was applied and discussed with some important case studies. Additionally, we studied the impact of cloud computing motivation in the context of emerging as-a Service era. An extended list of the new members of as-a-Service family was explored. The list comprised several new as-a-Service platforms far beyond the traditional cloud computing stack that typically consisted of IaaS, PaaS, and SaaS.

References

AbdelBaky, M., Tavakoli, R., Wheeler, M. F., Parashar, M., Kim, H., Jordan, K. E., and Pencheva, G. (2012). Enabling high-performance computing-as-a-Service. Computer, (10), 72-80. Alabbadi, M. M. (2011, September). Cloud computing for education and learning: Education and learning-as-a- Service (ELaaS). In Interactive Collaborative Learning (ICL), 2011 14th International Conference on (pp. 589-594). IEEE. Aluru, S., and Simmhan, Y. (2013). A special issue of journal of parallel and distributed computing: scalable systems for big data management and analytics. Journal of Parallel Distributing Computing 73, 896. Anderson, J., Bendiken, A., Huckabee, J. (2011). Dydra: Networks Made Friendly, A product of Datagraph, Inc. [Online] http://dydra.com (Last accessed on April 17, 2015) Angles, R. and Gutierrez, C. (2008). Survey of graph database models. ACM Computing Surveys, 40(1), 1. Bell, D. (2013). UML basics: An introduction to the Unified Modeling Language. IBM developerWorks. [Online] http://www.ibm.com/developerworks/rational/library/769.html (Last accessed on Nov 12, 2013) Baar, R. B., Beek, H. M. A., and Eijk, E. J. (2014). Digital Forensics-as-a-Service: A game changer. Digital Investigation, 11, S54-S62. Baliga, A., Chen, X., Coskun, B., de los Reyes, G., Lee, S., Mathur, S., & Van der Merwe, J. E. (2011, June). VPMN: virtual private mobile network towards mobility-as-a-service. In Proceedings of the second international workshop on Mobile cloud computing and services (pp. 7-12). ACM. Berezecki, M., Frachtenberg, E., Paleczny, M., & Steele, K. (2011).Many-core key-value store. In Proceedings of International Green Computing Conference and Workshops (IGCC), pp. 1-8, Florida, USA. Bloor, R. (2011). Big Data Analytics - This Time It's Personal. The Bloor Group. [Online] https://www.1010data.com/uploads/files/1010data_Robin_Bloor,_Ph_D,_Big_Data_Analytics_white_pap er.pdf (Last accessed on April 14, 2015) Bollier, D., Firestone, C. M. 2010. The Promise and Peril of Big Data, Aspen Institute, Communications and Society Program Washington, DC, USA. 17

Burrows, M. (2006). The Chubby lock service for loosely coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation, pp. 335-350, USENIX Asso. Berkeley, CA. Carino, F., and Sterling, W. M. (1998). Method and apparatus for extending existing database management system for new data types. U.S. Patent No. 5,794,250. Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), 12-27. Chen, Y., Alspaugh, S., Katz, R. (2012). Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads, Proc. VLDB Endow.5, 1802–1813. Cherrier, S., and Ghamri-Doudane, Y. M. (2014, September). The “Object-as-a-Service” paradigm. In Global Information Infrastructure and Networking Symposium (GIIS), 2014 (pp. 1-7). IEEE. Chen, Q., Hsu, M., and Zeller, H. (2011, March). Experience in Continuous analytics-as-a-Service (CaaaS). In Proceedings of the 14th International Conference on Extending Database Technology (pp. 509-514). ACM. Chu, S. (2008). Memcachedb. [Online] http://memcachedb.org/ (Last accessed on April 12, 2015) Cloud, A. E. C. (2011). Amazon web services. Retrieved November, 9, 2011. [Online] http://aws.amazon.com/ (Last accessed on April 14, 2015) Cloudera. (2012). Dell SecureWorks slashes the cost of storage with Dell, Cloudera Hadoop solution. [Online]http://www.cloudera.com/content/dam/cloudera/Resources/PDF/casestudy/ Dell_SecureWorks_Case_Study.pdf (Last accessed on April 25, 2015) Cloudera, I. (2012). CDH Proven, enterprise-ready Hadoop distribution–100% open source. [Online] cloudera.com/ content/cloudera/en/products-and-services/cdh.html (Last accessed on April 14, 2015) Cureton, L. (2012). AWS Case Study: NASA/JPL's Mars Curiosity Mission.[Online] aws.amazon.com/solutions/case- studies/nasa-jpl-curiosity/ (Last accessed on April 28, 2015) Curino, C., Jones, E. P., Popa, R. A., Malviya, N., Wu, E., Madden, S. and Zeldovich, N. (2011). Relational Cloud: A Database-as-a-Service for the Cloud. 5th Biennial Conference on Innovative Data Systems Research, CIDR, January 9-12, 2011 Asilomar, California. Dean, J., Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters, Commun.ACM51, 107–113. Demirkan, H., & Delen, D. (2013). Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud. Decision Support Systems, 55(1), 412-421. Distefano, S., Merlino, G., and Puliafito, A. (2012, July). Enabling the cloud of things. In Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth International Conference on (pp. 858-863). Distefano, S., Merlino, G., and Puliafito, A. (2012, August). Sensing and actuation-as-a-Service: A new development for clouds. In Network Computing and Applications (NCA), 11th IEEE International Symposium, 272-275. Duan, Y. (2012, August). Value modeling and calculation for everything-as-a-Service (XaaS) based on reuse. In Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 13th ACIS International Conference on (pp. 162-167). IEEE. Elkabany, K. and Staley, A. (2014). PiCloud client-side library. [Online] https://pypi.python.org/pypi/cloud (Last accessed on April 14, 2015). FAL Lab. (2010). Tokyo Tyrant: network interface of Tokyo Cabinet. [Online] http://fallabs.com/tokyotyrant/ (Last accessed on April 12, 2015) Flahive, A., Taniar, D., & Rahayu, W. (2014). Ontology-as-a-Service (OaaS): extending sub‐ontologies on the cloud. Concurrency and Computation: Practice and Experience. Franks, B. (20102). Taming the BigData Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics, Wiley.com John Wiley Sons Inc. Furner, J. (2003). Little Book, Big Book: Before and After Little Science, Big Science: A Review Article, Part I. Journal of Librarianship and Information Science, 35 (2): 115–125 (Retrieved 2014-02-09). Glinz M., Mukhuija A. (2003). Constructive Cost Model. A seminar on software cost estimation WS 2002/2003. [Online] www.ifi.uzh.ch/rerg/fileadmin/downloads/teaching/seminars/seminar_ws0203/Seminar_4.pdf Google Inc. (2010a). Google Cloud Platform. [Online] https://cloud.google.com/datastore/docs/concepts/overview (Last accessed on April 11, 2015) Google Inc. (2010b). Google Cloud Platform. [Online] https://cloud.google.com/bigquery/what-is-bigquery (Last accessed on April 16, 2015) Groschupf, S. (2009). Datameer Inc. [Online] http://www.datameer.com/ (Last accessed on April 17, 2015) Gunawi, H. S., Do, T., Hellerstein, J. M., Stoica, I., Borthakur, D., & Robbins, J. (2011). Failure-as-a-Service (faas): A cloud service for large-scale, online failure drills. University of California, Berkeley, Berkeley, 3. Han, J., Song, M., and Song, J. (2011). A Novel Solution of Distributed Memory NoSQL Database for Cloud Computing. 10th International Conference on Computer and Information Science, Sanya, China. Hanson, R. 2015. YapDatabaseCloudKit. YapDatabase. [Online] github.com/yapstudios/YapDatabase/ wiki/YapDatabaseCloudKit (Last accessed on April 11, 2015) Hilbert, M., López, P. (2011). The world’s technological capacity to store, communicate, and compute information, Science 332 (6025), 60–65. Hipgrave, S. (2013). Smarter fraud investigations with big data analytics. Network Security, (12), 7-9. IBM, 2012. What is Big Data ? Bringing Big Data to the enterprise. [Online] 01.ibm.com/software/data/bigdata/

18

IBM, 2014. IBM Completes Acquisition of Cloudant. [Online] 03.ibm.com/press/us/en/pressrelease/43342.wss Ji, C., Li, Y., Qiu, W., Awada, U., Li, K. (2012). Big data processing in cloud computing environments. Pervasive Systems, Algorithms and Networks, in Proceedings of the 12th International Symposium, pp.17–23. Kass-Hout, T. (2013). AWS Case Study: US Centers for Disease Control and Prevention (CDC). [Online] http://aws.amazon.com/solutions/case-studies/us-centers-for-disease-control-and-prevention/ (Last accessed on April 25, 2015) Kobashigawa-Bates, L. (2010). AWS Case Study: Expedia.[Online] http://aws.amazon.com/solutions/case- studies/expedia/ (Last accessed on April 26, 2015) Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management, 34(3), 387-394. Leavitt, N. (2010). Will NoSQL databases live up to their promise? Computer, 43(2), 12-14. Lee, S., Park, H., and Shin, Y. (2012). Cloud computing availability: multi-clouds for big data service. In Convergence and Hybrid Information Technology (pp. 799-806). Springer Berlin Heidelberg. Liu, R., Wang, H. W., Monreale, A., Pedreschi, D., Giannotti, F., and Guo, W. (2012). AUDIO: An Integrity\ underline {Audi} ting Framework of\ underline {O} utlier-Mining-as-a-Service Systems. In Machine Learning and Knowledge Discovery in Databases (pp. 1-18). Springer Berlin Heidelberg. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute [Online] http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/ big_data_the_next_frontier_for_innovation (Last accessed on January 24, 2015). Miller, H. E. (2013). Big Data in cloud computing: a taxonomy of risks, Inf.Res.18, 571. Microsoft Azure. (2015). DocumentDB: A fully-managed, highly-scalable, NoSQL document database service. [Online] http://azure.microsoft.com/en-us/services/documentdb/ (Last accessed on April 17, 2015) Nelson, M. 2015. AWS Case Study: Adobe Systems. [Online] http://aws.amazon.com/solutions/case-studies/adobe/ (Last accesses on April 23, 2015) Nepal, S., Chen, S., Yao, J., and Thilakanathan, D. (2011, July). DIaaS: Data integrity-as-a-Service in the cloud. In Cloud Computing (CLOUD), 2011 IEEE International Conference on (pp. 308-315). IEEE. Neumeyer, L., Robbins, B., Nair, A., Kesari, A. (2010). S4: Distributed Stream Computing Platform. Data Mining Workshops (ICDMW), IEEE International Conference on, pp.170–177, Sydney, Australia. Noordhuis, P., Heijkoop, M., Lazovik, A. (2010). Mining twitter in the cloud: A case study, Cloud Computing (CLOUD), Proceedings of IEEE 3rd International Conference on, IEEE, Miami, FL, pp. 107–114. Nugent, A., Halper, F., and Kaufman, M. (2013). Big data for dummies, John Wiley & Sons. O’Leary, D.E. (2013). Artificial Intelligence and Big Data. IEEE Intelligent Systems, 28 (02), 96–99. OSGi Alliance (2010). Semantic Versioning. Technical Whitepaper, Revision 1.0 [Online] http://www.osgi.org/wiki/uploads/Links/SemanticVersioning.pdf Palermo, E. (2014). Big Data: Do the Risks Outweigh the Rewards? Business News Daily. [Online] Source: www.businessnewsdaily.com/5825-big-data-risks-rewards.html (Last accessed on February 31, 2015) Pan, J. Z. (2009). Resource description framework. In Handbook on Ontologies (pp. 71-90). Springer Berlin Heidelberg. Pandey, S., and Nepal, S. (2013). Cloud Computing and Scientific Applications—Big Data, Scalable Analytics, and Beyond. Future Generation Computer Systems, (29), 1774-1776. Patel, H., Patel, D., Chaudhari, J., Patel, S., and Prajapati, K. (2012, September). Tradeoffs between performance and security of cryptographic primitives used in storage-as-a-Service for cloud computing. In Proceedings of the CUBE International Information Technology Conference (pp. 557-560). ACM. Pawar, P. S., Sajjad, A., Dimitrakos, T., and Chadwick, D. W. (2015). Security-as-a-Service in Multi-cloud and Federated Cloud Environments. In Trust Management IX (pp. 251-261). Springer International Publishing. Plurk.com. (2009). Lightcloud: Distributed and persistent key-value database. Plurk Open Source. [Online] http://opensource.plurk.com/LightCloud/ (Last accessed on April 12, 2015) Quackenbush, J. (2002). Microarray data normalization and transformation, Nat. Genet. 32, 496–501. Rahm, E., Do, H. H. (2000). Data cleaning: problems and current approaches, IEEE Data Eng.Bull.23, 3–13. Rao, B. P., Saluia, P., Sharma, N., Mittal, A., Sharma, S. V. (2012). Cloud computing for Internet of Things & sensing based applications. IEEE, The Sixth International Conference on Sensing Technology, pp.374–380, Kolkata Kolkata, West Bangal, India. Reid, J.G., Carroll, A.,Veeraraghavan, N., Dahdouli, M., Sundquist, A., English, A., Bainbridge, M., White, S., Salerno, W., Buhay, C.(2014). Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline, BMC Bioinf.15, 30. Robert, H. (2012). It’s time for a new definition of Big Data. MIKE2.0: The open source standard for Information Management. [Online] http://mike2.openmethodology.org/ Salihefendic, A. 2009. LightCloud adds support for Redis. [Online] http://amix.dk/blog/viewEntry/19458 (Last accessed on April 18, 2015) Sano, M. D. (2014, June). Business Intelligence-as-a-Service: A New Approach to Manage Business Processes in the 19

Cloud. In WETICE Conference (WETICE), 2014 IEEE 23rd International (pp. 155-160). IEEE. Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L. and Nolan, G. P. (2010). Computational solutions to large-scale data management and analysis. Nature Rev. Genet. 11, 647–657. Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., and Nolan, G. P. (2011). Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology. Nature Reviews Genetics, 12(3), 224. Schnase, J. L., Duffy, D. Q., Tamkin, G. S., Nadeau, D., Thompson, J. H., Grieg, C. M., ... & Webster, W. P. (2014). MERRA analytic services: Meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-Service. Computers, Environment and Urban Systems. Schnase, J. L., Duffy, D. Q., McInerney, M. A., Webster, W. P., and Lee, T. J. (2015).CLIMATE ANALYTICS-as-a- Service. NASA.gov. [Online] http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20150000368.pdf (Last accessed on May 3, 2015) Schneider, R. D. (2012). Hadoop for Dummies. John Willey & sons. Seibold, M., and Kemper, A. (2012). Database-as-a-Service. Datenbank-Spektrum, 12(1), 59-62. Sharma, S., Shandilya, R., Patnaik, S., and Mahapatra, A. (2015a). Leading NoSQL models for handling Big Data: a brief review, International Journal of Business Information Systems, Inderscience, Vol. 18 (4). Sharma, S., Tim, U. S., Gadia, S., Wong, J., Shandilya, R., and Peddoju, S. (2015b). Classification and Comparison of NoSQL Big Data Models, International Journal of Big Data Intelligence, Inderscience, Vol.2 (2). Sharma, S., Tim, U. S., Wong, J., Gadia, S., and Sharma, S. (2014). A Brief Review on Leading Big Data Models. Data Science Journal, 13(0), 138-157. Shende, J. R. G. (2012). Forensics-as-a-Service. Cybercrime and Cloud Forensics: Applications for Investigation Processes: Applications for Investigation Processes, 266. Silver, H. C. (2005). Algebraix Data Corporation. [Online] www.algebraixdata.com/ (Last accessed on April 17, 2015) Singh, K., Guntuku, S. C., Thakur, A., & Hota, C. (2014). Big data analytics framework for peer-to-peer botnet detection using random forests. Information Sciences, 278, 488-497. solidIT. (2014). DB-ENGINES: Knowledge Base of Relational and NoSQL Database Management Systems. [Online] http://db-engines.com/en/ home (Last accessed on Sept 3, 2014) Srirama, S. N., Jakovits, P., and Vainikko, E. (2012). Adapting scientific computing problems to clouds using MapReduce. Future Generation Computer Systems, 28(1), 184-192. Talia, D. (2012). Clouds for scalable big data analytics, Computer46, 98–101. Tannahill, B. K., and Jamshidi, M. (2014). System of Systems and Big Data analytics–Bridging the gap. Computers & Electrical Engineering, 40(1), 2-15. TSE, E. (2015). AWS Case Study: Netflix.[Online] http://aws.amazon.com/solutions/case-studies/netflix/ (Last accessed on April 30, 2015) Villars, R. L., Olofson, C. W. and Eastwood, M. (2011). Big Data: What it is and why you should care. IDC White Paper. Framingham, MA: IDC. Vogels, W. (2012). Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications. All Things Distributed blog. [Online] aws.amazon.com/dynamodb (Accessed on April 17, 2015) Vu, Q. H., Pham, T. V., Truong, H. L., Dustdar, S., & Asal, R. (2012, March). Demods: A description model for data- as-a-service. In Advanced Information Networking and Applications (AINA), 2012 IEEE 26th International Conference on (pp. 605-612). IEEE. Wang, R., Chen, S., Wang, X., and Qadeer, S. (2011, May). How to shop for free online--security analysis of cashier-as- a-service based web stores. In Security and Privacy (SP), 2011 IEEE Symposium on (pp. 465-480). IEEE. Watters, A. (2010). The Age of Exabytes: Tools and Approaches for Managing Big Data (Website/Slideshare). Hewlett-Packard Development Company. Wolfe, A. (2013). Why Database-as-a-Service (DBaaS) Will Be The Breakaway Technology of 2014. Forbes.com, New York City, USA. [Online] http://www.forbes.com/sites/oracle/2013/12/05/why-database-as-a-Service -dbaas-will-be-the-breakaway-technology-of-2014/ (Last accessed on March 27, 2015) Wood, L. (2015). Global Hadoop-as-a-Service (HDaaS) Market. [Online] http://www.prnewswire.com/news- releases/global-hadoop-as-a-servicehdaas-market-2015-2019---increased-adoption-of-haas-among-smes- with-amazon-emc-ibm--microsoft-dominating-300053719.html (Last accessed on May 4, 2015) Yan, W., Brahmakshatriya, U., Xue, Y., Gilder, M., and Wise, B. (2013). p-PIC: Parallel power iteration clustering for big data. Journal of Parallel and Distributed Computing, 73(3), 352-359. Zhang, L., Wu, C., Li, Z., Guo, C., Chen, M., & Lau, F. C. (2013). Moving big data to the cloud. In INFOCOM, Proceedings IEEE (pp. 405-409). Zulkernine, F., Martin, P., Zou, Y., Bauer, M., Gwadry-Sridhar, F., and Aboulnaga, A. (2013, June). Towards Cloud- Based Analytics-as-a-Service (CLAaaS) for Big Data Analytics in the Cloud. In Big Data (BigData Congress), IEEE International Congress on (pp. 62-69). IEEE.

20