A Data-Science-As-A-Service Model
Total Page:16
File Type:pdf, Size:1020Kb
A Data-Science-as-a-Service Model Matthias Pohl, Sascha Bosse and Klaus Turowski Magdeburg Research and Competence Cluster, Faculty of Computer Science, University of Magdeburg, Magdeburg, Germany. Keywords: Data-Science-as-a-Service, Cloud Computing, Service Model, Data Analytics, Data Science. Abstract: The keen interest in data analytics as well as the highly complex and time-consuming implementation lead to an increasing demand for services of this kind. Several approaches claim to provide data analytics functions as a service, however they do not process data analysis at all and provide only an infrastructure, a platform or a software service. This paper presents a Data-Science-as-a-Service model that covers all of the related tasks in data analytics and, in contrast to former technical considerations, takes a problem-centric and technology- independent approach. The described model enables customers to categorize terms in data analytics environ- ments. 1 INTRODUCTION able recommendations, decision or application servi- ces for non-expert customers or just analogies to stan- An increasing keen interest in data science and data dard concepts is not clarified. Furthermore, a custo- analytics exists as reported by a trend analysis of mer is confronted to choose within a set of apparently the last 5 years (Google Trends). The data-intensive unclear buzzwords like data science, data mining, big world allures with generating revenue from analy- data analytics, etc. zing information and data that are simply and quickly Therefore we make a predefinition for the usage available. Different approaches for knowledge dis- of these terms and will diffuse a definition through covery (KDD) (Fayyad et al., 1996) or business- argumentation. The whole process that contains data related data mining (CRISP-DM) (Shearer, 2000) provision, data preparation, data analysis and data vi- are in use. However, proceeding is highly com- sualization is called data science in this paper. Data plex, time-consuming and needs expertise in diffe- analytics, business analytics and big data analytics are rent disciplines, either computer sciences, mathema- often used synomously for data science, however they tics or a context-related specialization. The motiva- could differ in context of use. Data mining is a key tion within a company could arise from preventing term in most of related works and is used in varient downtime of machines, getting insights about custo- different ways. We will take this term as a byword for mer relationships or optimizing business processes. data analysis. If the required expertise for data analysis cannot be This paper has twofold objectives. Firstly, it will provided internally, the tasks can be forwarded to provide a full-service model that will be extracted external consulting services. By using such servi- from existing approaches and will address data ana- ces it is possible to compensate the lack of exper- lytics services. Secondly, it will discuss the arising tise, but it is very cost-intensive and still extremely service offers and data science process steps. The time-consuming. The paradigm of cloud computing subsumption of service models simplifies the asses- (Mell and Grance, 2011) establishes service concepts sment of IT service offers for companies that plan to that seem to be able to solve the remaining issues. get insight from their data. From a scientific view, Among concepts like Infrastructure-as-a-Service a guideline is drawn for a future usage of terms in (IaaS), Platform-as-a-Service (PaaS) or Software- data analytics environment and a classification of past as-a-Service (SaaS) that revolutionize computing work is a point of interest. The structure of the paper on several layers, service models like Analytics- is as follows. A knowledge base about related work as-a-Service, Data-Analysis-as-a-Service, Business- is presented in the second section. The review is built Analytics-as-a-Service or Big-Data-as-Service were up on a cross-reference search for data analytics ser- conceptualized. Whether these approaches are suit- vices and data mining process steps (Webster et al., 432 Pohl, M., Bosse, S. and Turowski, K. A Data-Science-as-a-Service Model. DOI: 10.5220/0006703104320439 In Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018), pages 432-439 ISBN: 978-989-758-295-0 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved A Data-Science-as-a-Service Model 2002). The third section shows a Data-Science-as- quirements and algorithmic implementation is esta- a-Service model that combines the different steps in blished. With the evolution of a data-intensive world data mining as well as a cloud computing service mo- new technologies are needed for handling rapidly gro- del. Section four demonstrates examples of existing wing, variously shaped mass of data (Laney, 2001). services that are offered by known IT service provi- In (Maltby, 2011), the author reviews big data with ders. A discussion that argues aspects and challenges relation to analytic techniques and mentions that data from a technical and a scientific view as well as con- mining combines statistics and machine learning with cludes the paper. database management. Machine learning is not a new thing (Michalski et al., 1983) and focuses on ”au- tomatically learn to recognize complex patterns and 2 RELATED WORKS make intelligent decisions based on data” (Maltby, 2011). Data analysis, machine learning and data vi- sualization are marked as the core disciplines of data There are numerous works that address data mining, science that is a new paradigm in the field of big data data analytics, data science or similar. First of all, we (Concolato and Chen, 2017). However, there is no want to have a look on frameworks that are related simple and unified data science framework (Marvasti to data mining, data analytics and data science. In et al., 2015). Such frameworks are often deep into case of cloud computing, different services that pro- specific context-related data and implemented in an vide data analytics are already developed. Next to application environment (Brough et al., 2017; Loet- data analytics services we will consider data science sch and Ultsch, 2016). In (Cleveland, 2001), data ana- process steps and focus on it in further search. In lysis is described as an enlargement of statistics com- order to retrieve relevant works, we took search en- bined with data computing and is called data science. gine (Science Direct, Scopus) and knowledge databa- Obviously, there exists discordance about the terms ses (DBLP, IEEE, ACM) into account. data science, (big) data analytics, data mining, etc. However, it occurs that all definitions come up with 2.1 Data Science Frameworks a similar structure. Data has to be selected, prepared and analyzed to show up new information. However, The data science process shall end up with know- the needed expertise and the lack of uniformed pro- ledge. The key step in this nontrivial process is cal- cesses lead to the fact that data analytics is more an led data mining (Fayyad et al., 1996). Therefore data art than science (Zorrilla and Garc´ıa-Saiz, 2013). has to be processed from selected sources and trans- formed into a proper format. CRISP-DM (Shearer, 2000) describes a process cycle that is similar to Kno- 2.2 Data Science Services wledge Discovery in Databases (Fayyad et al., 1996), but points out the business understanding that is ne- The thought of automation without requiring human cessary for initiating because data mining goals will interaction, the capability solving small and big data be defined on that basis. Respecting both frameworks problems, the availability as well as flexibility in com- it is conspicuous that data mining is a name of a pro- puting power and storage characterize the essentials cess step in KDD and entitles the whole reference of cloud computing (Mell and Grance, 2011). Dif- model CRISP-DM. In (Kurgan and Musilek, 2006), ferent concepts that offering machines (IaaS), opera- a review of data mining frameworks is given. There ting systems (PaaS) and applications (SaaS) are alre- exist various derivates of the mentioned frameworks ady defined and discussed (Dillon et al., 2010). There with relation to a specific field of application. In (Sun exist approaches for hosting data analytics environ- et al., 2017), the authors add data analytics that also ments in a cloud. In (Xu et al., 2015), the authors aim contains data mining as a significant part to the dis- at making real-time data analytics available as a ser- cussion and observe it as an aggregation of data ana- vice and deal with the challenges of creating service lysis, data mining, data warehouse, statistic modeling interfaces, wrapping existing big data frameworks and and data visualization. Furthermore the authors aim real-time processing of data. In (Zorrilla and Garc´ıa- at identifying an ontological separation between data Saiz, 2013), an approach is motivated with some ex- analytics, business analytics, knowledge analytics and emplary services (e.g. GoodData, IBM Smart Analy- big data analytics. In (Nalchigar and Yu, 2017), a tics) and it is concluded that none of them wraps the conceptual modeling framework for business analy- whole analytic process that is described in KDD or tics is proposed and provides catalogues for business CRISP-DM. The proposed framework contains servi- questions, analytic algorithms and data preparation. ces for the steps of data preparation, data analysis and In this manner, a transmission between business re- data visualization and is layer structured from an en- 433 CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science terprise perspective. The concept is implemented as different types of Database-as-a-Service (shared ma- a SaaS, though its automation is not sufficiently dis- chine, shared processes, shared tables) are described cussed.