Resource Management in Cloud Data Centers

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 10, 2018 Resource Management in Cloud Data Centers Aisha Shabbir1, Kamalrulnizam Abu Bakar2, Raja Muhammad Siraj4 3 Zahilah Raja Mohd. Radzi Department of Information Engineering School of Computing & Computer Science University Technology Malaysia University of Trento Johor, Malaysia Trento, Italy Abstract—Vast sums of big data is a consequence of the data computing. It abolished the need for expensive hardware, from different diversity. Conventional data computational software and devoted space. Cloud computing offers its users a frameworks and platforms are incapable to compute complex big platform which grants resources, services and applications. data sets and process it at a fast pace. Cloud data centers having Mainly the cloud computing offers three different services; massive virtual and physical resources and computing platforms platform as a service (PaaS), software as a service (SaaS) and can provide support to big data processing. In addition, most infrastructure as a service (IaaS). For the users, these services well-known framework, MapReduce in conjunction with cloud are easily accessible on pay per-Use-Demand [3]. data centers provide a fundamental support to scale up and speed up the big data classification, investigation and processing Several frameworks have been proposed for processing big of the huge volumes, massive and complex big data sets. data. Some of widely used frameworks are Hadoop Inappropriate handling of cloud data center resources will not MapReduce, Dryad, Spark, Dremel and Pregel [4]. The most yield significant results which will eventually leads to the overall well-known framework is MapReduce. MapReduce is system’s poor utilization. This research aims at analyzing and proposed by Google to simplify massively distributed parallel optimizing the number of compute nodes following MapReduce processing so that very large and complex datasets can be framework at computational resources in cloud data center by processed and analyzed efficiently. It is designed on the focusing upon the key issue of computational overhead due to principle of exploiting the parallelism among the processing inappropriate parameters selection and reducing overall units. Popular implementation of the MapReduce framework is execution time. The evaluation has been carried out Hadoop and is used typically in conjunction with cloud experimentally by varying the number of compute nodes that is, map and reduce units. The results shows evidently that computing, for executing various Big Data applications, appropriate handling of compute nodes have a significant effect including web analytics applications, scientific applications, on the overall performance of the cloud data center in terms of data mining applications and enterprise data-processing total execution time. applications [5]. The cloud data centers comprise of several compute nodes Keywords—Big data; cloud data center; MapReduce; resource servers and storage nodes. Inappropriate handling of cloud data utilization center resources will result in underutilization of the resources, I. INTRODUCTION high latency and computational costs. Thus, it would yeild the overall degradation of the system „performance [6]. Existing Data sets that are so huge or complex that conventional researches are revolving around the improvement of resource data processing techniques are incapable to deal with them are management of cloud data centers by focusing more on called big data. The key sources of big data production are scheduling of tasks on the relevant processers [7-9]. Some digital applications, social media, transactions, emails, sensors researchers tried to alleviate the communication cost of the data and migration of almost every manual entity towards data movement within the cloud data center [10-11]. While automation. The increasing number of challenges of big data some researchers aiming at energy conservation for resources are due to its diverse nature which is categorized by its V‟s [1]. of the cloud data center [12-13]. Minimal attention has been As Big data is growing enormously so is its processing given towards the optimization factor and some factors of requirements. Consequently, it calls for requirements of huge framework used on servers has been explored [14]. computational infrastructure, in order to successfully analyze and process large amount of data. This is a two pronged However, there is lack of significant research upon the challenge, on one hand the amount of data is constantly optimization of the resources of the of the cloud data center. In increasing and its allocation on suitable set of available addition, the proper selection and distribution of compute resources and on the other hand need to yield the output in less nodes are not considered by research community. The time with minimum cost. To deal with ever growing data sets, important challenge is effective utilization of resources, with the giants like Google, IBM, Microsoft and Amazon have trifling computational cost, while skillfully allocating various ventured their concentration in cloud computing. They have assets of the data center to diverse tasks. This research focuses offered various services based on cloud computing [2]. Cloud on the appropriate handling of the compute nodes of the cloud computing is a solution to perform large scale complex data center. 416 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 10, 2018 Fig. 1. Overview of Framework. For illustration of the problem focused in this research Velocity presents the speed of data movement i.e. batch study, a general framework is shown in Fig.1. When client processing, stream processing or the real time submit their requests and the whole data size for processing processing. from all requests is huge. Then, on the submission of this data, it will be divided further for efficient processing among the Variety represents the formats of the big data i.e. physical and virtual resources of the Cloud data center. As structured, unstructured and semi-structured. shown in Fig.1 the data is splitted among different resource Veracity shows the quality of the big data, originality clouds and for its data execution on the processing nodes, it and authencity of the big data. will be further chunked and assigned to suitable processing machines according to the submitted task requirement. Thus, Value presents the information extraction of big data, splitting the data among the compute nodes and proper statics and hypothetical forms. parameter handling of the compute resources is necessary. B. MapReduce Framework The organization of the paper is as follows: Section II MapReduce is proposed by Google to simplify massively describes the Preliminaries that is giving the overview of Big distributed parallel processing so that very large and complex Data and its characteristics, enlightens the MapReduce datasets can be processed and analyzed efficiently. Popular Framework and Cloud Computing details. Section III implementation of the MapReduce programming framework is comprises the Problem formulation. In Section IV, presents the Hadoop and is used typically in conjunction with cloud Experimental Setup with the configuration details. Section V is computing for executing various Big Data applications, about the Results and discussions. Last section that is, Section including web analytics applications, scientific applications, VI is of Conclusion and future work and following then are the data mining applications, and enterprise data-processing acknowledgements and references of this study. applications [16-17]. MapReduce is considered the most prominent and effective framework for the big data problems II. PRELIMINARIES that allow the processing of gigantic data over many A. Big Data underlying distributed nodes. The MapReduce is composed of This vast sum of big data is a consequence of the data from two basic components i.e., mappers and reducers. The basic different diversity that is, digital applications, scientific data, concept is to design the map function to generate a set of the business transactions, social networking, emails and sensors intermediate key-value pairs. The reducer is then used to merge data. The challenges of big data have been risen because of the all intermedia values associated with the intermediate key. The 5V‟s characteristics of big data i.e. how to store, process, key feature of the MapReduce framework is that it invokes the merge, manage and govern the different formats of data [15]. parallelism among the computing nodes. The workflow of the The brief detail of the V‟s are as follows. MapReduce is shown in Fig.2. MapReduce computes a job in a way that it takes the big data sets for processing and chunked Volume shows the size of big data i.e. gigabyte, the huge data sets into small chunks and process it over the terabyte or zettabyte. Map units. The output of the mappers is in the form of key- value pairs. This output has been forwarded to the reducers for further processing and the final out has been collected after the Reduce phase. 417 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 10, 2018 compute resources as shown in Fig. 1. When tasks are submitted from different users to the computational platform for processing. The whole size of the input data tasks could be of

Load more