Cloud Computing and Big Data Is There a Relation Between the Two: a Study
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 17 (2017) pp. 6970-6982 © Research India Publications. http://www.ripublication.com Cloud Computing and Big Data is there a Relation between the Two: A Study Nabeel Zanoon1, Abdullah Al-Haj2, Sufian M Khwaldeh3 1 Department of Applied science, Al- Balqa Applied University/Aqaba, Jordan. 2 Faculty of Information Technology, University of Jordan/Aqaba, Jordan. 3Business Information Technology (BIT) Department, The University of Jordan/Aqaba, Jordan. 1Orcid ID: 0000-0003-0581-206X Abstract and combinations of each. Such data can be directly or indirectly related to geospatial information [3]. Communicating by using information technology in various ways produces big amounts of data. Such data requires Cloud computing refers to on-demand computer resources and processing and storage. The cloud is an online storage model systems available across the network that can provide a where data is stored on multiple virtual servers. Big data number of integrated computing services without local processing represents a new challenge in computing, resources to facilitate user access. These resources include especially in cloud computing. Data processing involves data data storage capacity, backup and self-synchronization [4]. acquisition, storage and analysis. In this respect, there are Most IT Infrastructure computing consist of services that are many questions including, what is the relationship between provided and delivered through public centers and servers big data and cloud computing? And how is big data processed based on them. Here, clouds appear as individual access in cloud computing? The answer to these questions will be points for the computing needs of the consumer. It is generally discussed in this paper, where the big data and cloud expected for commercial offers to meet the QoS requirements computing will be studied, in addition to getting acquainted of customers or consumers, and typically include service level with the relationship between them in terms of safety and agreements (SLAs) [5]. They are an online storage model challenges. We have suggested a term for big data, and a where data are stored on multiple virtual servers, rather than model that illustrates the relationship between big data and being hosted on a specific server, and are usually provided by cloud computing. a third party. The hosting companies, which have advanced data centers, rent spaces that are stored in a cloud to their Keywords: big data, Hadoop, Cloud, MapReduce, resources, customers in line with their needs [6]. Five (Vs). The expert Erik Brynjolfsson likened big data to a microscope which was invented in old times, and by which scientists were INTRODUCTION able to identify and measure things they had never imagined before at the cell level. This is similar to big data which is a Data is the raw material for information before sorting, modern day microscope by which you are able to see things arranging and processing. It cannot be used in its primary and measure data that you never have expected. [7] The form prior to processing. Information represents data after statistics shown in [8] show that data growth in cloud processing and analysis [1]. The technology has been environments is increasing exponentially and rapidly with the developed and used in all aspects of life, increasing the increasing number of internet users around the world. With demand for storing and processing more data. As a result, this rapid growth, the question that comes to mind is how can several systems have been developed including cloud these vast amounts of data be stored in cloud environments? computing that support big data. While big data is responsible We need storage technology that meets the needs of rapid data for data storage and processing, the cloud provides a reliable, growth on the cloud and we need storage technology with low accessible, and scalable environment for big data systems to cost, high reliability and high capability. function [2]. Big data is defined as the quantity of digital data produced from different sources of technology for example, The relationship between big data and the cloud computing is sensors, digitizers, scanners, numerical modeling, mobile based on integration in that the cloud represents the phones, Internet, videos, e-mails and social networks. The storehouse and the big data represents the product that will be data types include texts, geometries, images, videos, sounds stored in the storehouse, since it is not possible to create storehouses without storing any product in them. The 6970 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 17 (2017) pp. 6970-6982 © Research India Publications. http://www.ripublication.com traditional databases known as 'relational' are no longer sufficient to process multiple-source data. For example, how can these traditional methods deal with data such as record of transactions, customer behavior, mobile phone and GPS navigation, and others. Here comes the role of cloud computing. At this point, a relationship between big data and the cloud will arise. In this paper, the relationship between them will be discussed, in addition to the obstacles and challenges that this relationship may encounter. BIG DATA Big data comes and is composed through electronics operations from multiple sources. It requires proper processing power and high capabilities for analysis [9]. The importance of big data lies in the analytical use which can help generate an informed decision to provide better and faster services [10]. The term big data is called on the huge amount of high-speed Figure 1 . Characteristics Of Big Data big data of different types; this data cannot be processed and stored in regular computers. The main characteristics of big There have been numerous revisions to the big data until they data, called V's 5 As in Figure 1 , can be summed up in the reached (7 v) [14]. In this paper, based on the relationship fact that the issue is not only about the volume of data, other between cloud computing and big data, will suggest a new dimensions of big data, known as 'five Vs', are as follows: term, virtualization, which virtually represents The data structure is by default. The virtualization of big data is a 1. Volume: It represents the amount of data produced from process that focuses on creating virtual structures for big data multiple sources which show the huge data in numbers systems. Virtualization technology is the key technology used by zeta bytes. The volume is most evident dimension in to help cloud computing handle large amounts of data flexibly what concerns to big data. and facilitate the process of managing big data. The virtual 2.Variety: It represents data types, with, increasing storage technology will be studied in section (6.2). the number of Internet users everywhere, smart phones and social networks users, the familiar form of data has changed from structured data in databases to The type and nature of the data unstructured data that includes a large number of formats such as images, audio and video clips, SMS, and Data in general is a set of values that are in the form of GPS data [11]. numbers, letters, symbols and other forms where they are 3. Velocity: It represents the speed of data frequency from concerned with a particular idea and subject .The data does different sources, that is, the speed of data production not make sense without analysis, and is, therefore, compiled such as Twitter and Facebook. The huge increase in data for use. It represents input, while information is output after volume and their frequency dictates the need for a processing, i.e. data is entered into the system first, then system that ensures super-speed data analysis. processed until it comes out in the form of useful information 4. Veracity: It represents the quality of the data, it shows the that has a clear meaning and against which decisions are accuracy of the data and the confidence in the data made. content. The quality of the data captured can vary Big data comes from multiple sources including sensors and greatly, which affects the accuracy of analysis. Although free texts such as social media, unstructured data, metadata there is wide agreement on the potential value of big and other geospatial data collected from web logs, GPS, data, the data is almost worthless if it is not accurate medical devices, etc. [15]. The big data is gathered from [12]. different sources ,so it is in several forms, including: 5. Value: It represents the value of big data, i.e. it shows the importance of data after analysis. This is due to the fact 1. Structured data: It is the organized data in the form of that the data on its own is almost worthless. The value tables or databases to be processed. lies in careful analysis of the exact data, the information 2. Unstructured data: It represents the biggest proportion of and ideas it provides. The value is the final stage that data; it is the data that people generate daily as texts, comes after processing volume, velocity, variety, images, videos, messages, log records, click-streams,etc. contrast, validity and visualization [13] 6971 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 17 (2017) pp. 6970-6982 © Research India Publications. http://www.ripublication.com 3. Semi-structured data: or multi-structured ,It is regarded a evolution of multitasking technology tools the data has kind of structured