Cloud Computing Environment – a Big Data Dash

Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-4, 2017 ISSN: 2454-1362, http://www.onlinejournal.in Cloud Computing Environment – A Big Data Dash Rajendra Kaple SGBAU, Amravati Abstract: Rapid growth of internet applications like includes data sets with sizes beyond the ability of social media, online shopping, banking services etc., current technology, method and theory to capture, leads to process a huge amount of diverse data, manage, and process the data within a acceptable which continues to witness a quick increase. lapsed time. The definition of big data is also given Effective management and analysis of large scale as, high-volume, high-velocity, and a variety of data leads to an interesting but critical challenge. information resources that require new forms of Recently, big data has involved a lot of attention processing to enable superior decision making, from academic world, industry as well as insight detection and process optimization. government. In this paper we include several big According to Wikimedia, big data is a collection of data processing techniques from system and data sets so large and complex that it becomes application sides. From the view of cloud data difficult to process using traditional database Management and big data processing mechanisms, management tools. The purpose of this paper is to let us concentrate on the key issues of big data offer an overview of big data studies and related processing, including cloud computing platform, works, which targets at providing a general view of cloud architecture, cloud database and data storage big data management technologies and applications. system. And, then let us acquaint with MapReduce optimization strategies and applications reported in 2. Big Data Management System : Many the literature. Lastly, focus on the open issues and researchers have suggested that commercial challenges, and talk over the research possibilities, Database Management Systems are not suitable for on big data processing in cloud computing processing extremely huge amount of data. Classic environments. architecture’s main failure is, the database server while encountered with top workloads. One database 1. Introduction : In the last twenty years, the server has restriction of scalability and cost, which continuous increase of computational power has are two important areas of big data processing. In produced an vast flow of data. Big data is not only order to adapt various large data processing models, becoming more accessible but also more clear to D. Kossmann et al. presented four different computers. For example, the social website, constructions based on classic multilayer database Facebook, serves around six hundred billion pages application architecture which are partitioning, per month, stores three billion new photos every replication, distributed control and caching month, and manages twenty five billion portions of constructions [1]. It is clear that the alternative content. Google’s search and advertising business, providers have different business models and target facebook, flickr, YouTube, and LinkedIn use a AI different kind of applications. Google seems to be tricks; require examining vast quantities of data and more interested in small applications with light making decisions istantly. Multimedia data mining workloads whereas Azure is currently the most platforms provide an easy access to achieve these realistic service for medium to large services. Most goals with the minimum amount of effort in terms of of recent cloud service providers are utilizing hybrid software, CPU and network resources. Big data and architecture that is capable of fulfilling their actual cloud computing are both the fast growing service necessities. In this section, let us discuss big technologies. Cloud computing is associated with data architecture from three key aspects: distributed new standard for providing computing infrastructure file system, non-structural and semi structured data and big data processing method for all types of storage and open source cloud platform. resources. Moreover, some new cloud based technologies have to be accepted because dealing A. Distributed File System: Google File System with big data for analogous processing is difficult. (GFS) is a portion based distributed file system that Then what is Big Data? In the publication of the supports fault-tolerance by data separating and journal of Science 2008, Big Data means the reproduction. As an fundamental storage layer of progress of the human intellectual processes, usually Google’s cloud computing platform, it is used to read Imperial Journal of Interdisciplinary Research (IJIR) Page 1374 Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-4, 2017 ISSN: 2454-1362, http://www.onlinejournal.in input and store output of MapReduce. Similarly, differs from key-value storage system. Facebook Hadoop also has a distributed file system as its data proposed the design of a new cluster-based data storage layer called Hadoop Distributed File System warehouse system, Llama[6], a hybrid data (HDFS)[2], which is an open-source counterpart of management system which combines the features of GFS. GFS and HDFS are user level filesystems that row-wise and column-wise database systems. They do not implement POSIX (Portable Operating also describe a new column-wise file format for System Interface) semantics and heavily improved Hadoop called CFile, which provides better for the case of large files (measured in gigabytes). performance than other file formats in data analysis. Amazon Simple Storage Service (S3) is an online C. Open Source Cloud Platform: The main idea public storage web service offered by Amazon web behind data centre is to empower the virtualization services. This file system is focuses at clusters hosted technology to maximize the utilization of computing on the Amazon Elastic Compute Cloud server-on- resources. Therefore, it provides the basic ingredients demand infrastructure. S3 aims to provide such as storage, CPUs, and network bandwidth as a scalability, high availability, and low latency at commodity by specialized service providers at low goods costs. ES2 is an elastic storage system of unit cost. For reaching the goals of big data epiC6, which is designed to support both management, most of the research institutions and functionalities within the same storage. The system enterprises bring virtualization into cloud provides efficient data loading from different architectures. Amazon Web Services (AWS), sources, flexible data partitioning scheme, index and Eucalyptus, Open nebula, Cloud stack and Open parallel sequential scan. In addition, there are general stack are the most popular cloud management filesystems that have not to be addressed such as platforms for infrastructure as a service (IaaS). Moose File System (MFS), Kosmos Distributed AWS9 is not free but it has huge usage in elastic Filesystem (KFS). platform. It is very easy to use and only pay-as-you- go. The Eucalyptus [7] works in IaaS as an open B. Non-structural and Semi-structured Data source. It uses virtual machine in controlling and Storage: With the success of the web, more and managing resources. Since Eucalyptus is the earliest more IT companies have increasing needs to store cloud management platform for IaaS, it signs API and analyze the ever growing web data, such as compatible agreement with AWS. It has a leading search logs, crawled web content, and click streams, position in the private cloud market for the AWS usually in the range of petabytes (1000 terabytes), ecological environment. Open NEBULA[8] has collected from different web services. However, web integration with various environments. It can offer data sets are usually non-relational or less structured the richest features, flexible ways and better and processing such semi-structured data sets at scale interoperability to build private, public or hybrid involves another challenge. Moreover, simple clouds. Open Nebula is not a Service Oriented distributed file systems mentioned above cannot Architecture (SOA) design and has weak decoupling satisfy service providers like Google, Yahoo!, for computing, storage and network independent Microsoft and Amazon. All providers have their components. CloudStack10 is an open source cloud perseverance to serve potential users and own their operating system which delivers public cloud relevant state of-the-art of big data management computing similar to Amazon EC2 but using users’ systems in the cloud environments. Big table[3] is a own hardware. Cloud Stack users can take full distributed storage system of Google for managing advantage of cloud computing to deliver higher structured data that is designed to scale to a very efficiency, limitless scale and faster deployment of large size say petabytes of data, across thousands of new services and systems to the end user. At present, commodity servers. Big table does not support a full Cloud Stack is one of the Apache open source relational data model. However, it provides clients projects. OpenStack11 is a collection of open source with a simple data model that supports active control software projects aiming to build an open-source over data layout and format. PNUTS[4] is a huge community with researchers, developers and scale hosted database system designed to support enterprises. People in this community share a Yahoo!’s web applications. The main focus of the common goal to create a cloud that is simple to system is on data allocation

Cloud Computing Environment – a Big Data Dash

Analytical Study on Performance, Challenges and Future Considerations of Google File System

QUICK START GUIDE System Requirements

Indexing the World Wide Web: the Journey So Far Abhishek Das Google Inc., USA Ankit Jain Google Inc., USA

CEPH Storage [Open Source Osijek]

Performance Evaluation of Gluster and Compuverde Storage Systems Comparative Analysis

Syst`Eme De Fichiers Distribu´E : Comparaison De Glusterfs

Linux File System Idataagent

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

Moosefs 3.0 User's Manual

System Requirements

Atman-Backup-User-Guide-EN

System Requirements