A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

Total Page:16

File Type:pdf, Size:1020Kb

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks Sanaa Hamid Mohamed, Student Member, IEEE, Taisir E.H. El-Gorashi, Member, IEEE, and Jaafar M.H. Elmirghani, Senior Member, IEEE Abstract— This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications. MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics. These services usually utilize commodity clusters within geographically-distributed data centers and provide cost-effective and elastic solutions. However, the increasing traffic between and within the data centers that migrate, store, and process big data, is becoming a bottleneck that calls for enhanced infrastructures capable of reducing the congestion and power consumption. Moreover, enterprises with multiple tenants requesting various big data services are challenged by the need to optimize leasing their resources at reduced running costs and power consumption while avoiding under or over utilization. In this survey, we present a summary of the characteristics of various big data programming models and applications and provide a review of cloud computing infrastructures, and related technologies such as virtualization, and software-defined networking that increasingly support big data systems. Moreover, we provide a brief review of data centers topologies, routing protocols, and traffic characteristics, and emphasize the implications of big data on such cloud data centers and their supporting networks. Wide ranging efforts were devoted to optimize systems that handle big data in terms of various applications performance metrics and/or infrastructure energy efficiency. This survey aims to summarize some of these studies which are classified according to their focus into applications-level, networking-level, or data centers-level optimizations. Finally, some insights and future research directions are provided. Index Terms— Big Data, MapReduce, Machine Learning, Data Streaming, Cloud Computing, Cloud Networking, Software-Defined Networking (SDN), Virtual Machines (VM), Network Function Virtualization (NFV), Containers, Data Centers Networking (DCN), Energy Efficiency, Completion Time, Scheduling, Routing. I INTRODUCTION T HE evolving paradigm of big data is essential for critical advancements in data processing models and the underlying acquisition, transmission, and storage infrastructures [1]. Big data differs from traditional data in being potentially unstructured, rapidly generated, continuously changing, and massively produced by a large number of distributed users or devices. Typically, big data workloads are transferred into powerful data centers containing sufficient storage and processing units for real-time or batch computations and analysis. A widely used characterization for big data is the "5V" notion which describes big data through its unique attributes of Volume, Velocity, Variety, Veracity, and Value [2]. In this notation, the volume refers to the vast amount of data produced which is usually measured in Exabytes (i.e. 260 or 1018 bytes) or Zettabytes (i.e. 270 or 1021 bytes), while the velocity reflects the high speed or rate of data generation and hence potentially the short lived useful lifetime of data. Variety indicates that big data can be composed of different types of data which can be categorized into structured and unstructured. An example of structured data is bank transactions which can fit into relational database systems, and an example of the unstructured data is social media content that could be a mix of text, photos, animated Graphics Interchange Format (GIF), audio files, and videos contained in the same element (e.g. a tweet, or a post). The veracity measures the trustworthiness of the data as some generated portions could be erroneous or inaccurate, while the value measures the ability of the user or owner of the data to extract useful information from the data. In 2020, the global data volume is predicted to be around 40,000 Exabytes which represents a 300 times growth factor compared to the global data volume in 2005 [3]. An estimate of the global data volume in 2010 is about 640 Exabytes [4], and in 2015 is about 2,700 Exabytes [5]. This huge growth in data volumes is the result of continuous developments in various applications that generate massive and rich content related to a wide range of human activities. For example, online business transactions are expected to have a rate of 450 Billion transactions per day by 2020 [4]. Social media such as Facebook, LinkedIn, and Twitter, which have between 300 Million and 2 Billion subscribers who access these social media platforms through web browsers in personal computers (PCs), or through applications installed in tablets and smart phones are enriching the content of the Internet with content in the range of several Terabytes (240 bytes) per day [5]. Analyzing the thematic connections between the subscribers, for example by grouping people with similar interests, is opening remarkable opportunities for targeted marketing and e-commerce. Moreover, the subscriber's behaviours and preferences tracked by their activities, clickstreams, requests, and collected web log files can be analyzed with big data mining tools for profound psychological, economical, business-oriented, and product improvement studies [6], [7]. To accelerate the delay-sensitive operations of web searching and indexing , distributed programming models for big data such as MapReduce were developed [8]. MapReduce is a powerful, reliable, and cost-effective programming model that performs parallel processing for large distributed datasets. These features have enabled the development of different distributed programming big data solutions and cloud computing applications. Fig. 1. Big data communication, networking, and processing infrastructure, and examples of big data applications. A wide range of applications are considered big data applications from data-intensive scientific applications that require extensive computations to massive datasets that require manipulation such as in earth sciences, astronomy, nanotechnology, genomics, and bioinformatics [9]. Typically, the computations, simulations, and modelling in such applications are carried out in High Performance Computing (HPC) clusters with the aid of distributed and grid computing. However, the datasets growth beyond these systems capacities in addition to the desire to share datasets for scientific research collaborations in some disciplines are encouraging the utilization of big data applications in cloud computing infrastructures with commodity devices for scientific computations despite the resultant performance and cost tradeoffs [10]. With the prevalence of mobile applications and services that have extensive computational and storage demands exceeding the capabilities of the current smart phones, emerging technologies such as Mobile Cloud Computing (MCC) were developed [11]. In MCC, the computational and storage demands of applications are outsourced to remote (or close as in mobile edge computing (MEC)) powerful servers over the Internet. As a result, on-demand rich services such as video streaming, interactive video, and online gaming can be effectively delivered to the capacity and battery limited devices. Video content accounted for 51% of the total mobile data traffic in 2012 [11], and is predicted to account for 78% of an expected total volume of 49 Exabytes by 2021 [12]. Due to these huge demands, in addition to the large sizes of video files, big video data platforms are fronting several challenges related to video streaming, storage, and replication management, while needing to meet strict quality-of-experience (QoE) requirements [13]. In addition to mobile devices, the wide range of everyday physical objects that are increasingly interconnected for automated operations has formed what is known as the Internet-of-Things (IoT). In IoT systems, the underlying communication and networking infrastructure are typically integrated with big data computing systems for data collection, analysis, and decision-making. Several technologies such as RFID, low power communication technologies, Machine-to-Machine (M2M) communications, and wireless sensor networking (WSN) have been suggested for improved IoT communications and networking infrastructure [14]. To process the big data generated by IoT devices, different solutions such as cloud and fog computing were proposed [15]-[31]. Existing cloud computing infrastructures could be utilized by aggregating and processing big data in powerful central data centers. Alternatively, data could be processed at the edge where fog computing units, typically with limited processing capacities compared to cloud, are utilized [32]. Edge computing reduces both; the traffic in core networks, and the latency by being closer to end devices. The connected devices could be sensors gathering different real-time measurements, or actuators performing automated control operations in industrial, agricultural, or smart building applications. IoT can support vehicle communication to realize smart transportation systems. IoT can also support medical applications such as wearables and telecare
Recommended publications
  • Efficient Implementation of Data Objects in the OSD+-Based Fusion
    Efficient Implementation of Data Objects in the OSD+-Based Fusion Parallel File System Juan Piernas(B) and Pilar Gonz´alez-F´erez Departamento de Ingenier´ıa y Tecnolog´ıa de Computadores, Universidad de Murcia, Murcia, Spain piernas,pilar @ditec.um.es { } Abstract. OSD+s are enhanced object-based storage devices (OSDs) able to deal with both data and metadata operations via data and direc- tory objects, respectively. So far, we have focused on designing and implementing efficient directory objects in OSD+s. This paper, however, presents our work on also supporting data objects, and describes how the coexistence of both kinds of objects in OSD+s is profited to efficiently implement data objects and to speed up some commonfile operations. We compare our OSD+-based Fusion Parallel File System (FPFS) with Lustre and OrangeFS. Results show that FPFS provides a performance up to 37 better than Lustre, and up to 95 better than OrangeFS, × × for metadata workloads. FPFS also provides 34% more bandwidth than OrangeFS for data workloads, and competes with Lustre for data writes. Results also show serious scalability problems in Lustre and OrangeFS. Keywords: FPFS OSD+ Data objects Lustre OrangeFS · · · · 1 Introduction File systems for HPC environment have traditionally used a cluster of data servers for achieving high rates in read and write operations, for providing fault tolerance and scalability, etc. However, due to a growing number offiles, and an increasing use of huge directories with millions or billions of entries accessed by thousands of processes at the same time [3,8,12], some of thesefile systems also utilize a cluster of specialized metadata servers [6,10,11] and have recently added support for distributed directories [7,10].
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Storage Systems and Input/Output 2018 Pre-Workshop Document
    Storage Systems and Input/Output 2018 Pre-Workshop Document Gaithersburg, Maryland September 19-20, 2018 Meeting Organizers Robert Ross (ANL) (lead organizer) Glenn Lockwood (LBL) Lee Ward (SNL) (co-lead) Kathryn Mohror (LLNL) Gary Grider (LANL) Bradley Settlemyer (LANL) Scott Klasky (ORNL) Pre-Workshop Document Contributors Philip Carns (ANL) Quincey Koziol (LBL) Matthew Wolf (ORNL) 1 Table of Contents 1 Table of Contents 2 2 Executive Summary 4 3 Introduction 5 4 Mission Drivers 7 4.1 Overview 7 4.2 Workload Characteristics 9 4.2.1 Common observations 9 4.2.2 An example: Adjoint-based sensitivity analysis 10 4.3 Input/Output Characteristics 11 4.4 Implications of In Situ Analysis on the SSIO Community 14 4.5 Data Organization and Archiving 15 4.6 Metadata and Provenance 18 4.7 Summary 20 5 Computer Science Challenges 21 5.1 Hardware/Software Architectures 21 5.1.1 Storage Media and Interfaces 21 5.1.2 Networks 21 5.1.3 Active Storage 22 5.1.4 Resilience 23 5.1.5 Understandability 24 5.1.6 Autonomics 25 5.1.7 Security 26 5.1.8 New Paradigms 27 5.2 Metadata, Name Spaces, and Provenance 28 5.2.1 Metadata 28 5.2.2 Namespaces 30 5.2.3 Provenance 30 5.3 Supporting Science Workflows - SAK 32 5.3.1 DOE Extreme Scale Use cases - SAK 33 5.3.2 Programming Model Integration - (Workflow Composition for on line workflows, and for offline workflows ) - MW 33 5.3.3 Workflows (Engine) - Provision and Placement MW 34 5.3.4 I/O Middleware and Libraries (Connectivity) - both on-and offline, (not or) 35 2 Storage Systems and Input/Output 2018 Pre-Workshop
    [Show full text]
  • HFAA: a Generic Socket API for Hadoop File Systems
    HFAA: A Generic Socket API for Hadoop File Systems Adam Yee Jeffrey Shafer University of the Pacific University of the Pacific Stockton, CA Stockton, CA [email protected] jshafer@pacific.edu ABSTRACT vices: one central NameNode and many DataNodes. The Hadoop is an open-source implementation of the MapReduce NameNode is responsible for maintaining the HDFS direc- programming model for distributed computing. Hadoop na- tory tree. Clients contact the NameNode in order to perform tively integrates with the Hadoop Distributed File System common file system operations, such as open, close, rename, (HDFS), a user-level file system. In this paper, we intro- and delete. The NameNode does not store HDFS data itself, duce the Hadoop Filesystem Agnostic API (HFAA) to allow but rather maintains a mapping between HDFS file name, Hadoop to integrate with any distributed file system over a list of blocks in the file, and the DataNode(s) on which TCP sockets. With this API, HDFS can be replaced by dis- those blocks are stored. tributed file systems such as PVFS, Ceph, Lustre, or others, thereby allowing direct comparisons in terms of performance Although HDFS stores file data in a distributed fashion, and scalability. Unlike previous attempts at augmenting file metadata is stored in the centralized NameNode service. Hadoop with new file systems, the socket API presented here While sufficient for small-scale clusters, this design prevents eliminates the need to customize Hadoop’s Java implementa- Hadoop from scaling beyond the resources of a single Name- tion, and instead moves the implementation responsibilities Node. Prior analysis of CPU and memory requirements for to the file system itself.
    [Show full text]
  • Maximizing the Performance of Scientific Data Transfer By
    Maximizing the Performance of Scientific Data Transfer by Optimizing the Interface Between Parallel File Systems and Advanced Research Networks Nicholas Millsa,∗, F. Alex Feltusb, Walter B. Ligon IIIa aHolcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC bDepartment of Genetics and Biochemistry, Clemson University, Clemson, SC Abstract The large amount of time spent transferring experimental data in fields such as genomics is hampering the ability of scientists to generate new knowledge. Often, computer hardware is capable of faster transfers but sub-optimal transfer software and configurations are limiting performance. This work seeks to serve as a guide to identifying the optimal configuration for performing genomics data transfers. A wide variety of tests narrow in on the optimal data transfer parameters for parallel data streaming across Internet2 and between two CloudLab clusters loading real genomics data onto a parallel file system. The best throughput was found to occur with a configuration using GridFTP with at least 5 parallel TCP streams with a 16 MiB TCP socket buffer size to transfer to/from 4{8 BeeGFS parallel file system nodes connected by InfiniBand. Keywords: Software Defined Networking, High Throughput Computing, DNA sequence, Parallel Data Transfer, Parallel File System, Data Intensive Science 1. Introduction of over 2,500 human genomes from 26 populations have been determined (The 1000 Genomes Project [3]); genome Solving scientific problems on a high-performance com- sequences have been produced for 3,000 rice varieties from puting (HPC) cluster will happen faster by taking full ad- 89 countries (The 3000 Rice Genomes Project [4]). These vantage of specialized infrastructure such as parallel file raw datasets are but a few examples that aggregate into systems and advanced software-defined networks.
    [Show full text]
  • Locofs: a Loosely-Coupled Metadata Service for Distributed File Systems
    LocoFS: A Loosely-Coupled Metadata Service for Distributed File Systems Siyang Li∗ Youyou Lu Jiwu Shu† Tsinghua University Tsinghua University Tsinghua University [email protected] [email protected] [email protected] Tao Li Yang Hu University of Florida University of Texas, Dallas [email protected] [email protected] ABSTRACT 1 INTRODUCTION Key-Value stores provide scalable metadata service for distributed As clusters or data centers are moving from Petabyte level to Ex- file systems. However, the metadata’s organization itself, which is abyte level, distributed file systems are facing challenges in meta- organized using a directory tree structure, does not fit the key-value data scalability. The recent work IndexFS [38] uses hundreds of access pattern, thereby limiting the performance. To address this metadata servers to achieve high-performance metadata operations. issue, we propose a distributed file system with a loosely-coupled However, most of the recent active super computers only deploy metadata service, LocoFS, to bridge the performance gap between 1 to 4 metadata servers to reduce the complexity of management file system metadata and key-value stores. LocoFS is designed to and guarantee reliability. Besides, previous work [24, 39] has also decouple the dependencies between different kinds of metadata revealed that metadata operations consume more than half of all with two techniques. First, LocoFS decouples the directory content operations in file systems. The metadata service plays a major role and structure, which organizes file and directory index nodes ina in distributed file systems. It is important to support parallel pro- flat space while reversely indexing the directory entries.
    [Show full text]
  • A Persistent Storage Model for Extreme Computing Shuangyang Yang Louisiana State University and Agricultural and Mechanical College
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2014 A Persistent Storage Model for Extreme Computing Shuangyang Yang Louisiana State University and Agricultural and Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Computer Sciences Commons Recommended Citation Yang, Shuangyang, "A Persistent Storage Model for Extreme Computing" (2014). LSU Doctoral Dissertations. 2910. https://digitalcommons.lsu.edu/gradschool_dissertations/2910 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. A PERSISTENT STORAGE MODEL FOR EXTREME COMPUTING A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Computer Science by Shuangyang Yang B.S., Zhejiang University, 2006 M.S., University of Dayton, 2008 December 2014 Copyright © 2014 Shuangyang Yang All rights reserved ii Dedicated to my wife Miao Yu and our daughter Emily. iii Acknowledgments This dissertation would not be possible without several contributions. It is a great pleasure to thank Dr. Hartmut Kaiser @ Louisiana State University, Dr. Walter B. Ligon III @ Clemson University and Dr. Maciej Brodowicz @ Indiana University for their ongoing help and support. It is a pleasure also to thank Dr. Bijaya B. Karki, Dr. Konstantin Busch, Dr. Supratik Mukhopadhyay at Louisiana State University for being my committee members and Dr.
    [Show full text]
  • The Ceph Distributed Storage System
    the ceph distributed storage system sage weil msst – april 17, 2012 outline ● why you should care ● what is it, what it does ● how it works, how you can use it ● architecture ● objects and data placement ● file system ● big data, cloud ● current status, roadmap ● who we are, why we do this why should you care about another storage system? requirements, time, money storage requirements ● scale ● terabytes, petabytes, exabytes ● heterogeneous hardware ● reliability and fault tolerance ● diverse storage needs ● object storage ● block devices ● shared file system (POSIX, coherent caches) ● structured data time ● ease of administration ● no manual data migration, load balancing ● painless scaling ● expansion and contraction ● seamless migration money ● low cost per gigabyte ● no vendor lock-in ● software solution ● commodity hardware ● open source what is ceph? unified storage system ● objects ● small or large ● multi-protocol Netflix VM Hadoop ● block devices radosgw RBD Ceph DFS ● snapshots, cloning RADOS ● files ● cache coherent ● snapshots ● usage accounting open source ● LGPLv2 ● copyleft ● free to link to proprietary code ● no copyright assignment ● no dual licensing ● no “enterprise-only” feature set ● active community ● commercial support distributed storage system ● data center (not geo) scale ● 10s to 10,000s of machines ● terabytes to exabytes ● fault tolerant ● no SPoF ● commodity hardware – ethernet, SATA/SAS, HDD/SSD – RAID, SAN probably a waste of time, power, and money architecture ● monitors (ceph-mon) ● 1s-10s, paxos ● lightweight
    [Show full text]
  • Introduction to Distributed File Systems. Orangefs Experience 0.05 [Width=0.4]Lvee-Logo-Winter
    Introduction to distributed file systems. OrangeFS experience Andrew Savchenko NRNU MEPhI, Moscow, Russia 16 February 2013 . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Outline 1. Introduction 2. Species of distributed file systems 3. Behind the curtain 4. OrangeFS 5. Summary . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Species of distributed file systems Network Clustered Distributed Terminology is ambiguous!. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Network file systems client1 client2 clientN server storage A single server (or at least an appearance) and multiple network clients. Examples: NFS, CIFS. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Clustered file systems server1 server2 serverN storage Servers sharing the same local storage (usually SAN[2] at block level). shared storage architecture. Examples: GFS2[3], OCFS2[4]. .. .. .. .. .
    [Show full text]
  • Týrfs: Increasing Small Files Access Performance with Dynamic Metadata Replication Pierre Matri, María Pérez, Alexandru Costan, Gabriel Antoniu
    TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication Pierre Matri, María Pérez, Alexandru Costan, Gabriel Antoniu To cite this version: Pierre Matri, María Pérez, Alexandru Costan, Gabriel Antoniu. TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication. CCGRID 2018 - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2018, Washington, United States. pp.452- 461, 10.1109/CCGRID.2018.00072. hal-01892691 HAL Id: hal-01892691 https://hal.archives-ouvertes.fr/hal-01892691 Submitted on 10 Oct 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. TyrFS:´ Increasing Small Files Access Performance with Dynamic Metadata Replication Pierre Matri∗, Mar´ıa S. Perez´ ∗, Alexandru Costanyz, Gabriel Antoniuz ∗Universidad Politecnica´ de Madrid, Madrid, Spain, fpmatri, mperezg@fi.upm.es yIRISA / INSA Rennes, Rennes, France, [email protected] zInria, Rennes, France, falexandru.costan, [email protected] Abstract—Small files are known to pose major performance We advocate that a different file system architecture is challenges for file systems. Yet, such workloads are increasingly necessary to reduce the cost of metadata management for common in a number of Big Data Analytics workflows or large- such workloads involving many small files.
    [Show full text]
  • Analytical Study on Performance, Challenges and Future Considerations of Google File System
    International Journal of Computer and Communication Engineering, Vol. 3, No. 4, July 2014 Analytical Study on Performance, Challenges and Future Considerations of Google File System Zahid Ullah, Sohail Jabbar, Muhammad Haris bin Tariq Alvi, and Awais Ahmad users can set the attributes of the files via operating system Abstract—In this paper we present a brief analysis on the commands for those files and directories which are located working and design of Google file System with its performance on the remote systems. analysis. A brief Comparison of Google File system with other There is a long list of distributed file systems available in Distributed File Systems (DFS) namely HDFS, Global FS, the market. Some are open source, some are specially GPFS and sector is also presented. Although Google shares many similarities with other DFSs but still it is unique in its designed by some organization for fulfilling their specific design which is made specifically to serve Google’s heavy needs and some are available for the use by the common workload. GFS’s Architecture has a single master which is users while some are provided to the end users for running responsible for performing some major duties as explained in their developed applications. Following are various types of the Architecture session. Reliability of data is maintained Distributed File Systems (DFS) with respect to their offered through triple replication of data. So it also provides fast features and dimensionality of their performance. recovery and fault tolerance. Comparison and future consideration enables the reader to understand the current A. Distributed Fault-Tolerant File Systems situation of GFS and its place in the future world.
    [Show full text]
  • QUICK START GUIDE System Requirements
    QUICK START GUIDE System Requirements Page 1 of 158 System Requirements TABLE OF CONTENTS SERVER CommServe MediaAgents CommCell Console z CommCell Console as a Stand-Alone Application z CommCell Console as a Remote Web-Based Application BACKUP Active Directory iDataAgent DB2 iDataAgent DB2 DBF iDataAgent Documentum iDataAgent External Data Connector Image Level iDataAgent Informix iDataAgent Lotus Notes/Domino Server iDataAgents Macintosh File System iDataAgent Microsoft Exchange Server iDataAgents z Microsoft Exchange Database iDataAgent z Microsoft Exchange Mailbox iDataAgent z Microsoft Exchange Public Folder iDataAgent Microsoft SharePoint Server iDataAgent Microsoft SQL Server iDataAgent Microsoft Windows File System iDataAgent MySQL iDataAgent NAS iDataAgent Novell Directory Services iDataAgent Novell GroupWise iDataAgent OES File System iDataAgent OpenVMS File System iDataAgent Oracle iDataAgent Oracle RAC iDataAgent PostgreSQL iDataAgent SAP for MAXDB iDataAgent SAP for Oracle iDataAgent Sybase iDataAgent Unix File System iDataAgent z AIX z HP-UX z FreeBSD z Linux z Solaris z Unix Virtualization Virtual Server iDataAgent z Microsoft/Hyper-V z VMware ARCHIVE OnePass z Windows z Driverless OnePass - Windows z Unix and Linux z BlueArc z Celerra z NetApp z Exchange Mailbox Classic z Isilon and Others Page 2 of 158 z Domino Mailbox z Exchange Public Folder z SharePoint z Object Link VIRTUALIZATION Microsoft/Hyper-V VMware SEARCH Search Engine Web Server Compliance Search Web Console LAPTOP Macintosh Linux Windows REPLICATION ContinuousDataReplicator
    [Show full text]