Scalable and Efficient Data Management in Distributed Clouds: Service Provisioning and Data Processing

Total Page:16

File Type:pdf, Size:1020Kb

Scalable and Efficient Data Management in Distributed Clouds: Service Provisioning and Data Processing Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing Jad Darrous To cite this version: Jad Darrous. Scalable and Efficient Data Management in Distributed Clouds : Service Provisioning and Data Processing. Distributed, Parallel, and Cluster Computing [cs.DC]. Université de Lyon, 2019. English. NNT : 2019LYSEN077. tel-02508592 HAL Id: tel-02508592 https://tel.archives-ouvertes.fr/tel-02508592 Submitted on 15 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Numéro National de Thèse : 2019LYSEN077 THESE de DOCTORAT DE L’UNIVERSITE DE LYON opérée par l’Ecole Normale Supérieure de Lyon Ecole Doctorale N° 512 École Doctorale en Informatique et Mathématiques de Lyon Discipline : Informatique Soutenue publiquement le 17/12/2019, par : Jad DARROUS Scalable and Efficient Data Management in Distributed Clouds: Service Provisioning and Data Processing Gestion de données efficace et à grande échelle dans les clouds distribués : Déploiement de services et traitement de données Devant le jury composé de : SENS, Pierre Professeur Sorbonne Université Rapporteur PEREZ, Maria S. Professeure Universidad Politécnica de Madrid Rapporteure PIERRE, Guillaume Professeur Université de Rennes 1 Examinateur PASCAL-STOLF, Patricia Maître de Conférences IUT de Blagnac Examinatrice PEREZ, Christian Directeur de Recherche Inria Grenoble Rhône-Alpes Directeur de thèse IBRAHIM, Shadi Chargé de Recherche Inria Rennes Bretagne-Atlantique Co-encadrant de thèse Acknowledgements Reaching the point of writing this acknowledgment is the result of the help, encourage- ment, and support of many people around me that I would like to thank. I would like first to express my gratitude for the members of the jury: Guillaume Pierre and Patricia Pascal-Stolf as well as my evaluators Pierre Sens and Maria S. Perez for dedicating part of their valuable time to assess my work. I would like to express my sincere gratitude to Shadi Ibrahim. Shadi was more than an advisor, his efforts were invaluable for me during these three years scientifically and personally. I am more than happy to had the chance to work closely with him which gave me the prospect to grow as a researcher. I would like also to dedicate a big thanks to Christian Perez. Christian happily accepted to be my thesis director, provided me with ceaseless support, and taught me how to ask “metaphysical” questions without forgetting the smallest detail. Also, I would like to thank Gilles Fedak who advised me during the first year and was the first contact with my Ph.D. journey. During my Ph.D., I had the chance to work and interact with many amazing and brilliant people that I would like to thank. Special thanks go to my two co-authors of my two first papers, Amelie and Thomas, for the nice collaboration. Thanks to the members of DISCOVERY initiative for the interesting discussions and presentations. Many thanks go to the members of Stack and Avalon teams for the nice moments we spent together. Likewise, I would like to thank my friends that I have shared happy and pleasant moments with (in my remaining time!), especially Maha, Lisa, and Bachar. Last but not least, I would like to thank my beloved family for their continuous encouragement and support during this very special chapter of my academic life. In particular, I would like to thank my parents who have always believed in me. iii Abstract We are living in the era of Big Data where data is generated at an unprecedented pace from various sources all over the world. These data can be transformed into meaningful information that has a direct impact on our daily life. To cope with the tremendous volumes of Big Data, large-scale infrastructures and scalable data management techniques are needed. Cloud computing has been evolving as the de-facto platform for running data-intensive applications. Meanwhile, large-scale storage systems have been emerging to store and access data in cloud data centers. They run on top of thousands of machines to offer their aggregated storage capacities, in addition to providing reliable and fast data access. The distributed nature of the infrastructures and storage systems introduces an ideal platform to support Big Data analytics frameworks such as Hadoop or Spark to efficiently run data-intensive applications. In this thesis, we focus on scalable and efficient data management for building and running data-intensive applications. We first study the management of virtual machine images and containers images as the main entry point for efficient service provisioning. Accordingly, we design, implement and evaluate Nitro, a novel VMI management system that helps to minimize the transfer time of VMIs over a heterogeneous wide area network (WAN). In addition, we present two container image placement algorithms which aim to reduce the maximum retrieval time of container images in Edge infrastructures. Second, towards efficient Big Data processing in the cloud, we investigate erasure coding (EC) as a scalable yet cost-efficient alternative for replication in data-intensive clusters. In particular, we conduct experiments to thoroughly understand the performance of data- intensive applications under replication and EC. We identify that data reads under EC are skewed and can result in noticeable performance degradation of data-intensive applica- tions. We then introduce an EC-aware data placement algorithm that targets balancing data accesses across nodes and therefore improving the performance of data-intensive applications under EC. v Résumé Si nous stockions tous les mots jamais prononcés par l’être humain, leur volume serait égale à celui des données générées en seulement deux jours1. Le dernier rapport de l’International Data Corporation (IDC) prévoit même que les données générées dépassent les 175 zettaoctets2 d’ici à 2025 [130]. Ces données sont produites par diverses sources telles que des capteurs, des médias sociaux ou des simulations scientifiques. Ce flot de données permet aux entreprises et aux universités d’améliorer leurs services et leurs connaissances qui ont un impact direct sur nos vies. Par exemple, les grandes sociétés Internet telles que Google et Facebook analysent les données recueillies quotidiennement pour améliorer l’expérience des utilisateurs, tandis que des instituts de recherche comme le Laboratoire National d’Argonne effectuent des simulations complexes pour repousser les limites du savoir humain [255]. Il est donc indispensable de permettre une gestion des données efficace à grande échelle pour transformer ces gigantesques volumes de données en informations utilisables. Pour faire face à l’énorme volume de données (ce qu’on appelle le “Big Data”), des infrastructures et des techniques de gestion de données à grande échelle sont néces- saires. De fait, les centres de calculs sont devenus les plate-formes privilégiées pour l’exécution d’applications de type « Big Data ». les centres de calculs donnent l’illusion de ressources infinies auxquelles on peut accéder à bas prix. Récemment, des fournisseurs d’informatique en cloud tels qu’Amazon, Microsoft et Google ont doté leurs infrastruc- tures de millions de serveurs répartis dans le monde entier pour faciliter la gestion des données massives. Par exemple, Amazon Web Services (AWS) compte au total près de 5 millions de serveurs [144], hébergés dans des centaines de centres de données répartis sur 5 continents [26], et des millions de services y sont lancés tous les jours [35]. D’autre part, des systèmes de stockage à grande échelle sont apparus pour stocker et accéder aux données dans ces centres de données. Ils fonctionnent sur des milliers de machines offrant leurs capacités de stockage agrégées, en plus de fournir un accès fiable et rapide aux don- nées. Par exemple, le système cloud Windows Azure Storage (WAS) héberge plus d’un exaoctet de données [123] et traite plus de deux milliards de transactions par jour [45]. La nature distribuée des infrastructures et des systèmes de stockage en fait une plate-forme idéale pour prendre en charge les infrastructures logicielles d’analyse Big Data telles que Hadoop [19] ou Spark [20] pour exécuter efficacement des applications de type « Big Data ». Par exemple, plus d’un million de tâche Hadoop, traitant des dizaines de pétaoctets de données, sont lancés chaque mois dans les centres de données de Facebook [49]. En général, les services cloud, y compris les services d’analyse de données (p. ex. 1On estime que 2,5 exaoctets sont produits chaque jour [117] et que tous les mots jamais prononcés par l’être humain ont une taille de 5 exaoctets [116]. 21 zettaoctet équivaut à 1021 octets, soit 1 million de pétaoctets. vii Amazon Elastic MapReduce [14] ou Microsoft Azure HDInsight [112]), sont déployés en tant que machines virtuelles (VM, virtual machine) ou conteneurs. Le déploiement d’un service pour exécuter des applications cloud nécessite que l’image du service (une image qui encapsule le système d’exploitation et le logiciel du service ainsi que ses dépendances) soit stockée localement sur le serveur hôte. Sinon, l’image correspondante est transférée via le réseau vers le serveur de destination, ce qui introduit un délai dans le temps de déploiement. La demande croissante de fourniture rapide de services, en plus de la taille et du nombre croissants d’images de services (p. ex. AWS fournit plus de 20 000 images différentes3, et leurs tailles atteignent des dizaines de gigaoctets [35, 258]) rendent la gestion des images de service essentielle pour la déploiement de services dans le cloud.
Recommended publications
  • Achieving Superior Manageability, Efficiency, and Data Protection With
    An Oracle White Paper December 2010 Achieving Superior Manageability, Efficiency, and Data Protection with Oracle’s Sun ZFS Storage Software Achieving Superior Manageability, Efficiency, and Data Protection with Oracle’s Sun ZFS Storage Software Introduction ......................................................................................... 2 Oracle’s Sun ZFS Storage Software ................................................... 3 Simplifying Storage Deployment and Management ............................ 3 Browser User Interface (BUI) ......................................................... 3 Built-in Networking and Security ..................................................... 4 Transparent Optimization with Hybrid Storage Pools ...................... 4 Shadow Data Migration ................................................................... 5 Third-party Confirmation of Management Efficiency ....................... 6 Improving Performance with Real-time Storage Profiling .................... 7 Increasing Storage Efficiency .............................................................. 8 Data Compression .......................................................................... 8 Data Deduplication .......................................................................... 9 Thin Provisioning ............................................................................ 9 Space-efficient Snapshots and Clones ......................................... 10 Reducing Risk with Industry-leading Data Protection ........................ 10 Self-Healing
    [Show full text]
  • What If What I Need Is Not in Powerai (Yet)? What You Need to Know to Build from Scratch?
    IBM Systems What if what I need is not in PowerAI (yet)? What you need to know to build from scratch? Jean-Armand Broyelle June 2018 IBM Systems – Cognitive Era Things to consider when you have to rebuild a framework © 2017 International Business Machines Corporation 2 IBM Systems – Cognitive Era CUDA Downloads © 2017 International Business Machines Corporation 3 IBM Systems – Cognitive Era CUDA 8 – under Legacy Releases © 2017 International Business Machines Corporation 4 IBM Systems – Cognitive Era CUDA 8 Install Steps © 2017 International Business Machines Corporation 5 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 6 IBM Systems – Cognitive Era cuDNN v6.0 for CUDA 8.0 © 2017 International Business Machines Corporation 7 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 8 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 9 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 10 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 11 IBM Systems – Cognitive Era Prepare your environment • When something goes wrong it’s better to Remove local anaconda installation $ cd ~; rm –rf anaconda2 .conda • Reinstall anaconda $ cd /tmp; wget https://repo.anaconda.com/archive/Anaconda2-5.1.0-Linux- ppc64le.sh $ bash /tmp/Anaconda2-5.1.0-Linux-ppc64le.sh • Activate PowerAI $ source /opt/DL/tensorflow/bin/tensorflow-activate • When you
    [Show full text]
  • Red Hat Data Analytics Infrastructure Solution
    TECHNOLOGY DETAIL RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION TABLE OF CONTENTS 1 INTRODUCTION ................................................................................................................ 2 2 RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION ..................................... 2 2.1 The evolution of analytics infrastructure ....................................................................................... 3 Give data scientists and data 2.2 Benefits of a shared data repository on Red Hat Ceph Storage .............................................. 3 analytics teams access to their own clusters without the unnec- 2.3 Solution components ...........................................................................................................................4 essary cost and complexity of 3 TESTING ENVIRONMENT OVERVIEW ............................................................................ 4 duplicating Hadoop Distributed File System (HDFS) datasets. 4 RELATIVE COST AND PERFORMANCE COMPARISON ................................................ 6 4.1 Findings summary ................................................................................................................................. 6 Rapidly deploy and decom- 4.2 Workload details .................................................................................................................................... 7 mission analytics clusters on 4.3 24-hour ingest ........................................................................................................................................8
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Reflex: Remote Flash at the Performance of Local Flash
    ReFlex: Remote Flash at the Performance of Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis Flash Memory Summit 2017 Santa Clara, CA Stanford MAST1 ! Flash in Datacenters § NVMe Flash • 1,000x higher throughput than disk (1MIOPS) • 100x lower latency than disk (50-70usec) § But Flash is often underutilized due to imbalanced resource requirements Example Datacenter Flash Use-Case Applica(on Tier Datastore Tier Datastore Service get(k) Key-Value Store So9ware put(k,val) App App Tier Tier App Clients Servers TCP/IP NIC CPU RAM Hardware Flash 3 Imbalanced Resource Utilization Sample utilization of Facebook servers hosting a Flash-based key-value store over 6 months 4 [EuroSys’16] Flash storage disaggregation. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar. Imbalanced Resource Utilization Sample utilization of Facebook servers hosting a Flash-based key-value store over 6 months 5 [EuroSys’16] Flash storage disaggregation. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar. Imbalanced Resource Utilization u"lizaon Sample utilization of Facebook servers hosting a Flash-based key-value store over 6 months 6 [EuroSys’16] Flash storage disaggregation. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar. Imbalanced Resource Utilization u"lizaon Flash capacity and bandwidth are underutilized for long periods of time 7 [EuroSys’16] Flash storage disaggregation. Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar Solution: Remote Access to Storage § Improve resource utilization by sharing NVMe Flash between remote tenants § There are 3 main concerns: 1. Performance overhead for remote access 2. Interference on shared Flash 3. Flexibility of Flash usage 8 Issue 1: Performance Overhead 4kB random read 1000 Local Flash iSCSI (1 core) 800 libaio+libevent (1core) 600 4x throughput drop 400 200 p95 read latency(us) 2× latency increase 0 0 50 100 150 200 250 300 IOPS (Thousands) • Traditional network/storage protocols and Linux I/O libraries (e.g.
    [Show full text]
  • Big Data Storage Workload Characterization, Modeling and Synthetic Generation
    BIG DATA STORAGE WORKLOAD CHARACTERIZATION, MODELING AND SYNTHETIC GENERATION BY CRISTINA LUCIA ABAD DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Professor Roy H. Campbell, Chair Professor Klara Nahrstedt Associate Professor Indranil Gupta Assistant Professor Yi Lu Dr. Ludmila Cherkasova, HP Labs Abstract A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. As Big Data stresses the storage layer in new ways, a better understanding of these workloads and the availability of flexible workload generators are increas- ingly important to facilitate the proper design and performance tuning of storage subsystems like data replication, metadata management, and caching. Our hypothesis is that the autonomic modeling of Big Data storage system workloads through a combination of measurement, and statistical and machine learning techniques is feasible, novel, and useful. We consider the case of one common type of Big Data storage cluster: A cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large clusters at Yahoo and identify interesting properties of the workloads. We present a novel model for capturing popularity and short-term temporal correlations in object re- quest streams, and show how unsupervised statistical clustering can be used to enable autonomic type-aware workload generation that is suitable for emerging workloads. We extend this model to include other relevant properties of stor- age systems (file creation and deletion, pre-existing namespaces and hierarchical namespaces) and use the extended model to implement MimesisBench, a realistic namespace metadata benchmark for next-generation storage systems.
    [Show full text]
  • Open Source Copyrights
    Kuri App - Open Source Copyrights: 001_talker_listener-master_2015-03-02 ===================================== Source Code can be found at: https://github.com/awesomebytes/python_profiling_tutorial_with_ros 001_talker_listener-master_2016-03-22 ===================================== Source Code can be found at: https://github.com/ashfaqfarooqui/ROSTutorials acl_2.2.52-1_amd64.deb ====================== Licensed under GPL 2.0 License terms can be found at: http://savannah.nongnu.org/projects/acl/ acl_2.2.52-1_i386.deb ===================== Licensed under LGPL 2.1 License terms can be found at: http://metadata.ftp- master.debian.org/changelogs/main/a/acl/acl_2.2.51-8_copyright actionlib-1.11.2 ================ Licensed under BSD Source Code can be found at: https://github.com/ros/actionlib License terms can be found at: http://wiki.ros.org/actionlib actionlib-common-1.5.4 ====================== Licensed under BSD Source Code can be found at: https://github.com/ros-windows/actionlib License terms can be found at: http://wiki.ros.org/actionlib adduser_3.113+nmu3ubuntu3_all.deb ================================= Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all. deb alsa-base_1.0.25+dfsg-0ubuntu4_all.deb ====================================== Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/alsa- driver/alsa-base_1.0.25+dfsg-0ubuntu4_all.deb alsa-utils_1.0.27.2-1ubuntu2_amd64.deb ======================================
    [Show full text]
  • Unlock Bigdata Analytic Efficiency with Ceph Data Lake
    Unlock Bigdata Analytic Efficiency With Ceph Data Lake Jian Zhang, Yong Fu, March, 2018 Agenda . Background & Motivations . The Workloads, Reference Architecture Evolution and Performance Optimization . Performance Comparison with Remote HDFS . Summary & Next Step 3 Challenges of scaling Hadoop* Storage BOUNDED Storage and Compute resources on Hadoop Nodes brings challenges Data Capacity Silos Costs Performance & efficiency Typical Challenges Data/Capacity Multiple Storage Silos Space, Spent, Power, Utilization Upgrade Cost Inadequate Performance Provisioning And Configuration Source: 451 Research, Voice of the Enterprise: Storage Q4 2015 *Other names and brands may be claimed as the property of others. 4 Options To Address The Challenges Compute and Large Cluster More Clusters Storage Disaggregation • Lacks isolation - • Cost of • Isolation of high- noisy neighbors duplicating priority workloads hinder SLAs datasets across • Shared big • Lacks elasticity - clusters datasets rigid cluster size • Lacks on-demand • On-demand • Can’t scale provisioning provisioning compute/storage • Can’t scale • compute/storage costs separately compute/storage costs scale costs separately separately Compute and Storage disaggregation provides Simplicity, Elasticity, Isolation 5 Unified Hadoop* File System and API for cloud storage Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/ adl:// oss:// s3n:// gs:// s3:// s3a:// wasb:// 2006 2008 2014 2015 2016 6 Proposal: Apache Hadoop* with disagreed Object Storage SQL …… Hadoop Services • Virtual Machine • Container • Bare Metal HCFS Compute 1 Compute 2 Compute 3 … Compute N Object Storage Services Object Object Object Object • Co-located with gateway Storage 1 Storage 2 Storage 3 … Storage N • Dynamic DNS or load balancer • Data protection via storage replication or erasure code Disaggregated Object Storage Cluster • Storage tiering *Other names and brands may be claimed as the property of others.
    [Show full text]
  • Multi-Resolution Storage and Search in Sensor Networks Deepak Ganesan University of Massachusetts Amherst
    University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Computer Science Series 2003 Multi-resolution Storage and Search in Sensor Networks Deepak Ganesan University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/cs_faculty_pubs Part of the Computer Sciences Commons Recommended Citation Ganesan, Deepak, "Multi-resolution Storage and Search in Sensor Networks" (2003). Computer Science Department Faculty Publication Series. 79. Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/79 This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Computer Science Department Faculty Publication Series by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Multi-resolution Storage and Search in Sensor Networks Deepak Ganesan Department of Computer Science, University of Massachusetts, Amherst, MA 01003 Ben Greenstein, Deborah Estrin Department of Computer Science, University of California, Los Angeles, CA 90095, John Heidemann University of Southern California/Information Sciences Institute, Marina Del Rey, CA 90292 and Ramesh Govindan Department of Computer Science, University of Southern California, Los Angeles, CA 90089 Wireless sensor networks enable dense sensing of the environment, offering unprecedented op- portunities for observing the physical world. This paper addresses two key challenges in wireless sensor networks: in-network storage and distributed search. The need for these techniques arises from the inability to provide persistent, centralized storage and querying in many sensor networks. Centralized storage requires multi-hop transmission of sensor data to internet gateways which can quickly drain battery-operated nodes.
    [Show full text]
  • Netbackup ™ Enterprise Server and Server 8.0 - 8.X.X OS Software Compatibility List Created on September 08, 2021
    Veritas NetBackup ™ Enterprise Server and Server 8.0 - 8.x.x OS Software Compatibility List Created on September 08, 2021 Click here for the HTML version of this document. <https://download.veritas.com/resources/content/live/OSVC/100046000/100046611/en_US/nbu_80_scl.html> Copyright © 2021 Veritas Technologies LLC. All rights reserved. Veritas, the Veritas Logo, and NetBackup are trademarks or registered trademarks of Veritas Technologies LLC in the U.S. and other countries. Other names may be trademarks of their respective owners. Veritas NetBackup ™ Enterprise Server and Server 8.0 - 8.x.x OS Software Compatibility List 2021-09-08 Introduction This Software Compatibility List (SCL) document contains information for Veritas NetBackup 8.0 through 8.x.x. It covers NetBackup Server (which includes Enterprise Server and Server), Client, Bare Metal Restore (BMR), Clustered Master Server Compatibility and Storage Stacks, Deduplication, File System Compatibility, NetBackup OpsCenter, NetBackup Access Control (NBAC), SAN Media Server/SAN Client/FT Media Server, Virtual System Compatibility and NetBackup Self Service Support. It is divided into bookmarks on the left that can be expanded. IPV6 and Dual Stack environments are supported from NetBackup 8.1.1 onwards with few limitations, refer technote for additional information <http://www.veritas.com/docs/100041420> For information about certain NetBackup features, functionality, 3rd-party product integration, Veritas product integration, applications, databases, and OS platforms that Veritas intends to replace with newer and improved functionality, or in some cases, discontinue without replacement, please see the widget titled "NetBackup Future Platform and Feature Plans" at <https://sort.veritas.com/netbackup> Reference Article <https://www.veritas.com/docs/100040093> for links to all other NetBackup compatibility lists.
    [Show full text]
  • Latency-Aware, Inline Data Deduplication for Primary Storage
    iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. fskiran, tbisson, goodson, [email protected] Abstract systems exist that deduplicate inline with client requests for latency sensitive primary workloads. All prior dedu- Deduplication technologies are increasingly being de- plication work focuses on either: i) throughput sensitive ployed to reduce cost and increase space-efficiency in archival and backup systems [8, 9, 15, 21, 26, 39, 41]; corporate data centers. However, prior research has not or ii) latency sensitive primary systems that deduplicate applied deduplication techniques inline to the request data offline during idle time [1, 11, 16]; or iii) file sys- path for latency sensitive, primary workloads. This is tems with inline deduplication, but agnostic to perfor- primarily due to the extra latency these techniques intro- mance [3, 36]. This paper introduces two novel insights duce. Inherently, deduplicating data on disk causes frag- that enable latency-aware, inline, primary deduplication. mentation that increases seeks for subsequent sequential Many primary storage workloads (e.g., email, user di- reads of the same data, thus, increasing latency. In addi- rectories, databases) are currently unable to leverage the tion, deduplicating data requires extra disk IOs to access benefits of deduplication, due to the associated latency on-disk deduplication metadata. In this paper, we pro- costs. Since offline deduplication systems impact la-
    [Show full text]
  • Key Exchange Authentication Protocol for Nfs Enabled Hdfs Client
    Journal of Theoretical and Applied Information Technology 15 th April 2017. Vol.95. No 7 © 2005 – ongoing JATIT & LLS ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195 KEY EXCHANGE AUTHENTICATION PROTOCOL FOR NFS ENABLED HDFS CLIENT 1NAWAB MUHAMMAD FASEEH QURESHI, 2*DONG RYEOL SHIN, 3ISMA FARAH SIDDIQUI 1,2 Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, South Korea 3 Department of Software Engineering, Mehran UET, Pakistan *Corresponding Author E-mail: [email protected], 2*[email protected], [email protected] ABSTRACT By virtue of its built-in processing capabilities for large datasets, Hadoop ecosystem has been utilized to solve many critical problems. The ecosystem consists of three components; client, Namenode and Datanode, where client is a user component that requests cluster operations through Namenode and processes data blocks at Datanode enabled with Hadoop Distributed File System (HDFS). Recently, HDFS has launched an add-on to connect a client through Network File System (NFS) that can upload and download set of data blocks over Hadoop cluster. In this way, a client is not considered as part of the HDFS and could perform read and write operations through a contrast file system. This HDFS NFS gateway has raised many security concerns, most particularly; no reliable authentication support of upload and download of data blocks, no local and remote client efficient connectivity, and HDFS mounting durability issues through untrusted connectivity. To stabilize the NFS gateway strategy, we present in this paper a Key Exchange Authentication Protocol (KEAP) between NFS enabled client and HDFS NFS gateway. The proposed approach provides cryptographic assurance of authentication between clients and gateway.
    [Show full text]