Inexpensive Storage

Total Page:16

File Type:pdf, Size:1020Kb

Inexpensive Storage BeoLink.org Design and build an inexpensive DFS Fabrizio Manfredi Furuholmen FrOSCon August 2008 Agenda BeoLink.org Overview Introduction Old way openAFS New way Hadoop CEPH Conclusion Overview BeoLink.org Why Distributed File system ? • Handle terabytes of data • Transparent to final user • Working in WAN environment • Good level of scalability • Inexpensive • Performance Overview BeoLink.org Software vs Hardware Centralize Storage DFS • Block device (SAN) • Single file system across • Shared file system (NAS) multiple computer nodes • Simple System Management • More complicated System • Single point of failure Management • Scalable • HA (sometime) Overview BeoLink.org DFS Advantages Small number of inexpensive fileservers provides similar performance to client side Increase in capacity are inexpensive Better manageability and redundancy. Overview BeoLink.org Inexpensive Terabyte Cost (SAS/FB) 12 TB Storage (SATA) • 18k $ NAS/SAN • 5k $ DFS Device Device Price Total ($) Price ($) Terabyte Cost (SATA) • 2.5k $ NAS/SAN SAN 1 37,000 37,000 • 0.7 $ DFS File Server 1 23,995 23,995 Disks Type • 500/1000GB SATA Disk reduce >50% DFS 3 2,500 7,500 Installation • space • network • supply 96 TB Storage (SATA) Software Device Device Price Total • Port extension ($) Price ($) • Special software for HA SAN 1 249,995 249,995 Discount • Dumping DFS 16 4,500 72,000 Introduction BeoLink.org DFS Distributed file systems NFS, CIFS, Netware.. Distributed fault tolerant file CODA, MS DFS.. systems Distributed parallel file VPFS2,LUSTRE.. systems Distributed parallel fault tolerant file Hadoop, GlusterFS, systems MogileFS.. Peer-to-peer file systems Ivy, Infinit.. openAFS BeoLink.org Intruduction • Client Caching • Replication Scalability • Load balance among servers while data is in use Transparent Access • Cell • Partitions and volumes and Uniform • Mount Points Namespace • In-use volume moves • Authentication and secure communication • Authorization and flexible access control Security • Single system interface • Administration tasks without system outage System Management • Delegation • Backup openAFS BeoLink.org Main Elements Cell • Cell is collection of file servers and workstation • The directories under /afs are cells, unique tree • Fileserver contains volumes Volumes • Volumes are "containers" or sets of related files and directories • Have size limit • 3 type rw, ro, backup Mount Point Directory • Access to a volume is provided through a mount point • A mount point looks and just like a static directory openAFS BeoLink.org Server Types Fileserver Server • Fileserver, delivers data files from the file server machine to workstations • Volume Server (Vol Server), performs all types of volume manipulation Database Server • Volume Location Server (VL Server), maintains the Volume Location Database (VLDB) • Protection Server (Ptserver), Users can grant access to several other users. • Authentication Server(Kaserver), AFS version of kerberos IV (deprecated). • Backup Server (Buserver), it stores information related to the Backup System. Ubik • Distributed Database openAFS BeoLink.org Implementation Problem: Company file system • Share documents • User home dir • Application file storage • WAN Environment Solution • openAFS • Scalable, HA, good in WAN, inexpensive • More then 20 platforms • Samba (Gateway) • AFS windows client slow and bit unstable • Clientless • Heimdal Kerberos (SSO) • KA emulation • LDAP backend • Openldap • Centralize Identity storage openAFS BeoLink.org Usage Read/Write Volume • Shared development areas • Documentation data storage • User home directories Read-Only Volume • Application deployment • Application executables (binaries, libraries, scripts) • Configuration files • Documentations (Model) openAFS BeoLink.org Design Scalability • Storage scalability (File system layer) • User scalability (Samba Gateway layer) Performance • Load balancing • Roaming user/branch office Clientless • Windows client Centralized Identity • Kerberos • Ldap openAFS BeoLink.org Tricks Cache on Plan the At least 3 separated Directory servers disk Tree Use volume Replicate Use volume name that read only much as explain data possible mount point Replicate “mount 400 clients point” per server volume openAFS BeoLink.org Enviroment 3 AFS Server (3TB) • Disk 6 x 300 SAS RAID 5 • 2 Gigabits Ethernet • 2 Processor Xeon • 2 GB Ram 2 Samba Server • Disk 2 x 73 SAS RAID 1 • 2 Gigabits Ethernet • 2 Processor Xeon • 4 GB Ram 2 Switch (Backbone) • 24 port Users • 400 Concurrent Unix • 250 Concurrent Windows openAFS BeoLink.org Linux Performance • 20-35 MB/s Write • Warm 35-100 MB/s Read • Cold 30-45 MB/s openAFS BeoLink.org Windows through Samba Performance • 18-25 MB/s Write Read • 20-50 MB/s openAFS BeoLink.org Who use it ? Morgan Stanley IT • Internal usage • Storage: 450 TB (ro)+ 15 TB (rw) • Client: 22.000 Pictage, Inc • Online picture album • Storage: 265TB ( planned growth to 425TB in twelve months) • Volumes: 800,000. • Files: 200 000 000. Embian • Internet Shared folder • Storage: 500TB • Server: 200 Storage server • 300 App server RZH • Internal usage 210TB openAFS BeoLink.org Good for.. Good • General purpose • Wide Area Network • Heterogeneous System • Read operation > write operation • Small File Bad • Locking • Database • Unicode • Performance (until OSD) New way BeoLink.org • Object-based storage • Separation of file metadata management (MDS) from OS the storage of file data • Object storage devices • Replace the traditional block-level interface with one named object OSDs New way BeoLink.org • Multiple streams are parallel channels through which data can flow, thus improving the rate at which data can be written to the storage media Stream • Files are striped across a set of nodes in order to facilitate parallel access • Chunk simplify fault tolerance operation. Chunk Hadoop BeoLink.org Introduction Scalable: can reliably store and process petabytes. Economical: It distributes the data and processing across clusters of commonly available computers. “Moving Computation is Cheaper than Moving Data” Efficient: can process data in parallel on the nodes where the data is located. Reliable: automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. Hadoop BeoLink.org MapReduce MapReduce • it is an associated implementation for processing and generating large data sets. Map • It is a function that processes a key/value pair to generate a set of intermediate key/value pairs, Reduce • It is a function that merges all intermediate values associated with the same intermediate key. Hadoop BeoLink.org Map and Reduce Map • Split and mapped in key- value pairs Combine • For efficiency reasons, the combiner work directly to map operation outputs . Reduce • The files are then merge, sorted and reduced Hadoop BeoLink.org HDFS Architecture Hadoop BeoLink.org Implementation Problem: Log centralization • Centralized log, keep track of all system activity • Search and statistics Solution • HDFS • Scalable, HA, distribution task • Hearbeat+DRDB • HA namenode • Syslog-ng • Flexible and scalable • Grep MapReduce Function • Mail logging • Firewall logging • Webserver logging • Generic Syslog Hadoop BeoLink.org Solution: Scale on demand • Increase syslog concentrator • Hadoop cluster size Performance • Dedicated mapReduce function for report and search • Parallel operation High Availability • Internal replication • Distribution on different shelf Hadoop BeoLink.org Enviroment 2 Log Server • Disk 2 x 143 SAS RAID 1 • 2 Gigabits Ethernet • 2 Processor Xeon • 4 GB Ram 2 Switch (Backbone) • 24 port Gigabit Hadoop • 2 namenode 8gb,300GB, 2Xeon • 5 node 4gb, 2TB, 2 Xeon Hadoop BeoLink.org Tricks Much server Parallel Block size as possible Streams Map / Good No old Reduce/ Network hardware Partitioning (Gigabits) fuctions Simple Software distribution Hadoop BeoLink.org Who use it ? Yahoo! • 2000 nodes (2*4cpu boxes w 3TB disk each) • Used to support research for Ad Systems and Web Search A9.com - Amazon • Amazon's product search Facebook • Internal log storage • Reporting/analytics and machine learning • 320 machine cluster with 2,560 cores and about 1.3 PB raw storage Last.fm • Charts calculation and web log analysis • 25 node cluster (dual Xeon LV 2GHz, 4GB RAM, 1TB/node storage) • 10 node cluster (dual Xeon L5320 1.86GHz, 8GB RAM, 3TB/node storage) Hadoop BeoLink.org Good for.. Good • Task distribution (Basic GRID infrastructure) • Distribution of content (High throughput of data access ) • Read operations >> Write operations Bad • Not General purpose File system • Not Posix Compliant • Low granularity in security setting Ceph BeoLink.org Next Generation Ceph addresses three critical challenges of system storage Scalability Performance Reliability Ceph BeoLink.org Introduction Capabilities • POSIX semantics. • Seamless scaling from a few nodes to many thousands • Gigabytes to Petabytes • High availability and reliability • No single points of failure • N-way replication of all data across multiple nodes • Automatic rebalancing of data on node addition/ removal to efficiently utilize device resources • Easy deployment (userspace daemons) Ceph BeoLink.org Architecture • Client • Metadata Cluster OSD • Object Storage Cluster Ceph BeoLink.org Architecture difference Dynamic Distributed Metadata • Metadata Storage • Dynamic Subtree Partitionin • Traffic Control Reliable Autonomic Distributed Object Storage • Data Distribution • Replication • Data Safety • Failure Detection • Recovery and Cluster Updates Ceph
Recommended publications
  • Elastic Storage for Linux on System Z IBM Research & Development Germany
    Peter Münch T/L Test and Development – Elastic Storage for Linux on System z IBM Research & Development Germany Elastic Storage for Linux on IBM System z © Copyright IBM Corporation 2014 9.0 Elastic Storage for Linux on System z Session objectives • This presentation introduces the Elastic Storage, based on General Parallel File System technology that will be available for Linux on IBM System z. Understand the concepts of Elastic Storage and which functions will be available for Linux on System z. Learn how you can integrate and benefit from the Elastic Storage in a Linux on System z environment. Finally, get your first impression in a live demo of Elastic Storage. 2 © Copyright IBM Corporation 2014 Elastic Storage for Linux on System z Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* FlashSystem Storwize* Tivoli* DB2* IBM* System p* WebSphere* DS8000* IBM (logo)* System x* XIV* ECKD MQSeries* System z* z/VM* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • Optimizing the Ceph Distributed File System for High Performance Computing
    Optimizing the Ceph Distributed File System for High Performance Computing Kisik Jeong Carl Duffy Jin-Soo Kim Joonwon Lee Sungkyunkwan University Seoul National University Seoul National University Sungkyunkwan University Suwon, South Korea Seoul, South Korea Seoul, South Korea Suwon, South Korea [email protected] [email protected] [email protected] [email protected] Abstract—With increasing demand for running big data ana- ditional HPC workloads. Traditional HPC workloads perform lytics and machine learning workloads with diverse data types, reads from a large shared file stored in a file system, followed high performance computing (HPC) systems consequently need by writes to the file system in a parallel manner. In this case, to support diverse types of storage services. Ceph is one possible candidate for such HPC environments, as Ceph provides inter- a storage service should provide good sequential read/write faces for object, block, and file storage. Ceph, however, is not performance. However, Ceph divides large files into a number designed for HPC environments, thus it needs to be optimized of chunks and distributes them to several different disks. This for HPC workloads. In this paper, we find and analyze problems feature has two main problems in HPC environments: 1) Ceph that arise when running HPC workloads on Ceph, and propose translates sequential accesses into random accesses, 2) Ceph a novel optimization technique called F2FS-split, based on the F2FS file system and several other optimizations. We measure needs to manage many files, incurring high journal overheads the performance of Ceph in HPC environments, and show that in the underlying file system.
    [Show full text]
  • Proxmox Ve Mit Ceph &
    PROXMOX VE MIT CEPH & ZFS ZUKUNFTSSICHERE INFRASTRUKTUR IM RECHENZENTRUM Alwin Antreich Proxmox Server Solutions GmbH FrOSCon 14 | 10. August 2019 Alwin Antreich Software Entwickler @ Proxmox 15 Jahre in der IT als Willkommen! System / Netzwerk Administrator FrOSCon 14 | 10.08.2019 2/33 Proxmox Server Solutions GmbH Aktive Community Proxmox seit 2005 Globales Partnernetz in Wien (AT) Proxmox Mail Gateway Enterprise (AGPL,v3) Proxmox VE (AGPL,v3) Support & Services FrOSCon 14 | 10.08.2019 3/33 Traditionelle Infrastruktur FrOSCon 14 | 10.08.2019 4/33 Hyperkonvergenz FrOSCon 14 | 10.08.2019 5/33 Hyperkonvergente Infrastruktur FrOSCon 14 | 10.08.2019 6/33 Voraussetzung für Hyperkonvergenz CPU / RAM / Netzwerk / Storage Verwende immer genug von allem. FrOSCon 14 | 10.08.2019 7/33 FrOSCon 14 | 10.08.2019 8/33 Was ist ‚das‘? ● Ceph & ZFS - software-basierte Storagelösungen ● ZFS lokal / Ceph verteilt im Netzwerk ● Hervorragende Performance, Verfügbarkeit und https://ceph.io/ Skalierbarkeit ● Verwaltung und Überwachung mit Proxmox VE ● Technischer Support für Ceph & ZFS inkludiert in Proxmox Subskription http://open-zfs.org/ FrOSCon 14 | 10.08.2019 9/33 FrOSCon 14 | 10.08.2019 10/33 FrOSCon 14 | 10.08.2019 11/33 ZFS Architektur FrOSCon 14 | 10.08.2019 12/33 ZFS ARC, L2ARC and ZIL With ZIL Without ZIL ARC RAM ARC RAM ZIL ZIL Application HDD Application SSD HDD FrOSCon 14 | 10.08.2019 13/33 FrOSCon 14 | 10.08.2019 14/33 FrOSCon 14 | 10.08.2019 15/33 Ceph Network Ceph Docs: https://docs.ceph.com/docs/master/ FrOSCon 14 | 10.08.2019 16/33 FrOSCon
    [Show full text]
  • Red Hat Openstack* Platform with Red Hat Ceph* Storage
    Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Reference Architecture Guide for deploying a private cloud based on Red Hat OpenStack* Platform with Red Hat Ceph Storage using Intel® Server Products. Rev 1.0 January 2017 Intel® Server Products and Solutions <Blank page> Intel® Data Center Blocks for Cloud – Red Hat* OpenStack* Platform with Red Hat Ceph* Storage Document Revision History Date Revision Changes January 2017 1.0 Initial release. 3 Intel® Data Center Blocks for Cloud – Red Hat® OpenStack® Platform with Red Hat Ceph Storage Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at Intel.com, or from the OEM or retailer. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.
    [Show full text]
  • BESIII Physical Analysis on Hadoop Platform
    20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013) IOP Publishing Journal of Physics: Conference Series 513 (2014) 032044 doi:10.1088/1742-6596/513/3/032044 BESIII Physical Analysis on Hadoop Platform Jing HUO12, Dongsong ZANG12, Xiaofeng LEI12, Qiang LI12, Gongxing SUN1 1Institute of High Energy Physics, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China E-mail: [email protected] Abstract. In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure. 1. Introduction 1.1 The current BESIII computing architecture High Energy Physics experiment is a typical data-intensive application. The BESIII computing system now consist of 3PB+ data, 6500 CPU cores, and it is estimated that there will be more than 10PB data produced in the future 5 years.
    [Show full text]
  • Red Hat Data Analytics Infrastructure Solution
    TECHNOLOGY DETAIL RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION TABLE OF CONTENTS 1 INTRODUCTION ................................................................................................................ 2 2 RED HAT DATA ANALYTICS INFRASTRUCTURE SOLUTION ..................................... 2 2.1 The evolution of analytics infrastructure ....................................................................................... 3 Give data scientists and data 2.2 Benefits of a shared data repository on Red Hat Ceph Storage .............................................. 3 analytics teams access to their own clusters without the unnec- 2.3 Solution components ...........................................................................................................................4 essary cost and complexity of 3 TESTING ENVIRONMENT OVERVIEW ............................................................................ 4 duplicating Hadoop Distributed File System (HDFS) datasets. 4 RELATIVE COST AND PERFORMANCE COMPARISON ................................................ 6 4.1 Findings summary ................................................................................................................................. 6 Rapidly deploy and decom- 4.2 Workload details .................................................................................................................................... 7 mission analytics clusters on 4.3 24-hour ingest ........................................................................................................................................8
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Unlock Bigdata Analytic Efficiency with Ceph Data Lake
    Unlock Bigdata Analytic Efficiency With Ceph Data Lake Jian Zhang, Yong Fu, March, 2018 Agenda . Background & Motivations . The Workloads, Reference Architecture Evolution and Performance Optimization . Performance Comparison with Remote HDFS . Summary & Next Step 3 Challenges of scaling Hadoop* Storage BOUNDED Storage and Compute resources on Hadoop Nodes brings challenges Data Capacity Silos Costs Performance & efficiency Typical Challenges Data/Capacity Multiple Storage Silos Space, Spent, Power, Utilization Upgrade Cost Inadequate Performance Provisioning And Configuration Source: 451 Research, Voice of the Enterprise: Storage Q4 2015 *Other names and brands may be claimed as the property of others. 4 Options To Address The Challenges Compute and Large Cluster More Clusters Storage Disaggregation • Lacks isolation - • Cost of • Isolation of high- noisy neighbors duplicating priority workloads hinder SLAs datasets across • Shared big • Lacks elasticity - clusters datasets rigid cluster size • Lacks on-demand • On-demand • Can’t scale provisioning provisioning compute/storage • Can’t scale • compute/storage costs separately compute/storage costs scale costs separately separately Compute and Storage disaggregation provides Simplicity, Elasticity, Isolation 5 Unified Hadoop* File System and API for cloud storage Hadoop Compatible File System abstraction layer: Unified storage API interface Hadoop fs –ls s3a://job/ adl:// oss:// s3n:// gs:// s3:// s3a:// wasb:// 2006 2008 2014 2015 2016 6 Proposal: Apache Hadoop* with disagreed Object Storage SQL …… Hadoop Services • Virtual Machine • Container • Bare Metal HCFS Compute 1 Compute 2 Compute 3 … Compute N Object Storage Services Object Object Object Object • Co-located with gateway Storage 1 Storage 2 Storage 3 … Storage N • Dynamic DNS or load balancer • Data protection via storage replication or erasure code Disaggregated Object Storage Cluster • Storage tiering *Other names and brands may be claimed as the property of others.
    [Show full text]
  • Evaluation of Active Storage Strategies for the Lustre Parallel File System
    Evaluation of Active Storage Strategies for the Lustre Parallel File System Juan Piernas Jarek Nieplocha Evan J. Felix Pacific Northwest National Pacific Northwest National Pacific Northwest National Laboratory Laboratory Laboratory P.O. Box 999 P.O. Box 999 P.O. Box 999 Richland, WA 99352 Richland, WA 99352 Richland, WA 99352 [email protected] [email protected] [email protected] ABSTRACT umes of data remains a challenging problem. Despite the Active Storage provides an opportunity for reducing the improvements of storage capacities, the cost of bandwidth amount of data movement between storage and compute for moving data between the processing nodes and the stor- nodes of a parallel filesystem such as Lustre, and PVFS. age devices has not improved at the same rate as the disk ca- It allows certain types of data processing operations to be pacity. One approach to reduce the bandwidth requirements performed directly on the storage nodes of modern paral- between storage and compute devices is, when possible, to lel filesystems, near the data they manage. This is possible move computation closer to the storage devices. Similarly by exploiting the underutilized processor and memory re- to the processing-in-memory (PIM) approach for random ac- sources of storage nodes that are implemented using general cess memory [16], the active disk concept was proposed for purpose servers and operating systems. In this paper, we hard disk storage systems [1, 15, 24]. The active disk idea present a novel user-space implementation of Active Storage exploits the processing power of the embedded hard drive for Lustre, and compare it to the traditional kernel-based controller to process the data on the disk without the need implementation.
    [Show full text]
  • Design and Implementation of Archival Storage Component of OAIS Reference Model
    Masaryk University Faculty of Informatics Design and implementation of Archival Storage component of OAIS Reference Model Master’s Thesis Jan Tomášek Brno, Spring 2018 Masaryk University Faculty of Informatics Design and implementation of Archival Storage component of OAIS Reference Model Master’s Thesis Jan Tomášek Brno, Spring 2018 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jan Tomášek Advisor: doc. RNDr. Tomáš Pitner Ph.D. i Acknowledgement I would like to express my gratitude to my supervisor, doc. RNDr. Tomáš Pitner, Ph.D. for valuable advice concerning formal aspects of my thesis, readiness to help and unprecedented responsiveness. Next, I would like to thank to RNDr. Miroslav Bartošek, CSc. for providing me with long-term preservation literature, to RNDr. Michal Růžička, Ph.D. for managing the very helpful internal documents of the ARCLib project, and to the whole ARCLib team for the great col- laboration and willingness to meet up and discuss questions emerged during the implementation. I would also like to thank to my colleagues from the inQool com- pany for providing me with this opportunity, help in solving of practi- cal issues, and for the time flexibility needed to finish this work. Last but not least, I would like to thank to my closest family and friends for all the moral support. iii Abstract This thesis deals with the development of the Archival Storage module of the Reference Model for an Open Archival Information System.
    [Show full text]
  • Ceph Done Right for Openstack
    Solution Brief: + Ceph done right for OpenStack The rise of OpenStack For years, the desire to standardize on an OpenStack deployments vary drastically open platform and adopt uniform APIs was as business and application needs vary the primary driver behind OpenStack’s rise By its nature and by design from the outset, OpenStack introduces across public and private clouds. choice for every aspect and component that comprises the cloud system. Cloud architects who deploy OpenStack enjoy Deployments continue to grow rapidly; with more and larger a cloud platform that is flexible, open and customizable. clouds, deep adoption throughout users’ cloud infrastructure and maturing technology as clouds move into production. Compute, network and storage resources are selected by business requirement and application needs. For storage needs, OpenStack OpenStack’s top attributes are, not surprisingly, shared by the most cloud architects typically choose Ceph as the storage system for its popular storage software for deployments: Ceph. With Ceph, users scalability, versatility and rich OpenStack support. get all the benefits of open source software, along with interfaces for object, block and file-level storage that give OpenStack what While OpenStack cloud architects find Ceph integrates nicely it needs to run at its best. Plus, the combination of OpenStack without too much effort, actually configuring Ceph to perform and Ceph enables clouds to get faster and more reliable requires experienced expertise in Ceph Software Defined Storage the larger they get. (SDS) itself, hardware architecture, storage device selection and networking technologies. Once applications are deployed It sounds like a match made in heaven - and it is - but it can also be within an OpenStack cloud configured for Ceph SDS, running a challenge if you have your heart set on a DIY approach.
    [Show full text]
  • Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC
    Paper SAS569-2017 Shared File Systems: Determining the Best Choice for your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC ABSTRACT If you are planning on deploying SAS® Grid Manager and SAS® Enterprise BI (or other distributed SAS® Foundation applications) with load balanced servers on multiple operating systems instances, , a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term "shared file system" to mean both a clustered file system and shared file system, even though" shared" can denote a network file system and a distributed file system – not clustered. INTRODUCTION This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses. SAS GRID COMPUTING REQUIREMENTS FOR SHARED FILE SYSTEMS Before we get into the reasons why a shared file system is needed for SAS® Grid Computing, let’s briefly discuss the SAS I/O characteristics. GENERAL SAS I/O CHARACTERISTICS SAS Foundation creates a high volume of predominately large-block, sequential access I/O, generally at block sizes of 64K, 128K, or 256K, and the interactions with data storage are significantly different from typical interactive applications and RDBMSs. Here are some major points to understand (more details about the bullets below can be found in this paper): SAS tends to perform large sequential Reads and Writes.
    [Show full text]