Irods and Spectrum Archive

Total Page:16

File Type:pdf, Size:1020Kb

Irods and Spectrum Archive Integration of iRODS with IBM Spectrum Archive™ Enterprise Edition A flexible tiered storage archiving solution Version 1.1 (02/17/2020) Nils Haustein IBM European Storage Competence Center Mauro Tridici Euro-Mediterranean Center on Climate Change (CMCC) Contents Introduction ............................................................................................................................................ 3 Executive summary ............................................................................................................................. 4 Tiered storage file systems ..................................................................................................................... 4 IBM Spectrum Scale with IBM Spectrum Archive ............................................................................... 5 Challenges with tiered storage file systems ....................................................................................... 6 iRODS ...................................................................................................................................................... 6 Integration of iRODS with IBM Spectrum Archive .................................................................................. 7 Typical workflows ............................................................................................................................... 8 Preventing transparent recalls ............................................................................................................ 9 Solution in action .......................................................................................................................... 12 Further work ..................................................................................................................................... 14 Quota ............................................................................................................................................ 14 Automated file tagging ................................................................................................................. 16 Display file state ............................................................................................................................ 18 References ............................................................................................................................................ 21 Disclaimer.............................................................................................................................................. 22 © IBM Corporation 2020 and © CMCC Foundation 2020 Page 2 Introduction Arching means storing data over long time periods while providing search capabilities to find and access the data. One of the key requirements for archiving is cost efficiency, especially when the volume of archive data that must be kept for long period of time grows into Petabyte ranges. Depending on the type of the archived data and its value the requirements for performance and cost of storage change over time. Different storage requirements can be addressed with different types of storage media, such as Flash and SSD, disk and tapes. As Figure 1 shows, Flash and SSD media provide fast access and is most expensive. Accessing data on hard disk is slower while disks are less expensive than Flash and SSD (considering enterprise grade Flash and disk). While tape storage is suitable for storing large volumes of data over long periods of time at lowest cost, access time to data on tape is significantly higher than to data on disk. Figure 1: Storage media characteristic Different storage media like Flash and SSD, disk and tapes can be combined in a tiered storage system. The tiered storage systems have the capability to automatically move archived data between the different storage tiers based on policies. This assures that archived data is stored on the most appropriate storage media over the lifecycle. Another key component for archiving is a data management system that allows to index archived data and allow the user to search, share and access the data he is authorized for. Data management capabilities can be provided by the open source data management software iRODS [1]. In this paper we describe the challenges of tiered storage systems using tape as a storage tier and how these challenges can be addressed when combining iRODS and IBM Spectrum Archive™ Enterprise Edition. We demonstrate how to prevent transparent recalls when migrated files are accessed through iRODS and instead perform tape optimized recalls with IBM Spectrum Archive. Furthermore, we demonstrate configuring quota in iRODS that applies to the disk and tape storage tiers. And we demonstrate harvesting metadata from the file content and using this for subsequent searches. © IBM Corporation 2020 and © CMCC Foundation 2020 Page 3 Executive summary The combination with iRODS and IBM Spectrum Archive Enterprise Edition allows to establish a tiered storage environment where transparent recalls triggered by the user accessing a migrated file are prevented. Instead migrated files are recalled in a tape optimized manner which is much faster than transparent recalls and more resource gentle. The user also has the possibility to check the state of a file in order to determine if the file is on disk or on tape of the tiered storage file system. Furthermore, this combination allows to enforce quota that applies for all user data stored in the tiered storage environment, regardless if the user data is stored on the disk or the tape tier. This prevents users from storing unlimited amount of data in a tiered storage environment. The automatic data harvesting and indexing capabilities of iRODS allow to automatically extract information from files after they have been stored in the iRODS system and before these files are migrated to tape. The harvested information can be used to discover and search for data based on metadata information. Finally, iRODS’ flexible plugin architecture, its data virtualization capabilities and its indexing and search capabilities paired with the cost efficiency of tiered storage file systems provided by IBM Spectrum Scale™ with IBM Spectrum Archive™ Enterprise Edition make this combination a perfect archiving solution for any types of data. There are many other use cases that can be addressed with the combination of iRODS and IBM Spectrum Archive Enterprise Edition. Tiered storage file systems A tiered storage file system (Figure 2) provides flash, disk and tape storage within a global file system namespace where a space management component transparently moves files between the storage tiers. The file system name space is accessible by users and applications through standard file system protocols such as NFS, SMB and POSIX. Figure 1: Tiered storage file system Tiered storage file systems with tape are primarily used for archiving large volumes of data for long period of time. An example of a tiered storage file system is IBM Spectrum Scale with IBM Spectrum Archive Enterprise Edition that is explained in the next section. © IBM Corporation 2020 and © CMCC Foundation 2020 Page 4 IBM Spectrum Scale with IBM Spectrum Archive IBM Spectrum Scale™ [2] is a software-defined scalable parallel file system providing tiered storage capabilities. The IBM Spectrum Scale file systems are provided by servers that are configured in a cluster. The storage used for the file system can be within the servers or externally connected to the servers. IBM Spectrum Scale also offers exporting the file systems via standardized protocols such as NFS, SMB, HDFS and Object. IBM Spectrum Archive Enterprise Edition [3] provides the tape tier within an IBM Spectrum Scale file system. IBM Spectrum Archive manages the tape library operations and stores the data on the tape tier in the Linear Tape File System (LTFS) format [4]. As shown in Figure 3, the combination of IBM Spectrum Scale with IBM Spectrum Archive provides a tiered storage file system with different storage media including Flash and SSD, disk and tape. While Flash and disk storage are managed by IBM Spectrum Scale directly, the tape storage is managed by Spectrum Archive. IBM Spectrum Scale provides a policy engine that allows to place the files on a storage tier upon file create and migrate the files to other storage tiers over the data lifecycle. Policies are defined and tested once and can then be configured to run automatically in the background. For example, a policy can places all new files on the disk storage tier of the IBM Spectrum Scale file system and if files have not been accessed for 30 days then migrate these files to tape storage. Figure 3: Combination of IBM Spectrum Scale with IBM Spectrum Archive tape tier After the migration of files to tape these files are still visible in the IBM Spectrum Scale file system and can be readily accessed. Upon access the files are automatically copied (or recalled) from tape back to disk storage. For this purpose, the tape-ID number is stored in the file metadata. There are two kind of recalls: transparent recall and tape optimized recall. A transparent recall is triggered transparently when a migrated file is accessed in the file system. IBM Spectrum Archive loads the tape-ID referenced in the file stub, positions to the file on tape and reads the file back to disk. When multiple migrated files are accessed at the same time, then multiple independent transparent recalls are triggered, each requiring a tape to be loaded, positioned and a file
Recommended publications
  • WP: ECS Overview & Architecture
    Technical White Paper ECS Overview and Architecture Abstract This document provides a technical overview and design of the Dell EMC™ ECS™ software-defined cloud-scale object storage platform. August 2021 h14071.19 Revisions Revisions Date Description December 2015 Initial release May 2016 Updated for 2.2.1 September 2016 Updated for 3.0 August 2017 Updated for 3.1 March 2018 Updated for 3.2 September 2018 Updated for Gen3 Hardware February 2019 Updated for 3.3 September 2019 Updated for 3.4 February 2020 Updated ECSDOC-628 changes May 2020 Updated for 3.5 November 2020 Updated for 3.6 February 2021 Updated for 3.6.1 August 2021 Updated for 3.6.2 and compression mechanism Acknowledgements This paper was produced by the following: Author: Zhu, Jarvis The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. This document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to update the document over subsequent future releases to revise these words accordingly. This document may contain language from third party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell's own content. When such third-party content is updated by the relevant third parties, this document will be revised accordingly.
    [Show full text]
  • Irods User Group Meeting 2016 Proceedings
    iRODS User Group Meeting 2016 Proceedings © 2016 All rights reserved. Each article remains the property of the authors. 8TH ANNUAL CONFERENCE SUMMARY The iRODS User Group Meeting of 2016 gathered together iRODS users, Consortium members, and staff to discuss iRODS-enabled applications and discoveries, technologies developed around iRODS, and future development and sustainability of iRODS and the iRODS Consortium. The two-day event was held on June 8th and 9th in Chapel Hill, North Carolina, hosted by the iRODS Consortium, with over 90 people attending. Attendees and presenters represented over 30 academic, government, and commercial institutions. Contents Listing of Presentations .......................................................................................................................................... 1 ARTICLES iRODS Audit (C++) Rule Engine Plugin and AMQP .............................................................................................. 5 Terrell Russell, Jason Coposky, Justin James - RENCI at UNC Chapel Hill Speed Research Discovery with Comprehensive Storage and Data Management .......................................... 21 HGST | PANASAS | iRODS Integrating HUBzero and iRODS ............................................................................................................................ 23 Rajesh Kalyanam, Robert A. Campbell, Samuel P. Wilson, Pascal Meunier, Lan Zhao, Elizabett A. Hillery, Carol Song - Purdue University An R Package to Access iRODS Directly ...........................................................................................................
    [Show full text]
  • Ios Forensics
    UNIVERSITY OF PIRAEUS Department of Digital Systems MSc in “Digital Systems Security” iOS Forensics Postgraduate student: Nikolaos Anagnostopoulos MTE 1202 Supervisor: Christos Xenakis Assistant Professor, University of Piraeus Academic Year: 2013-1014 Σ ΙΩ Α ΙΡ Ε Π ΙΟ Μ Η Τ Σ Ι Π Ε Ν Α Π THIS PAGE WAS INTENTIONALLY LEFT BLANK Table of Contents Acknowledgements ........................................................................................................ 1 Abstract .......................................................................................................................... 1 Introduction .................................................................................................................... 2 Digital Forensics ............................................................................................................ 3 The current state............................................................................................................ 4 Smartphones operating system market share ............................................................. 4 Smartphones Security ................................................................................................Σ . 5 Mobile Device Forensics ...............................................................................................Ω 6 Evidence of great interest ................................................................Ι ........................... 6 Live forensics ................................................................................................Α
    [Show full text]
  • Free License Key for Jalbum
    Free License Key For Jalbum Free License Key For Jalbum 1 / 4 2 / 4 Download jAlbum (64-bit) for Windows PC from FileHorse. 100% Safe and Secure ✓ Free Download 64-bit Latest Version 2021. ... This product is not just about presenting images, it's just as good at handling videos. ... Open Source software is software with source code that anyone can inspect, modify or .... Download JAlbum for Mac to create web photo albums with slideshows of your images. JAlbum has had ... Key Details of JAlbum. Create web ... During the installation of JAlbum for Mac the user is prompted to buy a product license. Support is .... Publishing your album to the web is easy; use your free Jalbum account or any other hosting provider. ... Fixed "Invalid code" error when activating license Choose the ad free Standard License for non-commercial purposes for just $12 ... code after completing your purchase to sign up at the jAlbum.net website to .. Full Cracked PC Software Product keys, License Key, Serial Free, ... jAlbum 23.2 (64-bit) Crack + License Key Latest 2021 jAlbum Crack into you can make web ... Groschengrab Deluxe 1.31 Cracked.epub EaseUS Data Recovery Wizard Pro 11 Activation Code, EaseUS Data ... Final Cut Library Manager 3.51 Mac Crack Free Download, ... jAlbum 15.2 Serial Key,. Check out this link on youtube it works ! Feb 6, 2013 Computers & Internet. 2 Answers. Adobe photoshop elements 11 serial number free. Telecharger Ali Mini Upgrade 6.10 Gratuit Star Timesl [FULL] Engineering Probability And Statistics D.k Murugesan 3 / 4 ReFX.Nexus.Dance.Orchestra.Expansion.Pack-DYNAMiCS.rar Compare the best jAlbum alternatives in 2021.
    [Show full text]
  • Cyverse Austria—A Local, Collaborative Cyberinfrastructure
    Mathematical and Computational Applications Communication CyVerse Austria—A Local, Collaborative Cyberinfrastructure 1, 1, , 2 3 Konrad Lang y , Sarah Stryeck * y , David Bodruzic , Manfred Stepponat , Slave Trajanoski 4 , Ursula Winkler 2 and Stefanie Lindstaedt 1,5,* 1 Institute for Interactive Systems and Data Science, Graz University of Technology, 8010 Graz, Austria; [email protected] 2 Server and Storage Systems, University of Graz, 8010 Graz, Austria; [email protected] (D.B.); [email protected] (U.W.) 3 Central Information Technology, Graz University of Technology, 8010 Graz, Austria; [email protected] 4 Core Facility Computational Bioanalytics, Medical University of Graz, 8010 Graz, Austria; [email protected] 5 Know-Center GmbH, 8010 Graz, Austria * Correspondence: [email protected] (S.S.); [email protected] (S.L.); Tel.: +43-316-873-30677 (S.S.); +43-316-873-30600 (S.L.) These authors contributed equally to this work. y Received: 25 May 2020; Accepted: 22 June 2020; Published: 24 June 2020 Abstract: Life sciences (LS) are advanced in research data management, since LS have established disciplinary tools for data archiving as well as metadata standards for data reuse. However, there is a lack of tools supporting the active research process in terms of data management and data analytics. This leads to tedious and demanding work to ensure that research data before and after publication are FAIR (findable, accessible, interoperable and reusable) and that analyses are reproducible. The initiative CyVerse US from the University of Arizona, US, supports all processes from data generation, management, sharing and collaboration to analytics.
    [Show full text]
  • Swiftstack and Cyberduck Summary Cyberduck Is an Open Source Client for Browsing Various Storage Back-Ends Including Openstack Swift
    SOLUTION BRIEF SwiftStack and Cyberduck Summary Cyberduck is an open source client for browsing various storage back-ends including OpenStack Swift. Combining Cyberduck with SwiftStack Object Storage enables users to quickly store and retrieve files in a cloud storage system. Like storage technology, Cyberduck has evolved in recent decades. Initially a tool for remotely browsing and facilitating transfers of local storage in stand-alone computers and servers (via FTP), functionality was added to support browsing network file shares (via SFTP and WebDAV) as centralized “filers” became the de facto standard for storage architecture. In recent years, however, storage administrators have “SwiftStack is the simplest way to deploy, found that NAS/SAN-based architectures cannot scale manage, and monitor Swift object storage to the capacities required by today’s applications and and undoubtedly offers the lowest TCO of users—resulting in multiple silos of storage. In addition, any private-cloud object storage on the these legacy disk-based systems are burdened with a market today—never mind the high-cost, very high cost per TB—making them impractical for highly distributed configurations spanning multiple low-scale legacy NAS vendors. With datacenters and geographies. Cyberduck, you have a free tool to view the contents of your SwiftStack object store, and To address today’s storage needs, object storage it only takes 30 seconds to set it up!” technologies have been introduced to enable private and public clouds using predominantly two APIs—OpenStack Chris Nelson, Director of Systems Engineering, Swift’s Swift API and Amazon’s S3 API, and Cyberduck SwiftStack now supports browsing cloud storage using both APIs.
    [Show full text]
  • Operators Manual
    ____________ _______________________________ OPERATORS MANUAL For use with Software Version 4.05.370 Image Server MAXX 6T January 2008 Copyright 2008, 360 Systems All rights reserved Printed in the United States of America 900-105-0002-07 6T UM ____________________________________________________________________________________ Page 2 MAXX 6T Owner’s Manual TABLE OF CONTENTS LIST OF FIGURES............................................................................................... 7 PREFACE............................................................................................................ 8 Software and Operations Manual Revisions................................................ 8 Typographical Conventions........................................................................ 8 SAFETY NOTICES .............................................................................................. 9 Safety Terms and Symbols.......................................................................... 9 General Safety Caution............................................................................... 9 Personal Injury Precautions ........................................................................ 9 PRODUCT DAMAGE PRECAUTIONS ............................................................. 10 Product Registration ................................................................................. 10 Product Improvements and Upgrades....................................................... 10 Trademarks .............................................................................................
    [Show full text]
  • Foss-2020-Spring
    CyVerse Documentation Release 0.2.0 CyVerse Jan 15, 2021 Contents 1 Expected outcomes: 3 2 Funding and Citations 155 i ii CyVerse Documentation, Release 0.2.0 Learning Center Home Foundational Open Science Skills (FOSS) is a novel, camp-style training designed to prepare principal investigators and their lab teams, both new and established, to meet the growing expectations of funding agencies, publishers, and research institutions for scientific reproducibility and data accessibility. Workshop Level There are no pre-requisites for FOSS, but the course will cover a lot of material in a short time. Participants who have limited computational experience should try to view the Software Carpentry Core Lessons before attending. Contents 1 CyVerse Documentation, Release 0.2.0 2 Contents CHAPTER 1 Expected outcomes: • Become familiar with productivity software for organizing your lab group, communications, and research • Learn how to scale out your computation from laptop to cloud and high performance computing (HPC) systems • Learn how to manage data for open science and reproducibility By working through an example project relevant to their interests, participants will practice open science skills using CyVerse, GitHub, R or Python, and other resources. At the end of the week, students will present a plan for how to integrate open science into their labs. Learning Center Home 1.1 Before you arrive Please endeavor to complete the pre-workshop setup before arriving at FOSS. The workshop will be run under a Code of Conduct. Please familiarize yourself with it prior to your arrival. There will be a webinar one week before camp to help answer questions about the setup, logistics, and materials.
    [Show full text]
  • Admin Tools for Wordpress Nicholas K
    Admin Tools for WordPress Nicholas K. Dionysopoulos Davide Tampellini Admin Tools for WordPress Nicholas K. Dionysopoulos Davide Tampellini Copyright © 2017-2021 Akeeba Ltd Abstract This book covers the use of the Admin Tools for WordPress site security plugin for WordPress™ -powered web sites. Both the free Admin Tools Core and the subscription-based Admin Tools Professional editions are completely covered. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the appendix entitled "The GNU Free Documentation License". Table of Contents 1. Getting Started .......................................................................................................................... 1 1. What is Admin Tools? ....................................................................................................... 1 1.1. Disclaimer ............................................................................................................. 1 1.2. The philosophy ....................................................................................................... 2 2. Server environment requirements ......................................................................................... 2 3. Installing Admin Tools ......................................................................................................
    [Show full text]
  • Introduction to Linux
    Introduction to Linux Augustine Abaris Research Computing Services Information Services & Technology Topics for Today ● Research Computing Services ● Linux Overview ● Linux Interaction - Shell and Commands ● I/O redirection (pipes, etc.) ● Navigating the file system ● Processes and job control ● Editors ● Creating and Running Code Research Computing Services Research Computing Services (RCS) A group within Information Services & Technology at Boston University provides computing, storage, and visualization resources and services to support research that has specialized or highly intensive computation, storage, bandwidth, or graphics requirements. Three Primary Services: ● Research Computation ● Research Visualization ● Research Consulting and Training RCS Team and Expertise Our Team Consulting Focus: ● Scientific Programmers ● Bioinformatics ● Systems Administrators ● Data Analysis / Statistics ● Service Management Specialists ● Molecular modeling ● Research Facilitators ● Geographic Information Systems ● Special Initiatives (Grants) ● Scientific/Engineering Simulation ● [email protected] ● Visualization Me ● Lead Systems Programmer/Administrator ● 20+ years of systems administration experience ● 15+ years at BU, contributed to design and deployment of SCC and 2 predecessor HPC clusters ● Contact: [email protected] You ● Who has experience programming? ● Using Linux? ● Using compute clusters? ● Using the Shared Computing Cluster (SCC)? Linux What is an operating system? What is Linux? ● Operating System ● Originated in early-90s ● Free ● Open Source Where is Linux? Why Linux ● Free and open-source. ● Powerful for research datacenters ● Personal for desktops and phones The most common OS used ● Universal by BU researchers when ● Community (and business) driven. working on a server or computer cluster Connecting Let’s use Linux Local System Remote Server Connection Protocols and Software Remote Connections: Remote Graphics: Data Transfer: Secure SHell X-Windowing Secure File Transfer Protocol (SSH) (X, X-Win) (SFTP) Other protocols too, but let’s start with these.
    [Show full text]
  • VSC HPC Tutorial for University of Antwerp Mac Users
    VLAAMS SUPERCOMPUTER Innovative Computing CENTRUM for A Smarter Flanders HPC Tutorial Last updated: September 29 2021 For Mac Users Authors: Franky Backeljauw5, Stefan Becuwe5, Geert Jan Bex3, Geert Borstlap5, Jasper Devreker2, Stijn De Weirdt2, Andy Georges2, Balázs Hajgató1,2, Kenneth Hoste2, Kurt Lust5, Samuel Moors1, Ward Poelmans1, Mag Selwa4, Álvaro Simón García2, Bert Tijskens5, Jens Timmerman2, Kenneth Waegeman2, Toon Willems2 Acknowledgement: VSCentrum.be 1Free University of Brussels 2Ghent University 3Hasselt University 4KU Leuven 5University of Antwerp 1 Audience: This HPC Tutorial is designed for researchers at the University of Antwerp and affiliated institutes who are in need of computational power (computer resources) and wish to explore and use the High Performance Computing (HPC) core facilities of the Flemish Supercomputing Centre (VSC) to execute their computationally intensive tasks. The audience may be completely unaware of the UAntwerpen-HPC concepts but must have some basic understanding of computers and computer programming. Contents: This Beginners Part of this tutorial gives answers to the typical questions that a new UAntwerpen- HPC user has. The aim is to learn how to make use of the HPC. Beginners Part Questions chapter title What is a UAntwerpen-HPC exactly? 1 Introduction to HPC Can it solve my computational needs? How to get an account? 2 Getting an HPC Account How do I connect to the UAntwerpen- 3 Connecting to the HPC infrastructure HPC and transfer my files and pro- grams? How to start background jobs? 4 Running batch jobs How to start jobs with user interaction? 5 Running interactive jobs Where do the input and output go? 6 Running jobs with input/output data Where to collect my results? Can I speed up my program by explor- 7 Multi core jobs/Parallel Computing ing parallel programming techniques? Troubleshooting 8 Troubleshooting What are the rules and priorities of 9 HPC Policies jobs? FAQ 10 Frequently Asked Questions The Advanced Part focuses on in-depth issues.
    [Show full text]
  • Introduction to Working in the Linux Shell Environment September 17, 2012
    Introduction to Working in the Linux Shell Environment September 17, 2012 Note: Do not attempt to copy/paste out of this document. Commands will likely not work! Some software you will need (Note: these are examples of free software, not endorsements): In order to From Mac or Linux From Windows Connect to submit (ssh) Terminal PuTTY Move files to/from submit FileZilla, CyberDuck (Mac) FileZilla, CyberDuck Edit text files TextWrangler (Mac) Notepad++ Getting around in Linux: • File paths (directories or folders): /, /home/magitz/, /scratch/hpc/magitz/ • pwd, cd, ls (Where am I, change directory, list directory) • cp, mv, rm (copy, move, delete) • more, less, head, tail, cat (examine files) • nano, vim (text editors in Linux) Making things easier: • Tab completion • history: redo something that you did before without retyping • man: getting help, also –h or --help flag Learning by doing: 1. Connect to HPC Center: ssh [email protected] a. Type your password and hit return (no characters display while you type). 2. Where are you when you login? pwd 3. What files are there? ls 4. Make a directory: mkdir test_script 5. Now what’s there? ls –l a. Linux commands usually have flags to change how they work b. man, –h or --help often give you help 6. Change into test_script directory: cd test_script or cd tes<tab> 7. Copy a sample file here (.): cp /project/bio/training/2012-09-17/simple.pbs . 8. Check that the copy worked: ls 9. Delete that file: rm simple.pbs a. That file is now GONE! Not in your recycle bin or trashcan, but gone! * *We may be able to retrieve some files, so if it’s an important file, let us know ASAP.
    [Show full text]