An Open Source Managed File Transfer Framework for Science Gateways

Total Page:16

File Type:pdf, Size:1020Kb

An Open Source Managed File Transfer Framework for Science Gateways An Open Source Managed File Transfer Framework for Science Gateways Dimuthu Wannipurage Suresh Marru Eroma Abeysinghe Cyberinfrastructure Integration Cyberinfrastructure Integration Cyberinfrastructure Integration Research Center, Pervasive Technology Research Center, Pervasive Technology Research Center, Pervasive Technology Institute Institute Institute Indiana University Indiana University Indiana University Bloomington, IN 47408 Bloomington, IN 47408 Bloomington, IN 47408 [email protected] [email protected] [email protected] Isuru Ranawaka Anna Branam Marlon Pierce Cyberinfrastructure Integration Cyberinfrastructure Integration Cyberinfrastructure Integration Research Center, Pervasive Technology Research Center, Pervasive Technology Research Center, Pervasive Technology Institute Institute Institute Indiana University Indiana University Indiana University Bloomington, IN 47408 Bloomington, IN 47408 Bloomington, IN 47408 [email protected] [email protected] [email protected] Abstract—Managed File Transfer (MFT) systems are ​ networking, support for extremely large data cyberinfrastructure that provide higher level transfers, reliable data transfers, scheduled transfers, functionalities on top of basic point-to-point data transfer protocols such as FTP, HTTP, and SCP. This and centralized audit logging across many systems. paper presents an open source MFT system that incorporates science gateway requirements. We describe All of these capabilities can be customized the system requirements, system architecture, and core extensively for different users and scenarios, and the capabilities. exploration of their optimizations is a topic for Keywords—Science gateways, cyberinfrastructure, data distributed systems research. There is a gap in the management, managed file transfer, open source software, cyberinfrastructure ecosystem today in providing an Apache Airavata open platform for enabling community contributions, providing software-level and operational-level I. INTRODUCTION transparency, supporting distributed systems research, and supporting science-gateway specific Managing data transfers between distributed user scenarios. resources across administrative domains is a fundamental capability of cyberinfrastructure, with Science gateways provide science-centric many long-standing solutions [1][2][3][4]. This interfaces to cyberinfrastructure. The key element of problem appears in business-to-business and other a science gateway system is that it integrates diverse domains as well [5], where it is commonly called backend resources that cross multiple administrative Managed File Transfer (MFT). domains on behalf of communities of users. This leads to several issues that a gateway-focused MFT MFT systems build on and provide a layer above system must accommodate. First, authenticated user point-to-point transfer protocols such as FTP, HTTP, identities may not be communicated end-to-end to all and their secure equivalents. Examples of additional systems; delegation, such as with community capabilities include separation of the control and data accounts, is common [6]. Second, gateway users need layers, end-to-end security for multi-hop transfers, ways to grant fine-grained access to and resulting optimized transfers using software-defined 1 operational permissions on particular data sets to StorkCloud [2] was an open source MFT their collaborators. Third, science gateways are implementation that provided an extensible inherently ad-hoc collections of diverse resources that multi-protocol transfer job scheduler, a directory may include multiple cloud providers, so minimizing listing service for prefetching and caching remote unnecessary data ingress and egress is important. directory metadata to minimize response time to Finally, science gateways can be used as mechanisms users, a web API, pluggable transfer modules, and for providing controlled access to sensitive data of all pluggable protocol-agnostic optimization modules types. Transfers of data to, from, and between which could be used to dynamically optimize various resources need to account for security requirements transfer settings to improve performance. arising from standards such as NIST 800-53 and NIST 800-171 that are used to establish HIPAA Rclone [7] is an open source project that supports alignment and for handling controlled unclassified multi-protocol data transfers, including SFTP, FTP, data, respectively. Google Drive, Box and Amazon S3. It has clean credential management APIs to keep track of all the Movement of data across data centers by MFT credentials of different sources. However, the data cyberinfrastructure must address multiple factors: 1) and control paths are interleaved so data must go access protocols to data on different resources may be through the Rclone server irrespective of the source different; 2) transfers crossing multiple and destination. administrative domains need end-to-end security, including authentication, access control, and Alluxio [8] is a data orchestration framework that encryption; 3) data must be discoverable and is being used heavily in “big data” and data analytics accessible to the MFT system, even when the applications where the streaming performance is resources are on local or otherwise private resources; critical. Alluxio is a fully contained distributed 4) transfer paths should be optimized to (depending system where it has its own data catalog and on the scenario) reduce transfer times, provide credential management system. This is good design required security, and minimize or eliminate costs from the product perspective, but it makes Alluxio such as egress charges from commercial clouds; 5) challenging to integrate with other systems, such as recurring transfers should be schedulable; 6) Apache Airavata, that have their own catalog and monitoring and control should be separate from the credential management components. actual data transfer path; 7) the system should Following the evaluation of these systems, we account for data replicas; 8) diverse storage systems identified an opportunity to build a light-weight MFT with, for example, highly varying latencies and system that could be open source, could bandwidths, should be seamlessly integrated; and 9) accommodate our requirements, and could be the system should be inherently asynchronous. integratable with other projects. II. PRIOR WORK III. AIRAVATA MFT ARCHITECTURE Globus [1] is well known for scientific data Airavata MFT has three design goals. First, the management, particularly in the transfer of very large system should separate the data path and the control data sets. Globus data transfer services utilize a high path to provide better flexibility for handling performance, GridFTP point-to-point data transfer transfers and utilizing resources efficiently. Second, protocol. The main advantage of this protocol is that the systems should provide APIs that can be cleanly the data and control paths are clearly separated. This integrated with multiple systems, such as other way, data can be transferred directly from one point Apache Airavata components and other frameworks. to another while keeping the control path for another Finally, we should reuse and build on standard third party. Globus software is closed source, and its transfer protocols where possible. If the data source operations are proprietary. supports a standard transfer protocol, support it as it 2 is and convert it into a common format inside the science gateways and middleware systems that want MFT engine. to use Airavata MFT as a standalone service. To implement the design goals, we identified the Consul: We use HashiCorp’s Consul as a ​ following major components, illustrated in Fig. 1. mediator for state changes, as described below. Fig. 1 includes a detailed view of the total message path among all components of MFT. 1. A Client (such as a science gateway) submits a transfer request to the MFT API Service 2. The API Service delivers the transfer request to the Consul message store. 3. The MFT Controller fetches the request, determines the target agent to do the transfer, and puts another message to the Consul message store for delivering that to the agent. 4. The target agent fetches the message and determines the resource IDs and secret IDs required for transfer. 5. The Agent talks to the Resource Service to fetch resource information from received IDs. Fig. 1. MFT internal components. Numerical labels give steps in a 6. Agent fetches credentials required to talk to data transfer, as described in the text. resource endpoints by talking to the secret Agent: This is the entity that handles the data path service. ​ of transfers. An Agent should know the protocol to 7. Once both resource and credential data are talk to the data source, but it does not store any collected, the agent starts the transfer using resource information or credential locally to compatible protocols. communicate with the data source. IV. DATA TRANSFER SCENARIOS Resource Service: This is the extension point for Fig. 2 illustrates the main concepts of an Airavata plugging the resource APIs of the external gateway MFT deployment: a controller communicates with system. The Agent fetches the resource metadata agents to issue control methods, and the agents from this endpoint. implement the data transfer. Agents can be deployed in several scenarios. Black solid lines indicate data Secrets Service: This is the extension point for paths while dotted lines indicate control paths. Any resource credentials. External systems can
Recommended publications
  • Rclone Mount Using Systemd
    Rclone Mount using Systemd This guide is for advanced users only and it serves as a guide for you to use rclone. The systemd files here are the recommended settings for our slots and will subject to change whenever there are new configurations that are appropriate for the slots. Furthermore, Ultra.cc is not responsible for any data loss or application errors due to this setup should you proceed and will not provide official support for it due to the large volume of variables and different configurations possible with rclone. You may visit the community discord server for help. Please make yourself aware of the Ultra.cc Fair Usage Policy. It is very important not to mount your Cloud storage to any of the premade folders, this creates massive instability for both you and everyone else on your server. Always follow the documentation and create a new folder for mounting. It is your responsibility to ensure usage is within acceptable limits. Ignorance is not an excuse. Please do not mount to any of the default directories such as: files media bin .apps .config www /homexx/username/ or any pre-created directory found on your Ultra.cc Slot Rclone's Mount allows you to mount any of your cloud storage accounts as part of your slot's file system using FUSE. In this guide, we will teach you how to run a rclone mount using systemd. Take note that this guide is setup using Google Drive as the cloud storage provider used. Should you use any other cloud storage providers, you may need consult rclone documentation for the appropriate flags for your setup.
    [Show full text]
  • 1) Installation 2) Configuration
    rclone 1) Installation........................................................................................................................................1 2) Configuration...................................................................................................................................1 2.1) Server setup..............................................................................................................................1 2.2) Client setup...............................................................................................................................2 2.3) Server setup - part 2..................................................................................................................2 2.4) Client verification.....................................................................................................................3 2.5) rclone - part 1............................................................................................................................3 2.6) rclone - part 2............................................................................................................................4 3) Backup configuration.......................................................................................................................5 4) Usage................................................................................................................................................5 1) Installation https://rclone.org/install/ Script installation To install rclone on Linux/macOS/BSD
    [Show full text]
  • Expedite Base/MVS 4.6 Programming Guide
    GXS EDI Services Expedite Base/MVS Programming Guide Version 4 Release 6 GC34-2204-05 Sixth Edition (November 2005) This edition applies to Expedite Base/MVS, Version 4 Release 6, and replaces GC34-2204-04. © Copyright GXS, Inc. 1994, 2005. All rights reserved. Government Users Restricted Rights - Use, duplication, or disclosure restricted. Contents . To the reader . xi Who should read this book . xi Terminology conventions . xi Type conventions . xii How this book is organized . xii Sample files . xiv Summary of changes . xiv Related books . xiv Chapter 1. Introducing Expedite Base/MVS . 1 Information Exchange . 1 Understanding the Expedite Base/MVS operating environment . 2 Hardware requirements . 2 Software requirements . 2 Connecting to Information Exchange . 2 Starting an Information Exchange session . 3 Ending an Information Exchange session . 3 Giving commands to Expedite Base/MVS . 3 Getting responses from Expedite Base/MVS . 3 Understanding command syntax . 4 Identifying Expedite Base/MVS error messages . 5 Identifying Information Exchange error messages . 5 Identifying Expedite Base/MVS completion codes . 5 Sending and receiving data . 5 Send and Receive file number limits . 6 Free-format messages . 7 Acknowledgments . 7 Restart and recovery considerations . 7 Providing security . 8 Selecting the Extended Security Option . 8 Working with libraries . 9 © Copyright GXS, Inc. 1994, 2005 iii Expedite Base/MVS Programming Guide Chapter 2. Setting up files, including the JCL . 11 How Expedite Base/MVS uses its primary files . 11 Expedite Base/MVS files . 12 File limitations . 12 Descriptions of required files . 13 Profile command file (INPRO) . 13 Profile response file (OUTPRO) . 14 Message command file (INMSG) . 15 Message response file (OUTMSG) .
    [Show full text]
  • Telco Edge Cloud: Edge Service Description and Commercial Principles Whitepaper October 2020
    Telco Edge Cloud: Edge Service Description and Commercial Principles Whitepaper October 2020 About the GSMA About the Telco Edge Cloud Group The GSMA represents the interests of The Telco Edge Cloud (TEC) group brings mobile operators worldwide, uniting more together over 20 operators, covering all than 750 operators with over 350 regions, who are working to promote a companies in the broader mobile collaborative deployment of cloud capabilities ecosystem, including handset at the edge of their networks. TEC is aiming to and device makers, software companies, align Multi-Access Edge Computing (MEC) equipment providers and internet business models, charging principles and companies, as well as organisations in commercial deployment considerations. adjacent industry sectors. The GSMA also produces the industry-leading MWC events TEC is working in partnership with the GSMA held annually in Barcelona, Los Angeles Operator Platform Group, which aims to create and Shanghai, as well as the Mobile 360 the architecture and technical requirements to Series of regional conferences. guide other Standard Developing Organisations (SDOs) in the development of For more information, please visit the specifications. GSMA corporate website at www.gsma.com. Follow the GSMA on Twitter: @GSMA. Executive Summary The Telco Edge Cloud taskforce was launched in the GSMA in March 2020 by 19 operators with the intention to design and develop a global edge computing service based on the federation of the edge infrastructures and platforms of a set of operators and edge service providers. Following its principle of being open and inclusive, the taskforce has grown to 25 members at the time of edition of this whitepaper and has set the basis for the global service launch.
    [Show full text]
  • Scaling a Game-Sharing Platform Introduction
    Level Up and SIMMER.io Down Scaling a Game-sharing Platform Introduction Much like gaming, starting a business means a lot of trial and error. In the beginning, you’re just trying to get your bearings and figure out which enemy to fend off first. After a few hours (or a few years on the market), it’s time to level up. SIMMER.io, a community site that makes sharing Unity WebGL games easy for indie game developers, leveled up in a big way to make their business sustainable for the long haul. When the site was founded in September 2017, the development team focused on getting the platform built and out the door, not on what egress costs would look like down the road. As it grew into a home for 80,000+ developers and 30,000+ games, though, those costs started to encroach on their ability to sustain and grow the business. We spoke with the SIMMER.io development team about their experience setting up a multi-cloud solution—including their use of the Bandwidth Alliance between Cloudflare and Backblaze B2 Cloud Storage to reduce egress to $0—to prepare the site for continued growth. © 2021 Backblaze Inc. All rights reserved. 500 Ben Franklin Ct. San Mateo, CA 94401 USA How to Employ a Multi-cloud Approach for Scaling a Web Application In 2017, sharing games online with static hosting through a service like AWS S3 was possible but certainly not easy. As one SIMMER.io team member put it, “No developer in the world would want to go through that.” The team saw a clear market opportunity.
    [Show full text]
  • Information Technology Creates New Opportunities for Network Service Providers
    White Paper “In the Cloud” Information Technology Creates New Opportunities for Network Service Providers Authors Seanan Murphy and Wagdy Samir Service Provider Practice Cisco Internet Business Solutions Group August 2008 Cisco Internet Business Solutions Group (IBSG) Cisco IBSG © 2008 Cisco Systems, Inc. All rights reserved. White Paper “In the Cloud” Information Technology Creates New Opportunities for Network Service Providers Introduction For the past several years, incumbent telecommunications service providers (SPs1) have faced a decline in core legacy revenues, such as fixed-line voice and legacy packet data services in the enterprise and small- and medium-sized business (SMB) markets, which has forced them to move into adjacent markets primarily through acquisitions. Data and voice mobile services and unmanaged fixed data services fueled revenue growth for integrated SPs, but have left a gap in earnings before interest, taxes, depreciation, and amortization (EBITDA). As a result, SPs have been retooling themselves to move “up the stack” into adjacent market spaces in the information technology value chain such as data center, unified communications, security, other IT-centric services, and advisory services. Many large, incumbent SPs entered these “up-the-stack” markets to a greater or lesser degree through acquisitions and alliances with IT providers. Managed network services are a part of this services market and a must-win battle for the incumbent service provider. Meanwhile, as SPs move into these adjacencies, “in the cloud” service providers— such as software-as-a-service (SaaS), infrastructure-as-a-service (IaaS), and hosted service-oriented architecture (HSOA) providers—are targeting transaction costs associated with IT services contracts and offering powerful, scaling economies.
    [Show full text]
  • Initial Definition of Protocols and Apis
    Initial definition of protocols and APIs Project acronym: CS3MESH4EOSC Deliverable D3.1: Initial Definition of Protocols and APIs Contractual delivery date 30-09-2020 Actual delivery date 16-10-2020 Grant Agreement no. 863353 Work Package WP3 Nature of Deliverable R (Report) Dissemination Level PU (Public) Lead Partner CERN Document ID CS3MESH4EOSC-20-006 Hugo Gonzalez Labrador (CERN), Guido Aben (AARNET), David Antos (CESNET), Maciej Brzezniak (PSNC), Daniel Muller (WWU), Jakub Moscicki (CERN), Alessandro Petraro (CUBBIT), Antoon Prins Authors (SURFSARA), Marcin Sieprawski (AILLERON), Ron Trompert (SURFSARA) Disclaimer: The document reflects only the authors’ view and the European Commission is not responsible for any use that may be made of the information it contains. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 863353 Table of Contents 1 Introduction ............................................................................................................. 3 2 Core APIS .................................................................................................................. 3 2.1 Open Cloud Mesh (OCM) ...................................................................................................... 3 2.1.1 Introduction .......................................................................................................................................... 3 2.1.2 Advancing OCM ....................................................................................................................................
    [Show full text]
  • THE EMERGING CLOUD ECOSYSTEM: Cyber Security Plus LI/RD
    Day 2, Thursday, 2012 Jan 19, 09.00 hrs SESSION 4: Security in the Cloud THE EMERGING CLOUD ECOSYSTEM: cyber security plus LI/RD Tony Rutkowski, Yaana Technologies 7th ETSI Security Workshop, 18‐19 Jan 2011 © ETSI 2012. All rights reserved Outline Security as a Business opportunity: A winning driver to ensure technology success and increase confidence and trust amongst end‐users ! CtCurrent Clou d dldevelopment s Cyber security and LI/RD developments Business opportunities 2 ETSI/Security Workshop (7) S4 The Basics: a new cloud‐based global communications infrastructure is emerging Global network architectures are profoundly, rapidly changing • PSTNs/mobile networks are disappearing • Internet is disappearing • Powerful end user devices for virtual services are becoming ubiquitous • End user behavior is nomadic • Huge data centers optimized for virtual services combined with local access bandwidth are emerging worldwide as the new infrastructure These changes are real, compelling, and emerging rapidly Bringing about a holistic “cloud” ecosystem is occupying idindustry in almost every venue around the world 3 ETSI/Security Workshop (7) S4 The Basics: a new cloud‐virtualized global communications architecture Virtualized Line or air Access, IdM & transport Intercloud Other cloud virtualization services, devices interfaces cloud virtualization services services especially for application support Access, IdM & transport General services Intercloud General Access, IdM & transport services General Intercloud Access, IdM & transport services
    [Show full text]
  • Data Protection and Collaboration in Cloud Storage
    Technical Report 1210 Charting a Security Landscape in the Clouds: Data Protection and Collaboration in Cloud Storage G. Itkis B.H. Kaiser J.E. Coll W.W. Smith R.K. Cunningham 7 July 2016 Lincoln Laboratory MASSACHUSETTS INSTITUTE OF TECHNOLOGY LEXINGTON, MASSACHUSETTS This material is based on work supported by the Department of Homeland Security under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Approved for public release: distribution unlimited. This report is the result of studies performed at Lincoln Laboratory, a federally funded research and development center operated by Massachusetts Institute of Technology. This material is based on work supported by the Department of Homeland Security under Air Force Contract No. FA8721-05- C-0002 and/or FA8702-15-D-0001. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Department of Homeland Security. © 2016 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work. Massachusetts Institute of Technology Lincoln Laboratory Charting a Security Landscape in the Clouds: Data Protection and Collaboration in Cloud Storage G. Itkis B. Kaiser J. Coll W. Smith R.
    [Show full text]
  • A View of Cloud Computing
    International Journal of Networked and Distributed Computing, Vol. 1, No. 1 (January 2013), 2-8 A View Of Cloud Computing Juhnyoung Lee Research Staff Member and Manager IBM T. J. Watson Research Center [email protected] Today’s IT infrastructure is under tremendous pressure and is finding it difficult to keep up. In distributed computing environments, up to 85 percent of computing capacity sits idle. 66 percent of every dollar on IT is spent on maintaining current IT infrastructures versus adding new capabilities. In history, operations have industrialized to become smarter. Cloud Computing is positioned to industrialize the IT delivery of the future. It is a natural evolution of the widespread adoption of multiple technical advances in the distributed computing area including virtualization, grid computing, autonomic computing, utility computing and software-as-a-service. It provides a new paradigm for consumption and delivery of IT based services – It provides an enhanced user experience with a self-service user interface for IT management. It abstracts the technical details from end-users so that they no longer need expertise in, or control over, the technology infrastructure “in the cloud” that supports them. It provides flexible pricing based on pay per usage. It enables flexible delivery and sourcing models including private, public and hybrid clouds. Finally, it provides automated provisioning and elastic scaling of IT infrastructure. This paper presents several views on different perspectives of Cloud Computing, including technical advancement, IT delivery and deployment modes, and economics. Keywords: Cloud, cloud computing, infrastructure, services lower cost. Manufacturers use robotics to improve 1. Introduction quality and lower cost.
    [Show full text]
  • Faq Cloud Sync
    FAQ CLOUD SYNC 1 What is Cloud Sync? NetApp® Data Fabric Cloud Sync is a simple replication and synchronization service. This software-as-a-service (SaaS) offering enables you to transfer and synchronize NAS data to and from cloud or on-premises object storage. The SMB/CIFS or NFS server can be the NetApp Cloud Volumes Service, a NetApp system, or a non-NetApp system. Cloud Sync supports these sources and targets: • CIFS • NFS • Amazon S3 • Amazon EFS • Azure Blob • IBM Cloud Object Storage • NetApp StorageGRID® Webscale appliance After your data is synchronized, it remains available for use in the target server and is updated during the next synchronization cycle. © 2019 NetApp, Inc. All Rights Reserved. | 1 2 Why should I use Cloud Sync? Cloud Sync enables you to perform data migration, data transformation, and data synchronization in a fast, efficient, and secure way. Key benefits of using Cloud Sync are: Fast. Cloud Sync transfers data in parallel processes. This speeds throughput to 1TB in four hours (in the default configuration), and up to 10x faster than in-house developed or traditional tools (such as rsync or Robocopy). Efficient. After the initial synchronization, only changes since the last synchronization are transferred. Data that hasn’t changed isn’t re-replicated, which makes updates faster. Cost-effective. Cloud Sync pricing is based on hourly usage, not on capacity. Compatible. Cloud Sync supports any NFS or CIFS servers, Amazon or private S3 buckets, Azure Blob, IBM Cloud Object Storage, Amazon EFS. Secure. Data is not transferred to our service domain; it remains in your protected environment.
    [Show full text]
  • Ph.D. Thesis Security Policies for Cloud Computing Dimitra A. Georgiou
    UNIVERSITΥ OF PIRAEUS ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΕΙΡΑΙΩΣ School of Information and Communication Technologies Department of Digital Systems Systems Security Laboratory Ph.D. Thesis Security Policies for Cloud Computing A dissertation submitted for the degree of Doctor of Philosophy in Computer Science By Dimitra A. Georgiou PIRAEUS 2017 Advisory Committee Costas Lambrinoudakis, Professor (Supervisor) University of Piraeus -------------------------------------------------------------------------------- Sokratis Katsikas, Professor University of Piraeus -------------------------------------------------------------------------------- Christos Xenakis, Associate Professor University of Piraeus -------------------------------------------------------------------------------- UNIVERSITY OF PIRAEUS 2017 2 Examination Committee Costas Lambrinoudakis, Professor University of Piraeus -------------------------------------------------------------------------------- Sokratis Katsikas, Professor University of Piraeus -------------------------------------------------------------------------------- Christos Xenakis, Associate Professor University of Piraeus -------------------------------------------------------------------------------- Stefanos Gritzalis, Professor University of the Aegean (Member) -------------------------------------------------------------------------------- Spyros Kokolakis, Associate Professor University of the Aegean (Member) -------------------------------------------------------------------------------- Aggeliki Tsohou, Assistant Professor
    [Show full text]