CSC's Services for Bio-Users

Total Page:16

File Type:pdf, Size:1020Kb

Load more

A webinar on CSC’s Services for Bio-users (23.03.2020) CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Outline •Accessing CSC Services •CSC Supercomputing Environment (i.e., Puhti) •CSC Data Storage Environment (i.e., Allas) •CSC Cloud Services (e.g., cPouta ) •Other Relevant Services for Biousers •Take Home Message 2 Accessing CSC Services How to get access? Your Haka/Virtu user ID is your access to our services. • Use CSC customer portal MyCSC (https://my.csc.fi/welcome) • Register to get a personal CSC user account • If your organization does not have Haka, please contact our customer services Customer service • Support and guidance [email protected] • Weekdays 8.30–16.00. 4 More about our customer portal – my.csc.fi Manage your account Create projects Apply for resources Register as Manage a CSC your customer personal my.csc.fi information Add + more services Add members to projects 5 Visit: https://docs.csc.fi/accounts/ CSC Supercomputing Environment Visit: https://docs.csc.fi/computing/overview/ 6 CSC supercomputing environment Why: • Huge number of computational memory requirements • Less Scalability on local clusters • Parallel computing is needed • Time consuming operations • Non-optmised programmes CSC options: • Puhti – our successor of Taito oPuhti - Supercomputer with Intel CPUs oPuhti-ai – Supercomputer with GPUs • Mahti – our successor of Sisu (close to piloting phase) • Lumi – EuroHPC (System installations: Q4/2020) 7 Some basic info on Puhti Supercomputer • Pre-installed bio-software stack on Puhti available at: https://docs.csc.fi/apps/ • Understand Puhti workspace directories, defaults quota and max. number files o HOME – user specific / small data … o PROJAPPL – project specific / your installations/ sharing project code… o SCRATCH – project specific / Actual data / temporary space / automatic cleaning/ Billing units • Support for module environment o module command module-name • Slurm configuration for running batch jobs o note: #SBATCH --account=project_XXXXX • Support for interactive jobs • Support for Singularity containers 8 DEMO: Getting familiar with basic usage of Puhti 9 CSC Data Storage Environment (Allas) https://docs.csc.fi/data/Allas/ 10 Allas – object storage service • Active data • Project-based • Sharing data 11 Allas – first steps for Puhti • Use https://my.csc.fi to apply Allas access for your project § Allas is not automatically available • In Puhti and (in future) Mahti, setup connection to Allas with the commands: module load allas allas-conf • Refer to our manual pages and start using Allas with rclone or a-tools: https://docs.csc.fi/data/Allas/introduction/ Allas – a-tools • A-tools provide easy and safer way to use Allas l Developed for CSC server environmnet (Puhti, Mahti) but you can install the tools in other linux and mac machines too. l Unlike rclone, a-tools do not overwrite and remove data without asking! l Automatic packing and compression. l Uses default bucket names based on directories of Puhti Visit: https://docs.csc.fi/data/Allas/using_allas/a_commands/ Example command with a-put Puhti Allas quota for project_123 /scratch/project_123 123-puhti- case1/ data1.txt SCRATCH data2.txt case1.tar.zst data3.txt case1.tar.zst_ameta case1.tar.zst Command: a-put case1 DEMO: Getting familiar with basic usage of ALLAS 15 CSC Cloud Services (e.g., Pouta) 16 Cloud computing use cases • ”We need root access” • Deploying tools with web interfaces • CSC Private Cloud (ePouta) for sensitive data • Dont want to stand in batch queues for the execution of jobs • Advanced users – able to manage servers • Difficult workflows – can’t run on Puhti 17 CSC cloud service models Infrastructure as a Service (IaaS) CSC’s ePouta/cPouta Platform as a Service (PaaS) CSC’s RAHTI CSC’s notebook.csc.fi Software as a Service (SaaS) CSC’s Chipster,.. 18 DEMO : few examples to deploy web tools on cPouta 19 ePouta IaaS Cloud for sensitive data • ePouta is a cloud computing environment (Infrastructure as a Service, IaaS) designed for processing sensitive data • It allows customers to access, use and manage virtualized infrastructure using a self-service model. • Ongoing further developments by ELIXIR activities 20 Other relevant services for biousers 21 Notebooks •Easy to use: No software installations, No Firewall rules, No extra registrations: Login with your Haka account. •Blueprints Available: • Jupyter Notebooks: Customize your own interactive working environment • R-studio servers: Data Analytics and Visualization • Apache Spark: Crunch your BigData • Tensor Flow and Keras: Deep Machine Learning & Data Analytics Visit: https://notebooks.csc.fi 22 Chipster • Easy to use • 450 analysis tools oSingle cell RNA-seq oRNA-seq omiRNA-seq o16S amplicon seq oChIP-seq oetc • Tutorials in YouTube • Log in with HAKA • https://chipster.csc.fi 23 Training portfolio https://www.csc.fi/training High- Computing Methods & Programming Performance Data Networking IT Security Platforms Software Computing Finite Element Parallel Data Intensive Network Secure IT Linux 1, 2 and 3 Methods Fortran programming Computing Administration Practices (Elmer) CSC Comp. Fluid Data Network Network Computing Accelerators Dynamics Python / R Management Technologies Security Platforms (OpenFOAM) Molecular Cloud Staging & Network System Optimisation Dynamics Scripting computing Storage Protocols Security (Gromacs) Quantum System Network Parallel Debugging Chemistry Parallel I/O Watch programming workshops Services (GPAW) Webinars Next- PGAS Meta-data Network in YouTube Generation languages Repositories Security Sequencing E-learning Parallel I/O Visualisation material CSC Summer School in HPC CSC Winter School in Bioinformatics CSC Spring School in Comp. Chemistry 24 Learning materials for bio-users • Course & eLearning materials, tutorials and webinar recordings for bioscientists: ohttps://research.csc.fi/bioscience-learning-materials ohttps://research.csc.fi/rnaseq-tutorial • Chipster: Youtube channel & course material packages oCourse materials available for: o RNA-seq data analysis o Single cell RNA-seq data analysis o Virus detection using small RNA-seq o Community analysis of amplicon sequencing data (16S) o Detection and annotation of genomic variants o ChIP-seq data analysis o Microarray data analysis 25 https://research.csc.fi/biosciences Fairdata.fi • National integrated services for storing, describing and sharing and preserving research data • Provided by MinEdu • Produced by CSC and National Library of Finland • Make your data safe , documented and citable o IDA – Research data storage service o ETSIN – Research data finder o QVAIN – Research dataset metadata tool o FAIRDATA-PAS – Digital preservation for research data 26 Take Home Message • Manage your csc services via. our customer portal: my.csc.fi • Make use of csc resources for your research oResources are (mostly) free for open science research oCSC environment is different from a laptop or single workstation • Participate in CSC training, read materials and watch webinars in YouTube • CSC user documentation pages: docs.csc.fi • Join the [email protected] e-mail list and get our bioNewsletter • Support and guidance: [email protected] 27.
Recommended publications
  • BEST PRACTICE GUIDE for CLOUD and AS-A-SERVICE PROCUREMENTS Executive Summary 1 Introduction

    BEST PRACTICE GUIDE for CLOUD and AS-A-SERVICE PROCUREMENTS Executive Summary 1 Introduction

    BEST PRACTICE GUIDE FOR CLOUD AND AS-A-SERVICE PROCUREMENTS Executive Summary 1 Introduction Specific Models and Understanding Cloud Procurement Service Models Data EXECUTIVE SUMMARY Breach Notification Personnel Security While private companies rapidly move systems and Vendors share blame, too. Lots of cloud providers are new to Encryption applications to the cloud, public agencies still struggle to adopt public sector business, having grown up selling to consumers Audits Operations hosted services that could save money and provide better value. and private firms. These companies don’t always understand Hybrid Cloud Environments legitimate demands that make government contracting Preparation for Migrating Yet states and localities have much to gain from the different from selling to other markets. Failure to accommodate Workloads to the Cloud technology industry’s “as-a-service” revolution. Many unique government requirements can be a deal-breaker for jurisdictions face huge legacy system replacement challenges. agencies charged with protecting the public’s interests. Conclusion They’re also under pressure to provide new classes of digital services. The cloud can offer a better path toward All too often, government and industry aren’t on the same page Workgroup Members modernization — there’s no hardware to buy, you’re always when it comes to cloud services. They may not even speak the and Contributors on the latest version of the software and system capacity same language. can be adjusted almost instantly based on your needs. Appendix 1 Bridging the Gap Model Terms and Conditions Templates So why is government lagging behind? The fact is that These pressures led us to release the first version of this guide Software-as-a-Service governments often struggle to buy cloud-based services because two years ago.
  • Elliptic Curve Cryptography in Cloud Computing Security

    Elliptic Curve Cryptography in Cloud Computing Security

    Elliptic curve cryptography in cloud computing security Manu Gopinathan ([email protected]) Øyvind Nygard ([email protected]) Kjetil Aune([email protected]) December 1st, 2015 1 Abstract Cloud computing is a technological advancement that has been growing swiftly during the last decade. In simple terms, cloud computing is a technology that enables shared, remote, on-demand and ubiquitous access to services through the Internet. It enables consumers to access applications and services that reside on remote servers, without having to allocate large amounts of storage space on their own computer and without the need for extensive compatibility configurations. Many such cloud applications provide services that are meant to handle sensitive user data and thus the protection of this data in terms of access and integrity is of major concern. Space- and time complexity of encryption algorithms can prove to be imperative when it comes to system performance. In this paper we will briefly present how elliptic curve cryptography (EEC) works, and then describe the advantages of it and how it can be used as an encryption solution to security related issues in cloud computing. 2 Introduction In this section we will briefly describe the notion of cloud computing to aid us in the discussion of ECC in cloud computing later. According to the National Institute of Standards and Technology (NIST), essential characteristics for a service based on the cloud computing model are [1]: 1. On-demand self-service: The consumer can provision service capabilities, such as server time and network storage, without actively interacting with the service provider. 2.
  • Data Warehouse Offload to Google Bigquery

    Data Warehouse Offload to Google Bigquery

    DATA WAREHOUSE OFFLOAD TO GOOGLE BIGQUERY In a world where big data presents both a major opportunity and a considerable challenge, a rigid, highly governed traditional enterprise data warehouse isn’t KEY BENEFITS OF MOVING always the best choice for processing large workloads, or for applications like TO GOOGLE BIGQUERY analytics. Google BigQuery is a lightning-fast cloud-based analytics database that lets you keep up with the growing data volumes you need to derive meaningful • Reduces costs and business value, while controlling costs and optimizing performance. shifts your investment from CAPEX to OPEX Pythian’s Data Warehouse Offload to Google BigQuery service moves your workload from an existing legacy data warehouse to a Google BigQuery data • Scales easily and on demand warehouse using our proven methodology and Google experts–starting with a fixed-cost Proof of Concept stage that will quickly demonstrate success. • Enables self-service analytics and advanced analytics GETTING STARTED The Pythian Data Warehouse Offload to Google BigQuery service follows a proven methodology and delivers a Proof of Concept (POC) that demonstrates viability and value within three to four weeks. The POC phase will follow this workflow: 1. Assess existing data warehouse environment to identify tables and up to two reports that will be offloaded in this phase 2. Provision GCP infrastructure including Cloud storage, Bastion hosts, BigQuery, and Networking 3. Implement full repeatable extract/load process for selected tables 4. Implement selected reports on BigQuery 5. Produce report PYTHIAN DELIVERS By the end of the first stage of our engagement, you can expect to have: • Working prototype on BigQuery • Up to two reports • Demonstrated analysis capabilities using one fact with five associated dimensions www.pythian.com • Report that includes: an assessment of your current setup and support you need to plan and maintain your full (including a cost analysis for BigQuery), performance/ Google BigQuery data warehouse and enterprise analytics usability analysis of POC vs.
  • Enhancing Bittorrent-Like Peer-To-Peer Content Distribution with Cloud Computing

    Enhancing Bittorrent-Like Peer-To-Peer Content Distribution with Cloud Computing

    ENHANCING BITTORRENT-LIKE PEER-TO-PEER CONTENT DISTRIBUTION WITH CLOUD COMPUTING A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Zhiyuan Peng IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE Haiyang Wang November 2018 © Zhiyuan Peng 2018 Abstract BitTorrent is the most popular P2P file sharing and distribution application. However, the classic BitTorrent protocol favors peers with large upload bandwidth. Certain peers may experience poor download performance due to the disparity between users’ upload/download bandwidth. The major objective of this study is to improve the download performance of BitTorrent users who have limited upload bandwidth. To achieve this goal, a modified peer selection algorithm and a cloud assisted P2P network system is proposed in this study. In this system, we dynamically create additional peers on cloud that are dedicated to boost the download speed of the requested user. i Contents Abstract ............................................................................................................................................. i List of Figures ................................................................................................................................ iv 1 Introduction .............................................................................................................................. 1 2 Background .............................................................................................................................
  • Performance Efficiency Pillar

    Performance Efficiency Pillar

    Performance Efficiency Pillar AWS Well-Architected Framework Performance Efficiency Pillar AWS Well-Architected Framework Performance Efficiency Pillar: AWS Well-Architected Framework Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. Performance Efficiency Pillar AWS Well-Architected Framework Table of Contents Abstract and Introduction ................................................................................................................... 1 Abstract .................................................................................................................................... 1 Introduction .............................................................................................................................. 1 Performance Efficiency ....................................................................................................................... 2 Design Principles ........................................................................................................................ 2 Definition .................................................................................................................................
  • Paas Solutions Evaluation

    Paas Solutions Evaluation

    PaaS solutions evaluation August 2014 Author: Sofia Danko Supervisors: Giacomo Tenaglia Artur Wiecek CERN openlab Summer Student Report 2014 CERN openlab Summer Student Report 2014 Project Specification OpenShift Origin is an open source software developed mainly by Red Hat to provide a multi- language PaaS. It is meant to allow developers to build and deploy their applications in a uniform way, reducing the configuration and management effort required on the administration side. The aim of the project is to investigate how to deploy OpenShift Origin at CERN, and to which extent it could be integrated with CERN "Middleware on Demand" service. The student will be exposed to modern cloud computing concepts such as PaaS, and will work closely with the IT middleware experts in order to evaluate how to address service needs with a focus on deployment in production. Some of the tools that are going to be heavily used are Puppet and Openstack to integrate with the IT infrastructure. CERN openlab Summer Student Report 2014 Abstract The report is a brief summary of Platform as a Service (PaaS) solutions evaluation including investigation the current situation at CERN and Services on Demand provision, homemade solutions, external market analysis and some information about PaaS deployment process. This first part of the report is devoted to the current status of the process of deployment OpenShift Origin at existing infrastructure at CERN, as well as specification of the common issues and restrictions that were found during this process using different machines for test. Furthermore, the following open source software solutions have been proposed for the investigation of possible PaaS provision at CERN: OpenShift Online; Cloud Foundry; Deis; Paasmaster; Cloudify; Stackato; WSO2 Stratos.
  • Economic and Social Impacts of Google Cloud September 2018 Economic and Social Impacts of Google Cloud |

    Economic and Social Impacts of Google Cloud September 2018 Economic and Social Impacts of Google Cloud |

    Economic and social impacts of Google Cloud September 2018 Economic and social impacts of Google Cloud | Contents Executive Summary 03 Introduction 10 Productivity impacts 15 Social and other impacts 29 Barriers to Cloud adoption and use 38 Policy actions to support Cloud adoption 42 Appendix 1. Country Sections 48 Appendix 2. Methodology 105 This final report (the “Final Report”) has been prepared by Deloitte Financial Advisory, S.L.U. (“Deloitte”) for Google in accordance with the contract with them dated 23rd February 2018 (“the Contract”) and on the basis of the scope and limitations set out below. The Final Report has been prepared solely for the purposes of assessment of the economic and social impacts of Google Cloud as set out in the Contract. It should not be used for any other purposes or in any other context, and Deloitte accepts no responsibility for its use in either regard. The Final Report is provided exclusively for Google’s use under the terms of the Contract. No party other than Google is entitled to rely on the Final Report for any purpose whatsoever and Deloitte accepts no responsibility or liability or duty of care to any party other than Google in respect of the Final Report and any of its contents. As set out in the Contract, the scope of our work has been limited by the time, information and explanations made available to us. The information contained in the Final Report has been obtained from Google and third party sources that are clearly referenced in the appropriate sections of the Final Report.
  • Faq Cloud Sync

    Faq Cloud Sync

    FAQ CLOUD SYNC 1 What is Cloud Sync? NetApp® Data Fabric Cloud Sync is a simple replication and synchronization service. This software-as-a-service (SaaS) offering enables you to transfer and synchronize NAS data to and from cloud or on-premises object storage. The SMB/CIFS or NFS server can be the NetApp Cloud Volumes Service, a NetApp system, or a non-NetApp system. Cloud Sync supports these sources and targets: • CIFS • NFS • Amazon S3 • Amazon EFS • Azure Blob • IBM Cloud Object Storage • NetApp StorageGRID® Webscale appliance After your data is synchronized, it remains available for use in the target server and is updated during the next synchronization cycle. © 2019 NetApp, Inc. All Rights Reserved. | 1 2 Why should I use Cloud Sync? Cloud Sync enables you to perform data migration, data transformation, and data synchronization in a fast, efficient, and secure way. Key benefits of using Cloud Sync are: Fast. Cloud Sync transfers data in parallel processes. This speeds throughput to 1TB in four hours (in the default configuration), and up to 10x faster than in-house developed or traditional tools (such as rsync or Robocopy). Efficient. After the initial synchronization, only changes since the last synchronization are transferred. Data that hasn’t changed isn’t re-replicated, which makes updates faster. Cost-effective. Cloud Sync pricing is based on hourly usage, not on capacity. Compatible. Cloud Sync supports any NFS or CIFS servers, Amazon or private S3 buckets, Azure Blob, IBM Cloud Object Storage, Amazon EFS. Secure. Data is not transferred to our service domain; it remains in your protected environment.
  • Application of Google Cloud Platform to Machine Learning Problems

    Application of Google Cloud Platform to Machine Learning Problems

    Machine Learning as a Service Application of Google Cloud Platform to Machine Learning problems Marco Landoni INAF – National Insitute of Astrophysics Brera Astronomical Observatory [email protected] Warning! •One size does not fit all. • Each problem has its proper computational model and architecture that maximise the return, the cost efficiency and the proper exploitation of common resources. • I will try to give you an overview of just 1 platform and the related main services (this should serve as the “1st page of the book”). Cloud computing paradigm • Cloud Computing is a style of computing paradigm in which typically real-time scalable resources can be accessible via Internet to users. Pay as-you-go for resource utilisation. (Wikipedia) Various providers …. Main Cloud Computing Services • Computational power available for a reasonable price. • Storage with high availability, virtual infinite storage and durability • A large set of services for data handling and analytics • Streaming • Data ingestion from various sources (e.g. sensor networks) • Messages and queue managements Machine Learning As A Service (MLaaS) • Set of services that offer machine learning tools as part of cloud computing services. • MLaaS providers offer tools including: • Data visualization • APIs, face recognition, natural language processing, predictive analytics and deep learning, data preparation and cleaning… • The provider's data centers handle the actual computation. You can focus only on Data Science The case of Google Cloud Platform Freely took from
  • Evolution of As-A-Service Era in Cloud

    Evolution of As-A-Service Era in Cloud

    Evolution of as-a-Service Era in Cloud Sugam Sharma Center for Survey Statistics and Methodology, Iowa State University, Ames, Iowa, USA Email: [email protected] Abstract. Today, a paradigm shift is being observed in science, where the focus is gradually shifting toward the cloud environments to obtain appropriate, robust and affordable services to deal with Big Data challenges (Sharma et al. 2014, 2015a, 2015b). Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. In this paper, we study the evolution of as-a-Service modalities, stimulated by cloud computing, and explore the most complete inventory of new members beyond traditional cloud computing stack. Keywords. Cloud, as-a-Service 1. Introduction Today, the cloud computing also has emerged as one of the major shifts in recent information and communication age that promises an affordable and robust computational architecture for large-scale and even for overly complex enterprise applications (Fenn et al. 2008). It is a powerful and revolutionary paradigm that offers service-oriented computing and abstracts the software-equipped hardware infrastructure from the clients or users. Although, the concept of cloud computing is mainly popular in three praxises- 1) IaaS (Infrastructure-as-a-Service ), 2) PaaS (Platform-as-a-Service ), and 3) SaaS (Software-as-a-Service ), but in this data science age, should be equally expandable to DBaaS (Database- as-a-Service ) (Curino, et al., 2011; Seibold et al., 2012). Over the last few year, the cloud computing has evolved as scalable, secure and cost-effective solutions to an overwhelming number of applications from diversified areas.
  • Rethinking Scalable Service Architectures for the Internet of Things

    Rethinking Scalable Service Architectures for the Internet of Things

    Devices-as-Services: Rethinking Scalable Service Architectures for the Internet of Things Fatih Bakir, Rich Wolski, Chandra Krintz Gowri Sankar Ramachandran Univ. of California, Santa Barbara Univ. of Southern California Abstract vices at the edge that augment device capabilities and enable We investigate a new distributed services model and architec- scale. In this paper, we outline this approach to implementing ture for Internet of Things (IoT) applications. In particular, Devices-as-Services and describe some of the capabilities of we observe that devices at the edge of the network, although an early prototype. resource constrained, are increasingly capable – performing Our work is motivated by the following observations. • IoT applications can and will likely be structured as collec- actions (e.g. data analytics, decision support, actuation, con- tions of services that require functionality from a device trol, etc.) in addition to event telemetry. Thus, such devices tier, an edge tier, and a cloud tier are better modeled as servers, which applications in the cloud • in-network data processing can significantly reduce re- compose for their functionality. We investigate the implica- sponse time and energy consumption [31], tions of this “flipped” IoT client-server model, for server dis- • edge isolation precludes the need for dedicated commu- covery, authentication, and resource use. We find that by com- nication channels between application and devices, and bining capability-based security with an edge-aware registry, facilitates privacy protection for both data and devices, this model can achieve fast response and energy efficiency. • actuation in device tier will require some form of request- 1 Introduction response protocol where the device fields the request, • the heterogeneity of devices militates for a single program- As the Internet of Things (IoT) grows in size and ubiquity, it ming paradigm and distributed interaction model, and is becoming critical that we perform data-driven operations • multi-function devices can and will be able to perform their (i.e.
  • Cloud Simply

    Cloud Simply

    A more simple, secure way to use the cloud1 simplycloud Boost your cloud-centric workflow with the HP Chrome Enterprise2 portfolio. Sleek, easy-to-use devices loved by business pros are coupled with comprehensive manageability and endpoint security that makes IT’s job easier. Because simple is always better. A simple, powerful way to work HP Chrome Enterprise adapts to the way your users work, whether it be on the go or at their desk. Multi- simplywork task nimbly and smoothly with access to millions of apps1 like Microsoft Office 365, G Suite, SaaS apps, and more on robust, beautiful devices that boot in under six seconds. A simple way to secure what matters Keeping your endpoints safe from evolving cyber threats is easy with multi-layered, hardware- enforced devices, data and identity security, AI- simply secure driven threat detection, sandboxing, anti-theft protection1, automatic updates and a self-healing, read-only OS. A simple way to manage Seamlessly integrate devices into your environment simplymanage in minutes with the perpetual Chrome Enterprise Upgrade2 that enables 200+ policies and support for Active Directory and leading VDI and enterprise mobility management solutions3. of workers consider % browser-based apps easier Workers spend hrs per day on their 94 to use than desktop apps more than 4 web browser The HP Difference • Professional, sleek and long-lasting devices • Professional, world-class limited warranties, designed with global availability PC lifecycle support, and HP Care Pack Services4 • Compatibility-tested