<<

A webinar on CSC’s Services for Bio-users (23.03.2020)

CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Outline •Accessing CSC Services

•CSC Supercomputing Environment (i.e., Puhti)

•CSC Data Storage Environment (i.e., Allas)

•CSC Services (e.g., cPouta )

•Other Relevant Services for Biousers

•Take Home Message

2 Accessing CSC Services How to get access? Your Haka/Virtu user ID is your access to our services. • Use CSC customer portal MyCSC (https://my.csc.fi/welcome) • Register to get a personal CSC user account • If your organization does not have Haka, please contact our customer services Customer service • Support and guidance [email protected] • Weekdays 8.30–16.00.

4 More about our customer portal – my.csc.fi

Manage your account

Create projects Apply for resources

Register as Manage a CSC your customer personal my.csc.fi information

Add + more services

Add members to projects

5 Visit: https://docs.csc.fi/accounts/ CSC Supercomputing Environment

Visit: https://docs.csc.fi/computing/overview/ 6 CSC supercomputing environment Why: • Huge number of computational memory requirements • Less Scalability on local clusters • Parallel computing is needed • Time consuming operations • Non-optmised programmes CSC options: • Puhti – our successor of Taito oPuhti - Supercomputer with Intel CPUs oPuhti-ai – Supercomputer with GPUs • Mahti – our successor of Sisu (close to piloting phase) • Lumi – EuroHPC (System installations: Q4/2020)

7 Some basic info on Puhti Supercomputer • Pre-installed bio-software stack on Puhti available at: https://docs.csc.fi/apps/ • Understand Puhti workspace directories, defaults quota and max. number files o HOME – user specific / small data … o PROJAPPL – project specific / your installations/ sharing project code… o SCRATCH – project specific / Actual data / temporary space / automatic cleaning/ Billing units • Support for module environment o module command module-name • Slurm configuration for running batch jobs o note: #SBATCH --account=project_XXXXX • Support for interactive jobs • Support for Singularity containers

8 DEMO: Getting familiar with basic usage of Puhti

9 CSC Data Storage Environment (Allas)

https://docs.csc.fi/data/Allas/ 10 Allas – object storage service

• Active data • Project-based • Sharing data

11 Allas – first steps for Puhti

• Use https://my.csc.fi to apply Allas access for your project § Allas is not automatically available • In Puhti and (in future) Mahti, setup connection to Allas with the commands: module load allas allas-conf • Refer to our manual pages and start using Allas with or a-tools:

https://docs.csc.fi/data/Allas/introduction/ Allas – a-tools

• A-tools provide easy and safer way to use Allas l Developed for CSC server environmnet (Puhti, Mahti) but you can install the tools in other linux and mac machines too. l Unlike rclone, a-tools do not overwrite and remove data without asking! l Automatic packing and compression. l Uses default bucket names based on directories of Puhti

Visit: https://docs.csc.fi/data/Allas/using_allas/a_commands/ Example command with a-put

Puhti Allas quota for project_123 /scratch/project_123 123-puhti- case1/ data1.txt SCRATCH data2.txt case1.tar.zst data3.txt case1.tar.zst_ameta case1.tar.zst

Command: a-put case1 DEMO: Getting familiar with basic usage of ALLAS

15 CSC Cloud Services (e.g., Pouta)

16 use cases

• ”We need root access” • Deploying tools with web interfaces • CSC Private Cloud (ePouta) for sensitive data • Dont want to stand in batch queues for the execution of jobs • Advanced users – able to manage servers • Difficult workflows – can’t run on Puhti

17 CSC cloud service models

Infrastructure (IaaS) CSC’s ePouta/cPouta

Platform as a Service (PaaS) CSC’s RAHTI CSC’s notebook.csc.fi

Software as a Service (SaaS) CSC’s Chipster,..

18 DEMO : few examples to deploy web tools on cPouta

19 ePouta IaaS Cloud for sensitive data

• ePouta is a cloud computing environment (Infrastructure as a Service, IaaS) designed for processing sensitive data • It allows customers to access, use and manage virtualized infrastructure using a self-service model. • Ongoing further developments by ELIXIR activities

20 Other relevant services for biousers

21 Notebooks

•Easy to use: No software installations, No Firewall rules, No extra registrations: Login with your Haka account. •Blueprints Available: • Jupyter Notebooks: Customize your own interactive working environment • R-studio servers: Data Analytics and Visualization • Apache Spark: Crunch your BigData • Tensor Flow and Keras: Deep & Data Analytics

Visit: https://notebooks.csc.fi 22 Chipster • Easy to use • 450 analysis tools oSingle cell RNA-seq oRNA-seq omiRNA-seq o16S amplicon seq oChIP-seq oetc • Tutorials in YouTube • Log in with HAKA • https://chipster.csc.fi

23 Training portfolio https://www.csc.fi/training

High- Computing Methods & Programming Performance Data Networking IT Security Platforms Software Computing

Finite Element Parallel Data Intensive Network Secure IT Linux 1, 2 and 3 Methods Fortran programming Computing Administration Practices (Elmer)

CSC Comp. Fluid Data Network Network Computing Accelerators Dynamics Python / R Management Technologies Security Platforms (OpenFOAM)

Molecular Cloud Staging & Network System Optimisation Dynamics Scripting computing Storage Protocols Security (Gromacs)

Quantum System Network Parallel Debugging Chemistry Parallel I/O Watch programming workshops Services (GPAW) Webinars Next- PGAS Meta-data Network in YouTube Generation languages Repositories Security Sequencing E-learning Parallel I/O Visualisation material

CSC Summer School in HPC CSC Winter School in Bioinformatics CSC Spring School in Comp. Chemistry 24 Learning materials for bio-users

• Course & eLearning materials, tutorials and webinar recordings for bioscientists: ohttps://research.csc.fi/bioscience-learning-materials ohttps://research.csc.fi/rnaseq-tutorial • Chipster: Youtube channel & course material packages oCourse materials available for: o RNA-seq data analysis o Single cell RNA-seq data analysis o Virus detection using small RNA-seq o Community analysis of amplicon sequencing data (16S) o Detection and annotation of genomic variants o ChIP-seq data analysis o Microarray data analysis

25 https://research.csc.fi/biosciences Fairdata.fi

• National integrated services for storing, describing and sharing and preserving research data

• Provided by MinEdu

• Produced by CSC and National Library of Finland

• Make your data safe , documented and citable o IDA – Research data storage service o ETSIN – Research data finder o QVAIN – Research dataset metadata tool o FAIRDATA-PAS – Digital preservation for research data

26 Take Home Message

• Manage your csc services via. our customer portal: my.csc.fi • Make use of csc resources for your research oResources are (mostly) free for open science research oCSC environment is different from a laptop or single workstation • Participate in CSC training, read materials and watch webinars in YouTube • CSC user documentation pages: docs.csc.fi • Join the [email protected] e-mail list and get our bioNewsletter • Support and guidance: [email protected]

27