Introduction to Pawsey

Advancing Science through Supercomputing Course Objectives

After completing the session, you will: • know some scenarios for researcher use of Pawsey services • understand what the Pawsey Supercomputing Centre is and offers • know how you can benefit from Pawsey services • know how to get access to Pawsey resources Computers in Research How do researchers use computers? Modeling and simulation e.g. fluid dynamics, climate models

Finding patterns in data genomics, finance, humanities

Collaboration data sharing and data access portals

Visualisation Understanding complex, multidimensional data Modelling and Simulation

Computer simulation is fundamental to scientific research • For problems that are too dangerous, too small, too big, too expensive… … to run as experiments • Scientists ever more ambitious in what they want to simulate Modelling and simulation

Computer simulation is now a well- established technique for scientific research • For problems that are too dangerous, too small, too big, too expensive, … to run as experiments • For too many scenarios to run as experiments.

Scientists ever more ambitious in what to simulate Case Study – Stents

• Study to model the blood flow before and after inserting stents

• CT scans used to create volume models of the patients

• Mesh of aorta and stents created

• Computational fluid dynamics used to simulate the pulsing flow through the model

• Researchers: Thanapong Chaichana & Zhonghua Sun, Data storage / management

CATAMI Classification Scheme

Problem: large amounts of data, unstructured, hard to find Solution: standards, policy, storage infrastructure & access services Analysing materials using CT scanning

http://www.kitchen-counter-tops.net

Micro CT(Skyscan) http://www.rjl-microanalytic.com http://www.recipetips.com

Source: http:\\vsg3d.com Typical CT image set sizes: 512x512 x 100 – 500 slices: 26MB – 130MB 1000x1000 x 500 – 1000 slices: 500MB – 1GB 2048x2048 x 1000 – 2000 slices: 16GB – 32GB Why use a supercomputer?

Problem is too big to process on a standard computer

“Too big” may refer to data and/or processing • Data size > a few terabytes • Program memory > a few gigabytes • Calculations that takes weeks or months • Too many scenarios or models

Or… may want to move to a more detailed model that will require the above. Why use a data store?

Easily share data. Data is available and discoverable.

Data provenance ensures data is available for a long time. • Helps researchers share data • Allows citation of data sets • Allows longitudinal data studies • Verification of work carried out a specific time (time stamp) • Sometimes a publication or funding requirement

Data may need to be accessible by other researchers or may simply be too big to move • Store the data near where it will be processed • Not practical or too expensive to transfer large volumes Why use Remote Scientific Visualisation? • Advanced computer algorithms can provide insight • Do not need to transport large data sets • Use low-specification devices such as laptops to view complex data Pawsey Supercomputing Centre Pawsey Supercomputing Centre

Purpose To be a supercomputing centre of international standing with the infrastructure and people that enable solutions to big science problems, bring benefits to government, society and industry, and provide future prosperity for the State and Nation. About Pawsey

Five Partners: • CSIRO • Curtin • ECU • Murdoch • UWA

Funding: • Partners • State government • Federal government Pawsey History

2001-2009 State/partner funded 2009-onwards Federal/state/partner funded

 Larger supercomputers.  More competitive to get access to supercomputing resources.  Higher expectations on science outcomes and simulation sizes. iVEC, Pawsey, and Super Science

In 2009 the Federal Government chose iVEC to establish and manage $80M Pawsey Centre project • $80M included the data centre, supercomputers, data storage, visualisation, networking, pre/post processing cluster. • Renamed iVEC to Pawsey Supercomputing Centre in 2014 Building a Science Engine

Analysis / Visualisation

Data Collection / Publication / Generation Processing / Product Supercomputing / Archiving / Cloud computing Collaboration From the data perspective

Planning Creating Analysis

Data Re-use? Publication (DOI)

Organising Disposal Sharing / Storing Example - ASKAP

Vis cluster

Remote Display - Quality control Instrument Supercomputer - processing Archive, Pawsey Centre Databases Operations - sharing

Research

Publication / Display Product Archive, - Science Supercomputer - analysis databases Pawsey Centre or elsewhere What is a Supercomputer?

Wikipedia - “A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation”.

Many computers (nodes) connected and working together.

• Shared resource – submit jobs to a queue. • Accessed remotely. Pawsey Services

Compute Data Supercomputers Storage services Cloud computing

Data analytics Remote Visualisation Expertise – including workflows, software development, scientific visualisation Training Internships Magnus Supercomputer

Cray XC40 containing 1488 nodes. 35,712 Intel Haswell cores giving 1.1 PFLOPs peak performance. • Cray Aries interconnect. Run 10,000’s of cores efficiently • Linux • 3PB high speed temporary storage Galaxy Supercomputer

• Cray XC30. 472 nodes CPU+CPU. 64 nodes CPU+GPU. 325 TFLOPs peak performance. Cray Aries interconnect. Linux. • In production since mid 2013. Full capacity Aug 2014.

For radio astronomy users. Both operations and research. Zeus Cluster

• Remote visualisation • Pre/post processing for Magnus • Large shared memory More suitable for some bio-informatics codes, mesh generation than a supercomputer

Zeus – cluster of 40 nodes. Each has 128, 256 or 512GB memory, and either a Nvidia Quadro K5000 or K20 GPU. Zythos – 1 large memory node in Zeus cluster. 6TB memory, 4x K20 GPUs. Pawsey Research Cloud

Pawsey Research Cloud (funded through NeCTAR). Suitable for data portals and long-running small simulations. E.g. R, Python, Java.

• Instances can have up to 16 cores, 64GB memory (via National allocation). • Project Trial allocation is 2 cores for 3 months – can be run as two single core or one two core machine. • Can start with Linux image from scratch and install software on it, or access virtual laboratories and tools where provided. A Pawsey Datastore

. Online storage of big data . 20PB storage . Hard disk cache (hierarchical storage) . Tape is highly energy efficient! Pawsey Storage Services Available Storage: • Pure disk (low latency) storage • Mixed disk and tape (medium latency) storage • Pure disk has a fixed capacity and more stringent quota requirements At Pawsey: • Data management tools to host and share high quality data collections • Dedicated staff to support researchers using data services Remote Visualisation

Visualise from the Zeus cluster. • No need to move data outside Pawsey Centre!

• Large-memory (≥128GB per node) • Nvidia Quadro graphics • Linux-only applications • High speed network (10Gbps) Supercomputing Expertise

• Parallel programming • Supercomputing workflows • Computational science • GPU accelerators • Cloud computing • Scientific visualisation • Data-intensive computing Data Expertise

• Volume data transfers (including staging) & movement • Data Analytics (including Hadoop) • Data Image compression • Data Asset Management (via LiveARC) including access • Data Management • Data Provenance • Data Ethics & Copyright • Use Research Cloud for Data Intensive Applications Visualisation Expertise

• Volume visualisation and segmentation • High resolution image capture and interactive display • Image and volume processing • 3D reconstruction from photographs (Photogrammetry) • Novel data presentation including 3D printing, 3D glasses free display • Computational geometry Australian Scientific Computing and eResearch Centres Working with Pawsey Access to Pawsey Supercomputers Directors 5% Annual calls (open in Oct/Nov) for: Energy & • NCMAS (25%) Resources Astronomy • Energy & Resources Scheme (15%) 15% 25% • Pawsey Partner Scheme (30%) These are by competitive merit, and free

Partner NCMAS Share 15% 30% Responsive-mode for: • Director’s Share (5%) – smaller allocations of time to support development and initial experiments. Industry can buy time in this share.

Astronomy share (25%) for telescope operations Access to Pawsey Storage

Pawsey Data Portal https://data.pawsey.org.au) • Web interface (navigate, search, edit, drag n’ drop upload, download) • Script upload (ashell.py) • Online application process (data >5TB*) [*some exceptions] https://data.pawsey.org.au/apply/ • Designed for collaboration, not for ‘primary’ copy of data Be part of the intern program

Workplace-based learning opportunity, in scientific computing space • 10-week research project over summer for undergraduates • $6,000 tax free Annual call: • Projects in June/July • Students in August/September

You can: • Propose a project to explore new areas • Encourage your students to apply Join the community

News, events and activities: • Subscribe to Pawsey Friends mailing list (via our website) • Come to seminars • Attend training • Suggest training

Find new collaborators: • Meet other researchers in WA and beyond • We can introduce you to industry and other researchers • We can co-host visitors Documentation https://portal.pawsey.org.au/

• Documentation • Pawsey-supported Software list • Maintenance / outages

For application-specific issues not related to HPC, try Google, mailing lists. Citing Pawsey

• Publications • Acknowledge Pawsey. E.g. “The work was supported by Pawsey through the use of advanced computing resources”.

• Grant Applications • When applying for grants, add in-kind contributions from Pawsey. Contact us for what amounts to use. Thank you - questions welcome

[email protected]

For more information see

http://www.pawsey.org.au/