Introduction to HPC in Canada

Erming Pei

Research Computing Group, UAlberta Compute Canada / WestGrid Outline & Schedule

• 10:00 Introduction to Compute Canada (15’) • 10:15 Introduction to WestGrid (15’) • 10:30 Q&A -1 (5’) • 10:35 Break (10’) • 10:45 Introduction to HPC (40’) • 11:25 Q&A -2 (5’) Introduction to Compute Canada About Compute Canada

• Compute Canada integrates 4 regional HPC consortia across the country – provides a shared HPC/ARC infrastructure across Canada – supports world-level leading-edge research activities.

• CC aggregates petaflops of computing power and petabytes storage capacity over Canada's high-performance networks.

• CC provides overall services including infrastructure, application, operation and user support for national-wide users. Compute Consortia

Previously, there were 7 consortia. • ACENET • CLUMEQ • RQCHP • HPCVL • SciNet • SHARCNET • WestGrid

Currently, it has been consolidated into four consortia. • WestGrid • Compute Ontario • Calcul Québec • ACENET Existing Systems & Resources

• ~40 Universities • ~27 Data Centers • ~50 Systems • ~200,000 cores, 2 Pflops, 20PB • ~100 of research software packages • ~200 experts in utilization of ARC for research

https://www.westgrid.ca/events/responding-to-canadas-research-computing-needs New CC Systems

• UVic, GP1 (Cloud) • SFU, GP2 (General Purpose) • UW, GP3 (General Purpose) • UofT, LP (Large Parallel) Schedule of New CC Systems

Site/Service Description Availability Resource

GP1 - UVic Large Openstack Sept., 2016 3000cores + 40% Cloud expansion (2017)

GP2 - SFU General purpose Feb., 2017 18,000cores+ 40% cluster + Cloud expansion (2017) partition 1923 GPU nodes

GP3 - Waterloo Ditto May, 2017 19,000 cores + 40% expansion (2017) 64 GPU nodes LP - UToronto Large parallel Dec., 2017 66,000 cores

National Storage HSM + Object Oct., 2016 Dozens PBs Infrastructure Storage (All 4 sites) 10PB to start

https://www.computecanada.ca/renewing-canadas-advanced-research-computing-platform/new-systems-at-four-national-sites/ Continuing Development

• Consolidation by 2018 – 5-10 Data Centres – 300,000 cores, 12 Pflops, 50+ PB

• 2016-17: Commissioning new systems while decommissioning old systems CC New Organization Chart

Administration

TLC SC SLC

GP2 GP1 LP GP3 Cloud MON NW PSNT RS Storage EOT VIZ DH Bio-M Bio-Info SPNT

SWG

https://staff.computecanada.ca/national_teams/chart CC Cloud Service

• Currently Compute Canada has mainly two cloud systems: Cloud West and Cloud East Access CC clouds

• Cloud East: http://east.cloud.computecanada.ca • Cloud West: http://west.cloud.computecanada.ca • Can access with your CC account CC Cloud Service OwnCloud

• A Dropbox-like cloud storage service – hosted by WestGrid • Can access with WestGrid user/password Globus Online

• High performance data service • https://globus.computecanada.ca Globus Online

• Needs MyProxy authentication (WestGrid login/passwd) • Can select existing endpoints (GridFTP service in sites) • Can create your personal endpoint with “Globus Connect Personal” Intro to WestGrid About WestGrid

• WestGrid is one of four regional HPC consortia of Compute Canada • WestGrid itself has 15 partner institutions across British Columbia, Alberta, Saskatchewan and Manitoba.

UNB A C UofU A USas Banff k TRU CenterUof SF Uof UUB C B Uof UVi C ULet R U M c h Uof W Overall Resources

• By far, WestGrid has more than 40,000 compute cores and 9PB storage space. • About 1,000 Compute Canada users from 475 projects are currently using WestGrid systems.

* HQP stands for highly qualified personnel 2012/13

Text and image source: Lindsay Sill, Intro to WestGrid 2013 WestGrid Staff

• Executive Director (Lindsay Sill)

• Director of Operations (Patrick Mann)

• Collaboration Coordinator

• Visualization Coordinator

• Site Leads

• Programmers

• System Analysts • System Administrators

Text and image source: Lindsay Sill, Intro to WestGrid 2013 WestGrid Facilities, UofA (Jasper)

• Processors: 4160 cores – 240 nodes with Xeon X5675 processors, 12 cores (2 x 6) and 24 GB of memory. – 160 nodes with Xeon L5420 processors, 8 cores (2 x 4) and 16 GB of memory. – Interconnect: • Infiniband QDR, 40 Gbit/s, with a 1:1 blocking factor • Infiniband DDR, 20 Gbit/s, with a 2:1 blocking factor – Storage: ~830TB (356TB Lustre + 280 TB storage servers + 192TB IS10K) – Quickstart: http://www.westgrid.ca/support/quickstart/jasper WestGrid Facilities, UofA (Hungabee)

• Processors: 2048 cores Shared-memory multiprocessor, comprises an SGI UV100 login node and an SGI UV1000 computational node, 16TB memory. • Interconnect: ccNUMA(cache-coherent non- ), combination of 's Quickpath and SGI's NUMAlink • Storage: 53TB NFS, and 356TB Lustre shared with Jasper • Quickstart: www.westgrid.ca/support/quickstart/hungabee WestGrid Facilities, UBC (Orcinus)

• Processors: 9600 cores (3072 Intel Xeon E5450 quad-core/16GB RAM + 6528 Xeon X5650 six-core/24GB RAM) • Storage: ~450TB, Lustre • QuickStart: www.westgrid.ca/support/quickstart/orcinus WestGrid Facilities, UofC (Breezy)

• Processors: 384 cores (16 node Appro AMD cluster with quad-socket, 6-core AMD Istanbul processors (24 cores @ 2.4 GHz) per node, 256GB RAM/node) • Interconnect: 4X DDR InfiniBand • Storage: ~450TB, IBRIX • Quickstart: http://www.westgrid.ca/support/quickstart/breezy WestGrid Facilities, UofC (Lattice)

• Processors: 4096 cores – 512 nodes with Intel Xeon L5520 8-core , 12 GB of memory. – Interconnect: • InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking – Storage: 160 TB shared with Lattice and Breezy – Quickstart: http://www.westgrid.ca/support/quickstart/lattice WestGrid Facilities, UofC (Parallel)

• Processors: 7056 cores – 528 12-core standard Xeon E5649 nodes, 24 GB of RAM. – 60 special nodes with 3 GPGPUs each ( Tesla M2070s, 5.5 GB Memory). • Interconnect: – InfiniBand 4X QDR (Quad Data Rate) 40 Gbit/s, 2:1 blocking – Storage: 160 TB shared with Lattice and Breezy – Quickstart: http://www.westgrid.ca/support/quickstart/lattice WestGrid Facilities, UM (Grex)

• Processors: 3792 cores (316 SGI Altix XE cluster, with two 6-core Intel Xeon X5650 2.66GHz processors, 48-96GB RAM/node • Interconnect: Non-blocking Infiniband 4X QDR • Storage: >100TB • Quickstart: www.westgrid.ca/support/quickstart/glacier WestGrid Facilities, UVic (Hermes/Nestor)

• Processors: 4416 cores [2112 (Hermes), 2304 (Nestor) ] – IBM iDataplex server with eight 2.67-GHz Xeon x5550 cores with 24 GB of RAM – Dell C6100 servers with twelve 2.66-GHz Xeon x5650 cores and 24 GB of RAM • Interconnect: – 84 Hermes nodes use two bonded Gigabit/s links – New Hermes 4X QDR non-blocking, 32-40Gb/s • Storage: 1.2PB, GPFS • Quickstart: www.westgrid.ca/support/quickstart/hermes_nestor WestGrid Facilities, SFU (Bugaboo)

• Processors: 4584 cores – 16 nodes with Intel Xeon E5430 4-core, 16GB/node; – 254 node with Xeon X5650 6-core processor, 24GB/node – 16 Xeon X5355 quad-core processor, 16GB/node • Interconnect: Infiniband using a 288-port QLogic switch • Storage: ~700TB • Quickstart: www.westgrid.ca/support/quickstart/bugaboo WestGrid Facilities, USask(Silo)

• Disk: 4.2 PB raw total, 3.15 PB usable – 600 x 1TB SATA drives, RAID 6 – 1800 x 2TB SATA drives, RAID 6 • Tape: IBM LTO 3584 tape library – ~3PB total, 1460 x LTO4 tapes, 920 LTO5 tapes. • Backup System: IBM Tivoli Storage Manager (TSM) – Quickstart: http://www.westgrid.ca/support/quickstart/silo Site Status

https://www.westgrid.ca/support/system_status Use CC/WestGrid

• Apply for a CC/WestGrid account • Get a Grid Certificate / Proxy • Existing Resource Classification • New Resource Allocation • Software • Site status • Technical Support CC/WestGrid Account

1. First ask your PI to apply for a Compute Canada account if he/she doesn’t have.

2. Then, you yourself apply for Compute Canada account as part of your PI’s project.

3. Your PI approves your application

4. You apply for an consortia account, e.g. WestGrid, ACEnet

Note: It takes a couple of days for your account to be created on all sites.

https://www.westgrid.ca/support/accounts/getting_account Grid Certificate

1. Log in to http://portal.westgrid.ca and “Request a Grid Certificate” 2. In “My Account” webpage, you will see two buttons for downloading you Grid certificate and private key. Grid Proxy

• Grid proxy is used in submitting Grid jobs or transferring files across Grid. (Limited lifetime and limited privileges)

• Users just need log in to any WestGrid site and then run: – myproxy-logon Resource Classification

Program Type Sites Serial Bugaboo, Hermes, Jasper Parallel Bugaboo, Nestor, Orcinus, Lattice, Parallel, Jasper, Grex SMP Parallel Breezy, Hungabee Large memory Grex, Breezy, Hungabee Visualization Parallel Gaussian Grex Matlab Orcinus (distributed computing toolbox), Jasper/Hungabee (UofA license), etc. Storage Silo, Bugaboo Software

• WestGrid has both free and commercial software. • You can use software packages in WestGrid – check this webpage to see if certain software release is already avaliable on WestGrid • Software list webpage https://www.westgrid.ca/support/software WestGrid support

Any questions, you can ask [email protected] New Resource Allocation

• RAC (Resource Allocation Competition) – https://www.westgrid.ca/support/accounts/resource_allocations

• RAC = RPP + RRG – RPP: Research Platforms and Portals (scientific/technical review needed)

– RRG: Resources for Research Groups (scientific/technical review needed)

• RAS: Rapid Access Service (formerly “Default Allocation”). No scientific/technical review needed

Email to:

[email protected] / [email protected] New RAC Schedule Introduction to HPC Outline

• What is HPC • Capability vs. Capacity • Programming model – Serial/Parallel • Architecture – SMP/DSM/MPP, UMA/NUMA/COMA • Interconnect – PCI(E)/Infiniband/NUMALink • Storage – RAID, Multipathing, Data – DAS/NAS/SAN – Parallel File Systems • Evolution of Computing – Mainframe, Cluster, Grid, Cloud, Big Data What is HPC?

• High Performance Computing – most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. Capability vs. Capacity

• Capability computing is typically thought of as using the maximum computing power to solve a single large problem in the shortest time. – e.g. A real-time weather simulation and prediction application.

• Capacity computing in contrast is typically thought of as using multiple cost-effective computing power to solve a big number of small problems or a small number of big problems. – e.g. Tons of user access to a web service simultaneously or, – To analyze huge amount HEP data: split it into many small pieces and distribute them across multiple cluster nodes. Spectrum

• Capability → Capacity

Hungabee Breezy Bugaboo BlueGene/Q • Single system • 16 fat node cluster • 256+ node cluster • 4096 low power nodes • 2048 cores • 256GB/node • 16-24GB/node • 65536 processor cores • 16TB • Hi-speed interconnet Architectures

• By processor – SMP (Symmetric Multi-Processors ) – DSM (Distributed ) – MPP (Massive Parallel Processors) • By memory – UMA (Uniform Memory Access) – NUMA (Non-Uniform Memory Access) – COMA (Cache Only Memory Access) Evolution of Architectures

Message Passing UMA

NUMA COMA Programming Model

• Serial – Instructions are executed one after another on a single CPU. • Parallel – Computations are carried out concurrently on multiple processors. • SPMD: single program multiple data • MPMD: multiple programs multiple data Parallel Programming Paradigms/Tools

– Data Parallel • HPF (High Performance Fortran) – Task Parallel • OpenMP (Open Multi-Processing) – Message Passing • PVM (Parallel Virtual Machine) • MPI (Message Passing Interface) – MPICH, Open MPI, etc. – Hybrid (MPI+OpenMP, MPI+GPGPU) – Advanced: Chapel, PGAS(Partitioned Global Address Space) Interconnect

• PCI • PCI Express • Infiniband

• HyperTransport (AMD) • QPI/Omni-path (Intel) • NUMAlink (SGI) Serial vs. Parallel

• In early days, serial connections were reliable but quite slow, so parallel connections was developed to send multiple pieces of data simultaneously. • While later it turned out that parallel connections have their own problems – electromagnetical interference between wires. • So the pendulum swung back to highly-optimized serial connections.

Serial → Parallel → Serial PCI/PCI-X

• PCI: Peripheral Component Interconnect (32bit) • PCI-X: PCI-eXtended (64bit)

Electromagnetic interference and signal degradation are common in parallel connections, which slows the connection down. The additional bandwidth of the PCI-X bus means it can carry more data but generates even more noise.

Image source: http://www.altera.com/products/ip/altera/t-alt-pci_soln.html PCI-Express • Instead of using the parallel connections, PCI-E has a switch controlled point-to-point serial connections. • Every device has its own dedicated connection, so devices no longer share bandwidth like they do on a normal data bus.

A single PCI Express lane, can handle 200 MB/s. A 16X PCI-E connector can reach 6.4 GB/s.

Image source: http://computer.howstuffworks.com/pci-express2.htm Infiniband

Image source: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Fall_2007/wiki4_001_a1 Infiniband

• The internal connections in most computers are inflexible and relatively slow. • As I/O increases, the existing bus system becomes a bottleneck. • While through InfiniBand switches, Infiniband channels are created to connect hosts (HCAs) and I/O targets (TCAs) • Instead of sending data in parallel across the backplane bus, Infiniband specifies a serial bus – The serial bus can also carry multiple channels of data at the same time in a multiplexing signal.

Infiniband theoretical throughput in Gb/s Infiniband vs. PCI/PCI-Express

http://www.mellanox.com/pdf/whitepapers/PCI_3GIO_IB_WP_120.pdf Storage • Storage Protocol • I/O BUS – Serial vs. Parallel • Redundancy – RAID (Redundant Array of Inexpensive Disks) – Multipathing (Redundant physical paths ) • Storage Attaching Approaches – DAS (Direct Attached Storage) – NAS (Network Attached Storage) – SAN (Storage Area Network) Storage Protocol

• CIFS/SMB (Common Internet File System) – application-layer network protocol mainly used to provide shared access to files, printers, etc. between nodes • NFS (Network File System) – application-layer network protocol only allows access to files over an Ethernet network. • SCSI/iSCSI (Internet Small Computer System Interface) – iSCSI is a mapping of regular SCSI protocol over TCP/IP • FC () – transport protocol which mainly transports SCSI commands over Fibre Channel networks • FCoE (Fibre Channel over Ethernet) – This allows Fibre Channel to use 10 Gigabit Ethernet networks (or higher speeds) while preserving the Fibre Channel protocol. I/O Bus

• ATA → SATA – ATA (Advanced Technology Attachment) – SATA (Serial ATA) • SCSI → SAS – SCSI (Small Computer System Interface) – SAS (Serially Attached SCSI)

Parallel → Serial

electromagnetic interference synchronization cost

http://www.denali.com/wordpress/index.php/dmr/2010/02/02/ssd-interfaces-and-performance-effects RAID Redundant Array of Independent Disks • RAID 0: Striping, without parity or mirroring. • RAID 1: Mirroring, without parity or striping. • RAID 2: Bit-level striping with dedicated parity. • RAID 3: Byte-level striping with dedicated parity. • RAID 4: Block-level striping with dedicated parity. • RAID 5: Striping with single distributed parity. • RAID 6: Block-level striping with double distributed parity. • Nested RAID: RAID 10, RAID 50, RAID 60, etc. Example: RAID 0, 1, 5 Example: Nested RAID 10, 50 Comparison

http://www.techwarelabs.com/10-things-to-consider-before-setting-up-raid/ Multipathing

• Multipath I/O – Is a fault-tolerance and performance enhancement technique. – To create multiple logical paths between the server and the storage devices. • via adapters, cables, and switches, etc. – In the event that one path fails, multipathing uses an alternate path so that applications can still continuingly access their data.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/DM_Multipath/ DAS/NAS/SAN

• Storage directly attached • Storage access through • Storage access through FC/IB • High cost of management Ethernet • Much better performance • Inflexible • Scalable and flexible • More flexible and scalable • Expensive to scale • Increases data availability

http://abdullrhmanfarram.wordpress.com/2013/04/08/storage-technologies-das-nas-and-san/ Parallel File System

• Distribute data into multiple storage nodes and access via high-speed network. • Concurrent (often coordinated) access from many clients • Provide global shared meta data (locations, file names, sizes, etc) Parallel File Systems

• Lustre • GPFS • Panasas • NFSv4?? Parallel File Systems

• Lustre • GlusterFS • OrangeFS • GPFS • IBRIX • CXFS • Panasys • PVFS2 • PNFS (NFSv4.1) • GoogleFS • Ceph Example: Lustre

Image source: http://wiki.lustre.org/manual/LustreManual18_HTML/figures/LustreArch.png Object storage

• Object storage appears as a collection of objects. • An object typically includes not only data itself, but some extra information such as meta data, OID, attributes, etc. • It moves lower-level functionalities such as space management, security functions into the storage device itself, accessing the device through a standard object interface. • Especially good for storing unstructured data such as photos, songs, etc.

Block Storage Object Storage Comparison of 3 storage types

NFS and SMB/CIFS Fibre Channel/iSCSI AWS S3 https://insights.ubuntu.com/2015/05/18/what-are-the-different-types-of-storage-block-object-and-file/ Comparison of 3 storage types

http://blog.sungardas.com/CTOLabs/2015/10/object-storage-the-alternator-of-cloud-computing/ Evolution of Computing

• Mainframe: super power • Cluster: worker bees • Grid: global orchestration • Cloud: everything as a service • Big Data: find needle in the sea Evolution of Computing

Big Data Clou d

Gri d PC/Clust Big volume er Big variety Big velocity Fast analysis/decision Virtualized resources Mainfram Elastic computing e Build everything as a service!

Multiple sites (geographically distributed) Global Scheduling Virtualized Organization Multiple nodes Transparent data access Batch job scheduling Unified security infrastructure

Single machine Shared memory

Image and test source: http://www.wikipedia.org/ Mainframe

• Originally referred to large cabinets that housed powerful CPU and shared memory.

• Modern design: – Redundant internal engineering resulting in high reliability and security – Extensive I/O facilities – High utilization rates Amdahl 470V/6 – Uses virtualization technology to support massive throughput Cluster

• Tightly connected computers that work together as a single system – Low cost – scalability – Flexibility

• Batch job scheduling/management

• Parallel computing Grid

• Grid computing is the coordination on massive computer resources from multiple locations, to reach a common goal. The resources are: – loosely coupled – heterogeneous – Geographically dispersed – Dynamic

• Main features: – High level scheduling/Workload management – Unified security infrastructure – Global information system – Virtualized organization – Transparent data transfer interface Example: WLCG (Worldwide LHC Computing Grid) Cloud

• Initially – IAAS (Infrastructure as a Service) – PAAS (Platform as a Service) – SAAS (Software as a Service)

• Subsequently – HAAS (Hardware as a Service) – NAAS (Network as a Service) – DAAS (Database as a Service) – CAAS (Communication as a Service) – BPAAS (Business Process as a Service)

• Eventually – XAAS (Everything as a service!) Image source: www.telezent.com X

Image source: http://blueatoll.com/blog/the-next-generation-enterprise-business-as-a-service-in-the-cloud/ Big Data

• What is Big Data – refers to technologies of handling data that is too diverse, fast-changing or massive for conventional technologies to address efficiently. – Today new technologies make it possible to realize value from Big Data. Big Data’s Four V’s

http://www.ibmbigdatahub.com/blog/how-big-data-and-cognitive-computing-are-transforming-insurance-part-2 Big Data: Core Technology

• Foundation stone – Google (GFS, MapReduce, Big Table)

• Free version – Apache (HDFS, YARN, Hbase, Hive, Pig…) Big Data: MapReduce

Image source: http://www.slideshare.net/tothc/introduction-to-hadoop-and-map-reduce Big Data: Evolution

• New Troika – Google (Dremel, Pregel, Caffeine)

• Free version – Apache Drill, Apache Giraph, Stanford GPS

Image source: http://blog.mikiobraun.de/2013/02/big-data-beyond-map-reduce-googles-papers.html Example: MapReduce vs. Dremel

Query: SELECT SUM(CountWords(txtField)) / COUNT(*) FROM T1 (T1: 85 billion records, 87 TB, 3000 nodes)

Image source: http://www.cubrid.org/blog/dev-platform/meet-impala-open-source-real-time-sql-querying-on-hadoop/ Big Data Ecosystem Summary

• Introduced Compute Canada and its consortia • Introduced WestGrid and the member sites • Introduced high performance computing from different angles such as architecture, memory, interconnect, storage, file system, etc. • Also briefed evolution of computing technologies from mainframe, cluster, grid, cloud, to the current hot topic —— Big Data. Follow-up Talks

• Sept.15, 2016 Tips for Submitting jobs & Moving Data (with Hands-on session) – Masao Fujinaga

• Sept. 27, 2016 Scheduling & Job Management (with Hands-on session) – Kamil Marcinkowski

See more details in: https://www.westgrid.ca/events Thanks! Questions?