AI Breakthroughs and Initiatives at the Pittsburgh Supercomputing Center

Paola Buitrago · AI & Big Data, PSC · [email protected] Nick Nystrom · Interim Director, PSC · [email protected] September 7, 2017

©1 2017 Pittsburgh Supercomputing Center PSC Overview Joint effort of Carnegie Mellon University and the University of Pittsburgh 31+ years of national leadership in: – High-performance and data-intensive computing – 19 supercomputers, including 9 that were/are “serial #1” – Software architecture, implementation, and optimization – Networking and network optimization – Enabling ground-breaking science, computer science, and engineering – Leading research in AI, biology, public health, neuroscience, filesystems, networking, HPC software engineering, chemistry, materials science, engineering, physics, statistics, … Supported by: NSF, NIH, the Commonwealth of Pennsylvania, foundations, DOE, DoD, and industry

2 What’s Happening at PSC

Two unique HPC systems – Bridges, which converges HPC, AI, and Big Data and emphasizes usability, flexibility, and interactivity to enable nontraditional HPC applications and communities – Anton 2, a special-purpose supercomputer for molecular dynamics (MD) simulation designed by D. E. Shaw Research (DESRES) Leadership transition – Nick Nystrom now Interim Director New and expanding areas of focus – Launched a new group addressing AI and Big Data, lead by Paola Buitrago – Introducing Computational Biology, directed by Phil Blood, over Biomedical Applications and Public Health Applications Strengthening ties with industry – Initial emphasis on AI and the life sciences

3 Bridges, a new kind of supercomputer at PSC, converges HPC and Big Data, empowers new research communities, brings desktop convenience to high-end computing, expands remote access, and helps researchers to work more intuitively. • Funded by NSF award #ACI-1445606 ($17.2M), Bridges emphasizes usability, flexibility, and interactivity • Available at no charge for open research and course support and by arrangement to industry • Popular programming languages and applications: R, MATLAB, Python, Hadoop, Spark, … • 846 compute nodes containing 128GB (800), 3TB (42), and 12TB (4) of RAM each • 64 NVIDIA Tesla P100 GPUs + 32 NVIDIA Tesla K80 GPUs; TensorFlow, Caffe, PyTorch, Theano, … • Dedicated nodes for persistent databases, web servers, and distributed services

4 • The first deployment of the Intel Omni-Path Architecture fabric Converging HPC, AI, and Big Data

Prompted by focusing on users – Convergence in Bridges was user-driven, ahead of industry Tightly couple CPUs, accelerators, and storage on a unified, high-performance interconnect – User-driven storage expansion to support Big Data – Community data + analytics → BDaaS Provide a very flexible software environment – Python, R, MATLAB, Java, MPI+X, … – Interactivity enables analytics and development – Persistent database and web server nodes – Gateways → HPCaaS, democratizing access – Containers (Singularity, Docker) and virtualization Build on PSC’s experience with new HPC communities – Leverage coherent shared memory: 12TB, 3TB, 128GB – Focused user support, leveraging XSEDE

5 20 Storage Building Blocks, Project & implementing the parallel Pylon community storage system (10 PB usable) datasets

4 HPE Integrity Superdome X (12TB) compute nodes … Large-memory Java & Python

Representative uses for AI … each with 2 gateway nodes 4 MDS nodes 2 front-end nodes 42 HPE ProLiant DL580 (3TB) compute nodes 2 boot nodes Large-memory Java & Python 8 management nodes 12 HPE ProLiant DL380 6 “core” Intel® OPA edge switches: database nodes fully interconnected, 6 HPE ProLiant DL360 User interfaces for 2 links per switch web server nodes AIaaS, BDaaS Robust paths to 20 “leaf” Intel® OPA edge switches parallel storage Intel® OPA cables Distributed training, Spark, etc.

Purpose-built Intel® Omni-Path Architecture topology for data-intensive HPC 32 HPE Apollo 2000 (128GB) GPU nodes Deep Learning with 2 NVIDIA Tesla P100 GPUs each 32 RSM nodes, each with 2 NVIDIA Tesla P100 GPUs Bridges Virtual Tour: 16 RSM nodes, each with 2 NVIDIA Teslahttps://psc.edu/bvt K80 748 HPE Apollo 2000 (128GB) GPUs compute nodes 16 HPE Apollo 2000 (128GB) GPU nodes Additional Deep Learning ML, inferencing, DL development, with 2 NVIDIA Tesla K80 GPUs each Spark, HPC AI (Libratus) 6 GPU Nodes

Bridges’ GPUs are accelerating both deep learning and simulation codes

Kepler architecture Phase 1: 16 nodes, each with: 2496 CUDA cores (128/SM) 7.08B transistors on 561mm2 die (28nm) • 2 × NVIDIA Tesla K80 GPUs (32 total) 2×24 GB GDDR5; 2×240.6 GB/s • 2 × Intel Xeon E5-2695 v3 (14c, 2.3/3.3 GHz) 562 MHz base – 876 MHz boost • 128GB DDR4-2133 RAM 2.91 Tf/s (64b), 8.73 Tf/s (32b)

Pascal architecture Phase 2: +32 nodes, each with: 3584 CUDA cores (64/SM) 15.3B transistors on 610mm2 die (16nm) • 2 × NVIDIA Tesla P100 GPUs (64 total) 16GB CoWoS® HBM2 at 720 GB/s w/ ECC • 2 × Intel Xeon E5-2683 v4 (16c, 2.1/3.0 GHz) 1126 MHz base – 1303 MHz boost 4.7 Tf/s (64b), 9.3 Tf/s (32b), 18.7 Tf/s (16b) • 128GB DDR4-2400 RAM Page migration engine improves unified memory 64 P100 GPUs → 600 Tf/s (32b)

7 A few DL examples Allocations on Bridges are shown in red Optimal Data Representation for Deep Learning for 301 institutions Computational Chemistry: 1008 projects Garrett Goh, PNNL Deep Convolutional Neural Networks 3682 users for Generative Protein and Small 101 fields of Molecule Design: science Joseph Jacobson, MIT

Deep Learning the Gene Regulatory Code: Shaun Mahony, Penn State Convolutional Neural Network Approaches to Classification of Marine Zooplankton: Mark Ohman, Scripps Institution of Oceanography

8 Recent Success Story: Libratus (HPE Video)

9 AI for Strategic Reasoning: Beating Top Pros in Heads-Up No-Limit Texas Hold’em Poker Tuomas Sandholm and Noam Brown, Carnegie Mellon University

Imperfect-info games require different algorithms, but apply to important classes of real-world problems: – Negotiation – Strategic pricing – Medical treatment planning – Auctions – Military allocation problems Prof. Tuomas Sandholm watching one of the world’s best players compete against Libratus.

Heads-up no-limit Texas hold’em is the main benchmark for imperfect-info games – 10161 situations – Libratus is first program to beat top humans Libratus improved upon previous best algorithms by incorporating real-time – Beat 4 top pros playing 120,000 hands over 20 days improvements in its strategy. – Libratus won decisively: 99.98% statistical significance

10 “The best AI's ability to do strategic reasoning with imperfect information has now surpassed that of the best humans.” —Professor Tuomas Sandholm, Carnegie Mellon University

Bridges made this breakthrough possible through 19 million core-hours of computing and data storage for the 2.6 PB knowledge base that Libratus generated. Libratus, under the Chinese name Lengpudashi, or “cold poker master”, also won a 36,000-hand exhibition in China in April 2017 against a team of six strong Chinese poker players. Most recently demonstrated at IJCAI 17 (Melbourne, August 2017). 11 Artificial Intelligence & Big Data (AI&BD) Group

Develop a comprehensive portfolio of services for academic & external users • Advance and support convergence of HPC, AI, and Big Data • Big Data as a Service (BDaaS), AI as a Service (AIaaS) • Data repositories (Common Crawl, microscopy, genomics, …) • Streamline user environment: software, documentation, processes Lead specification and acquisition of advanced technologies for AI and BD • Maintain close ties to vendors and other developers of hardware and software technologies

12 Artificial Intelligence & Big Data (AI&BD) Group

Advance the regional ecosystem of universities and industry • AI, life sciences, robotics • Center for Machine Learning and Health (CMLH), Life Sciences PA, SMC, … Lead and enable research • Exploit experience with Bridges Industrial initiative to help businesses get started with AI Coordinate education and training Communicate results and benefits of projects in AI and BD through publications, videos,and demonstrations

13 AI, BD, DS Project Life Cycle

Group approach: • User-centric. • Optimize time-to-science • Continuous improvement • Offer tools and resources for all the steps in the life cycle of an AI project: • Workflow management • Data management • Understand specific needs: • Framework, stack • Training vs. inference

IBM, Foundational Methodology for Data Science

Reproducibility Data Provenance Usability Hui Miao, Ang Li, Larry S. Davis, Amol Deshpande, Deep Learning Modelling Lifecycle

14 Deep Learning Frameworks on Bridges

Joel Welling

User-driven

15 National HPC & Big Data Training

XSEDE HPC Monthly Workshop Series – 47 events held, totaling 7,435 participants – Rotating topics: MPI, OpenMP, OpenACC, Big Data & DL • 8 Big Data workshops to date, serving 2,396 registered participants • May 18, 2017 Big Data workshop: 405 registered participants across 21 institutions XSEDE HPC Summer Boot Camp John Urbanic – 4 days, covering MPI, OpenMP, OpenACC and accelerators International Summer School on HPC Challenges in Computational Sciences – Sponsored by XSEDE, PRACE, NSF, RIKEN AICS, and Compute Canada – 2017: Boulder, Colorado – 2016: Ljubljana, Slovenia Tom Maiden For AI and BD training materials, go to https://www.psc.edu/training- education/training/hpc-big-data 16 Bridges Training Sites

● MSIs (Minority Serving Institutions) ● EPSCoR (Experimental Program to Stimulate Competitive Research) ● MSI and EPSCoR ● Other

17 National HPC & Big Data Training

XSEDE HPC Workshop: BIG Data: September 12-13 – Content • Big Data: Hadoop + Spark • AI: DL on TensorFlow • AI: DL on Bridges • Hands-on projects – TAs, real time support, HD video feed.

•Pittsburgh Supercomputing Center •North Carolina A&T State University – Can accommodate people with site •Kennesaw State University •Old Dominion University •Harvey Mudd College •Howard University access issues (send email to •George Mason University •University of Nebraska-Lincoln [email protected]). •Tufts University •University of Colorado Boulder •Stony Brook University •University of Texas at El Paso – Register at https://portal.xsede.org/course- •Oklahoma State University •University of Arizona •Georgia State University •Iowa State University calendar •University of Utah •University of Tennessee, Knoxville - •University of Houston, Clear-Lake National Institute for Computational •Purdue University Sciences •University of Notre Dame •University of California, Los Angeles •Pennsylvania State University 18 Support of ML/DL Courses

Spring 2017 Deep Reinforcement Learning and Control

• Big Data Systems in Practice (Spring 2016, Ravi Starzl, CMU) • An Introduction to Knowledge-Based Deep Learning and Socratic Coaches (Spring 2017; James Baker, Bhiksha Ramakrishnan, Rita Singh, CMU) • CMU 10-703: Deep Reinforcement Learning and Control (Spring 2017; Katerina Fragkiadaki, Ruslan Salakhutdinov, CMU) • CMU Tepper School of Business 44-956, Unstructured Data & Big Data: Acquisition to Analysis (Spring 2017, Dokyun Lee, CMU Tepper School of Business) • Big Data Mining (Spring 2017, Tabitha Samuel, U. of Tennessee Knoxville) • Big Data and Large scale Computing (Fall 2017, Melody Fleishauer, CMU Heinz College) • Computer Vision (Fall 2017, Remi Megret, University of Puerto Rico, Mayaguez) 19 2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT) at CMU

Held at Carnegie Mellon University from July 3-August 11, 2017 – Two-week summer school, followed by a six-week workshop Topics and research teams: – Neural Machine Translation with Minimal Parallel Resources Leader: George Foster, National Research Council Canada – Enhancement and Analysis of Conversational Speech Leader: Mark Liberman, Linguistic Data Consortium – The Speaking Rosetta Stone - Discovering Grounded Linguistic Units for Languages without Orthography Leader: Emmanuel Dupoux, École des Hautes Etudes en Sciences Sociales; Odette Scharenborg, Radboud University Nijmegen

Results at https://www.lti.cs.cmu.edu/frederick-jelinek-memorial-summer-workshop-closing-day-schedule.

20 Some Deep Learning Projects Using Bridges (1)

Deep Learning for Drug-Protein Modeling of Imaging and Genetics Exploring Stability, Cost, and Interaction Prediction, Gil Alterovitz using a Deep Graphical Model, Performance in Adversarial Deep (Harvard Medical School/Boston Nematollah Batmanghelich (University Learning, Matt Fredrikson (CMU) Children's Hospital) of Pittsburgh) Inverse Graphics Engines for Visual Preparing Grounds to Launch All-US Identification of Aircraft Orientation Inference, Ioannis Gkioulekas (CMU) Students Kaggle Competition on Drug using Deep Neural Net, Josh Bertram Optimal Data Representation for Deep Prediction, Gil Alterovitz (Harvard (Iowa State University) Learning for Computational Chemistry, Medical School/Boston Children's Summarizing and Learning Latent Garrett Goh (Pacific Northwest Hospital) Structure in Video, Jeff Boleng (CMU) National Laboratory) Simulation of Amorphous Materials Algorithms for Deep Learning, Roger Deep Convolutional Neural Networks and Nanostructures for Clean Energy Dannenberg (CMU) for Generative Protein and Small Applications, Nongnuch Artrith Molecule Design, Joseph Jacobson (University of California, Berkeley) Ligand Binding on the Atomic and (MIT) Macromolecular Scales, Jacob Durrant Course 11-364: Introduction to Deep (University of Pittsburgh) Enabling Robust Image Understanding Learning, James Baker (CMU) Using Deep Learning, Adriana Request of GPU Resources for CMU Deep Reinforcement Learning Based Kovashka (University of Pittsburgh) Course Deep Reinforcement Learning, Visual Tracker, CJ Barberan (Rice Katerini Fragkiadaki (CMU) University)

21 Some Deep Learning Projects Using Bridges (2)

Deep Recurrent Models for Fine- Deep Learning the Gene Regulatory Neural Networks with Structured Grained Recognition, Michael Lam Code, Shaun Mahony (Pennsylvania Innerproduct Layers, Amy Nesky (Oregon State University) State University) (University of Michigan) Simulations and Algorithms For Computer Vision Course, Remi Megret Machine Learning for Medical Image Genetics and Epigenetics Using 2D (University of Puerto Rico, Mayaguez) Analysis, Mai Nguyen (University of Electronic Nanopores, Jean-Pierre California, San Diego) Automatic Building of Speech Leburton (UIUC) Recognizers for Non-Experts, Florian Convolutional Neural Network Education Allocation for the Course Metze (Carnegie Mellon University) Approaches to Classification of Marine Unstructured Data & Big Data: Zooplankton, Mark Ohman (Scripps The 2017 Jelinek Memorial Summer Acquisition to Analysis, Dokyun Lee Institution of Oceanography) School and Workshop on Speech and (CMU) Language Technologies (JSALT), Florian Quantifying California Current Automatic Evaluation of Scientific Metze (CMU) Plankton using Machine Learning, Writing, Diane Litman (University of Mark Ohman (Scripps Institution of How do Recurrent Neural Networks Pittsburgh) Oceanography) Encode Grammar? A Case Study in Deciphering Cellular Signaling System Artificial Neuroscience, Joshua Deep Learning Architecture and by Deep Mining a Comprehensive Michalenko (Rice University) Application, Yi Pan (Georgia State Genomic Compendium, Xinghua Lu University) (University of Pittsburgh School of Medicine) 22 Some Deep Learning Projects Using Bridges (3)

Automatic Pain Assessment, Michael Image Classification Applied in Atomistic Understanding of Reale (SUNY Polytechnic Institute) Economic Analysis, Param Singh (CMU) Liquid/Solid Interfaces: Applications of Machine Learning Force Fields, Development of a Hybrid ARIEL: Analysis of Rare Incident-Event Yunpeng Wang (Vanderbilt University) Computational Approach for Macro- Languages, Ravi Starzl (CMU) scale Simulation of Exciton Diffusion in Unsupervised Learning of Deep Convolution Neural Network Based Polymer Thin Films, Based on Generative Models in Vision, Ying Nian Positron Emission Tomography Image Combined Machine Learning, Wu (University of California, Los Reconstruction, Jing Tang (Oakland Quantum-Classical Simulations and Angeles) University) Master Equation Techniques, Peter Petuum, a Distributed System for High- Rossky (Rice University) Live Song Identification Using Semantic Performance Machine Learning, Eric Features, Timothy Tsai (Harvey Mudd Accelerating the Training Process of Xing (CMU) College) Deep Neural Networks, Xipeng Shen Developing Large-Scale Distributed (North Carolina State University) Carnegie Mellon University Data Deep Learning Methods for Protein Science Club, John Urbanic (PSC) Deep Purple: Deep Purposeful Learning Bioinformatics, Jinbo Xu (Toyota of Complex Dynamic Systems, Aarti Deep Learning of Game Strategies for Technological Institute at Chicago) Singh (CMU) RoboCup, Manuela Veloso (CMU)

23 Applying Deep Learning to Connectomics

Goal: Explore the potential of deep learning to automate segmentation of high-resolution scanning electron microscope (SEM) images of brain tissue and the tracing of neurons through 3D volumes to automate generation of the connectome, a comprehensive map of neural connections. Motivation: This project builds on an ongoing collaboration between PSC, Harvard University, and the Allen Institute for Brain Science, through which we have access to high-quality raw and labeled data. The SEM data volume for mouse cortex imaging is ~3TB/day, and data processing is currently human-intensive. Images courtesy Florian Engert, David Hildebrand, and their students at the Center for Brain Science, Harvard. Forthcoming camera systems will increase data bandwidth by a factor of 65. Datasets: Zebrafish larva (1024×1024×4900), mouse, …. Data volume: mouse brain 430mm3 → ~1.4 exabytes. Collaborators: Ishtar Nyawĩra, Iris Qian, Annie Zhang, Joel Welling, John Urbanic, Art Wetzel, Nick Nystrom

24 Thank you! Questions?