Randomized Algorithms for Scientific Computing (RASC)

Randomized Algorithms for Scientific Computing (RASC) Report from the DOE ASCR Workshop held December 2–3, 2020, and January 6–7, 2021 Prepared for U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program RASC Report Authors Aydın Bulu¸c(co-chair), Lawrence Berkeley National Laboratory Tamara G. Kolda (co-chair), Sandia National Laboratories Stefan M. Wild (co-chair), Argonne National Laboratory Mihai Anitescu, Argonne National Laboratory Anthony DeGennaro, Brookhaven National Laboratory John Jakeman, Sandia National Laboratories Chandrika Kamath, Lawrence Livermore National Laboratory Ramakrishnan (Ramki) Kannan, Oak Ridge National Laboratory Miles E. Lopes, University of California, Davis Per-Gunnar Martinsson, University of Texas, Austin Kary Myers, Los Alamos National Laboratory Jelani Nelson, University of California, Berkeley Juan M. Restrepo, Oak Ridge National Laboratory C. Seshadhri, University of California, Santa Cruz Draguna Vrabie, Pacific Northwest National Laboratory Brendt Wohlberg, Los Alamos National Laboratory Stephen J. Wright, University of Wisconsin, Madison Chao Yang, Lawrence Berkeley National Laboratory Peter Zwart, Lawrence Berkeley National Laboratory arXiv:2104.11079v1 [cs.AI] 19 Apr 2021 DOE/ASCR Point of Contact Steven Lee Randomized Algorithms for Scientific Computing i Additional Report Authors This report was a community effort, collecting inputs in a variety formats. We have taken many of the ideas in this report from the thesis statements collected from registrants of “Part 2” of the workshop. We further solicited input from community members with specific expertise. Much of the input came from discussions at the workshop, so we must credit the entire set of participants listed in full in AppendixB. Although we cannot do justice to the origination of every idea contained herein, we highlight those whose contributions were explicitly solicited or whose submissions were used nearly verbatim. Solicited and thesis statement inputs from Rick Archibald, Oak Ridge National Laboratory David Barajas-Solano, Pacific Northwest National Laboratory Andrew Barker, Lawrence Livermore National Laboratory Charles Bouman, Purdue University Moses Charikar, Stanford University Jong Choi, Oak Ridge National Laboratory Aurora Clark, Washington State University Victor DeCaria, Oak Ridge National Laboratory Zichao Wendy Di, Argonne National Laboratory Jack Dongarra, University of Tennessee, Knoxville Jed Duersch, Sandia National Laboratories Ethan Epperly, California Institute of Technology Benjamin Erichson, University of California, Berkeley Maryam Fazel, University of Washington, Seattle Andrew Glaws, National Renewable Energy Laboratory Carlo Graziani, Argonne National Laboratory Cory Hauck, Oak Ridge National Laboratory Paul Hovland, Argonne National Laboratory William Kay, Oak Ridge National Laboratory Nathan Lemons, Los Alamos National Laboratory Ying Wai Li, Los Alamos National Laboratory Dmitriy Morozov, Lawrence Berkeley National Laboratory Cynthia Phillips, Sandia National Laboratories Eric Phipps, Sandia National Laboratories Benjamin Priest, Lawrence Livermore National Laboratory Ruslan Shaydulin, Argonne National Laboratory Katarzyna Swirydowicz, Pacific Northwest National Laboratory Ray Tuminaro, Sandia National Laboratories ii Randomized Algorithms for Scientific Computing Contents Executive Summary iv 1 Introduction 1 2 Application Needs and Drivers4 2.1 Massive Data from Experiments, Observations, and Simulations . .4 2.2 Forward Models . .7 2.3 Inverse Problems . 12 2.4 Applications with Discrete Structure . 14 2.5 Experimental Designs . 17 2.6 Software and Libraries for Scientific Computing . 20 2.7 Emerging Hardware . 22 2.8 Scientific Machine Learning . 24 3 Current and Future Research Directions 27 3.1 Opportunities for Random Sampling Schemes . 27 3.2 Sketching for Linear and Nonlinear Problems . 29 3.3 Discrete Algorithms . 33 3.4 Streaming . 35 3.5 Complexity . 37 3.6 Verification and Validation . 40 3.7 Software Abstractions . 41 4 Themes and Recommendations 45 4.1 Themes . 45 4.2 Recommendations . 47 References 49 Appendices 61 A Workshop Information 61 B Workshop Participants 63 C Acknowledgments 74 Randomized Algorithms for Scientific Computing iii Executive Summary Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. Advances in data collection and numerical simulation have changed the dynamics of scientific research and motivate the need for randomized algorithms. For instance, advances in imaging technologies such as X-ray ptychography, electron microscopy, electron energy loss spectrocopy, or adaptive optics lattice light-sheet microscopy collect hyperspectral imaging and scattering data in terabytes, at breakneck speed enabled by state-of-the-art detectors. The data collection is exceptionally fast compared with its analysis. Likewise, advances in high-performance architectures have made exascale computing a reality and changed the economies of scientific computing in the process. Floating-point operations that create data are essentially free in comparison with data movement. Thus far, most approaches have focused on creating faster hardware. Ironically, this faster hardware has exacerbated the problem by making data still easier to create. Under such an onslaught, scientists often resort to heuristic deterministic sampling schemes (e.g., low-precision arithmetic, sampling every nth element), sacrificing potentially valuable accuracy). Dramatically better results can be achieved via randomized algorithms, reducing the data size as much as or more than naive deterministic subsampling can achieve, while retaining the high accuracy of computing on the full data set. By randomized algorithms we mean those algorithms that employ some form of randomness in internal algorithmic decisions to accelerate time to solution, increase scalability, or improve reliability. Examples include matrix sketching for solving large-scale least-squares problems and stochastic gradient descent for training machine learning models. We are not recommending heuristic methods but rather randomized algorithms that have certificates of correctness and probabilistic guarantees of optimality and near-optimality. Such approaches can be useful beyond acceleration, for example, in understanding how to avoid measure zero worst-case scenarios that plague methods such as QR matrix factorization. data sketch Figure 1: Randomized sketching can dramatically reduce the size of massive data sets. Randomized algorithms have a long history. The Markov chain Monte Carlo method was central to the earliest computing efforts of DOE’s precursor (the Atomic Energy Commission). By the 1990s, randomized algorithms were deployed in regimes such as randomized routing in Internet protocols, the well-known quicksort algorithm, and polynomial factoring for cryptography. Starting in the mid-1990s, random forests and other ensemble classifiers have shown how randomization in machine iv Randomized Algorithms for Scientific Computing learning improve the bias-variance tradeoff. In the early 2000s, compressed sensing, based on random matrices and sketching of signals, dramatically changed signal processing. The 2010s saw a flurry of novel randomization results in linear algebra and optimization, accelerated by the pursuit of problems of growing scale in AI and bringing non-asymptotic guarantees for many problems. The accelerating evolution of randomized algorithms and the unrelenting tsunami of data from experiments, observations, and simulations have combined to motivate research in randomized algorithms focused on problems specific to DOE. The new results in artificial intelligence and elsewhere are just the tip of the iceberg in foundational research, not to mention specialization of the methods for distinctive applications. Deploying randomized algorithms to advance AI for Science within DOE requires new skill sets, new analysis, and new software, expanding with each new application. To that end, the DOE convened a workshop to discuss such issues. This report summarizes the outcomes of that workshop, “Randomized Algorithms for Scientific Computing (RASC),” held virtually across four days in December 2020 and January 2021. Partic- ipants were invited to provide input, which formed the basis of the workshop discussions as well as this report, compiled by the workshop writing committee. The report contains a summary of drivers and motivation for pursuing this line of inquiry, a discussion of possible research directions (summarized in Table1), and themes and recommendations summarized in the next two pages. Table 1: A sampling of possible research directions in randomized algorithms for scientific computing Analysis of random algorithms for production conditions Randomized algorithms cost and error models for emerging hardware Incorporation of sketching for solving subproblems Specialized randomization for structured problems Overcoming of parallel computational bottlenecks with probabilistic estimates Randomized optimization for DOE applications Computationally efficient sampling Stratified and topologically aware sampling Scientifically informed sampling Reproducibility Randomized

Randomized Algorithms for Scientific Computing (RASC)

Anna C. Gilbert

Martin J. Strauss April 22, 2015

Iasinstitute for Advanced Study

Curriculum Vitae

Focm 2017 Foundations of Computational Mathematics Barcelona, July 10Th-19Th, 2017 Organized in Partnership With

Dynamic Ham-Sandwich Cuts in the Plane∗

Jelani Nelson Current Position Previous Positions Education

Focm 2017 Foundations of Computational Mathematics Barcelona, July 10Th-19Th, 2017 Organized in Partnership With

40 IS10 Abstracts

Curriculum Vitae

Visions in Theoretical Computer Science

Future Directions in Compressed Sensing and the Integration of Sensing and Processing What Can and Should We Know by 2030?