A High Performance Backend for Array-Oriented Programming on Next-Generation Processing Units Simon Andreas Frimann Lund

NIELS BOHR INSTITUTE FACULTY OF SCIENCE UNIVERSITY OF COPENH AGEN DENMARK A High Performance Backend for Array-Oriented Programming on Next-Generation Processing Units Simon Andreas Frimann Lund Academic advisor: Brian Vinter September 2015 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Submitted: September 2015 Defended: Final version: Advisor: Brian Vinter, University of Copenhagen, Denmark Cover art: Photo from the city archives photographic atelier (starbas.net Ref. 2134/TMF07209) of the Copenhagen harbor and the Langebro bridge Anno 1954. The bridge then carried commuters and freight trains to and from the factories along the docks of the harbor. The bridge opened several times an hour for ships supplying the factories with coal. The freight train bridge was later converted to carry pedestrians and bicyclist. Today the bridge opens a couple of times a week. Abstract The financial crisis, which started in 2008, spawned the HIPERFIT research center as a preventive measure against future financial crises. The goal of prevention is to be met by improving mathematical models for finance, the verifiable description of them in domain-specific languages and the efficient execution of them on high performance systems. This work investigates the requirements for, and the implementation of, a high performance backend supporting these goals. This involves an outline of the hardware available today, in the near future and how to program it for high performance. The main challenge is to bridge the gaps between performance, productivity and portability. A declarative high-level array-oriented programming model is explored to achieve this goal and a backend implemented to support it. Different strategies to the backend design and application of optimizations are analyzed and experimentally tested. Result- ing in the design and implementation of Bohrium a runtime-system for transforming, scheduling and executing array-oriented programs. Multiple interfaces for existing languages such as Python, C++, C#, and F# have been built which utilize the backend. A suite of benchmarks applications, implemented in these languages, demonstrate the high-level declarative form of the programming model. Performance studies show that the high-level declarative programming model can be used to not only match but also exceed the performance of hand-coded imple- mentations in low-level languages. Acknowledgements With this thesis, I end my studies in computer science. Although, come to think it, I probably won’t. The curiosity that was awakened when I as a kid peered over my eldest brother Benjamin’s shoulder and saw the wonder of an i486DX is not going away anytime soon. However, what will end are my studies as a Ph.D. student thereby concluding a decade spent as a student at the University of Copenhagen. So for me this is the end of an era, a time that I am thankful for and have many to thank for. First and foremost then I would like to thank my immediate family. Mor, Far, Benjamin, Zanne, Tobias, Heidi, my awesome nephews Sebastian, Valdemar, and Alexander: Thank you for your encouragement, your unconditional love, your support when challenges were plenty, for your patience and understanding when I was in wired into another world, and for being there when I got back out. Also, thanks to my uncles, aunts, and cousins for following my process a bit. And even though you probably cannot read this Søren: Thanks for being my uncle and for the positive impact you have had, and continue to have on my life. Thanks also, to my friends A.C., Johannes, Laura, Marko, Matias and Phie for reminding me to go outside once in a while. Thanks to Astrid for her encouragement in embarking on this voyage. Thanks to Frederik in all his capacities as a colleague, friend, and of course proofrederr. A special thanks to Lene for turning her head just when she did and let her eyes meet mine. The decade of studies has gone by fast; I have had the opportunity to swim in a sea of knowledge taking to shore different topics of interest. I have with my Ph.D. had the pleasure of diving deep into a topic, meet very talented researchers, work with bright minds and explore interesting subjects. For all this, I can thank my advisor Brian Vinter. Thanks to all the members of the distlab, image, and eScience groups for the entertainment and great work environment. Thanks to those I have worked closely with over the years: James Avery, Kenneth Skovhede, Klaus Birkelund Jensen, Mads R. B. Kristensen, Troels Blum, and Weifeng Lui. Although he puts curly brackets in the wrong places, I especially want to thank Mads for his insight, approach and letting me pick his mind. Thanks to Bradford L. Chamberlain for hosting my visit to Cray Inc. and for his inspiring research on ZPL and Chapel. Thanks to the Chapel Team: Ben Harshbarger, David Iten, Elliot Ronaghan, Greg Titus, Kyle Brady, Lydia Duncan, Mike Noakes, Sung-Eun Choi, Thomas Van Doren, Tim Tzakian, Tom Hildebrandt, Tom MacDonald, Vassily Litvinov and thanks to the fascinating people of Seattle that I crossed paths with. Especially Erika, Chris, David, and Monique. Thanks to the Danish Strategic Research Council, Program Committee for Strategic Growth Technologies, for the research center ’HIPERFIT: Functional High Performance Computing for Financial Information Technology’ (hiperfit.dk) under contract number 10-092299 which has partially funded me. And thanks to the University of Copenhagen and the Niels Bohr Institute. And just because I can write anything in acknowledgements, I would like to thank music for keeping a link to the surface as I dove deep down in my bubble. Especially the sounds of: A-Trak, Air, Alt-J, Amon Tobin, Apparat, Arcade Fire, Basement Jaxx, Beach House, Beirut, Birdy Nam Nam, Bjørn Svin, the Black Keys, Bloc Party, Blur, Boards of Canada, Bon Iver, Bonobo, Boys Noize, Cari- bou, the Chemical Brothers, Com Truise, the Cranberries, Crystal Castles, Daft Punk, Digitalism, Diplo, Dirty Projectors, Disclosure, DJ Shadow, Fever Ray, FKA Twigs, Florence + The Machine, Foo Fighters, Four Tet, Friendly Fires, the Fugees, the Glitch Mob, Gold Panda, Groundislava, Hawksley Workman, Hercules and Love Affair, Hot Chip, Interpol, Jamie Woon, Jamie XX, Jimi Hendrix, Joy Division, Justice, Kashmir, Kasper Bjørke, Keane, Kent, Kings of Convenience, Klaxons, the Knife, La Roux, LCD Soundsystem, Leonard Cohen, Little Dragon, the Low Frequency in Stereo, M.I.A., Malk De Koijn, Manic Street Preachers, Massive Attack, Metric, MGMT, Moderat, Mod- eselektor, Modest Mouse, Mogwai, Morcheeba, MSTRKRFT, Muse, Mø, the National, New Order, Nick Cave and The Bad Seeds, Nicolas Jaar, Nils Frahm, Nirvana, Nosaj Thing, Oasis, Orbital, the Pixies, PJ Harvey, Placebo, Portishead, Pretty Lights, Pulp, Queen, Queens of the Stone Age, the Raconteurs, Radiohead, Ratatat, RJD2, Royksopp, Santigold, SBTRKT, Schoolboy Q, Sia, Simian Mobile Disco, Slow Magic, the Smashing Pumpkins, Snow Patrol, Sort Sol, Soulwax, the Strokes, Suspekt, Tiga, Todd Terje, Tool, Travis, Trolle Siebenhaar, Turboweekend, UNKLE, the Verve, When Saints Go Machine, White Lies, the White Stripes, the Whitest Boy Alive, Yeah Yeah Yeahs, Yeasayer, Yuna and Kate Bush for making me keep running up the hill. Contents I Extended Abstract 1 1 Context 2 1.1 Structure of the Thesis . 3 1.2 Computing Platforms and Architectures . 4 1.3 Backends and Languages . 10 1.4 Approach . 18 1.5 Contributions . 20 2 Programming Model 24 2.1 Expressions . 24 2.2 Data Representation . 28 3 Bohrium 30 3.1 Bytecode . 31 3.2 Runtime components . 32 3.3 Language Bridges . 35 4 Ongoing and Future Work 39 4.1 Descriptive Power . 39 4.2 Targeting accelerators with CAPE . 39 4.3 BXX: Bohrium for C++ . 40 4.4 PyChapel . 43 4.5 Benchpress . 44 5 Conclusion 45 Bibliography 46 II Publications 52 6.1 Automatic mapping of array operations to specific architectures . 53 6.2 cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications . 82 6.3 Doubling the Performance of Python/NumPy with less than 100 SLOC . 91 6.4 Bypassing the Conventional Software Stack Using Adaptable Runtime Systems . 98 6.5 Bohrium: a Virtual Machine Approach to Portable Parallelism . 111 6.6 Just-In-Time Compilation of NumPy Vector Operations . 120 6.7 Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster . 132 6.8 NumCIL and Bohrium: High productivity and high performance . 141 6.9 Separating NumPy API from Implementation . 152 6.10 Prototyping for Exascale . 162 6.11 Scripting Language Performance Through Interoperability . 168 6.12 Fusion of Array Operations at Runtime . 171 Part I Extended Abstract 1 Chapter 1 Context The financial crisis, which started in 2008, created a recession in the world economy which resulted in unemployment and instability. One of the triggering factors in the onset of this crisis was the poor performance of the banks in terms of pricing their financial products and assessing the risk of their financial transactions. National and international authorities have consequently decided to tighten regulations in the financial sector for example requiring that banks document the risk of every loan. Regulation and legislation are means of trying to prevent a new crisis, but it does not solve the problem that banks are performing poorly. Banks must directly address the problem at hand by strengthening their capabilities. A Danish initiative, which focuses on this approach, is the academic research center HIPERFIT. It is the goal of HIPERFIT to come up with new mathematical models within financial mathematics and develop powerful and safe tools to evaluate them; thereby strengthening the banks’ capabilities. The research areas of HIPERFIT are cross-discipline as figure 1.1 illustrates. MF DSL FP HPS Risk Scenarios Model Specification Financial Information Specification Extracting Parallism High Performance Backends Parallel High Mathematical Domain-Specific Functional Performance Finanance Languages Programming Systems Figure 1.1: Research areas of the HIPERFIT research center.

Load more