Design and Implementation of the Futhark Programming Language

UNIVERSITY OF COPENHAGEN FACULTY OF SCIENCE PhD thesis Troels Henriksen — [email protected] Design and Implementation of the Futhark Programming Language Supervisors: Cosmin Eugen Oancea and Fritz Henglein August, 2017 Abstract In this thesis we describe the design and implementation of Futhark, a small data-parallel purely functional array language that offers a machine-neutral programming model, and an optimising compiler that generates efficient OpenCL code for GPUs. The overall philosophy is based on seeking a middle ground between functional and imperative approaches. The specific contributions are as follows: First, we present a moderate flattening transformation aimed at enhancing the degree of parallelism, which is capable of exploiting easily accessible parallelism. Excess parallelism is efficiently sequentialised, while keeping access patterns intact, which then permits further locality-of-reference optimisations. We demonstrate this capability by showing instances of automatic loop tiling, as well as optimising memory access patterns. Second, to support the flattening transformation, we present a lightweight system of size-dependent types that enables the compiler to reason symbolically about the size of arrays in the program, and that reuses general-purpose compiler optimisations to infer relationships between sizes. Third, we furnish Futhark with novel parallel combinators capable of ex- pressing efficient sequentialisation of excess parallelism, as well as their fusion rules. Fourth, in order to express efficient programmer-written sequential code inside parallel constructs, we introduce support for safe in-place updates, with type system support to ensure referential transparency and equational reasoning. Fifth, we perform an evaluation on 21 benchmarks that demonstrates the impact of the language and compiler features, and shows application-level performance that is in many cases competitive with hand-written GPU code. Sixth, we make the Futhark compiler freely available with full source code and documentation, to serve both as a basis for further research into functional array programming, and as a useful tool for parallel programming in practise. ii Resumé Denne afhandling beskriver udformningen og implementeringen af Futhark, et enkelt data-parallelt, sideeffekt-frit, og funktionsorienteret geledsprog, der frembyder en maskinneutral programmeringsmodel. Vi beskriver ligeledes den tilhørende optimerende Futhark-oversætter, som producerer effektiv OpenCL- kode målrettet afvikling på GPUer. Den overordnede designfilosofi er at yd- nytte både functionsorienterede og imperative fremgangsmåder. Vores konkrete bidrag er som følger: For det første præsenterer vi en moderat fladningstransformering, der er i stand til at udnytte blot den grad af parallelisme som er nødvendig eller lettil- gængelig, og omdanne overskydende parallelisme til effektiv sekventiel kode. Denne sekventielle kode bibeholder oprindelig lageradgangsmønsterinforma- tion, hvilket tillader yderligere lagertilgangsforbedringer. Vi demonstrerer nyt- ten af denne egenskab ved at give eksempler på automatisk blokafvikling af løkker, samt ændring af lageradgangsmønstre således at GPUens lagersystem udnyttes bedst muligt. For det andet beskriver vi, med henblik på understøttelse af fladningstrans- formeringen, et enkelt typesystem med størrelses-afhængige typer, der tillader oversætteren at ræsonnere symbolsk om størrelsen på geledder i programmet under oversættelse. Vores fremgangsmåde tillader genbrug af det almene reper- toire af oversætteroptimeringer i spørgsmål om ligheder mellem størrelser. For det tredje udstyrer vi Futhark med en række nyskabedne parallelle kombinatorer der tillader effektiv sekventialisering af unødig parallelisme, samt disses fusionsregler. For det fjerde indfører vi, med henblik på at understøtte effektiv sekventiel kode indlejret i de parallelle sprogkonstruktioner, understøttelse for direkte ændringer i geledværdier. Denne understøttelse sikres af et typesystem der garanterer at effekten ikke kan observeres, og at lighedsbaseret ræsonnering stadigvæk er muligt. For det femte foretager vi en ydelsessammenlining indeholdende 21 programmer, med henblik på at demonstrere sprogets praktiske anvendelighed og oversætteroptimeringernes indvirkning. Vores resultater viser at Futhark’s overordnede ydelse i mange tilfælde er konkurrencedygtig med håndskreven GPU-kode. For det sjette gør vi Futhark-oversætteren frit tilgængelig, inklusive al kildetekst og omfattende dokumentation, således at den kan tjene både som et udgangspunkt for yderligere forskning i funktionsorienteret geledprogram- mering, samt som et praktisk andvendeligt værktøj til parallelprogrammering. iii Contents Preface v Part I Land, Logic, and Language 1 Introduction 2 2 Background and Philosophy 6 3 An Array Calculus 47 4 Parallelism and Hardware Constraints 52 Part II An Optimising Compiler 5 Overview and Uniqueness Types 69 6 Size Inference 89 7 Fusion 107 8 Moderate Flattening and Kernel Extraction 127 9 Optimising for Locality of Reference 137 10 Empirical Validation 147 Part III Closing Credits 11 Conclusions and Future Work 160 Bibliography 162 iv Preface This thesis is submitted in fulfillment of the PhD programme in computer science (Datalogi) at the University of Copenhagen, for Troels Henriksen, under the supervi- sion of Cosmin Eugen Oancea and Fritz Henglein. Publications Of the peer-reviewed papers I have published during my studies, the following con- tribute directly to this thesis: Henriksen, Troels, Martin Elsman, and Cosmin E Oancea. “Size slicing: a hybrid approach to size inference in Futhark”. In: Proc. of the 3rd ACM SIGPLAN workshop on Functional high-performance computing. ACM. 2014, pp. 31–42 Henriksen, Troels, Ken Friis Larsen, and Cosmin E. Oancea. “De- sign and GPGPU Performance of Futhark’s Redomap Construct”. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Li- braries, Languages, and Compilers for Array Programming. ARRAY 2016. Santa Barbara, CA, USA: ACM, 2016, pp. 17–24 Henriksen, Troels, Niels GW Serup, Martin Elsman, Fritz Hen- glein, and Cosmin E Oancea. “Futhark: purely functional GPU- programming with nested parallelism and in-place array updates”. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM. 2017, pp. 556–571 While the following do not: Henriksen, Troels and Cosmin E Oancea. “Bounds checking: An instance of hybrid analysis”. In: Proceedings of ACM SIGPLAN Inter- national Workshop on Libraries, Languages, and Compilers for Array Programming. ACM. 2014, p. 88 v CONTENTS Henriksen, Troels, Martin Dybdal, Henrik Urms, Anna Sofie Kiehn, Daniel Gavin, Hjalte Abelskov, Martin Elsman, and Cos- min Oancea. “APL on GPUs: A TAIL from the Past, Scribbled in Futhark”. In: Procs. of the 5th Int. Workshop on Functional High- Performance Computing. FHPC’16. Nara, Japan: ACM, 2016, pp. 38– 43 Larsen, Rasmus Wriedt and Troels Henriksen. “Strategies for Regu- lar Segmented Reductions on GPU”. in: Proceedings of the 6th ACM SIG- PLAN Workshop on Functional High-performance Computing. FHPC ’17. New York, NY, USA: ACM, 2017 Acknowledgements Language design and compiler hacking easily becomes a lonely endeavour. I would in particular like to thank Cosmin Oancea, Martin Elsman, and Fritz Henglein for being willing to frequently and vigorously discuss language design issues. I am also grateful to all students who contributed to the design and implementation of Futhark. In no particular order: Niels G. W. Serup, Chi Pham, Maya Saietz, Daniel Gavin, Hjalte Abelskov, Mikkel Storgaard Knudsen, Jonathan Schrøder Holm Hansen, Sune Hellfritzsch, Christian Hansen, Lasse Halberg Haarbye, Brian Spiegelhauer, William Sprent, René Løwe Jacobsen, Anna Sofie Kiehn, Henrik Urms, Jonas Brunsgaard, and Rasmus Wriedt Larsen. Development of the compiler was aided by Giuseppe Bilotta’s advice on OpenCL programming, and James Price’s work on oclgrind [PM15]. I am also highly appreciative of the work of all who contributed code or feedback to the Futhark project on Github1, including (but not limited to): David Rechnagel Udsen, Charlotte Tortorella, Samrat Man Singh, maccam912, mrakgr, Pierre Fenoll, and titouanc I am also particularly grateful for Cosmin Oancea’s choice to devise a research programme that has allowed me the opportunity to spend three years setting up GPU drivers on Linux. And relatedly, I am grateful to NVIDIA for donating the K40 GPU used for most of the empirical results in this thesis. And of course, I am grateful that Lea was able to put up with me for these three and a half years, even if I still do not understand why. The cover image depicts the common European hedgedog (Erinaceus europaeus), and is from Iconographia Zoologica, a 19th century collection of zoological prints. Like hedgehogs, functional languages can be much faster than they might appear. 1https://github.com/diku-dk/futhark vi Part I Land, Logic, and Language 1 Chapter 1 Introduction This thesis describes the design and implementation of Futhark, a data parallel functional array language. Futhark, named after the first six letters of the Runic alphabet, is a small programming language that is superficially similar to established functional languages such as OCaml and Haskell, but with restrictions and extensions meant to permit compilation into efficient parallel code. While this thesis contains techniques that could be applied in other settings, Futhark has been the overarching context for our work. We demonstrate the

Load more