Phd Thesis Troels Henriksen — [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF COPENHAGEN FACULTY OF SCIENCE PhD thesis Troels Henriksen — [email protected] Design and Implementation of the Futhark Programming Language (Revised) Supervisors: Cosmin Eugen Oancea and Fritz Henglein December, 2017 Abstract In this thesis we describe the design and implementation of Futhark, a small data-parallel purely functional array language that offers a machine-neutral pro- gramming model, and an optimising compiler that generates efficient OpenCL code for GPUs. The overall philosophy is based on seeking a middle ground between functional and imperative approaches. The specific contributions are as follows: First, we present a moderate flattening transformation aimed at enhancing the degree of parallelism, which is capable of exploiting easily accessible par- allelism. Excess parallelism is efficiently sequentialised, while keeping access patterns intact, which then permits further locality-of-reference optimisations. We demonstrate this capability by showing instances of automatic loop tiling, as well as optimising memory access patterns. Second, to support the flattening transformation, we present a lightweight system of size-dependent types that enables the compiler to reason symboli- cally about the size of arrays in the program, and that reuses general-purpose compiler optimisations to infer relationships between sizes. Third, we furnish Futhark with novel parallel combinators capable of ex- pressing efficient sequentialisation of excess parallelism, as well as their fusion rules. Fourth, in order to express efficient programmer-written sequential code inside parallel constructs, we introduce support for safe in-place updates, with type system support to ensure referential transparency and equational reason- ing. Fifth, we perform an evaluation on 21 benchmarks that demonstrates the impact of the language and compiler features, and shows application-level per- formance that is in many cases competitive with hand-written GPU code. Sixth, we make the Futhark compiler freely available with full source code and documentation, to serve both as a basis for further research into functional array programming, and as a useful tool for parallel programming in practice. ii Resumé Denne afhandling beskriver udformningen og implementeringen af Futhark, et enkelt data-parallelt, sideeffekt-frit, og funktionsorienteret geledsprog, der frembyder en maskinneutral programmeringsmodel. Vi beskriver ligeledes den tilhørende optimerende Futhark-oversætter, som producerer effektiv OpenCL- kode målrettet afvikling på GPUer. Den overordnede designfilosofi er at yd- nytte både functionsorienterede og imperative fremgangsmåder. Voreskonkrete bidrag er som følger: For det første præsenterer vi en moderat fladningstransformering, der er i stand til at udnytte blot den grad af parallelisme som er nødvendig eller lettil- gængelig, og omdanne overskydende parallelisme til effektiv sekventiel kode. Denne sekventielle kode bibeholder oprindelig lageradgangsmønsterinforma- tion, hvilket tillader yderligere lagertilgangsforbedringer. Vi demonstrerer nyt- ten af denne egenskab ved at give eksempler på automatisk blokafvikling af løkker, samt ændring af lageradgangsmønstre således at GPUens lagersystem udnyttes bedst muligt. For det andet beskriver vi, med henblik på understøttelse af fladningstrans- formeringen, et enkelt typesystem med størrelses-afhængige typer, der tillader oversætteren at ræsonnere symbolsk om størrelsen på geledder i programmet under oversættelse. Vores fremgangsmåde tillader genbrug af det almene reper- toire af oversætteroptimeringer i spørgsmål om ligheder mellem størrelser. For det tredje udstyrer vi Futhark med en række nyskabedne parallelle kom- binatorer der tillader effektiv sekventialisering af unødig parallelisme, samt disses fusionsregler. For det fjerde indfører vi, med henblik på at understøtte effektiv sekven- tiel kode indlejret i de parallelle sprogkonstruktioner, understøttelse for direkte ændringer i geledværdier. Denne understøttelse sikres af et typesystem der garanterer at effekten ikke kan observeres, og at lighedsbaseret ræsonnering stadigvæk er muligt. For det femte foretager vi en ydelsessammenlining indeholdende 21 pro- grammer, med henblik på at demonstrere sprogets praktiske anvendelighed og oversætteroptimeringernes indvirkning. Vores resultater viser at Futhark’s overordnede ydelse i mange tilfælde er konkurrencedygtig med håndskreven GPU-kode. For det sjette gør vi Futhark-oversætteren frit tilgængelig, inklusive al kilde- tekst og omfattende dokumentation, således at den kan tjene både som et udgangspunkt for yderligere forskning i funktionsorienteret geledprogrammering, samt som et praktisk andvendeligt værktøj til parallelprogrammering. iii Contents Preface v Part I Land, Logic, and Language 1 Introduction 2 2 Background and Philosophy 6 3 An Array Calculus 48 4 Parallelism and Hardware Constraints 53 Part II An Optimising Compiler 5 Overview and Uniqueness Types 70 6 Size Inference 90 7 Fusion 108 8 Moderate Flattening and Kernel Extraction 129 9 Optimising for Locality of Reference 139 10 Empirical Validation 149 Part III Closing Credits 11 Conclusions and Future Work 162 Bibliography 165 iv Preface This thesis is submitted in fulfillment of the PhD programme in computer science (Datalogi) at the University of Copenhagen, for Troels Henriksen, under the supervi- sion of Cosmin Eugen Oancea and Fritz Henglein. Publications Of the peer-reviewed papers I have published during my studies, the following con- tribute directly to this thesis: HENRIKSEN, TROELS, MARTIN ELSMAN, and COSMIN EOANCEA. “Size slicing: a hybrid approach to size inference in Futhark”. In: Proc. of the 3rd ACM SIGPLAN workshop on Functional high-performance computing. ACM. 2014, pp. 31–42 HENRIKSEN, TROELS, KEN FRIIS LARSEN, and COSMIN E. OANCEA. “Design and GPGPU Performance of Futhark’s Redomap Construct”. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Li- braries, Languages, and Compilers for Array Programming. ARRAY 2016. Santa Barbara, CA, USA: ACM, 2016, pp. 17–24 HENRIKSEN, TROELS, NIELS GW SERUP, MARTIN ELSMAN, FRITZ HENGLEIN, and COSMIN EOANCEA. “Futhark: purely functional GPU-programming with nested parallelism and in-place array updates”. In: Proceedings of the 38th ACM SIGPLAN Conference on Program- ming Language Design and Implementation. ACM. 2017, pp. 556–571 While the following do not: HENRIKSEN, TROELS and COSMIN EOANCEA. “Bounds checking: An instance of hybrid analysis”. In: Proceedings of ACM SIGPLAN In- ternational Workshop on Libraries, Languages, and Compilers for Array Programming. ACM. 2014, p. 88 v CONTENTS HENRIKSEN, TROELS, MARTIN DYBDAL, HENRIK URMS, ANNA SOFIE KIEHN, DANIEL GAVIN, HJALTE ABELSKOV, MARTIN ELSMAN, and COSMIN OANCEA. “APL on GPUs: A TAIL from the Past, Scribbled in Futhark”. In: Procs. of the 5th Int. Workshop on Functional High- Performance Computing. FHPC’16. Nara, Japan: ACM, 2016, pp. 38– 43 LARSEN, RASMUS WRIEDT and TROELS HENRIKSEN. “Strategies for Regular Segmented Reductions on GPU”. in: Proceedings of the 6th ACM SIGPLAN Workshop on Functional High-performance Computing. FHPC ’17. New York, NY, USA: ACM, 2017 Acknowledgements Language design and compiler hacking easily becomes a lonely endeavour. I would in particular like to thank Cosmin Oancea, Martin Elsman, and Fritz Henglein for being willing to frequently and vigorously discuss language design issues. I am also grateful to all students who contributed to the design and implementation of Futhark. In no particular order: Niels G. W. Serup, Chi Pham, Maya Saietz, Daniel Gavin, Hjalte Abelskov, Mikkel Storgaard Knudsen, Jonathan Schrøder Holm Hansen, Sune Hellfritzsch, Christian Hansen, Lasse Halberg Haarbye, Brian Spiegelhauer, William Sprent, René Løwe Jacobsen, Anna Sofie Kiehn, Henrik Urms, Jonas Brunsgaard, and Rasmus Wriedt Larsen. Development of the compiler was aided by Giuseppe Bilotta’s advice on OpenCL programming, and James Price’s work on oclgrind [PM15]. I am also highly appreciative of the work of all who contributed code or feedback to the Futhark project on Github1, including (but not limited to): David Rechnagel Ud- sen, Charlotte Tortorella, Samrat Man Singh, maccam912, mrakgr, Pierre Fenoll, and titouanc I am also particularly grateful for Cosmin Oancea’s choice to devise a research programme that has allowed me the opportunity to spend three years setting up GPU drivers on Linux. And relatedly, I am grateful to NVIDIA for donating the K40 GPU used for most of the empirical results in this thesis. And of course, I am grateful that Lea was able to put up with me for these three and a half years, even if I still do not understand why. For this revised version of my thesis, I am also grateful to the feedback of my as- sessment committee—Torben Mogensen, Mary Sheeran, and Alan Mycroft—whose careful reading resulting in many improvements to the text. The cover image depicts the common European hedgedog (Erinaceus europaeus), and is from Iconographia Zoologica, a 19th century collection of zoological prints. Like hedgehogs, functional languages can be much faster than they might appear. 1https://github.com/diku-dk/futhark vi Part I Land, Logic, and Language 1 Chapter 1 Introduction This thesis describes the design and implementation of Futhark, a data parallel func- tional array language. Futhark, named after the first six letters of the Runic alphabet, is a small programming language that is superficially similar to established functional languages such as OCaml and Haskell, but with