Machine Learning in Compiler Optimisation Zheng Wang and Michael O’Boyle
Total Page:16
File Type:pdf, Size:1020Kb
1 Machine Learning in Compiler Optimisation Zheng Wang and Michael O’Boyle Abstract—In the last decade, machine learning based com- hope of improving performance but could in some instances pilation has moved from an an obscure research niche to damage it. a mainstream activity. In this article, we describe the rela- Machine learning predicts an outcome for a new data point tionship between machine learning and compiler optimisation and introduce the main concepts of features, models, training based on prior data. In its simplest guise it can be considered and deployment. We then provide a comprehensive survey and a from of interpolation. This ability to predict based on prior provide a road map for the wide variety of different research information can be used to find the data point with the best areas. We conclude with a discussion on open issues in the outcome and is closely tied to the area of optimisation. It is at area and potential research directions. This paper provides both this overlap of looking at code improvement as an optimisation an accessible introduction to the fast moving area of machine learning based compilation and a detailed bibliography of its problem and machine learning as a predictor of the optima main achievements. where we find machine-learning compilation. Optimisation as an area, machine-learning based or other- Index Terms—Compiler, Machine Learning, Code Optimisa- tion, Program Tuning wise, has been studied since the 1800s [8], [9]. An interesting question is therefore why has has the convergence of these two areas taken so long? There are two fundamental reasons. I. INTRODUCTION Firstly, despite the year-on year increasing potential perfor- “Why would anyone want to use machine learning to build a mance of hardware, software is increasingly unable to realise it compiler?” It’s a view expressed by many colleagues over the leading to a software-gap. This gap has yawned right open with last decade. Compilers translate programming languages writ- the advent of multi-cores (see also Section VI-B). Compiler ten by humans into binary executable by computer hardware. writers are looking for new ways to bridge this gap. It is a serious subject studied since the 50s [1], [2], [3] where Secondly, computer architecture evolves so quickly, that it correctness is critical and caution is a by-word. Machine- is difficult to keep up. Each generation has new quirks and learning on the other hand is an area of artificial intelligence compiler writers are always trying to play catch-up. Machine aimed at detecting and predicting patterns. It is a dynamic field learning has the desirable property of being automatic. Rather looking at subjects as diverse as galaxy classification [4] to than relying on expert compiler writers to develop clever predicting elections based on tweeter feeds [5]. When an open- heuristics to optimise the code, we can let the machine learn source machine learning compiler was announced by IBM in how to optimise a compiler to make the machine run faster, an 2009 [6], some wry slashdot commentators picked up on the approach sometimes referred to as auto-tuning [10], [11], [12], AI aspect, predicting the start of sentient computers, global net [13]. Machine learning is, therefore, ideally suited to making and the war with machines from the Terminator film series. any code optimization decision where the performance impact In fact as we will see in this article, compilers and machine depends on the underlying platform. As described later in this learning are a natural fit and have developed into an established paper, it can be used for topics ranging from selecting the research domain. best compiler flags to determining how to map parallelism to processors. A. It’s all about optimization Machine learning is part of a tradition in computer science arXiv:1805.03441v1 [cs.PL] 9 May 2018 and compilation in increasing automation The 50s to 70s were Compiler have two jobs – translation and optimisation. spent trying to automate compiler translation, e.g. lex for They must first translate programs into binary correctly. lexical analysis [14] and yacc for parsing [15], the last decade Secondly they have to find the most efficient translation by contrast has focussed on trying to automating compiler possible. There are many different correct translations whose optimisation. As we will see it is not “magic” or a panacea for performance varies significantly. The vast majority of research compiler writers, rather it is another tool allowing automation and engineering practices is focussed on this second goal of of tedious aspects of compilation providing new opportunities performance, traditionally misnamed optimisation. The goal for innovation. It also brings compilation nearer to the stan- was misnamed because in most cases, till recently finding dards of evidence based science. It introduces an experimental an optimal translation was dismissed as being too hard to methodology where we separate out evaluation from design find and an unrealistic endeavour1. Instead it focussed on and considers the robustness of solutions. Machine learning developing compiler heuristics to transform the code in the based schemes in general have the problem of relying on Z. Wang is with MetaLab, School of Computing and Communications, black-boxes whose working we do not understand and hence Lancaster University, U. K. E-mail: [email protected] trust. This problem is just as true for machine learning based M. O’Boyle is with School of Informatics, University of Edinburgh, U. K. compilers. In this paper we aim to demystify machine learning E-mail: [email protected] 1In fact the term superoptimizer [7] was coined to describe systems that based compilation and show it is a trustworthy and exciting tried to find the optimum direction for compiler research. 2 Features for new program #inst. Supervised … #load ... for( ) { Machine Model Model ... #branch Training programs Learner New Prediction } cache miss rate programs + program Features oftraining ... Optimal options (a) Feature engineering (b) Learning a model (c) Deployment Fig. 1: A generic view of supervised machine learning in compilers. The remainder of this article is structured as follows. We Standard machine learning algorithms typically work on first give an intuitive overview for machine learning in compil- fixed length inputs, so the selected properties will be sum- ers in Section II. We then describe how machine learning can marised into a fixed length feature vector. Each element of be used to search for or to directly predict good compiler opti- the vector can be an integer, real or Boolean value. The mizations in Section III. This is followed by a comprehensive process of feature selection and tuning is referred as feature discussion in Section IV for a wide range of machine learning engineering. This process may need to iteratively perform models that have been employed in prior work. Next, in multiple times to find a set of high-quality features to build a Section V, we review how previous work chooses quantifiable accurate machine learning model. In Section V, we provide a properties, or features, to represent programs. We discuss the comprehensive review of feature engineering for the topic of challenges and limitations for applying machine learning to program optimisation. compilation, as well as open research directions in Section VII before we summarise and conclude in Section VIII. B. Learning a model The second step is to use training data to derive a model II. OVERVIEW OF MACHINE LEARNING IN COMPILERS using a learning algorithm. This process is depicted in Fig- Given a program, compiler writers would like to know what ure 1b. Unlike other applications of machine learning, we compiler heuristic or optimisation to apply in order to make typically generate our own training data using existing ap- the code better. Better often means execute faster, but can plications or benchmarks. The compiler developer will select also mean smaller code footprint or reduced power. Machine training programs which are typical of the application domain. learning can be used to build a model used within the compiler, For each training program, we calculate the feature values, that makes such decisions for any given program. compiling the program with different optimisation options, and running and timing the compiled binaries to discover the best- There are two main stages involved: learning and deploy- performing option. This process produces, for each training ment. The first stage learns the model based on training data, program, a training instance that consists of the feature values while the second uses the model on new unseen programs. and the optimal compiler option for the program. Within the learning stage, we needs a way of representing The compiler developer then feeds these examples to a programs in a systematic way. This representation is known machine learning algorithm to automatically build a model. as the program features [16]. The learning algorithms job is to find from the training Figure 1 gives a intuitive view how machine learning can examples a correlation between the feature values and the be applied to compilers. This process which includes feature optimal optimisation decision. The learned model can then engineering, learning a model and deployment is described in be used to predict, for a new set of features, what the optimal the following sub-sections. optimisation option should be. Because the performance of the learned model strongly A. Feature engineering depends on how well the features and training programs are Before we can learn anything useful about programs, we chosen, so that the processes of featuring engineering and first need to be able to characterise them. Machine learn- training data generation often need to repeat multiple times. ing relies on a set of quantifiable properties, or features, to characterise the programs (Figure 1a).