Designing Algorithms for Machine Learning and Data Mining

Designing Algorithms for Machine Learning and Data Mining Antoine Cornuéjols and Christel Vrain Abstract Designing Machine Learning algorithms implies to answer three main questions: First, what is the space H of hypotheses or models of the data that the algorithm considers? Second, what is the inductive criterion used to assess the merit of a hypothesis given the data? Third, given the space H and the inductive criterion, how is the exploration of H carried on in order to find a as good as possible hypothesis?Anylearningalgorithmcanbeanalyzedalongthesethreequestions.This chapter focusses primarily on unsupervised learning, on one hand, and supervised learning, on the other hand. For each, the foremost problems are described as well as the main existing approaches. In particular, the interplay between the structure that can be endowed over the hypothesis space and the optimisation techniques that can in consequence be used is underlined. We cover especially the major existing methods for clustering: prototype-based, generative-based, density-based, spectral based, hierarchical, and conceptual and visit the validation techniques available. For supervised learning, the generative and discriminative approaches are contrasted and awidevarietyoflinearmethodsinwhichweincludetheSupportVectorMachines and Boosting are presented. Multi-Layer neural networks and deep learning methods are discussed. Some additional methods are illustrated, and we describe other learning problems including semi-supervised learning, active learning, online learning, transfer learning, learning to rank, learning recommendations, and identifying causal relationships. We conclude this survey by suggesting new directions for research. 1 Introduction Machine Learning is the science of, on one hand, discovering the fundamental laws that govern the act of learning and, on the other hand, designing machines that learn from experiences, in the same way as physics is both the science of uncovering the A. Cornuéjols (B) UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, 75005 Paris, France e-mail: [email protected] C. Vrain LIFO, EA 4022, University of Orléans, 45067 Orleans, France e-mail: [email protected] ©SpringerNatureSwitzerlandAG2020 339 P. Marquis et al. (eds.), A Guided Tour of Artificial Intelligence Research, https://doi.org/10.1007/978-3-030-06167-8_12 340 A. Cornuéjols and C. Vrain laws of the universe and of providing knowledge to make, in a very broad sense, machines. Of course, “understanding” and “making” are tightly intertwined, in that aprogressinoneaspectgenerallybenefitstotheotheraspect.ButaMachineLearn- ing scientist can feel more comfortable and more interested in one end of the spectrum that goes from ‘theorize Machine Learning’ to ‘making Machine Learning’. Machine Learning is tied to data science, because it is fundamentally the science of induction, that tries to uncover general laws and relationships from some set of data. However, it is interested as much in understanding how it is possible to use very few examples, like when you learnt how to avoid a “fork” in chess from one experienceonly,ashowtomakesenseoflargeamountofdata.Thus,“bigdata”isnot synonymous with Machine Learning. In this chapter, we choose not to dwell upon the problems and techniques associated with gathering data and realize all the necessary preprocessing phases. We will mostly assume that this has been done in such a way that looking for patterns in the data will not be too compromised by imperfections of the data at hand. Of course, any practitioner of Machine Learning will know that this is a huge assumption and that the corresponding work is of paramount importance. Before looking at what can be a science of designing learning algorithms, it is interesting to consider basic classical Machine Learning scenarios. 2 Classical Scenarios for Machine Learning Alearningscenarioisdefinedbytheexchangesbetweenthelearneranditsenviron- ment. Usually, this goes hand in hand with the target task given to the system. In supervised learning,thelearnerreceivesasetofexamplesS (xi , yi ) 1 i m ={ } ≤ ≤ from the environment, each composed of a set of explanatory or input variables xi and of output variable(s) yi ,ofwhichthevaluemustbepredictedwhentheexplana- tory variables are observed. The goal for the learner is to be able to make predictions about the output values given input values. For example, the learner may receive data about patients registered in an hospital, in the form of pairs (measures made on the patient, diagnostic), and aims at being able to give a correct diagnostic for new arriving patients on which measurements are available. By contrast, the objective of unsupervised learning is not to make predictions from input values to output values, but to reveal possible hidden structures in the data set, S x1,...,xm ,ortodetectcorrelationsbetweenthevariables.Ifthese putative structures={ or regularities} may sometimes be extrapolated to other data col- lections, this is not the primary goal of unsupervised learning. Athirdtypeoflearning,ofgrowingimportance,isreinforcement learning (see chapter “Reinforcement Learning” of Volume 1). There, the learner acts in the environment and therefore must be able to decide on the action to take in each successive state encountered in its peregrinations. The trick is that the learner receives reinforcement signals, positive or negative, from time to time, sometimes long after the action that triggered it has been performed. It is therefore not easy to determine which actions are the best in each possible state. The goal of the learner is to maximize the Designing Algorithms for Machine Learning and Data Mining 341 cumulated reinforcement over time even though the credit assignment problem is hard to solve. Reinforcement learning is at the core of the famous AlphaGo system that beat one of the strongest Go player in the world in March 2016, and is now believed to far outclass any human player (see chapter “Artificial Intelligence for Games” of this volume). One important distinction is between descriptive learning and predictive learning. Descriptive learning aims at finding regularities in the data in the hope that they may help to better understand the phenomenon under study. For instance, descriptive learning may uncover different groups in a population of customers, which may in turn help to understand their behavior and suggest different marketing strategies. Descriptive learning is strongly linked to unsupervised learning. Predictive learning is concerned with finding rules that allow one to make prediction when given a new instance. Predictive learning is therefore inherently extrapolative. Its justification is in making prediction for new instances, while descriptive learning is turned towards the examples at hand, and not, at least directly, towards telling something about new instances. Predictive learning is tightly associated with supervised learning. Sometimes, a third type of learning, called prescriptive learning,ismentioned. The goal there is to extract information about what can be levers or control actions that would allow one to alter the course of some phenomenon (e.g. climatic change, the diet of the general population). Generally, gaining control means that causal factors have been identified. And this is not the same as being able to predict some events based on the occurrence of some other ones, which might be done by discovering correlations between events. Therefore, special techniques and some specific form of knowledge have to be called up to confront this challenge. 2.1 The Outputs of Learning It is useful to clarify what is the output of learning. It can indeed change in function of the application. A learning algorithm A can be seen as a machine that takes as input a data set S and produces as output either a model (loosely speaking) of the world M or a decision procedure h, formally A S M or h. Let us consider this last case, there the decision: procedure"→ h is able to associate an output y Y to any input x X .Thenwehaveh x X y Y . ∈ ∈ : ∈ "→ ∈ So, we see that we naturally speak of different outputs–either a model M or afunctionh–without using different words. And the decision function h outputs a prediction y given any input x.Therightinterpretationofthewordoutput is provided by the context, and the reader should always be careful about the intended meaning. In the following of this section, output will mean the output of the learning algorithm A . One important distinction is between the generative and the discriminative models or decision functions. In the generative approach,onetriestolearna(parametric)probabilitydistribu- tion pX over the input space X .Iflearningapreciseenoughprobabilitydistribution is successful, it becomes possible in principle to generate further examples x X ∈ 342 A. Cornuéjols and C. Vrain of which the distribution is indistinguishable from the true underlying distribution. Using the learnt distribution pX ,itispossibletouseitasamodelofthedatain the unsupervised regime, or as a basis for a decision function using maximum a posteriori criterion (see Sect. 3.3 below). Some say that this makes the generative approach “explicative”. This is only true as far as a distribution function provides an explanation. Not every one would agree on this. The discriminative approach does not try to learn a model that allows the gener- ation of more examples. It contents itself with providing either means of deciding when in the supervised mode, or means

Load more