Machine Learning for Sociologya Mario Molina (
[email protected])∗ Filiz Garip (
[email protected])∗ *Department of Sociology, Cornell University, Ithaca, NY, 14853, USA January 9, 2019 aThis manuscript is forthcoming in the Annual Review of Sociology, vol. 45, 2019. Abstract Machine learning is a field at the intersection of statistics and computer science that uses algorithms to extract information and knowledge from data. Its applications increasingly find their way into economics, political science, and sociology. We offer a brief intro- duction into this vast toolbox, and illustrate its current uses in social sciences, including distilling measures from new data sources, such as text and images; characterizing pop- ulation heterogeneity; improving causal inference, and offering predictions to aid policy decisions and theory development. In addition to providing similar use in sociology, we argue that ML tools can speak to long-standing questions on the limitations of the linear modeling framework; the criteria for evaluating empirical findings; transparency around the context of discovery, and the epistemological core of the discipline. Keywords: supervised learning, unsupervised learning, causal inference, prediction, het- erogeneity, discovery Introduction Machine learning (ML) seeks to automate discovery from data. It represents a break- through in computer science where intelligent systems typically involved fixed algorithms (logical set of instructions) that code the desired output for all possible inputs. Now, intelligent systems `learn' from data, and estimate complex functions that discover rep- resentations of some input (X), or link the input to an output (Y ) in order to make predictions on new data (Jordan & Mitchell, 2015). ML can be viewed as an off-shoot of non-parametric statistics (Kleinberg et al., 2015).