NBER WORKING PAPER SERIES TEXT AS DATA Matthew Gentzkow Bryan T. Kelly Matt Taddy Working Paper 23276 http://www.nber.org/papers/w23276 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 March 2017 The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. At least one co-author has disclosed a financial relationship of potential relevance for this research. Further information is available online at http://www.nber.org/papers/w23276.ack NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2017 by Matthew Gentzkow, Bryan T. Kelly, and Matt Taddy. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Text as Data Matthew Gentzkow, Bryan T. Kelly, and Matt Taddy NBER Working Paper No. 23276 March 2017 JEL No. C1 ABSTRACT An ever increasing share of human interaction, communication, and culture is recorded as digital text. We provide an introduction to the use of text as an input to economic research. We discuss the features that make text different from other forms of data, offer a practical overview of relevant statistical methods, and survey a variety of applications. Matthew Gentzkow Matt Taddy Department of Economics Microsoft Research New England Stanford University 1 Memorial Drive 579 Serra Mall Cambridge MA 02142 Stanford, CA 94305 and University of Chicago Booth School of Business and NBER
[email protected] [email protected] Bryan T.