Big Data: New Tricks for Econometrics†

Total Page:16

File Type:pdf, Size:1020Kb

Big Data: New Tricks for Econometrics† Journal of Economic Perspectives—Volume 28, Number 2—Spring 2014—Pages 3–28 Big Data: New Tricks for Econometrics† Hal R. Varian oomputersmputers aarere nnowow iinvolvednvolved iinn mmanyany eeconomicconomic ttransactionsransactions aandnd ccanan ccaptureapture ddataata aassociatedssociated wwithith tthesehese ttransactions,ransactions, whichwhich ccanan thenthen bbee manipulatedmanipulated C aandnd aanalyzed.nalyzed. CConventionalonventional sstatisticaltatistical aandnd econometriceconometric techniquestechniques suchsuch aass rregressionegression ooftenften wworkork well,well, bbutut ttherehere aarere iissuesssues uuniquenique ttoo bbigig datasetsdatasets thatthat maymay rrequireequire ddifferentifferent ttools.ools. FFirst,irst, tthehe ssheerheer ssizeize ooff tthehe ddataata iinvolvednvolved mmayay rrequireequire mmoreore ppowerfulowerful ddataata mmanipulationanipulation ttools.ools. SSecond,econd, wewe maymay hhaveave mmoreore ppotentialotential ppredictorsredictors tthanhan aappro-ppro- ppriateriate fforor eestimation,stimation, ssoo wwee needneed toto dodo somesome kindkind ofof variablevariable selection.selection. Third,Third, llargearge ddatasetsatasets mmayay aallowllow fforor mmoreore fl eexiblexible relationshipsrelationships tthanhan simplesimple linearlinear models.models. MMachineachine llearningearning ttechniquesechniques ssuchuch aass ddecisionecision ttrees,rees, ssupportupport vvectorector machines,machines, nneuraleural nnets,ets, ddeepeep llearning,earning, aandnd soso onon maymay allowallow forfor moremore effectiveeffective waysways toto modelmodel ccomplexomplex rrelationships.elationships. IInn tthishis eessay,ssay, I wwillill ddescribeescribe a ffewew ooff tthesehese ttoolsools fforor mmanipulatinganipulating aandnd aanalyzingnalyzing bbigig ddata.ata. I bbelieveelieve tthathat tthesehese mmethodsethods hhaveave a llotot ttoo oofferffer aandnd sshouldhould bbee mmoreore wwidelyidely kknownnown aandnd uusedsed bbyy eeconomists.conomists. IInn ffact,act, mmyy sstandardtandard aadvicedvice ttoo ggraduateraduate sstudentstudents tthesehese ddaysays iiss ggoo toto tthehe ccomputeromputer sciencescience ddepartmentepartment andand taketake a classclass inin machinemachine llearning.earning. TTherehere hhaveave bbeeneen veryvery fruitfulfruitful ccollaborationsollaborations betweenbetween computercomputer scien-scien- ttistsists aandnd sstatisticianstatisticians iinn tthehe llastast ddecadeecade oorr sso,o, andand I expectexpect collaborationscollaborations betweenbetween ccomputeromputer sscientistscientists aandnd eeconometriciansconometricians wwillill aalsolso bbee pproductiveroductive iinn tthehe ffuture.uture. ■ Hal Varian is Chief Economist, Google Inc., Mountain View, California, and Emeritus Professor of Economics, University of California, Berkeley, California. His email address is [email protected]. † To access the Appendix and disclosure statements, visit http://dx.doi.org/10.1257/jep.28.2.3 doi=10.1257/jep.28.2.3 4 Journal of Economic Perspectives Tools to Manipulate Big Data EEconomistsconomists hhaveave hhistoricallyistorically dealtdealt wwithith ddataata thatthat fi tsts inin a spreadsheet,spreadsheet, butbut thatthat iiss cchanginghanging asas nnewew mmore-detailedore-detailed ddataata becomesbecomes availableavailable (see(see EinavEinav andand LevinLevin 2013,2013, fforor sseveraleveral eexamplesxamples aandnd discussion).discussion). IfIf youyou havehave moremore thanthan a millionmillion oror soso rowsrows iinn a sspreadsheet,preadsheet, yyouou pprobablyrobably wantwant toto storestore itit inin a relationalrelational database,database, suchsuch asas MMySQL.ySQL. RRelationalelational ddatabasesatabases offeroffer a fl exibleexible wayway toto store,store, manipulate,manipulate, andand retrieveretrieve ddataata uusingsing a SStructuredtructured QQueryuery LLanguageanguage (SQL),(SQL), whichwhich isis easyeasy toto learnlearn andand veryvery uusefulseful fforor ddealingealing wwithith mmedium-sizededium-sized ddatasets.atasets. HHowever,owever, iiff yyouou hhaveave sseveraleveral ggigabytesigabytes ofof ddataata oror sseveraleveral millionmillion observations,observations, sstandardtandard rrelationalelational ddatabasesatabases bbecomeecome uunwieldy.nwieldy. DatabasesDatabases toto managemanage datadata ofof thisthis ssizeize aarere ggenericallyenerically kknownnown aass “NoSQL”“NoSQL” ddatabases.atabases. TThehe termterm iiss uusedsed rratherather lloosely,oosely, bbutut iiss ssometimesometimes iinterpretednterpreted aass mmeaningeaning ““notnot oonlynly SSQL.”QL.” NNoSQLoSQL ddatabasesatabases aarere mmoreore pprimitiverimitive tthanhan SSQLQL ddatabasesatabases iinn ttermserms ooff ddataata mmanipulationanipulation ccapabilitiesapabilities butbut ccanan hhandleandle llargerarger aamountsmounts ooff ddata.ata. DDueue ttoo tthehe rriseise ooff ccomputer-mediatedomputer-mediated ttransactions,ransactions, mmanyany ccompaniesompanies hhaveave ffoundound iitt nnecessaryecessary ttoo ddevelopevelop ssystemsystems ttoo pprocessrocess billionsbillions ofof ttransactionsransactions pperer dday.ay. ForFor eexample,xample, aaccordingccording ttoo SSullivanullivan ((2012),2012), GGoogleoogle hhasas sseeneen 330 trillion0 trillion UURLs,RLs, ccrawlsrawls ooverver 220 billion0 billion ooff tthosehose a dday,ay, aandnd aanswersnswers 1100 billion00 billion ssearchearch queriesqueries a mmonth.onth. AnalyzingAnalyzing eevenven oone day’sne day’s wworthorth ooff datadata ooff thisthis sizesize isis virtuallyvirtually impossibleimpossible withwith conventionalconventional ddatabases.atabases. TThehe cchallengehallenge ooff ddealingealing wwithith ddatasetsatasets ooff tthishis ssizeize lleded ttoo tthehe ddevelopmentevelopment ooff sseveraleveral ttoolsools ttoo mmanageanage aandnd aanalyzenalyze bbigig ddata.ata. A nnumberumber ooff tthesehese ttoolsools aarere pproprietaryroprietary ttoo Google,Google, butbut havehave beenbeen describeddescribed inin aacademiccademic ppublicationsublications inin ssuffiuffi ccientient detaildetail thatthat open-sourceopen-source implementationsimplementations havehave bbeeneen ddeveloped.eveloped. TTable 1able 1 ccontainsontains bbothoth thethe GoogleGoogle namename andand thethe namename ofof relatedrelated oopen-sourcepen-source ttools.ools. FFurtherurther ddetailsetails ccanan bbee ffoundound iinn tthehe WWikipediaikipedia eentriesntries aassociatedssociated wwithith tthehe ttoolool nnames.ames. TThoughhough tthesehese ttoolsools ccanan bbee rrunun oonn a ssingleingle ccomputeromputer fforor llearningearning ppurposes,urposes, rrealeal aapplicationspplications uusese llargearge cclusterslusters ooff ccomputersomputers ssuchuch aass tthosehose pprovidedrovided bbyy AAmazon,mazon, GGoogle,oogle, MMicrosoft,icrosoft, aandnd ootherther ccloud-computingloud-computing pproviders.roviders. TThehe aabilitybility ttoo rrentent rratherather tthanhan bbuyuy ddataata sstoragetorage aandnd pprocessingrocessing hhasas tturnedurned wwhathat wwasas ppreviouslyreviously a fi xxeded ccostost ooff ccomputingomputing iintonto a vvariableariable ccostost aandnd hhasas lloweredowered tthehe bbarriersarriers ttoo eentryntry fforor wworkingorking wwithith bbigig ddata.ata. Tools to Analyze Data TThehe ooutcomeutcome ooff tthehe bbig-dataig-data pprocessingrocessing ddescribedescribed aabovebove iiss ooftenften a ““small”small” ttableable ooff ddataata tthathat mmayay bbee ddirectlyirectly hhumanuman rreadableeadable oorr ccanan bbee lloadedoaded iintonto aann SSQLQL ddatabase,atabase, a sstatisticstatistics ppackage,ackage, oorr a spreadsheet.spreadsheet. IfIf thethe extractedextracted datadata isis stillstill inconvenientlyinconveniently llarge,arge, iitt iiss ooftenften ppossibleossible ttoo selectselect a subsamplesubsample forfor statisticalstatistical analysis.analysis. AtAt Google,Google, forfor eexample,xample, II have have ffoundound tthathat rrandomandom ssamplesamples onon tthehe oorderrder ooff 0.1 percent0.1 percent wworkork fi nene fforor aanalysisnalysis ooff bbusinessusiness ddata.ata. OOncence a datasetdataset hhasas bbeeneen eextracted,xtracted, itit isis ooftenften necessarynecessary toto dodo somesome explor-explor- aatorytory ddataata aanalysisnalysis alongalong withwith cconsistencyonsistency aandnd ddata-cleaningata-cleaning tasks.tasks. ThisThis isis somethingsomething Hal R. Varian 5 Table 1 Tools for Manipulating Big Data Google name Analog Description Google File System Hadoop File System This system supports fi les so large that they must be distributed across hundreds or even thousands of computers. Bigtable Cassandra This is a table of data that lives in the Google File System. It too can stretch over many computers. MapReduce Hadoop This is a system for accessing and manipulating data in large data structures such as Bigtables. MapReduce allows you to access the data in parallel, using hundreds or thousands of machines to extract the data you are interested in. The query is “mapped” to the machines and is then applied in parallel to different shards of the data. The partial calculations are then combined (“reduced”) to create the summary table you are interested in. Sawzall Pig This is a language for creating MapReduce jobs. Go None Go is fl exible open-source, general-purpose computer language that makes it easier to do parallel data processing. Dremel, BigQuery Hive, Drill, Impala This is a tool that allows data queries to be written in a simplifi ed form of of Structured Query Language (SQL). With Dremel it is possible to run an SQL
Recommended publications
  • Eliciting Informative Feedback: the Peer$Prediction Method
    Eliciting Informative Feedback: The Peer-Prediction Method Nolan Miller, Paul Resnick, and Richard Zeckhauser January 26, 2005y Abstract Many recommendation and decision processes depend on eliciting evaluations of opportu- nities, products, and vendors. A scoring system is devised that induces honest reporting of feedback. Each rater merely reports a signal, and the system applies proper scoring rules to the implied posterior beliefs about another rater’s report. Honest reporting proves to be a Nash Equilibrium. The scoring schemes can be scaled to induce appropriate e¤ort by raters and can be extended to handle sequential interaction and continuous signals. We also address a number of practical implementation issues that arise in settings such as academic reviewing and on-line recommender and reputation systems. 1 Introduction Decision makers frequently draw on the experiences of multiple other individuals when making decisions. The process of eliciting others’information is sometimes informal, as when an executive consults underlings about a new business opportunity. In other contexts, the process is institution- alized, as when journal editors secure independent reviews of papers, or an admissions committee has multiple faculty readers for each …le. The internet has greatly enhanced the role of institu- tionalized feedback methods, since it can gather and disseminate information from vast numbers of individuals at minimal cost. To name just a few examples, eBay invites buyers and sellers to rate Miller and Zeckhauser, Kennedy School of Government, Harvard University; Resnick, School of Information, University of Michigan. yWe thank Alberto Abadie, Chris Avery, Miriam Avins, Chris Dellarocas, Je¤ Ely, John Pratt, Bill Sandholm, Lones Smith, Ennio Stachetti, Steve Tadelis, Hal Varian, two referees and two editors for helpful comments.
    [Show full text]
  • Annual Report 2018
    2018Annual Report Annual Report July 1, 2017–June 30, 2018 Council on Foreign Relations 58 East 68th Street, New York, NY 10065 tel 212.434.9400 1777 F Street, NW, Washington, DC 20006 tel 202.509.8400 www.cfr.org [email protected] OFFICERS DIRECTORS David M. Rubenstein Term Expiring 2019 Term Expiring 2022 Chairman David G. Bradley Sylvia Mathews Burwell Blair Effron Blair Effron Ash Carter Vice Chairman Susan Hockfield James P. Gorman Jami Miscik Donna J. Hrinak Laurene Powell Jobs Vice Chairman James G. Stavridis David M. Rubenstein Richard N. Haass Vin Weber Margaret G. Warner President Daniel H. Yergin Fareed Zakaria Keith Olson Term Expiring 2020 Term Expiring 2023 Executive Vice President, John P. Abizaid Kenneth I. Chenault Chief Financial Officer, and Treasurer Mary McInnis Boies Laurence D. Fink James M. Lindsay Timothy F. Geithner Stephen C. Freidheim Senior Vice President, Director of Studies, Stephen J. Hadley Margaret (Peggy) Hamburg and Maurice R. Greenberg Chair James Manyika Charles Phillips Jami Miscik Cecilia Elena Rouse Nancy D. Bodurtha Richard L. Plepler Frances Fragos Townsend Vice President, Meetings and Membership Term Expiring 2021 Irina A. Faskianos Vice President, National Program Tony Coles Richard N. Haass, ex officio and Outreach David M. Cote Steven A. Denning Suzanne E. Helm William H. McRaven Vice President, Philanthropy and Janet A. Napolitano Corporate Relations Eduardo J. Padrón Jan Mowder Hughes John Paulson Vice President, Human Resources and Administration Caroline Netchvolodoff OFFICERS AND DIRECTORS, Vice President, Education EMERITUS & HONORARY Shannon K. O’Neil Madeleine K. Albright Maurice R. Greenberg Vice President and Deputy Director of Studies Director Emerita Honorary Vice Chairman Lisa Shields Martin S.
    [Show full text]
  • The Information Revolution Reaches Pharmaceuticals: Balancing Innovation Incentives, Cost, and Access in the Post-Genomics Era
    RAI.DOC 4/13/2001 3:27 PM THE INFORMATION REVOLUTION REACHES PHARMACEUTICALS: BALANCING INNOVATION INCENTIVES, COST, AND ACCESS IN THE POST-GENOMICS ERA Arti K. Rai* Recent development in genomics—the science that lies at the in- tersection of information technology and biotechnology—have ush- ered in a new era of pharmaceutical innovation. Professor Rai ad- vances a theory of pharmaceutical development and allocation that takes account of these recent developments from the perspective of both patent law and health law—that is, from both the production side and the consumption side. She argues that genomics has the po- tential to make reforms that increase access to prescription drugs not only more necessary as a matter of equity but also more feasible as a matter of innovation policy. On the production end, so long as patent rights in upstream genomics research do not create transaction cost bottlenecks, genomics should, in the not-too-distant future, yield some reduction in drug research and development costs. If these cost re- ductions are realized, it may be possible to scale back certain features of the pharmaceutical patent regime that cause patent protection for pharmaceuticals to be significantly stronger than patent protection for other innovation. On the consumption side, genomics should make drug therapy even more important in treating illness. This reality, coupled with empirical data revealing that cost and access problems are particularly severe for those individuals who are not able to secure favorable price discrimination through insurance, militates in favor of government subsidies for such insurance. As contrasted with patent buyouts, the approach favored by many patent scholars, subsidies * Associate Professor of Law, University of San Diego Law School; Visiting Associate Profes- sor, Washington University, St.
    [Show full text]
  • Ei-Report-2013.Pdf
    Economic Impact United States 2013 Stavroulla Kokkinis, Athina Kohilas, Stella Koukides, Andrea Ploutis, Co-owners The Lucky Knot Alexandria,1 Virginia The web is working for American businesses. And Google is helping. Google’s mission is to organize the world’s information and make it universally accessible and useful. Making it easy for businesses to find potential customers and for those customers to find what they’re looking for is an important part of that mission. Our tools help to connect business owners and customers, whether they’re around the corner or across the world from each other. Through our search and advertising programs, businesses find customers, publishers earn money from their online content and non-profits get donations and volunteers. These tools are how we make money, and they’re how millions of businesses do, too. This report details Google’s economic impact in the U.S., including state-by-state numbers of advertisers, publishers, and non-profits who use Google every day. It also includes stories of the real business owners behind those numbers. They are examples of businesses across the country that are using the web, and Google, to succeed online. Google was a small business when our mission was created. We are proud to share the tools that led to our success with other businesses that want to grow and thrive in this digital age. Sincerely, Jim Lecinski Vice President, Customer Solutions 2 Economic Impact | United States 2013 Nationwide Report Randy Gayner, Founder & Owner Glacier Guides West Glacier, Montana The web is working for American The Internet is where business is done businesses.
    [Show full text]
  • The Next Frontier for Innovation, Competition, and Productivity
    McKinsey Global Institute June 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company’s business and economics research arm. MGI’s mission is to help leaders in the commercial, public, and social sectors develop a deeper understanding of the evolution of the global economy and to provide a fact base that contributes to decision making on critical management and policy issues. MGI research combines two disciplines: economics and management. Economists often have limited access to the practical problems facing senior managers, while senior managers often lack the time and incentive to look beyond their own industry to the larger issues of the global economy. By integrating these perspectives, MGI is able to gain insights into the microeconomic underpinnings of the long-term macroeconomic trends affecting business strategy and policy making. For nearly two decades, MGI has utilized this “micro-to-macro” approach in research covering more than 20 countries and 30 industry sectors. MGI’s current research agenda focuses on three broad areas: productivity, competitiveness, and growth; the evolution of global financial markets; and the economic impact of technology. Recent research has examined a program of reform to bolster growth and renewal in Europe and the United States through accelerated productivity growth; Africa’s economic potential; debt and deleveraging and the end of cheap capital; the impact of multinational companies on the US economy; technology-enabled business trends; urbanization in India and China; and the competitiveness of sectors and industrial policy. MGI is led by three McKinsey & Company directors: Richard Dobbs, James Manyika, and Charles Roxburgh.
    [Show full text]
  • Spring 2020 CWRU Economics Dept Newsletter
    Spring 2020 CWRU Economics Dept Newsletter Hal Varian gives 2019 McMyler Memorial Lecture September 24, 2019: Hal Varian, Chief Economist at Google, visited campus to deliver the annual Howard T. McMyler Memorial Lecture. Varian talked about the future of work, focusing on how shifting demographics and declining popula- tion growth will continue to create a tight labor market even in the face of increased automation. He also met with a small group of economics majors before his presentation, where he answered their questions and offered advice. Pictured are economics majors at the meeting: back row: Aayush Parikh ('20), Hersh Bhatt ('20), Kareem Agag ('20), (Hal Vari- an), James Jung ('22), and Marcus Daly ('20); front row: Alec Hoover ('20), Alexandra Stevens ('20), Claire Jeffress ('22), Lee Radics ('20), Jon Powell ('21), Earl Hsieh ('21), and Peter Durand ('20). Women in Economics club approved Professor Roman Sheremeta gave two TED talks in Women in Economics (WE) aims to empower women by hosting 2019: one in Ukraine at events that foster connections within and beyond CWRU, as well TEDxIvano-Frankivsk over the summer, and one as assist women in professional development to increase their on campus at TEDxCWRU in November. His representation in the economics community. Students from any TEDxCWRU talk, “The Pursuit of Happiness: major or gender may join and/or attend events. Advice from a Behavioral Economist,” is now Their first event will be March 24, 4:00 p.m. in PBL 203: a available on the TEDx Talks YouTube channel. general introduction and social event for anyone interested in participating in the club.
    [Show full text]
  • Google's Economic Impact
    Google’s Economic Impact United States | 2012 The web is working for American businesses. And Google is helping. Google is well-known for helping people search for and find the information they want. Through our search and advertising programs, businesses find customers, publishers earn money from their online content and nonprofits get donations and volunteers. These tools are how we make money, and they’re how millions of businesses do, too. It’s these solutions that make Google an engine for economic growth. According to BIA/Kelsey, 97% of Internet users in the U.S. look online for local products and services, and hundreds of thousands of businesses use the Internet— and Google tools—to reach these potential customers. This economic report details Google’s economic impact across the country, including state-by-state numbers of advertisers, publishers, and non-profits who use Google every day. But there are real stories behind all those numbers, so we’ve included examples of businesses that are using the web, and Google, to succeed online. Google was a small business not that long ago and we are proud to share the tools of our success with other businesses that want to grow and thrive in the age of the Internet. Sincerely, Allan Thygesen Vice President, Global SMB Sales Google’s Economic Impact Where we get the numbers Aside from being a well-known search engine, Google is also a successful advertising company. We make most of our revenue from the ads shown next to our search results, on our other websites, and on the websites of our partners.
    [Show full text]
  • Hidden Value: How Consumer Learning Boosts Output
    Hidden Value: How Consumer Learning Boosts Output BY LEONARD NAKAMURA phones. Ipads. Wikipedia. Google Maps. Yelp. TripAdvisor. product exists and then what its char- New digital devices, applications, and services offer advice acteristics and performance are like. and information at every turn. The technology around This information acquisition in turn I us changes fast, so we are continually learning how best lowers the risk associated with any to use it. This increased pace of learning enhances the given purchase and, on average, will satisfaction we gain from what we buy and increases its value to us over raise the amount of pleasure or use we time, even though it may cost the same — or less. However, this effect get from it. of consumer learning on value makes inflation and output growth more Consider all the information avail- difficult to measure. As a result, current statistics may be undervaluing able to help us decide to see a movie. household purchasing power as well as how much our economy We can look at trailers in the theater or online; we can read reviews and produces, leading us to believe that our living standards are declining compare the number of stars the movie when they are not. gets from critics or fellow moviegoers; and we can ask our friends. Similarly, when deciding on a restaurant, we can This disconnect has implications Then we will turn to theories of con- consult online sources like Yelp, Zagat, for policy. Economists are more famil- sumer preferences and behavior that or Chowhound; we can examine the iar with how learning makes us better take learning into account.
    [Show full text]
  • Eliciting Informative Feedback: the Peer-Prediction Method
    Eliciting Informative Feedback: The Peer-Prediction Method Nolan Miller, Paul Resnick, and Richard Zeckhauser∗ June 15, 2004† Abstract Many recommendation and decision processes depend on eliciting evaluations of opportu- nities, products, and vendors. A scoring system is devised that induces honest reporting of feedback. Each rater merely reports a signal, and the system applies proper scoring rules to the implied posterior belief about another rater’s report. Honest reporting proves to be a Nash Equilibrium. The scoring schemes can be scaled to induce appropriate effort by raters and can be extended to handle sequential interaction and continuous signals. We also address a number of practical implementation issues that arise in settings such as academic reviewing and on-line recommender and reputation systems. ∗Miller and Zeckhauser, Kennedy School of Government, Harvard University; Resnick, School of Information, University of Michigan. †We thank Alberto Abadie, Chris Avery, Miriam Avins, Chris Dellarocas, Jeff Ely, John Pratt, Bill Sandholm, Lones Smith, Ennio Stachetti, Steve Tadelis, Hal Varian, two referees and two editors for helpful comments. We gratefully acknowledge financial support from the National Science Foundation under grant numbers IIS-9977999 and IIS-0308006, and the Harvard Business School for hospitality (Zeckhauser). 1Introduction We frequently draw on the experiences of multiple other individuals when making decisions. The process can be informal. Thus, an executive deciding whether to invest in a new business opportu- nity will typically consult underlings, each of whom has some specialized knowledge or perspective, before committing to a decision. Similarly, journal editors secure independent reviews of papers, and grant review panels and admissions and hiring committees often seek independent evaluations from individual members who provide input to a group decision process.
    [Show full text]
  • Big Data: New Tricks for Econometrics
    Big Data: New Tricks for Econometrics Hal R. Varian∗ June 2013 Revised: April 14, 2014 Abstract Nowadays computers are in the middle of most economic transactions. These \computer-mediated transactions" generate huge amounts of data, and new tools can be used to manipulate and analyze this data. This essay offers a brief introduction to some of these tools and meth- ods. Computers are now involved in many economic transactions and can cap- ture data associated with these transactions, which can then be manipulated and analyzed. Conventional statistical and econometric techniques such as regression often work well but there are issues unique to big datasets that may require different tools. First, the sheer size of the data involved may require more powerful data manipulation tools. Second, we may have more potential predictors than appropriate for estimation, so we need to do some kind of variable selection. Third, large datasets may allow for more flexible relationships than simple ∗Hal Varian is Chief Economist, Google Inc., Mountain View, California, and Emeritus Professor of Economics, University of California, Berkeley, California. Thanks to Jeffrey Oldham, Tom Zhang, Rob On, Pierre Grinspan, Jerry Friedman, Art Owen, Steve Scott, Bo Cowgill, Brock Noland, Daniel Stonehill, Robert Snedegar, Gary King, Fabien Curto- Millet and the editors of this journal for helpful comments on earlier versions of this paper. 1 linear models. Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning and so on may allow for more effective ways to model complex relationships. In this essay I will describe a few of these tools for manipulating and an- alyzing big data.
    [Show full text]
  • Don't Be Evil
    220 Chapter 13 “Don’t Be Evil” and Beyond for High Tech Organizations: Ethical Statements and Mottos (and Responsibility) Jo Ann Oravec University of Wisconsin – Whitewater, USA ABSTRACT Societal pressures on high tech organizations to define and disseminate their ethical stances are increasing as the influences of the technologies involved expand. Many Internet-based businesses have emerged in the past decades; growing numbers of them have developed some kind of moral declaration in the form of mottos or ethical statements. For example, the corporate motto “don’t be evil” (often linked with Google/ Alphabet) has generated considerable controversy about social and cultural impacts of search engines. After addressing the origins of these mottos and statements, this chapter projects the future of such ethi- cal manifestations in the context of critically-important privacy, security, and economic concerns. The chapter analyzes potential influences of the ethical expressions on corporate social responsibility (CSR) initiatives. The chapter analyzes issues of whether “large-grained” corporate mottos can indeed serve to supply social and ethical guidance for organizations as opposed to more complex, detailed codes of ethics or comparable attempts at moral clarification. INTRODUCTION Evil is whatever Sergey [Brin] says is evil. - Eric Schmidt, former Executive Chairman of Google, as quoted in Vise and Malseed (2005) How do organizations make sense of the panoply of ethical issues they face, especially in rapidly-changing technological and social environments? Challenges are expanding for high tech research and development organizations as their technologies grow in societal impact (Broeders & Taylor, 2017), from consider- ing the problems of young people confronting cyberbullies (Oravec, 2012) to the use of social media DOI: 10.4018/978-1-5225-4197-4.ch013 Copyright © 2018, IGI Global.
    [Show full text]
  • Search, Antitrust, and the Economics of the Control of User Data
    Search, Antitrust, and the Economics of the Control of User Data Nathan Newmant This Article argues for reorienting many antitrust investigations-and more generally regulatory approaches-tofocus on how control of personal data by corporations can entrench monopoly power in an economy shaped increasingly by the power of "big data." What is largely missed in analyses defending Google from antitrust action is how Google's ever expanding control of user personal data and its critical value to online advertisers creates an insurmountable barrierto entry for new competition. Moreover, Google did not just inherit that business advantage through its innovation in search engine technology. Rather, Google has aggressively expanded its control of user data both by moving into new product sectors to collect additionaluser data and by invading users'privacy. Introduction: The Undermining of Competition in Online Markets...............402 A. How Google Uses Controlof User Information to Strengthen its Monopoly........... ................... ............. 405 B. The Antitrust Case Against Google ........................... 407 C. Remedies and the Argumentfor Aggressive and Earlier Regulatory Intervention.......................... 411 I. How Google Dominates the Search Advertising Market.. ............. 412 A. How the Search Advertising Market Works ......... ........ 413 B. Measuring Google's Dominance.....................414 C. Why Search Advertising is the Relevant Marketfor Antitrust Analysis ......................................... 416 D. Google's
    [Show full text]