<<

Downloaded by guest on September 28, 2021 elt h vlaino ag opr fnw tre vralong scale a over not stories do of but corpora large valuable of (e.g., are evaluation the searches that to keyword well 11)—approaches or content and coders 4 investigative human refs. measuring on investigative relied to of largely approaches have production date, the this To for measuring news. have are to (8–10), and ill-suited work weather previous reason, or media in sports of extensively pandemics, employed topic categories been wars, fixed latent as labeling and such for phrases, coverage well work predetermined which or new models, methods, entities light Clustering of to public. previously counts bring not articles was that investigative information of definition, measure by some challenging, because is requires content investigative however, Measuring investigativeness. content, investigative for investigative of be production therefore continued disap- might the The decades content. imperil recent 7). to the in (6, emer- revenues expected on revenues ad the depended of advertising Historically, press pearance “watchdog” (5). of that independent mid-2000s by compet- growth an revenues the exacerbated a of advertising since is in gence in that faced undersupplied declines worry be have press this steep (4)—a to the that marketplace likely the of concerns news is functions raised content itive public long of important have investment— kind most scholars the and (3). of knowledge one local is often deep which and interest, requires public of information new uncovers journalism—reporting that Investigative coverage. newspaper of tent period. this throughout there production place the society, took in changes that democratic the news a of of review in systematic a reporting been news not has of role (2). the discussion public staffing about extensive overall inspired dozens in have changes with declines these along steady Although and (1), 1,800 changes States nearly ownership United been of the have in there closures 2004, Since newspaper are adjust. to local technologies forced media, being different emerging to consumption As news shift announcements. event community L impact journalistic a to corresponding decade, the layoffs. of newsroom y of 2 wave last but recent the examined, in period investiga- decline time of past notable the of a quantity the most the over over in produced States articles stability tive United surprising pro- the find news in We decade. in newspapers local patterns by cross-sectional method duction our and use over-time We examine articles. arti- subsequent to on on based that influence articles and reporting to text news method of cle of a content types propose investigative we the the paper, on measure this In changes produce. these newsrooms print of local character- to impact from attempts the systematic ize away few been decades. have few reorientation there last However, the a in commonplace become and 2021) advertising—have newsrooms—ownership 25, March local layoffs, review of for operation (received restructuring, 2021 the 24, to May approved changes and Major MA, Cambridge, University, Harvard Waters, C. Mary by Edited a Turkel Eray in newspapers journalism local investigative measuring for method A PNAS rdaeSho fBsns,Safr nvriy tnod A94305 CA Stanford, University, Stanford Business, of School Graduate nesadn httecagsi h esidsr mean industry news the in changes the what Understanding con- investigative the measuring on focus we article, this In rudraigivsiain olclsot oeaeand coverage sports from ranging local reporting to of investigations array groundbreaking an provide newsrooms ocal 01Vl 1 o 0e2105155118 30 No. 118 Vol. 2021 a,1 | ns Saha Anish , oa news local | ahn learning machine a ht asnOwen Carson Rhett , a rgr .Martin J. Gregory , doi:10.1073/pnas.2105155118/-/DCSupplemental ls epciey diinldtiso u aaadmdlaeprovided arti- are investigative model labeled and 119 data and our pages. hand-selected 213 on in front details had were Additional sets newspapers’ 3) respectively. newsletter test cles, local or weekly and from contest; validation the The reporting for IRE reporters investigative journalism Reporters regular of Investigative relevant showcases a team the a for of a for “investigative” database by (IRE) runner-up as the Editors or into labeled place entered and were were first that 2) won articles award; 1) 562 they used because we (n (n training, tuning 2019 and our hyperparameter in 2010 and published between (n validation articles published for training used Articles for three were into used testing: data were and our split 2017 training We content. for investigative groups predict to model work of effect characteristic measurable prevalent a stories—a had news that future news. articles investigative and identify discourse us (13). public articles help on scientific they of context, impact scholarly our evaluate been In to have work models additional previous influence classifier in Document of used the alone. influences text provides the measured which beyond the features, information additional used as and unsu- topics article month, on an each subsequent articles trained newspaper’s the we of each in Finally, Freedom of discussed (4). model the cases) influence of document court mentions pervised and (e.g., known audits, writing are Act, investigative that Information in terms of common groups be fea- specific to custom of extracted occurrence we the a measuring Second, using tures (12). article model each embedding of word representations which frequency pretrained high-dimensional article), each create document in to used a used phrases infor- built we two-word are and first that (words n-grams features We of descriptive content. matrix of investigative set rich about a mative generate to article each for States United content. the investigative of producing regions of selection different history a a in by have located and 2020 are and that 2010 newspapers text between 50 full of published the and articles collect collects Drawing for we y. that newspapers, metadata database from 10 and articles news past of a the versions NewsBank, digital over published by archives States provided articles United archive news an the of from across corpus newspapers comprehensive local a by on relies classifier Our Methods and Materials analysis. the our call for criterion we (which predicted investigative or the is is and “score” article which given stories, classifier, a news our that of future probability output in The an discussed content. on topics text based on investigativeness impact predict stories. to article’s trained news is investigative classifier unsuper- identify Our and to supervised approaches develop learning mixes we vised which challenge, algorithm measurement classification this a address To span. time ulse uy1,2021. 19, July Published at online information supporting contains article This 1 under distributed BY).y is (CC article access open This interest.y competing performed no declare research, authors designed The paper.y S.V. the wrote and and G.J.M., data, analyzed R.C.O., research, A.S., E.T., contributions: Author owo orsodnemyb drse.Eal [email protected] Email: addressed. be may correspondence whom To IAppendix SI sn h ulsto etrsa nus etandanua net- neural a trained we inputs, as features of set full the Using metadata and text raw the processed we classifier, the train to order In a n hsaaVasserman Shoshana and , p . hogotteppr,i sda h evaluation the as used is paper), the throughout https://doi.org/10.1073/pnas.2105155118 = 0,3)wr sdfrtsigol.For only. testing for used were 409,233) = ,0,9) rilspbihdi 2018 in published articles 5,005,696); raieCmosAtiuinLcne4.0 License Attribution Commons Creative . y https://www.pnas.org/lookup/suppl/ a which Matters, Local = 1,3) and 511,834); | f3 of 1

SOCIAL SCIENCES BRIEF REPORT ventions across newspapers, front page news sections fea- ture the most investigative articles by a large margin, fol- lowed by local/state and national sections. Similarly, the leading authors are all distinguished investigative journal- ists with lengthy portfolios of investigations spanning fraud,

Fig. 1. Validation and case studies. (i) Most-occurring section names and authors in predicted investigative articles, among articles for which we have section name and author information. (ii) Case studies in unseen (post-2018) validation and test data. We identify investigative articles on various topics. A: Rich Rodriguez and Don Shooter scandals; B: Tucson housing crisis; C, D: US/Mexico border wall and family separations along the border; E: Buffalo Water Authority and Percoco corruption cases; F, G, H: Buffalo Diocese and Boy Scout Organization sex scandals; I, K: foster care in Florida, and political campaign spending; J: use of DNA evidence in criminal courts; L: California Camp Fire; M: Gilroy mass shooting, and criminal investigations around the Golden State Killer case. Links to all articles are provided in SI Appendix.

Results Validation. We show that our classifier does well at identifying several hallmarks of investigative quality. First, our classifier suc- cessfully predicts articles in unseen data that were handpicked for the Local Matters newsletter and/or were ultimately recog- nized with investigative journalism awards. Using a threshold value of p = 0.9 in the test set, our model correctly identifies 80/119 award winners (for a recall value of 0.66), classifies 4,218 other non−award-winning articles as investigative, and classifies 404,894 articles as not investigative. Our classifier systematically identifies highly productive authors and assigns high average scores to sections and out- ∗ lets that specialize in investigative work. Fig. 1, i presents Fig. 2. Descriptive analysis. Major layoff events are marked in black on the x the authors and section names with the highest numbers of axis. Plotted values are 6-mo rolling averages. (i) Counts of articles that have articles that are predicted to be investigative by the classi- a high score (p > 0.9), grouped by newspaper origin. Twenty-eight newspa- fier. Although there is significant variation in naming con- pers in our dataset originated from large metropolitan areas (>1 million metro population), 20 are from small metropolitan areas (<1 million pop- ulation), and 2 were published online nationally. (ii) Eight newspapers that *Information about author identities is not an input to the predictive model, and thus have been acquired by “investment” firms according to the UNC newspapers this constitutes a validation check on the model predictions. database (1). Red lines represent the date of the ownership change.

2 of 3 | PNAS Turkel et al. https://doi.org/10.1073/pnas.2105155118 A method for measuring investigative journalism in local newspapers Downloaded by guest on September 28, 2021 Downloaded by guest on September 28, 2021 ihawv flyf vns(lte ttebto fFg 2, Fig. of bottom 2019. the into coincides continued at and drop (plotted mid-2018 This events in began layoff respectively. that of 0.65), wave = a (SD with 0.36 and are The 1.38), outlets papers. national small- large-metro and large-metro, metro, for the coefficients trend at time monthly concentrated post-2019 2019, in starting 4. epciey lowt ml oiietm rns((6.7 trends time positive small inves- with be = also to respectively, (SD predicted are 0.7 that is stories large-metro, tigative news outlets. national for of for share 0.025) 0.08) average = The (SD = 0.08 and (SD small-metro, reveals of for 0.85 trend 0.04) most time of for monthly coefficient overall stories a an on a investigative analysis find of Regression we output period. period, the the this in during trend industry upward news the Public consolida- in and for turmoil tion Center the given focus the into surprisingly, and that Perhaps Integrity). newspapers (ProPublica publications content of online-only investigative national sample on specialist the two newspapers, split and large-metro We newspapers, small-metro size. groups: and three area structure metro industry by newspaper in changes to news staffing. relates investigative of it production the as in trends broad examine we Analysis. Descriptive care Buf- foster Florida the investigating in articles facilities. abuse seven and sexual Diocese investigating falo 2018–2019 in published ehdfrmauigivsiaiejunls nlclnewspapers local in journalism investigative measuring for method A al. et Turkel exam- Some 1, Fig. investment. in plotted fixed investigative series large investigative for of a format ples requires common are series—a that that multipart corruption, reporting articles in a identifies abuse, crises reliably of sex also as part classifier separation, such Our 1, cor- family crime. topics Fig. and overwhelmingly rights, on in peaks gun stories the counted housing, investigative winner,” peak. articles to “award the the responded an to plot- of as contributed peaks which one tagged the articles only of data the test Although each examined and For validation we To newspapers. the ted, dataset. in four 0.9 our for than in (post-2018) higher label score “winning” a 1, received a Fig. in receive this, nature, not demonstrate in to did investigative able clearly that is are but it that articles, articles investigative classify correctly award-winning of set defined hazards, environmental more. and misconduct, prosecutorial corruption, .M erv,Nwppr n ate:Hwavriigrvne rae nindepen- an created revenues advertising How parties: and Newspapers Petrova, M. 7. Hamilton, T. J. 6. Cag J. Angelucci, C. 5. Hamilton, T. J. 4. Glasser, T. Ettema, coverage. S. news J. political affect 3. resources reporting How cuts: and Paper Innovation Peterson, for E. Center 3, 2. rep. (Tech. desert” news expanding “The Abernathy, P. 1. 8 oee,Fg 2, Fig. However, i.2, Fig. utemr,atog u lsie stando narrowly a on trained is classifier our although Furthermore, etpress. dent 2011). Microecon. 2016). MA, Cambridge, Press, University (Harvard Virtue Sci. Pol. J. 2018). Media, Local in Sustainability × 10 Clmi nvriyPes 1998). Press, University (Columbia −5 i 4–5 (2021). 443–459 65, 0.7% ihSD with , m oi.Si Rev. Sci. Polit. Am. 1–6 (2019). 319–364 11, hw h vrl eeso u esr,aggregated measure, our of levels overall the shows l h esTa’ i oSell to Fit That’s News the All eorc’ eetvs h cnmc fIvsiaieJournalism Investigative of Economics The Detectives: Democracy’s and ,Nwppr ntmso o detsn revenues. advertising low of times in Newspapers e, ´ utdaso osine netgtv oraimadPublic and Journalism Investigative Conscience: of Custodians sa lutaino h tlt fordataset, our of utility the of illustration an As < 0.5% i lososapeiiosdo noutput in drop precipitous a shows also 10 9–0 (2011). 790–808 105, −6 nlreadsalmtonewspapers, metro small and large in epo h on fatce that articles of count the plot we ii, nboth). in −7. PictnUiest rs,Pictn NJ, Princeton, Press, University (Princeton S 1.32), = (SD ii nld 8articles 48 include −4.2 × m cn J. Econ. Am. 10 S = (SD −5 i and was Am. i) 3 .Grw .H,J odGae,D .Be,J .Eas esrn icrieinfluence discursive Measuring Evans, A. J. Blei, M. D. Boyd-Graber, J. Hu, Y. Gerow, A. 13. local Joulin producing A. “Who’s 12. McCullough, K. Weber, M. Napoli, P. Wang, Q. Mahone, J. 11. Cag J. 10. ttshsbe eoie nHradDtvre(10.7910/DVN/HSZ2QL). Dataverse winning Harvard award in and deposited scores, been predicted has inputs, status network neural dates, names, the of Availability. Data organization the consequences. on public questions its and of industry variety news researchers a to useful in publicly be will interested is dataset this scores that expect impact investigativeness We full available. predicted with their articles our million seen and 5.9 of have metadata dataset not article-level most complete may Our its we yet. slow- are of and restructuring also processes, one and it conse- moving downsizing However, on feared. that as the caution industry, bad the as news that suggests be not the hope may in outputs, some important changes offers of evidence quences descriptive This peers. its Discussion to relative shift measure our noticeably in not output average did paper’s con- employees, the Times-Union, paper, part-time Florida that among the at centrated news at investigative layoffs the precipi- of more-limited of a measure whereas employees our by in 200+ followed decline was all tous 2018 to in offered American-Statesman buyout Austin The 2, a metric. equal; Fig. accom- our created often in in declines which plots of events, outlet-level predictive no layoff are articles. have that acquisitions, investigative newspapers note pany of also most production we cases, their However, some in in by change followed metric noticeable are our changes rela- ownership in strong while a drops visible; Overall, not change. ownership is 0.22 an tionship by that after decreases reveals 0.485) month = per newspapers (SD articles eight investigative of these number in the news- status and ownership averages, regional paper trends, time Regression sus- monthly content. to using investigative led of analysis output companies the ownership investment in declines changed by tained papers that of dataset acquisitions that our category. group” in “investment the papers into eight for level in finan- journalism on to investigative leading weight in newsrooms. group−owned (1), investments more investment benefits shrinking place community of scholars to groups worries Some relative such funds. profitability equity that cial private argued and have funds hedge groups: † ote“netetgop aeoyi hspro,adte dnie custo dates acquisition identified then changed reports. and that period, news newspapers this contemporaneous identify in from to category ownership group” “investment of the (1) to database Media’s Local in ability .G ig .Sher .Wie o h esmdaatvt ulcepeso and expression public activate media news the How White, A. Schneer, B. King, FindMyPast G. Cristianini; N. 9. Lewis, J. Thompson, J. Sudhahar, S. Lansdall-Welfare, T. 8. erle nteUiest fNrhCrln UC etrfrInvto n Sustain- and Innovation for Center (UNC) Carolina North of University the on relied We i.2, Fig. investment by newspapers of acquisitions at look next We cosscholarship. across 2021). March 1 (Accessed https://arxiv.org/abs/1612.03651 2019). (2016). University, Duke at 1, Democracy rep. & (Tech. Media types” for outlet Center Wallace different DeWitt across output journalistic Assessing journalism? Stud. Econ. agendas. national influence periodicals. British U.S.A. of Sci. years 150 of analysis Content Team, Newspaper ,N Herv N. e, ´ ii tal et 47E6 (2017). E457–E465 114, hw iesre lt formauea h monthly the at measure our of plots series time shows 1626 (2020). 2126–2164 87, ,Fstx.i:Cmrsigtx lsicto oes ri [Preprint] arXiv models. classification text Compressing Fasttext.zip: ., ,M-.Vad h rdcino nomto na nieworld. online an in information of production The Viaud, M.-L. e, ´ h ril-ee aae ihatos ils newspaper titles, authors, with dataset article-level The rc al cd c.U.S.A. Sci. Acad. Natl. Proc. Science 7–8 (2017). 776–780 358, ii ugs htntallyfsare layoffs all not that suggest https://doi.org/10.1073/pnas.2105155118 3831 (2018). 3308–3313 115, † efidlmtdevidence limited find We PNAS rc al Acad. Natl. Proc. | f3 of 3 Rev.

SOCIAL SCIENCES BRIEF REPORT