Value of the Online Newspaper Collection at the University of Illinois at Urbana-Champaign Library
Kirk Hess, Digital Humanities Specialist Sarah Hoover, History, Philosophy, and Newspaper Library Graduate Assistant
University Library University of Illinois at Urbana-Champaign Research Questions
How did our patrons access digitized newspaper content in 2012 and 2013? Can we develop a reliable metric to measure the relative usefulness of each title and/or aggregator? Utility Cost Quality
Online Newspaper Sources Included in Study
EbscoHost: Newspaper Source Gale: newspaper and newswire titles from Academic OneFile, Opposing Viewpoints in Context, and Biography in Context LexisNexis Academic: news sources NewsBank: Access World News, Archives of Americana, and World Newspaper Archive ProQuest: newspaper, historical newspaper, and wire feed titles Library PressDisplay
Factor One: Utility Measure
Most vendors provided COUNTER JR1 reports, which track successful full-text article requests by title LexisNexis does not provide JR1 reports and has its own article-level usage statistics PressDisplay is tracked by issues read rather than by article Filtered by type of content Newspapers Newswires Historical newspapers Vendor-Level Usage Newspaper Article Requests by Vendor, 2012-2013 This point includes data mining of a number of UK newspaper titles 100000 This point includes data mining of the San Jose
Mercury News (CA) and San Mateo County Times (CA)
80000 This point includes data mining of the Wall Street Journal, Chicago Defender, and Financial Times (London) 60000 EbscoHost Gale
LexisNexis
Text ArticleRequests Text - 40000 NewsBank ProQuest
20000 SuccessfulFull
0 PressDisplay Usage, 2012-2013
700
600
500
400
300 Issues Issues Read
200
100
0 EbscoHost Usage by Title (Log scale)
2013
1000
100 TextArticle Requests 2012 - 10
1000
1 Successful Successful Full 0 50 100 150 200 250 300 350 100
Individual Titles TextArticle Requests - 10
1 Successful Successful Full 0 100 200 300 400 500 Individual Titles Gale Usage by Title (Log scale)
2013
10000
1000
100
2012 TextArticle Requests -
10000 10
1000
Successful Successful Full 1 0 10 20 30 40 50 60 70 Individual Titles
100
TextArticle Requests -
10
Successful Successful Full 1 0 10 20 30 40 50 60 Individual Titles LexisNexis Usage by Title (Log scale)
2013
100000
10000
1000
100
TextArticle Requests -
2012 10
100000 1
10000 0 100 200 300 400 500 600 700 800 Successful Successful Full Individual Titles and Aggregate Categories
1000
100
TextArticle Requests -
10
1
Successufl Full 0 100 200 300 400 500 600 700 800 Individual Titles and Aggregate Categories NewsBank Usage by Title (Log scale)
2013
10000
1000
100
TextArticle Requests - 2012 10
1000000
Successful Successful Full 1 0 500 1000 1500 2000 2500 3000 3500 10000
Individual Titles TextArticle Requests - 100
1
Successful Successful Full 0 1000 2000 3000 4000 5000 Individual Titles ProQuest Usage by Title (Log scale)
2013
100000
10000
1000
2012 TextArticle Requests 100 -
100000 10
10000 1 Successful Successful Full 0 100 200 300 400 500 600 700 800 900
1000 Individual Titles TextArticle Requests - 100
10 Successful Successful Full 1 0 50 100 150 200 250 300 350 400 450 Individual Titles PressDisplay Usage by Title (Log scale)
2013 1000
100
2012 10000 Issues Read 10
1000 1
0 50 100 150 200 250 300 350 400 450 Individual Titles
100 Issues Issues Read
10
1 0 100 200 300 400 500 Individual Titles Title Usage by Vendor, 2012-2013
EbscoHost 2012 EbscoHost 2013 Gale 2012 Gale 2013 LexisNexis 2012 LexisNexis 2013 Newsbank 2012 Newsbank 2013 ProQuest 2012 ProQuest 2013 PressDisplay 2012 PressDisplay 2013
0 2000 4000 6000 8000 10000 Titles with 2 or more requests Titles with 1 request Titles with 0 requests
*Source of data for total titles varies by vendor; some title counts post-date the year of usage, thus making the total number of titles with 0 requests an approximation. Title-Level Usage Coverage Notes
Coverage ranges for individual titles from different vendors vary ProQuest titles are separated in the graphs into ProQuest and ProQuest Historical based on the designation within ProQuest databases, but there is generally an overlap in coverage: ProQuest Historical: ~19th century - 1990s or 2000s “Current” ProQuest: ~1980s – current Coverage for Gale, LexisNexis, NewsBank generally starts in the 1980s Title-Level Methods and Notes
Analysis brought together separate data points for print and online versions as well as varied date ranges from ProQuest Historical titles under a single title heading Variations in vendor naming practices limited the effectiveness of automatic matching for identifying title overlaps Individual newspaper titles within LexisNexis are also part of aggregated sets in LexisNexis, with no data available for article views from individual titles within those sets. Examples include: Major World Publications Combined Newspapers All Full-text English US Newspapers and Wires Most-Used Newspaper Titles
140000
120000
100000
80000
60000
Text ArticleRequests Text -
40000
20000
0 San San SuccessfulFull San Jose San Jose Financial Financial Mateo Mateo The New The New Wall Wall Mercury Mercury Chicago Chicago PR PR Chicago Chicago Business Business Washingt Washingt Times Times County County York York Street Street News News Tribune Tribune Newswir Newswir Defender Defender Wire Wire on Post on Post (London, (London, Times Times Times Times Journal Journal (CA) (CA) 2012 2013 e 2012 e 2013 2012 2013 2012 2013 2012 2013 England) England) (CA) (CA) 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 ProQuest Historical 33532 27511 2412 1821 24543 13628 7078 17035 6176 5298 ProQuest 28533 81118 3259 2536 7589 5633 1857 305 7584 3180 114 1511 16537 NewsBank 128927 42 86748 5 231 106 472 401 6 4 455 321 4274 108 LexisNexis 10 17655 5455 833 52 430 358 240 50 Gale 6964 6005 67 442 51 119 270 382 EbscoHost 113 83 1 1 5 4 454 379 Anomalous Activity Rates in Most-Requested Titles, 2012
Title Vendor Months of Peak requests Average Median high activity in a single requests per requests per month month month
San Jose Mercury NewsBank May-July 2012 65639 10743.9 17.5 Data mining News (CA) San Mateo County NewsBank May-July 2012 39002 7229 85 Times (CA) New York Times LexisNexis Feb-March 2012 7714 1471.3 570 PR Newswire ProQuest December 2012 5906 632.4 142 Business Wire ProQuest December 2012 5906 632 131.5 Financial Times NewsBank March-April 2985 356.2 17.5 (London) 2012 The Times NewsBank March-April 1557 267.9 5.5 (London) 2012 The Guardian NewsBank March-April 1841 249.8 8 (London) 2012 Chicago Defender ProQuest May 2012 1782 154.8 4 All of these cases appear to represent some degree of data mining. Anomalous Activity Rates in Most-Requested Titles, 2013
Title Vendor Months of Peak requests Average Median high activity in a single requests per requests per month month month Wall Street ProQuest March 2013 28767 6759.8 3779 Journal Chicago ProQuest March 2013 10796 1419.6 426.5 Defender (historical) Financial Times ProQuest March 2013 11051 1378.1 278 (London) PR Newswire ProQuest February 2013 2858 469.4 167 Business Wire ProQuest February 2013 1305 265 114.5 Los Angeles ProQuest March 2013 1256 142.6 8.5 Sentinel Sacramento ProQuest March 2013 1449 160.4 4 Observer
All of these cases appear to represent some degree of data mining. Top 20 Titles out of Total Requests, 2012 (641436 requests)
Titles not used: ~11520* All other titles (~6326*) San Jose Mercury 36% News (CA) 20% Next 15 titles:
PR Newswire 1.40% Chicago Defender 1.39% Business Wire 1.25% San Mateo County Washington Post 1.11% Times (CA) Financial Times (London) 0.98% 14% Los Angeles Times 0.96% The Baltimore Sun 0.73% The Guardian (London) 0.70% The Times (London) 0.57% The News-Gazette 0.57% (Champaign-Urbana, IL) The State Journal-Register 0.46% The New York Times (Springfield, IL) 9% New York Tribune 0.42% San Francisco Chronicle 0.40% The Daily Telegraph/Sunday 0.35% Wall Street Journal Telegraph (London) 5% The Daily Mail/Mail on Sunday 0.35% Next 15 titles (London) 12% Chicago Tribune 4%
* These numbers are high, as title de-duplication (both between and within vendors) was done manually for the top 20 titles and could not be accurately done automatically Top 20 Titles out of Total Requests, 2013 (384224 requests)
Titles not used: ~12649* All other titles (~4742*) 42% Wall Street Journal 22%
Next 15 titles:
PR Newswire 1.70% Washington Post 1.63% Los Angeles Times 1.33% New York Tribune 1.07% The State Journal-Register 0.96% The New York Times (Springfield, IL) The News-Gazette 0.95% 10% (Champaign-Urbana, IL) Business Wire 0.91% The Baltimore Sun 0.84% Chicago Defender The Boston Globe 0.70% 5% The Guardian (London) 0.67% Los Angeles Sentinel 0.59% Times of India 0.55% Financial Times McClatchy-Tribune Business 0.53% (London) News 4% Sacramento Observer 0.50% Next 15 titles Sun Reporter (San Francisco) 0.48% Chicago Tribune 13% 4%
* These numbers are high, as title de-duplication (both between and within vendors) was done manually for the top 20 titles and could not be accurately done automatically Usage of Top Ten Major US Newspapers (by circulation), 2012-2013
90000
80000
70000
60000
50000
40000 text ArticleRequests text - 30000
20000
10000
0 The The The The Daily Daily The The Los Los New New Chicago Chicago The The
SuccessfulFull Wall Wall New New USA USA News News Washin Washin Chicago Chicago Angeles Angeles York York Sun- Sun- Denver Denver Street Street York York Today Today of New of New gton gton Tribune Tribune Times Times Post Post Times Times Post Post Journal Journal Times Times 2012 2013 York York Post Post 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 ProQuest Historical 2412 1821 33532 27511 6094 4894 6176 5298 24543 13628 ProQuest 28533 81118 35 8 8 114 7 3259 2536 NewsBank 675 88 6 16 210 20 177 37 6 1273 1306 153 38 231 106 LexisNexis 17655 5455 167 199 77 37 4 9 27 8 430 316 58 32 5 Gale 6964 6005 11 29 4 1 51 119 EbscoHost 4 5 554 414 322 379 1 3
Average weekday circulation: 2,378,827 1,865,318 1,674,306 653,868 516,165 500,521 474,767 470,548 416,676 414,930
Circulation statistics: Alliance for Audited Media. Average weekday circulation October 2012-March 2013. Includes digital editions such as those on tablet computers or restricted websites as well as branded editions, which include regional editions or those tailored for commuters. Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013
EbscoHost Newspaper Source, 2012 Gale, 2012 M2PressWIRE 689 The New York Times 6964 Toronto Star 624 The Financial Times 270 USA Today 554 States News Service 176 The San Francisco Chronicle (San Francisco, CA) 425 PR Newswire 67 Christian Science Monitor 406 The Times (London England) 60 Times, The (United Kingdom) 374 The Washington Post 46 Arabia 2000 358 Sunday Times (London England) 39 Washington Post 322 CNN Wire 31 Philadelphia Inquirer, The (PA) 276 BBC Monitoring International Reports 15 Evening Standard 274 The Age (Melbourne Australia) 14 Total for all journals: 14738 Total for all titles: 7855
EbscoHost Newspaper Source, 2013 Gale, 2013 M2PressWIRE 498 The New York Times 6005 USA Today 400 States News Service 988 Toronto Star 387 The Financial Times 382 Arabia 2000 337 The Times (London England) 231 Washington Post 337 UWIRE Text 208 Times, The (United Kingdom) 330 Sunday Times (London England) 127 Christian Science Monitor 328 The Hindu (English) 127 All Things Considered (NPR) 209 International Business Times - US ed. 121 Morning Edition (NPR) 188 CNN Wire 114 Philadelphia Inquirer, The (PA) 159 The Washington Post 110 Total for all journals: 10077 Total for all titles: 9052 Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013
LexisNexis, 2012 NewsBank, 2012 The New York Times 17655 San Jose Mercury News (CA) 128927 Major World Publications 11351 San Mateo County Times (CA) 86748 All Full Text, English News 6123 Early American Imprints, Series 1 8177 Combined Newspapers 3046 Early American Imprints, Series II. Shaw-Shoemaker 7416 Miscellaneous 2686 Financial Times (London, England) 4274 International News, Company Info, Business Times, The (London, England) 3215 Opps & Analysis, Country Analysis & Legal Info 2613 Guardian, The (London, England) 2998 The State Journal –Register 1041 News-Gazette, The (Champaign-Urbana, IL) 2820 PR Newswire 833 Daily Telegraph, The/The Sunday Telegraph (London, England) 2231 US News (Papers & Wires) 676 Daily Mail, The / The Mail on Sunday 2194 NRC Handelsblad 407 Total for all titles: 394274 Total for all content: 55159 Newsbank, 2013 LexisNexis, 2013 Early American Imprints, Series 1 4214 Major World Publications 31773 Early American Imprints, Series II. Shaw-Shoemaker 4161 All Full Text, English News 15492 State Journal-Register, The (Springfield, IL) 3686 The New York Times 5455 Undefined 2988 Combined Newspapers 2183 News-Gazette, The: Web Edition Articles (Champaign-Urbana, IL) 1955 Miscellaneous 1157 News-Gazette, The (Champaign-Urbana, IL) 1660 Le Monde 692 Chicago Sun-Times (IL) 1212 US News (Papers & Wires) 513 Targeted News Service (USA) 1199 Daily Deal/The Deal 463 US Fed News (USA) 931 Combined Wire Services 280 Tombstone Epitaph Prospector* 758 USA Today 199 Total for all titles: 60717 Total for all content: 63888 *all but four requests from this title are from a single month Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013
ProQuest, 2012 PressDisplay, 2012 (measured by issues read) Wall Street Journal Eastern edition 23201 Chicago Tribune 1090 New York Times (1923-Current file) 22931 The Washington Post 455 Chicago Daily Tribune (1923-1963) 10993 The Moscow Times 346 New York Times (1857-1922) 9373 The Wall Street Journal Europe 321 PR Newswire 7589 South China Morning Post 212 Business Wire 7584 Daily Mail 180 Chicago Daily Tribune (1872-1922) 5925 Brasil Economico 167 Chicago Tribune (1963-Current file) 5658 The Washington Post Sunday 130 Wall Street Journal (Online) 5332 Valor Economico 130 Los Angeles Times (1923-Current File) 4867 USA TODAY US Edition 118 Total for all titles: 169410 Total for all titles: 4909
ProQuest, 2013 PressDisplay, 2013 (measured by issues read) Wall Street Journal 65459 Chicago Tribune 774 New York Times (1923-Current file) 22309 The Washington Post 220 Financial Times 16117 Brasil Economico 216 Wall Street Journal (Online) 15659 NRC Handelsblad 204 Chicago Daily Defender (Daily Edition) (1960-1973) 9250 The Moscow Times 192 PR Newswire 5633 Chicago Sun-Times.com 102 Chicago Daily Tribune (1923-1963) 5361 Los Angeles Times 82 The Chicago Defender (National edition) (1921-1967) 5216 USA TODAY International Edition 72 New York Times (1857-1922) 4916 USA TODAY US Edition 67 Los Angeles Times (1923-Current File) 4085 International New York Times 66 Total for all titles: 240490 Total for all titles: 3343 Usage Rank and Score Usage Score
i: row entry
n: newspaper title
v: vendor Rank score
For each month, get rank score for each title For each vendor, get monthly average rank score
2012 Usage Score
2012 Rank Score
2013 Usage Score 2013 Rank Score Factor Two: Cost Cost per Use
Vendor Use Cost Rank (2012) per Use
EbscoHost Newspaper Source 14738 $0.00 1 Gale (Academic OneFile, OVC, Biog; newspaper, newswire formats only) 7855 $0.08 2 NewsBank (Access World News, World Newspaper Archive, Archives of America) 394274 $0.22 3 LexisNexis Academic (news sources) 55159 $0.36 4 ProQuest (newspapers, newswires, historical newspapers) 169410 $0.85 5 PressDisplay 4909 $2.15 N/A Problems with Cost Analysis
EBSCO cost is > $0 Calculating costs for news content within mixed databases: Gale, LexisNexis, and ProQuest Measurement issues: Press Display, LexisNexis Usage statistics calendar year, cost by Fiscal Year Each title or database has a cost in addition to the access fee, which was not calculated Need for more Cost data (FY13 & FY14)
Factor Three: Quality Quality
How to measure without objective metrics? No Impact Factor Newspapers ≠ Journals Current news: circulation? WSJ, NYT have high use, high circulation (#1 & #2) USA Today, NY Daily News, low use, high circulation (#3, #5) Historical news: subjective WSJ, NYT still important news sources Data Mining?
Conclusions Findings Measured Utility and Cost, but not Quality. Ranked titles and vendors on usage, vendors on cost Certain titles drive use… New York Times, Wall Street Journal Many titles, but most are never used Why so many newspapers are never used? Is something wrong with our discovery systems (SFX)? Should we pay for content which is never used? Some titles are duplicated across vendors Data mining exists: problems, outreach to users? Need to be able to see usage grouped by purchase group vs. individual titles only in order to be able to match the usage with the costs paid COUNTER-like statistics needed for non-journal serials that have articles Need more quality metrics for individual historical newspaper titles Lack of good metadata hinders analysis (identifiers, uniform title, coverage)
Next steps Collect 2014 Usage data Coverage analysis Language (spec. non-English) Rank by Quintile (0-4) Improve cost analysis Collect FY13 & FY14 acquisition data Investigate additional costs Rank by Quartile (0-3) Develop Quality metric Consult with faculty/staff to develop list Boolean (0-1) Weighted Value Analysis Literature
Breakstone, Elizabeth R. 2010. “Now how much of your print collection is really online? An analysis of the overlap of print and digital holdings at the University of Oregon Law Library.” Legal Reference Services Quarterly 29 (4): 255–275. doi:10.1080/0270319X.2010.527781. Cheney, Debora. 2013. “Text mining newspapers and news content: new trends and research methodologies.” Paper presented at: IFLA World Library and Information Congress, 17 - 23 August 2013, Singapore. Cooper, Mindy M. 2007. “The importance of gathering print and electronic journal use data: getting a clear picture.” Serials Review 33 (3) (September): 172–174. doi:10.1016/j.serrev.2007.06.001. Way, Doug. 2010. “The impact of web-scale discovery on the use of a library collection.” Serials Review 36 (4) (December): 214–220. doi:10.1016/j.serrev.2010.07.002. Wilson, Jacqueline and Li, Chan. “Calculating scholarly journal value through objective metrics.” CDLINFO News http://www.cdlib.org/cdlinfo/2012/02/13/calculating-scholarly-journal-value-through- objective-metrics Wood, Elizabeth H. 2006. “Measuring journal usage: add a survey to the statistics?” Journal of Electronic Resources in Medical Libraries 3 (1) (January): 57–61. doi:10.1300/J383v03n01_06. Zappen, Susan H. 2010. “Managing resources to maximize serials access: the case of the small liberal arts college library.” Serials Librarian 59 (3/4) (October): 346–359. doi:10.1080/03615261003623104.