<<

Value of the Online Collection at the University of Illinois at Urbana-Champaign Library

Kirk Hess, Digital Humanities Specialist Sarah Hoover, History, Philosophy, and Newspaper Library Graduate Assistant

University Library University of Illinois at Urbana-Champaign Research Questions

 How did our patrons access digitized newspaper content in 2012 and 2013?  Can we develop a reliable metric to measure the relative usefulness of each title and/or aggregator?  Utility  Cost  Quality

Online Newspaper Sources Included in Study

 EbscoHost: Newspaper Source  Gale: newspaper and newswire titles from Academic OneFile, Opposing Viewpoints in Context, and Biography in Context  LexisNexis Academic: news sources  NewsBank: Access , Archives of Americana, and World Newspaper Archive  ProQuest: newspaper, historical newspaper, and wire feed titles  Library PressDisplay

Factor One: Utility Measure

 Most vendors provided COUNTER JR1 reports, which track successful full-text article requests by title  LexisNexis does not provide JR1 reports and has its own article-level usage statistics  PressDisplay is tracked by issues read rather than by article  Filtered by type of content   Newswires  Historical newspapers Vendor-Level Usage Newspaper Article Requests by Vendor, 2012-2013 This point includes data mining of a number of UK newspaper titles 100000 This point includes data mining of the San Jose

Mercury News (CA) and San Mateo County Times (CA)

80000 This point includes data mining of the Journal, Chicago Defender, and (London) 60000 EbscoHost Gale

LexisNexis

Text ArticleRequests Text - 40000 NewsBank ProQuest

20000 SuccessfulFull

0 PressDisplay Usage, 2012-2013

700

600

500

400

300 Issues Issues Read

200

100

0 EbscoHost Usage by Title (Log scale)

2013

1000

100 TextArticle Requests 2012 - 10

1000

1 Successful Successful Full 0 50 100 150 200 250 300 350 100

Individual Titles TextArticle Requests - 10

1 Successful Successful Full 0 100 200 300 400 500 Individual Titles Gale Usage by Title (Log scale)

2013

10000

1000

100

2012 TextArticle Requests -

10000 10

1000

Successful Successful Full 1 0 10 20 30 40 50 60 70 Individual Titles

100

TextArticle Requests -

10

Successful Successful Full 1 0 10 20 30 40 50 60 Individual Titles LexisNexis Usage by Title (Log scale)

2013

100000

10000

1000

100

TextArticle Requests -

2012 10

100000 1

10000 0 100 200 300 400 500 600 700 800 Successful Successful Full Individual Titles and Aggregate Categories

1000

100

TextArticle Requests -

10

1

Successufl Full 0 100 200 300 400 500 600 700 800 Individual Titles and Aggregate Categories NewsBank Usage by Title (Log scale)

2013

10000

1000

100

TextArticle Requests - 2012 10

1000000

Successful Successful Full 1 0 500 1000 1500 2000 2500 3000 3500 10000

Individual Titles TextArticle Requests - 100

1

Successful Successful Full 0 1000 2000 3000 4000 5000 Individual Titles ProQuest Usage by Title (Log scale)

2013

100000

10000

1000

2012 TextArticle Requests 100 -

100000 10

10000 1 Successful Successful Full 0 100 200 300 400 500 600 700 800 900

1000 Individual Titles TextArticle Requests - 100

10 Successful Successful Full 1 0 50 100 150 200 250 300 350 400 450 Individual Titles PressDisplay Usage by Title (Log scale)

2013 1000

100

2012 10000 Issues Read 10

1000 1

0 50 100 150 200 250 300 350 400 450 Individual Titles

100 Issues Issues Read

10

1 0 100 200 300 400 500 Individual Titles Title Usage by Vendor, 2012-2013

EbscoHost 2012 EbscoHost 2013 Gale 2012 Gale 2013 LexisNexis 2012 LexisNexis 2013 Newsbank 2012 Newsbank 2013 ProQuest 2012 ProQuest 2013 PressDisplay 2012 PressDisplay 2013

0 2000 4000 6000 8000 10000 Titles with 2 or more requests Titles with 1 request Titles with 0 requests

*Source of data for total titles varies by vendor; some title counts post-date the year of usage, thus making the total number of titles with 0 requests an approximation. Title-Level Usage Coverage Notes

 Coverage ranges for individual titles from different vendors vary  ProQuest titles are separated in the graphs into ProQuest and ProQuest Historical based on the designation within ProQuest databases, but there is generally an overlap in coverage:  ProQuest Historical: ~19th century - 1990s or  “Current” ProQuest: ~1980s – current  Coverage for Gale, LexisNexis, NewsBank generally starts in the 1980s Title-Level Methods and Notes

 Analysis brought together separate data points for print and online versions as well as varied date ranges from ProQuest Historical titles under a single title heading  Variations in vendor naming practices limited the effectiveness of automatic matching for identifying title overlaps  Individual newspaper titles within LexisNexis are also part of aggregated sets in LexisNexis, with no data available for article views from individual titles within those sets. Examples include:  Major World Publications  Combined Newspapers  All Full-text English  US Newspapers and Wires Most-Used Newspaper Titles

140000

120000

100000

80000

60000

Text ArticleRequests Text -

40000

20000

0 San San SuccessfulFull San Jose San Jose Financial Financial Mateo Mateo The New The New Wall Wall Mercury Mercury Chicago Chicago PR PR Chicago Chicago Business Washingt Washingt Times Times County County York York Street Street News News Tribune Tribune Newswir Newswir Defender Defender Wire Wire on Post on Post (London, (London, Times Times Times Times Journal Journal (CA) (CA) 2012 2013 e 2012 e 2013 2012 2013 2012 2013 2012 2013 England) England) (CA) (CA) 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 ProQuest Historical 33532 27511 2412 1821 24543 13628 7078 17035 6176 5298 ProQuest 28533 81118 3259 2536 7589 5633 1857 305 7584 3180 114 1511 16537 NewsBank 128927 42 86748 5 231 106 472 401 6 4 455 321 4274 108 LexisNexis 10 17655 5455 833 52 430 358 240 50 Gale 6964 6005 67 442 51 119 270 382 EbscoHost 113 83 1 1 5 4 454 379 Anomalous Activity Rates in Most-Requested Titles, 2012

Title Vendor Months of Peak requests Average Median high activity in a single requests per requests per month month month

San Jose Mercury NewsBank May-July 2012 65639 10743.9 17.5 Data mining News (CA) San Mateo County NewsBank May-July 2012 39002 7229 85 Times (CA) Times LexisNexis Feb-March 2012 7714 1471.3 570 PR Newswire ProQuest December 2012 5906 632.4 142 Business Wire ProQuest December 2012 5906 632 131.5 Financial Times NewsBank March-April 2985 356.2 17.5 (London) 2012 NewsBank March-April 1557 267.9 5.5 (London) 2012 NewsBank March-April 1841 249.8 8 (London) 2012 Chicago Defender ProQuest May 2012 1782 154.8 4 All of these cases appear to represent some degree of data mining. Anomalous Activity Rates in Most-Requested Titles, 2013

Title Vendor Months of Peak requests Average Median high activity in a single requests per requests per month month month Wall Street ProQuest March 2013 28767 6759.8 3779 Journal Chicago ProQuest March 2013 10796 1419.6 426.5 Defender (historical) Financial Times ProQuest March 2013 11051 1378.1 278 (London) PR Newswire ProQuest February 2013 2858 469.4 167 Business Wire ProQuest February 2013 1305 265 114.5 Los Angeles ProQuest March 2013 1256 142.6 8.5 Sentinel Sacramento ProQuest March 2013 1449 160.4 4 Observer

All of these cases appear to represent some degree of data mining. Top 20 Titles out of Total Requests, 2012 (641436 requests)

Titles not used: ~11520* All other titles (~6326*) San Jose Mercury 36% News (CA) 20% 15 titles:

PR Newswire 1.40% Chicago Defender 1.39% Business Wire 1.25% San Mateo County Washington Post 1.11% Times (CA) Financial Times (London) 0.98% 14% 0.96% The Baltimore Sun 0.73% The Guardian (London) 0.70% The Times (London) 0.57% The News-Gazette 0.57% (Champaign-Urbana, IL) The State Journal-Register 0.46% (Springfield, IL) 9% New York Tribune 0.42% San Francisco Chronicle 0.40% /Sunday 0.35% Wall Street Journal Telegraph (London) 5% The /Mail on Sunday 0.35% Next 15 titles (London) 12% 4%

* These numbers are high, as title de-duplication (both between and within vendors) was done manually for the top 20 titles and could not be accurately done automatically Top 20 Titles out of Total Requests, 2013 (384224 requests)

Titles not used: ~12649* All other titles (~4742*) 42% Wall Street Journal 22%

Next 15 titles:

PR Newswire 1.70% Washington Post 1.63% Los Angeles Times 1.33% New York Tribune 1.07% The State Journal-Register 0.96% The New York Times (Springfield, IL) The News-Gazette 0.95% 10% (Champaign-Urbana, IL) Business Wire 0.91% The Baltimore Sun 0.84% Chicago Defender 0.70% 5% The Guardian (London) 0.67% Los Angeles Sentinel 0.59% Times of 0.55% Financial Times McClatchy-Tribune Business 0.53% (London) News 4% Sacramento Observer 0.50% Next 15 titles Sun Reporter (San Francisco) 0.48% Chicago Tribune 13% 4%

* These numbers are high, as title de-duplication (both between and within vendors) was done manually for the top 20 titles and could not be accurately done automatically Usage of Top Ten Major US Newspapers (by circulation), 2012-2013

90000

80000

70000

60000

50000

40000 text ArticleRequests text - 30000

20000

10000

0 The The The The Daily Daily The The Los Los New New Chicago Chicago The The

SuccessfulFull Wall Wall New New USA USA News News Washin Washin Chicago Chicago Angeles Angeles York York Sun- Sun- Denver Denver Street Street York York Today Today of New of New gton gton Tribune Tribune Times Times Post Post Times Times Post Post Journal Journal Times Times 2012 2013 York York Post Post 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 2012 2013 ProQuest Historical 2412 1821 33532 27511 6094 4894 6176 5298 24543 13628 ProQuest 28533 81118 35 8 8 114 7 3259 2536 NewsBank 675 88 6 16 210 20 177 37 6 1273 1306 153 38 231 106 LexisNexis 17655 5455 167 199 77 37 4 9 27 8 430 316 58 32 5 Gale 6964 6005 11 29 4 1 51 119 EbscoHost 4 5 554 414 322 379 1 3

Average weekday circulation: 2,378,827 1,865,318 1,674,306 653,868 516,165 500,521 474,767 470,548 416,676 414,930

Circulation statistics: Alliance for Audited Media. Average weekday circulation October 2012-March 2013. Includes digital editions such as those on tablet computers or restricted websites as well as branded editions, which include regional editions or those tailored for commuters. Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013

EbscoHost Newspaper Source, 2012 Gale, 2012 M2PressWIRE 689 The New York Times 6964 Toronto Star 624 The Financial Times 270 USA Today 554 States News Service 176 The San Francisco Chronicle (San Francisco, CA) 425 PR Newswire 67 Christian Science Monitor 406 The Times (London England) 60 Times, The (United Kingdom) 374 46 Arabia 2000 358 Sunday Times (London England) 39 Washington Post 322 CNN Wire 31 Philadelphia Inquirer, The (PA) 276 BBC Monitoring International Reports 15 Evening Standard 274 The Age (Melbourne ) 14 Total for all journals: 14738 Total for all titles: 7855

EbscoHost Newspaper Source, 2013 Gale, 2013 M2PressWIRE 498 The New York Times 6005 USA Today 400 States News Service 988 Toronto Star 387 The Financial Times 382 Arabia 2000 337 The Times (London England) 231 Washington Post 337 UWIRE Text 208 Times, The (United Kingdom) 330 Sunday Times (London England) 127 Christian Science Monitor 328 The Hindu (English) 127 All Things Considered (NPR) 209 International Business Times - US ed. 121 Morning Edition (NPR) 188 CNN Wire 114 Philadelphia Inquirer, The (PA) 159 The Washington Post 110 Total for all journals: 10077 Total for all titles: 9052 Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013

LexisNexis, 2012 NewsBank, 2012 The New York Times 17655 San Jose Mercury News (CA) 128927 Major World Publications 11351 San Mateo County Times (CA) 86748 All Full Text, English News 6123 Early American Imprints, Series 1 8177 Combined Newspapers 3046 Early American Imprints, Series II. Shaw-Shoemaker 7416 Miscellaneous 2686 Financial Times (London, England) 4274 International News, Company Info, Business Times, The (London, England) 3215 Opps & Analysis, Country Analysis & Legal Info 2613 Guardian, The (London, England) 2998 The State Journal –Register 1041 News-Gazette, The (Champaign-Urbana, IL) 2820 PR Newswire 833 Daily Telegraph, The/The Sunday Telegraph (London, England) 2231 US News (Papers & Wires) 676 Daily Mail, The / The Mail on Sunday 2194 NRC Handelsblad 407 Total for all titles: 394274 Total for all content: 55159 Newsbank, 2013 LexisNexis, 2013 Early American Imprints, Series 1 4214 Major World Publications 31773 Early American Imprints, Series II. Shaw-Shoemaker 4161 All Full Text, English News 15492 State Journal-Register, The (Springfield, IL) 3686 The New York Times 5455 Undefined 2988 Combined Newspapers 2183 News-Gazette, The: Web Edition Articles (Champaign-Urbana, IL) 1955 Miscellaneous 1157 News-Gazette, The (Champaign-Urbana, IL) 1660 Le Monde 692 Chicago Sun-Times (IL) 1212 US News (Papers & Wires) 513 Targeted News Service (USA) 1199 Daily Deal/The Deal 463 US Fed News (USA) 931 Combined Wire Services 280 Tombstone Epitaph Prospector* 758 USA Today 199 Total for all titles: 60717 Total for all content: 63888 *all but four requests from this title are from a single month Top Ten Titles (Successful Full-Text Article Requests) per Vendor, 2012-2013

ProQuest, 2012 PressDisplay, 2012 (measured by issues read) Wall Street Journal Eastern edition 23201 Chicago Tribune 1090 New York Times (1923-Current file) 22931 The Washington Post 455 Chicago Daily Tribune (1923-1963) 10993 The Moscow Times 346 New York Times (1857-1922) 9373 321 PR Newswire 7589 South Morning Post 212 Business Wire 7584 Daily Mail 180 Chicago Daily Tribune (1872-1922) 5925 Brasil Economico 167 Chicago Tribune (1963-Current file) 5658 The Washington Post Sunday 130 Wall Street Journal (Online) 5332 Valor Economico 130 Los Angeles Times (1923-Current File) 4867 USA TODAY US Edition 118 Total for all titles: 169410 Total for all titles: 4909

ProQuest, 2013 PressDisplay, 2013 (measured by issues read) Wall Street Journal 65459 Chicago Tribune 774 New York Times (1923-Current file) 22309 The Washington Post 220 Financial Times 16117 Brasil Economico 216 Wall Street Journal (Online) 15659 NRC Handelsblad 204 Chicago Daily Defender (Daily Edition) (1960-1973) 9250 The Moscow Times 192 PR Newswire 5633 Chicago Sun-Times.com 102 Chicago Daily Tribune (1923-1963) 5361 Los Angeles Times 82 The Chicago Defender (National edition) (1921-1967) 5216 USA TODAY International Edition 72 New York Times (1857-1922) 4916 USA TODAY US Edition 67 Los Angeles Times (1923-Current File) 4085 International New York Times 66 Total for all titles: 240490 Total for all titles: 3343 Usage Rank and Score Usage Score

i: row entry

n: newspaper title

v: vendor Rank score

 For each month, get rank score for each title  For each vendor, get monthly average rank score

2012 Usage Score

2012 Rank Score

2013 Usage Score 2013 Rank Score Factor Two: Cost Cost per Use

Vendor Use Cost Rank (2012) per Use

EbscoHost Newspaper Source 14738 $0.00 1 Gale (Academic OneFile, OVC, Biog; newspaper, newswire formats only) 7855 $0.08 2 NewsBank (Access World News, World Newspaper Archive, Archives of America) 394274 $0.22 3 LexisNexis Academic (news sources) 55159 $0.36 4 ProQuest (newspapers, newswires, historical newspapers) 169410 $0.85 5 PressDisplay 4909 $2.15 N/A Problems with Cost Analysis

 EBSCO cost is > $0  Calculating costs for news content within mixed databases: Gale, LexisNexis, and ProQuest  Measurement issues: Press Display, LexisNexis  Usage statistics calendar year, cost by Fiscal Year  Each title or database has a cost in addition to the access fee, which was not calculated  Need for more Cost data (FY13 & FY14)

Factor Three: Quality Quality

 How to measure without objective metrics?  No Impact Factor  Newspapers ≠ Journals  Current news: circulation?  WSJ, NYT have high use, high circulation (#1 & #2)  USA Today, NY Daily News, low use, high circulation (#3, #5)  Historical news: subjective  WSJ, NYT still important news sources  Data Mining?

Conclusions Findings  Measured Utility and Cost, but not Quality.  Ranked titles and vendors on usage, vendors on cost  Certain titles drive use…  New York Times, Wall Street Journal  Many titles, but most are never used  Why so many newspapers are never used?  Is something wrong with our discovery systems (SFX)?  Should we pay for content which is never used?  Some titles are duplicated across vendors  Data mining exists: problems, outreach to users?  Need to be able to see usage grouped by purchase group vs. individual titles only in order to be able to match the usage with the costs paid  COUNTER-like statistics needed for non-journal serials that have articles  Need more quality metrics for individual historical newspaper titles  Lack of good metadata hinders analysis (identifiers, uniform title, coverage)

Next steps  Collect 2014 Usage data  Coverage analysis  Language (spec. non-English)  Rank by Quintile (0-4)  Improve cost analysis  Collect FY13 & FY14 acquisition data  Investigate additional costs  Rank by Quartile (0-3)  Develop Quality metric  Consult with faculty/staff to develop list  Boolean (0-1)  Weighted Value Analysis Literature

 Breakstone, Elizabeth R. 2010. “Now how much of your print collection is really online? An analysis of the overlap of print and digital holdings at the University of Law Library.” Legal Reference Services Quarterly 29 (4): 255–275. doi:10.1080/0270319X.2010.527781.  Cheney, Debora. 2013. “Text mining newspapers and news content: new trends and research methodologies.” Paper presented at: IFLA World Library and Information Congress, 17 - 23 August 2013, .  Cooper, Mindy M. 2007. “The importance of gathering print and electronic journal use data: getting a clear picture.” Serials Review 33 (3) (September): 172–174. doi:10.1016/j.serrev.2007.06.001.  Way, Doug. 2010. “The impact of web-scale discovery on the use of a library collection.” Serials Review 36 (4) (December): 214–220. doi:10.1016/j.serrev.2010.07.002.  , Jacqueline and Li, Chan. “Calculating scholarly journal value through objective metrics.” CDLINFO News http://www.cdlib.org/cdlinfo/2012/02/13/calculating-scholarly-journal-value-through- objective-metrics  Wood, Elizabeth H. 2006. “Measuring journal usage: add a survey to the statistics?” Journal of Electronic Resources in Medical Libraries 3 (1) (January): 57–61. doi:10.1300/J383v03n01_06.  Zappen, Susan H. 2010. “Managing resources to maximize serials access: the case of the small liberal arts college library.” Serials Librarian 59 (3/4) (October): 346–359. doi:10.1080/03615261003623104.