Europeana Newspapers Collections Plan
Total Page:16
File Type:pdf, Size:1020Kb
D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN Europeana DSI 2– Access to Digital Resources of European Heritage DELIVERABLE D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN Revision 1.0 Date of submission 28 February 2017 Author(s) Clemens Neudecker (SBB) Dissemination Level Public 1 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN REVISION HISTORY AND STATEMENT OF ORIGINALITY Revision History Revision Date Author Organisation Description No. Clemens Berlin State 0.1 17-10-2016 Initial draft Neudecker Library Clemens Berlin State 0.2 26-10-2016 Revised draft Neudecker Library Clemens Berlin State 0.3 10-01-2017 New structure Neudecker Library Clemens Berlin State 0.4 23-01-2017 Populated new structure Neudecker Library Clemens Berlin State 0.5 13-02-2017 Consolidated draft Neudecker Library Nienke van Schaverbeke, Europeana Comments and 0.6 14-02-2017 Adrian Murphy, Foundation suggestions for revision David Haskiya Europeana Section on product design 0.7 21-02-2017 David Haskiya Foundation and development added Clemens Berlin State Comments and 0.8 22-02-2017 Neudecker Library suggestions addressed Europeana 0.9 23-02-2017 Douglas McCarthy Final review Foundation Clemens Berlin State 1.0 27-02-2017 Finalised document Neudecker Library Statement of originality: This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both. The sole responsibility of this publication lies with the author. The European Union is not responsible for any use that may be made of the information contained therein. ‘Europeana DSI is co-financed by the European Union's Connecting Europe Facility’ 2 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN Table of Contents 1. Purpose of this document 4 1.1 Relationship to other documents 4 2. Elevator Pitch 4 3. Introduction to Europeana Newspapers 4 4. Business Model 5 4.1 Audiences 6 4.2 Value proposition 7 4.3 Market landscape 7 4.3.1 Market trends 9 4.3.2 SWOT analysis 9 4.4 Channels and networks 10 4.4.1 Advisory board 12 4.4.2 Events and workshops 13 4.4.3 Meetings and conferences 13 4.5 Key activities 14 4.5.1 Themes / Exhibitions 14 4.5.2 Browse entry points 14 4.5.3 Blog 14 4.5.4 Social media 15 4.5.5 User engagement activities 15 4.5.6 Activities timeline 15 4.6 Resources 16 4.7 Impact 16 5. Product design and development 19 5.1 Scope of product development 19 5.2 Evaluation 20 5.3 Product development methodology and policies 21 5.4 Roadmap 21 3 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN 1. Purpose of this document This plan covers the ambitions and high level milestones for the Europeana Newspapers thematic collection over the duration of DSI-2 and beyond. It will be evaluated on an ongoing basis against implementation, results and actual landscape to ensure effectiveness. 1.1 Relationship to other documents This document follows strategy, direction and broad activities described in the Europeana Collections Plan1 (Deliverable 2.1). 2. Elevator Pitch “Europeana must now be regarded as the first port of call for anybody who wishes to explore the continent’s newspapers” - British Library Labs prize-winner Dr. Bob Nicholson, in his review of Europeana Newspapers for “Reviews in History”.2 As Dr. Nicholson notes in his review, Europeana Newspapers has improved access to Europe’s digital historical newspapers dramatically. Europeana Newspapers is currently the largest single collection of digitised historical newspapers from Europe, and the only online platform to provide a single entry point to digital newspaper full- text content covering four centuries, and more than a dozen languages, from major national and research libraries in Europe. Furthermore, Europeana Newspapers raises the bar for open library data, with a majority of the content (images and OCRed text) being made available without any restrictions for anyone to use and reuse due to public domain licensing, and all metadata being released under CC0. Thanks to its volume, variety and availability, the Europeana Newspapers collection is of tremendous value to a broad community of users, and can help enable new ways in research and education as well as support the further standardisation and adoption of best-practices for the large-scale digitisation and online presentation of historical newspapers in Europeana. 3. Introduction to Europeana Newspapers Europeana Newspapers is foreseen to launch early in 2018 as a thematic collection on Europeana and will be managed by the Berlin State Library3. Providing access to around 20 million pages - 12 million pages fully searchable - of digital historical newspaper content from 23 libraries in Europe, it forms one of the largest collections of Europe’s historical newspapers currently available online. 1 http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_DSI- 2/Deliverables/d2.1-europeana-collections-plan.pdf 2 http://www.history.ac.uk/reviews/review/1894 3 http://staatsbibliothek-berlin.de/ 4 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN The Europeana Newspapers thematic collection will build on the results of the highly successful 2012-2015 ICT-PSP project Europeana Newspapers (Ref. 297380)4. The Europeana Newspapers project was a joint effort of 18 partners with the aim to make a significant amount of Europe’s historical newspapers fully searchable by means of refining scanned newspaper pages from libraries throughout Europe with OCR and, to some extent, also article separation. At the close of the project, the historical newspapers portal prototype developed by The European Library (TEL)5 provided access to more than 20 million pages of historical newspapers from as early on as 1618 and up to the 21st century - equivalent to more than 1,000 newspaper titles with around 3,6 million issues. As indicated below (4.7), the prototype has seen a very promising degree of use, with around 1.3 million visitors to the site in 2016 and an exceptionally high user session duration of over 16 minutes, indicating that users really dive into the full-text content of the newspapers. Access to the collection is possible via a variety of entry points such as ● full-text search (string search), incl. autosuggestions ● search by newspaper title ● search by date of issue using a calendar ● search by country of publication on a map Additionally, filters and facets like “language”, “content provider” or “decade” allow for a further fine-tuning of the result set. A carousel with browsable title pages of newspapers published “on this day” completes the landing page of the portal. Usability testing has been done for the TEL portal, and the full report6 is available online. Three datasets have been released that compile subsets of the Europeana Newspapers collection targeted at specific audiences ● ENP Ground Truth dataset7 ● ENP Named Entity corpora for French, Dutch, German8 ● Plain text downloads of OCRed text9 Finally, work is undertaken to further develop a Europeana Newspapers API that will allow programmatic access to metadata, images and OCRed text. It is the aim of Europeana to replicate the full feature set of the TEL portal prototype described above for the Europeana Newspapers thematic collection. The release of features and functionalities will happen in phases, starting with a public alpha release with a limited feature set in January 2018 and expanding from there based on user feedback and needs. 4. Business Model Below we have attached the business model canvas of Europeana Newspapers. The thematic collection is central for its activities and scopes, as it is the main channel through which it can reach its target audience and engage its partners. 4 http://www.europeana-newspapers.eu/ 5 http://www.theeuropeanlibrary.org/tel4/newspapers 6 http://www.europeana-newspapers.eu/wp-content/uploads/2014/05/The-European-Library-Newspaper- Archive-Usability-testing-Report-April-2014.pdf 7 http://primaresearch.org/datasets/ENP 8 https://github.com/EuropeanaNewspapers/ner-corpora; due for publication in ELRA http://catalog.elra.info/ 9 http://research.europeana.eu/itemtype/newspapers 5 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN Business model canvas for Europeana Newspapers; https://canvanizer.com/canvas/rkch03Nv5PazM 4.1 Audiences As seen from a series of interviews in 2014 and 201510, this large and diverse historical newspaper collection has already attracted a broad user base. This includes scholars in the humanities and social sciences, researchers in computer science and language processing as well as teachers in schools and universities, but also important stakeholders such as libraries or other cultural heritage organisations holding historical newspaper collections. The core audiences for Europeana Newspapers can therefore be roughly prioritised into the following groups. ● Research Researchers from the humanities and social sciences are particularly interested in historical newspapers, as they allow them to study public opinion and culture in the past and provide a perspective on the daily life of ordinary people that is often not accounted for by history textbooks. Also researchers from the field of language studies, computational linguists and philologists show great interest in historical newspapers. A corpus of full-text covering four centuries, like Europeana Newspapers, enables them to look into the use and development of language over time. Historical newspapers are also one of the main sources for genealogists or people wanting to perform research into their family history. 10 http://www.europeana-newspapers.eu/category/interviews-with-researchers/ 6 D2.4: EUROPEANA NEWSPAPERS COLLECTIONS PLAN Finally, the pattern recognition community is researching novel algorithms for OCR and document analysis. Many of the unsolved challenges in this domain lie in the area of historical documents or the complex layout typically found in newspaper pages. The Europeana Newspapers data and ground-truth11 open up many possibilities for the development and testing of innovative approaches and technologies for the recognition and analysis of historical documents based on a real-world representative dataset.