National of - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

NATIONAL DIGITAL SERVICES

SHAPING AND TRANSFORMING DIGITAL LANDSCAPE:

COLLABORATION WITH THE HIGHER EDUCATION SECTOR

ALISON DELLIT, ASSISTANT DIRECTOR-GENERAL NATIONAL COLLECTIONS ACCESS &

AILEEN WEIR, DIRECTOR DIGITAL BUSINESS

PAPER FOR CAUL COUNCIL - 21-22 SEPTEMBER 2017

This paper outlines the approach that the National Library of Australia is taking to transition towards a National Digital Services Agreement, which will replace existing collaborative arrangements and establish more holistic and inclusive management of digital library services, Trove, cataloguing and inter-library loan services, data aggregation, metrics and other services. University are central stakeholders in Australia’s data, publishing and archival landscape, and the Library anticipates working closely and collaboratively with the higher education sector as we transition towards the new framework.

The NLA and CAUL share common goals and values including a commitment to digital dexterity and fair, affordable and open access to knowledge. Collectively, we want to ensure our collections remain discoverable, accessible, and preserved for future generations. In an environment of diminishing budgets and expanding costs, collaboration can provide efficiencies and mutual support while showcasing the research value of aggregation.

The importance of Trove Trove has played a key role in achieving these aspirations for nearly a decade. Built on the spine of the nation’s bibliographic database, Libraries Australia, the addition of full-text content resulted in a well-known, often passionately loved, platform of services. Along with other sectors, researchers have embraced Trove, using it to find resources, develop new knowledge, and share their work. An example of the way Trove assists in the research lifecycle is the work of Dr Katherine Bode, Associate Professor, Literary and Textual Studies at the Australian National University.

Dr Bode devised a paratextual method to automatically identify and harvest fictional content scattered throughout Trove’s collection of 19th century newspapers. She discovered over 16,500 fictional works, identified a host of new Australian works and authors, and unearthed previously unlisted fiction by notable authors including Catherine Martin and Jessie Mabel Waterhouse. Her discoveries will be published in a new book A World of Fiction: Digital and the Future of Literary History, due in 2018. Trove is now working with Dr Bode to develop an automated way to generate for these hidden works of fiction and integrate the records back into the database.

Other examples include:

National Library of Australia - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

 The Prosecution Project – exploring the history of the criminal trial in Australia. Dr Mark Finnane, Professor of History in the School of Humanities, Griffith University, is analysing patterns of crime, prosecution and punishment in Australia from 1850-1960 using digitised Supreme Court registers and newspapers.  Impact of 1918-19 flu epidemic on development of SA health services. Dr Mayumi Kako, School of Nursing, Flinders University, is using historical newspapers to investigate the extent of the flu outbreak and the public health response.  The Australasian Language Technology Association is using the Trove Optical Character Recognition (OCR) software as a teaching tool to enable computer programming students to improve the quality of OCR text conversion.

Trove operates on an unparalleled scale in Australia, showcasing content from over 1400 different organisations including every Australian university, over 40 government departments, over 50 museums and archives and more than 500 public and special libraries. Trove combines the metadata and holdings of all these collections with its growing corpus of digitised and born- digital content to present a comprehensive overview of Australian cultural heritage. The social history value found in the hundreds of digitised Australian newspapers and Government Gazettes is a goldmine for academic research and attracts thousands of users to Trove every day.

Access to content in Trove is free to the end-user, a key to its success and a principle the National Library is committed to maintaining. Embracing the principle of open access, the National Library released the Trove Application Programming Interface (API) in 2012 inviting users to freely manipulate data and content. Since its release, the Library has issued over 1500 API keys enabling researchers and other institutions to generate new interfaces and applications and interrogate Trove data in unique ways that support research. With additional investment, an improved API would offer greater potential to slice and dice Trove data, present customised views and answer research questions.

Years of investment by the National Library in the digital infrastructure underpinning Trove is beginning to pay dividends. A world leader in managing large scale web content, Trove’s combination of metadata, digitised content, born-digital full-text and web archives, all accessible through a single platform, is unique in the world.

Trove’s future Trove has been a great success, but managing this scale of online content is outstripping the resources of a single institution. The National Library knows that end-users want more functionality, contributors want more participative governance, greater recognition and better metrics and that the API needs attention.

Trove’s potential to enable data analysis and research collaboration remains largely untapped. As outlined in the accompanying paper, additional investment in Trove would enhance the platform to provide more data, control and decision-making information to university librarians and research managers.

The National Library’s vision for Trove is in direct alignment with CAUL’s stated aims to build digital dexterity in our communities and promote fair, affordable and open access to knowledge. Working with us to innovate, build and improve Trove will enable better integration and visibility of university library and repository content and strengthen its academic research potential. Some of the improvements NLA has on its wish list include:

 Improved metrics – Because Trove aggregates content from all universities, it becomes possible not only to summarise the totality of activity emanating from a particular

Transformation through collaboration: shaping the Australian digital landscape 2

National Library of Australia - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

institution but also generate cross-institutional comparative data. Sophisticated measures presented as a ‘statistics dashboard’ would assist CAUL members to meet its goal to “facilitate assessment and evaluation of library services that provide evidence of impact and value” and “maintain a useful and relevant collection of statistics”1  Customised views – Filters that limit searches to research outputs, theses, datasets and other content relevant to each university would enable a holistic view of institutional output. As described in the accompanying paper, this could ultimately lead to a ‘Trove Research Portal’ that aggregates content held across all Australian university repositories and collections.  Enhanced access to full-text – The software and expertise gained from indexing full- text digitised newspapers, archived websites, podcasts and other content could be applied to the full-text content held in repositories. This has the potential not only to enhance cross-institutional research access to this rich content but also reduce the costs of creating human-generated metadata.  Improved Application Programming Interface (API) – A more robust API would facilitate access to large-scale digital data and enable computationally intensive research across collections, institutions, disciplines and formats.  Digitisation – The National Library has been digitising newspapers and other content for years and has developed considerable expertise that could be leveraged by university libraries to capture and expose their unique collections and “build capacity in publishing and digitisation”2. The Library could potentially act as a digitisation clearing house by overseeing and coordinating digitisation projects to “facilitate the exposure of digitised special collections in Australian university libraries”.  Support for ERA, ARC and NHMRC reporting and compliance – The inclusion of ORCID persistent identifiers, Australian Research Council and other research project identification tags will enable institutions and individual researchers to track their outputs and open-access obligations through Trove. Developing reporting and metrics that support ERA and other research compliance frameworks would “strengthen the role of university libraries as a partner in the research process”3.  Teacher toolkits – In addition to supporting the research agenda, the diverse content on Trove can be packaged to create teaching resources that promote university library collections and digitised content relevant to particular courses “supporting transformations in the role of university libraries as partners in learning and teaching”4  Leverage international relationships - Trove enjoys a global reputation and adheres to international standards of interoperability that align with comparator services such as . Trove’s established international relationships and profile would assist CAUL to “build and maintain relationships with other peer groups in the library and information science sector, nationally and internationally”5

If funding to support new Trove services is achieved the National Library would welcome suggestions from CAUL members about other services which will allow them to support the research agenda and gain efficiencies in their own libraries.

1 CAUL Strategic Directions 2014-2016 #5 ‘Sharing Knowledge and Data’ 2 CAUL Strategic Directions 2014-2016 #4 ‘Access to Information’ 3 CAUL Strategic Directions 2014-2016 #3 ‘Research’ 4 CAUL Strategic Directions 2014-2016 #2 ‘Learning and Teaching’ 5 CAUL Strategic Directions 2014-2016 #1 ‘Engagement and Influence’

Transformation through collaboration: shaping the Australian digital landscape 3

National Library of Australia - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

Why change things now? The Australian digital landscape is changing. Metadata aggregation and resource sharing are now two activities among many. Searchable full-text resources and citation searches have changed discovery, and discovery itself is mostly outside library platforms. The proliferation of digitised and born-digital content, the rise of big data and digital humanities and the strategic importance of open access research outputs and repository content are redefining how academics conduct research and their expectations of what university libraries can deliver.

Heritage libraries, archives and museums are prioritising precious resources to digitise and promote their own unique collections; and similarly higher education institutions are resourcing the support, promotion and measurement of their researchers’ outputs. Exposure of resources, measuring usage, and facilitating collaboration among creators and researchers are now core business for all of us.

We all recognise that, although the Australian National Bibliographic Database (ANBD) is the backbone of Trove, it is no longer the whole story of sector collaboration, and is no longer the highest priority for many of our collaborators. Bringing all collaborative activities the Library facilitates into a single tent, we intend to develop a clearer capacity to prioritise resources according to needs of the sector more broadly, as well expand what we are able to achieve collectively.

Efficiencies and economies of scale Storage, management, preservation, delivery and discovery of digital content are all costly activities. The National Library has invested heavily in infrastructure and established a strong foundation for managing digital content at scale. Trove is comprised of a complex and sophisticated ecosystem of highly interconnected hardware and software that supports 250,000 searches a day across over 500 million resources. The Library houses more than 4 petabytes of data on 300 virtual servers and has made significant investment in digital preservation and data security.

It makes sense to leverage this investment for the collective benefit of the tertiary sector. There are benefits and economies of scale to be gained. By joining forces, we give greater visibility to our freely available content in a highly competitive online environment, strengthen the position and bargaining power of libraries within the research sector and safeguard the longevity and sustainability of services on which Australian researchers depend.

Keeping Trove sustainable Following years of reduced budgets, the Commonwealth government announced that Trove would receive $16.4m over four years from 2016-17 as part of its modernisation program. This targeted, one-off funding has been granted to the Library for a specific purpose - to digitise material, sustain Trove’s critical infrastructure, and to better position Trove into the future. There is a clear expectation from Government that the Library will develop a sustainable business model to keep Trove operational by the end of this period. In the next few years, the Library is using these funds to address some of the most pressing user interface challenges that’s Trove’s loving-yet-frustrated users face. Government funding to aggregate metadata from large organisations – which includes all university research repositories – is available for this financial year. This places urgency on the development of a new funding model.

Transformation through collaboration: shaping the Australian digital landscape 4

National Library of Australia - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

The Library’s strategy The Library is responding to this issue by developing a new agreement between content contributors to Trove intended to facilitate a sustainable digital service infrastructure for digital collections. The proposed National Digital Services Agreement (NDSA) would encompass the full range of digital services accessible through Trove and reliant on the same technical infrastructure:

 Libraries Australia – bibliographic and holdings records of over 1100 libraries capturing over 50 million holdings, and associated services  Trove discovery platform supporting more than 500 million resources  Trove content repository for the preservation, management and discovery of digital and born digital material which will become increasingly valuable to researchers  Digitised newspapers, government gazettes and journals - over 20 million pages of content The NDSA will invite all contributing libraries and collecting organisations who have an interest in these services’ survival to invest in the national system by building a supportable economic future for our collective digital services.

Under the NDSA, the Library will establish a new, more inclusive governance model that will encompass all aspects of Trove, not just Libraries Australia. The Library recognises that co- investors and collaborative partners will want to actively shape Trove and have genuine involvement in strategic planning. The Library envisions this could be achieved by an overarching Advisory Council that would provide advice and guidance on new service offerings, future technical developments, and policy issues. Similar to the Libraries Australia Advisory Council, the new Council would have broad, cross-sectoral representation and potentially be supported by a sub-committee structure. CAUL and NSLA would play a central leadership role supplemented by representatives from the tertiary/research sector, public/special libraries and galleries/archives/museums communities.

Funding model In practical terms, the Library anticipates that all services would be unified under a single fee structure. Membership fees will be strictly cost-recovery. Allocation of costs to fees will be done on a sliding scale very similar to the current model used for Libraries Australia and aligning to the size and type of institution, size of digital collection and level of engagement with different services. The Library is doing a forensic analysis of its costs to clearly isolate the elements directly attributable to running collaborative services from other components of the Library’s core business to ensure that fees only attempt to recoup those activities. The resulting cost object – a detailed description of what is included in the cost recovery analysis - will be shared with partners.

The Library is cognizant that fees must deliver value to members for this initiative to be successful. Ensuring that services align to benefits for members is the central objective of this round of consultation. The incoming governance group for the new service may also face tough decisions, as achieving full cost recovery may not be immediately achievable. We believe these decisions will be more strategic for all Australians if made together.

Seeking membership funds from contributors to Trove is not the only tactic the Library is pursuing. The Library will continue to build a case for Government to provide longer term funding for these services, both through portfolio funding, and through the 2016 National

Transformation through collaboration: shaping the Australian digital landscape 5

National Library of Australia - National Digital Services - CAUL 21-22 Sept 2017.docx.pdf

Research Infrastructure Roadmap6 process. The Roadmap report states that the current small- scale national research infrastructure that includes Trove, Australian Data and the Atlas of Living Australia should be leveraged, along with the use of digitisation, to improve future research needs that demand increased discoverability, accessibility and the utilisation of innovative technologies. (p. 35).

The Library will also pursue other potential revenue streams such as commercial partners. Any extra source of revenue to Trove will reduce the proportion recovered through the National Digital Services Agreement, either through reduced fees or expanded services.

Trove and branding

While Trove has always been a collaborative effort from Australian collecting institutions, the National Library has funded and maintained the service infrastructure for Trove’s first 10 years. As we move into a different life-cycle for Trove, work will be needed to ensure that the Trove brand is immediately recognised as a collaboration. This discussion has started with the National and State Libraries Australasia, whose members have indicated a willingness to be involved in this process. On October 20, the National Library will be engaging in a workshop with GLAM Peak to better understand their needs, and discuss how the brand might be transitioned to better reflect the multiplicity of contributing partners. We would welcome opportunities to discuss this process with our higher education collaborators as we move forward.

What next? The National Library would like to establish a dialogue with the higher education sector about the issues this paper raises. As discussed in the accompanying paper, the Library would especially welcome feedback regarding what activities provide value, and how collaboration could better meet the needs of librarians, repositories managers, researchers and other stakeholders. The Library is actively pursuing conversations with leaders and stakeholders across the higher education and research sectors and welcomes all input as our plans to formalise a more comprehensive and inclusive collaborative arrangement progress.

6 2016 National Research Infrastructure Roadmap. Available from: https://docs.education.gov.au/node/43736 [24 Aug 2017]

Transformation through collaboration: shaping the Australian digital landscape 6