Preserving large-scale cultural heritage: the case for collaboration

Anne-Marie Schwirtlich, Director-General, National Library of Australia

It is a great honour and pleasure to be addressing you today and I pass on the greetings of Australian colleagues. Australia has been a keen observer and participant in CONSAL meetings since the very first conference in Singapore in 1970. How far we have all come since then!

Introduction

The large-scale loss of cultural heritage materials can happen for a number of reasons. It can happen when important cultural heritage content is lost to society because it was never collected and stored by the appropriate institution, whether library, archive, museum or gallery. It can happen when institutions themselves fail to take appropriate care of their collections which become degraded and inaccessible. It can happen when even the most careful attention cannot prevent the natural degradation of physical materials caused by time itself, such as the crumbling pages of thread-bound books or brittle newsprint.

Cultural heritage is also lost when collections are damaged and destroyed by natural disasters, such as earthquake, tsunami or fire. [slides 2,3,4] These events are beyond our control, although much can be done to protect our buildings and collections from the risk of natural disaster. Worst of all perhaps [slide 5], cultural heritage is lost at times of war and conflict due to looting and deliberate destruction, as we are seeing now in some parts of the world. Tragically, we are powerless to prevent this.

What can heritage collecting institutions do, then, to preserve their collections on a large scale?

My presentation focuses on two strategies. Firstly, large-scale collecting to protect digital heritage materials that are outside library collections and in danger of loss, namely web-based publications and websites. And, secondly,

1 employing large-scale digitisation to preserve and make accessible fragile collection content, focusing on historical Australian newspapers. Overall, I want to emphasise the role of collaboration and cooperation in the successful achievement of these important tasks.

Setting the scene

At the National Library of Australia, we build and manage a set of rapidly growing and complex digital collections. [slide 6] In June 2015 these collections comprised about 3,500 terabytes of data. [slide 7] They include archived copies of Australian websites; digitised copies of historic Australian newspapers; digitised copies of oral history and other audio files; digitised copies of analogue collection items such as pictures, music scores, maps, and manuscripts, and a small collection of digital photographs and personal archives.

In addition to the digital collections we also build and manage physical collections of over 6 million volumes, housed in a heritage building in Canberra and two off-site storage repositories.

Our staff numbers are around 410 Full Time Equivalents, and have been reducing each year in line with government budget requirements. It is therefore critical for us to make the best use of our scarce resources as we take on the ever-growing demands of both digital and physical collection management and preservation.

So why is collaboration such an important principle?

To explain this, I’d like to share with you a comment made by the International Internet Preservation Consortium some years ago about the obstacles to preserving the web. [Slide 8] The Consortium noted that:

The task is too large for individual institutions to undertake in isolation and the resources required for successful and sustained archiving are too great to make duplication of effort a tenable position.

2

We agree!

Models for collaboration

Thinking of digital collections, what types of collaboration are of the most use to us in the long term? Broadly speaking, we see four groupings where collaboration most fruitfully occurs [Slide 9]:

 Content custodians (such as national and research libraries, major archives, universities that maintain repositories of research outputs, and data archive centres) who are committed to long term preservation, including tackling the problem of obsolescence;  Communities of practice and information exchange (standards bodies, digital preservation experts, relevant professional associations);  Providers of services (such as infrastructure providers, software developers, registry services, identifier resolution services); and  Capacity building organisations (such as development organisations, funders of research, curriculum developers, non-government organisations).

Since we began to build our digital collections, we have found many opportunities to collaborate with institutions facing similar challenges, and with other stakeholders. Over the years these partners have included overseas national libraries, other Australian cultural institutions, university libraries, agencies developing relevant software, and standards bodies. Our experience has shown that collaborations between different groupings can have interesting and useful outcomes, such as a partnership between the National Library and a university-funded research project to develop a standard or pilot a new service. These have often provided us with new information and learning which has deepened our knowledge or informed our practice.

Some collaborations within groupings, such as between the National Library and its state library counterparts in Australia have had long-lasting

3 strategic outcomes. The more successful of these have, in fact, been transformational, and I would like to look at two in detail.

Web archiving

The need to collect and retain a record of a society’s cultural expression on the web is now well understood. The medium is highly vulnerable, in respect to content at least. Content can disappear without trace. There is no physical artefact or remnant for later, retrospective, collecting. As an example, the following sites archived by the National Library through a selective permissions-based approach have now disappeared from the live web:

 [Slide 10] Most sites associated with the 2000 Sydney Olympic Games, including the official Sydney Organising Committee for the Olympic Games site.  [Slide 11] the website of John Howard, while he was our Prime Minister between 1998 and 2007  [Slide 12] A number of e-journals including: o Digital Technology Law Journal (1999-2004) o Asian Linguistics & Language Teaching (2002-2006) o The Zeitgeist Gazette o Journal of Sports Marketing (1998-2001)  [Slide 13] the APEC Australia 2007 website (2007)  [Slide 14] the Aboriginal and Torres Strait Islander Commission website (2006)  Treatynow.org (2002-2005)  Forgotten Australians (2009)  Most sites associated with the 1998 Australian Constitutional Convention and 1999 Australian Republic Referendum.  Most sites associated with the Centenary of Federation (2001)  Most campaign websites of Federal and State elections from 1996 onwards (e.g. jeff.com.au, Kevin07).

4

 Online Australia – the Commonwealth Government’s first initiative to build online communities (1998-2000); also, GovOnline.gov.au (2002) and Culture.gov.au : Australia’s Culture Portal (2010).  The Australian Firearms Buyback website (2000)  The Paralysis Tick of Australia (2001-2002); and  The Jabiluka Uranium Mine Blockade website (1999) As a publishing medium that more and more people can engage in, collecting web materials allows us to understand our society over time in ways that have not been so possible in the past – provided we collect and preserve the record.

More and more ‘grey literature’ – that is, documents produced by entities that are not in the commercial business of publishing, that support and inform policy and research – is published online only because of the cheap and convenient means of publishing afforded by the web. Without we run the risk of allowing the proverbial ‘digital black hole’ to prevail in our cultural, social and intellectual memory.

Never has there been larger-scale cultural heritage that urgently requires preservation. But how are we as library and collection institutions to deal with it?

The PANDORA Project

The need to cooperate with other collecting institutions to achieve effective results was very much the thinking behind the National Library’s development of the PANDORA web archive in 1996.[Slide 15] PANDORA is a selective web archive, prioritising content with a high research value while also ensuring we collect and preserve a broad sample of material representing the range of online culture and publication relating to Australia and Australians. One of the reasons for taking a selective approach is so that we can manage the negotiation of permissions so that all the content collected for the PANDORA Archive can be made freely available to the public.

5

Because of the high cost of selective web archiving, it makes sense for a lead agency (such as a national library) to develop both the expertise and the infrastructure for web archiving, and for other agencies to leverage off this investment. Accordingly, PANDORA is a collaborative activity, as the archive is built by the Australian state libraries and some other cultural institutions in addition to the National Library. This activity is an example of collaboration between content custodians.

Today, 11 participants, including the National Library, jointly curate the PANDORA Archive. [slide 16]

This includes all state and territory libraries, with the exception of the Australian Capital Territory library service and the state library of Tasmania. Tasmania has maintained its own web archive for nearly as long as PANDORA. The state libraries take responsibility for collecting resources that specifically relate to their state or local jurisdictions.

In addition to state libraries, four other heritage collecting institutions participate. These institutions take responsibility for developing the collection in their areas of expertise. These are the National Film and Sound Archive, The , the Australian Institute of Aboriginal and Torres Strait Islander Studies and the National Gallery of Australia.

The PANDORA collaboration involves a shared software and database platform, and agreement on non-overlapping collection responsibilities. The entire technical infrastructure along with the actual archival content is maintained centrally at the National Library of Australia in Canberra.

The partners work together on the basis of a formal memorandum of understanding made between the National Library and each of the participants. This covers mutual responsibilities and adherence to common policies and procedures. However, each participant agency is free to develop its own selection guidelines. Approaches to selection are not standardised across all participants. We see this flexibility as contributing to the success of the collaboration, with each agency free to collect content that meets its collecting aims. As a result, the PANDORA archive has a good representation of content acquired over a number of years by each

6 contributing partner [slide 17]. The National Library is the largest contributor, as it takes a national approach to selection of content, while the partners contribute what they can in their areas of interest. Over the years this has led to a substantial representation of content from the largest mainland states, where publishing activity is greater.

Whole domain harvest

Since 2005, the National Library has also completed an annual large scale harvest of the Australian web domain. This activity was the result of a long- standing aspiration to complement the selective PANDORA approach with a whole domain approach. [Slide 18] The Library contracted the to do the whole domain harvest. Effectively, the Internet Archive acts as both a supplier and partner to the National Library. This provides an example of the third mode of collaboration, with providers of services.

As the National Library does not yet have permission to make the content of the whole domain archive freely available, it is a closed archive. Nevertheless it stands as a record gathered in anticipation of the day when it can be opened up for public access. Meanwhile, it makes up a substantial part of our 300 terabytes of web collection storage. [slide 19]

The Australian Government Web Archive

Of all the published Australian content collected by the National Library, we see the greatest change in the record of government publishing. [slide 20] Put simply, government publishing in Australia has migrated online at a far faster rate than any other sector, with the result that our receipts of print publishing have fallen sharply over the years.

In 2010 the Library advocated for and won permission to collect, preserve and make accessible national government publications on a ‘whole-of government’ basis. It developed in-house a purpose-built web archive for the delivery of archival web content, following the model of other web archives. A fully functioning access and search portal for the Australian government web collections was released in March 2014. [slide 21]

7

I believe this audience well understands that government publications represent the foundation of the 'grey literature' that articulates, informs and influences public policy. As such, there is a great public interest in ongoing access to the historic documents of government that have a bearing on social, economic, cultural, commercial, legal and other developments. As government publishing moves to online as the primary means of dissemination we need to meet the challenge to collect and preserve this content to continue the great collections of government print material we have in our collections.

The Australian Government Web Archive includes content dating back to 2005 amounting to around 144 million files or 9 terabytes of data. At present it only includes national Government websites which are collected through bulk harvests of nearly 1000 seed URLs. The scheduling of the harvests is not routinely established yet but harvests are being conducted roughly three times per year

At present the Australian Government Web Archive represents Commonwealth government web publishing. We are now hard at work developing a collaborative model in consultation with National and State Libraries Australasia to enable the state and territory libraries to contribute state-based government content to the central archive, with the National Library taking responsibility for the storage and delivery of content and ultimately, its preservation.

The Archive is already proving its worth as government officials, researchers and journalists use it to find content that has disappeared from department or ministerial websites. One comment left on a blog posting recently said:

Keep up this brilliant work and especially with the new infrastructure archiving platform. Like the archiving mirror in Alexandria, Egypt, Pandora and its successors will be much valued by historians and researchers of today and the future.

8

Digitisation

I’d now like to consider the role of digitisation as a large-scale preservation strategy for paper-based materials. Done well, digitisation achieves the reformatting of physical items to extend their life or the information they contain. This is particularly the case when the items are fragile, rare or unique. It is an added benefit that digitisation also enables much wider access to the item than was previously possible. That being the case, the National Library determined some years ago that digitisation rather than microfilming was an appropriate preservation strategy for its historical Australian newspaper collection [Slide 22].

It is important to observe that creation of the surrogate copy is the first step in preservation of the original newspaper; the second step is to preserve the digital file itself in the years to come. In short, digitisation is a necessary but not a sufficient condition for true preservation of the original work. I will not be discussing digital preservation in this paper.

Many libraries and cultural institutions have now established their own in- house digitisation activities, or have outsourced all or some of the process.

However, we have observed that while digitisation itself is relatively easy, providing the systems which deliver content to users and enable them to discover it is expensive and difficult. The digitisation of newspapers is a good example.

Over the last six years the National Library has made a major investment in its newspaper digitisation program, building the infrastructure to process content and deliver it through the platform. An investment on this scale is difficult for many libraries to justify. For this reason, the Australian state and territory libraries have partnered with the National Library to leverage off its investment. Where these libraries can generate funding to digitise titles of interest to them, such as regional newspapers, these funds are contributed to the Library to digitise and process the additional titles and mount the titles on the Trove service.

9

Six years after the program began, we are delivering 17 million pages of newspapers through Trove [Slide 23], the largest digitised newspaper collection in the world. 1,000 newspapers have been completely digitised out of a total of (we think) about 7,700 newspapers ever published in Australia. We have strong partnerships with the state and territory libraries which have contributed their own funds to the project and coordinated a myriad of small contributors such as local libraries, historical societies and local councils. We have collectively invested some $30 million in Trove, with around half of that figure relating to service infrastructure, and half to production of the digital content that has made Trove so successful. In a sign of the importance and centrality of this collaborative endeavour, state and territory libraries have funded approximately half of Trove’s digital newspaper content. The initiative has been so successful that no historical newspaper is being digitised in Australia outside our program. Meanwhile, the volume of Trove use, the wide audiences we have reached, and the volume and variety of complimentary feedback from researchers and users tells us every day that this initiative has had massive public impact.

However, the seeds of the success of the newspaper digitisation program were sown many years ago, before digitisation was ever thought of.

How is this so?

The Australian Newspaper Digitisation Program actually has its roots in a cooperative newspaper microfilming program that was initiated by the Library in 1991 to preserve Australian newspapers that were fragile, in heavy demand, or otherwise ‘at risk’. The Library worked with the Australian state and territory libraries to save these rich documentary sources with each partner firstly cataloguing their newspapers and making them known, and then microfilming them with financial support from the National Library. The collaboration was formalised under the name of the Australian Newspaper Plan [Slide 24], which extended to all aspects of newspaper collecting, preservation and access.

At that time preservation microfilming was a vital strategy for preserving permanent access to Australian newspapers. For some Australian Newspaper PLAN libraries it still is, as they continue to preserve their

10 newspaper collections through filming, re-filming poorly filmed material and copying from acetate microfilm to stable polyester microfilm.

The cooperative newspaper microfilming program ceased in June 2010, by which time it had laid the foundation for the success of the newspaper digitisation program. This happened in two ways. Firstly, it created a large body of microfilm content that could be efficiently scanned and enabled optical character recognition. This was far quicker than scanning from hard copy. Secondly, the Australian Newspaper PLAN collaboration over newspaper collection management had built up a level of trust and goodwill between partners, so that the transition to a collaborative digitisation program could be based on the same relationships forged by the Cooperative Microfilming Program. It is clear to all that the individual libraries by themselves could never have accomplished the mass digitisation and preservation of newspapers on the scale achieved by their collaboration to date. The availability of the Trove service and the existing cooperative program enabled the Australian Newspaper PLAN libraries to seamlessly transition to a forward-looking and world-leading collaborative digitisation program. Importantly, it proved to be attractive to government funding authorities. The State Library of NSW was able to harness substantial funding for collection digitisation through the NSW Government’s Digital Excellence Program, which has contributed significantly to the program. In the last year alone, we have digitised and delivered a record 4 million more newspaper pages.

The digital future

At the time of writing, the National Library is awaiting news that the Australian Parliament has passed a piece of legislation now before it mandating the legal deposit of electronic publications [Slide 25]. If and when it becomes law, the new legal deposit arrangements will enable the Library to fulfil its mandate to collect and preserve all Australian publications, whether print or digital.

Our state and territory library counterparts either have or are seeking similar legislation from their respective governments. We have begun to explore with them the feasibility of sharing infrastructure and processes to

11 collect and manage the Australian digital imprint. We see three guiding principles for such a collaboration: to provide publishers with a seamless and efficient way to meet their legal deposit obligations; to leverage each institutions’ resource commitment to maximise the benefit for all; and ultimately, to ensure the long term management and preservation of the national collection.

In conclusion, preserving large-scale cultural heritage is everyone’s business [Slide 26]. It requires an integrated, collaborative approach on a number of levels: within institutions to ensure proper care for collections; between institutions to accomplish together what one institution cannot achieve by itself; between heritage institutions and their natural stakeholders, whether creators or supporters; and from governments themselves, to ensure that the cultural record of a society is collected, protected and preserved for future generations. [Slide 27]

Thank you and I hope that over today and tomorrow I will learn about the innovative and successful collaborations that you run in order to preserve large scale national collections.

12