Stories, sources and new formats: the challenges of digitising large archive source collections

In the last two decades or so there has been a movement towards digitising large collections of original sources. These projects have had a range of purposes, approaches and target audiences but there can be little doubt that they have had a profound impact on the practice of history in the academic, educational and public domains. In this article Ben Walsh and Andrew Payne attempt to survey the broad landscape of the scale and impact of digitisation and assess how the digital landscape has affected the historical landscape.

The digitisation of for archives, allowing users to search World War service records, to be history collections and download copies of preserved through reduced handling. documents from anywhere in the world. Finally digitisation has allowed for the The development of digital technologies The National Archives has transformed bringing together of physically separated has opened up tremendous new its online catalogue Discovery into content to create comprehensive online possibilities for the study of history. the single point of access to 32 million collections which allow for ‘Big Data’ Digital imaging and digital storage descriptions of records; 22 million of projects such as the University of Sussex have enabled the creation of vast these are in their own collection and project on the household expenditure digital libraries enabling academic over 10 million records in 2,500 other surveys of the early twentieth century historians, educators and the general archives across the country. The vast which are held at Bangor University and public relatively easy access to very large majority of its collection, however, The National Archives. collections of original documents and remains available only for physical other sources from archive collections. inspection in the reading rooms at Kew, How is the process of The process began in the 1990s with with around 7% of records having been the digitisation of document collections digitised for online access. digitisation contributing on to CD-ROM. One of the earliest Digitisation has been driven by to academic scholarship? instances of this was The British four key factors: revenue generation, In addition to providing access to Library’s series of CD-ROMs Sources preservation, access and digital re- collections of source material which in History. Since then digital imaging unification. might be otherwise beyond reach, digital and digital storage have developed Revenue generation has become technologies allow historians to conduct spectacularly such that high-resolution an important motive as archives new types of research. Digital resources copies of documents, maps, cartoons, have sought to widen their funding offer up tremendous possibilities for census and other official records and base. Name-rich collections offer searching, sorting, categorising and countless other types of documents are an attractive option with a growing searching for patterns. now available to users in universities, audience interested in family history schools and homes. and genealogy. By partnering with The British Living And the digital collection continues commercial organisations such as to grow. Last year The National Archives Ancestry.co.uk and FindMyPast.co.uk, Standards Project delivered more than 640,000 documents it has been possible for The National Professor Ian Gazeley of the University to readers in their reading rooms at Archives to provide much wider of Sussex has described how digitisation Kew and over 200 million downloads access to the census, military service has transformed his research practice, via their website and licensed associate and migration records to researchers particularly in the British Living partnerships. Clearly digitisation offers around the world. At the same time the Standards Project, based on British archives a powerful way to open up their licensing agreements have generated government household surveys:1 collections to academic researchers, income to allow for further investment journalists and family historians as well in access and services. Digitisation also The digitalisation of twentieth- as history teachers and students. Indeed, allows fragile and damaged records, century British household in many ways the internet was made such as the ‘Burnt Collection’ of First expenditure records has allowed

28 The Historian – Spring 2016 The National Archives website: www.nationalarchives.gov.uk

me to tackle bigger questions than England’s medieval interest and confidence in working was possible hitherto. I’ve used the immigrants with medieval records. Among some results of household budgets studies of the recent key examples here are Similarly, Professor Mark Ormrod in my research since my PhD thesis the Fine Rolls of Henry III,3 England’s of the University of York and his 30 years ago, but these were all Immigrants, 1330-1550 and the York colleagues have reconstructed the lives small-scale enquiries that did not Cause Papers project.4 Electronic of numerous high-profile immigrants present significant issues with respect access gives history a bright future! in England in the medieval period as a to capturing the information they result of the digitisation of the medieval recorded. The 1953/4 Household Alien Subsidy Returns and Letters of Robot historians? Expenditure Survey is immense by Denization in the England’s Immigrants It may be that digitisation will continue comparison and was designed to project.2 Perhaps more interesting, they to transform the discipline of history be a stratified random sample of all have used a prosopographical approach still further. Canadian historians British households. It is quite rare to collect the scraps of information Geoff Keelan and Kirk Goodlet have to find a large survey where all the in these sources about more humble speculated a future in which historians original records survived. Capturing individuals to develop a fascinating new will be aided and abetted by robots.5 the expenditure information insight into patterns and processes of This does not mean a metal historian! recorded on a daily basis for three migration in the medieval period. As It is entirely plausible that before long weeks by all individuals in 13,000 Professor Ormrod himself observes: software will be able to read difficult households would not have been handwriting and convert to more feasible without modern digitisation Electronic resources have readable forms. Texts in modern and techniques, as there are about one transformed access to historical ancient foreign languages might be million hand-written individual daily records for an increasingly diverse translated with sufficient accuracy. As expenditure records in the survey. range of users, and thus made it more and more transactions are digital, We have digitised the material to be possible, as never before, for people to so the ability of software to archive in keeping with the original records understand the evidence on which we these transactions becomes feasible. as faithfully as possible. The data we base – and extend – our knowledge Of course this may be a curse rather have extracted from this survey has of the past. The opportunity to reach than a blessing as we are overwhelmed allowed us to answer a number of public audiences has been especially with data. Having said that, robot questions relating to the standard important for medieval and early readers could conceivably ‘understand’ of living in Britain at mid-twentieth modern historians, because the texts to the extent that they might be century, and now that this data is in records we use are written in hands able to direct historians’ energies by the public domain, I’m sure that in and languages that are difficult recommending them to study particular the future it will be used to address to read. So electronic delivery of source materials. Artificial Intelligence questions that we did not foresee at document facsimiles, modern may seem the stuff of science fiction but the point of digitisation. English translations and summaries, it is a rapidly-developing field and may and databases of standardised open up new approaches to the study of information have helped to revive history.

The Historian – Spring 2016 29 An example of a return for a railway platelayer in Kilburn, North London, in 1904. Records became more detailed in subsequent surveys. While individual records provide a fascinating insight into the lives of families, the digitisation of large surveys opens up the prospect of new levels of investigation and interpretation based on thousands of such records. The British Living Standards Project, University of Sussex, funded by ESRC Research Grant RES-062-23-2054

What are the challenges crowd-sourcing to make machine- create metadata about the official diaries in digitising a collection searchable formats. This final stage kept by each battalion commander often poses the biggest challenge during the First World War.6 of sources from an as the nature of the documents can archive and making determine how it is undertaken. Bringing digital them available? ‘Structured data’ such as forms collections to wider The process of digitisation itself is well are often easier to work once the key audiences practised and understood, with four fields have been identified to aid with So far we have focused on what main steps to get documents from searching. Not everything needs to be academic historians have gained, and artefact to screen. made searchable – usually only key data might further gain, from the process of fields such as name, address, date of digitisation. But the National Archives 1 Preparation of material to ensure the birth, service unit etc, as with the First and most other holders of collections best image possible can be obtained. World War service records. do not want to reserve their collections This may require documents to ‘Unstructured data’ such as purely for academic use. The last few undergo conservation work. letters, minutes and reports are more decades have seen an explosion in public 2 Photographing is usually done in challenging because what needs to interest in history. Probably the largest high-resolution TIF format which be transcribed will depend upon the single element of public activity has been ensures the digital image files viewpoint of the user. This usually in tracing family histories. In this respect contain as much data as possible. means transcribing everything. the digitisation of large collections such 3 Processing the image files into In the case of the Cabinet Papers as census returns and military records, output formats to meet the project www.nationalarchives.gov.uk/ particularly from the First World War, requirements of the project. These cabinetpapers the typed minutes and has opened up historical research to a will usually be either JPEG image memoranda could be transcribed using completely new audience. files or PDF which can allow for OCR and then a selection checked History teaching has also benefited optical character recognition (OCR) for quality control. The hand written from the opening up of collections in to aid with the transcription of Cabinet Secretary notebooks have digital form. Primary and secondary documents if required. This stage to be transcribed by hand, however, schools make extensive use of original also involves developing appropriate which is time consuming and therefore historical documents in their teaching file structures and metadata which expensive. One solution to this is to of history. Some believe that school will help users to work with the use crowd-sourcing projects such as students cannot make proper use of material. Zooniverse who bring large numbers of such materials. We do not, alas, have 4 The final step is making material digital volunteers to work on records by space in this article to explain why we searchable to facilitate research. tagging terms such as names, locations believe such naysayers are wrong, but This can include any combination and key events. The WO95 Operation they would be welcome to visit The of tagging, transcribing, OCR and War Diary has used this approach to National Archives, The

30 The Historian – Spring 2016 Visualisations from the England’s Immigrants project datasets. The graph on the left illustrates the points of origin of the majority of medieval immigrants to England for whom nationality is known. On the right is a representation of the number of Icelanders living in England in the medieval period. www.englandsimmigrants.com/

England’s Immigrants 1330-1550 (www.englandsimmigrants.com/, version 1.0, 29 January 2016 England’s Immigrants 1330-1550 (www.englandsimmigrants.com/, version 1.0, 2 April 2015 and countless other local and national institutions to be proved other ways of teaching history which are available and which wrong in person. To avoid any possibility of misunderstanding, inform and enthuse students in the subject. we are not advocating a replacement of traditional teacher- On the other hand, such materials as the Cabinet Papers led exposition or the demise of textbooks. We see the use of or cannot simply be placed in front of young archival documents and other sources as one aspect of a good students. So a challenge which follows on from the digitisation history education, complementing and enhancing the many of collections of documents is how to make these materials accessible to a general and/or a school audience. There are The National Archives Education resource on Magna Carta. a number of key challenges. Perhaps the most formidable is language. Early documents are generally in Latin or perhaps medieval French. Even contemporary documents are couched in terms which can be inaccessible simply because of the formality of their language or because they are full of technical terms. Appearance is another potential barrier. The dense text of medieval scribes was once referred to as ‘nothing more than scribble on a page’ in one particular meeting relating to medieval collections at the National Archives! Digitisation and technology is something which can help with this set of challenges. At the simplest level, technology makes it relatively easy to make transcripts (and or translations) available. Technology can also make different levels of transcript or translation available, with transcripts and simplified transcripts, for example. But technology can also help by making it relatively easy to provide additional supporting materials such as audio versions of documents. Technology can also provide digital equivalents to ‘notepads’ allowing students to record and share thoughts and ideas and questions easily in a medium with which they are comfortable and familiar. There are many examples where digital technologies have made apparently inaccessible materials accessible to a wide range of school students from all over the world. They range from the Norman Conquest to the Cold War and beyond. Magna Carta In March 2015, The National Archives launched Magna‘ Carta and the emergence of ’, a digital resource for school students developed in partnership with Parliament.7 Part of the 2015 Parliament in the Making programme, it marks the 800th anniversary of the sealing of Magna Carta and the 750th anniversary of Simon de Montfort’s parliament, and sets these events within the wider context of the struggle between kings and barons from 1066 through to the end of the thirteenth century.

The Historian – Spring 2016 31 A letter from the Churchill Archive concerning the contents of Churchill’s whisky bottle. By kind permission of Curtis Brown, the Churchill Archives Centre and Bloomsbury Publishing. www.churchillarchive.com www.churchillarchiveforschools.com annotations of wartime speeches, to the mundane, such as the letters from Churchill’s constituents on all manner of issues great and small. There is also the delightful, such as the exchange between officials concerning the Prime Minister’s whisky. So the first challenge was to find a way to make this overwhelming amount of material navigable. In order to make this collection accessible to teachers and students in school Bloomsbury publishers commissioned a team of teachers to create investigations based on selected sources. The sources were presented in their original forms, something we believe is very important. An official document may appear intimidating but when it is stamped ‘Top Secret’ and the list of people to whom it was circulated includes the monarch, the Prime Minister and numerous other important people then it can generate intrigue. The sources were also supported by transcripts and simplified versions. Another challenge with the Churchill Archive was the perception that it could only be of value for studying Churchill himself or for great events. This was tackled The resource requires students to student is actually the master in the through a series of short articles work with over 30 original documents relationship. For while Paris, as with all providing advice on how the collection from the period to investigate how and of his contemporary chroniclers, is a could be harnessed for other histories, why Magna Carta is issued, re-issued dab hand at recording the events of the particularly by focusing on the letters and evolves over time. Medieval period, he is less capable in explaining written to Churchill by his constituents. documents present real challenges why they occur and judging what their These letters provided a treasure trove for engaging teachers and students, significance may be. of insights into the preoccupations, however, as they appear little more than Students are rewarded throughout perceptions and viewpoints of the illegible text. the resource with badges for visiting ordinary people Churchill represented in The solution was to adopt an locations, reading documents, his Parliamentary career. unconventional approach which requires interviewing characters and completing students to work independently to write chronicle chapters. REFERENCES a comprehensive account of the struggle 1 British Living Standards Project between kings and barons over a 250- • Explorator (Explorer) www.sussex.ac.uk/britishlivingstandards/protect 2 • Inquisitor (Researcher) England’s Immigrants 1330 – 1550 Resident year period. This requires them to read, Aliens in the Late Middle Ages understand, draw evidence from and • Interrogator (Interviewer) www.englandsimmigrants.com/ substantiate judgements about four key • Chronicator (Chronicler) 3 Fine Rolls of Henry III www.finerollshenry3.org.uk 4 points in time – 1215, 1225, 1265 and york Cause Papers If they complete all this they will www.hrionline.ac.uk/causepapers/ 1297 – before wrapping the whole thing 5 Robot historians and the future of history up in the style of a medieval chronicle. become Magister Chronicator (Master http://clioscurrent.com/blog/2015/6/29/robot- The opening video presents the mystery Chronicler) but more importantly they historians-and-the-future-of-history will have developed an incomparable 6 WO95 Operation War Diary of different versions of Magna Carta www.operationwardiary.org issued at different times by different knowledge and understanding of the 7 National Archives Magna Carta people for different reasons and then complexity of how Magna Carta evolved www.nationalarchives.gov.uk/education/ and how it is linked to the , medieval/magna-carta/ set the enquiry: why does Magna Carta 8 Churchill Archive for Schools keep coming back? the genesis of our rights, the origins of www.churchillarchiveforschools.com/ Characterisation is with a cast Parliament and the foundation of our of figures from the period to draw constitution. students into the period; to participate Ben Walsh is Associate Vice President in a journey of discovery using ancient The Churchill Archive of the Historical Association. He is a maps to track down parchment The Churchill Archive holds the textbook author, trainer and senior texts and press well-intentioned, if personal archive of .8 examiner. occasionally impertinent, questions By the standards of many archives upon powerful people. The great monk it is quite small. But that still means Andrew Payne is Head of Education chronicler, Matthew Paris, acts as a guide around 800,000 private letters, speeches, and Outreach at The National and mentor; setting tasks, selecting telegrams, manuscripts, government Archives where he leads on the documents, helping with translations transcripts and other key historical development and delivery of and explaining their meaning but documents. These range from the iconic, services for schools, teachers and always recognising that ultimately the such as the handwritten drafts and communities.

32 The Historian – Spring 2016