How Reproductive Is a Reproduction? Digital Transmission of Text Based Documents

How reproductive is a reproduction? Digital transmission of text based documents Lars Björk Swedish School of Library and Information Science University of Borås 2015 Cover photo: Lars Björk Image editing: Andrea Davis Kronlund Portrait: Jens Östman Printed: Responstryck AB, Borås, 2015 Series: Skrifter från Valfrid, no. 59 ISBN 978-91-981654-6-3 (printed version) ISBN 978-91-981654-7-0 (pdf) ISSN 1103-6990 1. INTRODUCTION 1 1.1. The problem area 3 1.1.1. Historical framing 6 1.1.2. The digital document in the heritage sector 9 1.1.3. Different approaches to the digital document within the library sector 16 1.1.4. Conceptualising digitisation 23 1.1.5. Summing up the problem area 29 1.2. Positioning 30 1.3. Previous work 34 1.4. Objectives and relevance 39 1.5. My background 41 1.6. Terms, concepts and delimitations 42 1.7. The research questions and outline of the thesis 47 1.7.1. Outline of the thesis 50 2. THEORY 51 2.1. Documents and information 52 2.2. Production and reproduction of documents 59 2.3. Basic principles of transmission 63 2.3.1. Pre-digital methods 65 2.3.2. Digital methods 67 2.4. Informative configurations 71 2.5. Fixity and flexibility 76 2.6. Transmissional alterations 81 2.6.1. Loss and gain 86 2.6.2. From binary to analogue 90 2.7. Modes of representation 92 2.8. Following the document into the digital 100 2.8.1. Boundary object theory 101 2.8.2. Boundary objects in LIS and document studies 103 2.8.3. The digital resource as a boundary object 106 2.9. Summary 109 3. METHODOLOGY 111 3.1. The digitisation process 111 3.1.1. The guidelines 112 3.1.2. Comments on the empirical material 121 3.2. The capacity of documents in digital format 125 3.2.1. The informants 126 3.2.2. Reflections on the informants and the interviews 130 3.3. Method 131 3.4. Summary 133 4. THE DIGITISATION PROCESS 135 iii︱Table of contents 4.1. The analytical procedure 137 4.1.1. Basic structure 141 4.1.2. Modelling the process 142 4.2. The stages in the process 145 4.2.1. Selection 146 4.2.2. Information capture 148 4.2.3. Processing 151 4.2.4. Archiving 153 4.2.5. Presentation 155 4.3. The process and its context 157 4.4. The process as the link between the source document and the reproduction 161 4.5. Summary 166 5. THE DOCUMENT IN DIGITAL FORMAT 169 5.1. The analytical procedure 170 5.2. Informative configurations 174 5.3. Summary 190 6. DISCUSSION 193 6.1. Framing the digital resource 194 6.2. Representing the source document 202 6.2.1. Form 203 6.2.2. Sign 206 6.2.3. Context 209 6.2.4. Maintaining flexiblity 210 6.3. Negotiating boundaries 212 6.3.1. Displacements and overlaps 212 6.3.2. Tracing the roots 216 6.4. Summary 219 7. CONCLUSIONS 221 7.1. The research questions 223 7.1.1. Theoretical framework 223 7.1.2. The digitisation process 224 7.1.3. The capacity of documents in digital format 225 7.2. Reflections on the study 228 7.3. Contributions 230 7.4. Future work 231 ABSTRACT 235 SAMMANFATTNING (Abstract in Swedish) 239 REFERENCES 243 APPENDIX 1: Technical standards 261 iv︱Table of contents APPENDIX 2: Interview protocol 262 APPENDIX 3: Recording and transcribing the interviews 263 APPENDIX 4: Processing the interviews 264 APPENDIX 5: Mail to the respondents 267 APPENDIX 6: Publications 1980-2010 269 LIST OF FIGURES Figure 1. Facsimile and TEI-encoded representations, ver. 1 17 Figure 2. Facsimile and TEI-encoded representations, ver. 2 18 Figure 3. Visual representation of the binding structure of Codex Gigas 19 Figure 4. The collation formula of the Codex Gigas 20 Figure 5. The Codex Gigas during examination 22 Figure 6. The relation between capture, transmission, and representation 64 Figure 7. Capturing a text as individual signs 64 Figure 8. Capturing a text as image 65 Figure 9. Digital for capture and representation 68 Figure 10. Modelling of the informative dimensions of the document 73 Figure 11. The model used to indicate a focus on the dimension SIGN 74 Figure 12. The model used to indicate a focus on the dimension CONTEXT 75 Figure 13. The model used to indicate a focus on the dimension FORM 75 Figure 14. Linné, Fauna Svecia, as represented on a computer display 91 Figure 15. Linné, Fauna Svecia, emphasis on the dimension SIGN 94 Figure 16. Linné, Fauna Svecia, emphasis on the dimension FORM 95 Figure 17. Linné, Fauna Svecia, emphasis on the dimension CONTEXT 96 Figure 18. Linné, Fauna Svecia, images of two pages 97 Figure 19. Linné, Fauna Svecia, untrimmed and trimmed pages 98 Figure 20. Linné, Fauna Svecia, two types of bindings 99 Figure 21. Linné, Fauna Svecia, untrimmed and trimmed edges 99 v︱Table of contents Figure 22. The main steps in the analysis of the guidelines 137 Figure 23. Three units of analysis, ver. 1 139 Figure 24. Three units of analysis, ver. 2 140 Figure 25. The relation between capture and representation 141 Figure 26. The digitisation process modelled 141 Figure 27. Two flows of documents in the digitisation process 142 Figure 28. The core sequence of stages in the digitisation process 143 Figure 29. A conceptual model of the digitisation process 145 Figure 30. The relation between the guidelines in the sample 160 Figure 31. Examples of information transfer during the transmission 165 Figure 32. The digital resource with potential points of engagement 166 Figure 33. The main steps in the analysis of the interviews 171 Figure 34. Informative configurations as observed in the interview study 173 Figure 35. Respondent 7’s engagement with the document 174 Figure 36. Respondent 1’s engagement with the document 175 Figure 37. Respondent 13´s engagement with the document 176 Figure 38. Respondents 4, 5, 8, and 9’s engagement with the document 177 Figure 39. Respondent 1’s engagement with the document 178 Figure 40. Respondent 10’s engagement with the document 179 Figure 41. Respondent 3’s engagement with the document 181 Figure 42. Respondent 6’s engagement with the document 182 Figure 43. Respondent 8´s engagement with the document 185 Figure. 44. Respondent 8’s engagement with the document 187 Figure. 45. Respondent 2’s engagement with the document 188 Figure 46. Respondent 15’s engagement with the document 189 Figure 47. Tarte’s model 199 Figure 48. The informative dimensions in digital format 200 Figure 49. The model of the informative dimensions 201 Figure 50. The dimension FORM as rendered in digital format 205 Figure 51. The dimension SIGN as rendered in digital format 208 Figure 52. The dimension CONTEXT as rendered in digital format 210 Figure 53. Various ways of representing Linné, Fauna Svecica 211 vi︱Table of contents LIST OF TABLES Table 1. Seven punctuations of equilibria 60 Table 2. The informative dimensions of a document 72 Table 3. Three types of media literacy 85 Table 4. Error model for digitised books 87 Table 5. Some parameters that affects the transmission 89 Table 6. Keywords used in the literature search 112 Table 7. The documents in the sample 115 Table 8. A comparison between some features of the bokhylla.no and VD16 128 Table 9. The respondents in the interview study 129 Table 10. Modes of representation 202 vii︱Table of contents ACKNOWLEDGEMENTS It all started at the end of the 1990’s, at a conference, when I first met my supervisor to be. I think it was our mutual interests in documents (Mats’ as a researcher in the field of critical editing and mine as a conservator) that formed the starting point for our first discussion. At that time I was head of preservation at the National Library and digitisation was the hot topic in the library sector but there was very little at- tention to the relation between text and materiality in the digital domain. Mats was an inspiring contact and subsequent discussions introduced me to fields of research that enabled my base in conservation to be combined with excursions into the digital. Mats, I am deeply grateful for you putting me on track. Intrigued by the potential of digital technology in relation to text-carrying documents I wanted to dig deeper and was accepted as a doctoral student at the Swedish School of Library and Information Science. Thank you Gunnar Sahlin, Magdalena Gram, and Margareta Lundberg Rodin for providing the practical circumstances that made it possible to combine doctoral studies with my work at the National Library. A big thank you goes to my two brilliant supervisors, the above introduced Mats Dahlström and Pelle Snickars for being such an awesome and challenging combi- nation. You certainly encouraged and guided my explorations, fostered my critical thinking, and showed great patience when the investigations sometimes took sur- prising directions. I would like to thank my fellow PhD students at the SSLIS for discussions and sem- inars that have contributed to the development of this thesis. My thanks also go to the Division of ALM and Book History, the Department of Arts and Cultural Sciences at Lund University for inviting me to present a text and for offering valua- ble feedback at a research seminar. Keeping the desk at my ordinary work at the National library in Stockholm during this period of studies enabled me to use the library as a lab, a test bed for some of the concepts that were developed in the thesis.

How Reproductive Is a Reproduction? Digital Transmission of Text Based Documents

Diversity in Digital Libraries

Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation

Future Reading" Digitization and Its Discontents

Cornell University, Preservation and Access Framework for Digital Art

Frequently Asked Questions About the Million Book Project

Handbook of E-Resources from Yenepoya Central Library

The Million Book Project at Bibliotheca Alexandrina

Sustainable Preservation Practices and the Rhizome Artbase

Identifying and Selecting Content for the Million Book Project

The Theory and Craft of Digital Preservation Manuscript Submitted to Johns Hopkins University Press By: Trevor Owens June, 2017 2

Zhejiang University Libraries and CADAL Project: Retrospection & Anticipation Huang Chen (Vice Librarian, Zhejiang University Libraries)

Write, Rewrite, Backup: Piecing Together the Histories of Net Art Pau Waelder