AI-Generated Literature and the Vectorized Word by Judy Heflin

Total Page:16

File Type:pdf, Size:1020Kb

AI-Generated Literature and the Vectorized Word by Judy Heflin AI-Generated Literature and the Vectorized Word by Judy Heflin B.A. Comparative Literature and Culture, Yonsei University (2015) Submitted to Program in Comparative Media Studies/Writing in Partial Fulfillment of the Requirements for the Degree of… Master of Science in Comparative Media Studies at the Massachusetts Institute of Technology May 2020 ©2020 Judy Heflin. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author: Department of Comparative Media Studies May 8, 2020 Certified By: Nick Montfort Professor of Digital Media Thesis Supervisor Certified By: Lillian-Yvonne Bertram Assistant Professor of Creative Writing Director, MFA in Creative Writing, UMass Boston Thesis Supervisor Accepted By: Eric Klopfer Professor of Comparative Media Studies Director of Graduate Studies AI-Generated Literature and the Vectorized Word by Judy Heflin Submitted to the Program in Comparative Media Studies/Writing on May 8, 2020 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Comparative Media Studies. ABSTRACT This thesis focuses on contemporary AI-generated literature that has been traditionally published in the form of a printed book, labeled and interpreted as something written by “artificial intelligence,” and that necessarily depends on vector representations of linguistic data. This thesis argues that AI-generated literature is not a monolithic practice that erases the role of the author, but rather encompasses a diversity of practice that includes divergent artistic and writerly perspectives. Through an in-depth look at three books of contemporary AI-generated literature and discussions with their human authors, this thesis details the authorial and writerly labor throughout the stages of datafication, vectorization, generation, and pagination in order to categorize the writerly practices that are involved in the creation of this type of work. This thesis also considers how these practices are preceded by “analog” types of writing, compares divergent authorial perspectives, and discusses ways of reading AI-generated literature, including a material analysis of how AI-generated text interacts with the printed book, how authors and publishers use paratextual elements to guide readings, along with additional points of entry for analyzing literary AI-generated texts. Thesis Supervisor: Nick Montfort Thesis Supervisor: Lillian-Yvonne Bertram ​ ​ ​ Title: Professor of Digital Media Title: Assistant Professor of Creative Writing ​ ​ ​ Director, MFA in Creative Writing, UMass Boston 2 ACKNOWLEDGEMENTS Institutional: This thesis could not have been written without the guidance of Nick Montfort and ​ Lillian-Yvonne Bertram. I am so grateful for their mentorship and constantly inspired by their creative and scholarly work. Thank you to Nick for supervising this thesis and taking me on as both a research and teaching assistant in what people close to me have begun referring to as the “wax on, wax off” of computational literature. Over the past two years I’ve learned more than I ever thought I could, and more than I ever knew there was to know. Highlights from the Trope Tank include: learning to write a BASIC program on a ZX Spectrum and save it to tape, learning how to properly use a tape recorder in the first place, some of the nerdiest rap battles I have ever had the pleasure of witnessing, and a glorious collection of computer-generated books in print. My infinite gratitude goes to Allison Parrish, Ross Goodwin, and David Jhave Johnston for entertaining all of my questions, allowing me to publish our conversations, and inspiring me in so many ways. Personal: Thank you to my mom, for giving me such a welcoming place to finish my thesis ​ during the global pandemic and for the late-night espressos and unconditional love. Thank you to Dave whose rigorous work ethic and commitment to the grueling process of digging a basement by hand motivated me to continue writing. Thanks to my dad for supporting my education in so many ways, for the hours of Myst-playing and puzzle-solving, and for teaching me that I could survive at a place like MIT. Thanks to Julia, whose visits, memes, Duo calls, and friendship I could not have survived the last two years without. A big thank you to Marro, for getting me through my deepest pits of despair and self-doubt during graduate school and believing in me to the point of delusion. Thank you to my amazing cohort: Anna, Annie, Ben, Han, Iago, Lizzie, and Sam. I’m so proud to know every one of you. 3 TABLE OF CONTENTS ABSTRACT 2 ACKNOWLEDGEMENTS 3 TABLE OF CONTENTS 4 LIST OF FIGURES 5 GLOSSARY OF TERMS 6 1. AI-GENERATED LITERATURE 8 INTRODUCTION 8 CONCEPTS AND TERMS 13 METHODOLOGY 23 LITERATURE REVIEW 34 2. PRIMARY SOURCES AND SYSTEMS 34 ARTICULATIONS 35 1 THE ROAD 52 RERITES 64 3. WAYS OF WRITING 77 WRITERLY PRACTICES 77 DIVERGENT THREADS OF PRACTICE 81 4. WAYS OF READING 88 MATERIAL, PARATEXTUAL, INTENTIONAL, TEXTUAL 89 AI-GENERATED LITERATURE IN PRINT 95 5. CONCLUSION 101 INITIAL INSIGHTS INTO RELATED QUESTIONS 104 FINAL THOUGHTS 108 APPENDICES 110 APPENDIX A 110 EXCERPTS FROM CONVERSATION WITH ALLISON PARRISH // FEBRUARY 20, 2020 APPENDIX B 120 EXCERPTS FROM CONVERSATION WITH ROSS GOODWIN // FEBRUARY 25, 2019 APPENDIX C 131 EXCERPTS FROM CONVERSATION WITH DAVID JHAVE JOHNSTON // MARCH 24, 2020 ​ ​ BIBLIOGRAPHY 139 4 LIST OF FIGURES Figure 1. Map of definitions and AI-generated books ……………………………………….…11 ​ Figure 2. Co-occurrence matrix …………………………………………………………..….…17 ​ Figure 3. Word embedding vector ………………………………………………………...……19 ​ Figure 4. Most similar tokens ……………………………………………………………..……20 ​ Figure 5. Excerpt from Project Gutenberg Etext …………………………….…………………38 ​ Figure 6. Definition of a poem in A Gutenberg Poetry Corpus ……………...…………………41 ​ Figure 7. Definition of a line in A Gutenberg Poetry Corpus ………………………….……….42 ​ Figure 8. Wordfilter encryption ……………………………………………….…………...…...44 ​ Figure 9. Surveillance car as pen ………………………………………………………….……55 ​ Figure 10. WordCamera class …………………………………………………………..………58 ​ Figure 11. ReRites limited edition box set ………………………………………...……………67 ​ ​ Figure 12. PyTorch seed code ……………………………………………………………..……71 ​ Figure 13. Definition of a poem in ReRites …………………………………………….………73 ​ ​ ​ Figure 14. Inside cover of ReRites paperback ……………………………………..……………75 ​ ​ ​ Figure 15. Articulations Excerpt …………………………………………………………..……83 ​ ​ Figure 16. 1 the Road Excerpt ……………………………………………………………….…84 ​ ​ Figure 17. ReRites excerpts ……………………………………………………….…….………85 ​ ​ Figure 18. One Hundred Million Million Poems ……………………….…..…………………99 ​​ ​ 5 GLOSSARY OF TERMS Artificial Intelligence (AI): Computer programs with the ability to learn and reason like humans ​ Convolutional Neural Network (CNN): A type of neural network that is ideal for processing ​ image data and handling classification tasks Computer-generated literature: Literary text that has been generated in some way through ​ running a computer program Corpus: A collection of written texts used as source material or data for a work of ​ computer-generated literature Corpora: Plural of corpus ​ Dense Vector: A vector in which most of the indices are nonzero numbers ​ Dimensionality: The number of dimensions in a vector space. We can easily visualize ​ 2-dimensional or 3-dimensional spaces, but the text-generation systems that we will look at utilize very high dimensions that are not easy to visualize. Embedding: The process by which text is given numerical representation in a vector space ​ Language Model (LM): A computational representation of language that encodes some sort of ​ linguistic understanding Long-short-term-memory (LSTM): A type of recurrent neural network with the added ability ​ to store past information at each processing step in order to improve modeling and prediction of sequences of data Machine Learning (ML): Algorithms that are able to draw conclusions from data without being ​ explicitly programmed Neural Network: A computational model inspired by the structure of the human brain that can ​ automatically detect patterns in large amounts of data Natural Language Generation (NLG): A subfield of natural language processing that focuses ​ on computationally generating text Natural Language Processing (NLP): A subfield of artificial intelligence that focuses on the ​ computational manipulation of linguistic data Recurrent Neural Network (RNN): A type of neural network that is used to model, classify, ​ 6 and predict sequences of data such as text Rule-based System: A text-generation system wherein linguistic understanding and ​ relationships are explicitly encoded Scalar: A quantity with magnitude but no direction that can be measured by a real number ​ Seed: Data that initializes some sequence of generation powered by a neural network ​ Sparse Vector: A vector in which most of the indices are zero ​ Token: Every word in a corpus, including repeated words ​ Training: The computational process in which the weights and biases of a neural network are ​ changed in order to automatically encode meaningful patterns in data Type: Every individual word in a corpus, not including repeated words ​ Vector: A quantity with magnitude and direction, a sequence of numbers used to describe a ​ point in a high-dimensional space Vector Space: A high-dimensional space that includes a set of vectors that represent data in ​ some dataset. In
Recommended publications
  • Complete Issue 40:3 As One
    TUGBOAT Volume 40, Number 3 / 2019 General Delivery 211 From the president / Boris Veytsman 212 Editorial comments / Barbara Beeton TEX Users Group 2019 sponsors; Kerning between lowercase+uppercase; Differential “d”; Bibliographic archives in BibTEX form 213 Ukraine at BachoTEX 2019: Thoughts and impressions / Yevhen Strakhov Publishing 215 An experience of trying to submit a paper in LATEX in an XML-first world / David Walden 217 Studying the histories of computerizing publishing and desktop publishing, 2017–19 / David Walden Resources 229 TEX services at texlive.info / Norbert Preining 231 Providing Docker images for TEX Live and ConTEXt / Island of TEX 232 TEX on the Raspberry Pi / Hans Hagen Software & Tools 234 MuPDF tools / Taco Hoekwater 236 LATEX on the road / Piet van Oostrum Graphics 247 A Brazilian Portuguese work on MetaPost, and how mathematics is embedded in it / Estev˜aoVin´ıcius Candia LATEX 251 LATEX news, issue 30, October 2019 / LATEX Project Team Methods 255 Understanding scientific documents with synthetic analysis on mathematical expressions and natural language / Takuto Asakura Fonts 257 Modern Type 3 fonts / Hans Hagen Multilingual 263 Typesetting the Bangla script in Unicode TEX engines—experiences and insights Document Processing / Md Qutub Uddin Sajib Typography 270 Typographers’ Inn / Peter Flynn Book Reviews 272 Book review: Hermann Zapf and the World He Designed: A Biography by Jerry Kelly / Barbara Beeton 274 Book review: Carol Twombly: Her brief but brilliant career in type design by Nancy Stock-Allen / Karl
    [Show full text]
  • The Fontspec Package Font Selection for XƎLATEX and Lualatex
    The fontspec package Font selection for XƎLATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2013/05/12 v2.3b Contents 7.5 Different features for dif- ferent font sizes . 14 1 History 3 8 Font independent options 15 2 Introduction 3 8.1 Colour . 15 2.1 About this manual . 3 8.2 Scale . 16 2.2 Acknowledgements . 3 8.3 Interword space . 17 8.4 Post-punctuation space . 17 3 Package loading and options 4 8.5 The hyphenation character 18 3.1 Maths fonts adjustments . 4 8.6 Optical font sizes . 18 3.2 Configuration . 5 3.3 Warnings .......... 5 II OpenType 19 I General font selection 5 9 Introduction 19 9.1 How to select font features 19 4 Font selection 5 4.1 By font name . 5 10 Complete listing of OpenType 4.2 By file name . 6 font features 20 10.1 Ligatures . 20 5 Default font families 7 10.2 Letters . 20 6 New commands to select font 10.3 Numbers . 21 families 7 10.4 Contextuals . 22 6.1 More control over font 10.5 Vertical Position . 22 shape selection . 8 10.6 Fractions . 24 6.2 Math(s) fonts . 10 10.7 Stylistic Set variations . 25 6.3 Miscellaneous font select- 10.8 Character Variants . 25 ing details . 11 10.9 Alternates . 25 10.10 Style . 27 7 Selecting font features 11 10.11 Diacritics . 29 7.1 Default settings . 11 10.12 Kerning . 29 7.2 Changing the currently se- 10.13 Font transformations . 30 lected features .
    [Show full text]
  • Translate's Localization Guide
    Translate’s Localization Guide Release 0.9.0 Translate Jun 26, 2020 Contents 1 Localisation Guide 1 2 Glossary 191 3 Language Information 195 i ii CHAPTER 1 Localisation Guide The general aim of this document is not to replace other well written works but to draw them together. So for instance the section on projects contains information that should help you get started and point you to the documents that are often hard to find. The section of translation should provide a general enough overview of common mistakes and pitfalls. We have found the localisation community very fragmented and hope that through this document we can bring people together and unify information that is out there but in many many different places. The one section that we feel is unique is the guide to developers – they make assumptions about localisation without fully understanding the implications, we complain but honestly there is not one place that can help give a developer and overview of what is needed from them, we hope that the developer section goes a long way to solving that issue. 1.1 Purpose The purpose of this document is to provide one reference for localisers. You will find lots of information on localising and packaging on the web but not a single resource that can guide you. Most of the information is also domain specific ie it addresses KDE, Mozilla, etc. We hope that this is more general. This document also goes beyond the technical aspects of localisation which seems to be the domain of other lo- calisation documents.
    [Show full text]
  • Junicode 1234567890 ¼ ½ ¾ ⅜ ⅝ 27/64 ⅟7⒉27 (Smcp) C2sc Ff Ffi Ffl Fi Fl
    Various glyphs accessed through OpenType tags in five different fonts: Junicode 1234567890 ¼ ½ ¾ ⅜ ⅝ 27/64 ⅟7⒉27 (smcp) c2sc ff ffi ffl fi fl Qu Th ct ſpecieſ ſſ ſſi ſſlſiſl ⁰¹²³⁴⁵⁶⁷⁸⁹⁺-⁼⁽⁾ⁿ ₀₁₂₃₄₅₆₇₈₉₊-₌₍₎ H₂O EB Garamond 1234567890 1234567890 1234567890 1234567890 1⁄4 1⁄2 3⁄4 3⁄8 5⁄8 27⁄64 1⁄72.27 small caps (smcp) SMALL CAPS (c2sc) ff ffi ffl fi fl Quo Th ct ſt ſpecies ſs ſſa ſſi ſſl ſi ſl 0123456789+-=()n 0123456789+-=() 1st 2nd 3rd H2O H2SO4 Sorts Mill Goudy 1234567890 1234567890 1⁄4 1⁄2 3⁄4 3⁄8 5⁄8 27⁄64 1⁄72.27 small caps (smcp) small caps (c2sc) ff ffi ffl fi fl QuThctſt ſpecieſ ſſ ſſi ſſ l ſi ſ l 0123456789+-=()n 0123456789+-=() H2O IM FELL English 1234567890 small caps (smcp) ff ffi ffl fi fl Qu Th ct ſtſpecies ß ſſa ſſi ſſl ſiſl Cardo 1234567890 1234567890 Sᴍᴀᴌᴌ ᴄᴀᴘS (smcp) ff ffi ffl fi fl Qu ſt ſpecieſ ſſ ſſi ſſlſiſl Not all the fonts have the same tags e.g. EB Garamond is the only font with four numeral variants and the ordn tag, Junicode and Cardo don’t have the sinf tag. IM Fell and EB Garamond are clever enough to distinguish between initial and medial ſ and terminal s. Junicode and Cardo are the only two fonts in which all the glyphs survive being copied and pasted into a word processor.* * In Word 2003 running on Windows 7 at least. 1 Other methods of setting fractions ⒤ using TEX math mode (ii) using the Eplain \frac macro (iii) using the font’s own pre-composed action glyphs (iv) using the numr and dnom OpenType tags (if the font has these) ⒱ using the OpenType frac tag as in the previous examples.
    [Show full text]
  • MUFI Character Recommendation V. 3.0: Code Chart Order
    MUFI character recommendation Characters in the official Unicode Standard and in the Private Use Area for Medieval texts written in the Latin alphabet ⁋ ※ ð ƿ ᵹ ᴆ ※ ¶ ※ Part 2: Code chart order ※ Version 3.0 (5 July 2009) ※ Compliant with the Unicode Standard version 5.1 ____________________________________________________________________________________________________________________ ※ Medieval Unicode Font Initiative (MUFI) ※ www.mufi.info ISBN 978-82-8088-403-9 MUFI character recommendation ※ Part 2: code chart order version 3.0 p. 2 / 245 Editor Odd Einar Haugen, University of Bergen, Norway. Background Version 1.0 of the MUFI recommendation was published electronically and in hard copy on 8 December 2003. It was the result of an almost two-year-long electronic discussion within the Medieval Unicode Font Initiative (http://www.mufi.info), which was established in July 2001 at the International Medi- eval Congress in Leeds. Version 1.0 contained a total of 828 characters, of which 473 characters were selected from various charts in the official part of the Unicode Standard and 355 were located in the Private Use Area. Version 1.0 of the recommendation is compliant with the Unicode Standard version 4.0. Version 2.0 is a major update, published electronically on 22 December 2006. It contains a few corrections of misprints in version 1.0 and 516 additional char- acters (of which 123 are from charts in the official part of the Unicode Standard and 393 are additions to the Private Use Area). There are also 18 characters which have been decommissioned from the Private Use Area due to the fact that they have been included in later versions of the Unicode Standard (and, in one case, because a character has been withdrawn).
    [Show full text]
  • Fontspec.Pdf
    The fontspec package Font selection for X LE ATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2015/09/24 v2.4e Contents 6.4 Priority of feature selection 15 6.5 Different features for dif- 1 History3 ferent font shapes . 16 6.6 Different features for dif- 2 Introduction3 ferent font sizes . 17 2.1 About this manual.... 3 2.2 Acknowledgements.... 3 7 Font independent options 18 7.1 Colour ........... 18 3 Package loading and options4 7.2 Scale............. 19 3.1 Maths fonts adjustments. 4 7.3 Interword space . 19 3.2 Configuration....... 5 7.4 Post-punctuation space . 20 3.3 Warnings.......... 5 7.5 The hyphenation character 20 7.6 Optical font sizes . 20 I General font selection5 II OpenType 22 4 Font selection6 4.1 By font name........ 6 8 Introduction 22 4.2 By file name........ 6 8.1 How to select font features 22 5 New commands to select font 9 Complete listing of OpenType families7 font features 23 5.1 More control over font 9.1 Ligatures.......... 23 shape selection....... 8 9.2 Letters ........... 23 5.2 Specifically choosing the 9.3 Numbers.......... 24 NFSS family . 10 9.4 Contextuals . 25 5.3 Choosing additional NFSS 9.5 Vertical Position . 25 font faces.......... 10 9.6 Fractions.......... 26 5.4 Math(s) fonts . 11 9.7 Stylistic Set variations . 27 5.5 Miscellaneous font select- 9.8 Character Variants . 27 ing details ......... 12 9.9 Alternates ......... 28 9.10 Style............. 29 6 Selecting font features 13 9.11 Diacritics.........
    [Show full text]
  • Openoffice Review
    Open Office The Open Source Alternative What you get ● A cross-platform office suite – A word processor (Writer) – A spreadsheet (Calc) – A slide-show editor (Impress) – A drawing editor (Draw) – A database package (Base) – but not in my 2.3 ● No clip art – but plenty available on the web. Why Might You Want It? That enterpreneur fellow Gates Is richer than many small states Which is simply absurd Because Windows and Word Is software that everyone hates. How Good Is It? ● MS Office is the benchmark (although I hate to admit it) – But MS try hard to make each new version incompatible with the previous one. – And change the user interface as well! – doesn't handle large documents well ● Word Perfect, Aldus FrameMaker/PageMaker Open Office ● Open Office is quite good – Not quite as user friendly in some areas – Can be a bit flakey - I have crashed it by selecting some menu items (especially in the dictionary/languages area) – The price is very competitive By The Way ● Lunux Magazine recently had an Ubuntu special – With a writeup of ● Writer – the Word Processor ● Calc – the spreadsheet ● Inpress – the slide presentation package ● Base – the database package List start with Impress ● Rather like MS Powerpoint ● Doesn't come with any clip-art ● The “insert clip art” function has a preview button, but you do not always get a preview. Impress ● Easy to use ● Will import Power Point slideshows – But there may be issues with fonts ● Has lots of standard layouts – plus you can design your own ● In two screen mode, it did not cope well with me editing a slide while displaying the slideshow on the other screen Calc ● A quite good spreadsheet program ● Will import most Excel spreadsheets ● Doesn't handle spanned columns as well as Excel ● Wide range of functions Base ● In earlier releases on Windows ● Should be in release 2.4 (I run 2.3) ● I have not tried it ● Looks like a MS Access clone ● Probably OK if you want a simple WIZIWIG database ● If you need a real database, use MySql.
    [Show full text]
  • Typesetting Phonology Papers
    April 16, 2015 Michael Becker, [email protected] Typesetting phonology papers 1 Choosing your font Most modern fonts conform to Unicode specifications, which means that they can be used for a very wide variety of alphabets, including IPA. Most fonts, however, only cover some of Unicode, so it’s important to choose a font that has good IPA coverage. Here are some fonts with fairly good IPA coverage (look out for quirks and missing characters in some of them, though). ey are all free, and work on OS X, Windows, and Linux. ere is rarely any reason to use more than one font in any given paper, so just choose one font and go with it. Bold Italics Bold italics S C Linux Libertine yes yes yes yes Junicode yes yes yes yes Times New Roman yes yes yes yes¹ DejaVu Serif yes yes yes no Cardo yes yes no yes Charis SIL yes yes yes yes Doulos SIL no no no yes Gentium no yes no no Bold and italics are good for the usual reasons. Small Caps are needed for the names of constraints in constraint-based theories. Note that Microso Word will fake missing features from fonts that lack them; the result is usually less than aesthetically pleasing. ¹Times New Roman doesn’t actually have small caps, but if you are using XƎLATEX, there is a trick. First, download TeX Gyre Termes, which looks a lot like Times New Roman, and has small caps, but doesn’t have good IPA support. en pair the two fonts: \setromanfont[SmallCapsFont={TeX Gyre Termes},SmallCapsFeatures={Leers=SmallCaps}]{Times New Roman}.
    [Show full text]
  • Old English Newsletter
    OLD ENGLISH NEWSLETTER Published for The Old English Division of the Modern Language Association of America by The Department of English, University of Tennessee, Knoxville VOLUME 40 NUMBER 1 Fall 2006 ISSN 0030-1973 Old English Newsletter Volume 40 Number 1 Fall 2006 Editor R. M. Liuzza, University of Tennessee, Knoxville Associate Editors Year’s Work in Old English Studies: Daniel Donoghue, Harvard University Bibliography: Thomas Hall, University of Notre Dame Contributing Editors Research in Progress: Heide Estes, Monmouth University Conference Abstracts: Robert Butler, Alcorn State University Bibliography: Melinda Menzer, Furman University Editorial Board Patrick W. Conner, West Virginia University Antonette diPaolo Healey, Dictionary of Old English David F. Johnson, Florida State University Catherine Karkov, University of Leeds Ursula Lenker, University of Munich Mary Swan, University of Leeds Assistant to the Editor: Jonathan Huffstutler The Old English Newsletter (ISSN 0030-1973) is published for the Old English Division of the Modern Lan- guage Association by the Department of English, University of Tennessee, 301 McClung Tower, Knoxville, TN, 37996-0430; email [email protected]. The generous support of the International Society of Anglo- Saxonists and the Department of English at The University of Tennessee is gratefully acknowledged. Subscriptions: The rate for institutions is $20 US per volume; the rate for individuals is $15 per volume, but in order to reduce administrative costs the editors ask individuals to pay for two volumes at once at the discounted rate of $25. Individual back issues can be ordered for $5 each. All payments must be made in US dollars. A subscription form is online at http://www.oenewsletter.org/OEN/subscription_form.pdf.
    [Show full text]
  • Module 4 How to Transform Incunabula: Theory
    Module 4 How to transform incunabula: Theory Uwe Springmann Centrum für Informations- und Sprachverarbeitung (CIS) Ludwig-Maximilians-Universität München (LMU) 2015-09-14 Uwe Springmann Module 4 How to transform incunabula: Theory 2015-09-14 1 / 21 Why incunabula? oldest documents in modern printing (1450-1500) contain many abbreviations from manuscript era each printer (had to) cut his own types reason 1: if we can do incunabula, we can do anything else that comes later period of societal change: reformation, renaissance, age of discovery (America) printing was quickly embraced as a method for the dissemination of ideas reason 2: incunabula are primary sources (often never reprinted) of a key period of modern society Uwe Springmann Module 4 How to transform incunabula: Theory 2015-09-14 2 / 21 Steps in model training The special incunabula glyphs need to be explicitly taken into account: ground truth production choose some (8-13) representative pages to transcribe get an overview of the glyph repertoire explicitly state your transcription guidelines transcribe the chosen pages set a few pages (10%) apart as test pages, the rest are your training pages training a model segment the test and training pages into single lines train for 10 to 100 epochs (1 epoch = total number of training lines) stop when error no longes decreases (levels off and fluctuates) test your models on test pages choose model with least error on test pages if accuracy is ok, recognize all preprocessed pages (total book) if not, add a few more transcribed pages to your
    [Show full text]
  • The Fontspec Package Font Selection for X LE ATEX and Lualatex
    The fontspec package Font selection for X LE ATEX and LuaLATEX WILL ROBERTSON With contributions by Khaled Hosny, Philipp Gesang, Joseph Wright, and others. http://wspr.io/fontspec/ 2020/02/21 v2.7i Contents I Getting started 5 1 History 5 2 Introduction 5 2.1 Acknowledgements ............................... 5 3 Package loading and options 6 3.1 Font encodings .................................. 6 3.2 Maths fonts adjustments ............................ 6 3.3 Configuration .................................. 6 3.4 Warnings ..................................... 6 4 Interaction with LATEX 2ε and other packages 7 4.1 Commands for old-style and lining numbers ................. 7 4.2 Italic small caps ................................. 7 4.3 Emphasis and nested emphasis ......................... 7 4.4 Strong emphasis ................................. 7 II General font selection 8 1 Main commands 8 2 Font selection 9 2.1 By font name ................................... 9 2.2 By file name ................................... 10 2.3 By custom file name using a .fontspec file . 11 2.4 Querying whether a font ‘exists’ ........................ 12 1 3 Commands to select font families 13 4 Commands to select single font faces 13 4.1 More control over font shape selection ..................... 14 4.2 Specifically choosing the NFSS family ...................... 15 4.3 Choosing additional NFSS font faces ....................... 16 4.4 Math(s) fonts ................................... 17 5 Miscellaneous font selecting details 18 III Selecting font features 19 1 Default settings 19 2 Working with the currently selected features 20 2.1 Priority of feature selection ........................... 21 3 Different features for different font shapes 21 4 Selecting fonts from TrueType Collections (TTC files) 23 5 Different features for different font sizes 23 6 Font independent options 24 6.1 Colour .....................................
    [Show full text]
  • Junicode the Font for Medievalists by Peter S
    Junicode the font for medievalists by Peter S. Baker specimens and user’s guide The design of Junicode is based on scans of George Hickes, Linguarum vett. septentrionalium thesaurus grammatico-criticus et archaeologicus (Oxford: Sheldonian Theatre, 1703–5). This massive two-volume folio is a fine example of the work of the Oxford University Press at this period: printed in multiple types (for every language had to have the type that was proper to it) and lav- ishly illustrated with engravings of manuscript pages, coins and artifacts. An excellent facsimile of volume two of this work, the famous (to medievalists) catalogue of manuscripts containing Anglo-Saxon by Humfrey Wanley, can be seen at the Hathi Trust. The type used for Hickes’s Thesaurus may be one of those as- sembled by John Fell (1625–86) and bequeathed by him to the Uni- versity of Oxford. To my eye, however, this type looks more like the “Pica Roman” purchased by the University in 1692 than like any of those bequeathed by John Fell. For printing in Old En- glish, this type was supplemented by the “Pica Saxon” commissioned by the early Anglo-Saxonist Franciscus Junius (1591–1677) and bequeathed by him to the University. Specimens of both can be found in A Specimen of the Several Sorts of Letter Given to the University by Dr. John Fell, Sometime Lord Bishop of Oxford. To Which Is Added the Letter Given by Mr. F. Junius (Oxford, 1693). It seems likely that Junius’s Pica Saxon was mixed pretty freely with Pica Roman in printing the Thesaurus.
    [Show full text]