AI-Generated Literature and the Vectorized Word by Judy Heflin
Total Page:16
File Type:pdf, Size:1020Kb
AI-Generated Literature and the Vectorized Word by Judy Heflin B.A. Comparative Literature and Culture, Yonsei University (2015) Submitted to Program in Comparative Media Studies/Writing in Partial Fulfillment of the Requirements for the Degree of… Master of Science in Comparative Media Studies at the Massachusetts Institute of Technology May 2020 ©2020 Judy Heflin. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author: Department of Comparative Media Studies May 8, 2020 Certified By: Nick Montfort Professor of Digital Media Thesis Supervisor Certified By: Lillian-Yvonne Bertram Assistant Professor of Creative Writing Director, MFA in Creative Writing, UMass Boston Thesis Supervisor Accepted By: Eric Klopfer Professor of Comparative Media Studies Director of Graduate Studies AI-Generated Literature and the Vectorized Word by Judy Heflin Submitted to the Program in Comparative Media Studies/Writing on May 8, 2020 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Comparative Media Studies. ABSTRACT This thesis focuses on contemporary AI-generated literature that has been traditionally published in the form of a printed book, labeled and interpreted as something written by “artificial intelligence,” and that necessarily depends on vector representations of linguistic data. This thesis argues that AI-generated literature is not a monolithic practice that erases the role of the author, but rather encompasses a diversity of practice that includes divergent artistic and writerly perspectives. Through an in-depth look at three books of contemporary AI-generated literature and discussions with their human authors, this thesis details the authorial and writerly labor throughout the stages of datafication, vectorization, generation, and pagination in order to categorize the writerly practices that are involved in the creation of this type of work. This thesis also considers how these practices are preceded by “analog” types of writing, compares divergent authorial perspectives, and discusses ways of reading AI-generated literature, including a material analysis of how AI-generated text interacts with the printed book, how authors and publishers use paratextual elements to guide readings, along with additional points of entry for analyzing literary AI-generated texts. Thesis Supervisor: Nick Montfort Thesis Supervisor: Lillian-Yvonne Bertram Title: Professor of Digital Media Title: Assistant Professor of Creative Writing Director, MFA in Creative Writing, UMass Boston 2 ACKNOWLEDGEMENTS Institutional: This thesis could not have been written without the guidance of Nick Montfort and Lillian-Yvonne Bertram. I am so grateful for their mentorship and constantly inspired by their creative and scholarly work. Thank you to Nick for supervising this thesis and taking me on as both a research and teaching assistant in what people close to me have begun referring to as the “wax on, wax off” of computational literature. Over the past two years I’ve learned more than I ever thought I could, and more than I ever knew there was to know. Highlights from the Trope Tank include: learning to write a BASIC program on a ZX Spectrum and save it to tape, learning how to properly use a tape recorder in the first place, some of the nerdiest rap battles I have ever had the pleasure of witnessing, and a glorious collection of computer-generated books in print. My infinite gratitude goes to Allison Parrish, Ross Goodwin, and David Jhave Johnston for entertaining all of my questions, allowing me to publish our conversations, and inspiring me in so many ways. Personal: Thank you to my mom, for giving me such a welcoming place to finish my thesis during the global pandemic and for the late-night espressos and unconditional love. Thank you to Dave whose rigorous work ethic and commitment to the grueling process of digging a basement by hand motivated me to continue writing. Thanks to my dad for supporting my education in so many ways, for the hours of Myst-playing and puzzle-solving, and for teaching me that I could survive at a place like MIT. Thanks to Julia, whose visits, memes, Duo calls, and friendship I could not have survived the last two years without. A big thank you to Marro, for getting me through my deepest pits of despair and self-doubt during graduate school and believing in me to the point of delusion. Thank you to my amazing cohort: Anna, Annie, Ben, Han, Iago, Lizzie, and Sam. I’m so proud to know every one of you. 3 TABLE OF CONTENTS ABSTRACT 2 ACKNOWLEDGEMENTS 3 TABLE OF CONTENTS 4 LIST OF FIGURES 5 GLOSSARY OF TERMS 6 1. AI-GENERATED LITERATURE 8 INTRODUCTION 8 CONCEPTS AND TERMS 13 METHODOLOGY 23 LITERATURE REVIEW 34 2. PRIMARY SOURCES AND SYSTEMS 34 ARTICULATIONS 35 1 THE ROAD 52 RERITES 64 3. WAYS OF WRITING 77 WRITERLY PRACTICES 77 DIVERGENT THREADS OF PRACTICE 81 4. WAYS OF READING 88 MATERIAL, PARATEXTUAL, INTENTIONAL, TEXTUAL 89 AI-GENERATED LITERATURE IN PRINT 95 5. CONCLUSION 101 INITIAL INSIGHTS INTO RELATED QUESTIONS 104 FINAL THOUGHTS 108 APPENDICES 110 APPENDIX A 110 EXCERPTS FROM CONVERSATION WITH ALLISON PARRISH // FEBRUARY 20, 2020 APPENDIX B 120 EXCERPTS FROM CONVERSATION WITH ROSS GOODWIN // FEBRUARY 25, 2019 APPENDIX C 131 EXCERPTS FROM CONVERSATION WITH DAVID JHAVE JOHNSTON // MARCH 24, 2020 BIBLIOGRAPHY 139 4 LIST OF FIGURES Figure 1. Map of definitions and AI-generated books ……………………………………….…11 Figure 2. Co-occurrence matrix …………………………………………………………..….…17 Figure 3. Word embedding vector ………………………………………………………...……19 Figure 4. Most similar tokens ……………………………………………………………..……20 Figure 5. Excerpt from Project Gutenberg Etext …………………………….…………………38 Figure 6. Definition of a poem in A Gutenberg Poetry Corpus ……………...…………………41 Figure 7. Definition of a line in A Gutenberg Poetry Corpus ………………………….……….42 Figure 8. Wordfilter encryption ……………………………………………….…………...…...44 Figure 9. Surveillance car as pen ………………………………………………………….……55 Figure 10. WordCamera class …………………………………………………………..………58 Figure 11. ReRites limited edition box set ………………………………………...……………67 Figure 12. PyTorch seed code ……………………………………………………………..……71 Figure 13. Definition of a poem in ReRites …………………………………………….………73 Figure 14. Inside cover of ReRites paperback ……………………………………..……………75 Figure 15. Articulations Excerpt …………………………………………………………..……83 Figure 16. 1 the Road Excerpt ……………………………………………………………….…84 Figure 17. ReRites excerpts ……………………………………………………….…….………85 Figure 18. One Hundred Million Million Poems ……………………….…..…………………99 5 GLOSSARY OF TERMS Artificial Intelligence (AI): Computer programs with the ability to learn and reason like humans Convolutional Neural Network (CNN): A type of neural network that is ideal for processing image data and handling classification tasks Computer-generated literature: Literary text that has been generated in some way through running a computer program Corpus: A collection of written texts used as source material or data for a work of computer-generated literature Corpora: Plural of corpus Dense Vector: A vector in which most of the indices are nonzero numbers Dimensionality: The number of dimensions in a vector space. We can easily visualize 2-dimensional or 3-dimensional spaces, but the text-generation systems that we will look at utilize very high dimensions that are not easy to visualize. Embedding: The process by which text is given numerical representation in a vector space Language Model (LM): A computational representation of language that encodes some sort of linguistic understanding Long-short-term-memory (LSTM): A type of recurrent neural network with the added ability to store past information at each processing step in order to improve modeling and prediction of sequences of data Machine Learning (ML): Algorithms that are able to draw conclusions from data without being explicitly programmed Neural Network: A computational model inspired by the structure of the human brain that can automatically detect patterns in large amounts of data Natural Language Generation (NLG): A subfield of natural language processing that focuses on computationally generating text Natural Language Processing (NLP): A subfield of artificial intelligence that focuses on the computational manipulation of linguistic data Recurrent Neural Network (RNN): A type of neural network that is used to model, classify, 6 and predict sequences of data such as text Rule-based System: A text-generation system wherein linguistic understanding and relationships are explicitly encoded Scalar: A quantity with magnitude but no direction that can be measured by a real number Seed: Data that initializes some sequence of generation powered by a neural network Sparse Vector: A vector in which most of the indices are zero Token: Every word in a corpus, including repeated words Training: The computational process in which the weights and biases of a neural network are changed in order to automatically encode meaningful patterns in data Type: Every individual word in a corpus, not including repeated words Vector: A quantity with magnitude and direction, a sequence of numbers used to describe a point in a high-dimensional space Vector Space: A high-dimensional space that includes a set of vectors that represent data in some dataset. In