Twitsong: a Current Events Computer Poet and the Thorny Problem of Assessment

TwitSong: A current events computer poet and the thorny problem of assessment. by Carolyn Elizabeth Lamb A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2018 c Carolyn Elizabeth Lamb 2018 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Com- mittee is by majority vote. External Examiner: Ruli Manurung Google Japan Supervisor(s): Daniel G. Brown Professor and Director, School of Computer Science, University of Waterloo Charles L.A. Clarke Professor, School of Computer Science, University of Waterloo Internal Members: Edith Law Assistant Professor, School of Computer Science, University of Waterloo Dan Vogel Associate Professor, School of Computer Science, University of Waterloo Internal-External Member: Dave DeVidi Professor, Dept. of Philosophy, University of Waterloo ii This thesis consists of material all of which I authored or co-authored: see Statement of Contributions included in the thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Statement of Contributions Several parts of this thesis were previously published as academic conference or journal papers with more than one author. In every case, I was the first author listed. The overall design of the research, the literature review, the implementation of systems, the design and implementation of experiments, the interpretation of results, and the writing component were consistently done by me. My supervisors' role was to provide guidance and discussion, to occasionally point me in the direction of additional literature, to help provide structure in order to get me to finish my work on time, and to suggest revisions to my drafted work. No other authors besides myself and my supervisors were included in any publication. Sections 2, 3.1, 4, 5.1, and 5.2 were written and published with me as the first author and both Dan Brown and Charlie Clarke supervising as described in the previous paragraph. Sections 3.2 and 5.3 describe previously unpublished research undergone by me, with Dr. Brown in the supervisory role previously described. Sections 1, 6, and 7 were written entirely by me, with Dr. Brown and Dr. Clarke's role limited to reading a draft and suggesting revisions. Nonwithstanding the above, the academic \we" is used throughout this thesis for style and consistency. iv Abstract This thesis is driven by the question of how computers can generate poetry, and how that poetry can be evaluated. We survey existing work on computer-generated poetry and interdisciplinary work on how to evaluate this type of computer-generated creative product. We perform experiments illuminating issues in evaluation which are specific to poetry. Finally, we produce and evaluate three versions of our own generative poetry system, TwitSong, which generates poetry based on the news, evaluates the desired qualities of the lines that it chooses, and, in its final form, can make targeted and goal-directed edits to its own work. While TwitSong does not turn out to produce poetry comparable to that of a human, it represents an advancement on the state of the art in its genre of computer-generated poetry, particularly in its ability to edit for qualities like topicality and emotion. v Acknowledgements Thank you to Dan and Charlie, my endlessly patient supervisors; to Edith, Dan, Dave, and Ruli, my thesis committee (especially to Ruli, who Skyped in to my defense all the way from Jakarta); to Ming Li for chairing my defense; and to OGS, NSERC, Google, and the University of Waterloo for scholarship and grant money which variously supported me and my supervisors during the research that resulted in this thesis. I also want to thank the ICCC community, particularly Anna Jordanous, Gillian Smith, and Hannu Toivonen, for their enthusiastic engagement and support as I pursued this research. vi Table of Contents List of Tables x List of Figures xii 1 Introduction 1 2 Related work: Evaluating computational creativity3 2.1 Introduction.............................................3 2.2 Theories of creativity........................................4 2.3 Person perspective.........................................5 2.4 Process perspective.........................................6 2.4.1 Conceptual space......................................6 2.4.2 Stage- and loop-based theories..............................7 2.4.3 The process of professional artists............................9 2.4.4 Autonomy.......................................... 10 2.4.5 Specific evaluation techniques............................... 11 2.5 Product perspective........................................ 13 2.5.1 Novelty and value..................................... 13 2.5.2 Other criteria........................................ 18 2.5.3 The modified Turing test................................. 19 2.5.4 Consensual assessment................................... 20 2.5.5 Product evaluation, mk. II: Computational aesthetics................. 20 2.6 Press perspective.......................................... 21 2.6.1 The Creative Tripod.................................... 23 vii 2.6.2 Impact on the domain and field.............................. 24 2.6.3 Measures of audience impact............................... 24 2.6.4 Interactive art....................................... 25 2.6.5 Creativity support tools.................................. 26 2.6.6 Artificial social systems.................................. 26 2.6.7 Cultural success...................................... 26 2.7 Arguments against evaluating creativity............................. 27 2.7.1 Domain specificity..................................... 27 2.7.2 Other arguments...................................... 28 2.8 Issues in computational creativity evaluation.......................... 30 2.8.1 Implementations of models and ad hoc tests....................... 30 2.8.2 Opinion surveys, non-expert judges, and bias...................... 31 2.8.3 Meta-evaluation...................................... 33 2.9 Conclusion: Best practices for the assessment of creativity in computational systems.... 34 2.9.1 Person............................................ 34 2.9.2 Process........................................... 34 2.9.3 Product........................................... 35 2.9.4 Press............................................. 35 2.9.5 Best practices regardless of perspective.......................... 36 2.9.6 Deviations from best practice............................... 36 3 Related Work in Computational Poetry 38 3.1 A taxonomy of generative poetry techniques........................... 38 3.1.1 Introduction........................................ 38 3.1.2 Mere Generation...................................... 40 3.1.3 Human Enhancement................................... 46 3.1.4 Computer Enhancement.................................. 47 3.1.5 Separation of generative poetry communities...................... 53 3.1.6 Generalization and comparison with music........................ 54 3.1.7 Conclusion......................................... 55 3.2 State of the art in Computer Enhanced poetry......................... 56 3.2.1 Optimization / Filtration................................. 56 3.2.2 Knowledge representation / inception.......................... 61 3.2.3 Neural networks...................................... 67 3.2.4 Conclusion......................................... 71 viii 4 Our experiments in poetry evaluation 73 4.1 Human competence in evaluating poetry............................. 73 4.1.1 Introduction........................................ 73 4.1.2 Experiment I........................................ 76 4.1.3 Experiment II........................................ 81 4.1.4 Discussion.......................................... 84 4.2 Poetry criteria derived from consensual assessment....................... 86 4.2.1 Introduction........................................ 86 4.2.2 Background......................................... 86 4.2.3 Method........................................... 87 4.2.4 Results........................................... 89 4.2.5 Discussion.......................................... 93 5 TwitSong: Developing a computational poetry system 98 5.1 Generation one: Line selection proof of concept......................... 98 5.1.1 Introduction........................................ 99 5.1.2 Method........................................... 100 5.1.3 Results........................................... 103 5.1.4 Discussion.......................................... 105 5.2 Generation two: Full automation................................. 106 5.2.1 Introduction........................................ 106 5.2.2 How TwitSonnet works.................................. 106 5.2.3 Evaluating TwitSonnet.................................. 108 5.2.4 Discussion.......................................... 112 5.3 Generation three: The editorial algorithm............................ 114 5.3.1 Introduction........................................ 114 5.3.2 The mechanisms of generation three........................... 115 5.3.3 Evaluation......................................... 119 5.3.4 Discussion.......................................... 127 5.3.5 Conclusion......................................... 128 ix 6 Discussion, limitations, and future work 130

Twitsong: a Current Events Computer Poet and the Thorny Problem of Assessment

PSYCHOLOGY DEPARTMENTAL SEMINAR Psychological Approaches to Art Appreciation Professor Helmut Leder Department of Psychological Basic Research University of Vienna

Human Rights of Women Wearing the Veil in Western Europe

Drawing a Hypothesis Book

Expert Face Processing: Specialization and Constraints

Faith in Equality: Religion & Belief in Europe

The Hilltop 9-14-1990

The Distancing-Embracing Model of the Enjoyment of Negative Emotions in Art Reception

Defining and Understanding Street Art As It Relates to Racial Justice In

Entitling Art: Influence of Title Information on Understanding And

The Hilltop 10-19-1984

The Burkini Buzz: Exploring French National Identity Discourse Through Social Media

Art and Emotion: the Variety of Aesthetic Emotions and Their Internal Dynamics