Read Me Like A Book:

Lessons in Affective, Topical and Personalized Computational Creativity

Tony Veale

School of Computer Science, University College Dublin, Dublin, Ireland.

Abstract For personalization, the footprint of a target user – Context is crucial to creativity, as shown by the signif- whether their official bio or their recent tweets – offers a icance we attach to the labels P- and H-Creativity. It is textual context in which to situate the generation process. context that allows a system to truly assess novelty, or For topical creativity, the aggregated timelines of an array to ensure that its topical artifacts really are topical. An of online news sources, from a Twitter-addicted president important but an often overlooked aspect of context is to the breaking headlines of mainstream media, provide a personality. A CC system that is designed to reflect a dynamic context for machine creativity. We explore three specific aspect of the creative temperament, whether it modes of CC via these systems: a marriage of linguistic is humour, arrogance or whimsy, must stay true to this and artistic creativity that maps the digital personality of a assumed personality in its actions. Likewise, a system user, as reflected in what they tweet, into metaphors that that creates artifacts that are rooted in emotion must be sensitive to the personality or mood of its audience. are both textual and visual; a topical creator that generates But here we must tread carefully, as the assessment of metaphors for news stories rather than news readers; and personal qualities often implies judgement, and so few a book recommender system that leavens its user-tailored of us like to be judged, especially by our machines. To suggestions with humour, and which invents its own book better understand the upsides and pitfalls of topicality- ideas to supplement the titles in its well-stocked database. and personality-based CC systems, we unpack three of The principle that unites all three systems is the role of these systems here, and explore the lessons they offer. information compression in CC. One space of information may be compressed or decompressed to yield others, and The Wonder of You produce insightful generalizations or vivid elaborations in the process. Thus, compression is required to map a news Creativity can be an intensely personal affair. We put our- story to a linguistic metaphor, as the metaphor need only selves into what we create, relying on our experiences and capture the gist of the story. In fact, such information loss values to build artifacts we hope others will value too. In is desirable when it leads to generalization and ambiguity, doing so, we reveal our personalities. When we create for as metaphors should be objects of profound wonderment. others and assimilate the values of an audience, creativity When moving from online user personalities to metaphors becomes personal and personalized. While it is a tenet of we require the opposite, decompression, to inflate a low- computational creativity (CC) that an agent need not be a dimensional space of personality types into an elaborate, person to be creative, a creative system may nonetheless high-dimensioanl space of possible character metaphors. need a personality, or an appreciation of the personalities Current sentiment analysis techniques can place a user in of others, before it can create like a human (Colton et al. a space of a dozen or so psychological dimensions, while 2008). ‘Software is not human,’ to quote the CC refrain, metaphors will occupy a space that – even after a process yet CC systems must appreciate what matters to a human. of dimension reduction – has hundreds of dimensions. In In any case, we can only know a CC system’s personality, fact, even the extraction of psychological dimensions is a and it can only know ours, by what we do or say, making compression process, since the textual timelines that feed personal/personalized CC a special case of contextual CC. into sentiment analysis are converted into high-dimension For CC systems that create in language, context is itself a distributed spaces built with word co-occurrence statistics. linguistic artifact, as rooted e.g., in our time- In the next section we focus on personality-driven CC lines. Here we describe how best to use linguistic context with a system that maps recent user moods into metaphors to deliver various forms of topical and personalized CC. and pictures. Our approach is data- and knowledge-based, Specifically, we will explore the role of linguistic con- marrying textual data from a user profile with a symbolic text in the operation of three Twitter-based CC systems, model of the cultural allusions that underpin a machine’s ranging from one that uses context to ensure topicality to metaphors. Following that, we give statistical form to the ones that view context as the imprint of a user personality. notion that metaphors reside in a space of possibilities, so as to re-imagine metaphor creation as a mapping from one The metaphor whose vector presents the smallest angle space, a topic model of the news, into another, a space of (the largest cosine) to an incoming news vector is chosen metaphors that shares exactly the same dimensions. These as the one with the most relevance to that news item. We are whimsical systems that make sport of news and mood, built our joint space by compressing 380,000 news items, so we present one more system, a CC book recommender 210,000 tweets (from sources including @nytimesworld, that uses simple information-retrieval techniques to guide @CNNbrk and @FOXnews) and 22,846,672 metaphors its suggestions, but which also uses machine creativity in from MetaphorMagnet (which were made available to us some unsubtle ways. This specific system was built for a on request) into the same LDA space of 100 dimensions. recent science communications event, and user feedback We used the gensim package of Řehůřek & Sojka (2010) offers us some lessons on the willingness of humans to be to build the space, and concatenated word lemmas to their judged by machines. Although whimsy can diminish the POS tags to provide a richer feature set to the model. severity of a perceived criticism, humour must be wielded The best pairings produced by this conflation of spaces with care by our autonomous CC systems, especially if it are tweeted hourly by our bot, called @MetaphorMirror. is unbidden, or used for the furtherance of serious goals. The thematic basis of the compression means that some pairings show more literal similarity that others, as in: Metaphor Mirror On The Wall From @WSJ: Sultan Abdullah of Pahang has been Consider the problem of generating apt metaphors for the chosen as Malaysia’s new king. news. As a story breaks and headlines stream on Twitter, ↑↓ we want our metaphor machine to pair an original and in- From @MetaphorMirror: What is a sultan but a sightful metaphor to each headline. So a headline about ruling crony? What is a crony but a subservient sultan? extreme weather might be paired with a metaphor about What drives ruling sultans to be toppled from thrones, nature’s destructive might, or a political scandal might be appointed by bosses and to become subvervient cronies? paired to a crime metaphor. As metaphor theorists often speak of multiple spaces – e.g. Koestler (1964), Lakoff & As is evident here, MetaphorMagnet’s hardboiled world- Johnson (1980) and Fauconnier & Turner (2002) all see view shines through in these pairings, offering meanings different viewpoints as different spaces – it is tempting to and perspectives that, while not actually present in a head- model each space in a metaphor with its own vector space line, can be read into the headline if one is so inclined. So model (VSM), by equating vector spaces with conceptual some pairings show that the VSM has learnt the lessons of spaces. Yet this analogy is misleading, as different VSMs history by reading the news, and this shines through too: – constructed from different text corpora – must have different dimensions (even if they share the same number From @AP: Congo's new President Felix of dimensions) and we cannot directly perform geometric Tshisekedi sworn into office; country's first peaceful comparisons between the vectors of two different VSMs. transfer of power since independence. Since the principal reason for building a VSM is the ease ↑↓ with which semantic tests can be replaced with geometric From @MetaphorMirror: How might an elected ones, we should build a single vector space that imposes incumbent become an unelected warlord? What if the same dimensions on each conceptual metaphor space. elected incumbents were to complete tenures, grab It is useful then to view news headlines and metaphors as power and become unelected warlords. comprising two overlapping subspaces of the same VSM. For a news subspace we collect a large corpus of news Any stereotyping in this response is a product of the VSM content from the Twitter feeds of CNN, Fox News, AP, and its large corpus of past news, rather than any bias in Reuters, BBC and New York Times, and use a standard MetaphorMagnet. Though the latter has a symbolic model compression technique – such as LDA (Latent Dirchlet of warlords and democrats, it is the news space that unites Allocation; Blei et al., 2003), LSA (Latent Semantic this generic model with the specific history of the Congo. Analysis; Landauer & Dumais, 1997) or Word2Vec (Mik- So even as a system strives for topicality, it must have one olov, 2013) – to generate a vector for each headline. We foot planted in the past if its outputs are to seem informed. additionally build a large metaphor corpus by running the Metaphor Magnet system of Veale (2015) on the Google n-grams (Brants & Franz, 2006), to give millions of meta- Fifty Shades of Dorian Gray phors that stretch across diverse topics. Rather than build Much research has been conducted on the analysis of hu- separate vector spaces for the news and metaphor corpora, man personality as reflected in our lexical choices. Chung we build a single vector space for both by appending one & Pennebaker (2008) describe a tool and a resource, the corpus onto the other before applying dimension comp- LIWC (or Linguistic Inquiry & Word Count), for estim- ression. Within this joint VSM, every past metaphor and ating personality traits such as anger, affability, positivity, future headline is assigned a vector of precisely the same topicality, excitability, arrogance, analyticity, awareness, dimensionality. It is now a simple matter to measure the worry, anxiety and social engagement from a writer’s text angle between the vector for an incoming headline and outputs. The web version of the tool, AnalyzeWords.com, those of previously encoded MetaphorMagnet metaphors. which calculates values for these 11 dimensions by anal- yzing one’s recent tweets, tells us that @Oprah is upbeat tweets) as an “optimistic machine.” As his AnalyzeWords and affable as a tweet writer, while @realDonaldTrump is profile also suggests the qualities laid-back, educated and upbeat but angry. To create metaphors for a specific per- scientific, the latter two of which are typical of research- son, such as Donald Trump or Oprah, a machine can treat ers, it can also describe Musk as a “laid-back researcher.” a recent AnalyzeWords profile as an 11-dimension vector in personality space, and seek to map this coarse vector to I painted “Optimistic Machine” from a higher-dimensional space of metaphorical possibilities. @elonmusk’s tweets with determined Given the disparity in dimensions between these spaces badger-grey, unfeeling Sith-black and (11 versus 100) and their different means of construction, dispassionate robot-silver-grey. we cannot build a joint vector space by just concatenating data. Lacking a dataset to train a neural network to do this mapping across the spaces, we use a symbolic approach to inflate the AnalyzeWords space into 100s of dimensions that capture the qualities highlighted in our metaphor set. So we inflate the smaller space by hand-crafting logical formulas – or transformulas – to estimate approximately 300 qualities as functions of the eleven core dimensions. Transformulas can conjoin, disjoin and negate these core dimensions. All core dimensions are mapped to the scale 0 to 1.0 (from 0 to 100), and so all transformulas calculate values in the range 0 to 1 also. The negation of a quality simply inverts this scale, so not angry can be calculated to be (1 - angry). Consider the transformula for neurotic: neurotic(u) = worried(u) × analytic(u) That is, since neurotics tend to overthink their worries, we estimate the neuroticism of user u to be the product of the Fig. 1. A personalized metaphor for @ElonMusk. core dimensions worried and analytic. Likewise, we can say that someone is narcissistic to the extent that they are These metaphors, as tweeted by the metaphor-generating arrogant and self-aware (given to talking about their own bot @BotOnBotAction, are shown in Figures 1 and 2. feelings), or creative to the extent they are analytical and upbeat. While transformulas do not reflect an empirical I made “Laid-back Researcher” from truth about a person, they codify a kind of ‘folk’ symbolic @elonmusk’s tweets with scientific Walter reasoning that lends itself to explicit verbal explanation. White, educated priest-black and laid- Importantly, they allow any Twitter user u to be described back Lebowski-weed-green. in terms of the vivid qualities that are used in the NOC list (Veale, 2016) to characterize its gallery of famous people. So, once our transformulas have mapped AnalyzeWords’s 11 dimensions into the rich voculabulary of the NOC list, a Twitter user can be compared and matched to its iconic membership. In this way, @ElonMusk may show a strong similarity to Walter White of Breaking Bad, while @real DonaldTrump might produce a match to Lex Luthor. Such metaphors are a reach – all good metaphors are – but each can be explained in symbolic terms using the logic of the transformulas that link them to their most recent tweets. As such, transformulas turn text-analytic calculations into talking points that a creative linguistic system can exploit. Consider again the example of @ElonMusk, engineer and entrepreneur. From an AnalyzeWords.com profile that places his tweets high on the core dimensions upbeat and analytic and low on the dimensions angry and self-aware, the transformula qualities optimistic (upbeat × analytic), Fig. 2. Alternate personalized metaphor for @ElonMusk. dispassionate (analytic × not angry), unfeeling (analytic × not sensory) and determined (upbeat and not angry) can The bot creates a new piece of visual art to complement be inferred. Since three of these transformula qualities – its metaphors, by creating a 1-dimensional 4-state cellular unfeeling, determined and dispassionate – are typical of automaton that unfurls over many rows/generations – and machines, and the fourth, optimistic, is not, our metaphor rendering its four states with colours chosen to match the generator might describe Musk (in light of his most recent highlighted qualities of the metaphor (see Veale & Cook, 2018). As a flourish, the bitmap of an Emoji annotated The bot also has two activation modes that do not obey with one of the words in the metaphor – so an atom for a strict opt-in policy. The first is perhaps partially opt-in, “scientific” in Fig. 1, and a robot for “robot” in Fig. 2 – is insofar as one can request a recommendation for another. integrated into the image and coloured to suit its new con- In this mode, a user tweets #ReadHimLikeABook followed text. Using a colour lexicon in which 600 of the metaphor by the Twitter handle of a friend; the bot also accepts the machine’s stereotypes are mapped to apt RGB codes (e.g., tags #ReadHerLikeABook and #ReadThemLikeABook. As silver-grey for robots, black for priests), it is possible to with @BotOnBotAction, the second mode is a filler mode assign a specific hue to even the non-visual qualities (see for when the system finds itself between explicit requests. Veale & Alnajjar, 2016). Each metaphor is then framed so In this case, it exploits the fact that many of the authors in as to cement this link, so that the black of Fig. 1 is named its books database are themselves on Twitter, and so aims “educated priest black” while that of Fig. 2 is “unfeeling to start a conversation about modern literature that draws Sith black.” The tweeted metaphor, comprising four inter- contemporary writers into an online discussion of books. twined sub-metaphors, tells us what each of these colours Authors opt-out of this mode by simply blocking the bot. stands for, and suggests how we should feel about them. Recommender systems are typically either user-based It is important to note that @BotOnBotAction operates or content-based. In the former, a perceived similarity bet- on an opt-in basis for most users. The bot will not target ween users permits a system to recommend items favored them, or metaphorize them, unless explicitly asked to do by one to the other. In the latter, the similarity function is so with the hashtag #PaintMySoul. This kind of personal- defined over the items themselves, so a user that favors a ized computational creativity is not always flattering or given item is likely to favor a similar item too. These two welcome, and may even – in some cases – be considered modes are far from orthogonal, as a similarity function for abusive. The exception to this opt-in rule concerns high- users can be defined over the set of items they both favor, profile celebrity users of Twitter, who use the platform to whilst a similarity function for items can be defined over promote themselves to millions of followers. The bot uses the set of users that favor them both. In short, as a system the website TwitterCounter.com (now defunct) to obtain a learns more about its users, it learns more about the items list of the most well-known personalities on Twitter, such it has in its database of possible recommendations. Impor- as Mr. Musk, and freely generates metaphors and images tantly, @ReadMeLikeABot is not designed to track users, for these luminaries in the downtime between its explicit or to learn very much about them, other than that which is user commissions. Artists and satirists have always made public in their Twitter accounts. The bot remembers what targets of the powerful and famous, who hardly notice the it recommends simply to ensure that it does not make the impudence of a single provocateur; our bot is no different. same suggestion again in too short a timeframe. The bot’s user-based recommendations are personality-based, while its content-based recommendations are topic-based, where Making a Hash of Computational Creativity each is inferred on the basis of Twitter usage alone. Personalization and topicalization offer orthogonal means Recommendation systems are a practical application of of grounding the products of CC in the here and now of a AI, yet the task of suggesting existing items permits very user’s reality. Personalization shows that a creative syst- little in the way of novelty, no matter how insightful a re- em understands its users, whilst topicalization shows that commendation may be. Where then lies the computational it appreciates the current and historical context that conn- creativity of a system like @ReadMeLikeABot? We view ects them both. These alternate means of grounding inter- book recommendation not as a creative task in itself, but sect in the task of recommendation, for a good recomm- as an occasion for creativity that allows an expressive CC ender engine must understand both the personal dimens- system to demonstrate a witty and whimsical personality. ions of its users and the topics that matter most to them. Consider aspects of linguistic creativity such as metaphor In this section we describe the rationale and the mecha- and irony. While a bot like @MetaphorMagnet can gener- nisms of a recommender system for books as embodied in ate meaningful metaphors with a characteristic voice of its a Twitterbot named @ReadMeLikeABot. As with @BotOn own, its outputs are mostly apropos of nothing, for the bot BotAction, the bot obeys a mostly opt-in policy for its int- must rely on its readers to see a serendipitous relevance in eractions with users, who request ideas for new books to its outputs, in whatever context they consume them. Our read by using the hashtag #ReadMeLikeABook. When the @MetaphorMirror bot finds this relevance for itself in the bot is invoked in this way, it uses the text of the invoking topicality of the news, yet the bot remains a showcase for tweet as the basis for its recommendation. If this does not metaphorical capability rather than a practical application offer a foothold to the recommender engine, the bot looks in its own right. Linguistic creativity is a welcome season- instead at the short Twitter biography that each user def- ing for language, rather than the meal itself; it works best ines for their account. If this is empty or unrevealing, the when it augments rather than supplants our practical aims. bot finally considers the most recent tweets of the user as When viewed as a recommender of books, @ReadMeLike a source of topical material for its book suggestions. As in ABot is not a CC system. Yet the act of suggesting content @BotOnBotAction, those recent tweets also offer a basis to a user on the basis of its insights into the user’s person- for inferring something of the personality of a user, which ality allows a system to be creative in the expression of its may additionally colour the bot’s book recommendations. insights, and to find a genuine use for irony and metaphor. Unauthorized Autobiographies Barr, Bono, and Fredo Corleone. When the NOC entity is @ReadMeLikeABot maintains a tiered database of content fictional and has a known creator, this information is also to recommend. Its first tier contains 500 or so books that used in the generation of literary what-ifs. Consider these: are well known, highly regarded, and by authors of some Captain Ahab's Guide To Chasing a Great White Whale renown. Whenever a book from this tier is recommended, Dr. Stephen Strange's Guide to Performing Magic Tricks the system can be confident that the user has most likely Yoda’s Guide to Promoting Mysticism heard of it, and will likely see its relevance. Each book in this tier is also associated with a set of qualities that des- These books are credited to Herman Melville, Stan Lee cribe not just the book itself but the traits of the readers and George Lucas, respectively. What-ifs also give us the that are most likely to read it. We might assume, then, that opportunity to imagine incongruous pairings of authors: philosophical readers enjoy philosophical books. In this The Geek's Guide to Studying Science way, qualities such as smart, philosophical, warm, hostile The Psychiatrist's Guide to Probing the Mind and upbeat can be linked, via appropriate transformulas, The CEO's guide to Pioneering New Technologies to Twitter users who exhibit the same personality traits. The bot’s second tier is much larger, but also much less The first is credited to Peter Parker and Wesley Crusher; authoritative. Its 15,000 or so books have been extracted the second to Drs. Sigmund Freud and Frasier Crane; the from DBpedia.org using the website’s SPARQL endpoint. third to Tony Stark and Steve Jobs. In general, any lingu- We exploit the linguistic regularity of DBpedia’s category istic framing of pop-culture factoids that pokes fun at the terms to also extract a set of themes for every book. When book industry will suffice here. Publishers themselves see a book is listed under a semantic category with the label the value of parodic cash-grabs, and shelves already groan Xs_about_Y or Y_in_fiction, we extract Y as an apt theme. under fictive offerings like the following, by Pablo Esco- We also mine the hierarchical relations between DBpedia bar, Tyrion Lannister and Wile E. Coyote, respectively: categories to build a semantic network that relates these Lifting The Lid on The Medellín Cartel book themes to each other, such as Artificial Intelligence Exposed: The Secrets of The House Lannister to Neural Networks. This genre and theme network is then An Insider’s Guide to A.C.M.E. the basis of the bot’s content-based recommendations. The last tier, and certainly the most unusual, comprises Recall that such non-books are only ever recommended to 6000 or so humorous fabrications, wholly invented book the user when better matches from a higher tier cannot be ideas that wear their artifice on their sleeves. These titles found, or when all have been offered to that user already. show the usefulness of CC to a recommender system, for Their value is largely found in repetition, then: the more a when the system has no new content to recommend to a user interacts with the bot, the further down its tiers the user, it can always fall back on its own in-jokes to fill the bot must descend, and more the bot will reveal its sense of gap and keep the user engaged. These inventions must be humour, about books and about the book industry itself. seen as the literary jokes they are if the bot’s credibility as a recommender is not to be diminished in the process. To They Shall Not Grow Bold generate these witty fabrications, we use the NOC list of Much research has focused on the recognition of sarcasm Veale (2016), a large multifaceted database of pop-culture and irony in text, especially as it is used in social media. icons that provides vivid descriptions for over 1000 fam- This emphasis on detection is not surprising, given that so ous people, both real and fictional, modern and historical. much of the language that matters is created by humans. For each person, the NOC provides a set of categories – In contrast, very little research has addressed the creative e.g., billionaire for Donald Trump, or politican for Hillary task of generating irony and sarcasm, no doubt because Clinton – and a set of typical activities, such as building we already find our machines to be inscrutable enough in giant walls for Trump and tolerating adultery for Clinton. their dealings with humans. But more than that, sarcasm The NOC is a generic, application-neutral resource, but and irony cannot exist outside of a specific communicat- as these examples show, no little humour is baked into the ive goal: we can generate metaphors in a null context and database from the get-go. The NOC list was first built for leave it to the reader to unearth their implied meaning, but the WHIM project (the What-If Machine), and the task of irony and sarcasm require a firm context to push against. generating whimsical book ideas can be seen as a what-if In short, they need realistic expectations to bring to bear, scenario: what if Genghis Khan, or Bono, or Tony Stark, and a context that undermines them in ways for all to see. wrote a book and told us what they were really thinking? For a machine to generate irony and sarcasm well, it must What-if book generation is a simple task using the NOC: be given enough of these expectations to be versatile, and a system combines a famous person with an apt category an ability to identify those contexts that clearly fall short. and an associated activity, as in the following examples: Personality-driven recommendation supplies these exp- The Comedienne's Guide to Ranting About Liberals ectations in convenient qualitative and quantitative forms. The Rockstar’s Guide To Avoiding Taxes When the bot has a topic-based reason to recommend e.g., The Son’s Guide to Disappointing the Family an intellectual book to a reader who scores low on the an- alytic dimension, or is poorly scored by the transformula These faux books are credited to, respectively, Roseanne for intellectual, this mismatch between topic and person- ality is just the failure of expectation that irony demands. Note how the bot is forced to invent an author for its liter- In this case, topic-based recommendation creates the exp- ary in-joke, which it does by cutting up the author names ectation, and personality analysis defies it, giving the bot from its first tier of books. The third framing is especially a logical reason to snarkily poke fun at the disparity. Sup- apt when the bot’s tweet is accompanied by a graph of the pose the bot does suggest an intellectual book, on cosmo- user’s 11-dimension personality profile (see below), since logy, say, to a user with an avowed interest in cosmology it allows one to appreciate the basis for the bot’s response. that appears to fall well short of the intellectual bar; how But the second framing has another advantage, in that it should it wittily allude to this failure of expectations? The allows the bot to speak directly to the topic of the recom- bot can learn from how humans deal with disappointment mended book. Consider this particular response to a user: by looking to how we express our dissatisfaction through irony. So a web search for intellectual finds the following On the prettiness theme, @anonymized, I used to be ironic similes: about as intellectual as a Cheez Doodle, as as attractive as a brown cardigan until I read "The a cucumber, as a brush, a hole in the ground, a wart hog, Picture of Dorian Gray” by Oscar Wilde. How about a potted plant, a bulldog, an emu, and others too rude to you? I crunched your recent tweets: repeat here. What links each of these mental images is not a shared feature but a common framing; in each case, the author prefaces a simile with “about” to signify the sem- antic imprecision of a creative liberty. We can exploit this framing device to seek out many other ironic similes on the Web for any quality one cares to undermine, to give a bot a rich palette of ironic options to use on its own users. When a user’s Twitter profile scores low for a quality that is estimated either directly (using AnalyzeWords) or indirectly (via a transformula), the bot will dip into its bag of ironic similes for that quality. Choosing at random, it can frame what it retrieves in a variety of colourful ways. Suppose the quality is philosophical, and the bot retrieves the simile about as philosophical as a bowel movement. Perhaps the bot also intends to recommend the philosoph- ical novel Steppenwolfe by Hermann Hesse, as the user’s recent tweets mention loneliness and alone. It can frame this pairing of a book to a simile in the following ways: Hey @bookreader, if you're as philosophical as a bowel movement then maybe you should read ‘Steppenwolfe’ by Hermann Hesse on the theme of solitude. When the bot is between user requests, it attempts to start a conversation about books and their ideas. It does so by Hey @bookreader, I used to be as philosophical as a posing literary questions to its readers, as in this tweet: bowel movement until I read ‘Steppenwolfe’ by Hermann Hesse on the solitude theme. On the religion theme, which of these books is more provocative than the other? "The Satanic Verses" by Hey @bookreader, given your personality profile I don’t @SalmanRushdie, or "Headscarves and Hymens: Why know which philosophical book is more you: the Middle East Needs a Sexual Revolution" by ‘Steppenwolfe’ by Hermann Hesse on the solitude @MonaEltahawy? I compared their recent tweets. theme, or ‘The Bowel Movement’ by Stephen Tolkein. The question itself typically provokes much less convers- The first framing was used in early field tests of the bot, ation on Twitter than the side-by-side personality analyses in its prelaunch in the weeks before the 2018 Science and that the bot provides of the authors’ most recent tweets. Communication conference (for which the bot was comm- issioned). As might be expected, its in-your-face humour was not popular with everyone, and was a cause for some dismay to the event’s organizers. The “If you’re X” cons- truction did little to salve the pain of a sudden insult from an abusive bot, even if the user had invoked it explicitly. The second framing proved to be more successful, since it now turned the bot’s humour inward, on itself, rather than outward on users who might see themselves as its victims. The third framing turns it outward again, on the user, but in a more subtle guise that presents it not as a direct insult but as a playful joke at the expense of the book industry. As You Like It: The Question of Evaluation Vector Space Low Avg. Good V.Good A Twitterbot that publicizes a user’s psychological profile LDA stories+tweets 1.1% 47.8% 41.1% 10% is rather like a public speak-your-weight machine. No one LDA stories only 3.3% 65.6% 30% 1.1% likes to be judged, least of all by a computer, and we can expect a wide diversity of views on the use of tools like LSA stories+tweets 10% 60% 30% 0% AnalyzeWords to condition a bot’s creative outputs. These LSA stories only 17.8% 64.4% 16.7% 1.1% range from “Awesome!” to “very creepy,” with the rump Word2Vec 10% 57.8% 32.2% 0% of users taking their analyses as a starting point for further wit of their own. One user replied to the book bot’s ironic Random baseline 45.5% 46.7% 6.7% 1.1% confession ‘I used to be as poetic as a donkey passing Table 1. Distribution of Aptness by choice of model. wind until I read “Romeo and Juliet” by William Shake- speare’ with the wry remark “I’ll have you know that I’m We used CrowdFlower (now Figure-Eight) to elicit hu- still as poetic as a donkey.” Another user, for whom the man ratings for 90 metaphor / headline pairs from differ- bot recommended Alex Comfort’s “The Joy of Sex” (on ent models. A scale of 1 ... 5 was used for ratings on three the love theme) replied “Wow wow, easy there bot ... buy dimensions: aptness, comprehensibility, and influence, the me dinner first!” The author @MaggieEllen replied to the last of which marks the extent to which a metaphor shapes bot’s comparative analysis of herself to @AmyTan with a a rater’s response to a headline. Six different models were trope from The Simpsons, “you my overlord now?” before used to select the ‘best’ metaphor for each of the 90 head- addressing the specifics of the analysis with the remark lines: an LDA topic model built with a corpus of 380k “Just real glad to know @AmyTan and I are both on 300 news stories and 210k news tweets; an LDA model built mg extended release Welbutrin and equally depressed.” with the 380k news stories but no tweets; an LSA model The value of a creative agent lies as much in the creativity built with 380k stories and 210k tweets; an LSA model it fosters in others as in the creativity of its own outputs. built with 380k stories but no tweets; a Word2Vec space That said, users are more open to personalized outputs of pretrained vectors, so no news stories or tweets were when they flatter their targets, and often chafe at the neg- used; and a baseline model that pairs a random metaphor ative aspects of analyses that are otherwise quite positive. to each headline. For each variant of the LDA and LSA One famous comedian with a science background, whose models, 22.84M MetaphorMagnet metaphors were conc- Twitter feed reflects his TV presence – half comedy, half atenated to the news content (stories with/without tweets), science – was described by @botOnBotAction as “the best so these models produced joint news + metaphor spaces. of Peter Parker and the worst of Jim Carrey: scientific and Mean ratings for each dimension in different models intelligent yet cloying and insecure.” The user’s response were not very discriminating. In each case, LDA (stories was unforgiving: “Not so insecure that I post anonymous- + tweets) pipped all others to the top spot. For Aptness – ly though.” When a science book by the same user, a pop- how apt is a metaphor for a headline? – the means ranged ular author, was promoted by the book bot via its Analyze from 2.95 (± standard dev. 1.27) for LDA (stories+tweets) Words comparison to a similar author, the mention earned down to 2.20 ± 1.2 (random baseline). Comprehensibility it a comparable rebuke “Not sure this analysis has a single – the understandability of each pairing – ranged from 3.59 thing to do with the books; but you enjoy yourself.” When ± 1.05 (for LDA, stories+tweets) down to 2.54 ± 1.12 creative systems get personal, so too will their audiences, (random baseline), and Influence ranged from 3.01 ± 1.24 making it difficult to objectively evaluate their outputs. (LDA stories+tweets) to 2.09 ± 1.24 (random baseline). This makes an extrinsic evaluation of such systems pre- The differences across models were not statistically sign- ferable. Ghosh & Veale (2017) explore the contribution of ificant, except in comparison to the baseline. Yet mean a user’s Twitter profile – specifically, their AnalyzeWords performance disguises deeper differences. If we quantize profile – to the assessment of whether their most recent the human ratings of aptness into four equal-sized buckets tweet is sarcastic or not. We expect mood and personality (Low, Average, Good and Very Good) so as to identify the to be factors in the determination of a sarcastic mindset, model that places the most metaphor:headline pairs into as recent emotions – from anger to arrogance – will shape the Good or Very Good buckets, we obtain the findings of the perception of a user’s intent in a given tweet. In that Table 1. More than half of pairings suggested by the LDA case, we expect a neural model of sarcasm detection to be (stories+tweets) model end up in one of these top buckets, improved by the addition of accurate personality features suggesting that this model produces the most apt results. that are active in the relevant time frame. Ghosh & Veale report a statistically significant gain in detection accuracy, Conclusions: Don’t Give Up The Day Job of approximately 6% to 7%, when AnalyzeWords features are incorporated into their neural architecture. If personal- Oscar Wilde once wrote that “art has as many meanings ity features can improve the appreciation of creative texts, as man has moods.” The point of affective computational they can certainly play a key role in their generation too. creativity is not just to enlarge the space of artifacts that is Topicality-driven bots like @MetaphorMirror afford a explored by a CC system, or to make those artifacts more more direct evaluation, since it is the news context, and revealing about the processes that generated them; it is to not a specific user, that is directly addressed in the output. make those artifacts more revealing about their audiences. This potential for personalization and topicalization has Linguistic Data Consortium. not gone unnoticed in other CC work. With regard to The Chung, C.K. & Pennebaker, J.W. (2008). Revealing dimensions Painting Fool, a versatile generator of portraits (and other of thinking in open-ended self-descriptions: An automated mean- painterly forms), Colton et al. (2007) built on the work of ing extraction method for natural language. Journal of Research Pantic & Rothkrantz (2003) to give the Fool a sense of the in Personality, 46, 96-132. mood of the subject it is painting, so that it might capture this understanding in its outputs. A linguistic tool such as Colton, S., Valstar, M.F., & Pantic. M. (2008). Emotionally Aware Automated Portrait Painting. In Proc. of DIMEA’08, the AnalyzeWords is of little use when dealing with a video or rd a camera still, but Colton et al. note the value of FACS, 3 Int. Conf. on Digital Interactive Media in Entertainment and the Facial Action Coding System of Ekman (2002). Some Arts Athens, Greece. September 10 - 12. users of CC systems wear their emotions on their faces; Ekman, P. (2002). Facial Action Coding System. A Human others reflect them in their social-media communications. Face. Utah, Salt Lake City:W. V. Friesen, and J. C. Hager. Depending on the modality of the interaction – and perso- Fauconnier, G. & Turner, T. (2002). The Way We Think: Con- nality certainly turns CC into an interactive process, even ceptual Blending and the Mind's Hidden Complexities. Basic if users are scarcely aware of their own contributions – a Books, New York. system must exploit whatever affective cues it can find. Ghosh, A. & Veale, T. (2017). Magnets for Sarcasm: Making Topicalization has also revealed a strong potential for CC Sarcasm Detection Timely, Contextual and Very Personal. Proc. exploitation. Like personalization, it makes the outputs of of EMNLP 2017, the Conf. on Empirical Methods in Natural a generative system more relevant to the users for whom Language Processing, 493–502, Copenhagen, September 7–11. they are created. For example, the PoeTryMe poetry gene- rator of Gonçalo Oliveira (2017) augments its core knowl- Gonçalo Oliveira, H. (2017). O Poeta Artificial 2.0: Increasing edge stores (such as semantic and conceptual networks) in meaningfulness in a poetry generation Twitter bot. In Proc. of a number of ways, including the use of live Twitter feeds CC-NLG, the Workshop on Computational Creativity in Natural to ground its outputs in the here-and-now of social media. Language Generation. Santiago de Compostela, Spain pp 11-20. By showing an awareness of users and their world, these Koestler, A. (1964). The Act of Creation. Penguin Books. systems present themselves as more self-aware too. They Lakoff, G. &. Johnson, M. (1980). We Live By. University of present themselves not as closed generative bubbles, like Chicago Press. the imprisoned wretch of Searle’s Chinese Room thought Landauer, T.K. & Dumais, S. (1997). A solution to Plato's prob- experiment (1980), but as agents of a wider world that can lem: the latent semantic analysis theory of acquisition, induction, predict how their creative outputs will impinge on others. and representation of knowledge. Psych.Review 104(2):211-240. If viewed as ‘human’ creators, CC systems such as The Painting Fool, PoeTryMe and MetaphorMagnet would all Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Effic- be seen as full-time creators whose work is their calling. ient Estimation of Word Representations in Vector Space. Most CC systems conform to this all-or-nothing pattern; arXiv:1301.3781 [CS], January. their creative work is everything, and the systems have no Pantic, M. & Rothkrantz, L.J.M. (2003). Toward an affect- ‘lives,’ whatever this might mean, beyond their generative sensitive multimodal human-computer interaction. In Proc. of responsibilities. @ReadMeLikeABot is a useful exception the IEEE, 91(9):1370–1390. to this generalization. To this CC system, as it is to most Řehůřek, R. & Sojka, P. (2010). Software Framework for Topic humans, creativity is merely a sideline to a ‘day job’ that Modeling with Large Corpora. In Proc. of the LREC 2010 is not in itself a creative exercise. Book recommendation Workshop on New Challenges for NLP Frameworks, pp 45–50. is a task that requires AI but has little obvious use for CC, yet this bot shows that a system that benefits from an app- Searle, J. (1980). Minds, Brains and Programs. Behavioral and reciation of a topical context, or a user’s personality, can Brain Sciences 3(3): 417–457. also reap benefits from the creative framing of its outputs. Veale, T. (2016). Round Up The Usual Suspects: Knowledge- Conversely, the CC component of these systems may also Based Metaphor Generation. Proc. of the Meta4NLP Workshop benefit from exposure to the stuff of its mundane day job, on Metaphor at NAACL-2016, the annual meeting of the North by giving it a contingent knowledge of possibilities that it American Assoc. for Computational Linguistics CA: San Diego. might never recognize in a purely creative mode. We can Veale, T. (2015). Unnatural Selection: Seeing Human Intell- go further, and argue that all CC systems can benefit from igence in Artificial Creations. Journal of General Artificial Int- a day job that exposes them to the mundane concerns of elligence, 6(1), special issue on Computational Creativity, the people they must serve, so as to later transmute those Concept Invention, and General Intelligence, pp 5-20. concerns into something both familiar and non-obvious. Veale, T. & Alnajjar, K. (2016). Grounded for life: creative symbol-grounding for lexical invention. Connection Science, References 28(2):139–154. Veale, T. & Cook, M. (2018). Twitterbots: Making Machines Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003). Latent Dirichlet that Make Meaning. Cambridge, MA: MIT Press. Allocation. J. of Machine Learning Research, 3:993–1022. Brants, T. & Franz, A. (2006). Web 1T 5-gram Version 1.