The Computational Creativity Complex
Total Page:16
File Type:pdf, Size:1020Kb
The Computational Creativity Complex Dan Ventura Abstract We briefly examine the subject of computational creativity through the lenses of three different systems for producing creative artifacts in three different domains: music, visual art and cookery. From these we attempt to abstract some- thing of a “general-purpose” computationally creative agent and examine how this agent might behave in the context of an archetypical “algorithm” for creativity. Fi- nally, we consider this agent’s behavior from the point of view of the theory of (Turing) computability and suggest that computational creativity as a field provides an interesting opportunity for looking at computational complexity in new ways. 1 Inception The question of computational creativity can be approached from several different angles. Here we assume that creativity is possible in computational systems and examine the idea of a general, abstract mechanism for computational creativity that exists somehow independently from any particular domain. This is a difficulty ques- tion, and it is clear that much of creativity is not a domain independent construct. However, we posit that there does exist some core abstraction or creativity “algo- rithm” that can be applied to any domain (with suitable domain-specific augmen- tation, of course). Given this hypothesis, we attempt an inductive approach to the problem by first examining three computationally creative systems in three differ- ent domains and then attempt to generalize from these specific examples an abstract model of a creative system. As a complementary view of the problem, we also discuss an abstract “al- gorithm” for creativity and relate this “algorithm” to the abstract system, asking whether this “algorithm” could, in fact, become an algorithm in the formal sense, Dan Ventura Computer Science Department, Brigham Young University, e-mail: [email protected] 1 2 Dan Ventura and attempt to reason about the answer to that question. In other words, we try to ascertain whether computational creativity is actually computable. 2 Three Blind Mice Our three specific examples, from which we will try to generalize an abstract sys- tem are DARCI, a computational artist that creates visualizations for communicating concepts [31, 32, 19, 33, 20]; CARL, a computational composer that discovers mu- sical motifs in non-musical sources and composes music around them [22, 21]; and PIERRE a computational chef that creates original slow cooker recipes[29]. Each of these systems has been presented in more detail elsewhere, and we only give enough detail here to support the generalization that is our goal. 2.1 DARCI DARCI is a system for generating original images that convey intention and is in- spired by other artistic image generating systems such as AARON [28] and The Painting Fool [7]. Central to the design philosophy of DARCI is the notion that the communication of meaning in art is a necessary part of eliciting an aesthetic expe- rience in the viewer [11], and it is unique in that it creates images that explicitly express a given concept using visual metaphor. This is currently done at two levels: using iconic nouns as surrogates for the target concept and using image filters to convey associated adjectival content. DARCI is composed of two major subsystems, an image understanding com- ponent, and an image generation component. The image understanding component learns how to associate images with concepts in the forms of nouns and adjectives. The image generation component composes an original source image as a collage of iconic noun concepts and then uses a genetic algorithm, governed by the analysis component, to render this source image to visually convey an adjective. Figure 1 outlines this process of creating artifacts. 2.1.1 Image Understanding DARCI’s understanding of images is derived from two sources: a mapping from low-level image features to descriptive adjectives and semantic associations between linguistic concepts. Visuo-Linguistic Association In order for DARCI to make associations between images and associated adjectives, the system learns a mapping from low-level com- puter vision features [18, 13, 25, 43, 23, 42] to words using images that are hand- The Computational Creativity Complex 3 Fig. 1 A diagram outlining the two major components of DARCI. Image analysis learns how to annotate new images with adjectives using a series of appreciation networks trained with labeled images (outlined in blue). Image generation uses a semantic memory model to identify nouns and adjectives associated with a given concept. The nouns are composed into a source image that is rendered to reflect the adjectives, using a genetic algorithm that is governed by a set of evaluation metrics. The final product (outlined in red) is an image that communicates the given concept. labeled with adjective tags. The use of WordNet’s [17] database of adjective synsets allows images to be described by their affect, most of their aesthetic qualities, many of their possible associations, and even, to some extent, by their subject. To collect training data we have created a public website for training DARCI (http://darci.cs.byu.edu), where users are presented with a random image and asked to provide adjectives that describe the image. When users input a word with multiple senses, they are presented with a list of the available senses, along with the WordNet gloss, and asked to select the most appropriate one. Additionally, for each image presented to the user, DARCI lists seven adjectives that it associates with the image. The user is then allowed to flag those labels that are not accurate. Learning image to synset associations is a multi-label classification problem [39], meaning each image can be associated with more than one synset. To handle this, we use a collection of artificial neural networks (ANNs) that we call appreciation networks, each of which outputs a single real value, between 0 and 1, indicating the degree to which a given image can be described by the networks’ corresponding synset (adjective). An appreciation network is created for each synset that has a sufficient number of training data, and as data is incrementally accumulated, new neural networks are dynamically added to the collection to accommodate any new synsets. There are currently close to 300 appreciation networks in the system. Semantic Memory Model The system also contains a simple cognitive model, built as a semantic network forming a graph of associations between words [37, 4 Dan Ventura 14]. These word associations are acquired in one of two ways: from people and by automatic inference from a corpus, with the idea being to use the human word associations to capture general knowledge and then to fill in the gaps using the corpus associations. For the human word associations, we use two pre-existing databases of free as- sociation norms (FANs): the Edinburgh Associative Thesaurus [24] and the Univer- sity of Florida’s Word Association Norms [30]. These word associations were ac- quired by asking hundreds of human volunteers to provide the first word that comes to mind when given a cue word. This technique is able to capture many different types of word associations including word co-ordination (pepper, salt), collocation (trash, can), super-ordination (insect, butterfly), synonymy (starving, hungry), and antonymy (good, bad). The association strength between two words is simply a count of the number of volunteers that said the second word given the first word. FANs are considered to be one of the best methods for understanding how people, in general, associate words in their own minds [30]. For the corpus-based associations, we build a (term × term) co-occurrence matrix from a large corpus, in a manner similar to that employed in the Hyperspace Analog to Language (HAL) model [27]. For our corpus, we use the entire (English) text of Wikipedia, as it is large, easily accessible, and covers a wide range of human knowledge [15]. Once the co-occurrence matrix is built, we use the co-occurrence values themselves as association strengths between words. This approach works, since we only care about the strongest associations between words, and it allows us to reduce the number of irrelevant associations by ignoring any word pairs with a co-occurrence count less than some threshold. Our final semantic network is a composition of the human- and corpus-based as- sociations, which essentially merges the two separate graphs into a single network before querying it for associations. This method assumes that the human data con- tains more valuable word associations than the corpus data because such human data is typically used as the gold standard in the literature. However, the corpus data does contain some valuable associations not present in the human data. To combine the graphs, we add the top n associations for each word from the corpus data to the human data but weight the corpus-based association strengths lower than the human-based associations. This is beneficial for two reasons. First, if there are any associations that overlap, adding them again will strengthen the association in the combined network. Second, corpus-based associations not present in the human data will be added to the combined network and provide a greater variety of word asso- ciations. We keep the association strength low because we want the corpus data to reinforce, but not dominate, the human data. 2.1.2 Image Generation DARCI generates images in two stages: the creation of a source image composed of a collage of concept icons and the rendering of this source image using various pa- rameterized image filters. The collage generation is driven by the semantic network, The Computational Creativity Complex 5 while the filtered rendering is achieved using an evolutionary mechanism whose fit- ness function is defined in terms of the outputs of the visuo-linguistic association networks.