Distributed Semantics, Judgment, and Decision Making 1

DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION MAKING 1 Knowledge, cognition, and everyday judgment: An introduction to the distributed semantics approach Russell Richie* & Sudeep Bhatia University of Pennsylvania, USA September 23, 2019 Author Note * Correspondence regarding this article should be addressed to Russell Richie, 3720 Walnut St., Philadelphia, PA 19104, [email protected] DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION MAKING 2 Introduction Every day, we make thousands of judgments and decisions (Gilovich et al., 2002; Oppenheimer & Kelso, 2015; Weber & Johnson, 2009). We may judge how tasty and nutritious a fruit is, which may influence whether we buy or eat it. We might consider two items, say ‘nurse’ and ‘journalist’, and judge how similar they are, potentially generalizing from one to the other on the basis of their similarity – or we may consider ‘brilliant’ and ‘smart’, and decide whether one is sufficiently synonymous with the other to replace it in a particular sentence. We may ponder questions like “Will there be an earthquake in California?” or “Is Donald Trump honest” or “Should I spend money on a vacation or save up for a new television?”, consider various semantic aspects of these question, and gradually decide our answers to them. We may perceive that the relationship between Tony and Maria in West Side Story is similar to that between Romeo and Juliet, and generalize what we know of the latter relationship to the former. To complicate it all, different people from different cultures and/or times may make very different responses to all these judgment problems. In each of these examples, the judgments and decisions must be made on the basis of what the decision maker knows about the vast number of things in the world. For example, the taste judgment about a fruit likely depends on, e.g., what the decision maker thinks the sugar content of the fruit is. The judgment of the similarity of ‘nurse’ and ‘journalist’ depends on the extent to which what is known of ‘nurse’ is identical to what is known of ‘journalist’. Thus, for psychologists to understand how people make the above kinds of judgments and decisions, we must understand both: (a) What (different) people know about potential judgment targets, and (b) How that knowledge is used to make a judgment (again, in potentially different ways for different individuals) In some studies of human judgment and decision making, and other areas of high-level cognition, artificial stimuli may be constructed such that experimenters can assume (to a first DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION MAKING 3 approximation) what participants know about the judgment stimuli. But of course, theories of judgment and decision making ought to apply not just in the lab and to artificial stimuli, but in the natural world, to the decision problems people face every day in the normal course of their lives. Thus psychologists need to know what people know and believe about natural entities in the real world. Indeed, it is only by modeling what people know and believe about entities in the world can psychological theories of judgment be applied to areas of practical relevance, such as health policy, consumer behavior, or political psychology. Techniques for uncovering what people know about such naturally occurring judgment targets has a long history, constituting a large part of the field of psychometrics. For example, participants may provide data on the similarity between items (words, images, etc.), usually through direct ratings of similarity on Likert scales. A matrix of similarities between all pairs of items can then be submitted to techniques like multidimensional scaling or additive clustering (Shepard, 1974; Shepard & Arabie, 1979), which obtain for each item a low dimensional spatial or featural representation, respectively (representations which may have intuitive interpretations, e.g., scaling of emotion words may have a dimension corresponding to valence (Russell, 1980), or additive clustering of numbers may uncover features for evenness, primeness, etc.; Navarro & Lee, 2003). Or, participants may be simply asked to list the features they think are important for each item, producing so-called ‘feature norms’ (McRae, Cree, Seidenberg, McNorgan, 2005; Devereux, Tyler, Geertzen, & Randall, 2014; Buchanan, Valentine, & Maxwell, 2019). For example, responding to ‘cat’, a participant may say ‘furry’, ‘chases mice’, and ‘aloof’. Unfortunately, despite the long success of these and related classical techniques for obtaining knowledge representations, they face a crucial shortcoming: they simply cannot scale to the number – actually infinite – of natural entities in the real world that are potential judgment targets. And even for the relative handful of items for which representations could be obtained through classical techniques, DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION MAKING 4 the representations are typically relatively impoverished, possessing only a few dimensions or features (whereas, as we will see, newer techniques can uncover much richer representations). Fortunately, the last few years have seen the development of techniques that can deliver cheap, rich, and accurate representations for millions, and sometimes even infinite, entities that may be involved in everyday judgment. These techniques, and their underlying theoretical assumptions, are referred to by the term distributed semantics (DS), as they propose that semantic representations are reflected in, and can be recovered from, the statistical distribution of words in language. While these ideas have a long theoretical and applied history in psychology (Harris, 1954; Firth, 1957; Landauer and Dumais, 1997), three key advances have increased interest in distributed semantics: availability of large scale natural language corpora, increased computational power, and new algorithms for efficiently deriving distributed semantic representations. These advances now enable especially rich and comprehensive distributed semantic representations – commonly known as word vectors or word embeddings – for millions of words and common phrases (without the need for any explicit participant ratings data), and even, very recently, for longer, novel phrases, sentences, and documents (Turney & Pantel, 2010; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013; Devlin, Chang, Lee, & Toutanova, 2018). Distributed semantic representations thus provide a solution to (a) above, quantifying what people know about potential judgment targets. Additionally, as most distributed semantic methods yield (high-dimensional) spatial or featural representations for judgment targets, these methods can be used in much the same way as the outputs of standard psychometric methods. Thus existing psychological solutions to (b) above (how people use knowledge to make a decision – see Oppenheimer & Kelso, 2015) can be combined with representations obtained through distributed semantics, opening up new avenues for studying naturalistic human judgment, and making it feasible to build computational models that represent knowledge, make evaluations and attributions, and give responses, in a human-like manner. DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION MAKING 5 This chapter will provide a summary of distributed semantic models – primarily models of word representations, but briefly also novel phrase and sentence representations – and their applications to human judgment and psychological science, and is organized as follows. First, we describe distributed semantic models, at a high level of detail (again, focusing on DS models of words and frozen phrases). Other recent reviews provide a more comprehensive overview of the statistical and computational underpinnings of DS models, the differences between different algorithms for building DS models, the technical steps necessary to apply these algorithms on natural language data, and applications of DS models to natural language processing and artificial intelligence tasks (e.g. Lenci. 2018; Turney & Pantel, 2010). Second, we review applications of DS to judgment and decision making, proceeding from simpler to more complex applications. In each case, we will make explicit (b), the precise way in which DS representations are used, mathematically. A key point of this section and the chapter more broadly is that DS representations are themselves not a model of psychological process, a mistake that has, on occasion, invited certain criticisms against DS representations (and spatial models more generally). Finally, we suggest future work with distributed semantics. Here we will discuss (among other things) very recent models in natural language processing that have transformed that field, models like the Universal Sentence Encoder (Cer et al., 2018), BERT (Devlin et al., 2018), and ELMO (Peters et al., 2018), which can deliver representations for phrases and sentences. These models may likewise transform the psychology of judgment and decision making. Distributed Semantics The main idea behind distributed semantics is that patterns of co-occurrence among words reveal word meaning. We illustrate with a toy ‘co-occurrence matrix’ (Table 1). Consider the words ‘spinach’, ‘banana’, ‘cake’, ‘apple’, and ‘computer’, and imagine that they occur in a large corpus (collection of documents), and each co-occurs with the words ‘vitamin’, ‘sugar’, ‘fat’, ‘bake’, and ‘information’ with some particular frequency. We refer to the first set of words as target words and the DISTRIBUTED SEMANTICS, JUDGMENT, AND DECISION

Distributed Semantics, Judgment, and Decision Making 1

Intelligent Cognitive Assistants (ICA) Workshop Summary and Research Needs Collaborative Machines to Enhance Human Capabilities

Potential of Cognitive Computing and Cognitive Systems Ahmed K

Introduction to Gpus for Data Analytics Advances and Applications for Accelerated Computing

Cognitive Computing Systems: Potential and Future

Artificial Intelligence, Real Results Cognitive Computing Capabilities Can Help Member Firm Clients Make Smarter, Faster Choices

Cognitive Computing

Wearable Affective Robot

Construction of Feed Forward Multilayer Perceptron Model for Genetic Dataset in Leishmaniasis Using Cognitive Computing

Resiliency in the Cognitive Era

OCR (Optical Character Recognition)

Experiences in Building GPU Enabled HPC Clusters

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection Within Online Textual Comments