The Effects of Negative Premises on Inductive Reasoning: a Psychological Experiment and Computational Modeling Study

The Effects of Negative Premises on Inductive Reasoning: A Psychological Experiment and Computational Modeling Study Kayo Sakamoto ([email protected]) Tokyo Institute of Technology, 2-21-1 O-okayama, Meguro-ku, Tokyo, 152-8552 JAPAN Masanori Nakagawa ([email protected]) Tokyo Institute of Technology, 2-21-1 O-okayama, Meguro-ku, Tokyo, 152-8552 JAPAN Abstract studies, such as causal learning (e.g., Buehner and Cheng, Various learning theories stress the importance of negative 2005). However, the effects of negative premises have learning (e.g., Bruner, 1959; Hanson, 1956). However, the rarely been discussed in any detail in the context of effects of negative premises have rarely been discussed in inductive reasoning studies concerned with the evaluation any detail within theories of inductive reasoning (with the of arguments. exception of Osherson et al., 1990). Although Sakamoto et al. Investigating the effects of negative premises can (2005) have proposed some computational models that can undoubtedly contribute to our understanding of inductive cope with negative premises and verified their psychological reasoning. For instance, cases where the entities in both validity, they did not consider cases where category-based negative and positive premises belong to the same category induction theory is ineffective, such as when the entities in both negative and positive premises belong to the same are clearly problematic for the category-based induction category. The present study was conducted to test the theory (Osherson et al., 1990) because it is impossible to hypothesis that, even when negative and positive premises distinguish between negative premises and positive involve same-category entities, people can estimate the premises from the categorical viewpoint. In contrast to the likeliness of an argument conclusion by comparing feature similarity and coverage model based on category-based similarities. Based on this hypothesis, two computational induction theory, Sloman (1993) has proposed a feature- models are proposed to simulate this cognitive mechanism. based model based on a simple perceptron. According to While both these models were able to simulate the results Sloman, knowledge of category structure is not required for obtained from the psychological experiment, a perceptron the evaluation of arguments. Rather, he assumes that model could not. Finally, we argue that the mathematical argument evaluation is based on a simple computation of equivalence (from Support Vector Machines perspective) of these two models suggests that they represent a promising feature similarities between the entities of the premises and approach to modeling the effects of negative premises, and, the conclusion. Thus, Sloman’s feature-based model may be thus, to fully handling the complexities of feature-based more effective at coping with negative premises than induction on neural networks. category-based induction theory. However, the psychological validity of Sloman’s model has yet to be Introduction tested in terms of processing negative premises. This study is concerned with evaluating “arguments”, Accordingly, this study examines the validity of feature- such as: based induction theory to handle these cases that are so Collies produce phagocytes. problematic for category-based induction theory. In terms of model construction, what kind of model can Horses produce phagocytes. adequately represent the cognitive process of feature-based Shepherds produce phagocytes. induction, including negative premises? In addition to Sloman’s (1993) model, Sakamoto, Terai and Nakagawa The propositions above the line are referred to as (2005) have also proposed a feature-based model. While “premises” while the statement below is the “conclusion”. structurally similar, their model extends the learning The evaluation of an argument involves estimating the algorithm in order to cope with negative premises. likelihood of the conclusion based on the premises. Moreover, their model utilizes corpus-analysis results to Osherson, Smith, Wilkie, Lopez, and Shafir (1990) refer to compute feature similarities, rather than the results of this kind of argument as a “categorical” argument, because psychological evaluations used in Sloman’s model. This the predicate (e. g., “produce phagocytes”) in the premises means that Sakamoto et al’s model is capable of simulating and conclusion is attributed to one or more entities (e. g., a far greater variety of entities than Sloman’s model (over “Collies”, “Shepherds”). 20,000 compared to just 46), because the corpus analysis The premises can also be negative in form (e. g., provides information for an enormous quantity of words. “Penguins do not produce phagocytes ”). Since classic However, when induction of the appropriate category is studies, such as discrimination learning (e. g., Hanson, difficult, then the level of computation involved in the 1956) and concept learning (e.g., Bruner,1959), the feature similarity comparisons will far exceed the importance of negative examples has been widely computational capacity of simple perceptrons. This study recognized, and has been demonstrated in more recent therefore proposes a modified version of the Sakamoto et al 2081 model, referred to as the “multi-layer feature-based neural algorithm (See Kameya & Sato, 2005). In this study, term network model”, which is compared with the previous “Ni” represented a noun, and term “Aj” represents a feature perceptron model. word, such as a predicate. The number of latent classes was A further alternative model for coping with negative fixed at 200. premises would also seem to be possible. Osherson, Stern, For the actual estimation, the word co-occurrence Wilkie, Stob, and Smith (1991) have also proposed a model frequencies used were extracted from Japanese newspaper to handle negative premises based on feature similarity, articles, covering a ten-year span (1993-2002) of the involving more complex computation than possible with Mainichi Shimbun. This co-occurrence frequency data perceptrons. However, the Osherson et al.‘s feature-based comprises the combinations of 21,205 nouns and 83,176 model requires knowledge of relevant taxonomical predicates in modification relations. CaboCha (Kudo & categories, and also utilizes psychological evaluations like Matsumoto, 2002), a Japanese analysis tool for modification Sloman’s model. As the restricted number of available relations, was used for extraction. entities (46, similar to Sloman’s model) makes it difficult to In order to test the assumption that the latent classes apply that model, this study also modifies the Osherson et correspond to categories, it is important to identify the al feature-based model in order to handle greater numbers meanings of the classes. In this case, it is possible to of entities and to eliminate the need for categorical identify the meaning of a class from the conditional knowledge. probability of the latent class Ck given a predicate Aj The outline of this study is as follows: First, the corpus (P(Ck|Aj)) and the probability of the latent class Ck given a analysis is described. The results of the corpus analysis noun Ni (P(Ck|Ni)), which can be computed from the were utilized in creating clear category definitions and in estimated parameters P(Ck), P(Ni|Ck), and P(Aj|Ck), applying constructing the models. Second, a psychological the Bayesian theorem. experiment is described which was conducted to examine Class of Foods(c1) the effects of negative premises, that cannot be accounted for by the category-based induction theory. Third, two P(c1|ni) P(c1|aj) models —the multilayer feature-based model and a 1 steak 0.876 boil down 0.987 modified version of Osherson et al’s feature-based 2 set meal 0.867 eat 0.978 model—are proposed and tested in terms of their 3 fill one’s psychological validity. Finally, this study argues that the grain foods 0.811 mouth with 0.937 mathematical equivalence (from a support vector machine 4 vegetable soup 0.817 want to eat 0.927 perspective) of these two models suggests that they 5 represent a promising approach to modeling the meat 0.739 stew 0.920 complexities of feature-based induction on neural networks. 6 curry 0.734 don’t eat 0.918 7 Chinese 0.720 can eat 0.913 A Corpus Analysis for Category noodle Definitions and Model Construction 8 pizza 0.716 boil 0.894 clear Categories used in this study are defined as latent semantic 9 barleycorn 0.594 one's plate 0.894 classes estimated from an analysis of the words in a 10 rice cake 0.555 let’s eat 0.892 Japanese corpus. The estimations were based on a soft- clustering of words according to modifying frequencies in the corpus. The soft-clustering results are represented as Class of Valuable assets(c2) conditional probabilities of words given the latent classes . P(c2|ni) P(c2|aj) From these probabilities, the conditional probabilities of 1 stock 0.929 issue 0.929 feature word/phrases, given particular nouns, are also 2 government 0.862 list 0.916 computed. The models in this study applied these bonds conditional probabilities of feature words as the strengths of 3 place 0.791 increase 0.899 the relationships between nouns (entities) and features. 4 building estate 0.780 release 0.884 The method of soft-clustering was based on a method of 5 real 0.757 vend out 0.877 similar structure to Pereira’s method or PLSI (Hofmann estate 1999; Kameya & Sato 2005; Pereira, Tishby, and Lee 1993). 6 cruiser 0.662 sell 0.852 This method assumes that the co-occurrence probability of a 7 farmland 0.657 borrow on 0.841 term “Ni” and a term “Aj”, P(Ni,Aj), can be represented as foreign formula (1): 8 bonds 0.628 not sell 0.802 P(N , A ) = ∑ P(N | C )P(A | C )P(C ) , (1) 9 buy i j k i k j k k house 0.594 and add 0.802 where P(Ni|Ck) is the conditional probability of term Ni, 10 currency 0.555 keep 0.781 given the latent semantic class C .

Load more