ConceptNet 3: a Flexible, Multilingual for Common Sense Knowledge

Catherine Havasi Robert Speer Laboratory for Linguistics and Computation Common Sense Computing Group Brandeis University MIT Media Lab 415 South Street 20 Ames Street Waltham, MA 02454 Cambridge, MA 02139 [email protected] [email protected] Jason B. Alonso Tangible Media Group MIT Media Lab 20 Ames Street Cambridge, MA 02139 [email protected]

Abstract “I bought groceries”, he is unlikely to add that he used The project has been money to do so, unless the context made this fact sur- collecting common-sense knowledge from volun- prising or in question. This means that it is difficult to teers on the since 2000. This knowledge automatically extract common-sense statements from is represented in a machine-interpretable seman- text, and the results tend to be unreliable and need tic network called ConceptNet. to be checked by a human. In fact, large portions of We present ConceptNet 3, which improves the current lexical resources, such as WordNet, FrameNet, acquisition of new knowledge in ConceptNet and PropBank, , SIMPLE and the BSO, are not col- facilitates turning edges of the network back into lected automatically, but are created by trained knowl- natural language. We show how its modular de- edge engineers. This sort of resource creation is labor sign helps it adapt to different data sets and intensive and time consuming. languages. Finally, we evaluate the content of ConceptNet 3, showing that the information it In 2000, the Open Mind Common Sense project be- contains is comparable with WordNet and the gan to collect statements from untrained volunteers on Brandeis Semantic Ontology. the Internet. Since then, it has amassed over 700,000 pieces of information both from free and structured text entry. This data has been used to automatically Keywords build a semantic network of over 150,000 nodes, called ConceptNet. In this paper we introduce ConceptNet Knowledge representation, common-sense reasoning, natural 3, its newest version. We then compare information language processing, in ConceptNet to two primarily hand-created lexical resources: the Generative Lexicon-inspired Brandeis Semantic Ontology project [13] and WordNet [4]. 1 Introduction Understanding language in any form requires under- 2 The Open Mind Common standing connections among , concepts, phrases and thoughts. Many of the problems we face today Sense Project in artificial intelligence depend in some way on under- standing this network of relationships which represent The Open Mind Common Sense (OMCS) project the facts that each of us knows about the world. Re- serves as a distributed solution to the problem of com- searchers have looked for ways to automatically dis- mon sense acquisition, by enabling the general public cover such relationships, but automatic methods can to enter common sense into the system with no spe- miss many basic relationships that are rarely stated cial training or knowledge of science. The directly in corpora. When people communicate with project currently has 14,000 registered English lan- each other, their conversation relies on many basic, guage contributors. unspoken assumptions, and they often learn the basis OMCS collects data by interacting with its contribu- behind these assumptions long before they can write tors in activities which elicit different types of common at all, much less write the text found in corpora. sense knowledge. Some of the data is entered free- Grice’s theory of pragmatics [5] states that when form, and some was collected using semi-structured communicating, people tend not to provide informa- frames where contributors were given sentences and tion which is obvious or extraneous. If someone says would fill in a or phrase that completed the sen-

1 tence. For example, given the frame “ can be contribute to Open Mind, and give us the potential to used to .”, one could fill in “a pen” and “write”, build connections between the knowledge bases of the or more complex phrases such as “take the dog for a different languages and study the cultural differences walk” and “get exercise”. that emerge. Open Mind Commons [15] is a new interface for ConceptNet 3 is flexible enough with its natural lan- collecting knowledge from volunteers, built on top of guage tools that it can build ConceptNets for multiple ConceptNet 3, which allows its contributors to partic- languages and synthesize them into the same . ipate in the process of refining knowledge. Contribu- We have now done so with the Portuguese corpus, tors can see the statements that have previously been which is the most mature of OMCS’ sister projects. entered on a given topic, and give them ratings to in- dicate whether they are helpful, correct knowledge or not. Also, Commons uses the existing knowledge on a 2.3 OMCS and Other Resources topic to ask relevant questions. These questions help the system fill in gaps in its knowledge, and also help 2.3.1 Cyc to show users what the system is learning from the The Cyc project [7] is another attempt to collect com- knowledge they enter. mon sense knowledge. Started by Doug Lenat in 1984, Each interface to OMCS presents knowledge to its this project utilizes knowledge engineers who hand- users in natural language, and collects new knowledge craft assertions and place them in Cyc’s logical frame- in natural language as well. In order to use this knowl- works, using a logical representation called CycL. To edge computationally, it has to be transformed into a use Cyc for natural language tasks, one must trans- more structured representation. late text into CycL through a complex and difficult process, as natural language is ambiguous while CycL 2.1 The Birth of ConceptNet is logical and unambiguous. ConceptNet is a representation of the Open Mind Common Sense corpus that is easy for a variety of 2.3.2 WordNet applications to process. From the semi-structured En- glish sentences in OMCS, we are able to extract knowl- Princeton University’s WordNet [4] is one of the most edge into more computable representations. When the widely used natural language processing resources to- OMCS project began using the data set to improve day. WordNet is a collection of words arranged into intelligent user interfaces, we began employing extrac- a hierarchy, with each word carefully divided into dis- tion rules to mine the knowledge into a semantic net- tinct “senses” with pointers to related words, such as work. The evolution of this process has brought us to antonyms, is-a superclasses, and words connected by ConceptNet 3. other relations such as part-of. WordNet’s popularity In this version of ConceptNet, we focus on the use- may be explained by the ease a researcher has in inter- fulness of the data in the OMCS project to natural lan- facing it with a new application or system. We have guage processing and artificial intelligence as a whole. endeavored to accomplish this flexibility of integration We have aimed to make ConceptNet modular in a way with ConceptNet. which enables us to quickly and easily make Concept- Nets for other data sets such as the Brazilian Open 2.3.3 BSO Mind. To support this change of focus, improvements such as higher-order predicates, polarity and improved Currently being developed, the Brandeis Semantic On- weighting metrics have been introduced. tology (BSO) [13] is a large based in ConceptNet and OMCS are useful in a wide va- James Pustejovsky’s Generative Lexicon (GL) [12], a riety of applications where undisambiguated text is theory of that focuses on the distributed na- used. One example of this is improving the accuracy of ture of compositionality in natural language. Unlike [8]. ConceptNet can also be used to ConceptNet, however, the BSO focuses on the type help an intelligent user interface understand the user’s structure and argument structure as well as on rela- goals and views of the world [9]. For use of Concept- tionships between words. Net 3 as an evaluative tool please see [6]. An extensive An important part of GL is its network of qualia summary of applications using the ConceptNet frame- relations that characterize the relationships between work can be found in [10]. words in the lexicon, and this structure is significantly similar to the set of ConceptNet relations. There are 2.2 Multilingual Knowledge Collection four types of qualia relations: formal, the basic type distinguishing the meaning of a word; constitutive, the In 2005, a sister project to Open Mind Common Sense relation between an object and its parts; telic, the pur- was established at the Universidade Federal de S˜ao pose or function of the object; and agentive, the factors Carlos, in order to collect common sense knowledge involved in the object’s origins [12]. in Portuguese [2]. The Open Mind Commonsense no We’ve noticed that these qualia relations line up Brasil project has now collected over 160,000 state- well with ConceptNet 3 relations. IsA maps well to ments from its contributors. GlobalMind [3], a project the formal qualia, PartOf to the constitutive, Used- to collect similar knowledge in Korean, Japanese, and For to the telic. The closest relation in ConceptNet 2 Chinese and to encourage users to translate knowledge to the agentive relation was the CapableOfReceiving- among these languages and English, was launched in Action relation, but this is too general, as it describes 2006. These projects expand the population that can many things that can happen to an object besides how it comes into being. In order to further this GL com- Relation Example sentence pattern patibility, we’ve added the CreatedBy relation and im- IsA NP is a kind of NP . plemented targeted elicitation frames to collect state- MadeOf NP is made of NP . UsedFor NP is used for VP . ments that correspond with the agentive qualia. CapableOf NP can VP . DesireOf NP wants to VP . CreatedBy You make NP by VP . 3 The Design of ConceptNet 3 InstanceOf An example of NP is NP . PartOf NP is part of NP . In developing ConceptNet 3, we drew on our expe- PropertyOf NP is AP . rience with working with ConceptNet as users and EffectOf The effect of VP is NP |VP . observed what improvements would make it easier to work with. The new architecture of ConceptNet is Table 1: Some of the specific relation types in Con- more suitable to being incrementally updated, being ceptNet 3, along with an example of a sentence pattern populated from different data sources, and searching that produces each type in complex queries such as those that are necessary to discover common-sense analogies. We believe that these improvements make ConceptNet more accessi- ble to a variety of developers of artificial intelligence predicates to express this new predicate in natural lan- applications. guage.

3.1 Concepts The basic nodes of ConceptNet are concepts, which are aspects of the world that people would talk about 3.3 Modular Structure in natural language. Concepts correspond to se- lected constituents of the common-sense statements ConceptNet 3 is built on top of the Common Sense that users have entered; they can represent noun Application Model of Architecture (CSAMOA) [1], a phrases, verb phrases, adjective phrases, or preposi- four-layer software design pattern intended to ease the tional phrases (when describing locations). They tend development of common sense applications. By divid- to represent verbs only in complete verb phrases, so ing components of common sense reasoning along con- “go to the store” and “go home” are more typical con- sistent lines, CSAMOA encourages the development of cepts than the bare verb “go”. reusable and interchangeable software components. Although they are derived from constituents, con- cepts are not literal strings of text; a concept can rep- The layers of CSAMOA, in order, are the Cor- resent many related phrases, through the normaliza- pus layer, which preserves original, human representa- tion process described later. tion of common sense knowledge; the Representation layer, which abstracts the knowledge into a machine- interpretable form; the Realm layer, which helps nav- 3.2 Predicates igate or performs generic computations on the struc- ture of the machine-interpretable representation; and In a semantic network where concepts are the nodes, the Application layer, which is devoted to processing the edges are predicates, which express relationships all user interactions and performing all other opera- between two concepts. Predicates are extracted from tions pursuant to the particular application. Concept- the natural language statements that contributors en- Net 3 was developed as a Representation layer for use ter, and express types of relationships such as IsA, with OMCS as the Corpus layer. PartOf, LocationOf, and UsedFor. Our 21 basic rela- tion types are not a closed class, and we plan to add The use of CSAMOA and its emphasis on modular- more in the future. ity represents a major change in the design choices un- In addition to these specific relation types, there derlying ConceptNet. In particular, we want the var- are also some underspecified relation types such as ious components of ConceptNet, such as the ConceptuallyRelatedTo, which says that a relation- or reasoning components, to be customizable for dif- ship exists between two concepts, but we can’t de- ferent applications. For instance, the parsing patterns termine from the sentence what it is. Though they can be changed to handle different forms of natural are vague, these connections can help to provide infor- language input, and the NLP procedures themselves mation about the context around a concept, and they can be replaced in order to generate a ConceptNet in provide a fallback for cases where the parser is un- a language besides English. able to parse a sentence. They are also used in several The most notable improvements CSAMOA has current applications [10]. brought to ConceptNet are in its processing-oriented Predicates maintain a connection to natural lan- architecture. ConceptNet’s data, data models, and guage by keeping a reference to the original sentence processing code are now clearly separated, which per- that generated them, as well as the substrings of the mitted many advances in adding multiple language ca- sentence that produced each of their concepts. This pabilities and improving the extraction of knowledge way, if the computer generates a new predicate without from unparsed text. ConceptNet’s role in larger ap- human input, like when it forms a hypothesis based plications is also more clearly defined, allowing for the on other knowledge, it can follow the example of other simplification of the code base. 4 Creating ConceptNet

100000 English, without singletons English, with singletons 4.1 Pattern Matching Portuguese, without singletons Portuguese, with singletons Predicates in ConceptNet are created by a pattern- 10000 matching process, as they have been in previous ver- sions [10]. We compare each sentence we have collected 1000 with an ordered list of patterns, which are regular ex- pressions that can also include additional constraints 100 on phrase types based on the output of a natural lan- Number of concepts guage tagger and chunker. These patterns represent sentence structures that are commonly used to ex- 10 press the various relation types in ConceptNet. Ta- ble 1 shows some examples of patterns that express 1 different relations. The phrases that fill the slots in a 0 1 2 3 4 5 6 7 8 Content words in concept pattern are the phrases that will be turned into con- cepts. Many of these patterns correspond to elicitation frames that were presented on the OMCS website for Fig. 1: The number of words in the texts of concepts users to fill in; the fact that so many sentences were after normalization. The “without singletons” lines elicited with predictable sentence structures means leave out sporadic concepts that only appear in one that these sentences can be reliably turned into pred- predicate, discarding many phrases that are too long icates. to be useful concepts Other patterns, such as “NP is a NP ”, represent sentence structures that contributors commonly used when entering knowledge as free text. For these pat- terns, the constraints on phrase types (such as NP ) that are only related by accidents of orthography, be- imposed by the chunker are particularly important to cause we have found this to be an appropriate level of prevent false matches. granularity for reasoning about undisambiguated nat- Before a sentence goes through the pattern- ural language text collected from people. matching process, common typographical errors and spelling mistakes are corrected using a simple replace- ment dictionary. If the sentence is a complex sentence 4.3 Open Mind Commons with multiple clauses, we use patterns to extract sim- ConceptNet would be nothing without the ability to pler sentences out of it to run through the process. collect knowledge from contributors on the Internet. The statements that currently comprise ConceptNet 4.2 Normalization were collected from the Open Mind Common Sense web site, which used prompts such as “What is one When a sentence is matched against a pattern, the reason that you would ride a bicycle?” to collect result is a “raw predicate” that relates two strings of statements of common sense from its users. text. The normalization process determines which two Open Mind Commons [15] is an update of the orig- concepts these strings correspond to, turning the raw inal knowledge-collection website, OMCS 1, built on predicate into a true edge of ConceptNet. top of ConceptNet 3. The interface now includes ac- The following steps are used to normalize a string: tivities that help refine its existing knowledge, by giv- ing feedback to its users about what it already knows 1. Remove punctuation. and what gaps seem to exist in its knowledge. 2. Remove stop words. This feedback arises from a process that discovers analogies among the existing knowledge in Concept- 3. Run each remaining word through a stemmer. We Net. If concept X and concept Y appear in corre- currently use Porter’s Snowball stemmer, in both sponding places in many equivalent predicates, they its English1 and Portuguese versions [11]. are considered to be similar concepts. Then, if con- cept X appears in a predicate that is not known about 4. Alphabetize the remaining stems, so that the or- concept Y , Open Mind Commons can hypothesize that der of content words in the phrase doesn’t matter. the same predicate is true for Y , and it can make this inference stronger by finding other similar concepts A concept, then, encompasses all phrases that nor- that lead to the same hypothesis. By following the malize to the same text. As normalization often results links to natural language that are maintained in Con- in unreadable phrases such as “endang plant speci” ceptNet, it can turn the hypothesized predicate into a (from “an endangered species of plant”), the normal- natural language question, which it asks to a user of ized text is only used to group phrases into concepts, the site. never as an external representation. This grouping in- Another kind of question that Open Mind Commons tentionally lumps together many phrases, even ones will ask based on analogy is a “fill in the blank” ques- tion: if it determines that it doesn’t know enough pred- 1 For compatibility with previous work, we use the original version of the English Snowball stemmer (the one commonly icates of a certain type about a concept, compared to called “the Porter stemmer”), not the revised version. what it knows about similar concepts, it will ask the 1e+06 English Portuguese

100000

10000

1000 Number of predicates 100

10

1 0 2 4 6 8 10 12 14 16 Fig. 2: Open Mind Commons asks questions to fill Number of times asserted gaps in its knowledge

Fig. 3: The distribution of scores among predicates extracted from OpenMind user to fill in that predicate. Figure 2 shows Commons asking both kinds of questions about the topic ocean. Asking questions based on analogies serves to make the database’s knowledge more strongly connected, as 4.5 Polarity it eliminates gaps where simply no one had thought to say a certain fact; it also helps to confirm to contrib- In ConceptNet 3, we have introduced the ability to utors that the system is understanding and learning represent negative assertions. This capability allows from the data it acquires. us to develop interfaces that may ask a question of a user and draw reasonable conclusions when the answer is “no.” The pattern matching process includes addi- tional patterns, which match sentences expressing the 4.4 Reliability of Assertions negation of one of our relation types. To this end, we added a polarity parameter to our predicate models that can assume the values 1 and In ConceptNet 3, each predicate has a score that repre- −1, and we introduced a collection of extraction pat- sents its reliability. This score comes from two sources terns that mirror most of our other extraction pattern so far. A user on Open Mind Commons can evalu- but detect negation. About 1.8% of the English predi- ate an existing statement and increase or decrease its cates and 4.4% of the Portuguese predicates currently score by one point. The score can also be implicitly in- in ConceptNet 3 have a negative polarity. creased when multiple users independently enter sen- Importantly, score and polarity are independent tences that map to the same predicate, and this is quantities. A predicate with a negative polarity can where the majority of scores come from so far. have a high, positive score, indicating that multiple The default score for a statement is 1—it is sup- users have attested the negative statement (an exam- ported by one person: the person who entered it. ple is “People don’t want to be hurt”). Predicates Statements with zero or negative scores (because a with a zero or negative score, meanwhile, are usually user has decreased their score) are considered unre- unhelpful or nonsensical statements such as “Joe is a liable, and are not used for analogies in Open Mind cat” or “A garage is for asdfghjkl”, not statements that Commons. Statements with positive scores contribute are “false” in any meaningful way. to analogies with a weight that scales logarithmically with their score. In general, a significant number of predicates were 5 Evaluation asserted multiple times; Figure 3 shows the distribu- tion of scores among all these predicates. Surprisingly, The quality of the data collected by OMCS was mea- although the Portuguese corpus has been around for a sured in a 2002 study [14]. Human judges evaluated shorter time and has fewer predicates, its predicates a random sample of the corpus and gave positive re- tend to have higher scores. The fact that all Por- sults, judging three quarters of the assertions to be tuguese statements were entered through structured “largely true”, over four fifths to be “largely objective templates, not through free text, may have caused and sensible”, and 84% “common enough to be known them to coincide more often. by someone by high school”. The highest-scored predicate in the English OMCS Here, we evaluate ConceptNet 3 in a different way: is “Dogs are a kind of animal”, asserted independently by testing how often its assertions align with asser- by 101 different users. The highest-scored predicate in tions in similar lexical resources. The structure of Cyc Portuguese is “Pessoas dormem quando elas est˜aocom is not readily aligned with ConceptNet, but WordNet sono” (“People sleep when they are tired”), asserted and the BSO both contain information that is compa- independently by 318 users. rable to a subset of ConceptNet. In particular, certain ConceptNet relations correspond to WordNet’s point- Resource Type Hit Miss No comparison ers and the BSO’s qualia, as follows: WordNet IsA 2530 3065 1267 WordNet PartOf 653 1344 319 ConceptNet WordNet BSO WordNet Random 245 5272 1268 IsA Hypernym Formal BSO IsA 1813 2545 2044 BSO PartOf 26 49 2241 PartOf Meronym Constitutive BSO UsedFor 382 1584 3177 UsedFor none Telic BSO Random 188 4456 2142

BSO’s fourth qualia type, Agentive, corresponds to Table 2: The results of the evaluation. A “hit” the ConceptNet relation CreatedBy, but this relation is when the appropriate concepts exist in the target is new in ConceptNet 3 and we have not yet collected database and the correct relationship holds between examples of it from the public. them, a “miss” is when the concepts exist but the rela- In this evaluation, we examine IsA, PartOf, and tionship does not hold, and “no comparison” is when UsedFor predicates in ConceptNet, and check whether one or both concepts do not exist in the target database an equivalent relationship holds between equivalent entries in WordNet and the BSO. The test set consists Missed predicate Reason for difference of all predicates of these types where both concepts Swordfish is a novel. unreliable normalize to a single word (that is, they each contain Bill is a name. WordNet coverage one non-stopword), as these are the concepts that are Sam is a guy. vague most likely to have counterparts in other resources. (offensive statement) unreliable Such predicates make up 11.1% of the UsedFor rela- A gymnasium is a hall. vague tions, 21.0% of IsA, and 31.2% of the PartOf relations Babies are fun. misparsed in ConceptNet. Newsprint is a commodity. WordNet coverage For each predicate, we determine whether there ex- Biking is a sport. WordNet coverage ists a connection between two entries in WordNet or Cats are predators. WordNet coverage Seeds are food. WordNet coverage the BSO that have the same normalized form (stem) and the appropriate part of speech (generally nouns, except that the second argument of UsedFor is a verb). Table 3: A sample of ConceptNet predicates that do This allows us to make comparisons between the dif- not hold in WordNet, with an assessment of whether ferent resources despite the different granularities of the difference comes from unreliable/vague informa- their entries. If such a connection exists, we classify tion in ConceptNet or a difference in coverage the predicate as a “hit”; if no such connection exists between the corresponding entries, we classify it as a “miss”; and if no match is possible because a resource UsedFor against WordNet). As a control to show that has no entries with one of the given stems, we classify not too many hits arose from random noise, we also it as “no comparison”. tested “randomized IsA predicates”. These predicates The criterion for determining whether “a connection were created by making random IsA predicates out of exists” does not require the connection to be expressed the shuffled arguments of the IsA predicates we tested, by a single pointer or qualia. For example, the only so that these predicates would express nonsense state- direct hypernym of the first sense of “dog” in Word- ments such as “soy is a kind of peninsula”. Indeed, Net is “canine”, but we want to be able to match more few of these predicates were hits compared to real Con- general statements such as “a dog is an animal”. So ceptNet predicates, even though IsA predicates are the instead, we check whether the target database con- most likely to match by chance. Table 2 presents the tains the appropriate relation from the first concept results, and Figure 4 charts the success rates for each to the second concept or to any ancestor of the sec- trial (the ratios of hits to hits plus misses). ond concept under the IsA relation (that is, the hy- A Pearson’s chi-square test of independence showed pernym relation or the formal qualia). Under this that the difference in the hit vs. miss distribution be- criterion, ConceptNet’s (IsA “dog” “anim”) matches tween the real predicates and the randomly-generated against WordNet, as “anim” is the Porter stem of “an- ones is statistically very significant, with p < 0.001 imal”, WordNet contains a noun sense of “dog” that (df = 1) for each relation type. WordNet has χ2 = has a hypernym pointer to “canine”, and a series of 2465.3 for IsA predicates and χ2 = 1112.7 for PartOf hypernym pointers can be followed from “canine” to predicates compared to random predicates; the BSO reach a sense of “animal”. has χ2 = 1834.0 for IsA, χ2 = 159.8 for PartOf, and 2 There are two major classes of “misses”. Sometimes, χ = 414.7 for UsedFor compared to random predicates. a ConceptNet predicate does not hold in another re- source because the ConceptNet predicate is unreliable, vague, or misparsed; on the other hand, sometimes the 6 Discussion ConceptNet predicate is correct, and the difference is simply a difference in coverage. We have assessed a As a resource, ConceptNet differs from most available sample of 10 misses between ConceptNet and Word- corpora in the nature and structure of its content. Un- Net in Table 3, and between ConceptNet and the BSO like free text corpora, each sentence of OMCS was in Table 4. entered by a goal-directed user hoping to contribute We ran this evaluation independently for IsA, Used- common sense, resulting in a wealth of statements that For, and PartOf predicates, against each of WordNet focus on simple, real-world concepts that often go un- and the BSO (except that it is not possible to evaluate stated. ConceptNet predicate Reason for difference References A contest is a game. BSO coverage A spiral is a curve. BSO coverage [1] J. Alonso. CSAMOA: A common sense application model of architecture. Proceedings of the Workshop on Common Sense A robot is a worker. vague and Intelligent User Interfaces, 2007. A cookie is a biscuit. BSO coverage; regional An umbrella is waterproof. misparsed [2] J. Anacleto, H. Lieberman, M. Tsutsumi, V. Neris, A. Car- valho, J. Espinosa, and S. Zem-Mascarenhas. Can common A peanut is a legume. BSO coverage sense uncover cultural differences in computer applications? In A hunter is a camper. BSO coverage Proceedings of IFIP World Computer Conference, Santiago, A clone is a copy. BSO coverage Chile, 2006. The president is a liar. unreliable [3] H. Chung. GlobalMind — bridging the gap between different People are hairdressers. unreliable cultures and languages with common-sense computing. PhD thesis, MIT Media Lab, 2006.

[4] C. Fellbaum, editor. WordNet: An Electronic Lexical Table 4: A sample of ConceptNet predicates that do Database. MIT Press, Cambridge, MA, 1998. not hold in the BSO, with an assessment of whether the difference is due to unreliable information in Con- [5] P. Grice. Logic and conversation. In Speech Acts. Academic Press, 1975. ceptNet or a difference in coverage [6] C. Havasi. An evaluation of the brandeis semantic ontology. To Appear In Proceedings of the Fourth International Workshop on Generative Approaches to the Lexicon, 2007.

[7] D. Lenat. Cyc: A large-scale investment in knowledge infras- tructure. Communications of the ACM, 11:33–38, 1995. 50% WordNet BSO [8] H. Lieberman, A. Faaborg, W. Daher, and J. Espinosa. How to 45% wreck a nice beach you sing calm incense. Proceedings of the 10th international conference on Intelligent user interfaces, 40% 2005.

35% [9] H. Lieberman, H. Liu, P. Singh, and B. Barry. Beating some

30% common sense into interactive applications. AI Magazine, 25(4):63–76, 2004. 25% [10] H. Liu and P. Singh. ConceptNet: a practical commonsense 20%

Percentage of hits reasoning toolkit. BT Technology Journal, 2004.

15% [11] M. F. Porter. Snowball: a language for stem- ming algorithms. Snowball web site, 2001. 10% http://snowball.tartarus.org/texts/introduction.html – accessed Jan. 31, 2007. 5%

0% [12] J. Pustejovsky. The Generative Lexicon. MIT Press, Cam- Random IsA PartOf UsedFor bridge, MA, 1998.

[13] J. Pustejovsky, C. Havasi, R. Saur´ı, P. Hanks, and A. Rumshisky. Towards a generative lexical resource: The Brandeis Semantic Ontology. Proceedings of the Fifth Lan- Fig. 4: When ConceptNet predicates can be mapped guage Resource and Evaluation Conference, 2006. onto relations between WordNet and BSO entries, they [14] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. match a significant percentage of the time Zhu. Open Mind Common Sense: Knowledge acquisition from the general public. In Proceedings of First International Con- ference on Ontologies, , and Applications of Se- mantics for Large Scale Information Systems, Irvine, Cali- fornia, 2002.

[15] R. Speer. Open Mind Commons: An inquisitive approach to learning common sense. Proceedings of the Workshop on Our evaluation has shown that our information fre- Common Sense and Interactive Applications, 2007. quently overlaps with two expert-created resources, WordNet and the Brandeis Semantic Ontology, on the types of predicates where they are comparable. The goal of ConceptNet is not just to emulate these other resources, though; it also contains useful information beyond what is found in WordNet or the BSO. For example, many “misses” in our evaluation are useful statements in ConceptNet that simply do not appear in the other resources we evaluated it against, such as “sauce is a part of pizza”, “a son is part of a family”, and “weekends are used for recovery”.

In addition, ConceptNet expresses many important types of relations that we did not evaluate here, such as CapableOf (“fire can burn you”, “birds can fly”), LocationOf (“you would find a stapler on a desk”, “you would find books at a library”), and EffectOf (“an ef- fect of opening a gift is surprise”, “an effect of exercise is sweating”). All of these kinds of information are im- portant in giving AI applications the ability to reason about the real world.