Arxiv:2006.06114V2 [Cs.AI] 22 Jun 2020 3

Consolidating Commonsense Knowledge Filip Ilievski [email protected] Pedro Szekely [email protected] Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA Jingwei Cheng [email protected] Fu Zhang [email protected] School of Computer Science and Engineering, Northeastern University, Shenyang, China Ehsan Qasemi [email protected] Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA Abstract Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths, and weaknesses. In this paper, we list representative sources and their properties. Based on this survey, we propose principles and a representation model in order to consolidate them into a Common Sense Knowledge Graph (CSKG). We apply this approach to consolidate seven separate sources into a first integrated CSKG. We present statistics of CSKG, present initial investigations of its utility on four QA datasets, and list learned lessons. 1. Introduction Capturing, representing, and leveraging commonsense knowledge has been a paramount for AI since its early days, cf. [McCarthy, 1960]. In the light of the modern large (commonsense) knowledge graphs and various neural advancements, the DARPA Machine Common Sense program [Gunning, 2018] represents a new effort to understand commonsense knowledge through question-answering evaluation benchmarks. An example of such question from the SWAG dataset [Zellers et al., 2018] describes a woman that takes a sit at the piano: Q: On stage, a woman takes a seat at the piano. She: 1. sits on a bench as her sister plays with the doll. 2. smiles with someone as the music plays. arXiv:2006.06114v2 [cs.AI] 22 Jun 2020 3. is in the crowd, watching the dancers. -> 4. nervously sets her fingers on the keys. Realizing that the logical next step is her \nervously setting her fingers on the keys" is out of reach for typical information retrieval strategies, as there is no lexical overlap between the situation and the correct answer. Although language models [Devlin et al., 2018, Liu et al., 2019] capture linguistic patterns that allow them to perform well on many questions, they have no mechanism to fill gaps of knowledge in communication.1. Filling such gaps 1. Recent work provides evidence that language models, while probably useful and often impressive, are non-robust when applied to semantic tasks, lacking mechanisms to understand plausibility or keep track of evolving states of events and entities [Marcus, 2020]. They struggle with higher number of inference Ilievski, Szekely, Cheng, Zhang, & Qasemi requires a more complex, situational reasoning, for which the language models need to be enriched with suitable background knowledge, as in [Lin et al., 2019]. Intuitively, graphs of (commonsense) knowledge contain such background knowledge that humans possess and apply, but machines cannot access or distill directly in communication. A number of such knowledge sources exist today, which presents a unique opportunity for reasoning in downstream tasks. Taxonomies, like WordNet [Miller, 1995], organize conceptual knowledge into a hierarchy of classes. An independent ontology, cou- pled with rich instance-level knowledge, is provided by Wikidata [Vrandeˇcićand Krötzsch, 2014], a structured version of Wikipedia. FrameNet [Baker et al., 1998], on the other hand, defines an orthogonal structure of frames and roles; each of which can be filled with a Word- Net/Wikidata class or instance. Sources like ConceptNet [Speer et al., 2017] or WebChild [Tandon et al., 2017], provide more èpisodic' commonsense knowledge, whereas ATOMIC [Sap et al., 2019a] captures pre- and post-situations for an event. Finally, image description datasets, like Visual Genome [Krishna et al., 2017], have visual commonsense knowledge. Considering the above example, ConceptNet's triples state that pianos have keys and are used to perform music, which supports the correct option and discourages answer 2. WordNet states specifically, though in natural language, that pianos are played by pressing keys. According to an image description in Visual Genome, a person could play piano while sitting and having their hands on the keyboard. In natural language, ATOMIC indicates that before a person plays piano, they need to sit at it, be on stage, and reach for the keys. ATOMIC also lists strong feelings associated with playing piano. FrameNet's frame of a performance contains two separate roles for the performer and the audience, meaning that these two are distinct entities, which can be seen as evidence against answer 3. While these sources clearly provide complementary knowledge that can help commonsense reasoning, their representation formats, principles and foci are different, making integration difficult. In this paper, we propose an approach for integrating these (and more sources) into a single Common Sense Knowledge Graph (CSKG). We start by surveying existing sources of commonsense knowledge to understand their particularities (section2). We summarize key challenges and related efforts on consolidating commonsense knowledge in section3. Based on the survey and the listed challenges, we devise five principles and a representation model for a consolidated CSKG (section4). In section5 we apply our approach to build the first version of CSKG, by combining seven complementary, yet disjoint, sources. Here, we also compare the evidence provided by CSKG compared to ConceptNet on four commonsense QA datasets. In section6 we reflect on CSKG and discuss its role in future research. We conclude the paper in section7. 2. Sources of Common Sense Knowledge We survey existing commonsense knowledge sources: ConceptNet [Speer et al., 2017], WebChild [Tandon et al., 2017], ATOMIC [Sap et al., 2019a], Wikidata [Vrandeˇcić and Krötzsch, 2014], CEO [Segers et al., 2018], WordNet [Miller, 1995], Roget [Kipfer, steps [Richardson and Sabharwal, 2019], role-based event prediction [Ettinger, 2020], as well as numeric, emotional, and spatial inference [Bhagavatula et al., 2019]. On the other hand, systems like KagNet [Lin et al., 2019] and HyKAS [Ma et al., 2019] have managed to enhance language models by combining them with background knowledge from ConceptNet [Speer et al., 2017] CSKG Table 1: Survey of existing sources of commonsense knowledge. describes creation size mappings examples Concept everyday ob- crowd- 36 relations, 8M WordNet, /c/en/piano Net jects, actions, sourcing nodes, 21M edges DBpedia, /c/en/piano/n states, relations OpenCyc, /c/en/piano/n/wn (multilingual) Wiktionary /r/relatedTo Web everyday ob- curated 4 relation groups, 2M WordNet hasTaste Child jects, actions, automatic nodes, 18M edges fasterThan states, relations extraction ATOMIC event pre/post- crowd- 9 relations, 300k ConceptNet, wanted-to conditions sourcing nodes, 877k edges Cyc impressed Wikidata instances, con- crowd- 1.2k relations, 75M various wd:Q1234 wdt:P31 cepts, relations sourcing objects, 900M edges CEO event pre/ post- manual 121 properties, 223 FrameNet, ceo:Damaging conditions events SUMO hasPostSituation WordNet words, concepts, manual 10 relations, 155k dog.n.01 relations words, 176k synsets hypernymy Roget words, relations manual 2 relations, 72k truncate words, 1.4M edges antonym VerbNet verbs, rela- manual 273 top classes 23 FrameNet, perform-v tions roles, 5.3k senses WordNet performance-26.7-1 FrameNet frames, roles, manual 1.9k edges, 1.2k Activity relations frames, 12k roles, Change of leadership 13k lexical units New leader Visual image objects, crowd- 42k relations, 3.8M WordNet fire hydrant Genome relations, at- sourcing nodes, 2.3M edges, white dog tributes 2.8M attributes ImageNet image objects crowd- 14M images, 22k WordNet dog.n.01 sourcing synsets Flickr image objects crowd- 30k images, 750 ob- her backyard 30k sourcing jects red bags 2005], VerbNet [Schuler, 2005], FrameNet [Baker et al., 1998], Visual Genome [Krishna et al., 2017], ImageNet [Deng et al., 2009], and Flickr30k [Plummer et al., 2016].2 Table1 summarizes their content, creation method, size, external mappings, and example resources. Primarily, we observe that the commonsense knowledge is spread over a number of sources with different focus: commonsense knowledge graphs (e.g., ConceptNet), general- domain knowledge graphs (e.g., Wikidata), lexical resources (e.g., WordNet, FrameNet), taxonomies (e.g., Wikidata, WordNet), and visual datasets (e.g., Visual Genome). There- fore, these sources together cover a rich spectrum of knowledge, ranging from everyday knowledge, through event-centric knowledge and taxonomies, to visual knowledge. While the taxonomies have been created manually by experts, most of the commonsense and visual sources have been created by crowdsourcing or curated automatic extraction.3 Simi- larly, commonsense and general knowledge graphs tend to be relatively large, with millions of nodes and edges; whereas the taxonomies and the lexical sources are notably smaller. 2. Labels that refer to the same image object in Flickr30k were clustered by van Miltenburg[2016]. 3. Commonsense subsets of existing knowledge sources are sometimes also included, e.g., ConceptNet reuses knowledge from Wiktionary and DBpedia. Ilievski, Szekely, Cheng, Zhang, & Qasemi Despite the diverse nature of these sources , we observe that

Arxiv:2006.06114V2 [Cs.AI] 22 Jun 2020 3

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support