Semanticnet-Perception of Human Pragmatics
Total Page:16
File Type:pdf, Size:1020Kb
SemanticNet-Perception of Human Pragmatics Amitava Das 1 and Sivaji Bandyopadhyay 2 Department of Computer Science and Engineering Jadavpur University [email protected] 1 [email protected] 2 technical. It is often used in ordinary language Abstract to denote a problem of understanding that comes down to word selection or connotation. SemanticNet is a semantic network of We studied with various Psycholinguistics ex- lexicons to hold human pragmatic periments to understand how human natural knowledge. So far Natural Language intelligence helps to understand general se- Processing (NLP) research patronized mantic from nature. Our study was to under- much of manually augmented lexicon stand the human psychology about semantics resources such as WordNet. But the beyond language. We were haunting for the small set of semantic relations like intellectual structure of the psychological and Hypernym, Holonym, Meronym and neurobiological factors that enable humans to Synonym etc are very narrow to cap- acquire, use, comprehend and produce natural ture the wide variations human cogni- languages. Let’s come with an example of tive knowledge. But no such informa- simple conversation about movie between two tion could be retrieved from available persons. lexicon resources. SemanticNet is the Person A: Have you seen the attempt to capture wide range of con- movie ‘ No Man's Land’? How text dependent semantic inference is it? among various themes which human Person B: Although it is beings perceive in their pragmatic good but you should see knowledge, learned by day to day cog- ‘The Hurt Locker’? nitive interactions with the surrounding May be the conversation looks very casual, physical world. SemanticNet holds but our intension was to find out the direction human pragmatics with twenty well es- of the decision logic on the Person B’s brain. tablished semantic relations for every We start digging to find out the nature of hu- pair of lexemes. As every pair of rela- man intelligent thinking. A prolonged discus- tions cannot be defined by fixed num- sion with Person B reveals that the decision ber of certain semantic relation labels logic path to recommend a good movie was as thus additionally contextual semantic the Figure 1. The highlighted red paths are the affinity inference in SemanticNet could shortest semantic affinity distances of the hu- be calculated by network distance and man brain. represented as a probabilistic score. We call it semantic thinking. Although the SemanticNet is being presently devel- derivational path of semantic thinking is not oped for Bengali language. such easy as we portrait in Figure 1 but we keep it easier for understandability. Actually a 1 Historical Motivation human try to figure out the closest semantic Semantics (from Greek "σηµαντικός " - seman- affinity node into his pragmatics knowledge by tikos ) is the study of meaning, usually in lan- natural intelligence. In the previous example guage. The word "semantics" itself denotes a Person B find out with his intelligence that No range of ideas, from the popular to the highly Man's Land is a war movie and got Oscar 2 Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon (CogALex 2010), pages 2–11, Beijing, August 2010 award. Oscar award generally cracked by Hol- sense (as in ConceptNet) to human pragmatics lywood movies and thus Person B start search- and have proposed semantic relations for every ing his pragmatics network to find out a movie pair of lexemes that cannot be defined by fixed fall into war genre, from Hollywood and may number of certain semantic relation labels. be got Oscar award. Person B finds out the Contextual semantic affinity inference in Se- name of a movie The Hurt Locker at nearer manticNet could be calculated by network dis- distance into his pragmatics knowledge net- tance and represented as a probabilistic score. work which is an optimized recommendation SemanticNet is being presently developed for that satisfy all the criteria. Noticeably Person B Bengali language. didn’t choice the other paths like Bollywood, Foreign movie etc. 2 Semantic Roles The ideological study of semantic roles started age old ago since Panini’s karaka theory that assigns generic semantic roles to words in a natural language sentence. Semantic roles are generally domain specific in nature such as FROM_DESTINATION,TO_DESTINATION, DEPARTURE_TIME etc. Verb-specific se- mantic roles have also been defined such as EATER and EATEN for the verb eat. The standard datasets that are used in various Eng- lish SRL systems are: PropBank (Palmer et al., Figure 1: Semantic Thinking 2005), FrameNet (Fillmore et al., 2003) and And thus our aim was to develop a computa- VerbNet (Kipper et al., 2006). These collec- tional lexicon structure for semantics as human tions contain manually developed well-trusted pragmatics knowledge. We spare long time to gold reference annotations of both syntactic find out the most robust structure to represent and predicate-argument structures. pragmatics knowledge properly and it should PropBank defines semantic roles for each be easy understandable for next level of search verb. The various semantic roles identified and usability. (Dowty, 1991) are Agent, patient or theme etc. We look into literature that probably direct In addition to verb-specific roles, PropBank to the direction of our ideological thinking. We defines several more general roles that can ap- found that in the year of 1996 Push Singh and ply to any verb (Palmer et al., 2005). Marvin Minsky proposed the field has shat- FrameNet is annotated with verb frame se- tered into subfields populated by researchers mantics and supported by corpus evidence. with different goals and who speak very differ- The frame-to-frame relations defined in Fra- ent technical languages. Much has been meNet are Inheritance, Perspective_on, Sub- learned, and it is time to start integrating what frame, Precedes, Inchoative_of, Causative_of we've learned, but few researchers are widely and Using. Frame development focuses on pa- versed enough to do so. They had a proposal raphrasability (or near paraphrasability) of for how to do so in their ConceptNet work. words and multi-words. They developed lexicon resources like Con- VerbNet annotated with thematic roles refer ceptNet (Liu and Singh, 2004). ConceptNet- to the underlying semantic relationship be- ConceptNet is a large-scale semantic network tween a predicate and its arguments. The se- (over 1.6 million links) relating a wide variety mantic tagset of VerbNet consists of tags as of ordinary objects, events, places, actions, and agent, patient, theme, experiencer, stimulus, goals by only 20 different link types, mined instrument, location, source, goal, recipient, from corpus. benefactive etc. The present task of developing SemanticNet It is evident from the above discussions that is to capture semantic affinity knowledge of no adequate semantic role set exists that can be human pragmatics as a lexicon database. We defines across various domains. Hence pro- extend our vision from the human common 3 posed SemanticNet does not only rely on fixed Fortunately such corpus development could type of semantics roles as ConceptNet. For be found in (Ekbal and Bandyopadhyay, 2008) semantic relations we followed the 20 relations for Bengali. We obtained the corpus from the defined in ConceptNet. Additionally we pro- authors. The Bengali NEWS corpus consisted posed semantic relations for every pair of lex- of consecutive 4 years of NEWS stories with icons cannot be defined by exact semantic role various sub domains as reported above. For the and thus we formulated a probabilistic score present task we have used the Bengali NEWS based technique. Semantic affinity in Seman- corpus, developed from the archive of a lead- ticNet could be calculated by network distance. ing Bengali NEWS paper 2 available on the Details could be found in relevant Section 8. Web. The NEWS corpus is quite larger in size as reported in Table 1. 3 Corpus 4 Annotation Present SemanticNet has been developed for Bengali language. Resource acquisition is one From the collected document set 200 docu- of the most challenging obstacles to work with ments have been chosen randomly for the an- electronically resource constrained languages notation task. Three annotators (Mr. X, Mr. Y like Bengali. Although Bengali is the sixth 1 and Mr. Z) participated in the present task. popular language in the World, second in India Annotators were asked to annotate the theme and the national language in Bangladesh. words (topical expressions) which best de- There was another issue drive us long way scribe the topical snapshot of the document. to find out the proper corpus for the develop- The agreement of annotations among three ment of SemanticNet. As the notion is to cap- annotators has been evaluated. The agreements ture and store human pragmatic knowledge so of tag values at theme words level is reported the hypothesis was chosen corpus should not in Table 2. be biased towards any specific domain know- ledge as human pragmatic knowledge is not Annotators X vs. Y X Vs. Z Y Vs. Z Avg constricted to any domain rather it has a wide Percentage 82.64% 71.78% 80.47% 78.3% spread range over anything related to universe All Agree 75.45% and life on earth. Additionally it must be larger Table 2: Agreement of annotators at theme in size to cover mostly available general con- words level cepts related to any topic. After a detail analy- 5 Theme Identification sis we decided it is better to choose NEWS corpus as various domains knowledge like Pol- Term Frequency (TF) plays a crucial role to itics, Sports, Entertainment, Social Issues, identify document relevance in Topic-Based Science, Arts and Culture, Tourism, Adver- Information Retrieval. The motivation behind tisement, TV schedule, Tender, Comics and developing Theme detection technique is that Weather etc are could be found only in NEWS in many documents relevant words may not corpus.