COLING 2010
23rd International Conference on Computational Linguistics
Proceedings of the 8th Workshop on Asian Language Resources
21-22 August 2010 Beijing, China Produced by Chinese Information Processing Society of China All rights reserved for Coling 2010 CD production.
To order the CD of Coling 2010 and its Workshop Proceedings, please contact:
Chinese Information Processing Society of China No.4, Southern Fourth Street Haidian District, Beijing, 100190 China Tel: +86-010-62562916 Fax: +86-010-62562916 [email protected]
ii Preface
Language resources play a central role in statistical and learning-based approaches to natural language processing. Thus, recent research has put great emphasis in building these resources for target languages. Parallel resources across various languages are also being developed for multilingual processing. These include lexica and corpora with multiple levels of annotations. Though significant progress has been achieved in modeling few of the Asian languages, with the wider spread of ICT use across the region, there is a growing interest in this field from other linguistic communities. As research in the field matures across Asia, there is a growing need for developing language resources. However the region is not only short in the linguistic resources for more than 2200 language spoken in the region, there is also lack of experience in the researchers to develop these resources. As the efforts to develop the linguistic resources increases, there is also need to coordinate the efforts to develop common frameworks and processes so that these resources can be used by various groups of researchers equally effectively.
The workshop is organised by the Asian Language Resources Committee (ALRC) of Asian Federation for Natural Language Processing (AFNLP). The aim are to chart and catalogue the status of Asian Language Resources, to investigate and discuss the problems related to the standards and specification on creating and sharing various levels of language resources, to promote a dialogue between developers and users of various language resources in order to address any gaps in language resources and practical applications, and to nurture collaboration in their development and use, to provide opportunity for researchers from Asia to collaborate with researchers in other regions.
This is the eighth workshop in the series, and has been representative, with 35 submissions for Asian languages, including Bahasa Indonesia, Chinese, Dzongkha, Hindi, Japanese, Khmer, Sindhi, Sinhala, Thai, Turkish and Urdu, of which 22 have been finalized for presentation. We would like to thank the authors for their submissions and the Program Committee for their timely reviews. We hope that ALR workshops will continue to encourage researchers to focus on developing and sharing resources for Asian languages, an essential requirement for research in NLP.
ALR8 Workshop Organizers
iii Organizers: Sarmad Hussain, CLE-KICS, UET, Pakistan Virach Sornlertlamvanich, NECTEC, Thailand Hammam Riza, BPPT, Indonesia (on behalf of ALRC, AFNLP)
Program Committee: Mirna Adriani Pushpak Bhatacharyya Francis Bond Miriam Butt Thatsanee Charoenporn Key-Sun Choi Ananlada Chotimongkol Jennifer Cole Li Haizhou Choochart Haruechaiyasak Hitoshi Isahara Alisa Kongthon Krit Kosawat Yoshiki Mikami Cholwich Nattee Rachel Roxas Dipti Sharma Kiyoaki Shirai Thepchai Supnithi Thanaruk Theeramunkong Takenobu Tokunaga Ruvan Weerasinghe Chai Wutiwiwatchai Yogendra Yadava
iv Table of Contents
A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings Koichi Takeuchi, Kentaro Inui, Nao Takeuchi and Atsushi Fujita ...... 1
Collaborative Work on Indonesian WordNet through Asian WordNet (AWN) Chairil Hakim, Budiono Budiono and Hammam Riza ...... 9
Considerations on Automatic Mapping Large-Scale Heterogeneous Language Resources: Sejong Se- mantic Classes and KorLex Heum Park, Ae sun Yoon, Woo Chul Park and Hyuk-Chul Kwon ...... 14
Sequential Tagging of Semantic Roles on Chinese FrameNet Jihong LI, Ruibo WANG and Yahui GAO ...... 22
Augmenting a Bilingual Lexicon with Information for Word Translation Disambiguation Takashi Tsunakawa and Hiroyuki Kaji ...... 30
Construction of bilingual multimodal corpora of referring expressions in collaborative problem solving Takenobu Tokunaga, Ryu Iida, Masaaki Yasuhara, Asuka Terai, David Morris and Anja Belz. .38
Labeling Emotion in Bengali Blog Corpus A Fine Grained Tagging at Sentence Level Dipankar Das and Sivaji Bandyopadhyay ...... 47
SentiWordNet for Indian Languages Amitava Das and Sivaji Bandyopadhyay ...... 56
Constructing Thai Opinion Mining Resource: A Case Study on Hotel Reviews Choochart Haruechaiyasak, Alisa Kongthon, Pornpimon Palingoon and Chatchawal Sangkeet- trakarn...... 64
The Annotation of Event Schema in Chinese Hongjian Zou, Erhong Yang, Yan Gao and Qingqing Zeng ...... 72
Query Expansion for Khmer Information Retrieval Channa Van and Wataru Kameyama ...... 80
Word Segmentation for Urdu OCR System Misbah Akram and Sarmad Hussain ...... 88
Dzongkha Word Segmentation Sithar Norbu, Pema Choejey, Tenzin Dendup, Sarmad Hussain and Ahmed Muaz ...... 95
Building NLP resources for Dzongkha: A Tagset and A Tagged Corpus Chungku Chungku, Jurmey Rabgay and Gertrud Faaß ...... 103
Unaccusative/Unergative Distinction in Turkish: A Connectionist Approach Cengiz Acarturk and Deniz Zeyrek ...... 111
v A Preliminary Work on Hindi Causatives Rafiya Begum and Dipti Misra Sharma ...... 120
A Supervised Learning based Chunking in Thai using Categorial Grammar Thepchai Supnithi, Chanon Onman, Peerachet Porkaew, Taneth Ruangrajitpakorn, Kanokorn Trakultaweekoon and Asanee Kawtrakul ...... 129
A hybrid approach to Urdu verb phrase chunking Wajid Ali and Sarmad Hussain...... 137
Development of the Korean Resource Grammar: Towards Grammar Customization Sanghoun Song, Jong-Bok Kim, Francis Bond and Jaehyung Yang ...... 144
An Open Source Urdu Resource Grammar Shafqat Mumtaz Virk, Muhammad Humayoun and Aarne Ranta ...... 153
A Current Status of Thai Categorial Grammars and Their Applications Taneth Ruangrajitpakorn and Thepchai Supnithi ...... 161
Chained Machine Translation Using Morphemes as Pivot Language Li Wen, Chen Lei, Han Wudabala and Li Miao ...... 169
vi Conference Program
Saturday August 21, 2010
Semantics
9:00–9:25 A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings Koichi Takeuchi, Kentaro Inui, Nao Takeuchi and Atsushi Fujita
9:25–9:50 Collaborative Work on Indonesian WordNet through Asian WordNet (AWN) Chairil Hakim, Budiono Budiono and Hammam Riza
9:50–10:15 Considerations on Automatic Mapping Large-Scale Heterogeneous Language Re- sources: Sejong Semantic Classes and KorLex Heum Park, Ae sun Yoon, Woo Chul Park and Hyuk-Chul Kwon
10:15–10:40 Sequential Tagging of Semantic Roles on Chinese FrameNet Jihong LI, Ruibo WANG and Yahui GAO
10:40–11:00 Coffee Break
Semantics, Sentiment and Opinion
11:00–11:25 Augmenting a Bilingual Lexicon with Information for Word Translation Disam- biguation Takashi Tsunakawa and Hiroyuki Kaji
11:25–11:50 Construction of bilingual multimodal corpora of referring expressions in collabo- rative problem solving Takenobu Tokunaga, Ryu Iida, Masaaki Yasuhara, Asuka Terai, David Morris and Anja Belz
11:50–12:15 Labeling Emotion in Bengali Blog Corpus A Fine Grained Tagging at Sentence Level Dipankar Das and Sivaji Bandyopadhyay
12:15–12:40 SentiWordNet for Indian Languages Amitava Das and Sivaji Bandyopadhyay
12:40–14:10 Lunch Break
vii Saturday August 21, 2010 (continued)
Opinion and Information Retrival
14:10–14:35 Constructing Thai Opinion Mining Resource: A Case Study on Hotel Reviews Choochart Haruechaiyasak, Alisa Kongthon, Pornpimon Palingoon and Chatchawal Sang- keettrakarn
14:35–15:00 The Annotation of Event Schema in Chinese Hongjian Zou, Erhong Yang, Yan Gao and Qingqing Zeng
15:00–15:25 Query Expansion for Khmer Information Retrieval Channa Van and Wataru Kameyama
15:25–16:00 Coffee Break
Text Corpus
16:00–16:25 Word Segmentation for Urdu OCR System Misbah Akram and Sarmad Hussain
16:25–16:50 Dzongkha Word Segmentation Sithar Norbu, Pema Choejey, Tenzin Dendup, Sarmad Hussain and Ahmed Muaz
16:50–17:15 Building NLP resources for Dzongkha: A Tagset and A Tagged Corpus Chungku Chungku, Jurmey Rabgay and Gertrud Faaß
17:15–17:40 Unaccusative/Unergative Distinction in Turkish: A Connectionist Approach Cengiz Acarturk and Deniz Zeyrek
viii Sunday August 22, 2010
Grammars and Parsing
9:00–9:25 A Preliminary Work on Hindi Causatives Rafiya Begum and Dipti Misra Sharma
9:25–9:50 A Supervised Learning based Chunking in Thai using Categorial Grammar Thepchai Supnithi, Chanon Onman, Peerachet Porkaew, Taneth Ruangrajitpakorn, Kanokorn Trakultaweekoon and Asanee Kawtrakul
9:50–10:15 A hybrid approach to Urdu verb phrase chunking Wajid Ali and Sarmad Hussain
10:15–10:40 Development of the Korean Resource Grammar: Towards Grammar Customization Sanghoun Song, Jong-Bok Kim, Francis Bond and Jaehyung Yang
10:40–11:00 Coffee Break
Grammars and Applications
11:00–11:25 An Open Source Urdu Resource Grammar Shafqat Mumtaz Virk, Muhammad Humayoun and Aarne Ranta
11:25–11:50 A Current Status of Thai Categorial Grammars and Their Applications Taneth Ruangrajitpakorn and Thepchai Supnithi
11:50–12:15 Chained Machine Translation Using Morphemes as Pivot Language Li Wen, Chen Lei, Han Wudabala and Li Miao
12:15 Concluding Remarks and Discussion
ix A Thesaurus of Predicate-Argument Structure for Japanese Verbs to Deal with Granularity of Verb Meanings Koichi Takeuchi Kentaro Inui Nao Takeuchi Atsushi Fujita Okayama University / Tohoku University / Free Language Future University Hakodate / [email protected]. inui@ecei. Analyst [email protected] okayama-u.ac.jp tohoku.ac.jp
Abstract ables us to deal with detailed paraphrase/non- paraphrase relations in NLP. In this paper we propose a framework From the view of the current language re- of verb semantic description in order to source, how the shared/different meanings of organize different granularity of similar- “He lent her a bicycle” and “He gave her a bi- ity between verbs. Since verb mean- cycle” can be described? The shared mean- ings highly depend on their arguments ing of lend and give in the above sentences is we propose a verb thesaurus on the ba- that they are categorized to Giving Verbs,as sis of possible shared meanings with in Levin’s English Verb Classes and Alterna- predicate-argument structure. Motiva- tions (EVCA) (Levin, 1993), while the different tions of this work are to (1) construct a meaning will be that lend does not imply own- practical lexicon for dealing with alter- ership of the theme, i.e., a bicycle. One of the nations, paraphrases and entailment re- problematic issues with describing shared mean- lations between predicates, and (2) pro- ing among verbs is that semantic classes such as vide a basic database for statistical learn- Giving Verbs should be dependent on the gran- ing system as well as a theoretical lex- ularity of meanings we assumed. For example, icon study such as Generative Lexicon the meaning of lend and give in the above sen- and Lexical Conceptual Structure. One tences is not categorized into the same Frame of the characteristics of our description in FrameNet (Baker et al., 1998). The reason is that we assume several granularities for this different categorization can be consid- of semantic classes to characterize verb ered to be that the granularity of the semantic meanings. The thesaurus form allows us class of Giving Verbs is larger than that of the to provide several granularities of shared Giving Frame in FrameNet1. From the view of meanings; thus, this gives us a further re- natural language processing, especially dealing vision for applying more detailed analy- the with propositional meaning of verbs, all of ses of verb meanings. the above classes, i.e., the wider class of Giv- 1 Introduction ing Verbs containing lend and give as well as the narrower class of Giving Frame containing In natural language processing, to deal with give and donate, are needed. Therefore, in this similarities/differences between verbs is essen- work, in order to describe verb meanings with tial not only for paraphrase but also textual en- several granularities of semantic classes, a the- tailment and QA system which are expected saurus form is adopted for our verb dictionary. to extract more valuable facts from massively Based on the background, this paper presents large texts such as the Web. For example, in a thesaurus of predicate-argument structure for the QA system, assuming that the body text verbs on the basis of a lexical decompo- says “He lent her a bicycle”, the answer of the sitional framework such as Lexical Concep- question “He gave her a bicycle?” should be tual Structure (Jackendoff, 1990); thus our “No”, however the answer of “She rented the 1We agree with the concept of Frame and FrameEle- bicycle?” should be “Yes”. Thus construct- ments in FrameNet but what we propose in this paper is the ing database of verb similarities/differences en- necessity for granularities of Frames and FrameElements.
1 Proceedings of the 8th Workshop on Asian Language Resources, pages 1–8, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing proposed thesaurus can deal with argument ing verbs would be little different due to the dif- structure level alternations such as causative, ferent background theory of each English lexical transitive/intransitive, stative. Besides, tak- database; it must be not easy to add another level ing a thesaurus form enables us to deal with of semantic granularity with keeping consistency shared/differenciate meaning of verbs with con- for all the lexical databases; thus, thesaurus form sistency, e.g., a verb class node of “lend” and is needed to be a core form for describing verb “rent” can be described in the detailed layer of meanings2. the node “give”. We constructed this thesaurus on Japanese verbs and the current status of the verb thesaurus 2.2 Lexical Resources in Japanese is this: we have analyzed 7,473 verb meanings In previous studies, several Japanese lexicons (4,425 verbs) and organized the semantic classes were published: IPAL (IPA, 1986) focuses on in a ve-layer thesaurus with 71 semantic roles morpho-syntactic classes but IPAL is small3. types. Below, we describe background issues, EDR (Jap, 1995) consists of a large-scale lex- basic design issues, what kind of problems re- icon and corpus (See Section 3.4). EDR is main, limitations and perspectives of applica- a well-considered and wide coverage dictio- tions. nary focusing on translation between Japanese 2 Existing Lexical Resources and and English, but EDR’s semantic classes were Drawbacks not designed with linguistically-motivated lex- ical relations between verbs, e.g., alternations, 2.1 Lexical Resources in English causative, transitive, and detransitive relations From the view of previous lexical databases between verbs. We believe these relations must In English, several well-considered lexical be key for dealing with paraphrase in NLP. databases are available, e.g., EVCA, Dorr’s Recently Japanese FrameNet (Ohara et al., LCS (Dorr, 1997), FrameNet, WordNet (Fell- 2006) and Japanese WordNet (Bond et al., 2008) baum, 1998), VerbNet (Kipper-Schuler, 2005) are proposed. Japanese FrameNet currently and PropBank (Palmer et al., 2005). Be- published only less than 100 verbs4. Besides sides there is the research project (Pustejovsky Japanese WordNet contains 87000 words and and Meyers, 2005) to nd general descriptional 46000 synsets, however, there are three major framework of predicate argument structure by dif culty of dealing with paraphrase relations merging several lexical databases such as Prop- between verbs: (1) there is no argument informa- Bank, NomBank, TimeBank and PennDiscouse tion; (2) existing many similar synsets force us to Treebank. solve ne disambiguation between verbs when Our approach corresponds partly to each we map a verb in a sentence to WordNet; (3) the lexical database, (i.e., FrameNet’s Frame and basic verbs of Japanese (i.e., highly ambiguous FrameElements correspond to our verb class verbs) are wrongly assigned to unrelated synsets and semantic role labels, and the way to orga- because they are constructed by translation from nize verb similarity classes with thesaurus cor- English to Japanese. responds with WordNet’s synset), but is not 2As Kipper (Kipper-Schuler, 2005) showed in their exactly the same; namely, there is no lex- examples mapping between VerbNet and WordNet verb ical database describing several granularities senses, most of the mappings are many-to-many relations; of semantic classes between verbs with argu- this indicates that some two verbs grouped in a same se- mantic type in VerbNet can be categorized into different ments. Of course, since the above English lex- synsets in WordNet. Since WordNet does not have argu- ical databases have links with each other, it is ment structure nor syntactic information, we cannot pur- possible to produce a verb dictionary with sev- chase what is the different features for between the synsets. 3It contains 861 verbs and 136 adjectives. eral granularities of semantic classes with argu- 4We are supplying our database to Japanese FrameNet ments. However, the basic categories of classify- project.
2 3 Thesaurus of Predicate-Argument gives a book to her child”, “Kazuko rents a bicy- Structure cle from her friend”, and “Taro lend a car to his friend”. As Figure 1 shows that all of the above The proposed thesaurus of predicate-argument verb senses are involved in the verb class Moving structure can deal with several levels of verb of One’s Possession 5. The semantic description, classes on the basis of granularity of de ned verb which expresses core meaning of the verb class meaning. In the thesaurus we incorporate LCS- Moving of One’s Possession is based semantic description for each verb class that can provide several argument structure such ([Agent] CAUSE) as construction grammar (Goldberg, 1995). This BECOME [Theme] BE AT [Goal]. must be high advantage to describe the different Where the brackets [] denote variables that can factors from the view of not only syntactic func- be lled with arguments in example sentences. tions but also internal semantic relations. Thus Likewise parentheses () denote occasional factor. this characteristics of the proposed thesaurus can “Agent” and “Theme” are semantic role labels be powerful framework for calculating similar- that can be annotated to all example sentences. ity and difference between verb senses. In the Figure 1 shows that the children of the verb following sections we explain the total design of class Moving of One’s Possession are the two thesaurus and the details. verb classes Moving of One’s Possession/Renting and Moving of One’s Possession/Lending.Inthe 3.1 Design of Thesaurus Renting class, rent, hire and borrow are there, The proposed thesaurus consists of hierarchy of while in the Lending class, lend and lease exist. verb classes we assumed. A verb class, which Both of the semantic descriptions in the children is a conceptual class, has verbs with a shared verb classes are more detailed ones than the par- meaning. A parent verb class includes concepts ent’s description. of subordinate verb class; thus a subordinate verb class is a concretization of the parent verb 7"2 '33 class. A verb class has a semantic description 07&)$0#)"H3033"33&0) $&7"@2")4@'")!@%&2"@ that is a kind of semantic skeleton inspired from 02208@'"3"@A DF $")4G" %E'%"%F%"("G lexical conceptual structure (Jackendoff, 1990; '% F*0'I1"230)G
Kageyama, 1996; Dorr, 1997). Thus a seman- 3"()5 !"3 2&150) tic description in a verb class describes core se- 07&)$0#)"H3033"33&0)C mantic relations between arguments and shadow ")!&)$ 2")4@%&2"@ '")!@'"3" arguments of a shared meaning of the verb class. 02208 DF $")4G" %ED'% /'")!&)$ Since verb can be polysemous, each verb sense is 07&)$0#)"B3033"33&0)C F%"("G40F*0'GE'%"%F%"("G ")5)$ '% F*0'I1"230)G designated with example sentences. Verb senses DF $")4G" %ED'% /F $")4G with a shared meaning are assigned to a verb 2")5)$F%"("GE'%"%F%"("G class. Every example sentence is analyzed into '% F*0'I1"230)G their arguments and semantic role types; and then their arguments are linked to variables in se- Figure 1: Example of verb classes and their se- mantic description of verb class. This indicates mantic descriptions in parent-children. that one semantic description in a verb class can provide several argument structure on the basis A semantic description in the Renting class, i.e., of syntactic structure. This architecture is related to construction grammar. ([Agent] CAUSE) Here we explain this structure using verbs 5The name of a verb class consists of hierarchy of such as rent, lend, give, hire, borrow, lease.We thesaurus; and Figure 1 shows abbreviated verb class name. Full length of the verb class name is Change of assume that each verb sense we focus on here is State/Change of Position (Physical)/Moving of One’s Pos- designated by example sentences, e.g., “Mother session.
3 (BY MEANS OF [Agent] renting [Theme]) the same results, e.g., “He destroyed the door,” BECOME [Theme] BE AT [Agent], we would like to regard them as having the same describes semantic relations between “Agent” meaning. and “Theme”. Since semantic role labels are an- We de ne verb classes in intermediate hierar- notated to all of the example sentences, the vari- chy by grouping verb sense on the basis of aspec- ables in the semantic description can be linked tual category (i.e., action, state, change of state), to practical arguments in example sentences via argument type (i.e., physical, mental, informa- semantic role labels (See Figure 2). tion), and more detailed aspects depending on aspectual category. For example, walk the coun- )6%(#)"(!91)11!11%)(@ try, travel all over Europe and get up the stairs !(3(# can be considered to be in the Move on Path AC #!(2D" %BA'% /C #!(2D class. 0!(3(#C$!'!DB'%"%C$!'!D Verb class is essential for dealing with verb '% C #!(2D meanings as synsets in WordNet. Even if we had given an incorrect class name, the thesaurus will 75&)0!(21$)51!"0)'$!0"0%!( #!(2$!'!)50! work well if the whole hierarchy keeps is-a rela- tion, namely, the hierarchy does not contain any 0)$%0!151 multiple inheritance. #!(2$!'! The most ne-grained verb class before in- dividual verb sense is a little wider than alter- nations. Currently, for the ne-grained verb Figure 2: Linking between semantic description class, we are organizing what kind of differenti- and example sentences. ated classes can be assumed (e.g., manner, back- ground, presupposition, and etc.).
3.2 Construction of Verb Class Hierarchy 3.3 Semantic Role Labels To organize hierarchical semantic verb class, we The aim of describing arguments of a target verb take a top down and a bottom up approaches. As sense is (1) to link the same role arguments in for a bottom up approach, we use verb senses a related verb sense and (2) to provide disam- de ned by a dictionary as the most ne-grained biguated information for mapping a surface ex- meaning; and then we group verbs that can be pression to a verb sense. The Lexeed database considered to share some meaning. As for a dic- provides a representative sentence for each word tionary, we use the Lexeed database (Fujita et sense. The sentence is simple, without adjunc- al., 2006), which consists of more than 20,000 tive elements such as unessential time, location verbs with explanations of word sense and ex- or method. Thus, a sentence is broken down into ample sentences. subject and object, and semantic role labels are As a top down approach, we take three se- annotated to them (Figure 3). mantic classes: State, Change of State,andAc- tivity as top level semantic classes of the the- ex.: nihon-ga shigen-wo yunyuu-suru saurus according to Vendler’s aspectual analy- trans.: Japan resouces import sis (Vendler, 1967) (See Figure 4). This is be- (NOM) (ACC) cause the above three classes can be useful for AS: Agent Theme dealing with the propositional, especially, resul- tative aspect of verbs. For example “He threw a ball” can be an Activity and have no special re- Figure 3: An example of semantic role label. sult; but “He broke the door” can be a Change of State and then we can imagine a result, i.e., bro- Of course, only one representative sentence ken door. When other verb senses can express would miss some essential arguments; also, we
4 hierarchical verb class verbs CAUSE Test Ken-ga hon-wo tana-ni oku Activity oku (put on), (Ken puts a book on a shelf.) Change of Position Move to idou-suru ACT ON BECOME Change of (physical) Goal (move to),.. hon-ga tana-ni idou-suru State (A book moves to a shelf.) Change of Position (animates) [Ken]x [book]y BE AT aru (be), hon-ga tana-ni aru State Exist sonzai-suru [book]y [shelf]z Swell/ Swell (A book is on a shelf.) (exit),.. Position (physical) Sag Sag antonymy Activity State Attribute
Change of State super-event sub-event
Figure 4: Thesaurus and corresponding lexical decomposition.
do not know how many arguments are enough. hierarchical verb class compositional description 6 test This can be solved by adding examples ;how- activity change of position move to ([A] CAUSE) BECOME [T] BE AT [G] ever, we consider the semantic role labels of each change of (physical) goal state representative sentence in a verb class as an ex- change of position (animates) partially correspond ample of assumed argument structure to a verb state exist class. That is to say, we regard a verb class as a [T] BE AT [G] concept of event and suppose it to be a xed ar- gument frame for each verb class. The argument Figure 5: Compositional semantic description. frame is described as compositional relations. The principal function of the semantic role la- bel name is to link arguments in a verb class. In this verb thesaurus, being different from One exception is the Agent label. This can be a previous LCS studies, we try to ensure the com- marker discriminating transitive and intransitive positional semantic description as much as pos- verbs. Since the semantic class of the thesaurus sible by means of linking each sub-event struc- Change of State focuses on , transitive alternation ture to both a semantic class and example sen- cases such as “The president expands the busi- tences. Therefore, we believe that our verb the- ness” and “ The business expands” can be cate- saurus can provide a basic example data base for gorized into the same verb class. Then, these two LCS study. examples are differentiated by the Agent label. 3.5 Intrinsic Evaluation on Coverage 3.4 Compositional Semantic Description We did manual evaluation that how the pro- As described in Section 3.1, we incorporate com- posed verb thesaurus covers verb meanings in positional semantic structure to each verb class news articles. The results on Japanese new cor- to describe syntactically motivated lexical se- pus show that the coverage of verbs is 84.32% mantic relations and entailment meanings that (1825/2195) in 1000 sentences randomly sam- will expand the thesaurus. The bene t of com- pled from Japanese news articles7. Besides positional style is to link entailed meanings by we take 200 sentences and check whether the means of compositional manner. As an example of entailment, Figure 5 shows that a verb class verb meanings in the sentences can correspond Move to Goal entails Theme to be Goal, and this to verb meaning in our thesaurus. The result corresponds to a verb class Exist. shows that our thesaurus meaning covers 99.5% (199 verb meanings/200 verb meanings) of 200 6We are currently constructing an SRL annotated cor- pus. 7Mainichi news article in 2003.
5 verbs8. Table 1: Comparing to English resources FrameNet WordNet VerbNet 4 Discussions Concepts 825 N/A 237 (Frame) (Synset) (class) (Ver 1.3) (Ver 3.0) (Ver 2.2) 4.1 Comparison with Existing Resources 2007 2006 2006 Words 6100 155287 3819 WM N/A 117659 5257 Table 1 and Table 2 show a comparison of sta- Exp 13500 48349 N/A tistical characteristics with existing resources. In SRL 746 N/A 23 the tables, WN and Exp denote the number of POS V,N,A,Ad V,N,A,Ad V Lang E,O E,O E word meanings and example sentences, respec- tively. Also, SRL denotes the number corre- Table 2: Comparing to Japanese resources sponding to semantic role label. EDR JWordNet Our Thesaurus Looking at number of concepts, our Thesaurus Concepts 430000 N/A 709 has 709 types of concepts (verb classes) which (class) (Ver 3.0) (Ver 0.92) is similar to FrameNet and more than VerbNet. 2003 2009 2008 This seems to be natural because FrameNet is Words 410000 92241 4425 also language resource constructed on argument WM 270000 56741 7473 structure. Thanks to our thesaurus format, if we Exp 200000 48276 7473 SRL 28 N/A 71 need more ne grained concepts, we can expand POS all V,N,A,Ad V our thesaurus by adding concepts as new nodes Lang EJ E,J J at the lowest layer in the hierarchy. While at the number of SRL, FrameNet has much more types 4.2 Limitations of Developed Thesaurus than our thesaurus, and in the other resources One of the dif culties of annotating the seman- VerbNet and EDR the number of SRL is less than tic class of word sense is that a word sense can our thesaurus. This comes from the different de- be considered as several semantic classes. The sign issue of semantic role labels. In FrameNet proposed verb thesaurus can deal with multiple they try to differentiate argument types on the ba- semantic classes for a verb sense by adding them sis of the assumed concept, i.e., Frame. In con- into several nodes in the thesaurus. However, trast with FrameNet we try to merge the same this does not seem to be the correct approach. type of meaning in arguments. VerbNet and For example, what kind of Change of State se- EDR also de ned abstracted SRL; The differ- mantic class can be considered in the following ence between their resources and our thesaurus sentence? is that our SRLs are de ned taking into account what kind of roles in the core concept i.e., verb a. He took on a passenger. class; while SRLs in VerbNet and EDR are not dependent on verb’s class. Assuming that passenger is Theme, Move to goal Table 2 shows that our thesaurus does not have could be possible when we regard the vehicle9 as large number in registered words and examples Goal. In another semantic class, Change State of comparing to EDR and JWordNet. As we stated Container could be possible when we regard the in Section 3.5, the coverage of our verb class to vehicle as a container. Currently, all of the verb newspaper articles are high, but we try to add ex- senses are linked to only one semantic class that amples by constructing annotated Japanese cor- can be considered as the most related semantic pus of SRL and verb class. class.
9Vehicle does not appear in the surface expression but 8This evaluation is done by one person. Of course vehicle can exist. We currently describe the shadow argu- we need to check this by several persons and take inter- ment in the compositional description, but it would be hard annotator agreement. to prove the existence of a shadow argument.
6 A C B From the user side, i.e., dealing with the event propositional meaning of the sentence (a.), vari- D expression ous meanings should be estimated. Consider the camera=person following sentence:
``I sold C to B’’ b. Thus, we were packed. give ``I bought C from A’’ present As the semantic class of the sentence (a.) difference of Change State of Container could better explain standing point is essentially A C B need to normalize why they are packed in the sentence (b.) expressed by estimation The other related issue is how we describe the D scope, e.g., Figure 6: Requirement of normalization to deal c. He is hospitalized. with different expressions from the different If we take the meaning as a simple manner, Move views. to Goal can be a semantic class. This can be correct from the view of annotation, but we can relations with “or” in logical form. Therefore, guess he cannot work or he will have a tough nding and describing these verb relations will time as following events. FrameNet seems to be be essential for dealing with propositional mean- able to deal with this by means of a special type ings of a sentence. of linking between Frames. For further view of application, event match- Consequently, we think the above issues of se- ing to nd a similar situation in Web documents mantic class should depend on the application is supposed to be a practical and useful appli- side’s demands. Since we do not know all of the cation. Assuming that a user is confronted with requirements of NLP applications currently, then the fact that wireless LAN in the user’s PC does it must be suf cient to provide an expandable de- not work, and the user wants to search for doc- scriptional framework of linguistically motivated uments that provide a solution, the problem is lexical semantics. that expressions of situations must be different 4.3 Remaining and Conceivable Ways of from the views of individual writers, e.g., “wire- Extension less LAN did not work” or “wireless LAN was disconnected”. How can we nd the same mean- One of the aims of the proposed dictionary is to ing in these expressions, and how can we ex- identify the sentences that have the same mean- tract the answers by nding the same situation ings among different expressions. One of the from FAQ documents? To solve this, a lexical challenging paraphrase relations is that the sen- database describing verb relations between “go tences expressed from the different view points. wrong” and “disconnect” must be the base for Given the buying and selling in Figure 6, a hu- estimating how the expressions can be similar. man can understand that both sentences denote Therefore, constructing a lexicon can be worth- almost the same event from different points of while for developing NLP applications. view. This indicates that the sentences made by humans usually contain the point of view of the 5Conclusion speaker. This is similar to a camera, and we need to normalize the expressions as to their original In this paper, we presented a framework of a verb meaning. dictionary in order to describe shared meaning as We consider that NLP application researchers well as to differentiate meaning between verbs need to relate these expressions. Logically, if we from the viewpoint of relating eventual expres- know “buy” and “sell” have shared meanings of sions of NLP. One of the characteristics is that giving and taking things, we can describe their we describe verb relations on the basis of several
7 semantic granularities using a thesaurus form hinoki treebank. In COLING/ACL06 Interactive with argument structure. Semantic granularity Presentation Sessions, pages 65–68. is the basis for how we categorize (or recognize Goldberg, Adele E. 1995. Constructions.TheUni- which semantic class relates to a verb meaning). versity of Chicago Press. Also, we ensure functions and limitations of se- IPA: Information-Technology Promotion Agency, mantic classes and argument structure from the Japan, 1986. IPA Lexicon of the Japanese Lan- viewpoint of dealing with paraphrases. That is, guage for Computers. required semantic classes will be highly depen- dent on applications; thus, the framework of the Jackendoff, Ray. 1990. Semantic Structures.MIT Press. verb-sense dictionary should have expandability. The proposed verb thesaurus can take several se- Japan Electronic Dictionary Research Institute, Ltd, mantic granularities; therefore, we hope the verb 1995. EDR: Electric Dictionary the Second Edi- tion. thesaurus will be applicable to NLP’s task10. In future work, we will continue to organize Kageyama, Taro. 1996. Verb Semantics. Kurosio differentiated semantic classes between verbs Publishers. (In Japanese). and develop a system to identify the same event Kipper-Schuler, K. 2005. VerbNet: A broad- descriptions. coverage, comprehensive verb lexicon.Ph.D.the- sis, PhD Thesis, University of Pennsylvania. Acknowledgments Levin, Beth. 1993. English Verb Classes and Alter- We would like to thank NTT Communication nations. University of Chicago Press. Research Laboratory for allowing us to use the Ohara, Kyoko Hirose, Seiko Fujii, Toshio Ohori, Lexeed database, and Professor Koyama for Ryoko Suzuki, Hiroaki Saito, and Shun Ishizaki. many useful comments. 2006. Frame-based contrastive lexical seman- tics and japanese framenet: The case of risk and kakeru. In Proceeding of the Fourth Inter- national Conference on Construction Grammar. References http://jfn.st.hc.keio.ac.jp/ja/publications.html. Baker, Collin F., Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet project. Palmer, Martha, Daniel Gildea, and Paul Kingsbury. In Proceedings of the 36th Annual Meeting of the 2005. The proposition bank: An annotated cor- Association for Computational Linguistics, pages pus of semantic roles. Computational Linguistics, 86–90. 31(1):71–105. Pustejovsky, J. and Martha P.and A. Meyers. 2005. Bond, Francis, Hitoshi Isahara, Kyoko Kanzaki, Merging propbank, nombank, timebank, penn dis- and Kiyotaka Uchimoto. 2008. Construction course treebank and coreference. In Proceedings of Japanese WordNet from Multi-lingual Word- of the Workshop on Frontiers in Corpus Annota- Net. In Proceedings of the 14th Annual Meeting tion II: Pie in the Sky, pages 5–12. of Japanese Natural Language Processing, pages 853–856. Vendler, Zeno. 1967. Linguistics in Philosophy. Cornell University Press. Dorr, Bonnie J. 1997. Large-Scale Dictionary Con- struction for Foreign Language Tutoring and In- terlingual Machine Translation. Machine Transla- tion, 12(4):271–325.
Fellbaum, Chistiane. 1998. WordNet an Electronic Lexical Database. MIT Press.
Fujita, Sanae, Takaaki Tanaka, Fransis Bond, and Hi- romi Nakaiwa. 2006. An implemented descrip- tion of japanese: The lexeed dictionary and the 10The proposed verb thesaurus is available at: http://cl.cs.okayama-u.ac.jp/rsc/data/. (in Japanese).
8 Collaborative Work on Indonesian Word et through Asian Word et (AW )
Hammam Riza Budiono Chairil Hakim
Agency for the Assessment Agency for the Assessment Agency for the Assessment and Application of and Application of and Application of Technology (BPPT), Technology (BPPT), Technology (BPPT), Indonesia Indonesia Indonesia [email protected] [email protected] [email protected]
Abstract The goal of Indonesian AWN database man- agement system is to share a multilingual lexical This paper describes collaborative work database of Indonesian language which are on developing Indonesian WordNet in structured along the same lines as the AWN. the AsianWordNet (AWN). We will de- AWN is the result of the collaborative effort in scribe the method to develop for colla- creating an interconnected Wordnet for Asian borative editing to review and complete languages. AWN provides a free and public plat- the translation of synset. This paper aims form for building and sharing among AWN. The to create linkage among Asian languages distributed database system and user-friendly by adopting the concept of semantic re- tools have been developed for user. AWN is lations and synset expressed in Word- easy to build and share. Net. This paper describes manual interpretation me- thod of Indonesian for AWN. Based on web ser- vices architecture focusing on the particular 1 Introduction cross-lingual distributed. We use collective in- telligence approach to build this English equiva- Multilingual lexicons is of foremost importance lent. In this sequel, in section 2 the collabora- for intercultural collaboration to take place, as tions builders works on web interface at multilingual lexicons are several multilingual www.asianwordnet.org . In section 3, Interpretation application such as Machine Translation, of Indonesian AWN, short description of terminology, multilingual computing. progress of English – Indonesian translation and WordNet is the resource used to identify shallow the obstacle of translation. semantic features that can be attached to lexical units. The original WordNet is English WordNet 2 Collaborative AW proposed and developed at Princeton University WordNet (PWN) by using bilingual dictionary. WordNet covers the vast majority of nouns, In the era of globalization, communication verbs, adjectives and adverbs from English lan- among languages becomes much more guage. The words are organized in synonym sets important. People has been hoping that natural called synset. Each synset represents a concept language processing and speech processing. We includes an impressive number of semantic rela- can assist in smoothening the communication tions defined across concepts. among people with different languages. The information encoded in WordNet is used in However, especially for Indonesian language, several stages in the parsing process. For in- there were only few researches in the past. stance, attribute relations, adjective/adverb clas- The Princeton WordNet is one of the semantical- sifications, and others are semantic features ex- ly English lexical banks containing semantic tracted from WordNet and stored together with relationships between words. Concept mapping the words, so that they can be directly used by is a process of organizing to forming meaningful the semantic parser. relationships between them.
9 Proceedings of the 8th Workshop on Asian Language Resources, pages 9–13, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing
To build language WordNet there are two main of discussion; the merge approach and the ex- NUM MCF pand approach. The merge approach is to build Mongolia Myanmar the taxonomies of the language (synset) using BPPT NAST English equivalent words from bilingual dictio- Indonesia Lao PDR naries. The expand approach is to map translate NICT AsianWordNet MPP local words the bilingual dictionaries. This ap- Japan Nepal proach show the relation between senses. The system manages the synset assignment accord- VAST UCSC Vietnam Sri Langka ing to the preferred score obtained from the revi- sion process. For the result, the community will TCL Thailand NE C- be accomplish into original form of WordNet database. The synset can generate a cross lan- Fig 1. Collaboration on Asian WordNet guage result. AWN also introduce a web-based collaborative 3 Interpretation of Indonesian AW workbench, for revising the result of synset as- signment and provide a framework to create Indonesian WordNet have been used as a gener- AWN via linkage through PWN synset. AWN al-purpose translation. Our approach was to enables to connect and collaborate among indi- generate the query for the web services engine in vidual intelligence in order accomplish a text English and then to translate every key element files. of the query (topic, focus, keywords) into Indo- At present, there are ten Asian language in the nesian without modifying the query. The dictio- community. The amount of the translated syn- nary is distinguished by set of entry word cha- sets had been increased. Many language have racteristic, clear definitions, its guidance on collaboration in AWN. usage. All dictionary information for entries is • Agency for the Assessment and structured such as entry word, multiple word Application of Technology (BPPT), entries, notes, contemporary definitions, deriva- Indonesia tions, example sentence, idioms, etc. All dictio- • National Institute of Information and nary are implemented as text-files and as lin- Communications Technology (NICT), guistic databases connected to Indonesian AWN. Japan The set of language tags consists of part of speech, case, gender, number, tense, person, • Thai Computational Linguistics voice, aspect, mood, form, type, reflexive, ani- Laboratory (TCL), Thailand mation. • National Electronics and Computer
Technology Center (NECTEC), 3.1 Progress English – Indonesian Thailand • National University of Mongolia Indonesian WordNet is used Word Net Man- (NUM), Mongolia agement System (WNMS) tools developed by • Myanmar Computer Federation (MCF), AsianWordNet to create web services among Myanmar Asia languages based on Princeton WordNet® • National Authority of Science and version 3.0, Co-operation by TCL and BPPT Technology (NAST), Lao PDR establish on October 2007. • Madan Puraskar Pustakalaya (MPP), As presented above, we follow the merge to Nepal create and share the Indonesian WordNet by • University of Colombo School of translating the each synonym translation. We Computing (UCSC), SriLanka expand an appropriate synset to a lexical entry • Vietnamese Academy of Science and by considering its English equivalent. Technology (VAST), Vietnam We plan to have reliable process to create and share Indonesian WordNet in AWN. We classify this work into four person AWN translators to participate in the project of Indonesian AWN.
10 Each person was given a target translator in a An automatic compilation of dictionary in AWN month should reach at least 3000 sense so that have a translational issues. There are many cases the total achievement 12000 senses in a month. in explanation sense. One word in English will From 117.659 senses that there is expected to be be translated into a lot of Indonesian words, completed within 10 months. On the process of glossary can be express more than one Indone- mapping, a unique word will be generated for sian word (Ex. 1). every lexical entry which contain. The grammat- One of the main obstacles in studying the ab- ical dictionaries contain normalized entry word sorption of English words in Indonesian words, with hyphenation paradigm plus grammatical is the fact that the original form of some words tags. that have been removed due to Indonesian reform process, in which some words have been Assignment TOTAL through an artificial process. There is no special March April May sense character in Indonesian word, especially in tech- nical word, so that means essentially the same as Noun 10560 14199 16832 82115 the English word (Ex. 2). verb 6444 6444 6499 13767 Adjective 1392 1392 1936 18156 Ex. 1. time frame Adverb 481 481 488 3621 POS noun time synset time_frame Total 18877 22516 25755 117659 gloss a time period during which Table 1. Statistic of synsets something occurs or is In the evaluation of our approach for synset as- expected to occur; signment, we selected randomly sense from the an agreement can the result of synset assignment to English – In- be reached in donesian dictionary for manually checking. The a reasonably random set cover all types of part-of-speech. short time frame" With the best information of English equivalents Indonesian jangka waktu, marked with CS=5. The word entry must be selang waktu translated into the appropriate words by meaning explanation. Ex. 2. resolution Table 1. presents total assignment translated POS noun phenomenon words into Indonesian for the second third synset resolution month. Following the manual to translate the gloss (computer science) English AWN to Indonesian, we resulted the the number progress of AWN at this time. of pixels per We start to translate or edit from some group of square inch on base type in “By Category”. These base types a computer are based on categories from PWN. There is on- generated display; ly 21.89% ( approved 25,755 of 117,659 senses ) the greater of the total number of the synsets that were able the resolution, the to be assigned to lexical entry in the Indonesian better the picture – English Dictionary. Indonesian resolusi
3.2 Obstacle of Indonesian Translation Using definitions from the WordNet electronic lexical database. A major problem in natural Wordnet has unique meaning of word which is language processing is that of lexical ambiguity, presented in synonym set. Each synset has glos- be it syntactic. Each single words must be con- sary which defines concept its representation. tainer for some part of the linguistic knowledge For examples word car, auto, automobile, and need to ambiguous wordnet sense. Therefore, motorcar has one synset.
11 not only a single heuristic translate Indonesian are selected in the glossaries of each noun, verb, words. The WordNet defined in some semantic and adjectives and its subordinates. relations, this categories using lexicographer file WordNet information, whose objective is to and glossary definitions relations are assigned to build designs for the association between sen- weight in the range. WordNet hierarchy for the tences and coherence relations as well as to find first sense of the word “ empty ” there are 10 lexical characteristics in coherence categories. synset words (I take three of ten) that are related WordNet became an ancillary tool for semantic to the meaning are the following in ( Ex. 3.) ontology design geared to high quality informa- Three concepts recur in WordNet literature that tion extraction from the web services. entail a certain amount of ambiguity : termino- A comparative analysis of trends in wordnet use logical distance, semantic distance and concep- : tual distance. Terminological distance, by con- trast, often appears to refer to the suitability of 1. Support for the design of grammatical the word selected to express a given concept. categories designed to classify Semantic distance is understood to mean the information by aspects and traits, but in contextual factor of precision in meaning. And particular to design and classify the conceptual distance between words, in which semantic ontologies. have relations proved. 2. Basis for the development of audio- visual and multi-media information Ex. 3. empty retrieval systems. POS noun art synset empty 4 Internet Viewer gloss a container that has been The pilot internet service based on Wordnet 3.0 emptied; "return is published at http://id.asianwordnet.org . all empties to the store" 5 Discussion and Conclusion Indonesian hampa Any multilingual process such as cross-lingual empty information must involve resources and lan- POS verb change guage pair. Language specific can be applied in synset empty, discharge parallel to achieve best result. gloss become empty or In this paper we describe manually sharing of void of its Indonesian in the AWN by using dictionaries. content; "The AWN provides a free and public platform for room emptied" building and sharing among AWN. We want Indonesian mengosongkan continue the work defined learning the service matching system. Our future work on AWN will empty focuses in development platform WordNet and POS adjectives all language technology web services. synset empty Although AWN application are going steadily, gloss emptied of emotion; the limitations are: "after the violent argument he felt 1. AWN designed for manual so empty" authenticity can not be a reference. Indonesian kosong, penat 2. Classification was performed manually, Disambiguation is unquestionably the most ab- which means that the reasons and depth undant and varied application. It precision and of classification may not be consistent. relevance in response to a query inconsisten- cies. Schematically the semantic disambiguation
12 References Valenina Balkova, Andrey Suhonogov, Sergey Yablonsky. 2004. Rusian WordNet: From UML-notation to Internet/Infranet Database Implementation. In Porceedings of the Second International WordNet Conference (GWC 2004), Riza, H., Budiono, Adiansya P., Henky M., (2008). I/ETS: Indonesian-English Machine Translation System using Collaborative P2P Corpus, Agency for the Assessment and Application of Technology (BPPT), Indonesia, University of North Texas. Shi, Lei., Rada Mehalcea, (2005), Putting Pieces Together : Combining FrameNet, VerbNet, and WordNet for Robust Semantic Parsing Thoongsup, S., Kergrit Robkop, Chumpol Mokarat, Tan Sinthurahat, (2009). Thai WordNet Construction. Thai Computational Linguistics Lab., Thailand Virach Sornlertlamvanich., The 5th International Conference of the Global WordNet Association (GWC-2010), Mumbai, India , 31st Jan. - 4th Feb., 2010. Fragos, Kostas, Yannis Maistros, Christos Skourlas, (2004). Word Sense Disambiguation using WORDNET relations. Dep. Of Computer Engineering NTUA, Greece. www.asianwordnet.org
13 Considerations on Automatic Mapping Large-Scale Heterogene- ous Language Resources: Sejong Semantic Classes and KorLex
Heum Park Aesun Yoon Woo Chul Park and Center for U-Port IT LI Lab. Dept. of French Hyuk-Chul Kwon* Research and Education Pusan National University AI Lab Dept. of Computer Pusan National University [email protected] Science [email protected] Pusan National University [email protected]
defined argument structures with the selection- Abstract al restrictions of each argument, 2) large and semantically well segmented lexica, 3) most This paper presents an automatic map- importantly, interrelationship between the ar- ping method among large-scale hetero- gument structures and lexica. A couple of lan- geneous language resources: Sejong guage resources have been developed or can be Semantic Classes (SJSC) and KorLex. used for this end. Sejong Electronic Dictiona- KorLex is a large-scale Korean Word- ries (SJD) for nouns and predicates (verbs and Net, but it lacks specific syntactic & adjectives) along with semantic classes (Hong semantic information. Sejong Electron- 2007) were developed for syntactic and seman- ic Dictionary (SJD), of which semantic tic analysis, but the current versions do not segmentation depends on SJSC, has contain enough entries for concrete applica- much lower lexical coverage than tions, and they show inconsistency problem. A KorLex, but shows refined syntactic & Korean WordNet, named KorLex (Yoon & al, semantic information. The goal of this 2009), which was built on Princeton WordNet study is to build a rich language re- 2.0 (PWN) as its reference model, can provide source for improving Korean semanti- means for shallow semantic processing but co-syntactic parsing technology. There- does not contain refined syntactic and semantic fore, we consider integration of them information specific to Korean language. Ko- and propose automatic mapping me- rean Standard Dictionary (STD) provides a thod with three approaches: 1) Infor- large number of entries but it lacks systematic mation of Monosemy/Polysemy of description and formal representation of word Word senses (IMPW), 2) Instances be- senses, like other traditional dictionaries for tween Nouns of SJD and Word senses humans. Given these resources which were of KorLex (INW), and 3) Semantically developed through long-term projects (5 – 10 Related words between Nouns of SJD years), integrating them should result in signif- and Synsets of KorLex (SRNS). We icant benefits to Korean syntactic and semantic obtain good performance using com- processing. bined three approaches: recall 0.837, The primary goal of our recent work includ- precision 0.717, and F1 0.773. ing the work reported in this paper is to build a language resource, which will improve Korean
1 Introduction semantico-syntactic parsing technology. We While remarkable progress has been made in proceed by integrating the argument structures Korean language engineering on morphologi- as provided by SJD, and the lexical-semantic cal level during last two decades, syntactic and hierarchy as provided by KorLex. SJD is a semantic processing has progressed more language resource, of which all word senses slowly. The syntactic and semantic processing are labeled according to Sejong semantic requires 1) linguistically and formally well classes (SJSC), and in which selectional re-
* Corresponding Author
14 Proceedings of the 8th Workshop on Asian Language Resources, pages 14–21, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing strictions are represented in SJSC as for the propose an automatic mapping method with argument structures of predicates. KorLex is a three approaches from semantic classes of SJD large scale language resource, of which the to synsets of KorLex. In Section 5, we com- lexical-semantic hierarchies and other lan- pare the results of automatic mapping with guage–independent semantic relations between those of manual mapping. In Section 6, we synsets (synonym sets) share with those of draw conclusions and future works. PWN, and of which Korean language specific information comes from STD. The secondary 2 Related Works goal is the improvement of three resources as a Most existing mappings of heterogeneous lan- result of comparing and integrating them. guage resources were conducted manually by In this paper, we report on one of the operat- language experts. The Suggested Upper ing steps toward to our goals. We linked each Merged Ontology (SUMO) had been fully word sense of KorLex to that of STD by hand, linked to PWN. For manual mapping of be- when the former was built in our previous tween PWN and SUMO, it was considered work (Yoon & al. 2009). All predicates in SJD synonymy, hypernymy and instantiation be- were mapped to those of STD on word sense tween synsets of PWN and concepts of SUMO, level by semi-automatic mapping (Yoon, and found the nearest instances of SUMO for 2010). Thus KorLexVerb and KorLexAdj have synsets of PWN. Because the concept items of syntactico-semantic information on argument SUMO are much larger than those of PWN, it structures via this SJD - STD mapping. How- could be mapped between high level concepts ever, the selectional restrictions provided by of PWN and synonymy concepts of SUMO SJD are not useful, if SJSC which represents easily. (Ian Niles et al 2003). Dennis Spohr the selectional restrictions in SJD is not linked (2008) presented a general methodology to to KorLex. We thus conduct two mapping me- mapping EuroWordNet to the SUMO for ex- thods between SJSC and upper nodes of Kor- traction of selectional preferences for French. LexNoun: 1) manual mapping by a PH.D in Jan Scheffczyk et al. (2006) introduced the computational semantics (Bae & al. 2010), and connection of FrameNet to SUMO. They pre- 2) automatic mapping. This paper reports the sented general-domain links between Frame- latter. Reliable automatic mapping methods Net Semantic Types and SUMO classes in among heterogeneous language resources SUOKIF and developed a semi-automatic, should be considered, since the manual map- domain-specific approach for linking Frame- ping among large-scale resources is a very Net Frame Elements to SUMO classes time and labor consuming job, and might lack (Scheffczyk & al. 2006). Sara Tonelli et al. consistency. Less clean resources are, much (2009) presented a supervised learning frame- harder and more confusing manual mapping is. work for the mapping of FrameNet lexical In this paper, we propose an automatic map- units onto PWN synsets to solve limited cover- ping method of those two resources with three age of semantic phenomena for NLP applica- approaches to determine mapping candidate tions. Their best results were recall 0.613, pre- synsets of KorLex to a terminal node of SJSC: cision 0.761 and F1 measure 0.679. 1) using information of monosemy/polysemy Considerations on automatic mapping me- of word senses, 2) using instances between thods among language resources were always nouns of SJD and word senses of KorLex, and attempted for the sake of efficiency, using si- 3) using semantically related words between milarity measuring and evaluating methods. nouns of SJD and word senses of KorLex. We Typical traditional evaluating methods be- compared the results of automatic mapping tween concepts of heterogeneous language re- method with three approaches with those of sources were the dictionary-based approaches manual mapping aforementioned. (Kozima & al 1993), the semantic distance In the following Section 2, we discuss re- algorithm using PWN (Hirst & al 1998), the lated studies concerning language resources scaling method by semantic distance between and automatic mapping methods of heteroge- concepts (Sussna 1997), conceptual similarity neous language resources. In Section 3, we between concepts (Wu & al 1994), the scaled introduce KorLex and SJD. In Section 4, we
15 semantic similarity between concepts (Leacock Word Synsets Word
1998), the semantic similarity between con- Forms Trans Total Senses cepts using IS-A relation (Resnik 1995), the KorLexNoun 89,125 79,689 90,134 102,358 measure of similarity between concepts (Lin KorLexVerb 17,956 13,508 16,923 20,133 1998), Jiang and Conrath’s (1997) similarity KorLexAdj 19,698 18,563 18,563 20,905 computations to synthesize edge and node KorLexAdv 3,032 3,664 3,664 3,123 based techniques, etc. KorLexClas 1,181 - 1,377 1,377 Satanjeev et al. (2003) presented a new Total 130,992 115,424 130,661 147,896 measure of semantic relatedness between con- Table 1. Product of KorLex 1.5 cepts that was based on the number of shared words (overlaps) in their definitions (glosses) KorLexNoun includes 25 semantic domains for word sense disambiguation. The perfor- with 11 unique beginners with maximum 17 mances of their extended gloss overlap meas- levels in depth and KorLexVerb includes 15 ure with 3-word window were recall 0.342, semantic domains with 11 unique beginners precision 0.351 and F1 0.346. Siddharth et al. with maximum 12 levels in depth. Basically, (2003) presented the Adapted Lesk Algorithm KorLex synsets inherit the semantic informa- to a method of word sense disambiguation tion of PWN synsets mapped to them. The based on semantic relatedness. In addition, synset information of PWN consists of synset Alexander et al (2006) introduced the 5 exist- ID, semantic domain, POS, word senses, se- ing evaluating methods for PWN-based meas- mantic relations, frame information, and so on. ures of lexical semantic relatedness and com- We linked each word sense of KorLex 1.5 to pared the performance of typical five measures that of STD by hand, when the former was of semantic relatedness for NLP applications built in our previous work (Yoon & al. 2009). and information retrieval. Among them, Jiang- STD includes 509,076 word entries with about Conrath’s method showed the best perfor- 590,000 word senses. It contains a wide cover- mances: precision 0.247, recall 0.231 and F1 age for general words and a variety of example 0.211 for Detection. sentences for each meaning. More than 60% of In many studies, it was presented a variety word senses in KorLex 1.5 are linked to those of the adapted evaluating algorithms. Among of STD. KorLex 1.5, thus, inherits lexical rela- them, Jiang-Conrath’s method, Lin’s the tions described in STD, but both resources lack measure of similarity and Resnik’s the seman- refined semantic-syntactic information. tic similarity show good performances (Alex- 3.2 Sejong Electronic Dictionary ander & al 2006, Daniele 2009). SJD was developed during 1998-2007 manual- 3 Language resources to be mapped ly by linguists for a variety of Korean NLP application as a general-purpose machine read- 3.1 KorLex 1.5 able dictionary. Based on Sejong semantic KorLex 1.5 was constructed from 2004 to classes (SJSC), approximately 25,000 nouns 2007. Different from its previous version and 20,000 predicates (verbs and adjectives, (KorLex 1.0) which preserves all semantic SJPD) contain refined syntactic and semantic relations among synsets of PWN, KorLex 1.5 information. modifies them by deletion/correction of SJSC is a set of hierarchical meta-languages existing synsets, addition of new synsets and classifying word senses and it includes 474 conversion of hierarchical structure. Currently, terminal nodes and 139 non-terminal nodes, KorLex includes nouns, verbs, adjectives, and 6 unique beginners. Each unique beginner adverbs and classifiers: KorLexNoun, has levels from minimum 2 to maximum 7 le- KorLexVerb, KorLexAdj, KorLexAdv and vels in depth. Sejong Noun Dictionary (SJND) KorLexClas, respectively. Table 1 shows the contains 25,458 entries and 35,854 word size of KorLex 1.5, in which ‘Trans’ means the senses having lexical information for each en- number of synsets translated from PWN 2.0 try: semantic classes of SJSC, argument struc- and ‘Total’ is the number of manually added tures, selectional restrictions, semantically re- synsets including translated ones. lated words, derivatioinal relations/words et al.
16 sets and mapping synsets among candidates. Finally, to link each semantic class of SJSC to all lower-level synsets of LUB synsets. 4.1 Finding matched word senses between synsets and nouns of SJND For a semantic class of SJSC, we first find Figure 1. Correlation of lexical information word senses and synsets from KorLex that among SJND, SJPD and SJSC matched with nouns of SJND classified to that semantic class. Figure 2 shows the matched Figure 1 shows the correlation of lexical in- word senses and synsets between nouns of formation among SJND, SJPD and SJSC. Cer- SJND, then synsets of KorLex for a semantic tainly, that information of SJD should be ap- class. The left side of Figure 2 shows nodes of plied to a variety of NLP applications: infor- semantic classes with hierarchical structure mation retrieval, text analysis/generation, ma- and the center box shows the matched words chine translations, and various studies and (bold ones) among nouns of SJND with word educations. However, SJD has much lower senses of synsets in KorLex, and the right side lexical coverage than KorLex. More serious shows matched word senses and synsets in problem is that SJND and SJPD are still noisy: KorLex’ hierarchical structure. internal consistency inside each dictionary and external interrelationship between SJND, SJPD, and SJSC need to be ameliorated, as indicated by dot line in Fig. 1. 4 Automatic Mapping from Semantic Class of SJSC to Synsets of KorLex KorLex and SJSC have different hierarchical structures, grain sizes, and lexical information as aforementioned. For example, the semantic Figure 2. Matched word senses and synsets classes of SJSC are much bigger concepts in with nouns of SJND for a semantic class grain size than the synsets of KorLex: 623 For example, a semantic class concepts in SJSC vs.130,000 synsets in Kor- ‘Atmospheric Phenomena’ (rectangle in the Lex. Determining their semantic equivalence left) has nouns of SJND (words in the center), thus needs to be firmly based on linguistic the bold words are the matched words with clues. word senses of synsets from KorLex, and the Using following 3 linguistic clues that we underlined synsets of the right side are the found, we propose an automatic mapping me- matched ones and synset IDs in KorLex. The thod from semantic classes of SJSC to synsets notations for automatic mapping process be- of KorLex with three approaches to determine tween semantic classes of SJSC and synsets of mapping candidate synsets: 1) Information of KorLex are as follows: noun of SJND is ns, Monosemy/Polysemy of Word senses (IMPW), matched noun ns , un-matched ns , semantic 2) Instances between Nouns of SJD and Word m u class of SJSC is sc, synset is ss and word sense senses of KorLex (INW), and 3) Semantically of a synset is ws in KorLex, and monosemy Related words between Nouns of SJD and word is w and polysemy word is w . Synsets of KorLex (SRNS). mono poly A semantic class sc has nouns ns of SJND For automatic mapping method, following having matched noun ns and un-matched ns processes were conducted. First, to find word m u by comparing with word senses ws of a synset senses of synsets that matched to nouns of ss in KorLex. Thus a synset has word senses as SJND for each semantic class. Second, to se- ss ={ws , ws , …, ws }={ ns , ns , …, ns , lect mapping candidate synsets among them 1 1 2 n m1 m2 mk ns , ns , …}. And nouns of SJND for a with three approaches aforementioned. Third, u k+1 u k+2 semantic class sc is presented ns(sc )={ns , to determine the least upper bound (LUB) syn- 1 1 m1
17 nsm2, …, nsmk, nsu k, nsu k+1, …}. Therefore, we stances between nouns of SJND and word can find the matched word senses nsm1 ~ nsmk senses of a synset. As for KorLex, we used the for a semantic class sc from nouns of SJND examples of STD linked to word senses of and word senses of a synset ss in KorLex. KorLex. Figure 3 shows instances of STD and SJND for a word sense ‘Apple’. 4.2 Selecting Mapping Candidate Synsets Using those matched synsets and word senses, we select mapping candidate synsets with three different approaches. 4.2.1 Using Information of Monosemy and Polysemy of KorLex
Using information of monosemy/polysemy of Figure 3. Instances of STD and SJND for word senses of a synset, the first approach eva- luates mapping candidate synsets. The candi- word sense ‘Apple’ date synsets are evaluated into three catego- We reformulated the Lesk algorithm (Lesk ries: mc(A) is a most relevant candidate synset, 1987, Banerjee and Pedersen 2002) for com- mc(B) is a relevant candidate synset and mc(C) paring instances and evaluating mapping can- is a reserved synset. Evaluation begins from didate synsets. The process of evaluating map- lowest level synsets to top-level beginner. The ping candidate synsets is as follows. process of first approach is as follows. 1) To compare instances of a noun ns of 1) For a synset which contains a single word SJND with examples of a word of STD sense, ss={ws1}, if the word sense is a mo- linked to word sense ws of a synset ss, and nosemy, it is categorized as a a candidate to compute the Relatedness-A(ns, ws) = synset mc(A). If it is a polysemy, categori- score(instance(ns), example(ws)). zation is postponed for evaluating related- 2) To compare all nouns ns of SJND for a ness among siblings: candidate mc(C). semantic class with all nouns in instances 2) In the case of a synset having more than of STD linked to word senses ws, and to one word sense, ss={ws1, ws2, …}, if the compute the Relatedness-B(ns, ws) = matched words nsm among word senses of a score(∀ns, nouns(example(ws))). synset are over 60%: Pss(ws)=(count(nsm)/ 3) If Relatedness-A(ns, ws) ≥ λ and Related- count(ws)) ≥ 0.6, we evaluate whether that 1 ness-B(ns, ws) ≥ λ2, a synset is evaluated as synset is mapping candidate in the next step. a candidate synset mc(A). If either Related- 3) If all matched words nsm of a synset are ness-A(ns, ws) ≥ λ1 or Relatedness-B(ns, monosemic, we categorize it as a candidate ws) ≥ λ , evaluated as a candidate synset synset mc(A). If monosemic words among 2 mc(B). When threshold λ1 and λ2 were 1~4, matched words are over 50%: Pss(wmono| we had good performances. nsm) ≥ 0.5, it is evaluated as a mc(B). A 4) To repeat from step 1) to 3) for all of syn- synset containing polysemies over 50%: sets, in order to determine mapping candi- Pss(wpoly|nsm) ≥ 0.5, categorization is post- date synsets. And then, to construct hierar- poned for evaluating relatedness among chical structure for all those synsets. siblings: candidate synset mc(C). 4) To repeat from step 1) to 3) for all of syn- 4.2.3 Using Semantically Relatedness be- sets, in order to evaluate mapping candidate tween Nouns of SJND and Synsets of synsets. And then, to construct hierarchical KorLex structure for all those synsets. The third approach is to evaluate mapping 4.2.2 Using Instances between Nouns of candidate synsets using comparison of seman- SJND and Word senses of KorLex tic relations and their semantically related words between a noun of SJND and word The second approach is to evaluate mapping senses of a synset. To compute the relatedness candidate synsets using comparison of in- between them, we reformulated the computa-
18 tional formula of relatedness based on the Lesk mc(C) selected from the processes of “4.2 algorithm (Lesk 1987, Banerjee & al 2002). Select Mapping Candidate Synsets”, to de- The process of evaluating mapping candidate termine whether it is a LUB or not and fi- synsets is as follows. nal mapping synsets. 2) Among sibling synsets, if the ratio of 1) To compare semantically related words: count(mc(A)) to count(mc(A)+mc(B)+ between synonyms, hypernyms, hyponyms mc(C)) is over 60%, the parent synset of and antonyms of a noun of SJND and those siblings is evaluated as a candidate synset of a synset of KorLex. To compute the Re- mc(A) and as a LUB. latedness-C(ns, ss) = score (relations(ns), 3) If the ratio of count(mc(A)+mc(B)) to relations(ss)). count(mc(A)+mc(B)+mc(C)) is over 70%, 2) To compare all nouns ns of SJND for a the parent of siblings is evaluated as a can- semantic class with synonyms, hypernyms didate synset mc(A) and as a LUB. If the and hyponyms of a synset of KorLex, and ratio of count(mc(A)+mc(B)) to count compute the Relatedness-D(ns, ss) = score (mc(A) +mc(B)+mc(C)) is between 50% (∀ns, relations(ss)). and 69%, the parent of siblings is evaluated 3) If Relatedness-C(ns, ss) ≥ λ3 and Related- as a candidate synset mc(B) and as a LUB. ness-D(ns, ss) ≥ λ4, a synset is evaluated as 4) And if the others, to stop finding LUB for a candidate synset mc(A). If either Related- that synset and to determine final mapping ness-C(ns, ss) ≥ λ3 or Relatedness-D(ns, ss) synsets with its own level of candidate. ≥ λ4, evaluated as a candidate synset mc(B). 5) To repeat from step 1) to 4) until finding When threshold λ3 and λ4 were 1~4, we LUB synsets and final mapping synsets. have good performances. 4) To repeat from step 1) to 3) for all of syn- sets, in order to determine mapping synsets. And then, to construct hierarchical struc- ture for all those synsets. 4.3 Determining Least Upper Bound (LUB) Synsets and Mapping Synsets Next, we determine the LUB synsets using mapping candidate synsets and hierarchical structure having semantic relations: parent, Figure 4. Hierarchical structure of mapping child and sibling. In order to determine LUB candidate synsets for a semantic class and mapping synsets, we begin evaluation with bottom-up direction. Using relatedness among Figure 4 shows hierarchical structure of child-sibling candidate synsets, we evaluated mapping candidate synsets for a semantic class whether their parent synset is a LUB synset or ‘Furniture’ and when candidate synsets’ ID are not. If the parent is a LUB synset, we evaluate ‘04004316’ (Chair & Seat): mc(B), ‘04209815’ its parent (grand-parent of the candidate) syn- (Table & Desk): mc(B), ‘14441331’ (Table): set using relatedness among its sibling synsets. mc(C), and ‘14436072’ (Shoe shelf & Shoe If the parent is not a LUB, the candidate syn- rack): mc(A), we determine whether their par- sets mc(A) or mc(B) are determined as map- ent synset ‘03281101’ (Furniture) is a LUB or ping synsets (or LUB) and stop finding LUB. not, and evaluate it as a candidate synset mc(A) For all semantic classes, we determine LUB or mc(B). In this case, synset ‘03281101’ (Fur- and mapping synsets. Finally, we link the LUB niture) is a candidate mc(A) and a LUB synset. and mapping synsets to each semantic class of For all semantic classes, we find their map- SJSC. The process of determining of LUB and ping LUB and mapping synsets using informa- mapping synsets is as follows. tion of hierarchical structure and candidate synsets. Finally, we link each semantic class of 1) Using candidate synsets and their sibling, SJSC to all lower level synsets of matched for all candidate synsets mc(A), mc(B) or LUB synsets.
19 5 Experiments and Results present the ratio of the results of automatic mapping to original lexical data of Sejong and We experimented automatic mapping between KorLex: 474 semantic classes of SJSC, 25,245 623 semantic classes of SJSC and 90,134 noun nouns of SJND and 90,134 noun synsets and synsets of KorLex using the proposed automat- 147,896 word senses in KorLex. ic mapping method with three approaches. To SJD KorLex evaluate the performances, we used the results SC Nouns Word Approaches Synsets of manual mapping as correct answers, that (SJSC) (SJND) Senses was mapped 474 semantic classes (terminal 1) 473 18,575 54,943 69,970 2) 445 18,402 52,109 66,936 nodes) of SJSC to 65,820 synsets (73%) (in- 3) 413 18,047 49,768 64,003 clude 6,487 LUB) among total 90,134 noun 1)+2) 463 18,521 52,563 67,109 synsets of KorLex. We compared the results of 1)+3) 457 18,460 51,786 66,157 2)+3) 383 17,651 48,398 62,063 automatic mapping with those of manual map- 466 18,542 54,083 69,259 1)+2)+3) ping. For evaluation of performances, we em- (98.3%) (72.8%) (60%) (46.8%) ployed Recall, Precision and the F1 measure: Table 3. Numbers of semantic class, noun of F1 = (2*Recall*Precision)/(Recall+ Precision). SJD, synset and word sense of KorLex Approaches Recall Precision F1 1) 0.904 0.502 0.645 In manual mapping, we mapped 73% 2) 0.774 0.732 0.752 (65,820) synsets of KorLex for 474 semantic 3) 0.670 0.802 0.730 1)+2) 0.805 0.731 0.766 classes of SJSC. The 24,314 synsets was ex- 1)+3) 0.761 0.758 0.759 cluded in manual mapping among 90,134 total 2)+3) 0.636 0.823 0.718 nouns synsets. The reasons of excluded synsets 1)+2)+3) 0.838 0.718 0.774 in manual mapping were 1) inconsistency of Table 2. Performances of automatic mapping inheritance for lexical relations of parent-child with three approaches in SJSC or KorLex, 2) inconsistency between criteria for SJSC and candidate synsets, 3) Table 2 shows the performances of automat- candidate synsets belonging to more than two ic mapping with three approaches: 1) IMPW, semantic classes, 4) specific proper nouns 2) INW, and 3) SRNS. The ‘1)’, ‘2)’ or ‘3) in (chemical compound names), and 5) polysemic the Table present the results using for each abstract synsets (Bae & al. 2010). approach method and ‘1)+2)’, ‘1)+3)’ or In automatic mapping, we could map 60% ‘2)+3)’ present those of combining two ap- (54,083) synsets among total nouns synsets proaches. The ‘1)+2)+3)’ presents those of the (90,134) of KorLex, and it is 82.2% of the re- combining three approaches and we can see sults of manual mapping. The 11,737 synsets the best performances using the last approach was excluded in automatic mapping by com- among results: recall 0.837, precision 0.717 paring with manual mapping. Most of them and F1 0.773. The first approach ‘1)’ method were 1) tiny-grained synsets found in the low- shows high recall, but low precision and the est levels, 2) synsets having no matched word third approach ‘3)’ method present low recall senses with those of SJND, 3) synsets with and high precision. ‘1)+3)’ and ‘2)+3)’ shows polysemic word senses, 4) word senses having good performances overall. Thus, we could see poor instances in KorLex and in SJND, 5) good performances using the combined ap- word senses in SJND having poor semantic proach methods. relations. Second, we compared the numbers of se- mantic classes, nouns entries of SJND, noun Level LUB Ratio Level LUB Ratio 1 18 0.6% 9 230 7.3% synsets and word senses of KorLex for each 2 18 0.6% 10 98 3.1% approach, after mapping processes. 3 174 5.5% 11 32 1.0% 4 452 14.3% 12 20 0.6% As shown in Table 3, we can see the most 5 616 19.5% 13 4 0.1% numbers of mapping synsets using the ‘1)’ ap- 6 570 18.0% 14 4 0.1% 7 486 15.4% 15 2 0.1% proach. The ‘1)+2)+3)’ shows the results simi- 8 442 14.0% 16-17 0 0% lar to ‘1)’, but has the best performances (see Table 4. Numbers and Ratio of LUB synsets Table 2). The percentages in the round bracket excluded in automatic mapping
20 Table 4 shows the numbers and ratio of the KorLex, 2007. Korean WordNet, Korean Language LUB synsets excluded in automatic mapping processing Lab, Pusan National University. for each level in depth. Most synsets are 4-8 Available at http://korlex.cs.pusan.ac.kr levels synsets among 17 levels in depth. C. Hong. 2007. The Research Report of Develop- ment 21th century Sejong Dictionary, Ministry 6 Conclusions of Culture, Sports and Tourism, The National In- stitute of the Korean Language. We proposed a novel automatic mapping me- thod with three approaches to link Sejong Se- Dennis Spohr. 2008. A General Methodology for mantic Classes and KorLex using 1) informa- Mapping EuroWordNets to the Suggested Up- tion of monosemy/polysemy of word senses, 2) per Merged Ontology, Proceedings of the 6th LREC 2008:1-5. instances of nouns of SJD and word senses of KorLex, 3) semantically related words of Alexander Budanitsky and Graeme Hirst. 2006. nouns of SJD and synsets of KorLex. To find Evaluating WordNet-based Measures of Lexi- common clues from lexical information among cal Semantic Relatedness, Computational Lin- those language resources is important process guistics,Vol 32: Issue 1:13- 47. in automatic mapping method. Our proposed Siddharth Patwardhan, Satanjeev Banerjee and Ted automatic mapping method with three ap- Pedersen. 2003. Using Measures of Semantic proaches shows notable performances by com- Relatedness for Word Sense Disambiguation, paring with other studies on automatic map- CICLing 2003, LNCS(vol 2588):241-257. ping among language resources: recall 0.837, Satanjeev Banerjee and Ted Pedersen. 2002. An precision 0.717 and F1 0.773. Therefore, from Adapted Lesk Algorithm for Word Sense Dis- those studies, we can improve Korean seman- ambiguation Using WordNet, Proceedings of tico-syntactic parsing technology by integrat- CICLing 2002, LNCS 2276:136-145 ing the argument structures as provided by SJD, Sara Tonelli and Daniele Pighin. 2009. New Fea- and the lexical-semantic hierarchy as provided tures for FrameNet -WordNet Mapping, Pro- by KorLex. In addition, we can enrich three ceedings of the 13th Conference on Computa- resources: KorLex, SJD and STD as results of tional Natural Language Learning: 219-227. comparing and integrating them. We expect to Aesun Yoon, Soonhee Hwang, E. Lee, Hyuk-Chul improve automatic mapping technology among Kwon. 2009. Consruction of Korean WordNet other Korean language resources through this ‘KorLex 1.5’, JourNal of KIISE: Sortware and study. Applications, Vol 36: Issue 1:92-108. Acknowledgement Soonhee Hwang, A. Yoon, H. Kwon. 2010. KorLex 1.5: A Lexical Sematic Network for Korean This work was supported by the National Re- Numeral Classifiers, JourNal of KIISE: Sort- search Foundation of Korea (NRF) grant ware and Applications, Vol 37: Issue 1:60-73. funded by the Korea government(MEST) (No. Sun-Mee Bae, Kyoungup Im, Aesun Yoon. 2010. 2007-0054887). Mapping Heterogeneous Ontologies for the HLT Applications: Sejong Semantic Classes References and KorLexNoun 1.5, Korean Journal of Cog- Jan Scheffczyk, Adam Pease, Michael Ellsworth. nitive Science. Vol. 21: Issue 1: 95-126. 2006. Linking FrameNet to the Suggested Upper Aesun Yoon. 2010. Mapping Word Senses of Ko- Merged Ontology. Proc of the 2006 conference rean Predicates Between STD(STandard Dic- on Formal Ontology in Information Systems tionary) and SJD(SeJong Electronic Dictio- (FOIS 2006): 289-300. nary) for the HLT Applications, Journal of the Ian Niles and Adam Pease. 2003. Linking lexicons Linguistic Society of Korea. No 56: 197-235. and ontologies: Mapping wordnet to the sug- Hyopil Shin. 2010. KOLON: Mapping Korean gested upper merged ontology. In Proceedings of Words onto the Microkosmos Ontology and the 2003 International Conference on Informa- Combining Lexical Resources. Journal of the tion and Knowledge Engineering (IKE 03). Linguistic Society of Korea. No 56: 159-196.
21 Sequential Tagging of Semantic Roles on Chinese FrameNet
Jihong LI Ruibo WANG, Yahui GAO Computer Center Computer Center Shanxi University Shanxi University [email protected] wangruibo,gaoyahui @sxu.edu.cn { }
Abstract FrameNet (Baker et al., 1998), PropBank (Kings- bury and Palmer, 2002), NomBank (Meyers et al., In this paper, semantic role labeling(SRL) 2004), and so on. On this basis, several interna- on Chinese FrameNet is divided into the tional semantic evaluations have been organized, subtasks of boundary identification(BI) which include Senseval 3 (Litkowski, 2004), and semantic role classification(SRC). SemEval 2007 (Baker,et al., 2007), CoNLL These subtasks are regarded as the se- 2008 (Surdeanu et al., 2008), CoNLL 2009 (Hajic quential tagging problem at the word et al., 2009), and so on. level, respectively. We use the conditional The first SRL model on FrameNet was pro- random fields(CRFs) model to train and posed by Gildea and Jurafsky(2002). The model test on a two-fold cross-validation data consists of two subtasks of boundary identifica- set. The extracted features include 11 tion(BI) and semantic role classification(SRC). word-level and 15 shallow syntactic fea- Both subtasks were implemented on the pretreat- tures derived from automatic base chunk ment results of the full parsing tree. Many lex- parsing. We use the orthogonal array of ical and syntactic features were extracted to im- statistics to arrange the experiment so that prove the accuracy of the model. On the test data the best feature template is selected. The of FrameNet, the system achieved 65% precision experimental results show that given the and 61% recall. target word within a sentence, the best Most works on SRL followed Gildea’s frame- F-measures of SRL can achieve 60.42%. work of processing the SRL task on English For the BI and SRC subtasks, the best F- FrameNet and PropBank. They built their model measures are 70.55 and 81%, respectively. on the full parse tree and selected features using The statistical t-test shows that the im- various machine learning methods to improve the provement of our SRL model is not signif- accuracy of SRL models. Many attempts have icant after appending the base chunk fea- made significant progress, ssuch as the works of tures. Pradhan et al. (2005), Surdeanu et al. (2007), and so on. Other researchers regarded the task 1 Introduction of SRL as a sequential tagging problem and em- Semantic parsing is important in natural lan- ployed the shallow chunking technique to solve it, guage processing, and it has attracted an increas- as described by Marquez at al. (2008). ing number of studies in recent years. Cur- Although the SRL model based on a full parse rently, its most important aspect is the formaliza- tree has good performance in English, this method tion of the proposition meaning of one sentence of processing is not available in other languages, through the semantic role labeling. Therefore, especially in Chinese. A systemic study of Chi- many large human-annotated corpora have been nese SRL was done by Xue et al. (2008). Like constructed to support related research, such as the English SRL procedure, he removed many
22 Proceedings of the 8th Workshop on Asian Language Resources, pages 22–29, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing uncorrelated constituents of a parse tree and re- a sequential tagging problem at the word level. lied on the remainder to identify the semantic Conditional random fields(CRFs) model is em- roles using the maximum entropy model. When ployed to train the model and predict the result human-corrected parse is used, the F-measures of the unlabeled sentence. To improve the accu- on the PropBank and NomBank achieve 92.0 and racy of the model, base chunk features are intro- 69.6%, respectively. However, when automatic duced, and the feature selection method involving full parse is used, the F-measures only achieve an orthogonal array is adopted. The experimen- 71.9 and 60.4%, respectively. This significant de- tal results illustrate that the F-measure of our SRL crease prompts us to analyze its causes and to find model achieves 60.42%. This is the best SRL re- a potential solution. sult of CFN so far. First, the Chinese human-annotated resources The paper is organized as follows. In Section of semantic roles are relatively small. Sun and 2, we describe the situation of CFN and introduce Gildea only studied the SRL of 10 Chinese verbs SRL on CFN. In Section 3, we propose our SRL and extracted 1,138 sentences in the Chinese Tree model in detail. In Section 4, the candidate feature Bank. The size of the Chinese PropBank and set is proposed, and the orthogonal-array-based Chinese NomBank used in the paper of Xue is feature selection method is introduced. In Sec- significantly smaller than the ones used in En- tion 5, we describe the experimental setup used glish language studies. Moreover, more verbs ex- throughout this paper. In Section 6, we list our ist in Chinese than in English, which increases the experimental results and provide detailed analy- sparsity of Chinese Semantic Role data resources. sis. The conclusions and several further directions The same problem also exists in our experiment. are given at the end of this paper. The current corpus of Chinese FrameNet includes about 18,322 human-annotated sentences of 1,671 2 CFN and Its SRL task target words. There is only an average of less than Chinese FrameNet(CFN) (You et al., 2005) is a re- 10 sentences for every target word. To reduce the search project that has been developed by Shanxi influence of the data sparsity, we adopt a two-fold University, creating an FN-styled lexicon for Chi- cross validation technique for train and test label- nese, based on the theory of Frame Semantics ing. (Fillmore, 1982) and supported by corpus evi- Second, because of the lack of morphological dence. The results of the CFN project include a clues in Chinese, the accuracy of a state-of-the-art lexical resource, called the CFN database, and as- parsing system significantly decreases when used sociated software tools. Many natural language for a realistic scenario. In the preliminary stage processing(NLP) applications, such as Informa- of building an SRL model of CFN, we employed tion Retrieval and Machine Translation, will ben- a Stanford full parser to parse all sentences in the efit from this resource. In FN, the semantic roles corpus and adopted the traditional SRL technique of a predicate are called the frame elements of a on our data set. However, the experiment result frame. A frame has different frame elements. A was insignificant. Only 76.48% of the semantic group of lexical units (LUs) that evokes the same roles in the data set have a constituent with the frame share the same names of frame elements. same text span in the parse tree, and the F-measure The CFN project currently contains more than of BI can only achieves 54%. Therefore, we at- 1,671 LUs, more than 219 semantic frames, and tempted to use another processing technique for has exemplified more than 18,322 annotated sen- SRL on CFN. We formalized SRL on CFN into a tences. In addition to correct segmentation and sequential tagging problem at the word level. We part of speech, every sentence in the database is first extracted 11 word features into the baseline marked up to exemplify the semantic and syntac- model. Then we added 15 additional base chunk tic information of the target word. Each annotated features into the SRL model. sentence contains only one target word. In this paper, the SRL task of CFN comprises (a). 23 结构/n > ;/w To avoid the problem of data sparsity, we use all The CFN Corpus is currently at an early stage, sentences in our train data set to train the model of and the available CFN resource is relatively lim- BI. ited, so the SRL task on CFN is described as fol- lows. Given a Chinese sentence, a target word, and its frame, we identify the boundaries of the 3.3 SRC frame elements within the sentence and label them with the appropriate frame element name. This is After predicting the boundaries of semantic role the same as the task in Senseval-3. chunks in a sentence, the proper semantic role types should be assigned in the SRC step. Al- 3 Shallow SRL Models though it can be easily modeled as a classification problem, we regarded it as a sequential tagging This section proposes our SRL model architec- problem at the word level. An additional con- ture, and describes the stages of our model in de- straint is employed in this step: the boundary tags tail. of the predicting sequence of this stage should be 3.1 SRL Model Architecture consistent with the the output of the BI stage. A family of SRL models can be constructed using One intuitive reason for this model strategy is only shallow syntactic information as the input. that the SRC step can use the same feature set as The main differences of the models in this family BI, and it can further prove the rationality of our mainly focus on the following two aspects. feature optimization method. i) model strategy: whether to combine the sub- tasks of BI and SRC? ii) tagging unit: which is used as the tagging 3.4 Postprocessing unit, word or chunk. The one-stage and two-stage models are two Not all predicted IOB2 sequences can be trans- popular strategies used in SRL tasks, as described formed to the original sentence correctly; there- by Sui et al. (2009). The word and the chunk are fore, they should satisfy the following compulsory regarded as the two different tagging units of the constraints. SRL task. (1) The tagging sequence should be regular. In our SRL model, we consider BI and SRC as ”I...”, ”... OI...”, ”I-X...”, ”... O-I-X...”, ”... B-X- two stages, and the word is always used as the tag- I-Y...”, and ”B-I-X-I-X-I-Y...” are not the regular ging unit. The detailed formalization is addressed IOB2 sequences. in the following subsections. (2) The tag for the target word must be ”O”. 3.2 BI We use the Algorithm 1 to justify whether the The aim of the BI stage is to identify all word IOB2 sequences are regular. spans of the semantic roles in one Chinese sen- Moreover, at the SRC stage, the boundary tags tence. It can be regarded as a sequential tagging of the IOB2 sequence must be consistent with the problem. Using the IOB2 strategy (Erik et al., given boundary tags. 1999), we use the tag set B,I,O to tag all words, { } where tag ”B” represents the beginning word of a For the BI stage, we firstly add an additional chunk, ”I” denotes other tokens in the chunk, and chunk type tag X to all ”B” and ”I” tags in the ”O” is the tag of all tokens outside any chunks. IOB2 sequences, and then use Algorithm 1 to jus- Therefore, the example sentence (a) can be repre- tify the regularity of the sequences. sented as follows: In the testing stage of the SRL model, we use (b). 第 1 章 介绍 算法 与 数据 the regular sequence with the max probability as |B |I |O |B |I |I 结构 ; the optimal output. |I |O 24 Algorithm 1. justify the regular IOB2 sequence pus and to extract several types of features from Input: (1) IOB2 sequence:S = (s1, .., sn) them. The automatically generated base chunks where si B X,I X,O , and 1 i n (2) The position∈ { − of target− word in} sentence≤ pt≤ of example sentences (a) are given as follows: 1, Initialization: (c).[mp-ZX 第 1/m 章/q ] [vp-SG 介绍/v ] [np- ct = NULL (1) Current chunk type: ; SG 算法/n ] 与/c [np-AM 数据/n 结构/n ] ;/w (2) Regularity of sequence: state =′ REG′; 2, Check the tag of target word: spt: (1) If spt ==′ O′: go to Step 3; 4.1 Candidate Feature Set (2) If spt <>′ O′: state =′ IRR′, and go to Step 4; 3,F or(i = 1; i n; i + +) Three types of features are given as follows: ≤ (1) If si ==′ B X′: ct =′ X′; Features at the word level: − (2) If si ==′ I X′ and ct <>′ X′: state =′ IRR′, Word: The current token itself; and go to Step 4;− (3) If si ==′ O′: ct = NULL; Part-of-Speech: The part of speech of the cur- 4, Stop rent token; Output: Variable state; Position: The position of the current word rela- tive to the target word(before, after, or on); 3.5 Why Word-by-word? Target word: The target word in the sentence; We ever tried to use the methods of constituent- Features at the base chunk level: by-constituent and chunk-by-chunk to solve our Syntactic label: The syntactic label of the cur- SRL task on CFN, but the experiment results il- rent token, such as, B-np,I-vp, etc; lustrate that they are not suitable to our task. Structural label: The structural label of the cur- We use the Stanford Chinese full parser to parse rent token, such as, B-SG, I-ZX, etc; all sentences in the CFN corpus and use the SRL Head word and its Part of Speech: The head model proposed by Xue et al.(2008) in our task. word and its part of speech of the base chunk; However, the results is insignificant. Only 66.72% Shallow syntactic path: The combination of of semantic roles are aligned with the constituents the syntactic tags from the source base chunk, of the full parse tree, and the F-measure of BI only which contains the current word, to the target base achieves 52.43%. The accuracy of the state-of- chunk, which contains the target word of the sen- the-art Chinese full parser is not high enough, so tence; it is not suitable to our SRL task. Subcategory: The combination of the syntactic Chunk-by-chunk is another choice for our task. tags of the base chunk around the target word; When We use base chunk as the tagging unit of Other Features: our model, only about 15% of semantic roles did Named entity: The three types of named entities not align very well with the boundary of automati- are considered: person, location, and time. They cally generated base chunks, and the F-measure is can be directly mapped from the part of speech of significantly lower than the method of word-by- the current word. word, as described by Wang et al.(2009). Simplified sentence: A boolean feature. We use Therefore, words are chosen as the tagging unit the punctuation count of the sentence to estimate of our SRL model, which showed significant re- whether the sentence is the simplified sentence. sults from the experiment. Aside from the basic features described above, we also use combinations of these features, such 4 Feature Selection and Optimization as word/POS combination, etc. Word-level features and base-chunk features are used in our SRL research. 4.2 Feature Optimization Method Base chunk is a Chinese shallow parsing In the baseline model, we only introduce the fea- scheme proposed by Professor Zhou. He con- tures at the word level. Table 1 shows the candi- structed a high accuracy rule-based Chinese base date features of our baseline model and proposes chunk parse (Zhou, 2009), the F-measure of their optional sizes of sliding windows. which can achieve 89%. We use this parse to gen- For Table 1, we use the orthogonal array erate all base chunks of the sentences in our cor- L (49 24) to conduct 32 different templates. 32 × 25 The best template is chosen from the highest F- and the boundary tags are introduced as features measure for testing the 32 templates. The detailed during the SRC stage. orthogonal-array-based feature selection method The feature templates in Table 2 cannot con- was proposed by Li et al.(2010). tain the best feature template selected from Table 1. This is a disadvantage of our feature selection Table 1. Candidate features of baseline models Feature type Window size method. word [0,0] [-1,1] [-2,2] [-3,3] bigram of word - [-1,1] [-2,2] [-3,3] 5 Experimental Setup and Evaluation POS [0,0] [-1,1] [-2,2] [-3,3] Metrics bigram of POS - [-1,1] [-2,2] [-3,3] position [0,0] [-1,1] [-2,2] [-3,3] 5.1 Data Set bigram of position - [-1,1] [-2,2] [-3,3] word/POS - [0,0] [-1,1] [-2,2] The experimental data set consists of all sentences word/position - [0,0] [-1,1] [-2,2] of 25 frames selected in the CFN corpus. These POS/position - [0,0] [-1,1] [-2,2] trigram of position - [-2,0] [-1,1] [0,2] sentences have the correct POS tags and CFN se- word/target word - [0,0] mantic information; they are all auto parsed by target word [0,0] the rule-based Chinese base chunk parser. Table Compared with the baseline model, the features 3 shows some statistics on these 25 frames. at the word and base chunk levels are all consid- ered in Table 2. Table 3. Summary of the experimental data set Frame FEs Sents Frame FEs Sents 感受 6 569 因果 7 140 Table 2. Candidate features of the base chunk-based model 知觉特征 5 345 陈述 10 1,603 Feature type Window size 思想 3 141 拥有 4 170 word [0,0] [-1,1] [-2,2] 联想 5 185 适宜性 4 70 bigram of word - [-1,1] [-2,2] 自主感知 14 499 发明 12 198 POS [0,0] [-1,1] [-2,2] 查看 9 320 计划 6 90 bigram of POS - [-1,1] [-2,2] 思考 8 283 代表 7 80 position [0,0] [-1,1] [-2,2] 非自主感知 13 379 范畴化 11 125 bigram of position - [-1,1] [-2,2] 获知 9 258 证明 9 101 word/POS - [0,0] [-1,1] 相信 8 218 鲜明性 9 260 word/position - [0,0] [-1,1] 记忆 12 298 外观 10 106 POS/position - [0,0] [-1,1] 包含 6 126 属于某类 8 74 trigram of position - [-2,0] [-1,1] 宗教信仰 5 54 Totals 200 6,692 syntactic label [0,0] [-1,1] [-2,2] syn-bigram - [-1,1] [-2,2] Syn-trigram - [-1,1] [-2,2] 5.2 Cross-validation technique head word [0,0] [-1,1] [-2,2] In all our experiments, three groups of two-fold head word-bigram - [-1,1] [-2,2] POS of Head [0,0] [-1,1] [-2,2] cross-validation sets are used to estimate the per- POS-bigram of head - [-1,1] [-2,2] formance of our SRL model. All sentences in a syn/head word [0,0] [-1,1] [-2,2] frame are cut four-fold on average, where every stru/head word [0,0] [-1,1] [-2,2] shallow path - [0,0] [-1,1] two folder are merged as train data, and the other subcategory - [0,0] [0,0] two folds are used as test data. Therefore, we can named Entity - [0,0] [0,0] obtain three groups of two-fold cross-validation simplified Sentence - [0,0] [0,0] target word(compulsory) [0,0] data sets. Estimating the parameter of fold number is 1 25 The orthogonal array L54(2 3 ) is employed one of the most difficult problems in the cross- × to select the best feature template from all candi- validation technique. We believe that in the task of date feature templates in Table 2. To distinguish SRL, the two-fold cross validation set is a reason- it from the baseline model, we call the model able choice, especially when the data set is relative based on the table 2 as the ”base chunk-based SRL small. With a small data set, dividing it in half is model”. split of data set is the best approximation of the For both feature sets described above, the target real-world data distribution of semantic roles and word is the compulsory feature in every template, the sparse word tokens. 26 5.3 Classifiers where, K is the fold number of cross-validation CRFs model is used as the learning algorithm and n1 and n2 are the counts of training examples in our experiments. Previous SRL research has and testing examples. In our experimental setting, K = 2 and n2 1. Moreover, the estimation of demonstrated that CRFs model is one of the best n1 ≈ statistical algorithms for SRL, such as the works the variance of the total F-measure is as follows: of Cohn et al. (2005) and Yu et al. (2007). The crfpp toolkit1 is a good implementation of 1 V ar(F ) = V ar( (F1 + F2 + F3)) the CRF classifier, which contains three different 3 3 training algorithms: CRFL1, CRFL2, and MIRA. 1 = V ar(F ) We only use CRFL2 with Gaussian priori regular- 9 j ization and the variance parameter C=1.0. ∑j=1 5.4 Evaluation Metrics Using V ar(Fj) to estimate V ar(Fj), we can obtain: As described in SRL reseach, precision, recall, d and F-measure are also used as our evaluation 3 metrics. In addition, the standard deviation of the 1 V ar(F ) = V ar(F ) F-measure is also adopted as an important metric 9 j j=1 of our SRL model. The computation method of ∑ d 3 d2 these metrics is given as follows: 1 = (F i F ) Let P i, Ri and F i be the precision, recall, and 6 j − j j j j j=1 i=1 F-measure of the jth group of the ith cross valida- ∑ ∑ tion set, where j = 1, 2, 3 and i = 1, 2. The final Finally, we can derive the standard deviation of precision(P ), recall(R), and F-measure(F ) of our the F-measure, that is, std(F ) = V ar(F ). SRL model are the expectation values of the P i, j √ i i Rj, and Fj , respectively. 5.5 Significance Test of Two SRLd Models The estimation of the variance of cross- To test the significance of SRL models A and B, validation is another difficult problem in the we use the following statistics S. cross-validation technique. Although it has been proven that the uniform and unbiased estimation F (A) F (B) of the variance of cross-validation does not ex- S = − ∼ t(n) ist (Yoshua et al., 2007), we adopted the method V ar(F (A)) + V ar(F (B)) proposed by Nadeau et al. (2007), to estimate the where√F (A) and F (B) are the F-measures of variance of the F-measure of cross-validation sets. models A and B, and n is the freedom degree of This method is proposed hereinafter. t-distribution, an integer nearest to the n′. Let Fj be the average F-measure of the j group 1 1 2 experiment, that is, Fj = (F + F ), where 2 j j 3(V ar(F (A)) + V ar(F (B)))2 j = 1, 2, 3. The proposed estimator of the vari- n′ = (V ar(F (A))2 + V ar(F (B))2) ance of Fj in the work of Nadeau et al. (2007) is as follows: We use the p value( ) to test the significance − · of SRL models A and B, which are given as fol- 2 lows: 1 n2 i V ar(Fj) = ( + ) (Fj Fj) K n1 − ∑i=1 2 p value(F (A),F (B)) = P (S t1 α/2(n)) d 1 − ≥ − = ( + 1) (F i F ) 2 j − j If p value(F (A),F (B)) 0.05, the differ- i=1 − ≤ ∑ ence of the F-measures between models A and B 1crfpp toolkit: http://crfpp.sourceforge.net/ is significant at 95% level. 27 6 Experimental Results and Discussion 7 Conclusions and Further Directions We summarized the experiment results of every The SRL of Chinese predicates is a challenging stage of our SRL model, that is, BI, SRC and a task. In this paper, we studied the task of SRL combination of these two steps (BI+SRC). on the CFN. We proposed a two-stage model and exploited the CRFs classifier to implement the au- 6.1 Baseline SRL Model tomatic SRL systems. Moreover, we introduced The results of the baseline model are given in Ta- the base chunk features and the OA-based method ble 4, which only uses the features in Table 1. to improve the performance of our model. Exper- Table 4. Results of the baseline model imental results shows that the F-measure of our P(%) R(%) F(%) std(F) best model achieves 60.42%, and the base chunk BI 74.42 66.80 70.40 0.0031 features cannot improve the SRL model signifi- SRC - - 80.32 0.0032 cantly. BI+SRC 62.87 56.44 59.48 0.0050 In the future, we plan to introduce unlabeled In Table 1, because the results of the SRC stage data into the training phase and use the EM- are based on human-corrected boundary informa- schemed semi-supervised learning algorithms to tion, the precision, recall, and F-measure of this boost the accuracy of our SRL model. stage are the same. Therefore, we only give the F-measure and its deviation at the SRC stage. Acknowledgement In the baseline model, the BI stage is the bot- The authors would like to thank Prof. Kaiying tleneck of our SRL model. Its F-measure only LIU for his comments and Prof. Qiang ZHOU for achieves 70.4%, and the recall is lower than the the base-chunk parser. precision. Moreover, the F-measure of the final model only achieves 59.48%, and its standard de- viation is larger than both stages. References Baker, C., Fillmore, C., and John B. 1998. The Berke- 6.2 Base chunk-based SRL Model ley Framenet project. In Proceedings of COLING- When base chunk features, proposed in Table 2, ACL, 86-90, Montreal, Canada. are employed in the SRL model, we can obtain Baker, C., Ellsworth, M., Erk, K. 2007. SemEval’07 the results summarized in Table 5. Task 19: Frame semantic structure extraction. Pro- Table 5. Results of the base chunk-based model ceedings of the 4th International Workshop on Se- mantic Evaluations, 99-104, Prague, Czech Repub- P(%) R(%) F(%) std(F) BI 74.69 66.85 70.55 0.0038 lic. SRC - - 81.00 0.0029 Cohn, T., Blunsom P. 2005. Semantic role labeling BI+SRC 63.97 57.25 60.42 0.0049 with tree conditional random fields. Proceedings of A comparison of Table 4 and Table 5 provides CoNLL 2005, ACL, 169-172. the following two conclusions. Erik F., and John V. 1999. Representing text chunks. (1) When base chunk features are used, all P, R, In Proceedings of EACL’99, 173-179. F at every stage slightly increase (< 1%). Fillmore, C. 1982. Frame Semantics. In The Linguis- (2) The significance test values between the tic Society of Korea, Seoul: Hanshin. baseline model and the base chunk-based model Gildea, D., and Jurafsky, D. 2002. Automatic label- are given in Table 6. For every stage, the perfor- ing for semantic roles. Computational Linguistics, mance boost after introducing the base chunk fea- 28(3):245-288. tures is not significant at 95% level. However, the Hajic, J., Ciaramita, M., Johansson, R., Kawahara, D., impact of base chunk features at the SRC stage is Marti, M., Marquez,` L., Meyers, A., Nivre, J., Pado´, larger than that at the BI stage. S., Steˇpa´nek, J., Stranak, P., Surdeanu, M., Nian- wen X., Zhang, Y. 2009. The CoNLL-2009 shared Table 6. Test values between two SRL models task: syntactic and semantic dependencies in multi- BI SRC BI+SRC ple languages. In Proceedings of CoNLL 2009, 1- p value 0.77 0.166 0.228 − 18, Boulder, CO, USA.. 28 Jihong, L., Ruibo, W., Weilin, W., and Guochen, L. Surdeanu, M., Marquez,` L., Carreras, X., Comas, P. 2010. Automatic Labeling of Semantic Roles on 2007. Combination strategies for semantic role la- Chinese FrameNet. Journal of Software, 2010, beling. Journal of Artificial Intelligence Research, 21(4):597-611. 29:105-151. Jiangde, Y., Xiaozhong, F., Wenbo, P., and Zhengtao, Yoshua, B., and Yves, G. 2004. No unbiased estimator Y. 2007. Semantic role labeling based on condi- of the variance of K-fold cross-validation Journal of tional random fields Journal of southeast university Machine Learning Research, 5:1089-1105. (English edition), 23(2):5361-364. Weiwei, S., Zhifang, S., Meng, W., and Xing, W. 2009. Liping, Y., and Kaiying, L. 2005. Building Chinese Chinese semantic role labeling with shallow pars- FrameNet database. In Proceedings of IEEE NLP- ing. In Proceedings of the 2009 Conference on KE’05 , 301-306. Empirical Methods in Natural Language Process- ing(EMNLP 2009), ACL, 1475-1483. Litkowski, K. 2004. Senseval-3 task automatic label- ing of semantic roles. Third International Workshop on the Evaluation of Systems for the Semantic Anal- ysis of Text, 9-12, Barcelona, Spain. Marquez,` L., Carreras, X., Litkowski, K., Stevenson, S. 2008. Semantic Role Labeling: An Introduc- tion to the Special Issue. Computational Linguis- tics, 34(2):145-159. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., and Grishman, R. 2004. The NomBank Project: An interim report. In Pro- ceedings of the NAACL/HLT Workshop on Frontiers in Corpus Annotation, 24-31, Boston, MA, USA. Nadeau, C., and Bengio, Y. 2003. Inference for the generalization error. Machine Learning, 52: 239- 281. Nianwen, X. 2008. Labeling Chinese Predicates with Semantic Roles. Computational Linguistics, 2008, 34(2): 225-255. Paul, K., and Martha, P. 2002. From TreeBank to PropBank. In Proceedings of LREC-2002, Canary Islands, Spain. Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Mar- tin, J., Jurafsky, D. 2005. Support vector learn- ing for semantic argument classification. Machine Learning, 2005, 60(1):11-39. Qiang, Z. 2007. A rule-based Chinese base chunk parser. In Proc. of 7th International Conference of Chinese Computation (ICCC-2007), 137-142, Wuhan, China. Ruibo, W. 2004. Automatic Semantic Role Label- ing of Chinese FrameNet Based On Conditional Random Fields Model. Thesis for the 2009 Mas- ter’s Degree of Shanxi University, Taiyuan, Shanxi, China. Surdeanu, M., Johansson, R., Meyers, A., Marquez,` L., Nivre, J. 2008. The CoNLL 2008 shared task on joint parsing of syntactic and semantic depen- dencies. In Proceedings of CoNLL 2008, 159-177, Manchester, England, UK. 29 Augmenting a Bilingual Lexicon with Information for Word Translation Disambiguation Takashi Tsunakawa Hiroyuki Kaji Faculty of Informatics Faculty of Informatics Shizuoka University Shizuoka University [email protected] [email protected] the word inputted. This task is often called Abstract word translation disambiguation. In this paper, we describe a method for add- We describe a method for augmenting ing information for word translation disam- a bilingual lexicon with additional in- biguation into the bilingual lexicon. Compara- formation for selecting an appropriate ble corpora can be used to determine which translation word. For each word in the word associations suggest which translations source language, we calculate a corre- of the word (Kaji and Morimoto, 2002). First, lation matrix of its association words we extract word associations in each language versus its translation candidates. We corpus and align them by using a bilingual dic- estimate the degree of correlation by tionary. Then, we construct a word correlation using comparable corpora based on matrix for each word in the source language. these assumptions: “parallel word as- This correlation matrix works as information sociations” and “one sense per word for word translation disambiguation. association.” In our word translation We carried out word translation experiments disambiguation experiment, the results on two settings: English-to-Japanese and Chi- show that our method achieved 42% nese-to-Japanese. In the experiments, we tested recall and 49% precision for Japa- Dice/Jaccard coefficients, pointwise mutual nese-English newspaper texts, and 45% information, log-likelihood ratio, and Student’s recall and 76% precision for Chi- t-score as the association measures for extract- nese-Japanese technical documents. ing word associations. 1 Introduction 2 Constructing word correlation ma- The bilingual lexicon, or bilingual dictionary, trices for word translation disam- is a fundamental linguistic resource for multi- biguation lingual natural language processing (NLP). For 2.1 Outline of our method each word, multiword, or expression in the source language, the bilingual lexicon provides In this section, we describe the method for translation candidates representing the original calculating a word correlation matrix for each meaning in the target language. word in the source language. The correlation Selecting the right words for translation is a matrix for a word f consists of its association serious problem in almost all of multilingual words and its translation candidates. Among NLP. One word in the source language almost the translation candidates, we choose the most always has two or more translation candidates acceptable one that is strongly suggested by its in the target language by looking up them in association words occurring around f. the bilingual lexicon. Because each translation We use two assumptions for this framework: candidate has a distinct meaning and property, (i) Parallel word associations: we must be careful in selecting the appropriate Translations of words associated with translation candidate that has the same sense as each other in a language are also asso- ciated with each other in another language 30 Proceedings of the 8th Workshop on Asian Language Resources, pages 30–37, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing (Rapp, 1995). For example, two English words “tank” and “soldier” are associated with each other and their Japanese trans- lations “ᡓ㌴ (sensha)” and “රኈ (hei- shi)” are also associated with each other. (ii) One sense per word association: A polysemous word exhibits only one sense of a word per word association (Yarowsky, 1993). For example, a poly- semous word “tank” exhibits the “military vehicle” sense of a word when it is asso- ciated with “soldier,” while it exhibits the “container for liquid or gas” sense when it is associated with “gasoline.” Under these assumptions, we determine which of the words associated with an input word suggests which of its translations by aligning word associations by using a bilingual dictio- nary. Consider the associated English words (tank, soldier) and their Japanese translations (ᡓ㌴ (sensha), රኈ (heishi)). When we translate the word “tank” into Japanese, the associated word “soldier” helps us to translate it into “ᡓ㌴ (sensha)”, not to translate it into “ࢱࣥࢡ (tanku)” which means “a storage tank.” This naive method seems to suffer from the following difficulties: A disparity in topical coverage between Figure 1. Overview of our method. two corpora in two languages A shortage in the bilingual dictionary tions. Then, we iteratively calculate a correla- The existence of polysemous associated tion matrix for each word in the source lan- words that cannot determine the correct guage. Finally, we select the translation with sense of the input word the highest correlation from the translation For these difficulties, we use the tendency that candidates of the input word and the the two words associated with a third word are co-occurring words. likely to suggest the same sense of the third For each input word in the source language, word when they are also associated with each we calculate correlation values between their other. For example, consider an English asso- translation candidates and their association ciated word pair (tank, troop). The word “troop” words. The algorithm is shown in Figure 2. cannot distinguish the different meanings be- In Algorithm 1, the initialization of correla- cause it can co-occur with the word “tank” in tion values is based on word associations, both senses of the word. The third word “sol- where D is a set of word pairs in the bilingual dier,” which is associated with both “tank” and dictionary, and Af and Ae are the sets of asso- “troop,” can suggest the translation “ ᡓ㌴ ciated word pairs. First, we retain associated (sensha).” words f’(i) when its translation e’ exists and The overview of our method is shown in when e’ is associated with e. In the iteration, Figure 1. We first extract associated word pairs the correlation values of associated words f’(i) in the source and target languages from com- that suggest e(j) increase relatively by using parable corpora. Using a bilingual dictionary, association scores ( ( ), ) and we obtain alignments of these word associa- 31 ( ( ), ). In our experiments, we set the Algorithm 1: number of iterations Nr to 10. Input: 2.2 Alternative association measures for f: an input word extracting word associations f’(1), …, f’(I): associated words of f e(1), …, e(J): translation candidates We extract co-occurring word pairs and calcu- of f late their association scores. In this paper, we Nr: number of iterations focus on some frequently used metrics for A bilingual lexicon finding word associations based on their oc- Word association scores for both currence/co-occurrence frequencies. languages Suppose that words x and y frequently Output: co-occur. Let n and n be the occurrence fre- 1 2 Cf = [Cf (f’(i), e(j))]: a correlation ma- quencies of x and y respectively, and let m be trix for f the frequency that x and y co-occur between w content words. The parameter w is a window ɑ 1: if ( ), ( ) ƕ0 then size that adjusts the range of co-occurrences. ɒ Let N and M be the sum of occur- Sęʠ ($), (%)ʡ ɑ( ) ( ) 2: , Ų ɒ rences/co-occurrences of all words/word pairs, Ğ Sę ($), ( ) respectively. The frequencies are summarized 3: else in Table 1. ɑ 4: ( ), ( ) Ų 0 The word association scores ( , ) are de- 5: end fined as follows: 6: Ų 0 Dice coefficient (Smadja, 1993) 7: while i < N 2 r Dice( , ) = (1) 8: Ų + 1 ɑ + 9: ( ), ( ) Ų ( ( ), ) Jaccard coefficient (Smadja et al., 1996) ( ( ), ) ( , ( )) × Jaccard( , ) = (2) max ( ( ), ) ( , ( )) + 10: end Pointwise mutual information (pMI) ( , ) = (Church and Hanks, 1990) 1 (Ǥ : ( , ) Ǩ ̻ ,( , )Ǩ̾)ʭ / Ƥ & pMI( , ) =log (3) 0 (otherwise) ( )( ) Log-likelihood ratio (LLR) (Dunning, (9 :=9{ ɑɑ|( , ɑɑ)Ǩę,( ɑ($), ɑɑ)Ǩę}) 1993) Figure 2. Algorithm for calculating correlation LLR( , ) matrices. = 2 logL( , , ) +logL( , , ) x x not Total logL( , , ) occur occur logL( , , ) ; (4) ( ) logL , , = y mn2 –m n2 log +( ) log (1 ), (5) occur = , = , = (6) y not n1 –m M–n1 N–n2 Student’s t-score (TScore) (Church et al., occur –n2 + m 1991) Total n N–n N 1 1 TScore( , ) = (7) We calculate association scores for all pairs Table 1. Contingency matrix of occurrence of words when their occurrence frequencies frequencies. are not less than a threshold Tf and when their 32 33 Training comparable corpora every Chinese-Japanese term pair (tC, tJ) when The New York Times texts from (tC, tE) and (tJ, tE) were present in the dictiona- English Gigaword Corpus ries for one or more English terms tE. This Fourth Edition (LDC2009T13): merged dictionary contains about two million 1.6 billion words term pairs. While these Chinese-Japanese term The Mainichi Shimbun Corpus pairs include wrong translations, it was not a (2000-2005): 195 million words serious problem in our experiments because Test corpus wrong translations were excluded in the pro- A part of The New York Times cedure of our method. (January 2005): 157 paragraphs, We applied morphological analysis and 1,420 input words part-of-speech tagging by using TreeTagger Experiment B1 (Schmid, 1994) for English, JUMAN for Jap- Training comparable corpus anese, and mma (Kruengkrai et al., 2009) for In-house Chinese-Japanese pa- Chinese, respectively. rallel corpus in the environment In the test corpus, we manually annotated 2 domain: 53,027 sentence pairs1 reference translations for each target word. Test corpus The parameters we used were as follows: A part of the training data: 1,443 Experiment A: sentences, 668 input words Tf = 100 (Japanese), Tf = 1000 (English), Experiment B2 Tc = 4, w = 30, = 30. Training comparable corpus Experiment B1/B2: In-house Chinese-Japanese pa- Tf = 100, Tc = 4, w = 10, = 25. rallel corpus in the medical do- Some of the parameters were empirically ad- main: 123,175 sentence pairs justed. Test corpus In the experiments, the matrices could be obtained for 9103 English words (A), 674 A part of the training data: 940 sentences, 3,582 input words Chinese words (B1) and 1258 Chinese words Dictionaries (B2), respectively. In average one word had 3.24 (A), 1.15 (B1) and 1.51 (B2) translation Japanese-English bilingual dictiona- 3 ries: Total 333,656 term pairs candidates by using the best setting. Table 2 is the resulted matrix for the word “home” in EDR Electronic Dictionary the Experiment A. Eijiro, Third Edition EDICT (Breen, 1995) 4.2 Results of English-Japanese word Chinese-English bilingual dictionary translation Chinese-English Translation Table 3 shows the results of Experiment A. We Lexicon Version 3.0 (LDC classified the translation results for 1,420 2002L27): 54,170 term pairs target English words into four categories: True, Wanfang Data Chinese-English False, R, and M. When the translation was Science/Technology Bilingual output, the result was True if the output is Dictionary: 525,259 term pairs included in the reference translations, and it For the Chinese-Japanese translation, we gen- was False otherwise. The result was R when all erated a Chinese-Japanese bilingual dictionary the associated words in the correlation matrix by merging Chinese-English and Japa- nese-English dictionaries. The Chi- nese-Japanese bilingual dictionary includes 2 We prepared multiple references for several tar- get words. The average numbers of reference trans- lations for an input word are 1.84 (A), 1.50 (B1), 1 We could prepare only parallel corpora for Chi- and 1.48 (B2), respectively. nese-Japanese language pair as training corpora. 3 From each matrix, we cut off the columns with For our experiments, we assumed them as compa- translations that do not have the best scores for any rable corpora and did not use the correspondence of associated words, because such translations are sentence pairs. never selected. 34 did not occur around the input word. The result Score TA True False R M was M when no correlation matrix existed for Dice 0 588 627 90 115 the input word. We did not select a translation (41%/48%) output in these cases. The recall and precision 0.001 586 619 94 121 are shown in the parentheses below. (41%/49%) Recall = (True) / (Number of input words) 0.01 479 507 243 191 (34%/49%) Precision = (True) / (True + False) Jacc- 0 594 621 90 115 Among the settings, we obtained the best ard (42%/49%) results, 42% recall and 49% precision, when 0.001 584 609 105 122 we used the Jaccard coefficient for association (41%/49%) scores and TA = 0, which means all pairs were 0.01 348 378 374 320 taken into consideration. Among other settings, (25%/48%) the Dice coefficient achieved a comparable pMI 0 292 309 704 115 performance with Jaccard. (21%/49%) 1 293 308 703 116 4.3 Results of Chinese-Japanese word (21%/49%) translation LLR 10 530 747 28 115 Tables 4 and 5 show the results of Experiment (37%/42%) B. In each domain, we tested only the settings 100 529 744 32 115 on Dice, pMI, and LLR with TA = 0. (37%/42%) In the environmental domain, the pointwise T- 1 486 793 26 115 mutual information score achieved the best Score (34%/38%) performance, 45% recall and 76% precision. 4 489 787 26 118 However, the Dice coefficient gave the best (34%/38%) recall (55%) for the medical domain. This re- Table 3. Results of English-Japanese word sult indicates that Experiment B1/B2 had translation (A) higher precision and more words without the correlation matrix than Experiment A had. Score TA True False R M Dice 0 1984 895 82 621 4.4 Discussion (55%/69%) As a result, we could generate bilingual lex- pMI 0 1886 804 271 621 icons with word translation disambiguation (53%/70%) information for 9103 English words and 1932 LLR 0 1652 1246 63 621 Chinese words. Although the number of words (46%/57%) might be augmented by changing the settings, Table 4. Results of Chinese-Japanese word the size does not seem to be sufficient as bi- translation for environmental domain (B1). lingual dictionaries. The availability of larger output should be investigated. Score TA True False R M The experimental results show that our me- Dice 0 277 124 14 253 thod selected correct translations for at least (41%/69%) half of the input words if a correlation matrix pMI 0 303 95 17 253 existed and if the associated words co-occur. (45%/76%) Among all input words, at least 40% of the LLR 0 269 131 15 253 input words can be translated. The bilingual (40%/67%) dictionaries included 24.4, 38.6, and 52.0 Table 5. Results of Chinese-Japanese word translation candidates for one input word in translation for medical domain (B2). Experiment A, B1, and B2, respectively. When we select the most frequent word, the preci- candidates in the correlation matrices for one sions were 7%, 1%, and 1%, respectively. input word are shown in Table 6. These indi- Meanwhile, the average numbers of translation cate that our method effectively removed noisy 35 translations from the Chinese-Japanese dictio- Score TA Exp.A Exp.B1 Exp.B2 nary merged Japanese-English and Chi- Dice 0 1.83 0.61 0.91 nese-English dictionaries, and that the associa- pMI 0 1.27 0.50 0.79 tion scores contributed word translation dis- LLR 10/0 1.34 1.48 2.75 ambiguation. Table 6. Average numbers of translation can- Among the settings, the Jaccard/Dice coef- didates in the correlation matrices for one input ficients were proven to be effective, although word. pointwise mutual information (pMI) was also effective for technical domains and the Chi- tactic co-occurrence, which is obtained by nese-Japanese language pair. Because the Jac- conducting dependency parsing of the results card/Dice coefficients were originally used for of a sentence. The correlation between asso- measuring the proximity of sets, these might ciated words and translation candidates also be effective for collecting related words by needs to be re-examined. Similarly, we will using the similarity of kinds of co-occurring handle verbs as associated words to the input words. However, pMI tends to emphasize nouns by using syntactic co-occurrence. low-frequency words as associated words. The consequence of this tendency might be that 5 Related Work low-frequency associated words do not appear around the input word in the newspaper text.4 Statistical machine translation (Brown et al., In most metrics for the association score, the 1990) automatically acquires knowledge for lowest threshold value TA achieved the best word translation disambiguation from parallel performance. This result indicates that the corpora. Word translation disambiguation is cut-off of associated words by some thresholds based on probabilities calculated from the was not effective, although it requires more word alignment, phrase pair extraction, and the time and memory space to obtain correlation language model. However, much broad con- matrices without cut-off. How to optimize oth- text/domain information is not considered. er parameters in our method remains unsolved. Carpuat and Wu (2007) proposed con- More words without the correlation matrix text-dependent phrasal translation lexicons by were present in Experiment B1/B2 than in Ex- introducing context-dependent features into periment A because the input word was often a statistical machine translation. technical term that was not in the bilingual dic- Unsupervised methods using dictionaries tionary. The better recall and precision of Ex- and corpora were proposed for monolingual periment B1/B2 came from several reasons, WSD (Ide and Veronis, 1998). They used including difference of test sets and language grammatical information including pairs. In addition, it might have an impact on parts-of-speech, syntactically related words, this result that the fact that word translation and co-occurring words as the clues for the disambiguation of technical terms is easier WSD. Our method uses a part of the clues for than word translation disambiguation of com- bilingual WSD and word translation disam- mon words. biguation. We handled only nouns as input words and Li and Li (2002) constructed a classifier for associated words in this study. Considering word translation disambiguation by using a only the co-occurrence in a fixed window bilingual dictionary with bootstrapping tech- would be insufficient to apply this method to niques. We also conducted recursive calcula- the translation of verbs and other parts of tion by dealing with the bilingual dictionary as speech. In future work, we will consider syn- the seeds of the iteration. Vickrey et al. (2005) introduced a context as 4 a feature for a statistical MT system and they We limited the maximum number of association generated word-level translations. How to in- words for one word to 400 in descending order of troduce the word-level translation disambigua- their association scores because of restriction of tion into sentence-level translation is a consi- computational resources. In future work, we may alleviate the drawback of pMI by enlarging or de- derable problem. leting this limitation. 36 6 Conclusion Church, Kenneth W. and Patrick Hanks. 1990. Word association norms, mutual information, and lex- In this paper, we described a method for add- icography. Computational Linguistics, ing information for word translation disam- 16(1):22-29. biguation into the bilingual lexicon, by consi- Dunning, Ted. 1993. Accurate methods for the sta- dering the associated words that co-occur with tistics of surprise and coincidence. Computa- the input word. We based our method on the tional Linguistics, 19(1):61-74. following two assumptions: “parallel word Ide, Nancy and Jean Veronis. 1998. Introduction to associations” and “one sense per word associa- the special issue on word sense disambiguation: tion.” We aligned word associations by using a the state of the art. Computational Linguistics, bilingual dictionary, and constructed a correla- 24(1):1-40. tion matrix for each word in the source lan- guage for word translation disambiguation. Kaji, Hiroyuki and Yasutsugu Morimoto. 2002. Unsupervised word sense disambiguation using Experiments showed that our method was ap- bilingual comparable corpora. In Proc. of the plicable for both common newspaper text and 19th International Conference on Computational domain-specific text and for two language Linguistics, pages 411-417. pairs. The Jaccard/Dice coefficients were proven to be more effective than the other me- Kruengkrai, Canasai, Kiyotaka Uchimoto, Jun’ichi trics as word association scores. Future work Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven includes extending our method to handle verbs word-character hybrid model for joint Chinese as input words by introducing syntactic word segmentation and POS tagging. In Proc. of co-occurrence. The comparisons with other the Joint Conference of the 47th Annual Meeting disambiguation methods and machine transla- of the ACL and the 4th International Joint Con- tion systems would strengthen the effective- ference on Natural Language Processing of the ness of our method. We consider also evalua- AFNLP, pages 513-521. tions on real NLP tasks including machine Li, Cong and Hang Li. 2002. Word translation translation. disambiguation using bilingual bootstrapping. In Proc. of the 40th Annual Meeting of Association Acknowledgments for Computational Linguistics, pages 343-351. This work was partially supported by Japa- Rapp, Reinhard. 1995. Identifying word transla- nese/Chinese Machine Translation Project in tions in non-parallel texts. In Proc. of the 33rd Special Coordination Funds for Promoting Annual Meeting of the Association for Computa- Science and Technology (MEXT, Japan). tional Linguistics, pages 320-322. Schmid, Helmut. 1994. Probabilistic part-of-speech References tagging using decision trees. In Proc. of the 1st International Conference on New Methods in Brown, Peter F., John Cocke, Stephen A. Della Natural Language Processing. Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Smadja, Frank. 1993. Retrieving collocations from Roossin. 1990. A statistical approach to machine text: Xtract. Computational Linguistics, translation. Computational Linguistics, 19(1):143-177. 16(2):79-85. Smadja, Frank, Kathleen R. McKeown and Vasi- Carpuat, Marine and Dekai Wu. 2007. Con- leios Hatzivassiloglou. 1996. Translating collo- text-dependent phrasal translation lexicons for cations or bilingual lexicons: A statistical ap- statistical machine translation. In Proc. of Ma- proach. Computational Linguistics, 22(1):3-38. chine Translation Summit XI, pages 73-80. Vickrey, David, Luke Biewald, Marc Teyssier, and Church, Kenneth W., William Gale, Patrick Hanks Daphne Koller. 2005. Word-sense disambigua- and Donald Hindle. 1991. Using statistics in tion for machine translation. In Proc. of the lexical analysis. Lexical Acquisition: Using Conference on HLT/EMNLP, pages 771-778. On-line Resources to Build a Lexicon, pages Yarowsky, David. 1993. One sense per collocation. 115-164. In Proc. of ARPA Human Language Technology Workshop, pages 266-271. 37 Construction of bilingual multimodal corpora of referring expressions in collaborative problem solving TOKUNAGA Takenobu IIDA Ryu YASUHARA Masaaki TERAI Asuka take,ryu-i,yasuhara @cl.cs.titech.ac.jp [email protected] { } Tokyo Institute of Technology David MORRIS Anja BELZ [email protected] [email protected] University of Brighton Abstract reference resolution, particularly anaphora resolu- tion (Mitkov, 2002), has a long research history as This paper presents on-going work on far back as the mid-1970s (Hobbs, 1978). Much constructing bilingual multimodal corpora research has been conducted from both theoretical of referring expressions in collaborative and empirical perspectives, mainly concerning the problem solving for English and Japanese. identification of antecedents or entities mentioned The corpora were collected from dia- within the same text. This trend, targeting refer- logues in which two participants collab- ence resolution in written text, is still dominant in oratively solved Tangram puzzles with the language analysis, perhaps because such tech- a puzzle simulator. Extra-linguistic in- niques are intended for use in applications such as formation such as operations on puzzle information extraction. pieces, mouse cursor position and piece In contrast, in language generation research in- positions were recorded in synchronisa- terest has recently shifted from the generation of tion with utterances. The speech data one-off references to entities to generation of REs was transcribed and time-aligned with the in discourse context (Belz et al., 2010) and inves- extra-linguistic information. Referring tigating human referential behaviour in real world expressions in utterances that refer to puz- situations, with the aim of using such techniques zle pieces were annotated in terms of their in applications like human-robot interaction (Pi- spans, their referents and their other at- wek, 2007; Foster et al., 2008; Bard et al., 2009). tributes. The Japanese corpus has already been completed, but the English counter- In both analysis and generation, machine- part is still undergoing annotation. We learning approaches have come to replace rule- have conducted a preliminary comparative based approaches as the predominant research analysis of both corpora, mainly with re- trend since the 1990s. This trend has made anno- spect to task completion time, task suc- tated corpora an indispensable component of re- cess rates and attributes of referring ex- search for training and evaluating proposed meth- pressions. These corpora showed signif- ods. In fact, research on reference resolution has icant differences in task completion time developed significantly as a result of large scale and success rate. corpora, e.g. those provided by the Message Un- derstanding Conference (MUC)1 and the Auto- 1 Introduction matic Content Extraction (ACE)2 project. These corpora were constructed primarily for informa- A referring expression (RE) is a linguistic de- tion extraction research, thus were annotated with vice that refers to a certain object of interest (e.g. co-reference relations within texts. Also in the used in describing where the object is located in language generation community, several corpora space). REs have attracted a great deal of atten- tion in both language analysis and language gen- 1http://www.nlpir.nist.gov/related projects/muc/ eration research. In language analysis research, 2http://www.itl.nist.gov/iad/tests/ace/ 38 Proceedings of the 8th Workshop on Asian Language Resources, pages 38–46, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing have been developed (Di Eugenio et al., 2000; By- nally, Section 6 concludes the paper and looks at ron, 2005; van Deemter et al., 2006; Foster and possible future directions. Oberlander, 2007; Foster et al., 2008; Stoia et al., 2008; Spanger et al., 2009a; Belz et al., 2010). Unlike the corpora of MUC and ACE, many are collected from situated dialogues, and therefore include multimodal information (e.g. gestures and eye-gaze) other than just transcribed text (Martin et al., 2007). Foster and Oberlander (2007) em- phasised that any corpus for language generation !'#%%*"#( should include all possible contextual information +')$#&!%#) # at the appropriate granularity. Since constructing a dialogue corpus generally requires experiments Figure 1: Screenshot of the Tangram simulator for data collection, this kind of corpus tends to be small-scale compared with corpora for reference resolution. 2 Data collection Against this background, we have been de- veloping multimodal corpora of referring expres- 2.1 Experimental set-up sions in collaborative problem-solving settings. We recruited subjects in pairs of friends and col- This paper presents on-going work of construct- leagues. Each pair was instructed to solve Tan- ing bilingual (English and Japanese) comparable gram puzzles collaboratively. Tangram puzzles corpora in this domain. We achieve our goal by are geometrical puzzles that originated in ancient replicating, for the English corpus, the same pro- China. The goal of a Tangram puzzle is to con- cess of data collection and annotation as we used struct a given goal shape by arranging seven sim- for our existing Japanese corpus (Spanger et al., ple shapes, as shown in Figure 1. The pieces in- 2009a). Our aim is to create bilingual multimodal clude two large triangles, a medium-sized trian- corpora collected from dialogues in dynamic situ- gle, two small triangles, a parallelogram and a ations. From the point of view of reference anal- square. ysis, our corpora contribute to augmenting the re- With the aim of recording the precise position sources of multimodal dialogue corpora annotated of every piece and every action the participants with reference relations which have been minor made during the solving process, we implemented in number compared to other types of text cor- a Tangram simulator in which the pieces can be pora. From the point of view of reference gen- moved, rotated and flipped with simple mouse op- eration, our corpora contribute to increasing the erations on a computer display. The simulator dis- resources available that can be used to further re- plays two areas: a goal shape area and a work- search of this kind. In addition, our corpora con- ing area where the pieces can be manupulated and tribute to comparative studies of human referential their movements are shown in real time. behaviour in different languages We assigned a different role to each participant The structure of the paper is as follows. Sec- of a pair: one acted as the solver and the other as tion 2 describes the experimental set-up for data the operator. The operator has a mouse for manip- collection which was introduced in our previous ulating Tangram pieces, but does not have a goal work (Spanger et al., 2009a). The setting is basi- shape on the screen. The solver has a goal shape cally the same for the construction of the English on the screen but does not have a mouse. This set- corpus. Section 3 explains the annotation scheme ting naturally leads to a situation where given a adopted in our corpora, followed by a description certain goal shape, the solver thinks of the neces- of a preliminary analysis of the corpora in sec- sary arrangement of the pieces and gives instruc- tion 4. Section 5 briefly mentions related work tions to the operator how to move them, while the to highlight the characteristics of our corpora. Fi- operator manipulates the pieces with the mouse 39 according to the solver’s instructions. Each participant pair was assigned 4 trials con- sisting of two symmetric and two asymmetric goal shapes as shown in Figure 3. In Cogni- tive Science, a wide variety of different kinds of puzzles have been employed extensively in the field of Insight Problem solving. This has been termed the “puzzle-problem approach” (Sternberg and Davidson, 1996; Suzuki et al., 2001) and in the case of physical puzzles has relatively of- ten involved puzzle tasks of symmetric shapes like the so-called T-puzzle, e.g. (Kiyokawa and Nakazawa, 2006). In more recent work Tangram puzzles have been used as a means to study var- ious new aspects of human problem solving ap- Figure 2: Picture of the experiment setting proaches, including collection of of eye-gaze in- formation (Baran et al., 2007). In order to col- As we mentioned in our previous lect data as broadly as possible in this context, we study (Spanger et al., 2009a), this interaction set up puzzle-problems including both symmetri- produces frequent use of referring expressions cal as well as asymmetrical ones as shown in Fig- intended to distinguish specific pieces of the ure 3. puzzle. In our Tangram simulator, all pieces are of the same color, thus color is not useful in identifying a specific piece, i.e. only size The participants exchanged their roles after two and shape are discriminative object-intrinsic trials, i.e. a participant first solves a symmetric and attributes. Instead, we can expect other attributes then an asymmetric puzzle as the solver and then such as spatial relations and deictic reference to does the same as the operator, and vice versa. The be used more often. order of the puzzle trials is the same for all pairs. Each pair of participants sat side by side as shown in Figure 2. Each participant had his/her own computer display showing the shared work- Before starting the first trial as the operator, ing area. A room-divider screen was set between each participant had a short training exercise in the solver (right side) and operator (left side) to order to learn how to manipulate pieces with the prevent the operator from seeing the goal shape on mouse. The initial arrangement of the pieces was the solver’s screen, and to restrict their interaction randomised every time. We set a time limit of 15 to speech only. minutes for the completion of each trial (i.e. con- struction of the goal shape). In order to prevent the solver from getting into deep thought and keeping silent, the simulator is designed to give a hint ev- ery five minutes by showing a correct piece posi- tion in the goal shape area. After 10 minutes have !% $".1-0%+- !& $#2%- passed, a second hint is provided, while the pre- vious hint disappears. A trial ends when the goal shape is complete or the time is up. Utterances by the participants are recorded separately in stereo through headset microphones in synchronisation !' $!*%+/$ !( $$)%$&.2, with the position of the pieces and the mouse op- erations. Piece positions and mouse actions were automatically recorded by the simulator at inter- Figure 3: The goal shapes given to the subjects vals of 10 msec. 40 Table 1: The ELAN Tiers of the corpus Table 2: Attributes of referring expressions Tier meaning dpr : demonstrative pronoun, e.g. “the same one”, “this”, “that”, “it” OP-UT utterances by the operator dad : demonstrative adjective, e.g. “that triangle” SV-UT utterances by the solver siz : size, e.g. “the large triangle” OP-REX referring expressions by the operator typ : type, e.g. “the square” OP-Ref referents of OP-REX dir : direction of a piece, e.g. “the triangle facing the OP-Attr attributes of OP-REX left”. SV-REX referring expressions by the solver prj : projective spatial relation (including directional SV-Ref referents of SV-REX prepositions or nouns such as “right”, “left”, SV-Attr attributes of SV-REX “above”. . . ) e.g. “the triangle to the left of the Action action on a piece square” Target the target piece of Action tpl : topological spatial relation (including non- Mouse the piece on which the mouse is hovering directional prepositions or nouns such as “near”, Indentation of Tier denotes parent-child relations. ∗ “middle”. . . ), e.g. “the triangle near the square” ovl : overlap, e.g. “the small triangle under the large one” 2.2 Subjects and collected data act : action on pieces, e.g “the triangle that you are holding now”, “the triangle that you just rotated” For our Japanese corpus, we recruited 12 Japanese cmp : complement, e.g. “the other one” sim : similarity, e.g. “the same one” graduate students of the Cognitive Science depart- num : number, e.g. “the two triangle” ment, 4 females and 8 males, and split them into 6 rpr : repair, e.g. “the big, no, small triangle” pairs. All pairs knew each other previously and err : obvious erroneous expression, e.g. “the square” referring to a triangle were of the same sex and approximately same nest : nested expression; when a referring expression age3. We collected 24 dialogues (4 trials by 6 includes another referring expression, only the pairs) of about 4 hours and 16 minutes. The av- outermost expression is annotated with this at- tribute, e.g. “(the triangle to the left of (the small erage length of a dialogue was 10 minutes 40 sec- triangle))” onds (SD = 3 minutes 18 seconds). meta: metaphorical expression, e.g. “the leg”, “the For the comparable English corpus, we re- head” cruited 12 native English speakers of various oc- cupations, 6 males and 6 females. Their aver- SLAT (Noguchi et al., 2008)4. Our target expres- age age was 30. There were 6 pairs all of whom sions in this corpus are referring expressions re- knew each other beforehand except for one pair. ferring to a puzzle piece or a set of puzzle pieces. Whereas during the creation of the Japanese cor- We do not deal with expressions referring to a lo- pus we had to give extra attention to ensuring that cation, a part of a piece or a constructed shape. social relationships did not have an impact on how These expressions are put aside for future work. the subjects communicated with one another, for The annotation of referring expressions is three- the English corpus there was no such concern. We fold: (1) identification of the span of expressions, collected 24 dialogues (4 trials by 6 pairs) of 5 (2) identification of their referents, and (3) assign- hours and 7 minutes total length. The average ment of a set of attributes to each referring expres- length of a dialogue was 12 minutes 47 seconds sion. (SD = 3 minutes 34 seconds). Using the multimodal annotation tool ELAN,5 the annotations of referring expressions were then 3 Annotation merged with extra-linguistic data recorded by the The recorded speech data was transcribed and Tangram simulator. The available extra-linguistic the referring expressions were annotated with information from the simulator consists of (1) the the Web-based multi-purpose annotation tool action on a piece, (2) the coordinates of the mouse cursor and (3) the position of each piece in the 3In Japan, the relationship of senior to junior or socially higher to lower placed might affect the language use. We 4We did not use SLAT for English corpus annotation. In- carefully recruited pairs to avoid the effects of this social re- stead, ELAN was directly used for annotating referring ex- lationship such as the possible use of overly polite and indi- pressions. rect language, reluctance to correct mistakes etc. 5http://www.lat-mpi.eu/tools/elan/ 41 Table 3: Summary of trials ID time success OP-REX SV-REX ID time success OP-REX SV-REX E01 15:00 J01 8:40 o 10 48 E02 15:00 J02 11:49 o 7 55 E03 15:00 J03 11:36 o 5 26 E04 15:00 J04 7:31 o 2 21 E05 15:00 J05 15:00 23 78 E06 15:00 J06 11:12 o 5 60 E07 15:00 J07 12:11 o 3 59 E08 15:00 J08 11:20 o 4 61 E09 10:39 o J09 14:59 o 36 84 E10 15:00 J10 6:20 o 3 47 E11 15:00 J11 5:21 o 2 14 E12 8:30 o J12 13:40 o 37 77 E13 14:33 o 8 95 J13 15:00 8 56 E14 7:27 o 1 62 J14 4:48 o 1 29 E15 14:02 o 16 127 J15 9:30 o 20 39 E16 3:57 o 1 31 J16 5:07 o 3 17 E17 13:00 o J17 13:37 o 10 46 E18 6:40 o J18 8:57 o 4 51 E19 15:00 J19 8:02 o 0 37 E20 12:32 o J20 11:23 o 1 59 E21 15:00 J21 10:12 o 7 71 E22 15:00 J22 10:24 o 9 64 E23 15:00 J23 15:00 0 69 E24 5:36 o J24 14:22 o 0 76 Ave. 12:47 6.5 78.8 Ave. 10:40 8.3 51.8 SD 3:34 7.14 41.4 SD 3:18 10.4 20.1 Total 5:06:56 10 26 315 Total 4:16:01 21 200 1,244 working area. Actions and mouse cursor positions Erroneous expressions are annotated with a • are recorded at intervals of 10 msec, and are ab- special attribute. stracted into (1) a time span labeled with an action An expression without a definite referent (i.e. • symbol (“move”, “rotate” or “flip”) and its target a group of possible referents or none) is as- piece number (1–7), and (2) a time span labeled signed a referent number sequence consist- with a piece number which is under the mouse ing of a prefix, followed by the sequence of cursor during that span. The position of pieces is possible referents as its referent, if any are updated and recorded with a timestamp when the present. position of any piece changes. Information about All expressions appearing in muttering to piece positions is not merged into the ELAN files • and is kept in separate files. As a result, we have oneself are excluded. 11 time-aligned ELAN Tiers as shown in Table 1. Table 2 shows a list of attributes of referring Two annotators (two of the authors) first an- expressions used in annotating the corpus. notated four Japanese dialogues separately and The rest of the 20 Japanese dialogues were an- based on a discussion of discrepancies, decided notated by two of the authors and discrepancies on the following criteria to identify a referring ex- were resolved by discussion. Four English dia- pression. logues have been annotated so far by one of the The minimum span of a noun phrase in- authors. • cluding necessary information to identify a referent is annotated. The span might in- 4 Preliminary corpus analysis clude repairs with their reparandum and dis- We have already completed the Japanese corpus, fluency (Nakatani and Hirschberg, 1993) if which is named REX-J (2008-08), but only 4 out needed. of 24 dialogues have been annotated for the En- Demonstrative adjectives are included in ex- glish counterpart (REX-E (2010-03)). Table 3 • pressions. shows a summary of the trials. The horizontal 42 lines divide the trials by pairs, “o” in the “suc- shape as the dependent variable and the goal shape cess” column denotes that the trial was success- as the independent variable. The main effect was fully completed in the time limit (15 minutes), and not significant. the “OP-REX” and “SV-REX” columns show the In summary, we found a difference in the task number of referring expressions used by the op- performance between the languages in terms of erator and the solver respectively. The following the task completion time and the success rate, but subsections describe a preliminary comparison of no difference among the goal shapes. This dif- the English and Japanese corpora. ference could be explained by the diversity of the subjects rather than the difference of languages. Table 4: Task completion time The Japanese subject group consisted of univer- Lang. Shape (a) (b) (c) (d) sity graduate students from the same department English\ 832.0 741.2 890.3 605.8 (Cognitive Science) and roughly of the same age (105.4) (246.5) (23.7) (287.2) (Average = 23.3, SD = 1.5). In contrast, the En- Japanese 774.7 535.0 571.7 633.8 (167.3) (168.5) (242.2) (215.2) glish subjects have diverse backgrounds (e.g. high * Average (SD) school students, university faculty, writer, pro- grammer, etc.) and age (Average = 30.8, SD = 11.7). In addition, a familiarity with this kind of 4.1 Task performance geometric puzzle might have some effect. How- We conducted a two-way ANOVA with the task ever, we collected a familiarity with the puzzle completion time as the dependent variable, and only from the English subjects, we could not con- the goal shape and the language as the indepen- duct further analysis on this viewpoint. Anyhow, dent variables. Only the main effect of the lan- in this respect, the independent variable should guage was significant (F (1, 40) = 5.82, p < have been named “subject group” instead of “lan- 0.05). Table 4 shows the average and the standard guage”. deviation of the completion time. Note that we set a time limit (15 minutes) for solving the puzzle. 4.2 Referring expressions We considered the completion time as 15 minutes It is important to note that since we have only even when a puzzle was not actually solved in the completed the annotation of four dialogs, all by time limit. We also conducted a two-way ANOVA one pair of subjects, our analyses of referring ex- using only the successful cases. Both main effects pressions are tentative and pending further analy- and their interaction were not significant. sis. We then conducted an ANOVA with the num- We have 200 and 1,243 referring expressions by ber of successfully solved puzzles by each pair as the operator and the solver respectively, 1,444 in the dependent variable and the language as the in- total in the 24 Japanese dialogues. On the other dependent variable. The main effect was signifi- hand we have 26 (operator) and 315 (solver) re- cant (F (1, 10) = 6.79, p < 0.05). Table 5 shows ferring expressions in 4 English dialogues. The the average number of success goals per pair and average number of referring expressions per di- the success rate with their standard deviations in alogue in Table 3 suggests that English subjects parentheses. use more referring expressions than Japanese sub- Finally, we conducted an ANOVA with the jects. Since we have only the data from a single number of pairs who succeeded in solving a goal pair, we cannot say whether this tendency applies to the other pairs. We cannot draw a decisive con- clusion until we complete the annotation of the Table 5: The number of solved trials and success English corpus. rates Table 6 shows the total frequencies of the at- Lang. solved trials success rate [%] Japanese 3.50 (0.55) 87.5 (13.7) tributes and their frequencies per dialogue. The English 1.67 (1.63) 41.7 (40.8) table gives us an impression of significantly fre- * Average (SD) quent use of demonstrative pronouns (dpr) by the 43 as limited annotation. Table 6: Comparison of attribute distribution English Japanese The QUAKE corpus (Byron, 2005) and its suc- (4 dialogues) (24 dialogues) attribute frq frq/dlg frq frq/dlg cessor, the SCARE corpus (Stoia et al., 2008) deal dpr 226 56.5 678 28.3 with a more complex domain, where two partici- dad 29 7.3 178 7.4 siz 68 17.0 288 12.0 pants collaboratively play a treasure hunting game typ 103 25.8 655 27.3 in a 3-D virtual world. Despite the complexity dir 0 0 7 0.3 of the domain, the participants were only allowed prj 10 2.5 141 5.9 tpl 4 1 9 0.4 limited actions, e.g. moving step forward, pushing ovl 0 0 2 0.1 a button etc. act 5 1.3 103 4.3 cmp 17 4.3 33 1.4 As a part of the JAST project, the Joint Con- sim 0 0 7 0.3 num 22 5.5 35 1.5 struction Task (JCT) corpus was created based on rpr 0 0 1 0 dialogues in which two participants constructed a err 0 0 1 0 nest 1 0.3 31 1.3 puzzle (Foster et al., 2008). The setting of the meta 1 0.3 6 0.3 experiment is quite similar to ours except that both participants have even roles. Since our main concern is referring expressions, we believe our English subjects. The Japanese subjects use more asymmetric setting elicits more referring expres- attributes of projective spatial relations (prj) and sions than the symmetric setting of the JCT cor- 6 actions on the referent (act). The English subjects pus. use more complement attributes (cmp) as well as more number attributes (num). In contrast to these previous corpora, our cor- pora record a wide range of information useful 5 Related work for analysis of human reference behaviour in situ- ated dialogue. While the domain of our corpora is Over the last decade, with a growing recogni- simple compared to the QUAKE and SCARE cor- tion that referring expressions frequently appear pora, we allowed a comparatively large flexibil- in collaborative task dialogues (Clark and Wilkes- ity in the actions necessary for achieving the goal Gibbs, 1986; Heeman and Hirst, 1995), a num- shape (i.e. flipping, turning and moving of puzzle ber of corpora have been constructed to study the pieces at different degrees), relative to the com- nature of their use. This tendency also reflects plexity of the domain. Providing this relatively the recognition that this area yields both challeng- larger freedom of actions to the participants to- ing research topics as well as promising applica- gether with the recording of detailed information tions such as human-robot interaction (Foster et allows for research into new aspects of referring al., 2008; Kruijff et al., 2010). expressions. The COCONUT corpus (Di Eugenio et al., 2000) was collected from keyboard-dialogs be- As for a multilingual aspect, all the above cor- tween two participants, who worked together on pora are English. There have been several recent a simple 2-D design task, buying and arranging attempts at collecting multilingual corpora in situ- furniture for two rooms. The COCONUT cor- ated domains. For instance, (Gargett et al., 2010) pus is limited in annotations which describe sym- collected German and English corpora in the same bolic object information such as object intrinsic setting. Their domain is similar to the QUAKE attributes and location in discrete co-ordinates. As corpus. Van der Sluis et al. (2009) aim at a com- an initial work of constructing a corpus for collab- parative study of referring expressions between orative tasks, the COCONUT corpus can be char- English and Japanese. Their domain is still static acterised as having a rather simple domain as well at the moment. Our corpora aim at dealing with 6We called such expressions as action-mentioning expres- the dynamic nature of situated dialogues between sions (AME) in our previous work. very different languages, English and Japanese. 44 eration, and evaluation in different tasks (puzzles) Table 7: The REX-J corpus family as well. We are planning to distribute the REX-J name puzzle #pairs #dialg. #valid status T2008-08 Tangram 6 24 24 completed corpus family through GSK (Language Resources T2009-03 Tangram 10 40 16 completed Association in Japan)8, and the REX-E corpus T2009-11 Tangram 10 36 27 validating from both University of Brighton and GSK. N2009-11 Tangram 5 20 8 validating P2009-11 Polyomino 7 28 24 annotating D2009-11 2-Tangram 7 42 24 annotating References 6 Conclusion and future work Baran, Bahar, Berrin Dogusoy, and Kursat Cagiltay. 2007. How do adults solve digital tangram prob- This paper presented an overview of our English- lems? Analyzing cognitive strategies through eye tracking approach. In HCI International 2007 - 12th Japanese bilingual multimodal corpora of refer- International Conference - Part III, pages 555–563. ring expressions in a collaborative problem solv- ing setting. The Japanese corpus was completed Bard, Ellen Gurman, Robin Hill, Manabu Arai, and and has already been used for research (Spanger et Mary Ellen Foster. 2009. Accessibility and atten- tion in situated dialogue: Roles and regulations. In al., 2009b; Spanger et al., 2010; Iida et al., 2010), Proceedings of the Workshop on Production of Re- but the English counterpart is still undergoing an- ferring Expressions Pre-CogSci 2009. notation. We have also presented a preliminary comparative analysis of these corpora in terms of Belz, Anja, Eric Kow, Jette Viethen, and Albert Gatt. 2010. Referring expression generation in context: the task performance and usage of referring ex- The GREC shared task evaluation challenges. In pressions. We found a significant difference of the Krahmer, Emiel and Mariet¨ Theune, editors, Empir- task performance, which could be attributed to the ical Methods in Natural Language Generation, vol- difference in diversity of subjects. We have tenta- ume 5980 of Lecture Notes in Computer Science. tive results on the usage of referring expressions, Springer-Verlag, Berlin/Heidelberg. since only four English dialogues are available at Byron, Donna K. 2005. The OSU Quake 2004 cor- the moment. pus of two-party situated problem-solving dialogs. The data collection experiments were con- Technical report, Department of Computer Science ducted in August 2008 for Japanese and in March and Enginerring, The Ohio State University. 2010 for English. Between these periods, we Clark, H. Herbert. and Deanna Wilkes-Gibbs. 1986. conducted various data collections to build differ- Referring as a collaborative process. Cognition, ent types of Japanese corpora (March, 2009 and 22:1–39. November 2009). These experiments involve cap- Di Eugenio, Barbara, Pamela W. Jordan, Richmond H. turing eye-gaze information of participants during Thomason, and Johanna. D. Moore. 2000. The problem solving, and introducing variants of puz- agreement process: An empirical investigation of zles (Polyomino, Double Tangram and Tangram human-human computer-mediated collaborative di- without any hints7). They are also under prepa- alogues. International Journal of Human-Computer Studies, 53(6):1017–1076. ration for publication. Table 7 gives an overview of the REX-J corpus family, where “#valid” de- Foster, Mary Ellen and Jon Oberlander. 2007. Corpus- notes the number of dialogues with valid eye- based generation of head and eyebrow motion for gaze data. Eye-gaze data is difficult to capture an embodied conversational agent. Language Re- sources and Evaluation, 41(3–4):305–323, Decem- cleanly throughout a dialogue. We discarded di- ber. alogues in which eye-gaze was captured success- fully less than 70% of the total time of the dia- Foster, Mary Ellen, Ellen Gurman Bard, Markus Guhe, logue. Namely, we annotated or will annotate di- Robin L. Hill, Jon Oberlander, and Alois Knoll. 2008. The roles of haptic-ostensive referring ex- alogues with validated eye-gaze data only. pressions in cooperative, task-based human-robot These corpora enable research on utilising eye- dialogue. In Proceedings of 3rd Human-Robot In- gaze information in reference resolution and gen- teraction, pages 295–302. 7N2009-11 in Table 7 8http://www.gsk.or.jp/index e.html 45 Gargett, Andrew, Konstantina Garoufi, Alexander of referring expressions used in a situated collab- Koller, and Kristina Striegnitz. 2010. The give- oration task. In Proceedings of the 12th European 2 corpus of giving instructions in virtual environ- Workshop on Natural Language Generation (ENLG ments. In Proceedings of the Seventh conference on 2009), pages 110 – 113. International Language Resources and Evaluation (LREC 2010), pages 2401–2406. Spanger, Philipp, Masaaki Yasuhara, Ryu Iida, and Takenobu Tokunaga. 2009b. Using extra linguistic Heeman, Peter A. and Graeme Hirst. 1995. Collabo- information for generating demonstrative pronouns rating on referring expressions. Computational Lin- in a situated collaboration task. In Proceedings of guistics, 21:351–382. PreCogSci 2009: Production of Referring Expres- sions: Bridging the gap between computational and Hobbs, Jerry R. 1978. Resolving pronoun references. empirical approaches to reference. Lingua, 44:311–338. Spanger, Philipp, Ryu Iida, Takenobu Tokunaga, Iida, Ryu, Shumpei Kobayashi, and Takenobu Toku- Asuka Teri, and Naoko Kuriyama. 2010. Towards naga. 2010. Incorporating extra-linguistic informa- an extrinsic evaluation of referring expressions in tion into reference resolution in collaborative task situated dialogs. In Kelleher, John, Brian Mac dialogue. In Proceedings of 48th Annual Meeting Namee, and Ielka van der Sluis, editors, Proceed- of the Association for Computational Linguistics, ings of the Sixth International Natural Language pages 1259–1267. Generation Conference (INGL 2010), pages 135– 144. Kiyokawa, Sachiko and Midori Nakazawa. 2006. Ef- fects of reflective verbalization on insight problem Sternberg, Robert J. and Janet E. Davidson, editors. solving. In Proceedings of 5th International Con- 1996. The Nature of Insight. The MIT Press. ference of the Cognitive Science, pages 137–139. Stoia, Laura, Darla Magdalene Shockley, Donna K. Kruijff, Geert-Jan M., Pierre Lison, Trevor Ben- Byron, and Eric Fosler-Lussier. 2008. SCARE: jamin, Henrik Jacobsson, Hendrik Zender, and A situated corpus with annotated referring expres- Ivana Kruijff-Korbayova. 2010. Situated dialogue sions. In Proceedings of the Sixth International processing for human-robot interaction. In Cogni- Conference on Language Resources and Evaluation tive Systems: Final report of the CoSy project, pages (LREC 2008), pages 28–30. 311–364. Springer-Verlag. Suzuki, Hiroaki, Keiga Abe, Kazuo Hiraki, and Martin, Jean-Claude, Patrizia Paggio, Peter Kuehnlein, Michiko Miyazaki. 2001. Cue-readiness in in- Rainer Stiefelhagen, and Fabio Pianesi. 2007. Spe- sight problem-solving. In Proceedings of the 23rd cial issue on Mulitmodal corpora for modeling hu- Annual Meeting of the Cognitive Science Society, man multimodal behaviour. Language Resources pages 1012 – 1017. and Evaluation, 41(3-4). van Deemter, Kees, Ielka van der Sluis, and Albert Mitkov, Ruslan. 2002. Anaphora Resolution. Long- Gatt. 2006. Building a semantically transparent man. corpus for the generation of referring expressions. In Proceedings of the Fourth International Natural Nakatani, Christine and Julia Hirschberg. 1993. A Language Generation Conference, pages 130–132. speech-first model for repair identification and cor- rection. In Proceedings of 31th Annual Meeting of van der Sluis, Ielka, Junko Nagai, and Saturnino Luz. ACL, pages 200–207. 2009. Producing referring expressions in dialogue: Insights from a translation exercise. In Proceedings Noguchi, Masaki, Kenta Miyoshi, Takenobu Toku- of PreCogSci 2009: Production of Referring Ex- naga, Ryu Iida, Mamoru Komachi, and Kentaro pressions: Bridging the gap between computational Inui. 2008. Multiple purpose annotation using and empirical approaches to reference. SLAT – Segment and link-based annotation tool. In Proceedings of 2nd Linguistic Annotation Work- shop, pages 61–64. Piwek, Paul L. A. 2007. Modality choise for gen- eration of referring acts. In Proceedings of the Workshop on Multimodal Output Generation (MOG 2007), pages 129–139. Spanger, Philipp, Masaaki Yasuhara, Ryu Iida, and Takenobu Tokunaga. 2009a. A Japanese corpus 46 Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level Dipankar Das Sivaji Bandyopadhyay Department of Computer Science Department of Computer Science & Engineering, & Engineering, Jadavpur University Jadavpur University [email protected] [email protected] affective communication substrates to analyze Abstract the reaction of emotional catalysts. Among these media, blog is one of the communicative Emotion, the private state of a human and informative repository of text based emo- entity, is becoming an important topic tional contents in the Web 2.0 (Lin et al., in Natural Language Processing (NLP) 2007). with increasing use of search engines. Rapidly growing web users from multilin- The present task aims to manually an- gual communities focus the attention to im- notate the sentences in a web based prove the multilingual search engines on the Bengali blog corpus with the emotional basis of sentiment or emotion. Major studies components such as emotional expres- on Opinion Mining and Sentiment Analyses sion (word/phrase), intensity, associ- have been attempted with more focused per- ated holder and topic(s). Ekman’s six spectives rather than fine-grained emotions. emotion classes (anger, disgust, fear, The analyses of emotion or sentiment require happy, sad and surprise) along with some basic resource. An emotion-annotated three types of intensities (high, general corpus is one of the primary ones to start with. and low) are considered for the sen- The proposed annotation task has been car- tence level annotation. Presence of dis- ried out at sentence level. Three annotators course markers, punctuation marks, have manually annotated the Bengali blog sen- negations, conjuncts, reduplication, tences retrieved from a web blog archive1 with rhetoric knowledge and especially Ekman’s six basic emotion tags (anger (A), emoticons play the contributory roles disgust (D), fear (F), happy (H), sad (Sa) and in the annotation process. Different surprise (Su)). The emotional sentences are types of fixed and relaxed strategies tagged with three types of intensities such as have been employed to measure the high, general and low. The sentences of non- agreement of the sentential emotions, emotional (neutral) and multiple (mixed) cate- intensities, emotional holders and top- gories are also identified. The identification of ics respectively. Experimental results emotional words or phrases and fixing the for each emotion class at word level on scope of emotional expressions in the sen- a small set of the whole corpus have tences are carried out in the present task. Each been found satisfactory. of the emoticons is also considered as individ- ual emotional expressions. The emotion holder 1 Introduction and relevant topics associated with the emo- Human emotion described in texts is an impor- tional expressions are annotated considering tant cue for our daily communication but the the punctuation marks, conjuncts, rhetorical identification of emotional state from texts is structures and other discourse information. The not an easy task as emotion is not open to any knowledge of rhetorical structure helps in re- objective observation or verification (Quirk et moving the subjective discrepancies from the al., 1985). Emails, weblogs, chat rooms, online forums and even twitter are considered as the 1 www.amarblog.com 47 Proceedings of the 8th Workshop on Asian Language Resources, pages 47–55, Beijing, China, 21-22 August 2010. c 2010 Asian Federation for Natural Language Processing writer’s point of view. The annotation scheme happy, sad, positively surprised, negatively is used to annotate 123 blog posts containing surprised) to accomplish the emotion annota- 4,740 emotional sentences having single emo- tion task at sentence level. They have manually tion tag and 322 emotional sentences for mixed annotated 1580 sentences extracted from 22 emotion tagss along with 7087 neutral sen- Grimms’ tales. The present approach discusses tences in Bengali. Three types of standard the issues of annotating unstructured blog text agreement measures such as Cohen’s kappa considering rhetoric knowledge along with the ( ) (Cohen, 1960; Carletta, 1996), Measure of attributes, e.g. negation, conjunct, reduplica- Agreement on Set-valued Items (MASI) (Pas- tion etc. sonneau, 2004) and agr (Wiebe et al., 2005) Mishne (2005) experimented with mood metrics are employed for annotating the emo- classification in a blog corpus of 815,494 posts tion related components. The relaxed agree- from Livejournal ment schemes like MASI and agr are specially (http://www.livejournal.com), a free weblog considered for fixing the boundaries of emo- service with a large community. (Mihalcea and tional expressions and topic spans in the emo- Liu, 2006) have used the same data source for tional sentences. The inter annotator agreement classifying the blog posts into two particular of some emotional components such as senten- emotions – happiness and sadness. The blog tial emotions, holders, topics show satisfactory posts are self-annotated by the blog writers performance but the sentences of mixed emo- with happy and sad mood labels. In contrast, tion and intensities of general and low show the present approach includes Ekman’s six the disagreement. A preliminary experiment emotions, emotion holders and topics to ac- for word level emotion classification on a complish the whole annotation task. small set of the whole corpus yielded satisfac- (Neviarouskaya et al., 2007) collected 160 tory results. sentences labeled with one of the nine emo- The rest of the paper is organized as fol- tions categories (anger, disgust, fear, guilt, in- lows. Section 2 describes the related work. The terest, joy, sadness, shame, and surprise) and a annotation of emotional expressions, sentential corresponding intensity value from a corpus of emotion and intensities are described in Sec- online diary-like blog posts. On the other hand, tion 3. In Section 4, the annotation scheme for (Aman and Szpakowicz, 2007) prepare an emotion holder is described. The issues of emotion-annotated corpus with a rich set of emotional topic annotation are discussed in emotion information such as category, inten- Section 5. Section 6 describes the preliminary sity and word or phrase based expressions. The experiments carried out on the annotated cor- present task considers all the above emotion pus. Finally, Section 7 concludes the paper. information during annotation. But, the present annotation task additionally includes the com- 2 Related Work ponents like emotion holder, single or multiple topic spans. One of the most well known tasks of annotat- The emotion corpora for Japanese were built ing the private states in texts is carried out by for recognizing emotions (Tokuhisa et al., (Wiebe et al., 2005). They manually annotated 2008). An available emotion corpus in Chinese the private states including emotions, opinions, is Yahoo!’s Chinese news and sentiment in a 10,000-sentence corpus (the (http://tw.news.yahoo.com), which is used for MPQA corpus) of news articles. The opinion Chinese emotion classification of news readers holder information is also annotated in the (Lin, et al., 2007). The manual annotation of MPQA corpus but the topic annotation task has eight emotional categories (expect, joy, love, been initiated later by (Stoyanov and Cardie, surprise, anxiety, sorrow, angry and hate) 2008a). In contrast, the present annotation along with intensity, holder, word/phrase, de- strategy includes the fine-grained emotion gree word, negative word, conjunction, rheto- classes and specially handles the emoticons ric, punctuation and other linguistic expres- present in the blog posts. sions are carried out at sentence, paragraph as (Alm et al., 2005) have considered eight well as document level on 1,487 Chinese blog emotion categories (angry, disgusted, fearful, documents (Quan and Ren, 2009). In addition 48 to the above emotion entities, the present ap- emoticons (emo_icon)). The identification of proach also includes the annotation of single or structural clues indeed requires the identifica- multiple emotion topics in a target span. tion of lexical clues. Recent study shows that non-native English Rhetorical Structure Theory (RST) de- speakers support the growing use of the Inter- scribes the various parts of a text, how they net 2. This raises the demand of linguistic re- can be arranged and connected to form a whole sources for languages other than English. Ben- text (Azar, 1999). The theory maintains that gali is the fifth popular language in the World, consecutive discourse elements, termed text second in India and the national language in spans, which can be in the form of clauses, Bangladesh but it is less computerized com- sentences, or units larger than sentences, are pared to English. To the best of our knowl- related by a relatively small set (20–25) of rhe- edge, at present, there is no such available cor- torical relations (Mann and Thompson, 1988). pus that is annotated with detailed linguistic RST distinguishes between the part of a text expressions for emotion in Bengali or even for that realizes the primary goal of the writer, other Indian languages. Thus we believe that termed as nucleus, and the part that provides this corpus would help the development and supplementary material, termed satellite. The evaluation of emotion analysis systems in separation of nucleus from satellite is done Bengali. based on punctuation marks (, ! @?), emoti- cons, discourse markers ( jehetu [as], 3 Emotion Annotation jemon [e.g.], karon [because], mane e n Random collection of 123 blog posts contain- [means]), conjuncts ( ebong [and], a ing a total of 12,149 sentences are retrieved kintu [but], athoba [or]), causal verbs 3 from Bengali web blog archive (especially ( ghotay [caused]) if they are explicitly from comics, politics, sports and short stories) specified in the sentences. to prepare the corpus. No prior training was Use of emotion-related words is not the sole provided to the annotators but they were in- means of expressing emotion. Often a structed to annotate each sentence of the blog sentence, which otherwise may not have an corpus based on some illustrated samples of emotional word, may become emotion bearing the annotated sentences. Specially for annotat- depending on the context or underlying ing the emotional expressions and topic(s) in semantic meaning (Aman and Szpakowicz, emotional sentences, the annotators are free in 2007). An empirical analysis of the blog texts selecting the texts spans. This annotation shows two types of emotional expressions. The scheme is termed as relaxed scheme. For other first category contains explicitly stated emotional components, the annotators are emotion word (EW) or phrases (EP) mentioned given items with fixed text spans and in- in the nucleus or in the satellite. Another structed to annotation the items with definite category contains the implicit emotional clues tags. that are identified based on the context or from the metaphoric knowledge of the expressions. 3.1 Identifying Emotional Expressions for Sometimes, the emotional expressions contain Sentential Emotion and Intensity direct emotion words (EW) ( koutuk The identification of emotion or affect affixed [joke], ananda [happy], in the text segments is a puzzle. But, the puzzle ashcharjyo [surprise]), reduplication (Redup) can be solved partially using some lexical ( sanda sanda [doubt with fear], clues (e.g. discourse markers, punctuation question words (EW_Q) ( ki [what], kobe marks (sym), negations (NEG), conjuncts [when]), colloquial words (k kshyama (CONJ), reduplication (Redup)), structural [perdon]) and foreign words ( thanku clues (e.g. rhetoric and syntactic knowledge) [thanks], gossya [anger]). On the other and especially some direct affective clues (e.g. hand, the emotional expressions contain indirect emotion words e.g. proverbs, idioms 2 ( taser ghar [weakly built], grri- http://www.internetworldstats.com/stats.htm hadaho [family disturbance]) and emoticons 3 www.amarblog.com (,). 49 A large number of emoticons (emo_icon) agreement for emotion and intensities are present in the Bengali blog texts vary accord- measured using standard Cohen's kappa coef- ing to their emotional categories and slant. ficient ( ) (Cohen, 1960). The annotation Each of the emoticons is treated as individual agreement for emoticons is also measured us- emotional expression and its corresponding ing the kappa metric. It is a statistical measure intensity is set based on the image denoted by of inter-rater agreement for qualitative (cate- the emoticon. The labeling of the emoticons gorical) items. It measures the agreement be- with Ekman’s six emotion classes is verified tween two raters who separately classify items through the inter-annotator agreement that is into some mutually exclusive categories. considered for emotion expressions. The agreement of classifying sentential in- The intensifiers (( khub [too much/very], tensities into three classes (high, general and a anek [huge/large], >KD bhishon low) is also measured using kappa ( ). The [heavy/too much]) associated with the emo- intensities of mixed emotional sentences are tional phrases are also acknowledged in anno- also considered. Agreement results of emo- tating sentential intensities. As the intensifiers tional, non-emotional and mixed sentences, depend solely on the context, their identifica- emoticons, along with results for each emotion tion along with the effects of negation and con- class, intensity types are shown in Table 1. juncts play a role in annotating the intensity. Sentential emotions with happy, sad or sur- Negations ( na [no], noy [not]) and con- prise classes produce comparatively higher juncts freely occur in the sentence and change kappa coefficient than the other emotion the emotion of the sentence. For that very rea- classes as the emotional expressions of these son, a crucial analysis of negation and con- types were explicitly specified in the blog juncts is carried out both at intra and inter texts. It has been observed that the emotion phrase level to obtain the sentential emotions pairs such as “sad-anger” and “anger-disgust” and intensities. An example set of the anno- often cause the trouble in distinguishing the tated blog corpus is shown in Figure 1. emotion at sentence level. Mixed emotion category, general and low intensity types give